24Hibench

1. Hibench

官网

​ HiBench is a big data benchmark suite that helps evaluate different big data frameworks in terms of speed, throughput and system resource utilizations. It contains a set of Hadoop, Spark and streaming workloads, including Sort, WordCount, TeraSort, Repartition, Sleep, SQL, PageRank, Nutch indexing, Bayes, Kmeans, NWeight and enhanced DFSIO, etc. It also contains several streaming workloads for Spark Streaming, Flink, Storm and Gearpump.

1.1 workloads

There are totally 29 workloads in HiBench. The workloads are divided into 6 categories which are micro, ml(machine learning), sql, graph, websearch and streaming.

1.2 install maven

首先需要安装maven,并配好环境安装教程

mkdir repo
cd conf
vim settings.xml

修改仓库地址

<localRepository>/opt/module/maven/apache-maven-3.8.6/repo</localRepository>

阿里云镜像文件中已经有了,注释掉其他mirror

  <mirror><id>alimaven</id><name>aliyun maven</name><url>http://maven.aliyun.com/nexus/content/groups/public/</url><mirrorOf>central</mirrorOf>
</mirror>

1.3 bulid

下载zip文件,上传解压后的文件 不要使用7.11 会出现版本问题

参照文档中的,构建HiBench项目,我使用的是全部安装:

#ALL
mvn -Dspark=3.1 -Dscala=2.12 clean package
#SPARK
mvn -Psparkbench -Dspark=3.1 -Dscala=2.12 clean package

image-20221113191059611

如果出现以下错误:

image-20221113171241273

原因是maven没有安装好,没有设置好镜像以及安装仓库,详情见安装教程

build成功

image-20221114134337141

第二次

image-20221214105422318

1.4 configure

hadoop.conf

image-20221122101805886

spark.conf

image-20221122101910798

Input data size

image-20230408154837846

if you chose a real large data size ,you may find the errors:

image-20230608165007024

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

you need to modify the mapred-site.xml, and add the context:

<property><name>mapred.task.timeout</name><value>800000</value><final>true</final>
</property>
cluster mode
vim /opt/module/hibench/HiBench-master/HiBench-master/bin/functions/workload_functions.sh# 修改run_spark_job 方法

image-20230702113310572

1.5 hadoop example

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/hadoop/run.sh

官网地址

运行成功

image-20221122121433468

image-20221122121637126

更详细的介绍

/opt/module/Hibench/HiBench-master/report/wordcount/hadoop

1.6 spark example

准备输入数据

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/prepare/prepare.sh

控制台输出

image-20221122102053983

yarn

image-20221122102505411

进入HDFS查看准备的输入数据

image-20221122102227590

准备命令

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/prepare/prepare.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/prepare/prepare.sh

注意jar文件夹中不要包含其他备用的jar包

运行命令

/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/wordcount/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/terasort/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/micro/sort/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/kmeans/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/bayes/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/ml/lr/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/spark/run.sh/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/graph/nweight/spark/run.sh

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

成功!

image-20221214112321741

结果

image-20230610145858128

trouble

网络配置问题

所有任务在yarn上都用的是内网IP

image-20230608180507018

image-20230609113252954

Permission denied

# 进入bin目录
chmod -R +x ./bin/

multi-job

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/hibench.conf
Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/spark.conf
Parsing conf: /opt/module/hibench/HiBench-master/HiBench-master/conf/workloads/websearch/pagerank.conf
probe sleep jar: /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar
ERROR, execute cmd: '( /opt/module/hadoop-3.1.3/bin/yarn node -list 2> /dev/null | grep RUNNING )' timedout.STDOUT:STDERR:Please check!
Traceback (most recent call last):File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 685, in <module>load_config(conf_root, workload_configFile, workload_folder, patching_config)File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 217, in load_configgenerate_optional_value()File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 613, in generate_optional_valueprobe_masters_slaves_hostnames()File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 549, in probe_masters_slaves_hostnamesprobe_masters_slaves_by_Yarn()File "/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/load_config.py", line 500, in probe_masters_slaves_by_Yarnassert 0, "Get workers from yarn-site.xml page failed, reason:%s\nplease set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually" % e
AssertionError: Get workers from yarn-site.xml page failed, reason:( /opt/module/hadoop-3.1.3/bin/yarn node -list 2> /dev/null | grep RUNNING ) executed timedout for 5 seconds
please set `hibench.masters.hostnames` and `hibench.slaves.hostnames` manually
start ScalaSparkPagerank bench
/opt/module/hibench/HiBench-master/HiBench-master/bin/functions/workload_functions.sh: line 38: .: filename argument required
.: usage: . filename [arguments]
/opt/module/hibench/HiBench-master/HiBench-master/bin/workloads/websearch/pagerank/spark/run.sh: line 26: OUTPUT_HDFS: unbound variable

原因可能是系统负载过高导致的响应迟钝

image-20230710195619538

2.records

parallelism = 18

有input的stage的任务数是由数据数据的大小决定的,spark.default.parallelism决定的是shuffle后的stage的任务数

LR

内存不够

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692258304054&to=1692264023063&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817154625-0003&var-groupbyInterval=1s

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

image-20230710222144197

http://192.168.10.102:18080/history/app-20230817154625-0003/stages/

image-20230817182124188

image-20230823160549361

Bayes

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692265076383&to=1692265140060&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817173746-0004&var-groupbyInterval=1s

image-20230817182326359

http://192.168.10.102:18080/history/app-20230817173746-0004/jobs/

image-20230817182448207

image-20230817182410877

image-20230823160616907

NWeightGraphX

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

image-20230710222031686

ScalaPageRank

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692277885573&to=1692278062264&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817211121-0012&var-groupbyInterval=1s

image-20230817212034137

http://192.168.10.102:18080/history/app-20230817211121-0012/stages/

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

distinct

image-20230817211940302

flatMap

image-20230817211756998

DenseKMeans

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692273890319&to=1692274051643&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817200442-0009&var-groupbyInterval=1s

image-20230817203320695

http://192.168.10.102:18080/history/app-20230817200442-0009/stages/

image-20230817203354123

image-20230823160320490

map

image-20230817210720121

collect

image-20230817210901877

image-20230817210942909

sort

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692276453862&to=1692276644236&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817204743-0011&var-groupbyInterval=1s

image-20230814105408629

http://192.168.10.102:18080/history/app-20230817204743-0011/stages/

image-20230817205354207

map

image-20230817205420719

reduce

image-20230817205507056

TeraSort

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1692275803257&to=1692276079162&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230817203719-0010&var-groupbyInterval=1s

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

http://192.168.10.102:18080/history/app-20230817203719-0010/stages/

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

map

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

reduce

image-20230817205732605

WordCount

http://192.168.10.102:3000/d/e9e40733-bb3a-42c8-8704-38ec27cbee3f/spark-perf-dashboard-v04-custom?from=1691983226006&to=1691983545077&orgId=1&var-UserName=jaken&var-ApplicationId=app-20230814112024-0002&var-groupbyInterval=1s

image-20230814112936597

parallelism=20

image-20230817210134565

http://192.168.10.102:18080/history/app-20230814112024-0002/jobs/

map

image-20230814113241553

reduce

image-20230814113216632

hdfs维护

hadoop fs -rm -r -skipTrash /hibench_test/HiBench/

hadoop dfsadmin -safemode leave

var-ApplicationId=app-20230814112024-0002&var-groupbyInterval=1s

[外链图片转存中…(img-eGnFeoml-1696143711685)]

parallelism=20

[外链图片转存中…(img-tqMq87MT-1696143711685)]

http://192.168.10.102:18080/history/app-20230814112024-0002/jobs/

map

[外链图片转存中…(img-ckoZlP4I-1696143711686)]

reduce

[外链图片转存中…(img-yA8CbUjp-1696143711686)]

hdfs维护

hadoop fs -rm -r -skipTrash /hibench_test/HiBench/

hadoop dfsadmin -safemode leave

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/93433.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

JavaSE | 初始Java(九) | 包的使用

包 包是对类、接口等的封装机制的体现&#xff0c;是一种对类或者接口等的很好的组织方式&#xff0c;比如&#xff1a;一个包中的类不想被其他包中的类使用。包还有一个重要的作用&#xff1a;在同一个工程中允许存在相同名称的类&#xff0c;只要处在不同的包中即可。 可以…

软断言你也学不会

断言是测试用例的一部分&#xff0c;也是测试工程师开发测试用例的核心。断言通常集成在单元测试和集成测试中&#xff0c;断言分为硬断言和软断言。 硬断言是我们狭义上听到的普通断言:当用例运行后得到的[实际]结果与预期结果不匹配时&#xff0c;测试框架将停止测试执行并抛…

Python实时采集Windows CPU\MEMORY\HDD使用率

文章目录 安装psutil库在Python脚本中导入psutil库获取CPU当前使用率&#xff0c;并打印结果获取内存当前使用率&#xff0c;并打印结果获取磁盘当前使用情况&#xff0c;并打印结果推荐阅读 要通过Python实时采集Windows性能计数器的数据&#xff0c;你可以使用psutil库。psut…

AutoCAD 产品设计:图形单位

本文讲解 AutoCAD 产品的图形单位功能产品设计&#xff0c;没有任何代码实现。 使用的 AutoCAD 为 2020 版本 图形单位是什么&#xff1f; 图形单位是用于设置 一些属性数据应该用什么格式显示 的命令&#xff0c;命令标识为 un&#xff08;units&#xff09;。 举个例子。 …

WebGL笔记:绘制多个点,三角形,以及画各种不同的线条,面

绘制多点 1 &#xff09; WebGL 缓冲区 我们在用js定点位的时候&#xff0c;肯定是要建立一份顶点数据的&#xff0c;这份顶点数据是给着色器的&#xff0c;因为着色器需要这份顶点数据绘图然而&#xff0c;我们在js中建立顶点数据&#xff0c;着色器肯定是拿不到的&#xff…

基于SpringBoot的反诈宣传平台设计与实现(源码+lw+部署文档+讲解等)

文章目录 前言具体实现截图论文参考详细视频演示为什么选择我自己的网站自己的小程序&#xff08;小蔡coding&#xff09;有保障的售后福利 代码参考源码获取 前言 &#x1f497;博主介绍&#xff1a;✌全网粉丝10W,CSDN特邀作者、博客专家、CSDN新星计划导师、全栈领域优质创作…

SpringBoot注册web组件

目录 前言 一、注册Servlet组件 1.1 使用SpringBoot注解加继承HttpServet类注册 1.2 通过继承HttpServet类加配置类来进行注册 二、注册Listener组件 2.1 使用SpringBoot注解和实现ServletContextListener接口注册 2.2 ServletContextListener接口和配置类来进行注册 …

【Spring】Spring 创建和使用

Spring 创建和使用 一. 创建 Spring 项目1. 创建⼀个 Maven 项目2. 添加 Spring 框架⽀持3. 添加启动类 二. 存储 Bean 对象1. 创建 Bean2. 将 Bean 注册到容器 三. 获取并使⽤ Bean 对象1. 创建 Spring 上下文2. 获取指定的 Bean 对象3. 使用 Bean Spring 就是⼀个包含了众多⼯…

【vue3】wacth监听,监听ref定义的数据,监听reactive定义的数据,详解踩坑点

假期第二篇&#xff0c;对于基础的知识点&#xff0c;我感觉自己还是很薄弱的。 趁着假期&#xff0c;再去复习一遍 之前已经记录了一篇【vue3基础知识点-computed和watch】 今天在学习的过程中发现&#xff0c;之前记录的这一篇果然是很基础的&#xff0c;很多东西都讲的不够…

python复习

1.python属于解释型语言&#xff0c;解释器逐行解释每一句代码&#xff0c;然后执行 编译型语言需要由编译器生成最终可执行文件再执行 2. #单行注释""" 多行注释 """ 注释快捷键ctrl/ 3.变量是在计算机语言中能储存计算结果或表示某个数据…

Docker介绍与安装

目录 一、Docker 概述 1、什么时Docker 2、Docker的设计宗旨 4、Docker的优点 5、Docker容器和虚拟机的区别 6、 namespace的隔离&#xff08;命名空间&#xff09; 7、 Docker的三个核心概念 7.1 镜像 7.2 容器 7.3 仓库&#xff08;Docker Hapu&#xff09; 二、D…

Sentinel-2波段合成

Sentinel-2波段合成 在上一篇博客中下载了Sentinel-2数据&#xff0c;他有13个波段的.jp2文件&#xff0c;下面选取需要使用的波段进行合成。 导入了B2&#xff08;蓝色&#xff09;、B3&#xff08;绿色&#xff09;、B4&#xff08;红色&#xff09;、B8&#xff08;近红外&…

Linux--网络编程-字节序

进程间的通信&#xff1a; 管道、消息队列、共享内存、信号、信号量。 特点&#xff1a;都依赖于linux内核。 缺陷&#xff1a;无法多机通信。 一、网络编程&#xff1a; 1、地址&#xff1a;基于网络&#xff0c;ip地址端口号。 端口号作用&#xff1a; 一台拥有ip地址的主机…

Windows11安装MySQL8.1

安装过程中遇到任何问题均可以参考(这个博客只是单纯升级个版本和简化流程) Windows安装MySQL8教程-CSDN博客 到官网下载mysql8数据库软件 MySQL :: Download MySQL Community Server 下载完后,解压到你需要安装的文件夹 其中的配置文件内容了如下 [mysqld]# 设置3306端口po…

(c++)类和对象 下篇

目录 1.再次了解构造函数 2. Static成员 3. 友元 4. 内部类 5.匿名对象 6.拷贝对象时的一些编译器优化 1.再次了解构造函数 1.1 构造函数体赋值 在创建对象时&#xff0c;编译器通过调用构造函数&#xff0c;给对象中各个成员变量一个合适的初始值。 class Date { pub…

WARNING:tensorflow:Your input ran out of data; interrupting training. 解决方法

问题详情&#xff1a; WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least steps_per_epoch * epochs batches (in this case, 13800 batches). You may need to use the repeat() funct…

1.7.C++项目:仿muduo库实现并发服务器之Poller模块的设计

项目完整在&#xff1a; 文章目录 一、Poller模块&#xff1a;描述符IO事件监控模块二、提供的功能三、实现思想&#xff08;一&#xff09;功能&#xff08;二&#xff09;意义&#xff08;三&#xff09;功能设计 四、封装思想五、代码&#xff08;一&#xff09;框架&#…

Springboo整合Sentinel

Springboo整合Sentinel 1.启动Sentinel java -jar sentinel-dashboard-1.8.6.jar2.访问localhost:8080到Sentinel管理界面(默认账号和密码都是sentinel) 3.引入依赖(注意版本对应) <dependency><groupId>com.alibaba.cloud</groupId><artifactId>spr…

stm32 - 中断/定时器

stm32 - 中断/定时器 概念时钟树定时器类型基准时钟&#xff08;系统时钟&#xff09;预分频器 - 时基单元CNT计数器 - 时基单元自动重装寄存器 - 时基单元基本定时器结构通用定时器计数器模式内外时钟源选择 定时中断基本结构时序预分频器时序计数器时序 概念 时钟树 https:…

Vue中如何进行多语言处理

Vue中的多语言处理 在开发多语言Web应用程序时&#xff0c;处理文本翻译和国际化是一个重要的任务。Vue.js提供了多种方法来实现多语言处理&#xff0c;以确保您的应用程序能够支持不同语言的用户。本文将深入探讨在Vue中进行多语言处理的方法&#xff0c;并提供示例代码来帮助…