jar包在Hadoop集群上测试(MapReduce)

本片使用MapReduce——统计输出给定的文本文档每一个单词出现的总次数的案例进行，jar包在集群上测试

1、添加打包插件依赖

    <build><plugins><plugin><artifactId>maven-compiler-plugin</artifactId><version>3.6.2</version>	//这里换成对应版本<configuration><source>1.8</source><target>1.8</target></configuration></plugin><plugin><artifactId>maven-assembly-plugin </artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs><archive><manifest><mainClass>com.lizhengi.mr.WordcountDriver</mainClass>  // 此处要换成自己工程的名字</manifest></archive></configuration><executions><execution><id>make-assembly</id><phase>package</phase><goals><goal>single</goal></goals></execution></executions></plugin></plugins></build>

2、更改WcDriver

将

FileInputFormat.setInputPaths(job, "/Users/marron27/test/input");
FileOutputFormat.setOutputPath(job, new Path("/Users/marron27/test/output"));

更改为

 FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));

3、将程序打成jar包，然后拷贝到Hadoop集群中

选中maven工程
选择Hadoop_API>>Lifecycle>>package

完成打包

4、修改不带依赖的jar包名称为wc.jar，并拷贝该jar包到Hadoop集群

mv Hadoop-API-1.0-SNAPSHOT.jar wc.jar
scp wc.jar root@Carlota1:/root/test/input

5、新建测试用例，并上传到HDFS

ssh root@Carlota1
hadoop fs -copyFromLocal hello.txt /demo/test/input

6、执行WordCount程序

hadoop jar wc.jar com.lizhengi.mapreduce.WcDriver /demo/test/input /demo/test/output
这里我是遇到了一个卡在INFO mapreduce.Job: Running job: job_1595222530661_0003的问题，然后通过修改 mapred-site.xml解决
执行结束后，下载结果到本地hadoop fs -copyToLocal /demo/test/output /root/test/output
cat /root/test/output part-r-00000

flume	2
hadoop	2
hdfs	1
hive	1
kafka	2
mapreduce	1
spark 	1
spring	1
take	2
tomcat		2

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/535810.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

jar包在Hadoop集群上测试(MapReduce)

相关文章

代码实现——MapReduce实现Hadoop序列化

日常问题——hadoop启动后发现namenode没有启动，但是排除了格式化过度的问题

Zookeeper3.6.1常用的Shell命令

CentOS7下MySQL5.7的安装

CentOS7下Hive的安装配置

Hive常用的操作命令

Hive常见的属性配置

什么是集群(cluster)

Kafka：集群部署

集群（cluster）amp;高可用性(HA)概念

Kafka：常用命令

MySQL Cluster 群集安装环境介绍

八股文打卡day9——计算机网络（9）

使用ogg实现oracle到kafka的增量数据实时同步

转载：35岁前成功的12条黄金法则

JDK源码解析之 Java.lang.Object

将z-blog改成英文blog所遇到的问题

JDK源码解析之 Java.lang.String

看到一个blog的标语,有意思!

JDK源码解析之 Java.lang.AbstractStringBuilder