MapReduce词频统计

1.1 文件准备
创建本地目录和创建两个文本文件，在两个文件中输入单词，用于统计词频。

cd /usr/local/hadoop
mkdir WordFile
cd WordFile
touch wordfile1.txt
touch wordfile2.txt

在这里插入图片描述

1.2 创建一个HDFS目录，在本地上不可见,并将本地文本文件上传到HDFS目录。通过如下命令创建。

cd /usr/local/hadoop
./bin/hdfs dfs -mkdir wordfileinput
./bin/hdfs dfs -put ./WordFile/wordfile1.txt wordfileinput
./bin/hdfs dfs -put ./WordFile/wordfile2.txt wordfileinput

1.3 保证HDFS目录不存在output,我们执行如下命令，每次运行词频统计都要删除output输出文件,/user/hadoop/是HDFS的用户目录，不是本地目录。

./bin/hdfs dfs -rm -r /user/hadoop/output

1.4 Eclips编写代码
创建Java project ,项目名称为MapReduceWordCount,右键项目名，导入相关Jar包。
在这里插入图片描述

1.5 点击Add External Jars,进入目录/usr/local/hadoop/share/hadoop,导入如下包。

“/usr/local/hadoop/share/hadoop/common”目录下的hadoop-common-3.1.3.jar和haoop-nfs-3.1.3.jar；
“/usr/local/hadoop/share/hadoop/common/lib”目录下的所有JAR包；
“/usr/local/hadoop/share/hadoop/mapreduce”目录下的所有JAR包，但是，不包括jdiff、lib、lib-examples和sources目录;
“/usr/local/hadoop/share/hadoop/mapreduce/lib”目录下的所有JAR包。

1.6 创建类WordCount.java

import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {public WordCount() {}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();if(otherArgs.length < 2) {System.err.println("Usage: wordcount <in> [<in>...] <out>");System.exit(2);}Job job = Job.getInstance(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(WordCount.TokenizerMapper.class);job.setCombinerClass(WordCount.IntSumReducer.class);job.setReducerClass(WordCount.IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class); for(int i = 0; i < otherArgs.length - 1; ++i) {FileInputFormat.addInputPath(job, new Path(otherArgs[i]));}FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));System.exit(job.waitForCompletion(true)?0:1);}public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {private static final IntWritable one = new IntWritable(1);private Text word = new Text();public TokenizerMapper() {}public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString()); while(itr.hasMoreTokens()) {this.word.set(itr.nextToken());context.write(this.word, one);}}}
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {private IntWritable result = new IntWritable();public IntSumReducer() {}public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {int sum = 0;IntWritable val;for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {val = (IntWritable)i$.next();}this.result.set(sum);context.write(key, this.result);}}
}

1.7 编译打包程序
将程序打包到　/usr/local/hadoop/myapp目录下，

cd /usr/local/hadoop
mkdir myapp

Run As 运行程序；
右键工程名－>Export－>Java－>Runnable JAR file

在这里插入图片描述

“Launch configuration”用于设置生成的JAR包被部署启动时运行的主类，需要在下拉列表中选择刚才配置的类“WordCount-MapReduceWordCount”。在“Export destination”中需要设置JAR包要输出保存到哪个目录即其名称。点击finish,中间会出现一些信息，一直点击Ｏｋ即可。

1.8 运行程序
启动hadoop

cd /usr/local/hadoop
./sbin/start-dfs.sh
./bin/hadoop jar ./myapp/WordCount.jar wordfileinput output

在这里插入图片描述

1.9 查看结果

cd /usr/local/hadoop
./bin/hdfs dfs -cat output/*

在这里插入图片描述

1.20 查看HDFS 文件系统
进入/usr/local/hadoop/bin 目录，执行相关命令。

./hadoop fs -ls

1.21 源文档
http://dblab.xmu.edu.cn/blog/2481-2/#more-2481

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.mzph.cn/news/484167.shtml

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈email:809451989@qq.com，一经查实，立即删除！

MapReduce词频统计

相关文章

炼数成金 Oracle EBS R12 DBA培训视频教程

Spring中注解大全

Linux系统组成

使用github托管代码以及github一些最常用的命令

Hive安装与配置MySQL元数据库

国际领先的人工智能团队值得我们学习和深思

C - 思考使用差分简化区间操作

Curr Biol：间隔学习可巩固记忆的奥秘

Flume原理及使用案例

Hadoop相关技术

网络的性能指标与分组交换网络

Hadoop分布式集群安装配置

复杂性理论研究的核心问题是什么

【译】索引进阶（四）：页和区

网络体系结构

计算机网络安全-RSA加密原理

为什么神经网络不适合理解自然语言？

Spark详解

应用层协议与网络应用

oc之脚本