首先将1月份的订单数据上传到HDFS上,订单数据格式 ID Goods两个数据字段构成
将订单数据保存在order.txt中,(上传前记得启动集群)。
打开Idea创建项目
修改pom.xml,添加依赖
<dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>3.1.4</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>4.12</version></dependency><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-log4j12</artifactId><version>1.7.30</version></dependency> </dependencies>
指定打包方式:jar
打包时插件的配置:
<build><plugins><plugin><artifactId>maven-compiler-plugin</artifactId><version>3.1</version><configuration><source>1.8</source><target>1.8</target></configuration></plugin><plugin><artifactId>maven-assembly-plugin</artifactId><configuration><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><phase>package</phase></execution></executions></plugin></plugins> </build>
在resources目录下新建log4j文件log4j.properties
log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n log4j.appender.logfile=org.apache.log4j.FileAppender log4j.appender.logfile.File=D:\\ordercount.log log4j.appender.logfile.layout=org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
在com.maidu.ordercount包中创建一个新类ShoppingOrderCount类,编写以下模块
1.Mapper模块的编写
在ShoppingOrderCount中定义一个内部类MyMapper
public static class MyMap extends Mapper<Object,Text, Text, IntWritable>{@Overridepublic void map(Object key,Text value,Context context) throws IOException ,InterruptedException {String line =value.toString();String[] arr =line.split(" "); //3 水果 水果作为键 值 1(数量1 不是 3 表示用户编号)if(arr.length==2){context.write( new Text(arr[1]),new IntWritable(1) );}} }
2.Reducer模块的编写
在ShoppingOrderCount中定义一个内部类MyReduce
public static class MyReduce extends Reducer<Text,IntWritable,Text,IntWritable>{@Overrideprotected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {int count =0;for(IntWritable val:values){count++;}context.write(key,new IntWritable(count));} }
3.Driver模块的编写
在ShoppingOrderCount类中编写主方法
public static void main(String[] args) throws Exception{Configuration conf =new Configuration();String []otherArgs =new GenericOptionsParser(conf,args).getRemainingArgs();if(otherArgs.length<2){System.out.println("必须输入读取文件路径和输出文件路径");System.exit(2);}Job job = Job.getInstance(conf,"order count");job.setJarByClass(ShoppingOrderCount.class);job.setMapperClass(MyMap.class);job.setReducerClass(MyReduce.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);//添加输入的路径for(int i =0;i<otherArgs.length-1;i++){FileInputFormat.addInputPath(job,new Path(otherArgs[i]));}//设置输出路径FileOutputFormat.setOutputPath(job,new Path(otherArgs[otherArgs.length-1]));//执行任务System.exit( job.waitForCompletion(true)?0:1 );}
4.使用Maven编译打包,将项目打包为jar
从上往下,四步走,最终target下会生产jar文件
5.将orderCount-1.0-SNAPSHOT.jar拷贝上传到master主机上。
6.执行Jar
[yt@master ~]$ hadoop jar orderCount-1.0-SNAPSHOT.jar com.maidu.ordercount.ShoppingOrderCount /bigdata/order.txt /output-2301-02/
7.执行后查看结果
备注:如果运行出现虚拟内存不够,请参考:is running 261401088B beyond the ‘VIRTUAL‘ memory limit. Current usage: 171.0 MB of 1 GB physical-CSDN博客