一、目标
在eclipse中搭建hadoop的wordCount,并连接虚拟机上的hadoop环境,实现统计
二、过程
2.1.软件版本
jdk 1.8.0_31
hadoop 2.7.3
2.2eclipse插件安装(参考https://www.cnblogs.com/zimo-jing/p/8579065.html)
2.3编写项目中WordCountMapper.java
package com.lizp.test.mapper;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCountMapper {
// 字数统计
public static class WordCountMap extends Mapper<Object,Text,Text,IntWritable>{
private final IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
protected void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer token = new StringTokenizer(line);
while(token.hasMoreTokens()) {
word.set(token.nextToken());
context.write(word, one);
}
}
}
// 字数累计统计
public static class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable>{
private IntWritable result = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int sum = 0;
for(IntWritable val: values) {
sum +=val.get();
}
result.set(sum);
context.write(key,result);
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
conf.set("mapreduce.cluster.local.dir","E:\\hadoop-data\\tmp");
Job job = Job.getInstance(conf, "lizp-worldcounter");
job.setJarByClass(WordCountMapper.class);
job.setMapperClass(WordCountMap.class);
job.setCombinerClass(WordCountReduce.class);
job.setReducerClass(WordCountReduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
2.4上传测试数据和配置运行参数
2.4.1上传a.txt到hdfs://172.16.77.186:9000/input/a.txt
hello hadoop hello ketty hello cat hadoop
2.4.2设置java运行的参数
program arguments:hdfs://172.16.77.186:9000/input/a.txt hdfs://172.16.77.186:9000/output5
VM arguments:-Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64/
2.5运行结果
cat 1
hadoop 2
hello 3
ketty 1
三、总结
3.1遇到的问题
3.1.1window单台上安装hadoop配置了cpu、内存参数,否则会导致mapreduce 0%卡主
https://blog.youkuaiyun.com/dai451954706/article/details/50464036
3.1.2eclipse插件安装
https://www.cnblogs.com/supiaopiao/p/7240308.html
3.1.3修改NationIO源码,避免window检测权限
https://blog.youkuaiyun.com/congcong68/article/details/42043093
3.1.4VM 参数配置
-Djava.library.path=$HADOOP_HOME/lib/native/Linux-amd64-64/
https://zhidao.baidu.com/question/1382439112860211060.html
3.1.5注意hadoop在windows上运行的临时目录的读取权限
E:\tmp\hadoop-T\mapred\staging\hadoop40875801
3.1.6注意代码中private final IntWritable one = new IntWritable(1);的初始化