上篇介绍了在linux下测试运行 hadoop 的wordcount 例子后,就想着怎么在eclipse 下编写mapreduce函数,链接hadoop集群计算呢。
linux下测试运行 hadoop 的wordcount 参考:https://mp.youkuaiyun.com/mdeditor/84143774#
linux 部署hadoop 集群参考:https://mp.youkuaiyun.com/mdeditor/84073712#
1 下载eclipse的hadoop插件 hadoop2x-eclipse-plugin-2.6.0
https://download.youkuaiyun.com/download/qq_22830285/10792412
下载之后解压,将relase 目录下的hadoop-eclipse-plugin-2.6.0.jar 复制到eclipse的 plugin 目录下、
2.重起运行eclipse,打开菜单Window->ShowView->Other,显示如
3、new hadoop lacation 配置elcipse 与hadoop 的链接。填完之后,点击finish.
/4、保存完配置之后,可以看到project explorer ,新增了一个 DFS location.
/5、如果出现下面错误的话,在系统环境变量添加HADOOP_USER_NAME=root环境变量,或者win系统的用户名改为root,又或者修改hadoop 的hdfs-site文件中添加以下内容,关闭权限检查 ,即解决了上述问题。
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
/6、新建立项目map/reduce 项目
创建 map 函数,WordCountMap 类如下
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> {
private final IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer token = new StringTokenizer(line);
while (token.hasMoreTokens()) {
word.set(token.nextToken());
context.write(word, one);
}
}
}
创建reduce 函数,WordCountReduce
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
创建wordcount mian ,WordCountTest
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCountTest {
@SuppressWarnings("deprecation")
public static void main(String[] args) throws Exception{
// Configuration conf = new Configuration();
Job job = new Job();
job.setJarByClass(WordCountTest.class);
job.setJobName("wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(WordCountMap.class);
job.setReducerClass(WordCountReduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
//输入文件路径,我的a.txt,b.txt 文件放在hdfs系统下的 user/root/input 目录下
FileInputFormat.addInputPath(job, new Path("hdfs://192.168.80.130:9000/user/root/input"));
//计算结果输出文件路径,记住,此路径不能存在,否则会报错
FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.80.130:9000/user/root/out3"));
job.waitForCompletion(true)
}
}
然后运行,WordCountTest 类。
run as -->run on hadoop
/7、也许,看不到控制台的日志,那么将hadoop 的log4j.properties文件复制工程src 目录下。
/8.运行之后DFS相应的目录下看到输出结果。