使用命令行编译打包运行自己的MapReduce程序 Hadoop2.4.1
网上的MapReduce WordCount教程对于如何编译WordCount.java几乎是一笔带过… 而有写到的,大多又是 0.20 等旧版本版本的做法,即 javac -classpath /usr/local/hadoop/hadoop-1.0.1/hadoop-core-1.0.1.jar WordCount.java,但较新的 2.X 版本中,已经没有 hadoop-core*.jar 这个文件,因此编辑和打包自己的MapReduce程序与旧版本有所不同。
本文以 Hadoop 2.4.1 环境下的WordCount实例来介绍 2.x 版本中如何编辑自己的MapReduce程序。
Hadoop 2.x 版本中的依赖 jar
Hadoop 2.x 版本中jar不再集中在一个 hadoop-core*.jar 中,而是分成多个 jar,如运行WordCount实例需要如下三个 jar:
- $HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar
- $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar
- $HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
编译、打包 Hadoop MapReduce 程序
将上述 jar 添加至 classpath 路径:
export CLASSPATH="$HADOOP_HOME/share/hadoop/common/hadoop-common-2.4.1.jar:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.4.1.jar:$HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar:$CLASSPATH"
接着就可以编译 WordCount.java 了(使用的是 2.4.1 源码中的 WordCount.java,源码在文本最后面):
javac WordCount.java
(注:
按照上面的操作不能成功,推荐使用命令行如下,
设置环境变量,
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar
编译,
/usr/local/hadoop/bin/hadoop com.sun.tools.javac.Main WordCount.java
)
编译时会有警告,可以忽略。编译后可以看到生成了几个.class文件。
使用Javac编译自己的MapReduce程序
接着把 .class 文件打包成 jar,才能在 Hadoop 中运行:
jar -cvf WordCount.jar ./WordCount*.class
打包完成后,运行试试,创建几个输入文件:
Mkdir input
echo "echo of the rainbow" > ./input/file0
echo "the waiting game" > ./input/file1
创建WordCount的输入
开始运行:
/usr/local/hadoop/bin/hadoop jar WordCount.jar WordCount input output
不过这边可能会遇到如下的提示 Exception in thread "main" java.lang.NoClassDefFoundError: WordCount :
提示找不到 WordCount 类
因为程序中声明了 package ,所以在命令中也要 org.apache.hadoop.examples 写完整:
/usr/local/hadoop/bin/hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output
(注:
在这里,不会遇到上述的问题,
但是会遇到如下问题,
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/hadoop/input
和hadoop单机版遇到的问题相同,
暂时先用以前的方法解决,
使用命令,vim /usr/local/hadoop/etc/hadoop/core-site.xml
将core-site.xml文件重新编辑为原来的样子,也就是删除新添加的内容,
vim的删除行指令,dd为删除当前行,i为从当前位置插入
删除后,顺利完成,符合实验预期,
)
正确运行后的结果如下:
WordCount 运行结果
(
注:
第二条指令也可以使用,
cat output/*
)
进阶:使用Eclipse编译运行MapReduce程序
使用命令行编译运行MapReduce程序毕竟有些麻烦,修改一次就得手动编译、打包一次,使用Eclipse编译运行MapReduce程序会更加方便。
WordCount.java 源码
文件位于 hadoop-2.4.1-src\hadoop-mapreduce-project\hadoop-mapreduce-examples\src\main\java\org\apache\hadoop\examples 中:
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */package org.apache.hadoop.examples;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: wordcount <in> <out>"); System.exit(2); } Job job = new Job(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}

本文详细介绍了在Hadoop 2.x环境下如何编译、打包及运行MapReduce程序的具体步骤,包括设置环境变量、编译Java源文件、打包成jar文件以及在Hadoop中运行程序。
_使用命令行编译打包运行自己的MapReduce程序&spm=1001.2101.3001.5002&articleId=47439623&d=1&t=3&u=d95764e04c744fbb817419777fa17d02)
1110





