1. jar cf WordCount.jar WordCount*.class
使用情况:
编译WordCount.java文件,编译java文件的命令为javac,截图如下:
编译WordCount.java
此时,在workspace文件夹下将会出现生成三个class文件,
编译后生成class文件
编译成功后,即可将三个class文件打包成jar文件,
打包class成jar文件
执行成功后,在workspace文件下生成了WordCount.jar文件,
打包jar完成
-c:创建新的jar文件包;
-f:指定jar文件名;
WordCount.jar:[jar-文件] 即需要生成、查看、更新或者解开的 JAR 文件包,它是 -f 参数的附属参数 ;
WordCount*.class:名字可以简写为WordCount的.class文件;
2.bin/hadoop jar workspace/WordCount.jar WordCount input output
使用情况:
在/usr/local/hadoop文件夹下新建一个input文件夹,用于存放数据,
创建input文件夹
接着cd 到input文件下,执行以下命令,就是将’Hello World Bye World’写进file01文件,将’Hello Hadoop Goodbye Hadoop’ 写进file02文件
创建输入数据
最后运行程序,
运行程序
类似的:
hadoop jar WordCount.jar WordCount input output
hadoop jar WordCount.jar WordCount /tmp/input /tmp/output
/usr/local/hadoop/bin/hadoop jar WordCount.jar WordCount input output
/usr/local/hadoop/bin/hadoop jar WordCount.jarorg.apache.hadoop.examples.WordCount
input output
(因为某些程序中声明了 package ,所以在命令中也要 org.apache.hadoop.examples 写完整,这些程序的第一行代码就是:package org.apache.hadoop.examples)
bin/hadoop:/usr/local/hadoop/bin/hadoop,这是一个hadoop文件的位置,不是文件夹,是对java命令的又一层封装,可以认为是hadoop在shell端的脚本;
jar:执行一个作业任务,其数据在jar中;
workspace/WordCount.jar:WordCount.jar的详细位置,结合前面的参数/usr/local/hadoop,详细位置为/usr/local/hadoop/workspace/WordCount.jar;
WordCount:
input:在hdfs中的数据输入目录;
output:在hdfs中的数据输出目录;
代码示例:
/*** Licensed to the Apache Software Foundation (ASF) under one* or more contributor license agreements. See the NOTICE file* distributed with this work for additional information* regarding copyright ownership. The ASF licenses this file* to you under the Apache License, Version 2.0 (the* "License"); you may not use this file except in compliance* with the License. You may obtain a copy of the License at** http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an "AS IS" BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/package org.apache.hadoop.examples;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;publicclassWordCount{publicstaticclassTokenizerMapperextendsMapper<Object,Text,Text,IntWritable>{privatefinalstaticIntWritable one=newIntWritable(1);privateText word=newText();publicvoid map(Object key,Text value,Context context)throwsIOException,InterruptedException{StringTokenizer itr =newStringTokenizer(value.toString());while(itr.hasMoreTokens()){word.set(itr.nextToken());context.write(word, one);}}}publicstaticclassIntSumReducerextendsReducer<Text,IntWritable,Text,IntWritable>{privateIntWritable result=newIntWritable();publicvoid reduce(Text key,Iterable<IntWritable> values,Context context)throwsIOException,InterruptedException{int sum =0;for(IntWritable val: values){sum+= val.get();}result.set(sum);context.write(key, result);}}publicstaticvoid main(String[] args)throwsException{Configuration conf =newConfiguration();String[] otherArgs =newGenericOptionsParser(conf, args).getRemainingArgs();if(otherArgs.length!=2){System.err.println("Usage: wordcount <in> <out>");System.exit(2);}Job job =newJob(conf,"word count");job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job,newPath(otherArgs[0]));FileOutputFormat.setOutputPath(job,newPath(otherArgs[1]));System.exit(job.waitForCompletion(true)?0:1);}}
参考资料:
http://blog.youkuaiyun.com/wang_zhenwei/article/details/47403825
http://dblab.xmu.edu.cn/blog/hadoop-build-project-by-shell/

本文介绍如何使用Hadoop实现WordCount程序,包括编译Java源文件、打包成JAR文件及运行程序的具体步骤。
6337

被折叠的 条评论
为什么被折叠?



