Hadoop学习1_在使用命令行运行WordCount时，遇到的jar命令说明

最新推荐文章于 2024-06-07 19:41:02 发布

原创最新推荐文章于 2024-06-07 19:41:02 发布 · 6.5k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#jar #hadoop

原创同时被 2 个专栏收录

105 篇文章

订阅专栏

Hadoop

37 篇文章

订阅专栏

本文介绍如何使用Hadoop实现WordCount程序，包括编译Java源文件、打包成JAR文件及运行程序的具体步骤。

1. jar cf WordCount.jar WordCount*.class

使用情况：

编译WordCount.java文件，编译java文件的命令为javac，截图如下：

编译WordCount.java

此时，在workspace文件夹下将会出现生成三个class文件，

编译后生成class文件

编译成功后，即可将三个class文件打包成jar文件，

打包class成jar文件

执行成功后，在workspace文件下生成了WordCount.jar文件，

打包jar完成

jar cf WordCount.jar WordCount*.class

-c：创建新的jar文件包；

-f：指定jar文件名；

WordCount.jar：[jar-文件] 即需要生成、查看、更新或者解开的 JAR 文件包，它是 -f 参数的附属参数；

WordCount*.class：名字可以简写为WordCount的.class文件；

2.bin/hadoop jar workspace/WordCount.jar WordCount input output

使用情况：

在/usr/local/hadoop文件夹下新建一个input文件夹，用于存放数据，

创建input文件夹

接着cd 到input文件下，执行以下命令，就是将’Hello World Bye World’写进file01文件，将’Hello Hadoop Goodbye Hadoop’ 写进file02文件

创建输入数据

最后运行程序，

运行程序

bin/hadoop jar workspace/WordCount.jar WordCount input output

类似的：

hadoop jar WordCount.jar WordCount input output

hadoop jar WordCount.jar WordCount /tmp/input /tmp/output

/usr/local/hadoop/bin/hadoop jar WordCount.jar WordCount input output

/usr/local/hadoop/bin/hadoop jar WordCount.jarorg.apache.hadoop.examples.WordCount input output

（因为某些程序中声明了 package ，所以在命令中也要 org.apache.hadoop.examples 写完整，这些程序的第一行代码就是：package org.apache.hadoop.examples）

bin/hadoop：/usr/local/hadoop/bin/hadoop，这是一个hadoop文件的位置，不是文件夹，是对java命令的又一层封装，可以认为是hadoop在shell端的脚本；

jar：执行一个作业任务，其数据在jar中；

workspace/WordCount.jar：WordCount.jar的详细位置，结合前面的参数/usr/local/hadoop，详细位置为/usr/local/hadoop/workspace/WordCount.jar；

WordCount：

input：在hdfs中的数据输入目录；

output：在hdfs中的数据输出目录；

代码示例：

/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.examples;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
publicclassWordCount{
publicstaticclassTokenizerMapper
extendsMapper<Object,Text,Text,IntWritable>{
privatefinalstaticIntWritable one=newIntWritable(1);
privateText word=newText();
publicvoid map(Object key,Text value,Context context
)throwsIOException,InterruptedException{
StringTokenizer itr =newStringTokenizer(value.toString());
while(itr.hasMoreTokens()){
word.set(itr.nextToken());
context.write(word, one);
}
}
}
publicstaticclassIntSumReducer
extendsReducer<Text,IntWritable,Text,IntWritable>{
privateIntWritable result=newIntWritable();
publicvoid reduce(Text key,Iterable<IntWritable> values,
Context context
)throwsIOException,InterruptedException{
int sum =0;
for(IntWritable val: values){
sum+= val.get();
}
result.set(sum);
context.write(key, result);
}
}
publicstaticvoid main(String[] args)throwsException{
Configuration conf =newConfiguration();
String[] otherArgs =newGenericOptionsParser(conf, args).getRemainingArgs();
if(otherArgs.length!=2){
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job =newJob(conf,"word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job,newPath(otherArgs[0]));
FileOutputFormat.setOutputPath(job,newPath(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}

参考资料：

http://blog.youkuaiyun.com/wang_zhenwei/article/details/47403825

http://dblab.xmu.edu.cn/blog/hadoop-build-project-by-shell/