第6章 MapReduce入门
6.2 解读WordCount
WordCount程序就是MapReduce的HelloWord程序。通过对WordCount程序分析,我们可以了解MapReduce程序的基本结构和执行过程。
6.2.1 WordCount设计思路
WordCount程序很好的体现了MapReduce编程思想。
一般来说,本文作为MapReduce的输入,MapReduce会将文本进行切分处理并将行号作为输入键值对的键,文本内容作为键值对的值,经map方法处理后,输出中间结果为<word,1>形式。MapReduce会默认按键值分发给reduce方法,在完成计数并输出最后结果<word,count>

6.2.2 MapReduce运行方式
MapReduce运行方式分为本地运行和服务端运行两种。
本地运行多指本地Windows环境,方便开发调试。
而服务端运行,多用于实际生产环境。
6.2.3 编写代码
(1)创建Java 项目

(2)修改Hadoop源码
注意,在Windows本地运行MapReduce程序时,需要修改Hadoop源码。如果在Linux服务器运行,则不需要修改Hadoop源码。
修改Hadoop源码,其实就是简单修改一下Hadoop的NativeIO类的源码
下载对应hadoop源代码,hadoop-2.7.3-src.tar.gz解压,hadoop-2.7.3-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio下NativeIO.java 复制到对应的Eclipse的project.
修改代码
-
public
static
boolean
access
(String path, AccessRight desiredAccess)
-
throws
IOException {
-
return
true;
-
//return access0(path, desiredAccess.accessRight());
-
}
- 1
- 2
- 3
- 4
- 5
如果不修改NativeIO类的源码,在Windows本地运行MapReduce程序会产生异常:
-
log4j:WARN No appenders could be found
for logger (org
.apache
.hadoop
.metrics2
.lib
.MutableMetricsFactory).
-
log4j:WARN Please initialize the log4j system properly.
-
log4j:WARN See
http:/
/logging
.apache
.org
/log4j/
1.2/faq
.html
#noconfig for more info.
-
Exception
in thread
"main" java
.lang
.UnsatisfiedLinkError
: org
.apache
.hadoop
.io
.nativeio
.NativeIO$Windows
.access
0(Ljava/lang/String
;I)Z
-
at org
.apache
.hadoop
.io
.nativeio
.NativeIO$Windows
.access
0(Native Method)
-
at org
.apache
.hadoop
.io
.nativeio
.NativeIO$Windows
.access(NativeIO
.java
:
609)
-
at org
.apache
.hadoop
.fs
.FileUtil
.canRead(FileUtil
.java
:
977)
-
at org
.apache
.hadoop
.util
.DiskChecker
.checkAccessByFileMethods(DiskChecker
.java
:
187)
-
at org
.apache
.hadoop
.util
.DiskChecker
.checkDirAccess(DiskChecker
.java
:
174)
-
at org
.apache
.hadoop
.util
.DiskChecker
.checkDir(DiskChecker
.java
:
108)
-
at org
.apache
.hadoop
.fs
.LocalDirAllocator$AllocatorPerContext
.confChanged(LocalDirAllocator
.java
:
285)
-
at org
.apache
.hadoop
.fs
.LocalDirAllocator$AllocatorPerContext
.getLocalPathForWrite(LocalDirAllocator
.java
:
344)
-
at org
.apache
.hadoop
.fs
.LocalDirAllocator
.getLocalPathForWrite(LocalDirAllocator
.java
:
150)
-
at org
.apache
.hadoop
.fs
.LocalDirAllocator
.getLocalPathForWrite(LocalDirAllocator
.java
:
131)
-
at org
.apache
.hadoop
.fs
.LocalDirAllocator
.getLocalPathForWrite(LocalDirAllocator
.java
:
115)
-
at org
.apache
.hadoop
.mapred
.LocalDistributedCacheManager
.setup(LocalDistributedCacheManager
.java
:
125)
-
at org
.apache
.hadoop
.mapred
.LocalJobRunner$Job.<init>(LocalJobRunner
.java
:
163)
-
at org
.apache
.hadoop
.mapred
.LocalJobRunner
.submitJob(LocalJobRunner
.java
:
731)
-
at org
.apache
.hadoop
.mapreduce
.JobSubmitter
.submitJobInternal(JobSubmitter
.java
:
240)
-
at org
.apache
.hadoop
.mapreduce
.Job$10
.run(Job
.java
:
1290)
-
at org
.apache
.hadoop
.mapreduce
.Job$10
.run(Job
.java
:
1287)
-
at java
.security
.AccessController
.doPrivileged(Native Method)
-
at javax
.security
.auth
.Subject
.doAs(Unknown Source)
-
at org
.apache
.hadoop
.security
.UserGroupInformation
.doAs(UserGroupInformation
.java
:
1698)
-
at org
.apache
.hadoop
.mapreduce
.Job
.submit(Job
.java
:
1287)
-
at org
.apache
.hadoop
.mapreduce
.Job
.waitForCompletion(Job
.java
:
1308)
-
at cn
.hadron
.mr
.RunJob
.main(RunJob
.java
:
33)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
(3)定义Mapper类
-
package cn.hadron.mr;
-
import java.io.IOException;
-
-
import org.apache.hadoop.io.IntWritable;
-
import org.apache.hadoop.io.LongWritable;
-
import org.apache.hadoop.io.Text;
-
import org.apache.hadoop.mapreduce.Mapper;
-
import org.apache.hadoop.util.StringUtils;
-
//4个泛型参数:前两个表示map的输入键值对的key和value的类型,后两个表示输出键值对的key和value的类型
-
public
class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
-
-
//该方法循环调用,从文件的split中读取每行调用一次,把该行所在的下标为key,该行的内容为value
-
protected
void map(LongWritable key, Text value,Context context)
-
throws
IOException, InterruptedException {
-
String[] words = StringUtils.split(value.toString(),
' ');
-
for(String w :words){
-
context.write(
new Text(w),
new IntWritable(
1));
-
}
-
}
-
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
代码说明:
- Mapper类用于读取数据输入并执行map方法,编写Mapper类需要继承org.apache.hadoop.mapreduce.Mapper类,并且根据相应问题实现map方法。
- Mapper类的4个泛型参数:前两个表示map的输入键值对的key和value的类型,后两个表示输出键值对的key和value的类型
- MapReduce计算框架会将键值对作为参数传递给map方法。该方法有3个参数,第1个是Object类型(一般使用LongWritable类型)参数,代表行号,第2个是Object类型参数(一般使用Text类型),代表该行内容,第3个Context参数,代表上下文。
- Context类全名是org.apache.hadoop.mapreduce.Mapper.Context,也就是说Context类是Mapper类的静态内容类,在Mapper类中可以直接使用Context类。
- 在map方法中使用StringUtils的split方法,按空格将输入行内容分割成单词,然后通过Context类的write方法将其作为中间结果输出。
(4)定义Reducer类
-
package cn.hadron.mr;
-
import java.io.IOException;
-
import org.apache.hadoop.io.IntWritable;
-
import org.apache.hadoop.io.Text;
-
import org.apache.hadoop.mapreduce.Reducer;
-
-
public
class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{
-
-
/**
-
* Map过程输出<key,values>中key为单个单词,而values是对应单词的计数值所组成的列表,Map的输出就是Reduce的输入,
-
* 每组调用一次,这一组数据特点:key相同,value可能有多个。
-
* /所以reduce方法只要遍历values并求和,即可得到某个单词的总次数。
-
*/
-
protected
void reduce(Text key, Iterable<IntWritable> values,Context context)
-
throws
IOException, InterruptedException {
-
int sum =
0;
-
for(IntWritable i: values){
-
sum=sum+i.get();
-
}
-
context.write(key,
new IntWritable(sum));
-
}
-
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
代码说明:
- Reducer类用于接收Mapper输出的中间结果作为Reducer类的输入,并执行reduce方法。
- Reducer类的4个泛型参数:前2个代表reduce方法输入的键值对类型(对应map输出类型),后2个代表reduce方法输出键值对的类型
- reduce方法参数:key是单个单词,values是对应单词的计数值所组成的列表,Context类型是org.apache.hadoop.mapreduce.Reducer.Context,是Reducer的上下文。
(6)定义主方法(主类)
-
package cn
.hadron
.mr
;
-
import org
.apache
.hadoop
.conf
.Configuration
;
-
import org
.apache
.hadoop
.fs
.FileSystem
;
-
import org
.apache
.hadoop
.fs
.Path
;
-
import org
.apache
.hadoop
.io
.IntWritable
;
-
import org
.apache
.hadoop
.io
.Text
;
-
import org
.apache
.hadoop
.mapreduce
.Job
;
-
import org
.apache
.hadoop
.mapreduce
.lib
.input
.FileInputFormat
;
-
import org
.apache
.hadoop
.mapreduce
.lib
.output
.FileOutputFormat
;
-
-
public
class RunJob {
-
-
public
static void main(
String[] args) {
-
//设置环境变量HADOOP_USER_NAME,其值是root
-
System
.setProperty(
"HADOOP_USER_NAME",
"root")
;
-
//Configuration类包含了Hadoop的配置
-
Configuration config =new
Configuration()
;
-
//设置fs
.defaultFS
-
config
.set(
"fs.defaultFS",
"hdfs://192.168.80.131:9000")
;
-
//设置yarn
.resourcemanager
节点
-
config
.set(
"yarn.resourcemanager.hostname",
"node1")
;
-
try {
-
FileSystem fs =
FileSystem
.get(config)
;
-
Job job =
Job
.getInstance(config)
;
-
job
.setJarByClass(
RunJob
.class)
;
-
job
.setJobName(
"wc")
;
-
//设置Mapper类
-
job
.setMapperClass(
WordCountMapper
.class)
;
-
//设置Reduce类
-
job
.setReducerClass(
WordCountReducer
.class)
;
-
//设置reduce方法输出key的类型
-
job
.setOutputKeyClass(
Text
.class)
;
-
//设置reduce方法输出value的类型
-
job
.setOutputValueClass(
IntWritable
.class)
;
-
//指定输入路径
-
FileInputFormat
.addInputPath(job, new
Path(
"/user/root/input/"))
;
-
//指定输出路径(会自动创建)
-
Path outpath =new
Path(
"/user/root/output/")
;
-
//输出路径是MapReduce自动创建的,如果存在则需要先删除
-
if(fs
.exists(outpath)){
-
fs
.delete(outpath,
true)
;
-
}
-
FileOutputFormat
.setOutputPath(job, outpath)
;
-
//提交任务,等待执行完成
-
boolean f= job
.waitForCompletion(
true)
;
-
if(f){
-
System
.out
.println(
"job任务执行成功")
;
-
}
-
}
catch (
Exception e) {
-
e
.printStackTrace()
;
-
}
-
}
-
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
(6)本地运行
执行结果: 
-
[root
@node1 ~]
# hdfs dfs -ls /user/root/output
-
Found
2 items
-
-rw-r--r--
3 root supergroup
0
2017-
05-
28 09
:
01 /user/root/output/_SUCCESS
-
-rw-r--r--
3 root supergroup
46
2017-
05-
28 09
:
01 /user/root/output/part-r-
00000
-
[root
@node1 ~]
# hdfs dfs -cat /user/root/output/part-r-00000
-
Hadoop
2
-
Hello
2
-
Hi
1
-
Java
2
-
World
1
-
world
1
-
[root
@node1 ~]
#
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
6.2.4 服务端运行
(1)修改源码
上面代码中的主方法是根据本地运行设计的,如果要在服务器端运行,可以适当简化。
参照官方源码
http://hadoop.apache.org/docs/r2.7.3/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html
将Mapper类和Reducer类写成主类的静态内部类
-
package cn.hadron.mr;
-
-
import java.io.IOException;
-
import java.util.StringTokenizer;
-
import org.apache.hadoop.conf.Configuration;
-
import org.apache.hadoop.fs.Path;
-
import org.apache.hadoop.io.IntWritable;
-
import org.apache.hadoop.io.Text;
-
import org.apache.hadoop.mapreduce.Job;
-
import org.apache.hadoop.mapreduce.Mapper;
-
import org.apache.hadoop.mapreduce.Reducer;
-
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
-
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
-
-
public
class WordCount {
-
//4种形式的参数,分别用来指定map的输入key值类型、输入value值类型、输出key值类型和输出value值类型
-
public
static
class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {
-
-
private
final
static
IntWritable one =
new
IntWritable(
1);
-
private
Text word =
new
Text();
-
//map方法中value值存储的是文本文件中的一行(以回车符为行结束标记),而key值为该行的首字母相对于文本文件的首地址的偏移量
-
public void
map(
Object key,
Text value,
Context context)
throws
IOException,
InterruptedException {
-
StringTokenizer itr =
new
StringTokenizer(value.
toString());
-
//StringTokenizer类将每一行拆分成为一个个的单词,并将<word,1>作为map方法的结果输出
-
while (itr.hasMoreTokens()) {
-
word.
set(itr.nextToken());
-
context.write(word, one);
-
}
-
}
-
}
-
-
public
static
class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
-
private
IntWritable result =
new
IntWritable();
-
//Map过程输出<key,values>中key为单个单词,而values是对应单词的计数值所组成的列表,Map的输出就是Reduce的输入,
-
//所以reduce方法只要遍历values并求和,即可得到某个单词的总次数。
-
public void
reduce(
Text key,
Iterable<
IntWritable> values,
Context context)
-
throws
IOException,
InterruptedException {
-
int sum =
0;
-
for (
IntWritable
val : values) {
-
sum +=
val.
get();
-
}
-
result.
set(sum);
-
context.write(key, result);
-
}
-
}
-
//执行MapReduce任务
-
public
static void main(
String[] args)
throws
Exception {
-
Configuration conf =
new
Configuration();
-
Job job =
Job.getInstance(conf,
"wordCount");
-
job.setJarByClass(
WordCount.
class);
-
job.setMapperClass(
TokenizerMapper.
class);
-
job.setCombinerClass(
IntSumReducer.
class);
-
job.setReducerClass(
IntSumReducer.
class);
-
job.setOutputKeyClass(
Text.
class);
-
job.setOutputValueClass(
IntWritable.
class);
-
//命令行输入的第一个参数是输入路径,第二个参数是输出路径
-
FileInputFormat.addInputPath(job,
new
Path(args[
0]));
-
FileOutputFormat.setOutputPath(job,
new
Path(args[
1]));
-
System.exit(job.waitForCompletion(
true) ?
0 :
1);
-
}
-
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
(2)导出jar包

(3)上传到服务器端运行
和前面一样,通过xftp将刚刚导出到桌面的wordcount.jar包上传到node1节点 
-
[root@node1 ~]
# hadoop jar wordcount.jar cn.hadron.mr.WordCount input output
-
17/
05/
28
10
:
41
:
41 INFO client
.RMProxy
: Connecting to ResourceManager at node1/
192.168
.80
.131
:
8032
-
Exception
in thread
"main" org
.apache
.hadoop
.mapred
.FileAlreadyExistsException
: Output directory
hdfs:/
/node1:
9000
/user
/root/output already exists
-
at org
.apache
.hadoop
.mapreduce
.lib
.output
.FileOutputFormat
.checkOutputSpecs(FileOutputFormat
.java
:
146)
-
at org
.apache
.hadoop
.mapreduce
.JobSubmitter
.checkSpecs(JobSubmitter
.java
:
266)
-
at org
.apache
.hadoop
.mapreduce
.JobSubmitter
.submitJobInternal(JobSubmitter
.java
:
139)
-
at org
.apache
.hadoop
.mapreduce
.Job$10
.run(Job
.java
:
1290)
-
at org
.apache
.hadoop
.mapreduce
.Job$10
.run(Job
.java
:
1287)
-
at java
.security
.AccessController
.doPrivileged(Native Method)
-
at javax
.security
.auth
.Subject
.doAs(Subject
.java
:
422)
-
at org
.apache
.hadoop
.security
.UserGroupInformation
.doAs(UserGroupInformation
.java
:
1698)
-
at org
.apache
.hadoop
.mapreduce
.Job
.submit(Job
.java
:
1287)
-
at org
.apache
.hadoop
.mapreduce
.Job
.waitForCompletion(Job
.java
:
1308)
-
at cn
.hadron
.mr
.WordCount
.main(WordCount
.java
:
59)
-
at sun
.reflect
.NativeMethodAccessorImpl
.invoke
0(Native Method)
-
at sun
.reflect
.NativeMethodAccessorImpl
.invoke(NativeMethodAccessorImpl
.java
:
62)
-
at sun
.reflect
.DelegatingMethodAccessorImpl
.invoke(DelegatingMethodAccessorImpl
.java
:
43)
-
at java
.lang
.reflect
.Method
.invoke(Method
.java
:
498)
-
at org
.apache
.hadoop
.util
.RunJar
.run(RunJar
.java
:
221)
-
at org
.apache
.hadoop
.util
.RunJar
.main(RunJar
.java
:
136)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
这是由于output目录已经存在,删除即可
-
[root
@node1 ~]
# hdfs dfs -rmr /user/root/output
-
rmr: DEPRECATED: Please
use
'rm -r' instead.
-
17/
05/
28
10:
42:
01 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval =
0 minutes, Emptier interval =
0 minutes.
-
Deleted /user/root/output
- 1
- 2
- 3
- 4
重新运行
-
[root@node1 ~]# hadoop jar wordcount.jar cn.hadron.mr.WordCount input output
-
17/
05/
28
10:
43:
12 INFO client.RMProxy: Connecting
to ResourceManager at node1/
192.168
.80
.131:
8032
-
17/
05/
28
10:
43:
14 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing
not performed. Implement the Tool
interface
and execute your application
with ToolRunner
to remedy this.
-
17/
05/
28
10:
43:
15 INFO input.FileInputFormat: Total input paths
to
process :
2
-
17/
05/
28
10:
43:
15 INFO mapreduce.JobSubmitter: number
of splits:
2
-
17/
05/
28
10:
43:
16 INFO mapreduce.JobSubmitter: Submitting tokens
for job: job_1495804618534_0001
-
17/
05/
28
10:
43:
17 INFO impl.YarnClientImpl: Submitted application application_1495804618534_0001
-
17/
05/
28
10:
43:
17 INFO mapreduce.Job: The url
to track the job: http:
//node1:
8088
/proxy/application_1495804618534_0001/
-
17/
05/
28
10:
43:
17 INFO mapreduce.Job: Running job: job_1495804618534_0001
-
17/
05/
28
10:
43:
43 INFO mapreduce.Job: Job job_1495804618534_0001 running
in uber mode : false
-
17/
05/
28
10:
43:
43 INFO mapreduce.Job:
map
0% reduce
0%
-
17/
05/
28
10:
44:
19 INFO mapreduce.Job:
map
100% reduce
0%
-
17/
05/
28
10:
44:
33 INFO mapreduce.Job:
map
100% reduce
100%
-
17/
05/
28
10:
44:
35 INFO mapreduce.Job: Job job_1495804618534_0001 completed successfully
-
17/
05/
28
10:
44:
36 INFO mapreduce.Job: Counters:
50
-
File System Counters
-
FILE: Number
of bytes
read=
89
-
FILE: Number
of bytes written=
355368
-
FILE: Number
of
read operations=
0
-
FILE: Number
of large
read operations=
0
-
FILE: Number
of
write operations=
0
-
HDFS: Number
of bytes
read=
301
-
HDFS: Number
of bytes written=
46
-
HDFS: Number
of
read operations=
9
-
HDFS: Number
of large
read operations=
0
-
HDFS: Number
of
write operations=
2
-
Job Counters
-
Killed
map tasks=
1
-
Launched
map tasks=
2
-
Launched reduce tasks=
1
-
Data-
local
map tasks=
2
-
Total
time spent by
all maps
in occupied slots (ms)=
62884
-
Total
time spent by
all reduces
in occupied slots (ms)=
12445
-
Total
time spent by
all
map tasks (ms)=
62884
-
Total
time spent by
all reduce tasks (ms)=
12445
-
Total vcore-milliseconds taken by
all
map tasks=
62884
-
Total vcore-milliseconds taken by
all reduce tasks=
12445
-
Total megabyte-milliseconds taken by
all
map tasks=
64393216
-
Total megabyte-milliseconds taken by
all reduce tasks=
12743680
-
Map-Reduce Framework
-
Map input records=
6
-
Map output records=
14
-
Map output bytes=
140
-
Map output materialized bytes=
95
-
Input split bytes=
216
-
Combine input records=
14
-
Combine output records=
7
-
Reduce input groups=
6
-
Reduce shuffle bytes=
95
-
Reduce input records=
7
-
Reduce output records=
6
-
Spilled Records=
14
-
Shuffled Maps =
2
-
Failed Shuffles=
0
-
Merged
Map outputs=
2
-
GC
time elapsed (ms)=
860
-
CPU
time spent (ms)=
10230
-
Physical memory (bytes) snapshot=
503312384
-
Virtual memory (bytes) snapshot=
6236766208
-
Total committed heap usage (bytes)=
301146112
-
Shuffle Errors
-
BAD_ID=
0
-
CONNECTION=
0
-
IO_ERROR=
0
-
WRONG_LENGTH=
0
-
WRONG_MAP=
0
-
WRONG_REDUCE=
0
-
File Input Format Counters
-
Bytes
Read=
85
-
File Output Format Counters
-
Bytes Written=
46
-
[root@node1 ~]#
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
查看结果
-
[
root@node1
~
]
#
hdfs
dfs
-
ls
/user/root/output
-
Found
2
items
-
-
rw
-
r
-
-
r
-
-
3
root
supergroup
0
2017
-
05
-
28
10:44
/user/root/output/_SUCCESS
-
-
rw
-
r
-
-
r
-
-
3
root
supergroup
46
2017
-
05
-
28
10:44
/user/root/output/part
-
r
-
00000
-
[
root@node1
~
]
#
hdfs
dfs
-
cat
/user/root/output/part
-
r
-
00000
-
Hadoop
2
-
Hello
2
-
Hi
1
-
Java
2
-
World
1
-
world
1
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
问题补充
2017-06-24
今天再次运行之前写的MapReduce程序时,报错:
(null) entry in command string: null chmod 0700
- 1
解决办法:
(1)下载hadoop-2.7.3.tar.gz,解压缩。比如解压缩到D盘,hadoop根目录就是D:\hadoop-2.7.3
(2)拷贝debug工具(winutils.exe)到HADOOP_HOME/bin 
(3)设置环境变量 

MapReduce入门与WordCount实战
628

被折叠的 条评论
为什么被折叠?



