分布式并行计算框架:数据在哪,计算就在哪
【主要是map输入<k1,v1>,map的输出或reduce的输入<k2,v2>,reduce输出<k3,v3>分别代表什么,弄懂其中关系就容易了】
【k1:表示字节数索引,v1:表示该行的值】
【k2:表示map的context.write(key,value)的key值,v2:表示value值】
【k3:表示以key值分组,v3:表示对key分组有关的value值进行一定的操作,而产生的结果】
-
MapReduce:hadoop提供的一套分布式并行计算框架
Map阶段:映射阶段(并行阶段),多个datanode并行计算多个文件内容
Reduce阶段:规约阶段(合并阶段),将Map总结出的数据,合并给reduce处理 -
实现方式:均已键值对进行数据传输的。即<key,value>
-
需求:
在hadoop集群上,存在多个个文件:
hello1 hello2
hello world pleace, i wanna sleep right now how are you but I can’t
ni hao learning is so hard hello China so, just work harder
求文件hello1中词频出现的次数。 hello 2 world 1 how 1 are 1 you 1 ni 1 hao 1
China 1
-
编程思路:
【必须将core-site.xml文件加载到classpath下,或者利用conf.set()方法,设置对应的集群,否则不会连接集群,默认加载core-default.xml,然后加载core-site.xml,接着加载conf.set()方法中的配置,后来加载的配置会覆盖先加载的配置】
a.构造数据。将数据上传至hdfs
b.创建Mapper类,extends【org.apache.hadoop.mapreduce.Mapper】 class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
说明:Mapper是将输入记录转换为中间记录。转换后的中间记录不需要与输入记录的类型相同。
给定的输入对可以映射为零或许多输出对。
c.创建Reducer类,extends【org.apache.hadoop.mapreduce.Reducer】 MapReduce框架为每个提交集群的Job(作业),通过计算InputSplit(切分),分配map task。
hello1文件
hello world
how are you
ni hao
map阶段操作的键值对为:<k1,v1> --> <k2,v2>
hello world <k1,v1>:<0,‘hello world’> --> <k2,v2>:<hello,1>,<world,1>
how are you <k1,v1>:<11,‘how are you’> --> <k2,v2>:<how,1>,<are,1>,<you,1>
ni hao <k1,v1>:<22,‘ni hao’> --> <k2,v2>:<ni,1>,<hao,1>
Map阶段的K1值为【偏移量】,故类型为LongWritable,而不是行号;V1值为【行值】,故类型为Text。
reduce阶段,K2为中间值,V2为业务值
-
代码实现:词频统计
-
WordCountMapper类
package com.hyxy.hadoop.mr;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text word = new Text();
private IntWritable one = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split(" ");
for (String wd : words) {
System.out.println(word);
word.set(wd);
context.write(word, one); }
}
} -
WordCountReducer类
package com.hyxy.hadoop.mr; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable _sum = new IntWritable(); @Override protected void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); _sum.set(sum); } context.write(key, _sum); } }
-
WordCount类
package com.hyxy.hadoop.mr;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount {
public static void main(String[] args) throws ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
try {
//创建一个作业
Job job =Job.getInstance(conf);
//为Job设置一个mapper
job.setMapperClass(WordCountMapper.class);
//为Job设置一个reducer
job.setReducerClass(WordCountReducer.class);
//设置作业名
job.setJobName(“wordCount”);
//设置jar包,以类名方式
job.setJarByClass(WordCount.class);
//设置jar包,打包必须以该名称,否则将找不到该资源
//job.setJar(“wcc.jar”);
//设置mapper输出键值对指定类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//设置reducer输出键值对指定类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//设置文件输入路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
//设置文件输出路径
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true)?0:1);
} catch (IOException e) {
e.printStackTrace();
}
}
}
-
-
eclipse下执行【本地测试】
【并不会提交给集群,只在本地运行,没有在集群上生成作业】
-
必须将数据文件单独存放在一个目录下
-
必须将指定目录级联赋予777权限,否则权限异常,连接该目录被拒绝,不行就将/目录设置最大权限
19/08/09 23:11:19 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/08/09 23:11:19 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
org.apache.hadoop.security.AccessControlException: Permission denied: user=mumu, access=EXECUTE, inode="/mr/result3":hadoop:supergroup:drwxrw-rw-
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3857)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1012)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol 2. c a l l B l o c k i n g M e t h o d ( C l i e n t N a m e n o d e P r o t o c o l P r o t o s . j a v a ) a t o r g . a p a c h e . h a d o o p . i p c . P r o t o b u f R p c E n g i n e 2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine 2.callBlockingMethod(ClientNamenodeProtocolProtos.java)atorg.apache.hadoop.ipc.ProtobufRpcEngineServer P r o t o B u f R p c I n v o k e r . c a l l ( P r o t o b u f R p c E n g i n e . j a v a : 616 ) a t o r g . a p a c h e . h a d o o p . i p c . R P C ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)atorg.apache.hadoop.ipc.RPCServer.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler 1. r u n ( S e r v e r . j a v a : 2049 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2049) at org.apache.hadoop.ipc.Server 1.run(Server.java:2049)atorg.apache.hadoop.ipc.ServerHandler 1. r u n ( S e r v e r . j a v a : 2045 ) a t j a v a . s e c u r i t y . A c c e s s C o n t r o l l e r . d o P r i v i l e g e d ( N a t i v e M e t h o d ) a t j a v a x . s e c u r i t y . a u t h . S u b j e c t . d o A s ( S u b j e c t . j a v a : 422 ) a t o r g . a p a c h e . h a d o o p . s e c u r i t y . U s e r G r o u p I n f o r m a t i o n . d o A s ( U s e r G r o u p I n f o r m a t i o n . j a v a : 1698 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server 1.run(Server.java:2045)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)atorg.apache.hadoop.ipc.ServerHandler.run(Server.java:2043)-
通过配置提交作业到集群
-
没有提交到集群,还是因为配置文件,默认加载【mapreduce.framework.name=local】加载本地,不上传集群。【yarn.resourcemanager.hostname=0.0.0.0】,还是无法连上集群。因此至少要加载conf.set()这两个配置,或者将4个配置文件都加载到classpath下,就可以自动加载。
-
将集群上的/tmp级联赋予777权限,否则权限异常
19/08/10 15:07:39 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
org.apache.hadoop.security.AccessControlException: Permission denied: user=mumu, access=EXECUTE, inode="/tmp/hadoop-yarn/staging/mumu/.staging":hadoop:supergroup:d---------
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728)
at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3857)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1012)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol 2. c a l l B l o c k i n g M e t h o d ( C l i e n t N a m e n o d e P r o t o c o l P r o t o s . j a v a ) a t o r g . a p a c h e . h a d o o p . i p c . P r o t o b u f R p c E n g i n e 2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine 2.callBlockingMethod(ClientNamenodeProtocolProtos.java)atorg.apache.hadoop.ipc.ProtobufRpcEngineServer P r o t o B u f R p c I n v o k e r . c a l l ( P r o t o b u f R p c E n g i n e . j a v a : 616 ) a t o r g . a p a c h e . h a d o o p . i p c . R P C ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)atorg.apache.hadoop.ipc.RPCServer.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler 1. r u n ( S e r v e r . j a v a : 2049 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2049) at org.apache.hadoop.ipc.Server 1.run(Server.java:2049)atorg.apache.hadoop.ipc.ServerHandler 1. r u n ( S e r v e r . j a v a : 2045 ) a t j a v a . s e c u r i t y . A c c e s s C o n t r o l l e r . d o P r i v i l e g e d ( N a t i v e M e t h o d ) a t j a v a x . s e c u r i t y . a u t h . S u b j e c t . d o A s ( S u b j e c t . j a v a : 422 ) a t o r g . a p a c h e . h a d o o p . s e c u r i t y . U s e r G r o u p I n f o r m a t i o n . d o A s ( U s e r G r o u p I n f o r m a t i o n . j a v a : 1698 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server 1.run(Server.java:2045)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)atorg.apache.hadoop.ipc.ServerHandler.run(Server.java:2043) -
出现错误,就将【mapreduce.app-submission.cross-platform】跨平台属性设置为true
19/08/10 16:57:48 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
19/08/10 16:57:49 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/08/10 16:57:49 INFO input.FileInputFormat: Total input paths to process : 3
19/08/10 16:57:49 INFO mapreduce.JobSubmitter: number of splits:3
19/08/10 16:57:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0003
19/08/10 16:57:50 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0003
19/08/10 16:57:50 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0003/
19/08/10 16:57:50 INFO mapreduce.Job: Running job: job_1565422129055_0003
19/08/10 16:57:52 INFO mapreduce.Job: Job job_1565422129055_0003 running in uber mode : false
19/08/10 16:57:52 INFO mapreduce.Job: map 0% reduce 0%
19/08/10 16:57:52 INFO mapreduce.Job: Job job_1565422129055_0003 failed with state FAILED due to: Application application_1565422129055_0003 failed 2 times due to AM Container for appattempt_1565422129055_0003_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://master:8088/cluster/app/application_1565422129055_0003Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1565422129055_0003_02_000001
Exit code: 1
Exception message: /bin/bash: line 0: fg: no job controlStack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell S h e l l C o m m a n d E x e c u t o r . e x e c u t e ( S h e l l . j a v a : 773 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . D e f a u l t C o n t a i n e r E x e c u t o r . l a u n c h C o n t a i n e r ( D e f a u l t C o n t a i n e r E x e c u t o r . j a v a : 212 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . c o n t a i n e r m a n a g e r . l a u n c h e r . C o n t a i n e r L a u n c h . c a l l ( C o n t a i n e r L a u n c h . j a v a : 302 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . c o n t a i n e r m a n a g e r . l a u n c h e r . C o n t a i n e r L a u n c h . c a l l ( C o n t a i n e r L a u n c h . j a v a : 82 ) a t j a v a . u t i l . c o n c u r r e n t . F u t u r e T a s k . r u n ( F u t u r e T a s k . j a v a : 266 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r . r u n W o r k e r ( T h r e a d P o o l E x e c u t o r . j a v a : 1142 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r ShellCommandExecutor.execute(Shell.java:773) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor ShellCommandExecutor.execute(Shell.java:773)atorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)atjava.util.concurrent.FutureTask.run(FutureTask.java:266)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)atjava.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
19/08/10 16:57:52 INFO mapreduce.Job: Counters: 0 -
打jar包到【Build path】中或拷贝jar包到classp下,运行即可
-
运行速度没有之前快,就是提交到了集群处理
-
19/08/10 15:48:17 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
19/08/10 15:48:18 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/08/10 15:48:18 INFO input.FileInputFormat: Total input paths to process : 3
19/08/10 15:48:18 INFO mapreduce.JobSubmitter: number of splits:3
19/08/10 15:48:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0002
19/08/10 15:48:18 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0002
19/08/10 15:48:19 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0002/
19/08/10 15:48:19 INFO mapreduce.Job: Running job: job_1565422129055_0002
19/08/10 15:48:30 INFO mapreduce.Job: Job job_1565422129055_0002 running in uber mode : false
19/08/10 15:48:30 INFO mapreduce.Job: map 0% reduce 0%
19/08/10 15:48:51 INFO mapreduce.Job: map 33% reduce 0%
19/08/10 15:48:52 INFO mapreduce.Job: map 100% reduce 0%
19/08/10 15:48:59 INFO mapreduce.Job: map 100% reduce 100%
19/08/10 15:48:59 INFO mapreduce.Job: Job job_1565422129055_0002 completed successfully
19/08/10 15:48:59 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=617
FILE: Number of bytes written=475779
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=571
HDFS: Number of bytes written=290
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters 【作业统计】
Launched map tasks=3【map任务数3个,根据split分片决定个数】
Launched reduce tasks=1【reduce任务数1个】
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=58330
Total time spent by all reduces in occupied slots (ms)=5117
Total time spent by all map tasks (ms)=58330
Total time spent by all reduce tasks (ms)=5117
Total vcore-milliseconds taken by all map tasks=58330
Total vcore-milliseconds taken by all reduce tasks=5117
Total megabyte-milliseconds taken by all map tasks=59729920
Total megabyte-milliseconds taken by all reduce tasks=5239808
Map-Reduce Framework
Map input records=16【map输入行数统计】
Map output records=58【map输出行数统计】
Map output bytes=495【map输出字节统计】
Map output materialized bytes=629
Input split bytes=308
Combine input records=0
Combine output records=0
Reduce input groups=42【reduce输入组统计】
Reduce shuffle bytes=629【处理字节数统计】
Reduce input records=58【reduce输入行数统计】
Reduce output records=42【reduce输出行数统计】
Spilled Records=116
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=793
CPU time spent (ms)=8180
Physical memory (bytes) snapshot=695209984
Virtual memory (bytes) snapshot=8250441728
Total committed heap usage (bytes)=383655936
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters 【输入总字节数】
Bytes Read=263
File Output Format Counters 【输出总字节数】
Bytes Written=290
-
-
-
-
错误总结:
-
使用【job.setJar(“wordcount.jar”);】并没有将jar包导入每个datanode节点
-
【打出的jar包名,必须和设置的jar包名一致,否则找不到该资源,会报错】
-
必须将jar包拷贝到master的当前目录中,只有在有该jar包的目录下运行才生效
》》[hadoop@master Desktop]$ hadoop jar /mnt/hgfs/VMLink/WordCount.jar com.hyxy.hadoop.mr.WordCount /mm /mm/result
19/08/08 22:58:55 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
19/08/08 22:58:58 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/08/08 22:58:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1565255121801_0001
java.io.FileNotFoundException: File wordcount.jar does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1969)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1937)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1902)
at org.apache.hadoop.mapreduce.JobResourceUploader.copyJar(JobResourceUploader.java:246)
at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:166)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:95)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:190)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at com.hyxy.hadoop.mr.WordCount.main(WordCount.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) -
在打包时没有指明,main方法所在类的,必须在执行时,指定全限定类名,否则找不到主类
[hadoop@master Desktop]$ hadoop jar /mnt/hgfs/vm_link/WordCount.jar /mr/wordcount /mr/result1
Exception in thread “main” java.lang.ClassNotFoundException: /mr/wordcount
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) -
在打包时,选定了main方法所在的类,在执行时又指定了全限定类名,则报错
[hadoop@master Desktop]$ hadoop jar /mnt/hgfs/vm_link/wordcount.jar com.hyxy.hadoop.mr.WordCount /mr/wordcount /mr/result2
19/08/08 23:57:15 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://master:9000/mr/wordcount already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at com.hyxy.hadoop.mr.WordCount.main(WordCount.java:39)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) -
Mapper中的map方法调用了super父类的map,导致数据类型不匹配
[hadoop@master ~]$ hadoop jar /mnt/hgfs/G/wordcount.jar /mr/wordcount /mr/result5
19/08/08 19:53:49 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.33.128:8032
19/08/08 19:53:50 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/08/08 19:53:50 INFO input.FileInputFormat: Total input paths to process : 2
19/08/08 19:53:50 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1249)
at java.lang.Thread.join(Thread.java:1323)
at org.apache.hadoop.hdfs.DFSOutputStream D a t a S t r e a m e r . c l o s e R e s p o n d e r ( D F S O u t p u t S t r e a m . j a v a : 609 ) a t o r g . a p a c h e . h a d o o p . h d f s . D F S O u t p u t S t r e a m DataStreamer.closeResponder(DFSOutputStream.java:609) at org.apache.hadoop.hdfs.DFSOutputStream DataStreamer.closeResponder(DFSOutputStream.java:609)atorg.apache.hadoop.hdfs.DFSOutputStreamDataStreamer.endBlock(DFSOutputStream.java:370)
at org.apache.hadoop.hdfs.DFSOutputStreamKaTeX parse error: Double subscript at position 725: …17188061_0006_m_̲000000_0, Statu…MapOutputBuffer.collect(MapTask.java:1072)
at org.apache.hadoop.mapred.MapTask N e w O u t p u t C o l l e c t o r . w r i t e ( M a p T a s k . j a v a : 715 ) a t o r g . a p a c h e . h a d o o p . m a p r e d u c e . t a s k . T a s k I n p u t O u t p u t C o n t e x t I m p l . w r i t e ( T a s k I n p u t O u t p u t C o n t e x t I m p l . j a v a : 89 ) a t o r g . a p a c h e . h a d o o p . m a p r e d u c e . l i b . m a p . W r a p p e d M a p p e r NewOutputCollector.write(MapTask.java:715) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper NewOutputCollector.write(MapTask.java:715)atorg.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)atorg.apache.hadoop.mapreduce.lib.map.WrappedMapperContext.write(WrappedMapper.java:112)
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:125)
at map.WordCountMapper.map(WordCountMapper.java:18)
at map.WordCountMapper.map(WordCountMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143 -
导包出错,导hadoop.io.Text才对,所以报类型转换错误的异常
[hadoop@master ~]$ hadoop jar /mnt/hgfs/G/wordcount.jar /mr/wordcount /mr/result4
19/08/08 19:49:24 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.33.128:8032
19/08/08 19:49:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/08/08 19:49:25 INFO input.FileInputFormat: Total input paths to process : 2
19/08/08 19:49:25 INFO mapreduce.JobSubmitter: number of splits:2
19/08/08 19:49:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565317188061_0005
19/08/08 19:49:25 INFO impl.YarnClientImpl: Submitted application application_1565317188061_0005
19/08/08 19:49:25 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565317188061_0005/
19/08/08 19:49:25 INFO mapreduce.Job: Running job: job_1565317188061_0005
19/08/08 19:49:33 INFO mapreduce.Job: Job job_1565317188061_0005 running in uber mode : false
19/08/08 19:49:33 INFO mapreduce.Job: map 0% reduce 0%
19/08/08 19:49:41 INFO mapreduce.Job: Task Id : attempt_1565317188061_0005_m_000000_0, Status : FAILED
Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider$Text
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:414)
at org.apache.hadoop.mapred.MapTask.access 100 ( M a p T a s k . j a v a : 81 ) a t o r g . a p a c h e . h a d o o p . m a p r e d . M a p T a s k 100(MapTask.java:81) at org.apache.hadoop.mapred.MapTask 100(MapTask.java:81)atorg.apache.hadoop.mapred.MapTaskNewOutputCollector.(MapTask.java:698)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild 2. r u n ( Y a r n C h i l d . j a v a : 164 ) a t j a v a . s e c u r i t y . A c c e s s C o n t r o l l e r . d o P r i v i l e g e d ( N a t i v e M e t h o d ) a t j a v a x . s e c u r i t y . a u t h . S u b j e c t . d o A s ( S u b j e c t . j a v a : 422 ) a t o r g . a p a c h e . h a d o o p . s e c u r i t y . U s e r G r o u p I n f o r m a t i o n . d o A s ( U s e r G r o u p I n f o r m a t i o n . j a v a : 1698 ) a t o r g . a p a c h e . h a d o o p . m a p r e d . Y a r n C h i l d . m a i n ( Y a r n C h i l d . j a v a : 158 ) = = C a u s e d b y : j a v a . l a n g . C l a s s C a s t E x c e p t i o n : c l a s s c o m . s u n . j e r s e y . c o r e . i m p l . p r o v i d e r . e n t i t y . X M L J A X B E l e m e n t P r o v i d e r 2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) ==Caused by: java.lang.ClassCastException: class com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider 2.run(YarnChild.java:164)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)atorg.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)==Causedby:java.lang.ClassCastException:classcom.sun.jersey.core.impl.provider.entity.XMLJAXBElementProviderText==
at java.lang.Class.asSubclass(Class.java:3404)
at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
… 9 moreContainer killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143 -
跨平台属性设置为true
19/08/10 16:57:48 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
19/08/10 16:57:49 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/08/10 16:57:49 INFO input.FileInputFormat: Total input paths to process : 3
19/08/10 16:57:49 INFO mapreduce.JobSubmitter: number of splits:3
19/08/10 16:57:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0003
19/08/10 16:57:50 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0003
19/08/10 16:57:50 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0003/
19/08/10 16:57:50 INFO mapreduce.Job: Running job: job_1565422129055_0003
19/08/10 16:57:52 INFO mapreduce.Job: Job job_1565422129055_0003 running in uber mode : false
19/08/10 16:57:52 INFO mapreduce.Job: map 0% reduce 0%
19/08/10 16:57:52 INFO mapreduce.Job: Job job_1565422129055_0003 failed with state FAILED due to: Application application_1565422129055_0003 failed 2 times due to AM Container for appattempt_1565422129055_0003_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://master:8088/cluster/app/application_1565422129055_0003Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1565422129055_0003_02_000001
Exit code: 1
Exception message: /bin/bash: line 0: fg: no job controlStack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control
at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
at org.apache.hadoop.util.Shell.run(Shell.java:479)
at org.apache.hadoop.util.Shell S h e l l C o m m a n d E x e c u t o r . e x e c u t e ( S h e l l . j a v a : 773 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . D e f a u l t C o n t a i n e r E x e c u t o r . l a u n c h C o n t a i n e r ( D e f a u l t C o n t a i n e r E x e c u t o r . j a v a : 212 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . c o n t a i n e r m a n a g e r . l a u n c h e r . C o n t a i n e r L a u n c h . c a l l ( C o n t a i n e r L a u n c h . j a v a : 302 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . c o n t a i n e r m a n a g e r . l a u n c h e r . C o n t a i n e r L a u n c h . c a l l ( C o n t a i n e r L a u n c h . j a v a : 82 ) a t j a v a . u t i l . c o n c u r r e n t . F u t u r e T a s k . r u n ( F u t u r e T a s k . j a v a : 266 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r . r u n W o r k e r ( T h r e a d P o o l E x e c u t o r . j a v a : 1142 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r ShellCommandExecutor.execute(Shell.java:773) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor ShellCommandExecutor.execute(Shell.java:773)atorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)atjava.util.concurrent.FutureTask.run(FutureTask.java:266)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)atjava.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
19/08/10 16:57:52 INFO mapreduce.Job: Counters: 0 -
本地文件目录,没设置在本地跑,在集群跑就会出现如下错误
19/08/12 16:20:59 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
19/08/12 16:21:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/08/12 16:21:02 INFO input.FileInputFormat: Total input paths to process : 1
19/08/12 16:21:02 INFO mapreduce.JobSubmitter: number of splits:1
19/08/12 16:21:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0055
19/08/12 16:21:02 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0055
19/08/12 16:21:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0055/
19/08/12 16:21:02 INFO mapreduce.Job: Running job: job_1565422129055_0055
19/08/12 16:21:13 INFO mapreduce.Job: Job job_1565422129055_0055 running in uber mode : false
19/08/12 16:21:13 INFO mapreduce.Job: map 0% reduce 0%
19/08/12 16:21:22 INFO mapreduce.Job: map 100% reduce 0%
19/08/12 16:21:22 INFO mapreduce.Job: Task Id : attempt_1565422129055_0055_m_000000_0, Status : FAILED
Error: java.io.FileNotFoundException: File file:/D:/airdata/1901 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.ChecksumFileSystem C h e c k s u m F S I n p u t C h e c k e r . < i n i t > ( C h e c k s u m F i l e S y s t e m . j a v a : 142 ) a t o r g . a p a c h e . h a d o o p . f s . C h e c k s u m F i l e S y s t e m . o p e n ( C h e c k s u m F i l e S y s t e m . j a v a : 346 ) a t o r g . a p a c h e . h a d o o p . f s . F i l e S y s t e m . o p e n ( F i l e S y s t e m . j a v a : 769 ) a t o r g . a p a c h e . h a d o o p . m a p r e d u c e . l i b . i n p u t . L i n e R e c o r d R e a d e r . i n i t i a l i z e ( L i n e R e c o r d R e a d e r . j a v a : 85 ) a t o r g . a p a c h e . h a d o o p . m a p r e d . M a p T a s k ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85) at org.apache.hadoop.mapred.MapTask ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)atorg.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)atorg.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)atorg.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)atorg.apache.hadoop.mapred.MapTaskNewTrackingRecordReader.initialize(MapTask.java:548)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 14319/08/12 16:20:59 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
19/08/12 16:21:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/08/12 16:21:02 INFO input.FileInputFormat: Total input paths to process : 1
19/08/12 16:21:02 INFO mapreduce.JobSubmitter: number of splits:1
19/08/12 16:21:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0055
19/08/12 16:21:02 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0055
19/08/12 16:21:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0055/
19/08/12 16:21:02 INFO mapreduce.Job: Running job: job_1565422129055_0055
19/08/12 16:21:13 INFO mapreduce.Job: Job job_1565422129055_0055 running in uber mode : false
19/08/12 16:21:13 INFO mapreduce.Job: map 0% reduce 0%
19/08/12 16:21:22 INFO mapreduce.Job: map 100% reduce 0%
19/08/12 16:21:22 INFO mapreduce.Job: Task Id : attempt_1565422129055_0055_m_000000_0, Status : FAILED
Error: java.io.FileNotFoundException: File file:/D:/airdata/1901 does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.ChecksumFileSystem C h e c k s u m F S I n p u t C h e c k e r . < i n i t > ( C h e c k s u m F i l e S y s t e m . j a v a : 142 ) a t o r g . a p a c h e . h a d o o p . f s . C h e c k s u m F i l e S y s t e m . o p e n ( C h e c k s u m F i l e S y s t e m . j a v a : 346 ) a t o r g . a p a c h e . h a d o o p . f s . F i l e S y s t e m . o p e n ( F i l e S y s t e m . j a v a : 769 ) a t o r g . a p a c h e . h a d o o p . m a p r e d u c e . l i b . i n p u t . L i n e R e c o r d R e a d e r . i n i t i a l i z e ( L i n e R e c o r d R e a d e r . j a v a : 85 ) a t o r g . a p a c h e . h a d o o p . m a p r e d . M a p T a s k ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85) at org.apache.hadoop.mapred.MapTask ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)atorg.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)atorg.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)atorg.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)atorg.apache.hadoop.mapred.MapTaskNewTrackingRecordReader.initialize(MapTask.java:548)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
- eclipse 提交Job至集群
-
将*-site.xml文件复制到src(classpath)下
-
遇到权限问题时,修改hdfs目录权限,即可!
-
修改跨平台属性:
conf.set(“mapreduce.app-submission.cross-platform”, “true”);
-
打jar包,并将jar包复制到的classpath下
-
如果遇到0.0.0.0:10020异常
开启历史服务器:$>
mr-jobhistory-daemon.sh start historyserver
-