分布式并行计算框架:数据在哪,计算在哪

分布式并行计算框架:数据在哪,计算就在哪

【主要是map输入<k1,v1>,map的输出或reduce的输入<k2,v2>,reduce输出<k3,v3>分别代表什么,弄懂其中关系就容易了】

【k1:表示字节数索引,v1:表示该行的值】

【k2:表示map的context.write(key,value)的key值,v2:表示value值】

【k3:表示以key值分组,v3:表示对key分组有关的value值进行一定的操作,而产生的结果】

  1. MapReduce:hadoop提供的一套分布式并行计算框架
    Map阶段:映射阶段(并行阶段),多个datanode并行计算多个文件内容
    Reduce阶段:规约阶段(合并阶段),将Map总结出的数据,合并给reduce处理

  2. 实现方式:均已键值对进行数据传输的。即<key,value>

  3. 需求:
    在hadoop集群上,存在多个个文件:
    hello1 hello2
    hello world pleace, i wanna sleep right now

    ​ how are you but I can’t
    ​ ni hao learning is so hard

    ​ hello China so, just work harder

      求文件hello1中词频出现的次数。
     	hello 2
     	world 1
     	how 1
     	are 1
     	you 1
     	ni 1
     	hao 1
    

    ​ China 1

  4. 编程思路:

    【必须将core-site.xml文件加载到classpath下,或者利用conf.set()方法,设置对应的集群,否则不会连接集群,默认加载core-default.xml,然后加载core-site.xml,接着加载conf.set()方法中的配置,后来加载的配置会覆盖先加载的配置】

    ​ a.构造数据。将数据上传至hdfs
    ​ b.创建Mapper类,extends【org.apache.hadoop.mapreduce.Mapper】

    ​ class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

    ​ 说明:Mapper是将输入记录转换为中间记录。转换后的中间记录不需要与输入记录的类型相同。

    ​ 给定的输入对可以映射为零或许多输出对。
    ​ c.创建Reducer类,extends【org.apache.hadoop.mapreduce.Reducer】

    ​ MapReduce框架为每个提交集群的Job(作业),通过计算InputSplit(切分),分配map task。

​ hello1文件
​ hello world
​ how are you
​ ni hao
map阶段操作的键值对为:<k1,v1> --> <k2,v2>
hello world <k1,v1>:<0,‘hello world’> --> <k2,v2>:<hello,1>,<world,1>
how are you <k1,v1>:<11,‘how are you’> --> <k2,v2>:<how,1>,<are,1>,<you,1>
ni hao <k1,v1>:<22,‘ni hao’> --> <k2,v2>:<ni,1>,<hao,1>
Map阶段的K1值为【偏移量】,故类型为LongWritable,而不是行号;V1值为【行值】,故类型为Text。
reduce阶段,K2为中间值,V2为业务值


  1. 代码实现:词频统计

    • WordCountMapper类

      package com.hyxy.hadoop.mr;

      import java.io.IOException;

      import org.apache.hadoop.io.IntWritable;
      import org.apache.hadoop.io.LongWritable;
      import org.apache.hadoop.io.Text;
      import org.apache.hadoop.mapreduce.Mapper;

      public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
      private Text word = new Text();
      private IntWritable one = new IntWritable(1);
      @Override
      protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
      throws IOException, InterruptedException {
      String line = value.toString();
      String[] words = line.split(" ");
      for (String wd : words) {
      System.out.println(word);
      word.set(wd);
      context.write(word, one);

      ​ }
      ​ }
      }

    • WordCountReducer类

      package com.hyxy.hadoop.mr;
      
      import java.io.IOException;
      
      import org.apache.hadoop.io.IntWritable;
      import org.apache.hadoop.io.Text;
      import org.apache.hadoop.mapreduce.Reducer;
      
      public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
      	private IntWritable _sum = new IntWritable();
      	@Override
      	protected void reduce(Text key, Iterable<IntWritable> values,
      			Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
      		int sum = 0;
      		for (IntWritable value : values) {
      			sum += value.get();
      			_sum.set(sum);
      		}
      		context.write(key, _sum);
      	}
      }
      
      
      
    • WordCount类

      package com.hyxy.hadoop.mr;

      import java.io.IOException;

      import org.apache.hadoop.conf.Configuration;
      import org.apache.hadoop.fs.Path;
      import org.apache.hadoop.io.IntWritable;
      import org.apache.hadoop.io.Text;
      import org.apache.hadoop.mapreduce.Job;
      import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
      import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

      public class WordCount {
      public static void main(String[] args) throws ClassNotFoundException, InterruptedException {
      Configuration conf = new Configuration();
      try {
      //创建一个作业
      Job job =Job.getInstance(conf);
      //为Job设置一个mapper
      job.setMapperClass(WordCountMapper.class);
      //为Job设置一个reducer
      job.setReducerClass(WordCountReducer.class);
      //设置作业名
      job.setJobName(“wordCount”);
      //设置jar包,以类名方式
      job.setJarByClass(WordCount.class);
      //设置jar包,打包必须以该名称,否则将找不到该资源
      //job.setJar(“wcc.jar”);
      //设置mapper输出键值对指定类型
      job.setMapOutputKeyClass(Text.class);
      job.setMapOutputValueClass(IntWritable.class);
      //设置reducer输出键值对指定类型
      job.setOutputKeyClass(Text.class);
      job.setOutputValueClass(IntWritable.class);
      //设置文件输入路径
      FileInputFormat.setInputPaths(job, new Path(args[0]));
      //设置文件输出路径
      FileOutputFormat.setOutputPath(job, new Path(args[1]));
      System.exit(job.waitForCompletion(true)?0:1);
      } catch (IOException e) {
      e.printStackTrace();
      }
      }
      }

  2. eclipse下执行【本地测试】

    【并不会提交给集群,只在本地运行,没有在集群上生成作业】

    1. 必须将数据文件单独存放在一个目录下

    2. 必须将指定目录级联赋予777权限,否则权限异常,连接该目录被拒绝,不行就将/目录设置最大权限

      19/08/09 23:11:19 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
      19/08/09 23:11:19 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
      org.apache.hadoop.security.AccessControlException: Permission denied: user=mumu, access=EXECUTE, inode="/mr/result3":hadoop:supergroup:drwxrw-rw-
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
      at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
      at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728)
      at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3857)
      at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1012)
      at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
      at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol 2. c a l l B l o c k i n g M e t h o d ( C l i e n t N a m e n o d e P r o t o c o l P r o t o s . j a v a ) a t o r g . a p a c h e . h a d o o p . i p c . P r o t o b u f R p c E n g i n e 2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine 2.callBlockingMethod(ClientNamenodeProtocolProtos.java)atorg.apache.hadoop.ipc.ProtobufRpcEngineServer P r o t o B u f R p c I n v o k e r . c a l l ( P r o t o b u f R p c E n g i n e . j a v a : 616 ) a t o r g . a p a c h e . h a d o o p . i p c . R P C ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)atorg.apache.hadoop.ipc.RPCServer.call(RPC.java:982)
      at org.apache.hadoop.ipc.Server$Handler 1. r u n ( S e r v e r . j a v a : 2049 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2049) at org.apache.hadoop.ipc.Server 1.run(Server.java:2049)atorg.apache.hadoop.ipc.ServerHandler 1. r u n ( S e r v e r . j a v a : 2045 ) a t j a v a . s e c u r i t y . A c c e s s C o n t r o l l e r . d o P r i v i l e g e d ( N a t i v e M e t h o d ) a t j a v a x . s e c u r i t y . a u t h . S u b j e c t . d o A s ( S u b j e c t . j a v a : 422 ) a t o r g . a p a c h e . h a d o o p . s e c u r i t y . U s e r G r o u p I n f o r m a t i o n . d o A s ( U s e r G r o u p I n f o r m a t i o n . j a v a : 1698 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server 1.run(Server.java:2045)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)atorg.apache.hadoop.ipc.ServerHandler.run(Server.java:2043)

      • 通过配置提交作业到集群

        1. 没有提交到集群,还是因为配置文件,默认加载【mapreduce.framework.name=local】加载本地,不上传集群。【yarn.resourcemanager.hostname=0.0.0.0】,还是无法连上集群。因此至少要加载conf.set()这两个配置,或者将4个配置文件都加载到classpath下,就可以自动加载。

        2. 将集群上的/tmp级联赋予777权限,否则权限异常

          19/08/10 15:07:39 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
          org.apache.hadoop.security.AccessControlException: Permission denied: user=mumu, access=EXECUTE, inode="/tmp/hadoop-yarn/staging/mumu/.staging":hadoop:supergroup:d---------
          at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
          at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
          at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)
          at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
          at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1728)
          at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getFileInfo(FSDirStatAndListingOp.java:108)
          at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3857)
          at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1012)
          at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:843)
          at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol 2. c a l l B l o c k i n g M e t h o d ( C l i e n t N a m e n o d e P r o t o c o l P r o t o s . j a v a ) a t o r g . a p a c h e . h a d o o p . i p c . P r o t o b u f R p c E n g i n e 2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine 2.callBlockingMethod(ClientNamenodeProtocolProtos.java)atorg.apache.hadoop.ipc.ProtobufRpcEngineServer P r o t o B u f R p c I n v o k e r . c a l l ( P r o t o b u f R p c E n g i n e . j a v a : 616 ) a t o r g . a p a c h e . h a d o o p . i p c . R P C ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)atorg.apache.hadoop.ipc.RPCServer.call(RPC.java:982)
          at org.apache.hadoop.ipc.Server$Handler 1. r u n ( S e r v e r . j a v a : 2049 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2049) at org.apache.hadoop.ipc.Server 1.run(Server.java:2049)atorg.apache.hadoop.ipc.ServerHandler 1. r u n ( S e r v e r . j a v a : 2045 ) a t j a v a . s e c u r i t y . A c c e s s C o n t r o l l e r . d o P r i v i l e g e d ( N a t i v e M e t h o d ) a t j a v a x . s e c u r i t y . a u t h . S u b j e c t . d o A s ( S u b j e c t . j a v a : 422 ) a t o r g . a p a c h e . h a d o o p . s e c u r i t y . U s e r G r o u p I n f o r m a t i o n . d o A s ( U s e r G r o u p I n f o r m a t i o n . j a v a : 1698 ) a t o r g . a p a c h e . h a d o o p . i p c . S e r v e r 1.run(Server.java:2045) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.ipc.Server 1.run(Server.java:2045)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)atorg.apache.hadoop.ipc.ServerHandler.run(Server.java:2043)

        3. 出现错误,就将【mapreduce.app-submission.cross-platform】跨平台属性设置为true

          19/08/10 16:57:48 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
          19/08/10 16:57:49 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
          19/08/10 16:57:49 INFO input.FileInputFormat: Total input paths to process : 3
          19/08/10 16:57:49 INFO mapreduce.JobSubmitter: number of splits:3
          19/08/10 16:57:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0003
          19/08/10 16:57:50 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0003
          19/08/10 16:57:50 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0003/
          19/08/10 16:57:50 INFO mapreduce.Job: Running job: job_1565422129055_0003
          19/08/10 16:57:52 INFO mapreduce.Job: Job job_1565422129055_0003 running in uber mode : false
          19/08/10 16:57:52 INFO mapreduce.Job: map 0% reduce 0%
          19/08/10 16:57:52 INFO mapreduce.Job: Job job_1565422129055_0003 failed with state FAILED due to: Application application_1565422129055_0003 failed 2 times due to AM Container for appattempt_1565422129055_0003_000002 exited with exitCode: 1
          For more detailed output, check application tracking page:http://master:8088/cluster/app/application_1565422129055_0003Then, click on links to logs of each attempt.
          Diagnostics: Exception from container-launch.
          Container id: container_1565422129055_0003_02_000001
          Exit code: 1
          Exception message: /bin/bash: line 0: fg: no job control

          Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control

          at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
          at org.apache.hadoop.util.Shell.run(Shell.java:479)
          at org.apache.hadoop.util.Shell S h e l l C o m m a n d E x e c u t o r . e x e c u t e ( S h e l l . j a v a : 773 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . D e f a u l t C o n t a i n e r E x e c u t o r . l a u n c h C o n t a i n e r ( D e f a u l t C o n t a i n e r E x e c u t o r . j a v a : 212 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . c o n t a i n e r m a n a g e r . l a u n c h e r . C o n t a i n e r L a u n c h . c a l l ( C o n t a i n e r L a u n c h . j a v a : 302 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . c o n t a i n e r m a n a g e r . l a u n c h e r . C o n t a i n e r L a u n c h . c a l l ( C o n t a i n e r L a u n c h . j a v a : 82 ) a t j a v a . u t i l . c o n c u r r e n t . F u t u r e T a s k . r u n ( F u t u r e T a s k . j a v a : 266 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r . r u n W o r k e r ( T h r e a d P o o l E x e c u t o r . j a v a : 1142 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r ShellCommandExecutor.execute(Shell.java:773) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor ShellCommandExecutor.execute(Shell.java:773)atorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)atjava.util.concurrent.FutureTask.run(FutureTask.java:266)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)atjava.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)

          Container exited with a non-zero exit code 1
          Failing this attempt. Failing the application.
          19/08/10 16:57:52 INFO mapreduce.Job: Counters: 0

        4. 打jar包到【Build path】中或拷贝jar包到classp下,运行即可

        5. 运行速度没有之前快,就是提交到了集群处理

        6. 19/08/10 15:48:17 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
          19/08/10 15:48:18 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
          19/08/10 15:48:18 INFO input.FileInputFormat: Total input paths to process : 3
          19/08/10 15:48:18 INFO mapreduce.JobSubmitter: number of splits:3
          19/08/10 15:48:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0002
          19/08/10 15:48:18 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0002
          19/08/10 15:48:19 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0002/
          19/08/10 15:48:19 INFO mapreduce.Job: Running job: job_1565422129055_0002
          19/08/10 15:48:30 INFO mapreduce.Job: Job job_1565422129055_0002 running in uber mode : false
          19/08/10 15:48:30 INFO mapreduce.Job: map 0% reduce 0%
          19/08/10 15:48:51 INFO mapreduce.Job: map 33% reduce 0%
          19/08/10 15:48:52 INFO mapreduce.Job: map 100% reduce 0%
          19/08/10 15:48:59 INFO mapreduce.Job: map 100% reduce 100%
          19/08/10 15:48:59 INFO mapreduce.Job: Job job_1565422129055_0002 completed successfully
          19/08/10 15:48:59 INFO mapreduce.Job: Counters: 49
          File System Counters
          FILE: Number of bytes read=617
          FILE: Number of bytes written=475779
          FILE: Number of read operations=0
          FILE: Number of large read operations=0
          FILE: Number of write operations=0
          HDFS: Number of bytes read=571
          HDFS: Number of bytes written=290
          HDFS: Number of read operations=12
          HDFS: Number of large read operations=0
          HDFS: Number of write operations=2
          Job Counters 【作业统计】
          Launched map tasks=3【map任务数3个,根据split分片决定个数】
          Launched reduce tasks=1【reduce任务数1个】
          Data-local map tasks=3
          Total time spent by all maps in occupied slots (ms)=58330
          Total time spent by all reduces in occupied slots (ms)=5117
          Total time spent by all map tasks (ms)=58330
          Total time spent by all reduce tasks (ms)=5117
          Total vcore-milliseconds taken by all map tasks=58330
          Total vcore-milliseconds taken by all reduce tasks=5117
          Total megabyte-milliseconds taken by all map tasks=59729920
          Total megabyte-milliseconds taken by all reduce tasks=5239808
          Map-Reduce Framework
          Map input records=16【map输入行数统计】
          Map output records=58【map输出行数统计】
          Map output bytes=495【map输出字节统计】
          Map output materialized bytes=629
          Input split bytes=308
          Combine input records=0
          Combine output records=0
          Reduce input groups=42【reduce输入组统计】
          Reduce shuffle bytes=629【处理字节数统计】
          Reduce input records=58【reduce输入行数统计】
          Reduce output records=42【reduce输出行数统计】
          Spilled Records=116
          Shuffled Maps =3
          Failed Shuffles=0
          Merged Map outputs=3
          GC time elapsed (ms)=793
          CPU time spent (ms)=8180
          Physical memory (bytes) snapshot=695209984
          Virtual memory (bytes) snapshot=8250441728
          Total committed heap usage (bytes)=383655936
          Shuffle Errors
          BAD_ID=0
          CONNECTION=0
          IO_ERROR=0
          WRONG_LENGTH=0
          WRONG_MAP=0
          WRONG_REDUCE=0
          File Input Format Counters 【输入总字节数】
          Bytes Read=263
          File Output Format Counters 【输出总字节数】
          Bytes Written=290

  3. 错误总结:

  • 使用【job.setJar(“wordcount.jar”);】并没有将jar包导入每个datanode节点

  • 【打出的jar包名,必须和设置的jar包名一致,否则找不到该资源,会报错】

  • 必须将jar包拷贝到master的当前目录中,只有在有该jar包的目录下运行才生效

    》》[hadoop@master Desktop]$ hadoop jar /mnt/hgfs/VMLink/WordCount.jar com.hyxy.hadoop.mr.WordCount /mm /mm/result
    19/08/08 22:58:55 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
    19/08/08 22:58:58 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/08/08 22:58:58 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1565255121801_0001
    java.io.FileNotFoundException: File wordcount.jar does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1969)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1937)
    at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1902)
    at org.apache.hadoop.mapreduce.JobResourceUploader.copyJar(JobResourceUploader.java:246)
    at org.apache.hadoop.mapreduce.JobResourceUploader.uploadFiles(JobResourceUploader.java:166)
    at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:95)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:190)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at com.hyxy.hadoop.mr.WordCount.main(WordCount.java:37)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

  • 在打包时没有指明,main方法所在类的,必须在执行时,指定全限定类名,否则找不到主类

    [hadoop@master Desktop]$ hadoop jar /mnt/hgfs/vm_link/WordCount.jar /mr/wordcount /mr/result1
    Exception in thread “main” java.lang.ClassNotFoundException: /mr/wordcount
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

  • 在打包时,选定了main方法所在的类,在执行时又指定了全限定类名,则报错

    [hadoop@master Desktop]$ hadoop jar /mnt/hgfs/vm_link/wordcount.jar com.hyxy.hadoop.mr.WordCount /mr/wordcount /mr/result2
    19/08/08 23:57:15 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
    org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://master:9000/mr/wordcount already exists
    at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
    at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:266)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:139)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
    at com.hyxy.hadoop.mr.WordCount.main(WordCount.java:39)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

  • Mapper中的map方法调用了super父类的map,导致数据类型不匹配

    [hadoop@master ~]$ hadoop jar /mnt/hgfs/G/wordcount.jar /mr/wordcount /mr/result5
    19/08/08 19:53:49 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.33.128:8032
    19/08/08 19:53:50 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/08/08 19:53:50 INFO input.FileInputFormat: Total input paths to process : 2
    19/08/08 19:53:50 WARN hdfs.DFSClient: Caught exception
    java.lang.InterruptedException
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1249)
    at java.lang.Thread.join(Thread.java:1323)
    at org.apache.hadoop.hdfs.DFSOutputStream D a t a S t r e a m e r . c l o s e R e s p o n d e r ( D F S O u t p u t S t r e a m . j a v a : 609 ) a t o r g . a p a c h e . h a d o o p . h d f s . D F S O u t p u t S t r e a m DataStreamer.closeResponder(DFSOutputStream.java:609) at org.apache.hadoop.hdfs.DFSOutputStream DataStreamer.closeResponder(DFSOutputStream.java:609)atorg.apache.hadoop.hdfs.DFSOutputStreamDataStreamer.endBlock(DFSOutputStream.java:370)
    at org.apache.hadoop.hdfs.DFSOutputStreamKaTeX parse error: Double subscript at position 725: …17188061_0006_m_̲000000_0, Statu…MapOutputBuffer.collect(MapTask.java:1072)
    at org.apache.hadoop.mapred.MapTask N e w O u t p u t C o l l e c t o r . w r i t e ( M a p T a s k . j a v a : 715 ) a t o r g . a p a c h e . h a d o o p . m a p r e d u c e . t a s k . T a s k I n p u t O u t p u t C o n t e x t I m p l . w r i t e ( T a s k I n p u t O u t p u t C o n t e x t I m p l . j a v a : 89 ) a t o r g . a p a c h e . h a d o o p . m a p r e d u c e . l i b . m a p . W r a p p e d M a p p e r NewOutputCollector.write(MapTask.java:715) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper NewOutputCollector.write(MapTask.java:715)atorg.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)atorg.apache.hadoop.mapreduce.lib.map.WrappedMapperContext.write(WrappedMapper.java:112)
    at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:125)
    at map.WordCountMapper.map(WordCountMapper.java:18)
    at map.WordCountMapper.map(WordCountMapper.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

    Container killed by the ApplicationMaster.
    Container killed on request. Exit code is 143
    Container exited with a non-zero exit code 143

  • 导包出错,导hadoop.io.Text才对,所以报类型转换错误的异常

    [hadoop@master ~]$ hadoop jar /mnt/hgfs/G/wordcount.jar /mr/wordcount /mr/result4
    19/08/08 19:49:24 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.33.128:8032
    19/08/08 19:49:24 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/08/08 19:49:25 INFO input.FileInputFormat: Total input paths to process : 2
    19/08/08 19:49:25 INFO mapreduce.JobSubmitter: number of splits:2
    19/08/08 19:49:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565317188061_0005
    19/08/08 19:49:25 INFO impl.YarnClientImpl: Submitted application application_1565317188061_0005
    19/08/08 19:49:25 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565317188061_0005/
    19/08/08 19:49:25 INFO mapreduce.Job: Running job: job_1565317188061_0005
    19/08/08 19:49:33 INFO mapreduce.Job: Job job_1565317188061_0005 running in uber mode : false
    19/08/08 19:49:33 INFO mapreduce.Job: map 0% reduce 0%
    19/08/08 19:49:41 INFO mapreduce.Job: Task Id : attempt_1565317188061_0005_m_000000_0, Status : FAILED
    Error: java.io.IOException: Initialization of all the collectors failed. Error in last collector was :class com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider$Text
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:414)
    at org.apache.hadoop.mapred.MapTask.access 100 ( M a p T a s k . j a v a : 81 ) a t o r g . a p a c h e . h a d o o p . m a p r e d . M a p T a s k 100(MapTask.java:81) at org.apache.hadoop.mapred.MapTask 100(MapTask.java:81)atorg.apache.hadoop.mapred.MapTaskNewOutputCollector.(MapTask.java:698)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild 2. r u n ( Y a r n C h i l d . j a v a : 164 ) a t j a v a . s e c u r i t y . A c c e s s C o n t r o l l e r . d o P r i v i l e g e d ( N a t i v e M e t h o d ) a t j a v a x . s e c u r i t y . a u t h . S u b j e c t . d o A s ( S u b j e c t . j a v a : 422 ) a t o r g . a p a c h e . h a d o o p . s e c u r i t y . U s e r G r o u p I n f o r m a t i o n . d o A s ( U s e r G r o u p I n f o r m a t i o n . j a v a : 1698 ) a t o r g . a p a c h e . h a d o o p . m a p r e d . Y a r n C h i l d . m a i n ( Y a r n C h i l d . j a v a : 158 ) = = C a u s e d b y : j a v a . l a n g . C l a s s C a s t E x c e p t i o n : c l a s s c o m . s u n . j e r s e y . c o r e . i m p l . p r o v i d e r . e n t i t y . X M L J A X B E l e m e n t P r o v i d e r 2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) ==Caused by: java.lang.ClassCastException: class com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider 2.run(YarnChild.java:164)atjava.security.AccessController.doPrivileged(NativeMethod)atjavax.security.auth.Subject.doAs(Subject.java:422)atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)atorg.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)==Causedby:java.lang.ClassCastException:classcom.sun.jersey.core.impl.provider.entity.XMLJAXBElementProviderText==
    at java.lang.Class.asSubclass(Class.java:3404)
    at org.apache.hadoop.mapred.JobConf.getOutputKeyComparator(JobConf.java:887)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:1004)
    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:402)
    … 9 more

    Container killed by the ApplicationMaster.
    Container killed on request. Exit code is 143
    Container exited with a non-zero exit code 143

  • 跨平台属性设置为true

    19/08/10 16:57:48 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
    19/08/10 16:57:49 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/08/10 16:57:49 INFO input.FileInputFormat: Total input paths to process : 3
    19/08/10 16:57:49 INFO mapreduce.JobSubmitter: number of splits:3
    19/08/10 16:57:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0003
    19/08/10 16:57:50 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0003
    19/08/10 16:57:50 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0003/
    19/08/10 16:57:50 INFO mapreduce.Job: Running job: job_1565422129055_0003
    19/08/10 16:57:52 INFO mapreduce.Job: Job job_1565422129055_0003 running in uber mode : false
    19/08/10 16:57:52 INFO mapreduce.Job: map 0% reduce 0%
    19/08/10 16:57:52 INFO mapreduce.Job: Job job_1565422129055_0003 failed with state FAILED due to: Application application_1565422129055_0003 failed 2 times due to AM Container for appattempt_1565422129055_0003_000002 exited with exitCode: 1
    For more detailed output, check application tracking page:http://master:8088/cluster/app/application_1565422129055_0003Then, click on links to logs of each attempt.
    Diagnostics: Exception from container-launch.
    Container id: container_1565422129055_0003_02_000001
    Exit code: 1
    Exception message: /bin/bash: line 0: fg: no job control

    Stack trace: ExitCodeException exitCode=1: /bin/bash: line 0: fg: no job control

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
    at org.apache.hadoop.util.Shell.run(Shell.java:479)
    at org.apache.hadoop.util.Shell S h e l l C o m m a n d E x e c u t o r . e x e c u t e ( S h e l l . j a v a : 773 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . D e f a u l t C o n t a i n e r E x e c u t o r . l a u n c h C o n t a i n e r ( D e f a u l t C o n t a i n e r E x e c u t o r . j a v a : 212 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . c o n t a i n e r m a n a g e r . l a u n c h e r . C o n t a i n e r L a u n c h . c a l l ( C o n t a i n e r L a u n c h . j a v a : 302 ) a t o r g . a p a c h e . h a d o o p . y a r n . s e r v e r . n o d e m a n a g e r . c o n t a i n e r m a n a g e r . l a u n c h e r . C o n t a i n e r L a u n c h . c a l l ( C o n t a i n e r L a u n c h . j a v a : 82 ) a t j a v a . u t i l . c o n c u r r e n t . F u t u r e T a s k . r u n ( F u t u r e T a s k . j a v a : 266 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r . r u n W o r k e r ( T h r e a d P o o l E x e c u t o r . j a v a : 1142 ) a t j a v a . u t i l . c o n c u r r e n t . T h r e a d P o o l E x e c u t o r ShellCommandExecutor.execute(Shell.java:773) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor ShellCommandExecutor.execute(Shell.java:773)atorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)atjava.util.concurrent.FutureTask.run(FutureTask.java:266)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)atjava.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

    Container exited with a non-zero exit code 1
    Failing this attempt. Failing the application.
    19/08/10 16:57:52 INFO mapreduce.Job: Counters: 0

  • 本地文件目录,没设置在本地跑,在集群跑就会出现如下错误

    19/08/12 16:20:59 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
    19/08/12 16:21:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/08/12 16:21:02 INFO input.FileInputFormat: Total input paths to process : 1
    19/08/12 16:21:02 INFO mapreduce.JobSubmitter: number of splits:1
    19/08/12 16:21:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0055
    19/08/12 16:21:02 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0055
    19/08/12 16:21:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0055/
    19/08/12 16:21:02 INFO mapreduce.Job: Running job: job_1565422129055_0055
    19/08/12 16:21:13 INFO mapreduce.Job: Job job_1565422129055_0055 running in uber mode : false
    19/08/12 16:21:13 INFO mapreduce.Job: map 0% reduce 0%
    19/08/12 16:21:22 INFO mapreduce.Job: map 100% reduce 0%
    19/08/12 16:21:22 INFO mapreduce.Job: Task Id : attempt_1565422129055_0055_m_000000_0, Status : FAILED
    Error: java.io.FileNotFoundException: File file:/D:/airdata/1901 does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.fs.ChecksumFileSystem C h e c k s u m F S I n p u t C h e c k e r . < i n i t > ( C h e c k s u m F i l e S y s t e m . j a v a : 142 ) a t o r g . a p a c h e . h a d o o p . f s . C h e c k s u m F i l e S y s t e m . o p e n ( C h e c k s u m F i l e S y s t e m . j a v a : 346 ) a t o r g . a p a c h e . h a d o o p . f s . F i l e S y s t e m . o p e n ( F i l e S y s t e m . j a v a : 769 ) a t o r g . a p a c h e . h a d o o p . m a p r e d u c e . l i b . i n p u t . L i n e R e c o r d R e a d e r . i n i t i a l i z e ( L i n e R e c o r d R e a d e r . j a v a : 85 ) a t o r g . a p a c h e . h a d o o p . m a p r e d . M a p T a s k ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85) at org.apache.hadoop.mapred.MapTask ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)atorg.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)atorg.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)atorg.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)atorg.apache.hadoop.mapred.MapTaskNewTrackingRecordReader.initialize(MapTask.java:548)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

    Container killed by the ApplicationMaster.
    Container killed on request. Exit code is 143
    Container exited with a non-zero exit code 14319/08/12 16:20:59 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.204.204:8032
    19/08/12 16:21:01 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
    19/08/12 16:21:02 INFO input.FileInputFormat: Total input paths to process : 1
    19/08/12 16:21:02 INFO mapreduce.JobSubmitter: number of splits:1
    19/08/12 16:21:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1565422129055_0055
    19/08/12 16:21:02 INFO impl.YarnClientImpl: Submitted application application_1565422129055_0055
    19/08/12 16:21:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1565422129055_0055/
    19/08/12 16:21:02 INFO mapreduce.Job: Running job: job_1565422129055_0055
    19/08/12 16:21:13 INFO mapreduce.Job: Job job_1565422129055_0055 running in uber mode : false
    19/08/12 16:21:13 INFO mapreduce.Job: map 0% reduce 0%
    19/08/12 16:21:22 INFO mapreduce.Job: map 100% reduce 0%
    19/08/12 16:21:22 INFO mapreduce.Job: Task Id : attempt_1565422129055_0055_m_000000_0, Status : FAILED
    Error: java.io.FileNotFoundException: File file:/D:/airdata/1901 does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.hadoop.fs.ChecksumFileSystem C h e c k s u m F S I n p u t C h e c k e r . < i n i t > ( C h e c k s u m F i l e S y s t e m . j a v a : 142 ) a t o r g . a p a c h e . h a d o o p . f s . C h e c k s u m F i l e S y s t e m . o p e n ( C h e c k s u m F i l e S y s t e m . j a v a : 346 ) a t o r g . a p a c h e . h a d o o p . f s . F i l e S y s t e m . o p e n ( F i l e S y s t e m . j a v a : 769 ) a t o r g . a p a c h e . h a d o o p . m a p r e d u c e . l i b . i n p u t . L i n e R e c o r d R e a d e r . i n i t i a l i z e ( L i n e R e c o r d R e a d e r . j a v a : 85 ) a t o r g . a p a c h e . h a d o o p . m a p r e d . M a p T a s k ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85) at org.apache.hadoop.mapred.MapTask ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)atorg.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)atorg.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)atorg.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)atorg.apache.hadoop.mapred.MapTaskNewTrackingRecordReader.initialize(MapTask.java:548)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

    Container killed by the ApplicationMaster.
    Container killed on request. Exit code is 143
    Container exited with a non-zero exit code 143


  1. eclipse 提交Job至集群
    1. 将*-site.xml文件复制到src(classpath)下

    2. 遇到权限问题时,修改hdfs目录权限,即可!

    3. 修改跨平台属性:

      conf.set(“mapreduce.app-submission.cross-platform”, “true”);

    4. 打jar包,并将jar包复制到的classpath下

    5. 如果遇到0.0.0.0:10020异常
      开启历史服务器:

      $>mr-jobhistory-daemon.sh start historyserver

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值