linux上装好hadoop之后,可以等陆到linux上通过hadoop的shell命令查看hadoop的 hdfs文件系统,执行写好的mapreduce程序,但是这样会很不方便,我们通常 的做法是在windows下开发程序,调试成功之后,再发布到linux上执行,这一部分就是来分享一下如何配置windows下的eclipse,可以链接到linux下的hadoop。
一、在eclipse下安装开发hadoop程序的插件


二、插件安装后,配置一下连接参数
插件装完了,就可以建一个连接了,就相当于eclipse里配置一个 weblogic的连接
第一步 如图所示,打开Map/Reduce Locations 视图,在右上角有个大象的标志点击
第二步,在点击大象后弹出的对话框进行进行参数的添加,如下图
location name: 这个随便填写,笔者填写的是:hadoop.
map/reduce master 这个框里
host:就是jobtracker 所在的集群机器,笔者这里是单机伪分布式,jobtracker就在这个机器上,所以填上这个机器的ip
port:就是jobtracker 的port,这里写的是9001
这两个参数就是 mapred-site.xml里面mapred.job.tracker里面的ip和port
DFS master这个框里
host:就是namenode所在的集群机器,笔者这里是单机伪分布式,namenode就在这个机器上,所以填上这个机器的ip。
port:就是namenode的port,这里写9000
这两个参数就是 core-site.xml里fs.default.name里面的ip和port(use M\R master host,这个复选框如果选上,就默认和map\reduce master 这个框里的 host一样,如果不选择,就可以自己定义输入,这里jobtracker 和namenode在一个机器上,所以是一样的,就勾选上)
username:这个是连接hadoop的用户名,因为笔者是在linux中用root用户安装的hadoop,而且没建立其他的用户,所以就用root。
下面的不用填写。
然后点击 finish按钮,此时,这个视图中就有多了一条记录,
第三步,重启eclipse,然后重启完毕之后,重新编辑刚才建立的那个连接记录,如图,第二步里面我们是填写的General,tab页,现在我们编辑advance parameters tab页。
读者可能问,为什么不在第二步里直接把这个tab页 也编辑了,这是因为,新建连接的时候,这个advance paramters tab页面的一些属性显示不出来,显示不出来也就没法设置,(这个有点 不好哇~~,应该显示出来,免得又重启一次,小小鄙视一下~ 哇咔咔~),必须重启一下eclipse 再进来编辑才能看到。
这里大部门属性都已经自动填写上了,读者可以看到,这里其实就是把core-defaulte.xml,hdfs-defaulte.xml,mapred-defaulte.xml里面的一些配置属性展示在这,因为我们安装hadoop的时候,还在site系列配置文件里有改动,所以这里也要弄成一样的设置。主要关注的有以下属性
fs.defualt.name:这个在General tab页已经设置了。
mapred.job.tracker:这个在General tab页也设置了。
dfs.replication:这个这里默认是3,因为我们再hdfs-site.xml里面设置成了1,所以这里也要设置成1
hadoop.tmp.dir:这个默认是/tmp/hadoop-{user.name},因为我们在ore-defaulte.xml 里hadoop.tmp.dir设置的是/usr/local/hadoop/hadooptmp,所以这里我们也改成/usr/local/hadoop/hadooptmp,其他基于这个目录属性也会自动改
hadoop.job.ugi:刚才说看不见的那个,就是这个属性,这里要填写:root,Tardis,逗号前面的是连接的hadoop的用户,逗号后面就写死Tardis。
然后点击finish,然后就连接上了,连接上的标志如图:
DFS Locations下面会有一只大象,下面会有一个文件夹(2) 这个就是 hdfs的根目录,这里就是展示的分布式文件系统的目录结构。
三、写一个wordcount的程序,在eclipse里执行
在这个eclipse里建一个map\reduce 工程,如图
叫exam,然后在这个工程下面建个java类如下
第一个,MyMap.java
- package org;
- import java.io.IOException;
- import java.util.StringTokenizer;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Mapper;
- public class MyMap extends Mapper<Object, Text, Text, IntWritable> {
- private final static IntWritable one = new IntWritable(1);
- private Text word;
- public void map(Object key, Text value, Context context)
- throws IOException, InterruptedException {
- String line = value.toString();
- StringTokenizer tokenizer = new StringTokenizer(line);
- while (tokenizer.hasMoreTokens()) {
- word = new Text();
- word.set(tokenizer.nextToken());
- context.write(word, one);
- }
- }
- }
第二个,MyReduce.java
- package org;
- import java.io.IOException;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Reducer;
- public class MyReduce extends
- Reducer<Text, IntWritable, Text, IntWritable> {
- public void reduce(Text key, Iterable<IntWritable> values, Context context)
- throws IOException, InterruptedException {
- int sum = 0;
- for (IntWritable val : values) {
- sum += val.get();
- }
- context.write(key, new IntWritable(sum));
- }
- }
第三个,MyDriver.java
- package org;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
- public class MyDriver {
- public static void main(String[] args) throws Exception,InterruptedException {
- Configuration conf=new Configuration();
- Job job=new Job(conf,"Hello Hadoop");
- job.setJarByClass(MyDriver.class);
- job.setMapOutputKeyClass(Text.class);
- job.setMapOutputValueClass(IntWritable.class);
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(IntWritable.class);
- job.setMapperClass(MyMap.class);
- job.setCombinerClass(MyReduce.class);
- job.setReducerClass(MyReduce.class);
- job.setInputFormatClass(TextInputFormat.class);
- job.setOutputFormatClass(TextOutputFormat.class);
- FileInputFormat.setInputPaths(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job, new Path(args[1]));
- // JobClient.runJob(conf);
- job.waitForCompletion(true);
- }
- }
这三个类都是基于最新的 hadoop0.20.0的,
注意了,这一步非常关键,笔者折腾了半天才想明白,是在windows下的一些设置,进入C:\Windows\System32\drivers\etc 目录,打开 hosts文件 加入:192.168.133.128 hadoopName
ip是我linux的机器ip,hadoopName是linux的机器名,这个一定要加,不然,会出错,这里其实就是把master的ip和机器名加上了
然后设置MyDriver类的 执行参数,也就是输入,输出参数,和在linux下的一样,要指定输入的文件夹,和输出的文件夹
如图:
input 就是文件存放路径,outchen就是mapReduce 之后处理的数据 输出文件夹
然后run on hadoop 如图
控制台打印如下信息:
- 11/05/14 19:08:07 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
- 11/05/14 19:08:08 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
- 11/05/14 19:08:08 INFO input.FileInputFormat: Total input paths to process : 4
- 11/05/14 19:08:09 INFO mapred.JobClient: Running job: job_201105140203_0002
- 11/05/14 19:08:10 INFO mapred.JobClient: map 0% reduce 0%
- 11/05/14 19:08:35 INFO mapred.JobClient: map 50% reduce 0%
- 11/05/14 19:08:41 INFO mapred.JobClient: map 100% reduce 0%
- 11/05/14 19:08:53 INFO mapred.JobClient: map 100% reduce 100%
- 11/05/14 19:08:55 INFO mapred.JobClient: Job complete: job_201105140203_0002
- 11/05/14 19:08:55 INFO mapred.JobClient: Counters: 17
- 11/05/14 19:08:55 INFO mapred.JobClient: Job Counters
- 11/05/14 19:08:55 INFO mapred.JobClient: Launched reduce tasks=1
- 11/05/14 19:08:55 INFO mapred.JobClient: Launched map tasks=4
- 11/05/14 19:08:55 INFO mapred.JobClient: Data-local map tasks=4
- 11/05/14 19:08:55 INFO mapred.JobClient: FileSystemCounters
- 11/05/14 19:08:55 INFO mapred.JobClient: FILE_BYTES_READ=2557
- 11/05/14 19:08:55 INFO mapred.JobClient: HDFS_BYTES_READ=3361
- 11/05/14 19:08:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=5260
- 11/05/14 19:08:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1688
- 11/05/14 19:08:55 INFO mapred.JobClient: Map-Reduce Framework
- 11/05/14 19:08:55 INFO mapred.JobClient: Reduce input groups=192
- 11/05/14 19:08:55 INFO mapred.JobClient: Combine output records=202
- 11/05/14 19:08:55 INFO mapred.JobClient: Map input records=43
- 11/05/14 19:08:55 INFO mapred.JobClient: Reduce shuffle bytes=2575
- 11/05/14 19:08:55 INFO mapred.JobClient: Reduce output records=192
- 11/05/14 19:08:55 INFO mapred.JobClient: Spilled Records=404
- 11/05/14 19:08:55 INFO mapred.JobClient: Map output bytes=5070
- 11/05/14 19:08:55 INFO mapred.JobClient: Combine input records=488
- 11/05/14 19:08:55 INFO mapred.JobClient: Map output records=488
- 11/05/14 19:08:55 INFO mapred.JobClient: Reduce input records=202
说明执行成功,
去看一下,会多一个outchen目录,里面放着执行的结果,和在普通的linux上执行的一样。
四、聊聊注意事项
1、在安装hadoop的时候 core-site.xml 和 mapred.site.xml里面的 fs.defulate.name,和 mapred.job.tracker那个一定要写ip地址,不要写localhost,虽然是单机,也不能写localhost,要写本机的ip,不然eclipse连接不到。
2、masters 和 slaves文件里也要写ip,不要写localhost
五、一些错误分析
1、出现如图所示

- 11/05/08 21:41:37 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
- job new ֮ǰ-----------------------------------
- 11/05/08 21:41:40 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 0 time(s).
- 11/05/08 21:41:42 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 1 time(s).
- 11/05/08 21:41:44 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 2 time(s).
- 11/05/08 21:41:46 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 3 time(s).
- 11/05/08 21:41:48 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 4 time(s).
- 11/05/08 21:41:50 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 5 time(s).
- 11/05/08 21:41:52 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 6 time(s).
- 11/05/08 21:41:54 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 7 time(s).
- 11/05/08 21:41:56 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 8 time(s).
- 11/05/08 21:41:58 INFO ipc.Client: Retrying connect to server: /192.168.133.128:9001. Already tried 9 time(s).
- Exception in thread "main" java.net.ConnectException: Call to /192.168.133.128:9001 failed on connection exception: java.net.ConnectException: Connection refused: no further information
- at org.apache.hadoop.ipc.Client.wrapException(Client.java:767)
- at org.apache.hadoop.ipc.Client.call(Client.java:743)
- at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
- at org.apache.hadoop.mapred.$Proxy0.getProtocolVersion(Unknown Source)
- at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
- at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:429)
- at org.apache.hadoop.mapred.JobClient.init(JobClient.java:423)
- at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:410)
- at org.apache.hadoop.mapreduce.Job.<init>(Job.java:50)
- at org.apache.hadoop.mapreduce.Job.<init>(Job.java:54)
- at org.apache.hadoop.examples.WordCount.main(WordCount.java:59)
- Caused by: java.net.ConnectException: Connection refused: no further information
- at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
- at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
- at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
- at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
- at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:304)
- at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:176)
- at org.apache.hadoop.ipc.Client.getConnection(Client.java:860)
- at org.apache.hadoop.ipc.Client.call(Client.java:720)
- ... 9 more
- 11/05/14 20:08:26 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
- 11/05/14 20:08:46 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
- Exception in thread "main" java.net.UnknownHostException: unknown host: hadoopName
- at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:195)
- at org.apache.hadoop.ipc.Client.getConnection(Client.java:850)
- at org.apache.hadoop.ipc.Client.call(Client.java:720)
- at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
- at $Proxy1.getProtocolVersion(Unknown Source)
- at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:359)
- at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:106)
- at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:207)
- at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:170)
- at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82)
- at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1378)
- at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
- at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
- at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
- at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
- at org.apache.hadoop.mapred.JobClient.getFs(JobClient.java:463)
- at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:567)
- at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:761)
- at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
- at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
- at org.MyDriver.main(MyDriver.java:40)