一:CygWin安装
cygwin1.7.15 下载地址
安装省略、(记得安装ssh)
安装完成后 将usr\sbin 目录 加入到path环境变量中
二:ssh配置
$ ssh-host-config
*** Query: Should privilege separation be used? (yes/no) no
*** Query: (Say "no" if it is already installed as a service) (yes/no)yes
*** Query: Enter the value of CYGWIN for the daemon: [] ntsec
*** Query: Do you want to use a different name? (yes/no) yes
*** Query: Enter the new user name: admin
*** Query: Reenter: admin
*** Query: Create new privileged user account 'admin'? (yes/no) yes
*** Query: Please enter the password:密码
*** Query: Reenter:重复密码
启动ssh服务
net start sshed
配置无密登录
$ ssh-keygen(win7下 以管理员身份运行)
Enter file in which to save the key (/home/Administrator/.ssh/id_rsa):回车
Enter passphrase (empty for no passphrase):回车
Enter same passphrase again:回车
cd /cygdrive/c/cygwin/home/Administrator/.ssh
(对应cygwin安装目录 例如:D:\cygwin\home\Administrator\.ssh)
cp id_rsa.pub authorized_keys
登录 ssh
$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 86:07:88:db:34:94:f8:09:6d:f4:7d:19:48:67:fe:e1.
Are you sure you want to continue connecting (yes/no)? yes
三:hadoop配置 启动 (hadoop-1.0.0版本)
1.配置 修改hadoop/conf目录下 4个文件
hadoop-env.sh core-site.xml hdfs-site.xml mapred-site.xml
①.hadoop-env.sh
export JAVA_HOME=/cygdrive/d/Java/jdk1.6.0_10
②.conf/core-site.xml:
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://localhost:9000</value>
- </property>
- </configuration>
③.conf/hdfs-site.xml
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
④.conf/mapred-site.xml
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- </configuration>
2. 启动
切换到hadoop 安装目录 cd /cygdrive/d/hadoop/hadoop-1.0.0
格式化NameNode bin/hadoop namenode -format
启动hadoop bin/start-all.sh
在hdsf系统中建立 一个名称为test的目录 bin/hadoop fs -mkdir test
上传文件 bin/hadoop fs -put *.txt test (Hadoop根目录下的所有文本文件 上传到了test目录)
也可以通过NameNode - http://localhost:50070/ 上面的Browse the filesystem连接 验证是否上传成功
JobTracker - http://localhost:50030/
注:ssh服务通过如下方式起不来,如下:
Administrator@PC2011120115bjx ~/.ssh
$ ssh localhost
ssh: connect to host localhost port 22: Connection refused
请查看var/log/sshd.log日志文件,查看到是由于Privilege separation user sshd does not exist造成的,通过编辑cygwin中D:\cygwin\etc目录下的passwd文件,增加如下内容sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin保存,然后再通过服务启动即可
相关文章如下:
http://blog.youkuaiyun.com/java2000_wl/article/details/7586890
http://blog.163.com/tienan_feng@126/blog/static/173379258201132021831344/
http://blog.youkuaiyun.com/java2000_wl/article/details/7598040
注:要是CentOS版本必须要执行chmod 600 ~/.ssh/authorized_keys,否则连接还是会要求输入密码。
另外hadoop如果机群安装不能启动hdfs,则可能是机群中各配置文件内容不一致造成的,需要保持与namenode一致即可。
至于以下问题还需要待定位:
2013-01-14 20:16:28,700 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9100, call addBlock(/tmp/hadoop-root/mapred/system/jobtracker.info, DFSClient_-1519564247, null) from 127.0.0.1:54270: error: java.io.IOException: File /tmp/hadoop-root/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /tmp/hadoop-root/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
以上问题可能是由于以前的数据问题导致的,建议删除数据,然后格式化后再试试。
2013-01-16 18:01:35,197 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2013-01-16 18:01:35,199 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/yf/hadoopRun/tmp/dfs/name does not exist.
2013-01-16 18:01:35,201 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/yf/hadoopRun/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)
以上问题可能是由于以前存在datanode数据造成不一致,解决方式是删除以前的数据,格式化下,然后重新启动。
启动报如下错:
192.168.9.228: Address 192.168.9.228 maps to bida, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
192.168.9.228: no datanode to stop
192.168.9.229: no datanode to stop
192.168.9.228: Address 192.168.9.228 maps to bida, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
是由于masters和slaves配置不对,最好写成域名形式,从而防止重复。
正确配置如下:
环境:192.169.9.228(bida),192.168.9.229(bidb)
物理架构应该是有两台机器a(228) ,b(229)
a 运行 namenode,datanode ,tasktack ,jobtack,sencendnamenode
b运行datanode ,tasktack
a为主节点,b为普通节点
分别在228和229上配置/etc/hosts增加
192.168.9.228 bida
192.168.9.228 bidb
masters配置如下:
bida
slaves配置如下:
bida
bidb
core-site.xml 配置如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://bida:9100</value>
<description>
NameNode的URI路径,格式:hdfs://主机名:端口/
</description>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>3600</value>
<description>
进行checkpoint的周期时间间隔,单位:秒
</description>
</property>
<property>
<name>fs.checkpoint.size</name>
<value>67108864</value>
<description>
日志文件达到这个上限值时,将进行一次强制checkpoint操作,单位:byte
</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/yf/tmp</value>
<description>
Hadoop的默认临时路径,这个最好配置,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录,那么就需要重新执行NameNode格式化的命令。/home/yf//tmp这里给的路径不需要创建会自动生成。
</description>
</property>
</configuration>
hdfs-site.xml配置如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml配置如下:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://bida:9001</value>
</property>
</configuration>
最后启动即可。
hdfs-site.xml 可以如下配置,指定备份目录:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/yf/hadoopRun/name1,/home/yf/hadoopRun/name2</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/yf/hadoopRun/data1,/home/yf/hadoopRun/data2</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml 也进行如下配置:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://bida:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/home/yf/hadoopRun/var</value>
</property>
</configuration>
但需要清理以前的数据目录,并格式化操作,否则会出现类似如下问题:
2013-01-16 18:37:02,254 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: bida/127.0.0.1:9100. Already tried 9 time(s).
2013-01-16 18:37:02,256 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:java.net.ConnectException: Call to bida/127.0.0.1:9100 failed on connection exception: java.net.ConnectException: Connection refused
2013-01-16 18:37:02,257 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: null
java.net.ConnectException: Call to bida/127.0.0.1:9100 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
此问题是由于以前垃圾数据导致的,删除以前的数据目录,然后格式化即可。
最后访问
http://192.168.9.228:50070/dfshealth.jsp
会多以下两个指定路径,并可以查看namenode状态。截图如下:
另外map/reduce访问如下地址会出现一个节点:
http://192.168.9.228:50030/jobtracker.jsp
截图如下:
启动后可以看到如许进程:
[root@bida bin]# jps
1581 JobTracker
1070 NameNode
1324 DataNode
2342 Jps
1466 SecondaryNameNode
1723 TaskTracker
以下问题是由于配置文件中的中文造成的,删除中文描述即可
redmap-master: Exception in thread "main" java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. redmap-master: at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1560) redmap-master: at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1425) redmap-master: at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1371)
以下问题是由于没有执行hadoop的格式化命令造成的
13/01/20 09:29:45 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9100. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
执行 ./hadoop namenode -format,然后再启动hadoop即可。
另外配置完成后在datanode无法执行相关hadoop命令,查看namenode服务器上的日志文件
在hadoop-root-namenode-bida.log出现java.io.IOException: File jobtracker.info could only be replicated to 0 nodes, instead of 1这个异常,说明namenode和datanode之间不能通信,无法拷贝数据,另外这个文件中发现了一点异常,connection to bida/192.168.9.228:9000 failed。
bida/192.168.9.228:9000这是什么东西啊,后面的是本机ip,前面的是机器名,为啥回到一块呢,查看hosts文件,仔细一看,好像是多了一个配置127.0.0.1 bida localhost,把这个bida删除掉果然好了。姑且在这里做个记录,给大家做个参考。
centos修改主机名为在/etc/sysconfig/network文件中修改。