cygwin及linux上的hadoop安装

本文详细介绍了如何在CygWin环境下快速安装并配置CygWin、SSH及Hadoop,包括安装CygWin、配置SSH服务、实现无密登录、Hadoop配置与启动流程,以及解决常见部署问题的步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


一:CygWin安装

        cygwin1.7.15  下载地址

        安装省略、(记得安装ssh)


        安装完成后  将usr\sbin 目录   加入到path环境变量中

二:ssh配置

         $ ssh-host-config

         *** Query: Should privilege separation be used? (yes/no) no

         *** Query: (Say "no" if it is already installed as a service) (yes/no)yes

         *** Query: Enter the value of CYGWIN for the daemon: [] ntsec

         *** Query: Do you want to use a different name? (yes/no) yes

         *** Query: Enter the new user name: admin

         *** Query: Reenter: admin

         *** Query: Create new privileged user account 'admin'? (yes/no) yes

         *** Query: Please enter the password:密码

         *** Query: Reenter:重复密码

         启动ssh服务

        net start sshed


         配置无密登录

         $ ssh-keygen(win7下 以管理员身份运行)

         Enter file in which to save the key (/home/Administrator/.ssh/id_rsa):回车

         Enter passphrase (empty for no passphrase):回车

         Enter same passphrase again:回车

 

         cd /cygdrive/c/cygwin/home/Administrator/.ssh  

        (对应cygwin安装目录 例如:D:\cygwin\home\Administrator\.ssh)

         cp id_rsa.pub authorized_keys

        登录 ssh

         $ ssh localhost

         The authenticity of host 'localhost (127.0.0.1)' can't be established.
         ECDSA key fingerprint is 86:07:88:db:34:94:f8:09:6d:f4:7d:19:48:67:fe:e1.
         Are you sure you want to continue connecting (yes/no)? yes

三:hadoop配置 启动 (hadoop-1.0.0版本)

       1.配置 修改hadoop/conf目录下 4个文件

       hadoop-env.sh  core-site.xml  hdfs-site.xml  mapred-site.xml

       ①.hadoop-env.sh
       export JAVA_HOME=/cygdrive/d/Java/jdk1.6.0_10

       ②.conf/core-site.xml:

  1. <configuration>  
  2.     <property>  
  3.         <name>fs.default.name</name>  
  4.         <value>hdfs://localhost:9000</value>  
  5.     </property>  
  6. </configuration>  

       ③.conf/hdfs-site.xml

  1. <configuration>  
  2.     <property>  
  3.         <name>dfs.replication</name>  
  4.         <value>1</value>  
  5.     </property>  
  6. </configuration>  

       ④.conf/mapred-site.xml

  1. <configuration>  
  2.     <property>  
  3.          <name>mapred.job.tracker</name>  
  4.          <value>localhost:9001</value>  
  5.      </property>  
  6. </configuration>  

       2. 启动

       切换到hadoop 安装目录 cd  /cygdrive/d/hadoop/hadoop-1.0.0

       格式化NameNode           bin/hadoop namenode -format

       启动hadoop                      bin/start-all.sh

       在hdsf系统中建立 一个名称为test的目录 bin/hadoop fs -mkdir test

       上传文件  bin/hadoop fs -put *.txt test   (Hadoop根目录下的所有文本文件 上传到了test目录)

       也可以通过NameNode - http://localhost:50070/ 上面的Browse the filesystem连接 验证是否上传成功

      JobTracker - http://localhost:50030/


注:ssh服务通过如下方式起不来,如下:

Administrator@PC2011120115bjx ~/.ssh
$ ssh localhost
ssh: connect to host localhost port 22: Connection refused

查看var/log/sshd.log日志文件,查看到是由于Privilege separation user sshd does not exist造成的,通过编辑cygwin中D:\cygwin\etc目录下的passwd文件,增加如下内容sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin保存,然后再通过服务启动即可

相关文章如下:

http://blog.youkuaiyun.com/java2000_wl/article/details/7586890

http://blog.163.com/tienan_feng@126/blog/static/173379258201132021831344/

http://blog.youkuaiyun.com/java2000_wl/article/details/7598040

注:要是CentOS版本必须要执行chmod 600 ~/.ssh/authorized_keys,否则连接还是会要求输入密码。


另外hadoop如果机群安装不能启动hdfs,则可能是机群中各配置文件内容不一致造成的,需要保持与namenode一致即可。

至于以下问题还需要待定位:

2013-01-14 20:16:28,700 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9100, call addBlock(/tmp/hadoop-root/mapred/system/jobtracker.info, DFSClient_-1519564247, null) from 127.0.0.1:54270: error: java.io.IOException: File /tmp/hadoop-root/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
java.io.IOException: File /tmp/hadoop-root/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

以上问题可能是由于以前的数据问题导致的,建议删除数据,然后格式化后再试试。


2013-01-16 18:01:35,197 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names occuring more than 10 times
2013-01-16 18:01:35,199 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/yf/hadoopRun/tmp/dfs/name does not exist.
2013-01-16 18:01:35,201 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/yf/hadoopRun/tmp/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:303)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:388)


以上问题可能是由于以前存在datanode数据造成不一致,解决方式是删除以前的数据,格式化下,然后重新启动。


启动报如下错:

192.168.9.228: Address 192.168.9.228 maps to bida, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
192.168.9.228: no datanode to stop
192.168.9.229: no datanode to stop
192.168.9.228: Address 192.168.9.228 maps to bida, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!

是由于masters和slaves配置不对,最好写成域名形式,从而防止重复。





正确配置如下:

环境:192.169.9.228(bida),192.168.9.229(bidb)

物理架构应该是有两台机器a(228) ,b(229)
a 运行 namenode,datanode ,tasktack ,jobtack,sencendnamenode
b运行datanode ,tasktack
a为主节点,b为普通节点


分别在228和229上配置/etc/hosts增加

192.168.9.228 bida

192.168.9.228 bidb


masters配置如下:

bida


slaves配置如下:
bida
bidb


core-site.xml 配置如下:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>  
                <name>fs.default.name</name>  
                <value>hdfs://bida:9100</value>  
        <description>  
            NameNode的URI路径,格式:hdfs://主机名:端口/  
        </description>  
        </property>  
    <property>  
                <name>fs.checkpoint.period</name>  
                <value>3600</value>  
        <description>  
            进行checkpoint的周期时间间隔,单位:秒  
        </description>  
        </property>  
    <property>  
                <name>fs.checkpoint.size</name>  
                <value>67108864</value>  
        <description>  
            日志文件达到这个上限值时,将进行一次强制checkpoint操作,单位:byte    
        </description>  
        </property>  

        <property>  
                <name>hadoop.tmp.dir</name>  
                <value>/home/yf/tmp</value>  
        <description>  
            Hadoop的默认临时路径,这个最好配置,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。不过如果删除了NameNode机器的此目录,那么就需要重新执行NameNode格式化的命令。/home/yf//tmp这里给的路径不需要创建会自动生成。  
        </description>  
        </property>  

</configuration>

hdfs-site.xml配置如下:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>

         <name>dfs.replication</name>

         <value>1</value>

     </property>
</configuration>


mapred-site.xml配置如下:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>

         <name>mapred.job.tracker</name>

         <value>hdfs://bida:9001</value>

     </property>
</configuration>

最后启动即可。



hdfs-site.xml 可以如下配置,指定备份目录:


<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>  
                <name>dfs.name.dir</name>  
                <value>/home/yf/hadoopRun/name1,/home/yf/hadoopRun/name2</value>  
        </property>  
        <property>  
                <name>dfs.data.dir</name>  
                <value>/home/yf/hadoopRun/data1,/home/yf/hadoopRun/data2</value>  
        </property>  
        <property>

         <name>dfs.replication</name>

         <value>1</value>

     </property>
</configuration>

mapred-site.xml 也进行如下配置:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>

         <name>mapred.job.tracker</name>

         <value>hdfs://bida:9001</value>

     </property>
    <property>  
                <name>mapred.local.dir</name>  
                <value>/home/yf/hadoopRun/var</value>  
        </property>  
</configuration>


但需要清理以前的数据目录,并格式化操作,否则会出现类似如下问题:

2013-01-16 18:37:02,254 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: bida/127.0.0.1:9100. Already tried 9 time(s).
2013-01-16 18:37:02,256 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root cause:java.net.ConnectException: Call to bida/127.0.0.1:9100 failed on connection exception: java.net.ConnectException: Connection refused
2013-01-16 18:37:02,257 INFO org.apache.hadoop.mapred.JobTracker: problem cleaning system directory: null
java.net.ConnectException: Call to bida/127.0.0.1:9100 failed on connection exception: java.net.ConnectException: Connection refused
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1099)
        at org.apache.hadoop.ipc.Client.call(Client.java:1075)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at $Proxy5.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)

此问题是由于以前垃圾数据导致的,删除以前的数据目录,然后格式化即可。


最后访问

http://192.168.9.228:50070/dfshealth.jsp

会多以下两个指定路径,并可以查看namenode状态。截图如下:


另外map/reduce访问如下地址会出现一个节点:

http://192.168.9.228:50030/jobtracker.jsp

截图如下:


启动后可以看到如许进程:

[root@bida bin]# jps
1581 JobTracker
1070 NameNode
1324 DataNode
2342 Jps
1466 SecondaryNameNode
1723 TaskTracker


以下问题是由于配置文件中的中文造成的,删除中文描述即可

redmap-master: Exception in thread "main" java.lang.RuntimeException: com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
redmap-master:  at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1560)
redmap-master:  at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1425)
redmap-master:  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1371)


以下问题是由于没有执行hadoop的格式化命令造成的

13/01/20 09:29:45 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9100. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)

执行 ./hadoop namenode -format,然后再启动hadoop即可。


另外配置完成后在datanode无法执行相关hadoop命令,查看namenode服务器上的日志文件

在hadoop-root-namenode-bida.log出现java.io.IOException: File jobtracker.info could only be replicated to 0 nodes, instead of 1这个异常,说明namenode和datanode之间不能通信,无法拷贝数据,另外这个文件中发现了一点异常,connection to bida/192.168.9.228:9000 failed。

bida/192.168.9.228:9000这是什么东西啊,后面的是本机ip,前面的是机器名,为啥回到一块呢,查看hosts文件,仔细一看,好像是多了一个配置127.0.0.1 bida localhost,把这个bida删除掉果然好了。姑且在这里做个记录,给大家做个参考。


centos修改主机名为在/etc/sysconfig/network文件中修改。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值