Quick Install Hadoop on Windows

本文提供了一个详细的Hadoop安装教程,包括所需软件如Java 1.6.x、Cygwin及SSH的配置步骤,并介绍了如何在伪分布式模式下配置与运行Hadoop。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

For mahout, today I install Hadoop in my PC, here are the installation guide, hope useful :)

Required Software
1. Java 1.6.x
2. Cygwin: It is a is a Linux-like environment for Windows, and is Required for shell support in addition to the required software above. 
3. SSH must be installed and SSHD must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Install Cygwin:
1. download setup.exe from http://www.cygwin.com/
2. select install from internet, and sepecify the download folder and install folder, for downloading, please select a mirror nearby.
3. after the conponent list are downloaded, please search "SSH", it will in Net category, and change the default "Skip" into one version of SSH.
4. download and install the component.


After install, you can see a Cygwin.exe in your desktop, you can run the bash shell to perform Linux in our Windows enviornment.
Your linux filesystem is under %You_Cygwin_Install% folder.

SSH Configuration:

1. Add System Enviornment:
A.add a new system enviornment named as CYGWIN, it value is 'ntsec tty'.
B.edit system enviornment PATH, add your 'Cygwin/bin' folder into it.
2. Config SSH
A. change to bin folder: "cd /bin"
B. execute configuration command:"ssh-host-config -y",when "CYGWIN=" come up, please input "ntsec tty". After this, your SSH service is been started in window services, then please restart your computer.

C. change to home folder in your Cygwin install folder, you can see that a folder named as your window user account have been generated.
D. execute connect command: "ssh youname@127.0.0.1"
if you connect successfully, that means your configuration is correct.  print out "Last login: Sun Jun 8 19:47:14 2008 from localhost"
if your connection fail, maybe you need add SSH permition in your firewall, the default port of SSH is 22.
E. if you need sepcify the password, you could using the following commands:
"ssh-keygen -t dsa -P '<your_password>' -f ~/.ssh/id_dsa"
"$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys"
then every time you are be asked to input the password when you trying to connect through SSH.

Hadoop Install and Configuration
1. Download Hadoop ".tar.gz" file, and extract them under your Cygwin file system, suggest: usr/local
2. Configure hadoop-env.sh under hadoop/conf folder
export JAVA_HOME=<Your Java Location>  //put java under Cygwin is better to sepecify the location
export HADOOP_IDENT_STRING=MYHADOOP
After the configuration, you can use the following commands verify your installation.
cd /usr/local/hadoop
bin/hadoop version
It should print out:
Hadoop 0.17.0
Subversion http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.17 -r 656523
Compiled by hadoopqa on Thu May 15 07:22:55 UTC 2008
3. Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
For the "Pseudo-Distributed Operation" mode, you need do the following configurations
A. in conf/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
B. in conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
C. in conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
4. Execution
A. Format a new distributed-filesystem:
$ bin/hadoop namenode -format
B. Start the hadoop daemons:
$ bin/start-all.sh   your also can start them specifically, $> bin/start-dfs.sh and $> bin/start-mapred.sh
The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).
C. Browse the web interface for the NameNode and the JobTracker; by default they are available at:
* NameNode - http://localhost:50070/
* JobTracker - http://localhost:50030/
D. Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
E. Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
F. Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
   or
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
G. When you're done, stop the daemons with:
$ bin/stop-all.sh

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值