Single Node Setup
Purpose
Prerequisites:预先准备
Supported Platforms
- GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
- 支持GNU/LINUX作为开发和生产平台,HADOOP集群支持2000节点在GNU/LINUX上已经被证实。
- Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.
- WIN32也可以作为开发平台,在WIN32平台上,分布式操作没有得到很好操作,所有它不支持作为生产平台。
Required Software
Required software for Linux and Windows include:
- JavaTM 1.6.x, preferably from Sun, must be installed. 从SUN官方获取JAVA1.6x以上版本,必须按照。
- ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons。 SSH必须被安装,SSH必须运行,使用SSH远程管理HADOOP守护脚本。
Additional requirements for Windows include:
- Cygwin - Required for shell support in addition to the required software above. windows平台需要安装Cygwin。
Installing Software
For example on Ubuntu Linux:以Ubuntu linux为例
Download
Prepare to Start the Hadoop Cluster
解压下载的HADOOP集群软件,打开文件目录,编辑文件conf/hadoop-env.sh,修改JAVA_HOME路径,设置为JAVA JDK的根路径。
Now you are ready to start your Hadoop cluster in one of the three supported modes:
Standalone Operation:独立操作
Pseudo-Distributed Operation:伪分布式操作
HADOOP能运行于单节点于伪分布模式,每个HADOOP的坚守进程运行于分开的JAVA进程。
Configuration:配置
Use the following:
conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
现在检查你是否可以在没有密码的时候SSH到本地主机
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
Execution:执行
Format a new distributed-filesystem:
格式化一个新的分布式文件系统
$ bin/hadoop namenode -format
启动HADOOP的守护进程
$ bin/start-all.sh
HADOOP的守护进程日志输出被写到HADOOP_LOG_DIR目录,默认是HADOOP目录下的log目录。
Browse the web interface for the NameNode and the JobTracker; by default they are available at:
the NameNode and the JobTracker浏览器的接口,默认他们是
Copy the input files into the distributed filesystem:
拷贝input中的文件到分布式系统
$ bin/hadoop fs -put conf input
Run some of the examples provided:
运行一些提供的实例
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Copy the output files from the distributed filesystem to the local filesytem and examine them:
拷贝output文件夹中的文件从分布式文件系统到本地文件系统,检查他们:
$ bin/hadoop fs -get output output
$ cat output/*
View the output files on the distributed filesystem:
查看分布式系统中的output文件夹中的文件
$ bin/hadoop fs -cat output/*