一、安装VIM
sudo apt-get install vim
二、更改主机名(Master, Slave1, Slave2...)
sudo vim /etc/hostname
三、更改主机解析地址
sudo vim /etc/hosts
127.0.0.1 localhost
113.55.112.52 Master
113.55.112.7 Slave1
113.55.112.44 Slave2
四、创建hadoop用户组和hadoop用户
1.新建hadoop用户组
sudo addgroup hadoop-group
2.新建hadoop用户,并加入hadoop-group
sudo adduser --ingroup hadoop-group hadoop
3.给用户hadoop赋予和root一样的权限
sudo vim /etc/sudoers
root ALL=(ALL:ALL) ALL
hadoop ALL=(ALL:ALL) ALL
五、安装JDK
1.复制jdk到安装目录
1-1.在/usr/local下新建java目录
cd /usr/local
sudo mkdir java
1-2.将jdk-8u40-linux-i586.tar.gz文件解压至目标文件夹
sudo tar -xzvf jdk-8u40-linux-i586.tar.gz -C /usr/local/java
2.配置环境变量
2-1.打开/etc/profile文件
sudo vim /etc/profile
注意:/etc/profile为全局配置文件,配置后对所有用户生效;~/.bashrc为用户局部配置文件,只对当前用户生效。
2-2.添加如下变量
# /etc/profile: system-wide .profile file for the Bourne shell (sh(1))
# and Bourne compatible shells (bash(1), ksh(1), ash(1), ...).
#set java environment
export JAVA_HOME=/usr/local/java/jdk1.8.0_40
export JRE_HOME=/usr/local/java/jdk1.8.0_40/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH
2-3.使该文件生效
source /etc/profile
3.检查是否安装成功
java -version
六、安装ssh
1.更新apt-get
sudo apt-get update
2.安装openssh-server服务
sudo apt-get install openssh-server
3.建立ssh无密码登录本机
3-1.创建ssh-key
ssh-keygen -t rsa -P ""
3-2.对于Master,进入~/.ssh/目录下,将id_rsa.pub追加到authorized_keys授权文件中
cd ~/.ssh
cat id_rsa.pub >> authorized_keys 或
cp id_rsa.pub authorized_keys
3-3.对于Slave,将Master的authorized_keys复制到~/.ssh/目录下,同时做以下设置
chmod 600 authorized_key //改变文件的权限为600
chgrp hadoop-group authorized_keys //改变文件所属用户组
chown hadoop authorized_keys //改变文件所属用户
3-4.关闭防火墙后重启,安装gufw后将profile的home,office,public防火墙关闭,还要把首选项的ufw logging off
sudo apt-get install gufw
sudo gufw
3-4.登录localhost
ssh localhost
3-5.执行退出命令
exit
3-6.登录Slave
ssh Slave1
3-7.执行退出命令
exit
七、安装Hadoop
1.解压到(根)~目录下,并将名字重命名为hadoop
sudo tar -xzvf hadoop-2.6.0.tar.gz -C ~/
sudo mv hadoop-2.6.0 hadoop
2.修改Hadoop配置文件,进入${HADOOP_HOME}/etc/hadoop/目录
2-1.在hadoop-env.sh中修改Java安装目录
export JAVA_HOME=/usr/local/java/jdk1.8.0_40
2-2.修改core-site.xml,添加如下内容
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
2-3.修改hdfs-site.xml,添加以下内容
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
2-4.将mapred-site.xml.template重命名为mapred-site.xml,并添加如下内容
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Master:19888</value>
</property>
</configuration>
2-5.修改yarn-site.xml,添加以下内容
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>Master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>Master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Master:8088</value>
</property>
</configuration>
3.为了方便使用hadoop命令或者start-all.sh等命令,在所有节点上/etc/profile 新增以下内容:
export HADOOP_HOME=/home/hadoop/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
4.在slaves文件中添加slave1, slave2
5.向各节点复制hadoop
sudo scp -r /home/hadoop/hadoop Slave1:/home/hadoop/
sudo scp -r /home/hadoop/hadoop Slave2:/home/hadoop/
6.在Master上格式化HDFS
bin/hdfs namenode -format
7.在Master上启动HDFS,该命令执行后,会自动在slave节点主文件夹下生成dfs文件夹
sbin/start-dfs.sh
8.启动YARN
sbin/start-yarn.sh
9.上述HDFS和YARN启动完成后,在Master上jps命令查看有以下几个进程:
10308 NameNode
10583 SecondaryNameNode
11255 Jps
10971 ResourceManager
10.在slave上jps命令可以查看到以下几个进程
19217 NodeManager
19474 Jps
18869 DataNode
11.如果失败,检查各节点(包括主节点)上dfs/data/current/VERSION文件中clusterID是否相同,不相同要改成一致。如果要重新格式化HDFS,要先将所有节点的dfs文件删除。
八、测试
1.通过web界面输入http://localhost:50070查看Hadoop Administration信息
Overview 'Master:9000' (active)
Started: Sun May 17 13:54:04 CST 2015
Version: 2.6.0, re3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled: 2014-11-13T21:10Z by jenkins from (detached from e349649)
Cluster ID: CID-4cef2df9-1ae9-4409-9486-a71254487f7e
Block Pool ID: BP-786922400-113.55.112.196-1431841994289
Datanode Information
In operation
Node Last contact Admin State Capacity Used Non DFS Used Remaining Blocks Block pool used Failed Volumes Version
slave1 (113.55.112.190:50010) 2 In Service 458.23 GB 24 KB 24.13 GB 434.1 GB 0 24 KB (0%) 0 2.6.0
Slave2 (113.55.112.44:50010) 2 In Service 458.23 GB 24 KB 23.96 GB 434.27 GB 0 24 KB (0%) 0 2.6.0
2.
查看DataNode信息:bin/hdfs dfsadmin -reporthadoop@Master:~/hadoop$ bin/hdfs dfsadmin -report
Configured Capacity: 984037638144 (916.46 GB)
Present Capacity: 932398043136 (868.36 GB)
DFS Remaining: 932397993984 (868.36 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 113.55.112.190:50010 (slave1)
Hostname: slave1
Decommission Status : Normal
Configured Capacity: 492018819072 (458.23 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 25911414784 (24.13 GB)
DFS Remaining: 466107379712 (434.10 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.73%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun May 17 14:12:42 CST 2015
Name: 113.55.112.44:50010 (Slave2)
Hostname: Slave2
Decommission Status : Normal
Configured Capacity: 492018819072 (458.23 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 25728180224 (23.96 GB)
DFS Remaining: 466290614272 (434.27 GB)
DFS Used%: 0.00%
DFS Remaining%: 94.77%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Sun May 17 14:12:42 CST 2015
hadoop@Master:~/hadoop$
3.其他DFSadmin命令:
3.1 使Datanode节点datanodename退役
bin/hadoop dfsadmin -decommission datanodename
3.2 将集群置于安全模式
bin/hadoop dfsadmin -safemode enter
3.3 列出所有当前支持的命令
bin/hadoop dfsadmin-help