最近花了几天时间,终于把hadoop集群搭建成功了,集群信息如下:
主机名 | 角色 | jps命令结果 | 安装路径 |
centos-1 | master | 2612 NameNode 3017 TaskTracker 3129 Jps 2821 SecondaryNameNode 2907 JobTracker | /opt/soft/hadoop1.0.3 |
centos-2 | slaves | 4146 DataNode 4222 Jps 3998 TaskTracker | /opt/soft/hadoop1.0.3 |
centos-3 | slaves | 3968 TaskTracker 4208 Jps 4132 DataNode | /opt/soft/hadoop1.0.3 |
1、安装Java运行环境
下载JDK 1.7,解压安装到/opt/soft/jdk1.7 ,在/etc/profile中配置环境变量
#Java
export JAVA_HOME=/opt/soft/jdk1.7
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
2、安装hadoop1.0.3
下载hadoop1.0.3,解压到/opt/soft/hadoop1.0.3 ,在/etc/profile中配置hadoop的环境变量
#hadoop
export HADOOP_HOME=/opt/soft/hadoop1.0.3
export PATH=$HADOOP_HOME/bin:$PATH
打开/opt/soft/hadoop1.0.3/conf/hadoop-env.sh文件,取消JAVA_HOME的注释并将JAVA_HOME的值改为/opt/soft/jdk1.7
至此 hadoop单机版就安装成功了,这是hadoop的配置文件都为空,接下来安装hadoop的伪分布式
3、在centos-1上hadoop的伪分布式安扎un个
1)修改core-site.xml文件
编辑/opt/soft/hadoop1.0.3/conf/core-site.xml文件,添加如下内容:
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop</value>
<description>A base for other temporary directories.</description>
</property>
<!--用于dfs命令模块中指定默认的文件系统协议 -->
<property>
<name>fs.default.name</name>
<value>hdfs://centos-1:9000</value>
<description>The name of the default file system.A URI whose scheme and authority determine the FileSystem implementation.The uri's scheme determines the c onfig property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's
authority is used to determine the host,port,etc. for a filesystem.
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/name</value>
<description>Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-deliminted list of directories then the n ame table is replicated in all of the directories, for redundancy .</description>
</property>
2)修改hdfs-site.xml文件
编辑/opt/soft/hadoop1.0.3/conf/hdfs-site.xml文件,添加如下内容:
<property>
<name>dfs.data.dir</name>
<value>/hadoop/data</value>
<description>Determines where on the local filesystem an DFS data node should store its blocks.If this is a comma-delimited list of directories,then data will be sto red in all named directories,typically on different devices.Directories that
do not exist are ignored.</description>
</property>
<property>
<name>dfs.replication</name> // 默认Block副本数
<value>1</value>
<description>Default block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified i n create time.</description>
</property>
3) 修改mapred-site.xml文件
编辑/opt/soft/hadoop1.0.3/conf/mapred-site.xml文件,添加如下内容:
<property>
<name>mapred.job.tracker</name>
<value>centos-1:9001</value>
<description>The host and port that the MapReduce job tracker runs at. If "local",then jobs are run in-process as a single map and reduce task.</description>
</property>
4) 修改conf文件夹下的masters文件,在文件内容写入master节点的地址:centos-1
5) 修改conf文件夹下的slaves文件,添加所有的slaver节点:
centos-1
centos-2
centos-3
6) 修改 /etc/hosts 文件,将master节点和slaver节点的ip映射写入文件
192.168.1.20 centos-1
192.168.1.21 centos-2
192.168.1.22 centos-3
如果上述第五步、第六步不添加slaver节点的ip ,那么就是伪分布式安装了。
4、分布式安装
将上述修改过的文件拷贝到slaves节点相应的目录下,接下来还需要安装ssh免密登录.
首先进入/root目录,使用命令 ssh-keygen -t rsa ,然后使用命令 cd .ssh 进入隐藏目录.ssh ,接着使用命令cp id_rsa.pub authorized_keys 复制该文件,将复制后的文件拷贝到各个slaves节点相应的目录下
scp authorized_keys centos-2:/root/.ssh
scp authorized_keys centos-3:/root/.ssh
然后使用命令ssh centos-2
ssh centos-3 第一次输入密码后,以后登录就都不需要密码了。
注意: .ssh 权限必须是700 ,.ssh 里面的文件的权限最好是600
5、至此,hadoop的分布式环境就搭建好了
第一次使用时执行命令:hadoop namenode -format
start-all.sh //开启集群
#####注意:本文的hadoop搭建是在root用户下进行的,如果需要换成其他用户,那么在hadoop的安装,以及配置ssh免密登陆时一定要同一用户########
#####如果在执行start-all.sh 后,slaves节点中的datanode、tasktracker没有启动,那么请关闭各个节点的防火墙,让偶