Hadoop 配置的八大步骤

最新推荐文章于 2025-07-16 14:41:34 发布

影湛_SK

最新推荐文章于 2025-07-16 14:41:34 发布

阅读量1.4k

点赞数

CC 4.0 BY-SA版权

分类专栏： Hadoop 文章标签： hadoop配置 hadoop

本文链接：https://blog.youkuaiyun.com/sk__________________/article/details/12789781

Hadoop 专栏收录该内容

1 篇文章

订阅专栏

本文详细介绍了Hadoop集群的搭建过程，包括环境准备、主机名与IP配置、SSH免密配置、关键配置文件详解及集群启动等步骤。

Hadoop的单机/伪分布式/完全分布式环境的配置其实大同小异，大致可分为以下几个大步骤（我们主要以完全分布式集群的配置为例）：

0、准备

a) 准备至少两台机子，linux系统

b) 安装jdk，并export相关系统环境变量，主要是JAVA_HOME和PATH

c) 下载hadoop并解压缩，并export相关系统环境变量，主要是PATH

1、修改/etc/hostname中的主机名（该步骤可省略）

假定两台分别命名为master和slave

(其中master作为NameNode, SecondaryNameNode, JobTracker, DataNode和TaskTracker;slave作为DataNode, TaskTracker)

2、修改/etc/hosts文件中的ip绑定 ( 两台机必须一致且正确)

添加两行：xxx.xxx.xxx(MASTER_IP) master

xxx.xxx.xxx(SLAVE_IP) slave

（注意：如果/etc/hosts文件中有类似【127.0.x.x master/slave】行的最好注释掉，不然会导致DataNode无法正常运行）

3、新建Linux用户（两台机，该步骤亦可省略）

使用groupadd 和 useradd 新建hadoop组中的hadoop用户

（建议将hadoop加入adm和sudo组，便于系统级的管理，使用useradd -G或者gpasswd命令）

su hadoop（切换至hadoop用户）

4、配置SSH （只需配置master）

这一步主要为了使NameNode节点（master）能够免输入密码而直接访问DataNode节点（slave）

使用ssh-keygen -t rsa（rsa为一种加密算法），其余均默认即可（密码为空，目录为/home/hadoop），然后便在该目录下生成一个.ssh目录，包含id_rsa和id_rsa.pub两个文件，分别为私钥和公钥文件，顾名思义，我们可以公开使用公钥文件。通过man ssh-copy-id就知道接下来要做什么了。

cp /home/hadoop/.ssh/id_rsa.pub /home/hadoop/.ssh/authorized_keys

ssh-copy-id -i /home/hadoop/.ssh/id_rsa.pub hadoop@slave （将master的公钥追加给slave的authorized_keys文件）

可以测试下，ssh master 和 ssh slave就可以免输密码登录了。

5、配置hadoop （hadoop1.X版本的配置文件均在$HADOOP_INSTALL/conf/目录下）

首先确保java -version 和 which hadoop 有正确结果输出。

假定hadoop解压到/home/hadoop/目录下，先在master上进行配置，然后通过scp -r $HADOOP_INSTALL hadoop@slave:/home/hadoop/远程复制给slave即可。

a）配置masters文件：将文件内容改为master

b）配置slaves文件：将文件内容改为slave

c）配置hadoop-env.sh文件：找到export JAVA_HOME行，去掉前面的注释，并将JAVA_HOME的值改为你的JAVA_HOME值

d）配置core-site.xml文件（在configuration标签中配置，下同）

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hadoop/tmp</value>
  <description>A base for other temporary directories.</description>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://master:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

e）配置hdfs-site.xml文件

<property>
  <name>dfs.replication</name>
  <value>2</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>
<property>
  <name>dfs.name.dir</name>
  <value>/home/hadoop/hdfs/name</value>
  <description>NameNode hdfs directory.</description>
</property>
<property>
  <name>dfs.data.dir</name>
  <value>/home/hadoop/hdfs/data</value>
  <description>DataNode hdfs directory.</description>
</property>

f）配置mapred-site.xml文件

<property>
  <name>mapred.job.tracker</name>
  <value>master:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>