Hadoop 的配置、启动

本文详细介绍Hadoop集群的配置步骤,包括核心配置文件的设置,如core-site.xml、hdfs-site.xml等,并介绍了如何通过scp命令将配置同步至各节点,最后给出了启动集群的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

前提

  • 三台 linux(或虚拟机)的 ssh 免密登录(本人使用 root 用户)
  • Zookeeper 集群启动
  • java 环境变量配置

配置

  • 先在一台 linux 上配置,再 scp 命令复制到其他 linux 上
  • 配置文件都在 hadoop/etc/hadoop 目录下

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

	<!-- 分布式文件系统 -->
	<property>
		<name>fs.default.name</name>
		<value>hdfs://node1:8020</value>
	</property>
	<!-- 临时文件目录 -->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/usr/local/hadoop/data/tmp</value>
	</property>
	


	<!-- 缓冲区大小,根据实际调整 -->
	<property>
		<name>io.file.buffer.size</name>
		<value>4096</value>
	</property>
	
	<!-- 垃圾超时时间(分钟) -->
	<property>
		<name>fs.trash.interval</name>
		<value>10080</value>
	</property>
	
</configuration>

hadoop-env.sh

export JAVA_HOME=/usr/local/jdk

# hadoop3.0 以后,权限控制更严格
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
	
	<!-- namenode 访问地址,hadoop3.0 默认改为 9870 端口 -->
	<property>
		<name>dfs.namenode.http-address</name>
		<value>node1:9870</value>
	</property>
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>node2:9868</value>
	</property>
	



	<!-- namenode/datanode 数据存放目录 -->
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:///usr/local/hadoop/data/namenodeData, file:///usr/local/hadoop/data/namenodeData2</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:///usr/local/hadoop/data/datanodeData, file:///usr/local/hadoop/data/datanodeData2</value>
	</property>
	
	<!-- 日志文件存放目录 -->
	<property>
		<name>dfs.namenode.edits.dir</name>
		<value>file:///usr/local/hadoop/data/nn/edits</value>
	</property>
	
	<property>
		<name>dfs.namenode.checkpoint.dir</name>
		<value>file:///usr/local/hadoop/data/snn/name</value>
	</property>
	<property>
		<name>dfs.namenode.checkpoint.edits.dir</name>
		<value>file:///usr/local/hadoop/data/snn/edits</value>
	</property>
	
	<!-- 文件副本数 -->
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	
	<!-- 设置 hdfs 的文件权限(关闭) -->
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	<!-- 文件副本数 -->
	<property>
		<name>dfs.replication</name>
		<value>3</value>
	</property>
	</property>
	
	<!-- 文件切片大小(128M) -->
	<property>
		<name>dfs.blocksize</name>
		<value>134217728</value>
	</property>
	
</configuration>

mapred-env.sh

export JAVE_HOME=/usr/local/jdk

mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>

	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	
	<property>
		<name>yarn.app.mapreduce.am.env</name>
		<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
	</property>
	<property>
		<name>mapreduce.map.env</name>
		<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
	</property>
	<property>
		<name>mapreduce.reduce.env</name>
		<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
	</property>
	
	<property>
    <name>yarn.app.mapreduce.am.resource.mb</name>
    <value>1200</value>
	<description>表示MRAppMaster需要的总内存大小,默认是1536</description>
</property>
<property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-Xmx800m</value>
   	<description>表示MRAppMaster需要的堆内存大小,默认是:-Xmx1024m</description>
</property>
<property>
    <name>yarn.app.mapreduce.am.resource.cpu-vcores</name>
    <value>1</value>
    <description>表示MRAppMaster需要的的虚拟cpu数量,默认是:1</description>
</property>

<property>
    <name>mapreduce.map.memory.mb</name>
    <value>512</value>
	<description>表示MapTask需要的总内存大小,默认是1024</description>
</property>
<property>
    <name>mapreduce.map.java.opts</name>
    <value>-Xmx300m</value>
    <description>表示MapTask需要的堆内存大小,默认是-Xmx200m</description>
</property>
<property>
    <name>mapreduce.map.cpu.vcores</name>
    <value>1</value>
    <description>表示MapTask需要的虚拟cpu大小,默认是1</description>
</property>

<property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>512</value>
    <description>表示ReduceTask需要的总内存大小,默认是1024</description>
</property>
<property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-Xmx300m</value>
    <description>表示ReduceTask需要的堆内存大小,默认是-Xmx200m</description>
</property>
<property>
    <name>mapreduce.reduce.cpu.vcores</name>
    <value>1</value>
    <description>表示ReduceTask需要的虚拟cpu大小,默认是1</description>
</property>

</configuration>

yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->


	<!-- 主节点位置 -->
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>node1</value>
	</property>
	<property>
		<name>yarn.resourcemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<!-- 关闭 yarn 的内存检查,准备配置 spark -->
	<property>
		<name>yarn.nodemanager.pmem-check-enabled</name>
		<value>false</value>
	</property>
	<property>
		<name>yarn.nodemanager.vmem-check-enabled</name>
		<value>false</value>
	</property>
	


	<!-- 日志聚合功能 -->
	<property>
		<name>yarn.log-aggregation-enable</name>
		<value>true</value>
	</property>
	
	<!-- 聚合日志在 hdfs 上的保存时间 -->
	<property>
		<name>yarn.log-aggregation.retain-seconds</name>
		<value>604800</value>
	</property>
	
	<!-- 设置集群的内存分配方案 -->
	<property>
		<name>yarn.nodemanager.resource.memory-mb</name>
		<value>20480</value>
	</property>
	<property>
		<name>yarn.scheduler.minimum-allocation-mb</name>
		<value>2048</value>
	</property>
	<property>
		<name>yarn.nodemanager.vmem-pmem-ratio</name>
		<value>2.1</value>
	</property>

</configuration>

workers

  • Hadoop 3.0 以前版本也叫 slaves
node1
node2
node3
记得创建上述配置文件中的必要的路径

启动

  • 使用 hdfs namenode -format 命令格式化 namenode
  • 使用 hadoop/sbin/start-all.sh 启动 Hadoop 集群
  • jps 查看进程
  • 主机上有 NameNode,DataNode,SecondaryNameNode,ResourceManager,NodeManager
  • 从机上有 DataNode,NodeManager
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值