1.Hadoop全分布式集群搭建及Hadoop相关操作

本文详细介绍基于VM虚拟机的Hadoop集群安装过程,包括环境准备、SSH免密登录设置、hosts配置、Hadoop安装与配置、集群启动验证及Java操作HDFS示例。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

前面写过一篇基于Dockerfile的Hadoop集群安装,但是在安装过程中遇到了很多问题(如:权限,端口与宿主机间访问等),本篇主要基于VM虚拟机实现Hadoop集群安装。

1.相关环境

软件版本
CentOSCentOS Linux release 7.5
Jdkjdk-8u201-linux-x64.tar.gz
Hadoophadoop-3.0.0.tar.gz

2.Hadoop集群规划

IP节点
10.168.14.49hadoop-master
10.168.14.50hadoop-slave1
10.168.14.51hadoop-slave2

3.设置hadoop ssh免密登录
安装hadoop前我们先配置好ssh免密登录
1).在hadoop-master节点生成秘钥

# 1. 生成秘钥
$ ssh-keygen -t rsa

# 2. 复制秘钥
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

# 3. 修改authorized_keys文件的权限
$ chmod 600 ~/.ssh/authorized_keys

# 4. 将authorized_keys文件复制到slave节点
$ scp -P 42014 ~/.ssh/authorized_keys uaren@hadoop-slave1:~/

2).打开hadoop-slave1及hadoop-slave2,在终端生成秘钥

# 1. 生成秘钥
$  ssh-keygen -t rsa

# 2. 将authorized_keys文件移动到.ssh目录
$ mv authorized_keys ~/.ssh/

4.配置hosts

10.168.14.49 hadoop-master
10.168.14.50 hadoop-slave1
10.168.14.51 hadoop-slave2

5.hadoop-master节点安装
首先需安装好jdk环境,然后进行hadoop安装及相关配置。
1). 解压hadoop-3.0.0.tar.gz,并配置环境变量

# 1. 解压hadoop-3.0.0.tar.gz
$ tar -zxvf hadoop-3.0.0.tar.gz

# 2. 配置环境变量
$ sudo vi /etc/profile

export HADOOP_HOME=/usr/local/nlp/hadoop-3.0.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

# 3.使环境变量生效. /etc/profile;执行如下命令
$ hadoop version

# 4.出现如下信息说明配置成功
Hadoop 3.0.0
Source code repository https://git-wip-us.apache.org/repos/asf/hadoop.git -r c25427ceca461ee979d30edd7a4b0f50718e6533
Compiled by andrew on 2017-12-08T19:16Z
Compiled with protoc 2.5.0
From source with checksum 397832cb5529187dc8cd74ad54ff22
This command was run using /usr/local/nlp/hadoop-3.0.0/share/hadoop/common/hadoop-common-3.0.0.jar

2).修改hadoop相关配置

配置文件
hadoop-env.sh
core-site.xml
hdfs-site.xml
yarn-site.xml
mapred-site.xml
workers

(a). vi hadoop-env.sh

export JAVA_HOME=/usr/local/nlp/java/jdk1.8.0_201
<!--修改ssh默认端口-->
export HADOOP_SSH_OPTS="-p 42214"

(b). vi core-site.xml

<configuration>
	<property>
			<name>fs.defaultFS</name>
			<value>hdfs://hadoop-master:8100/</value>
	</property>
	<property> 
		<name>hadoop.tmp.dir</name> 
		<value>/usr/local/nlp/hadoop-3.0.0/tmp</value> 
	</property> 
	<property>
		<name>dfs.permissions</name>
		<value>false</value>
	</property>
	<property> 
		<name>fs.trash.interval</name> 
		<value>1440</value> 
	</property>
</configuration>
namevaluedescription
fs.defaultFShdfs://hadoop-master:8100/namenode RPC交互端口
hadoop.tmp.dir/usr/local/nlp/hadoop-3.0.0/tmp指定使用hadoop时产生文件的存放目录
dfs.permissionsfalsedfs权限是否打开,一般设置false
fs.trash.interval1440HDFS垃圾箱设置,可以恢复误删除,分钟数,0为禁用,添加该项无需重启hadoop

©. vi hdfs-site.xml

<configuration>
	<property> 
		<name>dfs.replication</name> 
		<value>1</value> 
	</property>
    <property>
  		<name>dfs.http.address</name>
  		<value>0.0.0.0:8101</value>
	</property>
	<property>
  		<name>dfs.datanode.address</name>
  		<value>0.0.0.0:8201</value>
	</property>
    <property>
  		<name>dfs.datanode.ipc.address</name>
  		<value>0.0.0.0:8301</value>
	</property>
    <property>
  		<name>dfs.datanode.http.address</name>
  		<value>0.0.0.0:8401</value>
	</property>
	<property> 
		<name>dfs.webhdfs.enabled</name> 
		<value>true</value> 
	</property> 
	<property> 
		<name>dfs.permissions</name> 
		<value>false</value> 
	</property> 
	<property> 
		<name>dfs.webhdfs.enabled</name> 
		<value>true</value> 
	</property> 
	<property> 
		<name>dfs.permissions</name> 
		<value>false</value> 
	</property> 
</configuration>
namevaluedescription
dfs.replication1hdfs数据块的复制份数,默认3,理论上份数越多跑数速度越快,但是需要的存储空间也更多
dfs.http.address0.0.0.0:8101NameNode web管理端口
dfs.datanode.address0.0.0.0:8201datanode控制端口
dfs.datanode.ipc.address0.0.0.0:8301datanode的RPC服务器地址和端口
dfs.datanode.http.address0.0.0.0:8401datanode的HTTP服务器和端口
dfs.webhdfs.enabledtrue使用namenode的IP和端口进行所有的webhdfs操作
dfs.permissionsfalsedfs权限是否打开,一般设置false

(d). vi yarn-site.xml

<configuration>
	<property> 
		<name>yarn.resourcemanager.hostname</name>
		<value>hadoop-master</value>
	</property> 
	<property> 
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property> 
	<property> 
		<name>yarn.log-aggregation-enable</name> 
		<value>true</value> 
	</property>
    <property>
		<name>yarn.resourcemanager.webapp.address</name>
        <value>0.0.0.0:8108</value>
	</property>
    <property>
		<name>yarn.nodemanager.webapp.address</name>
		<value>0.0.0.0:8142</value>
    </property>
</configuration>
namevaluedescription
yarn.resourcemanager.hostnamehadoop-master主机名称
yarn.nodemanager.aux-servicesmapreduce_shuffleNodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce程序
yarn.log-aggregation-enabletrue是否启用日志聚合
yarn.resourcemanager.webapp.address0.0.0.0:8108ResourceManager对外web ui地址。用户可通过该地址在浏览器中查看集群各类信息。

(e). vi mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
	<property>
        <name>mapreduce.jobtracker.http.address</name>
        <value>hadoop-master:8104</value>
    </property>
	<property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop-master:8105</value>
    </property>
	<property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop-master:8106</value>
    </property>
	<property>
        <name>mapred.job.tracker</name>
        <value>http://hadoop-master:8107</value>
    </property>
</configuration>

(f).vi workers

hadoop-slave1
hadoop-slave2

注意很多版本的安装都是编辑slaves文件,这个需要注意版本的区别hadoop3.x编辑的是workers文件。

3).将hadoop-master节点配置好的信息复制到其他两台虚拟机上

scp -P 42014 -r /usr/local/nlp/hadoop-3.0.0/*  uaren@10.168.14.50:/usr/local/nlp/hadoop-3.0.0/
scp -P 42014 -r /usr/local/nlp/hadoop-3.0.0/*  uaren@10.168.14.51:/usr/local/nlp/hadoop-3.0.0/

4). 格式化Namenode
注意只要格式化hadoop-master即可。

$ cd /usr/local/nlp/hadoop-3.0.0/
$ bin/hdfs namenode -format

5).启动hadoop
进入集群主节点启动hadoop

$ sbin/start-all.sh

6).查看hadoop集群启动情况,正常情况应该会显示为如下:
hadoop-master

[uaren@hadoop-master hadoop-3.0.0]$ jps
26355 Jps
24377 NameNode
24828 ResourceManager
24605 SecondaryNameNode

hadoop-slave1

[uaren@hadoop-slave1 hadoop-3.0.0]$ jps
9730 Jps
9447 NodeManager
9327 DataNode

hadoop-slave2

[uaren@hadoop-slave2 hadoop-3.0.0]$ jps
4205 Jps
3790 DataNode
3918 NodeManager

7).浏览器访问hadoop
http://10.168.14.49:8101
在这里插入图片描述6.java操作hdfs
我们新建一个maven项目,pom文件导入如下包

注意修改windows环境hosts新加:
10.168.14.49 hadoop-master
10.168.14.50 hadoop-slave1
10.168.14.51 hadoop-slave2

<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<hadoop.version>3.0.0</hadoop.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.12</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-client</artifactId>
			<version>${hadoop.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-common</artifactId>
			<version>${hadoop.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-hdfs</artifactId>
			<version>${hadoop.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-maven-plugins</artifactId>
			<version>${hadoop.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-mapreduce-client-core</artifactId>
			<version>${hadoop.version}</version>
		</dependency>
		<dependency>
			<groupId>jdk.tools</groupId>
			<artifactId>jdk.tools</artifactId>
			<version>1.8</version>
			<scope>system</scope>
			<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
		</dependency>
	</dependencies>

新建测试类

public class HdfsTest {

	FileSystem fs = null;

	@Before
	public void init() throws Exception {
		Configuration conf = new Configuration();
		conf.set("dfs.client.use.datanode.hostname", "true");
		fs = FileSystem.get(new URI("hdfs://hadoop-master:8100"), conf, "root");
	}

	/**
	 * 新建文件夹
	 * @throws Exception
	 * 作者:will
	 * 日期:Mar 19, 20192:36:59 PM
	 */
	@Test
	public void testMkdir() throws Exception {
		fs.mkdirs(new Path("/yrz"));
		fs.close();
	}

	/**
	 * 上传文件
	 * @throws Exception
	 * 作者:will
	 * 日期:Mar 15, 20194:31:07 PM
	 */
	@Test
	public void testUpload() throws Exception {
		fs.copyFromLocalFile(new Path("c:/CrackCaptcha.log"), new Path("/yrz"));
		fs.close();
	}

	/**
	 * 下载文件
	 * @throws Exception
	 * 作者:will
	 * 日期:Mar 15, 20194:31:07 PM
	 */
	@Test
	public void testDownload() throws Exception {
		fs.copyToLocalFile(false, new Path("/yrz/CrackCaptcha.log"), new Path("E:/CrackCaptcha.log"), true);
		fs.close();
		System.out.println("下载成功");
	}

	/**
	 * 删除文件
	 * @throws Exception
	 * 作者:will
	 * 日期:Mar 15, 20194:31:07 PM
	 */
	@Test
	public void testDelete() throws Exception {
		fs.delete(new Path("/yrz/CrackCaptcha.log"), true);
		fs.close();
	}
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值