2024世界职业技能大赛大数据平台搭建hadoop(容器环境)

任务A:大数据平台搭建(容器环境)(15分)

环境搭建请看这篇文章大数据模块A环境搭建
环境说明:

服务端登录地址详见各任务服务端说明。
补充说明:宿主机可通过Asbru工具或SSH客户端进行SSH访问;
相关软件安装包在宿主机的/opt目录下,请选择对应的安装包进行安装,用不到的可忽略;
所有任务中应用命令必须采用绝对路径;
进入Master节点的方式为
docker exec -it master /bin/bash
进入Slave1节点的方式为
docker exec -it slave1 /bin/bash
进入Slave2节点的方式为
docker exec -it slave2 /bin/bash
三个容器节点的root密码均为123456

提前准备好hadoop-3.1.3.tar.gz、jdk-8u212-linux-x64.tar.gz放在/opt/下(模拟的自己准备,比赛时会提供)

子任务一:Hadoop 完全分布式安装配置

评分标准
主要知识与技能点分值
JDK的解压安装1
JDK的环境变量配置1
Host配置及三个节点的分发1
Hadoop解压安装及环境初始化2
Hadoop集群启动并查看2

本任务需要使用root用户完成相关配置,安装Hadoop需要配置前置环境。命令中要求使用绝对路径,具体要求如下:

任务1

1、 从宿主机/opt目录下将文件hadoop-3.1.3.tar.gz、jdk-8u212-linux-x64.tar.gz复制到容器Master中的/opt/software路径中(若路径不存在,则需新建),将Master节点JDK安装包解压到/opt/module路径中(若路径不存在,则需新建),将JDK解压命令复制并粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下;

用终端连接宿主机
检查容器是否启动

​ docker ps -a

连接master

docker exec -it master /bin/bash

检查/opt/software路径是否存在
[root@master ~]# ls /opt/
module  software
把宿主机里的资料复制到容器里面去
[root@Bigdata ~]# docker cp /opt/jdk-8u212-linux-x64.tar.gz master:/opt/software

Successfully copied 195MB to master:/opt/software

[root@Bigdata ~]# docker cp /opt/hadoop-3.1.3.tar.gz master:/opt/software

Successfully copied 338MB to master:/opt/software

将Master节点JDK安装包解压到/opt/module
tar zxvf /opt/software/jdk-8u212-linux-x64.tar.gz -C /opt/module/
任务2

2、 修改容器中/etc/profile文件,设置JDK环境变量并使其生效,配置完毕后在Master节点分别执行“java -version”和“javac”命令,将命令行执行结果分别截图并粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下;

重命名jdk文件夹
mv /opt/module/jdk1.8.0_212 /opt/module/java
在/etc/profile文件末尾写环境变量
#JAVA_HOME
export JAVA_HOME=/opt/module/java

#PATH
export PATH=$PATH:$JAVA_HOME/bin
使环境变量生效
source /etc/profile
输入 java -version
java version "1.8.0_212"
Java(TM) SE Runtime Environment (build 1.8.0_212-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.212-b10, mixed mode)
输入 javac
Usage: javac <options> <source files>
where possible options include:
  -g                         Generate all debugging info
  -g:none                    Generate no debugging info
  -g:{lines,vars,source}     Generate only some debugging info
  -nowarn                    Generate no warnings
  -verbose                   Output messages about what the compiler is doing
  -deprecation               Output source locations where deprecated APIs are used
  -classpath <path>          Specify where to find user class files and annotation processors
  -cp <path>                 Specify where to find user class files and annotation processors
  -sourcepath <path>         Specify where to find input source files
  -bootclasspath <path>      Override location of bootstrap class files
  -extdirs <dirs>            Override location of installed extensions
  -endorseddirs <dirs>       Override location of endorsed standards path
  -proc:{none,only}          Control whether annotation processing and/or compilation is done.
  -processor <class1>[,<class2>,<class3>...] Names of the annotation processors to run; bypasses default discovery process
  -processorpath <path>      Specify where to find annotation processors
  -parameters                Generate metadata for reflection on method parameters
  -d <directory>             Specify where to place generated class files
  -s <directory>             Specify where to place generated source files
  -h <directory>             Specify where to place generated native header files
  -implicit:{none,class}     Specify whether or not to generate class files for implicitly referenced files
  -encoding <encoding>       Specify character encoding used by source files
  -source <release>          Provide source compatibility with specified release
  -target <release>          Generate class files for specific VM version
  -profile <profile>         Check that API used is available in the specified profile
  -version                   Version information
  -help                      Print a synopsis of standard options
  -Akey[=value]              Options to pass to annotation processors
  -X                         Print a synopsis of nonstandard options
  -J<flag>                   Pass <flag> directly to the runtime system
  -Werror                    Terminate compilation if warnings occur
  @<filename>                Read options and filenames from file
任务3

3、 请完成host相关配置,将三个节点分别命名为master、slave1、slave2,并做免密登录,用scp命令并使用绝对路径从Master复制JDK解压后的安装文件到slave1、slave2节点(若路径不存在,则需新建),并配置slave1、slave2相关环境变量,将全部scp复制JDK的命令复制并粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下;

分别进入三个容器检查ip

输入 ifconfig

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.100.102  netmask 255.255.255.0  broadcast 192.168.100.255
        ether 00:50:56:80:3e:d7  txqueuelen 0  (Ethernet)
        RX packets 6934  bytes 575269 (561.7 KiB)
        RX errors 0  dropped 334  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
编写/etc/hosts文件 vi /etc/hosts

在末尾添加

192.168.100.101 master
192.168.100.102 slave1
192.168.100.103 slave2
配置免密

输入 ssh-keygen 一直敲回车

Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:a5KqXjGa6r1CO1pe9cG9bR3Pp2om6BstpOB9l6SW24E root@master
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|                 |
|                 |
|       . .       |
|    + . S o   .  |
| . + * = O + . + |
|. = + = E.* o . +|
| B.o . =.*.oo  ..|
|=o*+o  .+..+...  |
+----[SHA256]-----+
复制密钥
ssh-copy-id master

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
The authenticity of host ‘master (192.168.100.102)’ can’t be established.
RSA key fingerprint is SHA256:nuf/qVhd2k6k0u5t7GvhylqRi+4xMC3MNGmJKJNXipo.
RSA key fingerprint is MD5:e6:fd:96:10:3f:2c:9a:68:40:cc:d7:7c:e2:ee:6e:67.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed – if you are prompted now it is to install the new keys
root@master’s password:

Number of key(s) added: 1

Now try logging into the machine, with: “ssh ‘master’”
and check to make sure that only the key(s) you wanted were added.

ssh-copy-id slave1

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
The authenticity of host ‘slave1 (192.168.100.103)’ can’t be established.
RSA key fingerprint is SHA256:nuf/qVhd2k6k0u5t7GvhylqRi+4xMC3MNGmJKJNXipo.
RSA key fingerprint is MD5:e6:fd:96:10:3f:2c:9a:68:40:cc:d7:7c:e2:ee:6e:67.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed – if you are prompted now it is to install the new keys
root@slave1’s password:

Number of key(s) added: 1

Now try logging into the machine, with: “ssh ‘slave1’”
and check to make sure that only the key(s) you wanted were added.

ssh-copy-id slave2

/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
The authenticity of host ‘slave2 (192.168.100.104)’ can’t be established.
RSA key fingerprint is SHA256:nuf/qVhd2k6k0u5t7GvhylqRi+4xMC3MNGmJKJNXipo.
RSA key fingerprint is MD5:e6:fd:96:10:3f:2c:9a:68:40:cc:d7:7c:e2:ee:6e:67.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed – if you are prompted now it is to install the new keys
root@slave2’s password:

Number of key(s) added: 1

Now try logging into the machine, with: “ssh ‘slave2’”
and check to make sure that only the key(s) you wanted were added.

复制hosts文件到slave机器上
scp /etc/hosts slave1:/etc/

hosts 100% 219 218.0KB/s 00:00

scp /etc/hosts slave2:/etc/

hosts 100% 219 218.0KB/s 00:00

在两台slave机器里也配置免密连接
ssh-keygen
ssh-copy-id master
ssh-copy-id slave1
ssh-copy-id slave2
复制jdk到slave1、slave2
scp -rq /opt/module/java slave1:/opt/module
scp -rq /opt/module/java slave2:/opt/module
复制环境变量
scp /etc/profile slave1:/etc/profile
scp /etc/profile slave2:/etc/profile
任务4

4、 在Master将Hadoop解压到/opt/module(若路径不存在,则需新建)目录下,并将解压包分发至slave1、slave2中,其中master、slave1、slave2节点均作为datanode,配置好相关环境,初始化Hadoop环境namenode,将初始化命令及初始化结果截图(截取初始化结果日志最后20行即可)粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下;

解压Hadoop
[root@master ~]# tar zxvf /opt/software/hadoop-3.1.3.tar.gz -C /opt/module/
重命名文件夹
[root@master ~]# mv /opt/module/hadoop-3.1.3 /opt/module/hadoop
环境配置 vi /etc/profile
增加下面代码
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop

#PATH
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

​​​​环境配置

使环境生效
source /etc/profile
配置hadoop文件
core-site.xml
 vi /opt/module/hadoop/etc/hadoop/core-site.xml
    <!-- 指定NameNode的地址 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:8020</value>
    </property>

    <!-- 指定hadoop数据的存储目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop/data</value>
    </property>

core-site

hdfs-site.xml
 vi /opt/module/hadoop/etc/hadoop/hdfs-site.xml
	<!-- nn web端访问地址-->
	<property>
        <name>dfs.namenode.http-address</name>
        <value>master:9870</value>
    </property>
	<!-- 2nn web端访问地址-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave2:9868</value>
    </property>

hdfs-site

yarn-site.xml
vi /opt/module/hadoop/etc/hadoop/yarn-site.xml
<!-- 指定MR走shuffle -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <!-- 指定ResourceManager的地址-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>

    <!-- 环境变量的继承 -->	·
    <property>
        <name>yarn.nodemanager.env-whitelist</name>       				 
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,
CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>

yarn-site

mapred-site.xml
vi /opt/module/hadoop/etc/hadoop/mapred-site.xml
<!-- 指定MapReduce程序运行在Yarn上 -->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

mapred-site

hadoop-env.sh
vi /opt/module/hadoop/etc/hadoop/hadoop-env.sh

把下面代码复制到这个文件末尾

export JAVA_HOME=/opt/module/java
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
workers
vi /opt/module/hadoop/etc/hadoop/workers

localhost删除这个填下面的代码

master
slave1
slave2
分发到集群去
分发环境变量
[root@master ~]# scp /etc/profile slave1:/etc/
[root@master ~]# scp /etc/profile slave2:/etc/
分发hadoop
[root@master ~]# scp -rq /opt/module/hadoop slave1:/opt/module/
[root@master ~]# scp -rq /opt/module/hadoop slave2:/opt/module/
初始化Hadoop环境namenode
[root@master ~]# hdfs namenode -format

namenode

任务5

5、 启动Hadoop集群(包括hdfs和yarn),使用jps命令查看Master节点与slave1节点的Java进程,将jps命令与结果截图粘贴至客户端桌面【Release\任务A提交结果.docx】中对应的任务序号下。

启动Hadoop集群
[root@master ~]# start-all.sh 

或者

[root@master ~]#  /opt/module/hadoop/sbin/start-all.sh 
使用jps命令查看Master节点与slave1节点的Java进程
[root@master ~]# jps

jps

[root@slave1 ~]# jps
12161 Jps
11112 NodeManager
10846 DataNode
[root@slave2 ~]# jps
10451 DataNode
10821 NodeManager
10582 SecondaryNameNode
11944 Jps

测试web网页是否正常http://192.168.100.101:9870
测试web
datanode

### 搭建比赛用大数据集群环境的方法 #### 准备工作 为了构建一个高效稳定的大数据处理平台,在开始之前需确保所有必要的软件包已就绪。提前准备好`hadoop-3.1.3.tar.gz`、`jdk-8u212-linux-x64.tar.gz`并放置于`/opt/`目录下[^1]。 #### JDK 解压安装与配置 对于JDK而言,其解压缩过程相对简单直接;而更重要的是完成之后的路径设置工作。这一步骤涉及到编辑系统的全局配置文件来添加Java运行所需的环境变量。具体操作是在各节点上通过命令行工具打开`/etc/profile`文件进行修改[^2]。 #### Hosts 文件配置 正确配置主机名到IP地址之间的映射关系至关重要,它不仅简化了网络通信中的名称解析流程,还为后续实施无密码SSH登录以及部署各类分布式组件奠定了基础。此环节主要涉及更新每台机器上的`/etc/hosts`文件内容[^3]。 #### Hadoop 安装及初始化 接下来就是核心部分——Hadoop本身的安装及其初步设定。同样地,先将下载好的二进制版本释放至指定位置,随后依照官方文档指导调整相应参数选项以适应当前硬件条件和业务需求。特别需要注意的是,针对完全分布式的场景,还需额外关注NameNode、DataNode等角色的具体安排。 #### XML 配置优化 深入理解XML格式下的资源配置项是调优性能的关键所在。例如可以在`core-site.xml`, `hdfs-site.xml`, 或者其他相关配置文件内的`<configuration>`标签之间加入特定属性定义,从而更好地控制整个框架的行为模式[^4]。 ```xml <!-- core-site.xml --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://namenode:9000</value> </property> </configuration> <!-- hdfs-site.xml --> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration> ``` 最后验证集群状态是否正常运作,可以通过启动服务后尝试访问Web界面或是利用CLI指令来进行基本的功能测试。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

xfcloud

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值