Win11 在Docker中部署Hadoop、Spark

Win11 在Docker中部署Hadoop、Spark

实验环境:

  • 操作系统:win11
  • 命令行工具: PowerShell
  • Docker Desktop for Windows (内置 Docker CLI client 与 Docker Compose)
  • jdk 版本:openjdk-8-jdk
  • Scala 版本:Scala 2.11.6
  • Spark 版本:spark-3.2.4
  • Hadoop 版本:hadoop-3.3.5

一、配置docker可参考:

Windows11下安装Docker_win11安装docker_zou_hailin226的博客-优快云博客

windows11如何安装docker desktop_如梦@_@的博客-优快云博客

windows docker 更改镜像安装目录_windows docker 目录_普通网友的博客-优快云博客

windwos11没有Hyper-V的解决方法 - 简书 (jianshu.com)

Win11安装Docker及简单使用 - 知乎 (zhihu.com)

二、部署Hadoop

参考教程:

docker自主搭建Hadoop3.2.0 HBASE2.1.6 Spark2.4.8三节点集群(含docker镜像制作过程)_docker hbase集群_学亮编程手记的博客-优快云博客

打开PowerShell (建议以管理员身份运行,好像不用也可以)

1、Pull原始镜像后修改apt源

docker pull ubuntu:16.04

2、进入容器

PS C:\Users\> docker images
REPOSITORY   TAG       IMAGE ID       CREATED         SIZE
ubuntu       16.04     b6f507652425   20 months ago   135MB
docker run -it b6f507652425 bash

3、修改为阿里源或清华源

查找Ubuntu对应的sources.list 镜像

阿里巴巴开源镜像站-OPSX镜像站-阿里云开发者社区 (aliyun.com)

ubuntu | 镜像站使用帮助 | 清华大学开源软件镜像站 | Tsinghua Open Source Mirror

# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse

# deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
# # deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-security main restricted universe multiverse

deb http://security.ubuntu.com/ubuntu/ xenial-security main restricted universe multiverse
# deb-src http://security.ubuntu.com/ubuntu/ xenial-security main restricted universe multiverse

# 预发布软件源,不建议启用
# deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
# # deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse

直接在此界面修改(把原来的全删了)

在这里插入图片描述

apt-get update

注:一般情况很快就更新好了,如果更新超级慢(注意看一下是否修改了docker的镜像 Windows Docker 配置国内镜像源的两种方法_docker镜像源 windows_灬倪先森_的博客-优快云博客
其实改了也作用不大,但好点儿,这个应该不是主要原因。

"registry-mirrors": [
    "https://ung2thfc.mirror.aliyuncs.com",
    "https://mirror.ccs.tencentyun.com",
    "https://docker.mirrors.ustc.edu.cn",
    "http://hub-mirror.c.163.com"
  ]
补充(可能会用到的命令):

重启存在的容器:

PS C:\Users\> docker ps -a
CONTAINER ID   IMAGE          COMMAND   CREATED        STATUS          PORTS     NAMES
83f32487daf7   b6f507652425   "bash"    24 hours ago   Up 21 seconds             admiring_roentgen
PS C:\Users\> docker start 83f32487daf7
d926add9c071
PS C:\Users\> docker exec -it 83f32487daf7 /bin/bash

4、安装vim与网络工具包

apt-get install vim
apt install net-tools

5、安装JDK1.8

apt install openjdk-8-jdk

6、安装Scala

apt install scala

7、SSH免密登录

apt-get install openssh-server
apt-get install openssh-client
cd ~
ssh-keygen -t rsa -P ""
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
service ssh start
ssh 127.0.0.1
vim ~/.bashrc

最后一行添加

service ssh start

8、安装Hadoop

wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/hadoop-3.3.5.tar.gz
tar -zxf ~/hadoop-3.3.5.tar.gz -C /usr/local
cd /usr/local/
mv ./hadoop-3.3.5/ ./hadoop

修改 /etc/profile

vim /etc/profile
#java
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=${JAVA_HOME}/jre    
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib    
export PATH=${JAVA_HOME}/bin:$PATH
#hadoop
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME 
export HADOOP_HDFS_HOME=$HADOOP_HOME 
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME 
export HADOOP_INSTALL=$HADOOP_HOME 
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec 
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
source /etc/profile

cd /usr/local/hadoop
/usr/local/hadoop# ./bin/hadoop version

在/usr/local/hadoop/etc/hadoop/目录下修改(直接在docker中找到相应文件夹改即可)

hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

core-site.xml

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://h01:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>      <value>/home/hadoop3/hadoop/tmp</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/hadoop3/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.namenode.data.dir</name>
        <value>/home/hadoop3/hadoop/hdfs/data</value>
    </property>
</configuration>

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>
            /usr/local/hadoop/etc/hadoop,
            /usr/local/hadoop/share/hadoop/common/*,
            /usr/local/hadoop/share/hadoop/common/lib/*,
            /usr/local/hadoop/share/hadoop/hdfs/*,
            /usr/local/hadoop/share/hadoop/hdfs/lib/*,
            /usr/local/hadoop/share/hadoop/mapreduce/*,
            /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
            /usr/local/hadoop/share/hadoop/yarn/*,
            /usr/local/hadoop/share/hadoop/yarn/lib/*
        </value>
    </property>
</configuration>

配置hadoop完成镜像构建

yarn-site.xml

<configuration>
    <property>
  <name>yarn.resourcemanager.hostname</name>
        <value>h01</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

修改workers

cd /usr/local/hadoop/etc/hadoop
vim workers
h01
h02

9、Docker启动集群

退出

cd /
exit

将当前容器导出为镜像,并查看当前镜像

docker commit -m "haddop" -a "hadoop" 83f32487daf7 newuhadoop
docker images

在这里插入图片描述

83f32487daf7 是容器ID

为 Hadoop 集群单独构建一个虚拟的网络。

docker network create --driver=bridge hadoop
sudo docker network ls

启动master

docker run -it --network hadoop -h "h01" --name "h01" -p 9870:9870 -p 8088:8088 newuhadoop /bin/bash

启动worker

docker run -it --network hadoop -h "h02" --name "h02" newuhadoop /bin/bash

h01主机中,启动 Haddop 集群

docker exec -it 96870f9bc672 /bin/bash

格式化

cd /usr/local/hadoop/bin
./hdfs namenode -format

启动

cd /usr/local/hadoop/sbin/
./start-all.sh 

访问本机localhost:8088

在这里插入图片描述

查看分布式文件状态

cd /usr/local/hadoop/bin
./hadoop dfsadmin -report

10、 运行内置WordCount例子

把license作为需要统计的文件

cd /usr/local/hadoop
ls

在 HDFS 中创建 input 文件夹

cd /usr/local/hadoop/bin
./hadoop fs -mkdir /input

上传 file1.txt 文件到 HDFS 中

./hadoop fs -put ../file1.txt /input

查看 HDFS 中 input 文件夹里的内容

./hadoop fs -ls /input

运作 wordcount 例子程序

./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordcount /input /output

查看 HDFS 中的 /output 文件夹的内容

./hadoop fs -ls /output

三、Spark-伪分布式

参考教程:

docker自主搭建Hadoop3.2.0 HBASE2.1.6 Spark2.4.8三节点集群(含docker镜像制作过程)_docker hbase集群_学亮编程手记的博客-优快云博客

cd  C:\Windows\system32
docker ps -a

启动h01和h02

docker start 96870f9bc672
docker start c82291dcdb23

进入h01

docker exec -it 96870f9bc672 /bin/bash

1、在 Hadoop 的基础上安装 Spark

wget https://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-3.2.4/spark-3.2.4-bin-hadoop3.2.tgz

解压到 /usr/local 目录下面

tar -zxvf spark-3.2.4-bin-hadoop3.2.tgz  -C /usr/local/

修改文件夹的名字

cd /usr/local/
mv spark-3.2.4-bin-hadoop3.2/ spark

2、修改 /etc/profile 环境变量文件

vim /etc/profile

追加

export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin

生效

source /etc/profile

退出容器,进入h02在其 /etc/profile 文件后追加那两行环境变量

cd /
exit
docker exec -it c82291dcdb23 /bin/bash
cd /usr/local/
vim /etc/profile
source /etc/profile

退出h02,进入h01

cd /usr/local/spark/conf

3、修改文件名

mv spark-env.sh.template spark-env.sh

4、修改 spark-env.sh,追加

vim spark-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SCALA_HOME=/usr/share/scala

export SPARK_MASTER_HOST=h01
export SPARK_MASTER_IP=h01
export SPARK_WORKER_MEMORY=4g

5、修改文件名

mv workers.template workers
vim workers

全部修改为(删掉localhost)

h01
h02 

6、重新启动Hadoop

修改hadoop-env.sh文件,加入Hadoop home参数

cd /usr/local/hadoop/etc/hadoop
vim hadoop-env.sh

追加

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

格式化

cd /usr/local/hadoop/bin
./hadoop namenode -format

启动集群

cd /usr/local/hadoop/sbin/
./start-all.sh 

7、复制及运行

将配置好的spark复制到h02上

cd /usr/local
scp -r /usr/local/spark root@h02:/usr/local/

启动 Spark

cd /usr/local/spark/sbin/
./start-all.sh 

改 spark-env.sh,追加

vim spark-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SCALA_HOME=/usr/share/scala

export SPARK_MASTER_HOST=h01
export SPARK_MASTER_IP=h01
export SPARK_WORKER_MEMORY=4g

5、修改文件名

mv workers.template workers
vim workers

全部修改为(删掉localhost)

h01
h02 

6、重新启动Hadoop

修改hadoop-env.sh文件,加入Hadoop home参数

cd /usr/local/hadoop/etc/hadoop
vim hadoop-env.sh

追加

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

格式化

cd /usr/local/hadoop/bin
./hadoop namenode -format

启动集群

cd /usr/local/hadoop/sbin/
./start-all.sh 

7、复制及运行

将配置好的spark复制到h02上

cd /usr/local
scp -r /usr/local/spark root@h02:/usr/local/

启动 Spark

cd /usr/local/spark/sbin/
./start-all.sh 
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值