Win11 在Docker中部署Hadoop、Spark
实验环境:
- 操作系统:win11
- 命令行工具: PowerShell
- Docker Desktop for Windows (内置 Docker CLI client 与 Docker Compose)
- jdk 版本:openjdk-8-jdk
- Scala 版本:Scala 2.11.6
- Spark 版本:spark-3.2.4
- Hadoop 版本:hadoop-3.3.5
一、配置docker可参考:
Windows11下安装Docker_win11安装docker_zou_hailin226的博客-优快云博客
windows11如何安装docker desktop_如梦@_@的博客-优快云博客
windows docker 更改镜像安装目录_windows docker 目录_普通网友的博客-优快云博客
windwos11没有Hyper-V的解决方法 - 简书 (jianshu.com)
Win11安装Docker及简单使用 - 知乎 (zhihu.com)
二、部署Hadoop
参考教程:
docker自主搭建Hadoop3.2.0 HBASE2.1.6 Spark2.4.8三节点集群(含docker镜像制作过程)_docker hbase集群_学亮编程手记的博客-优快云博客
打开PowerShell (建议以管理员身份运行,好像不用也可以)
1、Pull原始镜像后修改apt源
docker pull ubuntu:16.04
2、进入容器
PS C:\Users\> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu 16.04 b6f507652425 20 months ago 135MB
docker run -it b6f507652425 bash
3、修改为阿里源或清华源
查找Ubuntu对应的sources.list 镜像
阿里巴巴开源镜像站-OPSX镜像站-阿里云开发者社区 (aliyun.com)
ubuntu | 镜像站使用帮助 | 清华大学开源软件镜像站 | Tsinghua Open Source Mirror
# 默认注释了源码镜像以提高 apt update 速度,如有需要可自行取消注释
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
# deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse
# deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
# # deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-security main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu/ xenial-security main restricted universe multiverse
# deb-src http://security.ubuntu.com/ubuntu/ xenial-security main restricted universe multiverse
# 预发布软件源,不建议启用
# deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
# # deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse
直接在此界面修改(把原来的全删了)
apt-get update
注:一般情况很快就更新好了,如果更新超级慢(注意看一下是否修改了docker的镜像 Windows Docker 配置国内镜像源的两种方法_docker镜像源 windows_灬倪先森_的博客-优快云博客)
其实改了也作用不大,但好点儿,这个应该不是主要原因。
"registry-mirrors": [
"https://ung2thfc.mirror.aliyuncs.com",
"https://mirror.ccs.tencentyun.com",
"https://docker.mirrors.ustc.edu.cn",
"http://hub-mirror.c.163.com"
]
补充(可能会用到的命令):
重启存在的容器:
PS C:\Users\> docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
83f32487daf7 b6f507652425 "bash" 24 hours ago Up 21 seconds admiring_roentgen
PS C:\Users\> docker start 83f32487daf7
d926add9c071
PS C:\Users\> docker exec -it 83f32487daf7 /bin/bash
4、安装vim与网络工具包
apt-get install vim
apt install net-tools
5、安装JDK1.8
apt install openjdk-8-jdk
6、安装Scala
apt install scala
7、SSH免密登录
apt-get install openssh-server
apt-get install openssh-client
cd ~
ssh-keygen -t rsa -P ""
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
service ssh start
ssh 127.0.0.1
vim ~/.bashrc
最后一行添加
service ssh start
8、安装Hadoop
wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/stable/hadoop-3.3.5.tar.gz
tar -zxf ~/hadoop-3.3.5.tar.gz -C /usr/local
cd /usr/local/
mv ./hadoop-3.3.5/ ./hadoop
修改 /etc/profile
vim /etc/profile
#java
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
#hadoop
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
source /etc/profile
cd /usr/local/hadoop
/usr/local/hadoop# ./bin/hadoop version
在/usr/local/hadoop/etc/hadoop/目录下修改(直接在docker中找到相应文件夹改即可)
hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://h01:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name> <value>/home/hadoop3/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop3/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>/home/hadoop3/hadoop/hdfs/data</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/usr/local/hadoop/etc/hadoop,
/usr/local/hadoop/share/hadoop/common/*,
/usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/hdfs/*,
/usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoop/mapreduce/*,
/usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop/share/hadoop/yarn/*,
/usr/local/hadoop/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
配置hadoop完成镜像构建
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>h01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
修改workers
cd /usr/local/hadoop/etc/hadoop
vim workers
h01
h02
9、Docker启动集群
退出
cd /
exit
将当前容器导出为镜像,并查看当前镜像
docker commit -m "haddop" -a "hadoop" 83f32487daf7 newuhadoop
docker images
83f32487daf7 是容器ID
为 Hadoop 集群单独构建一个虚拟的网络。
docker network create --driver=bridge hadoop
sudo docker network ls
启动master
docker run -it --network hadoop -h "h01" --name "h01" -p 9870:9870 -p 8088:8088 newuhadoop /bin/bash
启动worker
docker run -it --network hadoop -h "h02" --name "h02" newuhadoop /bin/bash
h01主机中,启动 Haddop 集群
docker exec -it 96870f9bc672 /bin/bash
格式化
cd /usr/local/hadoop/bin
./hdfs namenode -format
启动
cd /usr/local/hadoop/sbin/
./start-all.sh
访问本机localhost:8088
查看分布式文件状态
cd /usr/local/hadoop/bin
./hadoop dfsadmin -report
10、 运行内置WordCount例子
把license作为需要统计的文件
cd /usr/local/hadoop
ls
在 HDFS 中创建 input 文件夹
cd /usr/local/hadoop/bin
./hadoop fs -mkdir /input
上传 file1.txt 文件到 HDFS 中
./hadoop fs -put ../file1.txt /input
查看 HDFS 中 input 文件夹里的内容
./hadoop fs -ls /input
运作 wordcount 例子程序
./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar wordcount /input /output
查看 HDFS 中的 /output 文件夹的内容
./hadoop fs -ls /output
三、Spark-伪分布式
参考教程:
docker自主搭建Hadoop3.2.0 HBASE2.1.6 Spark2.4.8三节点集群(含docker镜像制作过程)_docker hbase集群_学亮编程手记的博客-优快云博客
cd C:\Windows\system32
docker ps -a
启动h01和h02
docker start 96870f9bc672
docker start c82291dcdb23
进入h01
docker exec -it 96870f9bc672 /bin/bash
1、在 Hadoop 的基础上安装 Spark
wget https://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-3.2.4/spark-3.2.4-bin-hadoop3.2.tgz
解压到 /usr/local 目录下面
tar -zxvf spark-3.2.4-bin-hadoop3.2.tgz -C /usr/local/
修改文件夹的名字
cd /usr/local/
mv spark-3.2.4-bin-hadoop3.2/ spark
2、修改 /etc/profile 环境变量文件
vim /etc/profile
追加
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
生效
source /etc/profile
退出容器,进入h02在其 /etc/profile 文件后追加那两行环境变量
cd /
exit
docker exec -it c82291dcdb23 /bin/bash
cd /usr/local/
vim /etc/profile
source /etc/profile
退出h02,进入h01
cd /usr/local/spark/conf
3、修改文件名
mv spark-env.sh.template spark-env.sh
4、修改 spark-env.sh,追加
vim spark-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SCALA_HOME=/usr/share/scala
export SPARK_MASTER_HOST=h01
export SPARK_MASTER_IP=h01
export SPARK_WORKER_MEMORY=4g
5、修改文件名
mv workers.template workers
vim workers
全部修改为(删掉localhost)
h01
h02
6、重新启动Hadoop
修改hadoop-env.sh文件,加入Hadoop home参数
cd /usr/local/hadoop/etc/hadoop
vim hadoop-env.sh
追加
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
格式化
cd /usr/local/hadoop/bin
./hadoop namenode -format
启动集群
cd /usr/local/hadoop/sbin/
./start-all.sh
7、复制及运行
将配置好的spark复制到h02上
cd /usr/local
scp -r /usr/local/spark root@h02:/usr/local/
启动 Spark
cd /usr/local/spark/sbin/
./start-all.sh
改 spark-env.sh,追加
vim spark-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SCALA_HOME=/usr/share/scala
export SPARK_MASTER_HOST=h01
export SPARK_MASTER_IP=h01
export SPARK_WORKER_MEMORY=4g
5、修改文件名
mv workers.template workers
vim workers
全部修改为(删掉localhost)
h01
h02
6、重新启动Hadoop
修改hadoop-env.sh文件,加入Hadoop home参数
cd /usr/local/hadoop/etc/hadoop
vim hadoop-env.sh
追加
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
格式化
cd /usr/local/hadoop/bin
./hadoop namenode -format
启动集群
cd /usr/local/hadoop/sbin/
./start-all.sh
7、复制及运行
将配置好的spark复制到h02上
cd /usr/local
scp -r /usr/local/spark root@h02:/usr/local/
启动 Spark
cd /usr/local/spark/sbin/
./start-all.sh