一、容器列表解析
从 docker ps
输出可知,当前运行的大数据集群包含 Hadoop、Hive、HBase、MySQL 四大组件,部署在多个容器中。各组件角色及端口映射如下:
1. Hadoop 组件
容器名称 | 角色 | 端口映射 | 功能 |
---|---|---|---|
hadoop-hdfs-nn | HDFS NameNode | 30070 → 9870(HDFS Web UI) | 管理文件系统元数据 |
hadoop-hdfs-dn-0/1/2 | HDFS DataNode | 30864/65/66 → 9864 | 存储实际数据块 |
hadoop-yarn-rm | YARN ResourceManager | 30888 → 8088(YARN Web UI) | 全局资源调度与管理 |
hadoop-yarn-nm-0/1/2 | YARN NodeManager | 30042/43/44 → 8042 | 单节点资源管理 |
hadoop-mr-historyserver | MapReduce History | 31988 → 19888 | 记录作业历史信息 |
hadoop-yarn-proxyserver | YARN ProxyServer | 30911 → 9111 | 代理访问 YARN 应用日志 |
2. Hive 组件
容器名称 | 角色 | 端口映射 | 功能 |
---|---|---|---|
hive-metastore | Hive Metastore | 30983 → 9083 | 存储表结构、分区等元数据(依赖 MySQL) |
hive-hiveserver2 | HiveServer2 | 31000 → 10000 | 提供 JDBC/ODBC 接口执行 Hive 查询 |
3. HBase 组件
容器名称 | 角色 | 端口映射 | 功能 |
---|---|---|---|
hbase-master-1/2 | HBase Master | 36010/36011 → 16010 | 管理 Region 分配、集群协调 |
hbase-regionserver-1/2/3 | HBase RegionServer | 36030/31/32 → 16030 | 存储和处理 Region 数据 |
4. 其他组件
容器名称 | 角色 | 端口映射 | 功能 |
---|---|---|---|
mysql | MySQL | 13306 → 3306 | 存储 Hive Metastore 元数据 |
二、架构图说明
以下是基于容器部署的 大数据平台架构图,展示各组件间的交互与数据流:
plaintext
+----------------------------------------------------------------------------------------+ | Docker Host | | | | +----------------+ +----------------+ +----------------+ +----------------+ | | | HBase Master | | HBase Region | | HBase Region | | HBase Region | | | | (hbase-master-1/2) | Server 1 | | Server 2 | | Server 3 | | | | Port: 16010 | | (Port: 16030) | | (Port: 16030) | | (Port: 16030) | | | +--------+-------+ +-------+--------+ +-------+--------+ +-------+--------+ | | | | | | | | | | | | | | +--------v--------------------v--------------------v--------------------v--------+ | | | HDFS (Hadoop) | | | | +--------------+ +--------------+ +--------------+ +--------------+ | | | | | NameNode | | DataNode 0 | | DataNode 1 | | DataNode 2 | | | | | | Port: 9870 | | Port: 9864 | | Port: 9864 | | Port: 9864 | | | | | +------+-------+ +------+-------+ +------+-------+ +------+-------+ | | | | | | | | | | | | +-------------------+-------------------+-------------------+ | | | +----------------------------------------+---------------------------------------+ | | | | | +----------------------------------------v---------------------------------------+ | | | YARN (Hadoop) | | | | +----------------+ +----------------+ +----------------+ +---------+ | | | | | ResourceManager| | NodeManager 0 | | NodeManager 1 | | History | | | | | | Port: 8088 | | Port: 8042 | | Port: 8042 | | Port:19888 | | | | +--------+-------+ +-------+--------+ +-------+--------+ +---------+ | | | | | | | | | | +----------+--------------------+--------------------+--------------------------+ | | | | +----------------+ +----------------+ | | | Hive Metastore | | HiveServer2 | | | | Port: 9083 | | Port: 10000 | | | +-------+--------+ +-------+--------+ | | | | | | +-------v--------------------v---------+ | | | MySQL | | | | Port: 3306 (外部映射为13306) | | | +--------------------------------------+ | +----------------------------------------------------------------------------------------+
三、关键交互流程
-
Hive 与 Hadoop/HBase 集成:
-
Hive Metastore 将元数据(表结构、分区信息)存储在 MySQL。
-
HiveServer2 接收客户端查询请求,通过 YARN 调度资源执行 MapReduce 或 Spark 作业,数据存储在 HDFS 或 HBase。
-
Hive 可通过
hive-hbase-handler
直接查询 HBase 表。
-
-
HBase 与 Hadoop 集成:
-
HBase Master 管理 Region 分配和集群元数据,依赖 ZooKeeper(镜像中可能内嵌)协调。
-
HBase RegionServer 存储实际数据,底层文件(HFile)写入 HDFS。
-
-
YARN 资源调度:
-
ResourceManager 接收作业提交请求,分配资源给 NodeManager 执行任务(如 MapReduce、Spark)。
-
-
客户端访问:
-
通过宿主机映射端口(如
31000
访问 HiveServer2,30070
访问 HDFS Web UI)操作集群。
-
四、潜在问题与优化建议
-
端口冲突与规划:
-
确保宿主机端口映射无冲突(如
36010-36032
范围用于 HBase)。 -
建议为 Web UI 端口(如 HDFS 9870、YARN 8088)配置反向代理(如 Nginx)统一访问入口。
-
-
HBase RegionServer 性能:
-
监控 RegionServer 的堆内存和 GC 情况,调整 Docker 内存限制(
-m
参数)。
-
-
Hive 元数据备份:
-
定期备份 MySQL 中的
hive_metastore
数据库,防止元数据丢失。
-
-
存储持久化:
-
为 HDFS DataNode、HBase RegionServer 挂载宿主机的持久化存储卷,避免容器重启数据丢失。
-
五、总结
该架构通过 Docker 容器化部署,整合了 Hadoop、Hive、HBase 和 MySQL,实现了数据存储、计算和查询的一体化。各组件通过端口映射对外提供服务,适合开发测试环境。生产环境中需优化资源分配、网络拓扑和数据持久化策略。
附录deepseek
[root@localhost ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9db8135db867 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4 "sh -c '/opt/apache/…" 58 minutes ago Up 58 minutes (healthy) 0.0.0.0:36032->36032/tcp, :::36032->36032/tcp hbase-regionserver-3
87d1c8805654 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4 "sh -c '/opt/apache/…" 58 minutes ago Up 58 minutes (healthy) 0.0.0.0:36030->36030/tcp, :::36030->36030/tcp hbase-regionserver-1
d97ce4ca6f3a registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4 "sh -c '/opt/apache/…" 58 minutes ago Up 58 minutes (healthy) 0.0.0.0:36011->36011/tcp, :::36011->36011/tcp hbase-master-2
39871f7c91ee registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4 "sh -c '/opt/apache/…" 58 minutes ago Up 58 minutes (healthy) 0.0.0.0:36031->36031/tcp, :::36031->36031/tcp hbase-regionserver-2
41fd836e5a11 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hbase:2.5.4 "sh -c '/opt/apache/…" 58 minutes ago Up 58 minutes (healthy) 0.0.0.0:36010->36010/tcp, :::36010->36010/tcp hbase-master-1
513bbef9012f registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:31000->10000/tcp, :::31000->10000/tcp hive-hiveserver2
19af74410ab7 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30983->9083/tcp, :::30983->9083/tcp hive-metastore
c3e5f0cb0040 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30864->9864/tcp, :::30864->9864/tcp hadoop-hdfs-dn-0
dc6ee3379416 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:31988->19888/tcp, :::31988->19888/tcp hadoop-mr-historyserver
89b4e63a011c registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30042->8042/tcp, :::30042->8042/tcp hadoop-yarn-nm-0
7848da3a81e1 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30044->8042/tcp, :::30044->8042/tcp hadoop-yarn-nm-2
ad9aee34797a registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30866->9864/tcp, :::30866->9864/tcp hadoop-hdfs-dn-2
168c62d84f4f registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30865->9864/tcp, :::30865->9864/tcp hadoop-hdfs-dn-1
26d0954eae63 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30911->9111/tcp, :::30911->9111/tcp hadoop-yarn-proxyserver
ee2206d52350 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30043->8042/tcp, :::30043->8042/tcp hadoop-yarn-nm-1
0bf7f4c3936a registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30070->9870/tcp, :::30070->9870/tcp hadoop-hdfs-nn
c27b227026b8 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/hadoop_hive:v1 "sh -c '/opt/apache/…" 4 hours ago Up 4 hours (healthy) 0.0.0.0:30888->8088/tcp, :::30888->8088/tcp hadoop-yarn-rm
517bde9dbd07 registry.cn-hangzhou.aliyuncs.com/bigdata_cloudnative/mysql:5.7 "docker-entrypoint.s…" 2 days ago Up 7 hours (healthy) 33060/tcp, 0.0.0.0:13306->3306/tcp, :::13306->3306/tcp mysql
[root@localhost ~]#