HDFS体系结构(各种进程状态)
NameNode【名称节点】
- 开启方式(关闭方式):
-
- hdfs namenode(关闭Terminal)
-
- hadoop-daemon.sh start namenode(hadoop-daemon.sh stop namenode或杀死进程)
-
- start-dfs.sh(stop-dfs.sh或杀死进程)
-
- namenode默认大小1000M
- namenode守护进程作用:
- 维护HDFS集群元数据的镜像文件[fsimage],【fsimage】包括:文件属性信息,文件与block块的对应关系
- 维护客户端对HDFS的相关操作,并记录[Edits_log]
- 接收所有来自datanode的心跳汇报,内容:block块的信息与所属节点位置
DataNode【数据节点】
- 开启方式(关闭方式):
-
- hdfs datanode(关闭Terminal)
-
- hadoop-daemon.sh start datanode(hadoop-daemon.sh stop datanode或杀死进程)
-
- start-dfs.sh(stop-dfs.sh或杀死进程)
-
- datanode守护进程作用:
- datanode负责实时监控当前节点的运行状态
- datanode以block块的形式进行存储
- datanode要响应客户端的请求
SecondaryNameNode【辅助名称节点】
-
开启方式(关闭方式):
-
- hdfs secondarynamenode(关闭Terminal)
-
- hadoop-daemon.sh start secondarynamenode(hadoop-daemon.sh stop secondarynamenode或杀死进程)
-
- start-dfs.sh(stop-dfs.sh或杀死进程)
-
-
【定期合并fsimage文件和edits_log文件,保证集群的可靠性】
-
secondarynamenode守护作用:【解决HDFS的可靠性】
-
【secondarynamenode利用检查点机制,将fsimage和edits_log合并,解决宕机以后对HDFS的】
-
【HA高可用是解决HDFS的单点故障】
-
【区别:】
-
secondarynamenode是解决高可靠
HA解决的是高可用,两者实现系统稳定运行的角度不一样
用了高可用就不需要secondarynamenode了,有另一台节点存储namenode在运行的namenode挂掉的时候,该节点顶上去,保证高可用
-
-
-
可靠性:
-
数据3份副本:确保数据的可靠性
心跳机制:确保数据节点的可靠性
secondarynamenode:确保宕机恢复的可靠性
机架感知:性能的可靠性
-
YARN体系结构
ResourceManager(资源管理器)
-
开启方式(关闭方式):
- start-yarn.sh
- stop-yarn.sh或杀死进程
-
作用:
- 监控并分配集群全局资源,包括:CPU、内存、磁盘和网络。
- 通过心跳机制获取每个NodeManager节点的资源数据以及运行情况。
- 它将用于开启ApplicationMaster,分配所需资源。
NodeManager(节点管理器)守护进程
-
开启方式(关闭方式):
- start-yarn.sh
- stop-yarn.sh或杀死进程
-
作用:
- 管理并监控当前节点(自己)的资源使用情况。
- 通过心跳机制向ResourceManager进行汇报,汇报内容:CPU、内存、磁盘和网络。
- 用户执行启动Task任务(MapTask和ReduceTask)。
ApplicationMaster(应用主节点)守护进程
-
作用:
- MRAppMaster生命周期随着Job产生而产生。
- 监控当前Job应用程序的调度,内容:资源(jar、conf、split)
- 向ResourceManager申请提交所需任务的资源。
jar 默认10份副本
split 默认10份
job.xml 默认3份
splitmetainfo 默认3份,切分元数据
YarnChild(yarn运行MapReduce应用时开启的)
【随应用的开启而开启,生命周期同应用】
- 作用:
- YarnChild生命周期随着Job产生而产生
- 被MRAppMaster调度,用于运行mapTask或者reduceTask
- 一个节点默认最多开启2个
- 每个进程默认使用200M内存
JobHistoryServer
-
开启方式:
[hadoop@master hadoop]$ mr-jobhistory-daemon.sh start historyserver
starting historyserver, logging to /home/hadoop/soft/hadoop-2.7.3/logs/mapred-hadoop-historyserver-master.out[hadoop@master hadoop]$ jps
21896 JobHistoryServer
21931 Jps -
作用:
- 记录Job作业的历史情况
伪分布式集群守护线程开启状态
方式一:start-dfs.sh + start-yarn.sh
-
start-dfs.sh会开启如下线程:
-
[hadoop@master ~]$
start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-namenode-master.out
master: starting datanode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-datanode-master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-master.out主节点:(仅一个节点)
[hadoop@master ~]$
jps
17922 Jps17804 SecondaryNameNode
17583 NameNode
17886 DataNode
-
-
start-yarn.sh会开启如下线程:【只能在主节点执行该命令,从节点执行ResourceManager无法开启】
-
[hadoop@master ~]$
start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/soft/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-master.out
slave02: starting nodemanager, logging to /home/hadoop/soft/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave02.out
slave03: starting nodemanager, logging to /home/hadoop/soft/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave03.out
slave01: starting nodemanager, logging to /home/hadoop/soft/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave01.out主节点:
[hadoop@master ~]$
jps
18262 Jps
17804 SecondaryNameNode
17998 ResourceManager
17583 NameNode
17886 DataNode
-
完全分布式集群守护线程开启状态:
-
方式一:start-dfs.sh + start-yarn.sh
-
start-dfs.sh会开启如下线程:
-
[hadoop@master ~]$
start-dfs.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-namenode-master.out
slave03: starting datanode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-datanode-slave03.out
slave01: starting datanode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-datanode-slave01.out
slave02: starting datanode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-datanode-slave02.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-master.out主节点:
[hadoop@master ~]$
jps
17922 Jps
17804 SecondaryNameNode(仅1个,辅助名称节点单独主机上开启)
17583 NameNode(仅1个,主节点中开启)【主节点开启NameNode守护进程以及SecondaryNameNode(一般会再启用一个主机给辅助名称节点)】
从节点:
[hadoop@slave01 ~]$
jps
19360 DataNode(多个,每个数据节点都会开启,用于维护和管理数据)
19472 Jps【从节点开启DataNode守护进程,每个数据节点都会开启】
-
-
start-yarn.sh会开启如下线程:【只能在主节点执行该命令,从节点执行ResourceManager无法开启】
-
[hadoop@master ~]$
start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/soft/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-master.out
slave02: starting nodemanager, logging to /home/hadoop/soft/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave02.out
slave03: starting nodemanager, logging to /home/hadoop/soft/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave03.out
slave01: starting nodemanager, logging to /home/hadoop/soft/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-slave01.out主节点:
[hadoop@master ~]$
jps
18262 Jps
17804 SecondaryNameNode
17998 ResourceManager(仅1个,主节点中开启)
17583 NameNode【在主节点中会开启ResourceManager,,用于调度整体的资源】
从节点:
[hadoop@slave03 ~]$
jps
18870 DataNode
19035 NodeManager(多个,每个数据节点都会开启)
19485 Jps【从节点会开启NodeManager,用于管控本节点的资源调度】
-
-
-
方式二:
hadoop-daemon.sh start namenode +
hadoop-daemon.sh start datanode +
hadoop-daemon.sh start secondarynamenode +
start-yarn.sh
-
hadoop-daemon.sh start namenode
-
[hadoop@master ~]$
hadoop-daemon.sh start namenode
starting namenode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-namenode-master.out[hadoop@master ~]$
jps
18731 NameNode
18862 Jps
-
-
-
hadoop-daemon.sh start datanode
-
[hadoop@slave01 ~]$
hadoop-daemon.sh start datanode
starting datanode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-datanode-slave01.out[hadoop@slave01 ~]$
jps
19877 Jps
19829 DataNode【分别在要开启的数据节点执行该命令,开启DataNode】
-
hadoop-daemon.sh start secondarynamenode
-
[hadoop@master ~]$
hadoop-daemon.sh start secondarynamenode
starting secondarynamenode, logging to /home/hadoop/soft/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-master.out[hadoop@master ~]$
jps
18827 SecondaryNameNode
18862 Jps【任意节点都能开,将fsimage和edits_log文件做合并操作】
-
start-yarn.sh
-
-
【通常采用方式二开启守护进程】
Job运行状态下的守护线程
-
主节点:
-
[hadoop@master hadoop]$ jps
20321 SecondaryNameNode
20119 NameNode
20761 Jps
20474 ResourceManager【没有变化】
-
-
从节点:
-
[hadoop@slave03 ~]$ jps
20544 NodeManager
20420 DataNode
20857 YarnChild(在MRAppMaster开启后,MRAppMaster会根据Mapper任务的开启,启动YarnChild,用于执行Mapper任务以及reducer任务,一个数据节点默认最多开启2个YarnChild,而在哪个数据节点开启YarnChild也是随机的)
20905 Jps
20765 MRAppMaster(Job开启后,ResourceManager随机选定某个NodeManager,让其开启一个应用主节点,管理所有应用的运行,生命周期随Job的开启而开启,随Job的消亡而消亡)【开启了两个守护线程,MRAppMaster和YarnChild】
-