Hadoop_DataNode_代码分析(4)

本文深入探讨了DataNode在分布式文件系统中的角色及其关键操作流程,包括写操作、替换block、通知NameNode、心跳与Block状态报告机制,以及DataNode间的数据传输方式。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

(1)写操作:BlockReceiver作为处理writeBlock时候的主要类。写是通过管道实现的,写block可能是用户请求也可能是NameNode要求的block拷贝命令,不同情况处理不同。管道上的中间节点有四个方向的网络数据流,还有两个写block数据文件和校验文件的流。主线程负责读数据报直接转发给下一个DataNode,而单独启动新线程PacketResponder用来处理给上级的回复消息和接受下级的发送过来的消息。PacketResponder应用心跳机制,到一定时间会给上级发送心跳,这样只要最后一个DataNode按时发送心跳即可,其他的中间DataNode只要收到心跳并且转发就可以了,最大的上级就可以知道整个链路的情况。由此可知最后一个DataNode节点的PacketResponder应该向上级发送两种信息:心跳和完成写入的包,而中间节点收到下级节点发送的包序号时要和自己写入完成的包序号做比较从而保证整个链路写入的正确性。

(2)替换block操作:如果一个DataNode接受到了替换Block请求,那么这个请求中包含了要替换block所在的源DataNode,当前节点要向源节点发送拷贝Block的请求,拷贝请求和读Block类似。

(3)DataNode上面的block数据发生了任何改变,无论添加还是删除,都要通过notifyNamenodeReceivedBlock来告知NameNode。

(4)DataNode利用保存在receivedBlockList和delHints两个列表中的信息完成Block状态变化报告。receivedBlockList表明在这个DataNode成功创建的新的数据块,而delHints,是可以删除该数据块的节点。这两个列表的元素是一一对应的,如果delHints为空,就说明不需要删除。

(5)心跳和Block状态报告可以返回命令,这也是NameNode先DataNode发起请求的唯一方法。心跳以heartBeatInterval间隔发送。Block状态报告以blockReportInterval间隔发送。返回的命令主要有

  DNA_TRANSFER:拷贝数据块到其他DataNode

 

  DNA_INVALIDATE:删除数据块(简单方法)

  DNA_SHUTDOWN:关闭DataNode(简单方法)

  DNA_REGISTERDataNode重新注册(简单方法)

  DNA_FINALIZE:提交升级(简单方法)

  DNA_RECOVERBLOCK:恢复数据块

命令执行的模式就是利用socket连接对应地址的serversocket,发送请求使得DataNode准备做什么,这些个什么就是:

OP_WRITE_BLOCK (80):写数据块

OP_READ_BLOCK (81):读数据块

OP_READ_METADATA (82):读数据块元文件

OP_REPLACE_BLOCK (83):替换一个数据块

OP_COPY_BLOCK (84):拷贝一个数据块

OP_BLOCK_CHECKSUM (85):读数据块检验码

(6)transferBlocks方法将为每一个Block启动一个DataTransfer线程,用于传输数据,每个进程相当于一个Client去请求目的DataNode。

@echo off @rem Licensed to the Apache Software Foundation (ASF) under one or more @rem contributor license agreements. See the NOTICE file distributed with @rem this work for additional information regarding copyright ownership. @rem The ASF licenses this file to You under the Apache License, Version 2.0 @rem (the "License"); you may not use this file except in compliance with @rem the License. You may obtain a copy of the License at @rem @rem http://www.apache.org/licenses/LICENSE-2.0 @rem @rem Unless required by applicable law or agreed to in writing, software @rem distributed under the License is distributed on an "AS IS" BASIS, @rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @rem See the License for the specific language governing permissions and @rem limitations under the License. @rem Set Hadoop-specific environment variables here. @rem The only required environment variable is JAVA_HOME. All others are @rem optional. When running a distributed configuration it is best to @rem set JAVA_HOME in this file, so that it is correctly defined on @rem remote nodes. @rem The java implementation to use. Required. set JAVA_HOME=%JAVA_HOME% @rem The jsvc implementation to use. Jsvc is required to run secure datanodes. @rem set JSVC_HOME=%JSVC_HOME% @rem set HADOOP_CONF_DIR= @rem Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. if exist %HADOOP_HOME%\contrib\capacity-scheduler ( if not defined HADOOP_CLASSPATH ( set HADOOP_CLASSPATH=%HADOOP_HOME%\contrib\capacity-scheduler\*.jar ) else ( set HADOOP_CLASSPATH=%HADOOP_CLASSPATH%;%HADOOP_HOME%\contrib\capacity-scheduler\*.jar ) ) @rem The maximum amount of heap to use, in MB. Default is 1000. @rem set HADOOP_HEAPSIZE= @rem set HADOOP_NAMENODE_INIT_HEAPSIZE="" @rem Extra Java runtime options. Empty by default. @rem set HADOOP_OPTS=%HADOOP_OPTS% -Djava.net.preferIPv4Stack=true @rem Command specific options appended to HADOOP_OPTS when specified if not defined HADOOP_SECURITY_LOGGER ( set HADOOP_SECURITY_LOGGER=INFO,RFAS ) if not defined HDFS_AUDIT_LOGGER ( set HDFS_AUDIT_LOGGER=INFO,NullAppender ) set HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_NAMENODE_OPTS% set HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS %HADOOP_DATANODE_OPTS% set HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_SECONDARYNAMENODE_OPTS% @rem The following applies to multiple commands (fs, dfs, fsck, distcp etc) set HADOOP_CLIENT_OPTS=-Xmx512m %HADOOP_CLIENT_OPTS% @rem set HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData %HADOOP_JAVA_PLATFORM_OPTS%" @rem On secure datanodes, user to run the datanode as after dropping privileges set HADOOP_SECURE_DN_USER=%HADOOP_SECURE_DN_USER% @rem Where log files are stored. %HADOOP_HOME%/logs by default. @rem set HADOOP_LOG_DIR=%HADOOP_LOG_DIR%\%USERNAME% @rem Where log files are stored in the secure data environment. set HADOOP_SECURE_DN_LOG_DIR=%HADOOP_LOG_DIR%\%HADOOP_HDFS_USER% @rem @rem Router-based HDFS Federation specific parameters @rem Specify the JVM options to be used when starting the RBF Routers. @rem These options will be appended to the options specified as HADOOP_OPTS @rem and therefore may override any similar flags set in HADOOP_OPTS @rem @rem set HADOOP_DFSROUTER_OPTS="" @rem @rem The directory where pid files are stored. /tmp by default. @rem NOTE: this should be set to a directory that can only be written to by @rem the user that will run the hadoop daemons. Otherwise there is the @rem potential for a symlink attack. set HADOOP_PID_DIR=%HADOOP_PID_DIR% set HADOOP_SECURE_DN_PID_DIR=%HADOOP_PID_DIR% @rem A string representing this instance of hadoop. %USERNAME% by default. set HADOOP_IDENT_STRING=%USERNAME% 这个hadoophadoop-env.cmd该怎么改
最新发布
06-04
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值