Hadoop_DataNode_代码分析(3)

本文深入分析了DataNode在数据块接收与发送过程中的动态行为,重点介绍了DataXceiverServer与DataXceiver的工作原理及其实现方式,包括BlockSender与BlockReceiver的辅助作用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

通过上面的一系列介绍,我们知道了DataNode工作时的文件结构和文件结构在内存中的对应对象。下面我们可以来开始分析DataNode上的动态行为。首先我们来分析DataXceiverServer和DataXceiver。DataNode上数据块的接受/发送并没有采用我们前面介绍的RPC机制,原因很简单,RPC是一个命令式的接口,而DataNode处理数据部分,往往是一种流式机制。DataXceiverServer和DataXceiver就是这个机制的实现。其中,DataXceiver还依赖于两个辅助类:BlockSender和BlockReceiver。

DataXceiverServer很简单,它打开一个端口,然后每接收到一个连接,就创建一个DataXceiver,服务于该连接,DataXceiver是一个线程读一次操作请求进行操作之后就返回,并记录该连接的socket,对应的实现在DataXceiverServer的run方法里。当系统关闭时,DataXceiverServer将关闭监听的socket和所有DataXceiver的socket,这样就导致了DataXceiver出错并结束线程。DataXceiverServer接受到的数据主要有操作码+操作数据+用户名。

(1)BlockSender用来发送block数据,返回给用户的是:成功与否+校验类型+实际offset(因为校验块的原因和用户请求的offset不一致)。BlockSender有配置参数corruptChecksumOk(校验数据读入出错忽略,出错用零填充),chunkOffsetOK(是否要告知实际的offset,如上所述),verifyChecksum(是否要求在把校验数据和实际数据读入包缓存中时校验数据,也就是在发送之前),向客户端传包的时候第一、二个参数为true,第三为false,为的是尽快发送数据。而用来校验已有数据时使用第一二参数为false,第三参数为true,为了及时发现错误数据。readBlock完成实际读数据的操作,比较简单。sendChunks方法中,对于客户端传包的包只有校验和而实际数据通过管道传输,具体见函数。

(2)SocketIOWithTimeout,被其他类继承完成超时非阻塞socket,真正的读写操作由子类控制,故设置抽象方法performIO,使用SelectorPool类来完成高效selector的新建和重用,子类只需要告诉他要注册的channel和需要select的操作。

@echo off @rem Licensed to the Apache Software Foundation (ASF) under one or more @rem contributor license agreements. See the NOTICE file distributed with @rem this work for additional information regarding copyright ownership. @rem The ASF licenses this file to You under the Apache License, Version 2.0 @rem (the "License"); you may not use this file except in compliance with @rem the License. You may obtain a copy of the License at @rem @rem http://www.apache.org/licenses/LICENSE-2.0 @rem @rem Unless required by applicable law or agreed to in writing, software @rem distributed under the License is distributed on an "AS IS" BASIS, @rem WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. @rem See the License for the specific language governing permissions and @rem limitations under the License. @rem Set Hadoop-specific environment variables here. @rem The only required environment variable is JAVA_HOME. All others are @rem optional. When running a distributed configuration it is best to @rem set JAVA_HOME in this file, so that it is correctly defined on @rem remote nodes. @rem The java implementation to use. Required. set JAVA_HOME=%JAVA_HOME% @rem The jsvc implementation to use. Jsvc is required to run secure datanodes. @rem set JSVC_HOME=%JSVC_HOME% @rem set HADOOP_CONF_DIR= @rem Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. if exist %HADOOP_HOME%\contrib\capacity-scheduler ( if not defined HADOOP_CLASSPATH ( set HADOOP_CLASSPATH=%HADOOP_HOME%\contrib\capacity-scheduler\*.jar ) else ( set HADOOP_CLASSPATH=%HADOOP_CLASSPATH%;%HADOOP_HOME%\contrib\capacity-scheduler\*.jar ) ) @rem The maximum amount of heap to use, in MB. Default is 1000. @rem set HADOOP_HEAPSIZE= @rem set HADOOP_NAMENODE_INIT_HEAPSIZE="" @rem Extra Java runtime options. Empty by default. @rem set HADOOP_OPTS=%HADOOP_OPTS% -Djava.net.preferIPv4Stack=true @rem Command specific options appended to HADOOP_OPTS when specified if not defined HADOOP_SECURITY_LOGGER ( set HADOOP_SECURITY_LOGGER=INFO,RFAS ) if not defined HDFS_AUDIT_LOGGER ( set HDFS_AUDIT_LOGGER=INFO,NullAppender ) set HADOOP_NAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_NAMENODE_OPTS% set HADOOP_DATANODE_OPTS=-Dhadoop.security.logger=ERROR,RFAS %HADOOP_DATANODE_OPTS% set HADOOP_SECONDARYNAMENODE_OPTS=-Dhadoop.security.logger=%HADOOP_SECURITY_LOGGER% -Dhdfs.audit.logger=%HDFS_AUDIT_LOGGER% %HADOOP_SECONDARYNAMENODE_OPTS% @rem The following applies to multiple commands (fs, dfs, fsck, distcp etc) set HADOOP_CLIENT_OPTS=-Xmx512m %HADOOP_CLIENT_OPTS% @rem set HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData %HADOOP_JAVA_PLATFORM_OPTS%" @rem On secure datanodes, user to run the datanode as after dropping privileges set HADOOP_SECURE_DN_USER=%HADOOP_SECURE_DN_USER% @rem Where log files are stored. %HADOOP_HOME%/logs by default. @rem set HADOOP_LOG_DIR=%HADOOP_LOG_DIR%\%USERNAME% @rem Where log files are stored in the secure data environment. set HADOOP_SECURE_DN_LOG_DIR=%HADOOP_LOG_DIR%\%HADOOP_HDFS_USER% @rem @rem Router-based HDFS Federation specific parameters @rem Specify the JVM options to be used when starting the RBF Routers. @rem These options will be appended to the options specified as HADOOP_OPTS @rem and therefore may override any similar flags set in HADOOP_OPTS @rem @rem set HADOOP_DFSROUTER_OPTS="" @rem @rem The directory where pid files are stored. /tmp by default. @rem NOTE: this should be set to a directory that can only be written to by @rem the user that will run the hadoop daemons. Otherwise there is the @rem potential for a symlink attack. set HADOOP_PID_DIR=%HADOOP_PID_DIR% set HADOOP_SECURE_DN_PID_DIR=%HADOOP_PID_DIR% @rem A string representing this instance of hadoop. %USERNAME% by default. set HADOOP_IDENT_STRING=%USERNAME% 这个hadoophadoop-env.cmd该怎么改
最新发布
06-04
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值