租约管理是在客户端写文件时用到的一种机制,当客户端打文件进行写操作是要先申请租约,该租约由LeaseManager来管理,说通俗点,客户端写文件不能无时间限制的写,打开文件持续太长会影响其他用户的使用,这时就要有个机制来限制,在HDFS中采用的租约机制,看下一个lease的成员变量就明白了:
private final String holder; //持有者,类似DFSClient_1960866591的字符串
private long lastUpdate; //最后一次更新的时间戳
private final Collection<String> paths =
new TreeSet<String>(); //该lease所包含的文件名,如/a.txt这种字符串
当LeaseManager$Monitor循环检测租约时,判断是否会超出硬限制,如果没超出则直接返回,不做其他操作,如果超出则进行内部释放,通过fsnamesystem.internalReleaseLeaseOne(oldest,p);来实现。
为了有个感性认识,我们看下有关租约的日志,下面这些日志是记录在Namenode日志中的:
13/08/2612:11:00 INFO hdfs.StateChange: BLOCK* NameSystem.addToInvalidates:blk_5158458134414014528 is added to invalidSet of 192.168.0.43:50010
13/08/2612:11:00 INFO namenode.FSNamesystem: Number of transactions: 1 Total time fortransactions(ms): 0Number of transactions batched in Syncs: 0 Number of syncs:0 SyncTimes(ms): 0 0
13/08/2612:11:00 INFO hdfs.StateChange: BLOCK* ask 192.168.0.43:50010 to delete blk_5158458134414014528_1018
13/08/2613:12:40 INFO namenode.LeaseManager: Lease [Lease. Holder: DFSClient_1960866591, pendingcreates:1] has expired hard limit
13/08/2613:13:42 INFO namenode.FSNamesystem: Recovering lease=[Lease. Holder: DFSClient_1960866591, pendingcreates:1], src=/cat1.txt
13/08/2613:14:21 INFO hdfs.StateChange: Removing lease on file /cat1.txt from clientDFSClient_1960866591
13/08/2613:14:29 WARN hdfs.StateChange: BLOCK* internalReleaseLease: No blocks found,lease removed for /cat1.txt
13/08/2613:15:44 INFO hdfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost heartbeatfrom 192.168.0.43:50010
13/08/2613:15:44 INFO net.NetworkTopology: Removing a node:/default-rack/192.168.0.43:50010
可以看到在13/08/26 13:12:40触发了lease的硬限制,下面日志显示尝试恢复这个租约,但检测结果为打开的这个文件有0个块(因为我测试的时候是先创建文件然后等待超时),所以失败,最后删除这个租约,后面是一个datanode心跳超时报的错误,暂不关心。
然后再看客户端日志:
13/08/2613:15:44 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException:org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No leaseon /cat1.txt File is not open for writing. Holder DFSClient_1960866591 does nothave any open files.
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1639)
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:1622)
atorg.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1538)
atorg.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
atsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
atorg.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
atorg.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
atorg.apache.hadoop.ipc.Server$Handler.run(Server.java:1383)
可以看到,客户端睡醒后再写入文件时已经报了租约超时的异常,最终这个写入没有成功,有了这些日志后我们看下源码.
这是个LeaseManager的内部类,周期性的检测租约信息,如果发现超时的则会删除
class Monitor implements Runnable {
final String name = getClass().getSimpleName();
/** Check leases periodically. */
public void run() {
for(; fsnamesystem.isRunning(); ) {
synchronized(fsnamesystem) {
checkLeases();
}
try {
Thread.sleep(2000); //检测周期为2秒
} catch(InterruptedException ie) {
if (LOG.isDebugEnabled()) {
LOG.debug(name + " is interrupted", ie);
}
}
}
}
}
上面的日志中我们看到有个硬限制hard limit,其实在检测的时候有两个限制:
public static final long LEASE_SOFTLIMIT_PERIOD = 60 *1000; //1分钟
public static final long LEASE_HARDLIMIT_PERIOD= 60 * LEASE_SOFTLIMIT_PERIOD;//一小时
在Monitor类中主要检测函数为checkLeases();,进入这个源码看看
synchronized void checkLeases() {
for(; sortedLeases.size()> 0; ) {
final Lease oldest = sortedLeases.first();
//检测最老租约,如果他没过时,则其他的一定没过时,因为租约存放在sortedLeases中,是排序过的
if (!oldest.expiredHardLimit()) {
return;
}
LOG.info("Lease " + oldest + " has expired hard limit");
final List<String> removing = new ArrayList<String>();
// need to create a copy of the oldest lease paths, becuase
// internalReleaseLease() removes paths corresponding toempty files,
// i.e. it needs to modify the collection being iteratedover
// causing ConcurrentModificationException
String[] leasePaths = new String[oldest.getPaths().size()];
oldest.getPaths().toArray(leasePaths);
for(String p : leasePaths) {
try {
//开始内部释放,注意这个函数是在FSNameSystem中的,稍后我们还会分析这个函数
fsnamesystem.internalReleaseLeaseOne(oldest, p);
} catch (IOException e) {
LOG.error("Cannot release the path "+p+" in the lease "+oldest, e);
removing.add(p);
}
}
//真正开始删除,租约与文件路径的对应关系是放在sortedLeasesByPath中的,为SortedMap<String, Lease>
for(String p : removing) {
removeLease(oldest, p);
}
}
}
下面分析内部释放函数fsnamesystem.internalReleaseLeaseOne(oldest, p)
void internalReleaseLeaseOne(Lease lease, String src) throws IOException {
assert Thread.holdsLock(this);
LOG.info("Recovering lease=" + lease + ", src=" + src);
INodeFile iFile = dir.getFileINode(src);
//获得文件为空的情况
if (iFile == null) {
final String message = "DIR*NameSystem.internalReleaseCreate: "
+ "attempt to release a create lock on"
+ src + " file does not exist.";
NameNode.stateChangeLog.warn(message);
throw new IOException(message);
}
//文件已经关闭的情况
if (!iFile.isUnderConstruction()) {
final String message = "DIR*NameSystem.internalReleaseCreate: "
+ "attempt to release a create lock on"
+src + " but fileis already closed.";
NameNode.stateChangeLog.warn(message);
throw new IOException(message);
}
INodeFileUnderConstruction pendingFile =(INodeFileUnderConstruction) iFile;
// 尝试恢复租约,如果文件块数为0,则不能恢复成功,否则重新分配租约
if (pendingFile.getTargets() == null ||
pendingFile.getTargets().length == 0) {
if (pendingFile.getBlocks().length == 0) {
//收回租约并打印信息,有兴趣可以看下下面这个函数
finalizeINodeFileUnderConstruction(src,pendingFile);
NameNode.stateChangeLog.warn("BLOCK*"
+ " internalReleaseLease: No blocksfound, lease removed for " + src);
return;
}
// setup the Inode.targets for the lastblock from the blocksMap
//
Block[] blocks = pendingFile.getBlocks();
Block last = blocks[blocks.length-1];
DatanodeDescriptor[] targets =
new DatanodeDescriptor[blocksMap.numNodes(last)];
Iterator<DatanodeDescriptor> it = blocksMap.nodeIterator(last);
for (int i = 0; it != null && it.hasNext(); i++) {
targets[i] = it.next();
}
pendingFile.setTargets(targets);
}
// 开始真正恢复租约
pendingFile.assignPrimaryDatanode();
Lease reassignedLease = reassignLease(
lease, src, HdfsConstants.NN_RECOVERY_LEASEHOLDER, pendingFile);
leaseManager.renewLease(reassignedLease);
}
重新分配租约其实也很简单,只是更新了下lease的时间戳
private void renew() {
this.lastUpdate = FSNamesystem.now();
}