前言
近期,数据仓库因为积压数据较大,故对数据访问进行统计,进而计算数据生命周期,决定是否删除,但是对于不同用户的访问,无法做到统计所有访问入口,故从最底层的hdfs审计日志进行解析,对hdfs namenode的审计日志解析,获取当前数据的访问时间,访问目录,访问用户等信息,进行整理数据访问生命周期
审计日志类型
- 审计日志大致分两类,read/write,通过分析源码找到其包含类型
OperationCategory.READ
operationName = "listOpenFiles";
operationName = "open";
operationName = "getAdditionalBlock";
operationName = needBlockToken ? "open" : "getfileinfo";
operationName = "isFileClosed";
operationName = "contentSummary";
operationName = "quotaUsage";
operationName = "listStatus";
operationName = "listSnapshottableDirectory";
operationName = "computeSnapshotDiff";
operationName = "computeSnapshotDiff";
operationName = "queryRollingUpgrade";
operationName = "listCacheDirectives";
operationName = "listCacheDirectives";
operationName = "getAclStatus";
operationName = "getEZForPath";
operationName = "listEncryptionZones";
operationName = "listReencryptionStatus";
operationName = "getErasureCodingPolicy"
operationName = "getXAttrs";
operationName = "listXAttrs";
operationName = "checkAccess";
OperationCategory.WRITE
operationName = "setPermission";
operationName = "concat";
operationName = "setTimes";
operationName = "truncate";
operationName = "createSymlink";
operationName = "setReplication";
operationName = "setStoragePolicy";
operationName = "satisfyStoragePolicy";
operationName = "unsetStoragePolicy";
operationName = "rename";
operationName = "delete";
operationName = "mkdirs";
operationName = "setBalancerBandwidth";
operationName = "getDelegationToken";
operationName = "renewDelegationToken";
operationName = "cancelDelegationToken";
operationName = "disallowSnapshot";
operationName = "createSnapshot";
operationName = "renameSnapshot";
operationName = "deleteSnapshot";
operationName = "startRollingUpgrade";
operationName = "finalizeRollingUpgrade";
operationName = "addCacheDirective";
operationName = "modifyCacheDirective";
operationName = "removeCacheDirective";
operationName = "modifyCachePool";
operationName = "setAcl";
operationName = "setReplication";
operationName = "append";
日常分析
- 通过分析日志情况,访问的操作命令大致如下
//访问数据文件之前,获取文件信息
getfileinfo
//创建数据库表时,会对应有创建文件的操作
mkdirs
//进行访问控制权限的操作
//创建数据文件
create
setAcl -- checkOperation(OperationCategory.WRITE);
//获取访问控制权限的信息
getAclStatus -- checkOperation(OperationCategory.READ);
//数据文件重命名吗
rename -- Change the indicated filename.
//访问数据文件
open -- Get block locations within the specified range.
- 数据库表创建,插入时,会有create,mkdir等操作
- 数据表访问,分区查询时会有getFileStauts,listFileStatus,open灯光操作
#参考文档
- https://cloud.tencent.com/developer/article/1357831