反解析hdfs元数据分析hive分区文件统计_hdfs fsimage解析获取表分区-优快云博客

本文链接：https://blog.youkuaiyun.com/sunxunyong/article/details/140148465

根据fsimage反解析出的hive表，查询出所有hive目录：
select path from hdfsinfo.fsimage_info_csv where path like ‘/user/hive/warehouse/%’;

1、#删除文件空格和|
sed -i ‘s/[[:space:]]||//g’ out.log
sed -i ‘s/// /g’ out.log
sed -i ‘s/^ //g’ out.log
删除包含特定字符行
sed -i ‘/±------------------/d’ out.log
sed -i ‘/^path/d’ out.log

2、#取最后一列正则匹配"=“,若最后一个列不包含”=“,最后一列设置为空串，再输出整行,若包含”=“则直接输出整行。
nohup cat out.log| awk -F ’ ’ '{if( $KaTeX parse error: Expected '}', got 'EOF' at end of input: NF !~/=/) {$ NF=”";print $0;} else {print $0;}}’ | cut -c 21- >> Formatout.log 2>&1 &
3、#统计重复行数量
nohup sort Formatout.log | uniq -c | sed ‘s/¹*//g’ | sort -rn >> SortFormatout.log 2>&1 &

命令细分解释：
uniq -c 统计重复次数,
sort-rn 识别每行开头的数字并降序,
cut -c 21- 取出每行第21个及其以后的字符。
sed ‘s/²*//g’ 去除前的tab