2016年08月28日 14:07:25 小小程序员1986 阅读数:1405
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.youkuaiyun.com/jethai/article/details/52345352
#!/bin/bash
die () {
echo >&2 "$@"
echo "usage:"
echo " $0 check|split table_name [split_size]"
exit 1
}
[[ "$#" -lt 2 ]] && die "at least 2 arguments required, $# provided"
COMMAND=$1
TABLE=$2
SIZE="${3:-1073741824}"
split() {
region_key=`python /home/hduser/hbase/hbase-scan.py -t hbase:meta -f "RowFilter (=, 'substring:$1')"`
echo "split '$region_key'" | hbase shell
}
if [ "$COMMAND" != "check" ] ; then
for region in `hadoop fs -ls /hbase/data/default/$TABLE | awk {'print $8'}`
do
[[ ${region##*/} =~ ^\. ]] && continue
[[ `hadoop fs -du -s $region | awk {'print $1'}` -gt $SIZE ]] && split ${region##*/}
done
# check after split
sleep 60
fi
for region in `hadoop fs -ls /hbase/data/default/$TABLE | awk {'print $8'}`
do
[[ ${region##*/} =~ ^\. ]] && continue
[[ `hadoop fs -du -s $region | awk {'print $1'}` -gt $SIZE ]] && echo "${region##*/} (`hadoop fs -du -s -h $region | awk {'print $1 $2'}`) is a huge region" || echo "${region##*/} (`hadoop fs -du -s -h $region | awk {'print $1 $2'}`) is a small region"
done
hbase-scan.py
-
import subprocess -
import datetime -
import argparse -
import csv -
import gzip -
import happybase -
import logging -
def connect_to_hbase(): -
return happybase.Connection('itr-hbasetest01') -
def main(): -
logging.basicConfig(format='%(asctime)s %(name)s %(levelname)s: %(message)s',level=logging.INFO) -
argp = argparse.ArgumentParser(description='EventLog Reader') -
argp.add_argument('-t','--table', dest='table', default='eventlog') -
argp.add_argument('-p','--prefix', dest='prefix') -
argp.add_argument('-f','--filter', dest='filter') -
argp.add_argument('-l','--limit', dest='limit', default=10) -
args = argp.parse_args() -
hbase_conn = connect_to_hbase() -
table = hbase_conn.table(args.table) -
logging.info("scan start") -
scanner = table.scan(row_prefix=args.prefix, batch_size=1000, limit=int(args.limit), filter=args.filter) -
logging.info("scan done") -
i = 0 -
for key, data in scanner: -
logging.info(key) -
print key -
i+=1 -
logging.info('%s rows read in total', i) -
if __name__ == '__main__': -
main()
本文出自 “点滴积累” 博客,请务必保留此出处http://tianxingzhe.blog.51cto.com/3390077/1717714
本文介绍了一个用于HBase的Shell脚本,该脚本能够检查和自动拆分过大的Region,以提高HBase数据库的性能。通过设置阈值,脚本会遍历所有超过指定大小的Region并进行拆分,同时提供了检查功能确保拆分后的Region大小合理。
2728

被折叠的 条评论
为什么被折叠?



