org.apache.hadoop.hbase.coprocessor.AggregateImplementation 来统计表的行数

最新推荐文章于 2023-07-26 09:32:06 发布

原创最新推荐文章于 2023-07-26 09:32:06 发布 · 6.2k 阅读

2 ·

CC 4.0 BY-SA版权

hbase 专栏收录该内容

27 篇文章

订阅专栏

本文介绍如何利用HBase的Coprocessor进行表记录数统计，并对比hbaseshell方式，展示Coprocessor在性能上的显著优势。通过具体步骤说明如何启用Coprocessor并实现高效的数据聚合。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

hbase自带了一个聚合coprocessor类：org.apache.hadoop.hbase.coprocessor.AggregateImplementation。使用该类可以count一张表的总记录数。

当然在hbase shell下面也可以count <table_name>来统计。我这里比较了一下两者的执行时间，我有一张表有700多万的数据，在hbase shell下count足足花费了我12分钟的时间，而用coprocessor来统计，只花费了78秒！！！由此可见coprocessor的强大。

hbase aip 添加协处理器：

              Configuration hbaseconfig = HBaseConfiguration.create();

              HBaseAdmin hbaseAdmin = new HBaseAdmin(hbaseconfig);
              hbaseAdmin.disableTable(TABLE_NAME);

              HTableDescriptor htd = hbaseAdmin.getTableDescriptor(TABLE_NAME);
              htd.addCoprocessor(AggregateImplementation.class.getName());
              hbaseAdmin.modifyTable(TABLE_NAME, htd);
              hbaseAdmin.enableTable(TABLE_NAME);
              hbaseAdmin.close();

使用hbase提供的聚合coprocessor:

 AggregationClient aggregationClient = new AggregationClient(hbaseconfig);
                Scan scan = new Scan();
                scan.addFamily(Bytes.toBytes("fr"));
                Date start = new Date();
                long rowcount = aggregationClient.rowCount(TABLE_NAME,
                                new LongColumnInterpreter(), scan);
                Date end = new Date();
                System.out.println("rowcount:" + rowcount);
                System.out.println("timecost:" + (end.getTime() - start.getTime()));

hbase shell添加coprocessor:

disable 'member'
alter 'member',METHOD => 'table_att','coprocessor' => 'hdfs://master24:9000/user/hadoop/jars/test.jar|mycoprocessor.SampleCoprocessor|1001|'
enable 'member'

hbase shell 删除coprocessor:

disable 'member'
alter 'member',METHOD => 'table_att_unset',NAME =>'coprocessor$1'
enable 'member'