HBase 高级操作之过滤器

最新推荐文章于 2025-02-25 09:27:21 发布

zhangshk_

最新推荐文章于 2025-02-25 09:27:21 发布

阅读量1k

点赞数

分类专栏： hbase存储系统

本文链接：https://blog.youkuaiyun.com/zhangshk_/article/details/83691761

版权

hbase存储系统专栏收录该内容

12 篇文章

订阅专栏

过滤器能干什么？

HBase为筛选数据提供了一组过滤器，通过过滤器可以在HBase中的数据的多个维度（行，列，版本等）上对数据进行过滤筛选操作。
通常来说，通过行建，列来筛选数据的应用场景较多。

HBase过滤器分类

1.基于行，列，单元值的过滤器

1.1----- 基于行的过滤器

PrefixFilter ：行的前缀匹配
PageFilter ：基于行的分页

1.2------基于列的过滤器

ColumnPrefixFilter：列前缀匹配
FirstKeyOnlyFilter：只返回每一行的第一列

1.3-----基于单元值的过滤器

KeyOnlyFilter：返回的数据不包括单元值，之包含行建和列
TimeStampFilter：根据数据的时间戳版本进行过滤

1.4-----基于列和单元值的过滤器

SingleColumnValueFilter：对该列的单元值进行比较过滤
SingleColumnExcludeFilter：对该列的单元值进行比较过滤

2.比较过滤器

2.1-----比较过滤器通常需要一个比较运算符和一个比较器实现过滤

RowFilter
FamilyFilter
QualifierFilter
ValueFilter

最常用的过滤器

过滤器（Filter）	功能
RowFilter	筛选出匹配的所有的行
PrefixFilter	筛选出具有特定前缀的行建的数据
KeyOnlyFilter	只返回每行的行键，值全部为空
ColumnPrefixFilter	按照列名的前缀来筛选单元格
ValueFilter	按照具体的值来筛选单元格
TimeStampsFilter	根据时间戳版本进行过滤
FilterList	用于综合使用多个过滤器

下面就对这些常用的Filter一一测试：
首先看一下我们的表：

hbase(main):003:0> scan 'man'
ROW                                    COLUMN+CELL                                                                                                   
 rowkey1                               column=basic:age, timestamp=1541251830545, value=20                                                           
 rowkey1                               column=basic:name, timestamp=1541251830506, value=zs                                                          
 rowkey1                               column=basic:sex, timestamp=1541251830540, value=male                                                         
 rowkey1                               column=extend:job, timestamp=1541251830548, value=student                                                     
 rowkey1                               column=extend:salary, timestamp=1541251830553, value=0                                                        
 rowkey2                               column=basic:age, timestamp=1541251830565, value=24                                                           
 rowkey2                               column=basic:name, timestamp=1541251830557, value=jack                                                        
 rowkey2                               column=basic:sex, timestamp=1541251830561, value=male                                                         
 rowkey2                               column=extend:job, timestamp=1541251830569, value=IT                                                          
 rowkey2                               column=extend:salary, timestamp=1541251830572, value=10000                                                    
 rowkey3                               column=basic:age, timestamp=1541251830585, value=19                                                           
 rowkey3                               column=basic:name, timestamp=1541251830577, value=rose                                                        
 rowkey3                               column=basic:sex, timestamp=1541251830580, value=female                                                       
 rowkey3                               column=extend:job, timestamp=1541251830588, value=teacher                                                     
 rowkey3                               column=extend:salary, timestamp=1541251830592, value=2000                                                     
3 row(s) in 0.2140 seconds

下面的代码基于之前的HBase Java API基本操作https://blog.youkuaiyun.com/zhangshk_/article/details/83690790，我把需要用到的一个方法先贴出来

/**
     *
     * @param tableName
     * @param startKey
     * @param stopKey
     * @param filterList
     * @return
     */
    public static ResultScanner getScanner(String tableName,String startKey,String stopKey,FilterList filterList){
        try( Table table = HBaseConn.getTable(tableName)){
            Scan scan = new Scan();
            scan.setFilter(filterList);
            scan.setStartRow(Bytes.toBytes(startKey));
            scan.setStopRow(Bytes.toBytes(stopKey));
            scan.setCaching(1000);
            ResultScanner results = table.getScanner(scan);
            results.forEach(result -> {
                System.out.println("rowkey == "+Bytes.toString(result.getRow()));
                System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
                System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
                System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
                System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
                System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
            });
            return results;
        }catch (Exception e){
            e.printStackTrace();
        }
        return null;
    }

下面是一系列的过滤器的测试方法：

package com.zsk.hbase.api;

import org.apache.hadoop.hbase.CellComparator;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;

import java.util.Arrays;

public class HBaseFilterTest {

    @Test
    public void testRowFileterTest(){
        Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("rowkey1")));
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testPrefixFileterTest(){
        Filter filter = new PrefixFilter(Bytes.toBytes("row"));
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testKeyOnlyFileterTest(){
        Filter filter = new KeyOnlyFilter();
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testColumnPrefixFileterTest(){
        Filter filter = new ColumnPrefixFilter(Bytes.toBytes("nam"));
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testValueFileterTest(){
        Filter filter = new ValueFilter(CompareFilter.CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes("zs")) );
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
    @Test
    public void testTimeStampFileterTest(){
        Filter filter = new TimestampsFilter(Arrays.asList(1541251830545L,1541251830565L));
        FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
        ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
        results.forEach(result -> {
            System.out.println("rowkey == "+Bytes.toString(result.getRow()));
            System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
            System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
            System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
            System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
            System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
        });
    }
}

模式总结：
首先声明一个Filter,然后将Filter添加到FilterList中，同时添加到FilterList的时候，可以指定他的Operator，是不是所有的过滤器都必须通过（Operator.MUST_PASS_ALL）或者只通过一个就可以了（Operator.MUST_PASS_ONE）。
然后将FilterList添加到getScanner方法中就可以了。
其实，还是很简单的。

所有的Filter 都是在服务端生效的，如果我们自定义Filter，那么需要将开发完成的Filter打成jar包，发送到服务端。对于生产环境，一般不会自定义过滤器。
而一般情况下，我们通过对rowkey进行合理的设计，就可以解决根据不同场景的查询了，没有必要自定义Filter。