过滤器能干什么?
- HBase为筛选数据提供了一组过滤器,通过过滤器可以在HBase中的数据的多个维度(行,列,版本等)上对数据进行过滤筛选操作。
- 通常来说,通过行建,列来筛选数据的应用场景较多。
HBase过滤器分类
1.基于行,列,单元值的过滤器
1.1----- 基于行的过滤器
- PrefixFilter :行的前缀匹配
- PageFilter :基于行的分页
1.2------基于列的过滤器
- ColumnPrefixFilter:列前缀匹配
- FirstKeyOnlyFilter:只返回每一行的第一列
1.3-----基于单元值的过滤器
- KeyOnlyFilter:返回的数据不包括单元值,之包含行建和列
- TimeStampFilter:根据数据的时间戳版本进行过滤
1.4-----基于列和单元值的过滤器
- SingleColumnValueFilter:对该列的单元值进行比较过滤
- SingleColumnExcludeFilter:对该列的单元值进行比较过滤
2.比较过滤器
2.1-----比较过滤器通常需要一个比较运算符和一个比较器实现过滤
- RowFilter
- FamilyFilter
- QualifierFilter
- ValueFilter
最常用的过滤器
过滤器(Filter) | 功能 |
---|---|
RowFilter | 筛选出匹配的所有的行 |
PrefixFilter | 筛选出具有特定前缀的行建的数据 |
KeyOnlyFilter | 只返回每行的行键,值全部为空 |
ColumnPrefixFilter | 按照列名的前缀来筛选单元格 |
ValueFilter | 按照具体的值来筛选单元格 |
TimeStampsFilter | 根据时间戳版本进行过滤 |
FilterList | 用于综合使用多个过滤器 |
下面就对这些常用的Filter一一测试:
首先看一下我们的表:
hbase(main):003:0> scan 'man'
ROW COLUMN+CELL
rowkey1 column=basic:age, timestamp=1541251830545, value=20
rowkey1 column=basic:name, timestamp=1541251830506, value=zs
rowkey1 column=basic:sex, timestamp=1541251830540, value=male
rowkey1 column=extend:job, timestamp=1541251830548, value=student
rowkey1 column=extend:salary, timestamp=1541251830553, value=0
rowkey2 column=basic:age, timestamp=1541251830565, value=24
rowkey2 column=basic:name, timestamp=1541251830557, value=jack
rowkey2 column=basic:sex, timestamp=1541251830561, value=male
rowkey2 column=extend:job, timestamp=1541251830569, value=IT
rowkey2 column=extend:salary, timestamp=1541251830572, value=10000
rowkey3 column=basic:age, timestamp=1541251830585, value=19
rowkey3 column=basic:name, timestamp=1541251830577, value=rose
rowkey3 column=basic:sex, timestamp=1541251830580, value=female
rowkey3 column=extend:job, timestamp=1541251830588, value=teacher
rowkey3 column=extend:salary, timestamp=1541251830592, value=2000
3 row(s) in 0.2140 seconds
下面的代码基于之前的HBase Java API基本操作https://blog.youkuaiyun.com/zhangshk_/article/details/83690790,我把需要用到的一个方法先贴出来
/**
*
* @param tableName
* @param startKey
* @param stopKey
* @param filterList
* @return
*/
public static ResultScanner getScanner(String tableName,String startKey,String stopKey,FilterList filterList){
try( Table table = HBaseConn.getTable(tableName)){
Scan scan = new Scan();
scan.setFilter(filterList);
scan.setStartRow(Bytes.toBytes(startKey));
scan.setStopRow(Bytes.toBytes(stopKey));
scan.setCaching(1000);
ResultScanner results = table.getScanner(scan);
results.forEach(result -> {
System.out.println("rowkey == "+Bytes.toString(result.getRow()));
System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
});
return results;
}catch (Exception e){
e.printStackTrace();
}
return null;
}
下面是一系列的过滤器的测试方法:
package com.zsk.hbase.api;
import org.apache.hadoop.hbase.CellComparator;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.filter.*;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;
import java.util.Arrays;
public class HBaseFilterTest {
@Test
public void testRowFileterTest(){
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes("rowkey1")));
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
results.forEach(result -> {
System.out.println("rowkey == "+Bytes.toString(result.getRow()));
System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
});
}
@Test
public void testPrefixFileterTest(){
Filter filter = new PrefixFilter(Bytes.toBytes("row"));
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
results.forEach(result -> {
System.out.println("rowkey == "+Bytes.toString(result.getRow()));
System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
});
}
@Test
public void testKeyOnlyFileterTest(){
Filter filter = new KeyOnlyFilter();
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
results.forEach(result -> {
System.out.println("rowkey == "+Bytes.toString(result.getRow()));
System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
});
}
@Test
public void testColumnPrefixFileterTest(){
Filter filter = new ColumnPrefixFilter(Bytes.toBytes("nam"));
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
results.forEach(result -> {
System.out.println("rowkey == "+Bytes.toString(result.getRow()));
System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
});
}
@Test
public void testValueFileterTest(){
Filter filter = new ValueFilter(CompareFilter.CompareOp.EQUAL,new BinaryComparator(Bytes.toBytes("zs")) );
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
results.forEach(result -> {
System.out.println("rowkey == "+Bytes.toString(result.getRow()));
System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
});
}
@Test
public void testTimeStampFileterTest(){
Filter filter = new TimestampsFilter(Arrays.asList(1541251830545L,1541251830565L));
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays.asList(filter));
ResultScanner results = HBaseUtil.getScanner("man", "rowkey1", "rowkey3", filterList);
results.forEach(result -> {
System.out.println("rowkey == "+Bytes.toString(result.getRow()));
System.out.println("basic:name == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("name"))));
System.out.println("basic:age == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("age"))));
System.out.println("basic:sex == "+Bytes.toString(result.getValue(Bytes.toBytes("basic"), Bytes.toBytes("sex"))));
System.out.println("basic:salary == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("salary"))));
System.out.println("basic:job == "+Bytes.toString(result.getValue(Bytes.toBytes("extend"), Bytes.toBytes("job"))));
});
}
}
模式总结:
首先声明一个Filter,然后将Filter添加到FilterList中,同时添加到FilterList的时候,可以指定他的Operator,是不是所有的过滤器都必须通过(Operator.MUST_PASS_ALL)或者只通过一个就可以了(Operator.MUST_PASS_ONE)。
然后将FilterList添加到getScanner方法中就可以了。
其实,还是很简单的。
所有的Filter 都是在服务端生效的,如果我们自定义Filter,那么需要将开发完成的Filter打成jar包,发送到服务端。对于生产环境,一般不会自定义过滤器。
而一般情况下,我们通过对rowkey进行合理的设计,就可以解决根据不同场景的查询了,没有必要自定义Filter。