hbase协处理器endpoint应用:hbase数据加盐(Salting)后的数据查询方法
1 介绍
上一篇文章中介绍了hbase数据加盐的方法,并简单介绍了加盐后的数据查询思路,但没有给出具体的实现方法,本文则介绍一下具体用hbase协处理器endpoint的实现。
协处理器分两种类型,系统协处理器可以全局导入region server上的所有数据表,表协处理器即是用户可以指定一张表使用协处理器。协处理器框架为了更好支持其行为的灵活性,提供了两个不同方面的插件。一个是观察者(observer),类似于关系数据库的触发器。另一个是终端(endpoint),动态的终端有点像存储过程。本文介绍的实现是endpoint的应用。
2.实现
2.1 示例
首先看一下hbase给出的示例计算表的行数RowCountEndpoint.java,源代码在hbase-examples的org.apache.hadoop.hbase.coprocessor.example
public voidgetRowCount(RpcController controller, ExampleProtos.CountRequest request,
RpcCallback<ExampleProtos.CountResponse> done) {
Scan scan= newScan();
scan.setFilter(new FirstKeyOnlyFilter());
ExampleProtos.CountResponse response = null;
InternalScanner scanner = null;
try{
scanner= env.getRegion().getScanner(scan);
List<Cell> results = newArrayList<Cell>();
booleanhasMore= false;
byte[]lastRow= null;
longcount= 0;
do{
hasMore = scanner.next(results);
for (Cell kv : results) {
byte[] currentRow = CellUtil.cloneRow(kv);
if (lastRow == null|| !Bytes.equals(lastRow, currentRow)){
lastRow = currentRow;
count++;
}
}
results.clear();
} while(hasMore);
response= ExampleProtos.CountResponse.newBuilder()
.setCount(count).build();
} catch(IOException ioe){
ResponseConverter.setControllerException(controller, ioe);
} finally{
if(scanner!= null){
try {
scanner.close();
} catch (IOException ignored) {}
}
}
done.run(response);
}
实现比较简单,region遍历所有的行返回行数,客户端再把所有的region行数相加即得到整个表的行数。
2.2 server实现
接下来给出仿照RowCountEndpoint实现hbase数据加盐(Salting)后的数据查询方法。
1)接口协议定义
由于hbase内部通信使用的protobuf协议,首先我们要生存协议类,如上面的ExampleProtos,定义自己要实现的协议类DataProtos
package generated;
optionjava_package="com.bigdata.coprocessor.endpoint.generated";
optionjava_outer_classname="DataProtos";
option java_generic_services = true;
option java_generate_equals_and_hash = true;
option optimize_for = SPEED;
message DataQueryRequest {
optional string tableName = 1;
optional string startRow = 2;
optional string endRow = 3;
optional string rowKey = 4;
optional bool incluedEnd = 5;
optional bool isSalting = 6;
}
message DataQueryResponse {
messageCell{
requiredbytes value = 1;
requiredbytes family = 2;
requiredbytes qualifier = 3;
requiredbytes row = 4;
}
message Row{
optionalbytes rowKey = 1;
repeatedCell cellList = 2;
}
repeated Row rowList = 1;
}
service QueryDataService{
rpcqueryByStartRowAndEndRow(DataQueryRequest)
returns (DataQueryResponse);
rpcqueryByRowKey(DataQueryRequest)
returns (DataQueryResponse);
}
里面定义了请求对象DataQueryRequest与响应对象BigDataQueryResponse,定义了一个服务DataService,服务里定义了两个方法,分别是根据起止行rowkey查询和根据单个rowkey查询,然后需要用protoc.exe生成对应的java实现类
执行命令protoc.exe DataProtos.proto --java_out=e:\hbase\protoc-2.4.1即可生成DataProtos.java,protoc.exe工具我也上传了,可以下载使用。
2)实现协处理器
server端代码
/*
*Licensed to the Apache Software Foundation (ASF) under one
* ormore contributor license agreements. Seethe NOTICE file
*distributed with this work for additional information
*regarding copyright ownership. The ASFlicenses this file
* toyou under the Apache License, Version 2.0 (the
*"License"); you may not use this file except in compliance
*with the License. You may obtain a copyof the License at
*
*http://www.apache.org/licenses/LICENSE-2.0
*
*Unless required by applicable law or agreed to in writing, software
*distributed under the License is distributed on an "AS IS" BASIS,
*WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
*See the License for the specific language governing permissions and
*limitations under the License.
*/
package com.bigdata.coprocessor.endpoint;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.Coprocessor;
importorg.apache.hadoop.hbase.CoprocessorEnvironment;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Get;
importorg.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.coprocessor.CoprocessorException;
importorg.apache.hadoop.hbase.coprocessor.CoprocessorService;
importorg.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
importorg.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
importorg.apache.hadoop.hbase.filter.InclusiveStopFilter;
importorg.apache.hadoop.hbase.protobuf.ResponseConverter;
importorg.apache.hadoop.hbase.regionserver.HRegion;
importorg.apache.hadoop.hbase.regionserver.InternalScanner;
import org.apache.hadoop.hbase.util.Bytes;
import com.google.protobuf.ByteString;
import com.google.protobuf.RpcCallback;
import com.google.protobuf.RpcController;
import com.google.protobuf.Service;
importcom.bigdata.coprocessor.endpoint.generated.DataProtos;
importcom.bigdata.coprocessor.endpoint.generated.DataProtos.DataQueryRequest;
importcom.bigdata.coprocessor.endpoint.generated.DataProtos.DataQueryResponse;
public class QueryEndpoint extendsDataProtos.QueryDataService implements
Coprocessor,CoprocessorService {
privateRegionCoprocessorEnvironment env;
publicQueryEndpoint() {
}
/**
* Just returns a reference to this object,which implements the
* RowCounterService interface.
*/
@Override
publicService getService() {
returnthis;
}
/**
* Returns a count of the rows in the regionwhere this coprocessor is
* loaded.
*/
@Override
publicvoid queryByStartRowAndEndRow(RpcController controller,
DataProtos.DataQueryRequestrequest,
RpcCallback<DataProtos.DataQueryResponse>done) {
DataProtos.DataQueryResponseresponse = null;
InternalScannerscanner = null;
try{
StringstartRow = request.getStartRow();
&nbs