sqoop是一款数据互导工具,利用它可以在关系数据库与hbase/hive/hdfs进行数据互导,过程中会生成一个mapreduce任务,也就说是sqoop是基于hadoop的.淘宝也有自己的数据互导工具,叫datax,它跟sqoop实现原理不同,没有基于hadoop.
sqoop目前还没有很好的支持不同的导数据业务,比如数据库字段名称不能与java关键字同名、row-key不支持整型导入等,所以还需要对sqoop进行扩展,使其能很好地支持我们的业务。今天实现了row-key支持整形导入
继承PutTransformer类
public class ToIntPutTransformer extends PutTransformer {
public static final Log LOG = LogFactory.getLog(
ToIntPutTransformer.class.getName());
// A mapping from field name -> bytes for that field name.
// Used to cache serialization work done for fields names.
private Map<String, byte[]> serializedFieldNames;
public ToIntPutTransformer() {
serializedFieldNames = new TreeMap<String, byte[]>();
}
/**
* Return the serialized bytes for a field name, using
* the cache if it's already in there.
*/
private byte [] getFieldNameBytes(String fieldName) {
byte [] cachedName = serializedFieldNames.get(fieldName);
if (null != cachedName) {
// Cache hit. We're done.
return cachedName;
}
// Do the serialization and memoize the result.
byte [] nameBytes = Bytes.toBytes(fieldName);
serializedFieldNames.put(fieldName, nameBytes);
return nameBytes;
}
@Override
/** {@inheritDoc} */
public List<Put> getPutCommand(Map<String, Object> fields)
throws IOException {
String rowKeyCol = getRowKeyColumn();
String colFamily = getColumnFamily();
byte [] colFamilyBytes = Bytes.toBytes(colFamily);
Object rowKey = fields.get(rowKeyCol);
if (null == rowKey) {
// If the row-key column is null, we don't insert this row.
LOG.warn("Could not insert row with null value for row-key column: "
+ rowKeyCol);
return null;
}
//rowKey转换为整形,指定的rowKey对应的数据库字段要是整型,否则报错
Put put = new Put(Bytes.toBytes(Integer.parseInt(rowKey.toString())));
for (Map.Entry<String, Object> fieldEntry : fields.entrySet()) {
String colName = fieldEntry.getKey();
if (!colName.equals(rowKeyCol)) {
// This is a regular field, not the row key.
// Add it if it's not null.
Object val = fieldEntry.getValue();
if (null != val) {
put.add(colFamilyBytes, getFieldNameBytes(colName),
Bytes.toBytes(val.toString()));
}
}
}
return Collections.singletonList(put);
}
编译打包,放到lib目录下.
配置conf/sqoop-site.xml文件,增加
<property> <name>sqoop.hbase.insert.put.transformer.class</name> <value>org.apache.sqoop.hbase.ToIntPutTransformer</value> </property>完成.