数据结构
id name star
2 Warcraft
[{"starringid":1001,"starringname":"cheng"},
{"starringid":1002,"starringname":"dian"},
{"starringid":1003,"starringname":"hu"}]
3 CrossFire
[{"starringid":1002,"starringname":"dian"},
{"starringid":1004,"starringname":"li"},
{"starringid":1005,"starringname":"lei"}]
4 FIFA
[{"starringid":1007,"starringname":"王"},
{"starringid":1008,"starringname":"月"},
{"starringid":1011,"starringname":"为"}]
其中star列为Array[struct]嵌套类型,现演示如何通过python+thrift保存至Hbase
另外python通过thrift连接HBase参见
http://blog.youkuaiyun.com/umbrellacheng/article/details/51848802
嵌套类型转化
hbase中cell的分隔符,默认为\x02和\x03,所以嵌套类型
[{“starringid”:1007,”starringname”:”王”},
{“starringid”:1008,”starringname”:”月”},
{“starringid”:1011,”starringname”:”为”}]
对应的保存数据为
u’1007\x03王\x021008\x03月\x021011\x03为’
相关代码
#!/usr/bin/env python
import sys
sys.path.append('/usr/local/lib/python2.7/site-packages/hbase')
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase
from hbase.ttypes import *
import pickle
transport = TSocket.TSocket('10.18.210.202', 9090)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
transport.open()
row = str(4)
star = u'1007\x03王\x021008\x03月\x021011\x03为'
star2 = star.encode("utf8")
name = 'FIFA'
mutations = [Mutation(column="starinfo:star", value=star2)]
print mutations
client.mutateRow('doubanhbase', row, mutations, {})
result = client.getRow('doubanhbase',row, None)
for r in result:
print r.row
print str(r.columns.get('starinfo:star').value)
输出结果
打印输出为
[Mutation(column='starinfo:star', isDelete=False, writeToWAL=True, value='1007\x03\xe7\x8e\x8b\x021008\x03\xe6\x9c\x88\x021011\x03\xe4\xb8\xba')]
4
1007王1008月1011为
查看Hbase数据表doubanhbase
hbase(main):089:0> scan 'doubanhbase'
ROW COLUMN+CELL
2 column=basicinfo:name, timestamp=1466752946508, value=Warcraft
2 column=starinfo:star, timestamp=1466752946508, value=1001\x03cheng\x021002\x03dian\x021003\x03hu
3 column=basicinfo:name, timestamp=1466752946508, value=CrossFire
3 column=starinfo:star, timestamp=1466752946508, value=1002\x03dian\x021004\x03li\x021005\x03lei
4 column=basicinfo:name, timestamp=1467946899884, value=FIFA
4 column=starinfo:star, timestamp=1467966426845, value=1007\x03\xE7\x8E\x8B\x021008\x03\xE6\x9C\x88\x021011\x03\xE6\xB3\x89
3 row(s) in 0.1060 seconds
查询Hbase映射HIve表d_doubanhbase
hive> select * from d_doubanhbase;
OK
2 Warcraft [{"starringid":1001,"starringname":"cheng"},{"starringid":1002,"starringname":"dian"},{"starringid":1003,"starringname":"hu"}]
3 CrossFire [{"starringid":1002,"starringname":"dian"},{"starringid":1004,"starringname":"li"},{"starringid":1005,"starringname":"lei"}]
4 FIFA [{"starringid":1007,"starringname":"王"},{"starringid":1008,"starringname":"月"},{"starringid":1011,"starringname":"为"}]
Time taken: 0.281 seconds, Fetched: 3 row(s)
注意事项
因代码中涉及中文,请务必保持该py文件为utf-8编码