第7课 Hbase 使用教程-优快云博客

本文详细介绍HBase Shell的基本操作及常用命令，包括创建表、插入数据、查询数据等，并解释了HBase中的列簇概念及其重要性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

声明：

本文基于Centos 6.x + CDH 5.x
本例中 Hbase 是安装成集群模式的

本文通过建立student表等相关操作，简单介绍一下hbase的shell操作

建立student 表

使用 hbase shell命令进入hbase的命令行

[plain]view plain copy
[root@localhost conf]# hbase shell  
2014-08-22 16:10:47,662 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available  
HBase Shell; enter 'help<RETURN>' for list of supported commands.  
Type "exit<RETURN>" to leave the HBase Shell  
Version 0.96.1.1-cdh5.0.1, rUnknown, Tue May  6 13:27:24 PDT 2014  

然后用 create建立一个表，我们建立的表有如下属性

表名： student

列簇：sid, name, age

什么是列簇？

Hbase是面向列存储的数据库。Hbase中数据列是由列簇来组织的。一个列簇相当于你在mysql中这个表的多个列定义的总和。但是特别的是，一个表可以对多个列簇。具体列簇里面有哪些列是开始时不用指定的。暂时只需要知道这么多，等做了以后慢慢去理解消化，我们学习的时候一定要掌握方法，先做再想为什么这么做，是最高效的学习方式。

为什么要有列簇？

在同一个列簇中的列是存放在一个实例上的。所以对于列簇的理解我的猜测是这样的，刚开始可能没有列簇。虽然nosql是不用定义列的，但是由于我们的hadoop是分布式的，肯定会有一些列在这台机子上，有一些列在那些机子上，为了性能问题，需要弄出一个算法来把一些经常在一起使用的列放到一台机子上，最简单的算法就是由用户自己去定，这就产生了列簇，也就是列的集合，在同一个列簇中的列都在一个机子上。

说完了概念，我们来建立一下这个表

[plain]view plain copy
hbase(main):001:0> create 'student', 'info'  
0 row(s) in 4.3300 seconds  
  
=> Hbase::Table - student  

增加数据

使用put增加一行，这里说的一行意思是：一个表的一个列簇中的一个行，在mysql中就相当于只增加一行中的一列

[plain]view plain copy
hbase(main):002:0> put 'student','row1','info:name','jack'  
0 row(s) in 0.1990 seconds  

意思是往 student 的 name 列中插入一个值 jack

我们查一下这条数据

[plain]view plain copy
hbase(main):003:0> get 'student','row1','info:name'  
COLUMN                        CELL                                                                                 
 info:name                    timestamp=1408697225683, value=jack                                                  
1 row(s) in 0.0490 seconds  

查出来了。

怎么样？是不是感觉这么费劲才插入了一个行的一个列？这是以为hbase是基于google的工程师 Fay Chang (应该是个华裔) 的关于bigtable的论坛写的，而bigtable就是拥有超大列数的表格，大到什么程度？大到一台电脑放不下了，必须用多台电脑分布式的存放，才能放的下，所以数据的操作都是以一行一列为最小单位的。

这个row1 是rowkey

rowkey

行以rowkey作为唯一标示。Rowkey是一段字节数组，这意味着，任何东西都可以保存进去，例如字符串、或者数字。行是按字典的排序由低到高存储在表中

我们继续插入这行别的列

[plain]view plain copy
hbase(main):004:0> put 'student','row1','info:sid','1'  
0 row(s) in 0.0200 seconds  
  
hbase(main):005:0> put 'student','row1','info:age','22'  
0 row(s) in 0.0210 seconds  

然后我们用scan命令查询一下整个表

[plain]view plain copy
hbase(main):006:0> scan 'student'  
ROW                           COLUMN+CELL                                                                          
 row1                         column=info:age, timestamp=1408697651322, value=22                                   
 row1                         column=info:name, timestamp=1408697225683, value=jack                                
 row1                         column=info:sid, timestamp=1408697640490, value=1                                    
1 row(s) in 0.0580 seconds  

可以看到有三条记录，但是都是一个row里面的，这个row才是相当于mysql的一行

继续插入别的记录，最终结果是这样

[plain]view plain copy
hbase(main):005:0> scan 'student'  
ROW                              COLUMN+CELL                                                                                    
 row1                            column=info:age, timestamp=1420817226790, value=22                                             
 row1                            column=info:name, timestamp=1420817205836, value=jack                                          
 row1                            column=info:sid, timestamp=1420817219869, value=1                                              
 row2                            column=info:age, timestamp=1420817278346, value=28                                             
 row2                            column=info:name, timestamp=1420817252182, value=terry                                         
 row2                            column=info:sid, timestamp=1420817267780, value=2                                              
 row3                            column=info:age, timestamp=1420817315351, value=18                                             
 row3                            column=info:name, timestamp=1420817294342, value=billy                                         
 row3                            column=info:sid, timestamp=1420817304621, value=3                                              
 row4                            column=info:name, timestamp=1420858768667, value=karry                                         
 row4                            column=info:sid, timestamp=1420858794556, value=4                                              
4 row(s) in 1.0990 seconds  

命令介绍

有了基础数据我们就可以通过一边操作一边学习hbase的命令了，比如上个例子我们学习到了一个新的命令 scan

scan 查询数据表

scan命令如果不带任何参数相当于sql中的 select * from table

Limit 查询后显示的条数

用limit可以限制查询的条数

[plain]view plain copy
scan 'student',{'LIMIT'=>2}  

效果如下

[plain]view plain copy
hbase(main):006:0> scan 'student',{'LIMIT'=>2}  
ROW                              COLUMN+CELL                                                                                    
 row1                            column=info:age, timestamp=1420817226790, value=22                                             
 row1                            column=info:name, timestamp=1420817205836, value=jack                                          
 row1                            column=info:sid, timestamp=1420817219869, value=1                                              
 row2                            column=info:age, timestamp=1420817278346, value=28                                             
 row2                            column=info:name, timestamp=1420817252182, value=terry                                         
 row2                            column=info:sid, timestamp=1420817267780, value=2                                              
2 row(s) in 0.8250 seconds  

STARTROW 起点rowkey

用startrow可以定义查询返回结果的起点rowkey，相当于大于等于，比如

[plain]view plain copy
hbase(main):007:0> scan 'student',{'STARTROW'=>'row2'}  
ROW                              COLUMN+CELL                                                                                    
 row2                            column=info:age, timestamp=1420817278346, value=28                                             
 row2                            column=info:name, timestamp=1420817252182, value=terry                                         
 row2                            column=info:sid, timestamp=1420817267780, value=2                                              
 row3                            column=info:age, timestamp=1420817315351, value=18                                             
 row3                            column=info:name, timestamp=1420817294342, value=billy                                         
 row3                            column=info:sid, timestamp=1420817304621, value=3                                              
 row4                            column=info:name, timestamp=1420858768667, value=karry                                         
 row4                            column=info:sid, timestamp=1420858794556, value=4  

STARTROW 可以使用通配符，比如

[plain]view plain copy
hbase(main):008:0> scan 'student',{'STARTROW'=>'row*'}  
ROW                              COLUMN+CELL                                                                                    
 row1                            column=info:age, timestamp=1420817226790, value=22                                             
 row1                            column=info:name, timestamp=1420817205836, value=jack                                          
 row1                            column=info:sid, timestamp=1420817219869, value=1                                              
 row2                            column=info:age, timestamp=1420817278346, value=28                                             
 row2                            column=info:name, timestamp=1420817252182, value=terry                                         
 row2                            column=info:sid, timestamp=1420817267780, value=2                                              
 row3                            column=info:age, timestamp=1420817315351, value=18                                             
 row3                            column=info:name, timestamp=1420817294342, value=billy                                         
 row3                            column=info:sid, timestamp=1420817304621, value=3                                              
 row4                            column=info:name, timestamp=1420858768667, value=karry                                         
 row4                            column=info:sid, timestamp=1420858794556, value=4                                              
4 row(s) in 0.2830 seconds  

多个参数可以同时使用，比如我要查询startrow = row2 并且只返回一条

[plain]view plain copy
hbase(main):009:0> scan 'student',{'STARTROW'=>'row2','LIMIT'=>1}  
ROW                              COLUMN+CELL                                                                                    
 row2                            column=info:age, timestamp=1420817278346, value=28                                             
 row2                            column=info:name, timestamp=1420817252182, value=terry                                         
 row2                            column=info:sid, timestamp=1420817267780, value=2                                              
1 row(s) in 0.1890 seconds  

STOPROW 定义查询的结束rowkey

跟startrow类似，同上

COLUMNS 控制返回的字段列表

就相当于sql中的 select xx,xxx,xxx from 这里面的列定义。比如我只需要查询所有学生的名字和年龄，不需要sid信息

[plain]view plain copy
hbase(main):011:0> scan 'student',{'COLUMNS'=>['info:name','info:age'],LIMIT=>3}  
ROW                              COLUMN+CELL                                                                                    
 row1                            column=info:age, timestamp=1420817226790, value=22                                             
 row1                            column=info:name, timestamp=1420817205836, value=jack                                          
 row2                            column=info:age, timestamp=1420817278346, value=28                                             
 row2                            column=info:name, timestamp=1420817252182, value=terry                                         
 row3                            column=info:age, timestamp=1420817315351, value=18                                             
 row3                            column=info:name, timestamp=1420817294342, value=billy                                         
3 row(s) in 0.4470 seconds  

注意写列名的时候要记得带上列簇！比如 info:name

TIMESTAMP 使用时间来精确定位数据

timestamp可以精确的指定某一条记录

[plain]view plain copy
hbase(main):012:0> scan 'student',{'TIMESTAMP'=>1420817315351}  
ROW                              COLUMN+CELL                                                                                    
 row3                            column=info:age, timestamp=1420817315351, value=18                                             
1 row(s) in 0.1920 seconds  

get 获取一行数据

用get可以只获取一行数据

[plain]view plain copy
hbase(main):073:0> get 'student','row1'  
COLUMN                           CELL                                                                                           
 info:age                        timestamp=1420817226790, value=22                                                              
 info:name                       timestamp=1420817205836, value=jack                                                            
 info:sid                        timestamp=1420817219869, value=1                                                               
3 row(s) in 0.1730 seconds  

可以跟上更复杂的参数

[plain]view plain copy
hbase(main):076:0> get 'student','row1',{COLUMN=>['info:name','info:sid']}  
COLUMN                           CELL                                                                                           
 info:name                       timestamp=1420817205836, value=jack                                                            
 info:sid                        timestamp=1420817219869, value=1                                                               
2 row(s) in 0.0490 seconds  
  
hbase(main):077:0> get 'student','row1',{COLUMN=>['info:name','info:sid'],TIMESTAMP=>1420817219869,VERSION=>1}  
COLUMN                           CELL                                                                                           
 info:sid                        timestamp=1420817219869, value=1                                                               
1 row(s) in 0.0740 seconds  

describe 查看表信息

describe 可以查看表的信息，这个命令会常常用到

[plain]view plain copy
hbase(main):013:0> describe 'student'  
DESCRIPTION                                                                       ENABLED                                       
 'student', {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', true                                          
  REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS                                                
 => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', I                                               
 N_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                     
1 row(s) in 7.6720 seconds  

alter 修改表的列簇

用alter可以修改表的列簇，hbase的一个表其实全部信息就是列簇的信息了，比如我们可以增加一个列簇f2

[plain]view plain copy
alter 'student', {NAME => 'f2', VERSION => 2}  

这个VERSION官方说是每个字段可以有2个版本，就是一个行的一个列元素可以存成两个值，拥有不同的version

添加完再看下表结构

[plain]view plain copy
hbase(main):057:0> describe 'student'  
DESCRIPTION                                                                       ENABLED                                       
 'student', {NAME => 'f2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', R true                                          
 EPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 'FOREVER                                               
 ', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_                                               
 MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'info', DATA_BLOCK_ENCODING =                                               
 > 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPR                                               
 ESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => '                                               
 false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                      
1 row(s) in 0.6180 seconds  

可以看到有两个列簇，一个是f2，一个是info

用 TTL 控制表的数据自动过期

不过我这边用一个比较实用的例子来教大家操作alter：在实际生产环境上经常需要给表增加过期时间，方便表自动清理早期的数据，防止数据过多，毕竟能用hadoop的环境数据量那都是“海量”

现在我把f2这个列簇的TTL修改为20秒

[plain]view plain copy
alter 'student', {NAME => 'f2', TTL => 20}  

然后再看下表信息

[plain]view plain copy
hbase(main):061:0> describe 'student'  
DESCRIPTION                                                                       ENABLED                                       
 'student', {NAME => 'f2', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', R true                                          
 EPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => '20 SECO                                               
 NDS', MIN_VERSIONS => '0', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536',                                                
 IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'info', DATA_BLOCK_ENCODIN                                               
 G => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', CO                                               
 MPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS =                                               
 > 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}                                                   
1 row(s) in 0.1540 seconds  

可以看到f2的 TTL 被设置为20 seconds。

然后我们测试一下添加一个记录到f2去，然后等20秒再去看下

[plain]view plain copy
hbase(main):065:0> put 'student','row3','f2:grade','2'  
0 row(s) in 0.0650 seconds  
  
hbase(main):066:0> scan 'student',{STARTROW=>'row3',LIMIT=>1}  
ROW                              COLUMN+CELL                                                                                    
 row3                            column=f2:grade, timestamp=1420872179176, value=2                                              
 row3                            column=info:age, timestamp=1420817315351, value=18                                             
 row3                            column=info:name, timestamp=1420817294342, value=billy                                         
 row3                            column=info:sid, timestamp=1420817304621, value=3                                              
1 row(s) in 0.0630 seconds  
  
hbase(main):067:0> scan 'student',{STARTROW=>'row3',LIMIT=>1}  
ROW                              COLUMN+CELL                                                                                    
 row3                            column=info:age, timestamp=1420817315351, value=18                                             
 row3                            column=info:name, timestamp=1420817294342, value=billy                                         
 row3                            column=info:sid, timestamp=1420817304621, value=3                                              
1 row(s) in 0.1370 seconds  

会看到刚添加进去的时候row2还有 f2:grade的数据，但是过了一会儿去看就没了

使用alter删除列簇

使用alter删除列簇的操作是带上一个METHOD参数，并写值为 delete

[plain]view plain copy
hbase(main):068:0> alter 'student', {NAME => 'f2', METHOD=>'delete'}  
Updating all regions with the new schema...  
0/1 regions updated.  
1/1 regions updated.  
Done.  
0 row(s) in 3.9750 seconds  
  
hbase(main):069:0> describe 'student'  
DESCRIPTION                                                                       ENABLED                                       
 'student', {NAME => 'info', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', true                                          
  REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS                                                
 => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', I                                               
 N_MEMORY => 'false', BLOCKCACHE => 'true'}                                                                                     
1 row(s) in 0.2210 seconds  

count 统计表中的数据

跟传统的关系型数据库不一样，这个命令可能会执行很久

[plain]view plain copy
hbase(main):082:0> count 'student'  
4 row(s) in 0.6410 seconds  
  
=> 4  

这个命令还有一个很奇怪的功能，就是在统计的时候可以每隔X行显示一下数据的rowkey，可能是方便统计的时候看下统计到哪里了，比如我分别用间隔2行跟间隔1行做了实验

[plain]view plain copy
hbase(main):083:0> count 'student',2  
Current count: 2, row: row2                                                                                                     
Current count: 4, row: row4                                                                                                     
4 row(s) in 0.0480 seconds  
  
=> 4  
hbase(main):084:0> count 'student',1  
Current count: 1, row: row1                                                                                                     
Current count: 2, row: row2                                                                                                     
Current count: 3, row: row3                                                                                                     
Current count: 4, row: row4                                                                                                     
4 row(s) in 0.0650 seconds  
  
=> 4  

list 查看数据库中的所有表

用list可以列出当前hbase中的所有表

[plain]view plain copy
hbase(main):079:0> list  
TABLE                                                                                                                           
employee                                                                                                                        
employee2                                                                                                                       
student                                                                                                                         
3 row(s) in 0.2020 seconds  
  
=> ["employee", "employee2", "student"]  

status 命令

查询服务状态

[plain]view plain copy
hbase(main):013:0> status  
1 servers, 0 dead, 3.0000 average load  

[plain]view plain copy
hbase(main):070:0> list  
TABLE                                                                                                                           
employee                                                                                                                        
employee2                                                                                                                       
student                                                                                                                         
3 row(s) in 0.6380 seconds  
  
=> ["employee", "employee2", "student"]  

version

查询版本号

whoami

看连接用户

[javascript]view plain copy
hbase(main):014:0> whoami  
root (auth:SIMPLE)  

truncate 快速清除数据

跟一般数据库中的truncate不太一样，如果你执行 truncate，hbase就是帮你把表停掉，删掉再重建一次，只是这个动作不用你手动做了而已

[plain]view plain copy
hbase(main):086:0> truncate 'student'  
Truncating 'student' table (it may take a while):  
 - Disabling table...  
 - Dropping table...  
 - Creating table...  
0 row(s) in 4.6330 seconds