HBase shell指令总结

最新推荐文章于 2024-09-29 15:31:41 发布

weixin_30522183

最新推荐文章于 2024-09-29 15:31:41 发布

阅读量205

点赞数

CC 4.0 BY-SA版权

文章标签：大数据 shell 数据库

原文链接：http://www.cnblogs.com/FightingNoob/p/9087253.html

本文详细介绍了HBase Shell中的各种命令及其用法，包括DDL和DML操作、Region管理等，适合HBase初学者及需要复习相关命令的读者。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

hbase是面向列的nosql，其指令较之传统关系型数据库是有所不同的，我们可以利用hbase shell命令行来熟悉hbase的基本指令。

首先进入hbase:　　$HBASE_HOME/bin/hbase shell

输入help指令，可以查看基本命令集合，一般常用的命令如下：

whoami 查用户
help查看基本命令集合
help command 查看命令帮助
list看库中所有表
status 查看当前运行服务器状态
version 版本查询
exists '表名字' 判断表存在

hbase shell中删除为 ctrl + backspace（单按删除键仅能删除掉当前光标所指以后的部分）

下面总结一些常用的hbase指令：

1.DDL部分

　i. 【Namespace】

　　1）创建namespace

　　create_namespace 'ns1'

　　2）描述namespace

　　describe_namespace 'ns1'

　　3）删除namespace

　　drop_namespace 'ns1'

ii.【Table】

　　1）创建表（此处示例包含两个列簇）

　　create 'ns1:t1',{NAME => 'f1',VERSION => 5},{NAME => 'f2'}

　　或create 'ns1:t1', 'f1','f2'

　　2）修改表结构（禁能 -》修改-》使能）

　　disable 't1'

　　alter 't1',{NAME => 'f1'},{NAME => 'f2',METHOD => 'delete'}

　　enable 't1'

　　3）删除表（禁能-》删除）

　　disable 't1'

　　drop 't1'

　　4）描述

　　describe 't1'

此处注明一下hbase表结构属性(指定时属性名须大写)：

　　NAME:列簇名

　　BLOOMFILTER: 过滤storefile，一般为下列三种：

　　　　ROW: 行键过滤

　　　　ROWCOL:行列过滤

　　　　NONE:无

　　VERSIONS：版本数

　　MIN_VERSIONS:最小版本数

　　TTL:版本存活时间，默认为FORVER

　　BLOCKSIZE:数据块大小

　　IN_MEMORY:赋予某些列簇具有较高的优先级（true/false）

　　BLOCKCACHE:数据块缓存，一般常用的列簇我们将其设定为true

2. DML部分

i.【put】

增加一行数据

Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates.  To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:

  hbase> put 'ns1:t1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value', ts1
  hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:

  hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}

ii. 【get】

获取一行数据，可以指定列簇，列，版本

Get row or cell contents; pass table name, row, and optionally
a dictionary of column(s), timestamp, timerange and versions. Examples:

  hbase> get 'ns1:t1', 'r1'
  hbase> get 't1', 'r1'
  hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
  hbase> get 't1', 'r1', {COLUMN => 'c1'}
  hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMERANGE => [ts1, ts2], VERSIONS => 4}
  hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1, VERSIONS => 4}
  hbase> get 't1', 'r1', {FILTER => "ValueFilter(=, 'binary:abc')"}
  hbase> get 't1', 'r1', 'c1'
  hbase> get 't1', 'r1', 'c1', 'c2'
  hbase> get 't1', 'r1', ['c1', 'c2']
  hbsase> get 't1','r1', {COLUMN => 'c1', ATTRIBUTES => {'mykey'=>'myvalue'}}
  hbsase> get 't1','r1', {COLUMN => 'c1', AUTHORIZATIONS => ['PRIVATE','SECRET']}

iii. 【scan】

扫描全表，指定对应行的过滤条件并返回

Scan a table; pass table name and optionally a dictionary of scanner
specifications.  Scanner specifications may include one or more of:
TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
or COLUMNS, CACHE

If no columns are specified, all columns will be scanned.
To scan all members of a column family, leave the qualifier empty as in
'col_family:'.

The filter can be specified in two ways:
1. Using a filterString - more information on this is available in the
Filter Language document attached to the HBASE-4176 JIRA
2. Using the entire package name of the filter.

Some examples:

  hbase> scan 'hbase:meta'
  hbase> scan 'hbase:meta', {COLUMNS => 'info:regioninfo'}
  hbase> scan 'ns1:t1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
  hbase> scan 't1', {REVERSED => true}
  hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND
    (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"}
  hbase> scan 't1', {FILTER =>
    org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
For setting the Operation Attributes 
  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], ATTRIBUTES => {'mykey' => 'myvalue'}}
  hbase> scan 't1', { COLUMNS => ['c1', 'c2'], AUTHORIZATIONS => ['PRIVATE','SECRET']}
For experts, there is an additional option -- CACHE_BLOCKS -- which
switches block caching for the scanner on (true) or off (false).  By
default it is enabled.  Examples:

  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], CACHE_BLOCKS => false}

Also for experts, there is an advanced option -- RAW -- which instructs the
scanner to return all cells (including delete markers and uncollected deleted
cells). This option cannot be combined with requesting specific COLUMNS.
Disabled by default.  Example:

  hbase> scan 't1', {RAW => true, VERSIONS => 10}

Besides the default 'toStringBinary' format, 'scan' supports custom formatting
by column.  A user can define a FORMATTER by adding it to the column name in
the scan specification.  The FORMATTER can be stipulated: 

 1. either as a org.apache.hadoop.hbase.util.Bytes method name (e.g, toInt, toString)
 2. or as a custom class followed by method name: e.g. 'c(MyFormatterClass).format'.

Example formatting cf:qualifier1 and cf:qualifier2 both as Integers: 
  hbase> scan 't1', {COLUMNS => ['cf:qualifier1:toInt',
    'cf:qualifier2:c(org.apache.hadoop.hbase.util.Bytes).toInt'] }

示例：scan 'stu_info',{COLUMN => 'info:name',STARTROW => '20180525_10001',ENDROW => '20180525_10005'}

iv.【count】

统计表中数据的记录数，其中默认为每1000行显示一次当前计数，默认缓存大小为10行，如果当前行大小较小，应该增加此项属性值

Count the number of rows in a table.  Return value is the number of rows.
This operation may take a LONG time (Run '$HADOOP_HOME/bin/hadoop jar
hbase.jar rowcount' to run a counting mapreduce job). Current count is shown
every 1000 rows by default. Count interval may be optionally specified. Scan
caching is enabled on count scans by default. Default cache size is 10 rows.
If your rows are small in size, you may want to increase this
parameter. Examples:

 hbase> count 'ns1:t1'
 hbase> count 't1'
 hbase> count 't1', INTERVAL => 100000
 hbase> count 't1', CACHE => 1000
 hbase> count 't1', INTERVAL => 10, CACHE => 1000

v.【deleteall】

删除行

Delete all cells in a given row; pass a table name, row, and optionally
a column and timestamp. Examples:

  hbase> deleteall 'ns1:t1', 'r1'
  hbase> deleteall 't1', 'r1'
  hbase> deleteall 't1', 'r1', 'c1'
  hbase> deleteall 't1', 'r1', 'c1', ts1
  hbase> deleteall 't1', 'r1', 'c1', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

其中不指定列，为删除整行

3. Region管理

i.【移动region】move

Move a region.  Optionally specify target regionserver else we choose one
at random.  NOTE: You pass the encoded region name, not the region name so
this command is a little different to the others.  The encoded region name
is the hash suffix on region names: e.g. if the region name were
TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396. then
the encoded region name portion is 527db22f95c8a9e0116f0cc13c680396
A server name is its host, port plus startcode. For example:
host187.example.com,60020,1289493121758
Examples:

  hbase> move 'ENCODED_REGIONNAME'
  hbase> move 'ENCODED_REGIONNAME', 'SERVER_NAME'

ii.【开启/关闭region】

Enable/Disable balancer. Returns previous balancer state.
Examples:

  hbase> balance_switch true
  hbase> balance_switch false

iii.【手动split】

Split entire table or pass a region to split individual region.  With the 
second parameter, you can specify an explicit split key for the region.  
Examples:
    split 'tableName'
    split 'namespace:tableName'
    split 'regionName' # format: 'tableName,startKey,id'
    split 'tableName', 'splitKey'
    split 'regionName', 'splitKey'

iv.【手动触发major compact】

Run major compaction on passed table or pass a region row
          to major compact an individual region. To compact a single
          column family within a region specify the region name
          followed by the column family name.
          Examples:
          Compact all regions in a table:
          hbase> major_compact 't1'
          hbase> major_compact 'ns1:t1'
          Compact an entire region:
          hbase> major_compact 'r1'
          Compact a single column family within a region:
          hbase> major_compact 'r1', 'c1'
          Compact a single column family within a table:
          hbase> major_compact 't1', 'c1'