Hbase study notes - operation task

最新推荐文章于 2022-10-02 15:20:51 发布

原创最新推荐文章于 2022-10-02 15:20:51 发布 · 138 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#大数据 #ui #shell

软件综合同时被 2 个专栏收录

39 篇文章

订阅专栏

java

25 篇文章

订阅专栏

本文介绍了HBase中节点的增加与移除流程，包括如何停启RegionServer、使用负载均衡器、执行滚动重启等操作。同时，还详细阐述了数据导入导出工具、表复制工具及批量加载数据的方法。

[color=red][size=medium]Node Decommissioning[/size][/color]
1 $ ./bin/hbase-daemon.sh stop regionserver
Disabling the Load Balancer Before Decommissioning a Node
hbase(main):001:0> balance_switch false
hbase(main):002:0> balance_switch true
2 $ ./bin/graceful_stop.sh HOSTNAME
where HOSTNAME is the host carrying the region server you want to decommission

[color=red][size=medium]Rolling Restarts[/size][/color]
1. Unpack your release, make sure of its configuration, and then rsync it across
the cluster. If you are using version 0.90.2, patch it with HBASE-3744 and
HBASE-3756.
2. Run hbck to ensure the cluster is consistent:
$ ./bin/hbase hbck
Effect repairs if inconsistent.
3. Restart the master:
$ ./bin/hbase-daemon.sh stop master; ./bin/hbase-daemon.sh start master
4. Disable the region balancer:
$ echo "balance_switch false" | ./bin/hbase shell
5. Run the graceful_stop.sh script per region server. For example:
$ for i in `cat conf/regionservers|sort`; do ./bin/graceful_stop.sh \
--restart --reload --debug $i; done &> /tmp/log.txt &
If you are running Thrift or REST servers on the region server, pass the --thrift
or --rest option, as per the script’s usage instructions, shown earlier (i.e., run it
without any commandline options to get the instructions).
6. Restart the master again. This will clear out the dead servers list and reenable the
balancer.
7. Run hbck to ensure the cluster is consistent.

[color=red][size=large]Pseudodistributed mode[/size]
[/color]
[color=red][size=medium]Adding Servers[/size][/color]
Starting a local backup master process is accomplished by
using the local-master-backup.sh script in the bin directory, like so:
$ ./bin/local-master-backup.sh start 1
The number at the end of the command signifies an offset that is added to the default ports of 60000 for RPC and 60010 for the web-based UI. In this example, a new master process would be started that reads the same configuration files as usual, but would listen on ports 60001 and 60011, respectively.

$./bin/local-master-backup.sh start 1 3 5
This starts three backup masters on ports 60001, 60003, and 60005 for RPC, plus 60011, 60013, and 60015 for the web UIs.

Stopping the backup master(s) involves the same command, but replacing the start command with the aptly named stop, like so:
$ ./bin/local-master-backup.sh stop 1

[color=red][size=medium]Adding a local region server.[/size][/color]
$ ./bin/local-regionservers.sh start 1
This command will start an additional region server using port 60201 for RPC, and 60301 for the web UI.
Starting more than one region server is accomplished by adding more offsets:
$ ./bin/local-regionservers.sh start 1 2 3
Stopping any additional region server involves replacing the start command with the stop command:
$ ./bin/local-regionservers.sh stop 1

[color=red][size=large]Fully distributed cluster[/size][/color]

[color=red][size=medium]Adding a backup master.[/size][/color]

The master process uses ZooKeeper to negotiate which is the currently active master:
there is a dedicated ZooKeeper znode that all master processes race to create, and the first one to create it wins. This happens at startup and the winning process moves on to become the current master. All other machines simply loop around the znode check
and wait for it to disappear—triggering the race again.
The /hbase/master znode is ephemeral, and is the same kind the region servers use to
report their presence. When the master process that created the znode fails, ZooKeeper will notice the end of the session with that server and remove the znode accordingly, triggering the election process.
Starting a server on multiple machines requires that it is configured just like the rest of the HBase cluster (see “Configuration” on page 63 for details). The master servers
usually share the same configuration with the other servers in the cluster. Once you
have confirmed that this is set up appropriately, you can run the following command
on a server that is supposed to host the backup master:
$ ./bin/hbase-daemon.sh start master
Assuming you already had a master running, this command will bring up the new master to the point where it waits for the znode to be removed.* If you want to start many masters in an automated fashion and dedicate a specific server to host the current one, while all the others are considered backup masters, you can add the --backup
switch like so:
$ ./bin/hbase-daemon.sh start master --backup

Since HBase 0.90.x, there is also the option of creating a backup-masters file in the conf directory. This is akin to the regionservers file, listing one hostname per line that is supposed to start a backup master. For the example in “Example Configuration”
on page 65, we could assume that we have three backup masters running on the ZooKeeper servers. In that case, the conf/backup-masters, would contain these entries:
zk1.foo.com
zk2.foo.com
zk3.foo.com

[color=red][size=medium]Adding a region server.[/size][/color]
The first thing you should do is to edit the regionservers
file in the conf directory, to enable the launcher scripts to automat the server start and stop procedure.‡ Simply add a new line to the file specifying the hostname to add.

Then you have a few choices to start the new region server process. One option is to run the start-hbase.sh script on the master machine.

Another option is to use the launcher script directly on the new server. This is done like so:
$ ./bin/hbase-daemon.sh start regionserver

[color=red][size=large]Data Tasks[/size][/color]
[color=red][size=medium]Import and Export Tools[/size][/color]
HBase ships with a handful of useful tools, two of which are the Import and Export
MapReduce jobs. They can be used to write subsets, or an entire table, to files in HDFS,
and subsequently load them again. They are contained in the HBase JAR file and you
need the hadoop jar command to get a list of the tools:
$ hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar
Adding the export program name then displays the options for its usage:
$ hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export
You do need to specify the parameters from left to right, and you cannot omit any inbetween.

Running the command will start the MapReduce job and print out the progress:
$ hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar export \
testtable /user/larsgeorge/backup-testtable

Once the job is complete, you can check the filesystem for the exported data. Use the
hadoop dfs command (the lines have been shortened to fit horizontally):
$ hadoop dfs -lsr /user/larsgeorge/backup-testtable

Importing the data is the reverse operation. First we can get the usage details by invoking
the command without any parameters, and then we can start the job with the table name and inputdir (the directory containing the exported files):
$ hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import
ERROR: Wrong number of arguments: 0
Usage: Import <tablename> <inputdir>

$ hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar import \
testtable /user/larsgeorge/backup-testtable

[color=red][size=medium]CopyTable Tool[/size][/color]
Another supplied tool is CopyTable, which is primarily designed to bootstrap cluster replication.You can use is it to make a copy of an existing table from the master cluster to the slave cluster. Here are its command-line options:
$ hadoop jar $HBASE_HOME/hbase-0.91.0-SNAPSHOT.jar copytable
Examples:
To copy 'TestTable' to a cluster that uses replication for a 1 hour window:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable \
--rs.class=org.apache.hadoop.hbase.ipc.ReplicationRegionInterface
--rs.impl=org.apache.hadoop.hbase.regionserver.replication.ReplicationRegionServer
--starttime=1265875194289 --endtime=1265878794289
--peer.adr=server1,server2,server3:2181:/hbase TestTable

[color=red][size=medium]Bulk Import[/size][/color]
Bulk load procedure
The HBase bulk load process consists of two main steps:
Preparation of data
Load data

[color=red][size=medium]Using the importtsv tool[/size][/color]

[color=red][size=medium]Using the completebulkload Tool[/size][/color]