Some Research of HBase
1. The Architecture of HBase
There are three major components of the HBase architecture:
1. The HBaseMaster (analogous to the Bigtable master server)
2. The HRegionServer (analogous to the Bigtable tablet server)
3. The HBase client, defined by org.apache.hadoop.hbase.client.HTable
The Architecture of HBase
1) HBase Master duties:
l Cluster initialization
l Assigning/unassigning regions to/from HRegionServers (unassigning is for load balance)
l Monitor the health and load of each HRegionServer
l Changes to the table schema and handling table administrative functions
l Data localization
2) HBase Region Server duties:
l Serving HRegions assigned to HRegionServer
l Handling client read and write requests
l Flushing cache to HDFS
l Keeping HLog
l Compactions
l Region Splits
3) HBase Client
HBase is a Heavy Client System. Each client manages its own connection to appropriate server
2. The process of commit and update data.
The process of commit and update data.
Write Requests
When a write request is received, it is first written to a write-ahead log called a HLog. All write requests for every region the region server is serving are written to the same log. Once the request has been written to the HLog, it is stored in an in-memory cache called the Memcache. There is one Memcache for each HStore.
Read Requests
Reads are handled by first checking the Memcache and if the requested data is not found, the MapFiles are searched for results.
Cache Flushes
When the Memcache reaches a configurable size, it is flushed to disk, creating a new MapFile and a marker is written to the HLog, so that when it is replayed, log entries before the last flush can be skipped. A flush may also be triggered to relieve memory pressure on the region server.
Cache flushes happen concurrently with the region server processing read and write requests. Just before the new MapFile is moved into place, reads and writes are suspended until the MapFile has been added to the list of active MapFiles for the HStore.
3. Basic Operation of HBase Table
Operation | HBase shell Command |
Create Table | create 't1', {NAME => 'f1', VERSIONS => 5} |
Add the column family | alter 't1', {NAME => 'f1', VERSIONS => 5} |
Delete the column family | alter 't1', {NAME => 'f1', METHOD => 'delete'} |
Get row | get 't1', 'r1' |
Get cell content | get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} |
List all tables | list |
Count the table | count ‘t1’ |
scan | Scanner specifications may include one or more of the following: LIMIT, STARTROW, STOPROW, TIMESTAMP, or COLUMNS: 1. scan '.META.', {COLUMNS => 'info:regioninfo'} 2. scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, \ STARTROW => 'xyz'} |
4. Check the status:
1) Check the DFS namenode status: http://{DFSNameNodeIP}:50070/dfshealth.jsp
2) Check the HBase status: http://{masterServerIP}:60010/master.jsp
5. Meet some error and the solve method:
Error | Root | Resolve |
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /tmp/hadoop-root/dfs/data: namenode namespaceID = 1743665947; datanode namespaceID = 352063137 | Incompatible namespaceIDs | rm –rf /tmp/* |
NativeException: org.apache.hadoop.hbase.TableNotDisabledException: org.apache.hadoop.hbase.TableNotDisabledException: t1 | Before alter the column family , you need to disable the table first | Hbase>disable ‘t1’ |
6. How to deploy the HBase
http://hi.baidu.com/webcell/blog/item/83ee17303e7d2391a8018e5d.html/cmtid/1ae33bfa2d7bfe14a9d311b5
7. How to turning the HBase Performance
http://wiki.apache.org/hadoop/PerformanceTuning