HBase 第一篇-优快云博客

本文链接：https://blog.youkuaiyun.com/iteye_8075/article/details/82235410

本文详细介绍了HBase，一种基于Hadoop的数据库，适用于大规模数据存储和实时读写操作。它提供了线性和模块化的可扩展性，严格一致的读写特性，以及自动分片、故障转移支持等功能。通过利用HDFS提供的分布式数据存储，HBase能够高效地处理亿级行乘以百万级列的大表数据，同时提供快速记录查找和更新能力。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Welcome to Apache HBase!

HBase is the Hadoop database. Think of it as a distributed scalable Big Data store.

hadoop 的database，类似与google的Big Table

When Would I Use HBase?

Use HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

Features

HBase provides:

Linear and modular scalability.
Strictly consistent reads and writes.
Automatic and configurable sharding of tables
Automatic failover support between RegionServers.
Convenient base classes for backing Hadoop MapReduce jobs with HBase tables.
Easy to use Java API for client access.
Block cache and Bloom Filters for real-time queries.
Query predicate push down via server side Filters
Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
Extensible jruby-based (JIRB) shell
Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX

When Should I Use HBase?(这个据对的是重点哦，找了很久，貌似要找到适合我应用的框架了，赞。 fast record lookups (and updates) )

First, make sure you have enough data. HBase isn't suitable for every problem. If you have hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few thousand/million rows, then using a traditional RDBMS might be a better choice due to the fact that all of your data might wind up on a single node (or two) and the rest of the cluster may be sitting idle.

Second, make sure you have enough hardware. Even HDFS doesn't do well with anything less than 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.

HBase can run quite well stand-alone on a laptop - but this should be considered a development configuration only.

What Is The Difference Between HBase and Hadoop/HDFS?

HDFS is a distributed file system that is well suited for the storage of large files. It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files. HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist on HDFS for high-speed lookups. See the Chapter 5, Data Model and the rest of this chapter for more information on how HBase achieves its goals.