HBase

最新推荐文章于 2024-05-21 05:00:00 发布

iteye_5392

最新推荐文章于 2024-05-21 05:00:00 发布

阅读量129

点赞数

CC 4.0 BY-SA版权

分类专栏： hadoop 文章标签： HBASE

本文链接：https://blog.youkuaiyun.com/iteye_5392/article/details/82440705

hadoop 专栏收录该内容

11 篇文章

订阅专栏

标记下：先翻译下HBase,hadoop未必全部需要，HBase不可少（构建在Hadoop的HDFS之上，实际上依赖于Hadoop，如果只是测试在单机运行，不需要安装配置Hadoop，如果需要分布式，还是需要的），看了下cassandra，accumulo，都大同小异，主要是没有深入到源码级别。

When Would I Use HBase?

Use HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

何时使用HBase？

如果需要随机，实时读写Big Data数据。这项目目标是支持巨型表——几十亿行，几百万烈——构建在集群硬件之上。

HBase是开源的，分布式，版本化的，面向列方式存储的，以Google的BigTable为模型。正如Google的GFS中，Bigtable在分布式存储上的核心，HBase是Hadoop和HDFS的分布式存储核心。

Features

Linear and modular scalability.
Strictly consistent reads and writes.
Automatic and configurable sharding of tables
Automatic failover support between RegionServers.
Convenient base classes for backing Hadoop MapReduce jobs with HBase tables.
Easy to use Java API for client access.
Block cache and Bloom Filters for real-time queries.
Query predicate push down via server side Filters
Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
Extensible jruby-based (JIRB) shell
Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX

特性

线性和模块化的扩展能力
读写强一致性
自动化和可配置的表分片
通过RegionSever支持自动故障转移
无缝支持Hadoop的基于HBase表数据的MapReduce任务
易于客户端通过Java API访问
基于块缓存和Bloom filter(Bloom Filters是一种效率较高的内存索引hash算法，它本身具有矛盾性：一方面能快速测试目标成员是否存在，另一方面又不可避免的具有假命中率)来支持实时查询
通过服务端的Filters来查询预测
Thrift 网关和REST-ful web应用，支持XML，Protobuf，和二进制编码数据。
可扩展jruby-based (JIRB) 脚本
支持外部的测量，如通过Hadoop的测量子系统，文件，Ganglia或者JMX。

JIRB的启动方式：

$ ./bin/hbase org.jruby.Main PATH_TO_SCRIPT

PATH_TO_SCRIPT，是一个.rb文件。ruby，python这种还真是挺火的...