HBase Block Cache的重要实现细节和In-Memory Cache的特点

HBase优化技巧
本文介绍了HBase中的缓存管理机制,包括block cache的工作原理及如何合理配置in-memory cache以提高性能。同时,还探讨了rowkey的设计原则、ColumnFamily的最佳实践、数据版本控制以及数据生存周期管理等内容。
每load一个block到cache时,都会检查当前cache的size是否已经超过了“警戒线”,这个“警戒线”是一个规定的当前block cache总体积占额定体积的安全比例,默认该值是0.85,即当加载了一个block到cache后总大小超过了既定的85%就开始触发异步的evict操作了。

evict的逻辑是这样的:遍历cache中的所有block,根据它们所属的级别(single,multi,in-memory)分拨到三个优先级队列中,队头元素是最旧(最近访问日间值最小)的那个block。对这个三队列依次驱逐对头元素,释放空间。

所以说:in-memory的block与其他类型的block并无本质上的不同,它不会长久驻留cache而不被逐出cache, 当不断有新的in-memory的block被访问,而现有in-memory cache已达到上限时,旧的in-memory block就会被替换出去,除非,所有in-memory的block的总体积小于in-memory cache。

但是in-memory的block确实不同于其他两种block的地方在于它的这个“in-memory”特征是静态指定的(在column family上设置),不会像其他两种cache会因访问频率而发生改变,这就决定了它的独立性,另外两种block访问次数再多也不会被放到in-memory的区段里去,in-memory的block不管是第几次访问,总是被放置到in-memory的区段中。

从in-memory cache的这些特性上来看,需要特别强调的是:

1. 标记IN_MEMORY=>'true'的column family的总体积最好不要超过in-memory cache的大小(in-memory cache = heap size * hfile.block.cache.size * 0.85 * 0.25),特别是当总体积远远大于了in-memory cache时,会在in-memory cache上发生严重的颠簸。

2. 换个角度再看,普遍提到的使用in-memory cache的场景是把元数据表的column family声明为IN_MEMORY=>'true。实际上这里的潜台词是:元数据表都很小。其时我们也可以大胆地把一些需要经常访问的,总体积不会超过in-memory cache的column family都设为IN_MEMORY=>'true'从而更加充分地利用cache空间。就像前面提到的,普通的block永远是不会被放入in-memory cache的,只存放少量metadata是对in-memory cache资源的浪费(未来的版本应该提供三种区段的比例配置功能)。

1.2 Row Key

HBase中row key用来检索表中的记录,支持以下三种方式:

  • 通过单个row key访问:即按照某个row key键值进行get操作;
  • 通过row key的range进行scan:即通过设置startRowKey和endRowKey,在这个范围内进行扫描;
  • 全表扫描:即直接扫描整张表中所有行记录。

在HBase中,row key可以是任意字符串,最大长度64KB,实际应用中一般为10~100bytes,存为byte[]字节数组,一般设计成定长的

row key是按照字典序存储,因此,设计row key时,要充分利用这个排序特点,将经常一起读取的数据存储到一块,将最近可能会被访问的数据放在一块。

举个例子:如果最近写入HBase表中的数据是最可能被访问的,可以考虑将时间戳作为row key的一部分,由于是字典序排序,所以可以使用Long.MAX_VALUE – timestamp作为row key,这样能保证新写入的数据在读取时可以被快速命中。

1.3 Column Family

不要在一张表里定义太多的column family。目前Hbase并不能很好的处理超过2~3个column family的表。因为某个column family在flush的时候,它邻近的column family也会因关联效应被触发flush,最终导致系统产生更多的I/O。感兴趣的同学可以对自己的HBase集群进行实际测试,从得到的测试结果数据验证一下。

1.4 In Memory

创建表的时候,可以通过HColumnDescriptor.setInMemory(true)将表放到RegionServer的缓存中,保证在读取的时候被cache命中。

1.5 Max Version

创建表的时候,可以通过HColumnDescriptor.setMaxVersions(int maxVersions)设置表中数据的最大版本,如果只需要保存最新版本的数据,那么可以设置setMaxVersions(1)。

1.6 Time To Live

创建表的时候,可以通过HColumnDescriptor.setTimeToLive(int timeToLive)设置表中数据的存储生命期,过期数据将自动被删除,例如如果只需要存储最近两天的数据,那么可以设置setTimeToLive(2 * 24 * 60 * 60)。



[root@ljm1 apache-flume-1.8.0-bin]# bin/flume-ng agent --conf ./conf --conf-file flume-conf.properties --name agent -Dflume.root.logger=INFO,console Info: Sourcing environment configuration script /home/ljm1/daolun/servers/apache-flume-1.8.0-bin/conf/flume-env.sh Info: Including Hadoop libraries found via (/home/ljm1/daolun/servers/hadoop-2.7.4/bin/hadoop) for HDFS access Info: Including HBASE libraries found via (/home/ljm1/daolun/servers/hbase-1.4.0/bin/hbase) for HBASE access Info: Including Hive libraries found via (/home/ljm1/daolun/servers/apache-hive-1.2.1-bin) for Hive access + exec /home/ljm1/daolun/servers/jdk/bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/home/ljm1/daolun/servers/apache-flume-1.8.0-bin/conf:/home/ljm1/daolun/servers/apache-flume-1.8.0-bin/lib/*:/home/ljm1/daolun/servers/hadoop-2.7.4/etc/hadoop:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/common/lib/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/common/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/hdfs:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/hdfs/lib/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/hdfs/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/yarn/lib/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/yarn/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/mapreduce/lib/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/mapreduce/*:/home/ljm1/daolun/servers/hadoop-2.7.4/contrib/capacity-scheduler/*.jar:/home/ljm1/daolun/servers/hbase-1.4.0/conf:/home/ljm1/daolun/servers/jdk/lib/tools.jar:/home/ljm1/daolun/servers/hbase-1.4.0:/home/ljm1/daolun/servers/hbase-1.4.0/lib/activation-1.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/apacheds-i18n-2.0.0-M15.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/api-asn1-api-1.0.0-M20.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/api-util-1.0.0-M20.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/asm-3.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/avro-1.7.7.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-beanutils-1.7.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-beanutils-core-1.8.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-cli-1.2.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-codec-1.9.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-collections-3.2.2.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-compress-1.4.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-configuration-1.6.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-daemon-1.0.13.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-digester-1.8.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-el-1.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-httpclient-3.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-io-2.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-lang-2.6.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-logging-1.2.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-math-2.2.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-math3-3.1.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/commons-net-3.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/curator-client-2.7.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/curator-framework-2.7.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/curator-recipes-2.7.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/disruptor-3.3.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/findbugs-annotations-1.3.9-1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/gson-2.2.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/guava-12.0.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-annotations-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-auth-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-client-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-common-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-hdfs-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-mapreduce-client-app-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-mapreduce-client-common-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-mapreduce-client-core-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-mapreduce-client-jobclient-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-mapreduce-client-shuffle-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-yarn-api-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-yarn-client-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-yarn-common-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hadoop-yarn-server-common-2.7.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-annotations-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-annotations-1.4.0-tests.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-client-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-common-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-common-1.4.0-tests.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-examples-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-external-blockcache-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-hadoop2-compat-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-hadoop-compat-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-it-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-it-1.4.0-tests.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-metrics-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-metrics-api-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-prefix-tree-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-procedure-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-protocol-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-resource-bundle-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-rest-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-rsgroup-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-server-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-server-1.4.0-tests.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-shell-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/hbase-thrift-1.4.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/htrace-core-3.1.0-incubating.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/httpclient-4.5.2.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/httpcore-4.4.4.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jackson-core-asl-1.9.13.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jackson-jaxrs-1.9.13.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jackson-mapper-asl-1.9.13.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jackson-xc-1.9.13.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jamon-runtime-2.4.1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jasper-compiler-5.5.23.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jasper-runtime-5.5.23.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jaxb-api-2.2.2.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jaxb-impl-2.2.3-1.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jcodings-1.0.8.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jersey-client-1.9.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jersey-core-1.9.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jersey-json-1.9.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jersey-server-1.9.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jettison-1.3.3.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jetty-6.1.26.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jetty-sslengine-6.1.26.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jetty-util-6.1.26.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/joni-2.1.2.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jruby-complete-1.6.8.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jsch-0.1.54.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jsp-2.1-6.1.14.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/jsp-api-2.1-6.1.14.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/junit-4.12.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/leveldbjni-all-1.8.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/libthrift-0.9.3.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/log4j-1.2.17.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/metrics-core-2.2.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/metrics-core-3.1.2.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/netty-all-4.1.8.Final.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/paranamer-2.3.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/protobuf-java-2.5.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/servlet-api-2.5-6.1.14.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/slf4j-api-1.7.7.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/slf4j-log4j12-1.7.10.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/snappy-java-1.0.5.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/spymemcached-2.11.6.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/xmlenc-0.52.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/xz-1.0.jar:/home/ljm1/daolun/servers/hbase-1.4.0/lib/zookeeper-3.4.10.jar:/home/ljm1/daolun/servers/hadoop-2.7.4/etc/hadoop:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/common/lib/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/common/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/hdfs:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/hdfs/lib/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/hdfs/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/yarn/lib/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/yarn/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/mapreduce/lib/*:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/mapreduce/*:/home/ljm1/daolun/servers/hadoop-2.7.4/contrib/capacity-scheduler/*.jar:/home/ljm1/daolun/servers/hbase-1.4.0/conf:/home/ljm1/daolun/servers/apache-hive-1.2.1-bin/lib/*' -Djava.library.path=:/home/ljm1/daolun/servers/hadoop-2.7.4/lib/native:/home/ljm1/daolun/servers/hadoop-2.7.4/lib/native org.apache.flume.node.Application --conf-file flume-conf.properties --name agent SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/home/ljm1/daolun/servers/apache-flume-1.8.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/ljm1/daolun/servers/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/home/ljm1/daolun/servers/hbase-1.4.0/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2025-07-04 11:08:36,654 (main) [ERROR - org.apache.flume.node.Application.main(Application.java:348)] A fatal error occurred while running. Exception follows. org.apache.commons.cli.ParseException: The specified configuration file does not exist: /home/ljm1/daolun/servers/apache-flume-1.8.0-bin/flume-conf.properties at org.apache.flume.node.Application.main(Application.java:316) 这是怎么回事
07-06
## 题目重述 你执行了启动 Flume 的命令,但遇到了错误: ``` ERROR: The specified configuration file does not exist: /home/ljm1/daolun/servers/apache-flume-1.8.0-bin/flume-conf.properties ``` 这表明 Flume 无法找到你指定的配置文件。 --- ## 详解 ### 错误原因分析: 你的启动命令如下: ```bash bin/flume-ng agent --conf ./conf --conf-file flume-conf.properties --name agent -Dflume.root.logger=INFO,console ``` Flume 会根据当前目录加载 `flume-conf.properties` 文件。然而,系统提示: > **The specified configuration file does not exist** 说明 Flume 在当前目录下没有找到 `flume-conf.properties` 这个文件。 --- ### 解决方法: #### ✅ 方法一:确认配置文件存在 请检查当前目录是否真的有 `flume-conf.properties` 文件: ```bash ls -l flume-conf.properties ``` 如果不存在,请创建或复制一个正确的配置文件到当前目录下。 --- #### ✅ 方法二:使用绝对路径指定配置文件(推荐) 如果你确定配置文件在别的目录中,比如 `/home/ljm1/flume-conf.properties`,可以这样启动: ```bash bin/flume-ng agent --conf ./conf --conf-file /home/ljm1/flume-conf.properties --name agent -Dflume.root.logger=INFO,console ``` 这样就不会出现路径找不到的问题。 --- #### ✅ 方法三:切换到正确的工作目录 假设你的配置文件在 `/home/ljm1/flume/` 目录下,请先进入该目录再运行命令: ```bash cd /home/ljm1/flume/ ../apache-flume-1.8.0-bin/bin/flume-ng agent --conf ../apache-flume-1.8.0-bin/conf --conf-file flume-conf.properties --name agent -Dflume.root.logger=INFO,console ``` --- ### 示例:如何快速测试是否存在文件 ```bash cd /home/ljm1/daolun/servers/apache-flume-1.8.0-bin ls -l flume-conf.properties ``` 如果没有这个文件,你可以手动创建一个测试用的配置文件: ```bash vim flume-conf.properties ``` 粘贴以下内容保存: ```properties agent.sources = r1 agent.channels = c1 agent.sinks = k1 agent.sources.r1.type = netcat agent.sources.r1.bind = 0.0.0.0 agent.sources.r1.port = 44444 agent.sinks.k1.type = logger agent.channels.c1.type = memory agent.channels.c1.capacity = 1000 agent.channels.c1.transactionCapacity = 100 agent.sources.r1.channels = c1 agent.sinks.k1.channel = c1 ``` 然后再次尝试启动 Flume。 --- ## 知识点 - **Flume配置加载机制**:理解Flume如何通过`--conf-file`参数加载配置文件。 - **Linux文件路径管理**:掌握相对路径与绝对路径的区别及其使用方式。 - **日志错误排查技巧**:学会从日志中提取关键错误信息进行问题定位修复。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值