hbase-0.98.3小试牛刀

最新推荐文章于 2021-02-18 22:58:39 发布

原创最新推荐文章于 2021-02-18 22:58:39 发布 · 4.3k 阅读

0 ·

CC 4.0 BY-SA版权

Hadoop 专栏收录该内容

17 篇文章

订阅专栏

本文详细介绍了在Hadoop环境下安装Hbase的过程，包括配置文件的修改、环境准备及启动步骤，并提供了安装后的验证方法。重点阐述了如何通过优化配置提高Hbase性能，包括调整内存大小、设置最大接收连接数、配置垃圾回收选项等，旨在实现高效的数据存储与检索。

最近一直在考虑统计分析的基础数据、中间数据、结果数据该怎么存放才有利于写入、读取、汇总了，Mysql当然是不二人选，不过涉及到更新的时候，都要先select后update，效率比较低，另外一个就是不同维度的统计数据要是排序的话，那就惨了，到处都是索引，整一张表就是各种索引了，又导致写入变得更加慢，当然有很多办法可以加速这个过程，也是可以信赖的，不过总不甘心，想试一试是否有更好办法了！这时hbase自然出现了，天然的和Hadoop在一块，有基础维度后就可以计算出各种其它不同的维度，各种维度计算还是可以重复进行的，统计数据本就是一个维度+数据这样的结果，刚好就是key/value了，如果需要支持value排序的话，也可以完美解决！遂想，好歹先试一试了，这不才有如下这个实验了！

Hadoop环境：

hadoop2.2.0 +HA(QJM)，4节点

Hbase环境：

hbase-0.98.3-hadoop2(安装目录：/home/hadoop/hbase-0.98.3-hadoop2），4个节点(hadoop25\hadoop28\hadoop201\hadoop224)

ZK环境：

三节点的独立的ZK集群（ZK25、ZK28、ZK224，clientPort为2181)

以下为具体安装Hbase-0.98.3那些事！

我有现成已经搭建好，可以稳定运行的Hadoop2.2.0+HA(QJM)的环境，搭建hbase其实很容易的（测试而已，所以要求也不高）。

1、下载hbase-0.98.3-hadoop2-bin.tar.gz

直接从http://hbase.apache.org下载最新的稳定版本hbase-0.98.3-hadoop2，这个版本默认支持的就是hadoop2.2.0，所以省了好些麻烦事。

2、配置那点事

（1）修改hadoop2.2.0的配置文件hdfs-site.xml（增加支持append和增大打开文件的个数限制）

     <property>
        <name>dfs.datanode.max.xcievers</name>
        <value>4096</value>
     </property>
     <property> 
        <name>dfs.support.append</name> 
        <value>true</value> 
     </property>

这个配置修改了，需要把Hadoop集群重启。（我

（2）修改hbase的配置文件conf/hbase-env.sh

#
#/**
# * Copyright 2007 The Apache Software Foundation
# *
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */


# Set environment variables here.


# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)


# The java implementation to use.  Java 1.6 required.
export JAVA_HOME=/home/hadoop/jdk1.7.0_45


# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=


# The maximum amount of heap to use, in MB. Default is 1000.
export HBASE_HEAPSIZE=1024


# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"


# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.


# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"


# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"


# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"


# Uncomment one of the below three options to enable java garbage collection logging for the client processes.


# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"


# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"


# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"


# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.
# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="
# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.




# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"


# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers


# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"


# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters


# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"


# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs


# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"


# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER


# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10


# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/home/hadoop/hbase-0.98.3-hadoop2


# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1


# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false


# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.

（3）修改hbase的配置文件conf/hbase-size.xml(利用现成的ZK集群，不使用hbase自行管理ZK模式）

<configuration>
    <property>
	    <name>hbase.rootdir</name>
	    <value>hdfs://mycluster/hbase</value> <span style="background-color: rgb(255, 0, 0);"> <!--必须和core-site.xml的fs.defaultFS值一致--></span>
	</property>
	<property>
	    <name>hbase.cluster.distributed</name>
	    <value>true</value>
	</property>
    <property>  
        <name>hbase.tmp.dir</name>  
        <value>/home/hadoop/hbase-0.98.3-hadoop2/tmp</value>  
    </property> 
    <property>
	    <name>hbase.zookeeper.quorum</name> 
	    <value>zk25,zk28,zk224</value>
	</property>
</configuration>

（4）修改hbase的配置文件conf/regionservers(指定启动作为regionserver的节点）

<pre name="code" class="html">hadoop25
hadoop28
hadoop201
hadoop224

（5）把hadoop/etc/hadoop/hdfs-site.xml文件复制到hbase的conf目录下

（这个很重要哦，否则会报错误，我就遇到问题了，报mycluster未未知的主机）

------------------------------------------------------------------------------------------------------------------------------------------------

===将上述动作在每一台需要安装hbase的节点上重复执行，呵呵，其实配好一台机器，直接COPY更好====

------------------------------------------------------------------------------------------------------------------------------------------------

（6）再就是人见人爱的执行下

/home/hadoop/hbase-0.98.3-hadoop2/bin/start-hbase.sh

完成启动了。

（7）看看hbase有否启动成功？

http://hmaster:60010（你在那台机器执行start-hbase.sh那么哪一台机器就是hmaster）

当然最直接了当的办法还是看下日志有没有错误：

hbase-hadoop-regionserver-hadoop25log

hbase-hadoop-master-hadoop25.log