首先拿到三台机器
已经配置好了hadoop
在namenode机器上安装hbase的master
在namenode机器上:
yum install ~ hbase-master
修改配置文件
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode的ip或者主机名/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>1073741824</value>
<description>
Maximum HStoreFile size. If any one of a column families' HStoreFiles has
grown to exceed this value, the hosting HRegion is split in two.
Default: 256M.
</description>
</property>
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>134217728</value>
<description>
Memstore will be flushed to disk if size of the memstore
exceeds this number of bytes. Value is checked by a thread that runs
every hbase.server.thread.wakefrequency.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>120000</value>
</property>
<property>
<name>hbase.zookeeper.property.tickTime</name>
<value>6000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>所有装了zookeeper的机器的IP地址,这也包含了所有的regionserver,用逗号隔开</value>
<description>Comma separated list of servers in the ZooKeeper Quorum.
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
</description>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/hadoop/hbase</value>
</property>
</configuration>
修改配置文件hbase-env.sh
在末尾加入:
export JAVA_HOME=/usr/java/jdk1.6.0_45
export HBASE_CLASSPATH=/etc/hadoop/conf
export HBASE_MANAGES_ZK=false 这玩意代表了是否使用hbase自带的zookeeper,true代表使用
export HBASE_HEAPSIZE=2048
配置regionservers
全部都是IP地址,或者主机名,一行一个,只放regionserver的就行了
启动hmaster
/etc/init.d/hbase-master start
安装zookeeper
每个机器都装,步骤一样
yum install ~ zookeeper
配置zoo.cfg在末尾加入这些,全部都是regionservser的识别
server.1=172.16.26.25:2888:3888
server.2=172.16.26.39:2888:3888
server.3=172.16.26.53:2888:3888
server.4=172.16.26.54:2888:3888
启动 在目录 /usr/lib/zookeeper/bin/ sh zkServer.sh start
这里要注意一点/ect下的hosts要配置正确~~!!!!!!!!!!!!!
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
172.16.27.78 puppet.madhouse.cn
172.16.26.13 opd18hdp04.dev.optimad.cn
172.16.26.25 opd19hdp03.dev.optimad.cn
172.16.26.39 opd22hdp01.dev.optimad.cn
172.16.26.53 opd04hbs01.dev.optimad.cn
172.16.26.54 opd04hbs02.dev.optimad.cn
之后,在装好hive的机器上,把hive-site.xml文件配置好
<configuration>
<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
<!-- that are implied by Hadoop setup variables. -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource). -->
<!-- Hive Execution Parameters -->
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://172.16.26.24(这个地方是你hive元数据库的):3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/usr/lib/hive/tmp</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/lib/hive/logs</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.3.1.jar,file:///usr/lib/hive/lib/hbase.jar,file:///usr/lib/hive/lib/zookeeper-3.4.5-cdh4.3.1.jar</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>172.16.26.25,172.16.26.39,172.16.26.53,172.16.26.54</value>
</property>
</configuration>
ok,用hdfs用户进入hive
建立hbase可以识别的hive表
CREATE TABLE adunit_hbase(key string,
id int,
resourceid string,
adunitname string,
adprojectid int,
campaignid int,
mediaid int,
mediatype boolean,
mccaccountid int,
mcccampaignid string,
prebudgetdaily double,
preactivedaily bigint,
brief string,
briefstartdate string,
briefenddate string,
adunitstatus boolean,
isdel boolean,
owner int,
puturl string,
analyticsadunitid string,
isreturnactive boolean,
conversionid string,
label string,
bundleid string,
createuser int,
createtime string,
modifyuser int,
modifytime string,
verifyuser int,
verifytime string,
puttype boolean,
isfixputprice boolean,
fixputprice double,
impressionurl string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf0:id,cf1:resourceid,cf2:adunitname,cf3:adprojectid,cf4:campaignid,cf5:mediaid,cf6:mediatype,
cf7:mccaccountid,cf8:mcccampaignid,cf9:prebudgetdaily,cf10:preactivedaily,cf11:brief,
cf12:briefstartdate,cf13:briefenddate,cf14:adunitstatus,cf15:isdel,cf16:owner,cf17:puturl,cf18:analyticsadunitid,
cf19:isreturnactive,cf20:conversionid,cf21:label,cf22:bundleid,cf23:createuser,cf24:createtime,cf25:modifyuser,
cf26:modifytime,cf27:verifyuser,cf28:verifytime,cf29:puttype,cf30:isfixputprice,cf31:fixputprice,cf32:impressionurl")
TBLPROPERTIES ("hbase.table.name" = "adunit_hbase");
INSERT OVERWRITE TABLE call_postbacklog_hbase SELECT a.* FROM call_postbacklog a;
2f059452ed61ffd8
20140701 20140702 20140703 20140704 20140705 20140706 20140707 20140708 20140709 20140710 20140711 20140712 20140713 20140714 20140715 20140716 20140717 20140718 20140719 20140720 20140721 20140722
已经配置好了hadoop
在namenode机器上安装hbase的master
在namenode机器上:
yum install ~ hbase-master
修改配置文件
hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode的ip或者主机名/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.hregion.max.filesize</name>
<value>1073741824</value>
<description>
Maximum HStoreFile size. If any one of a column families' HStoreFiles has
grown to exceed this value, the hosting HRegion is split in two.
Default: 256M.
</description>
</property>
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>134217728</value>
<description>
Memstore will be flushed to disk if size of the memstore
exceeds this number of bytes. Value is checked by a thread that runs
every hbase.server.thread.wakefrequency.
</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed Zookeeper
true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>120000</value>
</property>
<property>
<name>hbase.zookeeper.property.tickTime</name>
<value>6000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>所有装了zookeeper的机器的IP地址,这也包含了所有的regionserver,用逗号隔开</value>
<description>Comma separated list of servers in the ZooKeeper Quorum.
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
By default this is set to localhost for local and pseudo-distributed modes
of operation. For a fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
</description>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/hadoop/hbase</value>
</property>
</configuration>
修改配置文件hbase-env.sh
在末尾加入:
export JAVA_HOME=/usr/java/jdk1.6.0_45
export HBASE_CLASSPATH=/etc/hadoop/conf
export HBASE_MANAGES_ZK=false 这玩意代表了是否使用hbase自带的zookeeper,true代表使用
export HBASE_HEAPSIZE=2048
配置regionservers
全部都是IP地址,或者主机名,一行一个,只放regionserver的就行了
启动hmaster
/etc/init.d/hbase-master start
安装zookeeper
每个机器都装,步骤一样
yum install ~ zookeeper
配置zoo.cfg在末尾加入这些,全部都是regionservser的识别
server.1=172.16.26.25:2888:3888
server.2=172.16.26.39:2888:3888
server.3=172.16.26.53:2888:3888
server.4=172.16.26.54:2888:3888
启动 在目录 /usr/lib/zookeeper/bin/ sh zkServer.sh start
这里要注意一点/ect下的hosts要配置正确~~!!!!!!!!!!!!!
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
172.16.27.78 puppet.madhouse.cn
172.16.26.13 opd18hdp04.dev.optimad.cn
172.16.26.25 opd19hdp03.dev.optimad.cn
172.16.26.39 opd22hdp01.dev.optimad.cn
172.16.26.53 opd04hbs01.dev.optimad.cn
172.16.26.54 opd04hbs02.dev.optimad.cn
之后,在装好hive的机器上,把hive-site.xml文件配置好
<configuration>
<!-- Hive Configuration can either be stored in this file or in the hadoop configuration files -->
<!-- that are implied by Hadoop setup variables. -->
<!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive -->
<!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
<!-- resource). -->
<!-- Hive Execution Parameters -->
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://172.16.26.24(这个地方是你hive元数据库的):3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/usr/lib/hive/tmp</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/lib/hive/logs</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.3.1.jar,file:///usr/lib/hive/lib/hbase.jar,file:///usr/lib/hive/lib/zookeeper-3.4.5-cdh4.3.1.jar</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>172.16.26.25,172.16.26.39,172.16.26.53,172.16.26.54</value>
</property>
</configuration>
ok,用hdfs用户进入hive
建立hbase可以识别的hive表
CREATE TABLE adunit_hbase(key string,
id int,
resourceid string,
adunitname string,
adprojectid int,
campaignid int,
mediaid int,
mediatype boolean,
mccaccountid int,
mcccampaignid string,
prebudgetdaily double,
preactivedaily bigint,
brief string,
briefstartdate string,
briefenddate string,
adunitstatus boolean,
isdel boolean,
owner int,
puturl string,
analyticsadunitid string,
isreturnactive boolean,
conversionid string,
label string,
bundleid string,
createuser int,
createtime string,
modifyuser int,
modifytime string,
verifyuser int,
verifytime string,
puttype boolean,
isfixputprice boolean,
fixputprice double,
impressionurl string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf0:id,cf1:resourceid,cf2:adunitname,cf3:adprojectid,cf4:campaignid,cf5:mediaid,cf6:mediatype,
cf7:mccaccountid,cf8:mcccampaignid,cf9:prebudgetdaily,cf10:preactivedaily,cf11:brief,
cf12:briefstartdate,cf13:briefenddate,cf14:adunitstatus,cf15:isdel,cf16:owner,cf17:puturl,cf18:analyticsadunitid,
cf19:isreturnactive,cf20:conversionid,cf21:label,cf22:bundleid,cf23:createuser,cf24:createtime,cf25:modifyuser,
cf26:modifytime,cf27:verifyuser,cf28:verifytime,cf29:puttype,cf30:isfixputprice,cf31:fixputprice,cf32:impressionurl")
TBLPROPERTIES ("hbase.table.name" = "adunit_hbase");
INSERT OVERWRITE TABLE call_postbacklog_hbase SELECT a.* FROM call_postbacklog a;
2f059452ed61ffd8
20140701 20140702 20140703 20140704 20140705 20140706 20140707 20140708 20140709 20140710 20140711 20140712 20140713 20140714 20140715 20140716 20140717 20140718 20140719 20140720 20140721 20140722