声明
- 本文基于Centos 6.x + CDH 5.x
- 本例中 Hbase 是安装成集群模式的
- 本文基于Maven3.5+ 和 Eclipse 4.3
- 教程后的参考资料建议大家一定要看下
我们搭建hbase并不是要用shell来查数据的,我们是要写基于hbase的应用的,所以学习如何使用java来调用hbase是必修课。
环境搭建
建立项目
打开Eclipse 建立一个Maven项目,archetype选quickstart,项目的artifactId 和 groupId 随便起
修改一下pom.xml 修改jdk为1.6+ ,并且引入hadoop相关jar包
- <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
- <modelVersion>4.0.0</modelVersion>
- <groupId>org.crazycake</groupId>
- <artifactId>playhbase</artifactId>
- <version>0.0.1-SNAPSHOT</version>
- <packaging>jar</packaging>
- <name>playhbase</name>
- <url>http://maven.apache.org</url>
- <properties>
- <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
- </properties>
- <resources>
- <resource>
- <directory>${basedir}/conf</directory>
- <filtering>false</filtering>
- <includes>
- <include>hbase-site.xml</include>
- </includes>
- </resource>
- </resources>
- <dependencies>
- <dependency>
- <groupId>junit</groupId>
- <artifactId>junit</artifactId>
- <version>3.8.1</version>
- <scope>test</scope>
- </dependency>
- <dependency>
- <groupId>org.apache.hbase</groupId>
- <artifactId>hbase-client</artifactId>
- <version>0.98.4-hadoop2</version>
- </dependency>
- </dependencies>
- <build>
- <plugins>
- <plugin>
- <artifactId>maven-compiler-plugin</artifactId>
- <version>2.0.2</version>
- <configuration>
- <source>1.6</source>
- <target>1.6</target>
- <encoding>UTF-8</encoding>
- <optimise>true</optimise>
- <compilerArgument>-nowarn</compilerArgument>
- </configuration>
- </plugin>
- <plugin>
- <groupId>org.apache.maven.plugins</groupId>
- <artifactId>maven-shade-plugin</artifactId>
- <version>2.3</version>
- <configuration>
- <transformers>
- <transformer
- implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer">
- </transformer>
- </transformers>
- </configuration>
- <executions>
- <execution>
- <phase>package</phase>
- <goals>
- <goal>shade</goal>
- </goals>
- </execution>
- </executions>
- </plugin>
- </plugins>
- </build>
- </project>
- 除了引入Hbase 的Jar包以外,还引入了一个maven插件叫 maven-shade-plugin ,这个插件可以防止出现“证书重复问题”,重复的证书文件会造成HDInsight集群在运行报错。
- 配置中还增加了一个resource,这个resource引用了一个配置文件hbase-site.xml,在这里写上hbase的连接信息
建立配置文件
建立src/main/resources 文件夹,并添加到源文件夹,并在文件夹内建立一个 hbase-site.xml 文件,内容是
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!--
- /**
- * Copyright 2010 The Apache Software Foundation
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
- -->
- <configuration>
- <property>
- <name>hbase.cluster.distributed</name>
- <value>true</value>
- </property>
- <property>
- <name>hbase.zookeeper.quorum</name>
- <value>host1,host2</value>
- </property>
- <property>
- <name>hbase.zookeeper.property.clientPort</name>
- <value>2181</value>
- </property>
- </configuration>
这个host1 跟 host2 就是你安装zookeeper的机器,因为我只装了两台机器,所以只有host1 和 host2,正常情况下至少是要3个,并且是奇数个增长
操作
创建表并插入数据
创建一类叫 CreateTable.java 这个例子来自 : http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-build-java-maven/?rnd=1 我做了翻译
- package org.crazycake.playhbase;
- import java.io.IOException;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.hbase.HBaseConfiguration;
- import org.apache.hadoop.hbase.HColumnDescriptor;
- import org.apache.hadoop.hbase.HTableDescriptor;
- import org.apache.hadoop.hbase.MasterNotRunningException;
- import org.apache.hadoop.hbase.TableName;
- import org.apache.hadoop.hbase.ZooKeeperConnectionException;
- import org.apache.hadoop.hbase.client.HBaseAdmin;
- import org.apache.hadoop.hbase.client.HTable;
- import org.apache.hadoop.hbase.client.Put;
- import org.apache.hadoop.hbase.util.Bytes;
- public class CreateTable {
- public static void main(String[] args) throws MasterNotRunningException, ZooKeeperConnectionException, IOException {
- Configuration config = HBaseConfiguration.create();
- // 这边注释起来的是动态设定zookeeper参数的方法,如果你没有hbase-site.xml 或者想动态改变
- // 可以采用动态方式设定
- //
- // config.set("hbase.zookeeper.quorum",
- // "zookeepernode0,zookeepernode1,zookeepernode2");
- //config.set("hbase.zookeeper.property.clientPort", "2181");
- //config.set("hbase.cluster.distributed", "true");
- // 使用配置文件创建一个 admin 对象
- HBaseAdmin admin = new HBaseAdmin(config);
- // 创建表
- HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("people"));
- // 创建2个列簇
- tableDescriptor.addFamily(new HColumnDescriptor("name"));
- tableDescriptor.addFamily(new HColumnDescriptor("contactinfo"));
- admin.createTable(tableDescriptor);
- // 接下来搞点数据进去呗
- String[][] people = {
- { "1", "Marcel", "Haddad", "marcel@fabrikam.com"},
- { "2", "Franklin", "Holtz", "franklin@contoso.com" },
- { "3", "Dwayne", "McKee", "dwayne@fabrikam.com" },
- { "4", "Rae", "Schroeder", "rae@contoso.com" },
- { "5", "Rosalie", "burton", "rosalie@fabrikam.com"},
- { "6", "Gabriela", "Ingram", "gabriela@contoso.com"} };
- HTable table = new HTable(config, "people");
- // 把这些数据插入到表里面
- for (int i = 0; i< people.length; i++) {
- //第一列做rowkey
- Put person = new Put(Bytes.toBytes(people[i][0]));
- //把 Marcel 放到 name 这个列簇的 first 这个字段去
- person.add(Bytes.toBytes("name"), Bytes.toBytes("first"), Bytes.toBytes(people[i][1]));
- person.add(Bytes.toBytes("name"), Bytes.toBytes("last"), Bytes.toBytes(people[i][2]));
- person.add(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"), Bytes.toBytes(people[i][3]));
- table.put(person);
- }
- // 最后要记得提交和关闭表
- table.flushCommits();
- table.close();
- }
- }
注意:在运行前,先用hbase shell 连上去,然后运行list 命令,看看是否正常,如果不正常就用jps看看是否hbase和hadoop的那些服务都启动起来了,把该起的都给起了。不然java运行了出错了,都不知道错在哪里再代码上瞎找原因浪费时间
一切就绪后,运行代码!
如果你的代码长时间卡主,不要傻等,去hbase部署的机器上看日志 :
- tail -200f /var/log/hbase/hbase-hbase-master-host1.localdomain.log
- hbase(main):003:0> scan 'people'
- ROW COLUMN+CELL
- 1 column=contactinfo:email, timestamp=1421338694666, value=marcel@fabrikam.com
- 1 column=name:first, timestamp=1421338694666, value=Marcel
- 1 column=name:last, timestamp=1421338694666, value=Haddad
- 2 column=contactinfo:email, timestamp=1421338694932, value=franklin@contoso.com
- 2 column=name:first, timestamp=1421338694932, value=Franklin
- 2 column=name:last, timestamp=1421338694932, value=Holtz
- 3 column=contactinfo:email, timestamp=1421338694977, value=dwayne@fabrikam.com
- 3 column=name:first, timestamp=1421338694977, value=Dwayne
- 3 column=name:last, timestamp=1421338694977, value=McKee
- 4 column=contactinfo:email, timestamp=1421338695034, value=rae@contoso.com
- 4 column=name:first, timestamp=1421338695034, value=Rae
- 4 column=name:last, timestamp=1421338695034, value=Schroeder
- 5 column=contactinfo:email, timestamp=1421338695054, value=rosalie@fabrikam.com
- 5 column=name:first, timestamp=1421338695054, value=Rosalie
- 5 column=name:last, timestamp=1421338695054, value=burton
- 6 column=contactinfo:email, timestamp=1421338695076, value=gabriela@contoso.com
- 6 column=name:first, timestamp=1421338695076, value=Gabriela
- 6 column=name:last, timestamp=1421338695076, value=Ingram
- 6 row(s) in 0.3910 seconds
根据email来搜索
建立一个SearchByEmail类
- package org.crazycake.playhbase;
- import java.io.IOException;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.hbase.HBaseConfiguration;
- import org.apache.hadoop.hbase.client.HTable;
- import org.apache.hadoop.hbase.client.Result;
- import org.apache.hadoop.hbase.client.ResultScanner;
- import org.apache.hadoop.hbase.client.Scan;
- import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
- import org.apache.hadoop.hbase.filter.RegexStringComparator;
- import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
- import org.apache.hadoop.hbase.util.Bytes;
- /**
- * 根据email 来搜索用户
- * @author alexxiyang (https://github.com/alexxiyang)
- *
- */
- public class SearchByEmail {
- public static void main(String[] args) throws IOException {
- //创建配置
- Configuration config = HBaseConfiguration.create();
- // 打开表
- HTable table = new HTable(config, "people");
- //定义一系列要用到的列簇和列
- // 定义列簇
- byte[] contactFamily = Bytes.toBytes("contactinfo");
- // 列
- byte[] emailQualifier = Bytes.toBytes("email");
- //列簇
- byte[] nameFamily = Bytes.toBytes("name");
- //列
- byte[] firstNameQualifier = Bytes.toBytes("first");
- byte[] lastNameQualifier = Bytes.toBytes("last");
- // 创建一个正则表达式的比较器
- RegexStringComparator emailFilter = new RegexStringComparator("rosalie@fabrikam.com");
- // 创建一个filter,把这个正则比较器传进去
- SingleColumnValueFilter filter = new SingleColumnValueFilter(contactFamily, emailQualifier, CompareOp.EQUAL, emailFilter);
- // 创建一个 scan对象
- Scan scan = new Scan();
- //把filter 传进去
- scan.setFilter(filter);
- // 开始查询,并获取结果
- ResultScanner results = table.getScanner(scan);
- // 遍历结果打印数据
- for (Result result : results) {
- String id = new String(result.getRow());
- byte[] firstNameObj = result.getValue(nameFamily, firstNameQualifier);
- String firstName = new String(firstNameObj);
- byte[] lastNameObj = result.getValue(nameFamily, lastNameQualifier);
- String lastName = new String(lastNameObj);
- System.out.println(firstName + " " + lastName + " - ID: " + id);
- byte[] emailObj = result.getValue(contactFamily, emailQualifier);
- String email = new String(emailObj);
- System.out.println(firstName + " " + lastName + " - " + email + " - ID: " + id);
- }
- //关闭结果
- results.close();
- //关闭表
- table.close();
- }
- }
- Rosalie burton - ID: 5
- Rosalie burton - rosalie@fabrikam.com - ID: 5
删除表
建立 DeleteTable 类
- package org.crazycake.playhbase;
- import java.io.IOException;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.hbase.HBaseConfiguration;
- import org.apache.hadoop.hbase.client.HBaseAdmin;
- /**
- * 删除表
- * @author alexxiyang (https://github.com/alexxiyang)
- *
- */
- public class DeleteTable {
- public static void main(String[] args) throws IOException {
- //创建配置
- Configuration config = HBaseConfiguration.create();
- // 建立 admin
- HBaseAdmin admin = new HBaseAdmin(config);
- // 先 disable 表,再delete
- admin.disableTable("people");
- admin.deleteTable("people");
- }
- }
去hbase检查下结果
- hbase(main):004:0> list
- TABLE
- employee
- employee2
- student
- users
- 4 row(s) in 4.8460 seconds
- => ["employee", "employee2", "student", "users"]
people表没有了
中间有一个小插曲,浪费了我一天的时间:
一开始我照这个教程 http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-build-java-maven/?rnd=1 做的时候发现我运行了代码发现长时间卡主,我就去hbase-master机器上看下日志,没有任何异常。
又到RegionServer 所在机器上看了下日志,没有任何异常。
然后我觉得可能hbase出了什么问题,我先停掉程序用hbase shell操作一下试试看,建了一个表,插入了一条数据,一切正常
- hbase(main):002:0> create 'users','info'
- 0 row(s) in 36.5110 seconds
- => Hbase::Table - users
- hbase(main):003:0> list
- TABLE
- employee
- employee2
- student
- users
- 4 row(s) in 0.4520 seconds
- => ["employee", "employee2", "student", "users"]
- hbase(main):004:0> put 'users',1,'info:name','ted'
- 0 row(s) in 0.8350 seconds
- hbase(main):005:0> scan 'users'
- ROW COLUMN+CELL
- 1 column=info:name, timestamp=1421252020520, value=ted
- 1 row(s) in 0.3140 seconds
这样问题就有可能出在zookeeper上了,因为你的java API 不是直接跟hbase交互的,是先通过zookeeper交互,所以我就去看下zookeeper的日志,我用tail监听zookeeper日志
- tail -200f /var/log/zookeeper/zookeeper.log
然后我运行java代码,看下zookeeper有没有报什么异常,不过很令我失望,zookeeper什么都没有报
用telnet来连接两台zookeeper机器也都ok,未发现问题。
我没办法只好用最土的方法,直接调试到源代码内部,发现了卡在了检测zookeeper是否可用的代码上
这里出现了一个localhost
联想到我在hbase 管理界面 (host1:60010)上看到的
估计是这个localhost有问题吧,造成java程序检测本机的2181,这样肯定会卡住的。
然后我去检查我的 host1 上的hbase-site.xml 文件,发现果然没有配置 hbase.zookeeper.quorum 这个参数,于是我配置上这个参数,然后重启hbase-master,再访问 host1:60010
这下比较像样了,然后我再去运行java代码,发现还是卡主!断点到checkIfBaseNodeAvailable发现获取到的配置还是localhost:2181!看来问题在于配置文件的解析上了。
继续断点配置文件的解析。
然后我就发现程序读取的配置文件居然不是hbase-site.xml 而是 zoo.cfg,这是怎么回事?!我看了下官方文档 “HBase 会优先加载 zoo.cfg 里面的配置,把hbase-site.xml里面的覆盖掉.” ,卧槽!既然这样,我就把 zookeeper 的配置文件zoo.cfg 干脆拷贝到 conf 文件夹下,再试试看。还是不行。
然后我突然想到这个hbase-site.xml根本没有加入编译,咋会被java读取到呢?所以我果断不理教程,自己建立了 resources 文件夹,并把这个文件塞到这个目录下,再运行就成了!