第8课Hbase的java调用方法

最新推荐文章于 2024-06-05 14:32:18 发布

转载最新推荐文章于 2024-06-05 14:32:18 发布 · 578 阅读

hadoop 专栏收录该内容

54 篇文章

订阅专栏

本文详细介绍如何使用Java API与HBase进行交互，包括创建表、插入数据、搜索及删除表的操作步骤。通过具体示例代码展示如何在Eclipse中构建Maven项目，并配置HBase环境。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

声明

本文基于Centos 6.x + CDH 5.x
本例中 Hbase 是安装成集群模式的
本文基于Maven3.5+ 和 Eclipse 4.3
教程后的参考资料建议大家一定要看下

我们搭建hbase并不是要用shell来查数据的，我们是要写基于hbase的应用的，所以学习如何使用java来调用hbase是必修课。

环境搭建

建立项目

打开Eclipse 建立一个Maven项目，archetype选quickstart，项目的artifactId 和 groupId 随便起

修改一下pom.xml 修改jdk为1.6+ ，并且引入hadoop相关jar包

[html]view plain copy
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">  
    <modelVersion>4.0.0</modelVersion>  
  
    <groupId>org.crazycake</groupId>  
    <artifactId>playhbase</artifactId>  
    <version>0.0.1-SNAPSHOT</version>  
    <packaging>jar</packaging>  
  
    <name>playhbase</name>  
    <url>http://maven.apache.org</url>  
  
    <properties>  
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>  
    </properties>  
    <resources>  
        <resource>  
            <directory>${basedir}/conf</directory>  
            <filtering>false</filtering>  
            <includes>  
                <include>hbase-site.xml</include>  
            </includes>  
        </resource>  
    </resources>  
    <dependencies>  
        <dependency>  
            <groupId>junit</groupId>  
            <artifactId>junit</artifactId>  
            <version>3.8.1</version>  
            <scope>test</scope>  
        </dependency>  
        <dependency>  
            <groupId>org.apache.hbase</groupId>  
            <artifactId>hbase-client</artifactId>  
            <version>0.98.4-hadoop2</version>  
        </dependency>  
    </dependencies>  
  
    <build>  
        <plugins>  
            <plugin>  
                <artifactId>maven-compiler-plugin</artifactId>  
                <version>2.0.2</version>  
                <configuration>  
                    <source>1.6</source>  
                    <target>1.6</target>  
                    <encoding>UTF-8</encoding>  
                    <optimise>true</optimise>  
                    <compilerArgument>-nowarn</compilerArgument>  
                </configuration>  
            </plugin>  
            <plugin>  
                <groupId>org.apache.maven.plugins</groupId>  
                <artifactId>maven-shade-plugin</artifactId>  
                <version>2.3</version>  
                <configuration>  
                    <transformers>  
                        <transformer  
                            implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer">  
                        </transformer>  
                    </transformers>  
                </configuration>  
                <executions>  
                    <execution>  
                        <phase>package</phase>  
                        <goals>  
                            <goal>shade</goal>  
                        </goals>  
                    </execution>  
                </executions>  
            </plugin>  
        </plugins>  
    </build>  
</project>  

除了引入Hbase 的Jar包以外，还引入了一个maven插件叫 maven-shade-plugin ，这个插件可以防止出现“证书重复问题”，重复的证书文件会造成HDInsight集群在运行报错。
配置中还增加了一个resource，这个resource引用了一个配置文件hbase-site.xml，在这里写上hbase的连接信息

建立配置文件

建立src/main/resources 文件夹，并添加到源文件夹，并在文件夹内建立一个 hbase-site.xml 文件，内容是

[html]view plain copy
<?xml version="1.0"?>  
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>  
<!--  
/**  
 * Copyright 2010 The Apache Software Foundation  
 *  
 * Licensed to the Apache Software Foundation (ASF) under one  
 * or more contributor license agreements.  See the NOTICE file  
 * distributed with this work for additional information  
 * regarding copyright ownership.  The ASF licenses this file  
 * to you under the Apache License, Version 2.0 (the  
 * "License"); you may not use this file except in compliance  
 * with the License.  You may obtain a copy of the License at  
 *  
 *     http://www.apache.org/licenses/LICENSE-2.0  
 *  
 * Unless required by applicable law or agreed to in writing, software  
 * distributed under the License is distributed on an "AS IS" BASIS,  
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  
 * See the License for the specific language governing permissions and  
 * limitations under the License.  
 */  
-->  
<configuration>  
  <property>  
    <name>hbase.cluster.distributed</name>  
    <value>true</value>  
  </property>  
  <property>  
    <name>hbase.zookeeper.quorum</name>  
    <value>host1,host2</value>  
  </property>  
  <property>  
    <name>hbase.zookeeper.property.clientPort</name>  
    <value>2181</value>  
  </property>  
</configuration>  

这个host1 跟 host2 就是你安装zookeeper的机器，因为我只装了两台机器，所以只有host1 和 host2，正常情况下至少是要3个，并且是奇数个增长

操作

创建表并插入数据

创建一类叫 CreateTable.java 这个例子来自： http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-build-java-maven/?rnd=1 我做了翻译

[java]view plain copy
package org.crazycake.playhbase;  
  
import java.io.IOException;  
  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.hbase.HBaseConfiguration;  
import org.apache.hadoop.hbase.HColumnDescriptor;  
import org.apache.hadoop.hbase.HTableDescriptor;  
import org.apache.hadoop.hbase.MasterNotRunningException;  
import org.apache.hadoop.hbase.TableName;  
import org.apache.hadoop.hbase.ZooKeeperConnectionException;  
import org.apache.hadoop.hbase.client.HBaseAdmin;  
import org.apache.hadoop.hbase.client.HTable;  
import org.apache.hadoop.hbase.client.Put;  
import org.apache.hadoop.hbase.util.Bytes;  
  
public class CreateTable {  
    public static void main(String[] args) throws MasterNotRunningException, ZooKeeperConnectionException, IOException {  
        Configuration config = HBaseConfiguration.create();  
          
        // 这边注释起来的是动态设定zookeeper参数的方法，如果你没有hbase-site.xml 或者想动态改变  
        // 可以采用动态方式设定  
        //  
        // config.set("hbase.zookeeper.quorum",  
        //            "zookeepernode0,zookeepernode1,zookeepernode2");  
        //config.set("hbase.zookeeper.property.clientPort", "2181");  
        //config.set("hbase.cluster.distributed", "true");  
          
        // 使用配置文件创建一个 admin 对象  
        HBaseAdmin admin = new HBaseAdmin(config);  
          
        // 创建表  
        HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("people"));  
          
        // 创建2个列簇  
        tableDescriptor.addFamily(new HColumnDescriptor("name"));  
        tableDescriptor.addFamily(new HColumnDescriptor("contactinfo"));  
          
        admin.createTable(tableDescriptor);  
          
        // 接下来搞点数据进去呗  
        String[][] people = {  
            { "1", "Marcel", "Haddad", "marcel@fabrikam.com"},  
            { "2", "Franklin", "Holtz", "franklin@contoso.com" },  
            { "3", "Dwayne", "McKee", "dwayne@fabrikam.com" },  
            { "4", "Rae", "Schroeder", "rae@contoso.com" },  
            { "5", "Rosalie", "burton", "rosalie@fabrikam.com"},  
            { "6", "Gabriela", "Ingram", "gabriela@contoso.com"} };  
  
        HTable table = new HTable(config, "people");  
          
        //  把这些数据插入到表里面  
        for (int i = 0; i< people.length; i++) {  
          //第一列做rowkey  
          Put person = new Put(Bytes.toBytes(people[i][0]));  
            
          //把 Marcel 放到 name 这个列簇的 first 这个字段去  
          person.add(Bytes.toBytes("name"), Bytes.toBytes("first"), Bytes.toBytes(people[i][1]));  
          person.add(Bytes.toBytes("name"), Bytes.toBytes("last"), Bytes.toBytes(people[i][2]));  
          person.add(Bytes.toBytes("contactinfo"), Bytes.toBytes("email"), Bytes.toBytes(people[i][3]));  
          table.put(person);  
        }  
          
        // 最后要记得提交和关闭表  
        table.flushCommits();  
        table.close();  
    }  
}  

注意：在运行前，先用hbase shell 连上去，然后运行list 命令，看看是否正常，如果不正常就用jps看看是否hbase和hadoop的那些服务都启动起来了，把该起的都给起了。不然java运行了出错了，都不知道错在哪里再代码上瞎找原因浪费时间

一切就绪后，运行代码！

如果你的代码长时间卡主，不要傻等，去hbase部署的机器上看日志：

[plain]view plain copy
tail -200f  /var/log/hbase/hbase-hbase-master-host1.localdomain.log  

运行结束后查看一下结果

[plain]view plain copy
hbase(main):003:0> scan 'people'  
ROW                              COLUMN+CELL                                                                                    
 1                               column=contactinfo:email, timestamp=1421338694666, value=marcel@fabrikam.com                   
 1                               column=name:first, timestamp=1421338694666, value=Marcel                                       
 1                               column=name:last, timestamp=1421338694666, value=Haddad                                        
 2                               column=contactinfo:email, timestamp=1421338694932, value=franklin@contoso.com                  
 2                               column=name:first, timestamp=1421338694932, value=Franklin                                     
 2                               column=name:last, timestamp=1421338694932, value=Holtz                                         
 3                               column=contactinfo:email, timestamp=1421338694977, value=dwayne@fabrikam.com                   
 3                               column=name:first, timestamp=1421338694977, value=Dwayne                                       
 3                               column=name:last, timestamp=1421338694977, value=McKee                                         
 4                               column=contactinfo:email, timestamp=1421338695034, value=rae@contoso.com                       
 4                               column=name:first, timestamp=1421338695034, value=Rae                                          
 4                               column=name:last, timestamp=1421338695034, value=Schroeder                                     
 5                               column=contactinfo:email, timestamp=1421338695054, value=rosalie@fabrikam.com                  
 5                               column=name:first, timestamp=1421338695054, value=Rosalie                                      
 5                               column=name:last, timestamp=1421338695054, value=burton                                        
 6                               column=contactinfo:email, timestamp=1421338695076, value=gabriela@contoso.com                  
 6                               column=name:first, timestamp=1421338695076, value=Gabriela                                     
 6                               column=name:last, timestamp=1421338695076, value=Ingram                                        
6 row(s) in 0.3910 seconds  

根据email来搜索

建立一个SearchByEmail类

[cpp]view plain copy
package org.crazycake.playhbase;  
  
import java.io.IOException;  
  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.hbase.HBaseConfiguration;  
import org.apache.hadoop.hbase.client.HTable;  
import org.apache.hadoop.hbase.client.Result;  
import org.apache.hadoop.hbase.client.ResultScanner;  
import org.apache.hadoop.hbase.client.Scan;  
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;  
import org.apache.hadoop.hbase.filter.RegexStringComparator;  
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;  
import org.apache.hadoop.hbase.util.Bytes;  
  
/** 
 * 根据email 来搜索用户 
 * @author alexxiyang (https://github.com/alexxiyang) 
 * 
 */  
public class SearchByEmail {  
    public static void main(String[] args) throws IOException {  
          
        //创建配置  
        Configuration config = HBaseConfiguration.create();  
  
        // 打开表  
        HTable table = new HTable(config, "people");  
          
        //定义一系列要用到的列簇和列  
        // 定义列簇  
        byte[] contactFamily = Bytes.toBytes("contactinfo");  
        // 列  
        byte[] emailQualifier = Bytes.toBytes("email");  
        //列簇  
        byte[] nameFamily = Bytes.toBytes("name");  
        //列  
        byte[] firstNameQualifier = Bytes.toBytes("first");  
        byte[] lastNameQualifier = Bytes.toBytes("last");  
        // 创建一个正则表达式的比较器  
        RegexStringComparator emailFilter = new RegexStringComparator("rosalie@fabrikam.com");  
        // 创建一个filter，把这个正则比较器传进去  
        SingleColumnValueFilter filter = new SingleColumnValueFilter(contactFamily, emailQualifier, CompareOp.EQUAL, emailFilter);  
  
        // 创建一个 scan对象  
        Scan scan = new Scan();  
          
        //把filter 传进去  
        scan.setFilter(filter);  
  
        // 开始查询，并获取结果  
        ResultScanner results = table.getScanner(scan);  
        // 遍历结果打印数据  
        for (Result result : results) {  
            String id = new String(result.getRow());  
            byte[] firstNameObj = result.getValue(nameFamily, firstNameQualifier);  
            String firstName = new String(firstNameObj);  
            byte[] lastNameObj = result.getValue(nameFamily, lastNameQualifier);  
            String lastName = new String(lastNameObj);  
            System.out.println(firstName + " " + lastName + " - ID: " + id);  
            byte[] emailObj = result.getValue(contactFamily, emailQualifier);  
            String email = new String(emailObj);  
            System.out.println(firstName + " " + lastName + " - " + email + " - ID: " + id);  
        }  
        //关闭结果  
        results.close();  
          
        //关闭表  
        table.close();  
    }  
}  

运行结果

[sql]view plain copy
Rosalie burton - ID: 5  
Rosalie burton - rosalie@fabrikam.com - ID: 5  

删除表

建立 DeleteTable 类

[java]view plain copy
package org.crazycake.playhbase;  
  
import java.io.IOException;  
  
import org.apache.hadoop.conf.Configuration;  
import org.apache.hadoop.hbase.HBaseConfiguration;  
import org.apache.hadoop.hbase.client.HBaseAdmin;  
  
/** 
 * 删除表 
 * @author alexxiyang (https://github.com/alexxiyang) 
 * 
 */  
public class DeleteTable {  
    public static void main(String[] args) throws IOException {  
          
        //创建配置  
        Configuration config = HBaseConfiguration.create();  
  
        // 建立 admin  
        HBaseAdmin admin = new HBaseAdmin(config);  
  
        // 先 disable 表，再delete  
        admin.disableTable("people");  
        admin.deleteTable("people");  
    }  
}  

去hbase检查下结果

[plain]view plain copy
hbase(main):004:0> list  
TABLE                                                                                                                           
employee                                                                                                                        
employee2                                                                                                                       
student                                                                                                                         
users                                                                                                                           
4 row(s) in 4.8460 seconds  
  
=> ["employee", "employee2", "student", "users"]  

people表没有了

我遇到的问题

中间有一个小插曲，浪费了我一天的时间：

一开始我照这个教程 http://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-build-java-maven/?rnd=1 做的时候发现我运行了代码发现长时间卡主，我就去hbase-master机器上看下日志，没有任何异常。

又到RegionServer 所在机器上看了下日志，没有任何异常。

然后我觉得可能hbase出了什么问题，我先停掉程序用hbase shell操作一下试试看，建了一个表，插入了一条数据，一切正常

[plain]view plain copy
hbase(main):002:0> create 'users','info'   
0 row(s) in 36.5110 seconds  
  
=> Hbase::Table - users  
hbase(main):003:0> list  
TABLE                                                                                                                           
employee                                                                                                                        
employee2                                                                                                                       
student                                                                                                                         
users                                                                                                                           
4 row(s) in 0.4520 seconds  
  
=> ["employee", "employee2", "student", "users"]  
hbase(main):004:0> put 'users',1,'info:name','ted'   
0 row(s) in 0.8350 seconds  
  
hbase(main):005:0> scan 'users'  
ROW                              COLUMN+CELL                                                                                    
 1                               column=info:name, timestamp=1421252020520, value=ted                                           
1 row(s) in 0.3140 seconds  

这样问题就有可能出在zookeeper上了，因为你的java API 不是直接跟hbase交互的，是先通过zookeeper交互，所以我就去看下zookeeper的日志，我用tail监听zookeeper日志

[plain]view plain copy
tail -200f /var/log/zookeeper/zookeeper.log  

然后我运行java代码，看下zookeeper有没有报什么异常，不过很令我失望，zookeeper什么都没有报

用telnet来连接两台zookeeper机器也都ok，未发现问题。

我没办法只好用最土的方法，直接调试到源代码内部，发现了卡在了检测zookeeper是否可用的代码上

这里出现了一个localhost

联想到我在hbase 管理界面 (host1:60010)上看到的

估计是这个localhost有问题吧，造成java程序检测本机的2181，这样肯定会卡住的。

然后我去检查我的 host1 上的hbase-site.xml 文件，发现果然没有配置 hbase.zookeeper.quorum 这个参数，于是我配置上这个参数，然后重启hbase-master，再访问 host1:60010

这下比较像样了，然后我再去运行java代码，发现还是卡主！断点到checkIfBaseNodeAvailable发现获取到的配置还是localhost:2181！看来问题在于配置文件的解析上了。

继续断点配置文件的解析。

然后我就发现程序读取的配置文件居然不是hbase-site.xml 而是 zoo.cfg，这是怎么回事？！我看了下官方文档 “HBase 会优先加载 zoo.cfg 里面的配置，把hbase-site.xml里面的覆盖掉.” ，卧槽！既然这样，我就把 zookeeper 的配置文件zoo.cfg 干脆拷贝到 conf 文件夹下，再试试看。还是不行。

然后我突然想到这个hbase-site.xml根本没有加入编译，咋会被java读取到呢？所以我果断不理教程，自己建立了 resources 文件夹，并把这个文件塞到这个目录下，再运行就成了！