Hbase实战--HBASE的API操作（增删改查）

最新推荐文章于 2025-06-20 09:47:41 发布

阿华田512

最新推荐文章于 2025-06-20 09:47:41 发布

阅读量4.5k

点赞数 3

CC 4.0 BY-SA版权

分类专栏： hbase 文章标签： hbase插入数据的几种方式 hbase的api spark操作hbase BufferedMutatorParams

本文链接：https://blog.youkuaiyun.com/aA518189/article/details/85298889

hbase 专栏收录该内容

3 篇文章

订阅专栏

扫一扫加入大数据公众号和技术交流群，了解更多大数据技术，还有免费资料等你哦

连接HBase的正确姿势

Connection是什么

在众多HBase用户中，常见的使用Connection的错误方法有：

（1）自己实现一个Connection对象的资源池，每次使用都从资源池中取出一个Connection对象；
（2）每个线程一个Connection对象。
（3）每次访问HBase的时候临时创建一个Connection对象，使用完之后调用close关闭连接。
从这些做法来看，这些用户显然是把Connection对象当成了单机数据库里面的连接对象来用了。然而，作为一个分布式数据库，HBase客户端需要和多个服务器中的不同服务角色建立连接，所以HBase客户端中的Connection对象并不是简单对应一个socket连接。HBase的API文档当中对Connection的定义是：
A cluster connection encapsulating lower level individual connections to actual servers and a connection to zookeeper.
我们知道，HBase访问一条数据的过程中，需要连接三个不同的服务角色：

（1）Zookeeper
（2）HBase Master
（3）HBase RegionServer

而HBase客户端的Connection包含了对以上三种socket连接的封装。Connection对象和实际的socket连接之间的对应关系如下图：

在HBase客户端代码中，真正对应socket连接的是RpcConnection对象。HBase使用PoolMap这种数据结构来存储客户端到HBase服务器之间的连接。PoolMap封装了ConcurrentHashMap>的结构，key是ConnectionId（封装了服务器地址和用户ticket）,value是一个RpcConnection对象的资源池。当HBase需要连接一个服务器时，首先会根据ConnectionId找到对应的连接池，然后从连接池中取出一个连接对象。HBase中提供了三种资源池的实现，分别是Reusable，RoundRobin和ThreadLocal。

具体实现可以通过hbase.client.ipc.pool.type配置项指定，默认为Reusable。连接池的大小也可以通过hbase.client.ipc.pool.size配置项指定，默认为1。

连接HBase的正确方式

在HBase中Connection类已经实现了对连接的管理功能，所以我们不需要自己在Connection之上再做额外的管理。另外，Connection是线程安全的，而Table和Admin则不是线程安全的，因此正确的做法是一个进程共用一个Connection对象，而在不同的线程中使用单独的Table和Admin对象。

所有进程共用一个connection对象

connection = ConnectionFactory.createConnection(config);

每个线程使用单独的table对象

 Table table = connection.getTable(TableName.valueOf("test"));
           try {
               ...
           } finally {
               table.close();
           }

HBase客户端默认的是连接池大小是1，也就是每个RegionServer 1个连接。如果应用需要使用更大的连接池或指定其他的资源池类型，也可以通过修改配置实现：

config.set("hbase.client.ipc.pool.type",...);
config.set("hbase.client.ipc.pool.size",...);
connection = ConnectionFactory.createConnection(config);

con...创建一个连接

Connection conn = ConnectionFactory.createConnection(conf);

拿到一个DDL操作器：表管理器admin

Admin admin = conn.getAdmin();

用表管理器的api去建表、删表、修改表定义

admin.createTable(HTableDescriptor descriptor);

public class HbaseApiDemo {
@Test
public void testCreateTable() throws Exception {
创建hbase的配置对象
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "cts02:2181,cts03:2181,cts04:2181");
创建hbase的连接对象
Connection conn = ConnectionFactory.createConnection(conf);
 DDL操作工具
Admin admin = conn.getAdmin();
创建一个表定义描述对象
HTableDescriptor tUser = new HTableDescriptor(TableName.valueOf("t_user"));
构造一个列族描述对象
HColumnDescriptor f1 = new HColumnDescriptor("f1");
HColumnDescriptor f2 = new HColumnDescriptor("f2");
在表描述对象中加入列族描述
tUser.addFamily(f1);
tUser.addFamily(f2);
调用admin的建表方法来建表
admin.createTable(tUser);
// 关闭连接
admin.close();
conn.close();
}

修改表定义

public void testAlterTable() throws Exception {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "cts02:2181,cts03:2181,cts04:2181");
Connection conn = ConnectionFactory.createConnection(conf);
DDL操作工具
Admin admin = conn.getAdmin();
HTableDescriptor user = admin.getTableDescriptor(TableName.valueOf("t_user"));
HColumnDescriptor f3 = new HColumnDescriptor("f3");
f3.setMaxVersions(3);
user.addFamily(f3);
admin.modifyTable(TableName.valueOf("t_user"), user);
admin.close();
conn.close();
}

删除表

public void testDropTable() throws Exception {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "cts02:2181,cts03:2181,cts04:2181");
Connection conn = ConnectionFactory.createConnection(conf);
// DDL操作工具
Admin admin = conn.getAdmin();
// 先禁用
admin.disableTable(TableName.valueOf("t_user"));
// 再删除
admin.deleteTable(TableName.valueOf("t_user"));
admin.close();
conn.close();
}

插入|更新

public void testPut() throws Exception {
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum", "cts02:2181,cts03:2181,cts04:2181");
Connection conn = ConnectionFactory.createConnection(conf);
//获得表对象
Table table = conn.getTable(TableName.valueOf("t_user"));
//创建put对象 ，一个put操作一行数据，并设置rowkey名称
Put put1 = new Put("001".getBytes());
//添加一个列，要制定列族，和列名称
put1.addColumn("f1".getBytes(), "name".getBytes(), "张三".getBytes());
put1.addColumn("f1".getBytes(), Bytes.toBytes("age"), Bytes.toBytes(28));
Put put2 = new Put("002".getBytes());
// 添加一个列
put2.addColumn("f1".getBytes(), "name".getBytes(), "李四".getBytes());
put2.addColumn("f1".getBytes(), Bytes.toBytes("age"), Bytes.toBytes(38));
ArrayList<Put> puts = new ArrayList<>();
puts.add(put1);
puts.add(put2);
table.put(puts);
table.close();
conn.close();

sparkStreaming向hbase写数据

SparkStreaming怎么向Hbase中写数据。首先，需要说一下，下面的这个方法。
foreachRDD(func)
最通用的输出操作，把func作用于从stream生成的每一个RDD。
注意：这个函数是在运行streaming程序的driver进程中执行的。
下面跟着思路，看一下，怎么优雅的向Hbase中写入数据
向外部写数据常见的错误：
向外部数据库写数据，通常会建立连接，使用连接发送数据(也就是保存数据)。
开发者可能在driver中创建连接，而在spark worker 中保存数据
例如：

 dstream.foreachRDD { rdd =>
  val connection = createNewConnection()  // 这个会在driver中执行
  rdd.foreach { record =>
    connection.send(record) //这个会在 worker中执行
  }
}

上面这种写法是错误的！上面的写法，需要connection 对象被序列化，然后从driver发送到worker。
这样的connection是很少在机器之间传输的。知道这个问题后，我们可以写出以下的，修改后的代码：

dstream.foreachRDD { rdd =>
  rdd.foreach { record =>
    val connection = createNewConnection()
    connection.send(record)
    connection.close()
  }
}

这种写法也是不对的。这会导致，对于每条数据，都创建一个connection(创建connection是消耗资源的)。
下面的方法会好一些：

 dstream.foreachRDD { rdd =>
  rdd.foreachPartition { partitionOfRecords =>
    val connection = createNewConnection()
    partitionOfRecords.foreach(record => connection.send(record))
    connection.close()
  }
}

上面的方法，使用 rdd.foreachPartition 创建一个connection 对象，一个RDD分区中的所有数据，都使用这一个connection。
更优的方法，在多个RDD之间，connection对象是可以重用的，所以可以创建一个连接池。如下

dstream.foreachRDD { rdd =>
  rdd.foreachPartition { partitionOfRecords =>
    // ConnectionPool是一个静态的,延迟初始化的连接池
    val connection = ConnectionPool.getConnection()
    partitionOfRecords.foreach(record => connection.send(record))
    ConnectionPool.returnConnection(connection)  // 返回到池中 以便别人使用  }
}

连接池中的连接应该是，应需求而延迟创建，并且，如果一段时间没用，就超时了(也就是关闭该连接)

实战开发规范操作

在项目实际开发中我们操作hbase 一般需要单独创建一个hbase工具类，方便之后的操作，一下代码涉及的方法详细解析参考上篇博客：https://blog.youkuaiyun.com/aA518189/article/details/89190693

scala版----hbase工具类案例

package com.util.hadoop
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.hbase._
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.util.Bytes
import scala.collection.mutable
object HbaseUtil {
  var conf: Configuration = _
  //线程池
  lazy val connection: Connection = ConnectionFactory.createConnection(conf)
  lazy val admin: Admin = connection.getAdmin
  /**
    * hbase conf
    * @param quorum hbase的zk地址
    * @param port   zk端口2181
    * @return
    */
  def setConf(quorum: String, port: String): Unit = {
    val conf = HBaseConfiguration.create()
    conf.set("hbase.zookeeper.quorum", quorum)
    conf.set("hbase.zookeeper.property.clientPort", port)
    this.conf = conf
  }
  /**
    * 如果不存在就创建表
    * @param tableName 命名空间：表名
    * @param columnFamily 列族
    */
  def createTable(tableName: String, columnFamily: String): Unit = {
    val tbName = TableName.valueOf(tableName)
    if (!admin.tableExists(tbName)) {
      val htableDescriptor = new HTableDescriptor(tbName)
      val hcolumnDescriptor = new HColumnDescriptor(columnFamily)
      htableDescriptor.addFamily(hcolumnDescriptor)
      admin.createTable(htableDescriptor)
    }
  }
  def hbaseScan(tableName: String): ResultScanner = {
    val scan = new Scan()
    val table = connection.getTable(TableName.valueOf(tableName))
    table.getScanner(scan)
//    val scanner: CellScanner = rs.next().cellScanner()
  }
  /**
    * 获取hbase单元格内容
    * @param tableName 命名空间：表名
    * @param rowKey rowkey
    * @return 返回单元格组成的List
    */
  def getCell(tableName: String, rowKey: String): mutable.Buffer[Cell] = {
    val get = new Get(Bytes.toBytes(rowKey))
    /*if (qualifier == "") {
      get.addFamily(family.getBytes())
    } else {
      get.addColumn(family.getBytes(), qualifier.getBytes())
    }*/
    val table = connection.getTable(TableName.valueOf(tableName))
    val result: Result = table.get(get)
    import scala.collection.JavaConverters._
    result.listCells().asScala
    /*.foreach(cell=>{
    val rowKey=Bytes.toString(CellUtil.cloneRow(cell))
    val timestamp = cell.getTimestamp;  //取到时间戳
    val family = Bytes.toString(CellUtil.cloneFamily(cell))  //取到族列
    val qualifier  = Bytes.toString(CellUtil.cloneQualifier(cell))  //取到修饰名
    val value = Bytes.toString(CellUtil.cloneValue(cell))
    println(rowKey,timestamp,family,qualifier,value)
  })*/
  }
  /**
    * 单条插入
    * @param tableName 命名空间：表名
    * @param rowKey rowkey
    * @param family 列族
    * @param qualifier column列
    * @param value 列值
    */
  def singlePut(tableName: String, rowKey: String, family: String, qualifier: String, value: String): Unit = {
    //向表中插入数据//向表中插入数据
    //a.单个插入
    val put: Put = new Put(Bytes.toBytes(rowKey)) //参数是行健row01
    put.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier), Bytes.toBytes(value))
    //获得表对象
    val table: Table = connection.getTable(TableName.valueOf(tableName))
    table.put(put)
    table.close()
  }
  /**
    * 删除数据
    * @param tbName 表名
    * @param row rowkey
    */
  def deleteByRow(tbName:String,row:String): Unit ={
    val delete = new Delete(Bytes.toBytes(row))
//    delete.addColumn(Bytes.toBytes("fm2"), Bytes.toBytes("col2"))
    val table = connection.getTable(TableName.valueOf(tbName))
    table.delete(delete)
  }
  def close(): Unit = {
    admin.close()
    connection.close()
  }
  def main(args: Array[String]): Unit = {
    setConf("ip", "2181")
    /*singlePut("kafka_offset:topic_offset_range", "gid_topic_name", "info", "partition0", "200")
    singlePut("kafka_offset:topic_offset_range", "gid_topic_name", "info", "partition1", "300")
    singlePut("kafka_offset:to·pic_offset_range", "gid_topic_name", "info", "partition2", "100")*/
    val cells = getCell("kafka_offset:grampus_double_groupid", "grampus_erp")
    cells.foreach(cell => {
      val rowKey = Bytes.toString(CellUtil.cloneRow(cell))
      val timestamp = cell.getTimestamp; //取到时间戳
      val family = Bytes.toString(CellUtil.cloneFamily(cell)) //取到族列
      val qualifier = Bytes.toString(CellUtil.cloneQualifier(cell)) //取到修饰名
      val value = Bytes.toString(CellUtil.cloneValue(cell))
      println(rowKey, timestamp, family, qualifier, value)
    })
    /*val topics=List("")
    val resultScanner: ResultScanner = hbaseScan("kafka_offset:topic_offset_range")
    resultScanner.asScala.foreach(rs=>{
      val cells = rs.listCells()
      cells.asScala.foreach(cell => {
        val rowKey = Bytes.toString(CellUtil.cloneRow(cell))
        val qualifier = Bytes.toString(CellUtil.cloneQualifier(cell)) //取到修饰名
        val value = Bytes.toString(CellUtil.cloneValue(cell))
      })
    })*/
//    deleteByRow("bi_odi:redis_ip","900150983cd24fb0d6963f7d28e17f72")
    this.close()
  }
}

java版---hbase工具类案例

public class HbaseUtil {
    public static Configuration conf = null;
    public static Connection connection = null;
    public static Admin admin = null;
    /**
     * @desc 取得连接
     */
    public static void setConf(String quorum, String port) {
        try {
            conf = HBaseConfiguration.create();
            conf.set("hbase.zookeeper.quorum", quorum);//zookeeper地址
            conf.set("hbase.zookeeper.property.clientPort", port);
            connection = ConnectionFactory.createConnection(conf);
            admin = connection.getAdmin();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    /**
     * @desc 连关闭接
     */
    public static void close() {
        try {
            if (connection != null) {
                connection.close();
            }
            if (admin != null) {
                admin.close();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    /**
     * @desc 创建表
     */
    public static void createTable(String tableName, String columnFamily) {
        try {
            TableName tbName = TableName.valueOf(tableName);
            if (!admin.tableExists(tbName)) {
                HTableDescriptor hTableDescriptor = new HTableDescriptor(tbName);
                HColumnDescriptor hColumnDescriptor = new HColumnDescriptor(columnFamily);
                hTableDescriptor.addFamily(hColumnDescriptor);
                admin.createTable(hTableDescriptor);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    /**
     * 添加多条记录
     */
    public static void addMoreRecord(String tableName, String family, String  qualifier, List < String > rowList, String value){
        Table table = null;
        try {
            table = connection.getTable(TableName.valueOf(tableName));
            List<Put> puts = new ArrayList<>();
            Put put = null;
            for (int i = 0; i < rowList.size(); i++) {
                put = new Put(Bytes.toBytes(rowList.get(i)));
                put.addColumn(Bytes.toBytes(family), Bytes.toBytes(qualifier), Bytes.toBytes(value));
                puts.add(put);
            }
            table.put(puts);
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (table != null) {
                try {
                    table.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
    /**
     * @desc 查询rowkey下某一列值
     */
    public static String getValue(String tableName, String rowKey, String family, String qualifier) {
        Table table = null;
        try {
            table = connection.getTable(TableName.valueOf(tableName));
            Get get = new Get(rowKey.getBytes());
            //返回指定列族、列名，避免rowKey下所有数据
            get.addColumn(family.getBytes(), qualifier.getBytes());
            Result rs = table.get(get);
            Cell cell = rs.getColumnLatestCell(family.getBytes(), qualifier.getBytes());
            String value = null;
            if (cell!=null) {
                value = Bytes.toString(CellUtil.cloneValue(cell));
            }
            return value;
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (table!=null){
                try {
                    table.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return null;
    }
     /**
     * @desc 查询rowkey下多列值，一起返回
     */
    public  String[] getQualifierValue(String tableName, String rowKey, String family, String[] qualifier) {
        Table table = null;
        try {
            table = connection.getTable(TableName.valueOf(tableName));
            Get get = new Get(rowKey.getBytes());
            //返回指定列族、列名，避免rowKey下所有数据
            get.addColumn(family.getBytes(), qualifier[0].getBytes());
            get.addColumn(family.getBytes(), qualifier[1].getBytes());
            Result rs = table.get(get);
            // 返回最新版本的Cell对象
            Cell cell = rs.getColumnLatestCell(family.getBytes(), qualifier[0].getBytes());
            Cell cell1 = rs.getColumnLatestCell(family.getBytes(), qualifier[1].getBytes());
            String[] value = new String[qualifier.length];
            if (cell!=null) {
                value[0] = Bytes.toString(CellUtil.cloneValue(cell));
                value[1] = Bytes.toString(CellUtil.cloneValue(cell1));
            }
            return value;
        } catch (IOException e) {
            e.printStackTrace();
            return null;
        } finally {
            if (table!=null){
                try {
                    table.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }

        }

    }
    /**
     * @desc 获取一行数据
     */
    public static List<Cell> getRowCells(String tableName, String rowKey, String family) {
        Table table = null;
        try {
            table = connection.getTable(TableName.valueOf(tableName));
            Get get = new Get(rowKey.getBytes());
            get.addFamily(family.getBytes());
            Result rs = table.get(get);
            List<Cell> cellList =   rs.listCells();
//    		如果需要,遍历cellList
//    		if (cellList!=null) {
//    			String qualifier = null;
//    			String value = null;
//    			for (Cell cell : cellList) {
//    				qualifier = Bytes.toString( cell.getQualifierArray(),cell.getQualifierOffset(),cell.getQualifierLength());
//    				value = Bytes.toString( cell.getValueArray(), cell.getValueOffset(), cell.getValueLength());
//    				System.out.println(qualifier+"--"+value);
//    			}
//			}
            return cellList;
        } catch (IOException e) {
            if (table!=null){
                try {
                    table.close();
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            }
        }
        return null;
    }
/**
     * 全表扫描
     * @param tableName
     * @return
     */
    public static ResultScanner scan(String tableName,String family,String qualifier) {
        Table table = null;
        try {
            table = connection.getTable(TableName.valueOf(tableName));
            Scan scan = new Scan();
            ResultScanner rs = table.getScanner(scan);
//			一般返回ResultScanner，遍历即可
//			if (rs!=null){
//				String row = null;
//				String quali = null;
//    			String value = null;
//				for (Result result : rs) {
//					row = Bytes.toString(CellUtil.cloneRow(result.getColumnLatestCell(family.getBytes(), qualifier.getBytes())));
//					quali =Bytes.toString(CellUtil.cloneQualifier(result.getColumnLatestCell(family.getBytes(), qualifier.getBytes())));
//					value =Bytes.toString(CellUtil.cloneValue(result.getColumnLatestCell(family.getBytes(), qualifier.getBytes())));
//					System.out.println(row+"-"+quali+"-"+value);
//				}
//			}
            return rs;
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (table!=null){
                try {
                    table.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
        return null;
    }
}

hive数据导入hbase

里面所有的hbase操作将会调用上面的hbase工具类，使用BufferedMutatorParams（）方式将数据导入hbase

object Hive2Hbase {
  def main(args: Array[String]): Unit = {
    val session: SparkSession = SparkSession.builder()
      .appName("Hive2Hbase")
      .enableHiveSupport()
      .config("spark.serializer","org.apache.spark.serializer.KryoSerializer")
      .getOrCreate()
    // 执行查询
    print("====================   任务开始       ========================")
    val hiveData: DataFrame = session.sql("select * from t_order")
    HbaseUtil.setConf("ip", "2181")
    val connection = HbaseUtil.connection
    HbaseUtil.createTable("test6", "hiveData")
    val hiveRdd: RDD[Row] = hiveData.rdd
    hiveRdd.foreachRDD { rdd =>
     rdd.foreachPartition { x =>
      val putList = new util.ArrayList[Put]()
      HbaseUtil.setConf("ip", "2181")
      val connection = HbaseUtil.connection
      //获取用户信息
      val userId = x.getAs[String]("id")
      val userName= x.getAs[String]("inamed")
      val userMoney= x.getAs[Double]("money")
      //一个Put对象就是一行记录，在构造方法中指定主键
      val put = new Put(Bytes.toBytes(MD5Util.getMD5(userId + userName)))
      put.addColumn(Bytes.toBytes("hiveData"), Bytes.toBytes("id"), Bytes.toBytes(userId))
        .addColumn(Bytes.toBytes("hiveData"), Bytes.toBytes("name"), Bytes.toBytes(userName))
        .addColumn(Bytes.toBytes("hiveData"), Bytes.toBytes("money"), Bytes.toBytes(userMoney))
      putList.add(put)
      //设置缓存1m，当达到1m时数据会自动刷到hbase
      val params = new BufferedMutatorParams(TableName.valueOf("test6"))
      params.writeBufferSize(1024 * 1024) //设置缓存的大小
      val mutator = connection.getBufferedMutator(params)
      mutator.mutate(putList)
      mutator.flush()
      putList.clear() 
    }
    })
    session.stop()
    HbaseUtil.close()
    println("======================  任务结束  ============")
  }

}

扫一扫加入大数据公众号和技术交流群，了解更多大数据技术，还有免费资料等你哦