大数据(5e)Spark之Scala读写HBase之Phoenix表

该博客介绍了如何利用Spark与Phoenix对HBase进行操作,包括创建Phoenix表用于统计每小时用户数,配置HBase及Spark依赖,通过SparkContext写入数据到HBase,以及使用SparkSession读取并展示数据。源码展示了HBaseConfiguration.create方法创建配置的过程,以及phoenixTableAsDataFrame方法将Phoenix表转换为DataFrame。

Phoenix建表

使用Phoenix对HBase建表,用于统计每小时用户数

create table uv_hour (
uid varchar,
ymdh varchar
constraint primary_key primary key (uid,ymdh)
);

依赖

<properties>
    <spark.version>3.0.3</spark.version>
</properties>

<dependencies>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.12</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.phoenix</groupId>
        <artifactId>phoenix-spark</artifactId>
        <version>5.0.0-HBase-2.0</version>
        <exclusions>
            <exclusion>
                <groupId>org.glassfish</groupId>
                <artifactId>javax.el</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
</dependencies>

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.hadoop.hbase.HBaseConfiguration // HBase配置
import org.apache.phoenix.spark._ // 隐式转换支持:saveToPhoenix
// 创建SparkContext对象
val conf = new SparkConf().setAppName("A0").setMaster("local[2]")
val sc = new SparkContext(conf)
// 创建RDD
val rdd = sc.makeRDD(Seq(
  ("u1", "2021-08-08 09"),
  ("u2", "2021-08-08 09"),
  ("u1", "2021-08-08 10"),
))
// 存HBase之Phoenix表
rdd.saveToPhoenix(
  "UV_HOUR", // 表名
  Seq("UID", "YMDH"), // 字段名
  HBaseConfiguration.create(), // HBase配置
  Some("hadoop102:2181") // ZooKeeper的URL
)

sqlline.py查看测试结果

源码截取:HBaseConfiguration.create

使用create方法创建HBase配置

public staticConfiguration create() {
  Configuration conf = new Configuration();
  // In case HBaseConfiguration is loaded from a different classloader than
  // Configuration, conf needs to be set with appropriate class loader to resolve
  // HBase resources.
  conf.setClassLoader(HBaseConfiguration.class.getClassLoader());
  return addHbaseResources(conf);
}

自动获取hbase-default.xmlhbase-site.xml配置文件信息

public static Configuration addHbaseResources(Configuration conf) {
  conf.addResource("hbase-default.xml");
  conf.addResource("hbase-site.xml");

  checkDefaultsVersion(conf);
  return conf;
}

import org.apache.spark.sql.SparkSession
import org.apache.phoenix.spark.toSparkSqlContextFunctions
val spark = SparkSession
  .builder()
  .appName("phoenix-test")
  .master("local[2]")
  .getOrCreate()
// 读HBase之Phoenix表
spark.sqlContext.phoenixTableAsDataFrame(
  table = "UV_HOUR",
  columns = Seq("UID", "YMDH"),
  predicate = Some("UID='u1'"),
  zkUrl = Some("hadoop102:2181"),
).show()

打印结果

源码截取:phoenixTableAsDataFrame

import org.apache.hadoop.conf.Configuration
import org.apache.spark.sql.{DataFrame, SQLContext}

class SparkSqlContextFunctions(@transient val sqlContext: SQLContext) extends Serializable {

  /*
  This will return a Spark DataFrame, with Phoenix types converted Spark SQL catalyst types

  'table' is the corresponding Phoenix table
  'columns' is a sequence of of columns to query
  'predicate' is a set of statements to go after a WHERE clause, e.g. "TID = 123"
  'zkUrl' is an optional Zookeeper URL to use to connect to Phoenix
  'conf' is a Hadoop Configuration object. If zkUrl is not set, the "hbase.zookeeper.quorum"
    property will be used
 */
  def phoenixTableAsDataFrame(table: String, columns: Seq[String],
                               predicate: Option[String] = None,
                               zkUrl: Option[String] = None,
                               tenantId: Option[String] = None,
                               conf: Configuration = new Configuration): DataFrame = {

    // Create the PhoenixRDD and convert it to a DataFrame
    new PhoenixRDD(sqlContext.sparkContext, table, columns, predicate, zkUrl, conf, tenantId = tenantId)
      .toDataFrame(sqlContext)
  }
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

小基基o_O

您的鼓励是我创作的巨大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值