spark 操作 hbase

最新推荐文章于 2025-05-22 11:11:17 发布

原创最新推荐文章于 2025-05-22 11:11:17 发布 · 307 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#spark #hbase #big data

bigdata 专栏收录该内容

60 篇文章

订阅专栏

本文介绍了如何在Spark中使用Hive，包括如何通过添加依赖、配置Hive-site.xml、构造HiveContext以及执行SQL操作。重点展示了如何创建表、导入数据和使用HiveQL查询。

Hive Tables

Spark SQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, it is not included in the default Spark assembly. Hive support is enabled by adding the -Phive and -Phive-thriftserver flags to Spark’s build. This command builds a new assembly jar that includes Hive. Note that this Hive assembly jar must also be present on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries (SerDes) in order to access data stored in Hive.

Configuration of Hive is done by placing your hive-site.xml file in conf/.

When working with Hive one must construct a HiveContext, which inherits from SQLContext, and adds support for finding tables in the MetaStore and writing queries using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext. When not configured by the hive-site.xml, the context automatically creates metastore_db and warehouse in the current directory.

// sc is an existing SparkContext.

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")

sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")

// Queries are expressed in HiveQL

sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)

RDD to hbase table

参考

http://www.cloudera.com/content/www/zh-CN/documentation/enterprise/5-3-x/topics/admin_hbase_import.html#concept_asc_ctz_wp_unique_1