spark sql中的sqlcontext与hivecontext区别

SparkSQL的模块用于处理结构化数据,包括DataFrame和SQL查询引擎。HiveContext是SQLContext的超集,允许执行SQL查询和Hive命令。在Spark 2.0之后,虽然对Hive的依赖减少,但若要使用Hive功能,仍需HiveContext。HiveContext具有更好的SQL解析和对Hive UDF的支持,并且是启动Thrift服务器的必要条件,但其依赖较大。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

很困惑这两者有什么区别,然后谷歌。
One of Sparks’s modules is SparkSQL. SparkSQL can be used to process structured data, so with SparkSQL your data must have a defined schema. In Spark 1.3.1, SparkSQL implements dataframes and a SQL query engine. SparkSQL has a SQLContext and a HiveContext. HiveContext is a super set of the SQLContext. Hortonworks and the Spark community suggest using the HiveContext. You can see below that when you run spark-shell, which is your interactive driver application, it automatically creates a SparkContext defined as sc and a HiveContext defined as sqlContext. The HiveContext allows you to execute SQL queries as well as Hive commands. The same behavior occurs for pyspark. You can review the Spark 1.3.1 documentation for SQLContext and HiveContext at SQLContext documentation and HiveContext documentation.
原文https://blogs.msdn.microsoft.com/bigdatasupport/2015/09/14/understanding-sparks-sparkconf-sparkcontext-sqlcontext-and-hivecontext/

Spark 2.0 provides native window functions (SPARK-8641) and features some additional improvements in parsing and much better SQL 2003 compliance so it is significantly less dependent on Hive to achieve core funcionality and because of that HiveContext seems to be slightly less important.

Spark < 2.0

Obviously if you want to work with Hive you have to use HiveContext. Beyond that the biggest difference as for now (Spark 1.5) is a support for window functions and ability to access Hive UDFs.

Generally speaking window functions are a pretty cool feature and can be used to solve quite complex problems in a concise way without going back and forth between RDDs and DataFrames. Performance is still far from optimal especially without PARTITION BY clause but it is really nothing Spark specific.

Regarding Hive UDFs it is not a serious issue now, but before Spark 1.5 many SQL functions have been expressed using Hive UDFs and required HiveContext to work.

HiveContext also provides more robust SQL parser. See for example: py4j.protocol.Py4JJavaError when selecting nested column in dataframe using select statetment

Finally HiveContext is required to start Thrift server.

The biggest problem with HiveContext is that it comes with large dependencies.
原文:
http://stackoverflow.com/questions/33666545/what-is-the-difference-between-apache-spark-sqlcontext-vs-hivecontext

还有听一个人说,是因为企业目前都是用hive

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值