Hive on Spark操作

最新推荐文章于 2025-03-10 18:42:48 发布

鲲鹏小飞猪

最新推荐文章于 2025-03-10 18:42:48 发布

阅读量2.3k

点赞数 3

分类专栏： Hadoop生态 Linux

本文链接：https://blog.youkuaiyun.com/zyl651334919/article/details/88895002

版权

Hadoop生态同时被 2 个专栏收录

27 篇文章

订阅专栏

Linux

9 篇文章

订阅专栏

本文介绍如何在Cloudera CDH 5.13.1版本中配置Hive on Spark，包括设置YARN服务上的Spark依赖、重启部署Hive客户端等步骤，并提供了执行示例与性能对比。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

https://www.cloudera.com/documentation/enterprise/latest/topics/admin_hos_oview.html

Configuring the Hive Dependency on a Spark Service

By default, if a Spark service is available, the Hive dependency on the Spark service is configured. To change this configuration, do the following:

In the Cloudera Manager Admin Console, go to the Hive service.
Click the Configuration tab.
Search for the Spark On YARN Service. To configure the Spark service, select the Spark service name. To remove the dependency, select none.
Click Save Changes.
Go to the Spark service.
Add a Spark gateway role to the host running HiveServer2.
Return to the Home page by clicking the Cloudera Manager logo.
Click the icon next to any stale services to invoke the cluster restart wizard.
Click Restart Stale Services.
Click Restart Now.
Click Finish.
In the Hive client, configure the Spark execution engine.

一、配置hive on spark（spark 的 gateway要和hiveserve2在同一个host上）

官网在5.7之前的版本都这样描述：

Important: Hive on Spark is included in CDH 5.6 but is not currently supported nor recommended for production use. To try this feature in CDH 5.6, use it in a test environment.

以下操作是在CDH5.13.1版本上执行的：

hive > select count(*) from tb_emp_info;

Query ID = hdfs_20180205153434_df36b886-80b0-4ed8-a9fc-0a2efb0ca071

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1517815091654_0007, Tracking URL = http://bd129106:8088/proxy/application_1517815091654_0007/

Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job -kill job_1517815091654_0007

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2018-02-05 15:34:56,439 Stage-1 map = 0%, reduce = 0%

2018-02-05 15:35:02,880 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.04 sec

2018-02-05 15:35:08,214 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.33 sec

MapReduce Total cumulative CPU time: 2 seconds 330 msec

Ended Job = job_1517815091654_0007

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.33 sec HDFS Read: 7529 HDFS Write: 3 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 330 msec

Time taken: 26.218 seconds, Fetched: 1 row(s)

hive默认执行引擎是mr；

硬件要求：

设置YARN服务上的Spark，可以为Spark也可以是none（5.7版本之前必须选择是Spark，选择Spark的话，hive必须依赖于yarn）

重启部署hive客户端：

进入hive：

hive> set hive.execution.engine=spark;

hive> select count(*) from tb_emp_info;

Query ID = hdfs_20180205160909_79be0d1b-e8a3-496a-ae5f-e990b71d4991

Total jobs = 1

Launching Job 1 out of 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Spark Job = d7b17038-8156-4f1b-b8d6-443c1db77f77

Running with YARN Application = application_1517815091654_0009

Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/yarn application -kill application_1517815091654_0009

Query Hive on Spark job[0] stages:

Status: Running (Hive on Spark job[0])

Job Progress Format

CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]

2018-02-05 16:09:49,725 Stage-0_0: 0(+1)/1 Stage-1_0: 0/1

2018-02-05 16:09:51,753 Stage-0_0: 1/1 Finished Stage-1_0: 0/1

2018-02-05 16:09:52,761 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished

Status: Finished successfully in 10.09 seconds

Time taken: 23.477 seconds, Fetched: 1 row(s)

查看结果：

二、配置修改（参数优化）