Hive on Spark操作

本文介绍如何在Cloudera CDH 5.13.1版本中配置Hive on Spark,包括设置YARN服务上的Spark依赖、重启部署Hive客户端等步骤,并提供了执行示例与性能对比。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

https://www.cloudera.com/documentation/enterprise/latest/topics/admin_hos_oview.html

Configuring the Hive Dependency on a Spark Service

By default, if a Spark service is available, the Hive dependency on the Spark service is configured. To change this configuration, do the following:

  1. In the Cloudera Manager Admin Console, go to the Hive service.

  2. Click the Configuration tab.

  3. Search for the Spark On YARN Service. To configure the Spark service, select the Spark service name. To remove the dependency, select none.

  4. Click Save Changes.

  5. Go to the Spark service.

  6. Add a Spark gateway role to the host running HiveServer2.

  7. Return to the Home page by clicking the Cloudera Manager logo.

  8. Click the icon next to any stale services to invoke the cluster restart wizard.

  9. Click Restart Stale Services.

  10. Click Restart Now.

  11. Click Finish.

  12. In the Hive client, configure the Spark execution engine.

 

一、配置hive on spark(spark 的 gateway要和hiveserve2在同一个host上)

 

官网在5.7之前的版本都这样描述:

Important: Hive on Spark is included in CDH 5.6 but is not currently supported nor recommended for production use. To try this feature in CDH 5.6, use it in a test environment.

 

 

以下操作是在CDH5.13.1版本上执行的:

 

hive  > select count(*) from tb_emp_info;

Query ID = hdfs_20180205153434_df36b886-80b0-4ed8-a9fc-0a2efb0ca071

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Job = job_1517815091654_0007, Tracking URL = http://bd129106:8088/proxy/application_1517815091654_0007/

Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/hadoop job  -kill job_1517815091654_0007

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2018-02-05 15:34:56,439 Stage-1 map = 0%,  reduce = 0%

2018-02-05 15:35:02,880 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.04 sec

2018-02-05 15:35:08,214 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.33 sec

MapReduce Total cumulative CPU time: 2 seconds 330 msec

Ended Job = job_1517815091654_0007

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.33 sec   HDFS Read: 7529 HDFS Write: 3 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 330 msec

OK

10

Time taken: 26.218 seconds, Fetched: 1 row(s)

 

hive默认执行引擎是mr;

 

硬件要求:

 

 

 

设置YARN服务上的Spark,可以为Spark也可以是none(5.7版本之前必须选择是Spark,选择Spark的话,hive必须依赖于yarn)

 

重启部署hive客户端:

 

进入hive:

 

hive> set hive.execution.engine=spark;

hive> select count(*) from tb_emp_info;

Query ID = hdfs_20180205160909_79be0d1b-e8a3-496a-ae5f-e990b71d4991

Total jobs = 1

Launching Job 1 out of 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Spark Job = d7b17038-8156-4f1b-b8d6-443c1db77f77

Running with YARN Application = application_1517815091654_0009

Kill Command = /opt/cloudera/parcels/CDH-5.13.1-1.cdh5.13.1.p0.2/lib/hadoop/bin/yarn application -kill application_1517815091654_0009

Query Hive on Spark job[0] stages:

0

1

Status: Running (Hive on Spark job[0])

Job Progress Format

CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]

2018-02-05 16:09:49,725 Stage-0_0: 0(+1)/1      Stage-1_0: 0/1

2018-02-05 16:09:51,753 Stage-0_0: 1/1 Finished Stage-1_0: 0/1

2018-02-05 16:09:52,761 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished

Status: Finished successfully in 10.09 seconds

OK

10

Time taken: 23.477 seconds, Fetched: 1 row(s)

 

查看结果:

 

 

二、 配置修改(参数优化)

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值