1.什么情况下,可以不启用MapReduce
Job?
2.方法1通过什么方式,不启用job?
3.bin/hive --hiveconf hive.fetch.task.conversion=more的作用是什么?
4.如果一直开启不使用MapReduce Job,该如何配置?
如果你想查询某个表的某一列,Hive默认是会启用MapReduce Job来完成这个任务,如下:
-
hive> SELECT id, money FROM m limit 10;
-
Total MapReduce jobs = 1
-
Launching Job 1 out of 1
-
Number of reduce tasks is set to 0 since there's no reduce operator
-
Cannot run job locally: Input Size (= 235105473) is larger than
-
hive.exec.mode.local.auto.inputbytes.max (= 134217728)
-
Starting Job = job_1384246387966_0229, Tracking URL =
-
-
http://l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/
-
-
Kill Command = /home/q/hadoop-2.2.0/bin/hadoop job
-
-kill job_1384246387966_0229
-
hadoop job information for Stage-1: number of mappers: 1;
-
number of reducers: 0
-
2013-11-13 11:35:16,167 Stage-1 map = 0%, reduce = 0%
-
2013-11-13 11:35:21,327 Stage-1 map = 100%, reduce = 0%,
-
Cumulative CPU 1.26 sec
-
2013-11-13 11:35:22,377 Stage-1 map = 100%, reduce = 0%,
-
Cumulative CPU 1.26 sec
-
MapReduce Total cumulative CPU time: 1 seconds 260 msec
-
Ended Job = job_1384246387966_0229
-
MapReduce Jobs Launched:
-
Job 0: Map: 1 Cumulative CPU: 1.26 sec
-
HDFS Read: 8388865 HDFS Write: 60 SUCCESS
-
Total MapReduce CPU Time Spent: 1 seconds 260 msec
-
OK
-
1 122
-
1 185
-
1 231
-
1 292
-
1 316
-
1 329
-
1 355
-
1 356
-
1 362
-
1 364
-
Time taken: 16.802 seconds, Fetched: 10 row(s)
我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题,从Hive0.10.0版本开始,对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句,不需要起MapReduce
job,直接通过Fetch task获取数据,可以通过下面几种方法实现:
方法一:
-
hive> set hive.fetch.task.conversion=more;
-
hive> SELECT id, money FROM m limit 10;
-
OK
-
1 122
-
1 185
-
1 231
-
1 292
-
1 316
-
1 329
-
1 355
-
1 356
-
1 362
-
1 364
-
Time taken: 0.138 seconds, Fetched: 10 row(s)
上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!
方法二:
-
bin/hive --hiveconf hive.fetch.task.conversion=more
方法三:
上面的两种方法都可以开启了Fetch任务,但是都是临时起作用的;如果你想一直启用这个功能,可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置:
-
<property>
-
<name>hive.fetch.task.conversion</name>
-
<value>more</value>
-
<description>
-
Some select queries can be converted to single FETCH task
-
minimizing latency.Currently the query should be single
-
sourced not having any subquery and should not have
-
any aggregations or distincts (which incurrs RS),
-
lateral views and joins.
-
1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
-
2. more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
-
</description>
-
</property>
这样就可以长期启用Fetch任务了,很不错吧,也赶紧去试试吧!