hive 连接查询sql对比效率

本文通过具体实例介绍如何使用Hive SQL进行表的创建、数据导入及各种查询操作,包括子查询、分组等高级功能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

准备4个表

从mysql 导出excel 转换为txt

创建hive 表的导入文件

create table bdqn_student(
sno int,
sname string,
sbirthdate string,
sgender string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ STORED AS TEXTFILE;

create table bdqn_teacher(
tno int,
tname string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ STORED AS TEXTFILE;

create table bdqn_course(
cno int,
cname string,
tno int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ STORED AS TEXTFILE;

create table bdqn_score(
sno int,
cno int,
score string)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’ STORED AS TEXTFILE;

Time taken: 4.246 seconds, Fetched: 1 row(s)
hive> create table bdqn_student(

sno int,
sname string,
sbirthdate string,
sgender string);
OK
Time taken: 0.583 seconds
hive> create table bdqn_teacher(
tno int,
tname string);
OK
Time taken: 0.106 seconds
hive> create table bdqn_course(
cno int,
cname string,
tno int);
OK
Time taken: 0.105 seconds
hive>
create table bdqn_score(
sno int,
cno int,
score string);
OK
Time taken: 0.094 seconds

Time taken: 0.094 seconds
hive> show tables;
OK
bdqn_course
bdqn_score
bdqn_student
bdqn_teacher
ncdc
Time taken: 0.021 seconds, Fetched: 5 row(s)

一共四个表

load data local inpath ‘/opt/hadoop/hadoopDATA/sql_Query_do_not_delete/course.txt’ into table bdqn_course
load data local inpath ‘/opt/hadoop/hadoopDATA/sql_Query_do_not_delete/student.txt’ into table bdqn_student
load data local inpath ‘/opt/hadoop/hadoopDATA/sql_Query_do_not_delete/teacher.txt’ into table bdqn_teacher
load data local inpath ‘/opt/hadoop/hadoopDATA/sql_Query_do_not_delete/score.txt’ into table bdqn_score

中文乱码问题解决:

解决方法:
1、修改远程linux机器的配置
[root@rhel ~]#vi /etc/sysconfig/i18n
把LANG改成支持UTF-8的字符集
如: LANG=”zh_CN.UTF-8″ 或者是 LANG=”en_US.UTF-8″ 本文修改为后者
2、修改Secure CRT的Session Options
Options->Session Options->Appearance->Font->新宋体 字符集:中文GB2312 ->Character encoding 为UTF-8
3、OK.

查询:
查询平均成绩大于等于60分的同学的学生编号和学生姓名和平均成绩(提示:子查询,分组)
select st.sname, ascore from bdqn_student st join
(select sno,avg(score) ascore from bdqn_score group by sno having avg(score)>=60) sc on sc.sno=st.sno

hive> select st.sname, ascore from bdqn_student st join

(select sno,avg(score) ascore from bdqn_score group by sno having avg(score)>=60) sc on sc.sno=st.sno;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201507050950_0007, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0007
Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0007
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2015-07-06 15:46:11,004 Stage-2 map = 0%, reduce = 0%
2015-07-06 15:46:15,029 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:16,034 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:17,040 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:18,046 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:19,051 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:20,057 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:21,063 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:22,068 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:23,074 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:24,079 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
2015-07-06 15:46:25,090 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec
2015-07-06 15:46:26,096 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec
2015-07-06 15:46:27,102 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec
2015-07-06 15:46:28,108 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec
MapReduce Total cumulative CPU time: 5 seconds 80 msec
Ended Job = job_201507050950_0007
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201507050950_0008, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0008
Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0008
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2015-07-06 15:46:35,818 Stage-1 map = 0%, reduce = 0%
2015-07-06 15:46:39,836 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.85 sec
2015-07-06 15:46:40,841 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.85 sec
2015-07-06 15:46:41,848 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
2015-07-06 15:46:42,853 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
2015-07-06 15:46:43,859 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
2015-07-06 15:46:44,864 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
2015-07-06 15:46:45,869 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
2015-07-06 15:46:46,875 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
2015-07-06 15:46:47,880 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 3.69 sec
2015-07-06 15:46:48,888 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.73 sec
2015-07-06 15:46:49,894 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.73 sec
2015-07-06 15:46:50,900 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.73 sec
2015-07-06 15:46:51,906 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.73 sec
MapReduce Total cumulative CPU time: 6 seconds 730 msec
Ended Job = job_201507050950_0008
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 5.08 sec HDFS Read: 377 HDFS Write: 226 SUCCESS
Job 1: Map: 2 Reduce: 1 Cumulative CPU: 6.73 sec HDFS Read: 1109 HDFS Write: 73 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 810 msec
OK
赵雷 89.66666666666667
钱电 70.0
孙风 80.0
周梅 81.5
郑竹 93.5
Time taken: 51.375 seconds, Fetched: 5 row(s)

Hive只支持在FROM子句中使用子查询,子查询必须有名字,并且列必须唯一:SELECT … FROM(subquery) name …

这个如果要写成mapred的话,将会非常复杂,但是一个简单的子查询就搞定啦。也可以看到,其实这个查询是有两个job的。

3. 查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩

select st.sname, ascore ,sum from bdqn_student st join
(select sno,sum(score) ascore,count(*) sum from bdqn_score group by sno) sc on sc.sno=st.sno

hive> select st.sname, ascore ,sum from bdqn_student st join

(select sno,sum(score) ascore,count(*) sum from bdqn_score group by sno) sc on sc.sno=st.sno
;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201507050950_0009, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0009
Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0009
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2015-07-06 16:00:40,162 Stage-2 map = 0%, reduce = 0%
2015-07-06 16:00:43,179 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2015-07-06 16:00:44,184 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2015-07-06 16:00:45,189 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2015-07-06 16:00:46,194 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2015-07-06 16:00:47,199 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2015-07-06 16:00:48,205 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2015-07-06 16:00:49,210 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2015-07-06 16:00:50,215 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
2015-07-06 16:00:51,220 Stage-2 map = 100%, reduce = 33%, Cumulative CPU 1.65 sec
2015-07-06 16:00:52,225 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.57 sec
2015-07-06 16:00:53,231 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.57 sec
2015-07-06 16:00:54,236 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.57 sec
2015-07-06 16:00:55,242 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.57 sec
MapReduce Total cumulative CPU time: 4 seconds 570 msec
Ended Job = job_201507050950_0009
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201507050950_0010, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0010
Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0010
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2015-07-06 16:01:01,938 Stage-1 map = 0%, reduce = 0%
2015-07-06 16:01:04,952 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.27 sec
2015-07-06 16:01:05,957 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.27 sec
2015-07-06 16:01:06,962 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.27 sec
2015-07-06 16:01:07,967 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
2015-07-06 16:01:08,972 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
2015-07-06 16:01:09,978 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
2015-07-06 16:01:10,983 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
2015-07-06 16:01:11,988 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
2015-07-06 16:01:12,993 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
2015-07-06 16:01:13,999 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
2015-07-06 16:01:15,005 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.52 sec
2015-07-06 16:01:16,011 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.52 sec
2015-07-06 16:01:17,016 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.52 sec
MapReduce Total cumulative CPU time: 5 seconds 520 msec
Ended Job = job_201507050950_0010
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.57 sec HDFS Read: 377 HDFS Write: 285 SUCCESS
Job 1: Map: 2 Reduce: 1 Cumulative CPU: 5.52 sec HDFS Read: 1170 HDFS Write: 104 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 90 msec
OK
赵雷 269.0 3
钱电 210.0 3
孙风 240.0 3
李云 100.0 3
周梅 163.0 2
吴兰 65.0 2
郑竹 187.0 2
Time taken: 44.616 seconds, Fetched: 7 row(s)

8. 查询没有学全所有课程的同学的信息

select * from bdqn_student st join (
select sno, count() from bdqn_score group by sno having count()<>3) temp on temp.sno=st.sno

Time taken: 44.616 seconds, Fetched: 7 row(s)
hive>

select * from bdqn_student st join (
select sno, count() from bdqn_score group by sno having count()<>3) temp on temp.sno=st.sno;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201507050950_0011, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0011
Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-07-06 16:05:29,038 Stage-1 map = 0%, reduce = 0%
2015-07-06 16:05:32,051 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
2015-07-06 16:05:33,057 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
2015-07-06 16:05:34,062 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
2015-07-06 16:05:35,067 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
2015-07-06 16:05:36,072 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
2015-07-06 16:05:37,077 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
2015-07-06 16:05:38,082 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
2015-07-06 16:05:39,088 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
2015-07-06 16:05:40,093 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 1.21 sec
2015-07-06 16:05:41,098 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 1.21 sec
2015-07-06 16:05:42,103 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.63 sec
2015-07-06 16:05:43,109 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.63 sec
2015-07-06 16:05:44,115 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.63 sec
MapReduce Total cumulative CPU time: 4 seconds 630 msec
Ended Job = job_201507050950_0011
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapred.reduce.tasks=
Starting Job = job_201507050950_0012, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0012
Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0012
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
2015-07-06 16:05:51,818 Stage-2 map = 0%, reduce = 0%
2015-07-06 16:05:54,833 Stage-2 map = 50%, reduce = 0%, Cumulative CPU 1.0 sec
2015-07-06 16:05:55,838 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
2015-07-06 16:05:56,844 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
2015-07-06 16:05:57,849 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
2015-07-06 16:05:58,854 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
2015-07-06 16:05:59,859 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
2015-07-06 16:06:00,865 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
2015-07-06 16:06:01,870 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
2015-07-06 16:06:02,875 Stage-2 map = 100%, reduce = 33%, Cumulative CPU 2.06 sec
2015-07-06 16:06:03,881 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.92 sec
2015-07-06 16:06:04,887 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.92 sec
2015-07-06 16:06:05,893 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.92 sec
MapReduce Total cumulative CPU time: 4 seconds 920 msec
Ended Job = job_201507050950_0012
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.63 sec HDFS Read: 377 HDFS Write: 153 SUCCESS
Job 1: Map: 2 Reduce: 1 Cumulative CPU: 4.92 sec HDFS Read: 1038 HDFS Write: 79 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 550 msec
OK
5 周梅 1991/12/1 女 5 2
6 吴兰 1992/3/1 女 6 2
7 郑竹 1989/7/1 女 7 2
Time taken: 43.597 seconds, Fetched: 3 row(s)

版权声明:本文为博主原创文章,未经博主允许不得转载。

转载于:https://www.cnblogs.com/mrcharles/p/4731715.html

<think>好的,用户想比较HiveSQL与传统SQL的区别以及各自的用法。首先,我需要回忆HiveSQL传统SQL的基本概念。HiveSQL是用于Hadoop生态系统的数据仓库工具,而传统SQL通常指关系型数据库如MySQL、PostgreSQL等使用的SQL。 接下来,用户提到的引用内容可能涉及兼容性、功能支持扩展性。引用1提到Hive Metastore的兼容性不同版本的支持,引用2提到Impala的SQL支持,包括UDFUDA,引用3提到ACID特性。这些点可能在比较中用到。 首先,从架构设计开始。HiveSQL基于Hadoop,适合处理大数据,而传统SQL用于OLTP。然后,数据规模方面,Hive处理PB级数据,传统SQL处理GB到TB级。延迟方面,Hive可能有较高的延迟,传统SQL响应更快。 功能差异方面,HiveSQL支持UDF/UDAF,如引用2中的Impala也支持这些,但Hive可能更多依赖MapReduce或Tez。传统SQL的事务支持更完善,如引用3提到的ACID在Hive中可能有限,特别是旧版本。 语法差异,比如Hive的分区、分桶,而传统SQL可能有不同的优化方式。数据类型方面,Hive支持复杂类型如数组,传统SQL更偏向标量类型,如引用2中的STRING、TIMESTAMP等。 部署环境方面,Hive需要Hadoop生态系统,传统SQL独立部署。扩展性方面,Hive水平扩展容易,传统SQL垂直扩展为主。 使用场景方面,Hive适合大数据分析、批处理,传统SQL适合事务处理、实时查询。例如,引用1提到Spark SQLHive Metastore的集成,说明Hive大数据生态中的位置。 可能需要检查Hive是否支持ACID,根据引用3,Hive在较新版本支持,但传统SQL如MySQL更早支持。此外,Hive的UDF传统SQL的UDF实现方式可能不同,如引用2提到Impala的UDF用C++/Java,而传统SQL可能用不同语言。 用户可能需要了解何时选择HiveSQL,何时用传统SQL。比如大数据处理选Hive,高频事务用传统SQL。同时,Hive的优化方法如分区、索引与传统SQL的索引机制不同。 最后,生成相关问题时要覆盖性能优化、应用场景、功能对比、事务处理等,帮助用户深入理解。</think>### HiveSQL与传统SQL对比分析 #### 1. **架构设计与应用场景** - **HiveSQL** 基于Hadoop生态系统设计,面向大数据分析场景(如数仓、ETL),适合处理$PB$级数据。底层依赖MapReduce/Tez/Spark计算框架,通过元数据管理(Hive Metastore)实现结构映射[^1]。 **典型场景**:日志分析、离线批处理、大规模数据聚合。 - **传统SQL** 服务于关系型数据库(如MySQL、Oracle),面向OLTP(联机事务处理),处理$GB$到$TB$级数据,强调低延迟高并发。 **典型场景**:电商交易、金融账户管理、实时查询。 #### 2. **功能差异** | **特性** | **HiveSQL** | **传统SQL** | |------------------|--------------------------------------|--------------------------------| | **事务支持** | 有限ACID(需配置,新版本支持更完善)[^3] | 完整ACID支持 | | **索引机制** | 支持分区、分桶,较少使用B树索引 | B树、哈希索引优化查询 | | **扩展性** | 水平扩展(依赖HDFS) | 垂直扩展为主 | | **UDF/UDAF** | 支持Java/Python等扩展[^1][^2] | 支持存储过程、触发器 | | **延迟** | 分钟级至小时级 | 毫秒级至秒级 | #### 3. **语法与数据类型** - **HiveSQL特有语法** - 分区:`CREATE TABLE ... PARTITIONED BY (dt STRING)` - 分桶:`CLUSTERED BY (id) INTO 10 BUCKETS` - 复杂数据类型:`ARRAY<STRING>`, `MAP<STRING, INT>` - 示例: ```sql SELECT TRANSFORM(col1) USING 'python_script.py' FROM table; ``` - **传统SQL特性** - 多JOIN优化(如嵌套循环、哈希连接) - 严格的事务控制:`BEGIN TRANSACTION; ... COMMIT;` - 数据类型以标量为主(如`VARCHAR(255)`、`DECIMAL(10,2)`)[^2] #### 4. **性能优化对比** - **HiveSQL优化** - 数据分区减少扫描范围 - 向量化查询Hive 0.13+) - Tez引擎替代MapReduce提升DAG执行效率 - **传统SQL优化** - 索引选择性分析 - 查询执行计划调优(如EXPLAIN命令) - 内存缓存机制(如InnoDB Buffer Pool) #### 5. **部署与生态** - **HiveSQL依赖** Hadoop集群(HDFS+YARN)、元数据存储(如MySQL或Derby)[^1] 可集成Spark、HBase等组件实现混合计算。 - **传统SQL依赖** 独立数据库服务器(如Oracle RAC、MySQL主从架构) 生态工具侧重备份恢复(如mysqldump)、监控(如Prometheus)。 --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值