HIVE 和 SPARK 中的 JOIN
参考了以下文档:
②
JOIN 的作用
JOIN 用来将又关系的行进行关联
常见的 JOIN
类型有:
[ INNER ] | CROSS | LEFT [ OUTER ] | [ LEFT ] SEMI | RIGHT [ OUTER ] | FULL [ OUTER ] | [ LEFT ] ANTI
INNER JOIN(内连接)
Inner Join 是 Spark
中默认的连接,将两张表中符合关联条件的行返回,Inner
可以省略
-- 语法
relation [ INNER ] JOIN relation [ join_criteria ]
SELECT id
,name
,employee.deptno
,deptname
FROM employee
INNER JOIN department
ON employee.deptno = department.deptno
;
LEFT JOIN(左外连接 | 左连接)
返回左表的所有值和右表的匹配值,如果没有匹配,则附加 NULL
。它也称为左外连接
注意:关联条件左边的表称为 左表(也称为主表)
,关联条件右边的表称为 右表(也称为副表)
-- 语法
relation LEFT [ OUTER ] JOIN relation [ join_criteria ]
SELECT id
,name
,employee.deptno
,deptname
FROM employee
LEFT JOIN department
ON employee.deptno = department.deptno;
注意:无论 ON
后面再加什么条件都不能改变结果个数,能改变的只是右表查出来的数据是不是 NULL
,示例:
-- 对 employee.name 进行过滤,只对右表生效,仔细想一下
SELECT id
,name
,employee.deptno
,deptname
FROM employee
LEFT JOIN department
ON employee.deptno = department.deptno AND employee.name = 'Lisa';
-- 对 epartment.deptno 进行过滤,只对右表生效,仔细想一下
SELECT id
,name
,employee.deptno
,deptname
FROM employee
LEFT JOIN department
ON employee.deptno = department.deptno AND department.deptno = 3;
RIGHT JOIN(右外连接 | 右连接)
右连接返回右表的所有值和左表的匹配值,如果没有匹配,则附加 NULL
-- 语法格式
relation RIGHT [ OUTER ] JOIN relation [ join_criteria ]
SELECT id
,name
,employee.deptno
,deptname
FROM employee
RIGHT JOIN department
ON employee.deptno = department.deptno;
员工表中增加一条数据,部门号为 3
,然后再次执行查询
FULL JOIN(完全连接 | 完全外连接)
完全连接返回两个表的所有值,在没有匹配的一侧加 NULL
值,返回的行数是两个表的总行数
-- 语法格式
relation FULL [ OUTER ] JOIN relation [ join_criteria ]
SELECT id
,name
,employee.deptno
,deptname
FROM employee
FULL JOIN department
ON employee.deptno = department.deptno;
向 部门表
中加一条数据,部门号为 7
此时再看
CROSS JOIN(交叉连接 | 笛卡尔积)
交叉连接返回两个表的笛卡尔积(左表 3 行,右表 2 行,笛卡尔积后结果为 6 行)
-- 格式
relation CROSS JOIN relation [ join_criteria ]
SELECT id
,name
,employee.deptno
,deptname
FROM employee
CROSS JOIN department;
SEMI JOIN(半连接)
从左表返回与右表匹配的值(并不是返回左表的全部行,只返回与右表匹配的行,右表字段是不会展示出来的,左表的字段都会显示)
-- 语法格式
relation [ LEFT ] SEMI JOIN relation [ join_criteria ]
-- Spark 中为 SEMI JOIN
SELECT *
FROM employee
SEMI JOIN department
ON employee.deptno = department.deptno;
-- Hive 中为 LEFT SEMI JOIN
SELECT *
FROM employee
LEFT SEMI JOIN department
ON employee.deptno = department.deptno;
ANTI JOIN(反连接)
从与右表中不匹配的左表返回值
-- 语法格式
relation [ LEFT ] ANTI JOIN relation [ join_criteria ]
-- Spark 独有的格式,Hive 中不支持
SELECT *
FROM employee
ANTI JOIN department
ON employee.deptno = department.deptno;