SQL集合运算详解-优快云博客

1.集合运算简介：

There are situations when we need to combine the results from two or more SELECT statements. SQL enables us to handle these requirements by using set operations. The result of each SELECT statement can be treated as a set, and SQL set operations can be applied on those sets to arrive at a final result. Oracle SQL supports the following four set operations:

· UNION ALL

· UNION

· MINUS

· INTERSECT

SQL statements containing these set operators are referred to as compound queries, and each SELECT statement in a compound query is referred to as a component query. Two SELECTs can be combined into a compound query by a set operation only if they satisfy the following two conditions:

·The result sets of both the queries must have the same number of columns.

·The datatype of each column in the second result set must match the datatype of its corresponding column in the first result set.

两个SELECT语句要合成一个混合查询(compound query)，必须满足以下两个条件：

·两个查询的结果集其列数必须相同

·第二个查询的结果集其字段的类型必须和第一个查询的结果集的字段类型相同，但是如果Oracle能够对字段的类
型进行隐式的自动转换，则不要求两个查询的结果集在字段类型上完全一致

These conditions are also referred to as union compatibility conditions. The term union compatibility is used even though these conditions apply to other set operations as well. Set operations are often called vertical joins, because the result combines data from two or more SELECTS based on columns instead of rows. The generic syntax of a query involving a set operation is:

{UNION | UNION ALL | MINUS | INTERSECT}

2．集合运算符：

The following list briefly describes the four set operations supported by Oracle SQL:

·UNION ALL

Combines the results of two SELECT statements into one result set.

·UNION

Combines the results of two SELECT statements into one result set, and then eliminates any duplicate rows from that result set.

·MINUS

Takes the result set of one SELECT statement, and removes those rows that are also returned by a second SELECT statement.

·INTERSECT

Returns only those rows that are returned by each of two SELECT statements

A．Union All:

The UNION ALL operator merges the result sets of two component queries. This operation returns rows retrieved by either of the component queries.The UNION ALL operator simply merges the output of its component queries, without caring about any duplicates in the final result set(Union All只是简单地将两个结果集合并在一起，而不管其中是否有重复的记录).

B．Union：

The UNION operator returns all distinct rows retrieved by two component queries. The UNION operation eliminates duplicates while merging rows retrieved by either of the component queries

To eliminate duplicate rows, a UNION operation needs to do some extra tasks as compared to the UNION ALL operation. These extra tasks include sorting and filtering the result set. If we observe carefully,(为了将重复的记录排除掉，Union运算和Union All运算相比需要做一些额外的工作，这些额外的工作包括了排序和对结果集进行过滤) we will notice that the result set of the UNION ALL operation is not sorted, whereas the result set of the UNION operation is sorted. These extra tasks introduce a performance overhead to the UNION operation. A query involving UNION will take extra time compared to the same query with UNION ALL, even if there are no duplicates to remove Therefore, unless we have a valid need to retrieve only distinct rows, we should use UNION ALL instead of UNION for better performance. (即便在数据集中确实没有重复的数据要过滤，Union查询也需要额外的时间用以和采用Union All查询的结果集进行对比。所以，除非我们确实需要完全不同的记录，否则我们应该采用Union All代替Union来提高性能).

C．Intersect:

INTERSECT returns only the rows retrieved by both component queries. Compare this with UNION, which returns the rows retrieved by any of the component queries. If UNION acts like 'OR', INTERSECT acts like 'AND'

D．Minus:

Minus returns all rows from the first SELECT that are not also returned by the second SELECT(Minus返回所有出现在第一个结果集中，但不出现在第二个结果集中的记录)

例：

查询1：

SELECT CUST_NBR, NAME

FROM CUSTOMER

WHERE REGION_ID = 5;

CUST_NBR NAME

---------- ------------------------------

1 Cooper Industries

2 Emblazon Corp.

3 Ditech Corp.

4 Flowtech Inc.

5 Gentech Industries

查询2：

SELECT C.CUST_NBR, C.NAME

FROM CUSTOMER C

WHERE C.CUST_NBR IN (SELECT O.CUST_NBR

FROM CUST_ORDER O, EMPLOYEE E

WHERE O.SALES_EMP_ID = E.EMP_ID

AND E.LNAME = 'MARTIN');

CUST_NBR NAME

---------- ------------------------------

4 Flowtech Inc.

8 Zantech Inc.

查询结果集相减：

SELECT CUST_NBR, NAME

FROM CUSTOMER

WHERE REGION_ID = 5

MINUS

SELECT C.CUST_NBR, C.NAME

FROM CUSTOMER C

WHERE C.CUST_NBR IN (SELECT O.CUST_NBR

　　FROM CUST_ORDER O, EMPLOYEE E

　 WHERE O.SALES_EMP_ID = E.EMP_ID

AND E.LNAME = 'MARTIN');

CUST_NBR NAME

---------- ------------------------------

1 Cooper Industries

2 Emblazon Corp.

3 Ditech Corp.

5 Gentech Industries

You might wonder why we don't see "Zantech Inc." in the output. An important thing to note here is that the execution order of component queries in a set operation is from top to bottom. The results of UNION, UNION ALL, and INTERSECT will not change if we alter the ordering of component queries. However, the result of MINUS will be different if we alter the order of the component queries. If we rewrite the previous query by switching the positions of the two SELECTs, we get a completely different result

SELECT C.CUST_NBR, C.NAME

FROM CUSTOMER C

WHERE C.CUST_NBR IN (SELECT O.CUST_NBR

FROM CUST_ORDER O, EMPLOYEE E

WHERE O.SALES_EMP_ID = E.EMP_ID

AND E.LNAME = 'MARTIN')

MINUS

SELECT CUST_NBR, NAME

FROM CUSTOMER

WHERE REGION_ID = 5;

CUST_NBR NAME

---------- ------------------------------

8 Zantech Inc.

In a MINUS operation, rows may be returned by the second SELECT that are not also returned by the first. These rows are not included in the output(在Minus运算中，第二个结果集返回的记录可能不会在第一个结果集中出现，但是这些记录同样不会被包含在最终的结果集中)

3.利用集合运算对表进行比较：

The following query uses both MINUS and UNION ALL to compare two tables for equality. The query depends on each table having either a primary key or at least one unique index.(假设每个表都有一个主键索引或至少有一个惟一)

(SELECT * FROM CUSTOMER_KNOWN_GOOD

MINUS

SELECT * FROM CUSTOMER_TEST)　　　--表A的记录减去表B的记录

UNION ALL　　　　　　　　　　　　　　　--注意两边要用括号，确保先进行MINUS运算　　　　　　　　　　　　　　　　　　

(SELECT * FROM CUSTOMER_TEST --表B的记录减去表A的记录

MINUS

SELECT * FROM CUSTOMER_KNOWN_GOOD);

We can look at it as the union of two compound queries. The parentheses ensure that both MINUS operations take place first before the UNION ALL operation is performed. The result of the first MINUS query will be those rows in CUSTOMER_KNOWN_GOOD that are not also in CUSTOMER_TEST. The result of the second MINUS query will be those rows in CUSTOMER_TEST that are not also in CUSTOMER_KNOWN_GOOD. The UNION ALL operator simply combines these two result sets for convenience. If no rows are returned by this query, then we know that both tables have identical rows. Any rows returned by this query represent differences between the CUSTOMER_TEST and CUSTOMER_KNOWN_GOOD tables.

If the possibility exists for one or both tables to contain duplicate rows, we must use a more general form of this query in order to test two tables for equality. This more general form uses row counts to detect duplicates(如果表中允许有重复记录的出现，则用第一种方式会得到错误的结果，此时可以通过对表A和表B的记录进行分组统计，然后相减，如果两边的记录完全相同，则不会出现在最终的记录集中，如果两边的记录数不同，则会出现在最终的记录集中)

(SELECT C1.*, COUNT(*) FROM CUSTOMER_KNOWN_GOOD C1

GROUP BY C1.CUST_NBR, C1.NAME

MINUS

SELECT C2.*, COUNT(*) FROM CUSTOMER_TEST C2

GROUP BY C2.CUST_NBR, C2.NAME)　　　　　　--将表A分组统计的结果减去表B分组统计的结果　　　　　　　　

UNION ALL

(SELECT C3.*, COUNT(*) FROM CUSTOMER_TEST C3

GROUP BY C3.CUST_NBR, C3.NAME

MINUS

SELECT C4.*, COUNT(*)

FROM CUSTOMER_KNOWN_GOOD C4

GROUP BY C4.CUST_NBR, C4.NAME);　　　　　--将表B分组统计的结果减去表A分组统计的结果

CUST_NBR NAME COUNT(*)

----------- ------------------------------ ----------

2 Samsung 1　 --表A减表B的结果

3 Panasonic 3

2 Samsung 2 --表B减表A的结果

3 Panasonic 1

These results indicate that one table (CUSTOMER_KNOWN_GOOD) has one record for "Samsung", whereas the second table (CUSTOMER_TEST) has two records for the same customer. Also, one table (CUSTOMER_KNOWN_GOOD) has three records for "Panasonic", whereas the second table (CUSTOMER_TEST) has one record for the same customer. Both the tables have the same number of rows (two) for "Sony", and therefore "Sony" doesn't appear in the output.

Duplicate rows are not possible in tables that have a primary key or at least one unique index. Use the short form of the table comparison query for such tables.

4．在混合查询(Compound Query)中使用NULLS：

As we know, NULL doesn't have a datatype, and NULL can be used in place of a value of any datatype. If we purposely select NULL as a column value in a component query, Oracle no longer has two datatypes to compare in order to see whether the two component queries are compatible

(正如我们所知，NULL型变量是没有数据类型的，并且NULL可以被用在任何数据类型的变量值处，假如我们有意地在构成查询(component query)中SELECT NULL值，Oracle不会有第二种数据类型用来判断两个构成查询中的返回值是否一致)

For character columns, this is no problem. For example:

SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL

UNION

SELECT 2 NUM, NULL STRING FROM DUAL;

NUM STRING

---------- --------

1 DEFINITE

Notice that Oracle considers the character string 'DEFINITE' from the first component query to be compatible with the NULL value supplied for the corresponding column in the second component query.

However, if a NUMBER or a DATE column of a component query is set to NULL, we must explicitly tell Oracle what "flavor" of NULL to use. Otherwise, we'll encounter errors.

For example:

SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL

UNION

SELECT NULL NUM, 'UNKNOWN' STRING FROM DUAL;

SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL

ERROR at line 1:

ORA-01790: expression must have same datatype as corresponding expression

Note that the use of NULL in the second component query causes a datatype mismatch between the first column of the first component query, and the first column of the second component query. Using NULL for a DATE column causes the same problem

In these cases, we need to cast the NULL to a suitable datatype to fix the problem, as in the following examples

SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL

UNION

SELECT TO_NUMBER(NULL) NUM, 'UNKNOWN' STRING FROM DUAL;

NUM STRING

---------- --------

1 DEFINITE

UNKNOWN

This problem of union compatibility when using NULLs is encountered in Oracle8i. However, there is no such problem in Oracle9i. Oracle9i is smart enough to know which flavor of NULL to use in a compound query

5．集合运算中的规则和限制：

there are some other rules and restrictions that apply to the set operations

Column names for the result set are derived from the first SELECT

If we want to use ORDER BY in a query involving set operations, we must place the ORDER BY at the end of the entire statement. The ORDER BY clause can appear only once at the end of the compound query. The component queries can't have individual ORDER BY clauses

(如果我们想在使用集合运算的查询中进行排序，必须把Order By放在整个SQL语句的最后，Order By只能在复合查询中出现一次，而且是在末尾。不允许复合查询中的单个查询拥有独立的Order By)

SELECT CUST_NBR, NAME

FROM CUSTOMER

WHERE REGION_ID = 5

UNION

SELECT EMP_ID, LNAME

FROM EMPLOYEE

WHERE LNAME = 'MARTIN'

ORDER BY CUST_NBR;

CUST_NBR NAME

---------- ---------------------

1 Cooper Industries

2 Emblazon Corp.

3 Ditech Corp.

4 Flowtech Inc.

5 Gentech Industries

Note that the column name used in the ORDER BY clause of this query is taken from the first SELECT. We couldn't order these results by EMP_ID. If we attempt to ORDER BY EMP_ID, we will get an error, as in the following example

SELECT CUST_NBR, NAME

FROM CUSTOMER

WHERE REGION_ID = 5

UNION

SELECT EMP_ID, LNAME

FROM EMPLOYEE

WHERE LNAME = 'MARTIN' ORDER BY EMP_ID;

ORDER BY EMP_ID

ERROR at line 8:

ORA-00904: invalid column name

The ORDER BY clause doesn't recognize the column names of the second SELECT. To avoid confusion over column names, it is a common practice to ORDER BY column positions

SELECT CUST_NBR, NAME

FROM CUSTOMER

WHERE REGION_ID = 5

UNION

SELECT EMP_ID, LNAME

FROM EMPLOYEE

WHERE LNAME = 'MARTIN'

ORDER BY 1;

Unlike ORDER BY, we can use GROUP BY and HAVING clauses in component queries.

Component queries are executed from top to bottom. If we want to alter the sequence of execution, use parentheses appropriately