1.集合运算简介:
There are situations when we need to combine the results from two or more SELECT statements. SQL enables us to handle these requirements by using set operations. The result of each SELECT statement can be treated as a set, and SQL set operations can be applied on those sets to arrive at a final result. Oracle SQL supports the following four set operations:
· UNION ALL
· UNION
· MINUS
· INTERSECT
SQL statements containing these set operators are referred to as compound queries, and each SELECT statement in a compound query is referred to as a component query. Two SELECTs can be combined into a compound query by a set operation only if they satisfy the following two conditions:
·The result sets of both the queries must have the same number of columns.
·The datatype of each column in the second result set must match the datatype of its corresponding column in the first result set.
两个SELECT语句要合成一个混合查询(compound query),必须满足以下两个条件:
·两个查询的结果集其列数必须相同
·第二个查询的结果集其字段的类型必须和第一个查询的结果集的字段类型相同,但是如果Oracle能够对字段的类
型进行隐式的自动转换,则不要求两个查询的结果集在字段类型上完全一致
型进行隐式的自动转换,则不要求两个查询的结果集在字段类型上完全一致
These conditions are also referred to as union compatibility conditions. The term union compatibility is used even though these conditions apply to other set operations as well. Set operations are often called vertical joins, because the result combines data from two or more SELECTS based on columns instead of rows. The generic syntax of a query involving a set operation is:
<component query>
{UNION | UNION ALL | MINUS | INTERSECT}
<component query>
2.集合运算符:
The following list briefly describes the four set operations supported by Oracle SQL:
·UNION ALL
Combines the results of two SELECT statements into one result set.
·UNION
Combines the results of two SELECT statements into one result set, and then eliminates any duplicate rows from that result set.
·MINUS
Takes the result set of one SELECT statement, and removes those rows that are also returned by a second SELECT statement.
·INTERSECT
Returns only those rows that are returned by each of two SELECT statements
A.Union All:
The UNION ALL operator merges the result sets of two component queries. This operation returns rows retrieved by either of the component queries.The UNION ALL operator simply merges the output of its component queries, without caring about any duplicates in the final result set(Union All只是简单地将两个结果集合并在一起,而不管其中是否有重复的记录).
B.Union:
The UNION operator returns all distinct rows retrieved by two component queries. The UNION operation eliminates duplicates while merging rows retrieved by either of the component queries
To eliminate duplicate rows, a UNION operation needs to do some extra tasks as compared to the UNION ALL operation. These extra tasks include sorting and filtering the result set. If we observe carefully,(为了将重复的记录排除掉,Union运算和Union All运算相比需要做一些额外的工作,这些额外的工作包括了排序和对结果集进行过滤) we will notice that the result set of the UNION ALL operation is not sorted, whereas the result set of the UNION operation is sorted. These extra tasks introduce a performance overhead to the UNION operation. A query involving UNION will take extra time compared to the same query with UNION ALL, even if there are no duplicates to remove Therefore, unless we have a valid need to retrieve only distinct rows, we should use UNION ALL instead of UNION for better performance. (即便在数据集中确实没有重复的数据要过滤,Union查询也需要额外的时间用以和采用Union All查询的结果集进行对比。所以,除非我们确实需要完全不同的记录,否则我们应该采用Union All代替Union来提高性能).
C.Intersect:
INTERSECT returns only the rows retrieved by both component queries. Compare this with UNION, which returns the rows retrieved by any of the component queries. If UNION acts like 'OR', INTERSECT acts like 'AND'
D.Minus:
Minus returns all rows from the first SELECT that are not also returned by the second SELECT(Minus返回所有出现在第一个结果集中,但不出现在第二个结果集中的记录)
例:
查询1:
SELECT CUST_NBR, NAME
FROM CUSTOMER
WHERE REGION_ID = 5;
CUST_NBR NAME
---------- ------------------------------
1 Cooper Industries
2 Emblazon Corp.
3 Ditech Corp.
4 Flowtech Inc.
5 Gentech Industries
查询2:
SELECT C.CUST_NBR, C.NAME
FROM CUSTOMER C
WHERE C.CUST_NBR IN (SELECT O.CUST_NBR
FROM CUST_ORDER O, EMPLOYEE E
WHERE O.SALES_EMP_ID = E.EMP_ID
AND E.LNAME = 'MARTIN');
CUST_NBR NAME
---------- ------------------------------
4 Flowtech Inc.
8 Zantech Inc.
查询结果集相减:
SELECT CUST_NBR, NAME
FROM CUSTOMER
WHERE REGION_ID = 5
MINUS
SELECT C.CUST_NBR, C.NAME
FROM CUSTOMER C
WHERE C.CUST_NBR IN (SELECT O.CUST_NBR
FROM CUST_ORDER O, EMPLOYEE E
WHERE O.SALES_EMP_ID = E.EMP_ID
AND E.LNAME = 'MARTIN');
CUST_NBR NAME
---------- ------------------------------
1 Cooper Industries
2 Emblazon Corp.
3 Ditech Corp.
5 Gentech Industries
You might wonder why we don't see "Zantech Inc." in the output. An important thing to note here is that the execution order of component queries in a set operation is from top to bottom. The results of UNION, UNION ALL, and INTERSECT will not change if we alter the ordering of component queries. However, the result of MINUS will be different if we alter the order of the component queries. If we rewrite the previous query by switching the positions of the two SELECTs, we get a completely different result
SELECT C.CUST_NBR, C.NAME
FROM CUSTOMER C
WHERE C.CUST_NBR IN (SELECT O.CUST_NBR
FROM CUST_ORDER O, EMPLOYEE E
WHERE O.SALES_EMP_ID = E.EMP_ID
AND E.LNAME = 'MARTIN')
MINUS
SELECT CUST_NBR, NAME
FROM CUSTOMER
WHERE REGION_ID = 5;
CUST_NBR NAME
---------- ------------------------------
8 Zantech Inc.
In a MINUS operation, rows may be returned by the second SELECT that are not also returned by the first. These rows are not included in the output(在Minus运算中,第二个结果集返回的记录可能不会在第一个结果集中出现,但是这些记录同样不会被包含在最终的结果集中)
3.利用集合运算对表进行比较:
The following query uses both MINUS and UNION ALL to compare two tables for equality. The query depends on each table having either a primary key or at least one unique index.(假设每个表都有一个主键索引或至少有一个惟一)
(SELECT * FROM CUSTOMER_KNOWN_GOOD
MINUS
SELECT * FROM CUSTOMER_TEST) --表A的记录减去表B的记录
UNION ALL --注意两边要用括号,确保先进行MINUS运算
(SELECT * FROM CUSTOMER_TEST --表B的记录减去表A的记录
MINUS
SELECT * FROM CUSTOMER_KNOWN_GOOD);
We can look at it as the union of two compound queries. The parentheses ensure that both MINUS operations take place first before the UNION ALL operation is performed. The result of the first MINUS query will be those rows in CUSTOMER_KNOWN_GOOD that are not also in CUSTOMER_TEST. The result of the second MINUS query will be those rows in CUSTOMER_TEST that are not also in CUSTOMER_KNOWN_GOOD. The UNION ALL operator simply combines these two result sets for convenience. If no rows are returned by this query, then we know that both tables have identical rows. Any rows returned by this query represent differences between the CUSTOMER_TEST and CUSTOMER_KNOWN_GOOD tables.
If the possibility exists for one or both tables to contain duplicate rows, we must use a more general form of this query in order to test two tables for equality. This more general form uses row counts to detect duplicates(如果表中允许有重复记录的出现,则用第一种方式会得到错误的结果,此时可以通过对表A和表B的记录进行分组统计,然后相减,如果两边的记录完全相同,则不会出现在最终的记录集中,如果两边的记录数不同,则会出现在最终的记录集中)
(SELECT C1.*, COUNT(*) FROM CUSTOMER_KNOWN_GOOD C1
GROUP BY C1.CUST_NBR, C1.NAME
MINUS
SELECT C2.*, COUNT(*) FROM CUSTOMER_TEST C2
GROUP BY C2.CUST_NBR, C2.NAME) --将表A分组统计的结果减去表B分组统计的结果
UNION ALL
(SELECT C3.*, COUNT(*) FROM CUSTOMER_TEST C3
GROUP BY C3.CUST_NBR, C3.NAME
MINUS
SELECT C4.*, COUNT(*)
FROM CUSTOMER_KNOWN_GOOD C4
GROUP BY C4.CUST_NBR, C4.NAME); --将表B分组统计的结果减去表A分组统计的结果
CUST_NBR NAME COUNT(*)
----------- ------------------------------ ----------
2 Samsung 1 --表A减表B的结果
3 Panasonic 3
2 Samsung 2 --表B减表A的结果
3 Panasonic 1
These results indicate that one table (CUSTOMER_KNOWN_GOOD) has one record for "Samsung", whereas the second table (CUSTOMER_TEST) has two records for the same customer. Also, one table (CUSTOMER_KNOWN_GOOD) has three records for "Panasonic", whereas the second table (CUSTOMER_TEST) has one record for the same customer. Both the tables have the same number of rows (two) for "Sony", and therefore "Sony" doesn't appear in the output.
Duplicate rows are not possible in tables that have a primary key or at least one unique index. Use the short form of the table comparison query for such tables.
4.在混合查询(Compound Query)中使用NULLS:
As we know, NULL doesn't have a datatype, and NULL can be used in place of a value of any datatype. If we purposely select NULL as a column value in a component query, Oracle no longer has two datatypes to compare in order to see whether the two component queries are compatible
(正如我们所知,NULL型变量是没有数据类型的,并且NULL可以被用在任何数据类型的变量值处,假如我们有意地在构成查询(component query)中SELECT NULL值,Oracle不会有第二种数据类型用来判断两个构成查询中的返回值是否一致)
For character columns, this is no problem. For example:
SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL
UNION
SELECT 2 NUM, NULL STRING FROM DUAL;
NUM STRING
---------- --------
1 DEFINITE
2
Notice that Oracle considers the character string 'DEFINITE' from the first component query to be compatible with the NULL value supplied for the corresponding column in the second component query.
However, if a NUMBER or a DATE column of a component query is set to NULL, we must explicitly tell Oracle what "flavor" of NULL to use. Otherwise, we'll encounter errors.
For example:
SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL
UNION
SELECT NULL NUM, 'UNKNOWN' STRING FROM DUAL;
SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL
*
ERROR at line 1:
ORA-01790: expression must have same datatype as corresponding expression
Note that the use of NULL in the second component query causes a datatype mismatch between the first column of the first component query, and the first column of the second component query. Using NULL for a DATE column causes the same problem
In these cases, we need to cast the NULL to a suitable datatype to fix the problem, as in the following examples
SELECT 1 NUM, 'DEFINITE' STRING FROM DUAL
UNION
SELECT TO_NUMBER(NULL) NUM, 'UNKNOWN' STRING FROM DUAL;
NUM STRING
---------- --------
1 DEFINITE
UNKNOWN
This problem of union compatibility when using NULLs is encountered in Oracle8i. However, there is no such problem in Oracle9i. Oracle9i is smart enough to know which flavor of NULL to use in a compound query
5.集合运算中的规则和限制:
there are some other rules and restrictions that apply to the set operations
Column names for the result set are derived from the first SELECT
If we want to use ORDER BY in a query involving set operations, we must place the ORDER BY at the end of the entire statement. The ORDER BY clause can appear only once at the end of the compound query. The component queries can't have individual ORDER BY clauses
(如果我们想在使用集合运算的查询中进行排序,必须把Order By放在整个SQL语句的最后,Order By只能在复合查询中出现一次,而且是在末尾。不允许复合查询中的单个查询拥有独立的Order By)
SELECT CUST_NBR, NAME
FROM CUSTOMER
WHERE REGION_ID = 5
UNION
SELECT EMP_ID, LNAME
FROM EMPLOYEE
WHERE LNAME = 'MARTIN'
ORDER BY CUST_NBR;
CUST_NBR NAME
---------- ---------------------
1 Cooper Industries
2 Emblazon Corp.
3 Ditech Corp.
4 Flowtech Inc.
5 Gentech Industries
Note that the column name used in the ORDER BY clause of this query is taken from the first SELECT. We couldn't order these results by EMP_ID. If we attempt to ORDER BY EMP_ID, we will get an error, as in the following example
SELECT CUST_NBR, NAME
FROM CUSTOMER
WHERE REGION_ID = 5
UNION
SELECT EMP_ID, LNAME
FROM EMPLOYEE
WHERE LNAME = 'MARTIN' ORDER BY EMP_ID;
ORDER BY EMP_ID
*
ERROR at line 8:
ORA-00904: invalid column name
The ORDER BY clause doesn't recognize the column names of the second SELECT. To avoid confusion over column names, it is a common practice to ORDER BY column positions
SELECT CUST_NBR, NAME
FROM CUSTOMER
WHERE REGION_ID = 5
UNION
SELECT EMP_ID, LNAME
FROM EMPLOYEE
WHERE LNAME = 'MARTIN'
ORDER BY 1;
Unlike ORDER BY, we can use GROUP BY and HAVING clauses in component queries.
Component queries are executed from top to bottom. If we want to alter the sequence of execution, use parentheses appropriately