Union 隐藏的优势

本文探讨了在 SQL Server 中使用 UNION 的高效方法,通过创建复合索引来优化查询性能,以及如何避免 OR 运算符导致的慢速执行。

Hidden Power of UNION

SQL Server Pro
InstantDoc ID #37531

I used to think that UNION was useful only when you needed to combine the results of two queries from different sources (e.g., tables, views) into one result set. However, using UNION is sometimes the quickest way to select from just one table.

Suppose you need to retrieve all OrderIDs from the Northwind Orders table where the CustomerID is VICTE or the EmployeeID is 5:

SELECT OrderID FROM Orders
WHERE CustomerID = 'VICTE' OR EmployeeID = 5

The Orders table has indexes on the CustomerID and EmployeeID columns, but SQL Server doesn't use them to execute the SELECT statement, as the following execution plan shows:

|--Clustered Index Scan(OBJECT:
([Northwind].[dbo].[Orders].[PK_Orders]
), WHERE:([Orders].[CustomerID]='VICTE'
OR [Orders].[EmployeeID]=5))

Instead, SQL Server scans the clustered index PK_Orders. SQL Server uses this plan because the OR operator in the query's WHERE clause makes the result satisfy both conditions at the same time, so SQL Server must double-scan the table. Scanning a clustered index in this case is almost the same as scanning the entire table; the server goes though the table record by record and checks for the specified predicate.

To improve the query's efficiency, you could create a composite index on CustomerID and EmployeeID:

CREATE INDEX IdxOrders001 ON Orders
(CustomerID, EmployeeID)

The Index Tuning Wizard advises you to create just such an index to improve performance of the SELECT statement. SQL Server can use the new index to find all the records in which EmployeeID = 5, then scan only the resulting range of records to return the required result. The estimated execution plan for the new query shows that SQL Server uses the composite index:

|--Index Scan(OBJECT:([Northwind]
   .[dbo].[Orders].[IdxOrders001]),
   WHERE:([Orders].[CustomerID]='VICTE' OR
   [Orders].[EmployeeID]=5))

But an employee can make thousands of deals with thousands of customers. So even if you create a composite index, SQL Server will need to scan a range of records in the index and won't perform a seek operation, which uses indexes to retrieve records and is the fastest way for SQL Server to find information.

You need to make SQL Server use a seek operation instead of a scan, but SQL Server won't perform a seek when an OR operator is in the WHERE clause. You can solve this dilemma by using UNION to rewrite the SELECT statement:

SELECT OrderID FROM Orders
WHERE CustomerID = 'VICTE'
UNION
SELECT OrderID FROM Orders
WHERE EmployeeID = 5

SQL Server 2000's estimated execution plan for this statement is

|--Merge Join(Union)
    |--Index Seek(OBJECT:([Northwind]
       .[dbo].[Orders].[CustomersOrders]),
       SEEK:([Orders].[CustomerID]='VICTE'
       ) ORDERED FORWARD)
    |--Index Seek(OBJECT:([Northwind]
       .[dbo].[Orders].[EmployeesOrders]),
       SEEK:([Orders].[EmployeeID]=5)
       ORDERED FORWARD)

This execution plan looks longer than the original one, but both operators are index-seek operators. SQL Server doesn't use the composite index; instead, it uses two single-column indexes. You might think that two seek operations would cost more than one seek, but performance improves when you use this method. You can check performance by using SQL Trace to analyze the three versions of the SELECT statement. For the UNION query, my SQL Server 2000 system performed four reads to return the result. The first SELECT query required 23 reads, and the second SELECT statement, which created the composite index, required 11 reads.

This special use of UNION can help you avoid the OR operator's slow performance, but use it carefully. If you don't have the appropriate indexes (CustomersOrders and EmployeesOrders, in this example), you can double-scan the table.

1
2
3
4
5



这个错误信息: ``` ODPS-0130241:[34,25] Illegal union operation - type mismatch for UNION, left has 24 columns while right has 41 columns ``` 是出现在 **ODPS(Open Data Processing Service,阿里云的MaxCompute)** 中的一个错误,意思是: > 在执行 `UNION` 操作时,左右两个查询结果的列数不一致,左边有 24 列,右边有 41 列,无法进行合并。 --- ## ✅ 解决方案 `UNION` 要求两个查询结果具有**完全相同的列数、列名和数据类型**。要解决这个问题,你需要让左右两边的查询结果列数一致。 ### ✅ 方法一:显式选择相同列(推荐) 你可以在两个 `SELECT` 语句中显式指定相同的列,比如: ```sql SELECT col1, col2, ..., col24 FROM tableA UNION SELECT col1, col2, ..., col24 FROM tableB; ``` 确保两个查询结果的列数量、顺序、类型都一致。 --- ### ✅ 方法二:使用 NULL 填充缺失列 如果右边的表比左边多列,你可以使用 `NULL` 或默认值来“补全”左边的查询列数: ```sql SELECT col1, col2, ..., col24 FROM tableA UNION SELECT col1, col2, ..., col24, NULL AS col25, NULL AS col26, ... FROM tableB; ``` 或者反过来: ```sql SELECT col1, col2, ..., col24, NULL AS col25, NULL AS col26, ... FROM tableA UNION SELECT col1, col2, ..., col41 FROM tableB; ``` 这样可以保证两边列数一致。 --- ### ✅ 方法三:使用视图或子查询对齐字段 你可以先分别构建两个子查询,使它们输出相同结构的数据,再进行 `UNION`: ```sql SELECT * FROM ( SELECT col1, col2, ..., col24 FROM tableA ) t1 UNION SELECT * FROM ( SELECT col1, col2, ..., col24 FROM tableB ) t2; ``` --- ### ✅ 方法四:使用字段别名统一字段名和类型 如果字段名或类型不一致,也可以通过别名和类型转换来统一: ```sql SELECT col1, CAST(col2 AS STRING) AS col2, col3 FROM tableA UNION SELECT col1, CAST(col2 AS STRING) AS col2, col3 FROM tableB; ``` --- ## 🔍 错误原因总结 | 原因 | 说明 | |------|------| | 列数量不一致 | 左右两边查询的列数不同,无法合并 | | 列类型不一致 | 同一列的数据类型不一致,例如一个是 `STRING`,另一个是 `BIGINT` | | 列名不一致 | 列名不同,虽然不影响结果,但可能导致后续处理出错 | --- ## ✅ 最佳实践建议 1. **始终显式列出字段**:不要使用 `SELECT *`,尤其在 `UNION` 时。 2. **统一字段名和类型**:使用别名和类型转换确保一致。 3. **测试子查询结构**:在合并前分别执行左右两边的查询,确认输出结构一致。 --- ##
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值