Indexing NULLs: (Empty Spaces)

最新推荐文章于 2025-06-27 02:20:57 发布

loryliu

最新推荐文章于 2025-06-27 02:20:57 发布

阅读量457

点赞数

分类专栏： oracle 文章标签： null index

oracle 专栏收录该内容

203 篇文章

订阅专栏

本文深入探讨了如何通过索引优化检索包含空值的数据库记录，提出了利用非空列、单字符附加及二进制浮点数替代空值等策略，并通过实例演示了这些方法的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

There have always been issues with NULLs and indexes. The main issue being of course if the indexed columns are all null then the associated row is not indexed.

Generally, this is a good thing. If we have a table with lots of null values for indexed columns, then the associated rows are not indexed resulting in a smaller index structure. Also, very often we’re simply not interested in result sets where the indexed values are null so it’s generally not an issue.

However, what if the number of rows where the values are null are relatively small and what if we want to find all rows where the index column or columns are indeed null. If the column or columns don’t have nulls indexed then a potentially expensive Full Table Scan (FTS) is the CBO’s only option.

The first thing to point out is that nulls are actually indexed, if other columns in the index have a not null value. For example, if we have a concatenated index on columns (A,B), so long as A has a not null value then column B can have an indexed null value and if column B has a not null value then column A can have an indexed null value. Only if both columns A and B contain nulls, will the associated row not be indexed.

If column B has a NOT NULL constraint, then Oracle knows that B can not contain any null values. Therefore, if column A can contain null values, Oracle also knows that each and every null value of A must also be indexed as it’s not possible to have an entirely null indexed entry. Therefore, with an index on (A,B), we can use the index to return every null value for A, providing of course the CBO considers the costs of doing so to be cheaper than a FTS. We can also always of course use the index to return all null values of A for any corresponding not null value of B.

So with concatenated indexes and with at least one not null column, Oracle can guarantee that every null for all the other columns are contained within the index and so could potentially use the index for corresponding IS NULL predicates.

But what if the index has a single column or what if none of the indexes have a NOT NULL constraint, we’re done for, the CBO won’t be able to use the associated index to just retrieve nulls, right ?

Well not quite.

Let’s assume we have an index that consists just of column A and it’s a null column. Let’s also assume there are not too many rows that have a null for A and we have an important query that would dearly love to use an index to retrieve rows based on these null values for column A.

Well one alternative of course as I’ve seen a number of times is to just include a NOT NULL column in the index as well, say (A,B). Yes, we don’t particularly want to include column B in the index but at least by doing so, we ensure all null values for column A are indexed, making A IS NULL predicates viable through an index.

However a somewhat cheaper and less expensive alternative is to just simply append a single character to the index, for example a space (A, ‘ ‘). The space character takes up one byte, the column length in the index takes up an additional byte for a total of 2 bytes overhead per index entry. Yes this will reduce the capacity of a leaf block to contain as many index entries and so potentially increase somewhat the overall size of the index. However, this will also guarantee that the index can not contain all null entries thereby ensuring all other columns have all their null values indexed.

See this demo on Indexing Null Values for examples on how this all works.

Comments»

1. Christian Antognini - January 23, 2008

Hi Richard

For numeric values another possibility is to use BINARY_FLOAT/DOUBLE datatype and insert NAN instead of NULL. Since NAN has a specific binary value, it is indexed as any other value.

Cheers,
Chris

2. Richard Foote - January 23, 2008

Hi Christian

Good thinking, nice idea.

In case anyone is unsure what Christian is suggesting, don’t use null values for (say) binary_float numerics but use a Not A Number (NAN) instead. It’s not a null, is hence indexable and can be used to retrieve a number when it’s not really a number.

SQL> create table nan (id binary_float);

Table created.

SQL> insert into nan values (binary_float_nan);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from nan where id is nan;

ID
———-
Nan

Create an index and all NaNs can be potenitally retrieved via the index.

Thanks for the suggestion Christian :)