Indexing NULLs: (Empty Spaces)

本文深入探讨了如何通过索引优化检索包含空值的数据库记录,提出了利用非空列、单字符附加及二进制浮点数替代空值等策略,并通过实例演示了这些方法的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

There have always been issues with NULLs and indexes. The main issue being of course if the indexed columns are all null then the associated row is not indexed.

Generally, this is a good thing. If we have a table with lots of null values for indexed columns, then the associated rows are not indexed resulting in a smaller index structure. Also, very often we’re simply not interested in result sets where the indexed values are null so it’s generally not an issue.

However, what if the number of rows where the values are null are relatively small and what if we want to find all rows where the index column or columns are indeed null. If the column or columns don’t have nulls indexed then a potentially expensive Full Table Scan (FTS) is the CBO’s only option.

The first thing to point out is that nulls are actually indexed, if other columns in the index have a not null value. For example, if we have a concatenated index on columns (A,B), so long as A has a not null value then column B can have an indexed null value and if column B has a not null value then column A can have an indexed null value. Only if both columns A and B contain nulls, will the associated row not be indexed.

If column B has a NOT NULL constraint, then Oracle knows that B can not contain any null values. Therefore, if column A can contain null values, Oracle also knows that each and every null value of A must also be indexed as it’s not possible to have an entirely null indexed entry. Therefore, with an index on (A,B), we can use the index to return every null value for A, providing of course the CBO considers the costs of doing so to be cheaper than a FTS. We can also always of course use the index to return all null values of A for any corresponding not null value of B.

So with concatenated indexes and with at least one not null column, Oracle can guarantee that every null for all the other columns are contained within the index and so could potentially use the index for corresponding IS NULL predicates.

But what if the index has a single column or what if none of the indexes have a NOT NULL constraint, we’re done for, the CBO won’t be able to use the associated index to just retrieve nulls, right ?

Well not quite.

Let’s assume we have an index that consists just of column A and it’s a null column. Let’s also assume there are not too many rows that have a null for A and we have an important query that would dearly love to use an index to retrieve rows based on these null values for column A.

Well one alternative of course as I’ve seen a number of times is to just include a NOT NULL column in the  index as well, say (A,B). Yes, we don’t particularly want to include column B in the index but at least by doing so, we ensure all null values for column A are indexed, making A IS NULL predicates viable through an index.

However a somewhat cheaper and less expensive alternative is to just simply append a single character to the index, for example a space (A, ‘ ‘). The space character takes up one byte, the column length in the index takes up an additional byte for a total of 2 bytes overhead per index entry. Yes this will reduce the capacity of a leaf block to contain as many index entries and so potentially increase somewhat the overall size of the index. However, this will also guarantee that the index can not contain all null entries thereby ensuring all other columns have all their null values indexed.

 See this demo on Indexing Null Values for examples on how this all works.


Comments»

1. Christian Antognini -  January 23, 2008

Hi Richard

For numeric values another possibility is to use BINARY_FLOAT/DOUBLE datatype and insert NAN instead of NULL. Since NAN has a specific binary value, it is indexed as any other value.

Cheers,
Chris

2. Richard Foote -  January 23, 2008

Hi Christian

Good thinking, nice idea.

In case anyone is unsure what Christian is suggesting, don’t use null values for (say) binary_float numerics but use a Not A Number (NAN) instead. It’s not a null, is hence indexable and can be used to retrieve a number when it’s not really a number.

SQL> create table nan (id binary_float);

Table created.

SQL> insert into nan values (binary_float_nan);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from nan where id is nan;

ID
———-
Nan

Create an index and all NaNs can be potenitally retrieved via the index.

Thanks for the suggestion Christian :)


资源下载链接为: https://pan.quark.cn/s/22ca96b7bd39 在当今的软件开发领域,自动化构建与发布是提升开发效率和项目质量的关键环节。Jenkins Pipeline作为一种强大的自动化工具,能够有效助力Java项目的快速构建、测试及部署。本文将详细介绍如何利用Jenkins Pipeline实现Java项目的自动化构建与发布。 Jenkins Pipeline简介 Jenkins Pipeline是运行在Jenkins上的一套工作流框架,它将原本分散在单个或多个节点上独立运行的任务串联起来,实现复杂流程的编排与可视化。它是Jenkins 2.X的核心特性之一,推动了Jenkins从持续集成(CI)向持续交付(CD)及DevOps的转变。 创建Pipeline项目 要使用Jenkins Pipeline自动化构建发布Java项目,首先需要创建Pipeline项目。具体步骤如下: 登录Jenkins,点击“新建项”,选择“Pipeline”。 输入项目名称和描述,点击“确定”。 在Pipeline脚本中定义项目字典、发版脚本和预发布脚本。 编写Pipeline脚本 Pipeline脚本是Jenkins Pipeline的核心,用于定义自动化构建和发布的流程。以下是一个简单的Pipeline脚本示例: 在上述脚本中,定义了四个阶段:Checkout、Build、Push package和Deploy/Rollback。每个阶段都可以根据实际需求进行配置和调整。 通过Jenkins Pipeline自动化构建发布Java项目,可以显著提升开发效率和项目质量。借助Pipeline,我们能够轻松实现自动化构建、测试和部署,从而提高项目的整体质量和可靠性。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值