Index Scans

最新推荐文章于 2025-11-26 15:49:07 发布

最新推荐文章于 2025-11-26 15:49:07 发布 · 203 阅读

文章标签：

#人工智能 #数据库

本文详细介绍了Oracle数据库中各种类型的索引扫描操作，包括索引扫描的基本原理、全索引扫描、快速全索引扫描、索引范围扫描、索引唯一扫描、索引跳过扫描等，并探讨了索引聚集因子的概念及其对查询性能的影响。

Index Scans

In an index scan, the database retrieves a row by traversing the index, using the indexed column values specified by the statement. If the database scans the index for a value, then it will find this value in n I/Os where n is the height of the B-tree index. This is the basic principle behind Oracle Database indexes.

If a SQL statement accesses only indexed columns, then the database reads values directly from the index rather than from the table. If the statement accesses columns in addition to the indexed columns, then the database uses rowids to find the rows in the table. Typically, the database retrieves table data by alternately reading an index block and then a table block.

Full Index Scan
In a full index scan, the database reads the entire index in order. A full index scan is available if a predicate (WHERE clause) in the SQL statement references a column in the index, and in some circumstances when no predicate is specified. A full scan can eliminate sorting because the data is ordered by index key.
Suppose that an application runs the following query:

SELECT department_id, last_name, salary
FROM employees
WHERE salary > 5000
ORDER BY department_id, last_name;

Also assume that department_id, last_name, and salary are a composite key in an index. Oracle Database performs a full scan of the index, reading it in sorted order (ordered by department ID and last name) and filtering on the salary attribute. In this way, the database scans a set of data smaller than the employees table, which contains more columns than are included in the query, and avoids sorting the data.
For example, the full scan could read the index entries as follows:

50,Atkinson,2800,rowid
60,Austin,4800,rowid
70,Baer,10000,rowid

80,Abel,11000,rowid
80,Ande,6400,rowid
110,Austin,7200,rowid
.
.
.

Fast Full Index Scan
A fast full index scan is a full index scan in which the database accesses the data in the index itself without accessing the table, and the database reads the index blocks in no particular order.
Fast full index scans are an alternative to a full table scan when both of the following conditions are met:
■The index must contain all columns needed for the query.
■A row containing all nulls must not appear in the query result set. For this result to be guaranteed, at least one column in the index must have either:
–A NOT NULL constraint
–A predicate applied to it that prevents nulls from being considered in the query result set
For example, an application issues the following query, which does not include an ORDER BY clause:

SELECT last_name, salary
FROM employees;

The last_name column has a not null constraint. If the last name and salary are a composite key in an index, then a fast full index scan can read the index entries to obtain the requested information:
Baida,2900,rowid
Zlotkey,10500,rowid
Austin,7200,rowid
Baer,10000,rowid
Atkinson,2800,rowid
Austin,4800,rowid
.
.
.

Index Range Scan
An index range scan is an ordered scan of an index that has the following characteristics:
■One or more leading columns of an index are specified in conditions. A condition specifies a combination of one or more expressions and logical (Boolean) operators and returns a value of TRUE, FALSE, or UNKNOWN.
■0, 1, or more values are possible for an index key.
The database commonly uses an index range scan to access selective data. The selectivity is the percentage of rows in the table that the query selects, with 0 meaning no rows and 1 meaning all rows. Selectivity is tied to a query predicate, such as WHERE last_name LIKE 'A%', or a combination of predicates. A predicate becomes more selective as the value approaches 0 and less selective (or more unselective) as the value approaches 1.
For example, a user queries employees whose last names begin with A. Assume that the last_name column is indexed, with entries as follows:

Abel,rowid
Ande,rowid
Atkinson,rowid
Austin,rowid
Austin,rowid
Baer,rowid
.
.
.
The database could use a range scan because the last_name column is specified in the predicate and multiples rowids are possible for each index key. For example, two employees are named Austin, so two rowids are associated with the key Austin.
An index range scan can be bounded on both sides, as in a query for departments with IDs between 10 and 40, or bounded on only one side, as in a query for IDs over 40. To scan the index, the database moves backward or forward through the leaf blocks. For example, a scan for IDs between 10 and 40 locates the first index leaf block that contains the lowest key value that is 10 or greater. The scan then proceeds horizontally through the linked list of leaf nodes until it locates a value greater than 40.

Index Unique Scan
In contrast to an index range scan, an index unique scan must have either 0 or 1 rowid associated with an index key. The database performs a unique scan when a predicate references all of the columns in a UNIQUE index key using an equality operator. An index unique scan stops processing as soon as it finds the first record because no second record is possible.
As an illustration, suppose that a user runs the following query:

SELECT *
FROM employees
WHERE employee_id = 5;

Assume that the employee_id column is the primary key and is indexed with entries as follows:
1,rowid
2,rowid
4,rowid
5,rowid
6,rowid
.
.
.

In this case, the database can use an index unique scan to locate the rowid for the employee whose ID is 5.

Index Skip Scan
An index skip scan uses logical subindexes of a composite index. The database "skips" through a single index as if it were searching separate indexes. Skip scanning is beneficial if there are few distinct values in the leading column of a composite index and many distinct values in the nonleading key of the index.
The database may choose an index skip scan when the leading column of the composite index is not specified in a query predicate. For example, assume that you run the following query for a customer in the sh.customers table:

SELECT * FROM sh.customers WHERE cust_email = 'Abbey@company.com';

The customers table has a column cust_gender whose values are either M or F. Assume that a composite index exists on the columns (cust_gender, cust_email). Example 3–1 shows a portion of the index entries.

The database can use a skip scan of this index even though cust_gender is not specified in the WHERE clause.
In a skip scan, the number of logical subindexes is determined by the number of distinct values in the leading column. In Example 3–1, the leading column has two possible values. The database logically splits the index into one subindex with the key F and a second subindex with the key M.
When searching for the record for the customer whose email is Abbey@company.com, the database searches the subindex with the value F first and then searches the subindex with the value M. Conceptually, the database processes the query as follows:

SELECT * FROM sh.customers WHERE cust_gender = 'F'
AND cust_email = 'Abbey@company.com'
UNION ALL
SELECT * FROM sh.customers WHERE cust_gender = 'M'
AND cust_email = 'Abbey@company.com';

Index Clustering Factor
The index clustering factor measures row order in relation to an indexed value such as employee last name. The more order that exists in row storage for this value, the lower the clustering factor.
The clustering factor is useful as a rough measure of the number of I/Os required to read an entire table by means of an index:
■If the clustering factor is high, then Oracle Database performs a relatively high number of I/Os during a large index range scan. The index entries point to random table blocks, so the database may have to read and reread the same blocks over and over again to retrieve the data pointed to by the index.
■If the clustering factor is low, then Oracle Database performs a relatively low number of I/Os during a large index range scan. The index keys in a range tend to point to the same data block, so the database does not have to read and reread the same blocks over and over.
The clustering factor is relevant for index scans because it can show:
■Whether the database will use an index for large range scans
■The degree of table organization in relation to the index key
■Whether you should consider using an index-organized table, partitioning, or table cluster if rows must be ordered by the index key

For example, assume that the employees table fits into two data blocks. Table 3–1 depicts the rows in the two data blocks (the ellipses indicate data that is not shown).

Rows are stored in the blocks in order of last name (shown in bold). For example, the bottom row in data block 1 describes Abel, the next row up describes Ande, and so on alphabetically until the top row in block 1 for Steven King. The bottom row in block 2 describes Kochar, the next row up describes Kumar, and so on alphabetically until the last row in the block for Zlotkey.
Assume that an index exists on the last name column. Each name entry corresponds to a rowid. Conceptually, the index entries would look as follows:
Abel,block1row1
Ande,block1row2
Atkinson,block1row3
Austin,block1row4
Baer,block1row5
.
.
.
Assume that a separate index exists on the employee ID column. Conceptually, the index entries might look as follows, with employee IDs distributed in almost random locations throughout the two blocks:

100,block1row50
101,block2row1
102,block1row9
103,block2row19
104,block2row39
105,block1row4
.
.
.
Example 3–2 queries the ALL_INDEXES view for the clustering factor for these two indexes. The clustering factor for EMP_NAME_IX is low, which means that adjacent index entries in a single leaf block tend to point to rows in the same data blocks. The clustering factor for EMP_EMP_ID_PK is high, which means that adjacent index entries in the same leaf block are much less likely to point to rows in the same data blocks.

Example 3–2 Clustering Factor
SQL> SELECT INDEX_NAME, CLUSTERING_FACTOR
 FROM ALL_INDEXES  WHERE INDEX_NAME IN ('EMP_NAME_IX','EMP_EMP_ID_PK');
INDEX_NAME CLUSTERING_FACTOR
-------------------- -----------------
EMP_EMP_ID_PK 19
EMP_NAME_IX 2