- 索引类别:
到10GR2为止,共有四种全文搜索类型:
| Index Type | Description | Supported Preferences and Parameters | Query Operator | Notes |
|---|---|---|---|---|
CONTEXT | Use this index to build a text retrieval application when your text consists of large coherent documents. You can index documents of different formats such as MS Word, HTML or plain text. With a context index, you can customize your index in a variety of ways. This index type requires | All CREATE INDEX preferences and parameters supported except for INDEX SET. These supported parameters include the index partition clause, and the format, charset, and language columns. | CONTAINS Grammar is called the The | Supports all documents services and query services. Supports indexing of partitioned text tables. |
CTXCAT | Use this index type for better mixed query performance. Typically, with this index type, you index small documents or text fragments. Other columns in the base table, such as item names, prices and descriptions can be included in the index to improve mixed query performance. This index type is transactional, automatically updating itself after DML to base table. No | INDEX SET
Format, charset, and language columns not supported. Table and index partitioning not supported. | CATSEARCH Grammar is called The Theme querying is supported. | This index is larger and takes longer to build than a CONTEXT index. The size of a The |
CTXRULE | Use CTXRULE index to build a document classification or routing application. The CTXRULE index is an index created on a table of queries, where the queries define the classification or routing criteria. | See "CTXRULE Parameters and Limitations". | MATCHES | Single documents (plain text, HTML, or XML) can be classified using the MATCHES operator, which turns a document into a set of queries and finds the matching rows in the CTXRULE index. |
CTXPATH | Create this index when you need to speed up existsNode() queries on an XMLType column. | STORAGE | Use with existsNode() | Can only create this index on XMLType column. Although this index type can be helpful for |
- 创建语法
CREATE INDEX [schema.]index ON [schema.]table(column) INDEXTYPE IS
ctxsys.context [ONLINE]
[LOCAL [(PARTITION [partition] [PARAMETERS('paramstring')]
[, PARTITION [partition] [PARAMETERS('paramstring')]])]
[PARAMETERS(paramstring)] [PARALLEL n] [UNUSABLE]];
这里的context还可以换成ctxcat、ctxrule、ctxpath几个类型,支持分区索引.这里很重要的部分
是PARAMETERS部分,定义了索引的各种创建条件.PARAMETERS(paramstring)
The syntax for paramstring is as follows:
paramstring =
'[DATASTORE datastore_pref] 定义要所索引的数据来自哪里, [FILTER filter_pref] 定义过滤器, [CHARSET COLUMN charset_column_name] 当选用显示指定
一个过滤器需要的列 [FORMAT COLUMN format_column_name] 过滤器需要格式化的列 [LEXER lexer_pref] 定义一个此法分析器 [LANGUAGE COLUMN language_column_name] 多语言环境中,显示指定lexer分析的多语言列 [WORDLIST wordlist_pref] 定义一个常用搜索词表 [STORAGE storage_pref] 定义索引的存储参数 [STOPLIST stoplist] 定义不需要index的词组列表 [SECTION GROUP section_group] 定义一个索引区域(通常用在索引html、xml的环境) [MEMORY memsize] 分配索引创建需要的内存空间(in 10G) [POPULATE | NOPOPULATE] 是否创建一个空的索引 [[METADATA] SYNC (MANUAL | EVERY "interval-string" | ON COMMIT)] 定义索引同步时间(in 10G) [TRANSACTIONAL]' 是否开启事务更新(in 10G)
索引创建
[oracle@ique ~]$ sqlplus mayp/mayp
SQL*Plus : Release 10.2.0.4.0 - Production on Thu Aug 21 19:45:43 2008
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to:Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing option
SQL> create table oratext(id number,name varchar2(30),address varchar2(200))
pctfree 10 tablespace assm/ 2
Table created.
SQL> alter table oratext add constraint pk_oratext primary key(id) using index/ 2
Table altered.
SQL> exec ctx_ddl.create_preference('mayp_storage','basic_storage');Elapsed: 00:00:00.22
SQL> exec ctx_ddl.set_attribute('mayp_storage','I_TABLE_CLAUSE','TABLESPACE TEST');Elapsed: 00:00:00.11SQL> exec ctx_ddl.set_attribute('mayp_storage','K_TABLE_CLAUSE','TABLESPACE TEST');Elapsed: 00:00:00.04SQL> exec ctx_ddl.set_attribute('mayp_storage','R_TABLE_CLAUSE','TABLESPACE TEST');Elapsed: 00:00:00.06
SQL> set feedback onSQL> exec ctx_ddl.set_attribute('mayp_storage','N_TABLE_CLAUSE','TABLESPACE TEST');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.01SQL> exec ctx_ddl.set_attribute('mayp_storage','I_INDEX_CLAUSE','TABLESPACE TEST COMPRESS 2');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.01SQL> exec ctx_ddl.set_attribute('mayp_storage','P_TABLE_CLAUSE','TABLESPACE TEST');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.04SQL> exec ctx_ddl.create_preference('mayp_lexer','chinese_lexer')
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.02SQL> exec ctx_ddl.create_preference('mayp_datastore','direct_datastore');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.06
SQL> create index oratext_address on oratext(address) indextype is ctxsys.context
parameters('lexer mayp_lexer datastore mayp_datastore storage mayp_storage')/ 2
Index created.
Elapsed: 00:00:04.48
这里我重新定了了三个索引的参数:mayp_datastore(要索引的数据的存储方式),mayp_storage(索引的存储位置),
mayp_lexer(词法分割器).在索引创建完之后,会发现多了以dr$index_name开头的四张表和
一个dr$索引和oratext_address索引:
SQL> select table_name,tablespace_name from user_tables;
Tab.Name TABLESPACE_NAME------------------------------ ------------------------------DR$ORATEXT_ADDRESS$I TESTDR$ORATEXT_ADDRESS$R TESTSYS_IOT_OVER_45158 MSSMSYS_IOT_MAP_45158 MSSMORATEXT ASSMDR$ORATEXT_ADDRESS$NDR$ORATEXT_ADDRESS$K
SQL> select index_name,tablespace_name from user_indexes;
Ind.Name TABLESPACE_NAME------------------------------ ------------------------------SYS_IL0000011906C00002$$ TESTPK_P TESTSYS_IL0000011907C00002$$ TESTDR$ORATEXT_ADDRESS$X TESTSYS_IL0000046825C00006$$ TESTSYS_IL0000046830C00002$$ TESTORATEXT_ADDRESSSYS_IL0000046726C00006$$ TESTSYS_IL0000046731C00002$$ TEST
其中DR$ORATEXT_ADDRESS$I这张表最关键,它存储被lexer分割后的token,也就是我们可以利用搜索的关键字 .SQL> select token_text from dr$oratext_address$i;
TOKEN_TEXT----------------------------------------------------------------1128工业海街号街南路南苏桐苏州市桐路星海园区
13 rows selected.
Elapsed: 00:00:00.01
通过一个查询来看看:
SQL> select name,address from oratext where contains(address,'园区')>0;
NAME ADDRESS------------------------------ --------------------------------------------------王皓 苏州市工业园区星海街南11号司马格达 苏州市工业园区苏桐路28号张磊 苏州市工业园区苏桐路28号
3 rows selected.
Elapsed: 00:00:01.49
SQL> select name,address from oratext where contains(address,'苏桐路')>0;
NAME ADDRESS------------------------------ --------------------------------------------------司马格达 苏州市工业园区苏桐路28号张磊 苏州市工业园区苏桐路28号
2 rows selected.
Elapsed: 00:00:00.07
SQL> select name,address from oratext where contains(address,'星海街')>0;
NAME ADDRESS------------------------------ --------------------------------------------------王皓 苏州市工业园区星海街南11号
1 row selected.
索引维护
ctxsys.context全文索引在发生DML语句后,不同自动同步索引,必须通过ctx_ddl.sync_index同步,
可以定义一个job进行同步,
SQL> insert into oratext values(4,'马成','苏州市观前街7号');
1 row created.
Elapsed: 00:00:00.08SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL> select name,address from oratext where contains(address,'观前街')>0;
no rows selected
Elapsed: 00:00:00.09
SQL> exec ctx_ddl.sync_index('oratext_address','2m');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.18
SQL> select name,address from oratext where contains(address,'观前街')>0;
NAME ADDRESS------------------------------ --------------------------------------------------马成 苏州市观前街7号
1 row selected.
Elapsed: 00:00:00.02
在索引的同步过程中,可能会产生碎片,我们可以定时通过ctx_ddl.optimize_index维护索引.
ctx_report package可以报告全文索引的一些统计信息:
ctx_report.index_size:报告索引的大小
Elapsed: 00:00:00.00SQL> select ctx_report.index_size('oratext_address') from dual;
CTX_REPORT.INDEX_SIZE('ORATEXT_ADDRESS')--------------------------------------------------------------------------------=========================================================================== INDEX SIZE FOR MAYP.ORATEXT_ADDRESS===========================================================================TABLE: MAYP.DR$ORATEXT_ADDRESS$ITABLESPACE NAME: TESTBLOCKS ALLOCATED: 8BLOCKS USED: 3BYTES ALLOCATED: 65,536 (64.00 KB)BYTES USED: 24,576 (24.00 KB)
LOB SEGMENT: MAYP.SYS_LOB0000046825C00006$$TABLE NAME: MAYP.DR$ORATEXT_ADDRESS$ILOB COLUMN: TOKEN_INFOTABLESPACE NAME: TESTBLOCKS ALLOCATED: 8BLOCKS USED: 2BYTES ALLOCATED: 65,536 (64.00 KB)BYTES USED: 16,384 (16.00 KB)
INDEX (NORMAL): MAYP.DR$ORATEXT_ADDRESS$XTABLE NAME: MAYP.DR$ORATEXT_ADDRESS$ITABLESPACE NAME: TEST.........................
TOTALS FOR INDEX MAYP.ORATEXT_ADDRESS---------------------------------------------------------------------------TOTAL BLOCKS ALLOCATED: 72TOTAL BLOCKS USED: 19TOTAL BYTES ALLOCATED: 589,824 (576.00 KB)TOTAL BYTES USED: 155,648 (152.00 KB)
ctx_report.create_index_script:创建索引定义脚本
SQL> select ctx_report.create_index_script('oratext_address') from dual;
CTX_REPORT.CREATE_INDEX_SCRIPT('ORATEXT_ADDRESS')--------------------------------------------------------------------------------begin ctx_ddl.create_preference('"ORATEXT_ADDRESS_DST"','DIRECT_DATASTORE');end;/
begin ctx_ddl.create_preference('"ORATEXT_ADDRESS_FIL"','NULL_FILTER');end;..........................
create index "MAYP"."ORATEXT_ADDRESS" on "MAYP"."ORATEXT" ("ADDRESS") indextype is ctxsys.context parameters(' datastore "ORATEXT_ADDRESS_DST" filter "ORATEXT_ADDRESS_FIL" section group "ORATEXT_ADDRESS_SGP" lexer "ORATEXT_ADDRESS_LEX" wordlist "ORATEXT_ADDRESS_WDL" stoplist "ORATEXT_ADDRESS_SPL" storage "ORATEXT_ADDRESS_STO" ')/
begin ctx_output.end_log;end;/1 row selected.
Elapsed: 00:00:00.23
ctx_report.index_stats:报告索引的内部结构,已经碎片的程度
SQL> declare l_index_stats clob; begin ctx_report.index_stats(index_name=>'oratext_address',report=>l_index_stats); dbms_output.put_line(l_index_stats); end; / 2 3 4 5 6 7 =========================================================================== STATISTICS FOR"MAYP"."ORATEXT_ADDRESS"===========================================================================
indexed documents: 3allocated docids:4$I rows: 17
--------------------------------------------------------------------------- TOKENSTATISTICS---------------------------------------------------------------------------
unique tokens: 17average $I rows per token:1.00tokens with most $I rows: 园区 (0:TEXT) 1 星海 (0:TEXT) 1 桐路 (0:TEXT)1 苏州市 (0:TEXT) 1 苏桐 (0:TEXT) 1 前街 (0:TEXT)1
...........
--------------------------------------------------------------------------- FRAGMENTATIONSTATISTICS---------------------------------------------------------------------------
total size of $I data: 93
$I rows:17estimated $I rows if optimal: 17estimated row fragmentation: 0 %
garbage docids:1estimated garbage size: 19
most fragmented tokens: 园区 (0:TEXT) 0 % 星海 (0:TEXT)0 % 桐路 (0:TEXT) 0 % 苏州市 (0:TEXT) 0 % 苏桐 (0:TEXT)0 % 前街 (0:TEXT) 0 % 南 (0:TEXT) 0 % 路 (0:TEXT)0 % 街南 (0:TEXT) 0 % 街 (0:TEXT) 0 % 号 (0:TEXT)0 % 海街 (0:TEXT) 0 % 观前 (0:TEXT) 0 % 工业 (0:TEXT)0 % 7 (0:TEXT) 0 % 28 (0:TEXT) 0 % 11 (0:TEXT)0 %
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.03这个索引没有碎片.
--结束
参考文档:
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/82387/viewspace-1017150/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/82387/viewspace-1017150/
Oracle全文索引详解
本文详细介绍了Oracle数据库中全文索引的创建与维护方法,包括索引的不同类型、创建语法及参数配置,并提供了实例演示如何使用全文索引进行高效文本搜索。
103

被折叠的 条评论
为什么被折叠?



