ZT oracle全文索引

Oracle全文索引详解
本文详细介绍了Oracle数据库中全文索引的创建与维护方法,包括索引的不同类型、创建语法及参数配置,并提供了实例演示如何使用全文索引进行高效文本搜索。
全文索引通常用在文本或网页搜索中,比如类似对 "大学生活真好!" 或者一个文件进行索引,查询方法跟使用平常的SELECT的区别是,需要在WHERE条件中使用一个特定的操作符,全文索引分几个类别,分别使用于不同的搜索环境,不同的类别使用不同的操作符进行搜索.没有全文索引时,这类查询通常用下面的sql查询:
SELECT id,doc FROM table WHERE doc like ‘%国家%’;
[@more@]
全文索引通常用在文本或网页搜索中,比如类似对 "大学生活真好!" 或者一个文件进行索引,查询方法跟使用平常的SELECT的区别是,需要在WHERE条件中使用一个特定的操作符,全文索引分几个类别,分别使用于不同的搜索环境,不同的类别使用不同的操作符进行搜索.没有全文索引时,这类查询通常用下面的sql查询:
SELECT id,doc FROM table WHERE doc like ‘%国家%’;
这些查询能够很好search我们想要的,但随着数据库的增长,查询速度是不可忍受的,数据库只能通过full table scan来完成这项工作.有了全文索引后,能够很好的解决这类问题,oracle会采用一种类似普通索引的技术来索引文档.其实在内部oracle利用了特殊的词法(lexer)分割器,词法分割器跟语言有关系,比如:english、日语、汉字等都有不同的分割器.分割器将文本根据平常的使用习惯分割成最小的词组单元进行存储,oracle 称他为token.比如"大学生活真好!" oracke可能会分割成"大学生"、"生活"、“真好"、"生活真好"等小单元进行存储,所以创建后的全文索引的容量是表的好多倍.索引的内容可以来自表中的某个列或者多个列或子表的列或者某个文件或某个网页等.
  • 索引类别:

到10GR2为止,共有四种全文搜索类型:

Index Type Description Supported Preferences and Parameters Query Operator Notes
CONTEXT Use this index to build a text retrieval application when your text consists of large coherent documents. You can index documents of different formats such as MS Word, HTML or plain text.

With a context index, you can customize your index in a variety of ways.

This index type requires CTX_DDL.SYNC_INDEX after DML on base table.

All CREATE INDEX preferences and parameters supported except for INDEX SET.

These supported parameters include the index partition clause, and the format, charset, and language columns.

CONTAINS

Grammar is called the CONTEXT grammar, which supports a rich set of operations.

The CTXCAT grammar can be used with query templating.

Supports all documents services and query services.

Supports indexing of partitioned text tables.

CTXCAT Use this index type for better mixed query performance. Typically, with this index type, you index small documents or text fragments. Other columns in the base table, such as item names, prices and descriptions can be included in the index to improve mixed query performance.

This index type is transactional, automatically updating itself after DML to base table. No CTX_DDL.SYNC_INDEX is necessary.

INDEX SET

LEXER

STOPLIST

STORAGE

WORDLIST (only prefix_index attribute supported for Japanese data)

Format, charset, and language columns not supported.

Table and index partitioning not supported.

CATSEARCH

Grammar is called CTXCAT, which supports logical operations, phrase queries, and wildcarding.

The CONTEXT grammar can be used with query templating.

Theme querying is supported.

This index is larger and takes longer to build than a CONTEXT index.

The size of a CTXCAT index is related to the total amount of text to be indexed, number of indexes in the index set, and number of columns indexed. Carefully consider your queries and your resources before adding indexes to the index set.

The CTXCAT index does not support table and index partitioning, documents services (highlighting, markup, themes, and gists) or query services (explain, query feedback, and browse words.)

CTXRULE Use CTXRULE index to build a document classification or routing application. The CTXRULE index is an index created on a table of queries, where the queries define the classification or routing criteria. See "CTXRULE Parameters and Limitations". MATCHES Single documents (plain text, HTML, or XML) can be classified using the MATCHES operator, which turns a document into a set of queries and finds the matching rows in the CTXRULE index.
CTXPATH Create this index when you need to speed up existsNode() queries on an XMLType column. STORAGE Use with existsNode() Can only create this index on XMLType column.

Although this index type can be helpful for existsNode() queries, it is not required for XML searching. See "XML Searching"

  • 创建语法

CREATE INDEX [schema.]index ON [schema.]table(column) INDEXTYPE IS
ctxsys.context [ONLINE]
[LOCAL [(PARTITION [partition] [PARAMETERS('paramstring')]
[, PARTITION [partition] [PARAMETERS('paramstring')]])]
[PARAMETERS(paramstring)] [PARALLEL n] [UNUSABLE]];
这里的context还可以换成ctxcat、ctxrule、ctxpath几个类型,支持分区索引.这里很重要的部分
是PARAMETERS部分,定义了索引的各种创建条件.

 PARAMETERS(paramstring)  
 

The syntax for paramstring is as follows:

paramstring =
'[DATASTORE datastore_pref]  定义要所索引的数据来自哪里,
 [FILTER filter_pref] 定义过滤器,
 [CHARSET COLUMN charset_column_name]  当选用显示指定
                                   一个过滤器需要的列
 [FORMAT COLUMN format_column_name] 过滤器需要格式化的列

 [LEXER lexer_pref] 定义一个此法分析器
 [LANGUAGE COLUMN language_column_name]  多语言环境中,显示指定lexer分析的多语言列

 [WORDLIST wordlist_pref]  定义一个常用搜索词表
 [STORAGE storage_pref]  定义索引的存储参数
 [STOPLIST stoplist] 定义不需要index的词组列表
 [SECTION GROUP section_group] 定义一个索引区域(通常用在索引html、xml的环境)
 [MEMORY memsize] 分配索引创建需要的内存空间(in 10G)
 [POPULATE | NOPOPULATE] 是否创建一个空的索引
 [[METADATA] SYNC (MANUAL | EVERY "interval-string" | ON COMMIT)] 定义索引同步时间(in 10G)
 [TRANSACTIONAL]' 是否开启事务更新(in 10G)
  • 索引创建      
  •  [oracle@ique ~]$ sqlplus mayp/mayp
SQL*Plus : Release 10.2.0.4.0 - Production on Thu Aug 21 19:45:43 2008
Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.
Connected to:Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - ProductionWith the Partitioning, OLAP, Data Mining and Real Application Testing option
SQL> create table oratext(id number,name varchar2(30),address varchar2(200)) 
pctfree 10 tablespace assm/  2  
Table created.
SQL> alter table oratext add constraint pk_oratext primary key(id) using index/  2  
Table altered.
SQL> exec ctx_ddl.create_preference('mayp_storage','basic_storage');Elapsed: 00:00:00.22
SQL> exec ctx_ddl.set_attribute('mayp_storage','I_TABLE_CLAUSE','TABLESPACE TEST');Elapsed: 00:00:00.11SQL> exec ctx_ddl.set_attribute('mayp_storage','K_TABLE_CLAUSE','TABLESPACE TEST');Elapsed: 00:00:00.04SQL> exec ctx_ddl.set_attribute('mayp_storage','R_TABLE_CLAUSE','TABLESPACE TEST');Elapsed: 00:00:00.06
SQL> set feedback onSQL> exec ctx_ddl.set_attribute('mayp_storage','N_TABLE_CLAUSE','TABLESPACE TEST');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.01SQL> exec ctx_ddl.set_attribute('mayp_storage','I_INDEX_CLAUSE','TABLESPACE TEST COMPRESS 2');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.01SQL> exec ctx_ddl.set_attribute('mayp_storage','P_TABLE_CLAUSE','TABLESPACE TEST');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.04SQL> exec ctx_ddl.create_preference('mayp_lexer','chinese_lexer')
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.02SQL> exec ctx_ddl.create_preference('mayp_datastore','direct_datastore');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.06
SQL> create index oratext_address on oratext(address) indextype is ctxsys.context 
parameters('lexer mayp_lexer datastore mayp_datastore storage mayp_storage')/  2  
Index created.
Elapsed: 00:00:04.48
这里我重新定了了三个索引的参数:mayp_datastore(要索引的数据的存储方式),mayp_storage(索引的存储位置),
mayp_lexer(词法分割器).在索引创建完之后,会发现多了以dr$index_name开头的四张表和
一个dr$索引和oratext_address索引:
SQL> select table_name,tablespace_name from user_tables;
Tab.Name                       TABLESPACE_NAME------------------------------ ------------------------------DR$ORATEXT_ADDRESS$I           TESTDR$ORATEXT_ADDRESS$R           TESTSYS_IOT_OVER_45158             MSSMSYS_IOT_MAP_45158              MSSMORATEXT                        ASSMDR$ORATEXT_ADDRESS$NDR$ORATEXT_ADDRESS$K
SQL> select index_name,tablespace_name from user_indexes;
Ind.Name                       TABLESPACE_NAME------------------------------ ------------------------------SYS_IL0000011906C00002$$       TESTPK_P                           TESTSYS_IL0000011907C00002$$       TESTDR$ORATEXT_ADDRESS$X           TESTSYS_IL0000046825C00006$$       TESTSYS_IL0000046830C00002$$       TESTORATEXT_ADDRESSSYS_IL0000046726C00006$$       TESTSYS_IL0000046731C00002$$       TEST
其中DR$ORATEXT_ADDRESS$I这张表最关键,它存储被lexer分割后的token,也就是我们可以利用搜索的关键字 .
SQL> select token_text from dr$oratext_address$i;
TOKEN_TEXT----------------------------------------------------------------1128工业海街号街南路南苏桐苏州市桐路星海园区
13 rows selected.
Elapsed: 00:00:00.01
通过一个查询来看看:
SQL> select name,address from  oratext where contains(address,'园区')>0;     
NAME                           ADDRESS------------------------------ --------------------------------------------------王皓                           苏州市工业园区星海街南11号司马格达                       苏州市工业园区苏桐路28号张磊                           苏州市工业园区苏桐路28号
3 rows selected.
Elapsed: 00:00:01.49
SQL> select name,address from  oratext where contains(address,'苏桐路')>0;
NAME                           ADDRESS------------------------------ --------------------------------------------------司马格达                       苏州市工业园区苏桐路28号张磊                           苏州市工业园区苏桐路28号
2 rows selected.
Elapsed: 00:00:00.07
SQL> select name,address from  oratext where contains(address,'星海街')>0;
NAME                           ADDRESS------------------------------ --------------------------------------------------王皓                           苏州市工业园区星海街南11号
1 row selected.
  • 索引维护
ctxsys.context全文索引在发生DML语句后,不同自动同步索引,必须通过ctx_ddl.sync_index同步,
可以定义一个job进行同步,
SQL> insert into oratext values(4,'马成','苏州市观前街7号');
1 row created.
Elapsed: 00:00:00.08SQL> commit;
Commit complete.
Elapsed: 00:00:00.00
SQL> select name,address from oratext where contains(address,'观前街')>0;
no rows selected
Elapsed: 00:00:00.09
SQL> exec ctx_ddl.sync_index('oratext_address','2m');
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.18
SQL> select name,address from oratext where contains(address,'观前街')>0;
NAME                           ADDRESS------------------------------ --------------------------------------------------马成                           苏州市观前街7号
1 row selected.
Elapsed: 00:00:00.02
在索引的同步过程中,可能会产生碎片,我们可以定时通过ctx_ddl.optimize_index维护索引.
ctx_report package可以报告全文索引的一些统计信息:
ctx_report.index_size:报告索引的大小
Elapsed: 00:00:00.00SQL> select ctx_report.index_size('oratext_address') from dual;
CTX_REPORT.INDEX_SIZE('ORATEXT_ADDRESS')--------------------------------------------------------------------------------===========================================================================                    INDEX SIZE FOR MAYP.ORATEXT_ADDRESS===========================================================================TABLE:                               MAYP.DR$ORATEXT_ADDRESS$ITABLESPACE NAME:                     TESTBLOCKS ALLOCATED:                                                        8BLOCKS USED:                                                             3BYTES ALLOCATED:                                         65,536 (64.00 KB)BYTES USED:                                              24,576 (24.00 KB)
LOB SEGMENT:                         MAYP.SYS_LOB0000046825C00006$$TABLE NAME:                          MAYP.DR$ORATEXT_ADDRESS$ILOB COLUMN:                          TOKEN_INFOTABLESPACE NAME:                     TESTBLOCKS ALLOCATED:                                                        8BLOCKS USED:                                                             2BYTES ALLOCATED:                                         65,536 (64.00 KB)BYTES USED:                                              16,384 (16.00 KB)
INDEX (NORMAL):                      MAYP.DR$ORATEXT_ADDRESS$XTABLE NAME:                          MAYP.DR$ORATEXT_ADDRESS$ITABLESPACE NAME:                     TEST.........................
TOTALS FOR INDEX MAYP.ORATEXT_ADDRESS---------------------------------------------------------------------------TOTAL BLOCKS ALLOCATED:                                                 72TOTAL BLOCKS USED:                                                      19TOTAL BYTES ALLOCATED:                                 589,824 (576.00 KB)TOTAL BYTES USED:                                      155,648 (152.00 KB)
ctx_report.create_index_script:创建索引定义脚本
SQL> select ctx_report.create_index_script('oratext_address') from dual;
CTX_REPORT.CREATE_INDEX_SCRIPT('ORATEXT_ADDRESS')--------------------------------------------------------------------------------begin  ctx_ddl.create_preference('"ORATEXT_ADDRESS_DST"','DIRECT_DATASTORE');end;/
begin  ctx_ddl.create_preference('"ORATEXT_ADDRESS_FIL"','NULL_FILTER');end;..........................
create index "MAYP"."ORATEXT_ADDRESS"  on "MAYP"."ORATEXT"      ("ADDRESS")  indextype is ctxsys.context  parameters('    datastore       "ORATEXT_ADDRESS_DST"    filter          "ORATEXT_ADDRESS_FIL"    section group   "ORATEXT_ADDRESS_SGP"    lexer           "ORATEXT_ADDRESS_LEX"    wordlist        "ORATEXT_ADDRESS_WDL"    stoplist        "ORATEXT_ADDRESS_SPL"    storage         "ORATEXT_ADDRESS_STO"  ')/
begin  ctx_output.end_log;end;/1 row selected.
Elapsed: 00:00:00.23
ctx_report.index_stats:报告索引的内部结构,已经碎片的程度
SQL> declare    l_index_stats clob;    begin     ctx_report.index_stats(index_name=>'oratext_address',report=>l_index_stats);     dbms_output.put_line(l_index_stats);  end;   /  2    3    4    5    6    7  ===========================================================================                  STATISTICS FOR"MAYP"."ORATEXT_ADDRESS"===========================================================================
indexed documents:                                                      3allocated docids:4$I rows:                                                               17
---------------------------------------------------------------------------                             TOKENSTATISTICS---------------------------------------------------------------------------
unique tokens:                                                         17average $I rows per token:1.00tokens with most $I rows:  园区 (0:TEXT)                                                         1  星海 (0:TEXT)                                                         1  桐路 (0:TEXT)1  苏州市 (0:TEXT)                                                       1  苏桐 (0:TEXT)                                                         1  前街 (0:TEXT)1
...........
---------------------------------------------------------------------------                         FRAGMENTATIONSTATISTICS---------------------------------------------------------------------------
total size of $I data:                                                 93
$I rows:17estimated $I rows if optimal:                                          17estimated row fragmentation:                                          0 %
garbage docids:1estimated garbage size:                                                19
most fragmented tokens:  园区 (0:TEXT)                                                       0 %  星海 (0:TEXT)0 %  桐路 (0:TEXT)                                                       0 %  苏州市 (0:TEXT)                                                     0 %  苏桐 (0:TEXT)0 %  前街 (0:TEXT)                                                       0 %  南 (0:TEXT)                                                         0 %  路 (0:TEXT)0 %  街南 (0:TEXT)                                                       0 %  街 (0:TEXT)                                                         0 %  号 (0:TEXT)0 %  海街 (0:TEXT)                                                       0 %  观前 (0:TEXT)                                                       0 %  工业 (0:TEXT)0 %  7 (0:TEXT)                                                          0 %  28 (0:TEXT)                                                         0 %  11 (0:TEXT)0 %
PL/SQL procedure successfully completed.
Elapsed: 00:00:00.03这个索引没有碎片.
--结束
参考文档:

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/82387/viewspace-1017150/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/82387/viewspace-1017150/

MATLAB主动噪声和振动控制算法——对较大的次级路径变化具有鲁棒性内容概要:本文主要介绍了一种在MATLAB环境下实现的主动噪声和振动控制算法,该算法针对较大的次级路径变化具有较强的鲁棒性。文中详细阐述了算法的设计原理与实现方法,重点解决了传统控制系统中因次级路径动态变化导致性能下降的问题。通过引入自适应机制和鲁棒控制策略,提升了系统在复杂环境下的稳定性和控制精度,适用于需要高精度噪声与振动抑制的实际工程场景。此外,文档还列举了多个MATLAB仿真实例及相关科研技术服务内容,涵盖信号处理、智能优化、机器学习等多个交叉领域。; 适合人群:具备一定MATLAB编程基础和控制系统理论知识的科研人员及工程技术人员,尤其适合从事噪声与振动控制、信号处理、自动化等相关领域的研究生和工程师。; 使用场景及目标:①应用于汽车、航空航天、精密仪器等对噪声和振动敏感的工业领域;②用于提升现有主动控制系统对参数变化的适应能力;③为相关科研项目提供算法验证与仿真平台支持; 阅读建议:建议读者结合提供的MATLAB代码进行仿真实验,深入理解算法在不同次级路径条件下的响应特性,并可通过调整控制参数进一步探究其鲁棒性边界。同时可参考文档中列出的相关技术案例拓展应用场景。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值