SQLServer Full Text摘记

全文索引优化与应用

最新推荐文章于 2021-04-08 01:25:18 发布

转载最新推荐文章于 2021-04-08 01:25:18 发布 · 746 阅读

文章标签：

#sqlserver #search #performance #duplicates #conflict #database

数据库全文检索专栏收录该内容

7 篇文章

订阅专栏

本文探讨了全文索引在数据库中的实现与优化方法，包括不同填充方式的特点、最佳实践建议以及如何通过调整配置提高全文搜索性能。

Typically, a single catalog will handle all the full-text searches for a database, although dedicating a
single catalog to a very large table (one with over one million rows) will improve performance.

全文索引创建完之后，需要进行填充（如果在创建的时候没有选择创建后进行填充），创建后在表的右键盘菜单里选择全文索引－启用全文索引，然后再选启动完全填充就行了。
全文索引的更新和普通的sql索引不一样，它不是在事务过程中进行的，它是要等到sqlserver把数据发给索引才进行更新，这样可以不影响大文本更新的速度，但是缺点是它不是实时的。全文索引创建之初都是空的，如果表中已经有数据要进行完全填充，空的就不需要了。

全文索引的填充有两种方式：Incremental populations和Change tracking and background population (default)，
incremental population表需要有一个rowversion列，
Incremental populations present two problems. First, a built-in delay occurs between the time the data is entered and the time the user can find the data using full-text search.
Second,incremental populations consolidate all the changes into a single process that consumes a
significant amount of CPU time during the incremental change.

Change tracking and background population (default):每当一行发生变更的时候，sqlserver就会发送一行的增量修改，而且它不在事务范围内，并不会影响更新的执行，它对系统的消耗也不明显，所以这种方式在实时性跟系统消耗间取得了很好的平衡。这个一般来说是最好的方案。
Change tracking can also be configured to require manual pushes of only the changed data.

Best Practice:If the database project incorporates searching for words within columns, use full-text search with change tracking and background population. It’s the best overall way to balance search performance with update
performance.

使用全文索引搜索词用contains和containstable函数。
contains函数用来确定一个词是否在某列中出现
containstable函数用来根据The frequency/uniqueness of the word in the table和The frequency/uniqueness of the word in the column来给一个词评级，返回一个两列的表，一个是创建全文索引的时候用的标识列一个是评级结果。a rare word will be ranked as statistically more important than a common word.

全文索引是不会大小写的。即使sqlserver设置成区分大小写，对它也无效。

在多列中查多个词的解决方案：
/*可以实现在多列中查多个词，但是这个有性能问题*/
SELECT Title
FROM Fable
WHERE CONTAINS (*,‘Thrifty’)
AND CONTAINS(*,‘supperless’)

The other solution to the multiple-column search problem consists of adding an additional column
to hold all the text to be searched and duplicating the data from the original columns to a
FullTextSearch column within an after trigger or using a persisted computed column. This solution
is not smooth either. It duplicates data and costs performance time during inserts and updates. The crux
of the decision regarding how to solve the multiple-column search is the conflict between fast reads and
fast writes — OLAP versus OLTP.

Fuzzy Searches:
Internally, the free-form text is broken down into multiple words and phrases, and the full-text search with inflections and weighting is then performed on the result.

FREETEXT and FREETEXTTABLE essentially turn on every advanced feature of iFTS and　perform a fuzzy word search.

Performance：
在全文索引性能调整方面有以下两点：
iFTS benefits from a very fast subsystem. Place your catalog on its own controller, preferably
its own RAID 10 array. A sweet spot exists for SQL iFTS on eight-way servers. After a full or
incremental population, force a master merge, which will consolidate all the shadow indexes
into a single master index, by issuing the following command:
ALTER FULLTEXT CATALOG catalog_name REORGANIZE;

You can also increase the maximum number of ranges that the gathering process can use. To
do so, issue the following command:
EXEC sp_configure ‘max full-text crawl range’, 32;