Full-text Search for SQLite

介绍SQLite的全文搜索模块FTS1,该模块为SQLite添加了全文索引功能,支持创建全文索引表、查询匹配及偏移量检索等功能。

Full-text Search for SQLite
(as of 2006-09-29)

Introduction

The module fts1 (see also FtsTwo and FtsUsage ) adds full-text indexing capabilities to SQLite. It is not yet included in the SQLite binary distribution; you can find it in the SQLite CVS tree in the ext/fts1 directory.

Building fts1

fts1 can be built either as a standalone shared library, or statically linked into the SQLite library.

As a shared library

To build the module as a shared library, compile all source files in the fts1 directory into a shared library (.so or .dll) on your platform. (Sorry - there's no makefile checked in yet. Coming soon.)

Statically linked

To statically link fts1 into SQLite, add all .c files from the fts1 directory to the Makefile you use to compile SQLite so that they will be linked into the SQLite image. You must define the preprocessor symbols SQLITE_CORE and SQLITE_ENABLE_FTS1 when compiling these files.

Initializing fts1

When fts1 is built as a shared library, you can load it into SQLite using the ".load" shell command.

  sqlite> .load fts1

Or you can load it using a SELECT statement:

  SELECT load_extension('fts1');

Note that you may need to call sqlite3_enable_load_extension before loading the extension; see the SQLite LoadableExtensions documentation.

In a static build, fts1 is always available; there's no need to load or otherwise initialize it.

Using fts1

Full-text tables store one or more columns of fully indexed text. You can create a full-text table using the CREATE VIRTUAL TABLE statement. For example, the following creates a table with columns name and ingredients :

  sqlite>create virtual table recipe using fts1(name, ingredients);

You can insert rows into a full-text table in the same way as into an ordinary table with columns of type TEXT:

  sqlite>insert into recipe (name, ingredients) values ('broccoli stew', 'broccoli peppers cheese tomatoes');
sqlite>insert into recipe (name, ingredients) values ('pumpkin stew', 'pumpkin onions garlic celery');
sqlite>insert into recipe (name, ingredients) values ('broccoli pie', 'broccoli cheese onions flour');
sqlite>insert into recipe (name, ingredients) values ('pumpkin pie', 'pumpkin sugar flour butter');

The MATCH operator performs a full-text match on a column in a full-text table:

  sqlite> select rowid, name, ingredients from recipe where name match 'pie';
3|broccoli pie|broccoli cheese onions flour
4|pumpkin pie|pumpkin sugar flour butter
sqlite>

As can be seen in the preceding output, every row in a full-text table has a unique rowid, just as in any other SQLite table.

A query may contain multiple terms, in which case it will return only documents containing all of the terms:

  sqlite> select name, ingredients from recipe where ingredients match 'onions cheese';
broccoli pie|broccoli cheese onions flour
sqlite>

OR queries

Inside a query, the OR operator may be used to retrieve documents containing either of two terms:

  sqlite> select name, ingredients from recipe where ingredients match 'onions OR cheese';
broccoli stew|broccoli peppers cheese tomatoes
pumpkin stew|pumpkin onions garlic celery
broccoli pie|broccoli cheese onions flour
sqlite>

Note that the OR in this query must be capitalized.

The OR operator binds more tightly than the implicit AND between two adjacent terms. Thus, the query 'onions OR cheese pumpkin' matches text which contains either "onions" or "cheese", and also contains "pumpkin":

  sqlite> select name, ingredients from recipe where ingredients match 'onions OR cheese pumpkin';
pumpkin stew|pumpkin onions garlic celery
sqlite>

fts1 does not currently provide any grouping operator (i.e. parentheses) for overriding this default precedence.

Excluding terms

The - operator excludes any documents containing the term which follows it:

  sqlite> select name, ingredients from recipe where ingredients match 'onions -celery';
broccoli pie|broccoli cheese onions flour
sqlite>

Note that a query must contain at least one non-excluded term:

  sqlite> select name, ingredients from recipe where ingredients match '-celery';  /* invalid! */
SQL error: SQL logic error or missing database
sqlite>

Phrase searches

Phrases may be enclosed in double quotes:

  sqlite> select name, ingredients from recipe where ingredients match '"broccoli cheese"';
broccoli pie|broccoli cheese onions flour
sqlite>

Querying multiple columns

To query all columns in a full-text table simultaneously, use the table's name on the left-hand side of the MATCH operator:

  sqlite> select name, ingredients from recipe where recipe match 'pie';
broccoli pie|broccoli cheese onions flour
pumpkin pie|pumpkin sugar flour butter
sqlite>

When an all-column query contains multiple terms, a row will match even if the terms appear in different columns in the row:

  sqlite> select name, ingredients from recipe where recipe match 'sugar pie';
pumpkin pie|pumpkin sugar flour butter
sqlite>

Any term in a query string may be preceded by the name of a particular column to use for matching that term:

  sqlite> select name, ingredients from recipe where recipe match 'name:pie ingredients:onions';
broccoli pie|broccoli cheese onions flour
sqlite>

The following are entirely equivalent:

  sqlite> select name from recipe where ingredients match 'sugar';
sqlite> select name from recipe where recipe match 'ingredients:sugar';

When a specific column name appears to the left of the MATCH operator, that column is used for matching any term without an explicit column qualifier. Thus, the following are equivalent:

  sqlite> select name from recipe where recipe match 'name:pie ingredients:onions';
sqlite> select name from recipe where name match 'pie ingredients:onions';

Note that fts1 currently requires a query to contain at most one instance of the MATCH operator. This means that to match multiple specific columns you must use field specifiers as described above; the following won't work:

  sqlite> select name, ingredients from recipe
...> where name match 'pie' and ingredients match 'onions'; /* invalid! */
SQL error: unable to use function MATCH in the requested context
sqlite>

Retrieving offset information

When returning full-text matches, fts1 can return the character offsets of individual term matches. To retrieve offset information in a query, call the offsets function and pass it the name of your full-text table:

  sqlite> select name, ingredients, offsets(recipe) from recipe where recipe match 'sugar pie';
pumpkin pie|pumpkin sugar flour butter|0 1 8 3 1 0 8 5
sqlite>

offsets returns a string containing a series of integers separated by spaces. Each match is represented by a series of four consecutive integers:

  1. The index of the column containing the match. Columns are numbered starting from 0.
  2. The term in the query expression which was matched. Terms are numbered starting from 0.
  3. The byte offset of the first character of the matching phrase, measured from the beginning of the column's text.
  4. Number of bytes in the match.

For example, in the query above two matches are returned, which we can decode as follows:

  • In column 0 (name), term 1 ("pie") matched at byte offset 8; the match is 3 bytes long (the length of "pie").
  • In column 1 (ingredients), term 0 ("sugar") matched at byte offset 8; the match is 5 bytes long (the length of "sugar").

Generating snippets

fts1 can generate snippets of text surrounding each match returned in a full-text query. To produce a snippet, call the snippet function and pass it the name of your full-text table:

  sqlite> create virtual table poem using fts1(name, text);
sqlite> insert into poem values ('ozymandias', 'i met a traveller from an antique land who said: two vast and trunkless legs of stone stand in the desert');
sqlite> select name, snippet(poem) from poem where text match 'land';
ozymandias|i met a traveller from an antique <b>land</b> who said: two vast and trunkless legs of <b>...</b>
sqlite>

By default, each matching term in a snippet is surrounded with the delimiters "<b>" and "</b>", and an ellipsis "..." indicates text not included in a snippet. (Note that the ellipsis is itself surrounded by delimiters "<b>" and "</b>".)

You can specify your own term delimiters and ellipsis text by specifying extra arguments to the snippet function. Its arguments are as follows:

  1. The name of the virtual table being queried.
  2. Markup to put before each matching word. Default: <b>
  3. Markup to put after each matching word. Default: </b>
  4. Ellipsis markup. Default: <b>...</b>

For example:

  sqlite> select name, snippet(poem, '[', ']', '%%') from poem where text match 'land';
ozymandias|i met a traveller from an antique [land] who said: two vast and trunkless legs of %%
sqlite>

Joining full-text data

A full-text table stores only full-text-indexed strings. To store full-text-indexed values along with other values, it's convenient to use both a full-text table and an ordinary SQLite table, joined by rowid. For example, suppose that you'd like to store a set of messages; each message has a sender and priority, which are not full-text indexed, and a subject and body, which are full-text indexed. You can use the following schema:

  create table email(sender text, priority integer);
create virtual table email_text using fts1(subject, body);

You can insert a new message as follows:

  insert into email (sender, priority) values ('ashley@foo.com', 4);
insert into email_text (rowid, subject, body) values (last_insert_rowid(), 'update', 'coming home now');

To find the sender of all messages containing the word "jam" you can issue a query joining the email and email_text tables:

  select sender, subject from email join email_text on email.rowid = email_text.rowid
where body match 'jam';

Tokenization

As the module indexes a piece of text, it converts the text to a sequence of tokens. Each token becomes a term in the index and can be matched using a full-text query.

The module currently uses the following generic tokenization mechanism. A token is a contiguous sequence of alphanumeric ASCII characters (A-Z, a-z and 0-9). All non-ASCII characters are ignored. Each token is converted to lowercase before it is stored in the index, so all full-text searches are case-insensitive. The module does not perform stemming of any sort.

Soon, we hope to allow applications to define their own tokenizers (we in fact already have a generic tokenizer mechanism in our code; we just have yet to expose it to the outside world).

Performance

There are two steps you can take to greatly improve performance when inserting documents into a full-text index. First, you can set the synchronous pragma to OFF:

   sqlite>pragma synchronous = off;

In our testing we've found that this dramatically increases indexing speed. Of course, you should study the SQLite documentation (see http://www.sqlite.org/pragma.html ) to understand the safety/speed tradeoff which this pragma controls and to determine the setting which is right for your application.

Secondly, you can index more than one document per transaction. In our testing, we've found throughput to be best when we index at least 50 or so documents in each transaction; this dramatically improves performance over the one-document-per-transaction case.

We're still in the process of assessing and improving performance.

Internals

Every full-text table contains a column with the same name as the table itself. This self-named column is used in multi-column queries as described above, and will appear in queries for * :

  sqlite> create virtual table foo using fts1(name, address);
sqlite> insert into foo (name, address) values ('amanda', '43 elm avenue');
sqlite> .header on
sqlite> select * from foo;
name|address|foo
amanda|43 elm avenue|8&#232;5
sqlite>

This column's value is private to fts1. When querying a full-text table, you probably want to specify column names explicitly rather than using *, to avoid seeing this self-named column.

When you create a full-text table, fts1 creates two backing tables which hold the table's contents and full-text index:

  sqlite> .tables
foo foo_content foo_term
sqlite>

You should not access the backing tables directly; their format is internal to fts1 and is subject to change.

Caveats

Please note that the full-text database format is subject to change at any time. We are not planning to implement backward compatibility in updates in the near future, so new code releases may fail spectacularly with old databases. Of course this will change at some future point once our data structures become more stable.

Missing features

The full-text module is still in an early development phase. The following features are missing but hopefully coming soon:

  • We plan to support prefix queries (e.g. "foo*").
  • Applications will be able to specify custom tokenizers.

Requests/ideas/suggestions

  • Some way to return how many matches we had, or some "relevance" value (lalo dot martins at gmail dot com, 07-aug-04 )
【四轴飞行器】非线性三自由度四轴飞行器模拟器研究(Matlab代码实现)内容概要:本文围绕非线性三自由度四轴飞行器模拟器的研究展开,重点介绍基于Matlab代码实现的四轴飞行器动力学建模与仿真方法。研究构建了考虑非线性特性的飞行器数学模型,涵盖姿态动力学与运动学方程,实现了三自由度(滚转、俯仰、偏航)的精确模拟。文中详细阐述了系统建模过程、控制算法设计思路及仿真结果分析,帮助读者深入理解四轴飞行器的飞行动力学特性与控制机制;同时,该模拟器可用于算法验证、控制器设计与教学实验。; 适合人群:具备一定自动控制理论基础和Matlab编程能力的高校学生、科研人员及无人机相关领域的工程技术人员,尤其适合从事飞行器建模、控制算法开发的研究生和初级研究人员。; 使用场景及目标:①用于四轴飞行器非线性动力学特性的学习与仿真验证;②作为控制器(如PID、LQR、MPC等)设计与测试的仿真平台;③支持无人机控制系统教学与科研项目开发,提升对姿态控制与系统仿真的理解。; 阅读建议:建议读者结合Matlab代码逐模块分析,重点关注动力学方程的推导与实现方式,动手运行并调试仿真程序,以加深对飞行器姿态控制过程的理解。同时可扩展为六自由度模型或加入外部干扰以增强仿真真实性。
基于分布式模型预测控制DMPC的多智能体点对点过渡轨迹生成研究(Matlab代码实现)内容概要:本文围绕“基于分布式模型预测控制(DMPC)的多智能体点对点过渡轨迹生成研究”展开,重点介绍如何利用DMPC方法实现多智能体系统在复杂环境下的协同轨迹规划与控制。文中结合Matlab代码实现,详细阐述了DMPC的基本原理、数学建模过程以及在多智能体系统中的具体应用,涵盖点对点转移、避障处理、状态约束与通信拓扑等关键技术环节。研究强调算法的分布式特性,提升系统的可扩展性与鲁棒性,适用于多无人机、无人车编队等场景。同时,文档列举了大量相关科研方向与代码资源,展示了DMPC在路径规划、协同控制、电力系统、信号处理等多领域的广泛应用。; 适合人群:具备一定自动化、控制理论或机器人学基础的研究生、科研人员及从事智能系统开发的工程技术人员;熟悉Matlab/Simulink仿真环境,对多智能体协同控制、优化算法有一定兴趣或研究需求的人员。; 使用场景及目标:①用于多智能体系统的轨迹生成与协同控制研究,如无人机集群、无人驾驶车队等;②作为DMPC算法学习与仿真实践的参考资料,帮助理解分布式优化与模型预测控制的结合机制;③支撑科研论文复现、毕业设计或项目开发中的算法验证与性能对比。; 阅读建议:建议读者结合提供的Matlab代码进行实践操作,重点关注DMPC的优化建模、约束处理与信息交互机制;按文档结构逐步学习,同时参考文中提及的路径规划、协同控制等相关案例,加深对分布式控制系统的整体理解。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值