这两天在做sphinx全文索引的项目,研究了两天了终于把它搞定了,下面来总结一下
1.安装sphinx(我这里用的是macOS,linux后文使用的大部分命令都兼容)
mkdir /usr/local/sphinx
cd /usr/local/spinx
wget http://sphinxsearch.com/files/sphinx-2.2.11-release.tar.gz
tar -zvxf sphinx-2.2.11-release.tar.gz
cd sphinx-2.2.11
./configure
sudo make && make install
测试是否安装成功
searchd -h //有提示即为成功
安装过程碰到的错误
configuring Sphinx
checking for CFLAGS needed for pthreads… none
checking for LIBS needed for pthreads… -lpthread
checking for pthreads… found
checking whether to compile with MySQL support… yes
checking for mysql_config… mysql_config
checking for mysql_real_connect… no
checking for mysql_real_connect… no
checking MySQL include files… configure: error: missing include files.
**
ERROR: cannot find MySQL include files.
解决办法:sudo apt-get install libmysql++
2.sphinx.conf配置(相关配置参数详见sphinx官网)
# Minimal Sphinx configuration sample (clean, simple, functional)
#
source main_src
{
type = mysql
sql_host = 192.168.1.221
sql_user = root
sql_pass =root.remote
sql_db = caiban
sql_port = 3306 # optional, default is 3306
sql_sock =/tmp/mysql.scok
sql_query_pre =SET NAMES utf8
sql_query_pre =SET SESSION query_cache_type=OFF
sql_query_pre =replace into sph_counter select 1,max(id) from register_enterprise_extends
sql_query = \
SELECT id,company_name,trademark,legal_person_name, UNIX_TIMESTAMP(created_at) AS created_at,reg_address,reg_number,business_scope,linkman,reg_organs,operating_period,views,reg_capital FROM register_enterprise_extends where id<=(select max_doc_id from sph_counter where counter_id=1)
sql_attr_uint =id
sql_field_string =company_name
sql_field_string =trademark
sql_field_string =legal_person_name
sql_attr_timestamp =created_at
sql_field_string =reg_address
sql_field_string =reg_number
sql_field_string =business_scope
sql_field_string =reg_organs
sql_field_string =operating_period
sql_field_string =reg_capital
sql_field_string =views
}
source delta_src: main_src{
sql_ranged_throttle=100
sql_query_pre=SET NAMES utf8
sql_query_pre=SET SESSION query_cache_type=OFF
sql_query= SELECT id,company_name,trademark,legal_person_name, UNIX_TIMESTAMP(created_at) AS created_at,reg_address,reg_number,business_scope,linkman,reg_organs,operating_period,views,reg_capital FROM register_enterprise_extends where id>(select max_doc_id from sph_counter where counter_id=1)
sql_attr_uint =id
sql_field_string =company_name
sql_field_string =trademark
sql_field_string =legal_person_name
sql_attr_timestamp =created_at
sql_field_string =reg_address
sql_field_string =reg_number
sql_field_string =business_scope
sql_field_string =reg_organs
sql_field_string =operating_period
sql_field_string =reg_capital
sql_field_string =views
}
index main{
source =main_src
path=/usr/local/sphinx/main
docinfo =extern
min_word_len =1
charset_type=utf-8
min_prefix_len=0
min_infix_len =1
ngram_len =1
charset_table = U+FF10..U+FF19->0..9, 0..9, U+FF41..U+FF5A->a..z, U+FF21..U+FF3A->a..z,A..Z->a..z, a..z, U+0149, U+017F, U+0138, U+00DF, U+00FF, U+00C0..U+00D6->U+00E0..U+00F6,U+00E0..U+00F6, U+00D8..U+00DE->U+00F8..U+00FE, U+00F8..U+00FE, U+0100->U+0101, U+0101,U+0102->U+0103, U+0103, U+0104->U+0105, U+0105, U+0106->U+0107, U+0107, U+0108->U+0109,U+0109, U+010A->U+010B, U+010B, U+010C->U+010D, U+010D, U+010E->U+010F, U+010F,U+0110->U+0111, U+0111, U+0112->U+0113, U+0113, U+0114->U+0115, U+0115,U+0116->U+0117,U+0117, U+0118->U+0119, U+0119, U+011A->U+011B, U+011B, U+011C->U+011D,U+011D,U+011E->U+011F, U+011F, U+0130->U+0131, U+0131, U+0132->U+0133, U+0133,U+0134->U+0135,U+0135, U+0136->U+0137, U+0137, U+0139->U+013A, U+013A, U+013B->U+013C,U+013C,U+013D->U+013E, U+013E, U+013F->U+0140, U+0140, U+0141->U+0142, U+0142,U+0143->U+0144,U+0144, U+0145->U+0146, U+0146, U+0147->U+0148, U+0148, U+014A->U+014B,U+014B,U+014C->U+014D, U+014D, U+014E->U+014F, U+014F, U+0150->U+0151, U+0151,U+0152->U+0153,U+0153, U+0154->U+0155, U+0155, U+0156->U+0157, U+0157, U+0158->U+0159,U+0159,U+015A->U+015B, U+015B, U+015C->U+015D, U+015D, U+015E->U+015F, U+015F,U+0160->U+0161,U+0161, U+0162->U+0163, U+0163, U+0164->U+0165, U+0165, U+0166->U+0167,U+0167,U+0168->U+0169, U+0169, U+016A->U+016B, U+016B, U+016C->U+016D, U+016D,U+016E->U+016F,U+016F, U+0170->U+0171, U+0171, U+0172->U+0173, U+0173, U+0174->U+0175,U+0175,U+0176->U+0177, U+0177, U+0178->U+00FF, U+00FF, U+0179->U+017A, U+017A, U+017B->U+017C,U+017C, U+017D->U+017E, U+017E, U+0410..U+042F->U+0430..U+044F,U+0430..U+044F,U+05D0..U+05EA, U+0531..U+0556->U+0561..U+0586, U+0561..U+0587, U+0621..U+063A, U+01B9,U+01BF, U+0640..U+064A, U+0660..U+0669, U+066E, U+066F, U+0671..U+06D3, U+06F0..U+06FF,U+0904..U+0939, U+0958..U+095F, U+0960..U+0963, U+0966..U+096F, U+097B..U+097F,U+0985..U+09B9, U+09CE, U+09DC..U+09E3, U+09E6..U+09EF,U+0A05..U+0A39, U+0A59..U+0A5E,U+0A66..U+0A6F, U+0A85..U+0AB9, U+0AE0..U+0AE3,U+0AE6..U+0AEF, U+0B05..U+0B39,U+0B5C..U+0B61, U+0B66..U+0B6F, U+0B71, U+0B85..U+0BB9,U+0BE6..U+0BF2, U+0C05..U+0C39,U+0C66..U+0C6F, U+0C85..U+0CB9, U+0CDE..U+0CE3,U+0CE6..U+0CEF, U+0D05..U+0D39, U+0D60,U+0D61, U+0D66..U+0D6F, U+0D85..U+0DC6,U+1900..U+1938, U+1946..U+194F, U+A800..U+A805,U+A807..U+A822, U+0386->U+03B1,U+03AC->U+03B1, U+0388->U+03B5, U+03AD->U+03B5,U+0389->U+03B7, U+03AE->U+03B7,U+038A->U+03B9, U+0390->U+03B9, U+03AA->U+03B9,U+03AF->U+03B9, U+03CA->U+03B9,U+038C->U+03BF, U+03CC->U+03BF, U+038E->U+03C5,U+03AB->U+03C5, U+03B0->U+03C5,U+03CB->U+03C5, U+03CD->U+03C5, U+038F->U+03C9,U+03CE->U+03C9, U+03C2->U+03C3, U+0391..U+03A1->U+03B1..U+03C1,U+03A3..U+03A9->U+03C3..U+03C9, U+03B1..U+03C1,U+03C3..U+03C9, U+0E01..U+0E2E,U+0E30..U+0E3A, U+0E40..U+0E45, U+0E47, U+0E50..U+0E59, U+A000..U+A48F, U+4E00..U+9FBF,U+3400..U+4DBF, U+20000..U+2A6DF, U+F900..U+FAFF,U+2F800..U+2FA1F, U+2E80..U+2EFF,U+2F00..U+2FDF, U+3100..U+312F, U+31A0..U+31BF,U+3040..U+309F, U+30A0..U+30FF,U+31F0..U+31FF, U+AC00..U+D7AF, U+1100..U+11FF, U+3130..U+318F, U+A000..U+A48F,U+A490..U+A4CF
ngram_chars =U+4E00..U+9FBF, U+3400..U+4DBF, U+20000..U+2A6DF, U+F900..U+FAFF,U+2F800..U+2FA1F, U+2E80..U+2EFF, U+2F00..U+2FDF, U+3100..U+312F, U+31A0..U+31BF,U+3040..U+309F, U+30A0..U+30FF, U+31F0..U+31FF, U+AC00..U+D7AF, U+1100..U+11FF, U+3130..U+318F, U+A000..U+A48F, U+A490..U+A4CF
}
index delta: main{
source=delta_src
path =/usr/local/sphinx/delta
}
indexer
{
mem_limit = 128M
}
searchd{
listen = 9312
listen = 9306:mysql41
log = /usr/local/sphinx/log/searchd.log
query_log = /usr/local/sphinx/log/query.log
read_timeout = 5
max_children = 30
pid_file = /usr/local/sphinx/log/searchd.pid
seamless_rotate = 1
preopen_indexes = 1
unlink_old = 1
max_matches = 1000
workers = threads # for RT to work
binlog_path = /usr/local/sphinx/data
}
3.生成索引,开启sphinx进程
/usr/local/bin/indexer -c /usr/local/sphinx/sphinx.conf –all
/usr/local/bin/searchd -c /usr/local/sphinx/sphinx.conf
# 查看进程是否已经开启
ps aux|grep searchd
4.计划任务执行shell脚本,定期更新增量索引(新增数据的索引)和主索引
vim delta_index.sh
#/bin/sh
#停止sphinx服务,将输出重定向
/usr/local/bin/indexer -c /usr/local/sphinx/sphinx.conf delta --rotate >> /usr/local/sphinx/log/deltaindex.log;
/usr/local/bin/indexer --merge main delta --rotate -c /usr/local/sphinx/sphinx.conf >> /usr/local/sphinx/log/deltaindex.log
:wq
vim main_index.sh
#!/bin/sh
#停止正在运行的searchd
/usr/local/bin/indexer -c /usr/local/sphinx/sphinx.conf main --rotate >> /usr/local/sphinx/log/mainindex.log
:wq
crontab -e
#插入以下内容
*/1 * * * * /bin/sh /usr/local/sphinx/build_delta_index.sh > /dev/null 2>&1
30 2 * * * /bin/sh /usr/local/sphinx/build_main_index.sh > /dev/null 2>&1
:wq
特别注意,这里有个坑,我在写脚本的时候,无数次一个字不差地敲完脚本代码,然而发现并没有正常运行,打开日志发现,一直在报错:–merge无法识别的参数。我当时内心纠结,到底是哪里出了错,我找了一天都没找出来,后来我看了indexer –help中的命令实例,于是我复制了一条命令,除了改了最后的索引文件名,什么都没动,结果亮瞎我的双眼,一个字不差,我自己敲的不行,复制过去的完美运行,ubantu的vim编辑器有毒。
5.laravel中sngrl插件使用相关(sngrl\SphinxSearch项目源码地址)
5.1.1完成了前面的几个步骤sphinx索引基本上所需环境已经搭建完毕,下面就是sngrl插件简单使用方法
在composer.json中“require”选项中加入
"require": {
/*** Some others packages ***/
"sngrl/sphinxsearch": "dev-master",
},
执行composer install或者composer update,个人建议使用composer install,很多依赖包都是国外的,用更新的方式安装过程会比较漫长而且可能出现更新中断的情况
5.1.2直接运行使用composer命令行方式
composer require sngrl/sphinxsearch:dev-master
5.2 在app.conf中”provider”选项中加入
'providers' => array(
/*** Some others providers ***/
'sngrl\SphinxSearch\SphinxSearchServiceProvider',
),
5.3生成组件所需配置文件
php artisan vendor:publish --provider=sngrl\SphinxSearch\SphinxSearchServiceProvider --force
5.4配置文件修改
return array (
//本地sphinx服务器地址
'host' => '127.0.0.1',
//本地sphinx服务器端口号
'port' => 9312,
'indexes' => array (
//这里的my_index_name是刚才配置sphinx.conf中的索引名称,例如我上面的配置文件我的索引名称就应该为main,后面的数组中table表示索引关联的表,第二个key为搜索结果中关联id对应的表id名,
'my_index_name' => array ( 'table' => 'my_keywords_table', 'column' => 'id' ),
//当然也可以不使用数组关联表
//'my_index_name' => FALSE,
)
);
5.5简单常见使用方法
//别忘记引入SphinxSearch()类
$sphinx = new SphinxSearch();
//search()第一个参数是查询的关键字,第二个参数是配置文件中添加的索引名(my_index_name)
$results = $sphinx->search('my query', 'index_name')->query();//返回值为原生sphinx的结果
$results = $sphinx->search('my query', 'index_name')->get();//返回值为封装的后结果数组
//在某个字段中搜索关键字(返回原生的sphinx结果数组),并添加分页限制
$sphinx->limit($limit,($page - 1) * $limit);
$result=$sphinx->search('@title "my query"','index_name')->query();