Solr全文检索系统详解-优快云博客

本文链接：https://blog.youkuaiyun.com/manba123456/article/details/105106389

上篇文章我们讲到Lucene，Lucene在创建索引，维护索引时相当繁琐，这些slor都可以解决。

Solr是一个高性能，采用Java开发，基于Lucene的全文搜索服务器。同时对其进行了扩展，提供了比Lucene更为丰富的查询语言，同时实现了可配置、可扩展并对查询性能进行了优化，并且提供了一个完善的功能管理界面，是一款非常优秀的全文搜索引擎。

Solr可以独立运行，运行在Jetty、Tomcat等这些Servlet容器中，Solr 索引的实现方法很简单，用 POST 方法向 Solr 服务器发送请求添加、删除、更新索引，发送 HTTP GET 请求来查询。

一、solr的目录结构

example下目录结构

bin：solr的运行脚本

contrib：solr的一些贡献软件/插件，用于增强solr的功能。

dist：该目录包含build过程中产生的war和jar文件，以及相关的依赖文件。

docs：solr的API文档

example：solr工程的例子目录：

l example/solr：

该目录是一个包含了默认配置信息的Solr的Core目录。

l example/multicore：

该目录包含了在Solr的multicore中设置的多个Core目录。

l example/webapps：

该目录中包括一个solr.war，该war可作为solr的运行实例工程。

licenses：solr相关的一些许可信息

1，重点介绍example下的三个文件夹，搭建slor服务器必须用到的(如上图所示)

lib/ext/下的五个jar包，为slor独有的包

webapps/下的slor.war(直接扔到tomcat下)

slor：称为家。存放索引的库(数据库)

slor/collection1 下自动生成data/index来存放索引

也可以有collection2 来存放索引

二、搭建单机版slor(分三步)

1，把slor.war放到tomcat的webapps下，解压后得slor文件夹

2，把lib/ext/下的五个jar包依赖包放到slor下的lib目录下

3，自己建立个索引库(家)solrhome，将原包下的家内容复制到该目录下(自己建立家方便后面搭集群)

修改D:\solar\apache-tomcat-7.0.53\webapps\solr\WEB-INF 下的web.xml文件，配置索引库(家)的路径

启动tomcat，访问http://localhost:8080/solr

三、schema.xml配置文件(只有这个文件里有的域才可以用)

1，域

solrhome\collection1\conf目录下有个schema.xml配置文件(该配置文件用来配置域相关信息)，在上篇文章中Luncene中是用代码去设置。（是否分析、是否索引、是否保存、是否多值）

2，动态域

域的名称符合就行(如*_i：1_i、qq_i、aa_i都可以)

3，复制域

搜索两个以上域时

比如说当在电商网站搜索：名称(域)+描述(域)时，会将这两个域拷贝到text域中，这样搜索一个text中就行

4，主键id(每个Filed都必须要有的，唯一不可重复)

四、配置中文分词器(分四步)

1，把IKAnalyzer2012FF_u1.jar添加到solr/WEB-INF/lib目录下

2，复制IKAnalyzer的配置文件和自定义词典和停用词词典到solr的classpath下

3，在schema.xml中添加一个自定义的fieldType，使用中文分析器

<!-- IKAnalyzer-->
    <fieldType name="text_ik" class="solr.TextField">
      <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>
    </fieldType>

4，定义field，指定field的type属性为text_ik

<!--IKAnalyzer Field-->
   <field name="title_ik" type="text_ik" indexed="true" stored="true" />
   <field name="content_ik" type="text_ik" indexed="true" stored="false" multiValued="true"/>

五，solr管理台查询页面介绍

q：搜索的关键词

fq：以关键词为结果基础上过滤

sort：排序

start，rows：从第几个开始展示，展示几个

fl：需要搜索哪些Filed(域)

df：默认域(当q中没有设置域，只有域值时，则搜索默认域)

wt：返回数据格式

hl：高亮(关键词显示指定颜色)

六、批量导入数据(分三步)

提前备好数据库数据，每个字段相当于一个Filed，一条数据相当于一个Document

1，把相关jar包和数据库驱动放到D:\solar\solrhome\collection1\lib下

2，配置solrconfig.xml文件，添加一个requestHandler

<requestHandler name="/dataimport"

class="org.apache.solr.handler.dataimport.DataImportHandler">

    <lst name="defaults">

      <str name="config">data-config.xml</str>

     </lst>

  </requestHandler>

3，创建一个data-config.xml(与solrconfig.xml同级目录)

<?xml version="1.0" encoding="UTF-8" ?>  
<dataConfig>   
<dataSource type="JdbcDataSource"   
		  driver="com.mysql.jdbc.Driver"   
		  url="jdbc:mysql://localhost:3306/lucene"   
		  user="root"   
		  password="root"/>   
<document>   
	<entity name="product" query="SELECT pid,name,catalog_name,price,description,picture FROM products ">
		 <field column="pid" name="id"/> 
		 <field column="name" name="product_name"/> 
		 <field column="catalog_name" name="product_catalog_name"/> 
		 <field column="price" name="product_price"/> 
		 <field column="description" name="product_description"/> 
		 <field column="picture" name="product_picture"/> 
	</entity>   
</document>   

</dataConfig>

4，在schema.xml中添加业务系统Field

<!--product-->
   <field name="product_name" type="text_ik" indexed="true" stored="true"/>
   <field name="product_price"  type="float" indexed="true" stored="true"/>
   <field name="product_description" type="text_ik" indexed="true" stored="false" />
   <field name="product_picture" type="string" indexed="false" stored="true" />
   <field name="product_catalog_name" type="string" indexed="true" stored="true" />

   <field name="product_keywords" type="text_ik" indexed="true" stored="false" multiValued="true"/>
   <copyField source="product_name" dest="product_keywords"/>
   <copyField source="product_description" dest="product_keywords"/>

重启tomcat

以上完成后我们就在索引库中建立了自己业务系统中的索引

七、使用SolrJ管理索引库

solrj是访问Solr服务的java客户端，提供索引和搜索的请求方法，SolrJ通常在嵌入在业务系统中，通过SolrJ的API接口操作Solr服务

1,导入核心包

2，简单的增删改代码实现

public void addDocument() throws Exception {
        //索引库中添加索引
		//和solr服务器创建连接
		//参数：solr服务器的地址
		SolrServer solrServer = new HttpSolrServer("http://localhost:8080/solr");
		//创建一个文档对象
		SolrInputDocument document = new SolrInputDocument();
		//向文档中添加域
		//第一个参数：域的名称，域的名称必须是在schema.xml中定义的
		//第二个参数：域的值
		document.addField("id", "c0001");
		document.addField("title_ik", "使用solrJ添加的文档");
		document.addField("content_ik", "文档的内容");
		document.addField("product_name", "商品名称");
		//把document对象添加到索引库中
		solrServer.add(document);
		//提交修改
		solrServer.commit();

        //根据id删除文档
		solrServer.deleteById("c0001");
		//提交修改
		solrServer.commit();

        //solrJ中只有add方法没有update，只要添加一条新的文档和被修改的id一样即可

3，复杂查询代码演示(重点) 其中包含查询、过滤、分页、排序、高亮显示等处理

@Test
	public void queryIndex2() throws Exception {
        //其中包含查询(关键词)、过滤、分页、排序、高亮显示等处理
		//创建连接
		SolrServer solrServer = new HttpSolrServer("http://localhost:8080/solr");
		//创建一个query对象
		SolrQuery query = new SolrQuery();
		//设置查询条件
		query.setQuery("钻石");
		//过滤条件
		query.setFilterQueries("product_catalog_name:幽默杂货");
		//排序条件
		query.setSort("product_price", ORDER.asc);
		//分页处理
		query.setStart(0);
		query.setRows(10);
		//结果中域的列表
		query.setFields("id","product_name","product_price","product_catalog_name","product_picture");
		//设置默认搜索域
		query.set("df", "product_keywords");
		//高亮显示
		query.setHighlight(true);
		//高亮显示的域
		query.addHighlightField("product_name");
		//高亮显示的前缀
		query.setHighlightSimplePre("<em>");
		//高亮显示的后缀
		query.setHighlightSimplePost("</em>");
		//执行查询
		QueryResponse queryResponse = solrServer.query(query);
		//取查询结果
		SolrDocumentList solrDocumentList = queryResponse.getResults();
		//共查询到商品数量
		System.out.println("共查询到商品数量:" + solrDocumentList.getNumFound());
		//遍历查询的结果
		for (SolrDocument solrDocument : solrDocumentList) {
			System.out.println(solrDocument.get("id"));
			//取高亮显示,高亮的结果在另外一个容器Map中
			String productName = "";
			Map<String, Map<String, List<String>>> highlighting = queryResponse.getHighlighting();
			List<String> list = highlighting.get(solrDocument.get("id")).get("product_name");
			//判断是否有高亮内容
			if (null != list) {
				productName = list.get(0);
			} else {
				productName = (String) solrDocument.get("product_name");
			}
			
			System.out.println(productName);
			System.out.println(solrDocument.get("product_price"));
			System.out.println(solrDocument.get("product_catalog_name"));
			System.out.println(solrDocument.get("product_picture"));

八、solr在企业中的应用(以京东为例搜索)