lucene & Lucene Spatial

最新推荐文章于 2020-06-02 17:05:21 发布

弗里曼的小伙伴

最新推荐文章于 2020-06-02 17:05:21 发布

阅读量2.1k

点赞数

CC 4.0 BY-SA版权

分类专栏： Java 文章标签： java 全文检索 lucene 空间检索 Lucene Spatial

本文链接：https://blog.youkuaiyun.com/sf2gis2/article/details/47191769

Java 专栏收录该内容

37 篇文章

订阅专栏

本文详细介绍了如何使用Lucene进行全文检索，并深入探讨了Lucene Spatial在空间索引和检索中的原理及应用。通过创建索引、查询索引以及设置空间查询条件和排序规则，展示了如何实现基于地理位置的搜索。示例代码涵盖了从建立索引到进行空间检索的全过程。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

lucene

sf2gis@163.com

2015年6月26日

1 目标：查询目标词汇所在的相关文档。

参考：http://www.cnblogs.com/forfuture1978/category/300665.html

2 原理：创建目标文档的分词索引，查找目标分词的相关文档。

创建索引：将目标文档进行分词，根据分词创建文档分词索引。将不同文档的索引进行合并，建立文件夹分词索引。通过单一文档和所有文档中出现的频率进行权重排序。

查询索引：查询分词并设置索引，查询后将结果进行权重排序。

3 方法：lucene

全文检索：结构化数据通常使用数据库进行查询、管理。非结构化数据又称全文数据，使用顺序查询、操作时，效率低。通过将全文数据分类，为全文数据建立结构化的索引，方便检索的过程称之为全文检索。

正向索引：为每个文件中的内容创建索引信息，通过文档查找内容的索引。

反向索引：为文件中每个词创建索引信息，通过词查找所在文件的索引。

字典：目标是索引的关键词，方法是通过分词器将文本中的词进行分解，得到所有关键词信息，称为字典。

Lucene：Apache的开源项目，仅支持纯文本全文检索。包括创建索引和查询索引两个主要功能。

4 方法：创建索引，IndexWriter。

目标：使用分词器，将文档创建为索引，并输出到指定目录。

方法：

配置索引信息：IndexWriterConfig。

添加索引内容：addDocuement（）将文档加入到索引。

生成索引：默认在close（）时生成索引。

操作索引：可以添加、删除、更新索引内容。

4.1 配置信息：IndexWriterConfig

使用分词器作为参数。

示例：

4.2 文档：纯文本文档，Document。

目标：代表一个原始文件。其中存储字符串。

方法：由多个Field组成，每个Field由Key-value-type组成。

4.3 分词器：Analyzer。

目标：将文档进行分词，生成字典。

方法：常用的英文分词器是StandardAnalyzer，中文分词器为cjk,cn,smartcn等。

4.4 文档目录：Directory。

目标：索引的最终输出结果。可以保存于文件系统或内存中。

方法：FSDirectory,RAMDirectory。

4.4.1文件系统输出：FSDirectory。

目标：文件系统输出目录。

方法：针对不同的环境有三个子类实现，使用open（）可以自动根据当前环境选择最合适的子类。

open()需要一个Path接口，由Paths提供。Paths类提供两个静态函数，将URI或String转换为Path对象。

4.4.2文件系统输出：RAMDirectory。

目标：内存中快速处理的输出目录。

方法：小数据量时使用，缓存1KB。如果超过百M，则浪费内存。

4.4.3生成内容：索引文件

整个输出目录作为一个完整的索引文件。

组织：段segment（具有相同的前缀，文档组），文档document（），域field，词term。既有正向索引，又有反向索引。

段信息： segment_xxx。

域信息：.fxx。

词信息：.txx。

4.5 示例

public static void main(String[] args) throws IOException {

// TODO Auto-generated method stub

//create index

Document doc=new Document();

doc.add(new Field("myField", "This is atest document,1234xxx",org.apache.lucene.document.TextField.TYPE_STORED));

//Directory output=new RAMDirectory();//write index to ram

Directoryoutput=FSDirectory.open(Paths.get("test_xx_path"));//result indexdirectory of file system

Analyzer analyzer=new StandardAnalyzer();

IndexWriterConfig cfg=new IndexWriterConfig(analyzer);

IndexWriter writer=new IndexWriter(output, cfg);

writer.addDocument(doc);

writer.close();

System.out.println("OK");

}

5 方法：查询索引，IndexSearcher。

目标：使用分词器分解查询内容构建查询，从索引目录中查询分词并输出结果。

方法：

5.1 构建查询：QueryPaser

分解查询：QueryParser，使用与索引生成时相同的分词器进行分解（可以指定Field）。

构建查询：parse()。

5.2 查询：IndexSearcher。

读入索引：DirectoryReader。使用open打开索引目录Dictionary。使用完成之后要close()。

查询：search()。

5.3 示例

/**

package com.thbd.luceneTest;

import java.awt.TextField;

import java.io.IOException;

import java.nio.file.Path;

import java.nio.file.Paths;

importorg.apache.lucene.queryparser.classic.ParseException;

importorg.apache.lucene.queryparser.classic.QueryParser;

importorg.apache.lucene.search.IndexSearcher;

importorg.apache.lucene.search.Query;

importorg.apache.lucene.search.ScoreDoc;

importorg.apache.lucene.store.*;

importorg.apache.lucene.analysis.*;

importorg.apache.lucene.analysis.standard.StandardAnalyzer;

importorg.apache.lucene.document.*;

importorg.apache.lucene.index.*;

/**

* @author zhangli3

public class LuceneTest {

/**

* @param args

* @throws IOException

* @throws ParseException

public static void main(String[] args) throws IOException,ParseException {

// TODO Auto-generated method stub

//create index

Document doc=new Document();

doc.add(new Field("myField", "This is atest document,1234xxx",org.apache.lucene.document.TextField.TYPE_STORED));

//Directory output=new RAMDirectory();//write index to ram

Directory output=FSDirectory.open(Paths.get("test_xx_path"));//resultindex directory of file system

Analyzer analyzer=new StandardAnalyzer();

IndexWriterConfig cfg=new IndexWriterConfig(analyzer);

IndexWriter writer=new IndexWriter(output, cfg);

writer.addDocument(doc);

writer.close();

System.out.println("OK");

//search

QueryParser parser=new QueryParser("myField",analyzer);

Query query=parser.parse("test");

DirectoryReader reader = DirectoryReader.open(output);

IndexSearcher searcher = new IndexSearcher(reader);

ScoreDoc[] results=searcher.search(query,1000).scoreDocs;

for(int i=0;i<results.length;++i){

System.out.println("Hit Query");

Document docHit=searcher.doc(results[i].doc);

System.out.println(docHit.get("myField"));

}

reader.close();

output.close();

}

6 方法：Lucene Spatial，空间索引

目标：基于地理有关的空间搜索，能够将文字描述转换为空间关系。

如搜索“酒仙桥周边1km内的餐馆”

原理：空间元素建立空间索引。将查询语句转换为空间位置和距离，利用空间算法进行空间检索。

方法：

6.1 建立空间索引:使用索引相关算法，将空间元素转化为可索引Field，进行全文检索。

6.1.1空间算法库，Spatial4J

参考：Spatial4J（https://github.com/locationtech/spatial4j）。

6.1.2基本功能：SpatialContext，Spatial4J的基本接口，生成几何类型等基本功能。

makePoint()，makeCircle()。

6.1.3索引相关算法：SpatialStrategy。

目标：设置空间索引的相关性、排序等策略。

方法：

生成空间过滤器：makeFilter()。

生成过虑查询：FilteredQuery类代表使用过虑器的查询。

将shp转换为可索引的空间Field：createIndexableFields（）。

6.1.4示例:

//add shape to index field

private Document addShapeDoc(SpatialStrategy strategy,intid,Shape... shapes){

Document doc = new Document();

doc.add(new StoredField("id", id));

doc.add(new NumericDocValuesField("id", id));

for(Shape shp:shapes){

for(Fieldfield:strategy.createIndexableFields(shp)){//create indexable filed

doc.add(field);

}

doc.add(newStoredField("MyGeoShape"+id,shp.toString()));

}

return doc;

}

6.2 空间检索：指定检索条件和排序规则，进行检索

6.2.1检索条件：SpatialArgs,设置空间查询时的形状和判断关系。

6.2.1.1 空间关系检索（过滤或距离排序）：

Point pt=ctx.makePoint(115.8, 40);

SpatialArgs args=newSpatialArgs(SpatialOperation.Intersects, ctx.makeCircle(pt,2));

Filter filter = strategy.makeFilter(args);

FilteredQuery query=new FilteredQuery(newMatchAllDocsQuery(), filter);

6.2.1.2 按名称检索：TermQuery（），距离排序。

参考：http://qxf567.iteye.com/blog/1984042

Query query=new TermQuery(newTerm("key","jiandemen"));

TopDocs results = searcher.search(query,10);

6.2.2排序规则：Sort，可以指定排序的列或者使用距离。

6.2.2.1 指定排序列：SortField

Sort idSort=new Sort(new SortField("id",SortField.Type.INT));

6.2.2.2 距离排序：ValueSource

ValueSourcevaluesource=strategy.makeDistanceValueSource(pt,DistanceUtils.DEG_TO_KM);

Sort distSort=newSort(valuesource.getSortField(false).rewrite(searcher));

6.3 示例

6.3.1示例：查询指定位置一定范围内的点,按ID排序

public voidtestLuceneSpatial() throws Exception{

//spatial tools

SpatialContext ctx=SpatialContext.GEO;

SpatialPrefixTree spt=new GeohashPrefixTree(ctx, 11);

SpatialStrategy strategy=newRecursivePrefixTreeStrategy(spt, "mygeofield");

//create spatial index

IndexWriterConfig cfg = new IndexWriterConfig(newStandardAnalyzer());

Directory output = new RAMDirectory();

IndexWriter writer = new IndexWriter(output, cfg);

writer.addDocument(addShapeDoc(strategy,10,ctx.makePoint(114, 40)));

writer.addDocument(addShapeDoc(strategy,1,ctx.makePoint(116, 40)));

writer.addDocument(addShapeDoc(strategy,12,ctx.makePoint(118, 40)));

writer.close();

//search spatial index

Point pt=ctx.makePoint(115.8, 40);

SpatialArgs args=newSpatialArgs(SpatialOperation.Intersects, ctx.makeCircle(pt,2));

Filter filter = strategy.makeFilter(args);

FilteredQuery query=new FilteredQuery(newMatchAllDocsQuery(), filter);

DirectoryReader reader = DirectoryReader.open(output);

IndexSearcher searcher = new IndexSearcher(reader);

Sort idSort=new Sort(new SortField("id",SortField.Type.INT));

TopDocs results = searcher.search(query,10,idSort);

printTopDocs(results, searcher);

reader.close();

output.close();

}

//add shape to index field

private Document addShapeDoc(SpatialStrategy strategy,intid,Shape... shapes){

Document doc = new Document();

doc.add(new StoredField("id", id));

doc.add(new NumericDocValuesField("id", id));

for(Shape shp:shapes){

for(Fieldfield:strategy.createIndexableFields(shp)){//create indexable filed

doc.add(field);

}

doc.add(newStoredField("MyGeoShape"+id,shp.toString()));

}

return doc;

}

private void printTopDocs(TopDocs docs,IndexSearcher searcher)throws IOException{

for(ScoreDoc scoredoc:docs.scoreDocs){

System.out.println("Doc="+scoredoc.doc);

Document doc = searcher.doc(scoredoc.doc);

System.out.println(doc.toString());

}

结果：

Doc=0

Document<stored<id:10>stored<MyGeoShape10:Pt(x=114.0,y=40.0)>>

Doc=2

Document<stored<id:12>stored<MyGeoShape12:Pt(x=118.0,y=40.0)>>

Doc=1

Document<stored<id:23>stored<MyGeoShape23:Pt(x=116.0,y=40.0)>>

6.3.2示例：查询指定坐标值一定范围内的点,按距离排序

public void testLuceneSpatial()throws Exception{

//spatial tools

SpatialContext ctx=SpatialContext.GEO;

SpatialPrefixTree spt=new GeohashPrefixTree(ctx, 11);

SpatialStrategy strategy=newRecursivePrefixTreeStrategy(spt, "mygeofield");

//create spatial index

IndexWriterConfig cfg = new IndexWriterConfig(newStandardAnalyzer());

Directory output = new RAMDirectory();

IndexWriter writer = new IndexWriter(output, cfg);

writer.addDocument(addShapeDoc(strategy,10,ctx.makePoint(114, 40)));

writer.addDocument(addShapeDoc(strategy,23,ctx.makePoint(116, 40)));

writer.addDocument(addShapeDoc(strategy,12,ctx.makePoint(118, 40)));

writer.close();

//search spatial index

Point pt=ctx.makePoint(115.8, 40);

DirectoryReader reader = DirectoryReader.open(output);

IndexSearcher searcher = new IndexSearcher(reader);

ValueSourcevaluesource=strategy.makeDistanceValueSource(pt,DistanceUtils.DEG_TO_KM);

Sort distSort=newSort(valuesource.getSortField(false).rewrite(searcher));

TopDocs results = searcher.search(newMatchAllDocsQuery(),10,distSort);

printTopDocs(results, searcher);

reader.close();

output.close();

}

//add shape to index field

private Document addShapeDoc(SpatialStrategy strategy,intid,Shape... shapes){

Document doc = new Document();

doc.add(new StoredField("id", id));

doc.add(new NumericDocValuesField("id", id));

for(Shape shp:shapes){

for(Fieldfield:strategy.createIndexableFields(shp)){//create indexable filed

doc.add(field);

}

doc.add(new StoredField("MyGeoShape"+id,shp.toString()));

}

return doc;

}

private void printTopDocs(TopDocs docs,IndexSearcher searcher)throws IOException{

for(ScoreDoc scoredoc:docs.scoreDocs){

System.out.println("Doc="+scoredoc.doc);

Document doc = searcher.doc(scoredoc.doc);

System.out.println(doc.toString());

}

结果：

Doc=1

Document<stored<id:23>stored<MyGeoShape23:Pt(x=116.0,y=40.0)>>

Doc=0

Document<stored<id:10>stored<MyGeoShape10:Pt(x=114.0,y=40.0)>>

Doc=2

Document<stored<id:12>stored<MyGeoShape12:Pt(x=118.0,y=40.0)>>

6.3.3示例：查询指定名称位置一定范围内的点,按距离排序

/**

package com.thbd.luceneTest;

import java.io.IOException;

import java.nio.file.Paths;

importorg.apache.lucene.queries.function.ValueSource;

import org.apache.lucene.queryparser.classic.*;

importorg.apache.lucene.search.*;

importorg.apache.lucene.store.*;

importorg.apache.lucene.analysis.*;

importorg.apache.lucene.analysis.standard.StandardAnalyzer;

importcom.spatial4j.core.context.SpatialContext;

importcom.spatial4j.core.distance.DistanceUtils;

importcom.spatial4j.core.shape.*;

importorg.apache.lucene.document.*;

importorg.apache.lucene.document.Field.Store;

importorg.apache.lucene.index.*;

importorg.apache.lucene.spatial.SpatialStrategy;

importorg.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy;

importorg.apache.lucene.spatial.prefix.tree.GeohashPrefixTree;

importorg.apache.lucene.spatial.prefix.tree.SpatialPrefixTree;

importorg.apache.lucene.spatial.query.*;

/**

* @author zhangli3

public class LuceneTest {

/**

* @param args

* @throws Exception

public static void main(String[] args) throws Exception {

// TODO Auto-generated method stub

//testLucene();

new LuceneTest().testLuceneSpatial();

}

public void testLuceneSpatial() throws Exception{

//spatial tools

SpatialContext ctx=SpatialContext.GEO;

SpatialPrefixTree spt=new GeohashPrefixTree(ctx, 11);

SpatialStrategy strategy=newRecursivePrefixTreeStrategy(spt, "mygeofield");

//create spatial index

IndexWriterConfig cfg = new IndexWriterConfig(newStandardAnalyzer());

Directory output = new RAMDirectory();

IndexWriter writer = new IndexWriter(output, cfg);

writer.addDocument(addShapeDoc(strategy,10,"sihui",ctx.makePoint(114, 40)));

writer.addDocument(addShapeDoc(strategy,23,"jiuxianqiao",ctx.makePoint(116, 40)));

writer.addDocument(addShapeDoc(strategy,12,"jiandemen",ctx.makePoint(118, 40)));

writer.addDocument(addShapeDoc(strategy,14,"jiandemen xi",ctx.makePoint(118.5, 40)));

writer.close();

//search spatial index

Point pt=ctx.makePoint(115.8, 40);

DirectoryReader reader = DirectoryReader.open(output);

IndexSearcher searcher = new IndexSearcher(reader);

//id sort and filter query

// SpatialArgs args=newSpatialArgs(SpatialOperation.Intersects, ctx.makeCircle(pt,2));

// Filter filter = strategy.makeFilter(args);

// FilteredQuery query=new FilteredQuery(newMatchAllDocsQuery(), filter);

// Sort idSort=new Sort(new SortField("id", SortField.Type.INT));

// TopDocs results = searcher.search(query,10,idSort);

//distance sort

// ValueSourcevaluesource=strategy.makeDistanceValueSource(pt,DistanceUtils.DEG_TO_KM);

// Sort distSort=newSort(valuesource.getSortField(false).rewrite(searcher));

// TopDocs results = searcher.search(newMatchAllDocsQuery(),10,distSort);

//term query and sort by distance

ValueSourcevaluesource=strategy.makeDistanceValueSource(pt,DistanceUtils.DEG_TO_KM);

Sort distSort=new Sort(valuesource.getSortField(false).rewrite(searcher));

Query query=new TermQuery(newTerm("key","jiandemen"));

TopDocs results = searcher.search(query,10,distSort);

printTopDocs(results, searcher);

reader.close();

output.close();

}

//add shape to index field

private Document addShapeDoc(SpatialStrategy strategy,intid,String value,Shape... shapes){

Document doc = new Document();

doc.add(new StoredField("id", id));

doc.add(new NumericDocValuesField("id", id));

for(Shape shp:shapes){

for(Field field:strategy.createIndexableFields(shp)){//createindexable filed

doc.add(field);

}

doc.add(newStoredField("MyGeoShape",shp.toString()));

doc.add(newTextField("key",value,Store.YES));

}

return doc;

}

private void printTopDocs(TopDocs docs,IndexSearcher searcher)throws IOException{

for(ScoreDoc scoredoc:docs.scoreDocs){

System.out.println("Doc="+scoredoc.doc);

Document doc = searcher.doc(scoredoc.doc);

System.out.println(doc.toString());

}

public static void testLucene() throws IOException,ParseException{

//create index

Document doc=new Document();

doc.add(new Field("myField", "This is atest document,1234xxx",org.apache.lucene.document.TextField.TYPE_STORED));

//Directory output=new RAMDirectory();//write index to ram

Directoryoutput=FSDirectory.open(Paths.get("test_xx_path"));//result indexdirectory of file system

Analyzer analyzer=new StandardAnalyzer();

IndexWriterConfig cfg=new IndexWriterConfig(analyzer);

IndexWriter writer=new IndexWriter(output, cfg);

writer.addDocument(doc);

writer.close();

System.out.println("OK");

//search

QueryParser parser=new QueryParser("myField",analyzer);

Query query=parser.parse("test");

DirectoryReader reader = DirectoryReader.open(output);

IndexSearcher searcher = new IndexSearcher(reader);

ScoreDoc[] results=searcher.search(query,1000).scoreDocs;

for(int i=0;i<results.length;++i){

System.out.println("Hit Query");

Document docHit=searcher.doc(results[i].doc);

System.out.println(docHit.get("myField"));

}

reader.close();

output.close();

}

结果：

Doc=2

Document<stored<id:12>stored<MyGeoShape:Pt(x=118.0,y=40.0)>stored,indexed,tokenized<key:jiandemen>>

Doc=3

Document<stored<id:14>stored<MyGeoShape:Pt(x=118.5,y=40.0)> stored,indexed,tokenized<key:jiandemenxi>>