Lucene全文检索引擎详解-优快云博客

本文链接：https://blog.youkuaiyun.com/name_110/article/details/6945934

Lucene是一个高效的，基于Java的全文检索库，提供了完整的查询引擎和索引引擎，部分文本分析引擎。Lucene的目的是为软件开发人员提供一个简单易用的工具包，以方便的在目标系统中实现全文检索的功能，或者是以此为基础建立起完整的全文检索引擎。

从以上简介中可以看出，Lucene是又索引引擎和查询引擎构成的，所以在开发一个基于Lucene的应用时一般要分两步：第一步建立索引；第二步对索引进行查询，得到结果。

为了更好地理解Lucene，在这里先用一个例子讲解一下，这段代码改编自Lucene in action第二版，分两段代码，分别是建立索引部分和查询部分（我没有改任何代码，只是把代码改成了我的风格而已）。

建立索引部分代码：

public class Indexer {
	private IndexWriter writer;

	public Indexer()throws Exception {
		String indexDir = "E:\\Test\\Index";         	//存放索引目录
		String dataDir = "E:\\Test\\Data";          	//存放数据目录
		long start = System.currentTimeMillis();		//开始时间
	  
		Directory dir = FSDirectory.open(new File(indexDir));
		//得到IndexWriter实例，这几个参数可以查阅一下API，了解意义
		writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_30), true, IndexWriter.MaxFieldLength.UNLIMITED); 
		int numIndexed;
		try {
			numIndexed = index(dataDir, new TextFilesFilter());
		}finally {
			close();
		}
		long end = System.currentTimeMillis();
		System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds");
	}

  	public void close() throws IOException {
  		writer.close();  
  	}

  	public int index(String dataDir, FileFilter filter)throws Exception {
  		File[] files = new File(dataDir).listFiles();//得到该目录下的所有文件

  		for (File f: files) 
  			if (!f.isDirectory() && !f.isHidden() && f.exists() && 
  					f.canRead() && (filter == null || filter.accept(f)))
  				indexFile(f);//为满足条件的文件建立索引

  		return writer.numDocs(); 
  	}

  	private void indexFile(File f) throws Exception {
  		System.out.println("Indexing " + f.getCanonicalPath());
  		Document doc = getDocument(f);
  		writer.addDocument(doc);             
  	}
  	
  	protected Document getDocument(File f) throws Exception {
  		Document doc = new Document();
  		doc.add(new Field("contents", new FileReader(f))); //为document增加Filed
  		doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
  		doc.add(new Field("fullpath", f.getCanonicalPath(),  Field.Store.YES, Field.Index.NOT_ANALYZED));
  		return doc;
  	}
  
  	public static void main(String[] args) throws Exception {
  		new Indexer();
  	}
  
  	private static class TextFilesFilter implements FileFilter {
  		public boolean accept(File path) {
  			return path.getName().toLowerCase().endsWith(".txt");     
  		}
  	}
}

查询部分代码：

public class Searcher {

  public static void main(String[] args) throws IllegalArgumentException, IOException, ParseException {
    String indexDir = "E:\\Test\\Index";     //索引存放目录
    String q = "dream";                      //搜索的关键字
    
    search(indexDir, q);
  }

  public static void search(String indexDir, String q)throws IOException, ParseException {
    Directory dir = FSDirectory.open(new File(indexDir));
    IndexSearcher is = new IndexSearcher(dir);   //建立IndexSearch，和IndexWriter相对应

    //新建QueryParser，可查阅API了解参数意义
    QueryParser parser = new QueryParser(Version.LUCENE_30, "contents",new StandardAnalyzer(Version.LUCENE_30));  //4
    Query query = parser.parse(q);          //QueryParser将人可读的搜索关键字转换为lucene可读
    long start = System.currentTimeMillis();
    TopDocs hits = is.search(query, 10); 	//这里开始搜索，返回满足要求的前10个
    long end = System.currentTimeMillis();
    
    System.err.println("Found " + hits.totalHits + " document(s) (in " + (end - start) + 
    		" milliseconds) that matched query '" + q + "':");  
    
    for(ScoreDoc scoreDoc : hits.scoreDocs) {
      Document doc = is.doc(scoreDoc.doc);   
      System.out.println(doc.get("fullpath"));//这里的fullpath在前面的field中有设置哦
    }
    is.close();         
  }
}

从以上代码中可以看出使用Lucene包括两个步骤：第一步建立索引；第二步对索引进行查询，得到结果，类似下图：