java鬼混笔记:lucene 1、简单的创建索引和查询

本文介绍使用Lucene创建简单全文索引的过程,并演示如何进行基本的文本查询。通过示例代码展示了如何对文本文件进行索引,并查询特定词汇出现的文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

这次学习笔记是lucene最简单的索引创建和查询(什么是lucene不做介绍和说明了,,,)
首先先创建3个txt文件,我创建的是in_the_end.txt、iridescent .txt、numb.txt(都是林肯公园的歌,可惜主唱今年飞天了。。。),里面都是歌词内容
准备好了,现在就创建索引(流程大概是:拿到文件,分词器分词,分好后根据语来创建索引,保存索引),上代码

索引创建


package cn;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStreamReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.IndexWriterConfig.OpenMode;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;
// 创建全文索引
public class Create {

	public static void main(String[] args) throws Exception {
		
		// 建立存储目录(索引创建后存储放的位置)// 相当于一个数据库
		Directory directory = FSDirectory.open(new File(System.getProperty("user.dir")+File.separator+"dir"));//到时这个目录下有.frq,.prx,.tim等等文件
		// 创建分词器(拆分文字),中文的话这个分词器都是分成单个的
		Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
		// 创建索引配置信息
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_40, analyzer);
		config.setOpenMode(OpenMode.CREATE);
		// 创建索引写入类
		IndexWriter iw = new IndexWriter(directory, config);
		// 哪些文件需要创建索引
		File needIndex = new File(System.getProperty("user.dir")+File.separator+"src"+File.separator+"txt");
		for(File f : needIndex.listFiles()) {
			Document d = new Document();// 相当于数据库里的一条记录
			System.out.println(f.getName());
			d.add(new TextField("name", f.getName(), Store.YES));// 相当于一条记录里的字段和字段的内容,比如字段是name,保存的信息是txt的名字,Store.YES:是否要保存索引
			d.add(new TextField("content", readTxt(f), Store.YES));// 相当于一条记录里的字段和字段的内容,比如字段是contetn,保存的信息是txt的所有内容
			iw.addDocument(d);
			iw.commit();
		}
		iw.close();
	}
	
	public static String readTxt(File file) {
		try {
			BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));
			String str = null;
			StringBuffer sbf = new StringBuffer();
			while((str = br.readLine()) != null){
				sbf.append(str);
			}
			br.close();
			return sbf.toString();
		} catch (Exception e) {
			e.printStackTrace();
			return "";
		}
	}

}
索引创建好了,在dir文件夹下会生成一堆文件。接下来查询一下,现在就直接查单词'end' 在哪个txt文件下
查询代码:


package cn;

import java.io.File;

import org.apache.lucene.document.Document;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
// 查询
public class Search {

	public static void main(String[] args) throws Exception {
		
		// 存储索引目录路径
		Directory directory = FSDirectory.open(new File("G:\\eclipseworkspace\\lucence\\dir"));
		// 索引读取工具
		IndexReader read = DirectoryReader.open(directory);
		// 索引搜索工具
		IndexSearcher searcher = new IndexSearcher(read);
		// 查询
		Query query = new TermQuery(new Term("content","end"));// 相当于在字段content中查找‘end’
		// 查询返回记录
		TopDocs top = searcher.search(query, 3);// 返回前3条
		System.out.println("得到"+top.totalHits+"条记录");// 返回全部有效的数量,虽然只查出前3条
		
		// 拿到有效的doc
		ScoreDoc[] scoreDocs = top.scoreDocs;
		// 遍历获取
		for (ScoreDoc scoreDoc :scoreDocs){
			Document doc = searcher.doc(scoreDoc.doc);// 找出这条记录
			System.out.println(doc.get("name") + "      " + doc.get("content"));// 找出内容
		}
		read.close();// 关闭
	}

}

ok,就这么简单,也是个简单入门例子。
显示结果:


得到2条记录
in_the_end.txt      it starts with one thingi don t know whyit doesn t even matterhow hard you trykeep that in mindi designed this rhymeto explain in due timeall i knowtime is a valuable thingwatch it fly byas the pendulum swingswatch it count downto the end of the daythe clock ticks life awayit s so unrealdidn t look out belowwatch the time goright out the windowtrying to hold on,but didn t even knowwasted it all justto watch you goi kept everything inside andeven though i tried,it all fell apartwhat it meant to me willeventually be amemory of a time wheni tried so hardand got so farbut in the end it doesn t even matteri had to fallto lose it allbut in the end it doesn t even matterone thing,i don t know whyit doesn’t even matterhow hard you try,keep that in mindi designed this rhyme,to remind myself howi tried so hardin spite of the wayyou were mocking meacting like i waspart of your propertyremembering all thetimes you fought with mei m surprised it got so (far)things aren t the waythey were beforeyou wouldn t evenrecognise me anymorenot that youknew me back thenbut it all comesback to me (in the end)you kept everything insideand even though i tried,it all fell apartwhat it meant to me willeventually be amemory of a time when ii tried so hardand got so farbut in the endit doesn t even matteri had to fallto lose it allbut in the endit doesn t even matteri ve put my trust in youpushed as far as i can gofor all thisthere s only one thing you should knowi ve put my trust in youpushed as far as i can gofor all thisthere s only one thing you should knowi tried so hardand got so farbut in the endit doesn t even matteri had to fallto lose it allbut in the endit doesn t even matter


numb.txt      i'm tired of being what you want me to be feeling so faithless lost under the surface don't know what you're expecting of me put under the pressure of walking in your shoes (caught in the undertow just caught in the undertow) every step i take is another mistake to you (caught in the undertow just caught in the undertow) i've become so numb i can't feel you there i've become so tired so much more aware i've becoming this all i want to do is be more like me and be less like you can't you see that you're smothering me holding too tightly afraid to lose control cause everything that you thought i would be has fallen apart right in front of you (caught in the undertow just caught in the undertow) every step that i take is another mistake to you (caught in the undertow just caught in the undertow) and every second i waste is more than i can take i've become so numb i can't feel you there i've become so tired so much more aware i've becoming this all i want to do is be more like me and be less like you and i know i may end up failing too but i know you were just like me with someone disappointed in you i've become so numb i can't feel you there i've become so tired so much more aware i've becoming this all i want to do  is be more like me and be less like you i've become so numb is everything what you want me to be i've become so numb is everything what you want me to be 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值