关于lucene的介绍就不放这了,网上一搜一堆。本文将用lucene实现一个超级简单的罪犯信息搜索功能。目的是让大家看过以后都会感慨:So `F Easy。
现有罪犯信息表如下:
Id | Name | Gender | Birthday | Crime | Description |
---|---|---|---|---|---|
1 | 隔壁老王 | Male | 1988/08/08 | 造大头,蹭饭 | 小头爸爸的天敌… |
2 | 王南翔 | Male | 1999/01/05 | 破坏公共安全 | 南翔挖掘机学校… |
3 | 绿坝妹 | Female | 2008/07/22 | 传播不良信息,蹭饭 | 向青少年传播不… |
目标:
1.找出所有男性罪犯
2.找出所有犯过‘蹭饭’罪的罪犯
3.找出描述中含有‘挖掘’的罪犯
首先,我们要建一个Java项目,这里采用gradle来构建,建立build.gradle文件,内容如下:
apply plugin: 'java'
apply plugin: 'eclipse'
def projectName = "lucene-1"
version = '0.0.1-SNAPSHOT'
sourceCompatibility = 1.8
// 定义源码目录
sourceSets {
main {
java {
srcDirs = ['src/main/java']
}
resources {
srcDirs = ['src/main/resources']
}
}
test {
java {
srcDirs = ['src/test/java']
}
resources {
srcDirs = ['src/test/resources']
}
}
}
// 生成目录任务
task createJavaProject << {
sourceSets*.java.srcDirs*.each { it.mkdirs() }
sourceSets*.resources.srcDirs*.each { it.mkdirs()}
}
// 在线仓库
// Osc的仓库为了速度快些,可以直接用Maven中央仓库
repositories {
maven {
url 'http://maven.oschina.net/content/groups/public/'
}
mavenCentral()
}
// 项目依赖
dependencies {
compile 'org.apache.lucene:lucene-core:5.0.0'
compile 'org.apache.lucene:lucene-queries:5.0.0'
compile 'org.apache.lucene:lucene-analyzers-common:5.0.0'
compile 'joda-time:joda-time:2.6'
testCompile('junit:junit:4.12')
}
执行如下代码生成目录,eclipse项目
gradle createJavaProject eclipse
在Eclipse中导入刚才建好的项目,打开,在src/main/java/目录下建个criminal包,里面新建个Criminal类(此类为实体类,用来存储罪犯信息)Criminal.java内容如下:
package criminals;
import org.joda.time.DateTime;
// 罪犯类
// 实体类,用于存贮罪犯档案
public class Criminal {
private long id;
private String name;
private String gender;
private DateTime birthday;
private String crime;
private String description;
public Criminal(long id, String name, String gender, DateTime birthday,
String crime, String description) {
super();
this.id = id;
this.name = name;
this.gender = gender;
this.birthday = birthday;
this.crime = crime;
this.description = description;
}
public long getId() {
return id;
}
public void setId(long id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getGender() {
return gender;
}
public void setGender(String gender) {
this.gender = gender;
}
public DateTime getBirthday() {
return birthday;
}
public void setBirthday(DateTime birthday) {
this.birthday = birthday;
}
public String getCrime() {
return crime;
}
public void setCrime(String crime) {
this.crime = crime;
}
public String getDescription() {
return description;
}
public void setDescription(String description) {
this.description = description;
}
@Override
public String toString() {
return "Criminal [id=" + id + ", name=" + name + ", gender=" + gender
+ ", birthday=" + birthday + ", crime=" + crime
+ ", description=" + description + "]";
}
}
新建CriminalDao.java模拟数据库存取层,内容如下:
package criminals;
import java.util.Collection;
import java.util.HashMap;
import java.util.Map;
// 数据库存取层
public class CriminalDao {
private static Map<String, Criminal> criminals = new HashMap<String, Criminal>();
public Collection<Criminal> getAllCriminals() {
return criminals.values();
}
public void saveCriminal(Criminal criminal) {
criminals.put(String.valueOf(criminal.getId()), criminal);
}
public Criminal getById(String id) {
return criminals.get(id);
}
}
监狱类,内容如下:
package criminals;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.cjk.CJKAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;
// 监狱类
// 维护罪犯档案,并提供查询服务
public class Jail {
private static final int MAX_COUNT = 16;
private IndexWriter indexWriter = null;
private Analyzer analyzer = null;
private Directory directory = null;
private IndexReader indexReader = null;
private IndexSearcher indexSearcher = null;
private CriminalDao criminalDao = null;
// 初始化监狱系统
public void initialize() throws IOException {
directory = new RAMDirectory();
analyzer = new CJKAnalyzer();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
indexWriter = new IndexWriter(directory, config);
criminalDao = new CriminalDao();
}
// 往监狱里投放罪犯
public void addCriminal(Criminal criminal) {
criminalDao.saveCriminal(criminal);
}
// 建立罪犯索引
public void buildIndexies() throws IOException {
System.out.println("开始建索引>>>");
Collection<Criminal> criminals = criminalDao.getAllCriminals();
indexWriter.deleteAll();
for (Criminal criminal : criminals) {
Document doc = new Document();
doc.add(new Field("id", String.valueOf(criminal.getId()), StringField.TYPE_STORED));
doc.add(new Field("name", criminal.getName(), StringField.TYPE_STORED));
doc.add(new Field("gender", criminal.getGender(), StringField.TYPE_STORED));
doc.add(new Field("birthday", criminal.getBirthday().toString("yyyy-MM-dd"), StringField.TYPE_STORED));
doc.add(new Field("crime", criminal.getCrime(), TextField.TYPE_STORED));
doc.add(new Field("description", criminal.getDescription(), TextField.TYPE_STORED));
indexWriter.addDocument(doc);
System.out.println("为罪犯:" + criminal.toString() + "建立索引完成.");
}
indexWriter.commit();
System.out.println("共建立" + String.valueOf(indexWriter.numDocs()) + "个罪犯记录索引.");
System.out.println("建索引结束<<<");
}
// 查询罪犯信息
public Collection<Criminal> searchCriminals(String field, String keyWord) throws IOException {
System.out.println("开始查所引>>>");
indexReader = DirectoryReader.open(directory);
indexSearcher = new IndexSearcher(indexReader);
Collection<Criminal> criminals = new ArrayList<Criminal>();
Term term = new Term(field, keyWord);
TermQuery query = new TermQuery(term);
System.out.println("查询条件:" + field + "->" + keyWord);
TopDocs topDocs = indexSearcher.search(query, MAX_COUNT);
System.out.println("查到如下罪犯信息:");
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
int docId = scoreDoc.doc;
Document doc = indexSearcher.doc(docId);
Criminal criminal = criminalDao.getById(doc.get("id"));
System.out.println(criminal.toString());
criminals.add(criminal);
}
System.out.println("查索引结束<<<");
return criminals;
}
}
最后,再建个监狱测试类,来做相应的测试:
package criminals;
import static org.junit.Assert.*;
import java.io.IOException;
import java.util.Collection;
import org.joda.time.DateTime;
import org.junit.Before;
import org.junit.Test;
public class JailTest {
private Jail jail;
@Before
public void setUp() throws Exception {
// 初始化监狱系统
jail = new Jail();
jail.initialize();
// 往监狱里投放罪犯
Criminal criminal;
criminal = new Criminal(1, "隔壁老王", "Male", new DateTime("1988-08-08"), "大头,蹭饭",
"小头爸爸的天敌,经常上小头爸爸家蹭饭,不过此人对大头儿子不错.经常关心大头的学业.");
jail.addCriminal(criminal);
criminal = new Criminal(2, "王南翔", "Male", new DateTime("1999-01-05"), "破坏公共安全",
"南翔挖掘机学校毕业,深得南翔黑客大师黑翔的真传,前不久刚挖断某ET家的网线.");
jail.addCriminal(criminal);
criminal = new Criminal(3, "绿坝妹", "Female", new DateTime("2008-07-22"), "传播不良信息,蹭饭",
"向青少年传播不良网站列表,据传绿坝妹幼时曾备加菲猫挠过,留下心里阴影,所以但凡有加菲猫出现的内容都不让小朋友观看.");
jail.addCriminal(criminal);
}
@Test
public void test() throws IOException {
jail.buildIndexies();
Collection<Criminal> criminals;
// 找出所有男性罪犯
criminals = jail.searchCriminals("gender", "Male");
assertEquals(criminals.size(), 2);
for (Criminal criminal : criminals) {
assertEquals(criminal.getGender(), "Male");
}
// 找出所有犯过‘蹭饭’罪的罪犯
criminals = jail.searchCriminals("crime", "蹭饭");
assertEquals(criminals.size(), 2);
for (Criminal criminal : criminals) {
assertTrue(criminal.getName().equals("隔壁老王") || criminal.getName().equals("绿坝妹"));
}
// 找出描述里包含‘挖掘’的罪犯
criminals = jail.searchCriminals("description", "挖掘");
assertEquals(criminals.size(), 1);
}
}
见证奇迹的时刻到了。跑下测试类看看结果,和咱预想的一样!
开始建索引>>>
为罪犯:Criminal [id=1, name=隔壁老王, gender=Male, birthday=1988-08-08T00:00:00.000+09:00, crime=大头,蹭饭, description=小头爸爸的天敌,经常上小头爸爸家蹭饭,不过此人对大头儿子不错.经常关心大头的学业.]建立索引完成.
为罪犯:Criminal [id=2, name=王南翔, gender=Male, birthday=1999-01-05T00:00:00.000+08:00, crime=破坏公共安全, description=南翔挖掘机学校毕业,深得南翔黑客大师黑翔的真传,前不久刚挖断某ET家的网线.]建立索引完成.
为罪犯:Criminal [id=3, name=绿坝妹, gender=Female, birthday=2008-07-22T00:00:00.000+08:00, crime=传播不良信息,蹭饭, description=向青少年传播不良网站列表,据传绿坝妹幼时曾备加菲猫挠过,留下心里阴影,所以但凡有加菲猫出现的内容都不让小朋友观看.]建立索引完成.
共建立3个罪犯记录索引.
建索引结束<<<
开始查所引>>>
查询条件:gender->Male
查到如下罪犯信息:
Criminal [id=1, name=隔壁老王, gender=Male, birthday=1988-08-08T00:00:00.000+09:00, crime=大头,蹭饭, description=小头爸爸的天敌,经常上小头爸爸家蹭饭,不过此人对大头儿子不错.经常关心大头的学业.]
Criminal [id=2, name=王南翔, gender=Male, birthday=1999-01-05T00:00:00.000+08:00, crime=破坏公共安全, description=南翔挖掘机学校毕业,深得南翔黑客大师黑翔的真传,前不久刚挖断某ET家的网线.]
查索引结束<<<
开始查所引>>>
查询条件:crime->蹭饭
查到如下罪犯信息:
Criminal [id=1, name=隔壁老王, gender=Male, birthday=1988-08-08T00:00:00.000+09:00, crime=大头,蹭饭, description=小头爸爸的天敌,经常上小头爸爸家蹭饭,不过此人对大头儿子不错.经常关心大头的学业.]
Criminal [id=3, name=绿坝妹, gender=Female, birthday=2008-07-22T00:00:00.000+08:00, crime=传播不良信息,蹭饭, description=向青少年传播不良网站列表,据传绿坝妹幼时曾备加菲猫挠过,留下心里阴影,所以但凡有加菲猫出现的内容都不让小朋友观看.]
查索引结束<<<
开始查所引>>>
查询条件:description->挖掘
查到如下罪犯信息:
Criminal [id=2, name=王南翔, gender=Male, birthday=1999-01-05T00:00:00.000+08:00, crime=破坏公共安全, description=南翔挖掘机学校毕业,深得南翔黑客大师黑翔的真传,前不久刚挖断某ET家的网线.]
查索引结束<<<