Lucene学习笔记 --- (1)用lucene查找罪犯信息

本文通过Lucene实现罪犯信息搜索功能,包括查询男性罪犯、特定罪行的罪犯及描述中含特定词汇的罪犯。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

关于lucene的介绍就不放这了,网上一搜一堆。本文将用lucene实现一个超级简单的罪犯信息搜索功能。目的是让大家看过以后都会感慨:So `F Easy。
现有罪犯信息表如下:

IdNameGenderBirthdayCrimeDescription
1隔壁老王Male1988/08/08造大头,蹭饭小头爸爸的天敌…
2王南翔Male1999/01/05破坏公共安全南翔挖掘机学校…
3绿坝妹Female2008/07/22传播不良信息,蹭饭向青少年传播不…

目标:
1.找出所有男性罪犯
2.找出所有犯过‘蹭饭’罪的罪犯
3.找出描述中含有‘挖掘’的罪犯

首先,我们要建一个Java项目,这里采用gradle来构建,建立build.gradle文件,内容如下:

apply plugin: 'java'
apply plugin: 'eclipse'

def projectName = "lucene-1"
version = '0.0.1-SNAPSHOT'
sourceCompatibility = 1.8

// 定义源码目录
sourceSets {
   main {
      java {
         srcDirs = ['src/main/java']
      }
      resources {
         srcDirs = ['src/main/resources']
      }
   }
   test {
      java {
         srcDirs = ['src/test/java']
      }
      resources {
         srcDirs = ['src/test/resources']
      }
   }
}

// 生成目录任务
task createJavaProject << {
      sourceSets*.java.srcDirs*.each { it.mkdirs() }
      sourceSets*.resources.srcDirs*.each { it.mkdirs()}
}


// 在线仓库
// Osc的仓库为了速度快些,可以直接用Maven中央仓库
repositories {
      maven {
         url 'http://maven.oschina.net/content/groups/public/'
      }
      mavenCentral()
}

// 项目依赖
dependencies {
      compile 'org.apache.lucene:lucene-core:5.0.0'
      compile 'org.apache.lucene:lucene-queries:5.0.0'
      compile 'org.apache.lucene:lucene-analyzers-common:5.0.0'
      compile 'joda-time:joda-time:2.6'

      testCompile('junit:junit:4.12')
}

执行如下代码生成目录,eclipse项目

gradle createJavaProject eclipse

在Eclipse中导入刚才建好的项目,打开,在src/main/java/目录下建个criminal包,里面新建个Criminal类(此类为实体类,用来存储罪犯信息)Criminal.java内容如下:

package criminals;

import org.joda.time.DateTime;

// 罪犯类
// 实体类,用于存贮罪犯档案
public class Criminal {
    private long id;
    private String name;
    private String gender;
    private DateTime birthday;
    private String crime;
    private String description;
    public Criminal(long id, String name, String gender, DateTime birthday,
            String crime, String description) {
        super();
        this.id = id;
        this.name = name;
        this.gender = gender;
        this.birthday = birthday;
        this.crime = crime;
        this.description = description;
    }
    public long getId() {
        return id;
    }
    public void setId(long id) {
        this.id = id;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
    public String getGender() {
        return gender;
    }
    public void setGender(String gender) {
        this.gender = gender;
    }
    public DateTime getBirthday() {
        return birthday;
    }
    public void setBirthday(DateTime birthday) {
        this.birthday = birthday;
    }
    public String getCrime() {
        return crime;
    }
    public void setCrime(String crime) {
        this.crime = crime;
    }
    public String getDescription() {
        return description;
    }
    public void setDescription(String description) {
        this.description = description;
    }
    @Override
    public String toString() {
        return "Criminal [id=" + id + ", name=" + name + ", gender=" + gender
                + ", birthday=" + birthday + ", crime=" + crime
                + ", description=" + description + "]";
    }


}

新建CriminalDao.java模拟数据库存取层,内容如下:

package criminals;

import java.util.Collection;
import java.util.HashMap;
import java.util.Map;

// 数据库存取层
public class CriminalDao {
    private static Map<String, Criminal> criminals = new HashMap<String, Criminal>();

    public Collection<Criminal> getAllCriminals() {
        return criminals.values();
    }

    public void saveCriminal(Criminal criminal) {
        criminals.put(String.valueOf(criminal.getId()), criminal);
    }
    public Criminal getById(String id) {
        return criminals.get(id);
    }
}

监狱类,内容如下:

package criminals;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.cjk.CJKAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

// 监狱类
// 维护罪犯档案,并提供查询服务
public class Jail {
    private static final int MAX_COUNT = 16;
    private IndexWriter indexWriter = null;
    private Analyzer analyzer = null;
    private Directory directory = null;
    private IndexReader indexReader = null;
    private IndexSearcher indexSearcher = null;
    private CriminalDao criminalDao = null;

    // 初始化监狱系统
    public void initialize() throws IOException {
        directory = new RAMDirectory();
        analyzer = new CJKAnalyzer();
        IndexWriterConfig config = new IndexWriterConfig(analyzer);
        indexWriter = new IndexWriter(directory, config);
        criminalDao = new CriminalDao();
    }

    // 往监狱里投放罪犯
    public void addCriminal(Criminal criminal) {
        criminalDao.saveCriminal(criminal);
    }

    // 建立罪犯索引
    public void buildIndexies() throws IOException {
        System.out.println("开始建索引>>>");
        Collection<Criminal> criminals = criminalDao.getAllCriminals();
        indexWriter.deleteAll();
        for (Criminal criminal : criminals) {
            Document doc = new Document();
            doc.add(new Field("id", String.valueOf(criminal.getId()), StringField.TYPE_STORED));
            doc.add(new Field("name", criminal.getName(), StringField.TYPE_STORED));
            doc.add(new Field("gender", criminal.getGender(), StringField.TYPE_STORED));
            doc.add(new Field("birthday", criminal.getBirthday().toString("yyyy-MM-dd"), StringField.TYPE_STORED));
            doc.add(new Field("crime", criminal.getCrime(), TextField.TYPE_STORED));
            doc.add(new Field("description", criminal.getDescription(), TextField.TYPE_STORED));
            indexWriter.addDocument(doc);
            System.out.println("为罪犯:" + criminal.toString() + "建立索引完成.");
        }
        indexWriter.commit();
        System.out.println("共建立" + String.valueOf(indexWriter.numDocs()) + "个罪犯记录索引.");
        System.out.println("建索引结束<<<"); 
    }

    // 查询罪犯信息
    public Collection<Criminal> searchCriminals(String field, String keyWord) throws IOException {
        System.out.println("开始查所引>>>");
        indexReader = DirectoryReader.open(directory);
        indexSearcher = new IndexSearcher(indexReader);
        Collection<Criminal> criminals = new ArrayList<Criminal>();
        Term term = new Term(field, keyWord);
        TermQuery query = new TermQuery(term);
        System.out.println("查询条件:" + field + "->" + keyWord);

        TopDocs topDocs = indexSearcher.search(query, MAX_COUNT);
        System.out.println("查到如下罪犯信息:");
        for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
            int docId = scoreDoc.doc;
            Document doc = indexSearcher.doc(docId);
            Criminal criminal = criminalDao.getById(doc.get("id"));
            System.out.println(criminal.toString());
            criminals.add(criminal);
        }
        System.out.println("查索引结束<<<");
        return criminals;
    }

}

最后,再建个监狱测试类,来做相应的测试:

package criminals;

import static org.junit.Assert.*;

import java.io.IOException;
import java.util.Collection;

import org.joda.time.DateTime;
import org.junit.Before;
import org.junit.Test;

public class JailTest {
    private Jail jail;

    @Before
    public void setUp() throws Exception {
        // 初始化监狱系统
        jail = new Jail();
        jail.initialize();

        // 往监狱里投放罪犯
        Criminal criminal;
        criminal = new Criminal(1, "隔壁老王", "Male", new DateTime("1988-08-08"), "大头,蹭饭",
                "小头爸爸的天敌,经常上小头爸爸家蹭饭,不过此人对大头儿子不错.经常关心大头的学业.");
        jail.addCriminal(criminal);
        criminal = new Criminal(2, "王南翔", "Male", new DateTime("1999-01-05"), "破坏公共安全",
                "南翔挖掘机学校毕业,深得南翔黑客大师黑翔的真传,前不久刚挖断某ET家的网线.");
        jail.addCriminal(criminal);
        criminal = new Criminal(3, "绿坝妹", "Female", new DateTime("2008-07-22"), "传播不良信息,蹭饭",
                "向青少年传播不良网站列表,据传绿坝妹幼时曾备加菲猫挠过,留下心里阴影,所以但凡有加菲猫出现的内容都不让小朋友观看.");
        jail.addCriminal(criminal);
    }

    @Test
    public void test() throws IOException {
        jail.buildIndexies();
        Collection<Criminal> criminals;
        // 找出所有男性罪犯
        criminals = jail.searchCriminals("gender", "Male");
        assertEquals(criminals.size(), 2);
        for (Criminal criminal : criminals) {
            assertEquals(criminal.getGender(), "Male");
        }
        // 找出所有犯过‘蹭饭’罪的罪犯
        criminals = jail.searchCriminals("crime", "蹭饭");
        assertEquals(criminals.size(), 2);
        for (Criminal criminal : criminals) {
            assertTrue(criminal.getName().equals("隔壁老王") || criminal.getName().equals("绿坝妹"));
        }
        // 找出描述里包含‘挖掘’的罪犯
        criminals = jail.searchCriminals("description", "挖掘");
        assertEquals(criminals.size(), 1);

    }

}

见证奇迹的时刻到了。跑下测试类看看结果,和咱预想的一样!

开始建索引>>>
为罪犯:Criminal [id=1, name=隔壁老王, gender=Male, birthday=1988-08-08T00:00:00.000+09:00, crime=大头,蹭饭, description=小头爸爸的天敌,经常上小头爸爸家蹭饭,不过此人对大头儿子不错.经常关心大头的学业.]建立索引完成.
为罪犯:Criminal [id=2, name=王南翔, gender=Male, birthday=1999-01-05T00:00:00.000+08:00, crime=破坏公共安全, description=南翔挖掘机学校毕业,深得南翔黑客大师黑翔的真传,前不久刚挖断某ET家的网线.]建立索引完成.
为罪犯:Criminal [id=3, name=绿坝妹, gender=Female, birthday=2008-07-22T00:00:00.000+08:00, crime=传播不良信息,蹭饭, description=向青少年传播不良网站列表,据传绿坝妹幼时曾备加菲猫挠过,留下心里阴影,所以但凡有加菲猫出现的内容都不让小朋友观看.]建立索引完成.
共建立3个罪犯记录索引.
建索引结束<<<
开始查所引>>>
查询条件:gender->Male
查到如下罪犯信息:
Criminal [id=1, name=隔壁老王, gender=Male, birthday=1988-08-08T00:00:00.000+09:00, crime=大头,蹭饭, description=小头爸爸的天敌,经常上小头爸爸家蹭饭,不过此人对大头儿子不错.经常关心大头的学业.]
Criminal [id=2, name=王南翔, gender=Male, birthday=1999-01-05T00:00:00.000+08:00, crime=破坏公共安全, description=南翔挖掘机学校毕业,深得南翔黑客大师黑翔的真传,前不久刚挖断某ET家的网线.]
查索引结束<<<
开始查所引>>>
查询条件:crime->蹭饭
查到如下罪犯信息:
Criminal [id=1, name=隔壁老王, gender=Male, birthday=1988-08-08T00:00:00.000+09:00, crime=大头,蹭饭, description=小头爸爸的天敌,经常上小头爸爸家蹭饭,不过此人对大头儿子不错.经常关心大头的学业.]
Criminal [id=3, name=绿坝妹, gender=Female, birthday=2008-07-22T00:00:00.000+08:00, crime=传播不良信息,蹭饭, description=向青少年传播不良网站列表,据传绿坝妹幼时曾备加菲猫挠过,留下心里阴影,所以但凡有加菲猫出现的内容都不让小朋友观看.]
查索引结束<<<
开始查所引>>>
查询条件:description->挖掘
查到如下罪犯信息:
Criminal [id=2, name=王南翔, gender=Male, birthday=1999-01-05T00:00:00.000+08:00, crime=破坏公共安全, description=南翔挖掘机学校毕业,深得南翔黑客大师黑翔的真传,前不久刚挖断某ET家的网线.]
查索引结束<<<
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值