Lucene 学习笔记

[b]Apache Lucene is a high-performance, full-featured text search engine library. [/b]

1.[b]Here's a simple example how to use Lucene for indexing and searching[/b] (using JUnit to check if the results are what we expect):


import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

/**
* @since V2.0
* @author David.Wei
* @date 2008-4-16
* @param args
* @return void
*/
public class Test {

public static void main(String[] args) throws Exception {

Analyzer analyzer = new StandardAnalyzer();

// Store the index in memory:
Directory directory = new RAMDirectory();
// To store an index on disk, use this instead:
// Directory directory = FSDirectory.getDirectory("/tmp/testindex");
IndexWriter iwriter = new IndexWriter(directory, analyzer, true);
iwriter.setMaxFieldLength(25000);
Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new Field("fieldname", text, Field.Store.YES,
Field.Index.TOKENIZED));
iwriter.addDocument(doc);
iwriter.optimize();
iwriter.close();

// Now search the index:
IndexSearcher isearcher = new IndexSearcher(directory);
// Parse a simple query that searches for "text":
QueryParser parser = new QueryParser("fieldname", analyzer);
Query query = parser.parse("text");
Hits hits = isearcher.search(query);
// assertEquals(1, hits.length());
// Iterate through the results:
for (int i = 0; i < hits.length(); i++) {
Document hitDoc = hits.doc(i);
System.out.println("This is the text to be indexed."
+ hitDoc.get("fieldname"));
}
isearcher.close();
directory.close();
}

}


[b]2.The Lucene API is divided into several packages:[/b]
[list]
[*][u]org.apache.lucene.analysis[/u] defines an abstract Analyzer API for converting text from a java.io.Reader into a TokenStream, an enumeration of Tokens. A TokenStream is composed by applying TokenFilters to the output of a Tokenizer. A few simple implemenations are provided, including StopAnalyzer and the grammar-based StandardAnalyzer.
[*][u]org.apache.lucene.document[/u] provides a simple Document class. A document is simply a set of named Fields, whose values may be strings or instances of java.io.Reader.
[*][u]org.apache.lucene.index[/u] provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
[*][u]org.apache.lucene.search[/u] provides data structures to represent queries (TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the abstract Searcher which turns queries into Hits. IndexSearcher implements search over a single IndexReader.
[*][u]org.apache.lucene.queryParser[/u] uses JavaCC to implement a QueryParser.
[*][u]org.apache.lucene.store[/u] defines an abstract class for storing persistent data, the Directory, a collection of named files written by an IndexOutput and read by an IndexInput. Two implementations are provided, FSDirectory, which uses a file system directory to store files, and RAMDirectory which implements files as memory-resident data structures.
[*][u]org.apache.lucene.util[/u] contains a few handy data structures, e.g., BitVector and PriorityQueue.
[/list]

[b]3.To use Lucene, an application should:[/b]
[list]
[*]Create Documents by adding Fields;
[*]Create an IndexWriter and add documents to it with addDocument();
[*]Call QueryParser.parse() to build a query from a string; and
[*]Create an IndexSearcher and pass the query to its search() method.
[/list]

[b]4.Some simple examples of code which does this are: [/b]
[list]
[*][u]FileDocument.java[/u] contains code to create a Document for a file.
[*][u]IndexFiles.java[/u] creates an index for all the files contained in a directory.
[*][u]DeleteFiles.java[/u] deletes some of these files from the index.
[*][u]SearchFiles.java[/u] prompts for queries and searches an index.
[/list]

[b]code detail:[/b]

(1)FileDocument.java

/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import java.io.File;
import java.io.FileReader;

import org.apache.lucene.document.DateTools;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;

/** A utility for making Lucene Documents from a File. */

public class FileDocument {
/**
* Makes a document for a File.
* <p>
* The document has three fields:
* <ul>
* <li><code>path</code>--containing the pathname of the file, as a
* stored, untokenized field;
* <li><code>modified</code>--containing the last modified date of the
* file as a field as created by <a
* href="lucene.document.DateTools.html">DateTools</a>; and
* <li><code>contents</code>--containing the full contents of the file,
* as a Reader field;
*/
public static Document Document(File f)
throws java.io.FileNotFoundException {

// make a new, empty document
Document doc = new Document();

// Add the path of the file as a field named "path". Use a field that is
// indexed (i.e. searchable), but don't tokenize the field into words.
doc.add(new Field("path", f.getPath(), Field.Store.YES,
Field.Index.UN_TOKENIZED));

// Add the last modified date of the file a field named "modified". Use
// a field that is indexed (i.e. searchable), but don't tokenize the
// field
// into words.
doc.add(new Field("modified", DateTools.timeToString(f.lastModified(),
DateTools.Resolution.MINUTE), Field.Store.YES,
Field.Index.UN_TOKENIZED));

// Add the contents of the file to a field named "contents". Specify a
// Reader,
// so that the text of the file is tokenized and indexed, but not
// stored.
// Note that FileReader expects the file to be in the system's default
// encoding.
// If that's not the case searching for special characters will fail.
doc.add(new Field("contents", new FileReader(f)));

// return the document
return doc;
}

private FileDocument() {
}
}



(2)IndexFiles.java

/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Date;

/** Index all text files under a directory. */
public class IndexFiles {

private IndexFiles() {
}

static final File INDEX_DIR = new File("index");

/** Index all text files under a directory. */
public static void main(String[] args) {
String usage = "java org.apache.lucene.demo.IndexFiles <root_directory>";
if (args.length == 0) {
System.err.println("Usage: " + usage);
System.exit(1);
}

if (INDEX_DIR.exists()) {
System.out.println("Cannot save index to '" + INDEX_DIR
+ "' directory, please delete it first");
System.exit(1);
}

final File docDir = new File(args[0]);
if (!docDir.exists() || !docDir.canRead()) {
System.out
.println("Document directory '"
+ docDir.getAbsolutePath()
+ "' does not exist or is not readable, please check the path");
System.exit(1);
}

Date start = new Date();
try {
IndexWriter writer = new IndexWriter(INDEX_DIR,
new StandardAnalyzer(), true);
System.out.println("Indexing to directory '" + INDEX_DIR + "'...");
indexDocs(writer, docDir);
System.out.println("Optimizing...");
writer.optimize();
writer.close();

Date end = new Date();
System.out.println(end.getTime() - start.getTime()
+ " total milliseconds");

} catch (IOException e) {
System.out.println(" caught a " + e.getClass()
+ "\n with message: " + e.getMessage());
}
}

static void indexDocs(IndexWriter writer, File file) throws IOException {
// do not try to index files that cannot be read
if (file.canRead()) {
if (file.isDirectory()) {
String[] files = file.list();
// an IO error could occur
if (files != null) {
for (int i = 0; i < files.length; i++) {
indexDocs(writer, new File(file, files[i]));
}
}
} else {
System.out.println("adding " + file);
try {
writer.addDocument(FileDocument.Document(file));
}
// at least on windows, some temporary files raise this
// exception with an "access denied" message
// checking if the file can be read doesn't help
catch (FileNotFoundException fnfe) {
;
}
}
}
}

}



(3)DeleteFiles.java

/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;

// import org.apache.lucene.index.Term;

/** Deletes documents from an index that do not contain a term. */
public class DeleteFiles {

private DeleteFiles() {
} // singleton

/** Deletes documents from an index that do not contain a term. */
public static void main(String[] args) {
String usage = "java org.apache.lucene.demo.DeleteFiles <unique_term>";
if (args.length == 0) {
System.err.println("Usage: " + usage);
System.exit(1);
}
try {
Directory directory = FSDirectory.getDirectory("index");
IndexReader reader = IndexReader.open(directory);

Term term = new Term("path", args[0]);
int deleted = reader.deleteDocuments(term);

System.out.println("deleted " + deleted + " documents containing "
+ term);

// one can also delete documents by their internal id:

// for (int i = 0; i < reader.maxDoc(); i++) {
// System.out.println("Deleting document with id " + i);
// reader.delete(i);
// }

reader.close();
directory.close();

} catch (Exception e) {
System.out.println(" caught a " + e.getClass()
+ "\n with message: " + e.getMessage());
}
}
}



(4)SearchFiles.java

/**
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.index.FilterIndexReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Searcher;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.Date;

/** Simple command-line based search demo. */
public class SearchFiles {

/**
* Use the norms from one field for all fields. Norms are read into memory,
* using a byte of memory per document per searched field. This can cause
* search of large collections with a large number of fields to run out of
* memory. If all of the fields contain only a single token, then the norms
* are all identical, then single norm vector may be shared.
*/
private static class OneNormsReader extends FilterIndexReader {
private String field;

public OneNormsReader(IndexReader in, String field) {
super(in);
this.field = field;
}

public byte[] norms(String field) throws IOException {
return in.norms(this.field);
}
}

private SearchFiles() {
}

/** Simple command-line based search demo. */
public static void main(String[] args) throws Exception {
String usage = "Usage: java org.apache.lucene.demo.SearchFiles [-index dir] [-field f] [-repeat n] [-queries file] [-raw] [-norms field]";
if (args.length > 0
&& ("-h".equals(args[0]) || "-help".equals(args[0]))) {
System.out.println(usage);
System.exit(0);
}

String index = "index";
String field = "contents";
String queries = null;
int repeat = 0;
boolean raw = false;
String normsField = null;

for (int i = 0; i < args.length; i++) {
if ("-index".equals(args[i])) {
index = args[i + 1];
i++;
} else if ("-field".equals(args[i])) {
field = args[i + 1];
i++;
} else if ("-queries".equals(args[i])) {
queries = args[i + 1];
i++;
} else if ("-repeat".equals(args[i])) {
repeat = Integer.parseInt(args[i + 1]);
i++;
} else if ("-raw".equals(args[i])) {
raw = true;
} else if ("-norms".equals(args[i])) {
normsField = args[i + 1];
i++;
}
}

IndexReader reader = IndexReader.open(index);

if (normsField != null)
reader = new OneNormsReader(reader, normsField);

Searcher searcher = new IndexSearcher(reader);
Analyzer analyzer = new StandardAnalyzer();

BufferedReader in = null;
if (queries != null) {
in = new BufferedReader(new FileReader(queries));
} else {
in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
}
QueryParser parser = new QueryParser(field, analyzer);
while (true) {
if (queries == null) // prompt the user
System.out.println("Enter query: ");

String line = in.readLine();

if (line == null || line.length() == -1)
break;

line = line.trim();
if (line.length() == 0)
break;

Query query = parser.parse(line);
System.out.println("Searching for: " + query.toString(field));

Hits hits = searcher.search(query);

if (repeat > 0) { // repeat & time as benchmark
Date start = new Date();
for (int i = 0; i < repeat; i++) {
hits = searcher.search(query);
}
Date end = new Date();
System.out.println("Time: " + (end.getTime() - start.getTime())
+ "ms");
}

System.out.println(hits.length() + " total matching documents");

final int HITS_PER_PAGE = 10;
for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {
int end = Math.min(hits.length(), start + HITS_PER_PAGE);
for (int i = start; i < end; i++) {

if (raw) { // output raw format
System.out.println("doc=" + hits.id(i) + " score="
+ hits.score(i));
continue;
}

Document doc = hits.doc(i);
String path = doc.get("path");
if (path != null) {
System.out.println((i + 1) + ". " + path);
String title = doc.get("title");
if (title != null) {
System.out.println(" Title: " + doc.get("title"));
}
} else {
System.out.println((i + 1) + ". "
+ "No path for this document");
}
}

if (queries != null) // non-interactive
break;

if (hits.length() > end) {
System.out.println("more (y/n) ? ");
line = in.readLine();
if (line.length() == 0 || line.charAt(0) == 'n')
break;
}
}
}
reader.close();
}
}

基于数据驱动的 Koopman 算子的递归神经网络模型线性化,用于纳米定位系统的预测控制研究(Matlab代码实现)内容概要:本文围绕“基于数据驱动的Koopman算子的递归神经网络模型线性化”展开,旨在研究纳米定位系统的预测控制问题,并提供完整的Matlab代码实现。文章结合数据驱动方法与Koopman算子理论,利用递归神经网络(RNN)对非线性系统进行建模与线性化处理,从而提升纳米级定位系统的精度与动态响应性能。该方法通过提取系统隐含动态特征,构建近似线性模型,便于后续模型预测控制(MPC)的设计与优化,适用于高精度自动化控制场景。文中还展示了相关实验验证与仿真结果,证明了该方法的有效性和先进性。; 适合人群:具备一定控制理论基础和Matlab编程能力,从事精密控制、智能制造、自动化或相关领域研究的研究生、科研人员及工程技术人员。; 使用场景及目标:①应用于纳米级精密定位系统(如原子力显微镜、半导体制造设备)中的高性能控制设计;②为非线性系统建模与线性化提供一种结合深度学习与现代控制理论的新思路;③帮助读者掌握Koopman算子、RNN建模与模型预测控制的综合应用。; 阅读建议:建议读者结合提供的Matlab代码逐段理解算法实现流程,重点关注数据预处理、RNN结构设计、Koopman观测矩阵构建及MPC控制器集成等关键环节,并可通过更换实际系统数据进行迁移验证,深化对方法泛化能力的理解。
内容概要:本文介绍了福建亘川科技有限公司及其研发的“亘川管网降雨量智能监测系统”。该公司专注于智慧水务领域,融合物联网、大数据、云计算和人工智能技术,打造了覆盖“水库、水厂、管网、泵站、排口、河湖”的“六位一体”智慧水务监测运维系统。该降雨量监测系统采用高精度传感器,支持总降雨量、瞬时降雨量和24小时累积雨量的实时监测,具备多维度数据采集、联动预警、太阳能绿色供电和4G稳定通信等功能,广泛应用于城市内涝、山洪、水库及边坡等灾害预警场景。系统依托“亘川智慧云”平台,实现远程数据监控、历史数据查询、多设备接入和自动报警,提升城市排水管理智能化水平。; 适合人群:从事智慧水务、城市防汛、环境监测等相关领域的技术人员、市政管理人员及系统集成商;具备一定物联网或水务行业背景的专业人员。; 使用场景及目标:①用于城市合流管网区域的降雨实时监测,评估排水能力,预防内涝;②在山洪、水库、边坡等场景中实现灾害早期预警;③通过云端平台实现多设备统一管理与数据可视化分析,提升运维效率。; 阅读建议:本资料侧重系统功能与应用场景介绍,建议结合实际项目需求,进一步了解设备参数、平台接口及定制化服务能力,以便更好地应用于智慧城市建设与应急管理中。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值