The solution to search related problems on Geonetwork: operators, quotes, phrase, chinese.

本文讨论了在Web搜索高级搜索过程中遇到的问题,并详细分析了原因。提出了一种解决方案,通过修改查询处理方式,使得操作符、引号、短语查询和中文字符查询效果得到改善。该方案涉及对标准分析器的改进,确保了操作符、引号、短语和中文字符查询的正确处理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

 did you encounter the problems search related on web in Advanced Search?
I did.
1. Problem.
These are the search related problems I found:
1) operators: the operators( and, or, not ) can not take any effect.
2) quotes:     also can not take any effect.
3) the phrase query:  must use quotes, but quotes,.....
4) the character query in Asian Language like chinese: 
can not get the exact result, GN found the metadata which includes  each character in query, not the query phrase.
the effect is like: "any more", and Geonetwork found "any" and "more".
2. WHY?
ok, why? what are the reasons? I 
The analyzer is the main reason for the problems.
In the java class file of services.main.Search, 
I saw that the query sentence will be send to MainUtil.splitWord function to split the word, like below.
if (any != null)
any.setText(MainUtil.splitWord(any.getText()));
Take a look at the splitWord function, it used StandardAnalyzer.
public static String splitWord(String requestStr)
{
Analyzer a = new StandardAnalyzer();
.....
}
We know, the StandardAnalyzer will filter some strings like "and", "or", "not", "as"..., 
and it also filter the quotes ("), so the return of this function will ignore the operator and quotes.
As default operator "and", the GN will use "and" to query in Lucene.
So,  the problems become.
3.Solution.
How to resolve that?
Just do not use the StandardAnalyzer? No, we need it to analyze the query sentence, for example, 
the phrase in the quotes. So we must find the quotes before analyze, and send the phrase between
quotes to analyzer. My solution can let the quotes, operators, phrase take effect, 
it can resolve the problem, implement the search function and Chinese involved. Below is my solution,
if (any != null)
{
any.setText( splitWord(any.getText()) );
}
Use the splitWord to replace the MainUtil.splitWord, and MainUtil.splitWord will be used in splitWord.
Below is the splitWord function in Search.java
//code from here, these code will be in .service.main.Search.java file
//author: zhuhuazha2004@yahoo.com.cn
private static final String OPER_AND = " and ";
private static final String OPER_OR = " or ";
private static final String OPER_NOT = " not ";
private String splitWord( String strValue )
{
//basic process string: trim, multi whitespace changed to one.
String  strQuoteSg = "/'";
String strQuoteDb = "/"";
//single quote to double quote mark
strValue = strValue.replaceAll( strQuoteSg, strQuoteDb);
//trim
strValue = strValue.trim();
//union the continued whitespace to one single
strValue = strValue.replaceAll("//s//s+", " ");
//toLowerCase, the search is not case sensitive
strValue = strValue.toLowerCase();
if( strValue.length()>0 )
{
int nFirstIndex = strValue.indexOf(strQuoteDb);
if( nFirstIndex<0>
{
//no quotes, must use the operator and, or, not to supple the quotes
strValue = replaceComponent( strValue );
}
return splitString( strValue );
}
else
return strValue;
}

// " " --> " and "
private String replaceComponent( String strValue )
{
String strQuoteDb = "/"";
String strWhitespace = " ";
//add quotes to head and tail
strValue = strQuoteDb +strValue+ strQuoteDb;
//find the whitespace index
int nIndex = strValue.indexOf( strWhitespace );
if( nIndex<0>
return strValue;
else
{
//and ,or ,not
strValue = checkKeyword( strValue );
//if not inclucde, just use add as default.
if( strValue.contains( OPER_AND ) || strValue.contains( OPER_OR )
|| strValue.contains( OPER_NOT ))
{
return strValue;
}
else
{
return strValue.replace( strWhitespace, 
strQuoteDb+ strWhitespace+"and"+strWhitespace+strQuoteDb);
}
}
}
private String checkKeyword(String strValue)
{
strValue = checkKeywordComponent( strValue, OPER_AND );
strValue = checkKeywordComponent( strValue, OPER_OR );
strValue = checkKeywordComponent( strValue, OPER_NOT );
return strValue;
}
//add quotes to the head and tail of the string
//the strValue and keyword must be lowercase 
private String checkKeywordComponent(String strValue, String keyword)
{
StringBuffer sb = new StringBuffer();
sb.append( strValue );
int nIndex = sb.indexOf( keyword );
int offset = keyword.length();
String strQuoteDb = "/"";
while( nIndex >=0 )
{
//check the quote
if( !sb.substring( nIndex-1, nIndex).equals( strQuoteDb ))
{
sb.insert( nIndex, strQuoteDb );
offset++;
}
if( !sb.substring( nIndex+offset, nIndex+offset+1).equals( strQuoteDb ))
{
sb.insert( nIndex+offset, strQuoteDb );
}
nIndex = sb.indexOf(keyword, nIndex+2 );
offset = keyword.length();
}
return sb.toString();
}
private String splitString(String strValue)
{
//clear the whitespace of head and tail
strValue = strValue.trim();
//continued whitespace to one  
strValue = strValue.replaceAll("//s//s+", " ");
//add quotes for operator: and ,or ,not
strValue = checkKeyword( strValue );
String strQuoteDb = "/"";
StringBuffer sb = new StringBuffer();
int nStartIndex = 0;
int nFirstIndex = strValue.indexOf( strQuoteDb );
while( nFirstIndex>=0 )
{
sb.append( strValue.substring( nStartIndex, nFirstIndex+1 ) );
int nSecondQuote = strValue.indexOf(strQuoteDb, nFirstIndex+1 );
nStartIndex = (nFirstIndex
if( nSecondQuote<0 class="Apple-tab-span" style="white-space:pre">//the last quote  not exist
{
String strLast = strValue.substring(nStartIndex, strValue.length() );
strLast = MainUtil.splitWord( strLast );
sb.append( strLast );
sb.append( strQuoteDb );
nStartIndex = strValue.length()-1;
break;
}
else
{
String strLast = strValue.substring( nStartIndex, nSecondQuote );
strLast = MainUtil.splitWord( strLast );
sb.append( strLast );
sb.append( strQuoteDb );
nStartIndex = nSecondQuote+1;
}
//find the third "
nFirstIndex = strValue.indexOf( strQuoteDb, nStartIndex );
}
if( nStartIndex+1 <>
{
sb.append( strValue.substring(nStartIndex+1));
}
return sb.toString();
}
You can have a test.
4. COMMIT?
who can commit this to GN source?
Or how can i commit this ?
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值