Solr_stopword相关注意事项

<p>So in Solr, normally we’re used to stopwords just kind of magically working. If you enter a stop word in a query, it’ll just be silently ignored and stripped out (unlike my legacy OPAC, which will give you zero results whenever you include a stopword!) — if you include a stopword in a <em>phrase</em> search, it’ll do even better: “kill a mockingbird” basically changes into “kill * mockingbird”, kill and mockingbird seperated by one word, and succesfully matches indexes with “kill a mockingbird” (along with any other “kill * mockingbird”).</p>
<p></p>
<p>Great! So normally we don’t have to think about it too much.</p>
<p></p>
<p>An exception is when you throw dismax into it. Dismax lets you search multiple solr fields at once (the qf parameter). It also lets you search with a multi-clause query, where, depending on your “mm” settings, only SOME of those clauses have to match for results to be included in the hitlist.</p>
<p></p>
<p>So you have multiple Solr fields involved. As long as each of those solr fields is configured for stopwords (and the <em>same</em>) stopwords, everything Just Works the way you’d expect. But if one of those fields does <em>not</em> have stopwords configured, then (depending on your mm settings), you can easily end up getting zero hits for any (non-phrase) query clause that is a stopword. This kind of makes sense when you think about it — since at least one field didn’t have stopwords, there was a clause included for that stopword you entered. And that clause won’t possibly match on any of your stopword fields, so it’s a clause that can’t match, which depending on your mm (and the contents of all your fields, phew) will result in no hits.</p>
<p></p>
<p>A bit more information in <a href="http://n3.nabble.com/Dismax-Minimum-Match-Stopwords-Bug-td493483.html">this solr listserv thread</a>.</p>
<p></p>
<p>If you have fields included in a dismax qf that all have stopwords configured, but with <em>different</em> stopwords lists, the results could be even <em>more</em> confusing.</p>
<p></p>
<p>The solution?</p>
<p></p>
<p>If you are using dismax, make sure all fields included in a qf have <em>exactly the same</em> stopwords settings. Either they all need to have stopwords configured with the same stopwords file, or they all need to have stopwords <em>not</em> configured.</p>
<p></p>
<p>Just not using stopwords seems like the simplest solution to me. What’s the reason for stopwords in the first place? Generally performance, a very common word will end up with a huge result set when there’s a search clause on that word, which will slow down lucene/solr. My Solr is not as performant as I’d like, it’s true, but there are <em>a whole bunch</em> of different things I really need to look at for performance (So many that it’s kind of overwhelming to consider, honestly) — Since using stopwords would make my solr configuration more confusing and error prone, I think assuming that lack of stopwords is my most important bottleneck without profiling of some kind is a kind of “premature optimization”. So no stopwords for now.</p>
<p></p>
<p>Erik Hatcher suggested in an IRC chat that if very common words are a performance bottleneck, rather than stopwords it might make more sense to investigate Solr’s (or lucene’s?) “commongrams capability”. Need to put that on my list to look into, I know little about that; I get the basic concept, but dont’ know how it’s implemented in solr/lucene or how to set it up.</p>
root@f1ef6bb4588a:/opt/solr# ps aux | grep java root 6700 0.0 0.0 12820 2048 pts/1 S+ 03:26 0:00 grep java root@f1ef6bb4588a:/opt/solr# ps aux | grep java root 6700 0.0 0.0 12820 2048 pts/1 S+ 03:26 0:00 grep java root@f1ef6bb4588a:/opt/solr# root@f1ef6bb4588a:/opt/solr# grep -R "SOLR_HOST" /opt/solr/ /opt/solr/CHANGES.txt: }' http://$SOLR_HOST:$SOLR_PORT/api/cluster /opt/solr/CHANGES.txt: }' http://$SOLR_HOST:$SOLR_PORT/api/cluster /opt/solr/CHANGES.txt:* SOLR-7545: Honour SOLR_HOST parameter with bin/solr{,.cmd} /opt/solr/bin/solr:if [ "$SOLR_HOST" != "" ]; then /opt/solr/bin/solr: SOLR_TOOL_HOST="$SOLR_HOST" /opt/solr/bin/solr: echo " Can be run on remote (non-Solr) hosts, as long as a valid SOLR_HOST is provided in solr.in.sh" /opt/solr/bin/solr: echo "Can be run from remote (non-Solr) hosts, as long as a valid SOLR_HOST is provided in solr.in.sh" /opt/solr/bin/solr: SOLR_HOST="$2" /opt/solr/bin/solr: PASS_TO_RUN_EXAMPLE+=" -h $SOLR_HOST" /opt/solr/bin/solr:if [ "$SOLR_HOST" != "" ]; then /opt/solr/bin/solr: SOLR_HOST_ARG=("-Dhost=$SOLR_HOST") /opt/solr/bin/solr: SOLR_HOST_ARG=() /opt/solr/bin/solr: if [ "$SOLR_HOST" != "" ]; then /opt/solr/bin/solr: REMOTE_JMX_OPTS+=("-Djava.rmi.server.hostname=$SOLR_HOST") /opt/solr/bin/solr: echo -e " SOLR_HOST = $SOLR_HOST" /opt/solr/bin/solr: "${SOLR_HOST_ARG[@]}" "-Duser.timezone=$SOLR_TIMEZONE" "-XX:-OmitStackTraceInFastThrow" \ /opt/solr/bin/solr.cmd:IF NOT "%SOLR_HOST%"=="" ( /opt/solr/bin/solr.cmd: set "SOLR_TOOL_HOST=%SOLR_HOST%" /opt/solr/bin/solr.cmd:echo Can be run on remote (non-Solr^) hosts, as long as a valid SOLR_HOST is provided in solr.in.cmd /opt/solr/bin/solr.cmd:echo Can be run from remote (non-Solr^) hosts, as long as a valid SOLR_HOST is provided in solr.in.cmd. /opt/solr/bin/solr.cmd:set SOLR_HOST=%~2 /opt/solr/bin/solr.cmd:IF NOT "%SOLR_HOST%"=="" ( /opt/solr/bin/solr.cmd: set SOLR_HOST_ARG=-Dhost=%SOLR_HOST% /opt/solr/bin/solr.cmd: set SOLR_HOST_ARG= /opt/solr/bin/solr.cmd: IF NOT "%SOLR_HOST%"=="" set REMOTE_JMX_OPTS=%REMOTE_JMX_OPTS% -Djava.rmi.server.hostname=%SOLR_HOST% /opt/solr/bin/solr.cmd: @echo SOLR_HOST = %SOLR_HOST% /opt/solr/bin/solr.cmd:IF NOT "%SOLR_HOST_ARG%"=="" set "START_OPTS=%START_OPTS% %SOLR_HOST_ARG%" /opt/solr/bin/solr.in.cmd:REM set SOLR_HOST=192.168.1.1 /opt/solr/bin/solr.in.sh:#SOLR_HOST="192.168.1.1" root@f1ef6bb4588a:/opt/solr# grep -R "8983" /opt/solr/server/etc/*.xml /opt/solr/server/etc/jetty-http.xml: <Set name="port"><Property name="jetty.port" default="8983" /></Set> /opt/solr/server/etc/jetty-https.xml: <Set name="port"><Property name="solr.jetty.https.port" default="8983" /></Set> /opt/solr/server/etc/jetty-https8.xml: <Set name="port"><Property name="solr.jetty.https.port" default="8983" /></Set> root@f1ef6bb4588a:/opt/solr# tail -n 100 /opt/solr/server/logs/solr-8983-console.log OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory (errno = 12). OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory (errno = 12). OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory (errno = 12). OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory (errno = 12). library initialization failed - unable to allocate file descriptor table - out of memoryroot@f1ef6bb4588a:/opt/solr#
03-22
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值