Solr 4.0: Realtime GET

本文介绍了 Solr 4.0 中新增的实时获取功能,允许用户查看尚未完全索引的数据,从而实现近乎实时的数据检索。通过配置交易日志和实时获取处理器,用户可以在数据更改后立即访问最新信息,无需等待索引完成。该功能特别适用于频繁更新的场景,提供快速响应和高效性能。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

he next functionality I decided to look at, from the upcoming Solr 4.0, is the so called “Realtime Get”. It allows you to see the data even though it was not yet added to the index, thus before the commit operation being sent to Solr. Let’s see how it works.

Some theory

Data update in Lucene and Solr has one disadvantage – when you submit index updates they can’t be seen until commit operation is run. The problem is that commit is costly in terms of performance and intense commiting may cause performance problems. So, when you need your data to be visible right after being change you may be forced to choose – either performance, or fast updates. In order to address that Lucene and Solr are working towards enabling Near Real Time (NRT) searching. In Lucene we have that possibility, in Solr 4.0 we will also be able to use that and not only that.

Configuration

In order to use Realtime Get functionality we need to configure the following Solr features:

Transaction log

The first thing to configure is the transaction log writing. In order to do that you need to add the following to your updateHandler configuration:

1 <updateLog>
2   <str name="dir">${solr.data.dir:}</str>
3 </updateLog>

The above entry says, that the directory holding transaction log will be located in the same directory where the index directory is located.

Realtime Get handler

The second thing that needs to be done, to see the Realtime Get in action, is the appropriate handler configuration (or adding component to your already defined handler). To do that add the following to your solrconfig.xml file:

1 <requestHandler name="/get" class="solr.RealTimeGetHandler">
2   <lst name="defaults">
3     <str name="omitHeader">true</str>
4   </lst>
5 </requestHandler>

The above entry it’s nothing unusual – it just add a new request handler implementing solr.RealTimeGetHandler class, which enables checking the transaction log.

Action

To check how Realtime Get works I decided to do a simple test. The first thing I did is indexing one file (from the ones that are available in the exampledocs directory) with the use of the following bash command:

1 curl 'http://localhost:8983/solr/update' -d @hd.xml -H 'Content-type:application/xml'

Of course I did not send the commit operation after indexing. As we could expect the following query:

didn’t return search results. So let’s check, if the handler registered as /get will be able to get us some results. In order to do that I send the following query:

And in result I got the following document:

01 <?xml version="1.0" encoding="UTF-8"?>
02 <response>
03 <doc name="doc">
04   <str name="id">SP2514N</str>
05   <str name="name">Samsung SpinPoint P120 SP2514N - hard drive - 250 GB - ATA-133</str>
06   <str name="manu">Samsung Electronics Co. Ltd.</str>
07   <str name="manu_id_s">samsung</str>
08   <arr name="cat">
09     <str>electronics</str>
10     <str>hard drive</str>
11   </arr>
12   <arr name="features">
13     <str>7200RPM, 8MB cache, IDE Ultra ATA-133</str>
14     <str>NoiseGuard, SilentSeek technology, Fluid Dynamic Bearing (FDB) motor</str>
15   </arr>
16   <float name="price">92.0</float>
17   <int name="popularity">6</int>
18   <bool name="inStock">true</bool>
19   <date name="manufacturedate_dt">2006-02-13T15:26:37Z</date>
20   <str name="store">35.0752,-97.032</str></doc>
21 </response>

So Solr returned the result that wasn’t added to the index – nice !

Usage possibilities

You probably noticed, that in order to fetch a document with /get handler I needed to provide it’s unique identifier (or identifiers list). That’s true, Realtime Get doesn’t support searching, because it was not created to support full searching. This functionality is able to show us the updates of the documents which identifiers are known (so for example the ones in the index) – in example by adding the component used in solr.RealTimeGetHandler to any of your defined handler. And the good news is – you don’t have to worry update performance – solr.RealTimeGet is very fast. So, if one of your problems is frequent updated you can look in the future with a smile :)

Last few words

The Realtime Get functionality brings new possibilities when it comes to Solr and also on the road to the SolrCloud. With the use of transaction log one can implement automatic cluster node restore or instance NRT instance updates. As you can see Solr 4.0 is not only about search, but also about data store and bringing Solr closer to NoSQL solutions.

 

 

转自:http://solr.pl/en/2012/01/09/solr-4-0-realtime-get-2/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值