lucene的两个小问题

1,RangeQuery
   如果我们要查询某个范围的文档,最先想到的可能就是它。
   比如我们要查询的field是时间,使用字符串存储
   RangeQuery rQuery=new RangeQuery(new Term("date","2000-01-01"),new Term("date","2009-01-01"),true);
   结果出现Too Many Clauses的异常,为什么呢?原因就是lucene的实现方法是:把RangeQuery用BooleanQuery来实现,上面的Query会用一个BooleanQuery来实现,
这个BooleanQuery的termQuery是2000-01-01,...,2009-01-01,所有被索引的值中满足条件的。那么想想看这会有多少。
   解决方法,使用RangeFilter

2,Sort
   比如我们想对查询结果根据date域来排序,最先想到的可能是:
   Hits hits = searcher.search(bQuery,new Sort("date"));
   但是却被告知Heap Space不够。怎么会呢?把它改成Hits hits = searcher.search(bQuery); 运行的飞快而且不用什么内存。
   原因,lucene会把所有文档的date域加载到内存中。为什么呢?不太清楚,难道是先排序,然后查询? 等哪天有空再看看源代码吧。
   解决方法,先查询,然后自己排序。


------------------------------------------zz----------------------------------------
http://www.nabble.com/java.lang.OutOfMemoryError:-Java-heap-space-when-sorting-the-fields-td16121128.html

Nabble | Old Nabble1 | Software » Apache » Lucene » Lucene - Java » Lucene - Java Users Login : Register
java.lang.OutOfMemoryError: Java heap space when sorting the fields
View:  Threaded Chronologically All Messages New views 10 Messages — Rating Filter:  0 1 2 3 4 5   Alert me   

 java.lang.OutOfMemoryError: Java heap space when sorting the fields  by sandyg Mar 18, 2008; 09:23pm :: Rate this Message:    - Use ratings to moderate (?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message
this is my search content

QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
Query query = parser.parse("1");

Sort sort = new Sort(new SortField(sortField));
         Hits       hits = searcher.search(query,sort);

And i had huge data about 13 millions of records
i am not sure y its giving outof memory exception and
no exception when no sorting is done
plz some one help me yar

and also if to increase heap space how to increase it programatically i had command prompt
java -Xms<initial heap size> -Xmx<maximum heap size>
please.....

 Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields  by chrislusf Mar 18, 2008; 11:19pm :: Rate this Message:    - Use ratings to

moderate (?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message
This is because sorting will load all values in that sortFirled into memory.

If it's an integer, you will need 4*N bytes, which is additional 52M for you.

There is no programatical way to increase memory size.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request)
got 2.6 Million Euro funding!


On Tue, Mar 18, 2008 at 6:23 AM, sandyg <gaddamsandeeps@...> wrote:
>
>  this is my search content
>
>  QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
>  Query query = parser.parse("1");
>
>  Sort sort = new Sort(new SortField(sortField));
>          Hits       hits = searcher.search(query,sort);
>
>  And i had huge data about 13 millions of records
>  i am not sure y its giving outof memory exception and
>  no exception when no sorting is done
>  plz some one help me yar
>
>  and also if to increase heap space how to increase it programatically i had
>  command prompt
>  java -Xms<initial heap size> -Xmx<maximum heap size>
>  please.....
>  --
>  View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError%3A-Java-heap-space-when-sorting-the-fields-tp16121128p16121128.html
>  Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@...
>  For additional commands, e-mail: java-user-help@...
>
> ...[show rest of quote]
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@...
For additional commands, e-mail: java-user-help@...

 

 Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields  by markrmiller Mar 19, 2008; 12:20am :: Rate this Message:    - Use ratings to

moderate (?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message
To sort on 13mil docs will take like at least 400 mb for the field
cache. Thats if you only sort on one field...it can grow fast if you
allow multi field sorting.

How much RAM are you giving your app?

sandyg wrote:
> this is my search content
>
> QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
> Query query = parser.parse("1");
>
> Sort sort = new Sort(new SortField(sortField));
>           Hits       hits = searcher.search(query,sort);
>
> And i had huge data about 13 millions of records
> i am not sure y its giving outof memory exception and
> no exception when no sorting is done
> plz some one help me yar
>
> and also if to increase heap space how to increase it programatically i had
> command prompt
> java -Xms<initial heap size>  -Xmx<maximum heap size>
> please.....
>     ...[show rest of quote]
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@...
For additional commands, e-mail: java-user-help@...

 

 Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields  by markrmiller Mar 19, 2008; 12:26am :: Rate this Message:    - Use ratings to

moderate (?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message
> Whoops...10 times to much there. more like 40 meg I think. A string
> sort could be a bit higher though, you also need to store all of terms
> to index into.


sandyg wrote:
> this is my search content
>
> QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
> Query query = parser.parse("1");
>
> Sort sort = new Sort(new SortField(sortField));
>           Hits       hits = searcher.search(query,sort);
>
> And i had huge data about 13 millions of records
> i am not sure y its giving outof memory exception and
> no exception when no sorting is done
> plz some one help me yar
>
> and also if to increase heap space how to increase it programatically i had
> command prompt
> java -Xms<initial heap size>  -Xmx<maximum heap size>
> please.....
>     ...[show rest of quote]
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@...
For additional commands, e-mail: java-user-help@...

 

 Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields  by sandyg Mar 19, 2008; 02:33pm :: Rate this Message:    - Use ratings to moderate

(?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message
Thanks for the reply.
Actually am sorting on a specific field that is on keyword feild which is unique
and i had 1 gb ram

markrmiller wrote:To sort on 13mil docs will take like at least 400 mb for the field
cache. Thats if you only sort on one field...it can grow fast if you
allow multi field sorting.

How much RAM are you giving your app?

sandyg wrote:
> this is my search content
>
> QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
> Query query = parser.parse("1");
>
> Sort sort = new Sort(new SortField(sortField));
>           Hits       hits = searcher.search(query,sort);
>
> And i had huge data about 13 millions of records
> i am not sure y its giving outof memory exception and
> no exception when no sorting is done
> plz some one help me yar
>
> and also if to increase heap space how to increase it programatically i had
> command prompt
> java -Xms<initial heap size>  -Xmx<maximum heap size>
> please.....
>    

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
...[show rest of quote]

 Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields  by sandyg Mar 19, 2008; 02:37pm :: Rate this Message:    - Use ratings to moderate

(?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message
How can i do sorting on the results i get  (if already hits are there then how to sort on the hits),instead of sorting on all the values b4 getting results

Chris Lu wrote:This is because sorting will load all values in that sortFirled into memory.

If it's an integer, you will need 4*N bytes, which is additional 52M for you.

There is no programatical way to increase memory size.

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request)
got 2.6 Million Euro funding!


On Tue, Mar 18, 2008 at 6:23 AM, sandyg <gaddamsandeeps@gmail.com> wrote:
>
>  this is my search content
>
>  QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
>  Query query = parser.parse("1");
>
>  Sort sort = new Sort(new SortField(sortField));
>          Hits       hits = searcher.search(query,sort);
>
>  And i had huge data about 13 millions of records
>  i am not sure y its giving outof memory exception and
>  no exception when no sorting is done
>  plz some one help me yar
>
>  and also if to increase heap space how to increase it programatically i had
>  command prompt
>  java -Xms<initial heap size> -Xmx<maximum heap size>
>  please.....
>  --
>  View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError%3A-Java-heap-space-when-sorting-the-fields-tp16121128p16121128.html
>  Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
...[show rest of quote]

 Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields  by sandyg Mar 19, 2008; 03:04pm :: Rate this Message:    - Use ratings to moderate

(?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message
Hi,
And is it not passibe to sort on the result we get instead of on all the values like
Hits       hits = searcher.search(query);
and it will be good if got sorting on the hits i.e on the result
because my sorting is based on specific  field and that field should be sorted when i click on it.
thanks for the reply


markrmiller wrote:To sort on 13mil docs will take like at least 400 mb for the field
cache. Thats if you only sort on one field...it can grow fast if you
allow multi field sorting.

How much RAM are you giving your app?

sandyg wrote:
> this is my search content
>
> QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
> Query query = parser.parse("1");
>
> Sort sort = new Sort(new SortField(sortField));
>           Hits       hits = searcher.search(query,sort);
>
> And i had huge data about 13 millions of records
> i am not sure y its giving outof memory exception and
> no exception when no sorting is done
> plz some one help me yar
>
> and also if to increase heap space how to increase it programatically i had
> command prompt
> java -Xms<initial heap size>  -Xmx<maximum heap size>
> please.....
>    

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
...[show rest of quote]

 Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields  by markrmiller Mar 20, 2008; 04:22am :: Rate this Message:    - Use ratings to

moderate (?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message
Heres what happens: in order to sort all of the hits you get back on a
field, you need to get the value of that field for comparisons right?
Well it turns out that reading a field value from the index is pretty
slow (its on the disk after all)...so Lucene will read all of the terms
in the field off disk once (on the first sort/search request) and cache
them. So if your field is an Int your talking numDocs*32 bits for your
cache. For a Long field its numDocs*64. For a String field Lucene caches
a String array with every unique term and then an int array indexing
into the term array.

You might think, if I only ask for the top 10 docs, don't i only read 10
field values? But of course you don't know what docs will be returned as
each search comes in...so you have to cache them all.
Like the other answer said, one field is going to be roughly 50 MB. You
can do the math yourself...I ripped the 40 doing some rough stuff in my
head, but around 50mb is prob closer.

So anyway, you *are* only sorting on the hits you get back...but youll
get different hits all the time so you still have to cache all the field
values. The FieldCache class does this, and thats whats taking up the
RAM. You should have 50 to give it though I would assume...how much RAM
are you giving the JVM?

- mark

sandyg wrote:
> Hi,
> And is it not passibe to sort on the result we get instead of on all the
> values like
> Hits       hits = searcher.search(query);
> and it will be good if got sorting on the hits i.e on the result
> because my sorting is based on specific  field and that field should be
> sorted when i click on it.
> thanks for the reply
>
>
> markrmiller wrote:
>    
>> To sort on 13mil docs will take like at least 400 mb for the field
>> cache. Thats if you only sort on one field...it can grow fast if you
>> allow multi field sorting.
>>
>> How much RAM are you giving your app?
>>
>> sandyg wrote:
>>      
>>> this is my search content
>>>
>>> QueryParser parser = new QueryParser("keyword",new StandardAnalyzer());
>>> Query query = parser.parse("1");
>>>
>>> Sort sort = new Sort(new SortField(sortField));
>>>            Hits       hits = searcher.search(query,sort);
>>>
>>> And i had huge data about 13 millions of records
>>> i am not sure y its giving outof memory exception and
>>> no exception when no sorting is done
>>> plz some one help me yar
>>>
>>> and also if to increase heap space how to increase it programatically i
>>> had
>>> command prompt
>>> java -Xms<initial heap size>   -Xmx<maximum heap size>
>>> please.....
>>>
>>>        
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@...
>> For additional commands, e-mail: java-user-help@...
>>
>>
>>
>>      
>
>     ...[show rest of quote]
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@...
For additional commands, e-mail: java-user-help@...

 

 Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields  by Daniel Noll-3 Mar 20, 2008; 06:41am :: Rate this Message:    - Use ratings to

moderate (?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message
On Thursday 20 March 2008 07:22:27 Mark Miller wrote:
> You might think, if I only ask for the top 10 docs, don't i only read 10
> field values? But of course you don't know what docs will be returned as
> each search comes in...so you have to cache them all.

If it lazily cached one field at a time, I guess this wouldn't be a problem...
but you would still eventually still run out of memory after performing a
certain number of queries.

I guess one solution would be a FieldCache backed by storage on hard disk.

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@...
For additional commands, e-mail: java-user-help@...

 

 Re: java.lang.OutOfMemoryError: Java heap space when sorting the fields  by hossman Mar 20, 2008; 07:53am :: Rate this Message:    - Use ratings to moderate

(?)

Reply | Reply to Author | Print | View Threaded | Show Only this Message

: You might think, if I only ask for the top 10 docs, don't i only read 10 field
: values? But of course you don't know what docs will be returned as each search
: comes in...so you have to cache them all.

Arguements have been made in the past that when you have an index
large enough that the FieldCache is prohibitively expensive, and you
expect the number of *matching* documents to be significantly smaller then
the total number of documents, it might make sense to Sort using
LazyLoaded "stored" field values ... it's a pretty radical shift from the
current sorting approach, and would require some explicit work on the part
of the indexing/searching code, but a patch along that lines was created a
while back...

https://issues.apache.org/jira/browse/LUCENE-769

...based on the last comment i posted in that issue, i clearly
thought it showed a lot of promise, but needed cleaned up.  I admittedly
lost track of it as other things took priority in life.  if people want
to dust it off, try it out, and make comments about it's effectiveness I'm
sure it could be revived.

 


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@...
For additional commands, e-mail: java-user-help@...

 

 


« Search Nabble for "lucene sort heap"

LightInTheBox - Buy quality products at wholesale price! 被过滤广告Free embeddable forum powered by Nabble  Forum Help 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值