Re: Sorting with Zend Lucene

5 messages Options
Embed this post
Permalink
Matt Schraeder

Re: Sorting with Zend Lucene

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)
My current memory limit is 750 MB.  We're trying to do the sort WITHOUT sending the entire result set to the user.  Ideally we want a sorted result set, then pass pages of 10, 20, 30, 40, 50, etc results back to the user.  But I can't figure out a way of getting the results returned by Lucene to be sorted without running out of memory.

>>> till <[hidden email]> 7/1/2009 10:10:32 AM >>>
On Wed, Jul 1, 2009 at 5:11 PM, Matt Schraeder<[hidden email]> wrote:
> I'm having a rather drastic ussing with sorting using Zend Search Lucene.
> Our index has 18,000 documents, so light as far as Lucene is concerned.  I
> can perform a search that will pull back all of our records with no problem.
>
> However, once I attempt to add a sort field (Sorting by Title for instance),
> the script crashes with an out of memory exception.  What is the best way to
> handle sorting to avoid running out of memory?

Not sure if it's helpful, but maybe sort on the client side with JS?

Also, what is your current memory_limit, and how much more does it require?

Till
mgkimsal

Re: Sorting with Zend Lucene

Reply Threaded More More options
Print post
Permalink
This seems a bit odd, as I've had SOLR (which uses Lucene) deal with far more than that and sorted without problem.

Open question - for those of you using Lucene with the Zend Framework library, what's been the largest set you've dealt with, and what was your performance like (sorting, etc)?  I've only personally known people using the ZFLucene stuff with under 20k, and they'd complained about performance (which was likely slow because of large amounts of memory being used, like you're seeing).

I see a couple of options:
1.  Don't try to pull all your fields back at once - just query for a doc id and the one field you want to sort by, and do your query that way.  Then figure out the doc ids in the range/slice you want, and then query for the full documents for each id.

2.  investigate using solr as the lucene server.  this might not be an option for you, but if it is, it might be worth considering.

#1 might be easier to implement, but if the speed still isn't good, you may have to consider other options (like #2, or looking at sphinx, or something else).

Hopefully someone can point you to an easier solution. 


On Wed, Jul 1, 2009 at 11:18 AM, Matt Schraeder <[hidden email]> wrote:
My current memory limit is 750 MB.  We're trying to do the sort WITHOUT sending the entire result set to the user.  Ideally we want a sorted result set, then pass pages of 10, 20, 30, 40, 50, etc results back to the user.  But I can't figure out a way of getting the results returned by Lucene to be sorted without running out of memory.

>>> till <[hidden email]> 7/1/2009 10:10:32 AM >>>

On Wed, Jul 1, 2009 at 5:11 PM, Matt Schraeder<[hidden email]> wrote:
> I'm having a rather drastic ussing with sorting using Zend Search Lucene.
> Our index has 18,000 documents, so light as far as Lucene is concerned.  I
> can perform a search that will pull back all of our records with no problem.
>
> However, once I attempt to add a sort field (Sorting by Title for instance),
> the script crashes with an out of memory exception.  What is the best way to
> handle sorting to avoid running out of memory?

Not sure if it's helpful, but maybe sort on the client side with JS?

Also, what is your current memory_limit, and how much more does it require?

Till



--
Michael Kimsal
http://jsmag.com - for javascript developers
http://groovymag.com - for groovy developers
919.827.4724
till

Re: Sorting with Zend Lucene

Reply Threaded More More options
Print post
Permalink
On Wed, Jul 1, 2009 at 5:23 PM, Michael Kimsal<[hidden email]> wrote:

> This seems a bit odd, as I've had SOLR (which uses Lucene) deal with far
> more than that and sorted without problem.
>
> Open question - for those of you using Lucene with the Zend Framework
> library, what's been the largest set you've dealt with, and what was your
> performance like (sorting, etc)?  I've only personally known people using
> the ZFLucene stuff with under 20k, and they'd complained about performance
> (which was likely slow because of large amounts of memory being used, like
> you're seeing).
>
> I see a couple of options:
> 1.  Don't try to pull all your fields back at once - just query for a doc id
> and the one field you want to sort by, and do your query that way.  Then
> figure out the doc ids in the range/slice you want, and then query for the
> full documents for each id.
>
> 2.  investigate using solr as the lucene server.  this might not be an
> option for you, but if it is, it might be worth considering.

I can second that. We are currently using Solr with the implementation
in ezComponents.

On a side-note, there has been an article in a recent issue of
php|Architect which dealt with many caveats of Zend_Search_Lucene.
Maybe backorder the issue (or get the PDF) and check it out.

Till
Matt Schraeder

Re: Sorting with Zend Lucene

Reply Threaded More More options
Print post
Permalink
In reply to this post by Matt Schraeder
Some javascript/style in this post has been disabled (why?)
On closer inspection, it seems that foreaching through the hits alone is enough to cause a memory usage to climb. 
 
    foreach ($hits as $hit) {
     echo $hit->SomeData;
     echo '<br>'.memory_get_usage().'<br>';
     @ob_flush();
     @flush();
    }
 
It looks as if each time through the loop, the previous $hit is still in memory.

>>> till <[hidden email]> 7/1/2009 10:33:40 AM >>>
On Wed, Jul 1, 2009 at 5:23 PM, Michael Kimsal<[hidden email]> wrote:

> This seems a bit odd, as I've had SOLR (which uses Lucene) deal with far
> more than that and sorted without problem.
>
> Open question - for those of you using Lucene with the Zend Framework
> library, what's been the largest set you've dealt with, and what was your
> performance like (sorting, etc)?  I've only personally known people using
> the ZFLucene stuff with under 20k, and they'd complained about performance
> (which was likely slow because of large amounts of memory being used, like
> you're seeing).
>
> I see a couple of options:
> 1.  Don't try to pull all your fields back at once - just query for a doc id
> and the one field you want to sort by, and do your query that way.  Then
> figure out the doc ids in the range/slice you want, and then query for the
> full documents for each id.
>
> 2.  investigate using solr as the lucene server.  this might not be an
> option for you, but if it is, it might be worth considering.

I can second that. We are currently using Solr with the implementation
in ezComponents.

On a side-note, there has been an article in a recent issue of
php|Architect which dealt with many caveats of Zend_Search_Lucene.
Maybe backorder the issue (or get the PDF) and check it out.

Till
pmjones

Re: Sorting with Zend Lucene

Reply Threaded More More options
Print post
Permalink

On Jul 1, 2009, at 12:26 , Matt Schraeder wrote:

> On closer inspection, it seems that foreaching through the hits  
> alone is enough to cause a memory usage to climb.
>
>     foreach ($hits as $hit) {
>      echo $hit->SomeData;
>      echo '<br>'.memory_get_usage().'<br>';
>      @ob_flush();
>      @flush();
>     }
>
> It looks as if each time through the loop, the previous $hit is  
> still in memory.

Perhaps this blog post has relevance:

   http://paul-m-jones.com/?p=262

Excerpt:

> If you have two objects in circular reference, such as in a parent-
> child relationship, calling unset() on the parent object will not  
> free the memory used for the parent reference in the child object.  
> (Nor will the memory be freed when the parent object is garbage-
> collected.)


--

Paul M. Jones
http://paul-m-jones.com/