Zend_Search_Lucene: wildcard search

11 messages Options
Embed this post
Permalink
Ralf Eggert

Zend_Search_Lucene: wildcard search

Reply Threaded More More options
Print post
Permalink
Hi again,

sorry for writing so many new mails about Zend_Search_Lucene, but
questions come up consecutively while working with it.

In the documentation there is a chapter about wildcard search:

http://framework.zend.com/manual/en/zend.search.lucene.query-api.html#zend.search.lucene.queries.wildcard

When I try to use this, I get an fatal error because the class
Zend_Search_Lucene_Search_Query_Wildcard does not exist.

Is the documentation outdated? How can I process a wildcard search now?

I downloaded Luke 0.7.1 to access the index directly. I am able to enter
a search expression like this:

    contents:whatever AND (destination:1.40.44.* OR site:2)

This is parsed to

    +contents:whatever +(destination:1.40.44.* site:2)

Does Zend_Search_Lucene support these kind of queries yet? If so, how
should I build the query with the Query Construction API?

Thanks and Best Regards,

Ralf
Ralf Eggert

Re: Zend_Search_Lucene: wildcard search

Reply Threaded More More options
Print post
Permalink
Hi,

This is weird.

> Is the documentation outdated? How can I process a wildcard search now?

In ZF 1.0.3 and the Zend_Search_Lucene_Search_Query_Wildcard class is
missing. But when I look in the SVN the class was changed on 17.07.2007.
So it should be available in the current version.

Why is the file missing?

Best Regards,

Ralf

Ralf Eggert

Re: Zend_Search_Lucene: combine wildcard search with other terms?

Reply Threaded More More options
Print post
Permalink
Hi,

I downloaded the current snapshot which has the class
Zend_Search_Lucene_Search_Query_Wildcard included. So now, my wildcard
search works. But unfortunately only for a single term.

What do I need to do, when I would want to use rather complex queries
like this one, combining wildcard search with AND and OR operators:

  contents:whatever AND (destination:1.40.44.* OR site:2)

or

  +contents:whatever +(destination:1.40.44.* site:2)

Any hints how I can extend Zend_Search_Lucene to get this working?

Best Regards,

Ralf

Alexander Veremyev

RE: Zend_Search_Lucene: combine wildcard search with other terms?

Reply Threaded More More options
Print post
Permalink
Could you give an example of queries which don't work?


With best regards,
   Alexander Veremyev.

> -----Original Message-----
> From: Ralf Eggert [mailto:[hidden email]]
> Sent: Friday, January 04, 2008 3:00 PM
> To: [hidden email]
> Subject: Re: [fw-formats] Zend_Search_Lucene: combine wildcard search with
> other terms?
>
> Hi,
>
> I downloaded the current snapshot which has the class
> Zend_Search_Lucene_Search_Query_Wildcard included. So now, my wildcard
> search works. But unfortunately only for a single term.
>
> What do I need to do, when I would want to use rather complex queries
> like this one, combining wildcard search with AND and OR operators:
>
>   contents:whatever AND (destination:1.40.44.* OR site:2)
>
> or
>
>   +contents:whatever +(destination:1.40.44.* site:2)
>
> Any hints how I can extend Zend_Search_Lucene to get this working?
>
> Best Regards,
>
> Ralf
>
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.516 / Virus Database: 269.17.13/1210 - Release Date:
> 05.01.2008 11:46
>

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.17.13/1214 - Release Date: 08.01.2008 13:38
 
Ralf Eggert

Re: Zend_Search_Lucene: combine wildcard search with other terms?

Reply Threaded More More options
Print post
Permalink
Hi Alexander,

sorry, must have missed your mail. Thanks for your reply.

What I would like to do is send a query, which combines the wildcard
search with other terms and even adds some boolean operator:

+contents:whatever +(destination:1.40.44.* site:2)

This should get all documents with the term "whatever" in the contents
field AND (the wildcard value "1.40.44.*" in the destination field OR
the value "2" in the site field).

I was able to successfully process such a query with Luke 0.7.1 but I am
afraid this is not possible with Zend_Search_Lucene (yet).

If I am right and this is not possible yet and you are too busy to solve
this issue in the nearby future, maybe you could be so kind to assist me
in solving this for myself. I really do need this feature as soon as
possible. Please advise and point my in the right direction!

Thanks and Best Regards,

Ralf



Alexander Veremyev

RE: Zend_Search_Lucene: combine wildcard search with other terms?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Ralf Eggert
Hi Ralf,

Queries like "+contents:whatever +(destination:1.40.44.* site:2)" should work correctly.

The problem is in the "1.40.44.*" parsing.
1) You should use TextNum analyzer for indexing and searching if you want numbers to be interpreted as parts of terms
2) '.' are treated as words delimiters. So 'destination:1.40.44.xx' is transformed to phrase: 'destination:"1 40 44 xx"', but if you use 'destination:1.40.44.*' you will get an exception 'Wildcard search is supported only for non-multiple word terms'.
Use your own analyzer or change '.' to some letter.


PS Keyword fields are intended for this case, but Zend_Search_Lucene query parser doesn't support non-tokenized fields now (see http://framework.zend.com/issues/browse/ZF-623 for details).


With best regards,
   Alexander Veremyev.



> -----Original Message-----
> From: Ralf Eggert [mailto:[hidden email]]
> Sent: Thursday, January 17, 2008 10:56 AM
> To: [hidden email]
> Cc: Alexander Veremyev
> Subject: Re: [fw-formats] Zend_Search_Lucene: combine
> wildcard search with other terms?
>
> Hi Alexander,
>
> sorry, must have missed your mail. Thanks for your reply.
>
> What I would like to do is send a query, which combines the
> wildcard search with other terms and even adds some boolean operator:
>
> +contents:whatever +(destination:1.40.44.* site:2)
>
> This should get all documents with the term "whatever" in the
> contents field AND (the wildcard value "1.40.44.*" in the
> destination field OR the value "2" in the site field).
>
> I was able to successfully process such a query with Luke
> 0.7.1 but I am afraid this is not possible with
> Zend_Search_Lucene (yet).
>
> If I am right and this is not possible yet and you are too
> busy to solve this issue in the nearby future, maybe you
> could be so kind to assist me in solving this for myself. I
> really do need this feature as soon as possible. Please
> advise and point my in the right direction!
>
> Thanks and Best Regards,
>
> Ralf
>
>
>
>
> No virus found in this incoming message.
> Checked by AVG Free Edition.
> Version: 7.5.516 / Virus Database: 269.19.4/1227 - Release
> Date: 16.01.2008 1:40
>  
>

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.19.4/1227 - Release Date: 16.01.2008 1:40
 
Ralf Eggert

Re: Zend_Search_Lucene: combine wildcard search with other terms?

Reply Threaded More More options
Print post
Permalink
Hi Alexander,

thanks, I will give it a try during the next week and send a note to the
list whether I got it to work or not.

Best Regards,

Ralf


Ralf Eggert

Re: Zend_Search_Lucene: combine wildcard search with other terms?

Reply Threaded More More options
Print post
Permalink
In reply to this post by Alexander Veremyev
Hi Alexander,

after an upgrade to ZF 1.5.0PR and changing the "1.40.44.*" to
"1x40x44x*" I basically got the wildcard search running. But now I
encountered other problems.

First, I would like to build this query with the Query Construction API
but I don't know how to do it.

  +contents:whatever +(destination:1x40x44x* site:2)

Second, when I use the query string above directly I get different
results between Zend_Search_Lucene and Luke. It seems as if Luke
presents the correct results, while Zend_Search_Lucene passes all
documents that match only "+contents:whatever". The rest of the query
string seems to be ignored.

Third, I only indexed 3.000 out of 150.000 documents yet and optimized
the index afterwards. While Luke shows the results almost immediately,
Zend_Search_Lucene already takes 1 second to find the results. Now I am
afraid that due to my wildcard search construct the search time will
rise even more when indexing all 150.000 documents.

Any idea to solve any of these problems?

Thanks and Best Regards,

Ralf

dinok

Re: Zend_Search_Lucene: wildcard search

Reply Threaded More More options
Print post
Permalink
In reply to this post by Ralf Eggert
Hi guys,

I'm really surprised about the current version. The search runs very well and returns good results. Also the numeric function is working now, thanks for the utf8num analyzer.
But the wildcardsearch brings one problem with it. If I search for:
"php" in a index with about 1000 documents, I get the result (about 100 hits) in 0.2seconds.
This is really nice! But when I try: "php *" the search is very slow (takes about 7seconds!).
Yet another "deadly query" is the "*" which returns a timeout after 60 seconds (Fatal error:  Maximum execution time of 60 seconds exceeded in Zend\Search\Lucene\Storage\File.php on line 302).
Now you might say, check if the query is longer than 3 characters. But what if the query is "*?**??*". This also returns a timeout..
So is there a possibility to eliminate system killing queries?
The only solution for me is, to allow only one wildcard a query.
But this doesn't solve the "* php" or "php *" or "php (*)" and so on :-/

Any ideas?
Best regards
Matthew Ratzloff

Re: Zend_Search_Lucene: wildcard search

Reply Threaded More More options
Print post
Permalink
Unfortunately we've had several problems with Zend_Search_Lucene, including this.  Ultimately we were forced to filter out asterisks and question marks entirely.

-Matt

On Wed, Jul 16, 2008 at 2:14 AM, dinok <[hidden email]> wrote:

Hi guys,

I'm really surprised about the current version. The search runs very well
and returns good results. Also the numeric function is working now, thanks
for the utf8num analyzer.
But the wildcardsearch brings one problem with it. If I search for:
"php" in a index with about 1000 documents, I get the result (about 100
hits) in 0.2seconds.
This is really nice! But when I try: "php *" the search is very slow (takes
about 7seconds!).
Yet another "deadly query" is the "*" which returns a timeout after 60
seconds (Fatal error:  Maximum execution time of 60 seconds exceeded in
Zend\Search\Lucene\Storage\File.php on line 302).
Now you might say, check if the query is longer than 3 characters. But what
if the query is "*?**??*". This also returns a timeout..
So is there a possibility to eliminate system killing queries?
The only solution for me is, to allow only one wildcard a query.
But this doesn't solve the "* php" or "php *" or "php (*)" and so on :-/

Any ideas?
Best regards
--
View this message in context: http://www.nabble.com/Zend_Search_Lucene%3A-wildcard-search-tp14601520p18483566.html
Sent from the Zend MFS mailing list archive at Nabble.com.


wllm

RE: Zend_Search_Lucene: wildcard search

Reply Threaded More More options
Print post
Permalink
Some javascript/style in this post has been disabled (why?)

Is there an issue tracker issue for this. If so, please vote on it. If not, please create one (and vote on it J ).

Time is of the essence; we will be prioritizing issues later this week to fix during our bug squashing next week.

 

,Wil

 

From: Matthew Ratzloff [mailto:[hidden email]]
Sent: Wednesday, July 16, 2008 10:19 AM
To: dinok
Cc: [hidden email]
Subject: Re: [fw-formats] Zend_Search_Lucene: wildcard search

 

Unfortunately we've had several problems with Zend_Search_Lucene, including this.  Ultimately we were forced to filter out asterisks and question marks entirely.

 

-Matt

On Wed, Jul 16, 2008 at 2:14 AM, dinok <[hidden email]> wrote:


Hi guys,

I'm really surprised about the current version. The search runs very well
and returns good results. Also the numeric function is working now, thanks
for the utf8num analyzer.
But the wildcardsearch brings one problem with it. If I search for:
"php" in a index with about 1000 documents, I get the result (about 100
hits) in 0.2seconds.
This is really nice! But when I try: "php *" the search is very slow (takes
about 7seconds!).
Yet another "deadly query" is the "*" which returns a timeout after 60
seconds (Fatal error:  Maximum execution time of 60 seconds exceeded in
Zend\Search\Lucene\Storage\File.php on line 302).
Now you might say, check if the query is longer than 3 characters. But what
if the query is "*?**??*". This also returns a timeout..
So is there a possibility to eliminate system killing queries?
The only solution for me is, to allow only one wildcard a query.
But this doesn't solve the "* php" or "php *" or "php (*)" and so on :-/

Any ideas?
Best regards
--
View this message in context: http://www.nabble.com/Zend_Search_Lucene%3A-wildcard-search-tp14601520p18483566.html
Sent from the Zend MFS mailing list archive at Nabble.com.