Zend_Lucene + UTF8 search problem... Help!

3 messages Options
Embed this post
Permalink
Maxim Savenko-2

Zend_Lucene + UTF8 search problem... Help!

Reply Threaded More More options
Print post
Permalink
Hi everybody,

I have a problem with searching russian strings, utf8 encoded, with
Zend_Search_Lucene. Here is my short sample code:


require_once 'ZendInit.php';

require_once 'Zend/Search/Lucene.php';

require_once 'Zend/Search/Lucene/Document.php';


// Create index

$index = Zend_Search_Lucene::create('data/index');

$doc = new Zend_Search_Lucene_Document();

$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский
текст; english text', 'utf-8'));

$index->addDocument($doc);

$index->commit();


// Open index and search:

$index = Zend_Search_Lucene::open('data/index');

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

Zend_Search_Lucene::setDefaultSearchField('samplefield');


// Query the index:

$queryStr = 'english';

$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');

$hits = $index->find($query);

foreach ($hits as $hit) {

/*@var $hit Zend_Search_Lucene*/

$doc = $hit->getDocument();

echo $doc->getField('samplefield')->value, PHP_EOL;

}


The 'samplefield' of the document contain string in too languages –
russian and english(see code). If we'll search 'english' it's all fine -
we successfully find the document, but if we'll try to find russian part
of field( set $queryStr to 'русский') then we don't find any document.

What is a problem with my code? Help me find solution...

Thank you guys

Maxim Savenko


Christopher Östlund

Re: Zend_Lucene + UTF8 search problem... Help!

Reply Threaded More More options
Print post
Permalink
What's up with the spam?

On Thu, Jul 24, 2008 at 3:21 PM, Maxim Savenko <[hidden email]> wrote:
Hi everybody,

I have a problem with searching russian strings, utf8 encoded, with Zend_Search_Lucene. Here is my short sample code:


require_once 'ZendInit.php';

require_once 'Zend/Search/Lucene.php';

require_once 'Zend/Search/Lucene/Document.php';


// Create index

$index = Zend_Search_Lucene::create('data/index');

$doc = new Zend_Search_Lucene_Document();

$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский текст; english text', 'utf-8'));

$index->addDocument($doc);

$index->commit();


// Open index and search:

$index = Zend_Search_Lucene::open('data/index');

Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');

Zend_Search_Lucene::setDefaultSearchField('samplefield');


// Query the index:

$queryStr = 'english';

$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');

$hits = $index->find($query);

foreach ($hits as $hit) {

/*@var $hit Zend_Search_Lucene*/

$doc = $hit->getDocument();

echo $doc->getField('samplefield')->value, PHP_EOL;

}


The 'samplefield' of the document contain string in too languages – russian and english(see code). If we'll search 'english' it's all fine - we successfully find the document, but if we'll try to find russian part of field( set $queryStr to 'русский') then we don't find any document.

What is a problem with my code? Help me find solution...

Thank you guys

Maxim Savenko



Tobias Gies

Re: Zend_Lucene + UTF8 search problem... Help!

Reply Threaded More More options
Print post
Permalink
In reply to this post by Maxim Savenko-2
Maxim,

disregard the "Your message could not be delivered" spam. Your message was sent to this list 7 times now. The Mails with "Your message could not be delivered" are not being sent by Zend, they come from some british bloke who seems to be unable to properly configure his/her mailserver.

Best regards
Tobias

2008/7/24 Maxim Savenko <[hidden email]>:
Hi everybody,
[...]