Zend_Lucene + UTF8 search problem... Help!

1 message Options
Embed this post
Permalink
Maxim Savenko

Zend_Lucene + UTF8 search problem... Help!

Reply Threaded More More options
Print post
Permalink
Hi everybody,

I have a problem with searching russian strings, utf8 encoded,  with
Zend_Search_Lucene. Here is my short sample code:

<?php
require_once 'ZendInit.php';
require_once 'Zend/Search/Lucene.php';
require_once 'Zend/Search/Lucene/Document.php';

// Create index
$index = Zend_Search_Lucene::create('data/index');
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::Text('samplefield', 'русский
текст; english text', 'utf-8'));
$index->addDocument($doc);
$index->commit();

// Open index and search:
$index = Zend_Search_Lucene::open('data/index');
Zend_Search_Lucene_Search_QueryParser::setDefaultEncoding('utf-8');
Zend_Search_Lucene::setDefaultSearchField('samplefield');

// Query the index:
$queryStr = 'english';
$query = Zend_Search_Lucene_Search_QueryParser::parse($queryStr, 'utf-8');
$hits = $index->find($query);
foreach ($hits as $hit) {
   /*@var $hit Zend_Search_Lucene*/
   $doc = $hit->getDocument();
   echo $doc->getField('samplefield')->value, PHP_EOL;
}

The 'samplefield' of the document contain string in too languages �C
russian and english(see code). If we'll search 'english' it's all fine
- we successfully find the document, but if we'll try to find russian
part of field( set $queryStr to 'русский') then we don't find any
document.

What is a problem with my code? Help me find solution...

Thank you guys

Maxim Savenko
[hidden email]