Detect indexing problems

4 messages Options
Embed this post
Permalink
Paco Avila-2

Detect indexing problems

Reply Threaded More More options
Print post
Permalink
Sometimes when i put a document in the repository, the text extractor fails
and the document is not indexed. would be nice to test if a document has
been indexed or not, but currently i can't see an easy way to achieve this
behaviou.

Any idea?
Fabiano Nunes

Re: Detect indexing problems

Reply Threaded More More options
Print post
Permalink
What kind of documents?

On Thu, Jul 9, 2009 at 6:49 PM, Paco Avila <[hidden email]> wrote:

> Sometimes when i put a document in the repository, the text extractor fails
> and the document is not indexed. would be nice to test if a document has
> been indexed or not, but currently i can't see an easy way to achieve this
> behaviou.
>
> Any idea?
>
Paco Avila

Re: Detect indexing problems

Reply Threaded More More options
Print post
Permalink
Sometimes fails to index PDF, MSWord, MSExcel. I know this is due to the
PDFBox and POI libraries (I have sumitted some of these documents to them),
but is is important to know where there is a problem with the text
extractors.

On Mon, Jul 20, 2009 at 8:02 PM, Fabiano Nunes <[hidden email]> wrote:

> What kind of documents?
>
> On Thu, Jul 9, 2009 at 6:49 PM, Paco Avila <[hidden email]> wrote:
>
> > Sometimes when i put a document in the repository, the text extractor
> fails
> > and the document is not indexed. would be nice to test if a document has
> > been indexed or not, but currently i can't see an easy way to achieve
> this
> > behaviou.
> >
> > Any idea?
> >
>



--
Paco Avila
GIT Consultors
tel: +34 971 498310
fax: +34 971496189
e-mail: [hidden email]
http://www.git.es
Fabiano Nunes-2

Re: Detect indexing problems

Reply Threaded More More options
Print post
Permalink
About the PDF documents, see
https://issues.apache.org/jira/browse/PDFBOX-361.
I've resolved it using the PDFBox trunk version.

On Tue, Jul 21, 2009 at 2:59 AM, Paco Avila <[hidden email]> wrote:

> Sometimes fails to index PDF, MSWord, MSExcel. I know this is due to the
> PDFBox and POI libraries (I have sumitted some of these documents to them),
> but is is important to know where there is a problem with the text
> extractors.
>
> On Mon, Jul 20, 2009 at 8:02 PM, Fabiano Nunes <[hidden email]> wrote:
>
> > What kind of documents?
> >
> > On Thu, Jul 9, 2009 at 6:49 PM, Paco Avila <[hidden email]> wrote:
> >
> > > Sometimes when i put a document in the repository, the text extractor
> > fails
> > > and the document is not indexed. would be nice to test if a document
> has
> > > been indexed or not, but currently i can't see an easy way to achieve
> > this
> > > behaviou.
> > >
> > > Any idea?
> > >
> >
>
>
>
> --
> Paco Avila
> GIT Consultors
> tel: +34 971 498310
> fax: +34 971496189
> e-mail: [hidden email]
> http://www.git.es
>