Performance query

7 messages Options
Embed this post
Permalink
Daniel Sanchez-3

Performance query

Reply Threaded More More options
Print post
Permalink
Hi,
When i have many items (about 5000) in //data/section/ why this query
"//data/section/*[@ap:idsync='95']" is more slow that this "//element(*,
ap:seccion)[@ap:idsync='95']" ?

Thanks
Ard

Re: Performance query

Reply Threaded More More options
Print post
Permalink
Hello Daniel,

you can read the exact explanation (and more) in this mail:

http://www.nabble.com/Explanation-and-solutions-of-some-Jackrabbit-queries-regarding-performance-td15028655.html

regards Ard

On Thu, Jul 9, 2009 at 2:23 PM, Daniel Sanchez<[hidden email]> wrote:
> Hi,
> When i have many items (about 5000) in //data/section/ why this query
> "//data/section/*[@ap:idsync='95']" is more slow that this "//element(*,
> ap:seccion)[@ap:idsync='95']" ?
>
> Thanks
>
Alexander Klimetschek

Re: Performance query

Reply Threaded More More options
Print post
Permalink
In reply to this post by Daniel Sanchez-3
On Thu, Jul 9, 2009 at 2:23 PM, Daniel Sanchez<[hidden email]> wrote:
> Hi,
> When i have many items (about 5000) in //data/section/ why this query
> "//data/section/*[@ap:idsync='95']" is more slow that this "//element(*,
> ap:seccion)[@ap:idsync='95']" ?

This is because the path is not indexed, so if there is a path
location step in the query, the query execution has to additionally
access the repository to filter out results inside that path.

See also http://markmail.org/message/d2e2v7lo6vx6t7my

Path queries were improved lately (for 1.4.9 and 1.5.x I think):
https://issues.apache.org/jira/browse/JCR-1872

Regards,
Alex

--
Alexander Klimetschek
[hidden email]
Ard

Re: Performance query

Reply Threaded More More options
Print post
Permalink
>
> This is because the path is not indexed, so if there is a path
> location step in the query, the query execution has to additionally
> access the repository to filter out results inside that path.

It is actually done within the lucene indexes (which is technically
part of the repository but I think you mean something else, like
database access :-)) ), but it gets really expensive for lots of
results. There is no database access for filtering the path or
something. There is a hierarchical child axis query within the lucene
indexes that is just quite expensive.

Regards

>
> See also http://markmail.org/message/d2e2v7lo6vx6t7my
>
> Path queries were improved lately (for 1.4.9 and 1.5.x I think):
> https://issues.apache.org/jira/browse/JCR-1872
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> [hidden email]
>
Alexander Klimetschek

Re: Performance query

Reply Threaded More More options
Print post
Permalink
On Thu, Jul 9, 2009 at 3:13 PM, Ard Schrijvers<[hidden email]> wrote:

>>
>> This is because the path is not indexed, so if there is a path
>> location step in the query, the query execution has to additionally
>> access the repository to filter out results inside that path.
>
> It is actually done within the lucene indexes (which is technically
> part of the repository but I think you mean something else, like
> database access :-)) ), but it gets really expensive for lots of
> results. There is no database access for filtering the path or
> something. There is a hierarchical child axis query within the lucene
> indexes that is just quite expensive.

Ah, thanks for the heads up. With "repository" I was refering to the
persistence managers / item state managers / hierarchy manager. But
didn't know this actually happened purely inside the Lucene index.
BTW, doesn't this make a move difficult as well, when the index
contains the hierarchy information itself? Or is it just parent node
references that are stored in the lucene documents?

Regards,
Alex


--
Alexander Klimetschek
[hidden email]
Daniel Sanchez-3

Re: Performance query

Reply Threaded More More options
Print post
Permalink
Thanks for all

2009/7/9 Alexander Klimetschek <[hidden email]>

> On Thu, Jul 9, 2009 at 3:13 PM, Ard Schrijvers<[hidden email]>
> wrote:
> >>
> >> This is because the path is not indexed, so if there is a path
> >> location step in the query, the query execution has to additionally
> >> access the repository to filter out results inside that path.
> >
> > It is actually done within the lucene indexes (which is technically
> > part of the repository but I think you mean something else, like
> > database access :-)) ), but it gets really expensive for lots of
> > results. There is no database access for filtering the path or
> > something. There is a hierarchical child axis query within the lucene
> > indexes that is just quite expensive.
>
> Ah, thanks for the heads up. With "repository" I was refering to the
> persistence managers / item state managers / hierarchy manager. But
> didn't know this actually happened purely inside the Lucene index.
> BTW, doesn't this make a move difficult as well, when the index
> contains the hierarchy information itself? Or is it just parent node
> references that are stored in the lucene documents?
>
> Regards,
> Alex
>
>
> --
> Alexander Klimetschek
> [hidden email]
>
Ard

Re: Performance query

Reply Threaded More More options
Print post
Permalink
In reply to this post by Alexander Klimetschek
>
> Ah, thanks for the heads up. With "repository" I was refering to the
> persistence managers / item state managers / hierarchy manager. But
> didn't know this actually happened purely inside the Lucene index.
> BTW, doesn't this make a move difficult as well, when the index
> contains the hierarchy information itself? Or is it just parent node
> references that are stored in the lucene documents?

Yes exactly. The lookups are done within lucene. But, as jackrabbit
consists of a whole set of lucene indices, a lookup for a parent might
be in a different index, making it quite a bit slower: The more
'fractioned' your indices (as in many parents in different indices,
which happens if you have a lot of existing nodes which are being
updated), the slower it becomes. There is though quite some hierarchy
caching in lucene happening, still, it is really cpu intensive.

OTOH, I have always found expensive moving a lesser problem then
expensive searching, hence, we have chosen to index some 'pseudo
paths' in the index, enabling us to search on (simple) path
constraints alsmost instantly, as it is a single lucene term match
then...

Cheers Ard

>
> Regards,
> Alex
>
>
> --
> Alexander Klimetschek
> [hidden email]
>