Loosing documents from DATASTORE.

8 messages Options
Embed this post
Permalink
Alexander Wallace-2

Loosing documents from DATASTORE.

Reply Threaded More More options
Print post
Permalink
Hi all.. I've no idea where the problem exists, and I am researching...

I am using the db for storage, and using DATASTORE as well...

The first time I roled this out I migrated all documents to the db and
ended up with 300+ rows in DATASTORE...

I'm going through db backups to find out when it first happened, but
right now, when i count (*) from DATASTORE, i see only 10 rows... If i
go back a few days I see a few more...

Any idea of what could be hapening?

I know i run GC every night...

Any clues?

Thanks!
Thomas Müller-2

Re: Loosing documents from DATASTORE.

Reply Threaded More More options
Print post
Permalink
Hi,

I'm not sure what could be the cause of the problem. I have a few
questions: What version of Jackrabbit do you use? How did you find out
you are missing data (could you post the exception stack trace)? What
does your repository.xml look like, and did you change it recently?
Did you migrate data recently using XML import? How exactly do you run
the data store garbage collection?

Regards,
Thomas

On Tue, Jun 23, 2009 at 9:03 PM, Alexander Wallace<[hidden email]> wrote:

> Hi all.. I've no idea where the problem exists, and I am researching...
>
> I am using the db for storage, and using DATASTORE as well...
>
> The first time I roled this out I migrated all documents to the db and ended
> up with 300+ rows in DATASTORE...
>
> I'm going through db backups to find out when it first happened, but right
> now, when i count (*) from DATASTORE, i see only 10 rows... If i go back a
> few days I see a few more...
>
> Any idea of what could be hapening?
>
> I know i run GC every night...
>
> Any clues?
>
> Thanks!
>
Alexander Wallace-2

Re: Loosing documents from DATASTORE.

Reply Threaded More More options
Print post
Permalink
Thank you very much for your response...

I will answer all your questions...  In order below.

First let me say that I just did an Import on a clean db, from an XML
document, which brought in 170 documents.  No errors anywhere. I did a
count(*) from DATASTORE and I can see 170 docs there... Then I caused my
GC job to run and now there are only 5 records in DATASTORE ... So
evidently it is the GC job doing it.

It is worth saying that I have 2 nodes on each of the clusters I'm testing.

Yesterday I did an import to one of the clusters (150 docs).  To the
other cluster I manually uploaded 3 documents to my app which were added
to the DATASTORE as well... I let both clusters run for the night and on
both cases, this morning, most of the records in DATASTORE are gone...

In both cases there are a few documents (5) that remain there. I don't
know why.

It would seem that all new documents are deleted by GC, but that's not
true... I just uploaded one of the ones i did yesterday, and after
running GC, It remained. However all the ones I imported are gone,
except for the misterios 5.

Now I will answer your questions below:


Thomas Müller wrote:
> What version of Jackrabbit do you use?
jackrabbit-api.jar - 1.4.0
jackrabbit-core.jar - 1.4.1
jackrabbit-jcr-commons.jar - 1.4.0
jackrabbit-spi.jar - 1.4.0
jackrabbit-spi-commons.jar - 1.4.0
jackrabbit-text-extractors.jar - 1.4.0

It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of the
stuff...  But that's how the framework i have came (liferay 5.1.2)

> How did you find out
> you are missing data (could you post the exception stack trace)?
No errors... Basically a user reported that some docs he added a day
before were not found (using our app's UI) and I went straight to the DB
and noticed that most of the docs were gone... I've researched database
backups and realize that this has been going on for a while.

> What
> does your repository.xml look like, and did you change it recently?
>  
I will paste the repository.xml file now...  I changed it on May 16th,
which is when I moved documents from FS to DB, and the problem has been
there, it looks like, since then... It is a shame that I did not realize
this before, but that's the way it is... I have a backup of may 17th and
the backup is missing most of the docs I imported on the 16th...
Evidently GC job ran before the backup.

<?xml version="1.0"?>
<Repository>

  <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
    <param name="driver" value="com.mysql.jdbc.Driver"/>
    <param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
    <param name="user" value="username" />
    <param name="password" value="userPassword" />
    <param name="schema" value="mysql"/>
    <param name="databaseType" value="mysql"/>
    <param name="minRecordLength" value="1024"/>
    <param name="maxConnections" value="3"/>
    <param name="copyWhenReading" value="true"/>
    <!-- prefix can NOT be used other than to specify a schema when used
         this seems to be due to some inconsistency in jackrabbit when
         creating the table and when reading it. It uses the prefix for
creation
         but not for using it. So, in MySql we use no prefix
         the talble name is then DATASTORE, so it is reserved by jackrabbit
      -->
    <param name="tablePrefix" value=""/>
  </DataStore>

  <!-- FS Should not be shared accross nodes in the cluster,
       so this should either be local, or prefixed for each node in the
db -->
  <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
    <param name="driver" value="com.mysql.jdbc.Driver"/>
    <param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
    <param name="user" value="username" />
    <param name="password" value="userPassword" />
    <param name="schema" value="mysql"/>
    <param name="schemaObjectPrefix" value="JCR_NODE1_FS_"/>
  </FileSystem>
  <Security appName="Jackrabbit">
    <AccessManager
class="org.apache.jackrabbit.core.security.SimpleAccessManager" />
    <LoginModule
class="org.apache.jackrabbit.core.security.SimpleLoginModule">
      <param name="anonymousId" value="anonymous" />
    </LoginModule>
  </Security>
  <Workspaces rootPath="${rep.home}/workspaces"
defaultWorkspace="liferay" />
  <Workspace name="${wsp.name}">
    <!-- FS Should not be shared accross nodes in the cluster,
         so this should either be local, or prefixed for each node in
the db -->
    <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
      <param name="driver" value="com.mysql.jdbc.Driver"/>
      <param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schema" value="mysql"/>
      <param name="schemaObjectPrefix" value="JCR_NODE1_${wsp.name}_FS_"/>
    </FileSystem>
    <!-- PM needs to be shared accross the cluster -->
    <PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
      <param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schemaObjectPrefix" value="JCR_${wsp.name}_PM_"/>
      <param name="externalBLOBs" value="false"/>
    </PersistenceManager>
  </Workspace>
  <Versioning rootPath="${rep.home}/version">
    <!-- FS Should not be shared accross nodes in the cluster,
         so this should either be local, or prefixed for each node in
the db -->
    <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
      <param name="driver" value="com.mysql.jdbc.Driver"/>
      <param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schema" value="mysql"/>
      <param name="schemaObjectPrefix" value="JCR_NODE1_V_FS_"/>
    </FileSystem>
    <!-- PM needs to be shared accross the cluster -->
    <PersistenceManager
class="org.apache.jackrabbit.core.persistence.bundle.MySqlPersistenceManager">
      <param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schemaObjectPrefix" value="JCR_V_PM_"/>
      <param name="externalBLOBs" value="false"/>
    </PersistenceManager>
  </Versioning>
 
  <!-- Each cluster node needs to have a unique node id -->
  <Cluster id="NODE1" syncDelay="5">
    <!-- Journal needs to be shared accross the cluster -->
    <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal">
      <param name="revision" value="${rep.home}/revision"/>
      <param name="driver" value="com.mysql.jdbc.Driver"/>
      <param name="url"
value="jdbc:mysql://dbserver/lportal_stage_cluster?autoReconnect=true" />
      <param name="user" value="username" />
      <param name="password" value="userPassword" />
      <param name="schema" value="mysql"/>
      <param name="schemaObjectPrefix" value="JCR_JOURNAL_"/>
    </Journal>
  </Cluster>
</Repository>


> Did you migrate data recently using XML import?
Not recently, it was just back then (May 16th).
> How exactly do you run
> the data store garbage collection?
>
>  
I have a quartz job that runs every night... I have changed confign to
run it every 5 mintues for testing... Here is the code:

               GarbageCollector gc;
                SessionImpl si = (SessionImpl)
JCRFactoryUtil.createSession();
                gc = si.createDataStoreGarbageCollector();

                // optional (if you want to report progress
sometime):                                                                                                                      

               
//gc.setScanEventListener(this);                                                                                                                                            


                // scan must be called to find unused
elements                                                                                                                              

                gc.scan();
                gc.stopScan();

                // delete old
data                                                                                                                                                          

                gc.deleteUnused();

It seems that for now I could just disable GC... But i think it would be
better to fix it, so that I don't end up with a lot of space being taken
by unused documents...
> Regards,
> Thomas
>
>  
Again, I really appreciate your response and hope to hear your input on
my answers soon.

Best Regards!
Alex.

> On Tue, Jun 23, 2009 at 9:03 PM, Alexander Wallace<[hidden email]> wrote:
>  
>> Hi all.. I've no idea where the problem exists, and I am researching...
>>
>> I am using the db for storage, and using DATASTORE as well...
>>
>> The first time I roled this out I migrated all documents to the db and ended
>> up with 300+ rows in DATASTORE...
>>
>> I'm going through db backups to find out when it first happened, but right
>> now, when i count (*) from DATASTORE, i see only 10 rows... If i go back a
>> few days I see a few more...
>>
>> Any idea of what could be hapening?
>>
>> I know i run GC every night...
>>
>> Any clues?
>>
>> Thanks!
>>
>>    
>
>
>  
Alexander Klimetschek

Re: Loosing documents from DATASTORE.

Reply Threaded More More options
Print post
Permalink
On Wed, Jun 24, 2009 at 7:23 PM, Alexander Wallace<[hidden email]> wrote:
> jackrabbit-api.jar - 1.4.0
> jackrabbit-core.jar - 1.4.1
> jackrabbit-jcr-commons.jar - 1.4.0
> jackrabbit-spi.jar - 1.4.0
> jackrabbit-spi-commons.jar - 1.4.0
> jackrabbit-text-extractors.jar - 1.4.0
>
> It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of the
> stuff...  But that's how the framework i have came (liferay 5.1.2)

No, that's right. For patch releases (the third part of the version
number), only the jars that actually have changed get an increase in
the version number, the rest is not released again. In this case,
jackrabbit-core got a change, but not all the others.

Regards,
Alex

--
Alexander Klimetschek
[hidden email]
Alexander Wallace-2

Re: Loosing documents from DATASTORE.

Reply Threaded More More options
Print post
Permalink
Thanks for the clarification!

I'm still baffled by the behavior presented by the garbage collection
process for the data store...

I've disabled it for now, but that means that I'll start accumulating
unused documents... Although that's way better than loosing most
documents arbitrarily.

Again, thanks!

Alexander Klimetschek wrote:

> On Wed, Jun 24, 2009 at 7:23 PM, Alexander Wallace<[hidden email]> wrote:
>  
>> jackrabbit-api.jar - 1.4.0
>> jackrabbit-core.jar - 1.4.1
>> jackrabbit-jcr-commons.jar - 1.4.0
>> jackrabbit-spi.jar - 1.4.0
>> jackrabbit-spi-commons.jar - 1.4.0
>> jackrabbit-text-extractors.jar - 1.4.0
>>
>> It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of the
>> stuff...  But that's how the framework i have came (liferay 5.1.2)
>>    
>
> No, that's right. For patch releases (the third part of the version
> number), only the jars that actually have changed get an increase in
> the version number, the rest is not released again. In this case,
> jackrabbit-core got a change, but not all the others.
>
> Regards,
> Alex
>
>  
Alexander Wallace-2

Re: Loosing documents from DATASTORE.

Reply Threaded More More options
Print post
Permalink
I wonder if my issue is related to:
https://issues.apache.org/jira/browse/JCR-2063

Alexander Wallace wrote:

> Thanks for the clarification!
>
> I'm still baffled by the behavior presented by the garbage collection
> process for the data store...
>
> I've disabled it for now, but that means that I'll start accumulating
> unused documents... Although that's way better than loosing most
> documents arbitrarily.
>
> Again, thanks!
>
> Alexander Klimetschek wrote:
>> On Wed, Jun 24, 2009 at 7:23 PM, Alexander Wallace<[hidden email]>
>> wrote:
>>  
>>> jackrabbit-api.jar - 1.4.0
>>> jackrabbit-core.jar - 1.4.1
>>> jackrabbit-jcr-commons.jar - 1.4.0
>>> jackrabbit-spi.jar - 1.4.0
>>> jackrabbit-spi-commons.jar - 1.4.0
>>> jackrabbit-text-extractors.jar - 1.4.0
>>>
>>> It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of the
>>> stuff...  But that's how the framework i have came (liferay 5.1.2)
>>>    
>>
>> No, that's right. For patch releases (the third part of the version
>> number), only the jars that actually have changed get an increase in
>> the version number, the rest is not released again. In this case,
>> jackrabbit-core got a change, but not all the others.
>>
>> Regards,
>> Alex
>>
>>  
>
>
Thomas Müller-2

Re: Loosing documents from DATASTORE.

Reply Threaded More More options
Print post
Permalink
Hi,

> jackrabbit-core.jar - 1.4.1

I suggest to use the most recent version of Jackrabbit core. See also:
http://jackrabbit.apache.org/downloads.html

Regards,
Thomas

On Thu, Jun 25, 2009 at 6:07 PM, Alexander Wallace<[hidden email]> wrote:

> I wonder if my issue is related to:
> https://issues.apache.org/jira/browse/JCR-2063
>
> Alexander Wallace wrote:
>>
>> Thanks for the clarification!
>>
>> I'm still baffled by the behavior presented by the garbage collection
>> process for the data store...
>>
>> I've disabled it for now, but that means that I'll start accumulating
>> unused documents... Although that's way better than loosing most documents
>> arbitrarily.
>>
>> Again, thanks!
>>
>> Alexander Klimetschek wrote:
>>>
>>> On Wed, Jun 24, 2009 at 7:23 PM, Alexander Wallace<[hidden email]>
>>> wrote:
>>>
>>>>
>>>> jackrabbit-api.jar - 1.4.0
>>>> jackrabbit-core.jar - 1.4.1
>>>> jackrabbit-jcr-commons.jar - 1.4.0
>>>> jackrabbit-spi.jar - 1.4.0
>>>> jackrabbit-spi-commons.jar - 1.4.0
>>>> jackrabbit-text-extractors.jar - 1.4.0
>>>>
>>>> It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of the
>>>> stuff...  But that's how the framework i have came (liferay 5.1.2)
>>>>
>>>
>>> No, that's right. For patch releases (the third part of the version
>>> number), only the jars that actually have changed get an increase in
>>> the version number, the rest is not released again. In this case,
>>> jackrabbit-core got a change, but not all the others.
>>>
>>> Regards,
>>> Alex
>>>
>>>
>>
>>
>
Alexander Wallace-2

Re: Loosing documents from DATASTORE.

Reply Threaded More More options
Print post
Permalink
Thanks for the response... I've disabled GC for now... And will analyze
the feasibility of upgrading...

Thanks again!

Thomas Müller wrote:

> Hi,
>
>  
>> jackrabbit-core.jar - 1.4.1
>>    
>
> I suggest to use the most recent version of Jackrabbit core. See also:
> http://jackrabbit.apache.org/downloads.html
>
> Regards,
> Thomas
>
> On Thu, Jun 25, 2009 at 6:07 PM, Alexander Wallace<[hidden email]> wrote:
>  
>> I wonder if my issue is related to:
>> https://issues.apache.org/jira/browse/JCR-2063
>>
>> Alexander Wallace wrote:
>>    
>>> Thanks for the clarification!
>>>
>>> I'm still baffled by the behavior presented by the garbage collection
>>> process for the data store...
>>>
>>> I've disabled it for now, but that means that I'll start accumulating
>>> unused documents... Although that's way better than loosing most documents
>>> arbitrarily.
>>>
>>> Again, thanks!
>>>
>>> Alexander Klimetschek wrote:
>>>      
>>>> On Wed, Jun 24, 2009 at 7:23 PM, Alexander Wallace<[hidden email]>
>>>> wrote:
>>>>
>>>>        
>>>>> jackrabbit-api.jar - 1.4.0
>>>>> jackrabbit-core.jar - 1.4.1
>>>>> jackrabbit-jcr-commons.jar - 1.4.0
>>>>> jackrabbit-spi.jar - 1.4.0
>>>>> jackrabbit-spi-commons.jar - 1.4.0
>>>>> jackrabbit-text-extractors.jar - 1.4.0
>>>>>
>>>>> It seems wrong that I have a core 1.4.1 and 1.4.0 for the rest of the
>>>>> stuff...  But that's how the framework i have came (liferay 5.1.2)
>>>>>
>>>>>          
>>>> No, that's right. For patch releases (the third part of the version
>>>> number), only the jars that actually have changed get an increase in
>>>> the version number, the rest is not released again. In this case,
>>>> jackrabbit-core got a change, but not all the others.
>>>>
>>>> Regards,
>>>> Alex
>>>>
>>>>
>>>>        
>>>      
>
>
>