|
|
|
Medha C Sutaria
|
Hi,
We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). It uses the following configuration - JCRHook + PersistenceManager + File system The file system is stored on a local folder of the same machine where the application is running. Now we want to cluster our application, for which we need to share the jackrabbit repository. Tried to find a lot of solutions for this, but couldn't zero down on the best solution for this. We need to decide which combination of congifurations is the best for clustering and high performance - JCRHook vs FileSystemHook? PersistenceManager vs Datastore? FileSystem vs Database? if filesystem, sharing the file system? or using SAN? More concerns - 1. To select a different solution, migration of current documents to the new solutions has to be done 2. The repository is increasing exponentially. The file system size is already 10 GB. Does jackrabbit support such large repositories? At what point will the performance start degrading? 3. What will it take to upgrade the jackrabbit version from 1.4 to 1.6? Will be require any migration? Thanks and regards, Medha Sutaria |
||||||||||||||||
|
Alexander Klimetschek
|
On Fri, Nov 6, 2009 at 08:39, Medha C Sutaria <[hidden email]> wrote:
> We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). It > uses the following configuration - > JCRHook + PersistenceManager + File system >... > JCRHook vs FileSystemHook? > PersistenceManager vs Datastore? > FileSystem vs Database? > if filesystem, sharing the file system? or using SAN? You need to be more specific. Which persistence manager are you using? Quick notes: - bundle based persistence managers are best - local dbs (like derby or h2) have better performance than remote dbs - datastore will only be used for large binaries; using filedatastore is a better choice than storing the binaries in a database (using a db pm) - FileSystem (element in repository.xml) is not important anymore, does not influence peformance - JCRHook seems to be a proprietary liferay component, so we (jackrabbit devs) cannot give you any information on this - if you do clustering and use the datastore, you will need a shared file system, SAN is typically the best (but know your network performance) > 1. To select a different solution, migration of current documents to the > new solutions has to be done See here http://wiki.apache.org/jackrabbit/BackupAndMigration for some options. > 2. The repository is increasing exponentially. The file system size is > already 10 GB. Does jackrabbit support such large repositories? At what > point will the performance start degrading? Depends on what configuration you actually use. > 3. What will it take to upgrade the jackrabbit version from 1.4 to 1.6? > Will be require any migration? AFAIK nothing would be required from 1.4 to 1.6. Minor version numbers are meant to be backwards compatible in Jackrabbit. Regards, Alex -- Alexander Klimetschek [hidden email] |
||||||||||||||||
|
Medha C Sutaria
|
Thanks Alex for the prompt reply! Your answers have really cleared some of the things. Few more queries inline - Alexander Klimetschek <[hidden email]> wrote on 11/06/2009 02:40:38 PM: > On Fri, Nov 6, 2009 at 08:39, Medha C Sutaria <[hidden email]> wrote: > > We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). It > > uses the following configuration - > > JCRHook + PersistenceManager + File system > >... > > JCRHook vs FileSystemHook? > > PersistenceManager vs Datastore? > > FileSystem vs Database? > > if filesystem, sharing the file system? or using SAN? > > You need to be more specific. Which persistence manager are you using? Medha - we use BundleFsPersistenceManager. > > Quick notes: > - bundle based persistence managers are best > - local dbs (like derby or h2) have better performance than remote dbs Medha - Any idea about MySql? Saw some posts about table locking and concurrent access issues while retrieving/updating files > - datastore will only be used for large binaries; using filedatastore > is a better choice than storing the binaries in a database (using a db > pm) > - FileSystem (element in repository.xml) is not important anymore, > does not influence peformance Is it this tag you are talking about? If yes, then isn't this which decides if we want to store data in DB or on LocalFileSystem? <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${rep.home}/repository" /> </FileSystem> > - JCRHook seems to be a proprietary liferay component, so we > (jackrabbit devs) cannot give you any information on this > - if you do clustering and use the datastore, you will need a shared > file system, SAN is typically the best (but know your network > performance) Will adding this tag in our repository.xml make the repository usable with SAN? (our repository.xml attached) <DataStore class="org.apache.jackrabbit.core.data.FileDataStore"> <param name="path" value="${rep.home}/repository/datastore"/> <param name="minRecordLength" value="1000"/> </DataStore> > > > 1. To select a different solution, migration of current documents to the > > new solutions has to be done > > See here http://wiki.apache.org/jackrabbit/BackupAndMigration for > some options. I checked out these options in the past. I've learned that there's a problem in migrating versions. That part of the repository tree is secured and cannot be exported to an xml file. We needed migration of data when we tried to use DbFileSystem instead of LocalFileSystem. I guess we don't need to migrate in case of changing other configuration? Eg. using datastore? > > > 2. The repository is increasing exponentially. The file system size is > > already 10 GB. Does jackrabbit support such large repositories? At what > > point will the performance start degrading? > > Depends on what configuration you actually use. Can you suggest which is the best configuration for clustering (based on performance and large repositories) > > > 3. What will it take to upgrade the jackrabbit version from 1.4 to 1.6? > > Will be require any migration? > > AFAIK nothing would be required from 1.4 to 1.6. Minor version numbers > are meant to be backwards compatible in Jackrabbit. This sounds great! > > Regards, > Alex > > -- > Alexander Klimetschek > [hidden email] |
||||||||||||||||
|
Medha C Sutaria
|
my repository.xml (dont know how to attach a file on this thread!) -
<?xml version="1.0"?> <Repository> <!-- added to use datastore for larger files --> <DataStore class="org.apache.jackrabbit.core.data.FileDataStore"> <param name="path" value="${rep.home}/repository/datastore"/> <param name="minRecordLength" value="1000"/> </DataStore> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${rep.home}/repository" /> </FileSystem> <!-- Database File System (Cluster Configuration) This is sample configuration for mysql persistence that can be used for clustering Jackrabbit. For other databases, change the connection, credentials, and schema settings. --> <!--<FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem"> <param name="driver" value="com.mysql.jdbc.Driver"/> <param name="url" value="jdbc:mysql://localhost/jcr" /> <param name="user" value="" /> <param name="password" value="" /> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="J_R_FS_"/> </FileSystem>--> <Security appName="Jackrabbit"> <AccessManager class="org.apache.jackrabbit.core.security.SimpleAccessManager" /> <LoginModule class="org.apache.jackrabbit.core.security.SimpleLoginModule"> <param name="anonymousId" value="anonymous" /> </LoginModule> </Security> <Workspaces rootPath="${rep.home}/workspaces" defaultWorkspace="liferay" /> <Workspace name="${wsp.name}"> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${wsp.home}" /> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.BundleFsPersistenceManager" /> <!-- Database File System and Persistence (Cluster Configuration) This is sample configuration for mysql persistence that can be used for clustering Jackrabbit. For other databases, change the connection, credentials, and schema settings. --> <!--<PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> <param name="driver" value="com.mysql.jdbc.Driver" /> <param name="url" value="jdbc:mysql://localhost/jcr" /> <param name="user" value="" /> <param name="password" value="" /> <param name="schema" value="mysql" /> <param name="schemaObjectPrefix" value="J_PM_${wsp.name}_" /> <param name="externalBLOBs" value="false" /> </PersistenceManager> <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem"> <param name="driver" value="com.mysql.jdbc.Driver"/> <param name="url" value="jdbc:mysql://localhost/jcr" /> <param name="user" value="" /> <param name="password" value="" /> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="J_FS_${wsp.name}_"/> </FileSystem>--> </Workspace> <Versioning rootPath="${rep.home}/version"> <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${rep.home}/version" /> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.persistence.bundle.BundleFsPersistenceManager" /> <!-- Database File System and Persistence (Cluster Configuration) This is sample configuration for mysql persistence that can be used for clustering Jackrabbit. For other databases, change the connection, credentials, and schema settings. --> <!--<FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem"> <param name="driver" value="com.mysql.jdbc.Driver"/> <param name="url" value="jdbc:mysql://localhost/jcr" /> <param name="user" value="" /> <param name="password" value="" /> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="J_V_FS_"/> </FileSystem> <PersistenceManager class="org.apache.jackrabbit.core.state.db.SimpleDbPersistenceManager"> <param name="driver" value="com.mysql.jdbc.Driver" /> <param name="url" value="jdbc:mysql://localhost/jcr" /> <param name="user" value="" /> <param name="password" value="" /> <param name="schema" value="mysql" /> <param name="schemaObjectPrefix" value="J_V_PM_" /> <param name="externalBLOBs" value="false" /> </PersistenceManager>--> </Versioning> <!-- Cluster Configuration This is sample configuration for mysql persistence that can be used for clustering Jackrabbit. For other databases, change the connection, credentials, and schema settings. --> <!--<Cluster id="node_1" syncDelay="5"> <Journal class="org.apache.jackrabbit.core.journal.DatabaseJournal"> <param name="revision" value="${rep.home}/revision"/> <param name="driver" value="com.mysql.jdbc.Driver"/> <param name="url" value="jdbc:mysql://localhost/jcr"/> <param name="user" value=""/> <param name="password" value=""/> <param name="schema" value="mysql"/> <param name="schemaObjectPrefix" value="J_C_"/> </Journal> </Cluster>--> </Repository> Medha C Sutaria/FSG/CSC@CSC 11/06/2009 05:52 PM Please respond to [hidden email] To [hidden email] cc Subject Re: jackrabbit configuration in clustered environment Thanks Alex for the prompt reply! Your answers have really cleared some of the things. Few more queries inline - Alexander Klimetschek <[hidden email]> wrote on 11/06/2009 02:40:38 PM: > On Fri, Nov 6, 2009 at 08:39, Medha C Sutaria <[hidden email]> wrote: > > We are using Jackrabbit (version 1.4.1) with liferay (version 5.2.2). It > > uses the following configuration - > > JCRHook + PersistenceManager + File system > >... > > JCRHook vs FileSystemHook? > > PersistenceManager vs Datastore? > > FileSystem vs Database? > > if filesystem, sharing the file system? or using SAN? > > You need to be more specific. Which persistence manager are you using? Medha - we use BundleFsPersistenceManager. > > Quick notes: > - bundle based persistence managers are best > - local dbs (like derby or h2) have better performance than remote dbs Medha - Any idea about MySql? Saw some posts about table locking and concurrent access issues while retrieving/updating files > - datastore will only be used for large binaries; using filedatastore > is a better choice than storing the binaries in a database (using a db > pm) > - FileSystem (element in repository.xml) is not important anymore, > does not influence peformance Is it this tag you are talking about? If yes, then isn't this which decides if we want to store data in DB or on LocalFileSystem? <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> <param name="path" value="${rep.home}/repository" /> </FileSystem> > - JCRHook seems to be a proprietary liferay component, so we > (jackrabbit devs) cannot give you any information on this > - if you do clustering and use the datastore, you will need a shared > file system, SAN is typically the best (but know your network > performance) Will adding this tag in our repository.xml make the repository usable with SAN? (our repository.xml attached) <DataStore class="org.apache.jackrabbit.core.data.FileDataStore"> <param name="path" value="${rep.home}/repository/datastore"/> <param name="minRecordLength" value="1000"/> </DataStore> > > > 1. To select a different solution, migration of current documents to the > > new solutions has to be done > > See here http://wiki.apache.org/jackrabbit/BackupAndMigration for > some options. I checked out these options in the past. I've learned that there's a problem in migrating versions. That part of the repository tree is secured and cannot be exported to an xml file. We needed migration of data when we tried to use DbFileSystem instead of LocalFileSystem. I guess we don't need to migrate in case of changing other configuration? Eg. using datastore? > > > 2. The repository is increasing exponentially. The file system size is > > already 10 GB. Does jackrabbit support such large repositories? At what > > point will the performance start degrading? > > Depends on what configuration you actually use. Can you suggest which is the best configuration for clustering (based on performance and large repositories) > > > 3. What will it take to upgrade the jackrabbit version from 1.4 to 1.6? > > Will be require any migration? > > AFAIK nothing would be required from 1.4 to 1.6. Minor version numbers > are meant to be backwards compatible in Jackrabbit. This sounds great! > > Regards, > Alex > > -- > Alexander Klimetschek > [hidden email] |
||||||||||||||||
|
Alexander Klimetschek
|
In reply to this post
by Medha C Sutaria
On Fri, Nov 6, 2009 at 13:22, Medha C Sutaria <[hidden email]> wrote:
> Medha - we use BundleFsPersistenceManager. Note that the FS-based persistence managers don't guarantee any consistency. See http://wiki.apache.org/jackrabbit/PersistenceManagerFAQ >> - FileSystem (element in repository.xml) is not important anymore, >> does not influence peformance > Is it this tag you are talking about? If yes, then isn't this which decides > if we want to store data in DB or on LocalFileSystem? > <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem"> > <param name="path" value="${rep.home}/repository" /> > </FileSystem> No, it doesn't. As I mentioned, it is legacy and not important. Where your data is stored depends on the persistence manager (central persistence component), datastore (if used, either db or file) and also search index (file only if enabled) and the clustering journal. > Will adding this tag in our repository.xml make the repository usable with > SAN? (our repository.xml attached) > <DataStore class="org.apache.jackrabbit.core.data.FileDataStore"> > <param name="path" value="${rep.home}/repository/datastore"/> > <param name="minRecordLength" value="1000"/> > </DataStore> The datastore only stores large binaries. If you want to share your repository, you need clustering anyway. See http://wiki.apache.org/jackrabbit/DataStore and http://wiki.apache.org/jackrabbit/Clustering >> See here http://wiki.apache.org/jackrabbit/BackupAndMigration for >> some options. > I checked out these options in the past. I've learned that there's a problem > in migrating versions. That part of the repository tree is secured and > cannot be exported to an xml file. We needed migration of data when we tried > to use DbFileSystem instead of LocalFileSystem. I guess we don't need to > migrate in case of changing other configuration? Eg. using datastore? Have you tried the (fairly new, since 1.6) RepositoryCopier mentioned at the end of that wiki page? http://jackrabbit.apache.org/api/1.6/org/apache/jackrabbit/core/RepositoryCopier.html > Can you suggest which is the best configuration for clustering (based on > performance and large repositories) See the mentioned wiki pages. A database with good and fast clustering is important. For fast write and streaming of large binaries a shared filedatastore is optimal, though that depends on the speed of the SAN. Regards, Alex -- Alexander Klimetschek [hidden email] |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |