|
|
|
gregoryjoseph
|
Hi list,
Given the following code, import java.text.Normalizer; ... final Session session = ... final Repository rep = session.getRepository(); System.out.println(rep.getDescriptor("jcr.repository.name") + " " + rep.getDescriptor("jcr.repository.version")); final Node root = session.getRootNode(); final String name = "föö"; System.out.println("Normalizer.isNormalized(name, Normalizer.Form.NFC) = " + Normalizer.isNormalized(name, Normalizer.Form.NFC)); // true System.out.println("Normalizer.isNormalized(name, Normalizer.Form.NFD) = " + Normalizer.isNormalized(name, Normalizer.Form.NFD)); // false root.addNode(name); session.save(); final Node node1 = root.getNode(name); System.out.println("node1 = " + node1); final Node node2 = root.getNode(Normalizer.normalize(name, Normalizer.Form.NFC)); System.out.println("node2 = " + node2); final Node node3 = root.getNode(Normalizer.normalize(name, Normalizer.Form.NFD)); // fails System.out.println("node3 = " + node3); There's a good chance fetching node3 won't work. It might be dependent on the underlying os and database, but in the case of OSX and Derby, this fails. It's not that surprising, really, given that Normalizer.normalize(name, Normalizer.Form.NFC).equals(Normalizer.normalize(name, Normalizer.Form.NFD)) is NOT true. Now, taking into account the fact that all sorts of clients will use a different Normalizing Form (Firefox seems to encode URL parameters with NFD, Safari with NFC; linux NFC, OSX finder seems to favor NFD), wouldn't it be a safe bet to normalize all input at repository level ? Or do you consider this is something client applications should do ? ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms Thanks for any tip, pointer, idea, feedback or reaction ! Cheers, -greg |
||||||||||||||||
|
gregoryjoseph
|
fwiw, the following solves the simple problem shown by my previous
example: private Session wrap(final SessionImpl origSession) throws RepositoryException { final WorkspaceImpl workspace = (WorkspaceImpl) origSession.getWorkspace(); final RepositoryImpl rep = (RepositoryImpl) origSession.getRepository(); return new SessionImpl(rep, origSession.getSubject(), workspace.getConfig()) { public Path getQPath(String path) throws MalformedPathException, IllegalNameException, NamespaceException { // this is the only relevant part: return super.getQPath(Normalizer.normalize(path, Normalizer.Form.NFC)); } }; } If there was a way to swap the session implementation or the Name-and/ or-PathResolver implementations that are used by default, I might give this a spin. Any opinions about the whole problem? Cheers, -g On Nov 4, 2009, at 6:11 PM, Grégory Joseph wrote: > Hi list, > > Given the following code, > import java.text.Normalizer; > ... > > final Session session = ... > > final Repository rep = session.getRepository(); > System.out.println(rep.getDescriptor("jcr.repository.name") + > " " + rep.getDescriptor("jcr.repository.version")); > > final Node root = session.getRootNode(); > final String name = "föö"; > System.out.println("Normalizer.isNormalized(name, > Normalizer.Form.NFC) = " + Normalizer.isNormalized(name, > Normalizer.Form.NFC)); // true > System.out.println("Normalizer.isNormalized(name, > Normalizer.Form.NFD) = " + Normalizer.isNormalized(name, > Normalizer.Form.NFD)); // false > root.addNode(name); > session.save(); > > final Node node1 = root.getNode(name); > System.out.println("node1 = " + node1); > final Node node2 = root.getNode(Normalizer.normalize(name, > Normalizer.Form.NFC)); > System.out.println("node2 = " + node2); > final Node node3 = root.getNode(Normalizer.normalize(name, > Normalizer.Form.NFD)); // fails > System.out.println("node3 = " + node3); > > There's a good chance fetching node3 won't work. It might be > dependent on the underlying os and database, but in the case of OSX > and Derby, this fails. It's not that surprising, really, given that > Normalizer.normalize(name, > Normalizer.Form.NFC).equals(Normalizer.normalize(name, > Normalizer.Form.NFD)) is NOT true. > > Now, taking into account the fact that all sorts of clients will use > a different Normalizing Form (Firefox seems to encode URL parameters > with NFD, Safari with NFC; linux NFC, OSX finder seems to favor > NFD), wouldn't it be a safe bet to normalize all input at repository > level ? Or do you consider this is something client applications > should do ? > > ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms > > Thanks for any tip, pointer, idea, feedback or reaction ! > > Cheers, > > -greg > > |
||||||||||||||||
|
Tobias Bocanegra-3
|
hi,
i don't think this should be the job of the repository to do normalization of the paths. likewise a good filesystem (a case sensitive one :-) does no normalization of it's paths neither. regards, toby 2009/11/4 Grégory Joseph <[hidden email]>: > fwiw, the following solves the simple problem shown by my previous example: > > private Session wrap(final SessionImpl origSession) throws > RepositoryException { > final WorkspaceImpl workspace = (WorkspaceImpl) > origSession.getWorkspace(); > final RepositoryImpl rep = (RepositoryImpl) > origSession.getRepository(); > return new SessionImpl(rep, origSession.getSubject(), > workspace.getConfig()) { > public Path getQPath(String path) throws MalformedPathException, > IllegalNameException, NamespaceException { > // this is the only relevant part: > return super.getQPath(Normalizer.normalize(path, > Normalizer.Form.NFC)); > } > }; > } > > If there was a way to swap the session implementation or the > Name-and/or-PathResolver implementations that are used by default, I might > give this a spin. > > Any opinions about the whole problem? > > Cheers, > > -g > > On Nov 4, 2009, at 6:11 PM, Grégory Joseph wrote: > >> Hi list, >> >> Given the following code, >> import java.text.Normalizer; >> ... >> >> final Session session = ... >> >> final Repository rep = session.getRepository(); >> System.out.println(rep.getDescriptor("jcr.repository.name") + " " + >> rep.getDescriptor("jcr.repository.version")); >> >> final Node root = session.getRootNode(); >> final String name = "föö"; >> System.out.println("Normalizer.isNormalized(name, >> Normalizer.Form.NFC) = " + Normalizer.isNormalized(name, >> Normalizer.Form.NFC)); // true >> System.out.println("Normalizer.isNormalized(name, >> Normalizer.Form.NFD) = " + Normalizer.isNormalized(name, >> Normalizer.Form.NFD)); // false >> root.addNode(name); >> session.save(); >> >> final Node node1 = root.getNode(name); >> System.out.println("node1 = " + node1); >> final Node node2 = root.getNode(Normalizer.normalize(name, >> Normalizer.Form.NFC)); >> System.out.println("node2 = " + node2); >> final Node node3 = root.getNode(Normalizer.normalize(name, >> Normalizer.Form.NFD)); // fails >> System.out.println("node3 = " + node3); >> >> There's a good chance fetching node3 won't work. It might be dependent on >> the underlying os and database, but in the case of OSX and Derby, this >> fails. It's not that surprising, really, given that >> Normalizer.normalize(name, >> Normalizer.Form.NFC).equals(Normalizer.normalize(name, Normalizer.Form.NFD)) >> is NOT true. >> >> Now, taking into account the fact that all sorts of clients will use a >> different Normalizing Form (Firefox seems to encode URL parameters with NFD, >> Safari with NFC; linux NFC, OSX finder seems to favor NFD), wouldn't it be a >> safe bet to normalize all input at repository level ? Or do you consider >> this is something client applications should do ? >> >> ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms >> >> Thanks for any tip, pointer, idea, feedback or reaction ! >> >> Cheers, >> >> -greg >> >> > > > |
||||||||||||||||
|
gregoryjoseph
|
Hi Toby,
On Nov 5, 2009, at 12:26 AM, Tobias Bocanegra wrote: > hi, > i don't think this should be the job of the repository to do > normalization of the paths. likewise a good filesystem (a case > sensitive one :-) does no normalization of it's paths neither. Since I wrote this yesterday in quite a rush, let me just stress the fact that I'm only talking about unicode normalization forms; a filesystem won't have to bother about that, since it doesn't have a whole slew of clients who decide to use one form or the other for no apparent reason. For "fun", you might want to see this: http://www.mail-archive.com/bug-bash@.../msg05818.html I can see why one would want to make a differentiation between the 2 forms in *values*; in item names, not so much. Thoughts ? -g > 2009/11/4 Grégory Joseph <[hidden email]>: >> fwiw, the following solves the simple problem shown by my previous >> example: >> >> private Session wrap(final SessionImpl origSession) throws >> RepositoryException { >> final WorkspaceImpl workspace = (WorkspaceImpl) >> origSession.getWorkspace(); >> final RepositoryImpl rep = (RepositoryImpl) >> origSession.getRepository(); >> return new SessionImpl(rep, origSession.getSubject(), >> workspace.getConfig()) { >> public Path getQPath(String path) throws >> MalformedPathException, >> IllegalNameException, NamespaceException { >> // this is the only relevant part: >> return super.getQPath(Normalizer.normalize(path, >> Normalizer.Form.NFC)); >> } >> }; >> } >> >> If there was a way to swap the session implementation or the >> Name-and/or-PathResolver implementations that are used by default, >> I might >> give this a spin. >> >> Any opinions about the whole problem? >> >> Cheers, >> >> -g >> >> On Nov 4, 2009, at 6:11 PM, Grégory Joseph wrote: >> >>> Hi list, >>> >>> Given the following code, >>> import java.text.Normalizer; >>> ... >>> >>> final Session session = ... >>> >>> final Repository rep = session.getRepository(); >>> System.out.println(rep.getDescriptor("jcr.repository.name") >>> + " " + >>> rep.getDescriptor("jcr.repository.version")); >>> >>> final Node root = session.getRootNode(); >>> final String name = "föö"; >>> System.out.println("Normalizer.isNormalized(name, >>> Normalizer.Form.NFC) = " + Normalizer.isNormalized(name, >>> Normalizer.Form.NFC)); // true >>> System.out.println("Normalizer.isNormalized(name, >>> Normalizer.Form.NFD) = " + Normalizer.isNormalized(name, >>> Normalizer.Form.NFD)); // false >>> root.addNode(name); >>> session.save(); >>> >>> final Node node1 = root.getNode(name); >>> System.out.println("node1 = " + node1); >>> final Node node2 = root.getNode(Normalizer.normalize(name, >>> Normalizer.Form.NFC)); >>> System.out.println("node2 = " + node2); >>> final Node node3 = root.getNode(Normalizer.normalize(name, >>> Normalizer.Form.NFD)); // fails >>> System.out.println("node3 = " + node3); >>> >>> There's a good chance fetching node3 won't work. It might be >>> dependent on >>> the underlying os and database, but in the case of OSX and Derby, >>> this >>> fails. It's not that surprising, really, given that >>> Normalizer.normalize(name, >>> Normalizer.Form.NFC).equals(Normalizer.normalize(name, >>> Normalizer.Form.NFD)) >>> is NOT true. >>> >>> Now, taking into account the fact that all sorts of clients will >>> use a >>> different Normalizing Form (Firefox seems to encode URL parameters >>> with NFD, >>> Safari with NFC; linux NFC, OSX finder seems to favor NFD), >>> wouldn't it be a >>> safe bet to normalize all input at repository level ? Or do you >>> consider >>> this is something client applications should do ? >>> >>> ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms >>> >>> Thanks for any tip, pointer, idea, feedback or reaction ! >>> >>> Cheers, >>> >>> -greg >>> >>> >> >> >> |
||||||||||||||||
|
Tobias Bocanegra-3
|
2009/11/5 Grégory Joseph <[hidden email]>:
> Hi Toby, > > On Nov 5, 2009, at 12:26 AM, Tobias Bocanegra wrote: > >> hi, >> i don't think this should be the job of the repository to do >> normalization of the paths. likewise a good filesystem (a case >> sensitive one :-) does no normalization of it's paths neither. > > Since I wrote this yesterday in quite a rush, let me just stress the fact > that I'm only talking about unicode normalization forms; a filesystem won't > have to bother about that, since it doesn't have a whole slew of clients who > decide to use one form or the other for no apparent reason. For "fun", you > might want to see this: > http://www.mail-archive.com/bug-bash@.../msg05818.html > > I can see why one would want to make a differentiation between the 2 forms > in *values*; in item names, not so much. however, i think the path to an item needs to be solid - the search can still provide you with all stemming and normalization you need. regards, toby > > Thoughts ? > > -g > >> 2009/11/4 Grégory Joseph <[hidden email]>: >>> >>> fwiw, the following solves the simple problem shown by my previous >>> example: >>> >>> private Session wrap(final SessionImpl origSession) throws >>> RepositoryException { >>> final WorkspaceImpl workspace = (WorkspaceImpl) >>> origSession.getWorkspace(); >>> final RepositoryImpl rep = (RepositoryImpl) >>> origSession.getRepository(); >>> return new SessionImpl(rep, origSession.getSubject(), >>> workspace.getConfig()) { >>> public Path getQPath(String path) throws >>> MalformedPathException, >>> IllegalNameException, NamespaceException { >>> // this is the only relevant part: >>> return super.getQPath(Normalizer.normalize(path, >>> Normalizer.Form.NFC)); >>> } >>> }; >>> } >>> >>> If there was a way to swap the session implementation or the >>> Name-and/or-PathResolver implementations that are used by default, I >>> might >>> give this a spin. >>> >>> Any opinions about the whole problem? >>> >>> Cheers, >>> >>> -g >>> >>> On Nov 4, 2009, at 6:11 PM, Grégory Joseph wrote: >>> >>>> Hi list, >>>> >>>> Given the following code, >>>> import java.text.Normalizer; >>>> ... >>>> >>>> final Session session = ... >>>> >>>> final Repository rep = session.getRepository(); >>>> System.out.println(rep.getDescriptor("jcr.repository.name") + " " + >>>> rep.getDescriptor("jcr.repository.version")); >>>> >>>> final Node root = session.getRootNode(); >>>> final String name = "föö"; >>>> System.out.println("Normalizer.isNormalized(name, >>>> Normalizer.Form.NFC) = " + Normalizer.isNormalized(name, >>>> Normalizer.Form.NFC)); // true >>>> System.out.println("Normalizer.isNormalized(name, >>>> Normalizer.Form.NFD) = " + Normalizer.isNormalized(name, >>>> Normalizer.Form.NFD)); // false >>>> root.addNode(name); >>>> session.save(); >>>> >>>> final Node node1 = root.getNode(name); >>>> System.out.println("node1 = " + node1); >>>> final Node node2 = root.getNode(Normalizer.normalize(name, >>>> Normalizer.Form.NFC)); >>>> System.out.println("node2 = " + node2); >>>> final Node node3 = root.getNode(Normalizer.normalize(name, >>>> Normalizer.Form.NFD)); // fails >>>> System.out.println("node3 = " + node3); >>>> >>>> There's a good chance fetching node3 won't work. It might be dependent >>>> on >>>> the underlying os and database, but in the case of OSX and Derby, this >>>> fails. It's not that surprising, really, given that >>>> Normalizer.normalize(name, >>>> Normalizer.Form.NFC).equals(Normalizer.normalize(name, >>>> Normalizer.Form.NFD)) >>>> is NOT true. >>>> >>>> Now, taking into account the fact that all sorts of clients will use a >>>> different Normalizing Form (Firefox seems to encode URL parameters with >>>> NFD, >>>> Safari with NFC; linux NFC, OSX finder seems to favor NFD), wouldn't it >>>> be a >>>> safe bet to normalize all input at repository level ? Or do you consider >>>> this is something client applications should do ? >>>> >>>> ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms >>>> >>>> Thanks for any tip, pointer, idea, feedback or reaction ! >>>> >>>> Cheers, >>>> >>>> -greg >>>> >>>> >>> >>> >>> > > > |
||||||||||||||||
|
gregoryjoseph
|
On Nov 5, 2009, at 3:39 PM, Tobias Bocanegra wrote: > 2009/11/5 Grégory Joseph <[hidden email]>: >> Hi Toby, >> >> On Nov 5, 2009, at 12:26 AM, Tobias Bocanegra wrote: >> >>> hi, >>> i don't think this should be the job of the repository to do >>> normalization of the paths. likewise a good filesystem (a case >>> sensitive one :-) does no normalization of it's paths neither. >> >> Since I wrote this yesterday in quite a rush, let me just stress >> the fact >> that I'm only talking about unicode normalization forms; a >> filesystem won't >> have to bother about that, since it doesn't have a whole slew of >> clients who >> decide to use one form or the other for no apparent reason. For >> "fun", you >> might want to see this: >> http://www.mail-archive.com/bug-bash@.../msg05818.html >> >> I can see why one would want to make a differentiation between the >> 2 forms >> in *values*; in item names, not so much. > well, i see a repository somewhere in between filesystems and > databases. > > however, i think the path to an item needs to be solid - the search > can still provide you with all stemming and normalization you need. I can see why one wouldn't this as the default behaviour; is there any chance the current PathResolver implementation could become configurable or swappable? >> >>> 2009/11/4 Grégory Joseph <[hidden email]>: >>>> >>>> fwiw, the following solves the simple problem shown by my previous >>>> example: >>>> >>>> private Session wrap(final SessionImpl origSession) throws >>>> RepositoryException { >>>> final WorkspaceImpl workspace = (WorkspaceImpl) >>>> origSession.getWorkspace(); >>>> final RepositoryImpl rep = (RepositoryImpl) >>>> origSession.getRepository(); >>>> return new SessionImpl(rep, origSession.getSubject(), >>>> workspace.getConfig()) { >>>> public Path getQPath(String path) throws >>>> MalformedPathException, >>>> IllegalNameException, NamespaceException { >>>> // this is the only relevant part: >>>> return super.getQPath(Normalizer.normalize(path, >>>> Normalizer.Form.NFC)); >>>> } >>>> }; >>>> } >>>> >>>> If there was a way to swap the session implementation or the >>>> Name-and/or-PathResolver implementations that are used by >>>> default, I >>>> might >>>> give this a spin. >>>> >>>> Any opinions about the whole problem? >>>> >>>> Cheers, >>>> >>>> -g >>>> >>>> On Nov 4, 2009, at 6:11 PM, Grégory Joseph wrote: >>>> >>>>> Hi list, >>>>> >>>>> Given the following code, >>>>> import java.text.Normalizer; >>>>> ... >>>>> >>>>> final Session session = ... >>>>> >>>>> final Repository rep = session.getRepository(); >>>>> System.out.println(rep.getDescriptor("jcr.repository.name") >>>>> + " " + >>>>> rep.getDescriptor("jcr.repository.version")); >>>>> >>>>> final Node root = session.getRootNode(); >>>>> final String name = "föö"; >>>>> System.out.println("Normalizer.isNormalized(name, >>>>> Normalizer.Form.NFC) = " + Normalizer.isNormalized(name, >>>>> Normalizer.Form.NFC)); // true >>>>> System.out.println("Normalizer.isNormalized(name, >>>>> Normalizer.Form.NFD) = " + Normalizer.isNormalized(name, >>>>> Normalizer.Form.NFD)); // false >>>>> root.addNode(name); >>>>> session.save(); >>>>> >>>>> final Node node1 = root.getNode(name); >>>>> System.out.println("node1 = " + node1); >>>>> final Node node2 = root.getNode(Normalizer.normalize(name, >>>>> Normalizer.Form.NFC)); >>>>> System.out.println("node2 = " + node2); >>>>> final Node node3 = root.getNode(Normalizer.normalize(name, >>>>> Normalizer.Form.NFD)); // fails >>>>> System.out.println("node3 = " + node3); >>>>> >>>>> There's a good chance fetching node3 won't work. It might be >>>>> dependent >>>>> on >>>>> the underlying os and database, but in the case of OSX and >>>>> Derby, this >>>>> fails. It's not that surprising, really, given that >>>>> Normalizer.normalize(name, >>>>> Normalizer.Form.NFC).equals(Normalizer.normalize(name, >>>>> Normalizer.Form.NFD)) >>>>> is NOT true. >>>>> >>>>> Now, taking into account the fact that all sorts of clients will >>>>> use a >>>>> different Normalizing Form (Firefox seems to encode URL >>>>> parameters with >>>>> NFD, >>>>> Safari with NFC; linux NFC, OSX finder seems to favor NFD), >>>>> wouldn't it >>>>> be a >>>>> safe bet to normalize all input at repository level ? Or do you >>>>> consider >>>>> this is something client applications should do ? >>>>> >>>>> ref: http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms >>>>> >>>>> Thanks for any tip, pointer, idea, feedback or reaction ! >>>>> >>>>> Cheers, >>>>> >>>>> -greg >>>>> >>>>> >>>> >>>> >>>> >> >> >> > |
||||||||||||||||
|
Alexander Klimetschek
|
2009/11/6 Grégory Joseph <[hidden email]>:
> I can see why one wouldn't this as the default behaviour; is there any > chance the current PathResolver implementation could become configurable or > swappable? I think nobody sees a real issue with that (yet). Your original example code that fails under certain combinations (OSX and Derby) is not a good case, as it can be expected to fail that way, as the original name "föö" provided is changed within the java application itself. I expect that any string in a Java application follows the same utf-8 encoding & normalization. If you find a combination (eg. including a browser or other client, using webdav, etc.) where it fails, this would be helpful. Also note that most (all?) people use the URL space as node names, to map it back and forth and unify the naming, just as in a plain unix filesystem. This gives plain ASCII and leaves out any umlautes. Regards, Alex -- Alexander Klimetschek [hidden email] |
||||||||||||||||
|
Alexander Klimetschek
|
In reply to this post
by gregoryjoseph
2009/11/6 Grégory Joseph <[hidden email]>:
> I can see why one wouldn't this as the default behaviour; is there any > chance the current PathResolver implementation could become configurable or > swappable? Sorry forgot to answer your question: no, it's not easily swappable by configuration. Regards, Alex -- Alexander Klimetschek [hidden email] |
||||||||||||||||
|
gregoryjoseph
|
In reply to this post
by Alexander Klimetschek
Hi Alex,
On Nov 6, 2009, at 4:46 PM, Alexander Klimetschek wrote: > 2009/11/6 Grégory Joseph <[hidden email]>: >> I can see why one wouldn't this as the default behaviour; is there >> any >> chance the current PathResolver implementation could become >> configurable or >> swappable? > > I think nobody sees a real issue with that (yet). Your original > example code that fails under certain combinations (OSX and Derby) is > not a good case, as it can be expected to fail that way, as the > original name "föö" provided is changed within the java application > itself. I expect that any string in a Java application follows the > same utf-8 encoding & normalization. If you find a combination (eg. > including a browser or other client, using webdav, etc.) where it > fails, this would be helpful. Map a webdav folder to OSX's finder, create a node with umlauts, it will be created with the NFD form. (java.text.Normalizer.isNormalized() to see that, or String.getBytes()) Map the same folder using Linux or Windows, I'm pretty sure the files will be created using the NFC form. TBH, I still have to try that; I stumbled upon the issue earlier because of something rather silly: at some point, a path is passed to a servlet, and this passed was not encoded on the client side (i.e the html used to trigger this call was wrong); somehow, it seems Firefox respected the original form (NFD) while apparently Safari tempered with it and converted it to NFC first. Granted, this isn't really convincing. Now that this piece is patched and the urls are encoded, clients seem to behave much better, in that they don't temper with the normal form anymore. Still, I have no control under what form a node is created. This could mean (to be verified) that in the case of a node type that does not allow same- name siblings, one could actually create two nodes with an "apparent" same name. > Also note that most (all?) people use the URL space as node names, to > map it back and forth and unify the naming, just as in a plain unix > filesystem. This gives plain ASCII and leaves out any umlautes. Sure; same remark as above though, without enforcing the normalization, you could end up with what could appear as "duplicates" (even though they're really not) > 2009/11/6 Grégory Joseph <[hidden email]>: >> I can see why one wouldn't this as the default behaviour; is there >> any >> chance the current PathResolver implementation could become >> configurable or >> swappable? > > Sorry forgot to answer your question: no, it's not easily swappable by > configuration. Encoding URLs properly is probably going to solve most of my problems; I've been looking at patching this, but it would seem indeed pretty contrived and requiring quite some code on our side to just change the type of PathResolver to use, for instance (starting from org.apache.jackrabbit.core.jndi.RegistryHelper and all the way down to javax.jcr.Repository#login. Could this maybe be something that would its place in the WorkspaceConfig ? Cheers, -g |
||||||||||||||||
|
Alexander Klimetschek
|
2009/11/6 Grégory Joseph <[hidden email]>:
> Map a webdav folder to OSX's finder, create a node with umlauts, it will be > created with the NFD form. > (java.text.Normalizer.isNormalized() to see that, or String.getBytes()) > > Map the same folder using Linux or Windows, I'm pretty sure the files will > be created using the NFC form. > TBH, I still have to try that; An explicit failure case would be good, as I think nobody has seen this issue (yet) with Jackrabbit. The only occurrence of this different normalization issue was with certain filenames (containing "special" characters) in SVN that was used both on Windows and Mac. But that was using the standard C-based SVN client. I think with Java the UTF-8 support is better. > Still, I have no control under what > form a node is created. This could mean (to be verified) that in the case of > a node type that does not allow same-name siblings, one could actually > create two nodes with an "apparent" same name. I think (feel free to correct me here) that under Java both strings should be equal(), regardless of their normalization when serialized and stored onto disk. > Encoding URLs properly is probably going to solve most of my problems; I've > been looking at patching this, but it would seem indeed pretty contrived and > requiring quite some code on our side to just change the type of > PathResolver to use, for instance (starting from > org.apache.jackrabbit.core.jndi.RegistryHelper and all the way down to > javax.jcr.Repository#login. Could this maybe be something that would its > place in the WorkspaceConfig ? I think would be an advanced setting, since the JCR compliance is based on a PathResolver working according to the spec, and people should not be easily allowed to "break" Jackrabbit this way. Rather, if this is really an issue, it should simply be fixed in Jackrabbit (PathResolver or where else the String might need to be normalized). Regards, Alex -- Alexander Klimetschek [hidden email] |
||||||||||||||||
| Free Embeddable Forum Powered by Nabble | Help |