[jira] Created: (LANG-516) Define standard for escape/unescape XML

6 messages Options
Embed this post
Permalink
JIRA jira@apache.org

[jira] Created: (LANG-516) Define standard for escape/unescape XML

Reply Threaded More More options
Print post
Permalink
Define standard for escape/unescape XML
---------------------------------------

                 Key: LANG-516
                 URL: https://issues.apache.org/jira/browse/LANG-516
             Project: Commons Lang
          Issue Type: Sub-task
            Reporter: Henri Yandell
             Fix For: 3.0




--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Commented: (LANG-516) Define standard for escape/unescape XML

Reply Threaded More More options
Print post
Permalink

    [ https://issues.apache.org/jira/browse/LANG-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733042#action_12733042 ]

Henri Yandell commented on LANG-516:
------------------------------------

XML escaping is:

        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_ESCAPE()),
            new LookupTranslator(EntityArrays.APOS_ESCAPE()),
            NumericEntityEscaper.above(0x7f)
        );

Unescaping is:

        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_UNESCAPE()),
            new LookupTranslator(EntityArrays.APOS_UNESCAPE()),
            new NumericEntityUnescaper()
        );

Questions raised have been whether to escape some of the characters below 0x32, and whether it should be escaping all the characters above 0x7f. One suggestion has been a need for XML_1_0 and XML_1_1 variants.

> Define standard for escape/unescape XML
> ---------------------------------------
>
>                 Key: LANG-516
>                 URL: https://issues.apache.org/jira/browse/LANG-516
>             Project: Commons Lang
>          Issue Type: Sub-task
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Commented: (LANG-516) Define standard for escape/unescape XML

Reply Threaded More More options
Print post
Permalink
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LANG-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733043#action_12733043 ]

Henri Yandell commented on LANG-516:
------------------------------------

HTML escaping:

    public static final CharSequenceTranslator ESCAPE_HTML3 =
        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_ESCAPE()),
            new LookupTranslator(EntityArrays.ISO8859_1_ESCAPE()),
            NumericEntityEscaper.above(0x7f)
        );
               
    public static final CharSequenceTranslator ESCAPE_HTML4 =
        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_ESCAPE()),
            new LookupTranslator(EntityArrays.ISO8859_1_ESCAPE()),
            new LookupTranslator(EntityArrays.HTML40_EXTENDED_ESCAPE()),
            NumericEntityEscaper.above(0x7f)
        );

HTML unescaping:

    public static final CharSequenceTranslator UNESCAPE_HTML3 =
        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_UNESCAPE()),
            new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE()),
            new NumericEntityUnescaper()
        );

    public static final CharSequenceTranslator UNESCAPE_HTML4 =
        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_UNESCAPE()),
            new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE()),
            new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE()),
            new NumericEntityUnescaper()
        );

Major question raised is why are we escaping numeric entities above 0x7f. Also request to escape below 0x20.

> Define standard for escape/unescape XML
> ---------------------------------------
>
>                 Key: LANG-516
>                 URL: https://issues.apache.org/jira/browse/LANG-516
>             Project: Commons Lang
>          Issue Type: Sub-task
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Updated: (LANG-516) Define standard for escape/unescape XML

Reply Threaded More More options
Print post
Permalink
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LANG-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-516:
-------------------------------

    Comment: was deleted

(was: HTML escaping:

    public static final CharSequenceTranslator ESCAPE_HTML3 =
        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_ESCAPE()),
            new LookupTranslator(EntityArrays.ISO8859_1_ESCAPE()),
            NumericEntityEscaper.above(0x7f)
        );
               
    public static final CharSequenceTranslator ESCAPE_HTML4 =
        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_ESCAPE()),
            new LookupTranslator(EntityArrays.ISO8859_1_ESCAPE()),
            new LookupTranslator(EntityArrays.HTML40_EXTENDED_ESCAPE()),
            NumericEntityEscaper.above(0x7f)
        );

HTML unescaping:

    public static final CharSequenceTranslator UNESCAPE_HTML3 =
        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_UNESCAPE()),
            new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE()),
            new NumericEntityUnescaper()
        );

    public static final CharSequenceTranslator UNESCAPE_HTML4 =
        new AggregateTranslator(
            new LookupTranslator(EntityArrays.BASIC_UNESCAPE()),
            new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE()),
            new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE()),
            new NumericEntityUnescaper()
        );

Major question raised is why are we escaping numeric entities above 0x7f. Also request to escape below 0x20.)

> Define standard for escape/unescape XML
> ---------------------------------------
>
>                 Key: LANG-516
>                 URL: https://issues.apache.org/jira/browse/LANG-516
>             Project: Commons Lang
>          Issue Type: Sub-task
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Commented: (LANG-516) Define standard for escape/unescape XML

Reply Threaded More More options
Print post
Permalink
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LANG-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777871#action_12777871 ]

Henri Yandell commented on LANG-516:
------------------------------------

As with the HTML issue:

I think the best is to do the minimum, as it's easy to add additional escapes for 0x7f and 0x20. So with respect to the above, the NumericEntityEscaper sections would be removed from both escape options. Unescape would look the same.

> Define standard for escape/unescape XML
> ---------------------------------------
>
>                 Key: LANG-516
>                 URL: https://issues.apache.org/jira/browse/LANG-516
>             Project: Commons Lang
>          Issue Type: Sub-task
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Closed: (LANG-516) Define standard for escape/unescape XML

Reply Threaded More More options
Print post
Permalink
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LANG-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell closed LANG-516.
------------------------------

    Resolution: Fixed

I've changed it so that > 0x7f values are not escaped to numerical entities.

> Define standard for escape/unescape XML
> ---------------------------------------
>
>                 Key: LANG-516
>                 URL: https://issues.apache.org/jira/browse/LANG-516
>             Project: Commons Lang
>          Issue Type: Sub-task
>            Reporter: Henri Yandell
>             Fix For: 3.0
>
>


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.