[jira] Created: (LANG-507) StringEscapeUtils.unescapeJava should support \u+ notation

7 messages Options
Embed this post
Permalink
JIRA jira@apache.org

[jira] Created: (LANG-507) StringEscapeUtils.unescapeJava should support \u+ notation

Reply Threaded More More options
Print post
Permalink
StringEscapeUtils.unescapeJava should support \u+ notation
----------------------------------------------------------

                 Key: LANG-507
                 URL: https://issues.apache.org/jira/browse/LANG-507
             Project: Commons Lang
          Issue Type: Improvement
    Affects Versions: 2.4
            Reporter: Gregor B. Rosenauer
            Priority: Trivial


Currently, when trying to unescape a String with Unicode escapes in the common notation, e.g., \u+0022, I get a NumberFormatException:

org.apache.commons.lang.exception.NestableRuntimeException: Unable to parse unicode value: +002

Note that the number is also parsed incorrectly as it is shortened by one character (obviously, the parser gets confused by the '+' and only takes up to 4 bytes, so it neglects the last digit).

I am aware that in Java, Unicode is escaped as "\u" followed by 4 bytes that represent the hex code in the Unicode map, but the \u+ notation is commonly used outside the Java world and it would be very handy if StringEscapeUtils supported that, at least as an option.

Would you please consider adding this feature to 3.0?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Updated: (LANG-507) StringEscapeUtils.unescapeJava should support \u+ notation

Reply Threaded More More options
Print post
Permalink

     [ https://issues.apache.org/jira/browse/LANG-507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell updated LANG-507:
-------------------------------

    Fix Version/s: 3.0

It doesn't sound like it should be in unescapeJava if it's not in the Java spec, but sounds like an interesting feature to be able to support.

> StringEscapeUtils.unescapeJava should support \u+ notation
> ----------------------------------------------------------
>
>                 Key: LANG-507
>                 URL: https://issues.apache.org/jira/browse/LANG-507
>             Project: Commons Lang
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Gregor B. Rosenauer
>            Priority: Trivial
>             Fix For: 3.0
>
>
> Currently, when trying to unescape a String with Unicode escapes in the common notation, e.g., \u+0022, I get a NumberFormatException:
> org.apache.commons.lang.exception.NestableRuntimeException: Unable to parse unicode value: +002
> Note that the number is also parsed incorrectly as it is shortened by one character (obviously, the parser gets confused by the '+' and only takes up to 4 bytes, so it neglects the last digit).
> I am aware that in Java, Unicode is escaped as "\u" followed by 4 bytes that represent the hex code in the Unicode map, but the \u+ notation is commonly used outside the Java world and it would be very handy if StringEscapeUtils supported that, at least as an option.
> Would you please consider adding this feature to 3.0?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Commented: (LANG-507) StringEscapeUtils.unescapeJava should support \u+ notation

Reply Threaded More More options
Print post
Permalink
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LANG-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725504#action_12725504 ]

Henri Yandell commented on LANG-507:
------------------------------------

I noticed in Java that \uuuuuuuu0022 is legal.

As an aside I wonder if the \u+ is a misunderstanding of regexes explaining that :) Or vice versa.

I think this is a very easy patch to do to text.translate.UnicodeEscaper if anyone wants to look at that. A boolean parameter to the constructor perhaps, with empty being false.

> StringEscapeUtils.unescapeJava should support \u+ notation
> ----------------------------------------------------------
>
>                 Key: LANG-507
>                 URL: https://issues.apache.org/jira/browse/LANG-507
>             Project: Commons Lang
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Gregor B. Rosenauer
>            Priority: Trivial
>             Fix For: 3.0
>
>
> Currently, when trying to unescape a String with Unicode escapes in the common notation, e.g., \u+0022, I get a NumberFormatException:
> org.apache.commons.lang.exception.NestableRuntimeException: Unable to parse unicode value: +002
> Note that the number is also parsed incorrectly as it is shortened by one character (obviously, the parser gets confused by the '+' and only takes up to 4 bytes, so it neglects the last digit).
> I am aware that in Java, Unicode is escaped as "\u" followed by 4 bytes that represent the hex code in the Unicode map, but the \u+ notation is commonly used outside the Java world and it would be very handy if StringEscapeUtils supported that, at least as an option.
> Would you please consider adding this feature to 3.0?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Commented: (LANG-507) StringEscapeUtils.unescapeJava should support \u+ notation

Reply Threaded More More options
Print post
Permalink
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LANG-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725506#action_12725506 ]

Henri Yandell commented on LANG-507:
------------------------------------

Note to add unit tests, currently UnicodeEscaper relies on unit tests from StringEscapeUtils covering its code.

> StringEscapeUtils.unescapeJava should support \u+ notation
> ----------------------------------------------------------
>
>                 Key: LANG-507
>                 URL: https://issues.apache.org/jira/browse/LANG-507
>             Project: Commons Lang
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Gregor B. Rosenauer
>            Priority: Trivial
>             Fix For: 3.0
>
>
> Currently, when trying to unescape a String with Unicode escapes in the common notation, e.g., \u+0022, I get a NumberFormatException:
> org.apache.commons.lang.exception.NestableRuntimeException: Unable to parse unicode value: +002
> Note that the number is also parsed incorrectly as it is shortened by one character (obviously, the parser gets confused by the '+' and only takes up to 4 bytes, so it neglects the last digit).
> I am aware that in Java, Unicode is escaped as "\u" followed by 4 bytes that represent the hex code in the Unicode map, but the \u+ notation is commonly used outside the Java world and it would be very handy if StringEscapeUtils supported that, at least as an option.
> Would you please consider adding this feature to 3.0?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Commented: (LANG-507) StringEscapeUtils.unescapeJava should support \u+ notation

Reply Threaded More More options
Print post
Permalink
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LANG-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12726221#action_12726221 ]

Gregor B. Rosenauer commented on LANG-507:
------------------------------------------

I have implemented a local workaround for this, so I could add the patch if nobody else is already on it, will look into it in the next days.

> StringEscapeUtils.unescapeJava should support \u+ notation
> ----------------------------------------------------------
>
>                 Key: LANG-507
>                 URL: https://issues.apache.org/jira/browse/LANG-507
>             Project: Commons Lang
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Gregor B. Rosenauer
>            Priority: Trivial
>             Fix For: 3.0
>
>
> Currently, when trying to unescape a String with Unicode escapes in the common notation, e.g., \u+0022, I get a NumberFormatException:
> org.apache.commons.lang.exception.NestableRuntimeException: Unable to parse unicode value: +002
> Note that the number is also parsed incorrectly as it is shortened by one character (obviously, the parser gets confused by the '+' and only takes up to 4 bytes, so it neglects the last digit).
> I am aware that in Java, Unicode is escaped as "\u" followed by 4 bytes that represent the hex code in the Unicode map, but the \u+ notation is commonly used outside the Java world and it would be very handy if StringEscapeUtils supported that, at least as an option.
> Would you please consider adding this feature to 3.0?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Commented: (LANG-507) StringEscapeUtils.unescapeJava should support \u+ notation

Reply Threaded More More options
Print post
Permalink
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LANG-507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733034#action_12733034 ]

Henri Yandell commented on LANG-507:
------------------------------------

*nudge to Gregor*

> StringEscapeUtils.unescapeJava should support \u+ notation
> ----------------------------------------------------------
>
>                 Key: LANG-507
>                 URL: https://issues.apache.org/jira/browse/LANG-507
>             Project: Commons Lang
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Gregor B. Rosenauer
>            Priority: Trivial
>             Fix For: 3.0
>
>
> Currently, when trying to unescape a String with Unicode escapes in the common notation, e.g., \u+0022, I get a NumberFormatException:
> org.apache.commons.lang.exception.NestableRuntimeException: Unable to parse unicode value: +002
> Note that the number is also parsed incorrectly as it is shortened by one character (obviously, the parser gets confused by the '+' and only takes up to 4 bytes, so it neglects the last digit).
> I am aware that in Java, Unicode is escaped as "\u" followed by 4 bytes that represent the hex code in the Unicode map, but the \u+ notation is commonly used outside the Java world and it would be very handy if StringEscapeUtils supported that, at least as an option.
> Would you please consider adding this feature to 3.0?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

JIRA jira@apache.org

[jira] Closed: (LANG-507) StringEscapeUtils.unescapeJava should support \u+ notation

Reply Threaded More More options
Print post
Permalink
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LANG-507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Henri Yandell closed LANG-507.
------------------------------

    Resolution: Fixed

svn ci -m "Implementing an option to UnicodeUnescaper in which the syntax '\u+0047' is supported. By default it remains unsupported to match Java's method of parsing. Request in LANG-507"
Sending        src/java/org/apache/commons/lang/text/translate/UnicodeUnescaper.java
Sending        src/test/org/apache/commons/lang/text/translate/UnicodeUnescaperTest.java
Transmitting file data ..
Committed revision 826370.

Also changed a thrown RuntimeException to be a thrown IllegalArgumentException as RuntimeException's aren't very happy to catch.

> StringEscapeUtils.unescapeJava should support \u+ notation
> ----------------------------------------------------------
>
>                 Key: LANG-507
>                 URL: https://issues.apache.org/jira/browse/LANG-507
>             Project: Commons Lang
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Gregor B. Rosenauer
>            Priority: Trivial
>             Fix For: 3.0
>
>
> Currently, when trying to unescape a String with Unicode escapes in the common notation, e.g., \u+0022, I get a NumberFormatException:
> org.apache.commons.lang.exception.NestableRuntimeException: Unable to parse unicode value: +002
> Note that the number is also parsed incorrectly as it is shortened by one character (obviously, the parser gets confused by the '+' and only takes up to 4 bytes, so it neglects the last digit).
> I am aware that in Java, Unicode is escaped as "\u" followed by 4 bytes that represent the hex code in the Unicode map, but the \u+ notation is commonly used outside the Java world and it would be very handy if StringEscapeUtils supported that, at least as an option.
> Would you please consider adding this feature to 3.0?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.