[Python-Dev] PEP 393 Summer of Code Project

fwierzbicki at gmail.com fwierzbicki at gmail.com
Fri Sep 9 00:09:09 CEST 2011


On Fri, Aug 26, 2011 at 3:00 PM, Guido van Rossum <guido at python.org> wrote:
> I have a different question about IronPython and Jython now. Do their
> regular expression libraries support Unicode better than CPython's?
> E.g. does "." match a surrogate pair? Tom C suggests that Java's regex
> libraries get this and many other details right despite Java's use of
> UTF-16 to represent strings. So hopefully Jython's re library is built
> on top of Java's?
>
> PS. Is there a better contact for Jython?
The best contact for Unicode and Jython is Jim Baker (I added him to
the cc) - I'll do my best to answer though: Java 5 added a bunch of
methods for dealing with Unicode that doesn't fit into 2 bytes - and
looking at our code for our Unicode object, I see that we are using
methods like the codePointCount method off of java.lang.String to
compute length[1] and using similar methods all through that code to
make sure we deal in code points when dealing with unicode.  So it
looks pretty good for us as far as I can tell.

[1] http://download.oracle.com/javase/6/docs/api/java/lang/String.html#codePointCount(int,
int)

-Frank Wierzbicki


More information about the Python-Dev mailing list