[Python-Dev] Multilingual programming article on the Red Hat Developer blog

Chris Angelico rosuav at gmail.com
Tue Sep 16 20:05:10 CEST 2014

On Wed, Sep 17, 2014 at 3:55 AM, Jim Baker <jim.baker at python.org> wrote:
> Of course, if you do actually have a smuggled isolated low surrogate
> FOLLOWED by a smuggled isolated high surrogate - guess what, the only
> interpretation is a codepoint. Or perhaps more likely garbage. Of course it
> doesn't happen so often, so maybe we are fine with the occasional bug ;)
> I personally suspect that we will resolve this by also supporting UCS-4 as a
> representation in Jython 3.x for such Unicode strings, albeit with the
> limitation that we have simply moved the problem to when we try to call Java
> methods taking java.lang.String objects.

That'll cost efficiency, of course, but it'll guarantee correctness.
And maybe, just maybe, you'll be able to put some pressure on Java
itself to start supporting UCS-4 natively...

One can dream.


More information about the Python-Dev mailing list