[Python-Dev] Multilingual programming article on the Red Hat Developer blog

Chris Angelico rosuav at gmail.com
Tue Sep 16 20:05:10 CEST 2014


On Wed, Sep 17, 2014 at 3:55 AM, Jim Baker <jim.baker at python.org> wrote:
> Of course, if you do actually have a smuggled isolated low surrogate
> FOLLOWED by a smuggled isolated high surrogate - guess what, the only
> interpretation is a codepoint. Or perhaps more likely garbage. Of course it
> doesn't happen so often, so maybe we are fine with the occasional bug ;)
>
> I personally suspect that we will resolve this by also supporting UCS-4 as a
> representation in Jython 3.x for such Unicode strings, albeit with the
> limitation that we have simply moved the problem to when we try to call Java
> methods taking java.lang.String objects.
>

That'll cost efficiency, of course, but it'll guarantee correctness.
And maybe, just maybe, you'll be able to put some pressure on Java
itself to start supporting UCS-4 natively...

One can dream.

ChrisA


More information about the Python-Dev mailing list