[Python-3000] UTF-16

Paul Prescod paul at prescod.net
Fri Sep 1 06:24:19 CEST 2006


On 8/31/06, Guido van Rossum <guido at python.org> wrote:
>
> On 8/31/06, Paul Prescod <paul at prescod.net> wrote:
> > On 8/31/06, Guido van Rossum <guido at python.org> wrote:
> > > (Adding back py3k list assuming you just forgot it)
> >
> > Yes, thanks. Gmail's UI really optimizes the "Reply To" operation of
> "Reply
> > To All."
> >
> > > > Plus, it sounds like you're proposing that the encodings of the
> > underlying
> > > > data would leak through to the application. As I understood
> Fredrick's
> > > > model, the intention was to treat the encoding as an implementation
> > detail.
> > > > If it works well, this could be an important differentiator for
> Python
> > > > (versus Java) as Unicode already is (versus Ruby).
> > >
> > > *Only* for UTF-16, which I consider a necessary evil since we can't
> > > rewrite the Java and .NET standards.
> >
> > I see what you're getting at.
> >
> > I'd say that decoding UTF-16 data in CPython and PyPy should (by
> default)
> > create true Unicode characters. Jython and IronPython could create
> > surrogates and characters when necessary. When you run the program in
> > CPython you'll get better behaviour than in Jython/IronPython. Maybe
> there
> > could be a way to make CPython run like Jython and IronPython if you
> wanted
> > 100% absolute compatibility between the environments. I think that we
> agree
> > that it would be unfortunate if CPython copied Java and .NET to its own
> > detriment. It's also not inconceivable that Java and .NET might evolve a
> > 4-byte mode in the long term.
>
> I think it would be best to do this as a CPython configuration option
> just like it's done today. You can choose 4-byte or 2-byte Unicode
> (essentially UCS-4 or UTF-16) in order to be compatible with other
> packages on the platform. Yes, 4-byte gives better Unicode support.
> But 2-bytes may be more compatible with other stuff on the platform.
> Too bad .NET and Java don't have this option. :-)


The current model is a hack (and I wrote the PEP!).

If you decide to go to all of the effort and expense of polymorphic strings,
I cannot understand why a user should be forced to choose between 16 and 32
bit strings AT BUILD TIME. PEP 261 says that reason for the build-time
solution is:

"[The alternate solutions] ... would require a much more
complex implementation than the accepted solution. ...
Guido is not willing to undertake the implementation right
now. ...This PEP represents least-effort solution."

Fair enough. A world of finite resouces. But I would be very annoyed if my
ISP had installed a Python version that could magically handle 8-bit and
16-bit strings efficiently but I had to ask them to install a special
version to handle 32 bit strings at all. Obviously build-time configuration
is the least flexible of all available options.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060831/3dd236f2/attachment.htm 


More information about the Python-3000 mailing list