diferences between 22 and python 23
fredrik at pythonware.com
Sun Dec 7 14:02:30 CET 2003
Martin v. Löwis wrote:
> This was by BDFL pronouncement, and I agree with that decision. I
> personally would have favoured UTF-8 as system encoding in Python, as
> it would support all languages, and would allow for as little mistakes
> as ASCII (e.g. you can't mistake a Latin-1 or KOI-8R string as UTF-8).
> I would consider chosing Latin-1 as euro-centric
otoh, it would make sense to use 8-bit strings to store Unicode strings
that happen to contain only Unicode code points in the full 8-bit range
(but that would make it almost-exactly-but-not-quite the same thing
as a Latin-1 string, which we all know is a euro-centric thingy... and
the "almost" part would give people even more reasons to complain
about how "rude" I am when I take them to task for flaming others ;-)
> > or it will use unicode through unicode objects
> > and their interfaces, which I imagine would be the way it started.
> Yes, all library functions that expect strings should support Unicode
I assume you meant:
Yes, all library functions that expect *text* strings should support
but maybe that was obvious from the thread context.
> I'm not too concerned with memory-limited implementations. It would be
> feasible to re-implement the Unicode type to use UTF-8 as its internal
> representation, but that would be tedious to do on the C level, and it
> would lead to really bad performance, given that slicing and indexing
> become inefficient.
and it *may* lead to really bad performance, given that slicing and
indexing *might* become inefficient.
having written Python's Unicode string type, I'm now thinking that it might
have been better to use a polymorphic "text" type with either UTF-8 or
encoded char or wchar buffers, and do dynamic translation based on usage
patterns. I've been playing with this idea in Pytte, but as usual, there's so
much code, and so little time...
More information about the Python-list