python 2.7 and unicode (one more time)

Chris Angelico rosuav at gmail.com
Sun Nov 23 07:37:30 CET 2014


On Sun, Nov 23, 2014 at 5:17 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> If Python treated the character set as an implementation detail, the
> programmer would have no way of knowing whether
>
> s = u"ö"
>
> is legal or not, since you cannot know whether or not ö is a supported
> character in the running Python. It might work on your system, and fail for
> other people. That is worse than the old distinction between "narrow"
> and "wide" builds. It would be a lazy and stupid design, and especially
> stupid since there really in no good alternative to Unicode today. ASCII is
> not even sufficient for American English, the whole Windows code page idea
> is a horrible mess, none of the legacy encodings are suitable for more than
> a tiny fraction of the world.

(Code pages aren't a Windows concept, of course, though I guess that's
the main place where they're found on PCs today.)

The only trouble with enforcing Unicode is Japanese encodings and the
whole Han unification debate. Ultimately, you have to pick a side: are
you siding with those who say there are fewer characters with multiple
forms, or with those who say there are more distinct characters? If
the former, go with Unicode. If the latter, be prepared to do heaps of
work yourself, and probably be stuck with supporting only Japanese,
because encodings like Shift-JIS aren't going to be able to represent
Scandinavian text.

Me, I'm siding with Unicode. The politicking of Han unification
doesn't interest me, so I'm happy to accept a position that says that
they're all the same character, just as the Roman letter A can be used
in English, Italian, German, Swedish, etc, etc, etc (maybe with some
combining characters for diacriticals). That gives me access to all
the world's languages with a single character set and some trustworthy
encodings. I think it's a fine trade-off: philosophy I don't care
about versus correctness in my code.

ChrisA



More information about the Python-list mailing list