Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]
steve at pearwood.info
Tue Dec 3 06:41:07 CET 2013
On Tue, 03 Dec 2013 04:32:13 +0000, Grant Edwards wrote:
> On 2013-12-03, Roy Smith <roy at panix.com> wrote:
>> "I believe that Pythonistas should commit themselves to achieving the
>> goal, before this decade is out, of making Python 3 the default version
>> and having everybody be cool with unicode."
> I'm cool with Unicode as long as it "just works" without me ever having
> to understand it
That will never happen. Unicode is a bit like floating point maths:
there's always *some* odd corner case that will lead to annoyance and
confusion and even murder:
And then there are legacy encodings. There are three things in life that
are inevitable: death, taxes, and text with the wrong encoding. Anyone
dealing with text they didn't generate themselves is going to have to
deal with mojibake at some point.
Having said that, if you control the text and always use UTF-8 for
storage and transmission, Unicode isn't that hard. Decode bytes to
Unicode as early as possible, do all your work in text rather than bytes,
then encode back to bytes as late as possible, and you'll be fine.
> and I can interact effortlessly with plain old ASCII files.
That at least is easy, provided you can guarantee that what you think if
plain ol' ASCII actually is plain ol' ASCII, which isn't as easy as you
might think given that an awful lot of people think that "extended ASCII"
is a thing and that you ought to be able to deal with it just like ASCII.
> Evertime I start to read anything about Unicode with any
> technical detail at all, I start to get dizzy and bleed from the ears.
Heh, the standard certainly covers a lot of ground.
More information about the Python-list