Code of Conduct, Trolls, and Thankless Jobs [was Re: Python Unicode handling wins again -- mostly]

Steven D'Aprano steve at pearwood.info
Tue Dec 3 06:41:07 CET 2013


On Tue, 03 Dec 2013 04:32:13 +0000, Grant Edwards wrote:

> On 2013-12-03, Roy Smith <roy at panix.com> wrote:
> 
>> "I believe that Pythonistas should commit themselves to achieving the
>> goal, before this decade is out, of making Python 3 the default version
>> and having everybody be cool with unicode."
> 
> I'm cool with Unicode as long as it "just works" without me ever having
> to understand it 

That will never happen. Unicode is a bit like floating point maths: 
there's always *some* odd corner case that will lead to annoyance and 
confusion and even murder:

http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail

And then there are legacy encodings. There are three things in life that 
are inevitable: death, taxes, and text with the wrong encoding. Anyone 
dealing with text they didn't generate themselves is going to have to 
deal with mojibake at some point.

Having said that, if you control the text and always use UTF-8 for 
storage and transmission, Unicode isn't that hard. Decode bytes to 
Unicode as early as possible, do all your work in text rather than bytes, 
then encode back to bytes as late as possible, and you'll be fine.


> and I can interact effortlessly with plain old ASCII files.  

That at least is easy, provided you can guarantee that what you think if 
plain ol' ASCII actually is plain ol' ASCII, which isn't as easy as you 
might think given that an awful lot of people think that "extended ASCII" 
is a thing and that you ought to be able to deal with it just like ASCII.


> Evertime I start to read anything about Unicode with any
> technical detail at all, I start to get dizzy and bleed from the ears.

Heh, the standard certainly covers a lot of ground.


-- 
Steven



More information about the Python-list mailing list