Python usage numbers

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Feb 12 03:23:24 CET 2012


On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote:

> On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow
> <ericsnowcurrently at gmail.com> wrote:
>> However, in at
>> least one current thread (on python-ideas) and at a variety of times in
>> the past, _some_ people have found Unicode in Python 3 to make more
>> work.
> 
> If Unicode in Python is causing you more work, isn't it most likely that
> the issue would have come up anyway?

The argument being made is that in Python 2, if you try to read a file 
that contains Unicode characters encoded with some unknown codec, you 
don't have to think about it. Sure, you get moji-bake rubbish in your 
database, but that's the fault of people who insist on not being 
American. Or who spell Zoe with an umlaut.

In Python 3, if you try the same thing, you get an error. Fixing the 
error requires thought, and even if that is only a minuscule amount of 
thought, that's too much for some developers who are scared of Unicode. 
Hence the FUD that Python 3 is too hard because it makes you learn 
Unicode.

I know this isn't exactly helpful, but I wish they'd just HTFU. I'm with 
Joel Spolsky on this one: if you're a programmer in 2003 who doesn't have 
at least a basic working knowledge of Unicode, you're the equivalent of a 
doctor who doesn't believe in germs.

http://www.joelonsoftware.com/articles/Unicode.html

Learning a basic working knowledge of Unicode is not that hard. You don't 
need to be an expert, and it's just not that scary.

The use-case given is:

"I have a file containing text. I can open it in an editor and see it's 
nearly all ASCII text, except for a few weird and bizarre characters like 
£ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an 
error. What should I do that requires no thought?"

Obvious answers:

- Try decoding with UTF8 or Latin1. Even if you don't get the right 
characters, you'll get *something*.

- Use open(filename, encoding='ascii', errors='surrogateescape')

(Or possibly errors='ignore'.)



-- 
Steven



More information about the Python-list mailing list