Python usage numbers
steve+comp.lang.python at pearwood.info
Sun Feb 12 03:23:24 CET 2012
On Sun, 12 Feb 2012 12:28:30 +1100, Chris Angelico wrote:
> On Sun, Feb 12, 2012 at 12:21 PM, Eric Snow
> <ericsnowcurrently at gmail.com> wrote:
>> However, in at
>> least one current thread (on python-ideas) and at a variety of times in
>> the past, _some_ people have found Unicode in Python 3 to make more
> If Unicode in Python is causing you more work, isn't it most likely that
> the issue would have come up anyway?
The argument being made is that in Python 2, if you try to read a file
that contains Unicode characters encoded with some unknown codec, you
don't have to think about it. Sure, you get moji-bake rubbish in your
database, but that's the fault of people who insist on not being
American. Or who spell Zoe with an umlaut.
In Python 3, if you try the same thing, you get an error. Fixing the
error requires thought, and even if that is only a minuscule amount of
thought, that's too much for some developers who are scared of Unicode.
Hence the FUD that Python 3 is too hard because it makes you learn
I know this isn't exactly helpful, but I wish they'd just HTFU. I'm with
Joel Spolsky on this one: if you're a programmer in 2003 who doesn't have
at least a basic working knowledge of Unicode, you're the equivalent of a
doctor who doesn't believe in germs.
Learning a basic working knowledge of Unicode is not that hard. You don't
need to be an expert, and it's just not that scary.
The use-case given is:
"I have a file containing text. I can open it in an editor and see it's
nearly all ASCII text, except for a few weird and bizarre characters like
£ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
error. What should I do that requires no thought?"
- Try decoding with UTF8 or Latin1. Even if you don't get the right
characters, you'll get *something*.
- Use open(filename, encoding='ascii', errors='surrogateescape')
(Or possibly errors='ignore'.)
More information about the Python-list