[Python-ideas] Python 3000 TIOBE -3%

Sat Feb 11 04:32:20 CET 2012

Terry Reedy writes:

 > > Sorry, Terry, but you're basically wrong here.
 > 
 > This is not a nice way to start a response, especially when you go on to 
 > admit that I was right as the the user case I discussed. Here is what 
 > you clipped.

The point is that the user case you discuss is a toy case.  Of course
the problem goes away if you get to define the problem away.  I don't
know of any nice way to say that.

 > In another post I detailed the *small* amount (one paragraph) that
 > I believe such people need to know to move to Python3. I have not
 > seen this minimum laid out before and I think it would be useful to
 > help such people move to Python3 without FUD fear.

I'll go back and take a look at it.  It probably is useful.  But I
don't think it deals with the real issue.

The problem is that without substantially more knowledge than what you
describe as the minimum, the fear, uncertainty, and doubt is *real*.
Anybody who follows Mailman, for example, is going to hear (even
today, though much less frequently than 3 years ago, and only for
installations with ancient Mailman from 2006 or so) of weird Unicode
errors that cause messages to be "lost".  Hearing that Python 3
requires everything be decoded to Unicode is not going to give
innocent confidence.

There's also a lot of FUD being created out of whole cloth, as well,
such as the alleged inefficiency of recoding ASCII into Unicode, etc.,
which doesn't matter for most applications.  The problem is that the
FUD based on real issues that you don't understand gives credibility
to the FUD that somebody made up.

 > OK, real-life example. My wife has colleagues in China. They interchange 
 > emails (utf-8 encoded) with project budgets and some Chinese characters. 
 > Suppose she asks me to use Python to pick out ¥ renminbi/yuan figures 
 > and convert to dollars. What 'strong imposition' does Python3 make to 
 > learn things I would not have to know to do the same thing in
 > Python2?

None.  The FUD is not about *processing* non-ASCII.  It's about
non-ASCII horking your process even though you have no intention of
processing it.

 > I do not consider that adding an encoding argument to make the same work 
 > in Python3 to be "a strong imposition of unicode awareness". Do
 > you?

Yes, I do.  If you get it wrong, you will still get a fatal UnicodeError.

 > In order to do much other than pass, I believe one typically needs
 > to know the encoding of the file, even in Python2.

The gentleman once again seems to be suffering from a misconception.
Quite often you need to know nothing about the encoding of a file,
except that the parts you care about are ASCII-encoded.  For example,
in an American programming shop

    git log | ./count-files-touched-per-day.py

will founder on 'Óscar Fuentes' as author, unless you know what coding
system is used, or know enough to use latin-1 (because it's
effectively binary, not because it's the actual encoding).

 > And of course, knowing about and using the one unicode byte
 > encoding is *much* easier than knowing about and using the 100 or
 > so non-unicode (or unicode subset) encodings.
 > 
 > To me, Python3's
 > 
 >    s = open('text.txt', 'utf-8').read()
 > 
 > is easier and simpler than either Python2

Indeed, it is.  But we're not talking about dealing with Unicode;
we're talking about why somebody who really only wants to deal with
ASCII needs to know more about Unicode in Python 3 than in Python 2.

 > (and please pardon any errors as I never actually did this)
 > 
 >    import codecs
 >    s = codecs.open('text.txt', 'utf-8').read()
 > 
 > or
 > 
 >    f = open('text.txt')
 >    s = unicode(f.read, 'utf-8')

The reason why Unicode is part of the FUD is that in Python 2 you
never needed to do that, unless you wanted to deal with a non-English
language.  With Python 3 you need to deal with the codec, always, or
risk a UnicodeError simply because some Spaniard's name gets mentioned
by somebody who cares about orthography.