Terry Reedy writes:
Sorry, Terry, but you're basically wrong here.
This is not a nice way to start a response, especially when you go on to admit that I was right as the the user case I discussed. Here is what you clipped.
The point is that the user case you discuss is a toy case. Of course the problem goes away if you get to define the problem away. I don't know of any nice way to say that.
In another post I detailed the *small* amount (one paragraph) that I believe such people need to know to move to Python3. I have not seen this minimum laid out before and I think it would be useful to help such people move to Python3 without FUD fear.
I'll go back and take a look at it. It probably is useful. But I don't think it deals with the real issue. The problem is that without substantially more knowledge than what you describe as the minimum, the fear, uncertainty, and doubt is *real*. Anybody who follows Mailman, for example, is going to hear (even today, though much less frequently than 3 years ago, and only for installations with ancient Mailman from 2006 or so) of weird Unicode errors that cause messages to be "lost". Hearing that Python 3 requires everything be decoded to Unicode is not going to give innocent confidence. There's also a lot of FUD being created out of whole cloth, as well, such as the alleged inefficiency of recoding ASCII into Unicode, etc., which doesn't matter for most applications. The problem is that the FUD based on real issues that you don't understand gives credibility to the FUD that somebody made up.
OK, real-life example. My wife has colleagues in China. They interchange emails (utf-8 encoded) with project budgets and some Chinese characters. Suppose she asks me to use Python to pick out ¥ renminbi/yuan figures and convert to dollars. What 'strong imposition' does Python3 make to learn things I would not have to know to do the same thing in Python2?
None. The FUD is not about *processing* non-ASCII. It's about non-ASCII horking your process even though you have no intention of processing it.
I do not consider that adding an encoding argument to make the same work in Python3 to be "a strong imposition of unicode awareness". Do you?
Yes, I do. If you get it wrong, you will still get a fatal UnicodeError.
In order to do much other than pass, I believe one typically needs to know the encoding of the file, even in Python2.
The gentleman once again seems to be suffering from a misconception. Quite often you need to know nothing about the encoding of a file, except that the parts you care about are ASCII-encoded. For example, in an American programming shop git log | ./count-files-touched-per-day.py will founder on 'Óscar Fuentes' as author, unless you know what coding system is used, or know enough to use latin-1 (because it's effectively binary, not because it's the actual encoding).
And of course, knowing about and using the one unicode byte encoding is *much* easier than knowing about and using the 100 or so non-unicode (or unicode subset) encodings.
To me, Python3's
s = open('text.txt', 'utf-8').read()
is easier and simpler than either Python2
Indeed, it is. But we're not talking about dealing with Unicode; we're talking about why somebody who really only wants to deal with ASCII needs to know more about Unicode in Python 3 than in Python 2.
(and please pardon any errors as I never actually did this)
import codecs s = codecs.open('text.txt', 'utf-8').read()
or
f = open('text.txt') s = unicode(f.read, 'utf-8')
The reason why Unicode is part of the FUD is that in Python 2 you never needed to do that, unless you wanted to deal with a non-English language. With Python 3 you need to deal with the codec, always, or risk a UnicodeError simply because some Spaniard's name gets mentioned by somebody who cares about orthography.