Masklinn, 11.02.2012 13:41:
On 2012-02-11, at 13:33 , Stefan Behnel wrote:
Paul Moore, 11.02.2012 11:47:
On 11 February 2012 00:07, Terry Reedy wrote:
Nor is there in 3.x.
I view that claim as FUD, at least for many users, and at least until the persons making the claim demonstrate it. In particular, I claim that people who use Python2 knowing nothing of unicode do not need to know much more to do the same things in Python3.
Concrete example, then.
I have a text file, in an unknown encoding (yes, it does happen to me!) but opening in an editor shows it's mainly-ASCII. I want to find all the lines starting with a '*'. The simple
with open('myfile.txt') as f: for line in f: if line.startswith('*'): print(line)
fails with encoding errors. What do I do? Short answer, grumble and go and use grep (or in more complex cases, awk) :-(
Or just use the ISO-8859-1 encoding.
It's true that requires to handle encodings upfront where Python 2 allowed you to play fast-and-lose though.
Well, except for the cases where that didn't work. Remember that implicit encoding behaves in a platform dependent way in Python 2, so even if your code runs on your machine doesn't mean it will work for anyone else.
And using latin-1 in that context looks and feels weird/icky, the file is not encoded using latin-1, the encoding just happens to work to manipulate bytes as ascii text + non-ascii stuff.
Correct. That's precisely the use case described above. Besides, it's perfectly possible to process bytes in Python 3. You just have to open the file in binary mode and do the processing at the byte string level. But if you don't care (and if most of the data is really ASCII-ish), using the ISO-8859-1 encoding in and out will work just fine for problems like the above. Stefan