[Python-Dev] Python3 "complexity"
Kristján Valur Jónsson
kristjan at ccpgames.com
Thu Jan 9 14:00:59 CET 2014
> -----Original Message-----
> From: Paul Moore [mailto:p.f.moore at gmail.com]
> Sent: 9. janúar 2014 10:53
> To: Kristján Valur Jónsson
> Cc: Stefan Ring; python-dev at python.org
> > Moving to python 3, I found that this quickly caused problems.
> You don't say what problems, but I assume encoding/decoding errors. So the
> files apparently weren't in the system encoding. OK, at that point I'd
> probably say to heck with it and use latin-1. Assuming I was sure that (a) I'd
> never hit a non-ascii compatible file (e.g., UTF16) and
> (b) I didn't have a decent means of knowing the encoding.
Right. But even latin-1, or better, cp1252 (on windows) does not solve it because these have undefined
code points. So you need 'surrogateescape' error handling as well. Something that I didn't know at
the time, having just come from python 2 and knowing its Unicode model well.
> One thing that genuinely is difficult is that because disk files don't have any
> out-of-band data defining their encoding, it *can* be hard to know what
> encoding to use in an environment where more than one encoding is
> common. But this isn't really a Python issue - as I say, I've hit it with GNU
> tools, and I've had to explain the issue to colleagues using Java on many
> occasions. The key difference is that with grep, people blame the file,
> whereas with Python people blame the language :-) (Of course, with Java,
> people expect this sort of problem so they blame the perverseness of the
> universe as a whole... ;-))
Which reminds me, can Python3 read text files with BOM automatically yet?
More information about the Python-Dev