Python 3.2 has some deadly infection
Robin Becker
robin at reportlab.com
Mon Jun 2 07:10:48 EDT 2014
............
>
> I probably should have mentioned it, but in my case it's not even Python
> (Java). It's exactly the same principal - an assumption was made that has
> become entrenched due to the fear of breakage. If they'd been forced to
> think about encodings up-front, it shouldn't have been an issue, which was
> the point I was trying to make.
>
there seems to be an implicit assumption in python land that encoded strings are
the norm. On virtually every computer I encounter that assumption is wrong. The
vast majority of bytes in most computers is not something that can be easily
printed out for humans to read. I suppose some clever pythonista can figure out
an encoding to read my .o / .so etc files, but they are practically meaningless
to a unicode program today. Same goes for most image formats and media files.
Browsers routinely encounter mis/un-encoded pages.
> In Java, it's much worse. At least with Python you can perform string-like
> operations on bytes. In Java you have to convert it to characters before
> you can really do anything with it, so people just use the default encoding
> all the time - especially if they want the convenience of line-by-line
> reading using BufferedReader ...
..
In python I would have preferred for bytes to remain the default io mechanism,
at least that would allow me to decide if I need any decoding.
As the cat example
http://lucumr.pocoo.org/2014/5/12/everything-about-unicode/
showed these extra assumptions are sometimes really in the way.
--
Robin Becker
More information about the Python-list
mailing list