[Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts

anatoly techtonik techtonik at gmail.com
Sat Jun 8 15:13:22 CEST 2013


Without reading subject of this letter, what is your idea about which
encoding Python 3 uses with open() calls on a text file? Please write in
reply and then scroll down.


Without cheating my opinion was cp1252 (latin-1), because it was the way
Python 2 assumed all text files are. Or Python 2 uses ISO-8859-1?

But it appeared to be different way -
http://docs.python.org/3/library/functions.html#open. No, it appeared here
- https://bitbucket.org/techtonik/hexdump/pull-request/1/ and after a small
lecture I realized how things are bad.

open() in Python uses system encoding to read files by default. So, if
Python script writes text file with some Cyrillic character on my Russian
Windows, another Python script on English Windows or Greek Windows will not
be able to read it. This is just what happened.

The solution proposed is to specify encoding explicitly. That means I have
to know it. Luckily, in this case the text file is my .py where I knew the
encoding beforehand. In real world you can never know the encoding
beforehand.

So, what should Python do if it doesn't know the encoding of text file it
opens:
1. Assume that encoding of text file is the encoding of your operating
system
2. Assume that encoding of text file is ASCII
3. Assume that encoding of text file is UTF-8

Please write in reply and then scroll down.


I propose three, because ASCII is a binary compatible subset of UTF-8.
Choice one is the current behaviour, and it is very bad. Troubleshooting
this issue, which should be very common, requires a lot of prior knowledge
about encodings and awareness of difference system defaults. For
cross-platform work with text files this fact implicitly requires you to
always use 'encoding' parameter for open().


Is it enough for a PEP? This stuff is rather critical IMO, so even if it
will be rejected there will be a documented design decision.
-- 
anatoly t.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130608/d337100f/attachment.html>


More information about the Python-ideas mailing list