<div dir="ltr">Without reading subject of this letter, what is your idea about which encoding Python 3 uses with open() calls on a text file? Please write in reply and then scroll down.<br><br><br>Without cheating my opinion was cp1252 (latin-1), because it was the way Python 2 assumed all text files are. Or Python 2 uses ISO-8859-1?<br>
<div><br>But it appeared to be different way - <a href="http://docs.python.org/3/library/functions.html#open">http://docs.python.org/3/library/functions.html#open</a>. No, it appeared here - <a href="https://bitbucket.org/techtonik/hexdump/pull-request/1/">https://bitbucket.org/techtonik/hexdump/pull-request/1/</a> and after a small lecture I realized how things are bad.<br>
<br><div>open() in Python uses system encoding to read files by default. So, if Python script writes text file with some Cyrillic character on my Russian Windows, another Python script on English Windows or Greek Windows will not be able to read it. This is just what happened.<br>
<br></div><div>The solution proposed is to specify encoding explicitly. That means I have to know it. Luckily, in this case the text file is my .py where I knew the encoding beforehand. In real world you can never know the encoding beforehand.<br>
<br></div><div>So, what should Python do if it doesn't know the encoding of text file it opens:<br>1. Assume that encoding of text file is the encoding of your operating system<br>2. Assume that encoding of text file is ASCII<br>
3. Assume that encoding of text file is UTF-8<br><br>Please write in reply and then scroll down.<br><br><br>I propose three, because ASCII is a binary compatible subset of UTF-8. Choice one is the current behaviour, and it is very bad. Troubleshooting this issue, which should be very common, requires a lot of prior knowledge about encodings and awareness of difference system defaults. For cross-platform work with text files this fact implicitly requires you to always use 'encoding' parameter for open().<br>
<br><br></div><div>Is it enough for a PEP? This stuff is rather critical IMO, so even if it will be rejected there will be a documented design decision.<br></div><div>-- <br>anatoly t.</div><div><div>
</div></div></div></div>