[Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts

MRAB python at mrabarnett.plus.com
Sat Jun 8 18:48:46 CEST 2013


On 08/06/2013 14:13, anatoly techtonik wrote:
> Without reading subject of this letter, what is your idea about which
> encoding Python 3 uses with open() calls on a text file? Please write in
> reply and then scroll down.
>
>
> Without cheating my opinion was cp1252 (latin-1), because it was the way
> Python 2 assumed all text files are. Or Python 2 uses ISO-8859-1?
>
> But it appeared to be different way -
> http://docs.python.org/3/library/functions.html#open. No, it appeared
> here - https://bitbucket.org/techtonik/hexdump/pull-request/1/ and after
> a small lecture I realized how things are bad.
>
> open() in Python uses system encoding to read files by default. So, if
> Python script writes text file with some Cyrillic character on my
> Russian Windows, another Python script on English Windows or Greek
> Windows will not be able to read it. This is just what happened.
>
> The solution proposed is to specify encoding explicitly. That means I
> have to know it. Luckily, in this case the text file is my .py where I
> knew the encoding beforehand. In real world you can never know the
> encoding beforehand.
>
> So, what should Python do if it doesn't know the encoding of text file
> it opens:
> 1. Assume that encoding of text file is the encoding of your operating
> system
> 2. Assume that encoding of text file is ASCII
> 3. Assume that encoding of text file is UTF-8
>
[snip]
I always use '''encoding="utf-8"''', but it's annoying that it's not
the default.

'open' defaults to universal newline support when opening for reading
(though that's not possible when opening for writing!), and it would be
nice if it also defaulted to a 'universal' encoding, i.e. UTF-8.

You can still use '''encoding=None''' if you want the operating
system's encoding.


More information about the Python-ideas mailing list