[Python-ideas] Python 3 open() text files: make encoding parameter optional for cross-platform scripts

Victor Stinner victor.stinner at gmail.com
Mon Jun 10 11:34:45 CEST 2013


2013/6/9 anatoly techtonik <techtonik at gmail.com>:
>  For a cross-platform language, as a programmer, you're responsible to
> detect the particular feature of operating

It is not how Python is designed. Python tries to remove some minor
differencies betwen operating systems, but it cannot remove all
differencies. For example, the mmap module has an API different on
UNIX and on Windows:
http://docs.python.org/3/library/mmap.html#mmap.mmap

Python stays close to the operating system for best performances.

If you would like a real portable language with a well defined
behaviour, you may develop libraries on top of Python and its stdlib.

Slowly, Python begins to integrate such libraries to have an higher
level API. shutil is based on the os module for example, and provide
an higher level API.

>> Just one example: configure script generates a Makefile using the locale
>> encoding, Python gets data from Makefile. If you use a path with non-ascii
>> character, use utf-8 in python whereas the locale is iso-8859-1,  python
>> cannot be compiled anymore or will refuse to start.
>
> I am not a C developer, but as SCons committer I don't know Python tools
> that directly work with Makefiles.

It is just one example. Please read the old thread of python-dev for
other examples.

> This choice also breaks key Unix principle of doing one thing good, because
> it is not the responsibility of open() call to determine system encoding.

Most applications on all platforms use the locale encoding.
Applications do not have to "guess" the encoding, it's simple and
reliable to get it (ex: sys.getfilesystemencoding() in Python).

>> When i made the encoding mandatory in my test, more than 70% of calls to
>> open() used encoding="locale". So it's simpler to keep the current default
>> choice.
>
> How many systems have you covered?

I ran tests on Mac OS X, FreeBSD, Windows, Solaris, Linux. I got
errors in the test suite, so no need to copy a file from a platform to
another to get issues. I'm not sure that you understood the real
problem: in short, using the locale encoding provides the best
compliance and cause less bugs (mojibake) than any other encoding.

Anyway, as written in the python-dev thread, and as Nick repeated: the
default vale of encoding parameter of open() is not gonna to change in
Python 3.

Victor


More information about the Python-ideas mailing list