[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?
victor.stinner at haypocalc.com
Wed Jun 29 00:34:54 CEST 2011
Le mardi 28 juin 2011 à 09:33 -0700, Toshio Kuratomi a écrit :
> Issuing a warning like "open used without explicit encoding may lead
> to errors" if open() is used without an explicit encoding would help
> a little (at least, people who get errors would then have an inkling
> that the culprit might be an open() call). If I read Victor's previous
> email correctly, though, he said this was previously rejected.
Oh sorry, I used the wrong word. I listed two other possible solutions,
but there were not really rejetected. I just thaugh that changing the
default encoding to UTF-8 was the most well accepted idea.
If I mix different suggestions together: another solution is to emit a
warning if the encoding is not specified (not only if the locale
encoding is different than UTF-8). Using encoding="locale" would make it
quiet. It would be annoying if the warning would be displayed by default
("This will make things harder for simple scripts which are not
intended to be cross-platform." wrote Paul Moore). It only makes sense
if we use the same policy than unclosed files/sockets: hidden by
default, but it can be configured using command line options (-Werror,
> Another brainstorming solution would be to use different default encodings on
> different platforms. For instance, for writing files, utf-8 on *nix systems
> (including macosX) and utf-16 on windows.
I don't think that UTF-16 is a better choice than UTF-8 on Windows :-(
> For reading files, check for a utf-16 BOM, if not present, operate as utf-8.
Oh oh. I already suggested to read the BOM. See
http://bugs.python.org/issue7651 and read the email thread "Improve
open() to support reading file starting with an unicode BOM"
Reading the BOM is a can of worm, everybody expects something different.
I forgot the idea of changing that.
More information about the Python-Dev