
Le mardi 28 juin 2011 à 09:33 -0700, Toshio Kuratomi a écrit :
Issuing a warning like "open used without explicit encoding may lead to errors" if open() is used without an explicit encoding would help a little (at least, people who get errors would then have an inkling that the culprit might be an open() call). If I read Victor's previous email correctly, though, he said this was previously rejected.
Oh sorry, I used the wrong word. I listed two other possible solutions, but there were not really rejetected. I just thaugh that changing the default encoding to UTF-8 was the most well accepted idea. If I mix different suggestions together: another solution is to emit a warning if the encoding is not specified (not only if the locale encoding is different than UTF-8). Using encoding="locale" would make it quiet. It would be annoying if the warning would be displayed by default ("This will make things harder for simple scripts which are not intended to be cross-platform." wrote Paul Moore). It only makes sense if we use the same policy than unclosed files/sockets: hidden by default, but it can be configured using command line options (-Werror, yeah!).
Another brainstorming solution would be to use different default encodings on different platforms. For instance, for writing files, utf-8 on *nix systems (including macosX) and utf-16 on windows.
I don't think that UTF-16 is a better choice than UTF-8 on Windows :-(
For reading files, check for a utf-16 BOM, if not present, operate as utf-8.
Oh oh. I already suggested to read the BOM. See http://bugs.python.org/issue7651 and read the email thread "Improve open() to support reading file starting with an unicode BOM" http://mail.python.org/pipermail/python-dev/2010-January/097102.html Reading the BOM is a can of worm, everybody expects something different. I forgot the idea of changing that. Victor