
On 28 June 2011 14:43, Victor Stinner <victor.stinner@haypocalc.com> wrote:
As discussed before on this list, I propose to set the default encoding of open() to UTF-8 in Python 3.3, and add a warning in Python 3.2 if open() is called without an explicit encoding and if the locale encoding is not UTF-8. Using the warning, you will quickly notice the potential problem (using Python 3.2.2 and -Werror) on Windows or by using a different locale encoding (.e.g using LANG="C").
-1. This will make things harder for simple scripts which are not intended to be cross-platform. I use Windows, and come from the UK, so 99% of my text files are ASCII. So the majority of my code will be unaffected. But in the occasional situation where I use a £ sign, I'll get encoding errors, where currently things will "just work". And the failures will be data dependent, and hence intermittent (the worst type of problem). I'll write a quick script, use it once and it'll be fine, then use it later on some different data and get an error. :-( I appreciate that the point here is to make sure that people think a bit more carefully about encoding issues. But doing so by making Python less friendly for casual, adhoc script use, seems to me to be a mistake.
I don't think that Windows developer even know that they are writing files into the ANSI code page. MSDN documentation of WideCharToMultiByte() warns developer that the ANSI code page is not portable, even accross Windows computers:
Probably true. But for many uses they also don't care. If you're writing something solely for a one-off job on your own PC, the ANSI code page is fine, and provides interoperability with other programs on your PC, which is really what you care about. (UTF-8 without BOM displays incorrectly in Vim, wordpad, and powershell get-content. MBCS works fine in all of these. It also displays incorrectly in CMD type, but in a less familiar form than the incorrect display mbcs produces, for what that's worth...)
It will always be possible to use ANSI code page using encoding="mbcs" (only work on Windows), or an explicit code page number (e.g. encoding="cp2152").
So, in effect, you propose making the default favour writing multiplatform portable code at the expense of quick and dirty scripts? My personal view is that this is the wrong choice ("practicality beats purity") but I guess it's ultimately a question of Python's design philosophy.
The two other (rejetected?) options to improve open() are:
- raise an error if the encoding argument is not set: will break most programs - emit a warning if the encoding argument is not set
IMHO, you missed another option - open() does not need improving, the current behaviour is better than any of the 3 options noted. Paul.