[Python-Dev] open(): set the default encoding to 'utf-8' in Python 3.3?

Toshio Kuratomi a.badger at gmail.com
Tue Jun 28 18:33:51 CEST 2011

On Tue, Jun 28, 2011 at 03:46:12PM +0100, Paul Moore wrote:
> On 28 June 2011 14:43, Victor Stinner <victor.stinner at haypocalc.com> wrote:
> > As discussed before on this list, I propose to set the default encoding
> > of open() to UTF-8 in Python 3.3, and add a warning in Python 3.2 if
> > open() is called without an explicit encoding and if the locale encoding
> > is not UTF-8. Using the warning, you will quickly notice the potential
> > problem (using Python 3.2.2 and -Werror) on Windows or by using a
> > different locale encoding (.e.g using LANG="C").
> -1. This will make things harder for simple scripts which are not
> intended to be cross-platform.
> I use Windows, and come from the UK, so 99% of my text files are
> ASCII. So the majority of my code will be unaffected. But in the
> occasional situation where I use a £ sign, I'll get encoding errors,
> where currently things will "just work". And the failures will be data
> dependent, and hence intermittent (the worst type of problem). I'll
> write a quick script, use it once and it'll be fine, then use it later
> on some different data and get an error. :-(
I don't think this change would make things "harder".  It will just move
where the pain occurs.  Right now, the failures are intermittent on A)
computers other than the one that you're using. or B) intermittent when run
under a different user than yourself.  Sys admins where I'm at are
constantly writing ad hoc scripts in python that break because you stick
something in a cron job and the locale settings suddenly become "C" and
therefore the script suddenly only deals with ASCII characters.

I don't know that Victor's proposed solution is the best (I personally would
like it a whole lot more than the current guessing but I never develop on
Windows so I can certainly see that your environment can lead to the
opposite assumption :-) but something should change here.  Issuing a warning
like "open used without explicit encoding may lead to errors" if open() is
used without an explicit encoding would help a little (at least, people who
get errors would then have an inkling that the culprit might be an open()
call).  If I read Victor's previous email correctly, though, he said this
was previously rejected.

Another brainstorming solution would be to use different default encodings on
different platforms.  For instance, for writing files, utf-8 on *nix systems
(including macosX) and utf-16 on windows.  For reading files, check for a utf-16
BOM, if not present, operate as utf-8.  That would seem to address your
issue with detection by vim, etc but I'm not sure about getting "£" in your
input stream.  I don't know where your input is coming from and how Windows
equivalent of locale plays into that.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110628/11ac1081/attachment.pgp>

More information about the Python-Dev mailing list