[Python-3000] locale-aware strings ?

David Hopwood david.nospam.hopwood at blueyonder.co.uk
Tue Sep 5 02:28:54 CEST 2006

Guido van Rossum wrote:
> On 9/4/06, David Hopwood <david.nospam.hopwood at blueyonder.co.uk> wrote:
>> Guido van Rossum wrote:
>> > I've always said (can someone find a quote perhaps?) that there ought
>> > to be a sensible default encoding for files (including but not limited
>> > to stdin/out/err), perhaps influenced by personalized settings,
>> > environment variables, the OS, etc.
>> While it should be possible to find out what the OS believes to be
>> the current "system" charset (GetCPInfoEx(CP_ACP, ...) on Windows;
>> LC_CHARSET environment variable on Unix), that does not mean that it
>> is this charset that Python programs should normally use. When defining
>> a new text-based file type, it is simpler to define it to be always
>> UTF-8.
> In this particular case I don't care what's simpler to implement,

The issue is not simplicity of implementation; it is what will provide
the simplest usage model in the long term. If new files are encoded in X
just because most of a user's existing files are encoded in X, then how is
the user supposed to migrate to a different encoding? Language specifications
can have a significant effect in helping migration to Unicode.

> but what's most likely to do what the user expects.

In practice, the system charset is often set to the charset that should
be used as a fallback *for applications that do not support Unicode*. This
is especially true on Windows systems.

Using UTF-8 by default for new file types is not only simpler, it's more
functional. If a BOM is written at the start of the file, and if the user
edits files with a text editor that recognizes this, then everything,
including writing text in multiple scripts, will Just Work from the user's
point of view.

> If on a particular box
> most files are encoded in encoding X, and the user did whatever is
> necessary to tell the tools that that's their preferred encoding, I
> want Python to honor that encoding when opening text files, unless the
> program makes other arrangements explicitly (such as specifying an
> explicit encoding as a parameter to open()).

I would prefer that there is no default. But since that is incompatible
with the existing API for open(), I accept that I'm not likely to win
that argument.

David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

More information about the Python-3000 mailing list