[Python-3000] locale-aware strings ?

Paul Prescod paul at prescod.net
Tue Sep 5 18:08:47 CEST 2006


On 9/4/06, Guido van Rossum <guido at python.org> wrote:
>
> In this particular case I don't care what's simpler to implement, but
> what's most likely to do what the user expects. If on a particular box
> most files are encoded in encoding X, and the user did whatever is
> necessary to tell the tools that that's their preferred encoding, I
> want Python to honor that encoding when opening text files, unless the
> program makes other arrangements explicitly (such as specifying an
> explicit encoding as a parameter to open()).


It does not strike me as accurate that on a modern computer system, a
Swedish person's computer is full of ISO/Swedish encoded files and a Chinese
person's computer is full of a speciifc Chinese encoding etc. Maybe that was
true before the notion of variant encodings became so popular.

But now Europeans are just as likely to use UTF-8 as a national encoding and
Asians each have MANY different encodings to select from (some defined by
Unicode, some national). I doubt you'll frequently guess correctly except in
specialized apps where a user has very explicit control over their file
encodings and doesn't depend on applications to choose. The direction over
the lifetype of Python 3000 will be AWAY from national, local,
locale-predictable encodings and TOWARDS global, standard encodings. Once we
get to a place where Unicode encodings are dominant, a local-encodings
feature will be useless. In the transition period, it will be actually
harmful.

Also, only a portion of the text data on a computer is in "documents" where
the end-user has control over the encoding. There are also  many, many
configuration files, emails, saved web pages, chat logs etc. where the
encoding was selected by someone else with a potentially different
nationality.

I would guess that "most" text files on "most" computers in any particular
locale are in ASCII/utf-8. Japanese people also have hosts files and
.htaccess files and INI files and log files and ... Python can't know
whether it is dealing with one of these files or an end-user document.

Of the subset of documents that actually have their encoding controlled by
the local user's preferences, an increasing portion with be XML and XML
documents describe their encoding explicitly. It would be wrong to use the
locale to override that.

Beyond all of that: It just seems wrong to me that I could send someone a
bunch of files and a Python program and their results processing them would
be different from mine, despite the fact that we run the same version of
Python on the same operating system.

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060905/18f98882/attachment.html 


More information about the Python-3000 mailing list