[Python-3000] locale-aware strings ?

Brian Quinlan brian at sweetapp.com
Wed Sep 6 13:33:43 CEST 2006

Marcin 'Qrczak' Kowalczyk wrote:
> Why would it matter? If most of their programs use UTF-8, and it's
> specified by the locale, then fine. My system uses mostly ISO-8859-2,
> and it's also fine, as long as there is a way for the program to get
> that information.

The problem is that blindly using the system encoding is error prone.

For example, I would imagine that when you type:

% less /usr/lib/python2.4/getopt.py

you see "Peter Ĺstrand" rather than "Peter Åstrand".

That happens because getopt.py is encoded in ISO-8859-1 and you are 
using ISO-8859-2 as your default encoding. Maybe you don't care about 
the display glitch but there are applications where it would be a big 
deal e.g. you are populating a database based on the content of text files.

> If a program can't read my text files or filenames or environment
> variables or program invocation arguments, while they are encoded
> according to the locale, then the program is broken.

How can the program know if the file is encoded according to your 
locale? Do you think that all of the text files on your system are 
encoded using ISO-8859-2? Should Python really just guess for you?

> If a file is not encoded using the encoding specified by the locale,
> and I don't tell the program explicitly about the encoding, then it's
> not the program's fault when it can't read that.
> If a language requires extra steps in order to make the locale
> encoding work, then it's unhelpful.

No, it's favoring caution and trying to avoid letting errors slip 
through. If the programmer believes that they understand the issues and 
wants to use the locale encoding setting, it will cost her <20 
characters of typing per file open to do so.

> Most programmers won't bother,
> and their programs will work most of the time when they test it,
> assuming they use it with English texts. Such programs suddenly break
> when used in a non-English speaking country.

And that is a great thing! Their program will break in a nice clean 
understandable way, instead of proceeding and generating incorrect results.


More information about the Python-3000 mailing list