[Python-3000] locale-aware strings ?
Brian Quinlan
brian at sweetapp.com
Wed Sep 6 13:33:43 CEST 2006
Marcin 'Qrczak' Kowalczyk wrote:
> Why would it matter? If most of their programs use UTF-8, and it's
> specified by the locale, then fine. My system uses mostly ISO-8859-2,
> and it's also fine, as long as there is a way for the program to get
> that information.
The problem is that blindly using the system encoding is error prone.
For example, I would imagine that when you type:
% less /usr/lib/python2.4/getopt.py
you see "Peter Ĺstrand" rather than "Peter Åstrand".
That happens because getopt.py is encoded in ISO-8859-1 and you are
using ISO-8859-2 as your default encoding. Maybe you don't care about
the display glitch but there are applications where it would be a big
deal e.g. you are populating a database based on the content of text files.
> If a program can't read my text files or filenames or environment
> variables or program invocation arguments, while they are encoded
> according to the locale, then the program is broken.
How can the program know if the file is encoded according to your
locale? Do you think that all of the text files on your system are
encoded using ISO-8859-2? Should Python really just guess for you?
> If a file is not encoded using the encoding specified by the locale,
> and I don't tell the program explicitly about the encoding, then it's
> not the program's fault when it can't read that.
>
> If a language requires extra steps in order to make the locale
> encoding work, then it's unhelpful.
No, it's favoring caution and trying to avoid letting errors slip
through. If the programmer believes that they understand the issues and
wants to use the locale encoding setting, it will cost her <20
characters of typing per file open to do so.
> Most programmers won't bother,
> and their programs will work most of the time when they test it,
> assuming they use it with English texts. Such programs suddenly break
> when used in a non-English speaking country.
And that is a great thing! Their program will break in a nice clean
understandable way, instead of proceeding and generating incorrect results.
Cheers,
Brian
More information about the Python-3000
mailing list