[Python-3000] locale-aware strings ?

David Hopwood david.nospam.hopwood at blueyonder.co.uk
Wed Sep 6 03:28:31 CEST 2006

Guido van Rossum wrote:
> On 9/5/06, David Hopwood <david.nospam.hopwood at blueyonder.co.uk> wrote:
>> Guido van Rossum wrote:
>> > On 9/5/06, Paul Prescod <paul at prescod.net> wrote:
>> >
>> >> Beyond all of that: It just seems wrong to me that I could send
>> >> someone a bunch of files and a Python program and their results
>> >> processing them would be different from mine, despite the fact that
>> >> we run the same version of Python on the same operating system.
>> >
>> > And it seems just as wrong if Python doesn't do what the user expects.
>> > If I were a beginning Python user, I'd hate it if I had prepared a
>> > simple data file in vi or notepad and my Python program wouldn't read
>> > it right because Python's idea of encoding differs from my editor's.
>> I don't know about vi, but notepad will open and save files that are
>> not in the system ("ANSI") encoding just fine. On opening it checks for
>> a BOM and auto-detects UTF-8 and UTF-16; on saving it will write a BOM
>> if you choose "Unicode" (UTF-16LE), "Unicode big-endian" (UTF-16BE), or
>> UTF-8 in the Encoding drop-down box.
>> This is exactly the behaviour that most users would expect of a
>> well-behaved Unicode-aware app. It should be as easy as possible to
>> match this behaviour in a Python program.
> And this is exactly why I want the determination of the default
> encoding (i.e. the encoding to be used when opening a file when no
> explicit encoding is specified by the Python code that does the
> opening) to be open-ended, rather than picking some standard default
> like UTF-8 and saying (like Paul seems to want to say) "this is it".

The point I was making is that the system encoding *should not* be
treated as (or called) a "default" encoding. I can't speak for Paul, but
that seemed to also be what he was saying.

The whole idea of a default encoding is flawed. Ideally there would be
no default; programmers should be forced to think about the issue
on a case-by-case basis. In some cases they might choose to open a file
with the system encoding, but that should be an explicit decision.

>> (Setting different locales for different applications is far too much
>> hassle. On Windows, although I believe it is technically possible to
>> do the equivalent of selecting a UTF-8 locale, most users don't know
>> how to do it, even if they want to use UTF-8 exclusively.)
> Right. Of course, "locale" and "encoding" are somewhat orthogonal
> issues; the encoding may be UTF-8 but that doesn't determine other
> aspects of the locale (such as language-specific collation order, or
> culture-specific formatting of numbers, dates and money).

The encoding is usually an attribute of the locale. This is certainly
the case on POSIX and Windows platforms.

David Hopwood <david.nospam.hopwood at blueyonder.co.uk>

More information about the Python-3000 mailing list