[Python-3000] locale-aware strings ?

Guido van Rossum guido at python.org
Tue Sep 5 21:13:46 CEST 2006


On 9/5/06, Brian Quinlan <brian at sweetapp.com> wrote:
> Guido van Rossum wrote:
> > And it seems just as wrong if Python doesn't do what the user expects.
> > If I were a beginning Python user, I'd hate it if I had prepared a
> > simple data file in vi or notepad and my Python program wouldn't read
> > it right because Python's idea of encoding differs from my editor's.
>
> As a user, I don't have any expectations regarding non-ASCII text files.

What tools do you use to edit or view those files? How do those tools
know the encoding to use?

(Auto-detection from sniffing the data is a perfectly valid answer BTW
-- I see no reason why that couldn't be one option, as long as there's
a way to disable it.)

> I'm using a US-English version of Windows XP (very common) and I haven't
> changed the default encoding (very common). Python claims that my system
> encoding is CP436 (from sys.stdin/stdout.encoding). I can assure you
> that most of the documents that I work with are not in CP436 - they are
> a combination of ASCII, ISO8859-1, and UTF-8. I would also guess that
> this is true of many Windows XP (US-English) users. So, for me and users
> like me, Python is going to silently misinterpret my data.

Not to any greater extent than Notepad or whatever other tool you are using.

> How about using ASCII as the default encoding and raising an exception
> if non-ASCII text is encountered?

That would not be doing what the user wants. We have extensive
experience with defaulting to ASCII in Python 2.x and it's mostly bad.
There should definitely be a way to force ASCII as the default
encoding (if only as a debugging aid), both in the program code and in
the environment; but it shouldn't be the only default. There should
also be a way to force UTF-8 as the default, or ISO-8859-1. But if
CP436 is the default encoding set by the OS I don't see why Python
shouldn't use that as the default *in the absence of any other
preferences*.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list