[Python-3000] locale-aware strings ?
paul at prescod.net
Wed Sep 6 03:53:53 CEST 2006
On 9/5/06, Guido van Rossum <guido at python.org> wrote:
> On 9/5/06, David Hopwood <david.nospam.hopwood at blueyonder.co.uk> wrote:
> > Guido van Rossum wrote:
> > > On 9/5/06, Paul Prescod <paul at prescod.net> wrote:
> > >
> > >> Beyond all of that: It just seems wrong to me that I could send
> someone a
> > >> bunch of files and a Python program and their results processing them
> > >> would be different from mine, despite the fact that we run the same
> version of
> > >> Python on the same operating system.
> > >
> > > And it seems just as wrong if Python doesn't do what the user expects.
> > > If I were a beginning Python user, I'd hate it if I had prepared a
> > > simple data file in vi or notepad and my Python program wouldn't read
> > > it right because Python's idea of encoding differs from my editor's.
> > I don't know about vi, but notepad will open and save files that are not
> > the system ("ANSI") encoding just fine. On opening it checks for a BOM
> > auto-detects UTF-8 and UTF-16; on saving it will write a BOM if you
> > "Unicode" (UTF-16LE), "Unicode big-endian" (UTF-16BE), or UTF-8 in the
> > Encoding drop-down box.
> > This is exactly the behaviour that most users would expect of a
> > Unicode-aware app. It should be as easy as possible to match this
> > in a Python program.
> And this is exactly why I want the determination of the default
> encoding (i.e. the encoding to be used when opening a file when no
> explicit encoding is specified by the Python code that does the
> opening) to be open-ended, rather than picking some standard default
> like UTF-8 and saying (like Paul seems to want to say) "this is it".
I never suggested that UTF-8 should be the default. In fact, I think it was
very wise of Python 2.x to make ASCII the default and I'm astounded to hear
that you regret that decision. "In the face of ambiguity, refuse the
temptation to guess."
Python 2.x provided an option to allow users to change the default
system-wide and ever since then we've (almost unanimously) counselled users
against changing it.
> > Sorry Paul, I appreciate your standards-driven perspective, but in
> > > this area I'd rather build in more flexibility than strictly needed,
> > > than too little. If it turns out that on a particular platform all
> > > files are in UTF-8, making Python *on that platform* always choose
> > > UTF-8 is simple enough.
> > The problem is not the systems where all files are UTF-8, or all files
> > another known charset. The problem is the platforms where half of the
> > are UTF-8 and half are in some other charset, determined either by type
> or by
> > presence of a UTF-8 BOM. This is a *very* common situation, especially
> > European users.
> Right. (And Paul appears to be ignorant of this.)
I don't see how the fact that an individual system can have half of the
files in one encoding and half in another could argue IN FAVOUR of a
system-global default. I would have thought it strengthens my argument
AGAINST trying to apply a random encoding to files.
"If on a particular box
most files are encoded in encoding X, and the user did whatever is
necessary to tell the tools that that's their preferred encoding, I
want Python to honor that encoding when opening text files, unless the
program makes other arrangements explicitly (such as specifying an
explicit encoding as a parameter to open())."
But there is no such thing that "most users do" to tell tool what's their
preferred encoding. Most users use some random (to them) operating system
default which on Windows is usually wrong and is different (for no
particular reason) on the Macintosh than on Linux. Long-time Windows users
in this thread cannot even agree what is the default for US English Windows
because there is no single default. There are two.
Can we at least agree that if LC_CHARSET is demonstrably wrong most of the
time on Windows that we should not use it (at least on Windows)?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-3000