[Python-Dev] Unicode strings as filenames
Martin v. Loewis
martin@v.loewis.de
Sun, 6 Jan 2002 01:20:27 +0100
> Hmm, shouldn't StringObjects themselves carry an encoding field
> (defaulting to sys.encoding)?
That approach has been discussed during the design phase of the
Unicode API; Bill Janssen was the first to propose this in response
to my talk
http://www.python.org/workshops/1997-10/proceedings/loewis.html
During the Unicode design, this idea came up sometimes, but it always
turned out that proposers could not give a coherent semantics to such
tags. Just explain what happens if you add two strings that have
different encodings.
> That would solve quite a fewb issues.
And introduce many new ones.
> > Making UTF-8 the default Python system encoding would have many other
> > consequences -- and you'd probably lose a great deal of portability
> > since UTF-8 conversion (nearly) always will succeed while ASCII can
> > easily fail on other systems which use e.g. Latin-1 as native
> > encoding.
>
> What are your reasons for asserting this?
If I understand this claim correctly, he means:
"Currently, if auto-conversion (to ASCII) succeeds, the result is
likely correc. If the default encoding was UTF-8, conversion would
succeed for all Unicode objects, but give incorrect results for many
users, e.g. if they use Latin-1 on their terminal"
This is actually a frequent problem since the introduction of UTF-8:
Some applications display the bytes that make up an UTF-8 string as if
it was a Latin-1 string, rendering it completely unreadable (although
I can already recognize my name if I run into such an application).
This problem may go unnoticed during testing, whereas an exception
is likely noticed.
> If I read this correctly this would make Python compatible to the
> least common denominator of all platforms, while I think I would
> prefer it to allow access to all the niceties a platform gives.
It does no such thing. The application has full control over all
conversions, if it initiates them explicitly. Explicit is better then
implicit.
Regards,
Martin