[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue
Stephen J. Turnbull
stephen at xemacs.org
Tue Sep 30 04:24:29 CEST 2008
Guido van Rossum writes:
> On Mon, Sep 29, 2008 at 4:29 PM, Victor Stinner
> <victor.stinner at haypocalc.com> wrote:
> > It would be hard for a newbie programmer to understand why he's
> > unable to find his very important file ("important r?port.doc")
> > using os.listdir().
> *Every* failure in this scenario will be hard to understand for a
> newbie programmer. We can just document the fact.
Guido is absolutely right. The Emacs/Mule people have been trying to
solve this kind of problem for 20 years, and the best they've come up
with is Martin's strategy: if you need really robust decoding, force
ISO 8859/1 (which for historical reasons uses all 256 octets) to get a
lossless internal text representation, and decode from that and *track
the encoding used* at the application level. The email-sig/Mailman
people will testify how hard this is to do well, even when you have a
handful of RFCs that specify how it is to be done!
On the other hand, this kind of robustness is almost never needed in
"general newbie programming", except when you are writing a program to
be used to clean up after an undisciplined administration, or some
other system disaster. Under normal circumstances the system encoding
is well-known and conformance is universal.
The best you can do for a general programming system is to
heuristically determine a single system encoding and raise an error if
the decoding fails.
More information about the Python-Dev
mailing list