[Python-Dev] Filename as byte string in python 2.6 or 3.0?
Ulrich Eckhardt
eckhardt at satorlaser.com
Mon Sep 29 12:50:03 CEST 2008
On Sunday 28 September 2008, Gregory P. Smith wrote:
> "broken" systems will always exist. Code to deal with them must be
> possible to write in python 3.0.
> since any given path (not just fs) can have its own encoding it makes
> the most sense to me to let the OS deal with the errors and not try to
> enforce bytes vs string encoding type at the python lib. level.
Actually I'm afraid that that isn't really useful. I, too, would like to kick
peoples' back in order to get the to fix their systems or use the proper
codepage while mounting etc, etc, but that is not going to happen soon. Just
ignoring those broken systems is tempting, but alienating a large group of
users isn't IMHO worth it.
Instead, I'd like to present a different approach:
1. For POSIX platforms (using a byte string for the path):
Here, the first approach is to convert the path to Unicode, according to the
locale's CTYPE category. Hopefully, it will be UTF-8, but also codepages
should work. If there is a segment (a byte sequence between two path
separators) where it doesn't work, it uses an ASCII mapping where possible
and codepoints from the "Private Use Area" (PUA) of Unicode for the
non-decodable bytes.
In order to pass this path to fopen(), each segment would be converted to a
byte string again, using the locale's CTYPE category except for segments
which use the PUA where it simply encodes the original bytes.
2. For win32 platforms, the path is already Unicode (UTF-16) and the whole
problem is solved or not solved by the OS.
In the end, both approaches yield a path represented by a Unicode string for
intermediate use, which provides maximum flexibility. Further, it
preserves "broken" encodings by simply mapping their byte-values to the PUA
of Unicode. Maybe not using a string to represent a path would be a good
idea, too. At least it would make it very clear that the string is not
completely free-form.
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
Visit our website at <http://www.satorlaser.de/>
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich.
More information about the Python-Dev
mailing list