[Python-Dev] Filename as byte string in python 2.6 or 3.0?

Ulrich Eckhardt eckhardt at satorlaser.com
Mon Sep 29 12:50:03 CEST 2008


On Sunday 28 September 2008, Gregory P. Smith wrote:
> "broken" systems will always exist.  Code to deal with them must be
> possible to write in python 3.0.
>
> since any given path (not just fs) can have its own encoding it makes
> the most sense to me to let the OS deal with the errors and not try to
> enforce bytes vs string encoding type at the python lib. level.

Actually I'm afraid that that isn't really useful. I, too, would like to kick 
peoples' back in order to get the to fix their systems or use the proper 
codepage while mounting etc, etc, but that is not going to happen soon. Just 
ignoring those broken systems is tempting, but alienating a large group of 
users isn't IMHO worth it.

Instead, I'd like to present a different approach:

1. For POSIX platforms (using a byte string for the path):
Here, the first approach is to convert the path to Unicode, according to the 
locale's CTYPE category. Hopefully, it will be UTF-8, but also codepages 
should work. If there is a segment (a byte sequence between two path 
separators) where it doesn't work, it uses an ASCII mapping where possible 
and codepoints from the "Private Use Area" (PUA) of Unicode for the 
non-decodable bytes.
In order to pass this path to fopen(), each segment would be converted to a 
byte string again, using the locale's CTYPE category except for segments 
which use the PUA where it simply encodes the original bytes.

2. For win32 platforms, the path is already Unicode (UTF-16) and the whole 
problem is solved or not solved by the OS.

In the end, both approaches yield a path represented by a Unicode string for 
intermediate use, which provides maximum flexibility. Further, it 
preserves "broken" encodings by simply mapping their byte-values to the PUA 
of Unicode. Maybe not using a string to represent a path would be a good 
idea, too. At least it would make it very clear that the string is not 
completely free-form.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich.

**************************************************************************************



More information about the Python-Dev mailing list