On Sunday 28 September 2008, Gregory P. Smith wrote:
"broken" systems will always exist. Code to deal with them must be possible to write in python 3.0.
since any given path (not just fs) can have its own encoding it makes the most sense to me to let the OS deal with the errors and not try to enforce bytes vs string encoding type at the python lib. level.
Actually I'm afraid that that isn't really useful. I, too, would like to kick peoples' back in order to get the to fix their systems or use the proper codepage while mounting etc, etc, but that is not going to happen soon. Just ignoring those broken systems is tempting, but alienating a large group of users isn't IMHO worth it. Instead, I'd like to present a different approach: 1. For POSIX platforms (using a byte string for the path): Here, the first approach is to convert the path to Unicode, according to the locale's CTYPE category. Hopefully, it will be UTF-8, but also codepages should work. If there is a segment (a byte sequence between two path separators) where it doesn't work, it uses an ASCII mapping where possible and codepoints from the "Private Use Area" (PUA) of Unicode for the non-decodable bytes. In order to pass this path to fopen(), each segment would be converted to a byte string again, using the locale's CTYPE category except for segments which use the PUA where it simply encodes the original bytes. 2. For win32 platforms, the path is already Unicode (UTF-16) and the whole problem is solved or not solved by the OS. In the end, both approaches yield a path represented by a Unicode string for intermediate use, which provides maximum flexibility. Further, it preserves "broken" encodings by simply mapping their byte-values to the PUA of Unicode. Maybe not using a string to represent a path would be a good idea, too. At least it would make it very clear that the string is not completely free-form. Uli -- Sator Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at http://www.satorlaser.de/ ************************************************************************************** Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden. E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich. **************************************************************************************