[Python-Dev] Python-3.0, unicode, and os.environ

Ulrich Eckhardt eckhardt at satorlaser.com
Fri Dec 12 09:31:16 CET 2008


On Thursday 11 December 2008, Adam Olsen wrote:
> The simplest solution there is to have windows bytes APIs that return
> raw UTF-16 bytes (note that windows does NOT guaranteed to be valid
> unicode, despite being much more likely than on linux).

Actually, I'm not aware of this case. I only know that the OS refuses to mount 
media it can't decode, but that is on the OS-level. Can you give me a hint?

> The only real issue I see is that UTF-16 isn't an ASCII superset, so it
> won't print nicely.

True, but I personally couldn't care less. Actually, I would even prefer if 
printing a byte string always produced \x escaped byte values, that way it 
would at least be consistent. 

> In other words, bytes can be your special type.

That would actually be a lot of work to do, but I do agree that it would be a 
way. 

The problem though is that I have seen quite a few places in Python where such 
a byte string is passed as 'char*' and treated with the assumption that 
strlen() would yield a meaningful value there, so this calls at least for a 
distinct 'Py_Byte' type. Also, this still doesn't even remotely handle the 
problem that you do have two valid encodings on win32, even though the MBCS 
one could be called deprecated. People will try to interface to other 
libraries that use win32 CHAR strings and that will be much harder or even 
impossible. Further, and that is IMHO the worst part of it, things will fail 
too silently and programmers aren't encouraged to write portable code, but 
maybe I'm just too pessimistic.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich.

**************************************************************************************



More information about the Python-Dev mailing list