[Python-Dev] Python-3.0, unicode, and os.environ

Ulrich Eckhardt eckhardt at satorlaser.com
Tue Dec 9 19:31:29 CET 2008


On Monday 08 December 2008, Adam Olsen wrote:
> At this point someone suggests we have a type that can store an
> arbitrary mix of unicode and bytes, so the undecodable portions stay
> in their original form. :P

Well, not an arbitrary mix, but a type that just stores whatever comes from 
the system without further specifying it as either bytes or Unicode:

* If you want a string for displaying it, you first have to extract a string 
from that thing and there you optionally specify the encoding and error 
behaviour.
* If you want to append a string to it, it is automatically encoded in the 
default encoding, which obviously can fail.
* Similarly, e.g. globbing is done on the underlying representation's level, 
so "*.py" will first have to be converted according to the default encoding.
* If you just print it, you will get something that you can make out the 
decodable parts from, but it will probably be like "{Unicode:u'abcde'}" 
or "{bytes:b'ab\xf0\x0fcd'}".
* If you don't want to display it, but just want to pass it to the system, 
just use it as is.

Yes, this puts an inconvenience on application programmers that up to now 
always assumed that they received a list of strings from os.readdir(), but 
that's the way with false assumptions. In any case, they will be aware (from 
reading the docs) of what the problem is and why there is no way to return a 
text. Further, they will get tools to convert these paths or environment vars 
to texts, so it will be simply replacing "os.readdir()" 
with "map(to_unicode,os.readdir())".


Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

**************************************************************************************
           Visit our website at <http://www.satorlaser.de/>
**************************************************************************************
Diese E-Mail einschließlich sämtlicher Anhänge ist nur für den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empfänger sein sollten. Die E-Mail ist in diesem Fall zu löschen und darf weder gelesen, weitergeleitet, veröffentlicht oder anderweitig benutzt werden.
E-Mails können durch Dritte gelesen werden und Viren sowie nichtautorisierte Änderungen enthalten. Sator Laser GmbH ist für diese Folgen nicht verantwortlich.

**************************************************************************************



More information about the Python-Dev mailing list