[Python-Dev] unifying str and unicode

Phillip J. Eby pje at telecommunity.com
Mon Oct 3 22:56:34 CEST 2005


At 10:38 PM 10/3/2005 +0200, Antoine Pitrou wrote:
>To which you apparently didn't read my answer, that is:
>you can never be sure that a variable containing something which
>is /semantically/ textual (*) will never contain anything other than
>ASCII text. For example raw_input() won't tell you that its 8-bit string
>result contains some chars > 0x7F. Same for many other library
>functions. How do you cope with (more or less occasional) non-ascii data
>coming in as 8-bit strings?

Presumably in Python 3.0, opening a file in "text" mode will require an 
encoding to be specified, and opening it in "binary" mode will cause it to 
produce or consume byte arrays, not strings.  This should apply to sockets 
too, and really any I/O facility, including GUI frameworks, DBAPI objects, 
os.listdir(), etc.

Of course, to get there we really need to add a convenient bytes type, 
perhaps by enhancing the current 'array' module.  It'd be nice to have a 
way to get this in 2.x versions so people can start fixing stuff to work 
the right way.  With no 8-bit strings coming in, there should be no 
unicode/str problems except those you create yourself.



More information about the Python-Dev mailing list