[Python-Dev] unifying str and unicode
Phillip J. Eby
pje at telecommunity.com
Mon Oct 3 22:56:34 CEST 2005
At 10:38 PM 10/3/2005 +0200, Antoine Pitrou wrote:
>To which you apparently didn't read my answer, that is:
>you can never be sure that a variable containing something which
>is /semantically/ textual (*) will never contain anything other than
>ASCII text. For example raw_input() won't tell you that its 8-bit string
>result contains some chars > 0x7F. Same for many other library
>functions. How do you cope with (more or less occasional) non-ascii data
>coming in as 8-bit strings?
Presumably in Python 3.0, opening a file in "text" mode will require an
encoding to be specified, and opening it in "binary" mode will cause it to
produce or consume byte arrays, not strings. This should apply to sockets
too, and really any I/O facility, including GUI frameworks, DBAPI objects,
os.listdir(), etc.
Of course, to get there we really need to add a convenient bytes type,
perhaps by enhancing the current 'array' module. It'd be nice to have a
way to get this in 2.x versions so people can start fixing stuff to work
the right way. With no 8-bit strings coming in, there should be no
unicode/str problems except those you create yourself.
More information about the Python-Dev
mailing list