[Python-Dev] Unicode

Jack Jansen Jack.Jansen@oratrix.com
Mon, 29 Apr 2002 00:05:13 +0200

On vrijdag, april 26, 2002, at 06:26 , Guido van Rossum wrote:

> No syntactic changes, no.  But the way we do things would become
> significantly different.  And think of binary I/O vs. textual I/O --
> currently, file.read() returns a string.  Code dealing with binary
> files will look significantly different, and old code won't work.

It could be argued that open(..., 'r').read() returns a text 
string and open(..., 'rb').read() returns a binary blob.

If textstrings and blobs become wholly different objects this 
shouldn't create too many problems [see below], except for code 
that opens a file in binary mode and (partially) reads the 
resulting file expecting text. But this code would need 
revisiting anyway if the normal textstring would become unicode.

[here's below] To my surprise I think that having blobs and 
textstrings be unrelated objects creates less problems than 
having the one be a subtype of the other. At least, every time I 
try to do the subtyping in my head I flip back and forth between 
textstrings-are-a-subtype-of-general-binary-buffers and 
binary-buffers-are-a-special-case-of-python-strings every couple 
of seconds. I think having them both be subtypes of a common 
base type (basestring) might work, but I'm not sure.
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -