[Python-Dev] unicodeobject.c,2.139,2.140 checkin

Jack Jansen Jack.Jansen@oratrix.com
Thu, 25 Apr 2002 23:40:51 +0200


On donderdag, april 25, 2002, at 08:59 , Guido van Rossum wrote:

>> I don't know why it is, but Unicode always seems to unnecessarily
>> heat up any discussion involving it. I would really like to know
>> what is causing this: is it a religious issue, does it have to do
>> with the people involved or is Unicode inherently controversial ?
>
[...]
> Another issue is that adding Unicode was probably the most invasive
> set of changes ever made to the Python code base.  It has complicated
> many parts of the code, and added at least a proportional share of
> bugs.  (I found 166 source files in CVS containing some variation on
> the string "unicode", and 110 bug reports mentioning "unicode" in the
> SF bug tracker.)

Another thing that bothers me is that it retroactively changed 
the interpretation of other Python objects. For me it's 
perfectly logical that a character string is a character string, 
unless there's a very good reason to treat it differently (a 
framebuffer scanline, a binary blob, etc). And so if I have an 
API OpenFileWithUnicodeName() that accepts a unicode filename I 
expect that if I pass an 8-bit filename it would be converted on 
the fly. Other people focus on different sets of API's, however, 
and think there's nothing more logical than interpreting the 
string object as a binary buffer containing UTF16 values or 
what-have-you.

Scanlines or binary blobs hardly ever mixed with filenames, so 
there wasn't an issue before unicode raised its pretty/ugly head.

(of course it could be argued that unicode has demonstrated a 
design flaw in Python, namely that a single data-type was used 
to store both binary data of unknown interpretation and 
character arrays, and that there's now little more to be done 
about that).
--
- Jack Jansen        <Jack.Jansen@oratrix.com>        
http://www.cwi.nl/~jack -
- If I can't dance I don't want to be part of your revolution -- 
Emma Goldman -