[Python-Dev] Python 1.5.2 modules need porting to 2.0 because of unicode - comments please

M.-A. Lemburg mal@lemburg.com
Tue, 19 Sep 2000 12:34:40 +0200

Fredrik Lundh wrote:
> mal wrote:
> > > So in essence, I suggest that the Unicode object does not implement
> > > the buffer interface. If that has any undesirable consequences (which
> > > ones?), I suggest that 'binary write' operations (sockets, files)
> > > explicitly check for Unicode objects, and either reject them, or
> > > invoke the system encoding (i.e. ASCII).
> >
> > It's too late for any generic changes in the Unicode area.
> it's not too late to fix bugs.

I doubt that we can fix all Unicode related bugs in the 2.0
stdlib before the final release... let's make this a project 
for 2.1.
> > The right thing to do is to make the *tools* Unicode aware, since
> > you can't really expect the Unicode-string integration mechanism
> > to fiddle things right in every possible case out there.
> no, but people may expect Python to raise an exception instead
> of doing something that is not only non-portable, but also clearly
> wrong in most real-life cases.

I completely agree that the divergence between "s" and "s#"
is not ideal at all, but that's something the buffer interface
design has to fix (not the Unicode design) since this is a
general problem. AFAIK, no other object makes a difference
between getreadbuf and getcharbuf... this is why the problem
has never shown up before.

Grepping through the stdlib, there are lots of places where
"s#" is expected to work on raw data and others where
conversion to string would be more appropriate, so the one
true solution is not clear at all.

Here are some possible hacks to work-around the Unicode problem:

1. switch off getreadbuf slot

   This would break many IO-calls w/r to Unicode support.

2. make getreadbuf return the same as getcharbuf (i.e. ASCII data)

   This could work, but would break slicing and indexing 
   for e.g. a UTF-8 default encoding.   

3. leave things as they are implemented now and live with the
   consequences (mark the Python stdlib as not Unicode compatible)

   Not ideal, but leaves room for discussion.

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/