[I18n-sig] Re: Unicode debate

M.-A. Lemburg mal@lemburg.com
Mon, 01 May 2000 21:48:51 +0200

Guido van Rossum wrote:
> > Here's a list of what I've found by running some of the
> > regression tests:
> >
> > * import string fails due to the way _idtable is constructed
> Hm, I don't see this -- string.py imports just fine.  There's no
> _idtable in my copy of string.py?!?!

Ehm, I meant _idmap... I would guess that the reason
your string.py imports fine is that the import still uses
a cached PYC file for the import (this is why I updated the
-U patch to modify the magic number for imports when the
flag is set -- it ensures that when running in -U mode,
only PYC files also having been compiled with -U are
used and that when running without -U no such files
are accepted; makes testing a little easier since it doesn't
interfere with existing implementations).
> > * getattr() doesn't like Unicode as second argument, same for
> >   delattr() and hasattr()
> > * eval() expects a string object
> These should all be fixed.
> > * there still are some string exceptions around in the regr.
> >   tests which cause a failure (Unicode exceptions don't work)
> Interesting.  One more reason to drop string exceptions sometime in
> the future.
> > * struct.pack('s') doesn't like Unicode as argument
> Fix it.
> > * re doesn't work: pcre_expand() needs a string object
> Fix it, but with low priority (the expectation is that sre will replace
> pcre in 1.6a3).

> > * regex doesn't work either because string objects are hard-coded
> Don't fix (regex is obsolete, only kept around because it used to be
> very common).
> > * mmap doesn't like Unicode: "mmap assignment must be
> >   single-character string"
> Yes, this has 8-bit string written all over it.  It really should be
> using the buffer API rather than requiring strings!
> > * cPickle.loads() doesn't like Unicode as data storage
> Hm, hard to fix.  Again, it really should use the buffer API, but it doesn't.

Note that this "bug" only occurrs when using strings as
data storage... the test code should really be using
a buffer object for this (or some other sort of binary
data container).
> > * keywords must be strings (f(1, 2, 3, **{'a':4, 'b':5}) doesn't work)
> How hard would this be to fix?

Not sure... the keyword code is spread across many files.
> > * rotor doesn't work
> Not very important.
> > Some of these could be fixed by putting a str() call around
> > the '...' constants. Others need fixes in C code. Yet others
> > would be better off if they used the buffer interfaces (basically
> > all APIs which work on raw data like cPickle or rotor).
> What I said. :-)

Should we go ahead with this for the 1.6 series or wait until 1.7 ?

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/