[Python-Dev] Python 1.5.2 modules need porting to 2.0 because of unicode - comments please

M.-A. Lemburg mal@lemburg.com
Tue, 19 Sep 2000 11:13:13 +0200

Barry Scott wrote:
> > But regardless of where Barry's Unicode objects come from, his point
> > remains open.  Do we consider the library's lack of Unicode awareness a
> > bug, or do we drop any pretence of string and unicode objects being
> > interchangeable?

Python's stdlib is *not* Unicode ready. This should be seen a project
for 2.1.

> > As a related issue, do we consider that str(unicode_ob) often fails is a
> > problem?  The users on c.l.py appear to...

It will only fail if the Unicode object is not compatible with the
default encoding. If users want to use a different encoding for
interfacing Unicode to strings they should call .encode explicitely,
possible through a helper function.

> > Mark.
> Exactly.
> I want unicode from Mark's code, unicode is goodness.
> But the principle of least astonishment may well be broken in the library,
> indeed in the language.
> It took me 40 minutes to prove that the unicode came from Mark's code and
> I know the code involved intimately. Debugging these failures is tedious.

To debug these things, simply switch off Unicode to string conversion
by editing site.py (look at the comments at the end of the module).
All conversion tries will then result in an exception.

> I don't have an opinion as to the best resolution yet.
> One option would be for Mark's code to default to string. But that does not
> help once someone chooses to enable unicode in Mark's code.
> Maybe '%s' % u'x' should return 'x' not u'x' and u'%s' % 's' return u's'
> Maybe 's' + u'x' should return 'sx' not u'sx'. and u's' + 'x' returns u'sx'
> The above 2 maybe's would have hidden the problem in my code, baring exceptions.

When designing the Unicode-string integration we decided to
use the same coercion rules as for numbers: always coerce to the
"bigger" type. Anything else would have caused even more

Again, what needs to be done is to make the tools Unicode aware,
not the magic ;-)

Marc-Andre Lemburg
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/