[Python-Dev] Python 1.5.2 modules need porting to 2.0 because of unicode - comments please

Mark Hammond MarkH@ActiveState.com
Tue, 19 Sep 2000 10:18:18 +1100


[Guido]

> Barry, I'm unclear on what exactly is happening.  Where does the
> Unicode come from?  You implied that your code worked under 1.5.2,
> which doesn't support Unicode.  How can code that works under 1.5.2
> suddenly start producing Unicode strings?  Unless you're now applying
> the existing code to new (Unicode) input data -- in which case, yes,
> we expect that fixes are sometimes needed.

My guess is that the Unicode strings are coming from COM.  In 1.5, we used
the Win32 specific Unicode object, and win32com did lots of explicit
str()s - the user of the end object usually saw real Python strings.

For 1.6 and later, I changed this, so that real Python Unicode objects are
used and returned instead of the strings.  I figured this would be a good
test for Unicode integration, as Unicode and strings are ultimately
supposed to be interchangeable ;-)

win32com.client.__init__ starts with:

NeedUnicodeConversions = not hasattr(__builtin__, "unicode")

This forces the flag "true" 1.5, and false otherwise.  Barry can force it
to "true", and win32com will always force a str() over all Unicode objects.

However, this will _still_ break in a few cases (and I have had some
reported).  str() of a Unicode object can often raise that ugly "char out
of range" error.  As Barry notes, the code would have to change to do an
"encode('mbcs')" to be safe anyway...

But regardless of where Barry's Unicode objects come from, his point
remains open.  Do we consider the library's lack of Unicode awareness a
bug, or do we drop any pretence of string and unicode objects being
interchangeable?

As a related issue, do we consider that str(unicode_ob) often fails is a
problem?  The users on c.l.py appear to...

Mark.