[IronPython] x = unicode(someExtendedUnicodeString) fails.

Vernon Cole vernondcole at gmail.com
Thu Dec 17 20:05:31 CET 2009

I just tripped over this one and it took some time to figure out what in
blazes was going on. You may want to watch for it when porting CPython code.

I was cleaning up an input argument using
     s = unicode(S.strip().upper())
where S is the argument supplying the value I need to convert.

When I handed the function a genuine unicode string, such as in:
     assert Roman(u'\u217b') == 12 #unicode Roman number 'xii' as a single
IronPython complains with:
    UnicodeEncodeError: ('unknown', '\x00', 0, 1, '')

The Python manual says:

> If no optional parameters are given, unicode() will mimic the behaviour of
> str() except that it returns Unicode strings instead of 8-bit strings.
> More precisely, if *object* is a Unicode string or subclass it will return
> that Unicode string without any additional decoding applied.

It turns out that this was already reported on codeplex as:
but the reporting party did not catch the fact that he had located an
incompatibility with documented behavior.
It has been setting on a back burner for some time.

Others may want to join me in voting this up.  Meanwhile I will add an
unneeded exception handler to my own code.
Vernon Cole
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20091217/a078db27/attachment.html>

More information about the Ironpython-users mailing list