[IronPython] Django, __unicode__, and #20366

Dino Viehland dinov at microsoft.com
Thu Feb 11 20:06:58 CET 2010


Vernon wrote:
> You need the 'byte' class for Python 3 anyway. Implement it now.

Done!  Assuming you mean bytes it’s in 2.6 already.  Now if everyone would upgrade their code to use b’’ :)

> A small sample...
>
> <code x.py>
> import sys
> u = u'1234\u00f6'
> s = '1234'
> x = str(s)
> print type(x), repr(x)
> x = unicode(s)
> print type(x), repr(x)
> try:
>    x = unicode(u)
>    print type(x), repr(x)
> except:
>    print 'Error=',sys.exc_info()[0]
> try:
>    x = str(u)
>    print type(x), repr(x)
> except:
>    print 'Error=',sys.exc_info()[0]
> </code>
> --------------------
>
> The results...
>
> >c:\python26\python.exe x.py
> <type 'str'> '1234'
> <type 'unicode'> u'1234'
> <type 'unicode'> u'1234\xf6'
> Error= <type 'exceptions.UnicodeEncodeError'>
>
> >"c:\program files\Ironpython 2.6\ipy.exe" x.py
> <type 'str'> '1234'
> <type 'str'> '1234'
> Error= <type 'exceptions.UnicodeDecodeError'>
> Error= <type 'exceptions.UnicodeDecodeError'>
>
> >copy x.py x3.py
> >2to3 -w x3.py
> >c:\python31\python.exe x3.py
> <class 'str'> '1234'
> <class 'str'> '1234'
> <class 'str'> '1234ö'
> <class 'str'> '1234ö'
> ------------------------------
> One would think that IronPython should produce the same output as Python 3 -- since 'str' and 'unicode' are the same thing in both dialects. In particular, the exception when 'converting' unicode to > unicode is just plain wrong.


I'm not going to argue the exception isn't wrong.  But saying IronPython should output the same thing as an entirely different script isn't right either.  After running 2to3 the script looks like this for me:

import sys
u = '1234\u00f6'
s = '1234'
x = str(s)
print(type(x), repr(x))
x = str(s)
print(type(x), repr(x))
try:
    x = str(u)
    print(type(x), repr(x))
except:
    print('Error=',sys.exc_info()[0])
try:
    x = str(u)
    print(type(x), repr(x))
except:
    print('Error=',sys.exc_info()[0])

You can argue whether or not 2to3 did the right thing here - it has completely dropped the distinction between str and unicode.  In reality if this was a script written for Python 2.5 and above your usage of str here is ambiguous.  If this script was written for 2.6 and above then it's clear you want strings and not bytes because you'd have used bytes/bytearray/b'' to indicate bytes.  The problem is there's still lots of code which runs on 2.5+ and won't be using bytes/bytearray/b'' but really is dealing with bytes and not strings.

The fact is this is going to be broken unless we were to make str be a distinct type from Unicode - then there'd be no ambiguity and we wouldn't have to guess.  But that's a massive change which propagates through the entire IronPython code base and involves tons of breaking changes.  I've looked at doing this before and it's spreads everywhere and there's lots of new ugliness.  We could look at doing it again but it seems like making that massive change and then switching to 3k and changing it all back isn't very productive.





More information about the Ironpython-users mailing list