[IronPython] unicode bug?

Dino Viehland dinov at exchange.microsoft.com
Mon May 1 18:34:39 CEST 2006


The problem, as you correctely pointed out, is that we don't actually have separate data types for ASCII & Unicode strings - we only have Unicode strings.

Therefore when you ask for a Unicode string from an ASCII string we have to do some magic to figure out whether you're using your string as a byte array that contains the bytes for a Unicode string, or if you're using your string as a Unicode string and we need to return you back the original string.

I agree w/ you that the error message could be better - my guess would be that the "specified code page" here is ASCII but that really is just a guess.  And if I'm guessing I'm betting most other people won't know what's going on either :).



Do you want to help develop Dynamic languages on CLR? (http://members.microsoft.com/careers/search/details.aspx?JobID=6D4754DE-11F0-45DF-8B78-DC1B43134038)

-----Original Message-----
From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of J. Merrill
Sent: Monday, May 01, 2006 9:15 AM
To: Discussion of IronPython
Subject: Re: [IronPython] unicode bug?

(This presumes that IronPython has separate string and unicode types, like CPython does.  If that's not the case, well, "never mind...")

Shouldn't it be the case that calling   typename(value)   does as little work as possible if the value is already of the specified type?  That is, it would be a shame if
    i = 5
    j = int(i)
did a lot of work to ensure that i is a valid int (within range of 32-bit integer etc).  So the sample code should not be testing to see if the already-unicode-string can be converted to a unicode string -- should it?  (That doesn't mean that there's no problem that could be demonstrated by slightly different code, like   u = unicode (Sq2 + 'hello')   for example.)

Shouldn't the error message say "from code page XXX to unicode" rather than saying "from specified code page to unicode"?  How else to know (without a lot of investigation) what code page was "specified"?

At 11:34 AM 5/1/2006, Dino Viehland wrote (in part)
>Thanks for the bug report, I've got it filed in our bug database.
>
>I'm thinking we should be able to get to this one for beta 7 if it doesn't end up being too complex (Unicode can always be trickier than you initially expect).
>
>-----Original Message-----
>From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of Cheemeng
>Sent: Sunday, April 30, 2006 2:59 AM
>To: users at lists.ironpython.com
>Subject: [IronPython] unicode bug?
>
>hi,
>
>Sq2 = u'\xb2'
>u = unicode(Sq2)
>print u is Sq2
>
>in CPython, the unicode function returns back the same str,
>in IP, an exception is thrown,
>UnicodeDecodeError: Unable to translate bytes [B2] at index 0 from
>specified code page to Unicode.
>
>regards,
>cheemeng


J. Merrill / Analytical Software Corp


_______________________________________________
users mailing list
users at lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com



More information about the Ironpython-users mailing list