[IronPython] Problems with 8-bit strings
Patrick Dubroy
pdubroy at gmail.com
Wed Nov 21 21:18:30 CET 2007
Hi,
I've noticed that in the latest version of IronPython (2.0A6), I
noticed some weird behaviour with 8-bit strings:
IronPython console: IronPython 2.0A6 (2.0.11102.00) on .NET 2.0.50727.1378
Copyright (c) Microsoft Corporation. All rights reserved.
>>> str("\x7e")
'~'
>>> str("\x7f")
u'\x7f'
>>> str("\x80")
u'\x80'
>>> str("\x81")
Traceback (most recent call last):
File , line 0, in ##23
File mscorlib, line unknown, in GetString
File mscorlib, line unknown, in GetChars
File mscorlib, line unknown, in Fallback
File mscorlib, line unknown, in Throw
UnicodeDecodeError: Unable to translate bytes [81] at index 0 from
specified code page to Unicode.
The first problem is that if the string contains characters 127 (0x7F)
or 128 (0x80), str() will return a Unicode string rather than an 8-bit
string. CPython, on the other hand, returns a standard 8-bit string
for both of those cases. Then, if the string contains any bytes
greater than 128, it throws an exception. CPython, on the other hand,
is happy to have bytes up to 0xFF in an 8-bit string.
Is this a known issue? Should I open a bug?
Pat
More information about the Ironpython-users
mailing list