str() should convert ANY object to a string without EXCEPTIONS !

Lie Lie.1296 at gmail.com
Sun Sep 28 11:49:11 CEST 2008


On Sep 28, 12:37 pm, est <electronix... at gmail.com> wrote:
> From python manual
>
> str( [object])
>
> Return a string containing a nicely printable representation of an
> object. For strings, this returns the string itself. The difference
> with repr(object) is that str(object) does not always attempt to
> return a string that is acceptable to eval(); its goal is to return a
> printable string. If no argument is given, returns the empty string,
> ''.
>
> now we try this under windows:
>
> >>> str(u'\ue863')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
> position 0
> : ordinal not in range(128)
>
> FAIL.

And it is correct to fail, ASCII is only defined within range(128),
the rest (i.e. range(128, 256)) is not defined in ASCII. The
range(128, 256) are extension slots, with many conflicting meanings.

>
> also almighty Linux
>
> Python 2.3.4 (#1, Feb  6 2006, 10:38:46)
> [GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.>>> str(u'\ue863')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
> position 0: ordinal not in range(128)
>
> Python 2.4.4 (#2, Apr  5 2007, 20:11:18)
> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.>>> str(u'\ue863')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
> position 0: ordinal not in range(128)
>
> Python 2.5 (release25-maint, Jul 20 2008, 20:47:25)
> [GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.>>> str(u'\ue863')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> UnicodeEncodeError: 'ascii' codec can't encode character u'\ue863' in
> position 0: ordinal not in range(128)

If that str() function has returned anything but error on this, I'd
file a bug report.

> The problem is, why the f**k set ASCII encoding to range(128) ????????
> while str() is internally byte array it should be handled in
> range(256) !!!!!!!!!!

string is a byte array, but unicode and ASCII is NOT. Unicode string
is a character array defined up to range(65535). Each character in
unicode may be one or two bytes long. ASCII string is a character
array defined up to range(127). Other than Unicode (actually utf-8,
utf-16, and utf-32) and ASCII, there are many other encodings (ECBDIC,
iso-8859-1', ..., 'iso-8859-16', 'KOI8', 'GB18030', 'Shift-JIS', etc,
etc, etc) each with conflicting byte to characters mappings.
Fortunately, most of these encodings do share a common ground: ASCII.

Actually, when a strictly stupid str() receives a Unicode string (i.e.
character array), it should return a <unicode s at
0x423549af813e4954>, but it doesn't, str() is smarter than that, it
tries to convert whatever fits into ASCII, i.e. characters lower than
128. Why ASCII? Because character from range(128, 256) varies widely
and it doesn't know which encoding you want to use, so if you don't
tell me what encoding to use it'd not guess (Python Zen: In the face
of ambiguity, refuse the temptation to guess).

If you're trying to convert a character array (Unicode) into a byte
string, it's done by specifying which codec you want to use. str()
tries to convert your character array (Unicode) to byte string using
ASCII codec. s.encode(codec) would convert a given character array
into byte string using codec.

> http://bugs.python.org/issue3648
>
> One possible solution(Windows Only)
>
> >>> str(u'\ue863'.encode('mbcs'))
> '\xfe\x9f'

actually str() is not needed, you need only: u'\ue863'.encode('mbcs')

> >>> print u'\ue863'.encode('mbcs')
>>
> I now spending 60% of my developing time dealing with ASCII range(128)
> errors. It was PAIN!!!!!!

Despair not, there is a quick hack:
# but only use it as temporary solution, FIX YOUR CODE PROPERLY
str_ = str
str = lambda s = '': s.encode('mbcs') if isinstance(s, basestring)
else str_(s)

> Please fix this issue.
>
> http://bugs.python.org/issue3648
>
> Please.




More information about the Python-list mailing list