[ python-Bugs-1331062 ] utf 7 codec broken
SourceForge.net
noreply at sourceforge.net
Wed Oct 19 12:58:12 CEST 2005
Bugs item #1331062, was opened at 2005-10-19 08:23
Message generated for change (Comment added) made by titty
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1331062&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Ralf Schmitt (titty)
Assigned to: M.-A. Lemburg (lemburg)
Summary: utf 7 codec broken
Initial Comment:
the following code doesn't work as expected:
ralf at stronzo:~$ cat t.py
#! /usr/bin/env python
s = 'Auguste and Louis Lumi\xe8re'
print repr(s)
u1 = s.decode('utf7')
print 'from utf7: %d %r' % (len(u1), u1)
u2 = u'Auguste and Louis Lumi\xe8re'
print ' u2: %d %r' % (len(u2), u2)
print 'u1==u2', u1==u2
e1 = u1.encode('utf8')
e2 = u2.encode('utf8')
print 'e1=%r' % e1
print 'e2=%r' % e2
unicode(e2, 'utf8')
unicode(e1, 'utf8')
ralf at stronzo:~$ python t.py
'Auguste and Louis Lumi\xe8re'
from utf7: 25 u'Auguste and Louis Lumi\xe8re'
u2: 25 u'Auguste and Louis Lumi\xe8re'
u1==u2 False
e1='Auguste and Louis Lumi\xff\xbf\xbf\xa8re'
e2='Auguste and Louis Lumi\xc3\xa8re'
Traceback (most recent call last):
File "t.py", line 19, in ?
unicode(e1, 'utf8')
File "/usr/local/lib/python2.4/encodings/utf_8.py",
line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff
in position 22: unexpected code byte
----------------------------------------------------------------------
>Comment By: Ralf Schmitt (titty)
Date: 2005-10-19 10:58
Message:
Logged In: YES
user_id=17929
On Debian testing and Freebsd 4.11 using Python 2.4.2
'\xe8'.decode('utf7') succeeds...
Using the windows version I also get that error.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2005-10-19 10:30
Message:
Logged In: YES
user_id=38388
Hmm, running Python 2.4.2 I get:
>>> s = 'Auguste and Louis Lumi\xe8re'
>>> print repr(s)
'Auguste and Louis Lumi\xe8re'
>>> u1 = s.decode('utf7')
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf7' codec can't decode bytes in
position 0-22: unexpected special character
Which looks correct as UTF-7 may not contain characters
having the hig bit set.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1331062&group_id=5470
More information about the Python-bugs-list
mailing list