codecs UTF-8 StreamReader ignores errors arg
wade at lightlink.com
wade at lightlink.com
Tue Dec 5 07:20:12 EST 2000
I've run into a problem using the codecs module to decode data that is
mostly UTF-8, but with some bogus characters thrown in. The
StreamReader class seems to ignore an 'errors' argument passed to its
constructor, so it uses the default, which is 'strict'.
A short session illustrating the problem is shown below. Any advice
appreciated.
Wade Leftwich
Ithaca, NY
------------------------------------
>>> import codecs
>>> from StringIO import StringIO
>>> encode, decode, reader, writer = codecs.lookup('UTF-8')
>>> s = 'ab\346c'
>>> decode(s, 'replace')
(u'ab\uFFFDc', 4)
>>> fh = StringIO(s)
>>> sr = reader(fh, 'replace')
>>> sr.read()
Traceback (innermost last):
File "<interactive input>", line 1, in ?
File "c:\python20\lib\codecs.py", line 208, in read
return self.decode(self.stream.read())[0]
UnicodeError: UTF-8 decoding error: unexpected end of data
Sent via Deja.com http://www.deja.com/
Before you buy.
More information about the Python-list
mailing list