Using codecs.EncodedFile() with Python 2.5
Peter Otten
__peter__ at web.de
Wed Jan 3 07:14:49 EST 2007
David Hughes wrote:
> I used this function successfully with Python 2.4 to alter the encoding
> of a set of database records from latin-1 to utf-8, but the same
> program raises an exception using Python 2.5. This small example shows
> the problem:
>
> import codecs
> fo = open('test.dat', 'w')
> fo.write('G\xe2teaux')
> fo.close()
>
> fi = open("test.dat",'r')
> fx = codecs.EncodedFile(fi, 'utf-8', 'latin-1')
> astring = fx.readline()
> print astring
> ustring = unicode(astring, 'utf-8' )
> print repr(ustring)
> print ustring.encode('latin-1')
> print ustring.encode('utf-8')
>
> Python 2.4 gives:
>
> Gâteaux
> u'G\xe2teaux'
> Gâteaux
> Gâteaux
>
> which I believe is correct, while 2.5 produces
>
> Traceback (most recent call last):
> File "test_codec.py", line 8, in <module>
> astring = fx.readline()
> File "C:\Python25\lib\codecs.py", line 709, in readline
> data = self.reader.readline()
> File "C:\Python25\lib\codecs.py", line 471, in readline
> data = self.read(readsize, firstline=True)
> File "C:\Python25\lib\codecs.py", line 418, in read
> newchars, decodedbytes = self.decode(data, self.errors)
> UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-3:
> invalid data
>
> Is there a genuine problem here, or have I been misusing this function?
This is indeed a bug in Python 2.5. Fixed in subversion.
http://svn.python.org/view/python/trunk/Lib/codecs.py?rev=52517&view=log
Peter
More information about the Python-list
mailing list