[New-bugs-announce] [issue15278] UnicodeDecodeError when readline in codecs.py
lovelylain
report at bugs.python.org
Sat Jul 7 17:18:39 CEST 2012
New submission from lovelylain <lovelylain at keju.tk>:
This is an example, `for line in fp` will raise UnicodeDecodeError:
#! -*- coding: utf-8 -*-
import codecs
text = u'\u6731' + u'\U0002a6a5' * 18
print repr(text)
with codecs.open('test.txt', 'wb', 'utf-16-le') as fp:
fp.write(text)
with codecs.open('test.txt', 'rb', 'utf-16-le') as fp:
print repr(fp.read())
with codecs.open('test.txt', 'rb', 'utf-16-le') as fp:
for line in fp:
print repr(line)
I read code in codecs.py:
def read(self, size=-1, chars=-1, firstline=False):
""" Decodes data from the stream self.stream and returns the
resulting object.
...
If firstline is true, and a UnicodeDecodeError happens
after the first line terminator in the input only the first line
will be returned, the rest of the input will be kept until the
next call to read().
"""
...
try:
newchars, decodedbytes = self.decode(data, self.errors)
except UnicodeDecodeError, exc:
if firstline:
newchars, decodedbytes = self.decode(data[:exc.start], self.errors)
lines = newchars.splitlines(True)
if len(lines)<=1:
raise
else:
raise
...
It seems that the firstline argument is not consistent with its doc description.
I don't konw why this argument was added and why lines count was checked.
If it was added for readline function to fix some decode errors, we may have no EOLs in data readed, so it caused UnicodeDecodeError too.
Maybe we should write code like below to support codecs readline.
def read(self, size=-1, chars=-1, autotruncate=False):
...
try:
newchars, decodedbytes = self.decode(data, self.errors)
except UnicodeDecodeError, exc:
if autotruncate and exc.start:
newchars, decodedbytes = self.decode(data[:exc.start], self.errors)
else:
raise
...
----------
components: Library (Lib)
messages: 164869
nosy: lovelylain
priority: normal
severity: normal
status: open
title: UnicodeDecodeError when readline in codecs.py
type: behavior
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue15278>
_______________________________________
More information about the New-bugs-announce
mailing list