[ python-Bugs-1379393 ] StreamReader.readline doesn't advance on decode errors

Sun Feb 19 01:58:54 CET 2006

Bugs item #1379393, was opened at 2005-12-13 11:35
Message generated for change (Comment added) made by birkenfeld
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1379393&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
>Status: Closed
>Resolution: Wont Fix
Priority: 5
Submitted By: Matthew Mueller (donut)
Assigned to: Walter Dörwald (doerwalter)
Summary: StreamReader.readline doesn't advance on decode errors

Initial Comment:
In previous versions of python, when there was a
unicode decode error, StreamReader.readline() would
advance to the next line.  In the current version(2.4.2
and trunk),  it doesn't.  Testing under Linux AMD64
(Ubuntu 5.10)

Attaching an example script.  In python2.3 it prints:

hi~
hi
error: 'utf8' codec can't decode byte 0x80 in position
2: unexpected code byte
error: 'utf8' codec can't decode byte 0x81 in position
2: unexpected code byte
all done

In python2.4 and trunk it prints:
hi~
hi
error: 'utf8' codec can't decode byte 0x80 in position
0: unexpected code byte
error: 'utf8' codec can't decode byte 0x80 in position
0: unexpected code byte
error: 'utf8' codec can't decode byte 0x80 in position
0: unexpected code byte
[repeats forever]

Maybe this isn't actually supposed to work (the docs
don't mention what should happen with strict error
checking..), but it would be nice, given the alternatives:
1. use errors='replace' and then search the result for
the replacement character. (ick)
2. define a custom error handler similar to ignore or
replace, that also sets some flag. (slightly less ick,
but more work.)

----------------------------------------------------------------------

>Comment By: Georg Brandl (birkenfeld)
Date: 2006-02-19 01:58

Message:
Logged In: YES 
user_id=1188172

Closing as Won't Fix, then.

----------------------------------------------------------------------

Comment By: Walter Dörwald (doerwalter)
Date: 2005-12-16 18:25

Message:
Logged In: YES 
user_id=89016

IMHO the current behaviour is more consistent. To read the
broken utf-8 stream from the test script the appropriate
error handler should be used. What is the desired outcome?
If only the broken byte sequence should be skipped
errors="replace" is appropriate. To skip a complete line
that contains a broken byte sequence do something like in
the attached skipbadlines.py. The StreamReader can't know
which behaviour is wanted.

----------------------------------------------------------------------

Comment By: Georg Brandl (birkenfeld)
Date: 2005-12-15 22:42

Message:
Logged In: YES 
user_id=1188172

I don't know what should be correct. Walter?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1379393&group_id=5470