[ python-Bugs-844561 ] codecs.open().readlines(sizehint) bug

SourceForge.net noreply at sourceforge.net
Thu Feb 26 04:51:08 EST 2004


Bugs item #844561, was opened at 2003-11-18 18:22
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=844561&group_id=5470

Category: Unicode
Group: Python 2.2
Status: Open
Resolution: None
Priority: 3
Submitted By: Jeff Epler (jepler)
Assigned to: M.-A. Lemburg (lemburg)
Summary: codecs.open().readlines(sizehint) bug

Initial Comment:
codecs.open().readlines(sizehint) can return truncated
lines.  The attached script, which uses
readlines(sizehint) to count the number of lines in a
file, demonstrates the problem.  Correct output would
be 1000 in both cases, but different values are
returned depending on sizehint because of the truncated
lines.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2004-02-26 10:51

Message:
Logged In: YES 
user_id=38388

Good catch. I must have overread the "whole lines" bit :-)

In that case, it's probably best to have .readlines() ignore
the sizehint argument altogether. An efficient implementation
is hard to do since the line breaking is not done at C level,
but after the data has been read.

----------------------------------------------------------------------

Comment By: Jeff Epler (jepler)
Date: 2004-02-26 02:14

Message:
Logged In: YES 
user_id=2772

To me, the phrase "*whole lines* totalling approximately
sizehint" means that no item from readlines(sizehint) will
be an incomplete line.  I don't understand why this
requirement isn't clearly indicated to you by the text you
included in your comments.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-02-26 00:04

Message:
Logged In: YES 
user_id=38388

It's hard to say whether this is a bug or not. The sizehint
argument is not well documented and the way you use it
does not look a proper way to use it.

>From the docs:
""""
f the optional sizehint argument is present, instead of
reading up to EOF, whole lines totalling approximately
sizehint bytes (possibly after rounding up to an internal
buffer size) are read. 
""""

In your example the underlying open() implementation 
seems to round up the sizehint value to include the whole
line, while the codec.open() version will only read sizehint
bytes without any rounding (see the codecs.py 
implementation).


----------------------------------------------------------------------

Comment By: Jeff Epler (jepler)
Date: 2003-11-18 18:28

Message:
Logged In: YES 
user_id=2772

The script triggers the assertion error using at least
python 2.3.2 (locally compiled) and python 2.2.2 (redhat 9 RPM)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=844561&group_id=5470



More information about the Python-bugs-list mailing list