[Patches] [ python-Patches-580331 ] xreadlines caching, file iterator

noreply@sourceforge.net noreply@sourceforge.net
Mon, 05 Aug 2002 07:52:00 -0700


Patches item #580331, was opened at 2002-07-11 17:45
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 3
Submitted By: Oren Tirosh (orenti)
Assigned to: Guido van Rossum (gvanrossum)
Summary: xreadlines caching, file iterator

Initial Comment:
Calling f.xreadlines() multiple times returns the same 
xreadlines object.

A file is an iterator - __iter__() returns self and next() calls 
the cached xreadlines object's next method.



----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2002-08-05 10:52

Message:
Logged In: YES 
user_id=6380

This begins to look good.

What's a normal text file?  One with a million bytes? :-)

Have you made sure this works as expected in Universal
newline mode?

I'd like a patch that doesn't use #define WITH_READAHEAD_BUFFER.

You might also experiment with larger buffer sizes (I
predict that a larger buffer doesn't make much difference,
since it didn't for xreadlines, but it would be nice to
verify that and then add a comment; at least once a year
someone asks whether the buffer shouldn't be much larger).

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-08-05 02:27

Message:
Logged In: YES 
user_id=562624

The version of the patch still makes a file an iterator but it no 
longer depends on xreadlines - it implements the readahead 
buffering inside the file object.

It is about 19% faster than xreadlines for normal text files and 
about 40% faster for files with 100k lines.

The methods readline and read do not use this readahead 
mechanism because it skews the current file position (just like 
xreadlines does).


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-17 13:50

Message:
Logged In: YES 
user_id=6380

Alas, there's a fatal flaw. The file object and the
xreadlines object now both have pointers to each other,
creating an unbreakable cycle (since neither participates in
GC). Weak refs can't be used to resolve this dilemma. I
personally think that's enough to just stick with the status
quo (I was never more than +0 on the idea of making the file
an interator anyway). But I'll leave it to Oren to come up
with another hack (please use this same SF patch).

Oren, if you'd like to give up, please say so and I'll close
the item in a jiffy. In fact, I positively encourage you to
give up. But I don't expect you to take this offer. :-)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-16 21:33

Message:
Logged In: YES 
user_id=6380

I'm reviewing this and will check it in, or something like
it (probably).

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2002-07-16 01:26

Message:
Logged In: YES 
user_id=562624

Now invalidates cache on a seek.



----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-07-15 10:38

Message:
Logged In: YES 
user_id=6380

I posted some comments to python-dev.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=580331&group_id=5470