[Tutor] File Access

Nick Lunt nick at javacat.f2s.com
Mon Apr 5 13:21:55 EDT 2004


Hi Danny,

'for line in somefile' works very well, and like you say will not bog my
system down (which is fortunate as my lowly 700Mhz machine is even
struggling with linux now). 

      * If you plan to do this sort of thing a lot, it might be worth it
        to build
      * an "index" that records the byte offsets of each line.  We can
        talk about
      * that if you'd like.

That sounds interesting. If you could give me a quick rundown on that
concept or point me to somewhere that can I'd be very grateful.

Many thanks,
Nick.


On Mon, 2004-04-05 at 09:33, Danny Yoo wrote:
> On Sun, 4 Apr 2004, Nick Lunt wrote:
> 
> > I could probably do a realines() until EOF overwriting a variable with
> > the contents of each line so that at the end the variable will hold the
> > contents of just the last line, or by putting each line of the file into
> > a list and pulling the last element of the list out, but eventually the
> > file.txt will get pretty big, so I thought of using seek().
> 
> Hi Nick,
> 
> 
> Yes, that's one approach.
> 
> If you are going to read a large file, definitely try to avoid using
> readlines(), because that tries to load the whole file into memory at
> once.  Usually, we don't need to read the whole file at once: things often
> work if we read and process a line at a time, to avoid holding the whole
> file at once.
> 
> So instead of:
> 
> ###
> for line in somefile.readlines():     ## First approach
>     ...
> ###
> 
> it's more efficient to say:
> 
> ###
> while 1:                              ## Second approach
>     line = somefile.readline()
>     if not line: break
>     ...
> ###
> 
> 
> This second appraoch is a little ugly, even though it scales better than
> the first.  Thankfully, there's a way out: recent versions of Python let
> us write a line-by-line approach by taking advantage of new features of
> the for loop:
> 
> ###
> for line in somefile:                 ## Third approach
>     ...
> ###
> 
> Here, we "iterate" across a file, just as if we were iterating across a
> list.  This third approach looks just as nice as the first 'buggy'
> approach, but with the low memory requirements of the second.  So it's the
> best of both worlds.  *grin*
> 
> 
> > However, I could not find on python.org or by googling any way to do
> > this.
> 
> Try iterating across the whole loop, and keep track of the very last line
> that's read.  That should be the line you're looking for.
> 
> 
> If you plan to do this sort of thing a lot, it might be worth it to build
> an "index" that records the byte offsets of each line.  We can talk about
> that if you'd like.
> 
> 
> If you have any questions on this, please feel free to ask.  Good luck to
> you!
> 
> 




More information about the Tutor mailing list