Seek the one billionth line in a file containing 3 billion lines.
Sullivan WxPyQtKinter
sullivanz.pku at gmail.com
Wed Aug 8 02:41:37 EDT 2007
On Aug 8, 2:35 am, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> Sullivan WxPyQtKinter <sullivanz.... at gmail.com> writes:
> > This program:
> > for i in range(1000000000):
> > f.readline()
> > is absolutely every slow....
>
> There are two problems:
>
> 1) range(1000000000) builds a list of a billion elements in memory,
> which is many gigabytes and probably thrashing your machine.
> You want to use xrange instead of range, which builds an iterator
> (i.e. something that uses just a small amount of memory, and
> generates the values on the fly instead of precomputing a list).
>
> 2) f.readline() reads an entire line of input which (depending on
> the nature of the log file) could also be of very large size.
> If you're sure the log file contents are sensible (lines up to
> several megabytes shouldn't cause a problem) then you can do it
> that way, but otherwise you want to read fixed size units.
Thank you for pointing out these two problem. I wrote this program
just to say that how inefficient it is to use a seemingly NATIVE way
to seek a such a big file. No other intention........
More information about the Python-list
mailing list