[Tutor] use gzip with large files

Kent Johnson kent37 at tds.net
Tue Jul 19 18:51:36 CEST 2005


Adam Bark wrote:
> If you use something like this:
> 
> for line in file.readlines():

This will read the whole file into a list of lines; the OP doesn't want to read the whole file at once.

> then line is a string to the next newline and it automatically detects 
> the EOF and the same with file.readline() but that will give you one 
> character at a time.

?? file.readline() reads by lines, not by characters.

A file is also an iterator (by lines) so the simplest way to iterate over lines in a file is just
for line in f:
  # process line

Specifically frank should be able to use
file = gzip.GzipFile(logfile)
for line in file():

though 'file' is a bad choice of names as it is also a built-in.

To check for EOF when using readline() just look for an empty string:
while True:
  line = f.readline()
  if not line:
    break

but 'for line in f' is preferable.

Kent

> 
> On 7/19/05, *frank h.* <frank.hoffsummer at gmail.com 
> <mailto:frank.hoffsummer at gmail.com>> wrote:
> 
>     hello all
>     I am trying to write a script in python that parses a gzipped logfile
> 
>     the unzipped logfiles can be very large (>2GB)
> 
>     basically the statements
> 
>     file = gzip.GzipFile(logfile)
>     data = file.read ()
> 
>     for line in data.striplines():
>     ....
> 
> 
>     would do what I want, but this is not feasible becasue the gzip files
>     are so huge.
> 
>     So I do file.readline() in a for loop, but have no idea how long to
>     continue, because I dont know how many lines the files contain. How do
>     I check for end of file when using readline() ?
>     simply put it in a while loop and enclose it with try: except: ?
> 
>     what would be the best (fastest) approach to deal with such large gzip
>     files in python?
> 
>     thanks
>     _______________________________________________
>     Tutor maillist  -  Tutor at python.org <mailto:Tutor at python.org>
>     http://mail.python.org/mailman/listinfo/tutor
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list