[Tutor] use gzip with large files

Hugo González Monteverde hugonz-lists at h-lab.net
Tue Jul 19 19:04:44 CEST 2005


for a file-like object with a read method:

for line in file:
     if not line:
         break

line will be "" for EOF, "\n" for an empty line.

This will not read the whole file to RAM at once. I'm not familiar with 
the gzip module, but if the read() solution works for small files, the 
one I present will work for larger files.

Hugo

frank h. wrote:
> hello all
> I am trying to write a script in python that parses a gzipped logfile
> 
> the unzipped logfiles can be very large (>2GB)
> 
> basically the statements
> 
> file = gzip.GzipFile(logfile)
> data = file.read()
> 
> for line in data.striplines():
> ....
> 
> 
> would do what I want, but this is not feasible becasue the gzip files
> are so huge.
> 
> So I do file.readline() in a for loop, but have no idea how long to
> continue, because I dont know how many lines the files contain. How do
> I check for end of file when using readline() ?
> simply put it in a while loop and enclose it with try: except: ?
> 
> what would be the best (fastest) approach to deal with such large gzip
> files in python?
> 
> thanks
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


More information about the Tutor mailing list