A fast way to read last line of gzip archive ?

Barak, Ron Ron.Barak at lsi.com
Sun May 24 02:52:29 EDT 2009


 

> -----Original Message-----
> From: MRAB [mailto:google at mrabarnett.plus.com] 
> Sent: Thursday, May 21, 2009 19:02
> To: 'python-list at python.org'
> Subject: Re: A fast way to read last line of gzip archive ?
> 
> Barak, Ron wrote:
> > Hi,
> >  
> > I need to read the end of a 20 MB gzip archives (To extract 
> the date 
> > from the last line of a a gzipped log file).
> > The solution I have below takes noticeable time to reach the end of 
> > the gzip archive.
> >  
> > Does anyone have a faster solution to read the last line of 
> a gzip archive ?
> >  
> > Thanks,
> > Ron.
> >  
> > #!/usr/bin/env python
> >  
> > import gzip
> >  
> > path = "./a/20/mb/file.tgz"
> >  
> > in_file = gzip.open(path, "r")
> > first_line = in_file.readline()
> > print "first_line ==",first_line
> > in_file.seek(-500)
> > last_line = in_file.readlines()[-1]
> > print "last_line ==",last_line
> > 
> It takes a noticeable time to reach the end because, well, 
> the data is compressed! The compression method used requires 
> the preceding data to be read first.

I thought maybe someone has a way to unzip just the end portion of the archive (instead of the whole archive), as only the last part is needed for reading the last line.

Bye,
Ron.



More information about the Python-list mailing list