really slow gzip decompress, why?
Diez B. Roggisch
deets at nospam.web.de
Mon Jan 26 16:47:03 CET 2009
> I've one big (6.9 Gb) .gz file with text inside it.
> zcat bigfile.gz > /dev/null does the job in 4 minutes 50 seconds
> python code have been doing the same job for 25 minutes and still
> doesn't finish =( the code is simpliest I could ever imagine:
> def main():
> fh = gzip.open(sys.argv)
> As far as I understand most of the time it executes C code, so pythons
> no overhead should be noticible. Why is it so slow?
I'm guessing here - but if gzip streams (and AFAIK it does), the commandline
will simply stream to /dev/null.
OTOH, python is not streaming, it will instead allocate buffers for the
whole file. Which for a *zipped* 6.9Gb file might take a while.
More information about the Python-list