Read a gzip file from inside a tar file
Fredrik Lundh
fredrik at pythonware.com
Mon Dec 13 16:07:30 EST 2004
Craig Ringer wrote:
>> These are huge files. My goal is to analyze the content of the gzip
>> file in the tar file without having to un gzip. If that is possible.
>
> As far as I know, gzip is a stream compression algorithm that can't be
> decompressed in small blocks. That is, I don't think you can seek 500k
> into a 1MB file and decompress the next 100k.
correct.
> I'd say you'll have to progressively read the file from the beginning,
> processing and discarding as you go. It looks like a no-brainer to me -
> see zlib.decompressobj.
it can be a bit tricky to set things up properly, though. here's a piece
of code that uses Python's good old consumer interface to decode things
incrementally:
http://effbot.org/zone/consumer-gzip.htm
you can either use this as is; just create a "target consumer", wrap it in the
gzip consumer, and feed data to the gzip consumer in suitable pieces.
alternatively, hack it until it does what you want.
</F>
More information about the Python-list
mailing list