[docs] [issue21146] update gzip usage examples in docs

Wolfgang Maier report at bugs.python.org
Thu Apr 3 12:40:06 CEST 2014


New submission from Wolfgang Maier:

The current documentation of the gzip module should have its section "12.2.1. Examples of usage" updated to reflect the changes made to the module in Python3.2 (https://docs.python.org/3.2/whatsnew/3.2.html#gzip-and-zipfile).

Currently, the recipe given for gz-compressing a file is:

import gzip
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        f_out.writelines(f_in)

which is clearly sub-optimal because it is line-based.

An equally simple, but more efficient recipe would be:

chunk_size = 1024
with open('/home/joe/file.txt', 'rb') as f_in:
    with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out:
        while True:
            c = f_in.read(chunk_size)
            if not c: break
            d = f_out.write(c)

Comparing the two examples I find a >= 2x performance gain (both in terms of CPU time and wall time).

In the inverse scenario of file *de*-compression (which is not part of the docs though), the performance increase of substituting:

with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
    with open('/home/joe/file.txt', 'wb') as f_out:
        f_out.writelines(f_in)

with:

with gzip.open('/home/joe/file.txt.gz', 'rb') as f_in:
    with open('/home/joe/file.txt', 'wb') as f_out:
        while True:
            c = f_in.read(chunk_size)
            if not c: break
            d = f_out.write(c)

is even higher (4-5x speed-ups).

In the de-compression case, another >= 2x speed-up can be achieved by avoiding the gzip module completely and going through a zlib.decompressobj instead, but of course this is a bit more complicated and should be documented in the zlib docs rather than the gzip docs (if you're interested, I could provide my code for it though).
Using the zlib library compression/decompression speed gets comparable to linux gzip/gunzip.

----------
assignee: docs at python
components: Documentation
messages: 215440
nosy: docs at python, wolma
priority: normal
severity: normal
status: open
title: update gzip usage examples in docs
type: performance
versions: Python 3.2, Python 3.3, Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21146>
_______________________________________


More information about the docs mailing list