Python vs. Java gzip performance
Peter Otten
__peter__ at web.de
Fri Mar 17 16:26:00 EST 2006
Caleb Hattingh wrote:
> I tried this:
>
> from timeit import *
>
> #Try readlines
> print Timer('import
> gzip;lines=gzip.GzipFile("gztest.txt.gz").readlines();[i+"1" for i in
> lines]').timeit(200) # This is one line
>
>
> # Try file object - uses buffering?
> print Timer('import gzip;[i+"1" for i in
> gzip.GzipFile("gztest.txt.gz")]').timeit(200) # This is one line
>
> Produces:
>
> 3.90938591957
> 3.98982691765
>
> Doesn't seem much difference, probably because the test file easily
> gets into memory, and so disk buffering has no effect. The file
> "gztest.txt.gz" is a gzipped file with 1000 lines, each being "This is
> a test file".
$ python -c"file('tmp.txt', 'w').writelines('%d This is a test\n' % n for n
in range(1000))"
$ gzip tmp.txt
Now, if you follow Martin's advice:
$ python -m timeit -s"from gzip import GzipFile"
"GzipFile('tmp.txt.gz').readlines()"
10 loops, best of 3: 20.4 msec per loop
$ python -m timeit -s"from gzip import GzipFile"
"GzipFile('tmp.txt.gz').read().splitlines(True)"
1000 loops, best of 3: 534 usec per loop
Factor 38. Not bad, I'd say :-)
Peter
More information about the Python-list
mailing list