[pypy-issue] [issue770] bzip2 decompression significantly slower than on CPython

Xavier Morel tracker at bugs.pypy.org
Wed Jun 29 13:41:31 CEST 2011

New submission from Xavier Morel <bugs.pypy.org at masklinn.net>:

Using a clone of pypy's hg repo (working copy included) as my tar base, decompressing to fs using `tarfile`.

Test archives created using BSDTAR, default options (`tar cjf` and `tar czf`), likewise for tar's decompression baseline (`tar xf` in both 

hg id of local Pypy clone is 27df060341f0 tip

OS is OSX 10.6.8

Decompressors tested:
* CPython is Python 2.7.2
* Pypy 1.5 is Python 2.7.1 (?, May 22 2011, 11:59:12) [PyPy 1.5.0-alpha0 with GCC 4.0.1] from macports
* Pypy trunk is Pypy-65b1ed60d7da from nightlies
* Tar is bsdtar 2.6.2 - libarchive 2.6.2

CPython and Pypy were running the exact same script, which can be found at the end of the comment

All measurements were performed via `time` and are in minute:seconds, they're the decompression times.

First I tested the behavior for gzipped files, in order to get an idea of what I could expect:
* tar: 0:19
* CPython: 0:31
* Pypy 1.5: 0:47
* Pypy trunk: 0:43

Pypy is ~50% slower than CPython, itself ~50% slower than the native tar.

Then I tested using a bz2-compressed archive:
* tar: 0:54
* CPython: 1:10
* Pypy 1.5: hard crash
* Pypy trunk: 2:58

Here, pypy is 200% slower than CPython, which is a significant slowdown. I believe it might be a source of performance issues when 
installing bz2-packed modules via pip.

Decompression script:
import tarfile
import sys

tar = tarfile.open(sys.argv[1])

messages: 2703
nosy: masklinn, pypy-issue
priority: bug
status: unread
title: bzip2 decompression significantly slower than on CPython

PyPy bug tracker <tracker at bugs.pypy.org>

More information about the pypy-issue mailing list