[pypy-dev] Speeding up zlib in standard library

Peter Cock p.j.a.cock at googlemail.com
Mon Mar 19 16:49:13 CET 2012


(Replying on list - assuming Justin went off list my mistake)

On Fri, Mar 16, 2012 at 4:54 AM, Justin Peel <peelpy at gmail.com> wrote:
> Two things to mention. First, if you are going to use valgrind, you
> will need to build your own pypy because, as far as I know, the
> buildbot ones do not have the debug info so you won't have any helpful
> function names in your profile. If I remember correctly, most of the
> time used is in _operate in rzlib.py.
>
> Also, in my opinion, the next thing to try in speeding up this code is
> to do a faster copy than a char by char copy for copying to the input
> buffer that is sent to the external C function. I'm not sure if the
> copying from the output buffer to the string builder is char by char
> or not.

Looking at the code for rzlib.py, _operate does seem to be the
core function, and so likely the hot spot.
https://bitbucket.org/pypy/pypy/src/default/pypy/rlib/rzlib.py

It uses the StringBuilder class from rstring via the append_charpsize
method, defined as follows:
https://bitbucket.org/pypy/pypy/src/default/pypy/rlib/rstring.py

    def append_charpsize(self, s, size):
        l = []
        for i in xrange(size):
            l.append(s[i])
        self.l.append(self.tp("").join(l))
        self._grow(size)

So it is indeed doing a char by char copy of the string from Python
to zlib (in my case to decompress a long chunk of data). I don't
know enough about PyPy's internals to say if something naive
like this would work faster (guessing looking at the append_slice
method):

    def append_charpsize(self, s, size):
        assert 0 <= size
        self.l.append(s[0:size])
        self._grow(size)

Presumably to try this idea out I'm going to first need to get
PyPy to build locally?

Thanks,

Peter


More information about the pypy-dev mailing list