[pypy-dev] file reading/writing speed

Josh Ayers josh.ayers at gmail.com
Tue Aug 9 05:07:30 CEST 2011


Sorry if this is a dumb question - I'm very new to PyPy, but also very
interested in using it in several applications.

I currently use PyTables and NumPy in a particular application, and I was
playing around with the idea of migrating it to PyPy.  I don't really need
the querying or compression capabilities of PyTables, and most processing of
the data is done in pure Python code, so I was hoping to see a speed-up with
PyPy.

I ran some tests yesterday on storing data on disk in flat binary files, and
PyPy was significantly slower than CPython.  I haven't done any comparisons
with PyTables yet.  Note that I'm running CPython 2.7.2 and PyPy 1.5 on a
rather old Windows XP machine.

I generated a 1 million element array using the built-in array module, wrote
it to disk, and then read it back in.  See http://pastie.org/2342676 for the
code.

Each operation was slower with PyPy than with CPython.

 * Array creation - CPython: 0.16s - PyPy: 0.47s
 * Writing file - CPython: 0.05s - PyPy: 0.11s
 * Reading file: CPython: 0.02s - PyPy: 0.08s

This method won't quite work for me in any case - I need to store 64 bit
integers, and the built-in array module doesn't support them.  To get around
that, I modified the pure-python array.py that comes in the pypy\lib_pypy
directory.  I added a "q" to the end of the line "TYPECODES = ..." which
represents a 64 bit signed integer within the struct module.  I saved that
modified file as array2.py and imported it in place of the built-in array.
See http://pastie.org/2342721 for the code.

That allowed me to use a 64 bit integer, but the array creation step was
again much slower on PyPy than it was on CPython.  The disk accessing steps
were more similar, and are probably at about the limit of the hard disk
anyway, but creating the array takes much longer under PyPy.

 * Array creation - CPython: 0.31s - PyPy: 1.42s
 * Writing file - CPython: 0.16s - PyPy: 0.25s
 * Reading file: CPython: 0.83s - PyPy: 0.13s

Any ideas on what could be causing this speed difference?  Am I doing
anything egregiously stupid in my code?  Any ideas on better methods for
efficiently storing and retrieving binary data from disk under PyPy?

Thanks in advance for your help.

Sincerely,
Josh Ayers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20110808/1551d0e1/attachment.html>


More information about the pypy-dev mailing list