[pypy-dev] file reading/writing speed
Hakan Ardo
hakan at debian.org
Tue Aug 9 08:22:19 CEST 2011
On Tue, Aug 9, 2011 at 5:07 AM, Josh Ayers <josh.ayers at gmail.com> wrote:
>
> I generated a 1 million element array using the built-in array module, wrote
> it to disk, and then read it back in. See http://pastie.org/2342676 for the
> code.
>
> Each operation was slower with PyPy than with CPython.
>
> * Array creation - CPython: 0.16s - PyPy: 0.47s
Whats taking time here is to iterate over the range-list and
unwrapping all the integers. If all you want is to allocate an array
it's significantly faster (both on pypy and on cpython) to do:
a = array.array(outputDataType,[0]) * dataSize
> * Writing file - CPython: 0.05s - PyPy: 0.11s
> * Reading file: CPython: 0.02s - PyPy: 0.08s
The builtin array module uses space.call_method(w_f, 'write') and
space.call_method(w_f, 'read') to implement fromfile and tofile. For
fromfile that means copying the data atleast once, and maybe that's
whats going on with tofile too. I dont know how hard it would be to
add some fast path for common cases that reads/writes data directly
into the array buffer?
>
> This method won't quite work for me in any case - I need to store 64 bit
> integers, and the built-in array module doesn't support them. To get around
> that, I modified the pure-python array.py that comes in the pypy\lib_pypy
> directory. I added a "q" to the end of the line "TYPECODES = ..." which
> represents a 64 bit signed integer within the struct module. I saved that
> modified file as array2.py and imported it in place of the built-in array.
> See http://pastie.org/2342721 for the code.
>
> That allowed me to use a 64 bit integer, but the array creation step was
> again much slower on PyPy than it was on CPython. The disk accessing steps
> were more similar, and are probably at about the limit of the hard disk
> anyway, but creating the array takes much longer under PyPy.
Why do you think this is limited by the harddisk? I would imagine this
approach to be slower than using the builtin module. Did you try this
approach with a datatype supported by the built in module to compare
the performance of the two approaches?
--
Håkan Ardö
More information about the pypy-dev
mailing list