[pypy-issue] Issue #2071: file.readinto() uses too much memory (pypy/pypy)

Andrew Dalke issues-reply at bitbucket.org
Thu Jun 25 03:18:09 CEST 2015

New issue 2071: file.readinto() uses too much memory

Andrew Dalke:

I am using CFFI to read a file containing 7 GB of uint64_t data. I use ffi.new() to allocate the space, then readinto() the pre-allocated buffer, as suggested by the CFFI documentation. 

(Note: the docstring for readinto says "Undocumented. Don't use this; it may go away".)

It appears that something internal to readinto makes a copy of the input because the readinto() ends up running out of memory on my 16 GB box, which has 15 GB free.

I am able to reproduce the problem using the array module, so it is not some oddity of the CFFI implementation. Here is an example of what causes a problem on my machine:


>>>> import array
>>>> a=array.array("c", s)
>>>> a.extend(s)
>>>> a.extend(s)

# do some cleanup, to be on the safe side.
>>>> del s
>>>> import gc
>>>> gc.collect()

# Read ~6GB from a file with >7GB in it
>>>> len(a)
>>>> filename = "pubchem.14"
>>>> import os
>>>> os.path.getsize(filename)
>>>> infile = open(filename, "rb")

# Currently, virtual memory size = 8.87 GB
>>>> infile.readinto(a)

# I killed it when the virtual memory was at 14 GB and still growing


More information about the pypy-issue mailing list