[pypy-issue] Issue #2071: file.readinto() uses too much memory (pypy/pypy)
issues-reply at bitbucket.org
Thu Jun 25 03:18:09 CEST 2015
New issue 2071: file.readinto() uses too much memory
I am using CFFI to read a file containing 7 GB of uint64_t data. I use ffi.new() to allocate the space, then readinto() the pre-allocated buffer, as suggested by the CFFI documentation.
(Note: the docstring for readinto says "Undocumented. Don't use this; it may go away".)
It appears that something internal to readinto makes a copy of the input because the readinto() ends up running out of memory on my 16 GB box, which has 15 GB free.
I am able to reproduce the problem using the array module, so it is not some oddity of the CFFI implementation. Here is an example of what causes a problem on my machine:
>>>> import array
>>>> a=array.array("c", s)
# do some cleanup, to be on the safe side.
>>>> del s
>>>> import gc
# Read ~6GB from a file with >7GB in it
>>>> filename = "pubchem.14"
>>>> import os
>>>> infile = open(filename, "rb")
# Currently, virtual memory size = 8.87 GB
# I killed it when the virtual memory was at 14 GB and still growing
More information about the pypy-issue