[pypy-issue] Issue #2071: file.readinto() uses too much memory (pypy/pypy)

Andrew Dalke issues-reply at bitbucket.org
Thu Jun 25 03:18:09 CEST 2015


New issue 2071: file.readinto() uses too much memory
https://bitbucket.org/pypy/pypy/issue/2071/filereadinto-uses-too-much-memory

Andrew Dalke:

I am using CFFI to read a file containing 7 GB of uint64_t data. I use ffi.new() to allocate the space, then readinto() the pre-allocated buffer, as suggested by the CFFI documentation. 

(Note: the docstring for readinto says "Undocumented. Don't use this; it may go away".)

It appears that something internal to readinto makes a copy of the input because the readinto() ends up running out of memory on my 16 GB box, which has 15 GB free.

I am able to reproduce the problem using the array module, so it is not some oddity of the CFFI implementation. Here is an example of what causes a problem on my machine:

```
#!python

>>>> import array
>>>> a=array.array("c", s)
>>>> a.extend(s)
>>>> a.extend(s)

# do some cleanup, to be on the safe side.
>>>> del s
>>>> import gc
>>>> gc.collect()
0

# Read ~6GB from a file with >7GB in it
>>>> len(a)
6442450944
>>>> filename = "pubchem.14"
>>>> import os
>>>> os.path.getsize(filename)
7662345264
>>>> infile = open(filename, "rb")

# Currently, virtual memory size = 8.87 GB
>>>> infile.readinto(a)
^CTerminated

# I killed it when the virtual memory was at 14 GB and still growing

```





More information about the pypy-issue mailing list