python reading file memory cost
Peter Otten
__peter__ at web.de
Tue Aug 2 04:26:22 EDT 2011
Chris Rebert wrote:
>> The running result was that read a 500M file consume almost 2GB RAM, I
>> cannot figure it out, somebody help!
>
> If you could store the floats themselves, rather than their string
> representations, that would be more space-efficient. You could then
> also use the `array` module, which is more space-efficient than lists
> (http://docs.python.org/library/array.html ). Numpy would also be
> worth investigating since multidimensional arrays are involved.
>
> The next obvious question would then be: do you /really/ need /all/ of
> the data in memory at once?
This is what you (OP) should think about really hard before resorting to the
optimizations mentioned above. Perhaps you can explain what you are doing
with the data once you've loaded it into memory?
> Also, just so you're aware:
> http://docs.python.org/library/sys.html#sys.getsizeof
To give you an idea how memory usage explodes:
>>> line = "1.23 4.56 7.89 0.12\n"
>>> len(line) # size in the file
20
>>> sys.getsizeof(line)
60
>>> formatted = ["%2.6E" % float(x) for x in line.split()]
>>> sys.getsizeof(formatted) + sum(sys.getsizeof(s) for s in formatted)
312
More information about the Python-list
mailing list