[Tutor] Load Entire File into memory

Steven D'Aprano steve at pearwood.info
Mon Nov 4 17:30:10 CET 2013


On Mon, Nov 04, 2013 at 07:00:29PM +0530, Amal Thomas wrote:
> Yes I have found that after loading to RAM and then reading lines by lines
> saves a huge amount of time since my text files are very huge.

This is remarkable, and quite frankly incredible. I wonder whether you 
are misinterpreting what you are seeing? Under normal circumstances, 
with all but quite high-end machines, trying to read a 50GB file into 
memory all at once will be effectively impossible. Suppose your computer 
has 24GB of RAM. The OS and other running applications can be expected 
to use some of that, but even ignoring this, it is impossible to read a 
50GB file into memory all at once with only 24GB.

What I would expect is that unless you have *at least* double the amount 
of memory as the size of the file (in this case, at least 100GB), either 
Python will give you a MemoryError, or the operating system will try 
paging memory into swap-space, which is *painfully* slooooow. I've been 
in the situation where I accidently tried reading a file bigger than the 
installed RAM, and it ran overnight (14+ hours), locked up and stopped 
responding, and I finally had to unplug the power and restart the 
machine.

So unless you have 100+ GB in your computer, which would put it in 
seriously high-end server class, I find it difficult to believe that 
you are actually reading the entire file into memory.

Please try this little bit of code, replacing the file name with the 
actual name of your 50GB data file:

import os
filename = "YOUR FILE NAME HERE"
print("File size:", os.stat(filename).st_size)
f = open(filename)
content = f.read()
print("Length of content actually read:", len(content))
print("Current file position:", f.tell())
f.close()


and send us the output.

Thanks,



-- 
Steven


More information about the Tutor mailing list