[Tutor] Load Entire File into memory
Steven D'Aprano
steve at pearwood.info
Mon Nov 4 17:30:10 CET 2013
On Mon, Nov 04, 2013 at 07:00:29PM +0530, Amal Thomas wrote:
> Yes I have found that after loading to RAM and then reading lines by lines
> saves a huge amount of time since my text files are very huge.
This is remarkable, and quite frankly incredible. I wonder whether you
are misinterpreting what you are seeing? Under normal circumstances,
with all but quite high-end machines, trying to read a 50GB file into
memory all at once will be effectively impossible. Suppose your computer
has 24GB of RAM. The OS and other running applications can be expected
to use some of that, but even ignoring this, it is impossible to read a
50GB file into memory all at once with only 24GB.
What I would expect is that unless you have *at least* double the amount
of memory as the size of the file (in this case, at least 100GB), either
Python will give you a MemoryError, or the operating system will try
paging memory into swap-space, which is *painfully* slooooow. I've been
in the situation where I accidently tried reading a file bigger than the
installed RAM, and it ran overnight (14+ hours), locked up and stopped
responding, and I finally had to unplug the power and restart the
machine.
So unless you have 100+ GB in your computer, which would put it in
seriously high-end server class, I find it difficult to believe that
you are actually reading the entire file into memory.
Please try this little bit of code, replacing the file name with the
actual name of your 50GB data file:
import os
filename = "YOUR FILE NAME HERE"
print("File size:", os.stat(filename).st_size)
f = open(filename)
content = f.read()
print("Length of content actually read:", len(content))
print("Current file position:", f.tell())
f.close()
and send us the output.
Thanks,
--
Steven
More information about the Tutor
mailing list