[Tutor] Load Entire File into memory

Steven D'Aprano steve at pearwood.info
Tue Nov 5 04:28:24 CET 2013


I mustly agree with Alan, but a couple of little quibbles:

On Tue, Nov 05, 2013 at 01:10:39AM +0000, ALAN GAULD wrote:

> >@Alan: Thanks.. I have checked the both ways( reading line by line by not loading into ram , 
> > other loading entire file to ram and then reading line by line)  for files with 2-3GB. 
> 
> OK, But 2-3G will nearly always live entirely in RAM on a modern computer.

Speak for yourself. Some of us are still using "modern computers" with 
1-2 GB of RAM :-(


> > Only change which i have done is in the reading part , rest of the code was kept same. 
> > There was significant time difference. Please note that I started this thread stating that 
> > when I am using io.StringIO(f.read()) in code it uses a memory of almost 4-5 times the 
> > input file size. Now using read() or readlines() it has reduced to 1.5 times... 
> 
> Yes a raw string is always going to be more efficient in memory use than StringIO.

It depends what you're doing with it. The beauty of StringIO is that it 
emulates an in-memory file, so you can modify it in place. String 
objects are immutable and cannot be modified in place, so if you have to 
make changes to it, you have to make a copy with the change. For large 
strings, say, over 100MB, the overhead can get painful.


> > Also as I have mentioned I cant afford to run my code using 4-5 times memory. 
> > Total resource available in my server is about 180 GB memory (approx 64 GB RAM + 128GB swap). 
> 
> OK, There is a huge difference between having 100G of RAM and having 64G+128G swap.
> swap is basically disk so if you are reading your data into memory and that memory is 
> bouncing in and out of swap things will slow down by an order of magnitude. 

At least. Hard drive technology is more like two or even three orders of 
magnitude slower than RAM access (100 or 1000 times slower), and 
including the overhead of the memory manager moving things about, there 
is no upper limit to how large the penalty can be. If you get away with 
only 10 times slower, you're lucky. In my experience, 100-1000 times 
slower is more common (although my experience is on machines with fairly 
small amounts of RAM in the first place) and sometimes slow enough that 
even the operating system stops responding.

Plan to avoid using swap space :-)

> You need to try to optimise to use real RAM and minimise use of swap. 

Agreed.


-- 
Steven


More information about the Tutor mailing list