[Tutor] Load Entire File into memory

ALAN GAULD alan.gauld at btinternet.com
Tue Nov 5 02:10:39 CET 2013


Forwarding to tutor list. Please use Reply All in responses.


From: Amal Thomas <amalthomas111 at gmail.com>
>To: Alan Gauld <alan.gauld at btinternet.com> 
>Sent: Monday, 4 November 2013, 17:26
>Subject: Re: [Tutor] Load Entire File into memory
> 
>
>
>@Alan: Thanks.. I have checked the both ways( reading line by line by not loading into ram , 
> other loading entire file to ram and then reading line by line)  for files with 2-3GB. 

OK, But 2-3G will nearly always live entirely in RAM on a modern computer.

> Only change which i have done is in the reading part , rest of the code was kept same. 
> There was significant time difference. Please note that I started this thread stating that 
> when I am using io.StringIO(f.read()) in code it uses a memory of almost 4-5 times the 
> input file size. Now using read() or readlines() it has reduced to 1.5 times... 

Yes a raw string is always going to be more efficient in memory use than StringIO.

> Also as I have mentioned I cant afford to run my code using 4-5 times memory. 
> Total resource available in my server is about 180 GB memory (approx 64 GB RAM + 128GB swap). 

OK, There is a huge difference between having 100G of RAM and having 64G+128G swap.
swap is basically disk so if you are reading your data into memory and that memory is 
bouncing in and out of swap things will slow down by an order of magnitude. 
You need to try to optimise to use real RAM and minimise use of swap. 
> So before starting to process my 30-50GB input files I am keen to know the best way.


Performance tuning is always a tricky topic and needs to be done on a case by case 
basis. There are simply too many factors to try to say that method (A) will always be 
faster than method (B) It depends on the nature of the data, its source, your target 
data structures, your algorithms, your output format and target etc as well as the 
physical machines being used... We need a lot more detail about the task before 
we can give any solid advice. And even then you should verify it before assuming 
you are done.

Alan G.


More information about the Tutor mailing list