[Tutor] Load Entire File into memory

Danny Yoo dyoo at hashcollision.org
Tue Nov 5 02:44:22 CET 2013


>
>
> > Also as I have mentioned I cant afford to run my code using 4-5 times
> memory.
> > Total resource available in my server is about 180 GB memory (approx 64
> GB RAM + 128GB swap).
>
> OK, There is a huge difference between having 100G of RAM and having
> 64G+128G swap.
> swap is basically disk so if you are reading your data into memory and
> that memory is
> bouncing in and out of swap things will slow down by an order of
> magnitude.
> You need to try to optimise to use real RAM and minimise use of swap.
>


I concur with Alan, and want to state his point more forcefully.  If you
are hitting swap, you are computationally DOOMED and must do something
different.


You _must_ avoid swap at all costs here.  You may not understand the point,
so a little more explanation: touching swap is several orders of magnitude
more expensive than anything else you are doing in your program.

    CPU operations are on the order of nanoseconds. (10^-9)

    Disk operations are on the order of milliseconds.  (10^-3)

References:

    http://en.wikipedia.org/wiki/Instructions_per_second
    http://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics

As soon as you start touching your swap space to simulate virtual memory,
you've lost the battle.


We were trying not to leap to conclusions till we knew more.  Now we know
more.  If your system has much less RAM than can fit your dataset at once,
trying to read it all at once on your single machine, into an in-memory
buffer, is wrong.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20131104/fe7ee437/attachment-0001.html>


More information about the Tutor mailing list