Processing large CSV files - how to maximise throughput?

Fri Oct 25 20:22:44 EDT 2013

In article <mailman.1560.1382744694.18130.python-list at python.org>,
 Dennis Lee Bieber <wlfraed at ix.netcom.com> wrote:

> 	Memory is cheap -- I/O is slow. <G> Just how massive are these CSV
> files?

Actually, these days, the economics of hardware are more like, "CPU is 
cheap, memory is expensive".

I suppose it all depends on what kinds of problems you're solving, but 
my experience is I'm much more likely to run out of memory on big 
problems than I am to peg the CPU.  Also, pegging the CPU leads to 
well-behaved performance degradation.  Running out of memory leads to 
falling off a performance cliff as you start to page.

And, with the advent of large-scale SSD (you can get 1.6 TB SSD in 2.5 
inch form-factor!), I/O is as fast as you're willing to pay for :-)