Christopher Barker writes:
The worker pool approach is probably the way to go, but there is a fair bit of overhead to creating a multiprocessing job. So fewer, larger jobs are faster than many small jobs.
True, but processing those rows would have to be awfully fast for the increase in overhead from 16 chunks x 10^6 rows/chunk to 64 chunks x 250,000 rows/chunk to matter, and that would be plenty granular to give a good approximation to his 2 chunks by fast core : 1 chunk by slow core nominal goal with a single queue, multiple workers approach. (Of course, it almost certainly will do a lot better, since 2 : 1 was itself a very rough approximation, but the single queue approach adjusts to speed differences automatically.)
And if it's that fast, he could do it on a single core, and still done by the time he's finished savoring a sip of coffee. ;-)
Steve