Processing large CSV files - how to maximise throughput?
Dave Angel
davea at davea.name
Thu Oct 24 23:57:17 EDT 2013
On 24/10/2013 23:35, Steven D'Aprano wrote:
> On Fri, 25 Oct 2013 02:10:07 +0000, Dave Angel wrote:
>
>>> If I have multiple large CSV files to deal with, and I'm on a
>>> multi-core machine, is there anything else I can do to boost
>>> throughput?
>>
>> Start multiple processes. For what you're doing, there's probably no
>> point in multithreading.
>
> Since the bottleneck will probably be I/O, reading and writing data from
> files, I expect threading actually may help.
>
>
>
We approach the tradeoff from opposite sides. I would use
multiprocessing to utilize multiple cores unless the communication costs
(between the processes) would get too high.
They won't in this case.
But I would concur -- probably they'll both give about the same speedup.
I just detest the pain that multithreading can bring, and tend to avoid
it if at all possible.
--
DaveA
More information about the Python-list
mailing list