[Tutor] using multiprocessing efficiently to process large data file

Abhishek Pratap abhishek.vit at gmail.com
Fri Aug 31 00:19:50 CEST 2012


Hi Guys

I have a with few million lines. I want to process each block of 8
lines and from my estimate my job is not IO bound. In other words it
takes a lot more time to do the computation than it would take for
simply reading the file.

I am wondering how can I go about reading data from this at a faster
pace and then farm out the jobs to worker function using
multiprocessing module.

I can think of two ways.

1. split the split and read it in parallel(dint work well for me )
primarily because I dont know how to read a file in parallel
efficiently.
2. keep reading the file sequentially into a buffer of some size and
farm out a chunks of the data through multiprocessing.

Any example would be of great help.

Thanks!
-Abhi


More information about the Tutor mailing list