Multiprocessing and file I/O

Sun May 24 08:10:05 EDT 2009

Infinity77 wrote:
> Hi All,
>
>     I am trying to speed up some code which reads a bunch of data from
> a disk file. Just for the fun of it, I thought to try and use parallel
> I/O to split the reading of the file between multiple processes.
> Although I have been warned that concurrent access by multiple
> processes to the same file may actually slow down the reading of the
> file, I was curious to try some timings by varying the number of
> processes which read the file. I know almost nothing of
> multiprocessing, so I was wondering if anyone had some very simple
> snippet of code which demonstrates how to read a file using
> multiprocessing.
>
> My idea was to create a "big" file by doing:
>
> fid = open("somefile.txt", "wb")
> fid.write("HELLO\n"*1e7)
> fid.close()
>
> and then using fid.seek() to point every process I start to a position
> inside the file and start reading from there. For example, with 4
> processes and a 10 MB file, I would tell the first process to read
> from byte 0 to byte 2.5 million, the second one from 2.5 million to 5
> million and so on. I just have an academic curiosity :-D
>
> Any suggestion is very welcome, either to the approach or to the
> actual implementation. Thank you for your help.
>
> Andrea.
>   
If the thing you would want to speed up is the processing of the file 
(and not the IO), I would make one process actually read the file, and 
feed the other processes with the data from the file through a queue.