Multiprocessing and file I/O

Sun May 24 10:13:52 EDT 2009

Hi Igor,

On May 24, 1:10 pm, Igor Katson <descent... at gmail.com> wrote:
> Infinity77 wrote:
> > Hi All,
>
> >     I am trying to speed up some code which reads a bunch of data from
> > a disk file. Just for the fun of it, I thought to try and use parallel
> > I/O to split the reading of the file between multiple processes.
> > Although I have been warned that concurrent access by multiple
> > processes to the same file may actually slow down the reading of the
> > file, I was curious to try some timings by varying the number of
> > processes which read the file. I know almost nothing of
> > multiprocessing, so I was wondering if anyone had some very simple
> > snippet of code which demonstrates how to read a file using
> > multiprocessing.
>
> > My idea was to create a "big" file by doing:
>
> > fid = open("somefile.txt", "wb")
> > fid.write("HELLO\n"*1e7)
> > fid.close()
>
> > and then using fid.seek() to point every process I start to a position
> > inside the file and start reading from there. For example, with 4
> > processes and a 10 MB file, I would tell the first process to read
> > from byte 0 to byte 2.5 million, the second one from 2.5 million to 5
> > million and so on. I just have an academic curiosity :-D
>
> > Any suggestion is very welcome, either to the approach or to the
> > actual implementation. Thank you for your help.
>
> > Andrea.
>
> If the thing you would want to speed up is the processing of the file
> (and not the IO), I would make one process actually read the file, and
> feed the other processes with the data from the file through a queue.

No, the processing of the data is fast enough, as it is very simple.
What I was asking is if anyone could share an example of using
multiprocessing to read a file, along the lines I described above.

Andrea.