Multiprocessing and file I/O

Sun May 24 07:28:05 EDT 2009

Hi All,

    I am trying to speed up some code which reads a bunch of data from
a disk file. Just for the fun of it, I thought to try and use parallel
I/O to split the reading of the file between multiple processes.
Although I have been warned that concurrent access by multiple
processes to the same file may actually slow down the reading of the
file, I was curious to try some timings by varying the number of
processes which read the file. I know almost nothing of
multiprocessing, so I was wondering if anyone had some very simple
snippet of code which demonstrates how to read a file using
multiprocessing.

My idea was to create a "big" file by doing:

fid = open("somefile.txt", "wb")
fid.write("HELLO\n"*1e7)
fid.close()

and then using fid.seek() to point every process I start to a position
inside the file and start reading from there. For example, with 4
processes and a 10 MB file, I would tell the first process to read
from byte 0 to byte 2.5 million, the second one from 2.5 million to 5
million and so on. I just have an academic curiosity :-D

Any suggestion is very welcome, either to the approach or to the
actual implementation. Thank you for your help.

Andrea.