Hi All, I have tried the solutions proposed in the previous thread and it looks like Chris' one is the fastest for my purposes. Now, I have a question which is probably more conceptual than implementation-related. I started this little thread as my task is to read medium to (relatively) big unformatted binary files written by another (black-box) software (which is written in Fortran). These files can range from 10 MB to 200 MB, more or less, and I read them using a f2py-wrapped Fortran subroutine. I got a stupendous speed improvement when I switched from Compaq Visual Fortran to G95 with "STREAM" access (from 8% to 90% faster, depending on the infamous "indices" I was talking about). Now, I was thinking about using the multiprocessing module in Python, as we have 4-cpus PCs at work and I could try to call my subroutine using multiple Python processes. I *really* should do this in Fortran directly but I haven't found any reference on how to do file I/O in parallel in Fortran and I haven't got any help from comp.lang.fortran in that sense (only a warning that I may slow down everything by using multiple processes). Splitting the reading process between 4 processes will require the exchange of 5-20 MB from the child processes to the main one: do you think my script will benefit from using multiprocessing? Is there any drawback in using Numpy arrays in multiple processes? If using multiprocessing in Python will create too much overhead, does anyone have any suggestion/reference/link/code on how to handle parallel I/O in Fortran directly? Should I try another approach? Thank you a lot for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ http://thedoomedcity.blogspot.com/
Andrea Gavana wrote:
I have tried the solutions proposed in the previous thread and it looks like Chris' one is the fastest for my purposes.
whoo hoo! What do I win? ;-)
Splitting the reading process between 4 processes will require the exchange of 5-20 MB from the child processes to the main one: do you think my script will benefit from using multiprocessing?
If you are talking about multiprocessing to read the data in -- I don't think so -- that's probably IO bound anyway. You can't make your disks faster with multiple processors.
Should I try another approach?
I don't know it will do anything for performance, but you might want to look at memory mapped arrays -- it's a very cool way to work with data files too big to want to bring into memory all at once. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Christopher Barker wrote:
Andrea Gavana wrote:
I have tried the solutions proposed in the previous thread and it looks like Chris' one is the fastest for my purposes.
whoo hoo! What do I win? ;-)
Splitting the reading process between 4 processes will require the exchange of 5-20 MB from the child processes to the main one: do you think my script will benefit from using multiprocessing?
If you are talking about multiprocessing to read the data in -- I don't think so -- that's probably IO bound anyway. You can't make your disks faster with multiple processors.
Should I try another approach?
I don't know it will do anything for performance, but you might want to look at memory mapped arrays -- it's a very cool way to work with data files too big to want to bring into memory all at once.
-Chris
Depending on your system and OS, I would agree with Chris that you are most likely to be I/O bound. If so, you have to look at a different approach to overcome that barrier. If you are not I/O bound then you need to find out what is the limiting your performance (like using Robert Kern's line_profiler http://pypi.python.org/pypi/line_profiler/). If you find it CPU-bound then you might you gain benefits from multiple cpu's - of which has been addressed in multiple times on the list. Cython is probably a very viable option for what you have described. Bruce
participants (3)
-
Andrea Gavana
-
Bruce Southey
-
Christopher Barker