Fwd: [mpi4py] Python on 10K of cores on BG/P
Just as a note, moving forward. ---------- Forwarded message ---------- From: Brian Granger <ellisonbg@gmail.com> Date: Wed, Feb 10, 2010 at 11:34 AM Subject: Re: [mpi4py] Python on 10K of cores on BG/P To: mpi4py <mpi4py@googlegroups.com>
We have been developing an electronic structure simulation software GPAW (https://wiki.fysik.dtu.dk/gpaw/). The software is written mostly in Python with the core computational routines in C-extensions. For parallel calculations we use MPI which is called both from C and Python (through our own Python interfaces for the MPI calls we need).
Nice!
We have run the code successfully on different supercomputing architectures such as Cray XT5 and Blue Gene, however as we are moving to thousands or tens of thousands processes one limitation of the current approach has become evident: at start-up time, the imports of python modules are starting to take increasing amount of time as huge number of processors try to read the same .py/.pyc files and the filesystem cannot naturally handle this efficiently.
Yes, I can imagine that if the .py files are on a shared filesystem, things would grind to a halt. The best way to fix this is to simply install all the .py files on the local disks of the compute nodes....assuming the compute nodes have local disks :-). If they don't have local disks, you are in a really tough situation. In some cases, it is feasible to think about saving the state of the python interpreter (along with imported modules), but in this case, I am doubtful that will work. If you are importing Python modules that link to C/C++/Fortran code, this will be very difficult. Furthermore, if your Python code is calling to MPI, you will also have to handle to fact that you have a live MPI universe with open sockets and so on. Separating out the parts that you can/want to send from the parts you can't/don't want to send will be quite a mess. AND, even if you are able to serialize the entire state of the Python inperpreter, you will still have to scatter it to all compute nodes (and unserialize it), which is what the shared filesystem is doing to begin with. While this scatter all may take place over a faster interconnect, you don't be able to get rid of it. Thus, in my mind, using a local disk is the only reasonable way to go. I realize it is likely that the local disk solution is not an option for you. In that case, I think you should go back to Cray and ask for an upgrade ;-) Cheers, Brian
Is it possible to modify the Python interpreter in order to have a single process do the import and then broadcast the data to the rest of the tasks?
-- Nichols A. Romero, Ph.D. Argonne Leadership Computing Facility Argonne, IL 60490 (630) 252-3441 (O) (630) 470-0462 (C)
-- You received this message because you are subscribed to the Google Groups "mpi4py" group. To post to this group, send email to mpi4py@googlegroups.com. To unsubscribe from this group, send email to mpi4py+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en.
-- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger@calpoly.edu ellisonbg@gmail.com -- You received this message because you are subscribed to the Google Groups "mpi4py" group. To post to this group, send email to mpi4py@googlegroups.com. To unsubscribe from this group, send email to mpi4py+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mpi4py?hl=en.
participants (1)
-
Matthew Turk