Re: [Yt-dev] Nautilus: ParallelHaloProfiler I/O issue

23 Sep 2011

      Geoffrey,
...
(1) How much data are read/written by program ?
- After all the particles (3200^3 of them) are read in they are linked with
a fortran KDtree if they satisfy some conditions.
(2) How many parallel readers/writers are used by program ?
- It is reading using 512 cores from my submission script.  The amount of
write to disk depends on the distribution of the particle haloes across
processors, if they exist across processors then there will be more files
written out by write_particle_lists.
(3) Do you use MPI_IO ? or something else ?
- Yes, the program uses mpi4py-1.2.2 installed in my home directory
The details of the code can be found at:
http://yt-project.org/doc/analysis_modules/running_halofinder.html#halo-find...
under the section "Parallel HOP"
The main thing you got wrong is that we do not use MPI_IO. The IO is
done primarily through a custom HDF5 reader written in C, and each
thread does its own reading.

The issue that Pragnesh is probably seeing, and what Geoffrey alludes
to, is how load balancing is done. Because of the details of how Enzo
stores its data, it is difficult to know where to send the data for
load balancing it without reading it all in, first. Out of
convenience, once the layout is established, the data is read in again
(instead of distributed via communication), this time by the tasks
that have been assigned the data. Furthermore, the data as assigned
may come from several files, meaning that each task will be
opening/closing multiple files multiple times.

If all these IO calls are causing a problem, I could see about putting
in some kind of IO wait (configurable by the user) that basically
slows down the reading part of the process.

p.s. Geoffrey - what is the cosmological size of your box? If it's
above about 300 Mpc/h, load balancing is probably not necessary, which
should roughly half the IO required.

-- 
Stephen Skory
s@skory.us
http://stephenskory.com/
510.621.3687 (google voice)