Hi guys, I brought this up a while ago and I'm not sure where we left it. I thought I needed it earlier, but it turns out I didn't, so I forgot about it. However, now I definitely have need to get virial masses for L7 haloes. A parallel halo profiler would make this a much more reasonable task. I think I can probably figure out a way to do it myself (which would be a good thing for me to do) but I don't want to step on any toes or do it in a bad or inextensible way. Keeping that in mind, do any of you have some suggestions? I think the best way is to parallelize similarly to HOP using sub-volumes to keep memory usage low for each processor. Then each process would only run on haloes in its sub-volume. This would make things nicer if parallel HOP is run before the profiler, data wouldn't have to be read multiple times. Thanks! _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Hi Stephen, I just chatted with Britton on the phone about this, and we came to a couple conclusions. * Right now, the halo profiler is about 10% away from running in parallel on *each* halo. i.e., each operation on the halos can be parallelized with a few more lines of work. * This method of parallelization is a bad way to go. * Volume decomposition is also not likely to be terribly useful, with the current means of IO. At some point in the future this could and will change, but for now a round-robin approach is the way to go. * To round robin this, two things need to happen -- * When projecting, we need to be able to set a flag to disable parallelization, ala '_processing'. I'll handle this, but it should be 1-2 lines of code. * The ParallelAnalysisInterface needs to be generalized to provide iterators over any object, not just _grids. Only a couple lines of code, but annoying. At that point, HaloProfiler can subclass ParallelAnalysisInterface and use that as the iterator over halos. The IO needs to be looked at briefly, but it should be mostly okay as is. I think this can be done in a couple hours. However, what I think the real problem is is that we need to ensure that the serial halo profiler doesn't break. To that end, I think we need to plan to apply Britton's sample parameter files to the RD0005-mine dataset, and run that in both serial and in parallel as our testing mechanism. My inclination is that we start working on this in isolation; I'll commit some changes I mention above tomorrow to the bitbucket yt repo, which I am also mirroring on hg.enzotools.org. -Matt On Fri, Mar 13, 2009 at 2:05 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
Hi guys,
I brought this up a while ago and I'm not sure where we left it. I thought I needed it earlier, but it turns out I didn't, so I forgot about it. However, now I definitely have need to get virial masses for L7 haloes. A parallel halo profiler would make this a much more reasonable task.
I think I can probably figure out a way to do it myself (which would be a good thing for me to do) but I don't want to step on any toes or do it in a bad or inextensible way. Keeping that in mind, do any of you have some suggestions?
I think the best way is to parallelize similarly to HOP using sub-volumes to keep memory usage low for each processor. Then each process would only run on haloes in its sub-volume. This would make things nicer if parallel HOP is run before the profiler, data wouldn't have to be read multiple times.
Thanks!
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Matt,
* Volume decomposition is also not likely to be terribly useful, with the current means of IO. At some point in the future this could and will change, but for now a round-robin approach is the way to go.
Out of general curiosity, what aspect of the IO would have to change?
My inclination is that we start working on this in isolation; I'll commit some changes I mention above tomorrow to the bitbucket yt repo, which I am also mirroring on hg.enzotools.org.
Thank you very much, Matt! While I was certainly not requesting you do my work for me, I had the feeling that what I wanted to do would be related to (or impinge upon) overarching plans you already had for yt. Let me know what I can do for you. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Out of general curiosity, what aspect of the IO would have to change?
Well, there are a couple problems. What I'll present as the most problematic is the copying of arrays. We constructs sphere to analyze halos, which then load-on-demand. If you either ran these as field-cuts (inlineextractedregions) from the tiles or enabled preloading of data and disabled the 'pop' method in the DataQueue object, you could get around this. It's something to think about for later, but I'm not yet it's an issue right now. I think we should try implementing the round robin, then run cProfile on it to see if IO is the limiting factor.
Thank you very much, Matt! While I was certainly not requesting you do my work for me, I had the feeling that what I wanted to do would be related to (or impinge upon) overarching plans you already had for yt. Let me know what I can do for you.
will do. This is in support of a ticket I opened maybe a month or two ago, 192, which is really two-in-one. -Matt
I've made my first pass at parallelizing the halo profiler. The writing out stage still probably won't work, as all the procs will still try to write out at the same time. You can see what I did in these two changesets: http://bitbucket.org/MatthewTurk/yt/changeset/556490d86872/ http://bitbucket.org/MatthewTurk/yt/changeset/91933c49f17f/ (note that I pushed a bunch more changesets where I fixed bugs I missed... oops! the main stuff is in here) I changed the GridIterator to be a general object iterator. The halo profiler then runs with this. The halo profiler won't work just yet; what needs to be done is to change hopHalos into a dict keyed by the halo ids, and then it'll be a lot closer... Maybe Britton could take a look at where it is, suggest if it's almost ready? I think it was close to nearly there already. Specifically, I suspect we'll need to have multiple round-robins over the halos with the current workflow in the HaloProfiler, so we'll have to use arguments to initialize_parallel and finalize_parallel to govern what gets done. (We probably only need a finalize_parallel.) -Matt On Fri, Mar 13, 2009 at 10:37 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Out of general curiosity, what aspect of the IO would have to change?
Well, there are a couple problems. What I'll present as the most problematic is the copying of arrays. We constructs sphere to analyze halos, which then load-on-demand. If you either ran these as field-cuts (inlineextractedregions) from the tiles or enabled preloading of data and disabled the 'pop' method in the DataQueue object, you could get around this. It's something to think about for later, but I'm not yet it's an issue right now. I think we should try implementing the round robin, then run cProfile on it to see if IO is the limiting factor.
Thank you very much, Matt! While I was certainly not requesting you do my work for me, I had the feeling that what I wanted to do would be related to (or impinge upon) overarching plans you already had for yt. Let me know what I can do for you.
will do. This is in support of a ticket I opened maybe a month or two ago, 192, which is really two-in-one.
-Matt
participants (2)
-
Matthew Turk
-
Stephen Skory