Why do halo objects need their particles?
Hi all, This week at the HIPACC Summer School, Chris Moody and I have been looking at finishing up Rockstar (rockstar.googlecode.com) integration. The current status is that Rockstar *works*, and Chris has added on TimeSeries support for it, but it alas only works for single-mass simulations. (The author of Rockstar has provided some suggestions about how to add multi-mass support.) Rockstar does have an interesting method of calculating halo properties. Specifically, you add in variables and whatnot to a struct and then insert your own routines into the C code, recompile, rerun, and get additional results. The way we've set up Rockstar integration, we could actually supply a generic Python function, and have it fill in a set of values that are then calculated on a halo. I think this is where we want to go with the Rockstar stuff -- you specify a set of functions you want calculated on a halo, these get calculated, you get back the halo object and the particles are flushed from memory. What this brings up in my mind is, why do we hold on to the particles at all? When you run the halo finders in yt, the returned halo objects keep all the particles that belong to each halo. This lets the user specify, later on, which analysis to run. But, it also leads to a *lot* of memory management, parallel-distribution of objects (in a way that's different than we do now), and really I think doesn't add a *huge* amount of value. Nearly all the remaining analysis that gets performed either requires you to dump the particle IDs to disk anyway (merger trees) or could be performed for very little cost inline. So if we think about this the way Christine, Stephen and everyone was talking about the other day, imagine you had a halo finder that was run like this: catalog = HaloFinder(ts, analysis = ["CenterOfMass", "MaximumRadii", "EllipsoidalParameters"]) The halo finder would go out, perform the analysis on each dataset in the time series, and then return to you the full tree. But where this would be different than before is that the halos would have the properties calculated ahead of time, and the particles themselves would be gone. They'd be far more manageable, and we could discard nearly *all* of the complexity in the existing halo objects. You could write your halo analysis routines *nearly identically* to the way you'd write your field definitions: def MaximumRadii(halo, pf): dx = halo["x"] - halo.properties["CenterOfMass"] ... return max_r The downside would be that to add on any analysis that requires *raw particle data*, you would either have to write out the particle IDs or information to disk, or re-run the halo finder. But honestly, I think this would probably be just fine to do. And the vast improvements to the infrstructure would probably be worth it. For instance, we would probably no longer need to subclass the halo objects, and we would be able to get rid of "claiming" on a processor-by-processor basis. Thoughts? Stephen, what do you think? -Matt
participants (3)
-
Christopher Moody
-
Matthew Turk
-
Stephen Skory