
Hi Geoffrey, Britton's done a very nice job of summarizing ways of approaching this, but I've got a couple tips as well.
My plan is to (if possible): 1) Create a hierarchy of clumps, based on their level of ionization instead of topology like in the cookbook find_clumps()
I think this is a misunderstanding. There are several related things here. Extracted Regions are any collection of cells, connected or not, represented as a single object in YT. This could for instance, be all the gas above a given density and temperature, or above an ionization level, etc etc. This type of object is generated by creating an object and then calling *either* .cut_region or .extract_region. cut_region works inline and accepts a list of filters: new_region = sphere.cut_region( ["grid['Density'] > 1e-18", "grid['Temperature'] > 1000"] ) Whereas extract region accepts a set of indices (and thus, extract_region is less memory conservative, whereas cut_region can operate before any data has been read): new_indices = (sphere["Density"] > 1e-18) & (sphere["Temperature"] > 1000) new_region = sphere.extract_region(new_indices) The cookbook uses the Clump object (which Britton wrote) which sits as a layer on top of a method I mostly wrote for extracting level sets. What the level set extraction method (contour finding) does is look for sets that satisfy the criteria, then split them up into individual objects that are required to be topologically connected. So I think what you want is to either use the Clump method or directly extract connected sets yourself. The extracted sets does not place any requirements on the number of children or the number of parents and it doesn't do any connection of one set to another. It accepts a field that one wants to extract level sets in, a min and a max for the contour finding, and the number of levels between those min and max values. It's accessible through the extract_connected_sets method: sphere.extract_connected_sets(field, num_levels, min_ val, max_val) It returns a set of the bounds and the objects themselves. But, like I said above, it doesn't do anything like the Clump does, with the parent/child relationship. "Connected" status is independent of the field over which you contour. :)
2) Calculate a global clumping factor from regions inside these clumps
Use a derived quantity like Britton suggested; the docs should describe how to do this.
3) Find the location(x,y,z) of the peaks of HI_Density and HI/H (ionized fraction) and the value of Density in those regions
MaxLocation is what you want. http://yt.enzotools.org/doc/advanced/derived_quantities.html#using-derived-q...
4) Find the volume inside each clump region
region.quantities["TotalValue"]("CellVolume")
5) From the hierarchy of clumps, create their merger history
Dave Collins has written something that does this. However, because you're looking at ionized regions overlapping (which doesn't quite work the same way as a moving set of overdensities) you don't need to do it quite that complicatedly. My suggestion is to use a one (since it's unigrid) or three dimensional (since it's easier) index for every point in each clump, then do pairwise comparisons using numpy's intersect1d function, across timesteps.
6) Save the clump information in such a way that I can come back to it if I find something else that's interesting to analyze about them
cPickle or the hierarchy's save_object function. Let us know if there's anything else! -Matt