Hi Stephen, Could you summarize quickly for us the following two points? I want -- very much -- to eliminate the SS_HopOutput.py file and place it in the old location, but we *cannot* do so until these are answered satisfactorily. You are the one in the best position to answer them. :) 1. What is the status of determining, automatically, the necessary padding? I have suggested that this could be some small multiple of the root grid dx, but I don't recall hearing back from you. (I believe you have been using on the order of 6 root grid cells as padding.) 2. Are there any outstanding bugs in SS_HopOutput that you have found prevent it from replacing the old one, and working in both serial and parallel? (I have found none.) Thanks very much! -Matt
Matt,
1. What is the status of determining, automatically, the necessary padding? I have suggested that this could be some small multiple of the root grid dx, but I don't recall hearing back from you. (I believe you have been using on the order of 6 root grid cells as padding.)
It depends on your cosmological size - not the gridding. Smaller volumes need bigger padding, but I've discovered that 'a little goes a long way.' The logic is that you want any object to at least exist fully in at least one of the padded subvolumes. So if your biggest object is a 200 Kpc halo in a 20 Mpc volume, you need at least 0.01 padding (in simulation units). So for this example, I'd set padding to 0.02 or 0.03 to be conservative. Is there a simple way to estimate the largest object one can expect in a certain cosmological volume?
2. Are there any outstanding bugs in SS_HopOutput that you have found prevent it from replacing the old one, and working in both serial and parallel? (I have found none.)
I haven't found any either. In my opinion it's good to go. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
It depends on your cosmological size - not the gridding. Smaller volumes need bigger padding, but I've discovered that 'a little goes a long way.' The logic is that you want any object to at least exist fully in at least one of the padded subvolumes. So if your biggest object is a 200 Kpc halo in a 20 Mpc volume, you need at least 0.01 padding (in simulation units). So for this example, I'd set padding to 0.02 or 0.03 to be conservative. Is there a simple way to estimate the largest object one can expect in a certain cosmological volume?
I personally feel we should allow the user to specify the minimum percentage of the matter enclosed in their box, and provide a relatively low value for this. From this, a length scale drops out, based on the total number of particles and the domain size. We cannot assume that the domain runs from 0 to 1 or that the three axes are identical.
I haven't found any either. In my opinion it's good to go.
Once we've addressed point 1, I agree. -Matt
I personally feel we should allow the user to specify the minimum percentage of the matter enclosed in their box, and provide a relatively low value for this. From this, a length scale drops out, based on the total number of particles and the domain size. We cannot assume that the domain runs from 0 to 1 or that the three axes are identical.
That's an OK solution for the dimensional problems you raise. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Hi Stephen and others, After my first email, I realized that we should not want the Domain to be considered equal, but that I think we should not consider the dx's to be different. This breaks a bit of generalization, but I'm okay with that. Domains I think we should keep general. I propose that we accept a parameter: minimum_halo_mass which we can use to come up with padding by: rho = total_mass / product(root_grid_dimensions) this gives us a mass per root grid cell. (Note that this is also the mean dark matter density of the universe, if we only have DM particles. But HOP is more general than just DM, so we do not make this assumption.) We should have padding such that it's over-resolving our minimum mass by some factor B. padding = B * root_dx * minimum_halo_mass / rho The reason I'm looking at this in terms of root grid cells is the typical sync between root grid cells and the particles; Brian's done some work with particle dimensions differing from the hydro dims, but I think what we can consider is that even in that case, the two are *correlated* by some multiple of RefineBy. The same is true for simulations with "zoom" particles. Maybe I'm on the wrong track, but does this make sense to anyone else? Brian, Britton? -Matt On Fri, Feb 20, 2009 at 3:18 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
I personally feel we should allow the user to specify the minimum percentage of the matter enclosed in their box, and provide a relatively low value for this. From this, a length scale drops out, based on the total number of particles and the domain size. We cannot assume that the domain runs from 0 to 1 or that the three axes are identical.
That's an OK solution for the dimensional problems you raise.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Anybody have any thoughts on this? I'd like to wrap this up in the next four days, if possible. On Fri, Feb 20, 2009 at 5:49 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Stephen and others,
After my first email, I realized that we should not want the Domain to be considered equal, but that I think we should not consider the dx's to be different. This breaks a bit of generalization, but I'm okay with that. Domains I think we should keep general. I propose that we accept a parameter:
minimum_halo_mass
which we can use to come up with padding by:
rho = total_mass / product(root_grid_dimensions)
this gives us a mass per root grid cell. (Note that this is also the mean dark matter density of the universe, if we only have DM particles. But HOP is more general than just DM, so we do not make this assumption.) We should have padding such that it's over-resolving our minimum mass by some factor B.
padding = B * root_dx * minimum_halo_mass / rho
The reason I'm looking at this in terms of root grid cells is the typical sync between root grid cells and the particles; Brian's done some work with particle dimensions differing from the hydro dims, but I think what we can consider is that even in that case, the two are *correlated* by some multiple of RefineBy. The same is true for simulations with "zoom" particles.
Maybe I'm on the wrong track, but does this make sense to anyone else? Brian, Britton?
-Matt
On Fri, Feb 20, 2009 at 3:18 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
I personally feel we should allow the user to specify the minimum percentage of the matter enclosed in their box, and provide a relatively low value for this. From this, a length scale drops out, based on the total number of particles and the domain size. We cannot assume that the domain runs from 0 to 1 or that the three axes are identical.
That's an OK solution for the dimensional problems you raise.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Matt,
Anybody have any thoughts on this? I'd like to wrap this up in the next four days, if possible.
Your minimum_halo_mass parameter idea seems fine with me. Like I've said before, all that's needed is a bit of padding, dependent on the biggest object, which is of course loosely dependent on the mass of the largest object. I think all it would take is a bit of testing, similar to what I've done before, to make sure the ratios work out to have enough padding. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
I'm not sure I have any better ideas than what's been presented. Matt's padding formula sounds as good as anything. Since we don't really have a good idea for how to define the padding that comes from first principles, maybe it would be helpful to do a small convergence study where we run some unigrids with varying box size and topgrid resolution, and then varying this padding parameter. Maybe from that we will be able to draw up some crude formula that will help people choose the right value for this parameter. I'm about to do a bunch of unigrid runs for my own work, so I would be willing to do some of this. Forgive my ignorance, is there a sample script or something somewhere on how to use yt_hop in parallel? On Tue, Feb 24, 2009 at 3:25 PM, Stephen Skory <stephenskory@yahoo.com>wrote:
Matt,
Anybody have any thoughts on this? I'd like to wrap this up in the next four days, if possible.
Your minimum_halo_mass parameter idea seems fine with me. Like I've said before, all that's needed is a bit of padding, dependent on the biggest object, which is of course loosely dependent on the mass of the largest object. I think all it would take is a bit of testing, similar to what I've done before, to make sure the ratios work out to have enough padding.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ <http://physics.ucsd.edu/%7Esskory/> _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Britton,
Forgive my ignorance, is there a sample script or something somewhere on how to use yt_hop in parallel?
Here is the script I use run parallel hop and dump all the halo data to HDF5 files, and it also will write out HopAnalysis.out file. Contained within this script is pretty much all you need, I think. Let me know if something is wrong! http://paste.enzotools.org/show/57/ _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Okay, so then I see two things we need to do: * Decide on a minimum halo size. Then we can move everyone over to HaloFinder, rather than HopGroup directly. (HopGroup will still be available, but HaloFinder by default works on the entire dataset, so we should encourage that.) * Add the HDF5 write functions to the Halo objects. The first one is the only one we need to do before moving SS_HopOutput.py over HopOutput.py. Britton, I've made a new paste that shows the simplest usage of the new halo finder: http://paste.enzotools.org/show/58/ Let's decide on a minimum volume and then move it over. The parameter will be adjustable, but we'll make the *halo size* the argument, rather than the padding; this keeps us tied to the physical arguments, rather than the code abstractions. Once we've moved it over, we'll add the HDF5 halo write-outs to the Halo groups themselves. If anyone has to manually figure out which processor an object resides on, we have failed. -Matt On Tue, Feb 24, 2009 at 6:05 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
Britton,
Forgive my ignorance, is there a sample script or something somewhere on how to use yt_hop in parallel?
Here is the script I use run parallel hop and dump all the halo data to HDF5 files, and it also will write out HopAnalysis.out file. Contained within this script is pretty much all you need, I think. Let me know if something is wrong!
http://paste.enzotools.org/show/57/
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Ah, sorry --
there is an obvious typo where I did not change the argument of write_out to remove the format string for a no-longer-there variable. Apologies. -Matt
All,
* Add the HDF5 write functions to the Halo objects.
When it comes to this, I think there should be the choice to write either in 'parallel' to multiple HDF5 files, which is very fast but less convenient, or in 'serial' to one HDF5 file which is very slow, but handy. As an example, writing out HOP data from a 512^3 particle dataset (L7) takes less than 10 minutes in parallel mode, but in serial mode takes over five hours. At least it takes that long on Ranger, and some of that may be due to pytables and very large datasets. Do we think it's a good idea to include a text file with a parallel write, like for packed AMR enzo, that lists the location of each halo dataset in the .cpu files? The situations I can think of when one would only want a subset of the haloes are when one wants a specific halo or haloes from a physical region of the box. Since the box is spatially decomposed, it may also be useful to record the boundaries of each .cpu file in a simple way. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
When it comes to this, I think there should be the choice to write either in 'parallel' to multiple HDF5 files, which is very fast but less convenient, or in 'serial' to one HDF5 file which is very slow, but handy. As an example, writing out HOP data from a 512^3 particle dataset (L7) takes less than 10 minutes in parallel mode, but in serial mode takes over five hours. At least it takes that long on Ranger, and some of that may be due to pytables and very large datasets.
I'd like for halos to have only one method, receiving the file handle. The finder can also have a single method. When run in parallel this will distribute file handles based on the processor; when run in serial, just one will be distributed, to each in turn. The finder method will be a parallel transaction, blocking across processors at the end; the halo method will not.
Do we think it's a good idea to include a text file with a parallel write, like for packed AMR enzo, that lists the location of each halo dataset in the .cpu files? The situations I can think of when one would only want a subset of the haloes are when one wants a specific halo or haloes from a physical region of the box. Since the box is spatially decomposed, it may also be useful to record the boundaries of each .cpu file in a simple way.
Writing out the cpu file is not a bad idea. Two column ASCII, written out by the root processor, listing halo id and filename. The boundaries file would probably be seven column, cpu, left edge and right edge. (Without padding) I'll implement these methods today. -Matt
I've checked in my first attempt at this in r1188. I've also (I believe) fixed the particle index as float problem. I'd like any non-fields to be added as attributes; you can see where I've added a little stub for that. I've not yet tested in parallel (doing that now.) On Wed, Feb 25, 2009 at 8:19 AM, Matthew Turk <matthewturk@gmail.com> wrote:
When it comes to this, I think there should be the choice to write either in 'parallel' to multiple HDF5 files, which is very fast but less convenient, or in 'serial' to one HDF5 file which is very slow, but handy. As an example, writing out HOP data from a 512^3 particle dataset (L7) takes less than 10 minutes in parallel mode, but in serial mode takes over five hours. At least it takes that long on Ranger, and some of that may be due to pytables and very large datasets.
I'd like for halos to have only one method, receiving the file handle. The finder can also have a single method. When run in parallel this will distribute file handles based on the processor; when run in serial, just one will be distributed, to each in turn. The finder method will be a parallel transaction, blocking across processors at the end; the halo method will not.
Do we think it's a good idea to include a text file with a parallel write, like for packed AMR enzo, that lists the location of each halo dataset in the .cpu files? The situations I can think of when one would only want a subset of the haloes are when one wants a specific halo or haloes from a physical region of the box. Since the box is spatially decomposed, it may also be useful to record the boundaries of each .cpu file in a simple way.
Writing out the cpu file is not a bad idea. Two column ASCII, written out by the root processor, listing halo id and filename. The boundaries file would probably be seven column, cpu, left edge and right edge. (Without padding)
I'll implement these methods today.
-Matt
participants (3)
-
Britton Smith
-
Matthew Turk
-
Stephen Skory