Hi Stephen, I was wondering how parallel HOP is going? I've got some big datasets I'd like to run it on, but I haven't heard if it's working as expected or not. What's the current status? Are the results converged? Do we have a good idea of how to pad the tiles? I'll probably use this information to update ticket #163. -Matt
I was wondering how parallel HOP is going? I've got some big datasets I'd like to run it on, but I haven't heard if it's working as expected or not. What's the current status? Are the results converged? Do we have a good idea of how to pad the tiles?
I got distracted from some of my testing (sorry!), and so my opinions aren't fully set yet. I'd say for now if you give a 'healthy' amount of padding, so larger than your largest object, you can do science with it right now. The changes between serial and parallel HOP are smaller than the fuzziness of halo boundaries themselves, so go with it! _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
I was wondering how parallel HOP is going? I've got some big datasets I'd like to run it on, but I haven't heard if it's working as expected or not. What's the current status? Are the results converged? Do we have a good idea of how to pad the tiles?
I got distracted from some of my testing (sorry!), and so my opinions aren't fully set yet. I'd say for now if you give a 'healthy' amount of padding, so larger than your largest object, you can do science with it right now. The changes between serial and parallel HOP are smaller than the fuzziness of halo boundaries themselves, so go with it!
Could you quantify that a bit, Stephen? I definitely agree with the point about the fuzziness of halo boundaries, but how far off are the results when you vary processor count? If you were to make halo catalogs of the same dataset using 1,2,4,8, etc. processors, and then compare them (assuming that n=1 processor is the "perfect" solution), how far off are the halo centers? If you haven't done this particular comparison, I'm happy to do it. I have some code laying around from the Enzo/Gadget code comparison that would be perfect for this task, if you want to generate the halo catalogs and give 'em to me. --Brian
All,
Could you quantify that a bit, Stephen? I definitely agree with the point about the fuzziness of halo boundaries, but how far off are the results when you vary processor count? If you were to make halo catalogs of the same dataset using 1,2,4,8, etc. processors, and then compare them (assuming that n=1 processor is the "perfect" solution), how far off are the halo centers?
If you're interested, I've made a parameter-space survey of processor counts and padding, here, with lots of pictures: http://stephenskory.com/research/?p=1469 It's password protected. Contact me off-list if you want it. But in summary, I found that parallel hop differs from serial by no more than 1% in absolute particle count, and the change is larger in smaller haloes, which are already vague to begin with. Even for very large objects (relative to the box) a little bit of padding goes a long way, 0.05 is nearly identical to a padding of 0.2. However, padding is still required to get good answers. The centers change by very, very little between serial and parallel. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Hi Stephen, Awesome. I am going to see if I can replicate your methodology over here sometime this weekend, because this is great stuff and it should get written up. Looks like basically the entire analysis workflow is now parallelized -- derived quantities, projections, halo profiling, radial & phase plots, I think even slices, and HOP. The only major component I see as missing is clump finding, and I have an idea for that, but for now it should be parallelized on the iterate-over-halos level. This is tremendous. -Matt On Thu, Jan 22, 2009 at 5:26 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
All,
Could you quantify that a bit, Stephen? I definitely agree with the point about the fuzziness of halo boundaries, but how far off are the results when you vary processor count? If you were to make halo catalogs of the same dataset using 1,2,4,8, etc. processors, and then compare them (assuming that n=1 processor is the "perfect" solution), how far off are the halo centers?
If you're interested, I've made a parameter-space survey of processor counts and padding, here, with lots of pictures:
http://stephenskory.com/research/?p=1469
It's password protected. Contact me off-list if you want it.
But in summary, I found that parallel hop differs from serial by no more than 1% in absolute particle count, and the change is larger in smaller haloes, which are already vague to begin with. Even for very large objects (relative to the box) a little bit of padding goes a long way, 0.05 is nearly identical to a padding of 0.2. However, padding is still required to get good answers. The centers change by very, very little between serial and parallel.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi everyone, Sorry to pollute the inboxes. Thing is, I think that maybe I want to dial back my enthusiasm. I think what has been sitting poorly with me is that we don't know *why* the particles are disappearing, and where they are going. If they exist within the padding region, then they should still show up. I'm going to run some of my own tests on the RD0005-mine dataset you gave me, Stephen, and see what I come up with. I want to figure out *why* it's doing what it's doing. -Matt On Thu, Jan 22, 2009 at 5:31 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Stephen,
Awesome. I am going to see if I can replicate your methodology over here sometime this weekend, because this is great stuff and it should get written up.
Looks like basically the entire analysis workflow is now parallelized -- derived quantities, projections, halo profiling, radial & phase plots, I think even slices, and HOP. The only major component I see as missing is clump finding, and I have an idea for that, but for now it should be parallelized on the iterate-over-halos level.
This is tremendous.
-Matt
On Thu, Jan 22, 2009 at 5:26 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
All,
Could you quantify that a bit, Stephen? I definitely agree with the point about the fuzziness of halo boundaries, but how far off are the results when you vary processor count? If you were to make halo catalogs of the same dataset using 1,2,4,8, etc. processors, and then compare them (assuming that n=1 processor is the "perfect" solution), how far off are the halo centers?
If you're interested, I've made a parameter-space survey of processor counts and padding, here, with lots of pictures:
http://stephenskory.com/research/?p=1469
It's password protected. Contact me off-list if you want it.
But in summary, I found that parallel hop differs from serial by no more than 1% in absolute particle count, and the change is larger in smaller haloes, which are already vague to begin with. Even for very large objects (relative to the box) a little bit of padding goes a long way, 0.05 is nearly identical to a padding of 0.2. However, padding is still required to get good answers. The centers change by very, very little between serial and parallel.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Okay, after some off-list discussion, I'm bringing it back here. I agree, the differences are not *necessarily* important. But, they are dependent on tilings. I don't like that. The way I see it, there are a few possibilities here. But let me start out by saying, I do not think we should present this unless we know *where* the differences come from; we should at least have an idea what's going on. The HOP algorithm, from my reading of the paper, works in a couple steps. 1. Associate with each particle a density. 2. Trace all particles to 'maximum' density among neighbors; all particles whose densest nearest neighbor is a given particle are a "group." 3. Remove all particles below the density threshold. 4. The groups sharing "sufficiently dense" boundaries are re-joined. So I'd like to identify where in the process the particle results change. This will likely -- by necessity -- include some patches to change the memory structure. It is not clear to me that identifying the divergence is a trivial matter. Stephen, if you could put your data up somewhere, along with the scripts you used to generate that blog post, we can get going on this in earnest. -Matt PS Are we sure none of this is related to periodicity? HOP is scale free, so I would suspect it wraps around the local tile. On Thu, Jan 22, 2009 at 9:00 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi everyone,
Sorry to pollute the inboxes. Thing is, I think that maybe I want to dial back my enthusiasm. I think what has been sitting poorly with me is that we don't know *why* the particles are disappearing, and where they are going. If they exist within the padding region, then they should still show up. I'm going to run some of my own tests on the RD0005-mine dataset you gave me, Stephen, and see what I come up with. I want to figure out *why* it's doing what it's doing.
-Matt
On Thu, Jan 22, 2009 at 5:31 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Stephen,
Awesome. I am going to see if I can replicate your methodology over here sometime this weekend, because this is great stuff and it should get written up.
Looks like basically the entire analysis workflow is now parallelized -- derived quantities, projections, halo profiling, radial & phase plots, I think even slices, and HOP. The only major component I see as missing is clump finding, and I have an idea for that, but for now it should be parallelized on the iterate-over-halos level.
This is tremendous.
-Matt
On Thu, Jan 22, 2009 at 5:26 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
All,
Could you quantify that a bit, Stephen? I definitely agree with the point about the fuzziness of halo boundaries, but how far off are the results when you vary processor count? If you were to make halo catalogs of the same dataset using 1,2,4,8, etc. processors, and then compare them (assuming that n=1 processor is the "perfect" solution), how far off are the halo centers?
If you're interested, I've made a parameter-space survey of processor counts and padding, here, with lots of pictures:
http://stephenskory.com/research/?p=1469
It's password protected. Contact me off-list if you want it.
But in summary, I found that parallel hop differs from serial by no more than 1% in absolute particle count, and the change is larger in smaller haloes, which are already vague to begin with. Even for very large objects (relative to the box) a little bit of padding goes a long way, 0.05 is nearly identical to a padding of 0.2. However, padding is still required to get good answers. The centers change by very, very little between serial and parallel.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
All,
So I'd like to identify where in the process the particle results change. This will likely -- by necessity -- include some patches to change the memory structure. It is not clear to me that identifying the divergence is a trivial matter.
As your PS alluded to, I'd suggest we start looking at periodicity as a possible source, simply because it's the easiest. Then it would be to try to discover at which step the divergence is happening. It would be instructive to be able to track easily the linked-to neighbor for each particle on the first pass, for example. What part of this would you like me to do, if any, Matt and Brian?
Stephen, if you could put your data up somewhere, along with the scripts you used to generate that blog post, we can get going on this in earnest.
Below you can find a link to download the dataset and the script I used to run it. http://stephenskory.com/research/?p=1538 _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
As your PS alluded to, I'd suggest we start looking at periodicity as a possible source, simply because it's the easiest.
I will commit a revision tonight to parameterize the periodicity. I think in the parallel halo finder, because we use boxes that we *manually* make periodic, we always want it off.
It would be instructive to be able to track easily the linked-to neighbor for each particle on the first pass, for example.
Agreed. I'll set this up, too. For this part I will likely distribute patches via the pastebin.
What part of this would you like me to do, if any, Matt and Brian?
I think most effective would be if I set up the code parts to get the data out, and then we can all go nuts on it. I'll do that tonight, after I get home. This afternoon I will likely be writing the entire time, so I won't be doing any of this until after dinnertime. (Especially since I am without BSG viewing this week until tomorrow. Boo.) Thanks for the data & script! -Matt
All, I think I've figured out the particle deficits. In the parallel HOP routine, there is a total_mass calculation that adds up the mass of the particles on all the procs with padding=0.0 temporarily. If you run parallel HOP with padding 0.0, and add up the numbers of particles going to each proc, you'll get *more* than the actual number of particles. Quoting Matt: "This is a result of adding grid['dx'] onto the selection criteria for cells." So total_mass is being calculated with too many particles, which changes the density adjustments that need to go into each run of HOP. Here's pictures and a more detailed explaination: http://stephenskory.com/research/?p=1544 On the negative side, the periodic particles are still not being plotted. Which is odd since they appear to be included in the haloes with the total_mass fix. So we need to make the total_mass calculation correct. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Stephen, Sorry for being dense; if you change (from latest SVN, you will have to find the lines yourself in an older revision) lines 1616-1621 from: (grid['x'] + 0.5 * grid['dx'] > self.left_edge[0]) to (grid['x'] + 0.0 * grid['dx'] > self.left_edge[0]) (and so on, so that all of them are non-inclusive) do you get identical *textual* results over your suite of tests? Would it be possible to impose upon you to run this on 1, 2, 8, 16 processors, with some fixed padding, and then send the HopOutput.txt from those four to the list? Plots would be added goodness, but not necessary, since it seems there's a glitch. (You should feel free to attach the plots to the emails, too. :) Thanks! -Matt On Fri, Jan 23, 2009 at 2:10 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
All,
I think I've figured out the particle deficits. In the parallel HOP routine, there is a total_mass calculation that adds up the mass of the particles on all the procs with padding=0.0 temporarily. If you run parallel HOP with padding 0.0, and add up the numbers of particles going to each proc, you'll get *more* than the actual number of particles. Quoting Matt: "This is a result of adding grid['dx'] onto the selection criteria for cells." So total_mass is being calculated with too many particles, which changes the density adjustments that need to go into each run of HOP.
Here's pictures and a more detailed explaination:
http://stephenskory.com/research/?p=1544
On the negative side, the periodic particles are still not being plotted. Which is odd since they appear to be included in the haloes with the total_mass fix.
So we need to make the total_mass calculation correct.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Matt,
(grid['x'] + 0.5 * grid['dx'] > self.left_edge[0])
to
(grid['x'] + 0.0 * grid['dx'] > self.left_edge[0])
(and so on, so that all of them are non-inclusive)
do you get identical *textual* results over your suite of tests?
Would it be possible to impose upon you to run this on 1, 2, 8, 16 processors, with some fixed padding, and then send the HopOutput.txt from those four to the list?
All of these things have been done. I've attached the text files, where MattFix outputs are using Matt's request above, by-hand is the output from my total_mass fix done by hand. The only differences between these is the value of 'max_dens,' which changes due to box size, but all other values are identical for each run. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
All of these things have been done. I've attached the text files, where MattFix outputs are using Matt's request above, by-hand is the output from my total_mass fix done by hand. The only differences between these is the value of 'max_dens,' which changes due to box size, but all other values are identical for each run.
Awesome. So we just need to rescale the value of max_dens *back*, which I will check into, and then we're set. I will add a new subclass of AMRRegionBase that is AMRRegionStrictBase, which does not include the +0.5[dx]. (I happen to think that we -- in general -- want the *current* behavior, but for this clearly we do not.) This will be the returned class for all domain decomp. The files you sent have one the scale of two particles different; I will see if I can pick out which two those are, and then IF we are able to get that set correctly, that would be an amazing bonus. If NOT, I think I am prepared to accept it. Thanks! -Matt
Okay, I've gone ahead and made the changes I mentioned. Now the domain decomposition gives "strict" dxs, which do not add on any dx padding. I also noticed that the periodic ones were giving a *full* dx of padding (I changed this to be 0.5 of padding in the default, 0 in a strict periodic region) which may have been the source of the other particle changes. If you have it set up to go, and run again, that'd be sweet. I asset now you should not need to have any changes to the dx padding, as they should all be done correctly in the ParallelTools.py and BaseDataTypes.py. If not, I'll do it myself tonight. :) Very close here... -Matt On Fri, Jan 23, 2009 at 2:42 PM, Matthew Turk <matthewturk@gmail.com> wrote:
All of these things have been done. I've attached the text files, where MattFix outputs are using Matt's request above, by-hand is the output from my total_mass fix done by hand. The only differences between these is the value of 'max_dens,' which changes due to box size, but all other values are identical for each run.
Awesome. So we just need to rescale the value of max_dens *back*, which I will check into, and then we're set. I will add a new subclass of AMRRegionBase that is AMRRegionStrictBase, which does not include the +0.5[dx]. (I happen to think that we -- in general -- want the *current* behavior, but for this clearly we do not.) This will be the returned class for all domain decomp. The files you sent have one the scale of two particles different; I will see if I can pick out which two those are, and then IF we are able to get that set correctly, that would be an amazing bonus. If NOT, I think I am prepared to accept it.
Thanks!
-Matt
If you have it set up to go, and run again, that'd be sweet. I asset now you should not need to have any changes to the dx padding, as they should all be done correctly in the ParallelTools.py and BaseDataTypes.py. If not, I'll do it myself tonight. :)
I did another set of 2,4,8,16 proc runs with 0.2 padding using r1143, and I got the identical results to my hand corrected total_mass runs. I think we can rest easy that it's almost the same as serial! _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Okay, so we're looking at one particle different in the most massive halo, two in one of the less massive ones between 4 & 2 processors. What do the rest of you think? I am torn between being anal about this and just saying that's not a big deal. On Fri, Jan 23, 2009 at 3:41 PM, Stephen Skory <stephenskory@yahoo.com> wrote:
If you have it set up to go, and run again, that'd be sweet. I asset now you should not need to have any changes to the dx padding, as they should all be done correctly in the ParallelTools.py and BaseDataTypes.py. If not, I'll do it myself tonight. :)
I did another set of 2,4,8,16 proc runs with 0.2 padding using r1143, and I got the identical results to my hand corrected total_mass runs.
I think we can rest easy that it's almost the same as serial!
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Okay, so we're looking at one particle different in the most massive halo, two in one of the less massive ones between 4 & 2 processors. What do the rest of you think? I am torn between being anal about this and just saying that's not a big deal.
One or two parts in 260,000 is pretty darn good. So I am inclined to say it's not a big deal. I've been bouncing ideas around in my head as to why I think this kind of variance is unavoidable. If you take a perverse situation where there is only one particle in a subbox, clearly determining its overdensity is ridiculous. But this says for subboxes with more reasonable numbers of particles, the overdensity is not as determined as for the whole box. I think subdividing the whole introduces error. Uh, I dunno, at any rate I think we're at the point of diminishing returns. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Alright, I'm on board. Next step is some determination of the padding. I think this likely should be done in terms of root grid dimensions; it seems we've had some good success with low integer multiples of (DomainRightEdge-DomainLeftEdge)/RootGridDimensions . As I recall, you said 0.02 was sufficient for your 64^3 run? Or am I misremembering? On Sat, Jan 24, 2009 at 10:09 AM, Stephen Skory <stephenskory@yahoo.com> wrote:
Okay, so we're looking at one particle different in the most massive halo, two in one of the less massive ones between 4 & 2 processors. What do the rest of you think? I am torn between being anal about this and just saying that's not a big deal.
One or two parts in 260,000 is pretty darn good. So I am inclined to say it's not a big deal.
I've been bouncing ideas around in my head as to why I think this kind of variance is unavoidable. If you take a perverse situation where there is only one particle in a subbox, clearly determining its overdensity is ridiculous. But this says for subboxes with more reasonable numbers of particles, the overdensity is not as determined as for the whole box. I think subdividing the whole introduces error.
Uh, I dunno, at any rate I think we're at the point of diminishing returns.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Next step is some determination of the padding. I think this likely should be done in terms of root grid dimensions; it seems we've had some good success with low integer multiples of (DomainRightEdge-DomainLeftEdge)/RootGridDimensions . As I recall, you said 0.02 was sufficient for your 64^3 run? Or am I misremembering?
I did tests with 0.0, 0.05, and 0.2 padding, and even with my massive halo the 0.05 padding was almost the same (by one or two particles) as the 0.2 case for runs of p>=4. My inclination is to default to something conservative (say 0.1, I think 0.2 is too much and wasteful) and document what padding means so users understand why they might want a padding bigger or smaller than the default. _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Okay, 0.1 works for the 64^3 run, which would be 6.4 root grid cells in that simulation. Obviously it would be nice to have this in a scale-dependent format, but since HOP is formally scale-free, I think we should be able to do it in terms of total-number-of-particles. I'll set up some tests on some 128^3 and 256^3 datasets I have here. -Matt On Sat, Jan 24, 2009 at 10:25 AM, Stephen Skory <stephenskory@yahoo.com> wrote:
Next step is some determination of the padding. I think this likely should be done in terms of root grid dimensions; it seems we've had some good success with low integer multiples of (DomainRightEdge-DomainLeftEdge)/RootGridDimensions . As I recall, you said 0.02 was sufficient for your 64^3 run? Or am I misremembering?
I did tests with 0.0, 0.05, and 0.2 padding, and even with my massive halo the 0.05 padding was almost the same (by one or two particles) as the 0.2 case for runs of p>=4. My inclination is to default to something conservative (say 0.1, I think 0.2 is too much and wasteful) and document what padding means so users understand why they might want a padding bigger or smaller than the default.
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________ _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (3)
-
Brian O'Shea
-
Matthew Turk
-
Stephen Skory