QuadTree projection now works in parallel
Hi everyone, I've just pushed some changes to the quad tree projection that should parallelize it automatically. The old-style of projecting requires parallelization through a spatial decomposition in the 2D plane of the image. This results in two problems -- very poor load balancing in the current scheme and the inability to utilize this operation in situ, as it requires passing data around in a manner different from the simulation code's load balancing scheme. Furthermore, it can be slow. About a year ago I implemented a quadtree projection mechanism that was about an order of magnitude faster for big datasets. Unfortunately, because of the more complicated nature of the datastructures, I never parallelized it. This last week I figured out how to do this, and then implemented this parallelization in the quad_proj object in yt. I've tested it and it gives very good results for both memory and speed; it's about an order of magnitude faster than the old-style projection for my datasets, and I have been unable to get it to scale since the time-to-completion is so low. It would be great if some other people could test it, to see how well it scales for them. It should perform best in parallel where the spatial-decomposition will give poor results -- this is often with deeply nested hierarchies, or with refinement regions that do not cover the entire box. Additionally, if you are interested in testing it in situ, this is a good idea as well. To use it, you can simply do: pf.h.proj = pf.h.quad_proj and do the normal PlotCollection, lightcone, etc etc operations, or you can manually create quad_proj objects: qp = pf.h.quad_proj(0, "Density") (for instance) and examine those and the time for those. I would like to replace the old-style projection with this for the 2.2 release, if we can go back and forth and make sure it is up to snuff, so your testing is GREATLY appreciated to avoid any hiccups along the way. Thanks, Matt
Matt, great work on parallelizing it! I know how difficult it can be.
qp = pf.h.quad_proj(0, "Density")
I'll try to test it out later today. One comment - for add_projection we have this: pc.add_projection('Density', 0) but above it's in the other order. Do we want to make these more consistent? -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
Hi Matt, Well done! I tried it out already on a small dataset and it took the projection from 1.3s to 0.61s in serial. As soon as I have access to ssh (yes, still no external ssh here at the lab), I'll give it a shot on a large dataset in parallel and report back. Sam On Thu, Jun 2, 2011 at 3:03 PM, Stephen Skory <s@skory.us> wrote:
Matt,
great work on parallelizing it! I know how difficult it can be.
qp = pf.h.quad_proj(0, "Density")
I'll try to test it out later today. One comment - for add_projection we have this:
pc.add_projection('Density', 0)
but above it's in the other order. Do we want to make these more consistent?
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Nice work, Matt!
pf.h.proj = pf.h.quad_proj
Clever trick. I didn't think about doing this before when running QuadProj in serial before. I just tested it on pleiades on 128 processes on a large nested grid simulation. The run time decreased from 348s to 104s. Very nice. There were no errors, but there were some lingering RECV/SEND debugging messages. Here are the stats on the dataset. level # grids # cells --------------------------- 0 512 134217728 1 850 25412184 2 1229 140608000 3 13775 283353768 4 41136 195020184 5 29687 74512760 6 14847 17540640 7 4104 3998824 8 527 1099896 9 10 6984 ---------------------------- 106677 875770968 John
John, Sam, Stephen, Thanks very much for testing. I think maybe this is just about ready for production; sounds like we should push ahead with swapping it out for 2.2. The only remaining test is the light cone stuff, but I was able to test the source-selection and that worked for me. It would be a real feather in our caps I think to release 2.2 that included a completely new web GUI (which includes PyLab support as well as google maps-style widgets!), a projection speedup of 3.5x, and a new field system. And, even a new logo! There are a few more optimizations I believe I can apply, which I will attempt to do over the next little while -- but they are less invasive. Mainly these involve moving from pre-generated arrays of positions to generating positions inside the Cython code on an as-needed basis. John -- 350 s to 104 s is pretty good, I think. We may now be IO dominated, but if you supply the argument preload="all" it might cut down even further on the run time. Thanks, everyone. -Matt On Thu, Jun 2, 2011 at 1:24 PM, John Wise <jwise@astro.princeton.edu> wrote:
Nice work, Matt!
pf.h.proj = pf.h.quad_proj
Clever trick. I didn't think about doing this before when running QuadProj in serial before.
I just tested it on pleiades on 128 processes on a large nested grid simulation. The run time decreased from 348s to 104s. Very nice. There were no errors, but there were some lingering RECV/SEND debugging messages. Here are the stats on the dataset.
level # grids # cells --------------------------- 0 512 134217728 1 850 25412184 2 1229 140608000 3 13775 283353768 4 41136 195020184 5 29687 74512760 6 14847 17540640 7 4104 3998824 8 527 1099896 9 10 6984 ---------------------------- 106677 875770968
John _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi all, In the development repo, I have swapped out quad tree for the old style projection. Old style is now the data object "overlap_proj" and "quad_proj" is now "proj". -Matt On Thu, Jun 2, 2011 at 1:37 PM, Matthew Turk <matthewturk@gmail.com> wrote:
John, Sam, Stephen,
Thanks very much for testing. I think maybe this is just about ready for production; sounds like we should push ahead with swapping it out for 2.2. The only remaining test is the light cone stuff, but I was able to test the source-selection and that worked for me. It would be a real feather in our caps I think to release 2.2 that included a completely new web GUI (which includes PyLab support as well as google maps-style widgets!), a projection speedup of 3.5x, and a new field system. And, even a new logo!
There are a few more optimizations I believe I can apply, which I will attempt to do over the next little while -- but they are less invasive. Mainly these involve moving from pre-generated arrays of positions to generating positions inside the Cython code on an as-needed basis.
John -- 350 s to 104 s is pretty good, I think. We may now be IO dominated, but if you supply the argument preload="all" it might cut down even further on the run time.
Thanks, everyone.
-Matt
On Thu, Jun 2, 2011 at 1:24 PM, John Wise <jwise@astro.princeton.edu> wrote:
Nice work, Matt!
pf.h.proj = pf.h.quad_proj
Clever trick. I didn't think about doing this before when running QuadProj in serial before.
I just tested it on pleiades on 128 processes on a large nested grid simulation. The run time decreased from 348s to 104s. Very nice. There were no errors, but there were some lingering RECV/SEND debugging messages. Here are the stats on the dataset.
level # grids # cells --------------------------- 0 512 134217728 1 850 25412184 2 1229 140608000 3 13775 283353768 4 41136 195020184 5 29687 74512760 6 14847 17540640 7 4104 3998824 8 527 1099896 9 10 6984 ---------------------------- 106677 875770968
John _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (4)
-
John Wise
-
Matthew Turk
-
Sam Skillman
-
Stephen Skory