Hi guys, On Monday I rewrote the (adaptive) projection backend to use Quad Trees for point placement instead of the current system. As it stands, the current system for projecting works pretty well, except for a couple caveats: - It has high peak memory usage - It can only be decomposed in the image plane, so inline projections can't be made, and load balancing AMR was poor - It's complicated, but that complication comes from wanting things to be fast Essentially, it's fast by wall-clock standards, but you need to run in parallel for big datasets. Plus, if you want to run with big datasets, you probably need to fully occupy nodes, because of the peak memory usage. Anyway, I was getting very frustrated with points #1 and #3 (but not #2, oddly) and so I wrote a new backend using Quad Trees. We did everything using int math anyway, but because joining grids was O(NxM), things got a bit slow in some domains. The conversion wasn't too bad. I ran some simple benchmarks. These were all conducted on the Triton resource, in serial, on a single node. They do not reflect time for disk IO, but they should roughly reflect real memory load. * For my 31 level PopIII binary star run (875 million cells, 8000 grids), it took 9 seconds with peak memory usage of 176 megs. * For a large, 768^3 with (peak) 12 levels of refinement (874 million cells in 125,000 grids), it takes 81 seconds with peak memory load of 840 megs. * For the 512^3 L7 lightcone run, at a time when the data had 300,000 grids, it took 170 seconds with peak memory usage of 2.5 gigs. In all cases, the results are correct according to the standard check of projecting "Ones". These are crazy improvements, especially considering that these are times for serial projections. And important note here is that this *will* parallelize independent of the image plane, although I have yet to see the need to parallelize just yet. The load balancing may be better optimized based on image plane decomposition, but the algorithm should work with independent joins. Additionally in parallel the final join at the end of the calculation may have higher peak memory usage if the decomposition is not via image plane decomposition, but it should still work. Unfortunately, right now my time is very constrained; so I am going to put out there that if someone would be willing to help me break apart the existing code and insert this code, then we can swap out the engines. But in the absence of that, I'm not sure that it'll get inserted into the main line before the Summer, or at least the late Spring. As is, I can definitely provide a very simple and manual interface to use it (and I intend to do so) but integrating it into the mainline is not going to be my priority right now. But, again, I will provide *an* interface, even if it's not a *final* interface and likely not a *nice* interface. Anyway, I'm pretty excited about this -- definitely a huge improvement over the previous algorithm, and we still retain the *adaptive* nature of the projections: project once, plot many. -Matt
Hi all, Another update! Today I was sent a 1024^3, 7 level, 1.6e9 cell dataset. The stats were: 0 512 1073741824 1 288126 433602800 2 83777 92854272 3 29655 23459488 4 14328 7726208 5 6733 4637416 6 3512 3355336 7 1647 1997864 ---------------------------- 428290 1641375208 That's all total cells, not effective cells. I ran the quadtree projection code on it, including data IO, but no weighting fields, in serial, on a low-memory node. The results made me very, very happy. To project the entire dataset to finest resolution, it took 2340 seconds. The peak memory usage was 2.7gigs. I'm *really* happy with how low those two numbers are. This would fit in a single core on Kraken. I'm substantially more motivated to wrap this into the existing projection machinery now. On the converse, with these numbers the way they are, I'm much less motivated to parallelize it. :) With the image pan-n-scanner, it's actually completely interactive to pan through it, zoom in however you like, all of that -- I was doing it here on my machine with frame rates that were about 1-5, but I think on a faster machine it might work much better. Anyway, in case anybody wants to try it at home, I've uploaded a zip file with the projection, the necessary info and a couple scripts that let you either pan interactively if you have Chaco installed (if you used the Snow Leopard install script, you should!) or, if you don't like the whole GUI thing, a pan controller that saves out an image every time you make a move. To run the pan_controller, do something like: ipython -q4thread pan_controller.py To run the not-as-interactive script, do this: python2.6 -i pan_saver.py Now you have access do an object, ip, that saves out a new image called "wimage_000.png" every time you call .zoom(factor) or any of the other methods on the object that change the bounding box of its image. This includes zoom, pan(delx, dely), pan_rel(reldelx, reldely) and some others, but those are the big ones. If you open wimage_000.png in preview, every time it saves preview should update. The zip file is available here, but I'll likely delete it in a few days: http://yt.enzotools.org/files/rd0027_panner.zip It's about 92 megs compressed, but uncompresses to around 900mb. Let me know if it breaks, but I think otherwise this is a really cool way to share images and data -- this 92 meg bundle lets you zoom all the way up in a humongous AMR dataset! I'll be writing up how to share data like this for regular projections for 1.7's docs. -Matt On Wed, Apr 28, 2010 at 7:44 PM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi guys,
On Monday I rewrote the (adaptive) projection backend to use Quad Trees for point placement instead of the current system. As it stands, the current system for projecting works pretty well, except for a couple caveats:
- It has high peak memory usage - It can only be decomposed in the image plane, so inline projections can't be made, and load balancing AMR was poor - It's complicated, but that complication comes from wanting things to be fast
Essentially, it's fast by wall-clock standards, but you need to run in parallel for big datasets. Plus, if you want to run with big datasets, you probably need to fully occupy nodes, because of the peak memory usage.
Anyway, I was getting very frustrated with points #1 and #3 (but not #2, oddly) and so I wrote a new backend using Quad Trees. We did everything using int math anyway, but because joining grids was O(NxM), things got a bit slow in some domains. The conversion wasn't too bad.
I ran some simple benchmarks. These were all conducted on the Triton resource, in serial, on a single node. They do not reflect time for disk IO, but they should roughly reflect real memory load.
* For my 31 level PopIII binary star run (875 million cells, 8000 grids), it took 9 seconds with peak memory usage of 176 megs. * For a large, 768^3 with (peak) 12 levels of refinement (874 million cells in 125,000 grids), it takes 81 seconds with peak memory load of 840 megs. * For the 512^3 L7 lightcone run, at a time when the data had 300,000 grids, it took 170 seconds with peak memory usage of 2.5 gigs.
In all cases, the results are correct according to the standard check of projecting "Ones". These are crazy improvements, especially considering that these are times for serial projections.
And important note here is that this *will* parallelize independent of the image plane, although I have yet to see the need to parallelize just yet. The load balancing may be better optimized based on image plane decomposition, but the algorithm should work with independent joins. Additionally in parallel the final join at the end of the calculation may have higher peak memory usage if the decomposition is not via image plane decomposition, but it should still work.
Unfortunately, right now my time is very constrained; so I am going to put out there that if someone would be willing to help me break apart the existing code and insert this code, then we can swap out the engines. But in the absence of that, I'm not sure that it'll get inserted into the main line before the Summer, or at least the late Spring. As is, I can definitely provide a very simple and manual interface to use it (and I intend to do so) but integrating it into the mainline is not going to be my priority right now. But, again, I will provide *an* interface, even if it's not a *final* interface and likely not a *nice* interface.
Anyway, I'm pretty excited about this -- definitely a huge improvement over the previous algorithm, and we still retain the *adaptive* nature of the projections: project once, plot many.
-Matt
participants (1)
-
Matthew Turk