On Monday I rewrote the (adaptive) projection backend to use Quad Trees for point placement instead of the current system. As it stands, the current system for projecting works pretty well, except for a couple caveats:
- It has high peak memory usage - It can only be decomposed in the image plane, so inline projections can't be made, and load balancing AMR was poor - It's complicated, but that complication comes from wanting things to be fast
Essentially, it's fast by wall-clock standards, but you need to run in parallel for big datasets. Plus, if you want to run with big datasets, you probably need to fully occupy nodes, because of the peak memory usage.
Anyway, I was getting very frustrated with points #1 and #3 (but not #2, oddly) and so I wrote a new backend using Quad Trees. We did everything using int math anyway, but because joining grids was O(NxM), things got a bit slow in some domains. The conversion wasn't too bad.
I ran some simple benchmarks. These were all conducted on the Triton resource, in serial, on a single node. They do not reflect time for disk IO, but they should roughly reflect real memory load.
* For my 31 level PopIII binary star run (875 million cells, 8000 grids), it took 9 seconds with peak memory usage of 176 megs. * For a large, 768^3 with (peak) 12 levels of refinement (874 million cells in 125,000 grids), it takes 81 seconds with peak memory load of 840 megs. * For the 512^3 L7 lightcone run, at a time when the data had 300,000 grids, it took 170 seconds with peak memory usage of 2.5 gigs.
In all cases, the results are correct according to the standard check of projecting "Ones". These are crazy improvements, especially considering that these are times for serial projections.
And important note here is that this *will* parallelize independent of the image plane, although I have yet to see the need to parallelize just yet. The load balancing may be better optimized based on image plane decomposition, but the algorithm should work with independent joins. Additionally in parallel the final join at the end of the calculation may have higher peak memory usage if the decomposition is not via image plane decomposition, but it should still work.
Unfortunately, right now my time is very constrained; so I am going to put out there that if someone would be willing to help me break apart the existing code and insert this code, then we can swap out the engines. But in the absence of that, I'm not sure that it'll get inserted into the main line before the Summer, or at least the late Spring. As is, I can definitely provide a very simple and manual interface to use it (and I intend to do so) but integrating it into the mainline is not going to be my priority right now. But, again, I will provide *an* interface, even if it's not a *final* interface and likely not a *nice* interface.
Anyway, I'm pretty excited about this -- definitely a huge improvement over the previous algorithm, and we still retain the *adaptive* nature of the projections: project once, plot many.