Hi Abhishek,

I'll answer your questions inline,

On Thu, Mar 15, 2018 at 2:15 PM, Abhishek Singh <abhisheksing@umass.edu> wrote:
[cc'ing directly Nathan in case the mail gets stuck in Mailman]

Hi Nathan,

I started working on the proposal for the project idea *Interpolating particle data onto grids*. After, familiarizing myself with the background information that you pointed to, I have few doubts which are as follows:

1. The YTEP-32 and paper by Dan Price mention "scatter" approach for interpolation. I could not find any material on "gather" approach. Do you have anything handy in this regard? Or should I take the path of scatter approach and proceed further?

Both are important. For the "gather" approach the idea is to calculate at each cell center what the list of nearest neighbor particles is and then use *those* particles to fill the cell.

Another way to look at it is in terms of big O analysis.

In the "scatter" approach the smoothing operation scales as O(n_particles) - for each particle, we check which zones the smoothing region for that particle overlaps with, and then deposit the contribution for that particle to that zone.

In the "gather" approach, the smoothing operation scales as O(n_zones) - for each zone we find which particles should contribute to that zone and use the SPH interpolation formula to find the estimate for a fied at the center of the zone.

(in principle there's nothing special about the center of the zone, we could sample at multiple regions in a single zone and use something like Gaussian quadrature to get a more accurate, conservative estimate)

In principle, the smoothed interplated data should go to the same answer for the scatter and gather approaches as the number of particles goes to infinity. In practice, for datasets with a finite number of particles, there will be situations where they won't agree. For example, there might be a situation where the some of the particles on the nearest neighbor list are very far away, e.g. near the edge of a blob of gas. In these situations, the "scatter" approach will likely not include the contributions of these distant particles. Fundamentally, there's no reason why just because one particle in the nearest neighbor list of some other particle *has* to have that other particle on its own nearest neighbor list. That will depend on the precise distribution of particles.

Often users will want the scatter approach because it's faster, sometimes they will want the gather approach, as that is more "natural" to the SPH formalism. In fact, different SPH codes internally use either scatter or gather-based approaches for different interpolation tasks.The idea for this project is that we will implement both and let users select which approach they want at runtime.

I suspect we will start with implementing just scatter, as it is faster and easier to think about conceptually, especially for uniform grids. Implementing the gather approach will require some thought about how to most efficiently search the KDTree to quickly generate nearest neighbor lists for each zone on the grid.
 

2. The project idea lists one of the deliverables as the ability to interpolate data onto a uniform resolution mesh. In YTEP-32 you have plotted slice and projection plots using octree and directly using particle data. Is my understanding correct? If we are able to plot the sph data with this particle-centric approach, then what is expected to be delivered with respect to this project idea.

Right now we've only implemented interpolation onto 2D images using the "Scatter" approach. One of the first parts of this project will be to extend that functionality to work with 3D grids and then to wire up the covering_grid and arbitrary_grid data objects use the new 3D grid filling code you will write.

Optimally we'll also add a "gather" version, as described above.
 

3. The optional deliverables list the ability to interpolate particle data onto an octree mesh. Is this not the current approach (yt version 3.*)?

Yes, it is similar to the current approach, and we may elect to re-use some of that code. However there are some issues with the way it currently works:

1. It's relatively complex and low-level (because it had to be because *all* visualization and analysis operations happenned on the octree). This means that it's hard for newcomers to understand the current octree code and that it's hard to modify. In addition, the SPH smoothing is done in a fashion that will someimes create incorrect smoothing results, particularl if nearest neighbor particles are not in located in neighboring octs.

2. There isn't an octree data object. So once the dataset is loaded, the octree parameters are set and users don't really have the ability to, say, define an octree that only covers a subset of the volume, or change the octree refinement criterion.

My hope is that at the end of this project, one will be able to do:

otree = ds.octree(left_edge, right_edge, n_ref=64)

Which will create an octree between the left and right edge, with a maximum of 64 particles per leaf node.

Like I said, it's possible we'll re-use some of the existing octree code, but I want to make sure that we end up with something simpler and more maintainable. Given that in the demeshening, the octree will *only* exist for deposition and smoothing purposes and not for indexing, my hope is that we will be able to eliminate a lot of the complexity of the existing octree implementation.
 

If most of my understanding is not correct, I would love to clarify all these doubts as soon as possible, so that I work on the proposal. Thank you for your time!

Best,
Abhishek Singh