Hi,
Just something to consider:
While I was at McMaster, there was quite a bit of talk about a python visualisation tool called 'pin body'. I think this might have been developed by Greg Stinson to work with GASOLINE.
I'm unsure how far this progressed; grad students at Mac were definitely using it to create slices but I can't find any mention of it on the web.
Potentially, this might be worth checking out, either as a possible incorporation or comparison?
I can ask McMaster what the situation is if that would be helpful?
As a side line, I'd find this very useful as a way of converting between simulation code formats, combined with it's new initial condition generator.
Elizabeth
On Jan 9, 2013, at 12:18 AM, Matthew Turk
Hi all,
I am writing today to request comments and suggestions for supporting SPH data in yt. This is part of a broader effort -- which includes the native Octree support -- in yt 3.0 to correctly support non-gridpatch data.
This email contains a lot of information about where we are, and I was hoping to get some feedback from people who have more experience with SPH data, Gadget, AREPO, etc. In particular, (if they're around and have the time) it'd be great to hear from Marcel Haas, Michael J. Roberts, and Chris Moody.
Additionally, I am very interested in approaching this as an iterative, incremental process. Minimum Viable Product (MVP) first, then improving as we learn and go along.
= Background =
As seen in this pull request:
https://bitbucket.org/yt_analysis/yt-3.0/pull-request/11/initial-n-body-data...
I've implemented a first pass at N-body data support. This proceeds through the following steps:
* Read header * Create octree * For each file in output: * Add particles to octree * When a cell crests a certain number of particles (or when particles from more than one file are co-located) refine * Directly query particles to get data.
As it stands, for N-body data this is very poor at memory conservation. In fact, it's extremely poor. I have however written it so that in the future, we can move to a distributed memory octree, which should alleviate difficulties.
For quantitative analysis of quantities that are *in the output*, this already works. It does not yet provide any kind of kernel or density estimator. By mandating refinement based on file that the particles are in, load on demand operations can (more) efficiently read data.
Many of the related items are outlined in the corresponding YTEP:
https://yt.readthedocs.org/projects/ytep/en/latest/YTEPs/YTEP-0005.html
Currently, it works for OWLS data. However, since there has been something of a Gadget fragmentation, there is considerable difficulty in "supporting Gadget" when that includes supporting the binary format. I've instead taken the approach of attempting to isolate nearly all the functionality into classes so that, at worst, someone could implement a couple functions, supply them to the Gadget interface, and it will use those to read data from disk. Some integration work on this is still to come, but on some level it will be a 'build your own' system for some particle codes. OWLS data (which was run with Gadget) on the other hand is in HDF5, is self-describing, has a considerable number of metadata items affiliated with it, and even outputs the Density estimate for the hydro particles.
Where I'm now stuck is how to proceed with handling SPH data. By extension, this also applies somewhat to handling tesselation codes like AREPO and PHURBAS.
= Challenges =
Analyzing quantitative data, where fields can be generated from fully-local information, is not difficult. We can in fact largely consider this to be done. This includes anything that does not require applying a smoothing kernel.
The hard part comes in in two areas:
* Processing SPH particles as fluid quantities * Visualizing results
Here are a few of our tools:
* Data selection and reading: we can get rectangular prisms and read them from disk relatively quickly. Same for spheres. We can also get boolean regions of these and read them from disk. * Neighbor search in octrees. Given Oct A at Level N, we can get all M (<=26) Octs that neighbor A at Levels <= N. * Voronoi tesselations: we have a wrapper for Voro++ which is relatively performant, but not amazing. The DTFE method provides a way to use this to our advantage, although this requires non-local information. (The naive approach, of getting density by dividing the particle mass by the volume of the voronoi cell, does not give good agreement at least for the OWLS data.)
Here's what we want to support:
* Quantitative analysis (profiles, etc) * Visualization (Projections, slices, probably not VR yet) * Speed
== Gridding Data ==
In the past, one approach we have explored has been to simply grid all of the values. Assuming you have a well-behaved kernel and a large number of particles, you can calculate at every cell-center the value of the necessary fields. If this were done, we could reduce the complexity of the problem *after* gridding the data, but it could potentially increase the complexity of the problem *before* gridding the data.
For instance, presupposing we took our Octree and created blocks of 8^3 (like FLASH), we could then analyze the resultant eulerian grid identically to other codes -- projections, slices, and so on would be done exactly as elsewhere.
This does bring with it the challenge of conducting the smoothing kernel. Not only are there a handful of free parameters in the smoothing kernel, but it also requires handling edge effects. Individual octs are confined to a single domain; neighbors, however, are not. So we'd potentially be in a situation where to calculate the values inside a single Oct, we would have to read from many different files. From the standpoint of implementing in yt, though, we could very simply implement this using the new IO functions in 3.0. We already have a _read_particles function, so the _read_fluid function would identify the necessary region, read the particles, and then proceed to smooth them. We would need to ensure that we were not subject to edge effects (something Daniel Price goes into some detail about in the SPLASH paper) which may prove tricky.
It's been impressed upon me numerous times, however, that this is a sub-optimal solution. So perhaps we should avoid it.
== Not Gridding Data ==
I spent some time thinking about this over the last little while, and it's no longer obvious to me that we need to grid the data at all even to live within the yt infrastructure. What gridded data buys us is a clear "extent" of fluid influence (i.e., a particle's influence may extend beyond its center) and an easy way of transforming a fluid into an image. The former is hard. But I think the latter is easier than I realized before.
Recent, still percolating changes in how yt handles pixelization and ray traversal should allow a simple mechanism for swapping out pixelization functions based on characteristics of the dataset. A pixelization function takes a slice or projection data object (which is, in fact, an *actual* data object and not an image) and then converts them to an image. For SPH datasets, we could mandate that the pixelization object be swapped out for something that can apply an SPH-appropriate method of pixelization (see Daniel Price's SPLASH paper, or the GigaPan paper, for a few examples). We would likely also want to change out the Projection object, as well, and we may need to eliminate the ability to avoid IO during the pixelization procedure (i.e., the project-once-pixelize-many advantage of grid data). But I think it would be do-able.
So we could do this. I think this is the harder road, but I think it is also possible. We would *still* need to come up with a mechanism for applying the smoothing kernel.
= Conclusions and Contributions =
So I think we have two main paths forward -- gridding, and not gridding. I would very, very much appreciate comments or feedback on either of these approaches. I believe that, regardless of what we choose to do, we'll need to figure out how to best apply a smoothing kernel to data; I think this is probably going to be difficult. But I am optimistic we are nearing the point of usability. For pure N-body data, we are already there -- the trick is now to look at the data as fluids, not just particles.
The best way to contribute help right now is to brainstorm and give feedback. I'd love to hear ideas, criticisms, flaws in my logic, ways to improve things, and so on and so forth. I think once we have an MVP, we will be better set up to move forward.
If you'd like to add a reader for your own flavor of particle data, we should do that as well -- especially if you are keen to start exploring and experimenting. The code so far can be found in the repository linked above. I'd also appreciate comments (or acceptance) of the pull request. I am also happy to hold a Hangout to discuss this, go over things, help set up readers or whatever. I consider this a relatively high-priority project. It will be a key feature of yt 3.0.
(Speaking of which, today I've allotted time to focus on getting the last major patch-refinement feature done -- covering grids. Hopefully patch-refinement people can test it out after that's done!)
Thanks for your patience with such a long email! :)
-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org