Hi all, I am writing today to request comments and suggestions for supporting SPH data in yt. This is part of a broader effort -- which includes the native Octree support -- in yt 3.0 to correctly support non-gridpatch data. This email contains a lot of information about where we are, and I was hoping to get some feedback from people who have more experience with SPH data, Gadget, AREPO, etc. In particular, (if they're around and have the time) it'd be great to hear from Marcel Haas, Michael J. Roberts, and Chris Moody. Additionally, I am very interested in approaching this as an iterative, incremental process. Minimum Viable Product (MVP) first, then improving as we learn and go along. = Background = As seen in this pull request: https://bitbucket.org/yt_analysis/yt-3.0/pull-request/11/initial-n-body-data... I've implemented a first pass at N-body data support. This proceeds through the following steps: * Read header * Create octree * For each file in output: * Add particles to octree * When a cell crests a certain number of particles (or when particles from more than one file are co-located) refine * Directly query particles to get data. As it stands, for N-body data this is very poor at memory conservation. In fact, it's extremely poor. I have however written it so that in the future, we can move to a distributed memory octree, which should alleviate difficulties. For quantitative analysis of quantities that are *in the output*, this already works. It does not yet provide any kind of kernel or density estimator. By mandating refinement based on file that the particles are in, load on demand operations can (more) efficiently read data. Many of the related items are outlined in the corresponding YTEP: https://yt.readthedocs.org/projects/ytep/en/latest/YTEPs/YTEP-0005.html Currently, it works for OWLS data. However, since there has been something of a Gadget fragmentation, there is considerable difficulty in "supporting Gadget" when that includes supporting the binary format. I've instead taken the approach of attempting to isolate nearly all the functionality into classes so that, at worst, someone could implement a couple functions, supply them to the Gadget interface, and it will use those to read data from disk. Some integration work on this is still to come, but on some level it will be a 'build your own' system for some particle codes. OWLS data (which was run with Gadget) on the other hand is in HDF5, is self-describing, has a considerable number of metadata items affiliated with it, and even outputs the Density estimate for the hydro particles. Where I'm now stuck is how to proceed with handling SPH data. By extension, this also applies somewhat to handling tesselation codes like AREPO and PHURBAS. = Challenges = Analyzing quantitative data, where fields can be generated from fully-local information, is not difficult. We can in fact largely consider this to be done. This includes anything that does not require applying a smoothing kernel. The hard part comes in in two areas: * Processing SPH particles as fluid quantities * Visualizing results Here are a few of our tools: * Data selection and reading: we can get rectangular prisms and read them from disk relatively quickly. Same for spheres. We can also get boolean regions of these and read them from disk. * Neighbor search in octrees. Given Oct A at Level N, we can get all M (<=26) Octs that neighbor A at Levels <= N. * Voronoi tesselations: we have a wrapper for Voro++ which is relatively performant, but not amazing. The DTFE method provides a way to use this to our advantage, although this requires non-local information. (The naive approach, of getting density by dividing the particle mass by the volume of the voronoi cell, does not give good agreement at least for the OWLS data.) Here's what we want to support: * Quantitative analysis (profiles, etc) * Visualization (Projections, slices, probably not VR yet) * Speed == Gridding Data == In the past, one approach we have explored has been to simply grid all of the values. Assuming you have a well-behaved kernel and a large number of particles, you can calculate at every cell-center the value of the necessary fields. If this were done, we could reduce the complexity of the problem *after* gridding the data, but it could potentially increase the complexity of the problem *before* gridding the data. For instance, presupposing we took our Octree and created blocks of 8^3 (like FLASH), we could then analyze the resultant eulerian grid identically to other codes -- projections, slices, and so on would be done exactly as elsewhere. This does bring with it the challenge of conducting the smoothing kernel. Not only are there a handful of free parameters in the smoothing kernel, but it also requires handling edge effects. Individual octs are confined to a single domain; neighbors, however, are not. So we'd potentially be in a situation where to calculate the values inside a single Oct, we would have to read from many different files. From the standpoint of implementing in yt, though, we could very simply implement this using the new IO functions in 3.0. We already have a _read_particles function, so the _read_fluid function would identify the necessary region, read the particles, and then proceed to smooth them. We would need to ensure that we were not subject to edge effects (something Daniel Price goes into some detail about in the SPLASH paper) which may prove tricky. It's been impressed upon me numerous times, however, that this is a sub-optimal solution. So perhaps we should avoid it. == Not Gridding Data == I spent some time thinking about this over the last little while, and it's no longer obvious to me that we need to grid the data at all even to live within the yt infrastructure. What gridded data buys us is a clear "extent" of fluid influence (i.e., a particle's influence may extend beyond its center) and an easy way of transforming a fluid into an image. The former is hard. But I think the latter is easier than I realized before. Recent, still percolating changes in how yt handles pixelization and ray traversal should allow a simple mechanism for swapping out pixelization functions based on characteristics of the dataset. A pixelization function takes a slice or projection data object (which is, in fact, an *actual* data object and not an image) and then converts them to an image. For SPH datasets, we could mandate that the pixelization object be swapped out for something that can apply an SPH-appropriate method of pixelization (see Daniel Price's SPLASH paper, or the GigaPan paper, for a few examples). We would likely also want to change out the Projection object, as well, and we may need to eliminate the ability to avoid IO during the pixelization procedure (i.e., the project-once-pixelize-many advantage of grid data). But I think it would be do-able. So we could do this. I think this is the harder road, but I think it is also possible. We would *still* need to come up with a mechanism for applying the smoothing kernel. = Conclusions and Contributions = So I think we have two main paths forward -- gridding, and not gridding. I would very, very much appreciate comments or feedback on either of these approaches. I believe that, regardless of what we choose to do, we'll need to figure out how to best apply a smoothing kernel to data; I think this is probably going to be difficult. But I am optimistic we are nearing the point of usability. For pure N-body data, we are already there -- the trick is now to look at the data as fluids, not just particles. The best way to contribute help right now is to brainstorm and give feedback. I'd love to hear ideas, criticisms, flaws in my logic, ways to improve things, and so on and so forth. I think once we have an MVP, we will be better set up to move forward. If you'd like to add a reader for your own flavor of particle data, we should do that as well -- especially if you are keen to start exploring and experimenting. The code so far can be found in the repository linked above. I'd also appreciate comments (or acceptance) of the pull request. I am also happy to hold a Hangout to discuss this, go over things, help set up readers or whatever. I consider this a relatively high-priority project. It will be a key feature of yt 3.0. (Speaking of which, today I've allotted time to focus on getting the last major patch-refinement feature done -- covering grids. Hopefully patch-refinement people can test it out after that's done!) Thanks for your patience with such a long email! :) -Matt
Hi,
Just something to consider:
While I was at McMaster, there was quite a bit of talk about a python visualisation tool called 'pin body'. I think this might have been developed by Greg Stinson to work with GASOLINE.
I'm unsure how far this progressed; grad students at Mac were definitely using it to create slices but I can't find any mention of it on the web.
Potentially, this might be worth checking out, either as a possible incorporation or comparison?
I can ask McMaster what the situation is if that would be helpful?
As a side line, I'd find this very useful as a way of converting between simulation code formats, combined with it's new initial condition generator.
Elizabeth
On Jan 9, 2013, at 12:18 AM, Matthew Turk
Hi all,
I am writing today to request comments and suggestions for supporting SPH data in yt. This is part of a broader effort -- which includes the native Octree support -- in yt 3.0 to correctly support non-gridpatch data.
This email contains a lot of information about where we are, and I was hoping to get some feedback from people who have more experience with SPH data, Gadget, AREPO, etc. In particular, (if they're around and have the time) it'd be great to hear from Marcel Haas, Michael J. Roberts, and Chris Moody.
Additionally, I am very interested in approaching this as an iterative, incremental process. Minimum Viable Product (MVP) first, then improving as we learn and go along.
= Background =
As seen in this pull request:
https://bitbucket.org/yt_analysis/yt-3.0/pull-request/11/initial-n-body-data...
I've implemented a first pass at N-body data support. This proceeds through the following steps:
* Read header * Create octree * For each file in output: * Add particles to octree * When a cell crests a certain number of particles (or when particles from more than one file are co-located) refine * Directly query particles to get data.
As it stands, for N-body data this is very poor at memory conservation. In fact, it's extremely poor. I have however written it so that in the future, we can move to a distributed memory octree, which should alleviate difficulties.
For quantitative analysis of quantities that are *in the output*, this already works. It does not yet provide any kind of kernel or density estimator. By mandating refinement based on file that the particles are in, load on demand operations can (more) efficiently read data.
Many of the related items are outlined in the corresponding YTEP:
https://yt.readthedocs.org/projects/ytep/en/latest/YTEPs/YTEP-0005.html
Currently, it works for OWLS data. However, since there has been something of a Gadget fragmentation, there is considerable difficulty in "supporting Gadget" when that includes supporting the binary format. I've instead taken the approach of attempting to isolate nearly all the functionality into classes so that, at worst, someone could implement a couple functions, supply them to the Gadget interface, and it will use those to read data from disk. Some integration work on this is still to come, but on some level it will be a 'build your own' system for some particle codes. OWLS data (which was run with Gadget) on the other hand is in HDF5, is self-describing, has a considerable number of metadata items affiliated with it, and even outputs the Density estimate for the hydro particles.
Where I'm now stuck is how to proceed with handling SPH data. By extension, this also applies somewhat to handling tesselation codes like AREPO and PHURBAS.
= Challenges =
Analyzing quantitative data, where fields can be generated from fully-local information, is not difficult. We can in fact largely consider this to be done. This includes anything that does not require applying a smoothing kernel.
The hard part comes in in two areas:
* Processing SPH particles as fluid quantities * Visualizing results
Here are a few of our tools:
* Data selection and reading: we can get rectangular prisms and read them from disk relatively quickly. Same for spheres. We can also get boolean regions of these and read them from disk. * Neighbor search in octrees. Given Oct A at Level N, we can get all M (<=26) Octs that neighbor A at Levels <= N. * Voronoi tesselations: we have a wrapper for Voro++ which is relatively performant, but not amazing. The DTFE method provides a way to use this to our advantage, although this requires non-local information. (The naive approach, of getting density by dividing the particle mass by the volume of the voronoi cell, does not give good agreement at least for the OWLS data.)
Here's what we want to support:
* Quantitative analysis (profiles, etc) * Visualization (Projections, slices, probably not VR yet) * Speed
== Gridding Data ==
In the past, one approach we have explored has been to simply grid all of the values. Assuming you have a well-behaved kernel and a large number of particles, you can calculate at every cell-center the value of the necessary fields. If this were done, we could reduce the complexity of the problem *after* gridding the data, but it could potentially increase the complexity of the problem *before* gridding the data.
For instance, presupposing we took our Octree and created blocks of 8^3 (like FLASH), we could then analyze the resultant eulerian grid identically to other codes -- projections, slices, and so on would be done exactly as elsewhere.
This does bring with it the challenge of conducting the smoothing kernel. Not only are there a handful of free parameters in the smoothing kernel, but it also requires handling edge effects. Individual octs are confined to a single domain; neighbors, however, are not. So we'd potentially be in a situation where to calculate the values inside a single Oct, we would have to read from many different files. From the standpoint of implementing in yt, though, we could very simply implement this using the new IO functions in 3.0. We already have a _read_particles function, so the _read_fluid function would identify the necessary region, read the particles, and then proceed to smooth them. We would need to ensure that we were not subject to edge effects (something Daniel Price goes into some detail about in the SPLASH paper) which may prove tricky.
It's been impressed upon me numerous times, however, that this is a sub-optimal solution. So perhaps we should avoid it.
== Not Gridding Data ==
I spent some time thinking about this over the last little while, and it's no longer obvious to me that we need to grid the data at all even to live within the yt infrastructure. What gridded data buys us is a clear "extent" of fluid influence (i.e., a particle's influence may extend beyond its center) and an easy way of transforming a fluid into an image. The former is hard. But I think the latter is easier than I realized before.
Recent, still percolating changes in how yt handles pixelization and ray traversal should allow a simple mechanism for swapping out pixelization functions based on characteristics of the dataset. A pixelization function takes a slice or projection data object (which is, in fact, an *actual* data object and not an image) and then converts them to an image. For SPH datasets, we could mandate that the pixelization object be swapped out for something that can apply an SPH-appropriate method of pixelization (see Daniel Price's SPLASH paper, or the GigaPan paper, for a few examples). We would likely also want to change out the Projection object, as well, and we may need to eliminate the ability to avoid IO during the pixelization procedure (i.e., the project-once-pixelize-many advantage of grid data). But I think it would be do-able.
So we could do this. I think this is the harder road, but I think it is also possible. We would *still* need to come up with a mechanism for applying the smoothing kernel.
= Conclusions and Contributions =
So I think we have two main paths forward -- gridding, and not gridding. I would very, very much appreciate comments or feedback on either of these approaches. I believe that, regardless of what we choose to do, we'll need to figure out how to best apply a smoothing kernel to data; I think this is probably going to be difficult. But I am optimistic we are nearing the point of usability. For pure N-body data, we are already there -- the trick is now to look at the data as fluids, not just particles.
The best way to contribute help right now is to brainstorm and give feedback. I'd love to hear ideas, criticisms, flaws in my logic, ways to improve things, and so on and so forth. I think once we have an MVP, we will be better set up to move forward.
If you'd like to add a reader for your own flavor of particle data, we should do that as well -- especially if you are keen to start exploring and experimenting. The code so far can be found in the repository linked above. I'd also appreciate comments (or acceptance) of the pull request. I am also happy to hold a Hangout to discuss this, go over things, help set up readers or whatever. I consider this a relatively high-priority project. It will be a key feature of yt 3.0.
(Speaking of which, today I've allotted time to focus on getting the last major patch-refinement feature done -- covering grids. Hopefully patch-refinement people can test it out after that's done!)
Thanks for your patience with such a long email! :)
-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Ah, found it:
http://code.google.com/p/pynbody/
Apparently, they now suppose AMR too. Let's take them out…. uh, I mean, look at their code with interest.
Elizabeth
On Jan 9, 2013, at 12:41 AM, Elizabeth Tasker
Hi,
Just something to consider:
While I was at McMaster, there was quite a bit of talk about a python visualisation tool called 'pin body'. I think this might have been developed by Greg Stinson to work with GASOLINE.
I'm unsure how far this progressed; grad students at Mac were definitely using it to create slices but I can't find any mention of it on the web.
Potentially, this might be worth checking out, either as a possible incorporation or comparison?
I can ask McMaster what the situation is if that would be helpful?
As a side line, I'd find this very useful as a way of converting between simulation code formats, combined with it's new initial condition generator.
Elizabeth
On Jan 9, 2013, at 12:18 AM, Matthew Turk
wrote: Hi all,
I am writing today to request comments and suggestions for supporting SPH data in yt. This is part of a broader effort -- which includes the native Octree support -- in yt 3.0 to correctly support non-gridpatch data.
This email contains a lot of information about where we are, and I was hoping to get some feedback from people who have more experience with SPH data, Gadget, AREPO, etc. In particular, (if they're around and have the time) it'd be great to hear from Marcel Haas, Michael J. Roberts, and Chris Moody.
Additionally, I am very interested in approaching this as an iterative, incremental process. Minimum Viable Product (MVP) first, then improving as we learn and go along.
= Background =
As seen in this pull request:
https://bitbucket.org/yt_analysis/yt-3.0/pull-request/11/initial-n-body-data...
I've implemented a first pass at N-body data support. This proceeds through the following steps:
* Read header * Create octree * For each file in output: * Add particles to octree * When a cell crests a certain number of particles (or when particles from more than one file are co-located) refine * Directly query particles to get data.
As it stands, for N-body data this is very poor at memory conservation. In fact, it's extremely poor. I have however written it so that in the future, we can move to a distributed memory octree, which should alleviate difficulties.
For quantitative analysis of quantities that are *in the output*, this already works. It does not yet provide any kind of kernel or density estimator. By mandating refinement based on file that the particles are in, load on demand operations can (more) efficiently read data.
Many of the related items are outlined in the corresponding YTEP:
https://yt.readthedocs.org/projects/ytep/en/latest/YTEPs/YTEP-0005.html
Currently, it works for OWLS data. However, since there has been something of a Gadget fragmentation, there is considerable difficulty in "supporting Gadget" when that includes supporting the binary format. I've instead taken the approach of attempting to isolate nearly all the functionality into classes so that, at worst, someone could implement a couple functions, supply them to the Gadget interface, and it will use those to read data from disk. Some integration work on this is still to come, but on some level it will be a 'build your own' system for some particle codes. OWLS data (which was run with Gadget) on the other hand is in HDF5, is self-describing, has a considerable number of metadata items affiliated with it, and even outputs the Density estimate for the hydro particles.
Where I'm now stuck is how to proceed with handling SPH data. By extension, this also applies somewhat to handling tesselation codes like AREPO and PHURBAS.
= Challenges =
Analyzing quantitative data, where fields can be generated from fully-local information, is not difficult. We can in fact largely consider this to be done. This includes anything that does not require applying a smoothing kernel.
The hard part comes in in two areas:
* Processing SPH particles as fluid quantities * Visualizing results
Here are a few of our tools:
* Data selection and reading: we can get rectangular prisms and read them from disk relatively quickly. Same for spheres. We can also get boolean regions of these and read them from disk. * Neighbor search in octrees. Given Oct A at Level N, we can get all M (<=26) Octs that neighbor A at Levels <= N. * Voronoi tesselations: we have a wrapper for Voro++ which is relatively performant, but not amazing. The DTFE method provides a way to use this to our advantage, although this requires non-local information. (The naive approach, of getting density by dividing the particle mass by the volume of the voronoi cell, does not give good agreement at least for the OWLS data.)
Here's what we want to support:
* Quantitative analysis (profiles, etc) * Visualization (Projections, slices, probably not VR yet) * Speed
== Gridding Data ==
In the past, one approach we have explored has been to simply grid all of the values. Assuming you have a well-behaved kernel and a large number of particles, you can calculate at every cell-center the value of the necessary fields. If this were done, we could reduce the complexity of the problem *after* gridding the data, but it could potentially increase the complexity of the problem *before* gridding the data.
For instance, presupposing we took our Octree and created blocks of 8^3 (like FLASH), we could then analyze the resultant eulerian grid identically to other codes -- projections, slices, and so on would be done exactly as elsewhere.
This does bring with it the challenge of conducting the smoothing kernel. Not only are there a handful of free parameters in the smoothing kernel, but it also requires handling edge effects. Individual octs are confined to a single domain; neighbors, however, are not. So we'd potentially be in a situation where to calculate the values inside a single Oct, we would have to read from many different files. From the standpoint of implementing in yt, though, we could very simply implement this using the new IO functions in 3.0. We already have a _read_particles function, so the _read_fluid function would identify the necessary region, read the particles, and then proceed to smooth them. We would need to ensure that we were not subject to edge effects (something Daniel Price goes into some detail about in the SPLASH paper) which may prove tricky.
It's been impressed upon me numerous times, however, that this is a sub-optimal solution. So perhaps we should avoid it.
== Not Gridding Data ==
I spent some time thinking about this over the last little while, and it's no longer obvious to me that we need to grid the data at all even to live within the yt infrastructure. What gridded data buys us is a clear "extent" of fluid influence (i.e., a particle's influence may extend beyond its center) and an easy way of transforming a fluid into an image. The former is hard. But I think the latter is easier than I realized before.
Recent, still percolating changes in how yt handles pixelization and ray traversal should allow a simple mechanism for swapping out pixelization functions based on characteristics of the dataset. A pixelization function takes a slice or projection data object (which is, in fact, an *actual* data object and not an image) and then converts them to an image. For SPH datasets, we could mandate that the pixelization object be swapped out for something that can apply an SPH-appropriate method of pixelization (see Daniel Price's SPLASH paper, or the GigaPan paper, for a few examples). We would likely also want to change out the Projection object, as well, and we may need to eliminate the ability to avoid IO during the pixelization procedure (i.e., the project-once-pixelize-many advantage of grid data). But I think it would be do-able.
So we could do this. I think this is the harder road, but I think it is also possible. We would *still* need to come up with a mechanism for applying the smoothing kernel.
= Conclusions and Contributions =
So I think we have two main paths forward -- gridding, and not gridding. I would very, very much appreciate comments or feedback on either of these approaches. I believe that, regardless of what we choose to do, we'll need to figure out how to best apply a smoothing kernel to data; I think this is probably going to be difficult. But I am optimistic we are nearing the point of usability. For pure N-body data, we are already there -- the trick is now to look at the data as fluids, not just particles.
The best way to contribute help right now is to brainstorm and give feedback. I'd love to hear ideas, criticisms, flaws in my logic, ways to improve things, and so on and so forth. I think once we have an MVP, we will be better set up to move forward.
If you'd like to add a reader for your own flavor of particle data, we should do that as well -- especially if you are keen to start exploring and experimenting. The code so far can be found in the repository linked above. I'd also appreciate comments (or acceptance) of the pull request. I am also happy to hold a Hangout to discuss this, go over things, help set up readers or whatever. I consider this a relatively high-priority project. It will be a key feature of yt 3.0.
(Speaking of which, today I've allotted time to focus on getting the last major patch-refinement feature done -- covering grids. Hopefully patch-refinement people can test it out after that's done!)
Thanks for your patience with such a long email! :)
-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Elizabeth,
Thanks for the info! Looks like we could learn from them.
-Matt
On Tue, Jan 8, 2013 at 10:45 AM, Elizabeth Tasker
Ah, found it:
http://code.google.com/p/pynbody/
Apparently, they now suppose AMR too. Let's take them out…. uh, I mean, look at their code with interest.
Elizabeth
On Jan 9, 2013, at 12:41 AM, Elizabeth Tasker
wrote: Hi,
Just something to consider:
While I was at McMaster, there was quite a bit of talk about a python visualisation tool called 'pin body'. I think this might have been developed by Greg Stinson to work with GASOLINE.
I'm unsure how far this progressed; grad students at Mac were definitely using it to create slices but I can't find any mention of it on the web.
Potentially, this might be worth checking out, either as a possible incorporation or comparison?
I can ask McMaster what the situation is if that would be helpful?
As a side line, I'd find this very useful as a way of converting between simulation code formats, combined with it's new initial condition generator.
Elizabeth
On Jan 9, 2013, at 12:18 AM, Matthew Turk
wrote: Hi all,
I am writing today to request comments and suggestions for supporting SPH data in yt. This is part of a broader effort -- which includes the native Octree support -- in yt 3.0 to correctly support non-gridpatch data.
This email contains a lot of information about where we are, and I was hoping to get some feedback from people who have more experience with SPH data, Gadget, AREPO, etc. In particular, (if they're around and have the time) it'd be great to hear from Marcel Haas, Michael J. Roberts, and Chris Moody.
Additionally, I am very interested in approaching this as an iterative, incremental process. Minimum Viable Product (MVP) first, then improving as we learn and go along.
= Background =
As seen in this pull request:
https://bitbucket.org/yt_analysis/yt-3.0/pull-request/11/initial-n-body-data...
I've implemented a first pass at N-body data support. This proceeds through the following steps:
* Read header * Create octree * For each file in output: * Add particles to octree * When a cell crests a certain number of particles (or when particles from more than one file are co-located) refine * Directly query particles to get data.
As it stands, for N-body data this is very poor at memory conservation. In fact, it's extremely poor. I have however written it so that in the future, we can move to a distributed memory octree, which should alleviate difficulties.
For quantitative analysis of quantities that are *in the output*, this already works. It does not yet provide any kind of kernel or density estimator. By mandating refinement based on file that the particles are in, load on demand operations can (more) efficiently read data.
Many of the related items are outlined in the corresponding YTEP:
https://yt.readthedocs.org/projects/ytep/en/latest/YTEPs/YTEP-0005.html
Currently, it works for OWLS data. However, since there has been something of a Gadget fragmentation, there is considerable difficulty in "supporting Gadget" when that includes supporting the binary format. I've instead taken the approach of attempting to isolate nearly all the functionality into classes so that, at worst, someone could implement a couple functions, supply them to the Gadget interface, and it will use those to read data from disk. Some integration work on this is still to come, but on some level it will be a 'build your own' system for some particle codes. OWLS data (which was run with Gadget) on the other hand is in HDF5, is self-describing, has a considerable number of metadata items affiliated with it, and even outputs the Density estimate for the hydro particles.
Where I'm now stuck is how to proceed with handling SPH data. By extension, this also applies somewhat to handling tesselation codes like AREPO and PHURBAS.
= Challenges =
Analyzing quantitative data, where fields can be generated from fully-local information, is not difficult. We can in fact largely consider this to be done. This includes anything that does not require applying a smoothing kernel.
The hard part comes in in two areas:
* Processing SPH particles as fluid quantities * Visualizing results
Here are a few of our tools:
* Data selection and reading: we can get rectangular prisms and read them from disk relatively quickly. Same for spheres. We can also get boolean regions of these and read them from disk. * Neighbor search in octrees. Given Oct A at Level N, we can get all M (<=26) Octs that neighbor A at Levels <= N. * Voronoi tesselations: we have a wrapper for Voro++ which is relatively performant, but not amazing. The DTFE method provides a way to use this to our advantage, although this requires non-local information. (The naive approach, of getting density by dividing the particle mass by the volume of the voronoi cell, does not give good agreement at least for the OWLS data.)
Here's what we want to support:
* Quantitative analysis (profiles, etc) * Visualization (Projections, slices, probably not VR yet) * Speed
== Gridding Data ==
In the past, one approach we have explored has been to simply grid all of the values. Assuming you have a well-behaved kernel and a large number of particles, you can calculate at every cell-center the value of the necessary fields. If this were done, we could reduce the complexity of the problem *after* gridding the data, but it could potentially increase the complexity of the problem *before* gridding the data.
For instance, presupposing we took our Octree and created blocks of 8^3 (like FLASH), we could then analyze the resultant eulerian grid identically to other codes -- projections, slices, and so on would be done exactly as elsewhere.
This does bring with it the challenge of conducting the smoothing kernel. Not only are there a handful of free parameters in the smoothing kernel, but it also requires handling edge effects. Individual octs are confined to a single domain; neighbors, however, are not. So we'd potentially be in a situation where to calculate the values inside a single Oct, we would have to read from many different files. From the standpoint of implementing in yt, though, we could very simply implement this using the new IO functions in 3.0. We already have a _read_particles function, so the _read_fluid function would identify the necessary region, read the particles, and then proceed to smooth them. We would need to ensure that we were not subject to edge effects (something Daniel Price goes into some detail about in the SPLASH paper) which may prove tricky.
It's been impressed upon me numerous times, however, that this is a sub-optimal solution. So perhaps we should avoid it.
== Not Gridding Data ==
I spent some time thinking about this over the last little while, and it's no longer obvious to me that we need to grid the data at all even to live within the yt infrastructure. What gridded data buys us is a clear "extent" of fluid influence (i.e., a particle's influence may extend beyond its center) and an easy way of transforming a fluid into an image. The former is hard. But I think the latter is easier than I realized before.
Recent, still percolating changes in how yt handles pixelization and ray traversal should allow a simple mechanism for swapping out pixelization functions based on characteristics of the dataset. A pixelization function takes a slice or projection data object (which is, in fact, an *actual* data object and not an image) and then converts them to an image. For SPH datasets, we could mandate that the pixelization object be swapped out for something that can apply an SPH-appropriate method of pixelization (see Daniel Price's SPLASH paper, or the GigaPan paper, for a few examples). We would likely also want to change out the Projection object, as well, and we may need to eliminate the ability to avoid IO during the pixelization procedure (i.e., the project-once-pixelize-many advantage of grid data). But I think it would be do-able.
So we could do this. I think this is the harder road, but I think it is also possible. We would *still* need to come up with a mechanism for applying the smoothing kernel.
= Conclusions and Contributions =
So I think we have two main paths forward -- gridding, and not gridding. I would very, very much appreciate comments or feedback on either of these approaches. I believe that, regardless of what we choose to do, we'll need to figure out how to best apply a smoothing kernel to data; I think this is probably going to be difficult. But I am optimistic we are nearing the point of usability. For pure N-body data, we are already there -- the trick is now to look at the data as fluids, not just particles.
The best way to contribute help right now is to brainstorm and give feedback. I'd love to hear ideas, criticisms, flaws in my logic, ways to improve things, and so on and so forth. I think once we have an MVP, we will be better set up to move forward.
If you'd like to add a reader for your own flavor of particle data, we should do that as well -- especially if you are keen to start exploring and experimenting. The code so far can be found in the repository linked above. I'd also appreciate comments (or acceptance) of the pull request. I am also happy to hold a Hangout to discuss this, go over things, help set up readers or whatever. I consider this a relatively high-priority project. It will be a key feature of yt 3.0.
(Speaking of which, today I've allotted time to focus on getting the last major patch-refinement feature done -- covering grids. Hopefully patch-refinement people can test it out after that's done!)
Thanks for your patience with such a long email! :)
-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Matt, Can you elaborate on why it will be so hard to implement the smoothing kernel? From my point of view its straightforward to smooth the particles onto grids, or, if we avoid gridding, splat the particles onto pixels according to the smoothing kernel. While there are a number of possible choices for kernel, most codes (Gadget in particular) use various flavors of the cubic spline kernel, which has only one free parameter, the smoothing length, which will be written to disk along with the particle data. As for whether to grid the data or not, I think that it will be necessary for geometric selection of data. One could in principle set up profiles over the whole simulations using the capabilities that you've already completed, querying the fluid fields at the particle locations and calculating derived fields based on the fluid fields written to disk at each particle location. However, it seems to me that calculating fields that require some sort of differencing, or calculating profiles over a geometric object /requires/ a gridding step. I think it should be possible to do most things without the expensive gridding, following your second suggestion for visualization based on the SPLASH algorithms, but I also think we should provide the capability to export to GDF or internally convert to gridded data in memory since it enables a lot of sophisticated analysis tasks that are more or less straightforward on grid data. That being said, I'm no SPH expert, so please correct me if I'm speaking out of turn here. Cheers, Nathan On 1/8/13 5:18 PM, Matthew Turk wrote:
Hi all,
I am writing today to request comments and suggestions for supporting SPH data in yt. This is part of a broader effort -- which includes the native Octree support -- in yt 3.0 to correctly support non-gridpatch data.
This email contains a lot of information about where we are, and I was hoping to get some feedback from people who have more experience with SPH data, Gadget, AREPO, etc. In particular, (if they're around and have the time) it'd be great to hear from Marcel Haas, Michael J. Roberts, and Chris Moody.
Additionally, I am very interested in approaching this as an iterative, incremental process. Minimum Viable Product (MVP) first, then improving as we learn and go along.
= Background =
As seen in this pull request:
https://bitbucket.org/yt_analysis/yt-3.0/pull-request/11/initial-n-body-data...
I've implemented a first pass at N-body data support. This proceeds through the following steps:
* Read header * Create octree * For each file in output: * Add particles to octree * When a cell crests a certain number of particles (or when particles from more than one file are co-located) refine * Directly query particles to get data.
As it stands, for N-body data this is very poor at memory conservation. In fact, it's extremely poor. I have however written it so that in the future, we can move to a distributed memory octree, which should alleviate difficulties.
For quantitative analysis of quantities that are *in the output*, this already works. It does not yet provide any kind of kernel or density estimator. By mandating refinement based on file that the particles are in, load on demand operations can (more) efficiently read data.
Many of the related items are outlined in the corresponding YTEP:
https://yt.readthedocs.org/projects/ytep/en/latest/YTEPs/YTEP-0005.html
Currently, it works for OWLS data. However, since there has been something of a Gadget fragmentation, there is considerable difficulty in "supporting Gadget" when that includes supporting the binary format. I've instead taken the approach of attempting to isolate nearly all the functionality into classes so that, at worst, someone could implement a couple functions, supply them to the Gadget interface, and it will use those to read data from disk. Some integration work on this is still to come, but on some level it will be a 'build your own' system for some particle codes. OWLS data (which was run with Gadget) on the other hand is in HDF5, is self-describing, has a considerable number of metadata items affiliated with it, and even outputs the Density estimate for the hydro particles.
Where I'm now stuck is how to proceed with handling SPH data. By extension, this also applies somewhat to handling tesselation codes like AREPO and PHURBAS.
= Challenges =
Analyzing quantitative data, where fields can be generated from fully-local information, is not difficult. We can in fact largely consider this to be done. This includes anything that does not require applying a smoothing kernel.
The hard part comes in in two areas:
* Processing SPH particles as fluid quantities * Visualizing results
Here are a few of our tools:
* Data selection and reading: we can get rectangular prisms and read them from disk relatively quickly. Same for spheres. We can also get boolean regions of these and read them from disk. * Neighbor search in octrees. Given Oct A at Level N, we can get all M (<=26) Octs that neighbor A at Levels <= N. * Voronoi tesselations: we have a wrapper for Voro++ which is relatively performant, but not amazing. The DTFE method provides a way to use this to our advantage, although this requires non-local information. (The naive approach, of getting density by dividing the particle mass by the volume of the voronoi cell, does not give good agreement at least for the OWLS data.)
Here's what we want to support:
* Quantitative analysis (profiles, etc) * Visualization (Projections, slices, probably not VR yet) * Speed
== Gridding Data ==
In the past, one approach we have explored has been to simply grid all of the values. Assuming you have a well-behaved kernel and a large number of particles, you can calculate at every cell-center the value of the necessary fields. If this were done, we could reduce the complexity of the problem *after* gridding the data, but it could potentially increase the complexity of the problem *before* gridding the data.
For instance, presupposing we took our Octree and created blocks of 8^3 (like FLASH), we could then analyze the resultant eulerian grid identically to other codes -- projections, slices, and so on would be done exactly as elsewhere.
This does bring with it the challenge of conducting the smoothing kernel. Not only are there a handful of free parameters in the smoothing kernel, but it also requires handling edge effects. Individual octs are confined to a single domain; neighbors, however, are not. So we'd potentially be in a situation where to calculate the values inside a single Oct, we would have to read from many different files. From the standpoint of implementing in yt, though, we could very simply implement this using the new IO functions in 3.0. We already have a _read_particles function, so the _read_fluid function would identify the necessary region, read the particles, and then proceed to smooth them. We would need to ensure that we were not subject to edge effects (something Daniel Price goes into some detail about in the SPLASH paper) which may prove tricky.
It's been impressed upon me numerous times, however, that this is a sub-optimal solution. So perhaps we should avoid it.
== Not Gridding Data ==
I spent some time thinking about this over the last little while, and it's no longer obvious to me that we need to grid the data at all even to live within the yt infrastructure. What gridded data buys us is a clear "extent" of fluid influence (i.e., a particle's influence may extend beyond its center) and an easy way of transforming a fluid into an image. The former is hard. But I think the latter is easier than I realized before.
Recent, still percolating changes in how yt handles pixelization and ray traversal should allow a simple mechanism for swapping out pixelization functions based on characteristics of the dataset. A pixelization function takes a slice or projection data object (which is, in fact, an *actual* data object and not an image) and then converts them to an image. For SPH datasets, we could mandate that the pixelization object be swapped out for something that can apply an SPH-appropriate method of pixelization (see Daniel Price's SPLASH paper, or the GigaPan paper, for a few examples). We would likely also want to change out the Projection object, as well, and we may need to eliminate the ability to avoid IO during the pixelization procedure (i.e., the project-once-pixelize-many advantage of grid data). But I think it would be do-able.
So we could do this. I think this is the harder road, but I think it is also possible. We would *still* need to come up with a mechanism for applying the smoothing kernel.
= Conclusions and Contributions =
So I think we have two main paths forward -- gridding, and not gridding. I would very, very much appreciate comments or feedback on either of these approaches. I believe that, regardless of what we choose to do, we'll need to figure out how to best apply a smoothing kernel to data; I think this is probably going to be difficult. But I am optimistic we are nearing the point of usability. For pure N-body data, we are already there -- the trick is now to look at the data as fluids, not just particles.
The best way to contribute help right now is to brainstorm and give feedback. I'd love to hear ideas, criticisms, flaws in my logic, ways to improve things, and so on and so forth. I think once we have an MVP, we will be better set up to move forward.
If you'd like to add a reader for your own flavor of particle data, we should do that as well -- especially if you are keen to start exploring and experimenting. The code so far can be found in the repository linked above. I'd also appreciate comments (or acceptance) of the pull request. I am also happy to hold a Hangout to discuss this, go over things, help set up readers or whatever. I consider this a relatively high-priority project. It will be a key feature of yt 3.0.
(Speaking of which, today I've allotted time to focus on getting the last major patch-refinement feature done -- covering grids. Hopefully patch-refinement people can test it out after that's done!)
Thanks for your patience with such a long email! :)
-Matt _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Matt,
Thanks for this work!
I'm not sure which side I come down on (gridding vs not-gridding.) But I do
think that we ultimately we should address two things:
1. Be able to mandate the full SPH octree ahead of time. For ART, for
example, we have unorganized particles for stars+dm but a defined octree
for the hydro. Sometimes you might want the built-by-yt particle octree to
match the hydro one. This will sometimes lead to large cells with little
gas, but many particles.
2. Be able to operate over multiple pfs / merge them. How should the
particle octree interact with the hydro one when we have both in a single
snapshot/instant?
chris
On Tue, Jan 8, 2013 at 11:45 PM, Nathan Goldbaum
Hi Matt,
Can you elaborate on why it will be so hard to implement the smoothing kernel?
From my point of view its straightforward to smooth the particles onto grids, or, if we avoid gridding, splat the particles onto pixels according to the smoothing kernel. While there are a number of possible choices for kernel, most codes (Gadget in particular) use various flavors of the cubic spline kernel, which has only one free parameter, the smoothing length, which will be written to disk along with the particle data.
As for whether to grid the data or not, I think that it will be necessary for geometric selection of data. One could in principle set up profiles over the whole simulations using the capabilities that you've already completed, querying the fluid fields at the particle locations and calculating derived fields based on the fluid fields written to disk at each particle location. However, it seems to me that calculating fields that require some sort of differencing, or calculating profiles over a geometric object /requires/ a gridding step.
I think it should be possible to do most things without the expensive gridding, following your second suggestion for visualization based on the SPLASH algorithms, but I also think we should provide the capability to export to GDF or internally convert to gridded data in memory since it enables a lot of sophisticated analysis tasks that are more or less straightforward on grid data.
That being said, I'm no SPH expert, so please correct me if I'm speaking out of turn here.
Cheers,
Nathan
On 1/8/13 5:18 PM, Matthew Turk wrote:
Hi all,
I am writing today to request comments and suggestions for supporting SPH data in yt. This is part of a broader effort -- which includes the native Octree support -- in yt 3.0 to correctly support non-gridpatch data.
This email contains a lot of information about where we are, and I was hoping to get some feedback from people who have more experience with SPH data, Gadget, AREPO, etc. In particular, (if they're around and have the time) it'd be great to hear from Marcel Haas, Michael J. Roberts, and Chris Moody.
Additionally, I am very interested in approaching this as an iterative, incremental process. Minimum Viable Product (MVP) first, then improving as we learn and go along.
= Background =
As seen in this pull request:
https://bitbucket.org/yt_**analysis/yt-3.0/pull-request/** 11/initial-n-body-data-supporthttps://bitbucket.org/yt_analysis/yt-3.0/pull-request/11/initial-n-body-data...
I've implemented a first pass at N-body data support. This proceeds through the following steps:
* Read header * Create octree * For each file in output: * Add particles to octree * When a cell crests a certain number of particles (or when particles from more than one file are co-located) refine * Directly query particles to get data.
As it stands, for N-body data this is very poor at memory conservation. In fact, it's extremely poor. I have however written it so that in the future, we can move to a distributed memory octree, which should alleviate difficulties.
For quantitative analysis of quantities that are *in the output*, this already works. It does not yet provide any kind of kernel or density estimator. By mandating refinement based on file that the particles are in, load on demand operations can (more) efficiently read data.
Many of the related items are outlined in the corresponding YTEP:
https://yt.readthedocs.org/**projects/ytep/en/latest/YTEPs/** YTEP-0005.htmlhttps://yt.readthedocs.org/projects/ytep/en/latest/YTEPs/YTEP-0005.html
Currently, it works for OWLS data. However, since there has been something of a Gadget fragmentation, there is considerable difficulty in "supporting Gadget" when that includes supporting the binary format. I've instead taken the approach of attempting to isolate nearly all the functionality into classes so that, at worst, someone could implement a couple functions, supply them to the Gadget interface, and it will use those to read data from disk. Some integration work on this is still to come, but on some level it will be a 'build your own' system for some particle codes. OWLS data (which was run with Gadget) on the other hand is in HDF5, is self-describing, has a considerable number of metadata items affiliated with it, and even outputs the Density estimate for the hydro particles.
Where I'm now stuck is how to proceed with handling SPH data. By extension, this also applies somewhat to handling tesselation codes like AREPO and PHURBAS.
= Challenges =
Analyzing quantitative data, where fields can be generated from fully-local information, is not difficult. We can in fact largely consider this to be done. This includes anything that does not require applying a smoothing kernel.
The hard part comes in in two areas:
* Processing SPH particles as fluid quantities * Visualizing results
Here are a few of our tools:
* Data selection and reading: we can get rectangular prisms and read them from disk relatively quickly. Same for spheres. We can also get boolean regions of these and read them from disk. * Neighbor search in octrees. Given Oct A at Level N, we can get all M (<=26) Octs that neighbor A at Levels <= N. * Voronoi tesselations: we have a wrapper for Voro++ which is relatively performant, but not amazing. The DTFE method provides a way to use this to our advantage, although this requires non-local information. (The naive approach, of getting density by dividing the particle mass by the volume of the voronoi cell, does not give good agreement at least for the OWLS data.)
Here's what we want to support:
* Quantitative analysis (profiles, etc) * Visualization (Projections, slices, probably not VR yet) * Speed
== Gridding Data ==
In the past, one approach we have explored has been to simply grid all of the values. Assuming you have a well-behaved kernel and a large number of particles, you can calculate at every cell-center the value of the necessary fields. If this were done, we could reduce the complexity of the problem *after* gridding the data, but it could potentially increase the complexity of the problem *before* gridding the data.
For instance, presupposing we took our Octree and created blocks of 8^3 (like FLASH), we could then analyze the resultant eulerian grid identically to other codes -- projections, slices, and so on would be done exactly as elsewhere.
This does bring with it the challenge of conducting the smoothing kernel. Not only are there a handful of free parameters in the smoothing kernel, but it also requires handling edge effects. Individual octs are confined to a single domain; neighbors, however, are not. So we'd potentially be in a situation where to calculate the values inside a single Oct, we would have to read from many different files. From the standpoint of implementing in yt, though, we could very simply implement this using the new IO functions in 3.0. We already have a _read_particles function, so the _read_fluid function would identify the necessary region, read the particles, and then proceed to smooth them. We would need to ensure that we were not subject to edge effects (something Daniel Price goes into some detail about in the SPLASH paper) which may prove tricky.
It's been impressed upon me numerous times, however, that this is a sub-optimal solution. So perhaps we should avoid it.
== Not Gridding Data ==
I spent some time thinking about this over the last little while, and it's no longer obvious to me that we need to grid the data at all even to live within the yt infrastructure. What gridded data buys us is a clear "extent" of fluid influence (i.e., a particle's influence may extend beyond its center) and an easy way of transforming a fluid into an image. The former is hard. But I think the latter is easier than I realized before.
Recent, still percolating changes in how yt handles pixelization and ray traversal should allow a simple mechanism for swapping out pixelization functions based on characteristics of the dataset. A pixelization function takes a slice or projection data object (which is, in fact, an *actual* data object and not an image) and then converts them to an image. For SPH datasets, we could mandate that the pixelization object be swapped out for something that can apply an SPH-appropriate method of pixelization (see Daniel Price's SPLASH paper, or the GigaPan paper, for a few examples). We would likely also want to change out the Projection object, as well, and we may need to eliminate the ability to avoid IO during the pixelization procedure (i.e., the project-once-pixelize-many advantage of grid data). But I think it would be do-able.
So we could do this. I think this is the harder road, but I think it is also possible. We would *still* need to come up with a mechanism for applying the smoothing kernel.
= Conclusions and Contributions =
So I think we have two main paths forward -- gridding, and not gridding. I would very, very much appreciate comments or feedback on either of these approaches. I believe that, regardless of what we choose to do, we'll need to figure out how to best apply a smoothing kernel to data; I think this is probably going to be difficult. But I am optimistic we are nearing the point of usability. For pure N-body data, we are already there -- the trick is now to look at the data as fluids, not just particles.
The best way to contribute help right now is to brainstorm and give feedback. I'd love to hear ideas, criticisms, flaws in my logic, ways to improve things, and so on and so forth. I think once we have an MVP, we will be better set up to move forward.
If you'd like to add a reader for your own flavor of particle data, we should do that as well -- especially if you are keen to start exploring and experimenting. The code so far can be found in the repository linked above. I'd also appreciate comments (or acceptance) of the pull request. I am also happy to hold a Hangout to discuss this, go over things, help set up readers or whatever. I consider this a relatively high-priority project. It will be a key feature of yt 3.0.
(Speaking of which, today I've allotted time to focus on getting the last major patch-refinement feature done -- covering grids. Hopefully patch-refinement people can test it out after that's done!)
Thanks for your patience with such a long email! :)
-Matt ______________________________**_________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/**listinfo.cgi/yt-dev-spacepope.**orghttp://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
______________________________**_________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/**listinfo.cgi/yt-dev-spacepope.**orghttp://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi all, Thanks, Matt, for the awesome set-up! Apologies for a delay, but here are my two cents. I am experienced with SPH, but far less with yt than any of you, so pardon me my ignorance from time to time. I have left some of Matt's email in, and near the end I have reacted to some other emails, as hopefully indicated.
Currently, it works for OWLS data. However, since there has been something of a Gadget fragmentation, there is considerable difficulty in "supporting Gadget" when that includes supporting the binary format.
As an aside, I might at some point spend some time on a converter to the owls-like HDF5 format for simulations saved in the gadget binary format (mostly to use the Oppenheimer&Dave simulations together with some stuff I have developed). If I get to that, it may prove useful.
Where I'm now stuck is how to proceed with handling SPH data. By extension, this also applies somewhat to handling tesselation codes like AREPO and PHURBAS.
As far as my knowledge of AREPO goes, I think using the mesh generating points of the voronoi tessellation (if those are stored), should be pretty much the same as using sph particles, although I am not sure what number of neighbors to use (structure may be smoothed out some).
Analyzing quantitative data, where fields can be generated from fully-local information, is not difficult. We can in fact largely consider this to be done. This includes anything that does not require applying a smoothing kernel.
So, you mean that that only includes DM and stars, right? Any given fluid quantity is only defined by smoothing over N_sph (not that this number os not the same for everybody, and maybe not stored in output, but in order to have quantitative analysis physically consistent with the simulation, this number should be the same in analysis and simulation) neighbors. Extra complication: the smoothing length is not always equal to the distance to the N_sph'th neighbor, as there is a minimum set to the kernel, that is the reason why in OWLS this number is stored in output as well (as Nathan indicated, not every simulation group does this though!).
* Data selection and reading: we can get rectangular prisms and read them from disk relatively quickly. Same for spheres. We can also get boolean regions of these and read them from disk.
I am not completely sure what these prisms are (n00b…). ANy fluid quantity inside some volume depends on particles just outside that volume as well. It sounds like that could be an issue.
* Neighbor search in octrees. Given Oct A at Level N, we can get all M (<=26) Octs that neighbor A at Levels <= N.
That is great, as long as the octs contain a good fraction of N_sph particles (such that fluid quantities are defined everywhere by the oct in question and its neighbors, and nothing more).
* Voronoi tesselations: we have a wrapper for Voro++ which is relatively performant, but not amazing. The DTFE method provides a way to use this to our advantage, although this requires non-local information. (The naive approach, of getting density by dividing the particle mass by the volume of the voronoi cell, does not give good agreement at least for the OWLS data.)
Indeed, for the same reasons as above: the volume of a voronoi cell depends only on the nearest few neighbors, always much less than N_sph, whereas fluid quantities in sph are always dependent on all N_sph neighbors. This is also the reason why e.g. arepo and gadget should be handled slightly differently (and is ultimately the reason why moving mesh codes do hydro better).
== Gridding Data == vs. == Not Gridding Data ==
Pro gridding: if you do it before you do anything lee and separately store files with the gridded fluid quantities (disk-space-wise not the cutest solution), you can do it with any resolution you like, as using the same definition as for sph particles, all fluid quantities are defined everywhere in space: take a point, grow a sphere to contain N_sph ngbs and so on…
IF not-gridding indeed is an option (i.e. if we figure out how to define fluid quantities everywhere we need in a physically consistent way) then I would go for that. This may require some brainstorming though; it at least isn't completely obvious to me.
On Jan 8, 2013, at 4:45 PM, Nathan Goldbaum
From my point of view its straightforward to smooth the particles onto grids, or, if we avoid gridding, splat the particles onto pixels according to the smoothing kernel. While there are a number of possible choices for kernel, most codes (Gadget in particular) use various flavors of the cubic spline kernel, which has only one free parameter, the smoothing length, which will be written to disk along with the particle data.
Indeed. Two buts: - the contribution of a particle may go to one or more pixels, which requires a nasty integration of a projected kernel over square pixels. - the kernel has only one free parameter, and if it is written together with particle data that is great. I not, it needs to be recomputed from the other particles, taking into account that it may be subject to a minimum in high density regions. This is just something to keep in mind, and calculating the kernel for all particles is fairly expensive, but gadget comes with C-routines to do it that are efficient.
As for whether to grid the data or not, I think that it will be necessary for geometric selection of data. One could in principle set up profiles over the whole simulations using the capabilities that you've already completed, querying the fluid fields at the particle locations and calculating derived fields based on the fluid fields written to disk at each particle location. However, it seems to me that calculating fields that require some sort of differencing, or calculating profiles over a geometric object /requires/ a gridding step.
I am afraid Nathan is right here. If not, I'd be curious to know how to solve this! Especially if field properties are desired somewhere where there is no particle (the particles form an irregular and sometimes pretty sparsely sampled grid!).
On Jan 10, 2013, at 4:24 AM, Christopher Moody
1. Be able to mandate the full SPH octree ahead of time
+1
2. Be able to operate over multiple pfs / merge them. How should the particle octree interact with the hydro one when we have both in a single snapshot/instant?
Maybe they should not be created independently…? So far my input, I'd be happy to discuss this in a hangout, over email, or via whatever platform! Cheers, Marcel
Hi Marcel et al,
Sorry for the delay, I was trying to wrap my head around this and also
push on some acceptance tests, etc. I've added a first pass at a
gadget binary format reader, although right now it *only* reads
vanilla gadget files. As I noted before, providing an easy way of
specifying a non-standard loader for this is high-priority.
On Thu, Jan 10, 2013 at 1:38 PM, Marcel Haas
Hi all,
Thanks, Matt, for the awesome set-up! Apologies for a delay, but here are my two cents. I am experienced with SPH, but far less with yt than any of you, so pardon me my ignorance from time to time. I have left some of Matt's email in, and near the end I have reacted to some other emails, as hopefully indicated.
Currently, it works for OWLS data. However, since there has been something of a Gadget fragmentation, there is considerable difficulty in "supporting Gadget" when that includes supporting the binary format.
As an aside, I might at some point spend some time on a converter to the owls-like HDF5 format for simulations saved in the gadget binary format (mostly to use the Oppenheimer&Dave simulations together with some stuff I have developed). If I get to that, it may prove useful.
I agree, it definitely would. Do you think you'd be at all interested in trying this out from the binary reader I wrote into yt? It might be possible to get a flow of data to OWLS-style format.
Where I'm now stuck is how to proceed with handling SPH data. By extension, this also applies somewhat to handling tesselation codes like AREPO and PHURBAS.
As far as my knowledge of AREPO goes, I think using the mesh generating points of the voronoi tessellation (if those are stored), should be pretty much the same as using sph particles, although I am not sure what number of neighbors to use (structure may be smoothed out some).
My understanding is that in fact the tessellation structure is not stored, but I think we can use a similar (albeit reduced, since it needs < N_sph neighbors) technique to get the local density estimate for AREPO data. In fact, I think that might end up being easier and faster than the SPH data.
Analyzing quantitative data, where fields can be generated from fully-local information, is not difficult. We can in fact largely consider this to be done. This includes anything that does not require applying a smoothing kernel.
So, you mean that that only includes DM and stars, right? Any given fluid quantity is only defined by smoothing over N_sph (not that this number os not the same for everybody, and maybe not stored in output, but in order to have quantitative analysis physically consistent with the simulation, this number should be the same in analysis and simulation) neighbors. Extra complication: the smoothing length is not always equal to the distance to the N_sph'th neighbor, as there is a minimum set to the kernel, that is the reason why in OWLS this number is stored in output as well (as Nathan indicated, not every simulation group does this though!).
Yup, DM and stars only. For the variation in N_sph I think we can actually have considerable flexibility in how we define both the smoothing kernel and the free parameters for it. The minimum length setting can also provide a bit of assistance for choosing the neighbors which will be examined for particles. The biggest stumbling block for me, conceptually, is the best way to search 26 (or more) oct neighbors efficiently, without storing the particle data in memory and minimizing the hits to the disk. I'm exploring a few options now, but what I think it's going to come down to is going to require some caching of particle data and clever ordering of octs in the tree.
* Data selection and reading: we can get rectangular prisms and read them from disk relatively quickly. Same for spheres. We can also get boolean regions of these and read them from disk.
I am not completely sure what these prisms are (n00b…). ANy fluid quantity inside some volume depends on particles just outside that volume as well. It sounds like that could be an issue.
Ah, sorry, I mean we can easily select particles that exist in a prism defined by two corners. As for the particles just outside the volume, this is where I'm starting to think we may need to stride over prisms that are somewhat larger than the oct leaf nodes that we want to examine, where we draw from neighbors as well.
* Neighbor search in octrees. Given Oct A at Level N, we can get all M (<=26) Octs that neighbor A at Levels <= N.
That is great, as long as the octs contain a good fraction of N_sph particles (such that fluid quantities are defined everywhere by the oct in question and its neighbors, and nothing more).
Right now my thought is we can do N_sph * 4 or so in each, and then inside each oct assume a grid size of say between 8 and 32 cells on a side, depending. I know I said earlier I wanted to avoid gridding, but I think as a first pass we may need to, after reading comments from both you and Nathan. (See below.)
* Voronoi tesselations: we have a wrapper for Voro++ which is relatively performant, but not amazing. The DTFE method provides a way to use this to our advantage, although this requires non-local information. (The naive approach, of getting density by dividing the particle mass by the volume of the voronoi cell, does not give good agreement at least for the OWLS data.)
Indeed, for the same reasons as above: the volume of a voronoi cell depends only on the nearest few neighbors, always much less than N_sph, whereas fluid quantities in sph are always dependent on all N_sph neighbors. This is also the reason why e.g. arepo and gadget should be handled slightly differently (and is ultimately the reason why moving mesh codes do hydro better).
== Gridding Data == vs. == Not Gridding Data ==
Pro gridding: if you do it before you do anything lee and separately store files with the gridded fluid quantities (disk-space-wise not the cutest solution), you can do it with any resolution you like, as using the same definition as for sph particles, all fluid quantities are defined everywhere in space: take a point, grow a sphere to contain N_sph ngbs and so on…
IF not-gridding indeed is an option (i.e. if we figure out how to define fluid quantities everywhere we need in a physically consistent way) then I would go for that. This may require some brainstorming though; it at least isn't completely obvious to me.
It's not obvious to me either the best way to do that. But I think that the solutions it will draw on will require much of the same infrastructure as the gridding solution, since we will ultimately need to evaluate the smoothing kernel at many different locations anyway. So I am going to proceed with a gridding step, with the admission that we hope to get rid of this in the future. And unlike in the 2.x series, I don't think we will face as much technical resistance to that.
On Jan 8, 2013, at 4:45 PM, Nathan Goldbaum
wrote: From my point of view its straightforward to smooth the particles onto grids, or, if we avoid gridding, splat the particles onto pixels according to the smoothing kernel. While there are a number of possible choices for kernel, most codes (Gadget in particular) use various flavors of the cubic spline kernel, which has only one free parameter, the smoothing length, which will be written to disk along with the particle data.
Indeed. Two buts: - the contribution of a particle may go to one or more pixels, which requires a nasty integration of a projected kernel over square pixels. - the kernel has only one free parameter, and if it is written together with particle data that is great. I not, it needs to be recomputed from the other particles, taking into account that it may be subject to a minimum in high density regions. This is just something to keep in mind, and calculating the kernel for all particles is fairly expensive, but gadget comes with C-routines to do it that are efficient.
Using the Gadget routines is actually one route I intend to explore.
As for whether to grid the data or not, I think that it will be necessary for geometric selection of data. One could in principle set up profiles over the whole simulations using the capabilities that you've already completed, querying the fluid fields at the particle locations and calculating derived fields based on the fluid fields written to disk at each particle location. However, it seems to me that calculating fields that require some sort of differencing, or calculating profiles over a geometric object /requires/ a gridding step.
I am afraid Nathan is right here. If not, I'd be curious to know how to solve this! Especially if field properties are desired somewhere where there is no particle (the particles form an irregular and sometimes pretty sparsely sampled grid!).
Okay. Good point. This is ultimately the point that convinced me that we're not (yet) ready for gridless analysis, in that geometric selection simply works better when you can easily evaluate a value at every point. So for now, I am going to go down the gridding path. Once I have an outline of how to do this (i.e., how to go from the Octree -- which we have -- to a set of fluid particles) I'll write back either here or in a YTEP to describe the process, request feedback, and set about implementing.
On Jan 10, 2013, at 4:24 AM, Christopher Moody
wrote: 1. Be able to mandate the full SPH octree ahead of time
+1
Good idea. Hadn't thought about this. I don't yet have a serialization format for the octree, but creating one should not be too hard.
2. Be able to operate over multiple pfs / merge them. How should the particle octree interact with the hydro one when we have both in a single snapshot/instant?
Maybe they should not be created independently…?
Hm, this is also possible. As it stands for RAMSES I was not creating particle octrees at all yet.
So far my input, I'd be happy to discuss this in a hangout, over email, or via whatever platform!
Let's do that. I need to spend some time thinking this through further and researching, but I will write back soon and we can set up a hangout about it. Thank you all for your input! I am generally quite excited about this, and I appreciate the thoughts and guidance. -Matt
Cheers, Marcel
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Matt et al., I removed most of the convo, as it got a bit long. I hope it's still clear what the different parts are about :)
I agree, it definitely would. Do you think you'd be at all interested in trying this out from the binary reader I wrote into yt? It might be possible to get a flow of data to OWLS-style format.
Yes, I think that would be worth my time for a bit. One issue could be is that not even close to all metadata for those simulations is stored in the binary output (that is at least true for the Oppenheimer&Dave simulations, I am not sure about others). Also, i bet different groups have stored different (numbers) of arrays and may have dealt with that in a non-standardized way. Maybe the main pro of having one routine that converts data into a useful format is for others to copy and adapt to their own output standards… Writing an HDF5 file with all metadata included also makes it super-easy to check whether all has gone right in the data conversion.
like AREPO and PHURBAS.
My understanding is that in fact the tessellation structure is not stored, but I think we can use a similar (albeit reduced, since it needs < N_sph neighbors) technique to get the local density estimate for AREPO data. In fact, I think that might end up being easier and faster than the SPH data.
Do you, or anyone in this community have access to AREPO? It seems that code is still fairly non-public. I am sure some of us could get (possibly me included) for some specific science case, if you ask Volker or Lars. I in fact don't quite know what they store. They could store particle-like data, or grid-like data or both. If they store particle-like data, I'm sure they output all necessary hydro quantities to estimate the density everywhere in space, also where there is no particle (which in a Voronoi tesellation of space would just be the density of the nearest voronoi cell center, i.e. particle, right?). So that always needs the knowledge of zero neighbors for particles, or 1 for points in space other than particle positions.
Yup, DM and stars only. For the variation in N_sph I think we can actually have considerable flexibility in how we define both the smoothing kernel and the free parameters for it.
If we want to be exact in our determination of hydro quantities, we will always have to do the same kernel, and the same kernel length as used in the code. Differences won't be huge if we use other kernel prescriptions or numbers of neighbors, but they will be non-zero, which might be undesirabke for quantitative analysis.
The minimum length setting can also provide a bit of assistance for choosing the neighbors which will be examined for particles.
The minimum length ensures that sometimes many _more_ than N_sph neighbors need to be taken into account. What I seem to rememeber is that in the centers of massive halos, up to ~150 particles fall within the kernel of a given particle (whlie N_sph=64 was used).
The biggest stumbling block for me, conceptually, is the best way to search 26 (or more) oct neighbors efficiently, without storing the particle data in memory and minimizing the hits to the disk. I'm exploring a few options now, but what I think it's going to come down to is going to require some caching of particle data and clever ordering of octs in the tree.
I'm afraid I won't be of much help here, would need a crash course in oct-tree construction+ being smart first. Chris will likely have some ideas?
Right now my thought is we can do N_sph * 4 or so in each, and then inside each oct assume a grid size of say between 8 and 32 cells on a side, depending. I know I said earlier I wanted to avoid gridding, but I think as a first pass we may need to, after reading comments from both you and Nathan. (See below.)
N_sph*4 should in general be enough. Would there be a way to check if this is OK during runtime? I mean to say: what if a situation arises where it is not quite enough, would there be a simple way to check, that is simple and cheap enough to just do every time?
It's not obvious to me either the best way to do that. But I think that the solutions it will draw on will require much of the same infrastructure as the gridding solution, since we will ultimately need to evaluate the smoothing kernel at many different locations anyway. So I am going to proceed with a gridding step, with the admission that we hope to get rid of this in the future. And unlike in the 2.x series, I don't think we will face as much technical resistance to that.
Sounds good to me.
Okay. Good point. This is ultimately the point that convinced me that we're not (yet) ready for gridless analysis, in that geometric selection simply works better when you can easily evaluate a value at every point. So for now, I am going to go down the gridding path.
Once I have an outline of how to do this (i.e., how to go from the Octree -- which we have -- to a set of fluid particles) I'll write back either here or in a YTEP to describe the process, request feedback, and set about implementing.
Yay! Unfortunately, the week of the yt developers thing in SC, I am already occupied elsewhere… I do have funds myself nowadays, so any future meeting like that (or smaller/shorter…) should be no problem!
2. Be able to operate over multiple pfs / merge them. How should the particle octree interact with the hydro one when we have both in a single snapshot/instant?
Maybe they should not be created independently…?
Hm, this is also possible. As it stands for RAMSES I was not creating particle octrees at all yet.
I think, when a grid cell somewhere wants to know the total mass contained in it, or something like that, the grids should be the same, right? So creating the octree for the hydro and collisionless components seems to make some sense to me. I am not sure if that wouldn't give issues in regions where the density of one type of particles is much larger than that of another type (again, oct tree n00b here)…
Thank you all for your input! I am generally quite excited about this, and I appreciate the thoughts and guidance.
Excitement seems contagious. Awesome work! Cheers, Marcel
Hi Marcel,
Do you, or anyone in this community have access to AREPO? It seems that code is still fairly non-public. I am sure some of us could get (possibly me included) for some specific science case, if you ask Volker or Lars. I in fact don't quite know what they store. They could store particle-like data, or grid-like data or both. If they store particle-like data, I'm sure they output all necessary hydro quantities to estimate the density everywhere in space, also where there is no particle (which in a Voronoi tesellation of space would just be the density of the nearest voronoi cell center, i.e. particle, right?). So that always needs the knowledge of zero neighbors for particles, or 1 for points in space other than particle positions.
I have no direct access to AREPO data but I recently spoke with someone who is analyzing a cosmological AREPO dataset. It turns out the AREPO has an identical data format to Gadget. In fact, at least in the Hernquist group, they are using the exact same scripts to analyze Gadget and AREPO runs. The Voronoi grids are never saved to disk, so if one wants to infer what the Voronoi grid was during post-processing, it would require a re-tesselation of the domain. Sunrise actually does this when it is run on AREPO data. This is accomplished by linking against the AREPO tessellation routines when one compiles sunrise. The upside of is that we can legitimately say that we support AREPO datasets as soon as we finish support for Gadget. Doing AREPO support properly by self-consistently accounting for the Voronoi mesh will be more difficult. We probably want to hold off on making a big deal about this until we can find someone who uses AREPO data and is interested in becoming involved in yt development. -Nathan
On Wed, Jan 16, 2013 at 5:01 PM, Marcel Haas
Hi Matt et al.,
I removed most of the convo, as it got a bit long. I hope it's still clear what the different parts are about :)
I agree, it definitely would. Do you think you'd be at all interested in trying this out from the binary reader I wrote into yt? It might be possible to get a flow of data to OWLS-style format.
Yes, I think that would be worth my time for a bit. One issue could be is that not even close to all metadata for those simulations is stored in the binary output (that is at least true for the Oppenheimer&Dave simulations, I am not sure about others). Also, i bet different groups have stored different (numbers) of arrays and may have dealt with that in a non-standardized way.
I had heard that incomplete sets of metadata were stored, but I think I had not realized that not enough information to continue the simulation was stored.
Maybe the main pro of having one routine that converts data into a useful format is for others to copy and adapt to their own output standards… Writing an HDF5 file with all metadata included also makes it super-easy to check whether all has gone right in the data conversion.
Yes, totally. (A few people here have been working on a data format, too ...)
Do you, or anyone in this community have access to AREPO? It seems that code is still fairly non-public. I am sure some of us could get (possibly me included) for some specific science case, if you ask Volker or Lars. I in fact don't quite know what they store. They could store particle-like data, or grid-like data or both. If they store particle-like data, I'm sure they output all necessary hydro quantities to estimate the density everywhere in space, also where there is no particle (which in a Voronoi tesellation of space would just be the density of the nearest voronoi cell center, i.e. particle, right?). So that always needs the knowledge of zero neighbors for particles, or 1 for points in space other than particle positions.
Let me back up and explain why I mentioned AREPO. I don't have access to it; I know that at least one member of the list does, but I don't. My understanding was similar to Nathan's -- that it's in the same format as Gadget, and mostly Gadget analysis tools were used. Since we already have tessellation support in yt via voro++, supporting tessellation-based density estimates should be possible. Anyway, the takeaway is that as long as we have flexibility in supporting different density estimators or methods of calculating primitive quantities. So once we can support Gadget format, this would enable someone interested in using yt for AREPO data to do so. But yes, I don't think we can do anything other than keep our thoughts on flexibility for supporting things like that. [snip]
Yay! Unfortunately, the week of the yt developers thing in SC, I am already occupied elsewhere… I do have funds myself nowadays, so any future meeting like that (or smaller/shorter…) should be no problem!
Ah, congrats on the funding, bummer on not being able to come. We should consider a mini-sprint at some other time during the year.
I think, when a grid cell somewhere wants to know the total mass contained in it, or something like that, the grids should be the same, right? So creating the octree for the hydro and collisionless components seems to make some sense to me. I am not sure if that wouldn't give issues in regions where the density of one type of particles is much larger than that of another type (again, oct tree n00b here)…
This is a good point. I think we should probably do it in a simialr way, or else we end up essentially having to merge contributions between mesh structures. I've gotten much of the way through Tipsy file format support, although the row vs column storage of the data caused some tricky bits. Hopefully I will start next week on the kernel evaluations. -Matt
participants (5)
-
Christopher Moody
-
Elizabeth Tasker
-
Marcel Haas
-
Matthew Turk
-
Nathan Goldbaum