sorted particle indices when loading halos from disk

Recently I've been working with the HOP halo finder in yt 3.0. In particular I've been looking at star particles from Enzo simulations in halos of different sizes. I've been running into strange results with particle fields that are stored in the halo hdf5 files vs particle fields that have to be retrieved from the original simulation data. In particular, if I create a mask for star particles from a field saved to disk (creation_time prior to 3.0 or ParticleMassMsun now) then I get the correct values for other fields when I use this mask if they were also saved to disk (so particle positions or velocities) but not for fields that were retrieved from the simulation (such as dynamical time). Similarly, if I identify stars by creation_time in 3.0 (when it isn't saved in the hdf5 file) then I get the correct dynamical_times, but incorrect particle masses. I think I've identified the source of this problem. When the "particle_index" field is read from the halo hdf5 files, it is then sorted into ascending order. In particular, in __getitem__ in the LoadedHalo class there is the following (this is line ~867 in halo_objects in the 3.0 experimental branch):
field_data = self._get_particle_data(self.id, self.fnames, self.size, key) if field_data is not None: if key == 'particle_index': field_data = field_data[field_data.argsort()]
These sorted particle indices are then used when retrieving fields from the simulation data, so the fields end up being sorted in a different order than the ones that are retrieved directly from the halo hdf5 files. As a result, masks created from one set of fields don't work properly on the other set. I think that I can fix this, but before I do I want to make sure I'm not going to be breaking anything else in the process. Does anyone know why the particle_index field was being sorted? If so, do you happen to know whether it would make more sense to sort the other particle fields from disk or leave particle_index unsorted? Thanks in advance for any help. - Josh

Hi Josh,
On Tue, Apr 8, 2014 at 7:17 PM, Josh Moloney Joshua.Moloney@colorado.edu wrote:
Recently I've been working with the HOP halo finder in yt 3.0. In particular I've been looking at star particles from Enzo simulations in halos of different sizes. I've been running into strange results with particle fields that are stored in the halo hdf5 files vs particle fields that have to be retrieved from the original simulation data. In particular, if I create a mask for star particles from a field saved to disk (creation_time prior to 3.0 or ParticleMassMsun now) then I get the correct values for other fields when I use this mask if they were also saved to disk (so particle positions or velocities) but not for fields that were retrieved from the simulation (such as dynamical time). Similarly, if I identify stars by creation_time in 3.0 (when it isn't saved in the hdf5 file) then I get the correct dynamical_times, but incorrect particle masses. I think I've identified the source of this problem. When the "particle_index" field is read from the halo hdf5 files, it is then sorted into ascending order. In particular, in __getitem__ in the LoadedHalo class there is the following (this is line ~867 in halo_objects in the 3.0 experimental branch):
field_data = self._get_particle_data(self.id, self.fnames, self.size, key) if field_data is not None: if key == 'particle_index': field_data = field_data[field_data.argsort()]
These sorted particle indices are then used when retrieving fields from the simulation data, so the fields end up being sorted in a different order than the ones that are retrieved directly from the halo hdf5 files. As a result, masks created from one set of fields don't work properly on the other set. I think that I can fix this, but before I do I want to make sure I'm not going to be breaking anything else in the process. Does anyone know why the particle_index field was being sorted? If so, do you happen to know whether it would make more sense to sort the other particle fields from disk or leave particle_index unsorted? Thanks in advance for any help.
My inclination is that we should fix the behavior -- which I believe means not sorting the particles. That being said, I am not familiar with where this gets used, so perhaps Britton or someone else can chime in? I believe Britton has envisioned a teardown of the existing functionality.
-Matt
- Josh
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Hi Josh,
I agree with Matt. I don't see any reason to sort. I'm not terribly familiar with this area of the code, but please give a shot at fixing this and issue a PR if it works. This area of the code will eventually get redesigned, but I'm not sure when, so let's at least get this working right for now.
Britton
On Wed, Apr 9, 2014 at 4:06 PM, Matthew Turk matthewturk@gmail.com wrote:
Hi Josh,
On Tue, Apr 8, 2014 at 7:17 PM, Josh Moloney Joshua.Moloney@colorado.edu wrote:
Recently I've been working with the HOP halo finder in yt 3.0. In
particular
I've been looking at star particles from Enzo simulations in halos of different sizes. I've been running into strange results with particle
fields
that are stored in the halo hdf5 files vs particle fields that have to be retrieved from the original simulation data. In particular, if I create a mask for star particles from a field saved to disk (creation_time prior
to
3.0 or ParticleMassMsun now) then I get the correct values for other
fields
when I use this mask if they were also saved to disk (so particle
positions
or velocities) but not for fields that were retrieved from the simulation (such as dynamical time). Similarly, if I identify stars by
creation_time in
3.0 (when it isn't saved in the hdf5 file) then I get the correct dynamical_times, but incorrect particle masses. I think I've identified the source of this problem. When the "particle_index" field is read from the halo hdf5 files, it is then
sorted
into ascending order. In particular, in __getitem__ in the LoadedHalo
class
there is the following (this is line ~867 in halo_objects in the 3.0 experimental branch):
field_data = self._get_particle_data(self.id, self.fnames, self.size,
key)
if field_data is not None: if key == 'particle_index': field_data = field_data[field_data.argsort()]
These sorted particle indices are then used when retrieving fields from
the
simulation data, so the fields end up being sorted in a different order
than
the ones that are retrieved directly from the halo hdf5 files. As a
result,
masks created from one set of fields don't work properly on the other
set.
I think that I can fix this, but before I do I want to make sure I'm not going to be breaking anything else in the process. Does anyone know why
the
particle_index field was being sorted? If so, do you happen to know
whether
it would make more sense to sort the other particle fields from disk or leave particle_index unsorted? Thanks in advance for any help.
My inclination is that we should fix the behavior -- which I believe means not sorting the particles. That being said, I am not familiar with where this gets used, so perhaps Britton or someone else can chime in? I believe Britton has envisioned a teardown of the existing functionality.
-Matt
- Josh
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (3)
-
Britton Smith
-
Josh Moloney
-
Matthew Turk