Hi (Matt, mainly), I have a packed AMR, dm-only dataset that is giving me some odd issues with the hierarchy. The grids are not being referenced to the correct AMR HDF5 file. These commands: from yt.mods import * pf = load('RedshiftOutput0005', data_style="enzo_packed_3d") pf.h.cpu_map['/mirage/sskory/reddead-survey-amr/RD0005/RedshiftOutput0005.cpu0004'][:10] give: [EnzoGrid_0005, EnzoGrid_1867, EnzoGrid_1868, EnzoGrid_1869, EnzoGrid_1870, EnzoGrid_1871, EnzoGrid_1872, EnzoGrid_1873, EnzoGrid_1874, EnzoGrid_1875] This shows that grid 1867 is owned by cpu 4. But here's what the .hierarchy file says about Grid 1867, that it's owned by cpu0005. A manual inspection shows that it is indeed inside of the cpu0005 HDF5 file. Grid = 1867 Task = 5 GridRank = 3 GridDimension = 12 12 12 GridStartIndex = 3 3 3 GridEndIndex = 8 8 8 GridLeftEdge = 0.5078125 0.521484375 0.337890625 GridRightEdge = 0.513671875 0.52734375 0.34375 Time = 646.75066015177 SubgridsAreStatic = 0 NumberOfBaryonFields = 0 NumberOfParticles = 96 ParticleFileName = /mirage/sskory/reddead-survey-amr/RD0005/RedshiftOutput0005.cpu0005 GravityBoundaryType = 2 Pointer: Grid[1867]->NextGridThisLevel = 0 Pointer: Grid[1867]->NextGridNextLevel = 0 Pointer: Grid[5]->NextGridNextLevel = 1868 I'm using the latest trunk, and the files should be world-readable. Any ideas why this is happening? Thanks! _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
Hi Stephen,
This is the same problem Britton reported. cpu_map is unused, but it
gets created anyway. The current trunk hierarchy is not great.
Can you try commenting out the call to __obtain_filenames inside
populate_hierarchy, and also try with the current mercurial tip, to
see if either or both of those things fix the problem?
-Matt
On Wed, Nov 18, 2009 at 8:52 PM, Stephen Skory
Hi (Matt, mainly),
I have a packed AMR, dm-only dataset that is giving me some odd issues with the hierarchy. The grids are not being referenced to the correct AMR HDF5 file. These commands:
from yt.mods import * pf = load('RedshiftOutput0005', data_style="enzo_packed_3d") pf.h.cpu_map['/mirage/sskory/reddead-survey-amr/RD0005/RedshiftOutput0005.cpu0004'][:10]
give: [EnzoGrid_0005, EnzoGrid_1867, EnzoGrid_1868, EnzoGrid_1869, EnzoGrid_1870, EnzoGrid_1871, EnzoGrid_1872, EnzoGrid_1873, EnzoGrid_1874, EnzoGrid_1875]
This shows that grid 1867 is owned by cpu 4. But here's what the .hierarchy file says about Grid 1867, that it's owned by cpu0005. A manual inspection shows that it is indeed inside of the cpu0005 HDF5 file.
Grid = 1867 Task = 5 GridRank = 3 GridDimension = 12 12 12 GridStartIndex = 3 3 3 GridEndIndex = 8 8 8 GridLeftEdge = 0.5078125 0.521484375 0.337890625 GridRightEdge = 0.513671875 0.52734375 0.34375 Time = 646.75066015177 SubgridsAreStatic = 0 NumberOfBaryonFields = 0 NumberOfParticles = 96 ParticleFileName = /mirage/sskory/reddead-survey-amr/RD0005/RedshiftOutput0005.cpu0005 GravityBoundaryType = 2 Pointer: Grid[1867]->NextGridThisLevel = 0 Pointer: Grid[1867]->NextGridNextLevel = 0 Pointer: Grid[5]->NextGridNextLevel = 1868
I'm using the latest trunk, and the files should be world-readable. Any ideas why this is happening?
Thanks!
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Yeah, this is exactly that problem. When reading the hierarchy file, all
the grid filenames are put into a list, and then re-associated with the
grids later. If you have a dm-only dataset, grids with no particles do not
have any filename entry in the hierarchy file, thus you get a list of file
and a list of grids of different length. The problem is that
__obtain_filenames assumes the lists are the same size, and just lays them
down next to each other. So, for grids later in the hierarchy than any grid
without particles, they are associated with the wrong filename.
Commenting out __obtain_filenames will work as long as the paths in the
hierarchy are correct, since it's that routine that works out the correct
paths to the data. I didn't commit the fix for this reason, as I've found
it pretty valuable not to have to change all those paths in the hierarchy
file every time I copy data somewhere else.
On a related note, how would you guys feel if enzo was changed such that the
hierarchy file and such only write the relative path to the filename? So,
for restarts, it would read in the value of GlobalDir and then, when reading
the hierarchy file and other places where file paths are given, it would
then say:
filename = GlobalDir + <file path on line of hierarchy file>
Personally, I find it a bit redundant that we keep track of GlobalDir and
write full paths to files.
Britton
On Wed, Nov 18, 2009 at 9:55 PM, Matthew Turk
Hi Stephen,
This is the same problem Britton reported. cpu_map is unused, but it gets created anyway. The current trunk hierarchy is not great.
Can you try commenting out the call to __obtain_filenames inside populate_hierarchy, and also try with the current mercurial tip, to see if either or both of those things fix the problem?
-Matt
On Wed, Nov 18, 2009 at 8:52 PM, Stephen Skory
wrote: Hi (Matt, mainly),
I have a packed AMR, dm-only dataset that is giving me some odd issues with the hierarchy. The grids are not being referenced to the correct AMR HDF5 file. These commands:
from yt.mods import * pf = load('RedshiftOutput0005', data_style="enzo_packed_3d")
pf.h.cpu_map['/mirage/sskory/reddead-survey-amr/RD0005/RedshiftOutput0005.cpu0004'][:10]
give: [EnzoGrid_0005, EnzoGrid_1867, EnzoGrid_1868, EnzoGrid_1869, EnzoGrid_1870, EnzoGrid_1871, EnzoGrid_1872, EnzoGrid_1873, EnzoGrid_1874, EnzoGrid_1875]
This shows that grid 1867 is owned by cpu 4. But here's what the
.hierarchy file says about Grid 1867, that it's owned by cpu0005. A manual inspection shows that it is indeed inside of the cpu0005 HDF5 file.
Grid = 1867 Task = 5 GridRank = 3 GridDimension = 12 12 12 GridStartIndex = 3 3 3 GridEndIndex = 8 8 8 GridLeftEdge = 0.5078125 0.521484375 0.337890625 GridRightEdge = 0.513671875 0.52734375 0.34375 Time = 646.75066015177 SubgridsAreStatic = 0 NumberOfBaryonFields = 0 NumberOfParticles = 96 ParticleFileName =
/mirage/sskory/reddead-survey-amr/RD0005/RedshiftOutput0005.cpu0005
GravityBoundaryType = 2 Pointer: Grid[1867]->NextGridThisLevel = 0 Pointer: Grid[1867]->NextGridNextLevel = 0 Pointer: Grid[5]->NextGridNextLevel = 1868
I'm using the latest trunk, and the files should be world-readable. Any ideas why this is happening?
Thanks!
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ http://physics.ucsd.edu/%7Esskory/_.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Commenting out __obtain_filenames will work as long as the paths in the hierarchy are correct, since it's that routine that works out the correct paths to the data. I didn't commit the fix for this reason, as I've found it pretty valuable not to have to change all those paths in the hierarchy file every time I copy data somewhere else.
Oh, drat, I'd forgotten this was why your fix wasn't committed. Boo. LAst night I discovered another set of data that was broken with the current trunk parser but works with the new hg parser -- there seems to be SOMEthing about WoC enzo that in some combinations of configuration gives hierarchies that don't work. (I saw it in a 2D dataset last night.) I'd like to move the full hg tree back to trunk -- does anyone object?
On a related note, how would you guys feel if enzo was changed such that the hierarchy file and such only write the relative path to the filename?
Really good.
On Wed, Nov 18, 2009 at 9:55 PM, Matthew Turk
wrote: Hi Stephen,
This is the same problem Britton reported. cpu_map is unused, but it gets created anyway. The current trunk hierarchy is not great.
Can you try commenting out the call to __obtain_filenames inside populate_hierarchy, and also try with the current mercurial tip, to see if either or both of those things fix the problem?
-Matt
On Wed, Nov 18, 2009 at 8:52 PM, Stephen Skory
wrote: Hi (Matt, mainly),
I have a packed AMR, dm-only dataset that is giving me some odd issues with the hierarchy. The grids are not being referenced to the correct AMR HDF5 file. These commands:
from yt.mods import * pf = load('RedshiftOutput0005', data_style="enzo_packed_3d")
pf.h.cpu_map['/mirage/sskory/reddead-survey-amr/RD0005/RedshiftOutput0005.cpu0004'][:10]
give: [EnzoGrid_0005, EnzoGrid_1867, EnzoGrid_1868, EnzoGrid_1869, EnzoGrid_1870, EnzoGrid_1871, EnzoGrid_1872, EnzoGrid_1873, EnzoGrid_1874, EnzoGrid_1875]
This shows that grid 1867 is owned by cpu 4. But here's what the .hierarchy file says about Grid 1867, that it's owned by cpu0005. A manual inspection shows that it is indeed inside of the cpu0005 HDF5 file.
Grid = 1867 Task = 5 GridRank = 3 GridDimension = 12 12 12 GridStartIndex = 3 3 3 GridEndIndex = 8 8 8 GridLeftEdge = 0.5078125 0.521484375 0.337890625 GridRightEdge = 0.513671875 0.52734375 0.34375 Time = 646.75066015177 SubgridsAreStatic = 0 NumberOfBaryonFields = 0 NumberOfParticles = 96 ParticleFileName = /mirage/sskory/reddead-survey-amr/RD0005/RedshiftOutput0005.cpu0005 GravityBoundaryType = 2 Pointer: Grid[1867]->NextGridThisLevel = 0 Pointer: Grid[1867]->NextGridNextLevel = 0 Pointer: Grid[5]->NextGridNextLevel = 1868
I'm using the latest trunk, and the files should be world-readable. Any ideas why this is happening?
Thanks!
_______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Matt & Britton,
Commenting out __obtain_filenames will work as long as the paths in the
Commenting out __obtain_filenames in trunk doesn't seem to do it for me. My dataset is located where the full pathnames say they are. It's dying inside of _save_data in HierarchyType where it's trying to save the .yt file, due to h5py being unwilling to save an empty array (h5py.h5.ArgsError: Zero sized dimension for non-unlimited dimension). From pdb: -> arr = myGroup.create_dataset(name,data=array) (Pdb) name 'DataFields' (Pdb) array []
LAst night I discovered another set of data that was broken with the current trunk parser but works with the new hg parser -- there seems
This is a WoC-created dataset, so that may introduce more wrinkles into this.
...
I've spent the last while trying to get this to work in the hg yt tip, but I think I need some help. I've discovered a couple things in HierarchyType.py that seems wrong. I haven't committed them because I'm not sure I understand what's going on well enough to do that.
h5py doesn't like directly accessing the file pointer. Adding an intermediate step fixes this hangup:
diff -r d2386cb60c6d yt/lagos/HierarchyType.py
--- a/yt/lagos/HierarchyType.py Wed Nov 18 10:07:25 2009 -0800
+++ b/yt/lagos/HierarchyType.py Thu Nov 19 10:06:07 2009 -0800
@@ -345,7 +345,8 @@
"%s.hierarchy" % (pf.parameter_filename))
harray_fn = self.hierarchy_filename[:-9] + "harrays"
if os.path.exists(harray_fn):
- self.num_grids = h5py.File(harray_fn)["/Level"].len()
+ harray_fp = h5py.File(harray_fn)
+ self.num_grids = harray_fp["/Level"].len()
elif os.path.getsize(self.hierarchy_filename) == 0:
raise IOError(-1,"File empty", self.hierarchy_filename)
self.directory = os.path.dirname(self.hierarchy_filename)
These three datasets in the .harrays file are not 1D, they're 3D, so trying to put them in a flatiter array won't work. I might be doing something wrong here, but as it is doesn't seem quite right. Do we want to keep the self...arrays as flatiters?
@@ -483,9 +484,9 @@
def _parse_binary_hierarchy(self):
mylog.info("Getting the binary hierarchy")
f = h5py.File(self.hierarchy_filename[:-9] + "harrays")
- self.grid_dimensions.flat[:] = f["/ActiveDimensions"][:]
- self.grid_left_edge.flat[:] = f["/LeftEdges"][:]
- self.grid_right_edge.flat[:] = f["/RightEdges"][:]
+ self.grid_dimensions[:] = f["/ActiveDimensions"][:]
+ self.grid_left_edge[:] = f["/LeftEdges"][:]
+ self.grid_right_edge[:] = f["/RightEdges"][:]
levels = f["/Level"][:]
parents = f["/ParentIDs"][:]
procs = f["/Processor"][:]
With these fixes I'm hanging in the "for level in xrange(self.max_level+1):" loop in "_initialize_level_stats()" because self.max_level+1 is much much greater than the actual max:
(Pdb) self.max_level
2867
(Pdb) self.level_stats
rec.array([(8, 16777216, 0), (239, 256240, 1), (2, 944, 2), (1, 384, 3),
(3, 9536, 4), (2, 4992, 5), (2, 1488, 6), (1, 384, 7), (1, 384, 8),
(1, 216, 9), (1, 480, 10), (1, 384, 11), (2, 928, 12), (1, 512, 13),
(1, 288, 14), (3, 3200, 15), (1, 1200, 16), (1, 800, 17),
(1, 288, 18), (1, 512, 19), (1, 640, 20), (1, 384, 21),
(1, 216, 22), (1, 512, 23), (1, 288, 24), (1, 192, 25),
(1, 216, 26), (2, 8608, 27), (3, 4224, 28), (1, 384, 29),
(1, 1440, 30), (1, 640, 31), (1, 144, 32), (3, 1120, 33),
(1, 216, 34), (2, 728, 35), (1, 512, 36), (2, 432, 37),
(1, 384, 38), (1, 288, 39), (1, 288, 40), (1, 216, 41),
(2, 504, 42), (1, 5040, 43), (1, 3528, 44), (1, 1680, 45),
(1, 640, 46), (2, 1216, 47)],
dtype=[('numgrids', '
Commenting out __obtain_filenames in trunk doesn't seem to do it for me. My dataset is located where the full pathnames say they are. It's dying inside of _save_data in HierarchyType where it's trying to save the .yt file, due to h5py being unwilling to save an empty array (h5py.h5.ArgsError: Zero sized dimension for non-unlimited dimension). From pdb:
-> arr = myGroup.create_dataset(name,data=array) (Pdb) name 'DataFields' (Pdb) array []
Interesting. Any ideas on how to fix it? :(
I've spent the last while trying to get this to work in the hg yt tip, but I think I need some help. I've discovered a couple things in HierarchyType.py that seems wrong. I haven't committed them because I'm not sure I understand what's going on well enough to do that.
As a sanity check, can you do this with the hierarchy-opt branch? To make sure I didn't screw anything up during the merge.
h5py doesn't like directly accessing the file pointer. Adding an intermediate step fixes this hangup:
Commit this.
These three datasets in the .harrays file are not 1D, they're 3D, so trying to put them in a flatiter array won't work. I might be doing something wrong here, but as it is doesn't seem quite right. Do we want to keep the self...arrays as flatiters?
No, that's not crucial.
With these fixes I'm hanging in the "for level in xrange(self.max_level+1):" loop in "_initialize_level_stats()" because self.max_level+1 is much much greater than the actual max:
self.max_level is being calculated wrong, and my suspicion is that it's getting the grid count, not the maximum level. Can you manually examine the .harrays file to make sure this is the case? Looking at the short output you have below, looks like it might actually be putting the ParentIDs into the level array.
Why is MAXLEVEL different than self.max_level?
It's a remnant from the very first piece of code I wrote for yt! :) It can probably be replaced with something, but I don't think it's that big a deal. It's only used in setting up the level stats, and if that happens *after* the max_level is set, we can just use that instead. -Matt
-> arr = myGroup.create_dataset(name,data=array) (Pdb) name 'DataFields' (Pdb) array []
Interesting. Any ideas on how to fix it? :(
None now, sorry. I'm focusing on the hg version.
As a sanity check, can you do this with the hierarchy-opt branch? To make sure I didn't screw anything up during the merge.
I get exactly the same issues. Which is a kind of good.
self.max_level is being calculated wrong, and my suspicion is that it's getting the grid count, not the maximum level. Can you manually examine the .harrays file to make sure this is the case? Looking at the short output you have below, looks like it might actually be putting the ParentIDs into the level array.
The maximum value in .harrays["/Level"] (I used h5dump to check, to be sure) is the same as self.max_level, which is as it should be, right? I have self.max_level = 2867, max(["/Level"]) = 2867, the number of grids is 5403, MAXLEVEL is 48. Should self.level_stats have 2867 entries, 5403 or 48? _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
The maximum value in .harrays["/Level"] (I used h5dump to check, to be sure) is the same as self.max_level, which is as it should be, right? I have self.max_level = 2867, max(["/Level"]) = 2867, the number of grids is 5403, MAXLEVEL is 48. Should self.level_stats have 2867 entries, 5403 or 48?
No, self.max_level and ["/Level"].max() should NOT be 2867. The arrays mean exactly what you think they mean: the level of a grid. level_stats should have one entry for every level in the hierarchy. Send me your .harrays file? Perhaps that has been generated incorrectly; did it come straight out of Enzo? I wrote both the enzo and yt code for the .harrays, so maybe I screwed something up. -Matt
Matt,
No, self.max_level and ["/Level"].max() should NOT be 2867. The arrays mean exactly what you think they mean: the level of a grid. level_stats should have one entry for every level in the hierarchy. Send me your .harrays file? Perhaps that has been generated incorrectly; did it come straight out of Enzo? I wrote both the enzo and yt code for the .harrays, so maybe I screwed something up.
I think it's best if I just point you to it, let me know if it's not readable: /mirage/sskory/reddead-survey-amr/RD0005 It did come straight out of enzo-woc rev. c9ad53e32b6c. Here's my enzo setup: MACHINE: Triton MACHINE-NAME: triton PARAMETER_MAX_SUBGRIDS: 100000 PARAMETER_MAX_BARYONS: 20 PARAMETER_MAX_TASKS_PER_NODE: 8 CONFIG_PRECISION: 64 CONFIG_PARTICLES: 64 CONFIG_INTEGERS: 64 CONFIG_INITS: 64 CONFIG_IO: 64 CONFIG_USE_MPI: yes CONFIG_OBJECT_MODE: 64 CONFIG_TASKMAP: no CONFIG_PACKED_AMR: yes CONFIG_PACKED_MEM: no CONFIG_LCAPERF: no CONFIG_PAPI: no CONFIG_PYTHON: no CONFIG_ECUDA: no CONFIG_OOC_BOUNDARY: no CONFIG_OPT: high CONFIG_TESTING: no CONFIG_TPVEL: no CONFIG_PHOTON: yes CONFIG_USE_HDF4: no CONFIG_NEW_GRID_IO: yes CONFIG_BITWISE_IDENTICALITY: no CONFIG_FAST_SIB: yes CONFIG_FLUX_FIX: yes _______________________________________________________ sskory@physics.ucsd.edu o__ Stephen Skory http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student ________________________________(_)_\(_)_______________
participants (3)
-
Britton Smith
-
Matthew Turk
-
Stephen Skory