Hi John, On Tue, Feb 4, 2014 at 1:21 PM, John Wise <jwise@physics.gatech.edu> wrote:
Hi all,
I've been trying to run rockstar in a sizable Enzo simulation (150k grids) with ~100 outputs, where it's running out of memory just loading the hierarchies. One hierarchy instance consumes almost 1GB! I've found this to be problem not specific to rockstar but time series objects.
Hm. How are you iterating over the parameter files? With the time series we try to do a load/retain on demand system, where the parameter files and their hierarchies are only kept around as long as they need to be. Devin and Hilary looked at how this worked with Rockstar, and I thought they came up with that it was okay.
My solution is to explicitly delete the hierarchy's metadata and grids. Since I haven't contributed to yt-3.0 yet, I wanted to run this by everyone before submitting a PR.
My question is about coding style, in that I see very few __del__() functions now. In my working version, I've defined a __del__ function for the grid_geometry_handler as
def __del__(self): del self.grid_dimensions del self.grid_left_edge del self.grid_right_edge del self.grid_levels del self.grid_particle_count del self.grids
When I delete pf._instantiated_hierarchy after each loop of a time series iterator, I don't see any excessive memory usage anymore. It just reuses the allocated memory from the previous iteration, which is totally fine by me. However, when I include this in a __del__ function for a static_output, I still see excessive memory usage, which is bizarre to me.
Hmm. I'm of two minds on this. On the one hand, I am not really *opposed* to destructors, but I don't like that they are necessary. Because the hierarchy is weirdly self-referential to the static output, this sometimes causes problems and the garbage collector doesn't pick it up. However, when the parameter file is deallocated, it *should* deallocate all of the arrays. Whether it does or not may be related to the system allocator, and whether it reuses the memory is potentially also related to that. On the other hand, I'd rather fix the issue of having a separate index and static output object, and break the reference cycle between them. So I guess where I fall down on this is: I'd like to fix the underlying issue, which is something I have been off-and-on working on. But since you are measurably seeing improvement with this change, I'm okay with it going in. But hopefully it will become obsolete eventually. ;-) Incidentally, I would still like to see how the time series is iterating, and how the references pass through the system. A related paper you might find interesting: http://www.dlr.de/sc/en/Portaldata/15/Resources/dokumente/PyHPC2013/submissi... -Matt
Should I define a new routine in the grid_geometry_handler, something like clear_hierarchy(), or keep the __del__ function? I ask because I want to keep in line with the overall structure of yt-3.0. This could also be included in the clear_all_data() call.
What do people think the best approach would be?
Thanks, John
-- John Wise Assistant Professor of Physics Center for Relativistic Astrophysics, Georgia Tech http://cosmo.gatech.edu _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org