parallel_objects not freeing communicators?
I have a script that I'm using to perform various analysis functions on halos from an Enzo simulation. In this script I'm using parallel_objects to split the halos up among multiple processors. for sto, halo in parallel_objects(halos, num_procs, storage=halo_props): sto.result_id = halo.id sto.result = analyze_halo(pf, halo, radius, old_time, current_time, f_ej) within analyze_halo I use WeightedAverageQuantity on a region containing each halo, which from what I understand also uses parallel_objects. reg = pf.h.sphere(halo.center_of_mass(),radius) properties = {'metallicity': (reg.quantities['WeightedAverageQuantity'] ('metallicity','Density')).in_units('Zsun')} This works fine for a while, but after the script has analyzed several thousand halos it eventually crashes with an MPI exception 'Too many communicators'. I took a quick look at the code in parallel_analysis_interface, and it seems like parallel_objects creates new communicators with every call but never explicitly frees them. Does anyone know if this is actually the case, or is it an issue with my script or the mpi libraries on my machine? I'm using the experimental branch of yt-3.0. I should be able to get this script working fairly easily by just calculating my weighted average directly rather than calling quantities (and I probably don't want this being done in parallel anyways since I'm only looking at small regions), but if parallel_objects is actually leaking communicators then it should be fixed at some point. - Josh
Hi Josh,
I suspect you're right, but I can't tell where the problem is arising.
What we do at the end of parallel_objects is this:
if parallel_capable:
communication_system.pop()
I believe this was implemented under the understanding that the
communicators, when garbage collected, would be destroyed. Evidently
this is not the case, and so a __del__ method should be implemented
for the Communicator object. I think this will need to be some call
to self.comm.Free(). Can you try that and see if it helps?
-Matt
On Fri, Apr 18, 2014 at 6:36 PM, Josh Moloney
I have a script that I'm using to perform various analysis functions on halos from an Enzo simulation. In this script I'm using parallel_objects to split the halos up among multiple processors.
for sto, halo in parallel_objects(halos, num_procs, storage=halo_props): sto.result_id = halo.id sto.result = analyze_halo(pf, halo, radius, old_time, current_time, f_ej)
within analyze_halo I use WeightedAverageQuantity on a region containing each halo, which from what I understand also uses parallel_objects.
reg = pf.h.sphere(halo.center_of_mass(),radius) properties = {'metallicity': (reg.quantities['WeightedAverageQuantity'] ('metallicity','Density')).in_units('Zsun')}
This works fine for a while, but after the script has analyzed several thousand halos it eventually crashes with an MPI exception 'Too many communicators'. I took a quick look at the code in parallel_analysis_interface, and it seems like parallel_objects creates new communicators with every call but never explicitly frees them. Does anyone know if this is actually the case, or is it an issue with my script or the mpi libraries on my machine? I'm using the experimental branch of yt-3.0. I should be able to get this script working fairly easily by just calculating my weighted average directly rather than calling quantities (and I probably don't want this being done in parallel anyways since I'm only looking at small regions), but if parallel_objects is actually leaking communicators then it should be fixed at some point. - Josh
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (2)
-
Josh Moloney
-
Matthew Turk