I have a question about memory usage of yt's HOP HaloFinder. I have a N=256^3 DM only Enzo simulation that I ran with a 512^3 root grid and fairly aggressive DM refinement with MaximumRefinementLevel=7. Although the run only has 256^3 particles, the AMR has resulted in 163,951 grids and more than 1.5e9 grid cells. Running HOP like so halo_list = HaloFinder(pf) I'm finding that yt is using around 14GB of memory during the particle reading (prior to actually starting the HOP process), which is way out of proportion to the relatively small number of particles. It seems that the memory usage is being driven by the huge number of grids rather than the number of particles. I've traced the memory increase to the calculation of the total mass, specifically to this line: total_mass = self.comm.mpi_allreduce(self._data_source["ParticleMassMsun"].sum(dtype='float64'), op='sum') and again further down: sub_mass = self._data_source["ParticleMassMsun"].sum(dtype='float64') When I specify the total mass as a keyword and comment out the sub_mass calculation (forcing sub_mass = total_mass), then the memory usage remains small. So there's something about the summing up that is leaking memory. Can anyone here shed any light on this puzzling memory hunger? Mike -- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************
Oops, should have added that this is in
yt/analysis_modules/halo_finding/halo_objects.py under class
HOPHaloFinder.
On Wed, Feb 22, 2012 at 3:20 PM, Michael Kuhlen
I have a question about memory usage of yt's HOP HaloFinder.
I have a N=256^3 DM only Enzo simulation that I ran with a 512^3 root grid and fairly aggressive DM refinement with MaximumRefinementLevel=7. Although the run only has 256^3 particles, the AMR has resulted in 163,951 grids and more than 1.5e9 grid cells.
Running HOP like so
halo_list = HaloFinder(pf)
I'm finding that yt is using around 14GB of memory during the particle reading (prior to actually starting the HOP process), which is way out of proportion to the relatively small number of particles. It seems that the memory usage is being driven by the huge number of grids rather than the number of particles.
I've traced the memory increase to the calculation of the total mass, specifically to this line:
total_mass = self.comm.mpi_allreduce(self._data_source["ParticleMassMsun"].sum(dtype='float64'), op='sum')
and again further down:
sub_mass = self._data_source["ParticleMassMsun"].sum(dtype='float64')
When I specify the total mass as a keyword and comment out the sub_mass calculation (forcing sub_mass = total_mass), then the memory usage remains small. So there's something about the summing up that is leaking memory.
Can anyone here shed any light on this puzzling memory hunger?
Mike
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************
Hi Mike,
total_mass = self.comm.mpi_allreduce(self._data_source["ParticleMassMsun"].sum(dtype='float64'), op='sum')
and again further down:
sub_mass = self._data_source["ParticleMassMsun"].sum(dtype='float64')
Could you try turning both of these into a quantity in the source file: self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") and see if that changes anything? -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
Mike, sorry to reply so quickly to my email, but I realized I could have been clearer. Please replace: self._data_source["ParticleMassMsun"].sum(dtype='float64') with self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") in both cases. -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
Hi Stephen
Yes, that does the trick. However,
self._data_source.quantities["TotalQuantity"]("ParticleMassMsun")
returns a list, so I needed to add a '[0]' in order to get just the
number.
It's not immediately clear to me how to implement this fix for the
dm_only=True case, in which you only want the sum over DM particles.
Lastly, does the sub_mass calculation have to be done even when
subvolume is None and only a single processor is being used? It seems
in this case sub_mass = total_mass and the second calculation could be
skipped.
Thanks for your help!
Mike
On Wed, Feb 22, 2012 at 4:12 PM, Stephen Skory wrote:
Mike,
sorry to reply so quickly to my email, but I realized I could have been clearer. Please replace:
self._data_source["ParticleMassMsun"].sum(dtype='float64')
with
self._data_source.quantities["TotalQuantity"]("ParticleMassMsun")
in both cases. -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************
Hi Mike,
Yes, that does the trick. However, self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") returns a list, so I needed to add a '[0]' in order to get just the number.
I'm glad it helped. I will make this change soon to the source. I always forget about that list part!
It's not immediately clear to me how to implement this fix for the dm_only=True case, in which you only want the sum over DM particles.
It may be possible to write a special field or something... I'll think about it.
Lastly, does the sub_mass calculation have to be done even when subvolume is None and only a single processor is being used? It seems in this case sub_mass = total_mass and the second calculation could be skipped.
I think you're right. I'll make this change too! Thanks for pointing this out. -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
Hi Mike,
Thanks for letting me take a look at the data. I have identified the
problem. To convert from code-units to good-units, yt calculates the
conversion factor. However, it also batches the grids to convert. To
do so, it calculates -- in this case -- CellVolume for every grid.
(Your 512^3 topgrid exacerbates the problem.) However, because I (and
this was most definitely my fault) did not use the functionality in yt
to ensure that every grid that has a supplemental field loaded then
flushes that field from memory once it has been used, the CellVolume
fields are all retained. So, CellVolume -- along with maybe one or
two other fields -- was being generated for every grid.
Having fixed this, I see about what I would expect for memory use on
this dataset.
I've issued a pull request to fix this problem, and I would request
testing from both you and Stephen, as it touches the way particles are
read and converted. I am leery of changes like this without a few
more sets of eyes. Additionally, I have tested it, and while it gives
the same answer to a very good precision, it is enough different
(likely because of concatenation order and FP-roundoff; for moving7,
the relative difference in a sum is ~1e-8) that the gold standard will
have to be re-generated.
The PR is here:
https://bitbucket.org/yt_analysis/yt/pull-request/105/particle-io-fix
-Matt
On Wed, Feb 22, 2012 at 9:16 PM, Stephen Skory wrote:
Hi Mike,
Yes, that does the trick. However, self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") returns a list, so I needed to add a '[0]' in order to get just the number.
I'm glad it helped. I will make this change soon to the source. I always forget about that list part!
It's not immediately clear to me how to implement this fix for the dm_only=True case, in which you only want the sum over DM particles.
It may be possible to write a special field or something... I'll think about it.
Lastly, does the sub_mass calculation have to be done even when subvolume is None and only a single processor is being used? It seems in this case sub_mass = total_mass and the second calculation could be skipped.
I think you're right. I'll make this change too! Thanks for pointing this out.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
Hi Matt
I cloned your repository and gave it a whirl. I can confirm that the
memory usage now is as expected. Also the halo catalog is identical to
the one that the earlier version produced, so it seems to me that
everything is good.
Cheers,
Mike
On Thu, Feb 23, 2012 at 7:31 PM, Matthew Turk
Hi Mike,
Thanks for letting me take a look at the data. I have identified the problem. To convert from code-units to good-units, yt calculates the conversion factor. However, it also batches the grids to convert. To do so, it calculates -- in this case -- CellVolume for every grid. (Your 512^3 topgrid exacerbates the problem.) However, because I (and this was most definitely my fault) did not use the functionality in yt to ensure that every grid that has a supplemental field loaded then flushes that field from memory once it has been used, the CellVolume fields are all retained. So, CellVolume -- along with maybe one or two other fields -- was being generated for every grid.
Having fixed this, I see about what I would expect for memory use on this dataset.
I've issued a pull request to fix this problem, and I would request testing from both you and Stephen, as it touches the way particles are read and converted. I am leery of changes like this without a few more sets of eyes. Additionally, I have tested it, and while it gives the same answer to a very good precision, it is enough different (likely because of concatenation order and FP-roundoff; for moving7, the relative difference in a sum is ~1e-8) that the gold standard will have to be re-generated.
The PR is here:
https://bitbucket.org/yt_analysis/yt/pull-request/105/particle-io-fix
-Matt
On Wed, Feb 22, 2012 at 9:16 PM, Stephen Skory
wrote:Hi Mike,
Yes, that does the trick. However, self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") returns a list, so I needed to add a '[0]' in order to get just the number.
I'm glad it helped. I will make this change soon to the source. I always forget about that list part!
It's not immediately clear to me how to implement this fix for the dm_only=True case, in which you only want the sum over DM particles.
It may be possible to write a special field or something... I'll think about it.
Lastly, does the sub_mass calculation have to be done even when subvolume is None and only a single processor is being used? It seems in this case sub_mass = total_mass and the second calculation could be skipped.
I think you're right. I'll make this change too! Thanks for pointing this out.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************
Small correction, the old and new halo catalogs aren't strictly
identical. However, the differences between the two are very small,
typically less than 1e-5. Only the center of mass positions and the
max_r field occasionally show differences at the 1e-2 level, but even
those fields usually are within 1e-5. I think this is consistent with
roundoff error.
On Thu, Feb 23, 2012 at 11:51 PM, Michael Kuhlen
Hi Matt
I cloned your repository and gave it a whirl. I can confirm that the memory usage now is as expected. Also the halo catalog is identical to the one that the earlier version produced, so it seems to me that everything is good.
Cheers, Mike
On Thu, Feb 23, 2012 at 7:31 PM, Matthew Turk
wrote: Hi Mike,
Thanks for letting me take a look at the data. I have identified the problem. To convert from code-units to good-units, yt calculates the conversion factor. However, it also batches the grids to convert. To do so, it calculates -- in this case -- CellVolume for every grid. (Your 512^3 topgrid exacerbates the problem.) However, because I (and this was most definitely my fault) did not use the functionality in yt to ensure that every grid that has a supplemental field loaded then flushes that field from memory once it has been used, the CellVolume fields are all retained. So, CellVolume -- along with maybe one or two other fields -- was being generated for every grid.
Having fixed this, I see about what I would expect for memory use on this dataset.
I've issued a pull request to fix this problem, and I would request testing from both you and Stephen, as it touches the way particles are read and converted. I am leery of changes like this without a few more sets of eyes. Additionally, I have tested it, and while it gives the same answer to a very good precision, it is enough different (likely because of concatenation order and FP-roundoff; for moving7, the relative difference in a sum is ~1e-8) that the gold standard will have to be re-generated.
The PR is here:
https://bitbucket.org/yt_analysis/yt/pull-request/105/particle-io-fix
-Matt
On Wed, Feb 22, 2012 at 9:16 PM, Stephen Skory
wrote:Hi Mike,
Yes, that does the trick. However, self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") returns a list, so I needed to add a '[0]' in order to get just the number.
I'm glad it helped. I will make this change soon to the source. I always forget about that list part!
It's not immediately clear to me how to implement this fix for the dm_only=True case, in which you only want the sum over DM particles.
It may be possible to write a special field or something... I'll think about it.
Lastly, does the sub_mass calculation have to be done even when subvolume is None and only a single processor is being used? It seems in this case sub_mass = total_mass and the second calculation could be skipped.
I think you're right. I'll make this change too! Thanks for pointing this out.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************
-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************
Hi Matt & Mike, I just ran some tests and I found no differences between the current yt tip and Matt's branch with the PR. Admittedly, this was on smallish datasets (64^3, 128^3). The differences that Mike reported don't trouble me. I'll go ahead and accept the PR. Thanks for looking into this issue, Matt! -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
participants (3)
-
Matthew Turk
-
Michael Kuhlen
-
Stephen Skory