Mailman 3 memory usage of HOP halo finder - yt-users

memory usage of HOP halo finder

older
yt installation problem in fedora...

Michael Kuhlen

Feb. 22, 2012

11:20 p.m.

I have a question about memory usage of yt's HOP HaloFinder. I have a N=256^3 DM only Enzo simulation that I ran with a 512^3 root grid and fairly aggressive DM refinement with MaximumRefinementLevel=7. Although the run only has 256^3 particles, the AMR has resulted in 163,951 grids and more than 1.5e9 grid cells. Running HOP like so halo_list = HaloFinder(pf) I'm finding that yt is using around 14GB of memory during the particle reading (prior to actually starting the HOP process), which is way out of proportion to the relatively small number of particles. It seems that the memory usage is being driven by the huge number of grids rather than the number of particles. I've traced the memory increase to the calculation of the total mass, specifically to this line: total_mass = self.comm.mpi_allreduce(self._data_source["ParticleMassMsun"].sum(dtype='float64'), op='sum') and again further down: sub_mass = self._data_source["ParticleMassMsun"].sum(dtype='float64') When I specify the total mass as a keyword and comment out the sub_mass calculation (forcing sub_mass = total_mass), then the memory usage remains small. So there's something about the summing up that is leaking memory. Can anyone here shed any light on this puzzling memory hunger? Mike -- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

Show replies by date

Michael Kuhlen

February 2012

11:24 p.m.

Oops, should have added that this is in yt/analysis_modules/halo_finding/halo_objects.py under class HOPHaloFinder. On Wed, Feb 22, 2012 at 3:20 PM, Michael Kuhlen <mqk@astro.berkeley.edu> wrote:

...

I have a question about memory usage of yt's HOP HaloFinder.

I have a N=256^3 DM only Enzo simulation that I ran with a 512^3 root grid and fairly aggressive DM refinement with MaximumRefinementLevel=7. Although the run only has 256^3 particles, the AMR has resulted in 163,951 grids and more than 1.5e9 grid cells.

Running HOP like so

halo_list = HaloFinder(pf)

I'm finding that yt is using around 14GB of memory during the particle reading (prior to actually starting the HOP process), which is way out of proportion to the relatively small number of particles. It seems that the memory usage is being driven by the huge number of grids rather than the number of particles.

I've traced the memory increase to the calculation of the total mass, specifically to this line:

total_mass = self.comm.mpi_allreduce(self._data_source["ParticleMassMsun"].sum(dtype='float64'), op='sum')

and again further down:

sub_mass = self._data_source["ParticleMassMsun"].sum(dtype='float64')

When I specify the total mass as a keyword and comment out the sub_mass calculation (forcing sub_mass = total_mass), then the memory usage remains small. So there's something about the summing up that is leaking memory.

Can anyone here shed any light on this puzzling memory hunger?

Mike

-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

Stephen Skory

12:10 a.m.

Hi Mike,

...

...
total_mass = self.comm.mpi_allreduce(self._data_source["ParticleMassMsun"].sum(dtype='float64'), op='sum')

and again further down:

sub_mass = self._data_source["ParticleMassMsun"].sum(dtype='float64')

Could you try turning both of these into a quantity in the source file: self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") and see if that changes anything? -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)

Stephen Skory

12:12 a.m.

Mike, sorry to reply so quickly to my email, but I realized I could have been clearer. Please replace: self._data_source["ParticleMassMsun"].sum(dtype='float64') with self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") in both cases. -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)

Michael Kuhlen

1:32 a.m.

Hi Stephen Yes, that does the trick. However, self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") returns a list, so I needed to add a '[0]' in order to get just the number. It's not immediately clear to me how to implement this fix for the dm_only=True case, in which you only want the sum over DM particles. Lastly, does the sub_mass calculation have to be done even when subvolume is None and only a single processor is being used? It seems in this case sub_mass = total_mass and the second calculation could be skipped. Thanks for your help! Mike On Wed, Feb 22, 2012 at 4:12 PM, Stephen Skory <s@skory.us> wrote:

...

Mike,

sorry to reply so quickly to my email, but I realized I could have been clearer. Please replace:

self._data_source["ParticleMassMsun"].sum(dtype='float64')

with

self._data_source.quantities["TotalQuantity"]("ParticleMassMsun")

in both cases. -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Stephen Skory

2:16 a.m.

Hi Mike,

...

Yes, that does the trick. However, self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") returns a list, so I needed to add a '[0]' in order to get just the number.

I'm glad it helped. I will make this change soon to the source. I always forget about that list part!

...

It's not immediately clear to me how to implement this fix for the dm_only=True case, in which you only want the sum over DM particles.

It may be possible to write a special field or something... I'll think about it.

...

Lastly, does the sub_mass calculation have to be done even when subvolume is None and only a single processor is being used? It seems in this case sub_mass = total_mass and the second calculation could be skipped.

I think you're right. I'll make this change too! Thanks for pointing this out. -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)

Matthew Turk

3:31 a.m.

Hi Mike, Thanks for letting me take a look at the data. I have identified the problem. To convert from code-units to good-units, yt calculates the conversion factor. However, it also batches the grids to convert. To do so, it calculates -- in this case -- CellVolume for every grid. (Your 512^3 topgrid exacerbates the problem.) However, because I (and this was most definitely my fault) did not use the functionality in yt to ensure that every grid that has a supplemental field loaded then flushes that field from memory once it has been used, the CellVolume fields are all retained. So, CellVolume -- along with maybe one or two other fields -- was being generated for every grid. Having fixed this, I see about what I would expect for memory use on this dataset. I've issued a pull request to fix this problem, and I would request testing from both you and Stephen, as it touches the way particles are read and converted. I am leery of changes like this without a few more sets of eyes. Additionally, I have tested it, and while it gives the same answer to a very good precision, it is enough different (likely because of concatenation order and FP-roundoff; for moving7, the relative difference in a sum is ~1e-8) that the gold standard will have to be re-generated. The PR is here: https://bitbucket.org/yt_analysis/yt/pull-request/105/particle-io-fix -Matt On Wed, Feb 22, 2012 at 9:16 PM, Stephen Skory <s@skory.us> wrote:

...

Hi Mike,

...
Yes, that does the trick. However, self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") returns a list, so I needed to add a '[0]' in order to get just the number.

I'm glad it helped. I will make this change soon to the source. I always forget about that list part!

...
It's not immediately clear to me how to implement this fix for the dm_only=True case, in which you only want the sum over DM particles.

It may be possible to write a special field or something... I'll think about it.

...
Lastly, does the sub_mass calculation have to be done even when subvolume is None and only a single processor is being used? It seems in this case sub_mass = total_mass and the second calculation could be skipped.

I think you're right. I'll make this change too! Thanks for pointing this out.

-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Michael Kuhlen

7:51 a.m.

Hi Matt I cloned your repository and gave it a whirl. I can confirm that the memory usage now is as expected. Also the halo catalog is identical to the one that the earlier version produced, so it seems to me that everything is good. Cheers, Mike On Thu, Feb 23, 2012 at 7:31 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...

Hi Mike,

Thanks for letting me take a look at the data. I have identified the problem. To convert from code-units to good-units, yt calculates the conversion factor. However, it also batches the grids to convert. To do so, it calculates -- in this case -- CellVolume for every grid. (Your 512^3 topgrid exacerbates the problem.) However, because I (and this was most definitely my fault) did not use the functionality in yt to ensure that every grid that has a supplemental field loaded then flushes that field from memory once it has been used, the CellVolume fields are all retained. So, CellVolume -- along with maybe one or two other fields -- was being generated for every grid.

Having fixed this, I see about what I would expect for memory use on this dataset.

I've issued a pull request to fix this problem, and I would request testing from both you and Stephen, as it touches the way particles are read and converted. I am leery of changes like this without a few more sets of eyes. Additionally, I have tested it, and while it gives the same answer to a very good precision, it is enough different (likely because of concatenation order and FP-roundoff; for moving7, the relative difference in a sum is ~1e-8) that the gold standard will have to be re-generated.

The PR is here:

https://bitbucket.org/yt_analysis/yt/pull-request/105/particle-io-fix

-Matt

On Wed, Feb 22, 2012 at 9:16 PM, Stephen Skory <s@skory.us> wrote:

...
Hi Mike,

...
Yes, that does the trick. However, self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") returns a list, so I needed to add a '[0]' in order to get just the number.

I'm glad it helped. I will make this change soon to the source. I always forget about that list part!

...
It's not immediately clear to me how to implement this fix for the dm_only=True case, in which you only want the sum over DM particles.

It may be possible to write a special field or something... I'll think about it.

...
Lastly, does the sub_mass calculation have to be done even when subvolume is None and only a single processor is being used? It seems in this case sub_mass = total_mass and the second calculation could be skipped.

I think you're right. I'll make this change too! Thanks for pointing this out.

-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

Michael Kuhlen

8:38 a.m.

Small correction, the old and new halo catalogs aren't strictly identical. However, the differences between the two are very small, typically less than 1e-5. Only the center of mass positions and the max_r field occasionally show differences at the 1e-2 level, but even those fields usually are within 1e-5. I think this is consistent with roundoff error. On Thu, Feb 23, 2012 at 11:51 PM, Michael Kuhlen <mqk@astro.berkeley.edu> wrote:

...

Hi Matt

I cloned your repository and gave it a whirl. I can confirm that the memory usage now is as expected. Also the halo catalog is identical to the one that the earlier version produced, so it seems to me that everything is good.

Cheers, Mike

On Thu, Feb 23, 2012 at 7:31 PM, Matthew Turk <matthewturk@gmail.com> wrote:

...
Hi Mike,

Thanks for letting me take a look at the data. I have identified the problem. To convert from code-units to good-units, yt calculates the conversion factor. However, it also batches the grids to convert. To do so, it calculates -- in this case -- CellVolume for every grid. (Your 512^3 topgrid exacerbates the problem.) However, because I (and this was most definitely my fault) did not use the functionality in yt to ensure that every grid that has a supplemental field loaded then flushes that field from memory once it has been used, the CellVolume fields are all retained. So, CellVolume -- along with maybe one or two other fields -- was being generated for every grid.

Having fixed this, I see about what I would expect for memory use on this dataset.

I've issued a pull request to fix this problem, and I would request testing from both you and Stephen, as it touches the way particles are read and converted. I am leery of changes like this without a few more sets of eyes. Additionally, I have tested it, and while it gives the same answer to a very good precision, it is enough different (likely because of concatenation order and FP-roundoff; for moving7, the relative difference in a sum is ~1e-8) that the gold standard will have to be re-generated.

The PR is here:

https://bitbucket.org/yt_analysis/yt/pull-request/105/particle-io-fix

-Matt

On Wed, Feb 22, 2012 at 9:16 PM, Stephen Skory <s@skory.us> wrote:

...
Hi Mike,

...
Yes, that does the trick. However, self._data_source.quantities["TotalQuantity"]("ParticleMassMsun") returns a list, so I needed to add a '[0]' in order to get just the number.

I'm glad it helped. I will make this change soon to the source. I always forget about that list part!

...
It's not immediately clear to me how to implement this fix for the dm_only=True case, in which you only want the sum over DM particles.

It may be possible to write a special field or something... I'll think about it.

...
Lastly, does the sub_mass calculation have to be done even when subvolume is None and only a single processor is being used? It seems in this case sub_mass = total_mass and the second calculation could be skipped.

I think you're right. I'll make this change too! Thanks for pointing this out.

-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org

-- ********************************************************************* * * * Dr. Michael Kuhlen Theoretical Astrophysics Center * * email: mqk@astro.berkeley.edu UC Berkeley * * cell phone: (831) 588-1468 B-116 Hearst Field Annex # 3411 * * skype username: mikekuhlen Berkeley, CA 94720 * * * *********************************************************************

Stephen Skory

4:01 p.m.

Hi Matt & Mike, I just ran some tests and I found no differences between the current yt tip and Matt's branch with the PR. Admittedly, this was on smallish datasets (64^3, 128^3). The differences that Mike reported don't trouble me. I'll go ahead and accept the PR. Thanks for looking into this issue, Matt! -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)

4749

Age (days ago)

4751

Last active (days ago)

List overview

Download

9 comments

3 participants

participants (3)

Matthew Turk
Michael Kuhlen
Stephen Skory

memory usage of HOP halo finder

Michael Kuhlen

Michael Kuhlen

Stephen Skory

Stephen Skory

Michael Kuhlen

Stephen Skory

Matthew Turk

Michael Kuhlen

Michael Kuhlen

Stephen Skory

tags

participants (3)