Hi all, I found that using a Enzo dataset I was getting slightly different numbers when 1) using the list of dark matter particles selected by creation_time < 0.0 In [97]: sph_dm = sph['creation_time'] < 0.0 In [98]: print "%12.12e" % (sph['ParticleMassMsun'][sph_dm]).sum() 1.211311468567e+11 2) compared to summing the dark matter particles inside a 3D container with TotalQuantity In [101]: print "%12.12e" % (sph.quantities['TotalQuantity']('Dark_Matter_Density')[0]*vol/Msun) 1.188937185993e+11 I'm wondering if the field and particles are handled differently when being counted as inside or outside the 3D container? From G.S.
Geoffrey, It seems that what is going on here is that whereas the particles in the container are simply the ones contained within the container's cells, the dark matter density is smeared onto the grid using the cloud-in-cell algorithm, meaning that for particles on the edge of their container some of their mass has been smeared onto cells just on the outside, which is a thing that Enzo does. So it's not really an apples-to-apples comparison. John ZuHone Laboratory for High-Energy Astrophysics NASA/Goddard Space Flight Center 8800 Greenbelt Rd., Mail Code 662 Greenbelt, MD 20771 (w) 301-286-2531 (m) 781-708-5004 john.zuhone@nasa.gov jzuhone@gmail.com
On Dec 20, 2013, at 9:02 PM, Geoffrey So
wrote: Hi all,
I found that using a Enzo dataset I was getting slightly different numbers when 1) using the list of dark matter particles selected by creation_time < 0.0
In [97]: sph_dm = sph['creation_time'] < 0.0 In [98]: print "%12.12e" % (sph['ParticleMassMsun'][sph_dm]).sum() 1.211311468567e+11
2) compared to summing the dark matter particles inside a 3D container with TotalQuantity
In [101]: print "%12.12e" % (sph.quantities['TotalQuantity']('Dark_Matter_Density')[0]*vol/Msun) 1.188937185993e+11
I'm wondering if the field and particles are handled differently when being counted as inside or outside the 3D container?
From G.S. _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
So in that case, would using TotalQuantity to sum up particle information
inside 3D container be a bad practice?
From
G.S.
On Fri, Dec 20, 2013 at 6:20 PM, John ZuHone
Geoffrey,
It seems that what is going on here is that whereas the particles in the container are simply the ones contained within the container's cells, the dark matter density is smeared onto the grid using the cloud-in-cell algorithm, meaning that for particles on the edge of their container some of their mass has been smeared onto cells just on the outside, which is a thing that Enzo does.
So it's not really an apples-to-apples comparison.
John ZuHone Laboratory for High-Energy Astrophysics NASA/Goddard Space Flight Center 8800 Greenbelt Rd., Mail Code 662 Greenbelt, MD 20771 (w) 301-286-2531 (m) 781-708-5004 john.zuhone@nasa.gov jzuhone@gmail.com
On Dec 20, 2013, at 9:02 PM, Geoffrey So
wrote: Hi all,
I found that using a Enzo dataset I was getting slightly different numbers when 1) using the list of dark matter particles selected by creation_time < 0.0
In [97]: sph_dm = sph['creation_time'] < 0.0 In [98]: print "%12.12e" % (sph['ParticleMassMsun'][sph_dm]).sum() 1.211311468567e+11
2) compared to summing the dark matter particles inside a 3D container with TotalQuantity
In [101]: print "%12.12e" % (sph.quantities['TotalQuantity']('Dark_Matter_Density')[0]*vol/Msun) 1.188937185993e+11
I'm wondering if the field and particles are handled differently when being counted as inside or outside the 3D container?
From G.S. _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
I think it depends on what you are trying to measure. If you're concerned with grid quantities maybe it makes more sense to use the dark matter density. It is certainly more accurate in a localized sense, since it is essentially the DM mass interpolated. However, since you're looking at a total quantity, the larger the region is the less the difference between the two will matter. Neither quantity is really "better", if you ask me. It's more important just to be aware of what the subtle difference is between them. In any case, the difference in the total quantity is about a few percent it looks like. Are you attempting to quantify something with that level of sensitivity? John ZuHone Laboratory for High-Energy Astrophysics NASA/Goddard Space Flight Center 8800 Greenbelt Rd., Mail Code 662 Greenbelt, MD 20771 (w) 301-286-2531 (m) 781-708-5004 john.zuhone@nasa.gov jzuhone@gmail.com
On Dec 20, 2013, at 9:36 PM, Geoffrey So
wrote: So in that case, would using TotalQuantity to sum up particle information inside 3D container be a bad practice?
From G.S.
On Fri, Dec 20, 2013 at 6:20 PM, John ZuHone
wrote: Geoffrey, It seems that what is going on here is that whereas the particles in the container are simply the ones contained within the container's cells, the dark matter density is smeared onto the grid using the cloud-in-cell algorithm, meaning that for particles on the edge of their container some of their mass has been smeared onto cells just on the outside, which is a thing that Enzo does.
So it's not really an apples-to-apples comparison.
John ZuHone Laboratory for High-Energy Astrophysics NASA/Goddard Space Flight Center 8800 Greenbelt Rd., Mail Code 662 Greenbelt, MD 20771 (w) 301-286-2531 (m) 781-708-5004 john.zuhone@nasa.gov jzuhone@gmail.com
On Dec 20, 2013, at 9:02 PM, Geoffrey So
wrote: Hi all,
I found that using a Enzo dataset I was getting slightly different numbers when 1) using the list of dark matter particles selected by creation_time < 0.0
In [97]: sph_dm = sph['creation_time'] < 0.0 In [98]: print "%12.12e" % (sph['ParticleMassMsun'][sph_dm]).sum() 1.211311468567e+11
2) compared to summing the dark matter particles inside a 3D container with TotalQuantity
In [101]: print "%12.12e" % (sph.quantities['TotalQuantity']('Dark_Matter_Density')[0]*vol/Msun) 1.188937185993e+11
I'm wondering if the field and particles are handled differently when being counted as inside or outside the 3D container?
From G.S. _______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
Hi Geoffrey,
On Fri, Dec 20, 2013 at 9:02 PM, Geoffrey So
Hi all,
I found that using a Enzo dataset I was getting slightly different numbers when 1) using the list of dark matter particles selected by creation_time < 0.0
In [97]: sph_dm = sph['creation_time'] < 0.0 In [98]: print "%12.12e" % (sph['ParticleMassMsun'][sph_dm]).sum() 1.211311468567e+11
2) compared to summing the dark matter particles inside a 3D container with TotalQuantity
In [101]: print "%12.12e" % (sph.quantities['TotalQuantity']('Dark_Matter_Density')[0]*vol/Msun) 1.188937185993e+11
I'm wondering if the field and particles are handled differently when being counted as inside or outside the 3D container?
I think John's answer is completely correct, and likely the dominant problem, but there's a funny thing that happens if you're not extra careful with big arrays. As an example, something that I ran into a couple years ago: # This is an array filled with ones. arr = numpy.ones((512, 512, 512), dtype="float32") print arr.sum() print arr.size print arr.sum(dtype = "float64") The output isn't what you might expect, because numpy uses the dtype of the array for the accumulation. The quantities -- thanks to Doug Rudd -- are uniformly careful about upgrading to 64 bits the accumulators used for quantities. Again, I don't think this is dominant, but it is something to be aware of. -Matt
From G.S.
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
Thanks, Matt, John, some follow-up
"In any case, the difference in the total quantity is about a few percent
it looks like. Are you attempting to quantify something with that level of
sensitivity?"
- I'm trying to compare star particle mass, and if I use TotalQuantity and
StarMassMsun comparing to summing the star particle's mass. For small
halos that are only a few cells, the difference can be quite high ~O(10%).
So from the perspective of actually particles inside the halo, I should
use the particle list inside the 3D object, right? Same thing with dark
matter particle mass?
"The output isn't what you might expect, because numpy uses the dtype of
the array for the accumulation."
- I don't quite understand why would float32 accumulation with sum() not
equal to the actual thing but casting it as float64 does (isn't 2^32 <
512^3 so no overflow should occur)? Should I always use
arr.sum(dtype='float64') when using sum() with possible large datasets?
From
G.S.
On Sat, Dec 21, 2013 at 5:43 AM, Matthew Turk
Hi Geoffrey,
On Fri, Dec 20, 2013 at 9:02 PM, Geoffrey So
wrote: Hi all,
I found that using a Enzo dataset I was getting slightly different numbers when 1) using the list of dark matter particles selected by creation_time < 0.0
In [97]: sph_dm = sph['creation_time'] < 0.0 In [98]: print "%12.12e" % (sph['ParticleMassMsun'][sph_dm]).sum() 1.211311468567e+11
2) compared to summing the dark matter particles inside a 3D container with TotalQuantity
In [101]: print "%12.12e" % (sph.quantities['TotalQuantity']('Dark_Matter_Density')[0]*vol/Msun) 1.188937185993e+11
I'm wondering if the field and particles are handled differently when being counted as inside or outside the 3D container?
I think John's answer is completely correct, and likely the dominant problem, but there's a funny thing that happens if you're not extra careful with big arrays. As an example, something that I ran into a couple years ago:
# This is an array filled with ones. arr = numpy.ones((512, 512, 512), dtype="float32") print arr.sum() print arr.size print arr.sum(dtype = "float64")
The output isn't what you might expect, because numpy uses the dtype of the array for the accumulation. The quantities -- thanks to Doug Rudd -- are uniformly careful about upgrading to 64 bits the accumulators used for quantities. Again, I don't think this is dominant, but it is something to be aware of.
-Matt
From G.S.
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
Hi Geoffrey,
On Sun, Dec 22, 2013 at 7:51 PM, Geoffrey So
Thanks, Matt, John, some follow-up
"In any case, the difference in the total quantity is about a few percent it looks like. Are you attempting to quantify something with that level of sensitivity?" - I'm trying to compare star particle mass, and if I use TotalQuantity and StarMassMsun comparing to summing the star particle's mass. For small halos that are only a few cells, the difference can be quite high ~O(10%). So from the perspective of actually particles inside the halo, I should use the particle list inside the 3D object, right? Same thing with dark matter particle mass?
I think it all comes down to how you want to represent a halo, the force resolution in your simulation, and how the halos are identified. As John noted, you're smearing our stars over a handful of cells, so if that is what you want to be examining and calculating then you are doing it right; if the precision of your halo definition is higher than that of the precision of a cell definition then you should use the halo definition. I am not prepared to make a suggestion on that, as it will really depend on to what end you put the results of the analysis, the specific manner in which the data was generated, and how you generate your halo definitions.
"The output isn't what you might expect, because numpy uses the dtype of the array for the accumulation." - I don't quite understand why would float32 accumulation with sum() not equal to the actual thing but casting it as float64 does (isn't 2^32 < 512^3 so no overflow should occur)? Should I always use arr.sum(dtype='float64') when using sum() with possible large datasets?
You should probably use float64 even for relatively small arrays -- the roundoff error can accumulate quite quickly. As for the other issue, 2^32 isn't the maximum for float32; you have to track the mantissa as well as the exponent for float values. You might find these of some interest: http://www.cplusplus.com/reference/climits/ http://en.wikipedia.org/wiki/C_data_types But often it's easier to think about things in terms of relative differences -- the problem isn't representing big or small values, it's representing relatively small differences. A couple years ago I forwarded the exchange I had on numpy-discussion about this to yt-dev; if you look in the May 2010 archives you'll see it. -Matt
From G.S.
On Sat, Dec 21, 2013 at 5:43 AM, Matthew Turk
wrote: Hi Geoffrey,
On Fri, Dec 20, 2013 at 9:02 PM, Geoffrey So
wrote: Hi all,
I found that using a Enzo dataset I was getting slightly different numbers when 1) using the list of dark matter particles selected by creation_time < 0.0
In [97]: sph_dm = sph['creation_time'] < 0.0 In [98]: print "%12.12e" % (sph['ParticleMassMsun'][sph_dm]).sum() 1.211311468567e+11
2) compared to summing the dark matter particles inside a 3D container with TotalQuantity
In [101]: print "%12.12e" % (sph.quantities['TotalQuantity']('Dark_Matter_Density')[0]*vol/Msun) 1.188937185993e+11
I'm wondering if the field and particles are handled differently when being counted as inside or outside the 3D container?
I think John's answer is completely correct, and likely the dominant problem, but there's a funny thing that happens if you're not extra careful with big arrays. As an example, something that I ran into a couple years ago:
# This is an array filled with ones. arr = numpy.ones((512, 512, 512), dtype="float32") print arr.sum() print arr.size print arr.sum(dtype = "float64")
The output isn't what you might expect, because numpy uses the dtype of the array for the accumulation. The quantities -- thanks to Doug Rudd -- are uniformly careful about upgrading to 64 bits the accumulators used for quantities. Again, I don't think this is dominant, but it is something to be aware of.
-Matt
From G.S.
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
_______________________________________________ yt-users mailing list yt-users@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-users-spacepope.org
participants (3)
-
Geoffrey So
-
John ZuHone
-
Matthew Turk