Hi all, I was trying to calculate the spin parameter for a halo with the following simple script, center = [0.495013, 0.494046, 0.495160] radius = 100 # pc pf = load("DD0019/output_0019") sp = pf.h.sphere(center, radius/pf["pc"]) gas_spin = sp.quantities["BaryonSpinParameter"]() and it doesn't work in parallel. It gives the following traceback. --- Traceback (most recent call last): File "spin.py", line 9, in <module> gas_spin = sp.quantities["BaryonSpinParameter"]() File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 85, in __call__ self.func(e, *args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 237, in _BaryonSpinParameter am = data["SpecificAngularMomentum"]*data["CellMassMsun"] ValueError: shape mismatch: objects cannot be broadcast to a single shape Traceback (most recent call last): File "spin.py", line 9, in <module> gas_spin = sp.quantities["BaryonSpinParameter"]() File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 85, in __call__ self.func(e, *args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 237, in _BaryonSpinParameter am = data["SpecificAngularMomentum"]*data["CellMassMsun"] ValueError: shape mismatch: objects cannot be broadcast to a single shape --- This happens on multiple systems. I looked into this further, and I think the shapes of the angular momentum data are not being preserved in the MPI calls. Also the shapes appear wrong in a serial calculation. In serial, everything looks fine until data["SpecificAngularMomentum"] is returned. Here are the shapes of the arrays. v_vec: (3,16,16,16) cross product of r_vec and v_vec: (3,16,16,16) and the returned array from _SpecificAngularMomentum has a shape of (3,0)! Then when I try to run it with 2 processors, the shapes change to v_vec: (3,4096) cross: (3,4096) _SpecificAngularMomentum result: (12288,) Can someone replicate this error? Any ideas? Thanks, John
Hi John, (Sorry for the delay, I was away from email yesterday and that fortuitously gave me time to compose my reply in my head a bit.) I guess this isn't that surprising to me. I have done a poor job of handling vector fields in yt. I think the AngularMomentum fields actually date back to before the code was called 'yt,' and back in those days you could even calculate the inertial tensor as a fully-local field. The problem that I ended up running into is exactly the problem you describe. The problem is basically what you point out: 1) When generating a covering grid, you end up with (I,J,K,N) where N is the number of components to the vector or tensor field. 2) Concatenating and joining needs substantially more care, as do operations like multiplication where broadcasting may result in exceptionally large arrays if not handled correctly. 3) The weighting and averaging of fields like angular momentum often have to be handled differently based on both the user and the type of process being done; by making it epsilon harder, I guess I implicitly made it less likely that averaging errors were the result of yt's issues rather than the user. I know, I know, this kind of decision is a bit ... callous / inconsiderate? To get around this, I actually added SpecificAngularMomentum[XYZ], and that's what I've been using. This made it more clear when you wanted to generate, say, the angular momentum magnitude in a given radial bin, where the square root sign is placed. Anyway, those things aside, I do agree it would be nice to have a bit of an easier mechanism for handling vector fields. In fact, it would be nice to have an easier and more standard mechanism for addressing fields on the whole. I think while we could fix this specific problem, it would be better to hold off on addressing it directly until we have both a better method for handling fields in general as well as a better method for regression and answer testing. There are a few ways we could address the issue of fields. Right now fields are just numpy.ndarrays, and individual fields do not know anything about themselves: there is such a thing as a DerivedField object (line 220, yt/data_objects/field_info_container.py), but this object only serves as a value in a dictionary of fields and as a mechanism for generating an numpy.ndarray; once the numpy.ndarray had been created, it no longer retains any reference to the original field. This is good and bad; if we were to subclass ndarray to retain field name, units, etc, then it would be bad for this to contain an incorrect reference once any value had been changed. (i.e., if we multiplied by 10 and the old field name were to be retained.) However, we *could* subclass ndarray (http://docs.scipy.org/doc/numpy/user/basics.subclassing.html) to get some measure of functionality. If we were to subclass, then we would be able to implement combination functions, etc etc, that operated in a yt-specific way. However, there *are* only a few places that we actually do manipulations of the array content. We could construct a new utilities section for array handling; this could include the concatenation of arrays inside AMRData objects, multiplication, and so on. This is a bit of a longer term process, and again would require better testing mechanisms than we currently have in place. (And while we're at it, I'd like to standardize on field names that mean something and that are distinct from the fields in the file. This code can be a bit fragile, with translation tables and so on.) I guess that's kind of a winding response that ends with: right now, vector/tensor fields can be generated, but are not reliable inputs to derived quantities and parallelism. However, you can get around this with specific components. And, in the future, we might be able to fix this to make a general solution, but the code base isn't ready just yet. Best, Matt On Tue, Feb 8, 2011 at 9:02 AM, John Wise <jwise@astro.princeton.edu> wrote:
Hi all,
I was trying to calculate the spin parameter for a halo with the following simple script,
center = [0.495013, 0.494046, 0.495160] radius = 100 # pc pf = load("DD0019/output_0019") sp = pf.h.sphere(center, radius/pf["pc"]) gas_spin = sp.quantities["BaryonSpinParameter"]()
and it doesn't work in parallel. It gives the following traceback.
--- Traceback (most recent call last): File "spin.py", line 9, in <module> gas_spin = sp.quantities["BaryonSpinParameter"]() File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 85, in __call__ self.func(e, *args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 237, in _BaryonSpinParameter am = data["SpecificAngularMomentum"]*data["CellMassMsun"] ValueError: shape mismatch: objects cannot be broadcast to a single shape Traceback (most recent call last): File "spin.py", line 9, in <module> gas_spin = sp.quantities["BaryonSpinParameter"]() File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 85, in __call__ self.func(e, *args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 237, in _BaryonSpinParameter am = data["SpecificAngularMomentum"]*data["CellMassMsun"] ValueError: shape mismatch: objects cannot be broadcast to a single shape ---
This happens on multiple systems. I looked into this further, and I think the shapes of the angular momentum data are not being preserved in the MPI calls. Also the shapes appear wrong in a serial calculation. In serial, everything looks fine until data["SpecificAngularMomentum"] is returned. Here are the shapes of the arrays.
v_vec: (3,16,16,16) cross product of r_vec and v_vec: (3,16,16,16) and the returned array from _SpecificAngularMomentum has a shape of (3,0)!
Then when I try to run it with 2 processors, the shapes change to v_vec: (3,4096) cross: (3,4096) _SpecificAngularMomentum result: (12288,)
Can someone replicate this error? Any ideas?
Thanks, John _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Matt, Thanks for the in-depth explanation and no worries on the delay! This makes sense to me now. I think the idea of having a set of array utilities would be best for yt. But as you say, this will take some time and patience to migrate it to the rest of the code. I can go ahead and update the derived quantities that use vector fields to consider the components instead. John On 9 Feb 2011, at 10:29, Matthew Turk wrote:
Hi John,
(Sorry for the delay, I was away from email yesterday and that fortuitously gave me time to compose my reply in my head a bit.)
I guess this isn't that surprising to me. I have done a poor job of handling vector fields in yt. I think the AngularMomentum fields actually date back to before the code was called 'yt,' and back in those days you could even calculate the inertial tensor as a fully-local field.
The problem that I ended up running into is exactly the problem you describe. The problem is basically what you point out:
1) When generating a covering grid, you end up with (I,J,K,N) where N is the number of components to the vector or tensor field. 2) Concatenating and joining needs substantially more care, as do operations like multiplication where broadcasting may result in exceptionally large arrays if not handled correctly. 3) The weighting and averaging of fields like angular momentum often have to be handled differently based on both the user and the type of process being done; by making it epsilon harder, I guess I implicitly made it less likely that averaging errors were the result of yt's issues rather than the user. I know, I know, this kind of decision is a bit ... callous / inconsiderate?
To get around this, I actually added SpecificAngularMomentum[XYZ], and that's what I've been using. This made it more clear when you wanted to generate, say, the angular momentum magnitude in a given radial bin, where the square root sign is placed.
Anyway, those things aside, I do agree it would be nice to have a bit of an easier mechanism for handling vector fields. In fact, it would be nice to have an easier and more standard mechanism for addressing fields on the whole. I think while we could fix this specific problem, it would be better to hold off on addressing it directly until we have both a better method for handling fields in general as well as a better method for regression and answer testing.
There are a few ways we could address the issue of fields.
Right now fields are just numpy.ndarrays, and individual fields do not know anything about themselves: there is such a thing as a DerivedField object (line 220, yt/data_objects/field_info_container.py), but this object only serves as a value in a dictionary of fields and as a mechanism for generating an numpy.ndarray; once the numpy.ndarray had been created, it no longer retains any reference to the original field. This is good and bad; if we were to subclass ndarray to retain field name, units, etc, then it would be bad for this to contain an incorrect reference once any value had been changed. (i.e., if we multiplied by 10 and the old field name were to be retained.) However, we *could* subclass ndarray (http://docs.scipy.org/doc/numpy/user/basics.subclassing.html) to get some measure of functionality. If we were to subclass, then we would be able to implement combination functions, etc etc, that operated in a yt-specific way.
However, there *are* only a few places that we actually do manipulations of the array content. We could construct a new utilities section for array handling; this could include the concatenation of arrays inside AMRData objects, multiplication, and so on. This is a bit of a longer term process, and again would require better testing mechanisms than we currently have in place.
(And while we're at it, I'd like to standardize on field names that mean something and that are distinct from the fields in the file. This code can be a bit fragile, with translation tables and so on.)
I guess that's kind of a winding response that ends with: right now, vector/tensor fields can be generated, but are not reliable inputs to derived quantities and parallelism. However, you can get around this with specific components. And, in the future, we might be able to fix this to make a general solution, but the code base isn't ready just yet.
Best,
Matt
On Tue, Feb 8, 2011 at 9:02 AM, John Wise <jwise@astro.princeton.edu> wrote:
Hi all,
I was trying to calculate the spin parameter for a halo with the following simple script,
center = [0.495013, 0.494046, 0.495160] radius = 100 # pc pf = load("DD0019/output_0019") sp = pf.h.sphere(center, radius/pf["pc"]) gas_spin = sp.quantities["BaryonSpinParameter"]()
and it doesn't work in parallel. It gives the following traceback.
--- Traceback (most recent call last): File "spin.py", line 9, in <module> gas_spin = sp.quantities["BaryonSpinParameter"]() File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 85, in __call__ self.func(e, *args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 237, in _BaryonSpinParameter am = data["SpecificAngularMomentum"]*data["CellMassMsun"] ValueError: shape mismatch: objects cannot be broadcast to a single shape Traceback (most recent call last): File "spin.py", line 9, in <module> gas_spin = sp.quantities["BaryonSpinParameter"]() File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 85, in __call__ self.func(e, *args, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/yt-2.1dev-py2.6-macosx-10.6-intel.egg/yt/data_objects/derived_quantities.py", line 237, in _BaryonSpinParameter am = data["SpecificAngularMomentum"]*data["CellMassMsun"] ValueError: shape mismatch: objects cannot be broadcast to a single shape ---
This happens on multiple systems. I looked into this further, and I think the shapes of the angular momentum data are not being preserved in the MPI calls. Also the shapes appear wrong in a serial calculation. In serial, everything looks fine until data["SpecificAngularMomentum"] is returned. Here are the shapes of the arrays.
v_vec: (3,16,16,16) cross product of r_vec and v_vec: (3,16,16,16) and the returned array from _SpecificAngularMomentum has a shape of (3,0)!
Then when I try to run it with 2 processors, the shapes change to v_vec: (3,4096) cross: (3,4096) _SpecificAngularMomentum result: (12288,)
Can someone replicate this error? Any ideas?
Thanks, John _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (2)
-
John Wise
-
Matthew Turk