I was wondering if there is a consensus about the precision of the data output by YT should be? I've came upon this question when trying to output ellipsoidal parameters along with halo attributes. With the current setting, the halo attributes are outputted with 9 decimal points, but the ellipsoid parameters determined using the particle's position (when the data is 64 bit) has 16 decimals. I was thinking that it is best to keep whatever precision we have, but Stephen brought up a good point, that the halos are only vaguely defined, that the extra digits are a waste of storage. So I just want to see if anyone even cares, and what we want to go with. Either way, I think it is best to be consistent across the board, having both halo or ellipsoid IO in the same amount of decimals. From G.S.
Hi all,
With the current setting, the halo attributes are outputted with 9 decimal points, but the ellipsoid parameters determined using the particle's position (when the data is 64 bit) has 16 decimals.
just to clarify, what I've done is to add the option to the halos.write_out() function (that outputs the HopAnalysis.out file) to add 5 or so extra columns for the ellipsoid information. So what Geoffrey is thinking about is increasing the precision of all the floats in that text file. -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
Hi Stephen and Geoffrey, I would prefer we stick with the longer IO output. The reason is not as much that we believe that a halo truly does exist with that specified precision, but to do our very best to ensure that we communicate between sessions the precise location. This may also come into play with very high precision runs. My personal preference would be to utilize an all-binary storage format as our *primary* storage format and then allow ASCII for secondary, caveat emptor purposes. I believe that both the IRATE group and the Galacticus group are pushing forward with halo cataloging methods that will be binary. -Matt On Sun, Aug 14, 2011 at 9:24 AM, Stephen Skory <s@skory.us> wrote:
Hi all,
With the current setting, the halo attributes are outputted with 9 decimal points, but the ellipsoid parameters determined using the particle's position (when the data is 64 bit) has 16 decimals.
just to clarify, what I've done is to add the option to the halos.write_out() function (that outputs the HopAnalysis.out file) to add 5 or so extra columns for the ellipsoid information. So what Geoffrey is thinking about is increasing the precision of all the floats in that text file.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
I am all for binary, especially since storage is a big problem for some of the bigger simulations. I am just curious, though, which binary are the other simulation groups pushing for? Is there a consistent way of storing binary data in YT? I am asking because currently sometimes I pickle my own data or store them in hdf5 .h5 binary or in the .yt files, and I didn't realize the different binaries are not compatible until recently. Are we looking for a universal YT way of storing binaries? One big advantage would be everyone being able to read the binary if we all store them the same way, and I'm sure there's many other advantages. But the headache is everyone agreeing to store them whichever way. From G.S. On Mon, Aug 15, 2011 at 6:02 AM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Stephen and Geoffrey,
I would prefer we stick with the longer IO output. The reason is not as much that we believe that a halo truly does exist with that specified precision, but to do our very best to ensure that we communicate between sessions the precise location. This may also come into play with very high precision runs.
My personal preference would be to utilize an all-binary storage format as our *primary* storage format and then allow ASCII for secondary, caveat emptor purposes. I believe that both the IRATE group and the Galacticus group are pushing forward with halo cataloging methods that will be binary.
-Matt
On Sun, Aug 14, 2011 at 9:24 AM, Stephen Skory <s@skory.us> wrote:
Hi all,
With the current setting, the halo attributes are outputted with 9 decimal points, but the ellipsoid parameters determined using the particle's position (when the data is 64 bit) has 16 decimals.
just to clarify, what I've done is to add the option to the halos.write_out() function (that outputs the HopAnalysis.out file) to add 5 or so extra columns for the ellipsoid information. So what Geoffrey is thinking about is increasing the precision of all the floats in that text file.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Geoffrey, On Mon, Aug 15, 2011 at 10:19 PM, Geoffrey So <gsiisg@gmail.com> wrote:
I am all for binary, especially since storage is a big problem for some of the bigger simulations.
I am just curious, though, which binary are the other simulation groups pushing for? Is there a consistent way of storing binary data in YT? I am asking because currently sometimes I pickle my own data or store them in hdf5 .h5 binary or in the .yt files, and I didn't realize the different binaries are not compatible until recently.
I can take a moment to explain the difference between pickling and the binary formats in yt, and why right now the .yt files use pickle occasionally. The .yt files utilize two different methods of storage. For projections, individual fields from the .data attribute of the projection are stored such that they can be reproduced external to Python. However, generic objects are stored as pickles. Pickles are bytecode interpretations of how to reconstruct a Python object's state; they describe the various attributes, but more importantly, they include information about how to import necessary modules. This works very well for simple objects; in yt, it is assumed that unprocessed data persists between calls, and is easy to recreate. (Projections, which can take a while, are a natural exception to this rule. They are not serialized using pickle, but using a manual reconstruction method defined in the object.) So the question then comes up, when you store a "Sphere" object, what exactly are you storing if you assume the loaded data is cheap to recreate? You store the field parameters, the radius, the center, and the parameter file off which it hangs. This naturally falls to a pickle. This is described in some more detail inside the ApJS paper. For naturally array-based objects, if you use store_data, it will pickle them, and then store the string as an HDF5 array. This is not naturally compatible with reading in external to yt; however, if you assume that there's going to be some handshake between you and the other user, you would have to agree on a format anyway, so you should take it upon yourself to store it in that format. To your question about what standards the other groups are supporting, I hope to have a better answer later this week. (Last week I was at a workshop on large data sets, and we came up with a few ideas which we're still iterating on.) For more info on pickle: http://nadiana.com/python-pickle-insecure and the included links
Are we looking for a universal YT way of storing binaries? One big advantage would be everyone being able to read the binary if we all store them the same way, and I'm sure there's many other advantages. But the headache is everyone agreeing to store them whichever way.
I don't understand "storing binaries." Is there a gigantic set of objects that we want to store to access external to yt, or is it just halos and merger trees and (maybe) projections? (I would argue that storing adaptive projections for use outside of yt is not necessarily productive.) I think an important question to address is, who is our audience with this? Others may disagree, but my development time is somewhat limited, and my priority is not to interoperate with people who would rather plot images in IDL, for instance. I'm not strictly opposed to their ability to do this, I just don't think that we should focus on that rather than attempt to provide the best analysis environment possible. That being said, however, there is a (very early) skunkworks project going on to create a data sharing service which will require binary serialization of most yt objects, independent of the parameter file. This will ultimately require binary serialization, but it is not being designed for wide interop or for massive sets of standards. Just very simple serve-n-share, where pickles are not necessarily the best way of passing data. -Matt
From G.S. On Mon, Aug 15, 2011 at 6:02 AM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Stephen and Geoffrey,
I would prefer we stick with the longer IO output. The reason is not as much that we believe that a halo truly does exist with that specified precision, but to do our very best to ensure that we communicate between sessions the precise location. This may also come into play with very high precision runs.
My personal preference would be to utilize an all-binary storage format as our *primary* storage format and then allow ASCII for secondary, caveat emptor purposes. I believe that both the IRATE group and the Galacticus group are pushing forward with halo cataloging methods that will be binary.
-Matt
On Sun, Aug 14, 2011 at 9:24 AM, Stephen Skory <s@skory.us> wrote:
Hi all,
With the current setting, the halo attributes are outputted with 9 decimal points, but the ellipsoid parameters determined using the particle's position (when the data is 64 bit) has 16 decimals.
just to clarify, what I've done is to add the option to the halos.write_out() function (that outputs the HopAnalysis.out file) to add 5 or so extra columns for the ellipsoid information. So what Geoffrey is thinking about is increasing the precision of all the floats in that text file.
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ Yt-dev mailing list Yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (3)
-
Geoffrey So
-
Matthew Turk
-
Stephen Skory