Re: [yt-dev] Interacting with data in yt 3.0 (was Field units from code to code)
In general, I agree with the idea Nathan put out. (Also, I think this is a fine time to have a bikeshed discussion. Many of the underlying assumptions about how yt works were laid out a long time ago.) But, I'm not entirely sure I understand how different it would be -- conceptually, yes, I see what you're getting at, that we'd have a set number of attributes. In what I was thinking of for the geometry refactor so far I'm trying to get rid of the "hierarchy" as existing for every data set, and instead relying on what amounts to an object-finder and io-coordinator, which I'm calling a geometry handler. It sounds like what you would like is: 1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects. I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput: refine_by dimensionality current_time domain_dimensions domain_left_edge domain_right_edge unique_identifier current_redshift cosmological_simulation omega_matter omega_lambda hubble_constant The only ones here that I think would be okay to move out of properties would be the cosmology items, and even those I'm -0 on moving. But, in general, the idea of moving from this two-stage system of parameter file (rather than dataset) and hierarchy (rather than an implicitly-handled geometry) is something I am in support of. The geometry is something that should nearly *always* be handled by the backend, rather than by the user. So having the library require pf.h.sphere(...) is less than ideal, since it's exposing something relatively unfortunate (that building a hundred thousand grid objects can take some time). The main ways that the static output is interacted with: * Parameter information specific to a simulation code * Properties that yt needs to know about * To get at the hierarchy * Input to plot collections The main ways that the hierarchy is interacted with: * Getting data objects * Finding max * Statistics about the simulation * Inspecting individual grids (much less common use case now that it was before) All of these use cases are still valid, but I think it's clear that accessing individual grids and accessing simulation-specific parameters are not "generic" functions. What a lot of this discussion has really brought up for me is that we're talking about *generic* functionality, not code-specific functionality, and we right now do not have the best enumeration of functionality and where it lies. With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary. This brings up two points, though -- 1) Does our method of instantiating objects still hold up? i.e., ds.sphere(...) and so on? Or does our dataset object then become overcrowded? I would also like to move *all* plotting objects into whatever we end up deciding is the location data containers come from, which for instance could look like ds.plot("slice", "x") (for instance, although we can bikeshed that later), which would return a plot window. 2) Datasets and time series should behave, if not identically, at least consistently in their APIs. Moving to a completely ds-mediated mechanism for generating, accessing and inspecting data opens up the ability to then construct very nice and simply proxy objects. As an example, while something this is currently technically possible with the current Time Series API, it's a bit tricky: ts = TimeSeriesData.from_filenames(...) plot = ts.plot("slice", "x", (100.0, 'au')) ts.seek(dt = (100, 'years')) plot.save() ts.seek(dt = (10, 'years')) plot.save() (The time-slider, as Tom likes to call it ...) In general, this idea of moving toward more thoughtful dataset-construction, rather than the hokey parameter file + hierarchy construction brings with it a mindset shift which I'd like to spread to the time series, which can continue to be a focus. What do you think? -Matt On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark <caseywstark@gmail.com> wrote:
+1 on datasets, although I would like to see the unit object(s) at the field level.
On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels <chummels@astro.columbia.edu> wrote:
+1 on datasets.
On 3/29/12 6:58 PM, Nathan Goldbaum wrote:
+1. I'd also be up to help out with the sprint. Doing a virtual sprint using a google hangout might help mitigate some of the distance problems.
While we're brining up Enzo-isms that we should get rid of, I think it might be a good idea to make a conceptual shift in the basic python UI. Instead referring to the interface between the user and the data as a parameter file, I think instead we should be talking about datasets. One would instantiate a dataset just like we do now with parameter files:
ds = load(filename)
A dataset would also have some universal attributes which would present themselves to the user as a dict, e.g. ds.units, ds.parameters, ds.basic_info (like current_time, timestep, filename, and simulation code), and ds.hierarchy (not sure how that would interfere with the geometry refactor).
This may be a paintibg the bike shed discussion, but I think this shift will help new users understand how to access their data. Thoughts?
On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk@gmail.com> wrote:
Hi Nathan and Casey,
I agree with what both of you have said. The Orion/Nyx units should be made to be consistent, but more importantly I think we should continue breaking away from Enzo-isms in the code.
As it stands, all of the universal fields call underlying Enzo-named aliases -- Density, ThermalEnergy, etc etc. I hope we can have a 3.0 out within a calendar year, hopefully by the end of this year. (I've been pushing on the geometry refactor, although recently other efforts have been paying off which has decreased my output there.) I am much, much less doubtful than Casey is that we cannot do this; in fact, I'm completely in favor of this and I think it would be relatively straightforward to implement.
In the existing system we have a mechanism for aliasing fields. What we can do is provide an additional translation system where we enumerate the fields that are available for items in UniversalFields, and then construct aliases to those. This would mean changing what is aliased in existing non-Enzo frontends, and adding aliases in Enzo. The style of name Casey proposes is what I woudl also agree with: underscores, lower cases, and erring on the side of verbosity. The fields off hand that we would need to do this for (in their current enzo-isms):
x-velocity => velocity_x (same for y, z) Density => density TotalEnergy => ? GasEnergy => thermal_energy_specific (and thermal_energy_density) Temperature => temperature
and so on.
Once we have these aliases in place, an overall cleanup of UniversalFields should take place. One place we should clean up is ensuring that there are no conditionals; rather than conditionals inside the functions, we should place those conditionals inside the parameter file types. So for instance, if you have a field that is calculated differently depending on the parameter HydroMethod (in Enzo for instance) you simply set a validator on the field requiring the parameter be set to a particular value, and then only the field which satisfies that validator will be called when requested.
So we've gotten rid of a bunch of enzo-isms in the parameter files; after fields, what else can we address? And, I'd be up for sprinting on this (which should take just a few hours) basically any time next week or after. I'd also be up for talking more about geometry refactoring, if anyone is interested, but it's not quite to the point that I think I am satisfied enough with the architecture to request input / contributions. Sometimes (especially with big architectural things like this) I think it's a shame we do all of our work virtually, as I think a lot of this would be easier to bang out in person for a couple hours.
-Matt
On Wed, Mar 28, 2012 at 6:14 PM, Casey W. Stark<caseywstark@gmail.com> wrote:
Hi Nathan.
I'm also worried about this and I agree that fields with the same name should all be consistent. I would support some sort of cleanup of frontend fields, and I can get the Nyx fields in line and help with Enzo.
I doubt we can do this, but I would prefer changing the field names as part of the removing enzo-isms and geometry handling refactoring pushes. For instance, the field in Orion could be thermal_energy_density and the field in Enzo could be specific_thermal_energy. I also noticed this issue when I was using "Density" in Enzo (proper density in cgs) and "density" in Nyx (comoving density in cgs).
Best, Casey
On Wed, Mar 28, 2012 at 1:47 PM, Nathan Goldbaum<goldbaum@ucolick.org> wrote:
Hi all,
On IRC today we noticed that Orion defines its ThermalEnergy field per unit volume but Enzo and FLASH define ThermalEnergy per unit mass. Is this a problem? Since yt defaults to the Enzo field names, should we try to make sure that all fields are defined using the same units as in Enzo? Is there a convention for how different codes should define derived fields that are aliased to Enzo fields?
One problem for this particular example is that the Pressure field is defined in terms of ThermalEnergy in universal_fields.py so the units of ThermalEnergy become important if a user merely wants the gas pressure in the simulation.
One possible solution for this issue would be the units overhaul we're planning. If all fields are associated with a unit object, we can simply query the units to ensure that units are taken care of correctly and code-to-code comparisons aren't sensitive to the units chosen for fields in the frontend.
Personally, I think it would be best if we could make sure that all of the fields aliased to Enzo fields have the same units.
Nathan Goldbaum Graduate Student Astronomy& Astrophysics, UCSC
goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f74e5073356450621218!
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
I'd say #3 is the least important. I'd be fine with the dataset object having some non-dict attributes that describe the nature of the dataset rather than storing them all in a basic_info dict. One thing to think about: if we want to support pure-particle datasets, then we should drop the notion of refine_by as a basic dataset attribute.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
Why not get access to objects through a geometry attribute that hangs off of the dataset object. If I wanted to instantiate a sphere object, I would just do: sp = ds.geometry.sphere() This is pretty much the same as the pf.h.sphere() syntax in place right now but allows for arbitrary selection embedded inside of the new geometry code. Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum On Mar 30, 2012, at 3:48 AM, Matthew Turk wrote:
In general, I agree with the idea Nathan put out. (Also, I think this is a fine time to have a bikeshed discussion. Many of the underlying assumptions about how yt works were laid out a long time ago.) But, I'm not entirely sure I understand how different it would be -- conceptually, yes, I see what you're getting at, that we'd have a set number of attributes. In what I was thinking of for the geometry refactor so far I'm trying to get rid of the "hierarchy" as existing for every data set, and instead relying on what amounts to an object-finder and io-coordinator, which I'm calling a geometry handler. It sounds like what you would like is:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
refine_by dimensionality current_time domain_dimensions domain_left_edge domain_right_edge unique_identifier current_redshift cosmological_simulation omega_matter omega_lambda hubble_constant
The only ones here that I think would be okay to move out of properties would be the cosmology items, and even those I'm -0 on moving.
But, in general, the idea of moving from this two-stage system of parameter file (rather than dataset) and hierarchy (rather than an implicitly-handled geometry) is something I am in support of. The geometry is something that should nearly *always* be handled by the backend, rather than by the user. So having the library require pf.h.sphere(...) is less than ideal, since it's exposing something relatively unfortunate (that building a hundred thousand grid objects can take some time).
The main ways that the static output is interacted with:
* Parameter information specific to a simulation code * Properties that yt needs to know about * To get at the hierarchy * Input to plot collections
The main ways that the hierarchy is interacted with:
* Getting data objects * Finding max * Statistics about the simulation * Inspecting individual grids (much less common use case now that it was before)
All of these use cases are still valid, but I think it's clear that accessing individual grids and accessing simulation-specific parameters are not "generic" functions. What a lot of this discussion has really brought up for me is that we're talking about *generic* functionality, not code-specific functionality, and we right now do not have the best enumeration of functionality and where it lies.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
This brings up two points, though --
1) Does our method of instantiating objects still hold up? i.e., ds.sphere(...) and so on? Or does our dataset object then become overcrowded? I would also like to move *all* plotting objects into whatever we end up deciding is the location data containers come from, which for instance could look like ds.plot("slice", "x") (for instance, although we can bikeshed that later), which would return a plot window. 2) Datasets and time series should behave, if not identically, at least consistently in their APIs. Moving to a completely ds-mediated mechanism for generating, accessing and inspecting data opens up the ability to then construct very nice and simply proxy objects. As an example, while something this is currently technically possible with the current Time Series API, it's a bit tricky:
ts = TimeSeriesData.from_filenames(...) plot = ts.plot("slice", "x", (100.0, 'au')) ts.seek(dt = (100, 'years')) plot.save() ts.seek(dt = (10, 'years')) plot.save()
(The time-slider, as Tom likes to call it ...)
In general, this idea of moving toward more thoughtful dataset-construction, rather than the hokey parameter file + hierarchy construction brings with it a mindset shift which I'd like to spread to the time series, which can continue to be a focus.
What do you think?
-Matt
On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark <caseywstark@gmail.com> wrote:
+1 on datasets, although I would like to see the unit object(s) at the field level.
On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels <chummels@astro.columbia.edu> wrote:
+1 on datasets.
On 3/29/12 6:58 PM, Nathan Goldbaum wrote:
+1. I'd also be up to help out with the sprint. Doing a virtual sprint using a google hangout might help mitigate some of the distance problems.
While we're brining up Enzo-isms that we should get rid of, I think it might be a good idea to make a conceptual shift in the basic python UI. Instead referring to the interface between the user and the data as a parameter file, I think instead we should be talking about datasets. One would instantiate a dataset just like we do now with parameter files:
ds = load(filename)
A dataset would also have some universal attributes which would present themselves to the user as a dict, e.g. ds.units, ds.parameters, ds.basic_info (like current_time, timestep, filename, and simulation code), and ds.hierarchy (not sure how that would interfere with the geometry refactor).
This may be a paintibg the bike shed discussion, but I think this shift will help new users understand how to access their data. Thoughts?
On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk@gmail.com> wrote:
Hi Nathan and Casey,
I agree with what both of you have said. The Orion/Nyx units should be made to be consistent, but more importantly I think we should continue breaking away from Enzo-isms in the code.
As it stands, all of the universal fields call underlying Enzo-named aliases -- Density, ThermalEnergy, etc etc. I hope we can have a 3.0 out within a calendar year, hopefully by the end of this year. (I've been pushing on the geometry refactor, although recently other efforts have been paying off which has decreased my output there.) I am much, much less doubtful than Casey is that we cannot do this; in fact, I'm completely in favor of this and I think it would be relatively straightforward to implement.
In the existing system we have a mechanism for aliasing fields. What we can do is provide an additional translation system where we enumerate the fields that are available for items in UniversalFields, and then construct aliases to those. This would mean changing what is aliased in existing non-Enzo frontends, and adding aliases in Enzo. The style of name Casey proposes is what I woudl also agree with: underscores, lower cases, and erring on the side of verbosity. The fields off hand that we would need to do this for (in their current enzo-isms):
x-velocity => velocity_x (same for y, z) Density => density TotalEnergy => ? GasEnergy => thermal_energy_specific (and thermal_energy_density) Temperature => temperature
and so on.
Once we have these aliases in place, an overall cleanup of UniversalFields should take place. One place we should clean up is ensuring that there are no conditionals; rather than conditionals inside the functions, we should place those conditionals inside the parameter file types. So for instance, if you have a field that is calculated differently depending on the parameter HydroMethod (in Enzo for instance) you simply set a validator on the field requiring the parameter be set to a particular value, and then only the field which satisfies that validator will be called when requested.
So we've gotten rid of a bunch of enzo-isms in the parameter files; after fields, what else can we address? And, I'd be up for sprinting on this (which should take just a few hours) basically any time next week or after. I'd also be up for talking more about geometry refactoring, if anyone is interested, but it's not quite to the point that I think I am satisfied enough with the architecture to request input / contributions. Sometimes (especially with big architectural things like this) I think it's a shame we do all of our work virtually, as I think a lot of this would be easier to bang out in person for a couple hours.
-Matt
On Wed, Mar 28, 2012 at 6:14 PM, Casey W. Stark<caseywstark@gmail.com> wrote:
Hi Nathan.
I'm also worried about this and I agree that fields with the same name should all be consistent. I would support some sort of cleanup of frontend fields, and I can get the Nyx fields in line and help with Enzo.
I doubt we can do this, but I would prefer changing the field names as part of the removing enzo-isms and geometry handling refactoring pushes. For instance, the field in Orion could be thermal_energy_density and the field in Enzo could be specific_thermal_energy. I also noticed this issue when I was using "Density" in Enzo (proper density in cgs) and "density" in Nyx (comoving density in cgs).
Best, Casey
On Wed, Mar 28, 2012 at 1:47 PM, Nathan Goldbaum<goldbaum@ucolick.org> wrote: > > Hi all, > > On IRC today we noticed that Orion defines its ThermalEnergy field per > unit volume but Enzo and FLASH define ThermalEnergy per unit mass. Is > this > a problem? Since yt defaults to the Enzo field names, should we try > to make > sure that all fields are defined using the same units as in Enzo? Is > there > a convention for how different codes should define derived fields that > are > aliased to Enzo fields? > > One problem for this particular example is that the Pressure field is > defined in terms of ThermalEnergy in universal_fields.py so the units > of > ThermalEnergy become important if a user merely wants the gas pressure > in > the simulation. > > One possible solution for this issue would be the units overhaul we're > planning. If all fields are associated with a unit object, we can > simply > query the units to ensure that units are taken care of correctly and > code-to-code comparisons aren't sensitive to the units chosen for > fields in > the frontend. > > Personally, I think it would be best if we could make sure that all of > the > fields aliased to Enzo fields have the same units. > > Nathan Goldbaum > Graduate Student > Astronomy& Astrophysics, UCSC > > goldbaum@ucolick.org > http://www.ucolick.org/~goldbaum > > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f758f9f246202301928688!
On Fri, Mar 30, 2012 at 1:22 PM, Nathan Goldbaum <goldbaum@ucolick.org> wrote:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
I'd say #3 is the least important. I'd be fine with the dataset object having some non-dict attributes that describe the nature of the dataset rather than storing them all in a basic_info dict. One thing to think about: if we want to support pure-particle datasets, then we should drop the notion of refine_by as a basic dataset attribute.
I think whether refine_by sticks around depends on how we end up wanting to address fluid quantities in particle datasets. One possibility for handling SPH data is to grid it, and while I don't want to lock us into that (myopic at best) I don't want to exclude it as an ultimate possibility. But yes, in general, I agree. As I have been working on the geometry refactor, the number of times refine_by is access has been going down, as for the most part it relies on (for instance) the projection code knowing how to handle data from grids, which has been pshed back onto the grids instead. Now projections simply receive data that is ordered spatially, and that data is appropriately added.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
Why not get access to objects through a geometry attribute that hangs off of the dataset object. If I wanted to instantiate a sphere object, I would just do:
sp = ds.geometry.sphere()
This is pretty much the same as the pf.h.sphere() syntax in place right now but allows for arbitrary selection embedded inside of the new geometry code.
That's how I was implementing it. I just wasn't sure this was as clean. Having the plots then hang off the geometry feels a little funny. Also, I don't think I explicitly commented on Casey's hangout suggestion -- I am in favor. Could we do Tuesday afternoon (late morning CA time) or Wednesday same? -Matt
Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
On Mar 30, 2012, at 3:48 AM, Matthew Turk wrote:
In general, I agree with the idea Nathan put out. (Also, I think this is a fine time to have a bikeshed discussion. Many of the underlying assumptions about how yt works were laid out a long time ago.) But, I'm not entirely sure I understand how different it would be -- conceptually, yes, I see what you're getting at, that we'd have a set number of attributes. In what I was thinking of for the geometry refactor so far I'm trying to get rid of the "hierarchy" as existing for every data set, and instead relying on what amounts to an object-finder and io-coordinator, which I'm calling a geometry handler. It sounds like what you would like is:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
refine_by dimensionality current_time domain_dimensions domain_left_edge domain_right_edge unique_identifier current_redshift cosmological_simulation omega_matter omega_lambda hubble_constant
The only ones here that I think would be okay to move out of properties would be the cosmology items, and even those I'm -0 on moving.
But, in general, the idea of moving from this two-stage system of parameter file (rather than dataset) and hierarchy (rather than an implicitly-handled geometry) is something I am in support of. The geometry is something that should nearly *always* be handled by the backend, rather than by the user. So having the library require pf.h.sphere(...) is less than ideal, since it's exposing something relatively unfortunate (that building a hundred thousand grid objects can take some time).
The main ways that the static output is interacted with:
* Parameter information specific to a simulation code * Properties that yt needs to know about * To get at the hierarchy * Input to plot collections
The main ways that the hierarchy is interacted with:
* Getting data objects * Finding max * Statistics about the simulation * Inspecting individual grids (much less common use case now that it was before)
All of these use cases are still valid, but I think it's clear that accessing individual grids and accessing simulation-specific parameters are not "generic" functions. What a lot of this discussion has really brought up for me is that we're talking about *generic* functionality, not code-specific functionality, and we right now do not have the best enumeration of functionality and where it lies.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
This brings up two points, though --
1) Does our method of instantiating objects still hold up? i.e., ds.sphere(...) and so on? Or does our dataset object then become overcrowded? I would also like to move *all* plotting objects into whatever we end up deciding is the location data containers come from, which for instance could look like ds.plot("slice", "x") (for instance, although we can bikeshed that later), which would return a plot window. 2) Datasets and time series should behave, if not identically, at least consistently in their APIs. Moving to a completely ds-mediated mechanism for generating, accessing and inspecting data opens up the ability to then construct very nice and simply proxy objects. As an example, while something this is currently technically possible with the current Time Series API, it's a bit tricky:
ts = TimeSeriesData.from_filenames(...) plot = ts.plot("slice", "x", (100.0, 'au')) ts.seek(dt = (100, 'years')) plot.save() ts.seek(dt = (10, 'years')) plot.save()
(The time-slider, as Tom likes to call it ...)
In general, this idea of moving toward more thoughtful dataset-construction, rather than the hokey parameter file + hierarchy construction brings with it a mindset shift which I'd like to spread to the time series, which can continue to be a focus.
What do you think?
-Matt
On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark <caseywstark@gmail.com> wrote:
+1 on datasets, although I would like to see the unit object(s) at the field level.
On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels <chummels@astro.columbia.edu> wrote:
+1 on datasets.
On 3/29/12 6:58 PM, Nathan Goldbaum wrote:
+1. I'd also be up to help out with the sprint. Doing a virtual sprint using a google hangout might help mitigate some of the distance problems.
While we're brining up Enzo-isms that we should get rid of, I think it might be a good idea to make a conceptual shift in the basic python UI. Instead referring to the interface between the user and the data as a parameter file, I think instead we should be talking about datasets. One would instantiate a dataset just like we do now with parameter files:
ds = load(filename)
A dataset would also have some universal attributes which would present themselves to the user as a dict, e.g. ds.units, ds.parameters, ds.basic_info (like current_time, timestep, filename, and simulation code), and ds.hierarchy (not sure how that would interfere with the geometry refactor).
This may be a paintibg the bike shed discussion, but I think this shift will help new users understand how to access their data. Thoughts?
On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk@gmail.com> wrote:
Hi Nathan and Casey,
I agree with what both of you have said. The Orion/Nyx units should be made to be consistent, but more importantly I think we should continue breaking away from Enzo-isms in the code.
As it stands, all of the universal fields call underlying Enzo-named aliases -- Density, ThermalEnergy, etc etc. I hope we can have a 3.0 out within a calendar year, hopefully by the end of this year. (I've been pushing on the geometry refactor, although recently other efforts have been paying off which has decreased my output there.) I am much, much less doubtful than Casey is that we cannot do this; in fact, I'm completely in favor of this and I think it would be relatively straightforward to implement.
In the existing system we have a mechanism for aliasing fields. What we can do is provide an additional translation system where we enumerate the fields that are available for items in UniversalFields, and then construct aliases to those. This would mean changing what is aliased in existing non-Enzo frontends, and adding aliases in Enzo. The style of name Casey proposes is what I woudl also agree with: underscores, lower cases, and erring on the side of verbosity. The fields off hand that we would need to do this for (in their current enzo-isms):
x-velocity => velocity_x (same for y, z) Density => density TotalEnergy => ? GasEnergy => thermal_energy_specific (and thermal_energy_density) Temperature => temperature
and so on.
Once we have these aliases in place, an overall cleanup of UniversalFields should take place. One place we should clean up is ensuring that there are no conditionals; rather than conditionals inside the functions, we should place those conditionals inside the parameter file types. So for instance, if you have a field that is calculated differently depending on the parameter HydroMethod (in Enzo for instance) you simply set a validator on the field requiring the parameter be set to a particular value, and then only the field which satisfies that validator will be called when requested.
So we've gotten rid of a bunch of enzo-isms in the parameter files; after fields, what else can we address? And, I'd be up for sprinting on this (which should take just a few hours) basically any time next week or after. I'd also be up for talking more about geometry refactoring, if anyone is interested, but it's not quite to the point that I think I am satisfied enough with the architecture to request input / contributions. Sometimes (especially with big architectural things like this) I think it's a shame we do all of our work virtually, as I think a lot of this would be easier to bang out in person for a couple hours.
-Matt
On Wed, Mar 28, 2012 at 6:14 PM, Casey W. Stark<caseywstark@gmail.com> wrote: > > Hi Nathan. > > I'm also worried about this and I agree that fields with the same name > should all be consistent. I would support some sort of cleanup of > frontend > fields, and I can get the Nyx fields in line and help with Enzo. > > I doubt we can do this, but I would prefer changing the field names as > part > of the removing enzo-isms and geometry handling refactoring pushes. For > instance, the field in Orion could be thermal_energy_density and the > field > in Enzo could be specific_thermal_energy. I also noticed this issue > when I > was using "Density" in Enzo (proper density in cgs) and "density" in > Nyx > (comoving density in cgs). > > Best, > Casey > > > On Wed, Mar 28, 2012 at 1:47 PM, Nathan Goldbaum<goldbaum@ucolick.org> > wrote: >> >> Hi all, >> >> On IRC today we noticed that Orion defines its ThermalEnergy field per >> unit volume but Enzo and FLASH define ThermalEnergy per unit mass. Is >> this >> a problem? Since yt defaults to the Enzo field names, should we try >> to make >> sure that all fields are defined using the same units as in Enzo? Is >> there >> a convention for how different codes should define derived fields that >> are >> aliased to Enzo fields? >> >> One problem for this particular example is that the Pressure field is >> defined in terms of ThermalEnergy in universal_fields.py so the units >> of >> ThermalEnergy become important if a user merely wants the gas pressure >> in >> the simulation. >> >> One possible solution for this issue would be the units overhaul we're >> planning. If all fields are associated with a unit object, we can >> simply >> query the units to ensure that units are taken care of correctly and >> code-to-code comparisons aren't sensitive to the units chosen for >> fields in >> the frontend. >> >> Personally, I think it would be best if we could make sure that all of >> the >> fields aliased to Enzo fields have the same units. >> >> Nathan Goldbaum >> Graduate Student >> Astronomy& Astrophysics, UCSC >> >> goldbaum@ucolick.org >> http://www.ucolick.org/~goldbaum >> >> _______________________________________________ >> yt-dev mailing list >> yt-dev@lists.spacepope.org >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org > > > > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org > _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f758f9f246202301928688!
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
I think I forgot to reply -- Tuesday works for me and Wednesday is good before 11 or after 12:30 Pacific. We can sort this out during the hangout, but which issue are we focusing on? Is this more for the units system, renaming fields in the 3.0 branch, or the dataset change? (or maybe something else that was mentioned, there were a lot) Best, Casey On Fri, Mar 30, 2012 at 12:59 PM, Matthew Turk <matthewturk@gmail.com>wrote:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
I'd say #3 is the least important. I'd be fine with the dataset object having some non-dict attributes that describe the nature of the dataset rather than storing them all in a basic_info dict. One thing to think about: if we want to support pure-particle datasets, then we should drop
On Fri, Mar 30, 2012 at 1:22 PM, Nathan Goldbaum <goldbaum@ucolick.org> wrote: the notion of refine_by as a basic dataset attribute.
I think whether refine_by sticks around depends on how we end up wanting to address fluid quantities in particle datasets. One possibility for handling SPH data is to grid it, and while I don't want to lock us into that (myopic at best) I don't want to exclude it as an ultimate possibility. But yes, in general, I agree. As I have been working on the geometry refactor, the number of times refine_by is access has been going down, as for the most part it relies on (for instance) the projection code knowing how to handle data from grids, which has been pshed back onto the grids instead. Now projections simply receive data that is ordered spatially, and that data is appropriately added.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
Why not get access to objects through a geometry attribute that hangs
off of the dataset object. If I wanted to instantiate a sphere object, I would just do:
sp = ds.geometry.sphere()
This is pretty much the same as the pf.h.sphere() syntax in place right
now but allows for arbitrary selection embedded inside of the new geometry code.
That's how I was implementing it. I just wasn't sure this was as clean. Having the plots then hang off the geometry feels a little funny.
Also, I don't think I explicitly commented on Casey's hangout suggestion -- I am in favor. Could we do Tuesday afternoon (late morning CA time) or Wednesday same?
-Matt
Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
On Mar 30, 2012, at 3:48 AM, Matthew Turk wrote:
In general, I agree with the idea Nathan put out. (Also, I think this is a fine time to have a bikeshed discussion. Many of the underlying assumptions about how yt works were laid out a long time ago.) But, I'm not entirely sure I understand how different it would be -- conceptually, yes, I see what you're getting at, that we'd have a set number of attributes. In what I was thinking of for the geometry refactor so far I'm trying to get rid of the "hierarchy" as existing for every data set, and instead relying on what amounts to an object-finder and io-coordinator, which I'm calling a geometry handler. It sounds like what you would like is:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
refine_by dimensionality current_time domain_dimensions domain_left_edge domain_right_edge unique_identifier current_redshift cosmological_simulation omega_matter omega_lambda hubble_constant
The only ones here that I think would be okay to move out of properties would be the cosmology items, and even those I'm -0 on moving.
But, in general, the idea of moving from this two-stage system of parameter file (rather than dataset) and hierarchy (rather than an implicitly-handled geometry) is something I am in support of. The geometry is something that should nearly *always* be handled by the backend, rather than by the user. So having the library require pf.h.sphere(...) is less than ideal, since it's exposing something relatively unfortunate (that building a hundred thousand grid objects can take some time).
The main ways that the static output is interacted with:
* Parameter information specific to a simulation code * Properties that yt needs to know about * To get at the hierarchy * Input to plot collections
The main ways that the hierarchy is interacted with:
* Getting data objects * Finding max * Statistics about the simulation * Inspecting individual grids (much less common use case now that it
All of these use cases are still valid, but I think it's clear that accessing individual grids and accessing simulation-specific parameters are not "generic" functions. What a lot of this discussion has really brought up for me is that we're talking about *generic* functionality, not code-specific functionality, and we right now do not have the best enumeration of functionality and where it lies.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
This brings up two points, though --
1) Does our method of instantiating objects still hold up? i.e., ds.sphere(...) and so on? Or does our dataset object then become overcrowded? I would also like to move *all* plotting objects into whatever we end up deciding is the location data containers come from, which for instance could look like ds.plot("slice", "x") (for instance, although we can bikeshed that later), which would return a plot window. 2) Datasets and time series should behave, if not identically, at least consistently in their APIs. Moving to a completely ds-mediated mechanism for generating, accessing and inspecting data opens up the ability to then construct very nice and simply proxy objects. As an example, while something this is currently technically possible with the current Time Series API, it's a bit tricky:
ts = TimeSeriesData.from_filenames(...) plot = ts.plot("slice", "x", (100.0, 'au')) ts.seek(dt = (100, 'years')) plot.save() ts.seek(dt = (10, 'years')) plot.save()
(The time-slider, as Tom likes to call it ...)
In general, this idea of moving toward more thoughtful dataset-construction, rather than the hokey parameter file + hierarchy construction brings with it a mindset shift which I'd like to spread to the time series, which can continue to be a focus.
What do you think?
-Matt
On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark <caseywstark@gmail.com>
wrote:
+1 on datasets, although I would like to see the unit object(s) at the field level.
On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels <chummels@astro.columbia.edu> wrote:
+1 on datasets.
On 3/29/12 6:58 PM, Nathan Goldbaum wrote:
+1. I'd also be up to help out with the sprint. Doing a virtual
sprint
using a google hangout might help mitigate some of the distance
While we're brining up Enzo-isms that we should get rid of, I think
it
might be a good idea to make a conceptual shift in the basic python UI. Instead referring to the interface between the user and the data as a parameter file, I think instead we should be talking about datasets. One would instantiate a dataset just like we do now with parameter files:
ds = load(filename)
A dataset would also have some universal attributes which would
themselves to the user as a dict, e.g. ds.units, ds.parameters, ds.basic_info (like current_time, timestep, filename, and simulation code), and ds.hierarchy (not sure how that would interfere with the geometry refactor).
This may be a paintibg the bike shed discussion, but I think this shift will help new users understand how to access their data. Thoughts?
On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk@gmail.com> wrote:
> Hi Nathan and Casey, > > I agree with what both of you have said. The Orion/Nyx units should > be made to be consistent, but more importantly I think we should > continue breaking away from Enzo-isms in the code. > > As it stands, all of the universal fields call underlying Enzo-named > aliases -- Density, ThermalEnergy, etc etc. I hope we can have a 3.0 > out within a calendar year, hopefully by the end of this year. (I've > been pushing on the geometry refactor, although recently other efforts > have been paying off which has decreased my output there.) I am much, > much less doubtful than Casey is that we cannot do this; in fact, I'm > completely in favor of this and I think it would be relatively > straightforward to implement. > > In the existing system we have a mechanism for aliasing fields. What > we can do is provide an additional translation system where we > enumerate the fields that are available for items in UniversalFields, > and then construct aliases to those. This would mean changing what is > aliased in existing non-Enzo frontends, and adding aliases in Enzo. > The style of name Casey proposes is what I woudl also agree with: > underscores, lower cases, and erring on the side of verbosity. The > fields off hand that we would need to do this for (in their current > enzo-isms): > > x-velocity => velocity_x (same for y, z) > Density => density > TotalEnergy => ? > GasEnergy => thermal_energy_specific (and thermal_energy_density) > Temperature => temperature > > and so on. > > Once we have these aliases in place, an overall cleanup of > UniversalFields should take place. One place we should clean up is > ensuring that there are no conditionals; rather than conditionals > inside the functions, we should place those conditionals inside the > parameter file types. So for instance, if you have a field that is > calculated differently depending on the parameter HydroMethod (in Enzo > for instance) you simply set a validator on the field requiring the > parameter be set to a particular value, and then only the field which > satisfies that validator will be called when requested. > > So we've gotten rid of a bunch of enzo-isms in the parameter files; > after fields, what else can we address? And, I'd be up for sprinting > on this (which should take just a few hours) basically any time next > week or after. I'd also be up for talking more about geometry > refactoring, if anyone is interested, but it's not quite to the
> that I think I am satisfied enough with the architecture to request > input / contributions. Sometimes (especially with big architectural > things like this) I think it's a shame we do all of our work > virtually, as I think a lot of this would be easier to bang out in > person for a couple hours. > > -Matt > > On Wed, Mar 28, 2012 at 6:14 PM, Casey W. Stark< caseywstark@gmail.com> > wrote: >> >> Hi Nathan. >> >> I'm also worried about this and I agree that fields with the same name >> should all be consistent. I would support some sort of cleanup of >> frontend >> fields, and I can get the Nyx fields in line and help with Enzo. >> >> I doubt we can do this, but I would prefer changing the field names as >> part >> of the removing enzo-isms and geometry handling refactoring
>> instance, the field in Orion could be thermal_energy_density and
>> field >> in Enzo could be specific_thermal_energy. I also noticed this issue >> when I >> was using "Density" in Enzo (proper density in cgs) and "density" in >> Nyx >> (comoving density in cgs). >> >> Best, >> Casey >> >> >> On Wed, Mar 28, 2012 at 1:47 PM, Nathan Goldbaum< goldbaum@ucolick.org> >> wrote: >>> >>> Hi all, >>> >>> On IRC today we noticed that Orion defines its ThermalEnergy field per >>> unit volume but Enzo and FLASH define ThermalEnergy per unit mass. Is >>> this >>> a problem? Since yt defaults to the Enzo field names, should we
>>> to make >>> sure that all fields are defined using the same units as in Enzo? Is >>> there >>> a convention for how different codes should define derived fields
>>> are >>> aliased to Enzo fields? >>> >>> One problem for this particular example is that the Pressure field is >>> defined in terms of ThermalEnergy in universal_fields.py so the units >>> of >>> ThermalEnergy become important if a user merely wants the gas
was before) problems. present point pushes. For the try that pressure
>>> in >>> the simulation. >>> >>> One possible solution for this issue would be the units overhaul we're >>> planning. If all fields are associated with a unit object, we can >>> simply >>> query the units to ensure that units are taken care of correctly and >>> code-to-code comparisons aren't sensitive to the units chosen for >>> fields in >>> the frontend. >>> >>> Personally, I think it would be best if we could make sure that all of >>> the >>> fields aliased to Enzo fields have the same units. >>> >>> Nathan Goldbaum >>> Graduate Student >>> Astronomy& Astrophysics, UCSC >>> >>> goldbaum@ucolick.org >>> http://www.ucolick.org/~goldbaum >>> >>> _______________________________________________ >>> yt-dev mailing list >>> yt-dev@lists.spacepope.org >>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >> >> >> >> _______________________________________________ >> yt-dev mailing list >> yt-dev@lists.spacepope.org >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >> > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org > > > _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f758f9f246202301928688!
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Casey, On Mon, Apr 2, 2012 at 1:01 PM, Casey W. Stark <caseywstark@gmail.com> wrote:
I think I forgot to reply -- Tuesday works for me and Wednesday is good before 11 or after 12:30 Pacific.
We can sort this out during the hangout, but which issue are we focusing on? Is this more for the units system, renaming fields in the 3.0 branch, or the dataset change? (or maybe something else that was mentioned, there were a lot)
How about 1:00PM pacific on Wednesday? And I was thinking we'd work in yt-refactor and change up the fields. -Matt
Best, Casey
On Fri, Mar 30, 2012 at 12:59 PM, Matthew Turk <matthewturk@gmail.com> wrote:
On Fri, Mar 30, 2012 at 1:22 PM, Nathan Goldbaum <goldbaum@ucolick.org> wrote:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
I'd say #3 is the least important. I'd be fine with the dataset object having some non-dict attributes that describe the nature of the dataset rather than storing them all in a basic_info dict. One thing to think about: if we want to support pure-particle datasets, then we should drop the notion of refine_by as a basic dataset attribute.
I think whether refine_by sticks around depends on how we end up wanting to address fluid quantities in particle datasets. One possibility for handling SPH data is to grid it, and while I don't want to lock us into that (myopic at best) I don't want to exclude it as an ultimate possibility. But yes, in general, I agree. As I have been working on the geometry refactor, the number of times refine_by is access has been going down, as for the most part it relies on (for instance) the projection code knowing how to handle data from grids, which has been pshed back onto the grids instead. Now projections simply receive data that is ordered spatially, and that data is appropriately added.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
Why not get access to objects through a geometry attribute that hangs off of the dataset object. If I wanted to instantiate a sphere object, I would just do:
sp = ds.geometry.sphere()
This is pretty much the same as the pf.h.sphere() syntax in place right now but allows for arbitrary selection embedded inside of the new geometry code.
That's how I was implementing it. I just wasn't sure this was as clean. Having the plots then hang off the geometry feels a little funny.
Also, I don't think I explicitly commented on Casey's hangout suggestion -- I am in favor. Could we do Tuesday afternoon (late morning CA time) or Wednesday same?
-Matt
Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
On Mar 30, 2012, at 3:48 AM, Matthew Turk wrote:
In general, I agree with the idea Nathan put out. (Also, I think this is a fine time to have a bikeshed discussion. Many of the underlying assumptions about how yt works were laid out a long time ago.) But, I'm not entirely sure I understand how different it would be -- conceptually, yes, I see what you're getting at, that we'd have a set number of attributes. In what I was thinking of for the geometry refactor so far I'm trying to get rid of the "hierarchy" as existing for every data set, and instead relying on what amounts to an object-finder and io-coordinator, which I'm calling a geometry handler. It sounds like what you would like is:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
refine_by dimensionality current_time domain_dimensions domain_left_edge domain_right_edge unique_identifier current_redshift cosmological_simulation omega_matter omega_lambda hubble_constant
The only ones here that I think would be okay to move out of properties would be the cosmology items, and even those I'm -0 on moving.
But, in general, the idea of moving from this two-stage system of parameter file (rather than dataset) and hierarchy (rather than an implicitly-handled geometry) is something I am in support of. The geometry is something that should nearly *always* be handled by the backend, rather than by the user. So having the library require pf.h.sphere(...) is less than ideal, since it's exposing something relatively unfortunate (that building a hundred thousand grid objects can take some time).
The main ways that the static output is interacted with:
* Parameter information specific to a simulation code * Properties that yt needs to know about * To get at the hierarchy * Input to plot collections
The main ways that the hierarchy is interacted with:
* Getting data objects * Finding max * Statistics about the simulation * Inspecting individual grids (much less common use case now that it was before)
All of these use cases are still valid, but I think it's clear that accessing individual grids and accessing simulation-specific parameters are not "generic" functions. What a lot of this discussion has really brought up for me is that we're talking about *generic* functionality, not code-specific functionality, and we right now do not have the best enumeration of functionality and where it lies.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
This brings up two points, though --
1) Does our method of instantiating objects still hold up? i.e., ds.sphere(...) and so on? Or does our dataset object then become overcrowded? I would also like to move *all* plotting objects into whatever we end up deciding is the location data containers come from, which for instance could look like ds.plot("slice", "x") (for instance, although we can bikeshed that later), which would return a plot window. 2) Datasets and time series should behave, if not identically, at least consistently in their APIs. Moving to a completely ds-mediated mechanism for generating, accessing and inspecting data opens up the ability to then construct very nice and simply proxy objects. As an example, while something this is currently technically possible with the current Time Series API, it's a bit tricky:
ts = TimeSeriesData.from_filenames(...) plot = ts.plot("slice", "x", (100.0, 'au')) ts.seek(dt = (100, 'years')) plot.save() ts.seek(dt = (10, 'years')) plot.save()
(The time-slider, as Tom likes to call it ...)
In general, this idea of moving toward more thoughtful dataset-construction, rather than the hokey parameter file + hierarchy construction brings with it a mindset shift which I'd like to spread to the time series, which can continue to be a focus.
What do you think?
-Matt
On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark <caseywstark@gmail.com> wrote:
+1 on datasets, although I would like to see the unit object(s) at the field level.
On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels <chummels@astro.columbia.edu> wrote:
+1 on datasets.
On 3/29/12 6:58 PM, Nathan Goldbaum wrote: > > +1. I'd also be up to help out with the sprint. Doing a virtual > sprint > using a google hangout might help mitigate some of the distance > problems. > > While we're brining up Enzo-isms that we should get rid of, I think > it > might be a good idea to make a conceptual shift in the basic python > UI. > Instead referring to the interface between the user and the data as > a > parameter file, I think instead we should be talking about datasets. > One > would instantiate a dataset just like we do now with parameter > files: > > ds = load(filename) > > A dataset would also have some universal attributes which would > present > themselves to the user as a dict, e.g. ds.units, ds.parameters, > ds.basic_info (like current_time, timestep, filename, and simulation > code), > and ds.hierarchy (not sure how that would interfere with the > geometry > refactor). > > This may be a paintibg the bike shed discussion, but I think this > shift > will help new users understand how to access their data. Thoughts? > > On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk@gmail.com> > wrote: > >> Hi Nathan and Casey, >> >> I agree with what both of you have said. The Orion/Nyx units >> should >> be made to be consistent, but more importantly I think we should >> continue breaking away from Enzo-isms in the code. >> >> As it stands, all of the universal fields call underlying >> Enzo-named >> aliases -- Density, ThermalEnergy, etc etc. I hope we can have a >> 3.0 >> out within a calendar year, hopefully by the end of this year. >> (I've >> been pushing on the geometry refactor, although recently other >> efforts >> have been paying off which has decreased my output there.) I am >> much, >> much less doubtful than Casey is that we cannot do this; in fact, >> I'm >> completely in favor of this and I think it would be relatively >> straightforward to implement. >> >> In the existing system we have a mechanism for aliasing fields. >> What >> we can do is provide an additional translation system where we >> enumerate the fields that are available for items in >> UniversalFields, >> and then construct aliases to those. This would mean changing what >> is >> aliased in existing non-Enzo frontends, and adding aliases in Enzo. >> The style of name Casey proposes is what I woudl also agree with: >> underscores, lower cases, and erring on the side of verbosity. The >> fields off hand that we would need to do this for (in their current >> enzo-isms): >> >> x-velocity => velocity_x (same for y, z) >> Density => density >> TotalEnergy => ? >> GasEnergy => thermal_energy_specific (and thermal_energy_density) >> Temperature => temperature >> >> and so on. >> >> Once we have these aliases in place, an overall cleanup of >> UniversalFields should take place. One place we should clean up is >> ensuring that there are no conditionals; rather than conditionals >> inside the functions, we should place those conditionals inside the >> parameter file types. So for instance, if you have a field that is >> calculated differently depending on the parameter HydroMethod (in >> Enzo >> for instance) you simply set a validator on the field requiring the >> parameter be set to a particular value, and then only the field >> which >> satisfies that validator will be called when requested. >> >> So we've gotten rid of a bunch of enzo-isms in the parameter files; >> after fields, what else can we address? And, I'd be up for >> sprinting >> on this (which should take just a few hours) basically any time >> next >> week or after. I'd also be up for talking more about geometry >> refactoring, if anyone is interested, but it's not quite to the >> point >> that I think I am satisfied enough with the architecture to request >> input / contributions. Sometimes (especially with big >> architectural >> things like this) I think it's a shame we do all of our work >> virtually, as I think a lot of this would be easier to bang out in >> person for a couple hours. >> >> -Matt >> >> On Wed, Mar 28, 2012 at 6:14 PM, Casey W. >> Stark<caseywstark@gmail.com> >> wrote: >>> >>> Hi Nathan. >>> >>> I'm also worried about this and I agree that fields with the same >>> name >>> should all be consistent. I would support some sort of cleanup of >>> frontend >>> fields, and I can get the Nyx fields in line and help with Enzo. >>> >>> I doubt we can do this, but I would prefer changing the field >>> names as >>> part >>> of the removing enzo-isms and geometry handling refactoring >>> pushes. For >>> instance, the field in Orion could be thermal_energy_density and >>> the >>> field >>> in Enzo could be specific_thermal_energy. I also noticed this >>> issue >>> when I >>> was using "Density" in Enzo (proper density in cgs) and "density" >>> in >>> Nyx >>> (comoving density in cgs). >>> >>> Best, >>> Casey >>> >>> >>> On Wed, Mar 28, 2012 at 1:47 PM, Nathan >>> Goldbaum<goldbaum@ucolick.org> >>> wrote: >>>> >>>> Hi all, >>>> >>>> On IRC today we noticed that Orion defines its ThermalEnergy >>>> field per >>>> unit volume but Enzo and FLASH define ThermalEnergy per unit >>>> mass. Is >>>> this >>>> a problem? Since yt defaults to the Enzo field names, should we >>>> try >>>> to make >>>> sure that all fields are defined using the same units as in Enzo? >>>> Is >>>> there >>>> a convention for how different codes should define derived fields >>>> that >>>> are >>>> aliased to Enzo fields? >>>> >>>> One problem for this particular example is that the Pressure >>>> field is >>>> defined in terms of ThermalEnergy in universal_fields.py so the >>>> units >>>> of >>>> ThermalEnergy become important if a user merely wants the gas >>>> pressure >>>> in >>>> the simulation. >>>> >>>> One possible solution for this issue would be the units overhaul >>>> we're >>>> planning. If all fields are associated with a unit object, we can >>>> simply >>>> query the units to ensure that units are taken care of correctly >>>> and >>>> code-to-code comparisons aren't sensitive to the units chosen for >>>> fields in >>>> the frontend. >>>> >>>> Personally, I think it would be best if we could make sure that >>>> all of >>>> the >>>> fields aliased to Enzo fields have the same units. >>>> >>>> Nathan Goldbaum >>>> Graduate Student >>>> Astronomy& Astrophysics, UCSC >>>> >>>> goldbaum@ucolick.org >>>> http://www.ucolick.org/~goldbaum >>>> >>>> _______________________________________________ >>>> yt-dev mailing list >>>> yt-dev@lists.spacepope.org >>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>> >>> >>> >>> _______________________________________________ >>> yt-dev mailing list >>> yt-dev@lists.spacepope.org >>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>> >> _______________________________________________ >> yt-dev mailing list >> yt-dev@lists.spacepope.org >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >> >> >> > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org > _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f758f9f246202301928688!
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Sounds good. On Mon, Apr 2, 2012 at 10:47 AM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Casey,
I think I forgot to reply -- Tuesday works for me and Wednesday is good before 11 or after 12:30 Pacific.
We can sort this out during the hangout, but which issue are we focusing on? Is this more for the units system, renaming fields in the 3.0 branch, or
On Mon, Apr 2, 2012 at 1:01 PM, Casey W. Stark <caseywstark@gmail.com> wrote: the
dataset change? (or maybe something else that was mentioned, there were a lot)
How about 1:00PM pacific on Wednesday? And I was thinking we'd work in yt-refactor and change up the fields.
-Matt
Best, Casey
On Fri, Mar 30, 2012 at 12:59 PM, Matthew Turk <matthewturk@gmail.com> wrote:
On Fri, Mar 30, 2012 at 1:22 PM, Nathan Goldbaum <goldbaum@ucolick.org> wrote:
1) Get rid of accessing parameters with an implicit __getitem__ on
parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
I'd say #3 is the least important. I'd be fine with the dataset object having some non-dict attributes that describe the nature of the dataset rather than storing them all in a basic_info dict. One thing to think about: if we want to support pure-particle datasets, then we should drop the notion of refine_by as a basic dataset attribute.
I think whether refine_by sticks around depends on how we end up wanting to address fluid quantities in particle datasets. One possibility for handling SPH data is to grid it, and while I don't want to lock us into that (myopic at best) I don't want to exclude it as an ultimate possibility. But yes, in general, I agree. As I have been working on the geometry refactor, the number of times refine_by is access has been going down, as for the most part it relies on (for instance) the projection code knowing how to handle data from grids, which has been pshed back onto the grids instead. Now projections simply receive data that is ordered spatially, and that data is appropriately added.
With the geometry_refactor, I'd like to consolidate functionality
into
the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
Why not get access to objects through a geometry attribute that hangs off of the dataset object. If I wanted to instantiate a sphere object, I would just do:
sp = ds.geometry.sphere()
This is pretty much the same as the pf.h.sphere() syntax in place right now but allows for arbitrary selection embedded inside of the new geometry code.
That's how I was implementing it. I just wasn't sure this was as clean. Having the plots then hang off the geometry feels a little funny.
Also, I don't think I explicitly commented on Casey's hangout suggestion -- I am in favor. Could we do Tuesday afternoon (late morning CA time) or Wednesday same?
-Matt
Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
On Mar 30, 2012, at 3:48 AM, Matthew Turk wrote:
In general, I agree with the idea Nathan put out. (Also, I think
is a fine time to have a bikeshed discussion. Many of the underlying assumptions about how yt works were laid out a long time ago.) But, I'm not entirely sure I understand how different it would be -- conceptually, yes, I see what you're getting at, that we'd have a set number of attributes. In what I was thinking of for the geometry refactor so far I'm trying to get rid of the "hierarchy" as existing for every data set, and instead relying on what amounts to an object-finder and io-coordinator, which I'm calling a geometry handler. It sounds like what you would like is:
1) Get rid of accessing parameters with an implicit __getitem__ on
parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
refine_by dimensionality current_time domain_dimensions domain_left_edge domain_right_edge unique_identifier current_redshift cosmological_simulation omega_matter omega_lambda hubble_constant
The only ones here that I think would be okay to move out of properties would be the cosmology items, and even those I'm -0 on moving.
But, in general, the idea of moving from this two-stage system of parameter file (rather than dataset) and hierarchy (rather than an implicitly-handled geometry) is something I am in support of. The geometry is something that should nearly *always* be handled by the backend, rather than by the user. So having the library require pf.h.sphere(...) is less than ideal, since it's exposing something relatively unfortunate (that building a hundred thousand grid objects can take some time).
The main ways that the static output is interacted with:
* Parameter information specific to a simulation code * Properties that yt needs to know about * To get at the hierarchy * Input to plot collections
The main ways that the hierarchy is interacted with:
* Getting data objects * Finding max * Statistics about the simulation * Inspecting individual grids (much less common use case now that it was before)
All of these use cases are still valid, but I think it's clear that accessing individual grids and accessing simulation-specific parameters are not "generic" functions. What a lot of this discussion has really brought up for me is that we're talking about *generic* functionality, not code-specific functionality, and we right now do not have the best enumeration of functionality and where it lies.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
This brings up two points, though --
1) Does our method of instantiating objects still hold up? i.e., ds.sphere(...) and so on? Or does our dataset object then become overcrowded? I would also like to move *all* plotting objects into whatever we end up deciding is the location data containers come from, which for instance could look like ds.plot("slice", "x") (for instance, although we can bikeshed that later), which would return a plot window. 2) Datasets and time series should behave, if not identically, at least consistently in their APIs. Moving to a completely ds-mediated mechanism for generating, accessing and inspecting data opens up the ability to then construct very nice and simply proxy objects. As an example, while something this is currently technically possible with the current Time Series API, it's a bit tricky:
ts = TimeSeriesData.from_filenames(...) plot = ts.plot("slice", "x", (100.0, 'au')) ts.seek(dt = (100, 'years')) plot.save() ts.seek(dt = (10, 'years')) plot.save()
(The time-slider, as Tom likes to call it ...)
In general, this idea of moving toward more thoughtful dataset-construction, rather than the hokey parameter file + hierarchy construction brings with it a mindset shift which I'd like to spread to the time series, which can continue to be a focus.
What do you think?
-Matt
On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark < caseywstark@gmail.com> wrote:
+1 on datasets, although I would like to see the unit object(s) at
field level.
On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels <chummels@astro.columbia.edu> wrote: > > +1 on datasets. > > > On 3/29/12 6:58 PM, Nathan Goldbaum wrote: >> >> +1. I'd also be up to help out with the sprint. Doing a virtual >> sprint >> using a google hangout might help mitigate some of the distance >> problems. >> >> While we're brining up Enzo-isms that we should get rid of, I
>> it >> might be a good idea to make a conceptual shift in the basic
>> UI. >> Instead referring to the interface between the user and the data as >> a >> parameter file, I think instead we should be talking about datasets. >> One >> would instantiate a dataset just like we do now with parameter >> files: >> >> ds = load(filename) >> >> A dataset would also have some universal attributes which would >> present >> themselves to the user as a dict, e.g. ds.units, ds.parameters, >> ds.basic_info (like current_time, timestep, filename, and simulation >> code), >> and ds.hierarchy (not sure how that would interfere with the >> geometry >> refactor). >> >> This may be a paintibg the bike shed discussion, but I think this >> shift >> will help new users understand how to access their data. Thoughts? >> >> On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk@gmail.com> >> wrote: >> >>> Hi Nathan and Casey, >>> >>> I agree with what both of you have said. The Orion/Nyx units >>> should >>> be made to be consistent, but more importantly I think we should >>> continue breaking away from Enzo-isms in the code. >>> >>> As it stands, all of the universal fields call underlying >>> Enzo-named >>> aliases -- Density, ThermalEnergy, etc etc. I hope we can have a >>> 3.0 >>> out within a calendar year, hopefully by the end of this year. >>> (I've >>> been pushing on the geometry refactor, although recently other >>> efforts >>> have been paying off which has decreased my output there.) I am >>> much, >>> much less doubtful than Casey is that we cannot do this; in fact, >>> I'm >>> completely in favor of this and I think it would be relatively >>> straightforward to implement. >>> >>> In the existing system we have a mechanism for aliasing fields. >>> What >>> we can do is provide an additional translation system where we >>> enumerate the fields that are available for items in >>> UniversalFields, >>> and then construct aliases to those. This would mean changing what >>> is >>> aliased in existing non-Enzo frontends, and adding aliases in Enzo. >>> The style of name Casey proposes is what I woudl also agree with: >>> underscores, lower cases, and erring on the side of verbosity. The >>> fields off hand that we would need to do this for (in their current >>> enzo-isms): >>> >>> x-velocity => velocity_x (same for y, z) >>> Density => density >>> TotalEnergy => ? >>> GasEnergy => thermal_energy_specific (and
>>> Temperature => temperature >>> >>> and so on. >>> >>> Once we have these aliases in place, an overall cleanup of >>> UniversalFields should take place. One place we should clean up is >>> ensuring that there are no conditionals; rather than conditionals >>> inside the functions, we should place those conditionals inside
>>> parameter file types. So for instance, if you have a field that is >>> calculated differently depending on the parameter HydroMethod (in >>> Enzo >>> for instance) you simply set a validator on the field requiring
the this the the think python thermal_energy_density) the the
>>> parameter be set to a particular value, and then only the field >>> which >>> satisfies that validator will be called when requested. >>> >>> So we've gotten rid of a bunch of enzo-isms in the parameter files; >>> after fields, what else can we address? And, I'd be up for >>> sprinting >>> on this (which should take just a few hours) basically any time >>> next >>> week or after. I'd also be up for talking more about geometry >>> refactoring, if anyone is interested, but it's not quite to the >>> point >>> that I think I am satisfied enough with the architecture to request >>> input / contributions. Sometimes (especially with big >>> architectural >>> things like this) I think it's a shame we do all of our work >>> virtually, as I think a lot of this would be easier to bang out in >>> person for a couple hours. >>> >>> -Matt >>> >>> On Wed, Mar 28, 2012 at 6:14 PM, Casey W. >>> Stark<caseywstark@gmail.com> >>> wrote: >>>> >>>> Hi Nathan. >>>> >>>> I'm also worried about this and I agree that fields with the same >>>> name >>>> should all be consistent. I would support some sort of cleanup of >>>> frontend >>>> fields, and I can get the Nyx fields in line and help with Enzo. >>>> >>>> I doubt we can do this, but I would prefer changing the field >>>> names as >>>> part >>>> of the removing enzo-isms and geometry handling refactoring >>>> pushes. For >>>> instance, the field in Orion could be thermal_energy_density and >>>> the >>>> field >>>> in Enzo could be specific_thermal_energy. I also noticed this >>>> issue >>>> when I >>>> was using "Density" in Enzo (proper density in cgs) and "density" >>>> in >>>> Nyx >>>> (comoving density in cgs). >>>> >>>> Best, >>>> Casey >>>> >>>> >>>> On Wed, Mar 28, 2012 at 1:47 PM, Nathan >>>> Goldbaum<goldbaum@ucolick.org> >>>> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> On IRC today we noticed that Orion defines its ThermalEnergy >>>>> field per >>>>> unit volume but Enzo and FLASH define ThermalEnergy per unit >>>>> mass. Is >>>>> this >>>>> a problem? Since yt defaults to the Enzo field names, should we >>>>> try >>>>> to make >>>>> sure that all fields are defined using the same units as in Enzo? >>>>> Is >>>>> there >>>>> a convention for how different codes should define derived fields >>>>> that >>>>> are >>>>> aliased to Enzo fields? >>>>> >>>>> One problem for this particular example is that the Pressure >>>>> field is >>>>> defined in terms of ThermalEnergy in universal_fields.py so the >>>>> units >>>>> of >>>>> ThermalEnergy become important if a user merely wants the gas >>>>> pressure >>>>> in >>>>> the simulation. >>>>> >>>>> One possible solution for this issue would be the units overhaul >>>>> we're >>>>> planning. If all fields are associated with a unit object, we can >>>>> simply >>>>> query the units to ensure that units are taken care of correctly >>>>> and >>>>> code-to-code comparisons aren't sensitive to the units chosen for >>>>> fields in >>>>> the frontend. >>>>> >>>>> Personally, I think it would be best if we could make sure that >>>>> all of >>>>> the >>>>> fields aliased to Enzo fields have the same units. >>>>> >>>>> Nathan Goldbaum >>>>> Graduate Student >>>>> Astronomy& Astrophysics, UCSC >>>>> >>>>> goldbaum@ucolick.org >>>>> http://www.ucolick.org/~goldbaum >>>>> >>>>> _______________________________________________ >>>>> yt-dev mailing list >>>>> yt-dev@lists.spacepope.org >>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> yt-dev mailing list >>>> yt-dev@lists.spacepope.org >>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>>> >>> _______________________________________________ >>> yt-dev mailing list >>> yt-dev@lists.spacepope.org >>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>> >>> >>> >> _______________________________________________ >> yt-dev mailing list >> yt-dev@lists.spacepope.org >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >> > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f758f9f246202301928688!
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Sounds good to me as well. I may be a bit late as I will be getting back from lunch. Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum On Apr 2, 2012, at 10:54 AM, Casey W. Stark wrote:
Sounds good.
On Mon, Apr 2, 2012 at 10:47 AM, Matthew Turk <matthewturk@gmail.com> wrote: Hi Casey,
On Mon, Apr 2, 2012 at 1:01 PM, Casey W. Stark <caseywstark@gmail.com> wrote:
I think I forgot to reply -- Tuesday works for me and Wednesday is good before 11 or after 12:30 Pacific.
We can sort this out during the hangout, but which issue are we focusing on? Is this more for the units system, renaming fields in the 3.0 branch, or the dataset change? (or maybe something else that was mentioned, there were a lot)
How about 1:00PM pacific on Wednesday? And I was thinking we'd work in yt-refactor and change up the fields.
-Matt
Best, Casey
On Fri, Mar 30, 2012 at 12:59 PM, Matthew Turk <matthewturk@gmail.com> wrote:
On Fri, Mar 30, 2012 at 1:22 PM, Nathan Goldbaum <goldbaum@ucolick.org> wrote:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
I'd say #3 is the least important. I'd be fine with the dataset object having some non-dict attributes that describe the nature of the dataset rather than storing them all in a basic_info dict. One thing to think about: if we want to support pure-particle datasets, then we should drop the notion of refine_by as a basic dataset attribute.
I think whether refine_by sticks around depends on how we end up wanting to address fluid quantities in particle datasets. One possibility for handling SPH data is to grid it, and while I don't want to lock us into that (myopic at best) I don't want to exclude it as an ultimate possibility. But yes, in general, I agree. As I have been working on the geometry refactor, the number of times refine_by is access has been going down, as for the most part it relies on (for instance) the projection code knowing how to handle data from grids, which has been pshed back onto the grids instead. Now projections simply receive data that is ordered spatially, and that data is appropriately added.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
Why not get access to objects through a geometry attribute that hangs off of the dataset object. If I wanted to instantiate a sphere object, I would just do:
sp = ds.geometry.sphere()
This is pretty much the same as the pf.h.sphere() syntax in place right now but allows for arbitrary selection embedded inside of the new geometry code.
That's how I was implementing it. I just wasn't sure this was as clean. Having the plots then hang off the geometry feels a little funny.
Also, I don't think I explicitly commented on Casey's hangout suggestion -- I am in favor. Could we do Tuesday afternoon (late morning CA time) or Wednesday same?
-Matt
Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
On Mar 30, 2012, at 3:48 AM, Matthew Turk wrote:
In general, I agree with the idea Nathan put out. (Also, I think this is a fine time to have a bikeshed discussion. Many of the underlying assumptions about how yt works were laid out a long time ago.) But, I'm not entirely sure I understand how different it would be -- conceptually, yes, I see what you're getting at, that we'd have a set number of attributes. In what I was thinking of for the geometry refactor so far I'm trying to get rid of the "hierarchy" as existing for every data set, and instead relying on what amounts to an object-finder and io-coordinator, which I'm calling a geometry handler. It sounds like what you would like is:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
refine_by dimensionality current_time domain_dimensions domain_left_edge domain_right_edge unique_identifier current_redshift cosmological_simulation omega_matter omega_lambda hubble_constant
The only ones here that I think would be okay to move out of properties would be the cosmology items, and even those I'm -0 on moving.
But, in general, the idea of moving from this two-stage system of parameter file (rather than dataset) and hierarchy (rather than an implicitly-handled geometry) is something I am in support of. The geometry is something that should nearly *always* be handled by the backend, rather than by the user. So having the library require pf.h.sphere(...) is less than ideal, since it's exposing something relatively unfortunate (that building a hundred thousand grid objects can take some time).
The main ways that the static output is interacted with:
* Parameter information specific to a simulation code * Properties that yt needs to know about * To get at the hierarchy * Input to plot collections
The main ways that the hierarchy is interacted with:
* Getting data objects * Finding max * Statistics about the simulation * Inspecting individual grids (much less common use case now that it was before)
All of these use cases are still valid, but I think it's clear that accessing individual grids and accessing simulation-specific parameters are not "generic" functions. What a lot of this discussion has really brought up for me is that we're talking about *generic* functionality, not code-specific functionality, and we right now do not have the best enumeration of functionality and where it lies.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
This brings up two points, though --
1) Does our method of instantiating objects still hold up? i.e., ds.sphere(...) and so on? Or does our dataset object then become overcrowded? I would also like to move *all* plotting objects into whatever we end up deciding is the location data containers come from, which for instance could look like ds.plot("slice", "x") (for instance, although we can bikeshed that later), which would return a plot window. 2) Datasets and time series should behave, if not identically, at least consistently in their APIs. Moving to a completely ds-mediated mechanism for generating, accessing and inspecting data opens up the ability to then construct very nice and simply proxy objects. As an example, while something this is currently technically possible with the current Time Series API, it's a bit tricky:
ts = TimeSeriesData.from_filenames(...) plot = ts.plot("slice", "x", (100.0, 'au')) ts.seek(dt = (100, 'years')) plot.save() ts.seek(dt = (10, 'years')) plot.save()
(The time-slider, as Tom likes to call it ...)
In general, this idea of moving toward more thoughtful dataset-construction, rather than the hokey parameter file + hierarchy construction brings with it a mindset shift which I'd like to spread to the time series, which can continue to be a focus.
What do you think?
-Matt
On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark <caseywstark@gmail.com> wrote:
+1 on datasets, although I would like to see the unit object(s) at the field level.
On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels <chummels@astro.columbia.edu> wrote: > > +1 on datasets. > > > On 3/29/12 6:58 PM, Nathan Goldbaum wrote: >> >> +1. I'd also be up to help out with the sprint. Doing a virtual >> sprint >> using a google hangout might help mitigate some of the distance >> problems. >> >> While we're brining up Enzo-isms that we should get rid of, I think >> it >> might be a good idea to make a conceptual shift in the basic python >> UI. >> Instead referring to the interface between the user and the data as >> a >> parameter file, I think instead we should be talking about datasets. >> One >> would instantiate a dataset just like we do now with parameter >> files: >> >> ds = load(filename) >> >> A dataset would also have some universal attributes which would >> present >> themselves to the user as a dict, e.g. ds.units, ds.parameters, >> ds.basic_info (like current_time, timestep, filename, and simulation >> code), >> and ds.hierarchy (not sure how that would interfere with the >> geometry >> refactor). >> >> This may be a paintibg the bike shed discussion, but I think this >> shift >> will help new users understand how to access their data. Thoughts? >> >> On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk@gmail.com> >> wrote: >> >>> Hi Nathan and Casey, >>> >>> I agree with what both of you have said. The Orion/Nyx units >>> should >>> be made to be consistent, but more importantly I think we should >>> continue breaking away from Enzo-isms in the code. >>> >>> As it stands, all of the universal fields call underlying >>> Enzo-named >>> aliases -- Density, ThermalEnergy, etc etc. I hope we can have a >>> 3.0 >>> out within a calendar year, hopefully by the end of this year. >>> (I've >>> been pushing on the geometry refactor, although recently other >>> efforts >>> have been paying off which has decreased my output there.) I am >>> much, >>> much less doubtful than Casey is that we cannot do this; in fact, >>> I'm >>> completely in favor of this and I think it would be relatively >>> straightforward to implement. >>> >>> In the existing system we have a mechanism for aliasing fields. >>> What >>> we can do is provide an additional translation system where we >>> enumerate the fields that are available for items in >>> UniversalFields, >>> and then construct aliases to those. This would mean changing what >>> is >>> aliased in existing non-Enzo frontends, and adding aliases in Enzo. >>> The style of name Casey proposes is what I woudl also agree with: >>> underscores, lower cases, and erring on the side of verbosity. The >>> fields off hand that we would need to do this for (in their current >>> enzo-isms): >>> >>> x-velocity => velocity_x (same for y, z) >>> Density => density >>> TotalEnergy => ? >>> GasEnergy => thermal_energy_specific (and thermal_energy_density) >>> Temperature => temperature >>> >>> and so on. >>> >>> Once we have these aliases in place, an overall cleanup of >>> UniversalFields should take place. One place we should clean up is >>> ensuring that there are no conditionals; rather than conditionals >>> inside the functions, we should place those conditionals inside the >>> parameter file types. So for instance, if you have a field that is >>> calculated differently depending on the parameter HydroMethod (in >>> Enzo >>> for instance) you simply set a validator on the field requiring the >>> parameter be set to a particular value, and then only the field >>> which >>> satisfies that validator will be called when requested. >>> >>> So we've gotten rid of a bunch of enzo-isms in the parameter files; >>> after fields, what else can we address? And, I'd be up for >>> sprinting >>> on this (which should take just a few hours) basically any time >>> next >>> week or after. I'd also be up for talking more about geometry >>> refactoring, if anyone is interested, but it's not quite to the >>> point >>> that I think I am satisfied enough with the architecture to request >>> input / contributions. Sometimes (especially with big >>> architectural >>> things like this) I think it's a shame we do all of our work >>> virtually, as I think a lot of this would be easier to bang out in >>> person for a couple hours. >>> >>> -Matt >>> >>> On Wed, Mar 28, 2012 at 6:14 PM, Casey W. >>> Stark<caseywstark@gmail.com> >>> wrote: >>>> >>>> Hi Nathan. >>>> >>>> I'm also worried about this and I agree that fields with the same >>>> name >>>> should all be consistent. I would support some sort of cleanup of >>>> frontend >>>> fields, and I can get the Nyx fields in line and help with Enzo. >>>> >>>> I doubt we can do this, but I would prefer changing the field >>>> names as >>>> part >>>> of the removing enzo-isms and geometry handling refactoring >>>> pushes. For >>>> instance, the field in Orion could be thermal_energy_density and >>>> the >>>> field >>>> in Enzo could be specific_thermal_energy. I also noticed this >>>> issue >>>> when I >>>> was using "Density" in Enzo (proper density in cgs) and "density" >>>> in >>>> Nyx >>>> (comoving density in cgs). >>>> >>>> Best, >>>> Casey >>>> >>>> >>>> On Wed, Mar 28, 2012 at 1:47 PM, Nathan >>>> Goldbaum<goldbaum@ucolick.org> >>>> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> On IRC today we noticed that Orion defines its ThermalEnergy >>>>> field per >>>>> unit volume but Enzo and FLASH define ThermalEnergy per unit >>>>> mass. Is >>>>> this >>>>> a problem? Since yt defaults to the Enzo field names, should we >>>>> try >>>>> to make >>>>> sure that all fields are defined using the same units as in Enzo? >>>>> Is >>>>> there >>>>> a convention for how different codes should define derived fields >>>>> that >>>>> are >>>>> aliased to Enzo fields? >>>>> >>>>> One problem for this particular example is that the Pressure >>>>> field is >>>>> defined in terms of ThermalEnergy in universal_fields.py so the >>>>> units >>>>> of >>>>> ThermalEnergy become important if a user merely wants the gas >>>>> pressure >>>>> in >>>>> the simulation. >>>>> >>>>> One possible solution for this issue would be the units overhaul >>>>> we're >>>>> planning. If all fields are associated with a unit object, we can >>>>> simply >>>>> query the units to ensure that units are taken care of correctly >>>>> and >>>>> code-to-code comparisons aren't sensitive to the units chosen for >>>>> fields in >>>>> the frontend. >>>>> >>>>> Personally, I think it would be best if we could make sure that >>>>> all of >>>>> the >>>>> fields aliased to Enzo fields have the same units. >>>>> >>>>> Nathan Goldbaum >>>>> Graduate Student >>>>> Astronomy& Astrophysics, UCSC >>>>> >>>>> goldbaum@ucolick.org >>>>> http://www.ucolick.org/~goldbaum >>>>> >>>>> _______________________________________________ >>>>> yt-dev mailing list >>>>> yt-dev@lists.spacepope.org >>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>>> >>>> >>>> >>>> _______________________________________________ >>>> yt-dev mailing list >>>> yt-dev@lists.spacepope.org >>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>>> >>> _______________________________________________ >>> yt-dev mailing list >>> yt-dev@lists.spacepope.org >>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>> >>> >>> >> _______________________________________________ >> yt-dev mailing list >> yt-dev@lists.spacepope.org >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >> > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f79e7e715204169418881! _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f79e7e715204169418881!
Hi all, I've put up a document on Google Docs. I've invited those of you that I know have comments/thoughts on it to be editors, but if anybody else wants to have direct editing privs please let me know. This will be an evolving document as we move toward a 3.0 design and implementation. Here's the link, which you can leave comments on without having edit privs. https://docs.google.com/document/d/17Q-rbmTj9PyaTgtN1h6C8vqoWeIZjw_OjFbQp8L3... -Matt PS I had a "black triangle" moment today with octree reading, where I was able to apply geometric selection using the same mechanisms as grid patches, but without the expensive regridding step. Hooray! On Mon, Apr 2, 2012 at 2:17 PM, Nathan Goldbaum <goldbaum@ucolick.org> wrote:
Sounds good to me as well. I may be a bit late as I will be getting back from lunch.
Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
On Apr 2, 2012, at 10:54 AM, Casey W. Stark wrote:
Sounds good.
On Mon, Apr 2, 2012 at 10:47 AM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi Casey,
On Mon, Apr 2, 2012 at 1:01 PM, Casey W. Stark <caseywstark@gmail.com> wrote:
I think I forgot to reply -- Tuesday works for me and Wednesday is good before 11 or after 12:30 Pacific.
We can sort this out during the hangout, but which issue are we focusing on? Is this more for the units system, renaming fields in the 3.0 branch, or the dataset change? (or maybe something else that was mentioned, there were a lot)
How about 1:00PM pacific on Wednesday? And I was thinking we'd work in yt-refactor and change up the fields.
-Matt
Best, Casey
On Fri, Mar 30, 2012 at 12:59 PM, Matthew Turk <matthewturk@gmail.com> wrote:
On Fri, Mar 30, 2012 at 1:22 PM, Nathan Goldbaum <goldbaum@ucolick.org> wrote:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
I'd say #3 is the least important. I'd be fine with the dataset object having some non-dict attributes that describe the nature of the dataset rather than storing them all in a basic_info dict. One thing to think about: if we want to support pure-particle datasets, then we should drop the notion of refine_by as a basic dataset attribute.
I think whether refine_by sticks around depends on how we end up wanting to address fluid quantities in particle datasets. One possibility for handling SPH data is to grid it, and while I don't want to lock us into that (myopic at best) I don't want to exclude it as an ultimate possibility. But yes, in general, I agree. As I have been working on the geometry refactor, the number of times refine_by is access has been going down, as for the most part it relies on (for instance) the projection code knowing how to handle data from grids, which has been pshed back onto the grids instead. Now projections simply receive data that is ordered spatially, and that data is appropriately added.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
Why not get access to objects through a geometry attribute that hangs off of the dataset object. If I wanted to instantiate a sphere object, I would just do:
sp = ds.geometry.sphere()
This is pretty much the same as the pf.h.sphere() syntax in place right now but allows for arbitrary selection embedded inside of the new geometry code.
That's how I was implementing it. I just wasn't sure this was as clean. Having the plots then hang off the geometry feels a little funny.
Also, I don't think I explicitly commented on Casey's hangout suggestion -- I am in favor. Could we do Tuesday afternoon (late morning CA time) or Wednesday same?
-Matt
Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
On Mar 30, 2012, at 3:48 AM, Matthew Turk wrote:
In general, I agree with the idea Nathan put out. (Also, I think this is a fine time to have a bikeshed discussion. Many of the underlying assumptions about how yt works were laid out a long time ago.) But, I'm not entirely sure I understand how different it would be -- conceptually, yes, I see what you're getting at, that we'd have a set number of attributes. In what I was thinking of for the geometry refactor so far I'm trying to get rid of the "hierarchy" as existing for every data set, and instead relying on what amounts to an object-finder and io-coordinator, which I'm calling a geometry handler. It sounds like what you would like is:
1) Get rid of accessing parameters with an implicit __getitem__ on the parameter file (i.e., pf["SomethingThatOnlyExistsInOneCode"]). I'm +10 on this. 2) Move units into the .units object (I'm mostly with Casey on this, but I think it should be a part of the field_info object) 3) Have things like current_time, domain_dimensions and so on move into basic_info and make them dict objects.
I think of those, I'm in favor of one and two, but somewhat opposed to #3. Right now we have these attributes mandated for subclasses of StaticOutput:
refine_by dimensionality current_time domain_dimensions domain_left_edge domain_right_edge unique_identifier current_redshift cosmological_simulation omega_matter omega_lambda hubble_constant
The only ones here that I think would be okay to move out of properties would be the cosmology items, and even those I'm -0 on moving.
But, in general, the idea of moving from this two-stage system of parameter file (rather than dataset) and hierarchy (rather than an implicitly-handled geometry) is something I am in support of. The geometry is something that should nearly *always* be handled by the backend, rather than by the user. So having the library require pf.h.sphere(...) is less than ideal, since it's exposing something relatively unfortunate (that building a hundred thousand grid objects can take some time).
The main ways that the static output is interacted with:
* Parameter information specific to a simulation code * Properties that yt needs to know about * To get at the hierarchy * Input to plot collections
The main ways that the hierarchy is interacted with:
* Getting data objects * Finding max * Statistics about the simulation * Inspecting individual grids (much less common use case now that it was before)
All of these use cases are still valid, but I think it's clear that accessing individual grids and accessing simulation-specific parameters are not "generic" functions. What a lot of this discussion has really brought up for me is that we're talking about *generic* functionality, not code-specific functionality, and we right now do not have the best enumeration of functionality and where it lies.
With the geometry_refactor, I'd like to consolidate functionality into the main "dataset" object. The geometry can still provide access to the individual grids (of course) but data objects, finding max, getting stats about the simulation, etc, should all go into the main dataset object, and the geometry handler can simply be created on the fly if necessary.
This brings up two points, though --
1) Does our method of instantiating objects still hold up? i.e., ds.sphere(...) and so on? Or does our dataset object then become overcrowded? I would also like to move *all* plotting objects into whatever we end up deciding is the location data containers come from, which for instance could look like ds.plot("slice", "x") (for instance, although we can bikeshed that later), which would return a plot window. 2) Datasets and time series should behave, if not identically, at least consistently in their APIs. Moving to a completely ds-mediated mechanism for generating, accessing and inspecting data opens up the ability to then construct very nice and simply proxy objects. As an example, while something this is currently technically possible with the current Time Series API, it's a bit tricky:
ts = TimeSeriesData.from_filenames(...) plot = ts.plot("slice", "x", (100.0, 'au')) ts.seek(dt = (100, 'years')) plot.save() ts.seek(dt = (10, 'years')) plot.save()
(The time-slider, as Tom likes to call it ...)
In general, this idea of moving toward more thoughtful dataset-construction, rather than the hokey parameter file + hierarchy construction brings with it a mindset shift which I'd like to spread to the time series, which can continue to be a focus.
What do you think?
-Matt
On Thu, Mar 29, 2012 at 7:08 PM, Casey W. Stark <caseywstark@gmail.com> wrote: > +1 on datasets, although I would like to see the unit object(s) at > the > field > level. > > > On Thu, Mar 29, 2012 at 4:04 PM, Cameron Hummels > <chummels@astro.columbia.edu> wrote: >> >> +1 on datasets. >> >> >> On 3/29/12 6:58 PM, Nathan Goldbaum wrote: >>> >>> +1. I'd also be up to help out with the sprint. Doing a virtual >>> sprint >>> using a google hangout might help mitigate some of the distance >>> problems. >>> >>> While we're brining up Enzo-isms that we should get rid of, I >>> think >>> it >>> might be a good idea to make a conceptual shift in the basic >>> python >>> UI. >>> Instead referring to the interface between the user and the data >>> as >>> a >>> parameter file, I think instead we should be talking about >>> datasets. >>> One >>> would instantiate a dataset just like we do now with parameter >>> files: >>> >>> ds = load(filename) >>> >>> A dataset would also have some universal attributes which would >>> present >>> themselves to the user as a dict, e.g. ds.units, ds.parameters, >>> ds.basic_info (like current_time, timestep, filename, and >>> simulation >>> code), >>> and ds.hierarchy (not sure how that would interfere with the >>> geometry >>> refactor). >>> >>> This may be a paintibg the bike shed discussion, but I think this >>> shift >>> will help new users understand how to access their data. >>> Thoughts? >>> >>> On Mar 29, 2012, at 3:40 PM, Matthew Turk<matthewturk@gmail.com> >>> wrote: >>> >>>> Hi Nathan and Casey, >>>> >>>> I agree with what both of you have said. The Orion/Nyx units >>>> should >>>> be made to be consistent, but more importantly I think we should >>>> continue breaking away from Enzo-isms in the code. >>>> >>>> As it stands, all of the universal fields call underlying >>>> Enzo-named >>>> aliases -- Density, ThermalEnergy, etc etc. I hope we can have >>>> a >>>> 3.0 >>>> out within a calendar year, hopefully by the end of this year. >>>> (I've >>>> been pushing on the geometry refactor, although recently other >>>> efforts >>>> have been paying off which has decreased my output there.) I am >>>> much, >>>> much less doubtful than Casey is that we cannot do this; in >>>> fact, >>>> I'm >>>> completely in favor of this and I think it would be relatively >>>> straightforward to implement. >>>> >>>> In the existing system we have a mechanism for aliasing fields. >>>> What >>>> we can do is provide an additional translation system where we >>>> enumerate the fields that are available for items in >>>> UniversalFields, >>>> and then construct aliases to those. This would mean changing >>>> what >>>> is >>>> aliased in existing non-Enzo frontends, and adding aliases in >>>> Enzo. >>>> The style of name Casey proposes is what I woudl also agree >>>> with: >>>> underscores, lower cases, and erring on the side of verbosity. >>>> The >>>> fields off hand that we would need to do this for (in their >>>> current >>>> enzo-isms): >>>> >>>> x-velocity => velocity_x (same for y, z) >>>> Density => density >>>> TotalEnergy => ? >>>> GasEnergy => thermal_energy_specific (and >>>> thermal_energy_density) >>>> Temperature => temperature >>>> >>>> and so on. >>>> >>>> Once we have these aliases in place, an overall cleanup of >>>> UniversalFields should take place. One place we should clean up >>>> is >>>> ensuring that there are no conditionals; rather than >>>> conditionals >>>> inside the functions, we should place those conditionals inside >>>> the >>>> parameter file types. So for instance, if you have a field that >>>> is >>>> calculated differently depending on the parameter HydroMethod >>>> (in >>>> Enzo >>>> for instance) you simply set a validator on the field requiring >>>> the >>>> parameter be set to a particular value, and then only the field >>>> which >>>> satisfies that validator will be called when requested. >>>> >>>> So we've gotten rid of a bunch of enzo-isms in the parameter >>>> files; >>>> after fields, what else can we address? And, I'd be up for >>>> sprinting >>>> on this (which should take just a few hours) basically any time >>>> next >>>> week or after. I'd also be up for talking more about geometry >>>> refactoring, if anyone is interested, but it's not quite to the >>>> point >>>> that I think I am satisfied enough with the architecture to >>>> request >>>> input / contributions. Sometimes (especially with big >>>> architectural >>>> things like this) I think it's a shame we do all of our work >>>> virtually, as I think a lot of this would be easier to bang out >>>> in >>>> person for a couple hours. >>>> >>>> -Matt >>>> >>>> On Wed, Mar 28, 2012 at 6:14 PM, Casey W. >>>> Stark<caseywstark@gmail.com> >>>> wrote: >>>>> >>>>> Hi Nathan. >>>>> >>>>> I'm also worried about this and I agree that fields with the >>>>> same >>>>> name >>>>> should all be consistent. I would support some sort of cleanup >>>>> of >>>>> frontend >>>>> fields, and I can get the Nyx fields in line and help with >>>>> Enzo. >>>>> >>>>> I doubt we can do this, but I would prefer changing the field >>>>> names as >>>>> part >>>>> of the removing enzo-isms and geometry handling refactoring >>>>> pushes. For >>>>> instance, the field in Orion could be thermal_energy_density >>>>> and >>>>> the >>>>> field >>>>> in Enzo could be specific_thermal_energy. I also noticed this >>>>> issue >>>>> when I >>>>> was using "Density" in Enzo (proper density in cgs) and >>>>> "density" >>>>> in >>>>> Nyx >>>>> (comoving density in cgs). >>>>> >>>>> Best, >>>>> Casey >>>>> >>>>> >>>>> On Wed, Mar 28, 2012 at 1:47 PM, Nathan >>>>> Goldbaum<goldbaum@ucolick.org> >>>>> wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> On IRC today we noticed that Orion defines its ThermalEnergy >>>>>> field per >>>>>> unit volume but Enzo and FLASH define ThermalEnergy per unit >>>>>> mass. Is >>>>>> this >>>>>> a problem? Since yt defaults to the Enzo field names, should >>>>>> we >>>>>> try >>>>>> to make >>>>>> sure that all fields are defined using the same units as in >>>>>> Enzo? >>>>>> Is >>>>>> there >>>>>> a convention for how different codes should define derived >>>>>> fields >>>>>> that >>>>>> are >>>>>> aliased to Enzo fields? >>>>>> >>>>>> One problem for this particular example is that the Pressure >>>>>> field is >>>>>> defined in terms of ThermalEnergy in universal_fields.py so >>>>>> the >>>>>> units >>>>>> of >>>>>> ThermalEnergy become important if a user merely wants the gas >>>>>> pressure >>>>>> in >>>>>> the simulation. >>>>>> >>>>>> One possible solution for this issue would be the units >>>>>> overhaul >>>>>> we're >>>>>> planning. If all fields are associated with a unit object, we >>>>>> can >>>>>> simply >>>>>> query the units to ensure that units are taken care of >>>>>> correctly >>>>>> and >>>>>> code-to-code comparisons aren't sensitive to the units chosen >>>>>> for >>>>>> fields in >>>>>> the frontend. >>>>>> >>>>>> Personally, I think it would be best if we could make sure >>>>>> that >>>>>> all of >>>>>> the >>>>>> fields aliased to Enzo fields have the same units. >>>>>> >>>>>> Nathan Goldbaum >>>>>> Graduate Student >>>>>> Astronomy& Astrophysics, UCSC >>>>>> >>>>>> goldbaum@ucolick.org >>>>>> http://www.ucolick.org/~goldbaum >>>>>> >>>>>> _______________________________________________ >>>>>> yt-dev mailing list >>>>>> yt-dev@lists.spacepope.org >>>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> yt-dev mailing list >>>>> yt-dev@lists.spacepope.org >>>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>>>> >>>> _______________________________________________ >>>> yt-dev mailing list >>>> yt-dev@lists.spacepope.org >>>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>>> >>>> >>>> >>> _______________________________________________ >>> yt-dev mailing list >>> yt-dev@lists.spacepope.org >>> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org >>> >> _______________________________________________ >> yt-dev mailing list >> yt-dev@lists.spacepope.org >> http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org > > > > _______________________________________________ > yt-dev mailing list > yt-dev@lists.spacepope.org > http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org > _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f79e7e715204169418881! _______________________________________________
yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f79e7e715204169418881!
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
PS I had a "black triangle" moment today with octree reading, where I was able to apply geometric selection using the same mechanisms as grid patches, but without the expensive regridding step. Hooray!
Um, I'm not familiar with what you mean by "black triangle", I think it means you had a successful moment when things all came together, but I'm guessing it's not this? https://en.wikipedia.org/wiki/Black_triangle_(badge) -- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice)
http://rampantgames.com/blog/2004/10/black-triangle.html It gets tossed around as a term in hacker culture. On Tue, Apr 3, 2012 at 2:54 PM, Stephen Skory <s@skory.us> wrote:
PS I had a "black triangle" moment today with octree reading, where I was able to apply geometric selection using the same mechanisms as grid patches, but without the expensive regridding step. Hooray!
Um, I'm not familiar with what you mean by "black triangle", I think it means you had a successful moment when things all came together, but I'm guessing it's not this?
https://en.wikipedia.org/wiki/Black_triangle_(badge)
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
http://philosophistry.com/archives/2009/01/what-is-a-black.html Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum On Apr 3, 2012, at 11:54 AM, Stephen Skory wrote:
PS I had a "black triangle" moment today with octree reading, where I was able to apply geometric selection using the same mechanisms as grid patches, but without the expensive regridding step. Hooray!
Um, I'm not familiar with what you mean by "black triangle", I think it means you had a successful moment when things all came together, but I'm guessing it's not this?
https://en.wikipedia.org/wiki/Black_triangle_(badge)
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f7b47742854955923471!
Back to the point... That looks great, Matt. I'll get some stuff going on libyt/gdf when I get the chance. I will also return to finishing up my part of the volume rendering refactor, which I think just needs to be touched up and would probably be ready for something more like 2.4. I also had some thoughts on the parameter file vs dataset, but perhaps I'll move those comments to the doc. Sam On Tue, Apr 3, 2012 at 12:56 PM, Nathan Goldbaum <goldbaum@ucolick.org>wrote:
http://philosophistry.com/archives/2009/01/what-is-a-black.html
Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
On Apr 3, 2012, at 11:54 AM, Stephen Skory wrote:
PS I had a "black triangle" moment today with octree reading, where I
was able to apply geometric selection using the same mechanisms as
grid patches, but without the expensive regridding step. Hooray!
Um, I'm not familiar with what you mean by "black triangle", I think it means you had a successful moment when things all came together, but I'm guessing it's not this?
https://en.wikipedia.org/wiki/Black_triangle_(badge)
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f7b47742854955923471!
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Sam, On Tue, Apr 3, 2012 at 3:05 PM, Sam Skillman <samskillman@gmail.com> wrote:
Back to the point...
That looks great, Matt. I'll get some stuff going on libyt/gdf when I get the chance. I will also return to finishing up my part of the volume rendering refactor, which I think just needs to be touched up and would probably be ready for something more like 2.4.
Awesome! I think this definitely should aim for 2.4. For those of you reading along that haven't been following the vr-refactor bookmark, the basic idea is that volume rendering is now separated into traversal and accumulator functions. The accumulator functions are all written in nogil Cython, which allows them to be threaded. Additionally, operations like off-axis projections now do not interpolate (or evaluate the transfer function) and so they operate much faster. So combine both the optimized accumulator/samplers with threading, and you get much improved speed. (For off-axis projections -- again, no longer interpolated -- this translated in my tests to ~20x.) Also, for some operations (out-of-order, which include the old style volume renderings with no alpha blending!) no longer does the volume need to be homogenized; it can just be fed in directly with vertex centered data. Sam's also done a lot of work to improve the functionality of the volume renderer; last I remember he had a pretty nice looking opaque-surface with specular lighting. When 2.4 goes out we should also include some benchmarking data that shows off the speed increase, as well as the new capabilities. -Matt
I also had some thoughts on the parameter file vs dataset, but perhaps I'll move those comments to the doc.
Sam
On Tue, Apr 3, 2012 at 12:56 PM, Nathan Goldbaum <goldbaum@ucolick.org> wrote:
http://philosophistry.com/archives/2009/01/what-is-a-black.html
Nathan Goldbaum Graduate Student Astronomy & Astrophysics, UCSC goldbaum@ucolick.org http://www.ucolick.org/~goldbaum
On Apr 3, 2012, at 11:54 AM, Stephen Skory wrote:
PS I had a "black triangle" moment today with octree reading, where I
was able to apply geometric selection using the same mechanisms as
grid patches, but without the expensive regridding step. Hooray!
Um, I'm not familiar with what you mean by "black triangle", I think it means you had a successful moment when things all came together, but I'm guessing it's not this?
https://en.wikipedia.org/wiki/Black_triangle_(badge)
-- Stephen Skory s@skory.us http://stephenskory.com/ 510.621.3687 (google voice) _______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
!DSPAM:10175,4f7b47742854955923471!
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (5)
-
Casey W. Stark
-
Matthew Turk
-
Nathan Goldbaum
-
Sam Skillman
-
Stephen Skory