Re: [yt-dev] Soliciting opinions about how to handle data tables for spectral integrator port

Hi, I'm okay with this, but let's try to make it obvious where things go. (which you have all suggested.) I think astropy does this and has helper functions for it. Matt On Dec 8, 2016 9:28 AM, "Britton Smith" <brittonsmith@gmail.com> wrote: HI all, I agree that it would be best to keep the data files out of the repo. What about adding some config variables to control where the data gets stored? We could even have it default to storing things in .config/yt. Perhaps, on the first import when the data is not there, you just get a notice with links to wget/curl them. Britton On Wed, Dec 7, 2016 at 12:12 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
On Wed, Dec 7, 2016 at 1:30 PM, John Zuhone <jzuhone@gmail.com> wrote:
Hi everyone,
I just issued PR 2465 (https://bitbucket.org/yt_anal ysis/yt/pull-requests/2465/wip-port-the-spectral_integrator-analysis/) to port the spectral integrator analysis module (originally written by Britton) under yt.fields. spectral_integrator is an analysis module that creates X-ray emission fields in user-specified energy bands.
The hitch is that spectral_integrator uses HDF5 tables to compute the emissivity, since they are not analytical functions. We currently host those tables on yt-project.org/data. spectral_integrator downloads them automatically if the analysis module is used and they are not present.
There was some discussion on Slack as to whether or not this is the correct approach, since it’s not ideal for certain computing environments (e.g., HPC).
Also a problem for e.g. packaging yt on linux distros.
The files are a total of about 2.4 MB in size, so there is some reticence to bundling them with yt.
So my main concern here is adding them to the repo. If you could figure out a way to add them to the source distribution without including them in the repo (and then issue an error with a URL for non-sdist builds of yt [e.g. from the repo]), it wouldn't be a problem.
This isn't really a generic solution for all binary files we might want to send to users, but 2.4 MB isn't all that big to be part of an sdist or wheel.
There is also the issue that uploading new versions of the tables breaks backward-compatibility. I have sacrificed backwards-compatiblity for considerable simplification of code.
I think the solution is just to use different names for the files or the hdf5 datasets or groups here so the code can tell the difference.
Does anyone have any ideas about the best way to handle this issue?
Best,
John Z
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

Agreed that these should stay out of the repo. Trident has this issue as well, and it automatically downloads them when the code is used the first time. It also has a permanent location where HPC users and such can download them directly and put them in the appropriate directory. I think this is about as good as you can get to making it work for everyone. Just document it and hope people read the docs before using. Cameron On Thu, Dec 8, 2016 at 7:34 AM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi,
I'm okay with this, but let's try to make it obvious where things go. (which you have all suggested.)
I think astropy does this and has helper functions for it.
Matt
On Dec 8, 2016 9:28 AM, "Britton Smith" <brittonsmith@gmail.com> wrote:
HI all,
I agree that it would be best to keep the data files out of the repo. What about adding some config variables to control where the data gets stored? We could even have it default to storing things in .config/yt. Perhaps, on the first import when the data is not there, you just get a notice with links to wget/curl them.
Britton
On Wed, Dec 7, 2016 at 12:12 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
On Wed, Dec 7, 2016 at 1:30 PM, John Zuhone <jzuhone@gmail.com> wrote:
Hi everyone,
I just issued PR 2465 (https://bitbucket.org/yt_anal ysis/yt/pull-requests/2465/wip-port-the-spectral_integrator-analysis/) to port the spectral integrator analysis module (originally written by Britton) under yt.fields. spectral_integrator is an analysis module that creates X-ray emission fields in user-specified energy bands.
The hitch is that spectral_integrator uses HDF5 tables to compute the emissivity, since they are not analytical functions. We currently host those tables on yt-project.org/data. spectral_integrator downloads them automatically if the analysis module is used and they are not present.
There was some discussion on Slack as to whether or not this is the correct approach, since it’s not ideal for certain computing environments (e.g., HPC).
Also a problem for e.g. packaging yt on linux distros.
The files are a total of about 2.4 MB in size, so there is some reticence to bundling them with yt.
So my main concern here is adding them to the repo. If you could figure out a way to add them to the source distribution without including them in the repo (and then issue an error with a URL for non-sdist builds of yt [e.g. from the repo]), it wouldn't be a problem.
This isn't really a generic solution for all binary files we might want to send to users, but 2.4 MB isn't all that big to be part of an sdist or wheel.
There is also the issue that uploading new versions of the tables breaks backward-compatibility. I have sacrificed backwards-compatiblity for considerable simplification of code.
I think the solution is just to use different names for the files or the hdf5 datasets or groups here so the code can tell the difference.
Does anyone have any ideas about the best way to handle this issue?
Best,
John Z
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- Cameron Hummels NSF Postdoctoral Fellow Department of Astronomy California Institute of Technology http://chummels.org

I agree with Cameron. I would simply add that, from a user's perspective, it's nice to have a reminder to download that data if it's not currently in the expected location. Cheers, Jason ---- Jason Galyardt Dept. of Physics and Astronomy University of Georgia On Thu, Dec 8, 2016 at 11:26 AM, Cameron Hummels <chummels@gmail.com> wrote:
Agreed that these should stay out of the repo. Trident has this issue as well, and it automatically downloads them when the code is used the first time. It also has a permanent location where HPC users and such can download them directly and put them in the appropriate directory. I think this is about as good as you can get to making it work for everyone. Just document it and hope people read the docs before using.
Cameron
On Thu, Dec 8, 2016 at 7:34 AM, Matthew Turk <matthewturk@gmail.com> wrote:
Hi,
I'm okay with this, but let's try to make it obvious where things go. (which you have all suggested.)
I think astropy does this and has helper functions for it.
Matt
On Dec 8, 2016 9:28 AM, "Britton Smith" <brittonsmith@gmail.com> wrote:
HI all,
I agree that it would be best to keep the data files out of the repo. What about adding some config variables to control where the data gets stored? We could even have it default to storing things in .config/yt. Perhaps, on the first import when the data is not there, you just get a notice with links to wget/curl them.
Britton
On Wed, Dec 7, 2016 at 12:12 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
On Wed, Dec 7, 2016 at 1:30 PM, John Zuhone <jzuhone@gmail.com> wrote:
Hi everyone,
I just issued PR 2465 (https://bitbucket.org/yt_anal ysis/yt/pull-requests/2465/wip-port-the-spectral_integrator-analysis/) to port the spectral integrator analysis module (originally written by Britton) under yt.fields. spectral_integrator is an analysis module that creates X-ray emission fields in user-specified energy bands.
The hitch is that spectral_integrator uses HDF5 tables to compute the emissivity, since they are not analytical functions. We currently host those tables on yt-project.org/data. spectral_integrator downloads them automatically if the analysis module is used and they are not present.
There was some discussion on Slack as to whether or not this is the correct approach, since it’s not ideal for certain computing environments (e.g., HPC).
Also a problem for e.g. packaging yt on linux distros.
The files are a total of about 2.4 MB in size, so there is some reticence to bundling them with yt.
So my main concern here is adding them to the repo. If you could figure out a way to add them to the source distribution without including them in the repo (and then issue an error with a URL for non-sdist builds of yt [e.g. from the repo]), it wouldn't be a problem.
This isn't really a generic solution for all binary files we might want to send to users, but 2.4 MB isn't all that big to be part of an sdist or wheel.
There is also the issue that uploading new versions of the tables breaks backward-compatibility. I have sacrificed backwards-compatiblity for considerable simplification of code.
I think the solution is just to use different names for the files or the hdf5 datasets or groups here so the code can tell the difference.
Does anyone have any ideas about the best way to handle this issue?
Best,
John Z
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
-- Cameron Hummels NSF Postdoctoral Fellow Department of Astronomy California Institute of Technology http://chummels.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (3)
-
Cameron Hummels
-
Jason Galyardt
-
Matthew Turk