Soliciting opinions about how to handle data tables for spectral integrator port

Hi everyone, I just issued PR 2465 (https://bitbucket.org/yt_analysis/yt/pull-requests/2465/wip-port-the-spectra...) to port the spectral integrator analysis module (originally written by Britton) under yt.fields. spectral_integrator is an analysis module that creates X-ray emission fields in user-specified energy bands. The hitch is that spectral_integrator uses HDF5 tables to compute the emissivity, since they are not analytical functions. We currently host those tables on yt-project.org/data. spectral_integrator downloads them automatically if the analysis module is used and they are not present. There was some discussion on Slack as to whether or not this is the correct approach, since it’s not ideal for certain computing environments (e.g., HPC). The files are a total of about 2.4 MB in size, so there is some reticence to bundling them with yt. There is also the issue that uploading new versions of the tables breaks backward-compatibility. I have sacrificed backwards-compatiblity for considerable simplification of code. Does anyone have any ideas about the best way to handle this issue? Best, John Z

On Wed, Dec 7, 2016 at 1:30 PM, John Zuhone <jzuhone@gmail.com> wrote:
Hi everyone,
I just issued PR 2465 (https://bitbucket.org/yt_analysis/yt/pull-requests/ 2465/wip-port-the-spectral_integrator-analysis/) to port the spectral integrator analysis module (originally written by Britton) under yt.fields. spectral_integrator is an analysis module that creates X-ray emission fields in user-specified energy bands.
The hitch is that spectral_integrator uses HDF5 tables to compute the emissivity, since they are not analytical functions. We currently host those tables on yt-project.org/data. spectral_integrator downloads them automatically if the analysis module is used and they are not present.
There was some discussion on Slack as to whether or not this is the correct approach, since it’s not ideal for certain computing environments (e.g., HPC).
Also a problem for e.g. packaging yt on linux distros.
The files are a total of about 2.4 MB in size, so there is some reticence to bundling them with yt.
So my main concern here is adding them to the repo. If you could figure out a way to add them to the source distribution without including them in the repo (and then issue an error with a URL for non-sdist builds of yt [e.g. from the repo]), it wouldn't be a problem. This isn't really a generic solution for all binary files we might want to send to users, but 2.4 MB isn't all that big to be part of an sdist or wheel.
There is also the issue that uploading new versions of the tables breaks backward-compatibility. I have sacrificed backwards-compatiblity for considerable simplification of code.
I think the solution is just to use different names for the files or the hdf5 datasets or groups here so the code can tell the difference.
Does anyone have any ideas about the best way to handle this issue?
Best,
John Z
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

On 12/07/2016 02:12 PM, Nathan Goldbaum wrote:
On Wed, Dec 7, 2016 at 1:30 PM, John Zuhone <jzuhone@gmail.com> wrote:
Hi everyone,
I just issued PR 2465 (https://bitbucket.org/yt_analysis/yt/pull-requests/ 2465/wip-port-the-spectral_integrator-analysis/) to port the spectral integrator analysis module (originally written by Britton) under yt.fields. spectral_integrator is an analysis module that creates X-ray emission fields in user-specified energy bands.
The hitch is that spectral_integrator uses HDF5 tables to compute the emissivity, since they are not analytical functions. We currently host those tables on yt-project.org/data. spectral_integrator downloads them automatically if the analysis module is used and they are not present.
There was some discussion on Slack as to whether or not this is the correct approach, since it’s not ideal for certain computing environments (e.g., HPC).
Hi!
Also a problem for e.g. packaging yt on linux distros.
That went undetected because 1) it only happens during runtime 2) file is written to cwd. The obvious drawback of the latter is that unaware user may download that file during each execution. I'd suggest the following "upgrades": 1) raise an error instead of autodownloading the files. That would allow to check if database has the minimal required version. 2) make the location of the database configurable via e.g.: yt config set xray cloudydb /foo/bar 3) add an explicit download/update option such as: yt download extra_data this could list all available and registered auxiliary data, and let users choose what they need. That'd also be a way for extensions to provide external data. Cheers, Kacper
The files are a total of about 2.4 MB in size, so there is some reticence to bundling them with yt.
So my main concern here is adding them to the repo. If you could figure out a way to add them to the source distribution without including them in the repo (and then issue an error with a URL for non-sdist builds of yt [e.g. from the repo]), it wouldn't be a problem.
This isn't really a generic solution for all binary files we might want to send to users, but 2.4 MB isn't all that big to be part of an sdist or wheel.
There is also the issue that uploading new versions of the tables breaks backward-compatibility. I have sacrificed backwards-compatiblity for considerable simplification of code.
I think the solution is just to use different names for the files or the hdf5 datasets or groups here so the code can tell the difference.
Does anyone have any ideas about the best way to handle this issue?
Best,
John Z
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org

HI all, I agree that it would be best to keep the data files out of the repo. What about adding some config variables to control where the data gets stored? We could even have it default to storing things in .config/yt. Perhaps, on the first import when the data is not there, you just get a notice with links to wget/curl them. Britton On Wed, Dec 7, 2016 at 12:12 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
On Wed, Dec 7, 2016 at 1:30 PM, John Zuhone <jzuhone@gmail.com> wrote:
Hi everyone,
I just issued PR 2465 (https://bitbucket.org/yt_anal ysis/yt/pull-requests/2465/wip-port-the-spectral_integrator-analysis/) to port the spectral integrator analysis module (originally written by Britton) under yt.fields. spectral_integrator is an analysis module that creates X-ray emission fields in user-specified energy bands.
The hitch is that spectral_integrator uses HDF5 tables to compute the emissivity, since they are not analytical functions. We currently host those tables on yt-project.org/data. spectral_integrator downloads them automatically if the analysis module is used and they are not present.
There was some discussion on Slack as to whether or not this is the correct approach, since it’s not ideal for certain computing environments (e.g., HPC).
Also a problem for e.g. packaging yt on linux distros.
The files are a total of about 2.4 MB in size, so there is some reticence to bundling them with yt.
So my main concern here is adding them to the repo. If you could figure out a way to add them to the source distribution without including them in the repo (and then issue an error with a URL for non-sdist builds of yt [e.g. from the repo]), it wouldn't be a problem.
This isn't really a generic solution for all binary files we might want to send to users, but 2.4 MB isn't all that big to be part of an sdist or wheel.
There is also the issue that uploading new versions of the tables breaks backward-compatibility. I have sacrificed backwards-compatiblity for considerable simplification of code.
I think the solution is just to use different names for the files or the hdf5 datasets or groups here so the code can tell the difference.
Does anyone have any ideas about the best way to handle this issue?
Best,
John Z
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (4)
-
Britton Smith
-
John Zuhone
-
Kacper Kowalik
-
Nathan Goldbaum