[AstroPy] new pyfits version deletes NP_pyfits, breaking pickle
jh at physics.ucf.edu
Fri Nov 12 10:39:21 EST 2010
Stefan, you make very good points, and we're aware of them. We do
rigorously separate code and data, and we do save our final
lightcurves and tables as FITS. We also save PNG and PS plots (each
generated automatically in both projection and print versions), tables
in both straight ASCII and Latex, and the text output of each run.
But internally, we have to rely on complex Python objects to manage in
a simple way what has become a very complex analysis. The analysis
has branches, so we need to save at each branch point so that we can
start parallel sessions that will load from that branch point and
handle each of the branches (otherwise we end up reading the same
gigabyte data set and finding all the bad pixels in it 50 times, for
each of 100 data sets). It's the short delays between running the
early stages and doing the 50 variants that's giving us the headache
Our archiving is simply to save the code (under SVN, with version
numbers of everything recorded in each run), input data, and ancillary
stuff that makes each analysis unique, and record how to run the
pipeline to regenerate what we need.
We update our OS and Python stack annually, and we archive those
(including sources) so that we can in principle rerun under old
software, though I hope that will never be necessary. My experience
has been that when questions are raised about old projects, they're
usually answered by looking at the code and work logs, along with the
plots, figures, and final data saved at the time.
> Date: Fri, 12 Nov 2010 10:29:53 +0100
> From: Stefan Schwarzburg <stefan.schwarzburg at googlemail.com>
> Subject: Re: [AstroPy] new pyfits version deletes NP_pyfits, breaking
> To: astropy at scipy.org
> <AANLkTikAr47p55mhAhEDf-vLdjhri_iC-dCB6r3AgUmK at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> Dear Joe,
> I know that you did not ask for this advice, but I feel the need to share
> this with you anyway.
> I've seen projects in the past that had similar problems like you describe
> (although they did not use python but root or R), and I think there is only
> one real solution to this:
> You should try to separate the data and logic and store the data in a
> standard file format that will be readable in the far future. In astronomy
> this is the FITS format. Of cause it is not able to do a lot of the cool
> things you can do with pickled objects and it has some strange restrictions
> that come from the time when FITS was supposed to be saved onto tapes, but
> this is actually its strenght. Just as you are able to open FITS files that
> were saved 20 years ago, you will be able to open these files in 20 years.
> Probably it will not be the same software (I guess there is a reason why you
> chose python and not algol or fortran77) but the data will still be there.
> With pickled objects you will definitely get the same problems over and over
> in the future. Pickled objects are great, because you can send them over a
> network and so on, but they are not a way to archive data.
> FITS files are good enough for all data of all astronomy projects I've seen
> so far, although you sometimes have to give up a certain style of thinking,
> but in the end you are always able to save what you need in images, tables
> and meta-data in the headers.
> As I said, this was not what you asked for, but I think I needed to tell you
> that large astronomy collaborations started with something similar like your
> setup and in the end had to learn the hard way that this does not work and
> ties them to old software that they would like to give up (and will switch
> to FITS in the future...).
> Best regards,
> Institut f?r Astronomie und Astrophysik
> Eberhard Karls Universit?t T?bingen
> Sand 1 - D-72076 T?bingen
> -------------- next part --------------
> An HTML attachment was scrubbed...
More information about the AstroPy