[AstroPy] pyfits: checksum/datasum not both required in fits files

Erik Bray embray at stsci.edu
Thu Sep 5 17:12:39 EDT 2013


On 09/04/2013 06:00 PM, John K. Parejko wrote:
> Hello,
>
> It looks like the fits checksum proposal does not require both DATASUM and CHECKSUM in the header (2nd paragraph of the PDF):
>
> http://fits.gsfc.nasa.gov/registry/checksum.html
>
> verify_checksun in pyfits 3.1.2 does not calculate a matching checksum if "CHECKSUM" is present, but "DATASUM" is not, because it assumes datasum is 0. If I pre-compute datasum and pass that to hdu._calculate_checksum(), I do get the matching checksum.
>
> This problem came up because we have some data written with nom.tam.fits v1.06.0, which only writes CHECKSUM, not DATASUM. We've hacked around it when reading the data with pyfits, but it'd be nice to not need that hack. Unfortunately, the correct behavior seems unspecified in the proposal, in the case where CHECKSUM is present but DATASUM is not.
>
> Should pyfits relax this requirement, and attempt to pre-compute the data checksum before calling _calculate_checksum(), instead of assuming 0?

I agree, this is a bug.  Looking over the code, it seems PyFITS is trying to be 
"clever" by using the already calculated DATASUM and just adding to it the 
checksum of the header to produce the full CHECKSUM value.

Of course, if the DATASUM keyword isn't present it should just compute the full 
checksum of the HDU including the data portion.  Instead it seems to be assuming 
that if the DATASUM keyword is absent then there is no data for which to compute 
the checksum.

I think that it's far more "typical" for a FITS HDU to have both DATASUM and 
CHECKSUM in the header, rather than just one or the other.  But agree, the 
specification does not require both.

(If I had to wager a guess, I'd say this confusion might have come in when 
whoever implemented this was following section A.2 of the spec: "Recommended 
CHECKSUM Keyword Implementation".  In step 3 is reads "Calculate the checksum 
for the entire HDU by adding ... the checksum accumulated over the header 
records to the checksum accumulated over the data records (i.e., the previously 
computed DATASUM keyword value)."  Even the specification *assumes* that the 
DATASUM keyword has been computed (and nowhere does it specify what to do if it 
hasn't been, but I assume the answer is to 'compute it' even if it's not saved 
in the header).

 > What is the correct datasum/checksum for unsigned int data with BZERO=37268,
 > BSCALE=1? Should it be calculated on the original bits, or on the scaled
 > values?

The checksum should use the raw bytes.

 > I ask, because it appears that pyfits and IDL's fits_test_checksum[1] seem to
 > disagree on the correct checksums for unsigned in data. Also, pyfits
 > transparently removes BZERO/BSCALE when writing files if "uint=True" is
 > specified on open().

I think this might be the usual confusion, about how pyfits automatically 
rescales the data if you open a file and then write it to a new file with 
writeto().  See

https://pyfits.readthedocs.org/en/v3.1.2/appendix/faq.html#why-is-an-image-containing-integer-data-being-converted-unexpectedly-to-floats

(the symptoms are different, but the root cause is the same).

Erik

> John
>
> --
> *************************
> John Parejko
> john.parejko at yale.edu
> http://www.astro.yale.edu/~jp727/
> 203 432-9759
> JWG 465
> Department of Physics
> Yale University
> New Haven, CT
> **************************
>
>
>
>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>




More information about the AstroPy mailing list