[AstroPy] pyfits: checksum/datasum not both required in fits files
embray at stsci.edu
Thu Sep 5 17:12:39 EDT 2013
On 09/04/2013 06:00 PM, John K. Parejko wrote:
> It looks like the fits checksum proposal does not require both DATASUM and CHECKSUM in the header (2nd paragraph of the PDF):
> verify_checksun in pyfits 3.1.2 does not calculate a matching checksum if "CHECKSUM" is present, but "DATASUM" is not, because it assumes datasum is 0. If I pre-compute datasum and pass that to hdu._calculate_checksum(), I do get the matching checksum.
> This problem came up because we have some data written with nom.tam.fits v1.06.0, which only writes CHECKSUM, not DATASUM. We've hacked around it when reading the data with pyfits, but it'd be nice to not need that hack. Unfortunately, the correct behavior seems unspecified in the proposal, in the case where CHECKSUM is present but DATASUM is not.
> Should pyfits relax this requirement, and attempt to pre-compute the data checksum before calling _calculate_checksum(), instead of assuming 0?
I agree, this is a bug. Looking over the code, it seems PyFITS is trying to be
"clever" by using the already calculated DATASUM and just adding to it the
checksum of the header to produce the full CHECKSUM value.
Of course, if the DATASUM keyword isn't present it should just compute the full
checksum of the HDU including the data portion. Instead it seems to be assuming
that if the DATASUM keyword is absent then there is no data for which to compute
I think that it's far more "typical" for a FITS HDU to have both DATASUM and
CHECKSUM in the header, rather than just one or the other. But agree, the
specification does not require both.
(If I had to wager a guess, I'd say this confusion might have come in when
whoever implemented this was following section A.2 of the spec: "Recommended
CHECKSUM Keyword Implementation". In step 3 is reads "Calculate the checksum
for the entire HDU by adding ... the checksum accumulated over the header
records to the checksum accumulated over the data records (i.e., the previously
computed DATASUM keyword value)." Even the specification *assumes* that the
DATASUM keyword has been computed (and nowhere does it specify what to do if it
hasn't been, but I assume the answer is to 'compute it' even if it's not saved
in the header).
> What is the correct datasum/checksum for unsigned int data with BZERO=37268,
> BSCALE=1? Should it be calculated on the original bits, or on the scaled
The checksum should use the raw bytes.
> I ask, because it appears that pyfits and IDL's fits_test_checksum seem to
> disagree on the correct checksums for unsigned in data. Also, pyfits
> transparently removes BZERO/BSCALE when writing files if "uint=True" is
> specified on open().
I think this might be the usual confusion, about how pyfits automatically
rescales the data if you open a file and then write it to a new file with
(the symptoms are different, but the root cause is the same).
> John Parejko
> john.parejko at yale.edu
> 203 432-9759
> JWG 465
> Department of Physics
> Yale University
> New Haven, CT
> AstroPy mailing list
> AstroPy at scipy.org
More information about the AstroPy