[AstroPy] Read FITS headers without loading data

Erik Bray embray at stsci.edu
Mon Sep 8 12:21:08 EDT 2014


On 09/08/2014 11:58 AM, Gary Bernstein wrote:
> Thanks for setting me straight Erik.
>
> Typically 60 extensions per FITS files.  Each is a binary table of cataloged objects from one CCD of a mosaic camera.  So there are many keywords in the header and perhaps I am just seeing the processing time for this.

Thanks for clarifying-- indeed 60 headers each with many keywords is going to 
have an impact.  One way in which CFITSIO is different is that it does not 
actually parse the entire header at once, in general.  Instead when you request 
a particular keyword, for example, it will seek around until it finds it.

Generally very fast, but not appropriate for the richer data structures PyFITS 
provides where parsing much of the header in advance is necessary (though 
there's still a lot of shortcuts and lazy loading going on there too).

Actually if you wouldn't mind sending me a sample file that could be very 
helpful.  I don't have time to work on it right now, but "out of the ordinary" 
files like that are often very helpful for tracking down areas where performance 
can be improved.

Thanks,

Erik

> On Sep 8, 2014, at 11:14 AM, Erik Bray <embray at stsci.edu> wrote:
>
>> On 09/08/2014 10:40 AM, Gary Bernstein wrote:
>>> I would like to extract the header information from a large multi-extension FITS file using astropy.io.fits.  This runs very slowly, and I suspect because it is reading the data of each extension when the header is accessed e.g. via
>>>
>>> import astropy.io.fits as pf
>>> f = pf.open(‘mef.fits’)
>>> for hdu in f:
>>>      h = hdu.header
>>>      … do stuff with header…
>>>
>>> It is *much* faster using e.g. cfitsio utilities (<<1 second compared to 10’s of seconds for the above on a multiple-GB file).  Am I correct that the data is being loaded for each extension in the above method?  If so is there a workaround?  Neither using getheader nor (un)setting memmap seems to make a difference.
>>
>> No, it doesn't touch the data when just reading headers.  The difference depends
>> largely on what you're doing with the headers, though in general it's due to the
>> fact that CFITSIO is written in C while PyFITS is written in pure Python, and
>> does a lot more to parse headers into an in-memory data structure.
>>
>> That said, how many headers are in this file?  It shouldn't take "10's of
>> seconds" though again that might depend in part on what you're doing.
>>
>> Erik
>>
>> _______________________________________________
>> AstroPy mailing list
>> AstroPy at scipy.org
>> http://mail.scipy.org/mailman/listinfo/astropy
>>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at scipy.org
> http://mail.scipy.org/mailman/listinfo/astropy
>




More information about the AstroPy mailing list