[AstroPy] reading one line from many small fits files

Mon Jul 30 20:57:12 EDT 2012

Hi John,

On 31.07.2012, at 1:40AM, "John K. Parejko" <john.parejko at yale.edu> wrote:

> This is really more of a pyfits question, but I've upgraded to pyfits 3.1 (SVN), which is the version in astropy.
> 
> I have data stored in thousands of ~few MB .fits files (photoObj files from SDSS) totaling a few TB of data, and I know the one single line I want to extract from some known subset of those files. But pyfits is taking more than a second per file to extract the fields I want, which seems very long, especially if it is using memmapped access, and thus should only have to read that single line (plus the header) from each file.
> 
> I'm doing something like this:
> 
>    result = np.empty(len(data),dtype=dtype)
>    for i,x in enumerate(data):
> 	getfilename(x[somefield])
>        photo = pyfits.open(photo,memmap=True)
>        result[i] = photo[1].data[x[otherfield]-1]
> 
> Is there a better way to go about this? Is pyfits known to be quite slow when reading a single row from a lot of different files? Anyone have suggestions on how to speed this up?

that seems quite slow; it takes me about 50 ms to read a random line from the DR8 example file
with pyfits 3.0.2. Unless the file access itself takes so long something appears to be odd. 
But the only thing coming to my mind now is that pyfits supports scaled column data (similar to
BSCALE/BZERO in image HDUs, I assume), and if such keywords were present, they would probably 
cause a corresponding transformation for the entire bintable. They don't seem to exist in the standard 
SDSS files, though. 
Naïve question: do you call photo.close() after each read?

Cheers,
						Derek