[AstroPy] reading one line from many small fits files

Erin Sheldon erin.sheldon at gmail.com
Fri Aug 3 19:29:15 EDT 2012


On Fri, Aug 3, 2012 at 6:06 PM, Erik Bray <embray at stsci.edu> wrote:
>> This is a fundamental limitation for structured numpy arrays and the
>> memmap interface.
>
>
> Indeed--exactly my point.  If were to write PyFITS from scratch I would
> probably not rely on Numpy as my primary I/O interface, but would instead
> serve up Numpy arrays as a high-level abstraction.  Under the hood I would
> probably do something more closely resembling CFITSIO's buffer ring (the
> returned Numpy arrays would of course not be contiguous by default).
>
>
>> I developed a C code to get around this limitation and added it to
>> numpy.  This can be used by pyfits for tables (but not variable length
>> columns). Unfortunately that project got a bit hijacked by other
>> developers because they want the ascii reading portion of the code to do
>> type inference. This has dramatically increased the scope of the project
>> and delayed things indefinitely.  I am no longer involved, but it is my
>> understanding that this functionality should be available in an upcoming
>> version of numpy.
>>
>> -e
>
>
> You'll have to tell me more about exactly what this Numpy development is--if
> it's something I can still incorporate into PyFITS (nevermind what the Numpy
> people are doing with it) it could be very helpful.  We can take that
> discussion offlist if you want.

Hi Erik -

I'll say a bit here, since it may be of general interest.

The issue I addressed with my numpy addition was grabbing arbitrary
sub-elements of a file, ascii or binary, directly using a memmap like
interface.  On many systems the memmap interface to binary files is
somehow broken and will actually eat a lot of real memory, notably OS
X.   Furthermore the ascii reading routines that come with numpy are
inefficient for inhomogeneous tabular data.

So the code I developed made reading binary and ascii files efficient
and built right into numpy using a unified interface.  The main portion
of FITS tables fall under these categories.  The variable length
columns must be dealt with separately.



Some technical details:

The code in numpy is nearly stand-alone and could be ripped out and used
anywhere.  My fork is here https://github.com/esheldon/numpy. See
numpy.recfile.  It includes some nice contributions from others.

As a reference, an early C++ incarnation of the code is here
    http://code.google.com/p/recfile/
The version incorporated into numpy is pure C and is, frankly, far
superior.

I've been working with the view that a structured ndarray is an adequate
representation for FITS table data.  It can represent all the types in
some way, even variable length columns as object arrays.  It is
important to me to have the data stored in a standard numpy object for
interoperability with other libraries.  I think it is adequate to make
the HDU classes smart and understand what they are doing during input
and output.  I agree the fitsio hdu class should probably be factored.


-- 
Erin Scott Sheldon
Brookhaven National Laboratory erin dot sheldon at gmail dot com



More information about the AstroPy mailing list