[AstroPy] Consider ASDF for hierarchical numpy data

Daniel D'avella ddavella at stsci.edu
Mon Dec 4 18:00:27 EST 2017


There are some good suggestions in this thread. If you do in fact need to serialize your data to disk and if you're not tied to FITS for other reasons, you might consider using the Advanced Scientific Data Format (ASDF) which is designed specifically for this purpose. Here's an example of how to use ASDF to store the data set you described:


>>> import asdf

>>> import numpy as np


>>> data = {'D1': np.linspace( 0, 100, 8*4,).reshape(8, 4),

                  'D2': np.linspace( 0, 100, 10*5, ).reshape(10, 5),
                   'ND': {'D1': np.linspace( 0, 100, 10*5,).reshape(10, 5),

                             'D2': np.linspace( 0, 100, 8*4,).reshape(8, 4), }}}


# Writing data to file on disk

>>> outfile = asdf.AsdfFile(data)

>>> outfile.write_to('data.asdf')


# Reading data from file on disk

>>> infile = asdf.open('data.asdf')

>>> infile.tree

{'D1': <array (unloaded) shape: [8, 4] dtype: float64>,

 'D2': <array (unloaded) shape: [10, 5] dtype: float64>,
 'ND': {'D1': <array (unloaded) shape: [10, 5] dtype: float64>,  'D2': <array (unloaded) shape: [8, 4] dtype: float64>}}
# Data arrays can be accessed hierarchically from the top-level tree:
>>> infile.tree['D1']
array([[   0.        ,    3.22580645,    6.4516129 ,    9.67741935],
       [  12.90322581,   16.12903226,   19.35483871,   22.58064516],
       [  25.80645161,   29.03225806,   32.25806452,   35.48387097],
       [  38.70967742,   41.93548387,   45.16129032,   48.38709677],
       [  51.61290323,   54.83870968,   58.06451613,   61.29032258],
       [  64.51612903,   67.74193548,   70.96774194,   74.19354839],
       [  77.41935484,   80.64516129,   83.87096774,   87.09677419],
       [  90.32258065,   93.5483871 ,   96.77419355,  100.        ]])
>>> infile.tree['ND']
{'D1': <array (unloaded) shape: [10, 5] dtype: float64>,
 'D2': <array (unloaded) shape: [8, 4] dtype: float64>}

The metadata contents of the ASDF file are human-readable:

#ASDF 1.0.0
#ASDF_STANDARD 1.1.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
D1: !core/ndarray-1.0.0
  source: 0
  datatype: float64
  byteorder: little
  shape: [8, 4]
D2: !core/ndarray-1.0.0
  source: 1
  datatype: float64
  byteorder: little
  shape: [10, 5]
ND:
  D1: !core/ndarray-1.0.0
    source: 2
    datatype: float64
    byteorder: little
    shape: [10, 5]
  D2: !core/ndarray-1.0.0
    source: 3
    datatype: float64
    byteorder: little
    shape: [8, 4]
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
  name: asdf, version: 1.3.2.dev1044}

The data arrays themselves are stored efficiently, and can even be compressed.

ASDF is also capable of serializing various types from Astropy including tables, Time objects, units and quantities, and some transforms and coordinates.

ASDF can be installed using pip:
$ pip install asdf

Basic documentation can be found here:

http://asdf.readthedocs.io/en/latest/


If you have any questions feel free to open an issue in our  github repo:

https://github.com/spacetelescope/asdf

[https://avatars0.githubusercontent.com/u/2751928?s=400&v=4]<https://github.com/spacetelescope/asdf>

GitHub - spacetelescope/asdf: ASDF (Advanced Scientific Data Format) is a next generation interchange format for scientific data<https://github.com/spacetelescope/asdf>
github.com
asdf - ASDF (Advanced Scientific Data Format) is a next generation interchange format for scientific data




________________________________
From: AstroPy <astropy-bounces+ddavella=stsci.edu at python.org> on behalf of astropy-request at python.org <astropy-request at python.org>
Sent: Monday, December 4, 2017 4:51 PM
To: astropy at python.org
Subject: AstroPy Digest, Vol 135, Issue 2

Send AstroPy mailing list submissions to
        astropy at python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/astropy
or, via email, send a message with subject or body 'help' to
        astropy-request at python.org

You can reach the person managing the list at
        astropy-owner at python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of AstroPy digest..."


Today's Topics:

   1. Re: Nested recarrays in FITS (Paul Kuin)
   2. Re: Nested recarrays in FITS (Peter Teuben)


----------------------------------------------------------------------

Message: 1
Date: Mon, 4 Dec 2017 21:08:04 +0000
From: Paul Kuin <npkuin at gmail.com>
To: Astronomical Python mailing list <astropy at python.org>
Cc: Daniel Sela <danielsela42 at gmail.com>
Subject: Re: [AstroPy] Nested recarrays in FITS
Message-ID:
        <CANoQ6N3gCT1Ek91-VMgzxYc4z4+UuiK3gMu1HwigpdnKn-oxBg at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

I think that HDF does that for you. FIts is more flexible, but you have to
do your own writes and retrievals. In the end you will be reinventing the
wheel unless you check out how HDF does it, That's my opinion.

Cheers,

   Paul

On Mon, Dec 4, 2017 at 9:02 PM, Arnon Sela <arnon.sela at gmail.com> wrote:

> Dear Whom that Can Help,
>
> I have nested numpy recarray structure to be stored into Fits.
> The following code is a just a test I used to build a nested structure
> (data_for_fits variable in the last line of the code).
>
> Code start >>>>>>
>
> import numpy as np
>
> ''' The following two functions are adapted from:
> adopted from https://stackoverflow.com/questions/32328889/numpy-
> structured-array-from-arbitrary-level-nested-dictionary
> '''
>
> def mkdtype(d):
>     ''' Creates dtype for nested dictionary with numpy based type objects
>     '''
>     result = []
>     for k, v in d.items():
>         if isinstance(v,np.ndarray):
>             result.append((k, v.dtype, v.shape))
>         else:
>             result.append((k, mkdtype(v)))
>     return np.dtype(result)
>
> def dict2recarray(data, rec=None):
>     ''' Creates numpy.recarray from data (dict)
>     '''
>     def _dict2recarray(data, rec):
>         if rec.dtype.names:
>             for n in rec.dtype.names:
>                 _dict2recarray(data[n], rec[n])
>         else:
>             rec[:] = data
>         return rec
>
>     dtype = mkdtype(data)
>     if rec is None:
>         rec = np.zeros(dtype.shape, dtype)
>
>     return _dict2recarray(data, rec)
>
> datan_raw = {'DATA': {'D1': np.linspace( 0, 100, 8*4,).reshape(8, 4),
>                       'D2': np.linspace( 0, 100, 10*5, ).reshape(10, 5),
>                       'ND': {'D1': np.linspace( 0, 100, 10*5,
> ).reshape(10, 5),
>                              'D2': np.linspace( 0, 100, 8*4,).reshape(8,
> 4), }}}
>
> dtype = mkdtype(datan_raw)
> *data_for_fits* = dict2recarray(datan_raw)
>
>
> >>>>>> Code ends
>
> I couldn't find documentation on how to build such a FITS structure
> (nested recarrays).
>
> One option is to build sub-recarrays into different BIN tables with a
> header that would correspond to a nested key in the recarray. But that
> would require creating another function to reconstruct the recarray
> structure after reading the BIN tables from the FITS file.
>
> The better option is to build FITS is such a manner that would retrieve
> the structure correctly on FITS load().
>
> Thank you for your help,
>
> Best regards.
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at python.org
> https://mail.python.org/mailman/listinfo/astropy
>
>


--

* * * * * * * * http://www.mssl.ucl.ac.uk/~npmk/ * * * *
Paul Kuin, Mullard Space Science Laboratory, UCL<http://www.mssl.ucl.ac.uk/~npmk/>
www.mssl.ucl.ac.uk
Space Science, Supernovae, Novae, Gamma Ray Bursts, Solar Flares, Coronal Mass Ejections, Stellar Winds and Coronae, N. Paul M. Kuin


Dr. N.P.M. Kuin      (n.kuin at ucl.ac.uk)
phone +44-(0)1483 (prefix) -204111 (work)
mobile +44(0)7908715953  skype ID: npkuin
Mullard Space Science Laboratory  ? University College London  ?
Holmbury St Mary ? Dorking ? Surrey RH5 6NT?  U.K.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20171204/8577ee9c/attachment-0001.html>

------------------------------

Message: 2
Date: Mon, 4 Dec 2017 22:46:27 +0100
From: Peter Teuben <teuben at astro.umd.edu>
To: astropy at python.org
Subject: Re: [AstroPy] Nested recarrays in FITS
Message-ID: <b1419060-493c-bf7e-0ff0-0d2bcd1d31a1 at astro.umd.edu>
Content-Type: text/plain; charset="utf-8"


?another thought on this:

I think the original question was also limited in not explaining why fits was needed. I could argue for pickle. Paul is right, HDF might be a better match, especially if you have to switch to another language, HDF has a more native match to that. But does it have to be persistent data? otherwise using a python-c/fortran interface is far more efficient.? (I believe HDF is actually more flexible than the F in FITS).

You can't beat a native pickle:

??? ??? ??? import pickle
??? ??? ??? pickle.dump(datan_raw,open("test.dat","wb"))
??? ??? ??? ..
??? ??? ??? new_raw = pickle.load(open("test.dat", "rb"))

So perhaps we could return the question and ask in what situation you need this data structure (for).

- peter

On 12/04/2017 10:08 PM, Paul Kuin wrote:
> I think that HDF does that for you. FIts is more flexible, but you have to do your own writes and retrievals. In the end you will be reinventing the wheel unless you check out how HDF does it, That's my opinion.?
>
> Cheers,?
>
> ? ?Paul
>
> On Mon, Dec 4, 2017 at 9:02 PM, Arnon Sela <arnon.sela at gmail.com <mailto:arnon.sela at gmail.com>> wrote:
>
>     Dear Whom that Can Help,
>
>     I have nested numpy recarray structure to be stored into Fits.
>     The following code is a just a test I used to build a nested structure (data_for_fits variable in the last line of the code).
>
>     Code start >>>>>>
>
>         import numpy as np
>
>         ''' The following two functions are adapted from:?
>         adopted from https://stackoverflow.com/questions/32328889/numpy-structured-array-from-arbitrary-level-nested-dictionary <https://stackoverflow.com/questions/32328889/numpy-structured-array-from-arbitrary-level-nested-dictionary>
>         '''
>
>         def mkdtype(d):
>         ? ? ''' Creates dtype for nested dictionary with numpy based type objects
>         ? ? '''
>         ? ? result = []
>         ? ? for k, v in d.items():
>         ? ? ? ? if isinstance(v,np.ndarray):
>         ? ? ? ? ? ? result.append((k, v.dtype, v.shape))
>         ? ? ? ? else:
>         ? ? ? ? ? ? result.append((k, mkdtype(v)))
>         ? ? return np.dtype(result)
>
>         def dict2recarray(data, rec=None):
>         ? ? ''' Creates numpy.recarray from data (dict)
>         ? ? '''
>         ? ? def _dict2recarray(data, rec):
>         ? ? ? ? if rec.dtype.names:
>         ? ? ? ? ? ? for n in rec.dtype.names:
>         ? ? ? ? ? ? ? ? _dict2recarray(data[n], rec[n])
>         ? ? ? ? else:
>         ? ? ? ? ? ? rec[:] = data
>         ? ? ? ? return rec
>         ? ??
>         ? ? dtype = mkdtype(data)
>         ? ? if rec is None:
>         ? ? ? ? rec = np.zeros(dtype.shape, dtype)
>         ? ? ? ??
>         ? ? return _dict2recarray(data, rec)
>
>         datan_raw = {'DATA': {'D1': np.linspace( 0, 100, 8*4,).reshape(8, 4),
>         ? ? ? ? ? ? ? ? ? ? ? 'D2': np.linspace( 0, 100, 10*5, ).reshape(10, 5),?
>         ? ? ? ? ? ? ? ? ? ? ? 'ND': {'D1': np.linspace( 0, 100, 10*5, ).reshape(10, 5),?
>         ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'D2': np.linspace( 0, 100, 8*4,).reshape(8, 4), }}}
>
>         dtype = mkdtype(datan_raw)
>         */data_for_fits/* = dict2recarray(datan_raw)
>
>
>     >>>>>> Code ends
>
>     I couldn't find documentation on how to build such a FITS structure (nested recarrays).
>
>     One option is to build sub-recarrays into different BIN tables with a header that would correspond?to a nested key?in the recarray. But that would require creating another function to reconstruct the recarray structure after reading the BIN tables from the FITS file.
>
>     The better option is to build FITS is such a manner that would retrieve the structure correctly on FITS load().
>
>     Thank you for your help,
>
>     Best regards.
>
>     _______________________________________________
>     AstroPy mailing list
>     AstroPy at python.org <mailto:AstroPy at python.org>
>     https://mail.python.org/mailman/listinfo/astropy <https://mail.python.org/mailman/listinfo/astropy>
>
>
>
>
> --
>
> * * * * * * * * http://www.mssl.ucl.ac.uk/~npmk/ <http://www.mssl.ucl.ac.uk/%7Enpmk/> * * * *
> Dr. N.P.M. Kuin ? ? ?(n.kuin at ucl.ac.uk <mailto:n.kuin at ucl.ac.uk>) ? ? ?
> phone +44-(0)1483 (prefix) -204111 (work)
> mobile +44(0)7908715953 ?skype ID: npkuin
> Mullard Space Science Laboratory ?? University College London ??
> Holmbury St Mary ? Dorking ? Surrey RH5 6NT? ?U.K.
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at python.org
> https://mail.python.org/mailman/listinfo/astropy


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20171204/1e4458bb/attachment.html>

------------------------------

Subject: Digest Footer

_______________________________________________
AstroPy mailing list
AstroPy at python.org
https://mail.python.org/mailman/listinfo/astropy


------------------------------

End of AstroPy Digest, Vol 135, Issue 2
***************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/astropy/attachments/20171204/cc143683/attachment-0001.html>


More information about the AstroPy mailing list