# [AstroPy] Consider ASDF for hierarchical numpy data

```There are some good suggestions in this thread. If you do in fact need to serialize your data to disk and if you're not tied to FITS for other reasons, you might consider using the Advanced Scientific Data Format (ASDF) which is designed specifically for this purpose. Here's an example of how to use ASDF to store the data set you described:

>>> import asdf

>>> import numpy as np

>>> data = {'D1': np.linspace( 0, 100, 8*4,).reshape(8, 4),

'D2': np.linspace( 0, 100, 10*5, ).reshape(10, 5),
'ND': {'D1': np.linspace( 0, 100, 10*5,).reshape(10, 5),

'D2': np.linspace( 0, 100, 8*4,).reshape(8, 4), }}}

# Writing data to file on disk

>>> outfile = asdf.AsdfFile(data)

>>> outfile.write_to('data.asdf')

# Reading data from file on disk

>>> infile = asdf.open('data.asdf')

>>> infile.tree

{'D1': <array (unloaded) shape: [8, 4] dtype: float64>,

'D2': <array (unloaded) shape: [10, 5] dtype: float64>,
'ND': {'D1': <array (unloaded) shape: [10, 5] dtype: float64>,  'D2': <array (unloaded) shape: [8, 4] dtype: float64>}}
# Data arrays can be accessed hierarchically from the top-level tree:
>>> infile.tree['D1']
array([[   0.        ,    3.22580645,    6.4516129 ,    9.67741935],
[  12.90322581,   16.12903226,   19.35483871,   22.58064516],
[  25.80645161,   29.03225806,   32.25806452,   35.48387097],
[  38.70967742,   41.93548387,   45.16129032,   48.38709677],
[  51.61290323,   54.83870968,   58.06451613,   61.29032258],
[  64.51612903,   67.74193548,   70.96774194,   74.19354839],
[  77.41935484,   80.64516129,   83.87096774,   87.09677419],
[  90.32258065,   93.5483871 ,   96.77419355,  100.        ]])
>>> infile.tree['ND']
{'D1': <array (unloaded) shape: [10, 5] dtype: float64>,
'D2': <array (unloaded) shape: [8, 4] dtype: float64>}

#ASDF 1.0.0
#ASDF_STANDARD 1.1.0
%YAML 1.1
%TAG ! tag:stsci.edu:asdf/
--- !core/asdf-1.0.0
D1: !core/ndarray-1.0.0
source: 0
datatype: float64
byteorder: little
shape: [8, 4]
D2: !core/ndarray-1.0.0
source: 1
datatype: float64
byteorder: little
shape: [10, 5]
ND:
D1: !core/ndarray-1.0.0
source: 2
datatype: float64
byteorder: little
shape: [10, 5]
D2: !core/ndarray-1.0.0
source: 3
datatype: float64
byteorder: little
shape: [8, 4]
asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf',
name: asdf, version: 1.3.2.dev1044}

The data arrays themselves are stored efficiently, and can even be compressed.

ASDF is also capable of serializing various types from Astropy including tables, Time objects, units and quantities, and some transforms and coordinates.

ASDF can be installed using pip:
\$ pip install asdf

Basic documentation can be found here:

If you have any questions feel free to open an issue in our  github repo:

https://github.com/spacetelescope/asdf

[https://avatars0.githubusercontent.com/u/2751928?s=400&v=4]<https://github.com/spacetelescope/asdf>

GitHub - spacetelescope/asdf: ASDF (Advanced Scientific Data Format) is a next generation interchange format for scientific data<https://github.com/spacetelescope/asdf>
github.com
asdf - ASDF (Advanced Scientific Data Format) is a next generation interchange format for scientific data

I think that HDF does that for you. FIts is more flexible, but you have to
do your own writes and retrievals. In the end you will be reinventing the
wheel unless you check out how HDF does it, That's my opinion.

> Dear Whom that Can Help,
>
> I have nested numpy recarray structure to be stored into Fits.
> The following code is a just a test I used to build a nested structure
> (data_for_fits variable in the last line of the code).
>
> Code start >>>>>>
>
> import numpy as np
>
> ''' The following two functions are adapted from:
> structured-array-from-arbitrary-level-nested-dictionary
> '''
>
> def mkdtype(d):
>     ''' Creates dtype for nested dictionary with numpy based type objects
>     '''
>     result = []
>     for k, v in d.items():
>         if isinstance(v,np.ndarray):
>             result.append((k, v.dtype, v.shape))
>         else:
>             result.append((k, mkdtype(v)))
>     return np.dtype(result)
>
> def dict2recarray(data, rec=None):
>     ''' Creates numpy.recarray from data (dict)
>     '''
>     def _dict2recarray(data, rec):
>         if rec.dtype.names:
>             for n in rec.dtype.names:
>                 _dict2recarray(data[n], rec[n])
>         else:
>             rec[:] = data
>         return rec
>
>     dtype = mkdtype(data)
>     if rec is None:
>         rec = np.zeros(dtype.shape, dtype)
>
>     return _dict2recarray(data, rec)
>
> datan_raw = {'DATA': {'D1': np.linspace( 0, 100, 8*4,).reshape(8, 4),
>                       'D2': np.linspace( 0, 100, 10*5, ).reshape(10, 5),
>                       'ND': {'D1': np.linspace( 0, 100, 10*5,
> ).reshape(10, 5),
>                              'D2': np.linspace( 0, 100, 8*4,).reshape(8,
> 4), }}}
>
> dtype = mkdtype(datan_raw)
> *data_for_fits* = dict2recarray(datan_raw)
>
>
> >>>>>> Code ends
>
> I couldn't find documentation on how to build such a FITS structure
> (nested recarrays).
>
> One option is to build sub-recarrays into different BIN tables with a
> header that would correspond to a nested key in the recarray. But that
> would require creating another function to reconstruct the recarray
> structure after reading the BIN tables from the FITS file.
>
> The better option is to build FITS is such a manner that would retrieve
> the structure correctly on FITS load().
>
> Thank you for your help,
>
> Best regards.
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at python.org
> https://mail.python.org/mailman/listinfo/astropy
>
>

?another thought on this:

I think the original question was also limited in not explaining why fits was needed. I could argue for pickle. Paul is right, HDF might be a better match, especially if you have to switch to another language, HDF has a more native match to that. But does it have to be persistent data? otherwise using a python-c/fortran interface is far more efficient.? (I believe HDF is actually more flexible than the F in FITS).

You can't beat a native pickle:

??? ??? ??? import pickle
??? ??? ??? pickle.dump(datan_raw,open("test.dat","wb"))
??? ??? ??? ..
??? ??? ??? new_raw = pickle.load(open("test.dat", "rb"))

So perhaps we could return the question and ask in what situation you need this data structure (for).

- peter

> I think that HDF does that for you. FIts is more flexible, but you have to do your own writes and retrievals. In the end you will be reinventing the wheel unless you check out how HDF does it, That's my opinion.?
>
> Cheers,?
>
> ? ?Paul
>
> On Mon, Dec 4, 2017 at 9:02 PM, Arnon Sela <arnon.sela at gmail.com <mailto:arnon.sela at gmail.com>> wrote:
>
>     Dear Whom that Can Help,
>
>     I have nested numpy recarray structure to be stored into Fits.
>     The following code is a just a test I used to build a nested structure (data_for_fits variable in the last line of the code).
>
>     Code start >>>>>>
>
>         import numpy as np
>
>         ''' The following two functions are adapted from:?
>         '''
>
>         def mkdtype(d):
>         ? ? ''' Creates dtype for nested dictionary with numpy based type objects
>         ? ? '''
>         ? ? result = []
>         ? ? for k, v in d.items():
>         ? ? ? ? if isinstance(v,np.ndarray):
>         ? ? ? ? ? ? result.append((k, v.dtype, v.shape))
>         ? ? ? ? else:
>         ? ? ? ? ? ? result.append((k, mkdtype(v)))
>         ? ? return np.dtype(result)
>
>         def dict2recarray(data, rec=None):
>         ? ? ''' Creates numpy.recarray from data (dict)
>         ? ? '''
>         ? ? def _dict2recarray(data, rec):
>         ? ? ? ? if rec.dtype.names:
>         ? ? ? ? ? ? for n in rec.dtype.names:
>         ? ? ? ? ? ? ? ? _dict2recarray(data[n], rec[n])
>         ? ? ? ? else:
>         ? ? ? ? ? ? rec[:] = data
>         ? ? ? ? return rec
>         ? ??
>         ? ? dtype = mkdtype(data)
>         ? ? if rec is None:
>         ? ? ? ? rec = np.zeros(dtype.shape, dtype)
>         ? ? ? ??
>         ? ? return _dict2recarray(data, rec)
>
>         datan_raw = {'DATA': {'D1': np.linspace( 0, 100, 8*4,).reshape(8, 4),
>         ? ? ? ? ? ? ? ? ? ? ? 'D2': np.linspace( 0, 100, 10*5, ).reshape(10, 5),?
>         ? ? ? ? ? ? ? ? ? ? ? 'ND': {'D1': np.linspace( 0, 100, 10*5, ).reshape(10, 5),?
>         ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'D2': np.linspace( 0, 100, 8*4,).reshape(8, 4), }}}
>
>         dtype = mkdtype(datan_raw)
>         */data_for_fits/* = dict2recarray(datan_raw)
>
>
>     >>>>>> Code ends
>
>     I couldn't find documentation on how to build such a FITS structure (nested recarrays).
>
>     One option is to build sub-recarrays into different BIN tables with a header that would correspond?to a nested key?in the recarray. But that would require creating another function to reconstruct the recarray structure after reading the BIN tables from the FITS file.
>
>     The better option is to build FITS is such a manner that would retrieve the structure correctly on FITS load().
>
>     Thank you for your help,
>
>     Best regards.
>
>     _______________________________________________
>     AstroPy mailing list
>     AstroPy at python.org <mailto:AstroPy at python.org>
>     https://mail.python.org/mailman/listinfo/astropy <https://mail.python.org/mailman/listinfo/astropy>
>
>
>
>
>
>
> _______________________________________________
> AstroPy mailing list
> AstroPy at python.org
> https://mail.python.org/mailman/listinfo/astropy

```