[Neuroimaging] Planning for data formats - upcoming journal club

Matthew Brett matthew.brett at gmail.com
Tue Nov 9 10:27:02 EST 2021


Hi,

On Tue, Nov 9, 2021 at 2:35 PM Satrajit Ghosh <satra at mit.edu> wrote:
>
> hi,
>
> i can help provide a data archiving, metadata, and cloud/hpc access perspective. we are using hdf5 quite a bit both through nwb and otherwise at present, and are also using zarr, a cloud native format. i would not write off hdf5 and instead ask which use cases in what physical environments would run you into trouble.

Just to clarify - I am acutely aware of the need to think carefully
about use-cases.   In particular, I wanted to think about libraries
for accessing the data via an API - like Zarr - and the backend
storage format, such as HDF5, or a more transparent format like ASDF,
Exdir or similar.  What are the issues for performance now, and in the
future?  Can we unlock the potential of all those great Python
multiprocessing tools more easily with one format rather than another?
 And do we need the same or a different format for sharing data
between applications - such as SPM?  For example, might we want to
have several Zarr backends, one for transparency, one for performance
and one for cross-tool compatibility (such as HDF5).

I think the paper is very good at laying out those questions - that's
why I thought it would be a good place to start.

Cheers,

Matthew


More information about the Neuroimaging mailing list