
On 20/2/24 01:24, philippe@loco-labs.io wrote:
Hi community,
This memo is a proposal to implement a compact and reversible (lossless round-trip) JSON interface for multi-dimensional data and in particular for Numpy (see issue #12481). The links to the documents are at the end of the memo.
The JSON-NTV (Named and Typed value) format is a JSON format which integrates a notion of type. This format has also been implemented for tabular data (see NTV-pandas package available in the pandas ecosystem and the PDEP12 specification). .
The use of this format has the following advantages: - Taking into account data types not known to Numpy, - Reversible format (lossless round-trip) - Interoperability with other tools for tabular or multi-dimensional data (e.g. pandas, Xarray) - Ease of sharing Json format - Binary coding possible (e.g. CBOR format) - Format integrating data of different nature
The associated Jupyter Notebook presents some key points of this proposal (first draft):
Summary: - introduction - benefits - multi-dimensionnal data - Multi-dimensional types - Format JSON - Using the NTV format - Equivalence of tabular format and multidimensional format - Astropy specific points - Units and quantities - Coordinates - Tables - Other structures
This subject seems important to me (in particular for interoperability issues) and I would like to have your feedback before working on the implementation. Especially, - do you think this “semantic” format is interesting to use? - do you have any particular expectations or subjects that I need to study beforehand? - do you have any examples or test cases to offer me? And of course, any type of remark and comment is welcome.
Thanks in advance !
links: - Jupyter notebook : https://nbviewer.org/github/loco-philippe/Environmental-Sensing/blob/main/py... - JSON-NTV format : https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html - JSON-NTV overview : https://nbviewer.org/github/loco-philippe/NTV/blob/main/example/example_ntv.... - NTV tabular format : https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html#name-tabular-str... - NTV-pandas package : https://github.com/loco-philippe/ntv-pandas/blob/main/README.md - NTV-pandas examples : https://nbviewer.org/github/loco-philippe/ntv-pandas/blob/main/example/examp... - Pandas specification - PDEP12 : https://pandas.pydata.org/pdeps/0012-compact-and-reversible-JSON-interface.h...
There is an open issue [1] about such a format, is this the same or different? We discussed this at the latest triage meeting. While interoperability is one of NumPy's goals, and something we care deeply about, we were not sure how this initiative will play out. Perhaps, like the Pandas package, it should live outside NumPy for a while until some wider consensus could emerge. We did have a few questions about the standard: - How does it handle sharing data? NumPy can handle very large ndarrays, and a read-only container with a shared memory location, like in DLPack [0] seems more natural than a format that precludes sharing data. - Is there a size limitation either on the data or on the number of dimensions? Could this format represent, for instance, data with more than 100 dimensions, which could not be mapped back to NumPy. Matti [0] https://dmlc.github.io/dlpack/latest/ [1] https://github.com/numpy/numpy/issues/12481