On Thu, Nov 25, 2021 at 10:21 PM Qianqian Fang <q.fang@neu.edu> wrote:
On 11/25/21 17:05, Stephan Hoyer wrote:
Hi Qianqian,
What is your concrete proposal for NumPy here?
Are you suggesting new methods or functions like to_json/from_json in NumPy itself?
that would work - either define a subclass of JSONEncoder to serialize ndarray and allow users to pass it to cls in json.dump, or, as you mentioned, define to_json/from_json like pandas DataFrame <https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html> would save people from writing customized codes/formats.
I am also wondering if there is a more automated way to tell json.dump/dumps to use a default serializer for ndarray without using cls=...? I saw a SO post mentioned about a method called "__serialize__" in a class, but can't find it in the official doc. I am wondering if anyone is aware of the method defining a default json serializer in an object?
There isn't one. You have to explicitly provide the JSONEncoder. Which is why there is nothing that we can really do in numpy to avoid the TypeError that you mention below. The stdlib json module just doesn't give us the hooks to be able to do that. We can provide top-level functions like to_json()/from_json() to encode/decode a top-level ndarray to a JSON text, but that doesn't help with ndarrays in dicts or other objects. We could also provide a JSONEncoder/JSONDecoder pair, too, but as I mention in one of the Github issues you link to, there are a number of different expectations that people could have for what the JSON representation of an array is. Some will want to use the JData standard. Others might just want the arrays to be represented as lists of lists of plain-old JSON numbers in order to talk with software in other languages that have no particular standard for array data.
As far as I can tell, reading/writing in your custom JSON format already works with your jdata library.
ideally, I was hoping the small jdata encoder/decoder functions can be integrated into numpy; it can help avoid the "TypeError: Object of type ndarray is not JSON serializable" in json.dump/dumps without needing additional modules; more importantly, it simplifies users experience in exchanging complex arrays (complex valued, sparse, special shapes) with other programming environments.
It seems to me that the jdata package is the right place for implementing the JData standard. I'm happy for our documentation to point to it in all the places that we talk about serialization of arrays. If the json module did have some way for us to specify a default representation for our objects, then that would be a different matter. But for the present circumstances, I'm not seeing a substantial benefit to moving this code inside of numpy. Outside of numpy, you can evolve the JData standard at its own pace. -- Robert Kern