Re: Improvement: __json__

As the author of one of these third-party libraries, I feel like I can contribute to this discussion. It can indeed be done very elegantly with type annotations, and it should for sure be left to the ecosystem. The only things we need from core Python are good tools for dealing with run-time type information. For example, singledispatch doesn't really work with types (i.e. Optionals, Unions, Sequences as opposed to actual classes), but all of that can be worked around. In my experience, runtime type information is extremely useful exactly in cases of deserialization, and for big projects starts being useful much before the type information starts being useful in a static analysis context.

Would you generate a schema from the type annotations so that other languages can use the data? JSON is really not the most efficient data representation for round-tripping to and from Python. A binary format (or pickle without arbitrary code execution) would solve for Python object > file > Python object. IMHO, JSON is most useful when the data needs to be read/written from (browser-side) JS, too. And then you need data validation for user-supplied input (which is not distinct from the deserialization problem). E.g. DRF has data validation. But (mypy) type annotations are insufficient for data validation: compare JSONschema and the maximum amount of information storable in mypy-compatible type annotations, for example. Data validation needs to return errors in a feedback loop with the user, so it's sort of more than just the preconditions that need to be satisfied before a function proceeds. - serialization from Python (TA useful) - deserialization with the same Python code (TA useful) - deserialization with other Python code (insufficient) - deserialization with other languages (insufficient) - form generation (insufficient) - data validation (insufficient) - preconditions (insufficient) So, IMHO type annotations are not insufficient and thus redundant and not elegant. On Tue, Apr 7, 2020, 11:45 AM Tin Tvrtković <tinchester@gmail.com> wrote:

On Tue, Apr 7, 2020 at 11:17 AM Wes Turner <wes.turner@gmail.com> wrote:
Would you generate a schema from the type annotations so that other languages can use the data?
I haven't done this yet, but it would be pretty cool.
So, IMHO type annotations are not insufficient and thus redundant and not elegant.
you mean are not sufficient / or are insufficient? But what is meant by "type annotations"? I'm using them via dataclasses -- really as a shorthand for assigning a type to every field -- the annotations are just a shorthand that auto-generates a schema, essentially. @dataclass class MyClass: x: A_Type = A_default So now I know that this class has a field names x that is the type int. So I use that type for validation, and serialization / deserialization. But if you mean "type annotations" in the sense of the types provided out of the box in the typing module and used by MyPy (so I've heard, never did it myself) -- no, they are not sufficient -- I need types that support my serialization / deserialization system, and my validation system. And I suppose we could have a standardized __json__ and __from_json__ protocol that I could use, but it seems a special case to me. Note: I don't need to do anything special for types with standard json representation, but that's only the basics -- I end up using custom types for anything nested. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Apr 7, 2020, at 15:31, Christopher Barker <pythonchb@gmail.com> wrote:
This seems like one of the many things that’s impossible to do for Python classes with full generality, but pretty easy to do if you only want to support @dataclass. Either dynamically or statically, in fact. You could even write a tool that generates dataclasses (statically or dynamically) from a schema, if you wanted.
A type annotation is just the “: whatever”. It doesn’t matter whether that whatever is a real dynamic type like int or spam.Eggs or list, or a typing.type like List[int], it’s still an annotation. And dataclass can handle either—change that to “x: List[A_Type]” and at runtime, the dataclass will treat that the same as if you just used list, but mypy can know that it’s only supposed to have A_Type members in that list. (So if you initialize MyClass([1, “spam”, open(“eggs.txt”)]) it’ll work at runtime, but mypy will flag it as a type error.) An “automated JSON serialization for dataclasses” library could do either. It could be “dumb” like @dataclass and just treat x as a list of anything serializable, or it could be “smart” and treat it as a list of A_Type objects only. Either way seems like it makes sense. A schema generator for dataclasses, I think you’d want it to use the typing information. A MyClass property a doesn’t just have an attribute of {“type”: “array”}; it has a {“type”: “array”, “contains”: recursively_schematize(A_Type)}.

API schemas are a useful tool that allow for a range of use cases, including generating reference documentation, or driving dynamic client
*That* should read as "are not sufficient". Stuffing all of those into annotations is going to be cumbersome; resulting in there being multiple schema definitions to keep synchronized and validate data according to. I think generating some amalgamation of JSONLD @context & SHACL and JSON schema would be an interesting exercise. You'd certainly want to add more information to the generated schema than just the corresponding Python types : - type URI(s) that correspond to the Python primitive types - JSON schema format strings - JSON schema length - TODO JSON schema [...] You could replace the first comma in this with a colon and call that a validatable type annotation: attrname, pytype, type URI, JSONschema format, validators name, str, xsd:string, string url, str, xsd:anyURI, uri dateCreated, datetime.datetime, xsd:dateTime, date-time, regex_iso8601 author_email, str, xsd:string, email username, str, xsd:string, string, {length: {minLength:3, maxLength:32}} Stuffing all of those into annotations is going to be cumbersome; resulting in there being multiple schema definitions to keep synchronized and validate data according to. https://docs.python.org/3/reference/datamodel.html https://json-schema.org/understanding-json-schema/reference/string.html https://www.w3.org/TR/json-ld11/#the-context https://www.w3.org/TR/json-ld11/#advanced-context-usage https://www.w3.org/TR/shacl/ https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.3.md (JSON schema) Similarly unnecessarily redundant: having a models.py, forms.py, api.py, and an OpenAPI specification that includes the JSON schema (that DRF *tries* to generate from api.py) https://www.django-rest-framework.org/api-guide/schemas/ libraries that can interact with your API.
Django REST Framework provides support for automatic generation of
OpenAPI schemas. On Tue, Apr 7, 2020, 8:24 PM Andrew Barnert <abarnert@yahoo.com> wrote:

On Apr 7, 2020, at 18:10, Wes Turner <wes.turner@gmail.com> wrote:
Not everything in the world has to be built around RDF semantic triples. In fact, most things don’t have to be. That’s why are a lot more things out there using plain old JSON Schema for their APIs and formats than using JSON-LD. And even more things just using free form JSON. And for either of those, type annotations are sufficient. You can serialize any instance of Spam to JSON, and deserialize JSON (that you know represents a Spam) to an equal Spam instance, as long as you know what the name and type of every attribute of Spam is (and all of those types are number/string/book/null, types that match the same qualifications as Spam, lists of such a type, and dicts mapping str to such a type). Which is guaranteed to be knowable for dataclasses even without any external information. Or any classes with a (correct) accompanying schema. Or just any classes you design around such a serialization system. The fact that you don’t have, e.g., a URI with metadata about Spam doesn’t in any way stop any of that from working, or being useful. Type annotations are sufficient for this purpose. In fact, even type annotations aren’t necessary. Any value that can pickle, you can just msg=json.dumps(b64_encode(pickle.dumps(obj)))) and obj=pickle.loads(b64_decode(json.loads(msg)))) and you’ve got working JSON serialization. What type annotations add is JSON serialization that’s human readable/editable, or computer verifiable, or both. You don’t need JSON-LD unless you’re not just building APIs, but meta-indexes of APIs or automatic API generators or something.

You don't need JSON at all in order to serialize and deserialize instances of Python objects and primitives. Pickle (or e.g. Arrow + [parquet,]) handles nested, arbitrary complex types efficiently and without dataclasses and/or type annotations. I don't see the value in using JSON to round-trip from Python to the same Python code. External schema is far more useful than embedding part of an ad-hoc nested object schema in type annotations that can't also do or even specify data validations. You can already jsonpickle data classes. If you want to share or just publish data, external schema using a web standard is your best bet. On Wed, Apr 8, 2020, 3:30 AM Andrew Barnert <abarnert@yahoo.com> wrote:

On Wed, Apr 8, 2020 at 1:18 AM Wes Turner <wes.turner@gmail.com> wrote:
I don't see the value in using JSON to round-trip from Python to the same Python code.
There is a bit of a value: it's a human-readable format that matches the Python Data model pretty well. I've used it for that reason. Maybe I should be using yaml or something instead, but it's nice to use something common. I have thought about using "PYSON" -- which would better match the Python data model, but never got around to formalizing that. But yes, the real advantage to JSON is interaction with non-python systems.
I suppose so -- I really need to see if I can make use of JSONSchema -- it would be nice to specify teh schema in one place, and be able to build Python objects, and Javascript objects, and theoretically all kinds of other implementations as well. Someone may have done that, I need to go look. NOTE: JSON-LD keeps coming up in these thrteads, but that really seems like an orthogonal issue. Maybe there should be JSON-LD support for Python (is there already?) but that shouldn't impact the core json library, nor a __json__ magic method. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Apr 8, 2020, at 01:18, Wes Turner <wes.turner@gmail.com> wrote:
I don't see the value in using JSON to round-trip from Python to the same Python code.
External schema is far more useful than embedding part of an ad-hoc nested object schema in type annotations that can't also do or even specify data validations.
But dataclasses with type annotations can express a complete JSON Schema. Or, of course, in an ad hoc schema only published in human readable form, as most web APIs use today.
You can already jsonpickle data classes. If you want to share or just publish data, external schema using a web standard is your best bet.
Sure, but you don’t need JSON-LD for that. Again, the fact that type annotations are insufficient to represent semantic triples is irrelevant. They are sufficient for the case of writing code to parse what YouTube gives you, or to provide a documented API that you design to be consumed by other people in JS, or to generate a JSON Schema from your legacy collection of classes that you can then publish, or to validate that a dataclass hierarchy matches a published schema, and so on. All of which are useful.

I'm aware of a couple Python implementations of JSON-LD: pyld [1] and rdflib-jsonld [2]. But you don't need a JSON-LD parser to parse or produce JSON-LD: You can just frame the data in the json document correctly such that other tools can easily parse your necessarily complex data types stored within the limited set of primitive datatypes supported by JSON. We should welcome efforts to support linked data in Python. TimBL created the web on top of the internet in order that we could share resources in order to collaborate on science. In order to collaborate on science, we need to be able to share, discover, merge, join, concatenate, analyze, and compare Datasets. TimBL's 5-star Open Data plan [3] justifies the costs and benefits of sharing LOD: Linked Open Data. ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ make it available in a non-proprietary open format (e.g., CSV instead of Excel) ★★★★ use URIs to denote things, so that people can point at your stuff ★★★★★ link your data to other data to provide context JSON is ★★★ data. JSON-LD, RDFa, and Microformats are ★★★★ or ★★★★★ data. We can link our data to other data with URIs in linked data formats. JSON-LD is one representation of RDF. RDF* (read as "RDF star") extends RDF for use with property graphs. No one cares whether you believe that "semantic web failed" or "those standards are useless": being able to share, discover, merge, join, concatenate, analyze, and compare Datasets is of significant value to the progress of the sciences and useful arts; so, I think that we should support the linked data use case with at least: 1. a __json__(obj, spec=None) method and 2. a more-easily modifiable make_iterencode/iterencode implementation in the json module of the standard library. [1] https://github.com/digitalbazaar/pyld [2] https://github.com/RDFLib/rdflib-jsonld [3] https://5stardata.info/en/ ***** Here's this that merges JSON-LD, SHACL, and JSON schema: It has 3 stars. Validating JSON documents is indeed somewhat orthogonal to the __json__ / iterencode implementation details we're discussing; but specifying types in type annotations and then creating an additional complete data validation specification is not DRY. https://github.com/mulesoft-labs/json-ld-schema ***** re: generating JSON schema from type annotations You can go from python:Str to jsonschema:format:string easily enough, but, again, going from python:str to jsonschema:format:email will require either extending the type annotation syntax or modifying a generated schema stub and then changes to which will then need to be ported back to the type annotations. I suppose if all you're working with are data classes, that generating part of the JSON schema from data class type annotations could be useful to you in your quest to develop a new subset of JSON thats supports deserializing (and validating) complex types in at least Python and JS (when there are existing standards for doing so). On Wed, Apr 8, 2020 at 3:08 PM Andrew Barnert <abarnert@yahoo.com> wrote:

On Apr 8, 2020, at 12:55, Wes Turner <wes.turner@gmail.com> wrote:
We should welcome efforts to support linked data in Python.
Fine, but that’s doesn’t meant we should derail every proposal that has anything to do with JSON by turning it into a proposal for linked data, or consider a useful proposal to be useless because it doesn’t give us linked data support on top of whatever it was intended to give us.
No one cares whether you believe that "semantic web failed" or "those standards are useless":
But no one is saying either of those things. People are saying that the semantic web is irrelevant to this discussion. JSON is useful, JSON Schema is useful, improving Python to make working with ad hoc or Schema-specified JSON is useful, and that’s all equally true whether the semantic web is the one true future or a total failure. Would a __json__ method protocol improve Python? I don’t think so, but the reasons I don’t think so have nothing to do with LD. Other people do think so, and their reasons also have nothing to do with LD. And the same is true for most of the counter- and side-ideas that have come up in this thread and the last one.
Sure, but who says you have to store email addresses as str? @dataclass class Person: name: str email: Email address: Address The fact that str is a builtin type, Address is a dataclass with a bunch of builtin-typed attributes, and Email is (say) a str subclass (or a dataclass with just a str member or whatever) that knows to render to and from type:string, format:email instead of just type:string doesn’t change the fact that they’re all perfectly good Python types and can all be used as type annotations and can all be mapped to a JSON Schema. How does it know that? Lots of things would work. Which one you use would be up to your JSON Schema-driven serialization library, or to your Python-code-from-JSON-Schema generator tool or your static Schema-from-code generator tool or.whatever. Maybe if we have multiple such libraries in wide use fighting it out, one of them will win. But even if there’s never a category killer, any of them will work fine for the applications that use it. And of course often just using str for email addresses is fine. There’s a reason the Formats section of the JSON Schema spec is optional and explicitly calls out that many implementations don’t include it. And in some applications it would make more sense to break an email address down into two parts (user and host) and store a dict of those two values instead, and that’s also fine. Would it better serve the needs of the semantic web if the email format were a mandatory rather than optional part of the spec and everyone were required by law to always use it when storing email addresses? Maybe. But neither of those things are true.

I think that we should support the linked data use case with at least: 1. a __json__(obj, spec=None) method and 2. a more-easily modifiable make_iterencode/iterencode implementation in the json module of the standard library. On Wed, Apr 8, 2020 at 5:00 PM Andrew Barnert <abarnert@yahoo.com> wrote:

If specified, default should be a function that gets called for objects
In trying to do this (again), I've realized that you *can't* just check for hasattr(obj, '__json__') in a JSONEncoder.default method because default only get's called for types it doesn't know how to serialize; so if you e.g. subclass a dict, default() never gets called for the dict subclass. https://docs.python.org/3/library/json.html : that can’t otherwise be serialized. https://github.com/python/cpython/blob/master/Lib/json/encoder.py https://stackoverflow.com/questions/16405969/how-to-change-json-encoding-beh... specifies how to overload _make_iterencode *in pure Python*, but AFAICS that NOPs use of the C-optimized json encoder. The python json module originally came from simplejson. Here's simplejson's for_json implementation in pure Python: https://github.com/simplejson/simplejson/blob/288e4e005c39a2eb855b5225c5dc8e... : for_json = _for_json and getattr(value, 'for_json', None) if for_json and callable(for_json): chunks = _iterencode(for_json(), _current_indent_level) And simplejson's for_json implementation in C: https://github.com/simplejson/simplejson/blob/288e4e005c39a2eb855b5225c5dc8e... Is there a strong reason that the method would need to be called __json__ instead of 'for_json'? Passing a spec=None kwarg through to the __json__()/for_json() method would not be compatible with simplejson's existing implementation. Passing a spec=None kwarg would be necessary to support different JSON standards within the same method; which I think is desirable because JSON/JSON5/JSONLD/JSON_future. On Wed, Apr 8, 2020 at 5:03 PM Wes Turner <wes.turner@gmail.com> wrote:

Would you generate a schema from the type annotations so that other languages can use the data? JSON is really not the most efficient data representation for round-tripping to and from Python. A binary format (or pickle without arbitrary code execution) would solve for Python object > file > Python object. IMHO, JSON is most useful when the data needs to be read/written from (browser-side) JS, too. And then you need data validation for user-supplied input (which is not distinct from the deserialization problem). E.g. DRF has data validation. But (mypy) type annotations are insufficient for data validation: compare JSONschema and the maximum amount of information storable in mypy-compatible type annotations, for example. Data validation needs to return errors in a feedback loop with the user, so it's sort of more than just the preconditions that need to be satisfied before a function proceeds. - serialization from Python (TA useful) - deserialization with the same Python code (TA useful) - deserialization with other Python code (insufficient) - deserialization with other languages (insufficient) - form generation (insufficient) - data validation (insufficient) - preconditions (insufficient) So, IMHO type annotations are not insufficient and thus redundant and not elegant. On Tue, Apr 7, 2020, 11:45 AM Tin Tvrtković <tinchester@gmail.com> wrote:

On Tue, Apr 7, 2020 at 11:17 AM Wes Turner <wes.turner@gmail.com> wrote:
Would you generate a schema from the type annotations so that other languages can use the data?
I haven't done this yet, but it would be pretty cool.
So, IMHO type annotations are not insufficient and thus redundant and not elegant.
you mean are not sufficient / or are insufficient? But what is meant by "type annotations"? I'm using them via dataclasses -- really as a shorthand for assigning a type to every field -- the annotations are just a shorthand that auto-generates a schema, essentially. @dataclass class MyClass: x: A_Type = A_default So now I know that this class has a field names x that is the type int. So I use that type for validation, and serialization / deserialization. But if you mean "type annotations" in the sense of the types provided out of the box in the typing module and used by MyPy (so I've heard, never did it myself) -- no, they are not sufficient -- I need types that support my serialization / deserialization system, and my validation system. And I suppose we could have a standardized __json__ and __from_json__ protocol that I could use, but it seems a special case to me. Note: I don't need to do anything special for types with standard json representation, but that's only the basics -- I end up using custom types for anything nested. -CHB
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Apr 7, 2020, at 15:31, Christopher Barker <pythonchb@gmail.com> wrote:
This seems like one of the many things that’s impossible to do for Python classes with full generality, but pretty easy to do if you only want to support @dataclass. Either dynamically or statically, in fact. You could even write a tool that generates dataclasses (statically or dynamically) from a schema, if you wanted.
A type annotation is just the “: whatever”. It doesn’t matter whether that whatever is a real dynamic type like int or spam.Eggs or list, or a typing.type like List[int], it’s still an annotation. And dataclass can handle either—change that to “x: List[A_Type]” and at runtime, the dataclass will treat that the same as if you just used list, but mypy can know that it’s only supposed to have A_Type members in that list. (So if you initialize MyClass([1, “spam”, open(“eggs.txt”)]) it’ll work at runtime, but mypy will flag it as a type error.) An “automated JSON serialization for dataclasses” library could do either. It could be “dumb” like @dataclass and just treat x as a list of anything serializable, or it could be “smart” and treat it as a list of A_Type objects only. Either way seems like it makes sense. A schema generator for dataclasses, I think you’d want it to use the typing information. A MyClass property a doesn’t just have an attribute of {“type”: “array”}; it has a {“type”: “array”, “contains”: recursively_schematize(A_Type)}.

API schemas are a useful tool that allow for a range of use cases, including generating reference documentation, or driving dynamic client
*That* should read as "are not sufficient". Stuffing all of those into annotations is going to be cumbersome; resulting in there being multiple schema definitions to keep synchronized and validate data according to. I think generating some amalgamation of JSONLD @context & SHACL and JSON schema would be an interesting exercise. You'd certainly want to add more information to the generated schema than just the corresponding Python types : - type URI(s) that correspond to the Python primitive types - JSON schema format strings - JSON schema length - TODO JSON schema [...] You could replace the first comma in this with a colon and call that a validatable type annotation: attrname, pytype, type URI, JSONschema format, validators name, str, xsd:string, string url, str, xsd:anyURI, uri dateCreated, datetime.datetime, xsd:dateTime, date-time, regex_iso8601 author_email, str, xsd:string, email username, str, xsd:string, string, {length: {minLength:3, maxLength:32}} Stuffing all of those into annotations is going to be cumbersome; resulting in there being multiple schema definitions to keep synchronized and validate data according to. https://docs.python.org/3/reference/datamodel.html https://json-schema.org/understanding-json-schema/reference/string.html https://www.w3.org/TR/json-ld11/#the-context https://www.w3.org/TR/json-ld11/#advanced-context-usage https://www.w3.org/TR/shacl/ https://github.com/OAI/OpenAPI-Specification/blob/master/versions/3.0.3.md (JSON schema) Similarly unnecessarily redundant: having a models.py, forms.py, api.py, and an OpenAPI specification that includes the JSON schema (that DRF *tries* to generate from api.py) https://www.django-rest-framework.org/api-guide/schemas/ libraries that can interact with your API.
Django REST Framework provides support for automatic generation of
OpenAPI schemas. On Tue, Apr 7, 2020, 8:24 PM Andrew Barnert <abarnert@yahoo.com> wrote:

On Apr 7, 2020, at 18:10, Wes Turner <wes.turner@gmail.com> wrote:
Not everything in the world has to be built around RDF semantic triples. In fact, most things don’t have to be. That’s why are a lot more things out there using plain old JSON Schema for their APIs and formats than using JSON-LD. And even more things just using free form JSON. And for either of those, type annotations are sufficient. You can serialize any instance of Spam to JSON, and deserialize JSON (that you know represents a Spam) to an equal Spam instance, as long as you know what the name and type of every attribute of Spam is (and all of those types are number/string/book/null, types that match the same qualifications as Spam, lists of such a type, and dicts mapping str to such a type). Which is guaranteed to be knowable for dataclasses even without any external information. Or any classes with a (correct) accompanying schema. Or just any classes you design around such a serialization system. The fact that you don’t have, e.g., a URI with metadata about Spam doesn’t in any way stop any of that from working, or being useful. Type annotations are sufficient for this purpose. In fact, even type annotations aren’t necessary. Any value that can pickle, you can just msg=json.dumps(b64_encode(pickle.dumps(obj)))) and obj=pickle.loads(b64_decode(json.loads(msg)))) and you’ve got working JSON serialization. What type annotations add is JSON serialization that’s human readable/editable, or computer verifiable, or both. You don’t need JSON-LD unless you’re not just building APIs, but meta-indexes of APIs or automatic API generators or something.

You don't need JSON at all in order to serialize and deserialize instances of Python objects and primitives. Pickle (or e.g. Arrow + [parquet,]) handles nested, arbitrary complex types efficiently and without dataclasses and/or type annotations. I don't see the value in using JSON to round-trip from Python to the same Python code. External schema is far more useful than embedding part of an ad-hoc nested object schema in type annotations that can't also do or even specify data validations. You can already jsonpickle data classes. If you want to share or just publish data, external schema using a web standard is your best bet. On Wed, Apr 8, 2020, 3:30 AM Andrew Barnert <abarnert@yahoo.com> wrote:

On Wed, Apr 8, 2020 at 1:18 AM Wes Turner <wes.turner@gmail.com> wrote:
I don't see the value in using JSON to round-trip from Python to the same Python code.
There is a bit of a value: it's a human-readable format that matches the Python Data model pretty well. I've used it for that reason. Maybe I should be using yaml or something instead, but it's nice to use something common. I have thought about using "PYSON" -- which would better match the Python data model, but never got around to formalizing that. But yes, the real advantage to JSON is interaction with non-python systems.
I suppose so -- I really need to see if I can make use of JSONSchema -- it would be nice to specify teh schema in one place, and be able to build Python objects, and Javascript objects, and theoretically all kinds of other implementations as well. Someone may have done that, I need to go look. NOTE: JSON-LD keeps coming up in these thrteads, but that really seems like an orthogonal issue. Maybe there should be JSON-LD support for Python (is there already?) but that shouldn't impact the core json library, nor a __json__ magic method. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Apr 8, 2020, at 01:18, Wes Turner <wes.turner@gmail.com> wrote:
I don't see the value in using JSON to round-trip from Python to the same Python code.
External schema is far more useful than embedding part of an ad-hoc nested object schema in type annotations that can't also do or even specify data validations.
But dataclasses with type annotations can express a complete JSON Schema. Or, of course, in an ad hoc schema only published in human readable form, as most web APIs use today.
You can already jsonpickle data classes. If you want to share or just publish data, external schema using a web standard is your best bet.
Sure, but you don’t need JSON-LD for that. Again, the fact that type annotations are insufficient to represent semantic triples is irrelevant. They are sufficient for the case of writing code to parse what YouTube gives you, or to provide a documented API that you design to be consumed by other people in JS, or to generate a JSON Schema from your legacy collection of classes that you can then publish, or to validate that a dataclass hierarchy matches a published schema, and so on. All of which are useful.

I'm aware of a couple Python implementations of JSON-LD: pyld [1] and rdflib-jsonld [2]. But you don't need a JSON-LD parser to parse or produce JSON-LD: You can just frame the data in the json document correctly such that other tools can easily parse your necessarily complex data types stored within the limited set of primitive datatypes supported by JSON. We should welcome efforts to support linked data in Python. TimBL created the web on top of the internet in order that we could share resources in order to collaborate on science. In order to collaborate on science, we need to be able to share, discover, merge, join, concatenate, analyze, and compare Datasets. TimBL's 5-star Open Data plan [3] justifies the costs and benefits of sharing LOD: Linked Open Data. ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ make it available in a non-proprietary open format (e.g., CSV instead of Excel) ★★★★ use URIs to denote things, so that people can point at your stuff ★★★★★ link your data to other data to provide context JSON is ★★★ data. JSON-LD, RDFa, and Microformats are ★★★★ or ★★★★★ data. We can link our data to other data with URIs in linked data formats. JSON-LD is one representation of RDF. RDF* (read as "RDF star") extends RDF for use with property graphs. No one cares whether you believe that "semantic web failed" or "those standards are useless": being able to share, discover, merge, join, concatenate, analyze, and compare Datasets is of significant value to the progress of the sciences and useful arts; so, I think that we should support the linked data use case with at least: 1. a __json__(obj, spec=None) method and 2. a more-easily modifiable make_iterencode/iterencode implementation in the json module of the standard library. [1] https://github.com/digitalbazaar/pyld [2] https://github.com/RDFLib/rdflib-jsonld [3] https://5stardata.info/en/ ***** Here's this that merges JSON-LD, SHACL, and JSON schema: It has 3 stars. Validating JSON documents is indeed somewhat orthogonal to the __json__ / iterencode implementation details we're discussing; but specifying types in type annotations and then creating an additional complete data validation specification is not DRY. https://github.com/mulesoft-labs/json-ld-schema ***** re: generating JSON schema from type annotations You can go from python:Str to jsonschema:format:string easily enough, but, again, going from python:str to jsonschema:format:email will require either extending the type annotation syntax or modifying a generated schema stub and then changes to which will then need to be ported back to the type annotations. I suppose if all you're working with are data classes, that generating part of the JSON schema from data class type annotations could be useful to you in your quest to develop a new subset of JSON thats supports deserializing (and validating) complex types in at least Python and JS (when there are existing standards for doing so). On Wed, Apr 8, 2020 at 3:08 PM Andrew Barnert <abarnert@yahoo.com> wrote:

On Apr 8, 2020, at 12:55, Wes Turner <wes.turner@gmail.com> wrote:
We should welcome efforts to support linked data in Python.
Fine, but that’s doesn’t meant we should derail every proposal that has anything to do with JSON by turning it into a proposal for linked data, or consider a useful proposal to be useless because it doesn’t give us linked data support on top of whatever it was intended to give us.
No one cares whether you believe that "semantic web failed" or "those standards are useless":
But no one is saying either of those things. People are saying that the semantic web is irrelevant to this discussion. JSON is useful, JSON Schema is useful, improving Python to make working with ad hoc or Schema-specified JSON is useful, and that’s all equally true whether the semantic web is the one true future or a total failure. Would a __json__ method protocol improve Python? I don’t think so, but the reasons I don’t think so have nothing to do with LD. Other people do think so, and their reasons also have nothing to do with LD. And the same is true for most of the counter- and side-ideas that have come up in this thread and the last one.
Sure, but who says you have to store email addresses as str? @dataclass class Person: name: str email: Email address: Address The fact that str is a builtin type, Address is a dataclass with a bunch of builtin-typed attributes, and Email is (say) a str subclass (or a dataclass with just a str member or whatever) that knows to render to and from type:string, format:email instead of just type:string doesn’t change the fact that they’re all perfectly good Python types and can all be used as type annotations and can all be mapped to a JSON Schema. How does it know that? Lots of things would work. Which one you use would be up to your JSON Schema-driven serialization library, or to your Python-code-from-JSON-Schema generator tool or your static Schema-from-code generator tool or.whatever. Maybe if we have multiple such libraries in wide use fighting it out, one of them will win. But even if there’s never a category killer, any of them will work fine for the applications that use it. And of course often just using str for email addresses is fine. There’s a reason the Formats section of the JSON Schema spec is optional and explicitly calls out that many implementations don’t include it. And in some applications it would make more sense to break an email address down into two parts (user and host) and store a dict of those two values instead, and that’s also fine. Would it better serve the needs of the semantic web if the email format were a mandatory rather than optional part of the spec and everyone were required by law to always use it when storing email addresses? Maybe. But neither of those things are true.

I think that we should support the linked data use case with at least: 1. a __json__(obj, spec=None) method and 2. a more-easily modifiable make_iterencode/iterencode implementation in the json module of the standard library. On Wed, Apr 8, 2020 at 5:00 PM Andrew Barnert <abarnert@yahoo.com> wrote:

If specified, default should be a function that gets called for objects
In trying to do this (again), I've realized that you *can't* just check for hasattr(obj, '__json__') in a JSONEncoder.default method because default only get's called for types it doesn't know how to serialize; so if you e.g. subclass a dict, default() never gets called for the dict subclass. https://docs.python.org/3/library/json.html : that can’t otherwise be serialized. https://github.com/python/cpython/blob/master/Lib/json/encoder.py https://stackoverflow.com/questions/16405969/how-to-change-json-encoding-beh... specifies how to overload _make_iterencode *in pure Python*, but AFAICS that NOPs use of the C-optimized json encoder. The python json module originally came from simplejson. Here's simplejson's for_json implementation in pure Python: https://github.com/simplejson/simplejson/blob/288e4e005c39a2eb855b5225c5dc8e... : for_json = _for_json and getattr(value, 'for_json', None) if for_json and callable(for_json): chunks = _iterencode(for_json(), _current_indent_level) And simplejson's for_json implementation in C: https://github.com/simplejson/simplejson/blob/288e4e005c39a2eb855b5225c5dc8e... Is there a strong reason that the method would need to be called __json__ instead of 'for_json'? Passing a spec=None kwarg through to the __json__()/for_json() method would not be compatible with simplejson's existing implementation. Passing a spec=None kwarg would be necessary to support different JSON standards within the same method; which I think is desirable because JSON/JSON5/JSONLD/JSON_future. On Wed, Apr 8, 2020 at 5:03 PM Wes Turner <wes.turner@gmail.com> wrote:
participants (4)
-
Andrew Barnert
-
Christopher Barker
-
Tin Tvrtković
-
Wes Turner