On Thursday, August 15, 2019, Andrew Barnert <abarnert@yahoo.com> wrote:
On Aug 15, 2019, at 10:23, Christopher Barker <pythonchb@gmail.com> wrote:
This is all making me think it's time for a broader discussion / PEP:
The future of the JSON module.
I think this is overreacting. There’s really only two issues here, and neither one is that major.
But nobody has shown any need for any of the other stuff.
Data interchange with structured types is worthwhile. Lack of implementation does not indicate lack of need, so much as ignorance of the concerns. - There was no JSON module in the stdlib, but that doesn't mean it wasn't needed - We all have CSV, which is insufficient for reuse because, for example, there's nowhere to list metadata like unit URLs like liters/litres or deciliters and we thus all have to waste time later determining what was serialized (hopefully with at least a URL in the references of a given ScholarlyArticle). CSVW JSON-LD is a solution for complex types; such as decimals and numbers with units and precision. It's not that the functionality isn't needed, it's that nobody knows how to do it because it doesn't just work with a simple function call. Decimals, datetimes, complex numbers, ±Infinity, NaN, float precision: these things aren't easy with JSON and that results in us losing fidelity in exchange for serialization convenience. Doing more to make it easy to losslessly serialize and deserialize is a net win for the public good and it's worth the effort to host discussions - in an unfragmented forum - to develop a PEP with a protocol for progress on the lossless serialization front.
* support for extensions to the JSON standard (JSON5, etc.)
One guy saw a bunch of shiny new protocols (half of which are actually stagnant dead protocols) and thought that if the Python stdlib added support for all of them, they’d all take over the world. That’s not what the stdlib is for.
Whether JSON was worthy of inclusion in the stdlib was contentious and required justification. It was a good idea because pickles are dangerous and interacting with JS is very useful. It is unfortunate that we all just use JSON and throw away decimals and float precision and datetimes because json.dumps is so easy. An object.__json__(**kwargs) protocol would inconvenience no-one so long as: - decimal isn't imported unless used - all existing code continues to work
Plus, most of them aren’t even JSON extensions, they’re other formats that build on top of JSON. For example, a JSONlines document is not a superset of a JSON document, it’s a sequence of lines, each of which is a plain-old JSON document (with the restriction that any newlines must be escaped, which the stdlib can already handle). There are multiple JSONlines libraries out there that work fine today using the stdlib json module or their favorite third-party package; none of their authors have asked for any new features. So, what should the stdlib add to support JSONlines? Nothing.
Streaming JSON is not possible without JSON lines support. There are packages to do it, but that's not an argument against making it easy for people to do safe serialization without a dependency.
Maybe a brief section in the docs about related protocols, explaining why they’re not JSON but how they’re related, would be helpful. But I’m not sure even that is needed. How many people come to the json docs looking for how to parse NDJ?
How many people know that: - You can or should use decimal to avoid float precision error, but then you have to annoyingly write a JSONEncoder to save that data, and then the type is lost when it's parsed and cast to a float when it's deserialized? - JSON-LD is the only non-ad-hoc solution to preserving precision, datetimes, and complex numbers and types with JSON - JSON5 supports IEEE 754 ±Infinity and NaN - Pickles do serialize arbitrary objects, but are not safe for data publishing because unmarshalling runs executable code in the pickle (this is in the docs now) Including guidance in the docs would be great. Making it easy to do things correctly and consistently would also be great.
* support for serializing arbitrary precision decimal numbers
Multiple people want this. The use_decimal flag from simplejson is a proven way to do it. The only real objection is “Fine, but can you find a way to do it without slowing down imports for people who don’t use it?” People have suggested answers, and either someone will implement one, or not.
It might be nice to spin this off into its own thread to escape all the baggage. Going the other way and holding it hostage to a “let’s redesign everything from scratch before doing that” seems like a bad idea to me.
Why break the thread? A protocol for object.__json__(**kwargs) is a partial solution for the OT. A PEP PR would be a good place to continue discussion, but nobody will helpfully chime in there. Saving decimals as float strs which deserialize as floats does not preserve the type or precision of the decimals (which are complex types with a numerator and a denominator). JSON-LD is the way to go for complex types in JSON.
* support for allowing custom serializations (i.e. not just what can be serialized, but controlling exactly how)
Only one person wants this. And he keeps saying he doesn’t want it. And he only wants it for Decimal or other float-substitutes, not for all types. And it won’t actually solve the problem he wants to solve.
Would optional kwargs passed to object.__json__(**kwargs) from e.g. json.dumps kwargs allow for parameter-customizable serializations?
If there were prior art on this to consider, it might be worth trying to design a solution that has all the strengths of each of the existing answers. But if no JSON package has such a feature, because nobody needs it, then the answer is simple: don’t invent the first-ever solution for the stdlib module.
The alternative is for every package to eventually solve for these real needs in a different way, stdlib to write a JSON protocol PEP, and then for every package to have to support their original and, now, the spec'd protocol.
* a "dunder protocol" for customization
I personally don’t see much point in this, as it’s trivial to do that yourself. (I had a program that depended on simplejson’s for_json protocol; making it work with the stdlib and ujson was just a matter of writing a function that checks for and tries for_json, and partialling that in as the default function. We’re talking 5 minutes of work, and easily reusable in your personal toolbox.)
There's good reason to not copy paste code that needs to be tested into every application. 5 minutes of everyone's time to copy-paste, and t time for every run of every app's test suite.
But if people really want it, the only real question is what to call it. And __json__ seems like the obvious answer.
object.__json__(**kwargs)
And again, it seems like this would benefit from being separated out to its own small discussion, not being bundled into a huge one.
https://github.com/python/peps
* what role, if any, should the json module have in ensuring only valid JSON is produced?
I think where it is today is fine.
The allow_nan flag is necessary for interoperability.
The separators option may not be the best possible design, but it rarely if ever causes problems for anyone, so why break compatibility?
The fact that you can monkeypatch in a punning float.__repr__ or whatever isn’t a problem for a consenting-adults library to worry about.
Such global float precision (from a __repr__ with a format string) is lost when the JSON is deserialized. (JSON-LD supports complex types in a standard way; which are otherwise necessarily JSON-implementation specific and thus non-portable) It's worth specifying a JSON serialization protocol as a PEP that third-party and stdlib JSON implementations would use.