On Mon, Aug 12, 2019 at 9:53 AM Richard Musil <risa2000x@gmail.com> wrote:

Christopher, I understood that the risk of producing invalid JSON if custom type is allowed to serialize into output stream seems to be a major problem for many. This has been already mentioned in this discussion. However, I thought it was related to the original idea of "raw output" (for generic custom user type - which could serialize into arbitrary JSON type

well, that would make it all the easier to make arbitrarily invalid JSON, but no, I don't think that was the only issue.

And as Chris A. points out, it's plenty easy to make totally invalid JSON anyway:

In [29]: json.dumps({"spam": [1,2,3]}, separators=(' } ','] '))
Out[29]: '{"spam"] [1 } 2 } 3]}'

So *maybe* that's a non issue?

Since we agreed that the only type which needs such a treatment is JSON fractional number, it should not be that hard to check if the custom type output is valid.

well, no -- we did not agree to that.

If your goal is tp support full precision of Decimal, than the way to do that is to support serialization of the Decimal type. And that would also allow serialization of any other (numeric) type into a valid JSON number.

But it seems your goal is not to be able to serialize Decimal numbers with full precision, but rather, to be able perfectly preserve the original JSON representation (Not just it's value), or to be a bit more generic, to be able to control exactly the JSON representation of a value that may have more than one valid representation.

If that is the goal, then strings would need a hook, too, as Unicode allows different normalized forms for some "characters" (see previous discussion for this, I, frankly, don't quite "get" it. Maybe it would be helpful to have full control over numbers, but still not strings -- but be clear that that's what's on the table if it is.

My summary:

I support full precision serialization of the Decimal type -- it would extend the json module to more fully support the JSON spec.

I don't think the json module should aim to allow users to fully control exactly how a given item is serialized, though if it did, it would be nice if it did a validity check as well.

Others, I'm sure, have different opinions, and mine hold no particular weight. But I suggest the way forward is to separate out the goal:

a) This is what I want to be able to do.

from

b) This is how I suggest that be achieved

Because you need to know (a) in order to know how to do (b) , and because you should make sure the core devs support (a) in the first place.

I have not verified if the NUMBER_RE regex defined in scanner.py matches exactly JSON number syntax or there are some deviations, but I would say it would be good start for checking the custom type output - by the rule, if the decoder lets this in, then the encoder should let it out.

and it it doesn't that may be considered a bug in the NUMBER_RE that could be fixed.

The check will involve only the custom type specified by 'dump_as_float', so it should not impact the default use and for those who would want to use custom type, it would be acceptable price to pay for the flexibility they get.

This type of consistency seems to be worthy the performance impact of the additional check for me.

Agreed.

-CHB

Christopher Barker, PhD

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython