On Sat, Aug 10, 2019 at 11:20 PM Richard Musil <risa2000x@gmail.com> wrote:

Christopher, let me go through your summary and add some remarks, hopefully for the benefit of all (who made it so far) in this conversation:

And more comments from me now :-)

Christopher Barker wrote:

> TL;DR : I propose that python's JSON encoder encode the Decimal type as
> maintaining full precision.

My proposal will be to extend JSON encoder to allow custom type to encode into "JSON fractional number" (explanation will follow).

Here is where I think we have slightly different ideas: See the recent exchange with Greg Ewing, but my thought is that we have the built in Python Decimal type that matches the "JSON fractional number" -- if it can losslessly encode into JSON, then any other custom numeric type can be encoded into JSON by using the Decimal encoding. The advantage of this is that it would then ensure that users's couldn't accidentally crate invalid JSON.

My (biggest) mistake was that I used word "float" loosely for different things based on the context. First, I used it for "JSON fractional number", i.e. the *number* (as defined by JSON spec) with decimal point. Sometimes I wrote it as "JSON float", sometimes as "float" only and I guess it was _the_ (unintentional) red herring.

yeah, the terminology can be confusing.

Second, I used it for IEEE-754 Floating Point number, usually referring to it as "floating point binary representation".

What did not occur to me (yes!) that for most (all) here, "float" would probably prominently mean "Python native type float", as I really did not make a distinction between IEEE-754 and the Python type and considered the latter just an implementation of the former.

Which is pretty much the case, so I don't think that's a real problem -- at least until/unless Python decides to support a different float implementation in the future.

Here I would like to clarify two things:

1) At the moment Python `json` module allows using custom type for parsing "JSON fractional number". This custom type is passed to the decoder (`json.load(s)`) in an optional keyword argument `parse_float`. This type could be Python's own decimal.Decimal, but can be something else.

OK -- *maybe* there could be a more "standard" way to make a Decimal out of a JSON number, but this seems fine to me. Having a JSON number become a Python float by default does seem like the best option (and is what it does now, and backwards compatibility and all that).

2) decimal.Decimal has been used throughout the discussion as an example of such custom type and because `simplejson` already allows using it for both (decoding and encoding).

I'm not sure simplejson is particularly relevant here, but yes, as an example, sure.

> `simplejson` however does not allow custom type to be used (neither for parsing nor encoding) decimal.Decimal is "hardcoded" in the API by `use_decimal` Boolean keyword argument.

I think that is the right approach -- while there may be a use case for people to use a custom numeric type with JSON numbers, the JSON spec does not allow all possible numeric values to be represented -- it can only represent numbers that can be represented with a finite number of base ten digits (i.e no rational -- there is now way to precisely specify the value 1/3 for instance, and no rational numbers (srt(2), pi, etc.). As it happens, the Pyton Decimal type can also represent exactly these same numbers (and a few more -- JSON exclude, Inf, -Inf, NaN). So using int, float, and Decimal as the only way to encode/decode numbers provides access to the full functionality of JSON numbers.

If someone does need to encode another number type, they will need to convert to int, float, or decimal, and I'm pretty sure that will provide a way to access any legal JSON value.

So why not allow users to write any custom encoder they want for JSON numbers? Because we want the jonn lib to ideally only ever produce valid JSON. I don't think it's horrible to allow users to do do something wrong (consenting adults and all that), but it's better not to make it easy to do, and only to allow that if it provided some functionality that can not be provided with a more restrictive system.

An example -- say a user is using the Fraction type in Python. They want to store that value in JSON. How can they do that now? Two ways:

1) Convert to float and accept the loss of precision

2) Store it in some custom string r (or object) representation

What should they be allowed to do? Well, in pure JSON the options are:

1) store as a JSON number, accepting loss of precision, but choosing what that loss will be: 15 digits, 100 digits? all valid JSON.

2) Store it in some custom string (Object) representation

If the python json lib provides a Decimal encoder, then users will have exactly these same two options.

However, if it allows a custom encoder, then the user *could* use a non-legal JSON option -- which would NOT be a good idea -- so why allow it?

However, I see one reason folks may want to control the encoding of Decimal numbers:

As pointed out this this thread, the way one can encode a particular value in JSON is not unique:

1.01 and 0.101e+1 and 101.0e-2 all represent the same value. The encoding of a Decimal in JSON will presumably normalize this, so that value would always be stored the same way. But other JSON libs could encode teh same value in a different way while still being totally valid.

The Python decoder would produce the same Decimal value for any legal JSON that represented the same value, but there would be no way to ensure that the Python Decimal encoder produced exactly the same JSON text for all Decimal values. (much like it currently does not for float values)

One way to address this would be to allow the users to write their own custom encoder that could then match what some other encoder does, but I think that isn't a use case that we should aim to support, for all the reasons previously mentioned in this thread.

This is not a question only of the interoperability (between two different codec/platforms). Imagine an application which reads a JSON file, which contains those "problematic" values, does some operation on completely unrelated parts (e.g. changes some metadata, etc.) and then dumps the file back. Imagine those "problematic" values are some financial data.

Such application, even without actually needing (or even caring about) any JSON numbers in the file is not able to dump them back into the changed file without changing them.

This is the question -- is this something to support -- in my example, such an application would return JSON that represented exactly the same values, but may not be exactly the same JSON text.

Again, I think the preserving exactly the same JSON text (rather than the value) should not be a goal of the Python json lib (and indeed, the only way to ensure that would be to not actually decode the data at all (or at least keep the original version around for round tripping)

I think that a) this is not a valid goal for the lib, and b) that your proposal wouldn't solve it anyway -- it would work for a larger set of cases, but not in general.

If the extension to support custom type for decoding and encoding "JSON fractional number" will be accepted, then also byte-to-byte accuracy for this particular type could be implemented. Imagine this "custom type":

```
class JsonNumCopy():
def __init__(self, src):
self.src = src

def __str__(self):
return self.src

def __repr__(self):
return self.src
```
It can already be used with a decoder (`json.loads(msg, parse_float=JsonNumCopy)`) and works as expected.

Well, OK. If you store the original JSON text, then you can reproduce it exactly. But you then have to use that special type -- yo aren't getting a decimal, or whatever numeric type you might want -- you are getting a special, "reproduce the JSON exactly" type.

If I actually had that use case, I think I'd forget trying to use the built-in json module, and write code that was designed to manipulate and work with the JSON text itself.

> I also fully agree that the Python JSON encoder should not EVER generate
> invalid JSON, so the OP's idea of a "raw" encoder seems like a bad idea. I

I came to the same conclusion during the discussion here. It seems there is no need for similar treatment for other Python native types as they are already handled accordingly (int -> big int).

Well, as others have pointed out Unicode (even UTF-8) doesn't guarantee reproducibility of the encoded bytes. So if want to be abel to guarnatee exact round-tripping of JSON, you really have to do it everywhere.

> The fundamental problem here is not that we don't allow a raw encoding, but
> that the JSON spec is based on decimal numbers, and Python also support
> Decimal numbers, but there was that one missing piece of how to "properly"
> encode the Decimal type -- it is clear in the JSON spec how best to do
> that, so Python should support it.

You summed it up nicely. I could not do it better. The only thing I am not sure we are aligned is the actual implementation (or what we have in mind).

As I wrote, I am aiming for custom type (or types as suggested by Joao) being allowed to serialize into "JSON fractional number".

not as far as i can tell -- as I wrote there isn't any real reason to serialize ANY custom type to a "JSON fractional number". What you want is to be able to round-trip the exact JSON text, and you've identified the number type as the only JSON type that doesn't do this now. But IIUC, strings don't either, necessarily.

Then there are the other issues with whitespace an all that. That can be normalized out, but if the goal is to reproduce the original JSON, I think a library designed with that in mind would make more sense.

the proposed implementation should avoid importing decimal.

I didn't quite follow the logic here, but it's not hard to only import Decimal if it was asked for.

Unfortunately I will not be very responsive in the next week, so will only be able to come up with a proposal on bpo after that. Since you seem to have something else in mind for the implementation I guess you either go ahead with yours (and then I will join in later with my comments), or you wait for my proposal and then join with yours.

Frankly, I don't have a use case, and i have a lot of other things to do -- so I won't be doing much beyond this kibitzing.

But I don't think the ultimate goal of being able tp preserve the exact text in the original JSON is appropriate for the json package.

The core devs may have another opinion, so good luck.

-CHB