Christopher, let me go through your summary and add some remarks, hopefully for the benefit of all (who made it so far) in this conversation:
Christopher Barker wrote:
TL;DR : I propose that python's JSON encoder encode the Decimal type as maintaining full precision.
My proposal will be to extend JSON encoder to allow custom type to encode into "JSON fractional number" (explanation will follow).
- The original post was inspired by a particular problem the OP is trying
to solve, and a suggested solution that I suspect the OP thought was the least disruptive and maybe most general solution. However, what I think that did was throw some red herrings into the conversation.However, it made me go read the RFC and see that the JSON spec really says about numbers, and think about whether the Python json module does as well as it could in transcoding numbers. And I think we've found a limitation.
After I decided to write a proposal on bpo (and the patch) I made a mental note that I would need to address some things in my proposal differently, to avoid misunderstanding which was apparent in this thread.
My (biggest) mistake was that I used word "float" loosely for different things based on the context. First, I used it for "JSON fractional number", i.e. the *number* (as defined by JSON spec) with decimal point. Sometimes I wrote it as "JSON float", sometimes as "float" only and I guess it was _the_ (unintentional) red herring. Second, I used it for IEEE-754 Floating Point number, usually referring to it as "floating point binary representation".
What did not occur to me (yes!) that for most (all) here, "float" would probably prominently mean "Python native type float", as I really did not make a distinction between IEEE-754 and the Python type and considered the latter just an implementation of the former.
So being loose was not helpful at all, and I realized, I would need to address it my "official" proposal.
- To be clear about vocabulary: a "float" is a binary floating point
number, for all intents and purposes an IEEE754 float -- or at the very least, a Python float. Which is compatible between many computer systems. JSON does not have a "float" specification.
Fully agree on that (as per my remark above).
OK -- so this means: if you want to be generally interoperable, than limit yourself to numbers that can be represented by IEEE-754. But it does not prohibit greater precision, or different binary representation when decoded.
That was the premise with which I came in, and manage to communicate so poorly.
Python's json module, like I imagine most JSON decoders, takes the very practical approach of using (IEEE-754) float as a default for JSON numbers with a fractional part. But it also allows you to decode a JSON number as a Decimal type instead. But it does not have a way to losslessly encode a python Decimal as JSON.
This triggered this thread.
Since the JSON spec does in fact allow lossless representation of a Python Decimal, it seems that for completeness' sake, the json default encoding of Decimal should maintain the full precision. This would provide round tripping in the sense that a Python Decimal encoded and then decoded as JSON would get back the same value. (but a given JSON number would not necessarily get the exact same text back when round-tripped though Decimal) And it would provide compatibility with any hypothetical other JSON implementation that fully supports Decimal numbers.
Here I would like to clarify two things:
1) At the moment Python `json` module allows using custom type for parsing "JSON fractional number". This custom type is passed to the decoder (`json.load(s)`) in an optional keyword argument `parse_float`. This type could be Python's own decimal.Decimal, but can be something else.
2) decimal.Decimal has been used throughout the discussion as an example of such custom type and because `simplejson` already allows using it for both (decoding and encoding). `simplejson` however does not allow custom type to be used (neither for parsing nor encoding) decimal.Decimal is "hardcoded" in the API by `use_decimal` Boolean keyword argument.
Note that this might solve the OPs problem in this particular case, but not in the general case -- it relies on the Python user to know how some other JSON encoder is encoding its floats. But it would provide a consistent encoding of Decimal that should be compatible with other decimal numeric types.
This is not a question only of the interoperability (between two different codec/platforms). Imagine an application which reads a JSON file, which contains those "problematic" values, does some operation on completely unrelated parts (e.g. changes some metadata, etc.) and then dumps the file back. Imagine those "problematic" values are some financial data.
Such application, even without actually needing (or even caring about) any JSON numbers in the file is not able to dump them back into the changed file without changing them.
Final points: I fully concur with many posters that byte for byte consistency of JSON is NOT a reasonable goal.
If the extension to support custom type for decoding and encoding "JSON fractional number" will be accepted, then also byte-to-byte accuracy for this particular type could be implemented. Imagine this "custom type":
``` class JsonNumCopy(): def __init__(self, src): self.src = src
def __str__(self): return self.src
def __repr__(self): return self.src ``` It can already be used with a decoder (`json.loads(msg, parse_float=JsonNumCopy)`) and works as expected.
I also fully agree that the Python JSON encoder should not EVER generate invalid JSON, so the OP's idea of a "raw" encoder seems like a bad idea. I
I came to the same conclusion during the discussion here. It seems there is no need for similar treatment for other Python native types as they are already handled accordingly (int -> big int).
The fundamental problem here is not that we don't allow a raw encoding, but that the JSON spec is based on decimal numbers, and Python also support Decimal numbers, but there was that one missing piece of how to "properly" encode the Decimal type -- it is clear in the JSON spec how best to do that, so Python should support it.
You summed it up nicely. I could not do it better. The only thing I am not sure we are aligned is the actual implementation (or what we have in mind).
As I wrote, I am aiming for custom type (or types as suggested by Joao) being allowed to serialize into "JSON fractional number". This may seem too broad (or risky) at first, but I believe there are two good reasons for that:
1) The parser already allows custom type. It would be good to be able to do something like this: ``` json_dict = json.loads(json_in, parse_float=MyFloat) json_out = json.dumps(json_dict, dump_as_float=MyFloat) ```
2) This allows offloading the burden of deciding which type it should use from JSONEncoder to the client code.
Very primitive custom type (example above) can be used instead of decimal.Decimal, but more importantly, as raised by others here, the proposed implementation should avoid importing decimal.
This solution will solve also that. The client code will import (or define) whatever type it would need. (I believe that was the reason why the implementer of the decoder in `json` module went for this solution instead of simple `use_decimal` flag as `simplejson` did.)
Richard: if your proposal is different, I'd love to hear what it is, and why you think Python needs something else. -CHB
Thanks again, for the sum up. I realized (still in the learning process) that even though I planned something like this for my bpo post, it was better to do it here right now.
Unfortunately I will not be very responsive in the next week, so will only be able to come up with a proposal on bpo after that. Since you seem to have something else in mind for the implementation I guess you either go ahead with yours (and then I will join in later with my comments), or you wait for my proposal and then join with yours.