json library: non-standards-compliant by default, and what to do about it.

Hi list, as you might be aware, the json library is non standards-compliant [1] by default: when fed {NaN, Inf, -Inf} floating point values in the serialization input, it will output {Nan, Inf, -Inf} literals in serialized form, unless the keyward argument allow_nan is explicitly set to False - and by default it's set to True. Therefore, the current state of affairs is that a simple `import json; json.dumps(float("NaN"))` is non-standards compliant. There is a symmetrical issue with deserialization - json.load and friends will by default happily treat NaN/Inf/-Inf values as valid and convert them to Python float values. However, I'd like to focus on the emitting/encoding/serializing part, as I think that's the one that's more problematic in practice. A parser that is by default more lenient than the standard dictates is generally less of an issue than when a serializer is. A quick Google search for 'allow_nan python github' brings up many, many examples of people being bitten by either the fact that their code was not emitting standards-compliant JSON, or by the fact that they have to deal with an external Python system that emits NaN/Inf/-Inf [2]. From my experience it's not uncommon to find existing, mature Python codebases that exhibit this issue, and I'd like future Python users to notice early that their code is likely emitting non-compliant JSON values, and take appropriate actions. Is there a general consensus on this state of affairs, or some discussion about this that I've missed? As far as I can tell, this behavior has existed in Python (and simplejson) since at least 2005. What does the list think of the following two ideas: 1) Document this lack of standards compliance better - eg., introduce a big emphasized box on top of the Python manual for the json library that mentions the importance of the allow_nan flag. Or, 2) Fix the current behavior of JSON encoding in Python with regards to NaN/Inf/-Inf values - keeping in mind that a simple flip of allow_nan to False by default would unfortunately cause obvious breakage to existing Python codebases (as attempts to emit out-of- range float values result in a ValueError being thrown). A discussion naturally arises on whether increased standards compliance is wroth the breakage of backwards compatibility, or whether there is a way to implement this change in a less drastic way (transition period with warning? defaulting to converting invalid values to null? something else?). I've been sufficiently annoyed by this behavior that I'm willing to drive either of these proposals to further discussion and possibly implementation, but I wanted to first gauge the consensus on this, and make sure there wasn't a previous discussion on this that I've missed. Kind regards, Serge Bazanski [1] - By standards-compliant, I refer to compliance with RFC8259 - but I couldn't find _any_ JSON specification that would allow for NaN/Inf/-Inf. [2] - I've personally had a similar experience not so long ago, which forced me to implement https://github.com/q3k/cursedjson - so I'm generally somewhat biased with regards to this.

Please forgive the stupid question, but given that the JSON standard is so obviously broken (being unable to serialise valid values from a supported type, what on earth were they thinking???), wouldn't all this time and energy be better aimed at fixing the standard rather than making Python's JSON encoder broken by default? On Tue, Jun 16, 2020 at 06:48:19PM +0200, Serge Bazanski wrote:
What does the list think of the following two ideas:
1) Document this lack of standards compliance better
2) Fix the current behavior of JSON encoding in Python with regards to NaN/Inf/-Inf values
For some value of "fix". A third option would be for the JSON encoder to emit a warning if it actually encodes an INF or NAN.
Because it would break backwards compatibility, there would have to be a transition period. At the moment, that is handled on a de facto basis, but PEP 387 aims to make it offical and even more strict: https://www.python.org/dev/peps/pep-0387/ -- Steven

On Wed, Jun 17, 2020 at 10:43 AM Steven D'Aprano <steve@pearwood.info> wrote:
What do you mean? JSON doesn't have a "float" type with IEEE semantics. It just has a "Number" type, which is defined syntactically but not semantically. It doesn't mandate 53-bit precision, for instance, so you can carry large integers between languages that support them. That's one of the good things about JSON... and also one of the bad things. IMO the default behaviour is very useful and should be kept, but I would agree with a cautionary note in the docs saying that the default settings aren't as strict as the standard demands. It'd be similar to the way PostgreSQL docs are very clear about which features are Postgres extensions to the standard. ChrisA

On Wed, Jun 17, 2020 at 10:58:38AM +1000, Chris Angelico wrote:
I never mentioned float type or 53 bit precision. In Javascript: js> typeof(NaN) number js> typeof(Infinity) number Odd as it may seem, NANs and INFs are numbers. And the JSON standard isn't capable of encoding them. The JSON standard defines "number" in such a way that even in the language that originated JSON, it can't represent all numbers. -- Steven

On Wed, Jun 17, 2020 at 3:47 PM Steven D'Aprano <steve@pearwood.info> wrote:
That's JavaScript you're looking at, not JSON. The JSON standard never says anything about IEEE floats. You said the words "supported type" and clearly implied IEEE floats, complete with infinity and nan, but that is not what the standard says. The "supported type" in JSON is simply a string of digits, optionally with a decimal point and/or an exponent. That is all. ChrisA

On Wed, Jun 17, 2020 at 05:06:29PM +1000, Chris Angelico wrote:
Yes I know. You can tell I already knew that, by the way I explicitly referred to Javascript, sometimes I'm quite clever like that :-)
You're technically correct, which is the best kind of correct, apart from missing out on negatives :-) Nevertheless, JSON fails to support things which are considered numbers by Javascript and other languages, such as those little-known exotic languages Java, C/C++, PHP, and Python, you might have heard of them :-) We're not arguing about exotic numeric types like quaternions or complex numbers or even fractions. We're talking about the JSON standard not even being able to fully support probably the single most common floating point numeric types in the world. That doesn't strike you as a bit broken? Not even a little bit? The JSON standard didn't just accidently fail to specify what to do with NANs and INFs. It mandates that they are turned into null. JSON is designed to take your numeric data and throw values away, and this is a real problem for people: https://github.com/dotnet/runtime/issues/31024 https://github.com/AppMetrics/AppMetrics/issues/502 which is probably why a lot of JSON implementations provide support for INFs and NANs no matter what the standard says. The bottom line here is that the JSON definition of "number" fails to match numeric data types in not just other languages like Python, but even in Javascript. -- Steven

On Sat, Jun 27, 2020 at 03:33:55PM +0300, Serhiy Storchaka wrote:
Douglas Crockford recommended using nulls in place of NaNs: http://www.json.org/json.ppt (Look at slide 16) The RFC states that Infinity and NaN are not permitted (but doesn't explain what to do with them): https://tools.ietf.org/html/rfc4627 and ECMA-262 section 24.5.2, JSON.stringify, NOTE 4, page 683 states: "Finite numbers are stringified as if by calling ToString(number). NaN and Infinity regardless of sign are represented as the String null." http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf -- Steven

On Tue, Jun 16, 2020 at 5:45 PM Steven D'Aprano <steve@pearwood.info> wrote:
You're kidding, right? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Tue, Jun 16, 2020 at 07:11:57PM -0700, Guido van Rossum wrote:
Was what I said so stupid that even when prefixed with an acknowledgement that it was a stupid question, you can't imagine how anyone could ask the question? What exactly is getting in the way here? Standards do change. One standard (JSON) is not capable of representing all values from another standard (IEEE-754). Removing NANs and INFs would break floating point software everywhere, and a lot of hardware too. Adding support for them to JSON would be an enhancement, not a breakage. In my ignorance, that seems like a no-brainer. So I don't understand your point here. Is it...? That the behaviour of JSON is perfect as it is. That it's not perfect, but there are reasons *aside from the standard* why it can't be changed. That it's not possible to change the standard, even if you managed to get (let's say...) Mozilla and Microsoft on board. It's possible to change the standard, but that costs a lot of money, and nobody cares enough to spend it. Or something else? -- Steven

On Wed, Jun 17, 2020, 1:30 AM Steven D'Aprano <steve@pearwood.info> wrote:
I can't speak for Guido, of course. But it seems to me that changing the JSON standard used by many languages and libraries, would be a long uphill walk. Not because it's perfect, but simply because there is a lot of institutional inertia and varying interested parties. That said, Python is far from alone in supporting "JSON+" as an informal extension that adds +/-Infinity and NaN. Those balls are definitely work having available. I think the argument 'allow_nan' is poorly spelled. Spelling it 'strict' would have been much better. Maybe 'conformant'. I'm not sure I hate the spelling enough to propose a change with depreciation period, but I certainly wouldn't oppose that.

17.06.20 08:42, David Mertz пише:
It is not only thing which which makes Python implementation non-conforming to the standard or incompatible with other implementations. 1. Initial standard allowed only JSON objects and JSON arrays at the top level, but Python implementations allowed all. Now the standard has been changed. 2. Initial standard allowed binary input and suggested algorithm to determine the encoding (if it one of UTF-8, UTF-16, UTF-32 with variations). Current standard requires UTF-8 encoding. Python implementation uses the above algorithm (with variation). You can also use arbitrary explicit encoding. 3. Python implementation supports integers of arbitrary size. Other implementations can be limited to 32- or 64-bit integers. 4. Python implementation is limited to precision and range of IEEE-754 for non-integer numbers. Other implementations can support larger precision and range. 5. Python implementation supports single surrogate characters in strings. Other implementations can be limited. 6. Python implementation can produce JSON objects with duplicated keys (and their order was unspecified before 3.6), for example when serialize {1: 1, "1": 2}. So there is more than one meaning in the term "strict", and it may be changed with changing the JSON standard.

You raise valid points - notably about how JSON isn't a great target in general. On 6/17/20 8:42 AM, Serhiy Storchaka wrote:
This seems like a non-issue now, though - if we're explicitly making the decision that RFC8259 is what we refer to as the standard Python should conform to.
Good point - but the non-conformity is on the deserialization side, and serialization/encoding is conformant (with an option to break conformity). I still feel strongly about focusing on the encoding path for now (while responsibly keeping in mind the state of the decoding/deserialization path - trying to also fix it if some sort of rfc-compatibility is indeed implemented).
3, 4 & 5 are unfortunate effects of the JSON spec being Not Very Good. However, we can still strive to declare conformity to the spec while being potentially incompatible with some other implementations.
RFC8529 doesn't prohibit this [1] by weaseling out^W^Wsaying that they SHOULD be unique. So this is technically conformant, but I personally wouldn't mind some way of optionally making serialization more strict in this regard.
So there is more than one meaning in the term "strict", and it may be changed with changing the JSON standard.
So I think there's two separate things here: 1) Strictness wrt. to the standard: with allow_nan set to False I can't see how the implementation is not strict/conformant wrt. RFC 8259, at least on the serialization side - so I don't see any issues with evolving this option into a strict/conformant mode. Naturally the question arises on whether replacements to RFC 8259 won't break this compatibility - but this is something that is possible with any standard. JSON does indeed not have the greatest history in this regards, but with RFC8259 I feel that we can hope for a glimpse of stability and responsible future revisions. 2) Compatibility with other implementations: as you noted in your points 3, 4, 5 and 6, even strict conformity to the standard does not guarantee interoperability with other implementations for some data. However, I still think that this isn't a problem if strictness/conformity is explicitly defined to be conforming to a given standard, and explaining that even with that there is no guarantee for interoperability. Perhaps 'strict' isn't the best term because of its vagueness - but would you agree that some sort of named, conformant-with-RFC82599 option (eg. 'conformant', or 'rfc_compat{,ible,ibility}') would be a better choice than 'allow_nan=False'? [1] - https://tools.ietf.org/html/rfc8259#section-4 “The names within an object SHOULD be unique.”

On Wed, Jun 17, 2020 at 09:18:00AM +0300, Serhiy Storchaka wrote:
It won't break anything that doesn't actually include NANs or INFs. In another post, you replied to David Mertz: "1. Initial standard allowed only JSON objects and JSON arrays at the top level, but Python implementations allowed all. Now the standard has been changed." "2. Initial standard allowed binary input and suggested algorithm to determine the encoding (if it one of UTF-8, UTF-16, UTF-32 with variations). Current standard requires UTF-8 encoding." So the standard has changed in the past. Breakage can be managed. I don't know anyone who likes the fact that JSON cannot round-trip all Javascript numbers. Its a source of pain to anyone who deals with such numbers where infinities and NANs might appear. JSON has changed in the past, breaking backwards compatibility, and I dare say it will change again in the future. Why is it unthinkable for this issue? There might be a good reason. But it's not obvious to me what that would be, other than "But we've always done it this way". -- Steven

27.06.20 10:23, Steven D'Aprano пише:
If it does not include NANs or INFs you do need the ability to serialize them.
Javascript numbers are unrelated to Python. Python floats include NaNs and infinities, and it may be a problem if they are occurred in a serialized data. But Python has a non-standard extension (enabled by default for historical reasons) to solve this problem. It is important to understand that it is a non-standard extension, and the result may be not compatible with other JSON implementations. Maybe future JSON standards will support NaNs and infinities, and other things. But this is not our business. The json module just implements the current standard, and supports compatibility with past implementations.

On 6/27/20 8:51 AM, Serhiy Storchaka wrote:
Another way to look at this is that currently, the JSON standard doesn't handle some values (like NAN and INF) that a Python program might try to send to a JSON. The JSON encoder has a couple of choices of what to do: 1) (The Current) Use an 'extension' to the JSON standard to output the value in a way that JSON dosn't strictly allow. 2) It could quietly accept the value, and output some legal value, making the JSON not round trip, and perhaps be considered 'lying' on the output. This could be something like output the maximum real value for infinity (or even a number bigger so it might actually come back as infinity). NAN might be harder to think of a reasonable value, some might use 0 or -9999. 3) It could complain (loudly our quietly) about the bad value, and if it quietly complained, it might generate an output based on one of the previous methods. Quietly would be returning an error code, loudly would be throwing an exception. Depending on the actual application, any of these could be the 'right' choice. Fundamentally, we end up back at the maxim of being generous of what we accept, and rigorous in what we generate, but run into the quandary of if we accept a value that JSON can't handle, what is the right trade off of being generous and rigorous (we CAN'T be totally both) -- Richard Damon

On Tue, Jun 16, 2020, 8:42 PM Steven D'Aprano <steve@pearwood.info> wrote:
I do not support changing the default in the Python library. However, I do want to note that JSON is not "broken." The data that gets represented in JSON may have the type "Number" (RFC 7159). That is, there is no concept of IEEE-754 floating point, or integer, or Decimal, etc. There is simply the type "Number" that a particular programming language may approximate in one of its available types. The type Number has no specific semantics, only a grammar for what sequences of characters qualify. There is a large intersection with the things representable in IEEE-754, but it's neither a subset not superset of that. From the RFC: A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available. On the other side, strings like -inf, Infinity, or NaN are not matched by the grammar.

no, the json encoder shouldn't change. But a clear warning in the docs is a fine idea. On Tue, Jun 16, 2020 at 7:21 PM David Mertz <mertz@gnosis.cx> wrote:
Well, maybe not "Decimal", but it IS, in fact decimal -- i.e. base 10 -- it can only exactly represent values that can be exactly represented in base 10. Which is why, performance aside, it would be better if a JSON number mapped to a Python Decimal, rather than a float. Which I'd still like to see happen, at least optionally. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Jun 16, 2020 at 11:55 PM Christopher Barker <pythonchb@gmail.com> wrote:
I strongly agree. simplejson has `use_decimal=True`, which is straightforward. The standard library would require users painfully define their own subclasses of JSONEncoder and JSONDecoder that have the desired Decimal behavior (passing in the `cls=` argument to use the custom classes). It's doable, but it should *just work* with a switch instead. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

Please forgive the stupid question, but given that the JSON standard is so obviously broken (being unable to serialise valid values from a supported type, what on earth were they thinking???), wouldn't all this time and energy be better aimed at fixing the standard rather than making Python's JSON encoder broken by default? On Tue, Jun 16, 2020 at 06:48:19PM +0200, Serge Bazanski wrote:
What does the list think of the following two ideas:
1) Document this lack of standards compliance better
2) Fix the current behavior of JSON encoding in Python with regards to NaN/Inf/-Inf values
For some value of "fix". A third option would be for the JSON encoder to emit a warning if it actually encodes an INF or NAN.
Because it would break backwards compatibility, there would have to be a transition period. At the moment, that is handled on a de facto basis, but PEP 387 aims to make it offical and even more strict: https://www.python.org/dev/peps/pep-0387/ -- Steven

On Wed, Jun 17, 2020 at 10:43 AM Steven D'Aprano <steve@pearwood.info> wrote:
What do you mean? JSON doesn't have a "float" type with IEEE semantics. It just has a "Number" type, which is defined syntactically but not semantically. It doesn't mandate 53-bit precision, for instance, so you can carry large integers between languages that support them. That's one of the good things about JSON... and also one of the bad things. IMO the default behaviour is very useful and should be kept, but I would agree with a cautionary note in the docs saying that the default settings aren't as strict as the standard demands. It'd be similar to the way PostgreSQL docs are very clear about which features are Postgres extensions to the standard. ChrisA

On Wed, Jun 17, 2020 at 10:58:38AM +1000, Chris Angelico wrote:
I never mentioned float type or 53 bit precision. In Javascript: js> typeof(NaN) number js> typeof(Infinity) number Odd as it may seem, NANs and INFs are numbers. And the JSON standard isn't capable of encoding them. The JSON standard defines "number" in such a way that even in the language that originated JSON, it can't represent all numbers. -- Steven

On Wed, Jun 17, 2020 at 3:47 PM Steven D'Aprano <steve@pearwood.info> wrote:
That's JavaScript you're looking at, not JSON. The JSON standard never says anything about IEEE floats. You said the words "supported type" and clearly implied IEEE floats, complete with infinity and nan, but that is not what the standard says. The "supported type" in JSON is simply a string of digits, optionally with a decimal point and/or an exponent. That is all. ChrisA

On Wed, Jun 17, 2020 at 05:06:29PM +1000, Chris Angelico wrote:
Yes I know. You can tell I already knew that, by the way I explicitly referred to Javascript, sometimes I'm quite clever like that :-)
You're technically correct, which is the best kind of correct, apart from missing out on negatives :-) Nevertheless, JSON fails to support things which are considered numbers by Javascript and other languages, such as those little-known exotic languages Java, C/C++, PHP, and Python, you might have heard of them :-) We're not arguing about exotic numeric types like quaternions or complex numbers or even fractions. We're talking about the JSON standard not even being able to fully support probably the single most common floating point numeric types in the world. That doesn't strike you as a bit broken? Not even a little bit? The JSON standard didn't just accidently fail to specify what to do with NANs and INFs. It mandates that they are turned into null. JSON is designed to take your numeric data and throw values away, and this is a real problem for people: https://github.com/dotnet/runtime/issues/31024 https://github.com/AppMetrics/AppMetrics/issues/502 which is probably why a lot of JSON implementations provide support for INFs and NANs no matter what the standard says. The bottom line here is that the JSON definition of "number" fails to match numeric data types in not just other languages like Python, but even in Javascript. -- Steven

On Sat, Jun 27, 2020 at 03:33:55PM +0300, Serhiy Storchaka wrote:
Douglas Crockford recommended using nulls in place of NaNs: http://www.json.org/json.ppt (Look at slide 16) The RFC states that Infinity and NaN are not permitted (but doesn't explain what to do with them): https://tools.ietf.org/html/rfc4627 and ECMA-262 section 24.5.2, JSON.stringify, NOTE 4, page 683 states: "Finite numbers are stringified as if by calling ToString(number). NaN and Infinity regardless of sign are represented as the String null." http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf -- Steven

On Tue, Jun 16, 2020 at 5:45 PM Steven D'Aprano <steve@pearwood.info> wrote:
You're kidding, right? -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Tue, Jun 16, 2020 at 07:11:57PM -0700, Guido van Rossum wrote:
Was what I said so stupid that even when prefixed with an acknowledgement that it was a stupid question, you can't imagine how anyone could ask the question? What exactly is getting in the way here? Standards do change. One standard (JSON) is not capable of representing all values from another standard (IEEE-754). Removing NANs and INFs would break floating point software everywhere, and a lot of hardware too. Adding support for them to JSON would be an enhancement, not a breakage. In my ignorance, that seems like a no-brainer. So I don't understand your point here. Is it...? That the behaviour of JSON is perfect as it is. That it's not perfect, but there are reasons *aside from the standard* why it can't be changed. That it's not possible to change the standard, even if you managed to get (let's say...) Mozilla and Microsoft on board. It's possible to change the standard, but that costs a lot of money, and nobody cares enough to spend it. Or something else? -- Steven

On Wed, Jun 17, 2020, 1:30 AM Steven D'Aprano <steve@pearwood.info> wrote:
I can't speak for Guido, of course. But it seems to me that changing the JSON standard used by many languages and libraries, would be a long uphill walk. Not because it's perfect, but simply because there is a lot of institutional inertia and varying interested parties. That said, Python is far from alone in supporting "JSON+" as an informal extension that adds +/-Infinity and NaN. Those balls are definitely work having available. I think the argument 'allow_nan' is poorly spelled. Spelling it 'strict' would have been much better. Maybe 'conformant'. I'm not sure I hate the spelling enough to propose a change with depreciation period, but I certainly wouldn't oppose that.

17.06.20 08:42, David Mertz пише:
It is not only thing which which makes Python implementation non-conforming to the standard or incompatible with other implementations. 1. Initial standard allowed only JSON objects and JSON arrays at the top level, but Python implementations allowed all. Now the standard has been changed. 2. Initial standard allowed binary input and suggested algorithm to determine the encoding (if it one of UTF-8, UTF-16, UTF-32 with variations). Current standard requires UTF-8 encoding. Python implementation uses the above algorithm (with variation). You can also use arbitrary explicit encoding. 3. Python implementation supports integers of arbitrary size. Other implementations can be limited to 32- or 64-bit integers. 4. Python implementation is limited to precision and range of IEEE-754 for non-integer numbers. Other implementations can support larger precision and range. 5. Python implementation supports single surrogate characters in strings. Other implementations can be limited. 6. Python implementation can produce JSON objects with duplicated keys (and their order was unspecified before 3.6), for example when serialize {1: 1, "1": 2}. So there is more than one meaning in the term "strict", and it may be changed with changing the JSON standard.

You raise valid points - notably about how JSON isn't a great target in general. On 6/17/20 8:42 AM, Serhiy Storchaka wrote:
This seems like a non-issue now, though - if we're explicitly making the decision that RFC8259 is what we refer to as the standard Python should conform to.
Good point - but the non-conformity is on the deserialization side, and serialization/encoding is conformant (with an option to break conformity). I still feel strongly about focusing on the encoding path for now (while responsibly keeping in mind the state of the decoding/deserialization path - trying to also fix it if some sort of rfc-compatibility is indeed implemented).
3, 4 & 5 are unfortunate effects of the JSON spec being Not Very Good. However, we can still strive to declare conformity to the spec while being potentially incompatible with some other implementations.
RFC8529 doesn't prohibit this [1] by weaseling out^W^Wsaying that they SHOULD be unique. So this is technically conformant, but I personally wouldn't mind some way of optionally making serialization more strict in this regard.
So there is more than one meaning in the term "strict", and it may be changed with changing the JSON standard.
So I think there's two separate things here: 1) Strictness wrt. to the standard: with allow_nan set to False I can't see how the implementation is not strict/conformant wrt. RFC 8259, at least on the serialization side - so I don't see any issues with evolving this option into a strict/conformant mode. Naturally the question arises on whether replacements to RFC 8259 won't break this compatibility - but this is something that is possible with any standard. JSON does indeed not have the greatest history in this regards, but with RFC8259 I feel that we can hope for a glimpse of stability and responsible future revisions. 2) Compatibility with other implementations: as you noted in your points 3, 4, 5 and 6, even strict conformity to the standard does not guarantee interoperability with other implementations for some data. However, I still think that this isn't a problem if strictness/conformity is explicitly defined to be conforming to a given standard, and explaining that even with that there is no guarantee for interoperability. Perhaps 'strict' isn't the best term because of its vagueness - but would you agree that some sort of named, conformant-with-RFC82599 option (eg. 'conformant', or 'rfc_compat{,ible,ibility}') would be a better choice than 'allow_nan=False'? [1] - https://tools.ietf.org/html/rfc8259#section-4 “The names within an object SHOULD be unique.”

On Wed, Jun 17, 2020 at 09:18:00AM +0300, Serhiy Storchaka wrote:
It won't break anything that doesn't actually include NANs or INFs. In another post, you replied to David Mertz: "1. Initial standard allowed only JSON objects and JSON arrays at the top level, but Python implementations allowed all. Now the standard has been changed." "2. Initial standard allowed binary input and suggested algorithm to determine the encoding (if it one of UTF-8, UTF-16, UTF-32 with variations). Current standard requires UTF-8 encoding." So the standard has changed in the past. Breakage can be managed. I don't know anyone who likes the fact that JSON cannot round-trip all Javascript numbers. Its a source of pain to anyone who deals with such numbers where infinities and NANs might appear. JSON has changed in the past, breaking backwards compatibility, and I dare say it will change again in the future. Why is it unthinkable for this issue? There might be a good reason. But it's not obvious to me what that would be, other than "But we've always done it this way". -- Steven

27.06.20 10:23, Steven D'Aprano пише:
If it does not include NANs or INFs you do need the ability to serialize them.
Javascript numbers are unrelated to Python. Python floats include NaNs and infinities, and it may be a problem if they are occurred in a serialized data. But Python has a non-standard extension (enabled by default for historical reasons) to solve this problem. It is important to understand that it is a non-standard extension, and the result may be not compatible with other JSON implementations. Maybe future JSON standards will support NaNs and infinities, and other things. But this is not our business. The json module just implements the current standard, and supports compatibility with past implementations.

On 6/27/20 8:51 AM, Serhiy Storchaka wrote:
Another way to look at this is that currently, the JSON standard doesn't handle some values (like NAN and INF) that a Python program might try to send to a JSON. The JSON encoder has a couple of choices of what to do: 1) (The Current) Use an 'extension' to the JSON standard to output the value in a way that JSON dosn't strictly allow. 2) It could quietly accept the value, and output some legal value, making the JSON not round trip, and perhaps be considered 'lying' on the output. This could be something like output the maximum real value for infinity (or even a number bigger so it might actually come back as infinity). NAN might be harder to think of a reasonable value, some might use 0 or -9999. 3) It could complain (loudly our quietly) about the bad value, and if it quietly complained, it might generate an output based on one of the previous methods. Quietly would be returning an error code, loudly would be throwing an exception. Depending on the actual application, any of these could be the 'right' choice. Fundamentally, we end up back at the maxim of being generous of what we accept, and rigorous in what we generate, but run into the quandary of if we accept a value that JSON can't handle, what is the right trade off of being generous and rigorous (we CAN'T be totally both) -- Richard Damon

On Tue, Jun 16, 2020, 8:42 PM Steven D'Aprano <steve@pearwood.info> wrote:
I do not support changing the default in the Python library. However, I do want to note that JSON is not "broken." The data that gets represented in JSON may have the type "Number" (RFC 7159). That is, there is no concept of IEEE-754 floating point, or integer, or Decimal, etc. There is simply the type "Number" that a particular programming language may approximate in one of its available types. The type Number has no specific semantics, only a grammar for what sequences of characters qualify. There is a large intersection with the things representable in IEEE-754, but it's neither a subset not superset of that. From the RFC: A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available. On the other side, strings like -inf, Infinity, or NaN are not matched by the grammar.

no, the json encoder shouldn't change. But a clear warning in the docs is a fine idea. On Tue, Jun 16, 2020 at 7:21 PM David Mertz <mertz@gnosis.cx> wrote:
Well, maybe not "Decimal", but it IS, in fact decimal -- i.e. base 10 -- it can only exactly represent values that can be exactly represented in base 10. Which is why, performance aside, it would be better if a JSON number mapped to a Python Decimal, rather than a float. Which I'd still like to see happen, at least optionally. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Tue, Jun 16, 2020 at 11:55 PM Christopher Barker <pythonchb@gmail.com> wrote:
I strongly agree. simplejson has `use_decimal=True`, which is straightforward. The standard library would require users painfully define their own subclasses of JSONEncoder and JSONDecoder that have the desired Decimal behavior (passing in the `cls=` argument to use the custom classes). It's doable, but it should *just work* with a switch instead. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
participants (10)
-
Antoine Pitrou
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Greg Ewing
-
Guido van Rossum
-
Richard Damon
-
Serge Bazanski
-
Serhiy Storchaka
-
Steven D'Aprano