On Aug 8, 2019, at 03:22, Richard Musil firstname.lastname@example.org wrote:
What matters is that I did not find a way how to fix it with the standard `json` module. I have the JSON file generated by another program (C++ code, which uses nlohmann/json library), which serializes one of the floats to the value above.
If anyone would want to know, why the last digit matters (or why I cannot double quote the floats), it is because the file has a secure hash attached and this basically breaks it.
If you need to exactly match a JSON file byte for byte, you really shouldn’t rely on parsing it and re-creating it in the first place, and especially not with two different libraries.
The fact that your C++ library is apparently using a different rounding mode in representing floats than Python’s default round-to-even. But different libraries also have different rules for when they switch to exponential numbers, and how they represent that. And a C++ library may well represent 64-bit integers above 1<<56 imprecisely, while Python won’t. And, beyond numbers, different libraries produce different white space, different ordering within dicts, and different escaped representations of strings (not to mention how they handle things like “\uDEAD”, which the spec says is legal but doesn’t tell you how to interpret, because it doesn’t map to any Unicode character). There’s no way to guarantee that dumps(loads(x) == x, even if you use Decimal instead of float.
And this isn’t really a limitation of either of the libraries you’re using, it’s the way JSON is supposed to work, by design. Even if both libraries follow all of the interoperability recommendations in the RFC, they’re still not expected to produce the same bytes for the same input.
Usually you just shouldn’t be hashing JSON files. But sometimes you have to, to fit into a poorly-designed ecosystem that you can’t change. In that case, if your goal is to write a program that sometimes makes a substantive change (in which case you want to re-sign the package, or tell the client there’s an update to download, etc.), but usually doesn’t, and you want it t leave the file byte-for-byte unchanged (so you don’t need to re-sign, re-download, etc.), the best thing to do is check that the dict is unchanged and, if so, not write the file at all, or write back the original un-parsed string.