On Aug 12, 2019, at 15:18, Christopher Barker
If that is the goal, then strings would need a hook, too, as Unicode allows different normalized forms for some "characters" (see previous discussion for this, I, frankly, don't quite "get" it.
Although normalization can be a problem, there’s a much simpler—and more common—issue: what to escape, and how to escape it. For example, many pre-ES5 JS implementations escaped forward slashes. This is legal, but unnecessary, and Python’s module doesn’t do it. And you can’t make Python’s module do it. So, you load a JSON document with the string “abc\/def”, you get the Python string “abc/def”, you dump it and get a JSON document with “abc/def”. JSON says the two documents are identical, but they obviously aren’t the same bytes. Similar issues include case for hex letters in \u escapes, whether to use \u0008 instead of \b (and similar for all the other special escape sequences, but I think \b is the most commonly different one), whether to escape all non-ASCII (and what that means—Python doesn’t count \x7f as ASCII), whether to escape all non-BMP, whether to treat Unicode separators (or just the two that JS source doesn’t allow) as control characters, etc. So, even if you only care about preserving the output of one specific library, and you know exactly what rule it uses for escaping, you still can’t do it. And really, the same is true for array and object whitespace. At least most libraries are consistent in how they use whitespace, and most rules can be covered by the simple separators hook, so if you know which library generated the document, you can usually write code to reproduce the whitespace. But if you have to work with the output of two different libraries? Or a library you don’t know? Or JSON edited by hand or by sed scripts that might not even be consistent?