On Mon, Aug 12, 2019 at 8:40 PM Andrew Barnert <abarnert@yahoo.com> wrote:

Although normalization can be a problem, there’s a much simpler—and more common—issue: what to escape, and how to escape it.

Yup -- I realized this after writing that post -- thanks for fleshing it out.

This tell me that trying to use the json module to create the exact same JSON is a pretty bad idea.

On the other hand, if it did allow user code to control exactly how an object is represented, then a user could create a specific "version", and round-trip anything. I just don't think that's really the job of the json module -- I'd rather have it work harder to enforce valid JSON.

And really, the same is true for array and object whitespace. At least most libraries are consistent in how they use whitespace, and most rules can be covered by the simple separators hook, so if you know which library generated the document, you can usually write code to reproduce the whitespace. But if you have to work with the output of two different libraries? Or a library you don’t know? Or JSON edited by hand or by sed scripts that might not even be consistent?

I think the OP suggested that they remove all whitespace to normalize it. But anyway, other good points as to why trying to hash JSON to check for changes is touchy at best, and maybe impossible.

-CHB

Christopher Barker, PhD

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython