The hash is calculated over the "normalized" JSON output, where "normalized" basically means stripped of all whitespaces by the "generator". This is as canonical as it gets. Then the same data are transmitted in "loose" form, i.e. with some indentation so it is humanly readable. The other party has two options how to verify the hash.
1) Take the file as the text file, remove all the whitespaces, why doing some hardcoded primitive "JSON parsing" probably very limited and very error-prone and recalculate the hash from that. Since it will only use the data already available in the original text input, it could not anyhow corrupt them or change them, it just needs to know how to remove all white spaces correctly.
2) Use JSON decoder to decode it (hopefully without losing anything in the process) and then dump it into "normalized" form and compute the hash over this one. This has the risk of conversion error, but if I could avoid that risk by using a custom type which does not have such an error, it would be much easier and maintenable solution.
Recoding the data into some other format (binary or textual) for the hash would just add another level of complexity and will face the exactly same issues. Plus the goal of the hash is to protect the information in its transmitted form (i.e. in its textual form) because this is the only one which is available to both the sender and receiver, and not to authenticate some other representation of the same data which may be subject to "rounding errors" depending on the situation.
But as I said, discussing this was not the point of the OP.