Christopher Barker writes:
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
encoding=None: this is the important one -- json is always UTF-8 yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that there's a ton of Japanese in Shift JIS, including some produced by default in Python on Windows. I'll bet the same is true of GBK for Chinese, and maybe even ISO-8859-1 in Europe.
So what should the json lib do with these?
Well, I'm a mail guy from way back, so I'm with Mr. Postol: be libertine in what you accept, puritan in what you emit. I think given the current architecture of json, dump and load are fine as is, dump should be discourage (but not removed!) in favor of dumpf, and dumpf and loadf should provide no option but UTF-8. I just wanted to point out that it's very likely that there's a lot of "JSON-like" data out there, and probably a lot of "unwritten protocols" that expect it. While nobody has proposed removing dump and load, I don't want them deprecated or discouraged for the purpose of dealing with "JSON-like" data, especially not load.