On Mon, Sep 14, 2020 at 9:20 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

Christopher Barker writes:

> On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull <

> turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

>

> > > encoding=None: this is the important one -- json is always UTF-8 yes?

> >

> > Standard JSON is always UTF-8. Nevertheless, I'm quite sure that

> > there's a ton of Japanese in Shift JIS, including some produced by

> > default in Python on Windows. I'll bet the same is true of GBK for

> > Chinese, and maybe even ISO-8859-1 in Europe.

>

> So what should the json lib do with these?

Well, I'm a mail guy from way back, so I'm with Mr. Postol: be

libertine in what you accept, puritan in what you emit. I think given

the current architecture of json, dump and load are fine as is, dump

should be discourage (but not removed!) in favor of dumpf, and dumpf

and loadf should provide no option but UTF-8.

I just wanted to point out that it's very likely that there's a lot of

"JSON-like" data out there, and probably a lot of "unwritten

protocols" that expect it. While nobody has proposed removing dump

and load, I don't want them deprecated or discouraged for the purpose

of dealing with "JSON-like" data, especially not load.