On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
 > encoding=None: this is the important one -- json is always UTF-8 yes?

Standard JSON is always UTF-8.  Nevertheless, I'm quite sure that
there's a ton of Japanese in Shift JIS, including some produced by
default in Python on Windows.  I'll bet the same is true of GBK for
Chinese, and maybe even ISO-8859-1 in Europe.

So what should the json lib do with these? It could have an encoding parameter with utf-8 as default. Or it could require that the user open the file themselves if it's not UTF-8.

BTW: I noticed that json.loads() takes:

Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
containing a JSON document) to a Python object.

A str is an str (already Unicode, yes?) -- but for bytes, it must be assuming some encoding, presumably UTF-8, but it doesn't seem to have a way to specify one -- so this is already a missing feature.

-CHB






--
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython