[Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf

Sept. 13, 2020


      On 2020-09-13 11:57, Christopher Barker wrote:
...
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull
<turnbull.stephen.fw@u.tsukuba.ac.jp
<mailto:turnbull.stephen.fw@u.tsukuba.ac.jp>> wrote:
> encoding=None: this is the important one -- json is always UTF-8
    yes?
Standard JSON is always UTF-8.  Nevertheless, I'm quite sure that
    there's a ton of Japanese in Shift JIS, including some produced by
    default in Python on Windows.  I'll bet the same is true of GBK for
    Chinese, and maybe even ISO-8859-1 in Europe.
So what should the json lib do with these? It could have an encoding
parameter with utf-8 as default. Or it could require that the user open
the file themselves if it's not UTF-8.
BTW: I noticed that json.loads() takes:
Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
containing a JSON document) to a Python object.
A str is an str (already Unicode, yes?) -- but for bytes, it must be
assuming some encoding, presumably UTF-8, but it doesn't seem to have a
way to specify one -- so this is already a missing feature.
It's not a missing feature, because the JSON spec requires UTF-8.  If 
it's not UTF-8, it's invalid JSON.  If a user wants to handle a file 
that looks sort of like JSON but technically isn't because it's not 
UTF-8, it's on the user to first convert the file to UTF-8 before 
bringing JSON into the picture.

-- 
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is no 
path, and leave a trail."
    --author unknown