[Python-ideas] Re: A shortcut to load a JSON file into a dict : json.loadf

Sept. 15, 2020

      On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.turner@gmail.com> wrote:
...
json.load and json.dump already default to UTF8 and already have
parameters for json loading and dumping.
yes, of course.

json.loads and json.dumps exist only because there was no way to
...
distinguish between a string containing JSON and a file path string.
(They probably should've been .loadstr and .dumpstr, but it's too late for
that now)
I think they exist because that was the pickle API from years ago -- though
maybe that's why the pickle API had them. Though I think you have it a bit
backwards -- you can't pass a path into loads/dumps for that reason. If
they were created because that distinction couldn't be made, then load/sump
would have accepted a string path back in the day.

TBH, I think it would be great to just have .load and .dump read the file
...
with standard params when a path-like ( hasattr(obj, '__path__') ) is
passed, but the suggested disadvantages of this are:
- https://docs.python.org/3/library/functions.html#open
...
The default encoding is platform dependent (whatever
locale.getpreferredencoding() returns), but any text encoding supported by
Python can be used. See the codecs module for the list of supported
encodings.
that's not a reason at all -- the reason is that some folks think
overloading a function like this is bad API design. And it's been the way
it's been for a long time, so probably better to add a new function(s),
rather than extend the API of an existing one.
...
- .load and .dump don't default to UTF8?
  AFAIU, they do default to UTF-8. Do they instead currently default to
locale.getpreferredencoding() instead of the JSON spec(s) *
  encoding= was removed from .loads and was never accepted by json.load or
json.dump
I think dump defaults to UTF-8. But load is a bit odd (and not that well
documented).

it appears to accept a file_like object that returns either a string or a
byte object from its read() method. If strings, then the decoding is done.
if bytes, then I assume that it's using utf-8.

This, by the way, should be better documented.
...
- .load and .dump would also need to accept an encoding= parameter for
non-spec data that don't want to continue handling the file themselves
  - pickle.load has an encoding= parameter
.loads doesn't now, so I don't see why they would need to with the proposed
change. You can always encode/decode ahead of time however you want, either
in the file-like object or by passing decoded str to .loads/dumps.
...
- Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode)
no, I think that's clear. in fact, you can't currently dump to a binary
file:

In [26]: json.dump(obj, open('tiny-enc.json', 'wb'))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-02e9bcd47a3e> in <module>
----> 1 json.dump(obj, open('tiny-enc.json', 'wb'))

~/miniconda3/envs/py3/lib/python3.8/json/__init__.py in dump(obj, fp,
skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators,
default, sort_keys, **kw)
    178     # a debuggability cost
    179     for chunk in iterable:
--> 180         fp.write(chunk)
    181
    182

TypeError: a bytes-like object is required, not 'str'

That's the beauty of Python 3's text model :-)

JSON Specs:
...
- https://tools.ietf.org/html/rfc7159#section-8.1  :
...
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The default
   encoding is UTF-8,
So THAT is interesting. But the current implementation does not directly
support anything but UTF-8, and I think it's fine that that still be the
case. If anyone is using the other two, it's an esoteric case, and they can
encode/decode by hand.
...
So, could we just have .load and .dump accept a path-like and an
encoding= parameter (because they need to be able to specify UTF-8 / UTF-16
/ UTF-32 anyway)?
These are separate questions, but I'll say:

Yes, it could take a path-like. But I think there was not much support for
that in this discussion.

No -- there is no need for encoding parameter -- the other two options are
rare and can be done by hand.

BTW: .dumps() dumps to, well, a string, so it's not assuming any encoding.
A user can encode it any way they want when passing it along.

This, in fact, is all very Python3 text model compatible -- the
encoding/decoding should happen as close to IO as possible.

If there were no backward compatibility options, and it were me, I would
only use strings in/out of the json module, but I think that ship has
sailed.

Anyway -- if anyone wants to push for overloading .load()/dump(), rather
than making two new loadf() and dumpf() functions, then speak now -- that
will take more discussion, and maybe a PEP.

-CHB

-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython