
On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.turner@gmail.com> wrote:
json.load and json.dump already default to UTF8 and already have parameters for json loading and dumping.
yes, of course. json.loads and json.dumps exist only because there was no way to
distinguish between a string containing JSON and a file path string. (They probably should've been .loadstr and .dumpstr, but it's too late for that now)
I think they exist because that was the pickle API from years ago -- though maybe that's why the pickle API had them. Though I think you have it a bit backwards -- you can't pass a path into loads/dumps for that reason. If they were created because that distinction couldn't be made, then load/sump would have accepted a string path back in the day. TBH, I think it would be great to just have .load and .dump read the file
with standard params when a path-like ( hasattr(obj, '__path__') ) is passed, but the suggested disadvantages of this are:
- https://docs.python.org/3/library/functions.html#open
The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.
that's not a reason at all -- the reason is that some folks think overloading a function like this is bad API design. And it's been the way it's been for a long time, so probably better to add a new function(s), rather than extend the API of an existing one.
- .load and .dump don't default to UTF8? AFAIU, they do default to UTF-8. Do they instead currently default to locale.getpreferredencoding() instead of the JSON spec(s) * encoding= was removed from .loads and was never accepted by json.load or json.dump
I think dump defaults to UTF-8. But load is a bit odd (and not that well documented). it appears to accept a file_like object that returns either a string or a byte object from its read() method. If strings, then the decoding is done. if bytes, then I assume that it's using utf-8. This, by the way, should be better documented.
- .load and .dump would also need to accept an encoding= parameter for non-spec data that don't want to continue handling the file themselves - pickle.load has an encoding= parameter
.loads doesn't now, so I don't see why they would need to with the proposed change. You can always encode/decode ahead of time however you want, either in the file-like object or by passing decoded str to .loads/dumps.
- Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode)
no, I think that's clear. in fact, you can't currently dump to a binary file: In [26]: json.dump(obj, open('tiny-enc.json', 'wb')) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-26-02e9bcd47a3e> in <module> ----> 1 json.dump(obj, open('tiny-enc.json', 'wb')) ~/miniconda3/envs/py3/lib/python3.8/json/__init__.py in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 178 # a debuggability cost 179 for chunk in iterable: --> 180 fp.write(chunk) 181 182 TypeError: a bytes-like object is required, not 'str' That's the beauty of Python 3's text model :-) JSON Specs:
- https://tools.ietf.org/html/rfc7159#section-8.1 :
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8,
So THAT is interesting. But the current implementation does not directly support anything but UTF-8, and I think it's fine that that still be the case. If anyone is using the other two, it's an esoteric case, and they can encode/decode by hand.
So, could we just have .load and .dump accept a path-like and an encoding= parameter (because they need to be able to specify UTF-8 / UTF-16 / UTF-32 anyway)?
These are separate questions, but I'll say: Yes, it could take a path-like. But I think there was not much support for that in this discussion. No -- there is no need for encoding parameter -- the other two options are rare and can be done by hand. BTW: .dumps() dumps to, well, a string, so it's not assuming any encoding. A user can encode it any way they want when passing it along. This, in fact, is all very Python3 text model compatible -- the encoding/decoding should happen as close to IO as possible. If there were no backward compatibility options, and it were me, I would only use strings in/out of the json module, but I think that ship has sailed. Anyway -- if anyone wants to push for overloading .load()/dump(), rather than making two new loadf() and dumpf() functions, then speak now -- that will take more discussion, and maybe a PEP. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython