
json.load and json.dump already default to UTF8 and already have parameters for json loading and dumping. json.loads and json.dumps exist only because there was no way to distinguish between a string containing JSON and a file path string. (They probably should've been .loadstr and .dumpstr, but it's too late for that now) TBH, I think it would be great to just have .load and .dump read the file with standard params when a path-like ( hasattr(obj, '__path__') ) is passed, but the suggested disadvantages of this are: - https://docs.python.org/3/library/functions.html#open
The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations
- .load and .dump don't default to UTF8? AFAIU, they do default to UTF-8. Do they instead currently default to locale.getpreferredencoding() instead of the JSON spec(s) * encoding= was removed from .loads and was never accepted by json.load or json.dump - .load and .dump would also need to accept an encoding= parameter for non-spec data that don't want to continue handling the file themselves - pickle.load has an encoding= parameter - marshal.load does not have (and probably doesn't need?) an encoding= parameter - What if you need to specify parameters for the file context manager? Accepting a path-like object should not break any existing code: you could always still open and close a file-like yourself. open('file', 'rb') as _file: json.load(_file) - Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode) JSON Specs: - https://tools.ietf.org/html/rfc7159#section-8.1 : that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32). Implementations MUST NOT add a byte order mark to the beginning of a JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error. - https://www.json.org/ > http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf (PDF!)
JSON syntax describes a sequence of Unicode code points. JSON also depends on Unicode in the hex numbers used in the \u escapement notation
So, could we just have .load and .dump accept a path-like and an encoding= parameter (because they need to be able to specify UTF-8 / UTF-16 / UTF-32 anyway)? On Tue, Sep 15, 2020 at 3:22 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Joao S. O. Bueno writes:
If .load and .dump are super-charged, people coding with these methods in mind have _one_ less_ thing to worry about: if the method accepts a path or an open file becomes irrelevant.
But then you either lose the primary benefit of this three line function (defaulting to the UTF-8 encoding to conform to the JSON standard), or you have a situation where what encoding you get can depend on whether you use the name of a file or that file already opened.
I consider that worse because it's precisely the kind of thing that people *don't* worry about and *do* have some difficulty debugging. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KO3ZZN... Code of Conduct: http://python.org/psf/codeofconduct/