A shortcut to load a JSON file into a dict : json.loadf
Hi All, This is the first time I'm posting to this mailing group, so forgive me if I'm making any mistakes. So one of the most common ways to load json, is via a file. This is used extensively in data science and the lines. We often write something like :- with open(filename.json, "r") as f: my_dict = json.load(f) or my_dict = json.load(open("filename.json", "r")) Since this is sooooo common, why doesn't python have something like :- json.loadf("filename.json") Is there an obvious issue by defining this in the cpython? I don't whipping up a PR if it gains traction.
+1 because the general idea is one of my most commonly used utility functions. There should also be a similar function for writing to a file, e.g. json.dumpf if we stick to this naming scheme. As an alternative, pathlib.Path.read/write_text are pretty cool, maybe we could have .read/write_json? In any case if we add anything like this it would probably make sense to add similar functions for pickle, and maybe other formats if the API is obvious enough. On Fri, Sep 11, 2020 at 10:40 PM The Nomadic Coder <atemysemicolon@gmail.com> wrote:
Hi All,
This is the first time I'm posting to this mailing group, so forgive me if I'm making any mistakes.
So one of the most common ways to load json, is via a file. This is used extensively in data science and the lines. We often write something like :-
with open(filename.json, "r") as f: my_dict = json.load(f)
or my_dict = json.load(open("filename.json", "r"))
Since this is sooooo common, why doesn't python have something like :- json.loadf("filename.json")
Is there an obvious issue by defining this in the cpython? I don't whipping up a PR if it gains traction. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YHO575... Code of Conduct: http://python.org/psf/codeofconduct/
Personally prefer it to be in the json module as it just feels more logical. But that's just a personal choice. I didn't mention about dumpf as I see a previous thread that became quite controversial (seems like dumps was a more common usage at that time). Something I forgot to mention : A (non-exact)search for this construct in github (https://github.com/search?q=with+open+%3A+json.load&type=Code) gives 20million+ results. Seems like it's a popular set of statements that people use ... ---- The Nomadic Coder
I"m pretty sure this came up recently, and was pretty much rejected. Another option would be to have json.dump take a file-like-object or a path-like object -- there's plenty of code out there that does that. hmm.. maybe that was the version that was rejected. But I like the idea either way. it always seemed cumbersome to me to write the whole context manager in these kinds of cases. -CHB On Fri, Sep 11, 2020 at 2:05 PM The Nomadic Coder <atemysemicolon@gmail.com> wrote:
Personally prefer it to be in the json module as it just feels more logical. But that's just a personal choice.
I didn't mention about dumpf as I see a previous thread that became quite controversial (seems like dumps was a more common usage at that time).
Something I forgot to mention : A (non-exact)search for this construct in github (https://github.com/search?q=with+open+%3A+json.load&type=Code) gives 20million+ results. Seems like it's a popular set of statements that people use ...
---- The Nomadic Coder _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/R4CZ2I... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
I like the idea of having these functions, but I don't like overloading the argument to a function with "filename or file-like object" as is common in libraries like Pandas. I think there are a few places the standard library does it, but the separation seems better to me. I don't LOVE the names dumpf() and loadf(), but I don't have an obviously better choice. I guess probably I'd want fromfile() and tofile() as more "modern" names. On Fri, Sep 11, 2020 at 12:17 PM Christopher Barker <pythonchb@gmail.com> wrote:
I"m pretty sure this came up recently, and was pretty much rejected.
Another option would be to have json.dump take a file-like-object or a path-like object -- there's plenty of code out there that does that.
hmm.. maybe that was the version that was rejected.
But I like the idea either way. it always seemed cumbersome to me to write the whole context manager in these kinds of cases.
-CHB
On Fri, Sep 11, 2020 at 2:05 PM The Nomadic Coder < atemysemicolon@gmail.com> wrote:
Personally prefer it to be in the json module as it just feels more logical. But that's just a personal choice.
I didn't mention about dumpf as I see a previous thread that became quite controversial (seems like dumps was a more common usage at that time).
Something I forgot to mention : A (non-exact)search for this construct in github (https://github.com/search?q=with+open+%3A+json.load&type=Code) gives 20million+ results. Seems like it's a popular set of statements that people use ...
---- The Nomadic Coder _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/R4CZ2I... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4XVZAX... Code of Conduct: http://python.org/psf/codeofconduct/
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
How about load_from_path or loadp? I can understand that loadf is a bit misleading, you might think that it loads from a file-like object, but parses from a file instead.
On Fri, 11 Sep 2020 at 19:50, David Mertz <mertz@gnosis.cx> wrote:
I like the idea of having these functions, but I don't like overloading the argument to a function with "filename or file-like object" as is common in libraries like Pandas. I think there are a few places the standard library does it, but the separation seems better to me.
I don't LOVE the names dumpf() and loadf(), but I don't have an obviously better choice. I guess probably I'd want fromfile() and tofile() as more "modern" names.
While separating the methods to avoid method overloading feels much cleaner (and I also think of it that way), w have to look at the other side of it: If .load and .dump are super-charged, people coding with these methods in mind have _one_ less_ thing to worry about: if the method accepts a path or an open file becomes irrelevant. On th othr hand, with two extra methods, anyone coding the samething have one _extra_ thing to worry about, even more things than they have today! Nowadays, one learn that 'ok, so load takes an open file", and that's it. With two new methods it becomes "ok, was it dump or dumpf which took an openfile? Or was it the other way around?" - and that is bad. Parameter overloading has been around from O.O. inception, it is used in other parts of Python, and permissive signatures are part of what made Python so atractive along the years. So, this is "the other side".
On Fri, Sep 11, 2020 at 12:17 PM Christopher Barker <pythonchb@gmail.com> wrote:
I"m pretty sure this came up recently, and was pretty much rejected.
Another option would be to have json.dump take a file-like-object or a path-like object -- there's plenty of code out there that does that.
hmm.. maybe that was the version that was rejected.
But I like the idea either way. it always seemed cumbersome to me to write the whole context manager in these kinds of cases.
-CHB
On Fri, Sep 11, 2020 at 2:05 PM The Nomadic Coder < atemysemicolon@gmail.com> wrote:
Personally prefer it to be in the json module as it just feels more logical. But that's just a personal choice.
I didn't mention about dumpf as I see a previous thread that became quite controversial (seems like dumps was a more common usage at that time).
Something I forgot to mention : A (non-exact)search for this construct in github (https://github.com/search?q=with+open+%3A+json.load&type=Code) gives 20million+ results. Seems like it's a popular set of statements that people use ...
---- The Nomadic Coder _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/R4CZ2I... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4XVZAX... Code of Conduct: http://python.org/psf/codeofconduct/
-- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/WOTLFR... Code of Conduct: http://python.org/psf/codeofconduct/
Joao S. O. Bueno writes:
If .load and .dump are super-charged, people coding with these methods in mind have _one_ less_ thing to worry about: if the method accepts a path or an open file becomes irrelevant.
But then you either lose the primary benefit of this three line function (defaulting to the UTF-8 encoding to conform to the JSON standard), or you have a situation where what encoding you get can depend on whether you use the name of a file or that file already opened. I consider that worse because it's precisely the kind of thing that people *don't* worry about and *do* have some difficulty debugging.
json.load and json.dump already default to UTF8 and already have parameters for json loading and dumping. json.loads and json.dumps exist only because there was no way to distinguish between a string containing JSON and a file path string. (They probably should've been .loadstr and .dumpstr, but it's too late for that now) TBH, I think it would be great to just have .load and .dump read the file with standard params when a path-like ( hasattr(obj, '__path__') ) is passed, but the suggested disadvantages of this are: - https://docs.python.org/3/library/functions.html#open
The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations
- .load and .dump don't default to UTF8? AFAIU, they do default to UTF-8. Do they instead currently default to locale.getpreferredencoding() instead of the JSON spec(s) * encoding= was removed from .loads and was never accepted by json.load or json.dump - .load and .dump would also need to accept an encoding= parameter for non-spec data that don't want to continue handling the file themselves - pickle.load has an encoding= parameter - marshal.load does not have (and probably doesn't need?) an encoding= parameter - What if you need to specify parameters for the file context manager? Accepting a path-like object should not break any existing code: you could always still open and close a file-like yourself. open('file', 'rb') as _file: json.load(_file) - Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode) JSON Specs: - https://tools.ietf.org/html/rfc7159#section-8.1 : that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32). Implementations MUST NOT add a byte order mark to the beginning of a JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error. - https://www.json.org/ > http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf (PDF!)
JSON syntax describes a sequence of Unicode code points. JSON also depends on Unicode in the hex numbers used in the \u escapement notation
So, could we just have .load and .dump accept a path-like and an encoding= parameter (because they need to be able to specify UTF-8 / UTF-16 / UTF-32 anyway)? On Tue, Sep 15, 2020 at 3:22 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Joao S. O. Bueno writes:
If .load and .dump are super-charged, people coding with these methods in mind have _one_ less_ thing to worry about: if the method accepts a path or an open file becomes irrelevant.
But then you either lose the primary benefit of this three line function (defaulting to the UTF-8 encoding to conform to the JSON standard), or you have a situation where what encoding you get can depend on whether you use the name of a file or that file already opened.
I consider that worse because it's precisely the kind of thing that people *don't* worry about and *do* have some difficulty debugging. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KO3ZZN... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, 15 Sep 2020 at 18:10, Wes Turner <wes.turner@gmail.com> wrote:
json.loads and json.dumps exist only because there was no way to distinguish between a string containing JSON and a file path string. (They probably should've been .loadstr and .dumpstr, but it's too late for that now)
Well, if you see the code of msutils.jsonLoad I linked before, it does a simple try. Not very elegant, but effective.
On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.turner@gmail.com> wrote:
json.load and json.dump already default to UTF8 and already have parameters for json loading and dumping.
yes, of course. json.loads and json.dumps exist only because there was no way to
distinguish between a string containing JSON and a file path string. (They probably should've been .loadstr and .dumpstr, but it's too late for that now)
I think they exist because that was the pickle API from years ago -- though maybe that's why the pickle API had them. Though I think you have it a bit backwards -- you can't pass a path into loads/dumps for that reason. If they were created because that distinction couldn't be made, then load/sump would have accepted a string path back in the day. TBH, I think it would be great to just have .load and .dump read the file
with standard params when a path-like ( hasattr(obj, '__path__') ) is passed, but the suggested disadvantages of this are:
- https://docs.python.org/3/library/functions.html#open
The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.
that's not a reason at all -- the reason is that some folks think overloading a function like this is bad API design. And it's been the way it's been for a long time, so probably better to add a new function(s), rather than extend the API of an existing one.
- .load and .dump don't default to UTF8? AFAIU, they do default to UTF-8. Do they instead currently default to locale.getpreferredencoding() instead of the JSON spec(s) * encoding= was removed from .loads and was never accepted by json.load or json.dump
I think dump defaults to UTF-8. But load is a bit odd (and not that well documented). it appears to accept a file_like object that returns either a string or a byte object from its read() method. If strings, then the decoding is done. if bytes, then I assume that it's using utf-8. This, by the way, should be better documented.
- .load and .dump would also need to accept an encoding= parameter for non-spec data that don't want to continue handling the file themselves - pickle.load has an encoding= parameter
.loads doesn't now, so I don't see why they would need to with the proposed change. You can always encode/decode ahead of time however you want, either in the file-like object or by passing decoded str to .loads/dumps.
- Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode)
no, I think that's clear. in fact, you can't currently dump to a binary file: In [26]: json.dump(obj, open('tiny-enc.json', 'wb')) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-26-02e9bcd47a3e> in <module> ----> 1 json.dump(obj, open('tiny-enc.json', 'wb')) ~/miniconda3/envs/py3/lib/python3.8/json/__init__.py in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 178 # a debuggability cost 179 for chunk in iterable: --> 180 fp.write(chunk) 181 182 TypeError: a bytes-like object is required, not 'str' That's the beauty of Python 3's text model :-) JSON Specs:
- https://tools.ietf.org/html/rfc7159#section-8.1 :
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8,
So THAT is interesting. But the current implementation does not directly support anything but UTF-8, and I think it's fine that that still be the case. If anyone is using the other two, it's an esoteric case, and they can encode/decode by hand.
So, could we just have .load and .dump accept a path-like and an encoding= parameter (because they need to be able to specify UTF-8 / UTF-16 / UTF-32 anyway)?
These are separate questions, but I'll say: Yes, it could take a path-like. But I think there was not much support for that in this discussion. No -- there is no need for encoding parameter -- the other two options are rare and can be done by hand. BTW: .dumps() dumps to, well, a string, so it's not assuming any encoding. A user can encode it any way they want when passing it along. This, in fact, is all very Python3 text model compatible -- the encoding/decoding should happen as close to IO as possible. If there were no backward compatibility options, and it were me, I would only use strings in/out of the json module, but I think that ship has sailed. Anyway -- if anyone wants to push for overloading .load()/dump(), rather than making two new loadf() and dumpf() functions, then speak now -- that will take more discussion, and maybe a PEP. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Tue, Sep 15, 2020 at 7:30 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.turner@gmail.com> wrote:
json.load and json.dump already default to UTF8 and already have parameters for json loading and dumping.
yes, of course.
json.loads and json.dumps exist only because there was no way to
distinguish between a string containing JSON and a file path string. (They probably should've been .loadstr and .dumpstr, but it's too late for that now)
I think they exist because that was the pickle API from years ago -- though maybe that's why the pickle API had them. Though I think you have it a bit backwards -- you can't pass a path into loads/dumps for that reason. If they were created because that distinction couldn't be made, then load/sump would have accepted a string path back in the day.
TBH, I think it would be great to just have .load and .dump read the file
with standard params when a path-like ( hasattr(obj, '__path__') ) is passed, but the suggested disadvantages of this are:
- https://docs.python.org/3/library/functions.html#open
The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.
that's not a reason at all -- the reason is that some folks think overloading a function like this is bad API design. And it's been the way it's been for a long time, so probably better to add a new function(s), rather than extend the API of an existing one.
.load - reads a file object .loadf - reads a file object that it opens for you from a str path or an object with an obj.__path__ .loads - reads from a string-like object or .load - reads a file object or creates a file object from a path or an obj.__path__ and closes it after reading .loads - reads from a For backwards-compatibility (without a check for `sys.version_info[:2]` or `hasattr(json, 'loadf')`, handling the file (e.g. using a context manager) will still be the way it's done.
- .load and .dump don't default to UTF8? AFAIU, they do default to UTF-8. Do they instead currently default to locale.getpreferredencoding() instead of the JSON spec(s) * encoding= was removed from .loads and was never accepted by json.load or json.dump
I think dump defaults to UTF-8. But load is a bit odd (and not that well documented).
it appears to accept a file_like object that returns either a string or a byte object from its read() method. If strings, then the decoding is done. if bytes, then I assume that it's using utf-8.
This, by the way, should be better documented.
I agree: https://github.com/python/cpython/blob/master/Lib/json/__init__.py
- .load and .dump would also need to accept an encoding= parameter for non-spec data that don't want to continue handling the file themselves - pickle.load has an encoding= parameter
.loads doesn't now, so I don't see why they would need to with the proposed change. You can always encode/decode ahead of time however you want, either in the file-like object or by passing decoded str to .loads/dumps.
pickle.loads does accept an encoding= parameter; and that's the API we were matching. Handling the file object will continue to be the backwards-compatible way to do it .
- Should we be using open(pth, 'rb') and open(pth, 'wb')? (Binary mode)
no, I think that's clear. in fact, you can't currently dump to a binary file:
In [26]: json.dump(obj, open('tiny-enc.json', 'wb'))
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-26-02e9bcd47a3e> in <module> ----> 1 json.dump(obj, open('tiny-enc.json', 'wb'))
~/miniconda3/envs/py3/lib/python3.8/json/__init__.py in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw) 178 # a debuggability cost 179 for chunk in iterable: --> 180 fp.write(chunk) 181 182
TypeError: a bytes-like object is required, not 'str'
That's the beauty of Python 3's text model :-)
JSON Specs:
- https://tools.ietf.org/html/rfc7159#section-8.1 :
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8,
So THAT is interesting. But the current implementation does not directly support anything but UTF-8, and I think it's fine that that still be the case. If anyone is using the other two, it's an esoteric case, and they can encode/decode by hand.
The Python JSON implementation should support the full JSON spec (including UTF-8, UTF-16, and UTF-32) and should default to UTF-8.
So, could we just have .load and .dump accept a path-like and an encoding= parameter (because they need to be able to specify UTF-8 / UTF-16 / UTF-32 anyway)?
These are separate questions, but I'll say:
Yes, it could take a path-like. But I think there was not much support for that in this discussion.
A path str or a path-like. Is there any reason not to also support a path-like object with this API, too?
No -- there is no need for encoding parameter -- the other two options are rare and can be done by hand.
There is a need for an encoding parameter in order to support the full JSON spec. Whether creating a new .loadf or just extending .load is the solution, the method should accept an encoding parameter.
BTW: .dumps() dumps to, well, a string, so it's not assuming any encoding. A user can encode it any way they want when passing it along.
This, in fact, is all very Python3 text model compatible -- the encoding/decoding should happen as close to IO as possible.
Is there precedent for handling the file for the user in any other stdlib functions? Extending the pickle and marshal APIs should also occur with this PR if accepted.
If there were no backward compatibility options, and it were me, I would only use strings in/out of the json module, but I think that ship has sailed.
The obj.__json__ protocol discussions discussed various ways to implement customizable serialization of object graphs containing complex types to JSON/JSON5 and/or JSON-LD (which BTW supports complex types like complex fractions)
Anyway -- if anyone wants to push for overloading .load()/dump(), rather than making two new loadf() and dumpf() functions, then speak now -- that will take more discussion, and maybe a PEP.
I don't see why one or the other would need a PEP so long as the new functionality is backward-compatible?
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Tue, Sep 15, 2020 at 5:26 PM Wes Turner <wes.turner@gmail.com> wrote:
On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.turner@gmail.com> wrote:
json.load and json.dump already default to UTF8 and already have
parameters for json loading and dumping.
so it turns out that loads(), which optionally takes a bytes or bytesarray object tries to determine whether the encoding is UTF-6, UTF-!6 or utf-32 (the ones allowed by the standard) (thanks Guido for the pointer). And load() calls loads(), so it should work with binary mode files as well.
Currently, dump() simply uses the fp passed in, and it doesn't support binary files, so it'll use the encoding the user set (or the default, if not set, which is an issue here) dumps() returns a string, so no encoding there. I think dumpf() should use UTF-8, and that's it. If anyone really wants something else, they can get it by providing an open text file object. loads(), on the other hand, is a bit tricky -- it could allow only UTF-8, but it seems it would be more consistent (and easy to do) to open the file in binary mode and use the existing code to determine the encoding. -CHB
The Python JSON implementation should support the full JSON spec (including UTF-8, UTF-16, and UTF-32) and should default to UTF-8.
'turns out it does already, and no one is suggesting changing that. Anyway -- if anyone wants to push for overloading .load()/dump(), rather
than making two new loadf() and dumpf() functions, then speak now -- that will take more discussion, and maybe a PEP.
I don't see why one or the other would need a PEP so long as the new functionality is backward-compatible?
iIm just putting my finger in the wind. no need for a PEP if it's simeel and non-controversial, but if even the few folks on this thread don't agree on the API we want, then it's maybe too controversial -- so either more discussion, to come to consensus, or a PEP. Or not -- we can see what the core devs say if/when someone does a bpo / PR. -CHB
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Wed, Sep 16, 2020, 5:18 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Tue, Sep 15, 2020 at 5:26 PM Wes Turner <wes.turner@gmail.com> wrote:
On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.turner@gmail.com> wrote:
json.load and json.dump already default to UTF8 and already have
parameters for json loading and dumping.
so it turns out that loads(), which optionally takes a bytes or bytesarray object tries to determine whether the encoding is UTF-6, UTF-!6 or utf-32 (the ones allowed by the standard) (thanks Guido for the pointer). And load() calls loads(), so it should work with binary mode files as well.
Currently, dump() simply uses the fp passed in, and it doesn't support binary files, so it'll use the encoding the user set (or the default, if not set, which is an issue here) dumps() returns a string, so no encoding there.
So I was not correct: dump does not default to UTF-8 (and does not accept an encoding= parameter)
I think dumpf() should use UTF-8, and that's it. If anyone really wants something else, they can get it by providing an open text file object.
Why would we impose UTF-8 when the spec says UTF-8, UTF-16, or UTF-32? How could this be improved? (I'm on my phone, so) def dumpf(obj, path, *args, **kwargs): with open(getattr(path, '__path__', path), 'w', encoding=kwargs.get('encoding', 'utf8')) as _file: return dump(_file, *args, **kwargs) def loadf(obj, path, *args, **kwargs): with open(getattr(path, '__path__', path), encoding=kwargs.get('encoding', 'utf8')) as _file: return load(_file, *args, **kwargs)
loads(), on the other hand, is a bit tricky -- it could allow only UTF-8, but it seems it would be more consistent (and easy to do) to open the file in binary mode and use the existing code to determine the encoding.
-CHB
The Python JSON implementation should support the full JSON spec (including UTF-8, UTF-16, and UTF-32) and should default to UTF-8.
'turns out it does already, and no one is suggesting changing that.
Anyway -- if anyone wants to push for overloading .load()/dump(), rather
than making two new loadf() and dumpf() functions, then speak now -- that will take more discussion, and maybe a PEP.
I don't see why one or the other would need a PEP so long as the new functionality is backward-compatible?
iIm just putting my finger in the wind. no need for a PEP if it's simeel and non-controversial, but if even the few folks on this thread don't agree on the API we want, then it's maybe too controversial -- so either more discussion, to come to consensus, or a PEP.
Or not -- we can see what the core devs say if/when someone does a bpo / PR.
-CHB
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
https://docs.python.org/3/library/os.html#os.fspath *__fspath__ On Wed, Sep 16, 2020, 5:53 PM Wes Turner <wes.turner@gmail.com> wrote:
On Wed, Sep 16, 2020, 5:18 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Tue, Sep 15, 2020 at 5:26 PM Wes Turner <wes.turner@gmail.com> wrote:
On Tue, Sep 15, 2020 at 9:09 AM Wes Turner <wes.turner@gmail.com> wrote:
json.load and json.dump already default to UTF8 and already have
parameters for json loading and dumping.
so it turns out that loads(), which optionally takes a bytes or bytesarray object tries to determine whether the encoding is UTF-6, UTF-!6 or utf-32 (the ones allowed by the standard) (thanks Guido for the pointer). And load() calls loads(), so it should work with binary mode files as well.
Currently, dump() simply uses the fp passed in, and it doesn't support binary files, so it'll use the encoding the user set (or the default, if not set, which is an issue here) dumps() returns a string, so no encoding there.
So I was not correct: dump does not default to UTF-8 (and does not accept an encoding= parameter)
I think dumpf() should use UTF-8, and that's it. If anyone really wants something else, they can get it by providing an open text file object.
Why would we impose UTF-8 when the spec says UTF-8, UTF-16, or UTF-32?
How could this be improved? (I'm on my phone, so)
def dumpf(obj, path, *args, **kwargs): with open(getattr(path, '__path__', path), 'w', encoding=kwargs.get('encoding', 'utf8')) as _file: return dump(_file, *args, **kwargs)
def loadf(obj, path, *args, **kwargs): with open(getattr(path, '__path__', path), encoding=kwargs.get('encoding', 'utf8')) as _file: return load(_file, *args, **kwargs)
loads(), on the other hand, is a bit tricky -- it could allow only UTF-8, but it seems it would be more consistent (and easy to do) to open the file in binary mode and use the existing code to determine the encoding.
-CHB
The Python JSON implementation should support the full JSON spec (including UTF-8, UTF-16, and UTF-32) and should default to UTF-8.
'turns out it does already, and no one is suggesting changing that.
Anyway -- if anyone wants to push for overloading .load()/dump(), rather
than making two new loadf() and dumpf() functions, then speak now -- that will take more discussion, and maybe a PEP.
I don't see why one or the other would need a PEP so long as the new functionality is backward-compatible?
iIm just putting my finger in the wind. no need for a PEP if it's simeel and non-controversial, but if even the few folks on this thread don't agree on the API we want, then it's maybe too controversial -- so either more discussion, to come to consensus, or a PEP.
Or not -- we can see what the core devs say if/when someone does a bpo / PR.
-CHB
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Wed, Sep 16, 2020 at 2:53 PM Wes Turner <wes.turner@gmail.com> wrote:
So I was not correct: dump does not default to UTF-8 (and does not accept an encoding= parameter)
I think dumpf() should use UTF-8, and that's it. If anyone really wants something else, they can get it by providing an open text file object.
Why would we impose UTF-8 when the spec says UTF-8, UTF-16, or UTF-32?
The idea was that the encoding was one of the motivators to doing this in the first place. But I suppose as long as utf-8 is the default, and only the three "official" ones are allowed, then yeah, we could add an encoding keyword argument. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Is all of this locale/encoding testing necessary (or even sufficient)? ```python import json import locale import os def get_default_encoding(): """ TODO XXX: ??? """ default_encoding = locale.getdefaultlocale()[1] if default_encoding.startswith("UTF-"): return default_encoding else: return "UTF-8" def dumpf(obj, path, *args, **kwargs): with open( os.fspath(path), "w", encoding=kwargs.pop("encoding", get_default_encoding()), ) as file_: return json.dump(obj, file_, *args, **kwargs) def loadf(path, *args, **kwargs): with open( os.fspath(path), "r", encoding=kwargs.pop("encoding", get_default_encoding()), ) as file_: return json.load(file_, *args, **kwargs) import pathlib import unittest class TestJsonLoadfAndDumpf(unittest.TestCase): def setUp(self): self.locales = ["", "C", "en_US.UTF-8", "japanese"] self.encodings = [None, "UTF-8", "UTF-16", "UTF-32"] data = dict( obj=dict(a=dict(b=[1, 2, 3])), encoding=None, path=pathlib.Path(".") / "test_loadf_and_dumpf.json", ) if os.path.isfile(data["path"]): os.unlink(data["path"]) self.data = data self.previous_locale = locale.getlocale() def tearDown(self): locale.setlocale(locale.LC_ALL, self.previous_locale) def test_get_default_encoding(self): for localestr in self.locales: locale.setlocale(locale.LC_ALL, localestr) output = get_default_encoding() assert output.startswith("UTF-") def test_dumpf_and_loadf(self): data = self.data for localestr in self.locales: locale.setlocale(locale.LC_ALL, localestr) for encoding in self.encodings: dumpf_output = dumpf( data["obj"], data["path"], encoding=encoding ) loadf_output = loadf(data["path"], encoding=encoding) assert loadf_output == data["obj"] ``` On Wed, Sep 16, 2020 at 8:30 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Wed, Sep 16, 2020 at 2:53 PM Wes Turner <wes.turner@gmail.com> wrote:
So I was not correct: dump does not default to UTF-8 (and does not accept an encoding= parameter)
I think dumpf() should use UTF-8, and that's it. If anyone really wants something else, they can get it by providing an open text file object.
Why would we impose UTF-8 when the spec says UTF-8, UTF-16, or UTF-32?
The idea was that the encoding was one of the motivators to doing this in the first place. But I suppose as long as utf-8 is the default, and only the three "official" ones are allowed, then yeah, we could add an encoding keyword argument.
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Is that suggested code? I don't follow. But if it is, no. personally, I think ANY use of system settings is a bad idea [*]. But certainly no need to even think about it for JSON. -CHB * have we not learned that in the age of the internet the machine the code happens to be running on has nothing to do with the user of the applications' needs? Timezones, encodings, number formats, NOTHING. On Wed, Sep 16, 2020 at 8:45 PM Wes Turner <wes.turner@gmail.com> wrote:
Is all of this locale/encoding testing necessary (or even sufficient)?
```python import json import locale import os
def get_default_encoding(): """ TODO XXX: ??? """ default_encoding = locale.getdefaultlocale()[1] if default_encoding.startswith("UTF-"): return default_encoding else: return "UTF-8"
def dumpf(obj, path, *args, **kwargs): with open( os.fspath(path), "w", encoding=kwargs.pop("encoding", get_default_encoding()), ) as file_: return json.dump(obj, file_, *args, **kwargs)
def loadf(path, *args, **kwargs): with open( os.fspath(path), "r", encoding=kwargs.pop("encoding", get_default_encoding()), ) as file_: return json.load(file_, *args, **kwargs)
import pathlib import unittest
class TestJsonLoadfAndDumpf(unittest.TestCase): def setUp(self): self.locales = ["", "C", "en_US.UTF-8", "japanese"] self.encodings = [None, "UTF-8", "UTF-16", "UTF-32"]
data = dict( obj=dict(a=dict(b=[1, 2, 3])), encoding=None, path=pathlib.Path(".") / "test_loadf_and_dumpf.json", ) if os.path.isfile(data["path"]): os.unlink(data["path"]) self.data = data
self.previous_locale = locale.getlocale()
def tearDown(self): locale.setlocale(locale.LC_ALL, self.previous_locale)
def test_get_default_encoding(self): for localestr in self.locales: locale.setlocale(locale.LC_ALL, localestr) output = get_default_encoding() assert output.startswith("UTF-")
def test_dumpf_and_loadf(self): data = self.data for localestr in self.locales: locale.setlocale(locale.LC_ALL, localestr) for encoding in self.encodings: dumpf_output = dumpf( data["obj"], data["path"], encoding=encoding ) loadf_output = loadf(data["path"], encoding=encoding) assert loadf_output == data["obj"] ```
On Wed, Sep 16, 2020 at 8:30 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Wed, Sep 16, 2020 at 2:53 PM Wes Turner <wes.turner@gmail.com> wrote:
So I was not correct: dump does not default to UTF-8 (and does not accept an encoding= parameter)
I think dumpf() should use UTF-8, and that's it. If anyone really wants something else, they can get it by providing an open text file object.
Why would we impose UTF-8 when the spec says UTF-8, UTF-16, or UTF-32?
The idea was that the encoding was one of the motivators to doing this in the first place. But I suppose as long as utf-8 is the default, and only the three "official" ones are allowed, then yeah, we could add an encoding keyword argument.
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Something like this in the docstring?: "In order to support the historical JSON specification and closed ecosystem JSON, it is possible to specify an encoding other than UTF-8." 8.1. Character Encoding
JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629]. Previous specifications of JSON have not required the use of UTF-8 when transmitting JSON text. However, the vast majority of JSON- based software implementations have chosen to use the UTF-8 encoding, to the extent that it is the only encoding that achieves interoperability. Implementations MUST NOT add a byte order mark (U+FEFF) to the beginning of a networked-transmitted JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.
```python import json import os def dumpf(obj, path, *, encoding="UTF-8", **kwargs): with open(os.fspath(path), "w", encoding=encoding) as f: return json.dump(obj, f, **kwargs) def loadf(path, *, encoding="UTF-8", **kwargs): with open(os.fspath(path), "r", encoding=encoding) as f: return json.load(f, **kwargs) import pathlib import unittest class TestJsonLoadfAndDumpf(unittest.TestCase): def setUp(self): self.encodings = [None, "UTF-8", "UTF-16", "UTF-32"] data = dict( obj=dict(a=dict(b=[1, 2, 3])), path=pathlib.Path(".") / "test_loadf_and_dumpf.json", ) if os.path.isfile(data["path"]): os.unlink(data["path"]) self.data = data def test_dumpf_and_loadf(self): data = self.data for encoding in self.encodings: path = f'{data["path"]}.{encoding}.json' dumpf_output = dumpf(data["obj"], path, encoding=encoding) loadf_output = loadf(path, encoding=encoding) assert loadf_output == data["obj"] # $ pip install pytest-cov # $ pytest -v example.py # https://docs.pytest.org/en/stable/parametrize.html # https://docs.pytest.org/en/stable/tmpdir.html import pytest @pytest.mark.parametrize("encoding", [None, "UTF-8", "UTF-16", "UTF-32"]) @pytest.mark.parametrize("obj", [dict(a=dict(b=[1, 2, 3]))]) def test_dumpf_and_loadf(obj, encoding, tmpdir): pth = pathlib.Path(tmpdir) / f"test_loadf_and_dumpf.{encoding}.json" dumpf_output = dumpf(obj, pth, encoding=encoding) loadf_output = loadf(pth, encoding=encoding) assert loadf_output == obj ``` For whoever creates a PR for this: - [ ] add parameter and return type annotations - [ ] copy docstrings from json.load/json.dump and open#encoding - [ ] correctly support the c module implementation (this just does `import json`)? - [ ] keep or drop the encoding tests? On Thu, Sep 17, 2020 at 1:25 AM Christopher Barker <pythonchb@gmail.com> wrote:
Is that suggested code? I don't follow.
But if it is, no. personally, I think ANY use of system settings is a bad idea [*]. But certainly no need to even think about it for JSON.
-CHB
* have we not learned that in the age of the internet the machine the code happens to be running on has nothing to do with the user of the applications' needs? Timezones, encodings, number formats, NOTHING.
On Wed, Sep 16, 2020 at 8:45 PM Wes Turner <wes.turner@gmail.com> wrote:
Is all of this locale/encoding testing necessary (or even sufficient)?
```python import json import locale import os
def get_default_encoding(): """ TODO XXX: ??? """ default_encoding = locale.getdefaultlocale()[1] if default_encoding.startswith("UTF-"): return default_encoding else: return "UTF-8"
def dumpf(obj, path, *args, **kwargs): with open( os.fspath(path), "w", encoding=kwargs.pop("encoding", get_default_encoding()), ) as file_: return json.dump(obj, file_, *args, **kwargs)
def loadf(path, *args, **kwargs): with open( os.fspath(path), "r", encoding=kwargs.pop("encoding", get_default_encoding()), ) as file_: return json.load(file_, *args, **kwargs)
import pathlib import unittest
class TestJsonLoadfAndDumpf(unittest.TestCase): def setUp(self): self.locales = ["", "C", "en_US.UTF-8", "japanese"] self.encodings = [None, "UTF-8", "UTF-16", "UTF-32"]
data = dict( obj=dict(a=dict(b=[1, 2, 3])), encoding=None, path=pathlib.Path(".") / "test_loadf_and_dumpf.json", ) if os.path.isfile(data["path"]): os.unlink(data["path"]) self.data = data
self.previous_locale = locale.getlocale()
def tearDown(self): locale.setlocale(locale.LC_ALL, self.previous_locale)
def test_get_default_encoding(self): for localestr in self.locales: locale.setlocale(locale.LC_ALL, localestr) output = get_default_encoding() assert output.startswith("UTF-")
def test_dumpf_and_loadf(self): data = self.data for localestr in self.locales: locale.setlocale(locale.LC_ALL, localestr) for encoding in self.encodings: dumpf_output = dumpf( data["obj"], data["path"], encoding=encoding ) loadf_output = loadf(data["path"], encoding=encoding) assert loadf_output == data["obj"] ```
On Wed, Sep 16, 2020 at 8:30 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Wed, Sep 16, 2020 at 2:53 PM Wes Turner <wes.turner@gmail.com> wrote:
So I was not correct: dump does not default to UTF-8 (and does not accept an encoding= parameter)
I think dumpf() should use UTF-8, and that's it. If anyone really wants something else, they can get it by providing an open text file object.
Why would we impose UTF-8 when the spec says UTF-8, UTF-16, or UTF-32?
The idea was that the encoding was one of the motivators to doing this in the first place. But I suppose as long as utf-8 is the default, and only the three "official" ones are allowed, then yeah, we could add an encoding keyword argument.
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Thu, Sep 17, 2020 at 3:02 PM Wes Turner <wes.turner@gmail.com> wrote:
Something like this in the docstring?: "In order to support the historical JSON specification and closed ecosystem JSON, it is possible to specify an encoding other than UTF-8."
I don't think dumpf should support encoding parameter. 1. Output is ASCII unless `ensure_ascii=True` is specified. 2. Writing new JSON file with obsolete spec is not recommended. 3. If user really need it, they can write obsolete JSON by `dump` or `dumps` anyway. I against adding `encoding` parameter to dumpf and loadf. They are just shortcut for common cases. Regards, -- Inada Naoki <songofacandy@gmail.com>
On Thu, Sep 17, 2020 at 6:54 AM Wes Turner <wes.turner@gmail.com> wrote:
Why would we impose UTF-8 when the spec says UTF-8, UTF-16, or UTF-32?
Obsolete JSON spec said UTF-8, UTF-16, and UTF-32. Current spec says UTF-8. See https://tools.ietf.org/html/rfc8259#section-8.1 So `dumpf` must use UTF-8, although `loadf` can support UTF-16 and UTF-32 like `loads`.
How could this be improved? (I'm on my phone, so)
def dumpf(obj, path, *args, **kwargs): with open(getattr(path, '__path__', path), 'w', encoding=kwargs.get('encoding', 'utf8')) as _file: return dump(_file, *args, **kwargs)
def loadf(obj, path, *args, **kwargs): with open(getattr(path, '__path__', path), encoding=kwargs.get('encoding', 'utf8')) as _file: return load(_file, *args, **kwargs)
def dumpf(obj, path, *, **kwargs): with open(path, "w", encoding="utf-8") as f: return dump(obj, f, **kwargs) def loadf(obj, path, *, **kwargs): with open(path, "rb") as f: return load(f, **kwargs) Regards, -- Inada Naoki <songofacandy@gmail.com>
On Tue, Sep 15, 2020 at 12:21 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
If .load and .dump are super-charged, people coding with these methods in mind have _one_ less_ thing to worry about: if the method accepts a path or an open file becomes irrelevant.
But then you either lose the primary benefit of this three line function (defaulting to the UTF-8 encoding to conform to the JSON standard), or you have a situation where what encoding you get can depend on whether you use the name of a file or that file already opened.
I don't follow here -- either way, if you open the file yourself, then you're responsible for the encoding. and if you use a filename, then the module takes care of that for you. If anything, I think it may be less likely that people will open the file themselves incorrectly rather than passing in a file name if they are using the same function to do it. Honearly, I've been known to try to pass a filename in directly into json.load() when quickly writing code. But I didn't hear much support for the overloading option anyway, so I don't think that's going anywhere. -CHB Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KO3ZZN... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
What happened to "not every three-line function needs to be a built-in"? This is *literally* a three-line function. And I know the proposal is not to make it a builtin, but still... ISTM down here lies the path to PHP. On Fri, Sep 11, 2020 at 3:16 PM Christopher Barker <pythonchb@gmail.com> wrote:
I"m pretty sure this came up recently, and was pretty much rejected.
Another option would be to have json.dump take a file-like-object or a path-like object -- there's plenty of code out there that does that.
hmm.. maybe that was the version that was rejected.
But I like the idea either way. it always seemed cumbersome to me to write the whole context manager in these kinds of cases.
-CHB
On Fri, Sep 11, 2020 at 2:05 PM The Nomadic Coder < atemysemicolon@gmail.com> wrote:
Personally prefer it to be in the json module as it just feels more logical. But that's just a personal choice.
I didn't mention about dumpf as I see a previous thread that became quite controversial (seems like dumps was a more common usage at that time).
Something I forgot to mention : A (non-exact)search for this construct in github (https://github.com/search?q=with+open+%3A+json.load&type=Code) gives 20million+ results. Seems like it's a popular set of statements that people use ...
---- The Nomadic Coder _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/R4CZ2I... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4XVZAX... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
oops, was not aware of "not every three-line function needs to be a built-in" This came out personal frustration, as I use this 3 line function very, very often, and the whole community does. Still learning-to-navigate what's accepted here and what's not :)
On 11Sep2020 23:09, The Nomadic Coder <atemysemicolon@gmail.com> wrote:
oops, was not aware of "not every three-line function needs to be a built-in"
This came out personal frustration, as I use this 3 line function very, very often, and the whole community does. Still learning-to-navigate what's accepted here and what's not :)
DRY. Why do not people use modules more often (I don't mean the stdlib, I mean extra modules)? My personal solution to this kind of thing is to keep a little library/module of these 3 line things if I use them. Then you just import stuff and use it. 2 trite examples: I've got a cs.lex module with a bunch of little things in there for parsing stuff - identifiers, quoted strings, etc etc; it's on PyPI so using it elsewhere is trivial. Closer to Nomadic's use case, I've got an @strable decorator, thus: @strable def func(f, ...): ... do something with an open file ... It intercepts the first argument: if a str, it opens it as a file (by default, you can provide an arbitrary function for the "open" action). Then the function just has to work with a file (or whatever a str should turn into, domain specific). You could also write: load_json = strable(json.load) and be on your way. This is also in a module (cs.deco), also on PyPI for reuse. Not every three-line function needs to be a built-in, but for the three-line functions _you_ use a lot, write them _once_ and import them from your personal little module-of-three-line-functions. No need to publish to PyPI (extra work) - it's as easy to keep them locally unless you need them elsewhere. But don't rewrite - reuse! Cheers, Cameron Simpson <cs@cskk.id.au>
The Nomadic Coder writes:
This came out personal frustration, as I use this 3 line function very, very often, and the whole community does.
I don't deny your experience, but mine differs. Most JSON I get as single objects arrives as strings (eg as an attribute on an HTTP response object), not in files or file-like objects. Most JSON I get from file-like objects comes as streams of objects, rather than as single objects, which means that to use the json module I have to do something a little more complicated (sometimes quite a bit more complicated) than a 3-line function. The issue of handling such streams has come up in the past.
Still learning-to-navigate what's accepted here and what's not :)
On this list, you can ask as long as you don't get pissy about not receiving. See Naoki Inada's post for why this might be a good idea even though it's a three-line function. It's not open and shut (for one thing, on most modern systems the system default encoding is already UTF-8), but it definitely is food for thought. Also Serhiy's post on issues specific to working with files and json. So this is a richer learning experience than you might have thought!
On Sat, Sep 12, 2020 at 5:29 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
See Naoki Inada's post for why this might be a good idea even though it's a three-line function. It's not open and shut (for one thing, on most modern systems the system default encoding is already UTF-8),
Actually that fact is the most compelling reason to do this -- since many (most) development systems use UTF-* by default then if you leave the encoding out, your code will work most of the time, pass tests, pass test on the CI, etc, and then barf when it's run on an unusual system. I have to say that I"m not at all sure that I've remembered to set the encoding in my code that reads/writes JSON from files. Time to go check. In fact, I'm getting bytes/str errors in JSON related parts of code I'm porting to py3 right now. This is probably related. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Fri, Sep 11, 2020, 12:59 PM Guido van Rossum <guido@python.org> wrote:
What happened to "not every three-line function needs to be a built-in"? This is *literally* a three-line function. And I know the proposal is not to make it a builtin, but still... ISTM down here lies the path to PHP.
By the same reasoning, though, if you have dumps(), writing dump() in terms of it is a three-line function. Same in the venerable pickle versions.
On Fri, Sep 11, 2020 at 4:19 PM David Mertz <mertz@gnosis.cx> wrote:
On Fri, Sep 11, 2020, 12:59 PM Guido van Rossum <guido@python.org> wrote:
What happened to "not every three-line function needs to be a built-in"? This is *literally* a three-line function. And I know the proposal is not to make it a builtin, but still... ISTM down here lies the path to PHP.
By the same reasoning, though, if you have dumps(), writing dump() in terms of it is a three-line function. Same in the venerable pickle versions.
Not quite -- if the serialized data is really huge it makes sense to read or write it directly from/to a file, rather than building up a gigantic in-memory buffer as an intermediate structure. (And going in the other direction, if all you have is load/dump, constructing an io.StringIO instance is a fairly awkward bit of idiom. But I'm convinced by Inada's observation that it's easy to have encoding-related bugs -- we should add this in a way that avoids those. And I'm fine with loadf/dumpf as the names, since we already have loads/dumps. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
On Sat, Sep 12, 2020 at 7:59 AM Guido van Rossum <guido@python.org> wrote:
What happened to "not every three-line function needs to be a built-in"? This is *literally* a three-line function.
This is not only common two line idiom. It creates huge amount of potential bugs. ``` with open("myfile.json") as f: data = json.load(f) with open("myfile.json", "w") as f: json.dump(f, ensure_ascii=False) ``` Both two lines have bugs; they don't specify `encoding` [1]. It uses locale encoding to read/write JSON although the JSON file must be encoded in UTF-8. The locale encoding is legacy encoding on Windows. It is very easy to write "not work on Windows". My PEP 597 [2] will help to find such bugs. But it warns only in dev mode to avoid too noisy DeprecationWarning. Huge amounts of DeprecationWarning make people dismiss DeprecationWarning. So helper functions will save people from this kind of bugs too. [1] In case of `json.load(f)`, we can use binary file instead. [2] https://www.python.org/dev/peps/pep-0597/ Regards, -- Inada Naoki <songofacandy@gmail.com>
12.09.20 01:57, Guido van Rossum пише:
What happened to "not every three-line function needs to be a built-in"? This is *literally* a three-line function. And I know the proposal is not to make it a builtin, but still... ISTM down here lies the path to PHP.
Oh, I am very glad that this principle has not yet been forgotten.
11.09.20 23:28, The Nomadic Coder пише:
Hi All,
This is the first time I'm posting to this mailing group, so forgive me if I'm making any mistakes.
So one of the most common ways to load json, is via a file. This is used extensively in data science and the lines. We often write something like :-
with open(filename.json, "r") as f: my_dict = json.load(f)
or my_dict = json.load(open("filename.json", "r"))
Since this is sooooo common, why doesn't python have something like :- json.loadf("filename.json")
Is there an obvious issue by defining this in the cpython? I don't whipping up a PR if it gains traction.
Similar ideas were already proposed and rejected multiple times. First, there is a principle "not every three-line function needs to be a built-in". Every new function increases maintenance cost and cognitive burden. You can think that it is small, but there are thousands of such "useful" helpers. It is easier to learn how to combine simple builtin blocks than remember names and arguments for all combinations. Second, the resulting function would have monstrous interface. open() takes 8 arguments, and json.load() can take at least 8 arguments, and they should be combined. And thank that open() does not accept arbitrary var-keyword arguments as json.load(), resolving this conflict would be impossible. Third, this combination is not so common as you think. On my current work JSON is used everywhere, but in most cases it is received from internet or load from database. If loading from a file is the most common ways to load JSON in your program, it is not hard to write your own helper with interface and defaults that suit you. It will take less time than writing a letter in a mailing list.
On 12/09/20 8:36 pm, Serhiy Storchaka wrote:
it is not hard to write your own helper with interface and defaults that suit you. It will take less time than writing a letter in a mailing list.
Obviously what's needed is an IDE feature such that whenever you write a 3-line function that you haven't used before, it automatically posts a request to python-ideas asking for it to be added to the stdlib. Think of all the time it would save! -- Greg
On 9/12/2020 12:05 PM, Greg Ewing wrote:
On 12/09/20 8:36 pm, Serhiy Storchaka wrote:
it is not hard to write your own helper with interface and defaults that suit you. It will take less time than writing a letter in a mailing list.
Obviously what's needed is an IDE feature such that whenever you write a 3-line function that you haven't used before, it automatically posts a request to python-ideas asking for it to be added to the stdlib. Think of all the time it would save!
Nah, don't stop there. All you need is an IDE feature that creates and merges a pull request. Just think how easy that would make it for Python to be everything to everyone. --Edwin
On Sat, Sep 12, 2020 at 1:40 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
Second, the resulting function would have monstrous interface. open() takes 8 arguments,
well, maybe not -- this would not need to support the entire open() interface: * No one is suggesting getting rid of the current load() function, so if you need to do something less common, you can always open the file yourself. * Many of the options to open() should be always the same when reading/writing json. In fact, that's (IMHO) the most compelling reason to have this built in -- the defaults may be not ideal for JSON, and other specs may be downright wrong. Here they are all eight parameters. mode='r' : this should be either 'r' or 'w' depending on loading or saving, I don't think there's any other (certainly not common) use case for any other mode. buffering=-1 : This seems very reasonable to put in control of the JSON encoder/decoder encoding=None: this is the important one -- json is always UTF-8 yes? errors=None: This also is good ot be in control of the JSON encoder/decoding newline=None: also in control of the encoder closefd=True: I think this is irrelevant to this use case. opener=None: Also does not need to be supported If I have this right, then none of the optional arguments to open() would need to be passed through.
they should be combined. And thank that open() does not accept arbitrary
var-keyword arguments as json.load(), resolving this conflict would be impossible.
see above -- there is no conflict to resolve.
Third, this combination is not so common as you think.
agreed -- but still fairly common. And most of the use cases that aren't file-on-disk are strings anyway. I know I use file-on-disk most of the time I use .load()/dump(), even if I use .loads()/dumps() more frequently. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Christopher Barker writes:
encoding=None: this is the important one -- json is always UTF-8 yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that there's a ton of Japanese in Shift JIS, including some produced by default in Python on Windows. I'll bet the same is true of GBK for Chinese, and maybe even ISO-8859-1 in Europe.
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
encoding=None: this is the important one -- json is always UTF-8 yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that there's a ton of Japanese in Shift JIS, including some produced by default in Python on Windows. I'll bet the same is true of GBK for Chinese, and maybe even ISO-8859-1 in Europe.
So what should the json lib do with these? It could have an encoding parameter with utf-8 as default. Or it could require that the user open the file themselves if it's not UTF-8. BTW: I noticed that json.loads() takes: Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object. A str is an str (already Unicode, yes?) -- but for bytes, it must be assuming some encoding, presumably UTF-8, but it doesn't seem to have a way to specify one -- so this is already a missing feature. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Sun, Sep 13, 2020 at 8:59 AM Christopher Barker <pythonchb@gmail.com> wrote:
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
encoding=None: this is the important one -- json is always UTF-8 yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that there's a ton of Japanese in Shift JIS, including some produced by default in Python on Windows. I'll bet the same is true of GBK for Chinese, and maybe even ISO-8859-1 in Europe.
If a document is not encoded as UTF-8, then it is not JSON. It might be "a format closely inspired by JSON" but not the real thing. I think it's reasonable for a new json.loadf() function to only handle actual JSON files. If something is encoded wrongly, user can do what they do now, and use the various options to open() combined with json.load() or json.loads(). A good error message about the encoding being the issue would be nice. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
On 2020-09-13 11:57, Christopher Barker wrote:
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp <mailto:turnbull.stephen.fw@u.tsukuba.ac.jp>> wrote:
> encoding=None: this is the important one -- json is always UTF-8 yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that there's a ton of Japanese in Shift JIS, including some produced by default in Python on Windows. I'll bet the same is true of GBK for Chinese, and maybe even ISO-8859-1 in Europe.
So what should the json lib do with these? It could have an encoding parameter with utf-8 as default. Or it could require that the user open the file themselves if it's not UTF-8.
BTW: I noticed that json.loads() takes:
Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object.
A str is an str (already Unicode, yes?) -- but for bytes, it must be assuming some encoding, presumably UTF-8, but it doesn't seem to have a way to specify one -- so this is already a missing feature.
It's not a missing feature, because the JSON spec requires UTF-8. If it's not UTF-8, it's invalid JSON. If a user wants to handle a file that looks sort of like JSON but technically isn't because it's not UTF-8, it's on the user to first convert the file to UTF-8 before bringing JSON into the picture. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
Oh, well, but stdlib json already emit (and parse) invalid json by default... $ ipython Python 3.8.3 (default, May 19 2020, 13:54:14) Type 'copyright', 'credits' or 'license' for more information IPython 7.15.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: import json In [2]: json.dumps({'a':float('nan')}) Out[2]: '{"a": NaN}' In [3]: json.loads(json.dumps({'a':float('nan')})) Out[3]: {'a': nan} NaN is not allowed in json spec, so this is wrong, though you can still do it... Maybe it's one of those cases where practicality beats purity. -- M On Sun, 13 Sep 2020 at 12:31, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-09-13 11:57, Christopher Barker wrote:
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp <mailto:turnbull.stephen.fw@u.tsukuba.ac.jp>> wrote:
> encoding=None: this is the important one -- json is always UTF-8 yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that there's a ton of Japanese in Shift JIS, including some produced by default in Python on Windows. I'll bet the same is true of GBK for Chinese, and maybe even ISO-8859-1 in Europe.
So what should the json lib do with these? It could have an encoding parameter with utf-8 as default. Or it could require that the user open the file themselves if it's not UTF-8.
BTW: I noticed that json.loads() takes:
Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object.
A str is an str (already Unicode, yes?) -- but for bytes, it must be assuming some encoding, presumably UTF-8, but it doesn't seem to have a way to specify one -- so this is already a missing feature.
It's not a missing feature, because the JSON spec requires UTF-8. If it's not UTF-8, it's invalid JSON. If a user wants to handle a file that looks sort of like JSON but technically isn't because it's not UTF-8, it's on the user to first convert the file to UTF-8 before bringing JSON into the picture.
-- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6LLNXB... Code of Conduct: http://python.org/psf/codeofconduct/
Yes, that is a design flaw in the stdlib. There ought to be an opt-in switch for accepting/producing those special values, not the current opt-out for strictness... And the misnamed parameter is 'allow_nan' whereas it also configures 'Infinity'. On Sun, Sep 13, 2020, 3:16 PM Matthias Bussonnier < bussonniermatthias@gmail.com> wrote:
Oh, well, but stdlib json already emit (and parse) invalid json by default...
$ ipython Python 3.8.3 (default, May 19 2020, 13:54:14) Type 'copyright', 'credits' or 'license' for more information IPython 7.15.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import json
In [2]: json.dumps({'a':float('nan')}) Out[2]: '{"a": NaN}'
In [3]: json.loads(json.dumps({'a':float('nan')})) Out[3]: {'a': nan}
NaN is not allowed in json spec, so this is wrong, though you can still do it... Maybe it's one of those cases where practicality beats purity. -- M
On Sun, 13 Sep 2020 at 12:31, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-09-13 11:57, Christopher Barker wrote:
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp <mailto:turnbull.stephen.fw@u.tsukuba.ac.jp>> wrote:
> encoding=None: this is the important one -- json is always
UTF-8
yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that there's a ton of Japanese in Shift JIS, including some produced by default in Python on Windows. I'll bet the same is true of GBK for Chinese, and maybe even ISO-8859-1 in Europe.
So what should the json lib do with these? It could have an encoding parameter with utf-8 as default. Or it could require that the user open the file themselves if it's not UTF-8.
BTW: I noticed that json.loads() takes:
Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object.
A str is an str (already Unicode, yes?) -- but for bytes, it must be assuming some encoding, presumably UTF-8, but it doesn't seem to have a way to specify one -- so this is already a missing feature.
It's not a missing feature, because the JSON spec requires UTF-8. If it's not UTF-8, it's invalid JSON. If a user wants to handle a file that looks sort of like JSON but technically isn't because it's not UTF-8, it's on the user to first convert the file to UTF-8 before bringing JSON into the picture.
-- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6LLNXB... Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/R36SES... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Sep 14, 2020 at 10:52 AM David Mertz <mertz@gnosis.cx> wrote:
Yes, that is a design flaw in the stdlib. There ought to be an opt-in switch for accepting/producing those special values, not the current opt-out for strictness... And the misnamed parameter is 'allow_nan' whereas it also configures 'Infinity'.
In case of encoding, we deprecated and ignored it in json.loads since Python 3.1, and removed in 3.9. Users still can load/save JSON with legacy encodings with open() + dump/load. -- Inada Naoki <songofacandy@gmail.com>
From "[Python-ideas] JSON encoder protocol (was Re: adding support for a "raw output" in JSON serializer)" https://mail.python.org/archives/list/python-ideas@python.org/message/5C4UHZ...
- There's JSON5; which supports comments, trailing commas, IEEE 754 ±Infinity and NaN, [...] https://json5.org/
On Sun, Sep 13, 2020, 9:53 PM David Mertz <mertz@gnosis.cx> wrote:
Yes, that is a design flaw in the stdlib. There ought to be an opt-in switch for accepting/producing those special values, not the current opt-out for strictness... And the misnamed parameter is 'allow_nan' whereas it also configures 'Infinity'.
On Sun, Sep 13, 2020, 3:16 PM Matthias Bussonnier < bussonniermatthias@gmail.com> wrote:
Oh, well, but stdlib json already emit (and parse) invalid json by default...
$ ipython Python 3.8.3 (default, May 19 2020, 13:54:14) Type 'copyright', 'credits' or 'license' for more information IPython 7.15.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import json
In [2]: json.dumps({'a':float('nan')}) Out[2]: '{"a": NaN}'
In [3]: json.loads(json.dumps({'a':float('nan')})) Out[3]: {'a': nan}
NaN is not allowed in json spec, so this is wrong, though you can still do it... Maybe it's one of those cases where practicality beats purity. -- M
On Sun, 13 Sep 2020 at 12:31, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2020-09-13 11:57, Christopher Barker wrote:
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp <mailto:turnbull.stephen.fw@u.tsukuba.ac.jp>> wrote:
> encoding=None: this is the important one -- json is always
UTF-8
yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that there's a ton of Japanese in Shift JIS, including some produced by default in Python on Windows. I'll bet the same is true of GBK
for
Chinese, and maybe even ISO-8859-1 in Europe.
So what should the json lib do with these? It could have an encoding parameter with utf-8 as default. Or it could require that the user
open
the file themselves if it's not UTF-8.
BTW: I noticed that json.loads() takes:
Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance containing a JSON document) to a Python object.
A str is an str (already Unicode, yes?) -- but for bytes, it must be assuming some encoding, presumably UTF-8, but it doesn't seem to have a way to specify one -- so this is already a missing feature.
It's not a missing feature, because the JSON spec requires UTF-8. If it's not UTF-8, it's invalid JSON. If a user wants to handle a file that looks sort of like JSON but technically isn't because it's not UTF-8, it's on the user to first convert the file to UTF-8 before bringing JSON into the picture.
-- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6LLNXB... Code of Conduct: http://python.org/psf/codeofconduct/
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/R36SES... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RYKDII... Code of Conduct: http://python.org/psf/codeofconduct/
Christopher Barker writes:
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
encoding=None: this is the important one -- json is always UTF-8 yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that there's a ton of Japanese in Shift JIS, including some produced by default in Python on Windows. I'll bet the same is true of GBK for Chinese, and maybe even ISO-8859-1 in Europe.
So what should the json lib do with these?
Well, I'm a mail guy from way back, so I'm with Mr. Postol: be libertine in what you accept, puritan in what you emit. I think given the current architecture of json, dump and load are fine as is, dump should be discourage (but not removed!) in favor of dumpf, and dumpf and loadf should provide no option but UTF-8. I just wanted to point out that it's very likely that there's a lot of "JSON-like" data out there, and probably a lot of "unwritten protocols" that expect it. While nobody has proposed removing dump and load, I don't want them deprecated or discouraged for the purpose of dealing with "JSON-like" data, especially not load.
There seems to be a fair bit of support for this idea. Will it need a PEP ? -CHB On Mon, Sep 14, 2020 at 9:20 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Christopher Barker writes:
On Sun, Sep 13, 2020 at 7:58 AM Stephen J. Turnbull <
turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
encoding=None: this is the important one -- json is always UTF-8 yes?
Standard JSON is always UTF-8. Nevertheless, I'm quite sure that
there's a ton of Japanese in Shift JIS, including some produced by
default in Python on Windows. I'll bet the same is true of GBK for
Chinese, and maybe even ISO-8859-1 in Europe.
So what should the json lib do with these?
Well, I'm a mail guy from way back, so I'm with Mr. Postol: be
libertine in what you accept, puritan in what you emit. I think given
the current architecture of json, dump and load are fine as is, dump
should be discourage (but not removed!) in favor of dumpf, and dumpf
and loadf should provide no option but UTF-8.
I just wanted to point out that it's very likely that there's a lot of
"JSON-like" data out there, and probably a lot of "unwritten
protocols" that expect it. While nobody has proposed removing dump
and load, I don't want them deprecated or discouraged for the purpose
of dealing with "JSON-like" data, especially not load.
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Tue, Sep 15, 2020 at 2:41 AM Christopher Barker <pythonchb@gmail.com> wrote:
There seems to be a fair bit of support for this idea.
Will it need a PEP ?
I think not; unless there are API details to be hashed out or counterarguments to be rebuffed, I think this is fairly simple and non-controversial. (If someone disagrees with me, then it's clearly at least somewhat controversial, and I withdraw the above.) Can you create a PR directly? If not, create a bugs.python.org issue to track it. Either way, it's up to the core devs to decide, but it shouldn't need all the overhead of a full PEP. ChrisA
On Mon, Sep 14, 2020 at 10:00 AM Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Sep 15, 2020 at 2:41 AM Christopher Barker <pythonchb@gmail.com> wrote:
There seems to be a fair bit of support for this idea.
Will it need a PEP ?
I think not; unless there are API details to be hashed out or counterarguments to be rebuffed, I think this is fairly simple and non-controversial.
agreed -- at least the simple part. The only API detail I"ve seen is whether to make json.load() take a "path-like or file-like" object, or to have separate functions. And I think the consensus is to have separate functions. And to name them ``loadf()`` and ``savef``. The other potential API issue is whether to support arbitrary encdoings -- but I think the consensus there is also no. So yes -- pretty simple.
Can you create a PR directly? If not, create a bugs.python.org issue
to track it.
not me -- I'm having enough trouble finding time to srite the inf/nan PEP. But I think the OP offered to do so.
Either way, it's up to the core devs to decide, but it shouldn't need all the overhead of a full PEP.
And if it does, the core devs can tell us then. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Mon, Sep 14, 2020 at 9:58 AM Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Sep 15, 2020 at 2:41 AM Christopher Barker <pythonchb@gmail.com> wrote:
There seems to be a fair bit of support for this idea.
Will it need a PEP ?
I think not; unless there are API details to be hashed out or counterarguments to be rebuffed, I think this is fairly simple and non-controversial.
I agree.
(If someone disagrees with me, then it's clearly at least somewhat controversial, and I withdraw the above.)
Can you create a PR directly? If not, create a bugs.python.org issue to track it. Either way, it's up to the core devs to decide, but it shouldn't need all the overhead of a full PEP.
It will still need a bpo issue, so might as well create one now. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
Well, I didn't read the entire discussion... but I wrote in unsuspicious times a stupid little module, msutils, with two stupid little functions, jsonLoad and jsonDump: https://github.com/Marco-Sulla/msutils/blob/master/msutils/jsonutil.py#L20
14.09.20 19:55, Chris Angelico пише:
I think not; unless there are API details to be hashed out or counterarguments to be rebuffed, I think this is fairly simple and non-controversial.
I also considered this issue simple and non-controversial. But the discussion turned in an unexpected direction.
(If someone disagrees with me, then it's clearly at least somewhat controversial, and I withdraw the above.)
Can you create a PR directly? If not, create a bugs.python.org issue to track it. Either way, it's up to the core devs to decide, but it shouldn't need all the overhead of a full PEP.
And don't forget to update also modules marshal, pickle and plistlib.
On 14/09/2020 17:36, Christopher Barker wrote:
There seems to be a fair bit of support for this idea.
Will it need a PEP ?
-CHB
If I've understood correctly (far from certain) the existing json.dumps and json.loads functions are permissive (allow some constructions that are not part of the JSON spec) but the proposed new functions will be strict. To minimise possible confusion, I think that the documentation (both the docstrings and the online docs) should be **very clear** about this. E.g. loads: ... loads accepts blah-blah-blah. This is different from loadf which only accepts strict JSON. loadf: ... loadf only accepts strict JSON. This is different from loads which blah-blah-blah Etc. Rob Cliffe
On Wed, Sep 16, 2020 at 12:59 AM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
On 14/09/2020 17:36, Christopher Barker wrote:
nstructions that are not part of the JSON spec) but the proposed new functions will be strict.
as it looks like I maybe the one to write the PR -- no, I'm not suggesting any changes to compliance. The only thing even remotely on the table is only supporting UTF-8 -- but IIUC, the current functions, if they do the encoding/decoding for you, are already UTF-8 only, so no change. load() and dump() work with text file-like objects -- they are not doing any encoding/decoding. loads() is working with strings or bytes. if strings, then no encoding. if bytes, then: "The ``encoding`` argument is ignored and deprecated since Python 3.1" which I figured meant utf-8 but it fact it seems to work with utf-16 as well. In [17]: utf16 = '{"this": 5}'.encode('utf-16') In [18]: json.loads(utf16) Out[18]: {'this': 5} which surprises me. I'll need to look at the code and see what it's doing. Unless someone wants to tell us :-) dumps(), meanwhile, dumps a str, so gain, no encoding. The idea here is that if you want to use loadf() or dumpf(), it will be utf-8, and if you want to use another encoding, you can open the file yourself and use load() or dump()
To minimise possible confusion, I think that the documentation (both the docstrings and the online docs) should be **very clear** about this.
Yes, and they need some help in that regard now anyway. -CHB
E.g. loads: ... loads accepts blah-blah-blah. This is different from loadf which only accepts strict JSON.
loadf: ... loadf only accepts strict JSON. This is different from loads which blah-blah-blah
Etc. Rob Cliffe _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ABQKK6... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
Maybe unrelated, but the same goes for `pickle.load` and `pickle.dump`. For consistencies, any changes made to `json.load` and `json.dump` (e.g. adding `json.loadf` and `json.dumpf` or accepting a path like as argument) should be also applied equivalently to `pickle.load` and `pickle.dump`. Off the top of my head, I can't think of any more places in the standard library with the same parallel structure.
On Thu, Sep 17, 2020 at 9:53 AM <lammenspaolo@gmail.com> wrote:
Maybe unrelated, but the same goes for `pickle.load` and `pickle.dump`. For consistencies, any changes made to `json.load` and `json.dump` (e.g. adding `json.loadf` and `json.dumpf` or accepting a path like as argument) should be also applied equivalently to `pickle.load` and `pickle.dump`.
Off the top of my head, I can't think of any more places in the standard library with the same parallel structure.
marshal is the other one in that set, and a quick 'git grep' shows that plistlib also has that API. The xmlrpc.client module also has dumps/loads, but not dump/load. ChrisA
I believe Sergie already suggested pickle and marshall, and I guess we can add plistlib to those. Personally, I'm not so sure it should be added to all these. I see why the same API was used for all of them, but they really are fairly different beasts. So if they have a function with the same purpose, it should have the same name, but that doesn't mean that all these modules need to have all the functions. On the other hand, the fact that we might be adding two new functions to four different modules is, in my mind, andn argument for overloading the existing dump() / load() instead. a lot less API churn. -CHB On Wed, Sep 16, 2020 at 5:10 PM Chris Angelico <rosuav@gmail.com> wrote:
On Thu, Sep 17, 2020 at 9:53 AM <lammenspaolo@gmail.com> wrote:
Maybe unrelated, but the same goes for `pickle.load` and `pickle.dump`.
For consistencies, any changes made to `json.load` and `json.dump` (e.g. adding `json.loadf` and `json.dumpf` or accepting a path like as argument) should be also applied equivalently to `pickle.load` and `pickle.dump`.
Off the top of my head, I can't think of any more places in the standard
library with the same parallel structure.
marshal is the other one in that set, and a quick 'git grep' shows that plistlib also has that API. The xmlrpc.client module also has dumps/loads, but not dump/load.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AWJNAL... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On the other hand, the fact that we might be adding two new functions to four different modules is, in my mind, and argument for overloading the existing dump() / load() instead. a lot less API churn.
I also believe that overloading is the better option here. The whole point of this change is to make a very frequent operation more convenient, and having to keep in mind two distinct pairs of functions, which only differ in the type of the first argument and a few minor things, is less convenient than just using the same functions for both file objects and paths. Besides, I don't understand what the downside of overloading is, apart from purism (?).
On Fri, Sep 18, 2020 at 12:07 AM Paolo Lammens <lammenspaolo@gmail.com> wrote:
Besides, I don't understand what the downside of overloading is, apart from purism (?).
I am one of who are conservative about overloading. I agree this is purism, but I want to explain behind of this purism. In statically, and nominal typed language, overloading is simple and clear because only one type is chosen by compiler. On the other hand, compiler or VM can not choose single type in duck-typed (or structural typed) languages. For example, * str subtype can implement read/write method. It is both of PathLike and file-like. * File subtype can implement `.__fspath__`. It is both of PathLike and File. Of course, statically typed languages like Java allow implementing multiple interfaces. But Java programmer must choose one interface explicitly when it is ambiguous. So it is explicit what type is used in overloading. On the other hand, in case of Python, there are no compiler/VM support for overloading, because Python is duck-typed language. * `load(f, ...)` uses `f.read()` * `dump(f, ...)` uses `f.write()` * `loadf(path, ..)` and `dumpf(path, ...)` uses `open(path, ...)` This is so natural design for duck-typed language. Regards, -- Inada Naoki <songofacandy@gmail.com>
On Thu, Sep 17, 2020 at 7:07 PM Inada Naoki <songofacandy@gmail.com> wrote:
Besides, I don't understand what the downside of overloading is, apart from purism (?). I am one of who are conservative about overloading. I agree this is
On Fri, Sep 18, 2020 at 12:07 AM Paolo Lammens <lammenspaolo@gmail.com> wrote: purism, but I want to explain behind of this purism.
Thank you -- it is key to understanding where we should go with this feature.
For example,
* str subtype can implement read/write method. It is both of PathLike and file-like. * File subtype can implement `.__fspath__`. It is both of PathLike and File.
I see the issue here, but isn't that inherent in duck typing? It's not specific to overloading. In fact the current implementation of json.load() simply calls the .read() in the object passed in. If it does not have a read method, you get an AttributeError. IF someone where to pass in a string subclass with a read method, and that method returned the right thing, it would "just work". Here's an example: In [5]: class ReadableString(str): ...: def read(self): ...: return self In [6]: rs = ReadableString('{"some": "json"}') In [7]: rs Out[7]: '{"some": "json"}' In [8]: json.load(rs) Out[8]: {'some': 'json'} This is the whole point of dynamic, duck typing yes? In a sense, the json.load() is already "overloaded" to take anything with a read() method that returns a string containing JSON. If I were to overload load() to allow a path-like object, I would probably do: def load(f_or_p, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw): try: fp = open(f_or_p, 'r', encoding="utf-8") except TypeError: fp = f_or_p return loads(fp.read(), cls=cls, object_hook=object_hook, parse_float=parse_float, parse_int=parse_int, parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) In this case, it would work on anything that either could be used in open() or had a read() method. Is that really much different? On the other hand, in case of Python, there are no compiler/VM support
for overloading, because Python is duck-typed language.
* `load(f, ...)` uses `f.read()` * `dump(f, ...)` uses `f.write()` * `loadf(path, ..)` and `dumpf(path, ...)` uses `open(path, ...)`
This is so natural design for duck-typed language.
or: * `load(f, ...)` uses `f.read()` or open(f) I can see how that is a bit more complicated, but don't see how it makes anyone's life worse or more confusing. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Fri, Sep 18, 2020 at 1:05 PM Christopher Barker <pythonchb@gmail.com> wrote:
I see the issue here, but isn't that inherent in duck typing? It's not specific to overloading. In fact the current implementation of json.load() simply calls the .read() in the object passed in. If it does not have a read method, you get an AttributeError. IF someone where to pass in a string subclass with a read method, and that method returned the right thing, it would "just work". Here's an example:
Yes - it would indeed just work, and that's because the function *name* specifies whether it's going to read from a file object or open a path name. There's no ambiguity; if you call load() on something that's both a path and a file, it'll work, and if you call loadf() on something that's both a path and a file, it'll also just work. But if you try to make those into one and the same function, how's it going to pick from the two options? ChrisA
On Thu, Sep 17, 2020 at 8:11 PM Chris Angelico <rosuav@gmail.com> wrote:
I see the issue here, but isn't that inherent in duck typing? It's not specific to overloading. In fact the current implementation of json.load() simply calls the .read() in the object passed in. If it does not have a read method, you get an AttributeError. IF someone where to pass in a string subclass with a read method, and that method returned the right
On Fri, Sep 18, 2020 at 1:05 PM Christopher Barker <pythonchb@gmail.com> wrote: thing, it would "just work". Here's an example:
Yes - it would indeed just work, and that's because the function *name* specifies whether it's going to read from a file object or open a path name. There's no ambiguity; if you call load() on something that's both a path and a file, it'll work, and if you call loadf() on something that's both a path and a file, it'll also just work. But if you try to make those into one and the same function, how's it going to pick from the two options?
We can certainly document that it tries to use it as a path first. I suppose you could catch the FileNotFoundError and then try to use it as a file-like object. I understand the theory here: but practicality beats purity -- do we really imagine that folks will create these path-like and file-like objects, and then pass them into json.load() and get upset when it behaves oddly? And this really doesn't feel that different from the fact that you can create an object with a .read() method that does something different, and it will fail then, too. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
I understand the theory here: but practicality beats purity -- do we really imagine that folks will create these path-like and file-like objects, and then pass them into json.load() and get upset when it behaves oddly?
And this really doesn't feel that different from the fact that you can create an object with a .read() method that does something different, and it will fail then, too.
+1, emphasis on the *practicality beats purity*. The whole reason this change was proposed is practicality. Adding two separate functions doesn't solve that problem. (Or, to be more precise, it does solve it, but the complexity it introduces directly counteracts any practicality it introduces.)
There's a small amount of overhead to overloading: - 2x conditionals - 1x function call overhead (additional stack frame) Instead of this approach we could either - inline the existing contents of _load by indenting within the conditional - just add the new code at the top of the function - use multiple dispatch ```python import os import json import pathlib def test_load(n): print(('n', n, load)) with open('test.json','w') as f: json.dump(True, f) assert load('test.json') assert load(pathlib.Path('test.json')) with open('test.json', 'r') as f: assert load(f) def _load(file_, **kwargs): # ... rename existing json.load to json._load ... print(('load', file_, kwargs)) return True def load(file_, *, encoding="UTF-8", **kwargs): if isinstance(file_, str) or hasattr(file_, "__fspath__"): with open(os.fspath(file_), "r", encoding=encoding) as f: return _load(f, **kwargs) else: return _load(file_, **kwargs) test_load(1) def load(file_, *, encoding="UTF-8", **kwargs): if isinstance(file_, str) or hasattr(file_, "__fspath__"): with open(os.fspath(file_), "r", encoding=encoding) as f: return _load(f, **kwargs) return _load(file_, **kwargs) # (or inline the existing json.load) test_load(2) ## singledispatch # https://docs.python.org/3/library/functools.html?highlight=dispatch#functool... from functools import singledispatch @singledispatch def load(file_, **kwargs): return _load(file_, **kwargs) @load.register(str) def load_str(file_: str, *, encoding='UTF-8', **kwargs): with open(file_, "r", encoding=encoding) as f: return _load(f, **kwargs) @load.register(os.PathLike) def load_pathlike(file_: os.PathLike, *, encoding='UTF-8', **kwargs): with open(os.fspath(file_), "r", encoding=encoding) as f: return _load(f, **kwargs) test_load(3) ``` On Thu, Sep 17, 2020 at 10:08 PM Inada Naoki <songofacandy@gmail.com> wrote:
On Fri, Sep 18, 2020 at 12:07 AM Paolo Lammens <lammenspaolo@gmail.com> wrote:
Besides, I don't understand what the downside of overloading is, apart
from purism (?).
I am one of who are conservative about overloading. I agree this is purism, but I want to explain behind of this purism.
In statically, and nominal typed language, overloading is simple and clear because only one type is chosen by compiler. On the other hand, compiler or VM can not choose single type in duck-typed (or structural typed) languages.
For example,
* str subtype can implement read/write method. It is both of PathLike and file-like. * File subtype can implement `.__fspath__`. It is both of PathLike and File.
Of course, statically typed languages like Java allow implementing multiple interfaces. But Java programmer must choose one interface explicitly when it is ambiguous. So it is explicit what type is used in overloading.
On the other hand, in case of Python, there are no compiler/VM support for overloading, because Python is duck-typed language.
* `load(f, ...)` uses `f.read()` * `dump(f, ...)` uses `f.write()` * `loadf(path, ..)` and `dumpf(path, ...)` uses `open(path, ...)`
This is so natural design for duck-typed language.
Regards, -- Inada Naoki <songofacandy@gmail.com> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZZOHJI... Code of Conduct: http://python.org/psf/codeofconduct/
- inline the existing contents of _load by indenting within the conditional - just add the new code at the top of the function
Yes, this is what I was referring to; I'd choose one of these options. I'm not so sure about dispatch because it is solely based on type and isn't very flexible; if in the future any change is made to the definition of path-like, or another addition is made to `json.load`/`json.dump`, or whatever, the type-based dispatch option is more likely to need more changes. On Fri, 18 Sep 2020 at 04:22, Wes Turner <wes.turner@gmail.com> wrote:
There's a small amount of overhead to overloading: - 2x conditionals - 1x function call overhead (additional stack frame)
Instead of this approach we could either - inline the existing contents of _load by indenting within the conditional - just add the new code at the top of the function - use multiple dispatch
```python import os import json import pathlib
def test_load(n): print(('n', n, load)) with open('test.json','w') as f: json.dump(True, f) assert load('test.json') assert load(pathlib.Path('test.json')) with open('test.json', 'r') as f: assert load(f)
def _load(file_, **kwargs): # ... rename existing json.load to json._load ... print(('load', file_, kwargs)) return True
def load(file_, *, encoding="UTF-8", **kwargs): if isinstance(file_, str) or hasattr(file_, "__fspath__"): with open(os.fspath(file_), "r", encoding=encoding) as f: return _load(f, **kwargs) else: return _load(file_, **kwargs)
test_load(1)
def load(file_, *, encoding="UTF-8", **kwargs): if isinstance(file_, str) or hasattr(file_, "__fspath__"): with open(os.fspath(file_), "r", encoding=encoding) as f: return _load(f, **kwargs) return _load(file_, **kwargs) # (or inline the existing json.load)
test_load(2)
## singledispatch # https://docs.python.org/3/library/functools.html?highlight=dispatch#functool...
from functools import singledispatch
@singledispatch def load(file_, **kwargs): return _load(file_, **kwargs)
@load.register(str) def load_str(file_: str, *, encoding='UTF-8', **kwargs): with open(file_, "r", encoding=encoding) as f: return _load(f, **kwargs)
@load.register(os.PathLike) def load_pathlike(file_: os.PathLike, *, encoding='UTF-8', **kwargs): with open(os.fspath(file_), "r", encoding=encoding) as f: return _load(f, **kwargs)
test_load(3) ```
On Thu, Sep 17, 2020 at 10:08 PM Inada Naoki <songofacandy@gmail.com> wrote:
On Fri, Sep 18, 2020 at 12:07 AM Paolo Lammens <lammenspaolo@gmail.com> wrote:
Besides, I don't understand what the downside of overloading is, apart
from purism (?).
I am one of who are conservative about overloading. I agree this is purism, but I want to explain behind of this purism.
In statically, and nominal typed language, overloading is simple and clear because only one type is chosen by compiler. On the other hand, compiler or VM can not choose single type in duck-typed (or structural typed) languages.
For example,
* str subtype can implement read/write method. It is both of PathLike and file-like. * File subtype can implement `.__fspath__`. It is both of PathLike and File.
Of course, statically typed languages like Java allow implementing multiple interfaces. But Java programmer must choose one interface explicitly when it is ambiguous. So it is explicit what type is used in overloading.
On the other hand, in case of Python, there are no compiler/VM support for overloading, because Python is duck-typed language.
* `load(f, ...)` uses `f.read()` * `dump(f, ...)` uses `f.write()` * `loadf(path, ..)` and `dumpf(path, ...)` uses `open(path, ...)`
This is so natural design for duck-typed language.
Regards, -- Inada Naoki <songofacandy@gmail.com> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ZZOHJI... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Sep 18, 2020 at 9:39 AM Paolo Lammens <lammenspaolo@gmail.com> wrote:
I'm not so sure about dispatch because it is solely based on type and isn't very flexible; if in the future any change is made to the definition of path-like, or another addition is made to `json.load`/`json.dump`, or whatever, the type-based dispatch option is more likely to need more changes.
Ideally, a match statement should have good runtime performance compared to an equivalent chain of if-statements. Although the history of
Though I doubt the __fspath__ interface will change, Unfortunately, the @singledispatch decorator accepts types but not functions (or methods); so there's no way to dispatch according to the presence or value of parameter attributes like file_.__fspath__; all @singledispatch will do is check whether isinstance(file_, os.PathLike). IDK if that's for (C) performance reasons? There are a few references to singledispatch in the stdlib: https://github.com/python/cpython/search?q=singledispatch. Given the presented use cases, I don't think the extensibility of @singledispatch is not worth the performance cost for json.load. https://www.python.org/dev/peps/pep-0622/#performance-considerations : programming languages is rife with examples of new features which increased engineer productivity at the expense of additional CPU cycles, it would be unfortunate if the benefits of match were counter-balanced by a significant overall decrease in runtime performance.
Although this PEP does not specify any particular implementation
strategy, a few words about the prototype implementation and how it attempts to maximize performance are in order.
Basically, the prototype implementation transforms all of the match
statement syntax into equivalent if/else blocks - or more accurately, into Python byte codes that have the same effect. In other words, all of the logic for testing instance types, sequence lengths, mapping keys and so on are inlined in place of the match.
This is not the only possible strategy, nor is it necessarily the best.
For example, the instance checks could be memoized, especially if there are multiple instances of the same class type but with different arguments in a single match statement. It is also theoretically possible for a future implementation to process case clauses or sub-patterns in parallel using a decision tree rather than testing them one by one.
This reminds me of a previous proposal I can't remember if I hit the list with, allowing with open(filename.json, "r") as f: my_dict = json.load(f) to be spelt as a single expression: my_dict = (json.load(f) with open(filename.json, "r") as f) Obviously this would be more useful in comprehensions, but it might encourage people not to lazily write `json.load(open("filename.json", "r"))` because they want an expression, and end up leaving files open. Eric
`my_dict = (json.load(f) with open(filename.json, "r") as f)` Would that be called a generator expression / comprehension context manager? https://docs.python.org/3/reference/datamodel.html#context-managers https://docs.python.org/3/library/contextlib.html PEP 343 added the "with" statement. Tests that would need to be extended / that may be useful references: https://github.com/python/cpython/blob/master/Lib/test/test_with.py https://github.com/python/cpython/blob/master/Lib/test/test_contextlib.py https://github.com/python/cpython/blob/master/Lib/test/test_contextlib_async... https://github.com/python/cpython/blob/master/Lib/test/test_grammar.py#L1701 test_with_statement https://github.com/python/cpython/blob/master/Lib/test/test_grammar.py#L1864 test_async_with https://github.com/python/cpython/blob/master/Grammar/python.gram Would there be any new scope issues? On Mon, Sep 28, 2020, 11:12 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
This reminds me of a previous proposal I can't remember if I hit the list with, allowing
with open(filename.json, "r") as f: my_dict = json.load(f)
to be spelt as a single expression:
my_dict = (json.load(f) with open(filename.json, "r") as f)
Obviously this would be more useful in comprehensions, but it might encourage people not to lazily write `json.load(open("filename.json", "r"))` because they want an expression, and end up leaving files open.
Eric _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QR233S... Code of Conduct: http://python.org/psf/codeofconduct/
https://github.com/python/cpython/blame/master/Lib/unittest/test/testmock/te... On Mon, Sep 28, 2020 at 12:00 PM Wes Turner <wes.turner@gmail.com> wrote:
`my_dict = (json.load(f) with open(filename.json, "r") as f)`
Would that be called a generator expression / comprehension context manager?
https://docs.python.org/3/reference/datamodel.html#context-managers
https://docs.python.org/3/library/contextlib.html
PEP 343 added the "with" statement.
Tests that would need to be extended / that may be useful references:
https://github.com/python/cpython/blob/master/Lib/test/test_with.py
https://github.com/python/cpython/blob/master/Lib/test/test_contextlib.py
https://github.com/python/cpython/blob/master/Lib/test/test_contextlib_async...
https://github.com/python/cpython/blob/master/Lib/test/test_grammar.py#L1701 test_with_statement
https://github.com/python/cpython/blob/master/Lib/test/test_grammar.py#L1864 test_async_with
https://github.com/python/cpython/blob/master/Grammar/python.gram
Would there be any new scope issues?
On Mon, Sep 28, 2020, 11:12 AM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
This reminds me of a previous proposal I can't remember if I hit the list with, allowing
with open(filename.json, "r") as f: my_dict = json.load(f)
to be spelt as a single expression:
my_dict = (json.load(f) with open(filename.json, "r") as f)
Obviously this would be more useful in comprehensions, but it might encourage people not to lazily write `json.load(open("filename.json", "r"))` because they want an expression, and end up leaving files open.
Eric _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QR233S... Code of Conduct: http://python.org/psf/codeofconduct/
I'm pretty opposed to the idea of adding this functionality *specifically for JSON*, because JSON loading is not the only use case that could benefit from the convenience and reduced verbosity. I propose the following (including type annotations and tests): https://repl.it/@maximum__/loadfile#main.py Short version below: def loadfile(path, encoding='utf-8', loader=methodcaller('read')): """ Load data from a file, using a given "loader" """ with open(path, 'r', encoding=encoding) as fp: return loader(fp) So for the JSON use case, you would write: data = loadfile('data.json', loader=json.load) This is still shorter and more readable than the with-open idiom, but also lets ANYONE benefit from the new convenience, including people who might want to load data from various other file formats (YAML, TOML, CSV, etc. etc. etc.) or even just plain Unicode or binary streams. As a side effect, it also fixes the "what to call it" problem, and more importantly fixes the "how to delegate args to open() vs to json.load()" problem. loadfile() would pass along its kwargs to open(), and the user could use lambda, partial, or write a utility function to customize the loader itself. If we are really really leery of adding a new top-level builtin, you could namespace it under a "loadfile" module: from loadfile import loadfile Although having to explicitly import the thing somewhat cuts into the beginner-friendliness aspect. -Greg Werbin
participants (22)
-
Alex Hall
-
Brendan Barnwell
-
Cameron Simpson
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
Edwin Zimmerman
-
Eric Wieser
-
Greg Ewing
-
Greg Werbin
-
Guido van Rossum
-
Inada Naoki
-
Joao S. O. Bueno
-
lammenspaolo@gmail.com
-
Marco Sulla
-
Matthias Bussonnier
-
Paolo Lammens
-
Rob Cliffe
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
The Nomadic Coder
-
Wes Turner