pathlib.Path(...).json() (again)
It's been brought up a few times eg: https://github.com/python/cpython/pull/12465 but I really think it's time to be re-considered. Specifically I love being able to use `asyncio.to_thread(pathlib.Path(...).read_text, encoding="utf8")` It does exactly the right thing for me! It's great because it interacts with the event loop as few times as possible and uses only the RAM absolutely required. I'm looking to be able to do something like `asyncio.to_thread(pathlib.Path(...).json)` defined as: ```python def json(self): import json with self.open("rb") as f: return json.load(f) ``` while this may seem simple to implement in my own code it handles a lot of stuff for me!, Specifically PEP 538, minimal interactions with the asyncio waker pipe, the minimal amount of ram, etc etc. You might ask - if json why not .html() and .xml() and .csv() ? Well each of these formats require specific configuration - eg do you want defused or regular exploding xml? Do you want excel style CSV or GNU style CSV? AFAIK json is the only format with a one shot zeroconf bytes -> structure format, and so makes it worthy of the pathlib shortcut.
Hi Thomas, On Fri, Jul 09, 2021 at 05:54:27PM -0000, Thomas Grainger wrote:
It's been brought up a few times eg: https://github.com/python/cpython/pull/12465
but I really think it's time to be re-considered.
Has anything changed since the last time it was discussed? If nothing has changed, and there are no new arguments in favour of the change, why do you think the result will be any different? Why not just subclass pathlib.Path and add the method yourself? class MyPath(pathlib.Path): def json(self): ... Or for that matter, you can add an extension method to pathlib.Path: def json(self): ... pathlib.Path.json = json
On Fri, Jul 9, 2021 at 6:21 PM Steven D'Aprano <steve@pearwood.info> wrote:
Has anything changed since the last time it was discussed? If nothing has changed, and there are no new arguments in favour of the change, why do you think the result will be any different?
Note: one thing that has not been rejected, and would likely be accepted, is adding the capability in the json module to pass in a filename, rather than an open file, to have a json file read. So your example would be: (untested) `asyncio.to_thread(pathlib.Path(...).read_text, encoding="utf8")` import json the_path - pathlib.Path(....) asyncio.to_thread(json.loadf, the_path) which reads even better to me. And while I like json file reading to be more simple, I don't understand why any of this is needed to "interact with the event loop as few times as possible and use only the RAM absolutely required." Honestly I'm still not totally clear on all things async, but what's wrong with: with open(the_path, encoding="utf-8") as the_file: asyncio.to_thread(json.load, the_file) It opens the file in the main thread, and not asynchronously, but doesn't the file itself get read in the other thead, asynchronously? And is there any extra RAM used? Also -- and this may be completely my ignorance -- but don't you need to wrap the json reading in a function anyway so that you can capture the result somewhere? In short -- no, I don't think JSON is not special enough to get a Path method, but a simple way to read JSON directly from a Path would be nice. -CHB
Why not just subclass pathlib.Path and add the method yourself?
class MyPath(pathlib.Path): def json(self): ...
Or for that matter, you can add an extension method to pathlib.Path:
def json(self): ...
pathlib.Path.json = json
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MWTA3B... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
It opens the file in the main thread, and not asynchronously, but doesn't the file itself get read in the other thead, asynchronously? And is there any extra RAM used?
The file could be on an external network drive and so opening it may block the main thread for seconds: ``` with open(the_path, 'rb') as the_file: # no other tasks can progress while opening this file return await asyncio.to_thread(json.load, the_file) ```
Also -- and this may be completely my ignorance -- but don't you need to wrap the json reading in a function anyway so that you can capture the result somewhere?
asyncio.to_thread is an async function that returns the result of the passed in synchronous function, and can run concurrently with other tasks by scheduling the synchronous function on a thread
In short -- no, I don't think JSON is not special enough to get a Path method, but a simple way to read JSON directly from a Path would be nice.
``` asyncio.to_thread(json.loadf, the_path) ``` would be fine too
It's a utility method, so its usefulness derives from being available everywhere without having to patch it in. For me what's changed is the introduction of `asyncio.to_thread` and PEP 597 making `pathlib.Path.open` slightly less ergonomic,
On Sat, Jul 10, 2021 at 09:21:14AM -0000, Thomas Grainger wrote:
It's a utility method, so its usefulness derives from being available everywhere without having to patch it in.
Don't mistake *convenience* with *usefulness*. It is convenient to have the functionality in the std lib, especially for those who are limited to only using the std lib without third party libraries. But it is no more useful. If the functionality is the same, it makes no difference whether it was in pathlib.Path, patched in, subclassed or imported from a library. The usefulness of this method to *you* does not depend on whether other people have access to it or not. Remember too that even if we agreed right now that this was an awesome idea, the earliest that it could appear in production would be 3.11. So you can't rely on it being in the std lib until you can stop using 3.10 and below. Until then, you have to rely on a subclass or monkey-patch. If this method was hard to write correctly, that would be a good argument for adding it to pathlib. But you literally gave an implementation for this method and it was only a few lines long, so not difficult for people to write themselves. So far, at least, it doesn't seem to have attracted much interest from other people saying "oh yes, that would be really useful", so it seems that the usefulness is very niche. That makes it hard to justify complicating the pathlib API with an extra method if hardly anyone is going to use it. Put all of those together: - you can't use it until you have abandoned 3.10 and older; - not many other people seem interested, including the pathlib maintainers; - it seems to be easy enough to implement yourself; and the convenience argument isn't very strong. If you're going to push this, I think you need a stronger justification.
For me what's changed is the introduction of `asyncio.to_thread` and PEP 597 making `pathlib.Path.open` slightly less ergonomic,
I don't know what you mean by ergonomic when it comes to software. Obviously you're not implying that it is physically harder to type "pathlib.Path.open" now, so I don't know what you mean. -- Steve
participants (3)
-
Christopher Barker
-
Steven D'Aprano
-
Thomas Grainger