pathlib+os/shutil feedback

I talked to my colleague. He didn't remember the concrete use-case, though, he instantly mentioned three possible things (no order of preference): 1) pathlib + mtime 2) os.walk and pathlib 3) creation/removal of paths He wasn't too sure but I checked with the docs and his memories seemed to be correct: ----- 1) https://docs.python.org/3/library/pathlib.html#pathlib.Path.stat High-level path objects should return high-level [insert type here] objects. Put differently, an API for retrieving time-stats as real date/time objects would be nice. I think that can be expanded to other pathlib methods as well, to make them less "os-wrapper"-like and provide added value. ----- 2) I remember a discussion on python-ideas about using "glob" or "rglob". However, when searching the docs for "walk" like in "os.walk" or for "iter", I don't find "glob"/"rglob". I can imagine ourselves (pathlib newbies back then), we didn't discover them. It would be great if the docs could be improved like the following: """ Path.rglob(pattern) Walk down a given path; a wrapper for "os.scandir"/"os.listdir". This is like calling glob() with “**” added in front of the given pattern: """ I think it would make "glob" and "rglob" more discoverable to new users. NOTE: """ Using the “**” pattern in large directory trees may consume an inordinate amount of time.""" sounds not really encouraging. This is especially true for "rglob" as it is defined as "like calling glob() with “**”". That leads to wondering whether "rglob" performs slow globbing instead of a "os.scandir"/"os.listdir". https://docs.python.org/3/library/pathlib.html#basic-use even promotes "glob" with "**" in the beginning which seems rather discouraging to use "rglob" as a fast alternative to "os.walk/scandir/listdir". Renaming "rglob"/adding a "scan" method would definitely help here. ----- 3) Again searching the docs for "create", "delete" (nothing found) and "remove", I found "Path.touch", "Path.rmdir" and "Path.unlink". It would be great if we had an easy way to remove a complete subtree as with "shutil.rmtree". We mostly don't care if a directory is empty. We need the system to be in a state of "this path does not exist anymore". Moreover, touching a file is good enough to "create" it if you don't care about changing its mtime. It you care about its mtime, it's a problem to "touch". ------ That's it for our issues with pathlib from the past. Additionally, I got two further observations: A) pathlib tries to mimic/publish some low-level APIs to its users. "unlink" is not something people would expect to use when they want to "delete" or to "remove" a file or a directory. I know where the term stems from but it's the wrong layer of abstraction IMHO. Same for "touch" or "chmod". B) "rename" vs "replace". The difference is not really clear from the docs. You need to read "Path.replace" in order to understand "Path.rename" completely. (one raises an exception, the other don't if I read it correctly). If there's some agreement to change things with respect to those 5 points, I am willing to put some time into it. Best, Sven

On 10 April 2016 at 15:07, Sven R. Kunze <srkunze@mail.de> wrote:
If there's some agreement to change things with respect to those 5 points, I am willing to put some time into it.
In broad terms I agree with these points. Thanks for doing the research. It would certainly be good to try to improve pathlib based on this sort of feedback while it is still provisional. One specific point - you say: """ Path.rglob(pattern) Walk down a given path; a wrapper for "os.scandir"/"os.listdir". """ However, at least in 3.5, Path.rglob does *not* wrap scandir. There's a difference in principle, in that scandir (DirEntry) objects cache stat data, where pathlib does not. Whether that makes using scandir in Path.rglob impossible, I don't know. Ideally I'd like to see pathlib modified to use scandir (because otherwise there will always be people saying "use os.walk rather than scandir, as it's faster) - or if it's not possible to do so because of the difference in principle, then I'd like to see a clear discussion of the issue in the docs, including the recommended approach for people who want scandir performance *without* having to abandon pathlib for lower level functions. Paul

On 10.04.2016 16:51, Paul Moore wrote:
On 10 April 2016 at 15:07, Sven R. Kunze <srkunze@mail.de> wrote:
If there's some agreement to change things with respect to those 5 points, I am willing to put some time into it. In broad terms I agree with these points. Thanks for doing the research. It would certainly be good to try to improve pathlib based on this sort of feedback while it is still provisional.
I'd appreciate some guidance on this. Just let me know what I can do since I don't know the processes of hacking CPython.
""" Path.rglob(pattern) Walk down a given path; a wrapper for "os.scandir"/"os.listdir". """
However, at least in 3.5, Path.rglob does *not* wrap scandir. There's a difference in principle, in that scandir (DirEntry) objects cache stat data, where pathlib does not. Whether that makes using scandir in Path.rglob impossible, I don't know. Ideally I'd like to see pathlib modified to use scandir (because otherwise there will always be people saying "use os.walk rather than scandir, as it's faster) - or if it's not possible to do so because of the difference in principle, then I'd like to see a clear discussion of the issue in the docs, including the recommended approach for people who want scandir performance *without* having to abandon pathlib for lower level functions.
Good point. The proposed docstring was just to illustrate the functionality to the uninformed reader. People mostly trust the docs without digging deeper but they should be accurate of course. Best, Sven

On Mon, 11 Apr 2016 at 13:40 Sven R. Kunze <srkunze@mail.de> wrote:
On 10 April 2016 at 15:07, Sven R. Kunze <srkunze@mail.de> wrote:
If there's some agreement to change things with respect to those 5
On 10.04.2016 16:51, Paul Moore wrote: points, I
am willing to put some time into it. In broad terms I agree with these points. Thanks for doing the research. It would certainly be good to try to improve pathlib based on this sort of feedback while it is still provisional.
I'd appreciate some guidance on this. Just let me know what I can do since I don't know the processes of hacking CPython.
https://docs.python.org/devguide/ and https://mail.python.org/mailman/listinfo/core-mentorship are your friends. :) For new features of a module you can discuss it on python-ideas first before proposing a patch if you're worried a patch implementing the feature might get rejected and you don't want to risk wasting your time. -Brett
participants (3)
-
Brett Cannon
-
Paul Moore
-
Sven R. Kunze