New PEP proposal -- Pathlib Module Should Contain All File Operations

Good day all, as it seemed to be a good idea, I wrote a PEP proposal for pathlib to contain file operations. Here is the draft. What do you think about this? BR, George --------------------------- PEP: 9999 Title: Pathlib Module Should Contain All File Operations Author: George Fischhof <george at fischhof.hu> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 15-Mar-2018 Python-Version: 3.8 Post-History: Abstract ======== This PEP proposes pathlib module to be a centralized place for all file-system related operations. Rationale ========= Right now we have several modules that contain functions related to file-system operations mainly the os, pathlib and shutil. For beginners it is quite hard to remember where can he / she find a function (copy resides in shutil, but the remove function can be found in the os module. (And sometimes developers with moderate experience have to check the documentation as well.) After the release of version 3.6 several methods became aware of path-like object. There are only a few ones which does not support the path-like object. After making these methods path-like object aware, these functions could be added to pathlib. With functions in pathlib the developers should not have to think on which method (function) can be found in which module. Makes the life easier. Implementation ============== For compatibility reasons the pathlib should contain wrappers to the original functions. The original functions should remain at their original place. (Or if pathlib contains the function, the original modules should have a wrapper to it.) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

On 12 March 2018 at 20:57, George Fischhof <george@fischhof.hu> wrote:
I don't know for certain what I think I feel about the idea - in general, it seems plausible. But I think you'll need to get down to specifics in the PEP, exactly what functions do you suggest get added to pathlib? Paul

path.py conveniently defines very many (most?) file operations as methods: https://github.com/jaraco/path.py/blob/master/path.py https://pathpy.readthedocs.io/en/latest/api.html The division operator is mapped to os.path.join: Path('a') / '/root' == Path('/root') On Monday, March 12, 2018, George Fischhof <george@fischhof.hu> wrote:

Here's a comparison table of os, os.path, shutil, pathlib, and path.py. The full version is at https://github.com/westurner/pyfilemods (README.rst) and at https://westurner.github.io/pyfilemods. I ran a few set intersections and went ahead and wrote a report to compare function/method signatures and sources. attr table ========== ================== == ======= ====== ======= ======= attr os os.path shutil pathlib path.py ================== == ======= ====== ======= ======= `__div__`_ X `__rdiv__`_ X `absolute`_ X `abspath`_ X X `access`_ X X `altsep`_ X X `anchor`_ X `as_posix`_ X `as_uri`_ X `atime`_ X `basename`_ X X `bytes`_ X `capitalize`_ X `casefold`_ X `cd`_ X `center`_ X `chdir`_ X X `chmod`_ X X X `chown`_ X X X `chroot`_ X X `chunks`_ X `commonpath`_ X `commonprefix`_ X `copy`_ X X `copy2`_ X X `copyfile`_ X X `copymode`_ X X `copystat`_ X X `copytree`_ X X `count`_ X `ctime`_ X `curdir`_ X X `cwd`_ X `defpath`_ X X `devnull`_ X X `dirname`_ X X `dirs`_ X `drive`_ X X `encode`_ X `endswith`_ X `exists`_ X X X `expand`_ X `expandtabs`_ X `expanduser`_ X X X `expandvars`_ X X `ext`_ X `extsep`_ X X `files`_ X `find`_ X `fnmatch`_ X X `format`_ X `format_map`_ X `get_owner`_ X `getatime`_ X X `getctime`_ X X `getcwd`_ X X `getmtime`_ X X `getsize`_ X X `glob`_ X X `group`_ X `home`_ X `in_place`_ X `index`_ X `is_absolute`_ X `is_block_device`_ X `is_char_device`_ X `is_dir`_ X `is_fifo`_ X `is_file`_ X `is_reserved`_ X `is_socket`_ X `is_symlink`_ X `isabs`_ X X `isalnum`_ X `isalpha`_ X `isdecimal`_ X `isdigit`_ X `isdir`_ X X `isfile`_ X X `isidentifier`_ X `islink`_ X X `islower`_ X `ismount`_ X X `isnumeric`_ X `isprintable`_ X `isspace`_ X `istitle`_ X `isupper`_ X `iterdir`_ X `join`_ X X `joinpath`_ X X `lchmod`_ X `lexists`_ X `lines`_ X `link`_ X X `listdir`_ X X `ljust`_ X `lower`_ X `lstat`_ X X X `lstrip`_ X `makedirs`_ X X `makedirs_p`_ X `maketrans`_ X `match`_ X `merge_tree`_ X `mkdir`_ X X X `mkdir_p`_ X `module`_ X `move`_ X X `mtime`_ X `name`_ X X X `namebase`_ X `normcase`_ X X `normpath`_ X X `open`_ X X X `os`_ X X `owner`_ X X `pardir`_ X X `parent`_ X X `parents`_ X `partition`_ X `parts`_ X `pathconf`_ X X `pathsep`_ X X `read_bytes`_ X `read_hash`_ X `read_hexhash`_ X `read_md5`_ X `read_text`_ X `readlink`_ X X `readlinkabs`_ X `realpath`_ X X `relative_to`_ X `relpath`_ X X `relpathto`_ X `remove`_ X X `remove_p`_ X `removedirs`_ X X `removedirs_p`_ X `rename`_ X X X `renames`_ X X `replace`_ X X X `resolve`_ X `rfind`_ X `rglob`_ X `rindex`_ X `rjust`_ X `rmdir`_ X X X `rmdir_p`_ X `rmtree`_ X X `rmtree_p`_ X `root`_ X `rpartition`_ X `rsplit`_ X `rstrip`_ X `samefile`_ X X X `sameopenfile`_ X `samestat`_ X `sep`_ X X `size`_ X `special`_ X `split`_ X X `splitall`_ X `splitdrive`_ X X `splitext`_ X X `splitlines`_ X `splitpath`_ X `splitunc`_ X `startswith`_ X `stat`_ X X X X X `statvfs`_ X X `stem`_ X X `strip`_ X `stripext`_ X `suffix`_ X `suffixes`_ X `swapcase`_ X `symlink`_ X X `symlink_to`_ X `text`_ X `title`_ X `touch`_ X X `translate`_ X `uncshare`_ X `unlink`_ X X X `unlink_p`_ X `upper`_ X `using_module`_ X `utime`_ X X `walk`_ X X `walkdirs`_ X `walkfiles`_ X `with_name`_ X `with_suffix`_ X X `write_bytes`_ X X `write_lines`_ X `write_text`_ X X `zfill`_ X ================== == ======= ====== ======= ======= On Wed, Mar 14, 2018 at 2:22 PM, Wes Turner <wes.turner@gmail.com> wrote:

I added trio to the comparison table (Things are mostly just async-wrapped, though pathlib_not_trio does show a few missing methods?). https://github.com/westurner/pyfilemods/issues/2 https://github.com/westurner/pyfilemods/blob/master/README.rst#attr-table ================== == ======= ====== ======= ======= ==== attr os os.path shutil pathlib path.py trio ================== == ======= ====== ======= ======= ==== `__div__`_ X `__rdiv__`_ X `absolute`_ X X `abspath`_ X X `access`_ X X `altsep`_ X X `anchor`_ X `as_posix`_ X X `as_uri`_ X X `atime`_ X `basename`_ X X `bytes`_ X `capitalize`_ X `casefold`_ X `cd`_ X `center`_ X `chdir`_ X X `chmod`_ X X X X `chown`_ X X X `chroot`_ X X `chunks`_ X `commonpath`_ X `commonprefix`_ X `copy`_ X X `copy2`_ X X `copyfile`_ X X `copymode`_ X X `copystat`_ X X `copytree`_ X X `count`_ X `ctime`_ X `curdir`_ X X `cwd`_ X X `defpath`_ X X `devnull`_ X X `dirname`_ X X `dirs`_ X `drive`_ X X `encode`_ X `endswith`_ X `exists`_ X X X X `expand`_ X `expandtabs`_ X `expanduser`_ X X X X `expandvars`_ X X `ext`_ X `extsep`_ X X `files`_ X `find`_ X `fnmatch`_ X X `format`_ X `format_map`_ X `get_owner`_ X `getatime`_ X X `getctime`_ X X `getcwd`_ X X `getmtime`_ X X `getsize`_ X X `glob`_ X X X `group`_ X X `home`_ X X `in_place`_ X `index`_ X `is_absolute`_ X X `is_block_device`_ X X `is_char_device`_ X X `is_dir`_ X X `is_fifo`_ X X `is_file`_ X X `is_reserved`_ X X `is_socket`_ X X `is_symlink`_ X X `isabs`_ X X `isalnum`_ X `isalpha`_ X `isdecimal`_ X `isdigit`_ X `isdir`_ X X `isfile`_ X X `isidentifier`_ X `islink`_ X X `islower`_ X `ismount`_ X X `isnumeric`_ X `isprintable`_ X `isspace`_ X `istitle`_ X `isupper`_ X `iterdir`_ X X `join`_ X X `joinpath`_ X X X `lchmod`_ X X `lexists`_ X `lines`_ X `link`_ X X `listdir`_ X X `ljust`_ X `lower`_ X `lstat`_ X X X X `lstrip`_ X `makedirs`_ X X `makedirs_p`_ X `maketrans`_ X `match`_ X X `merge_tree`_ X `mkdir`_ X X X X `mkdir_p`_ X `module`_ X `move`_ X X `mtime`_ X `name`_ X X X `namebase`_ X `normcase`_ X X `normpath`_ X X `open`_ X X X X `os`_ X X `owner`_ X X X `pardir`_ X X `parent`_ X X `parents`_ X `partition`_ X `parts`_ X `pathconf`_ X X `pathsep`_ X X `read_bytes`_ X X `read_hash`_ X `read_hexhash`_ X `read_md5`_ X `read_text`_ X X `readlink`_ X X `readlinkabs`_ X `realpath`_ X X `relative_to`_ X X `relpath`_ X X `relpathto`_ X `remove`_ X X `remove_p`_ X `removedirs`_ X X `removedirs_p`_ X `rename`_ X X X X `renames`_ X X `replace`_ X X X X `resolve`_ X X `rfind`_ X `rglob`_ X X `rindex`_ X `rjust`_ X `rmdir`_ X X X X `rmdir_p`_ X `rmtree`_ X X `rmtree_p`_ X `root`_ X `rpartition`_ X `rsplit`_ X `rstrip`_ X `samefile`_ X X X X `sameopenfile`_ X `samestat`_ X `sep`_ X X `size`_ X `special`_ X `split`_ X X `splitall`_ X `splitdrive`_ X X `splitext`_ X X `splitlines`_ X `splitpath`_ X `splitunc`_ X `startswith`_ X `stat`_ X X X X X X `statvfs`_ X X `stem`_ X X `strip`_ X `stripext`_ X `suffix`_ X `suffixes`_ X `swapcase`_ X `symlink`_ X X `symlink_to`_ X X `text`_ X `title`_ X `touch`_ X X X `translate`_ X `uncshare`_ X `unlink`_ X X X X `unlink_p`_ X `upper`_ X `using_module`_ X `utime`_ X X `walk`_ X X `walkdirs`_ X `walkfiles`_ X `with_name`_ X X `with_suffix`_ X X X `write_bytes`_ X X X `write_lines`_ X `write_text`_ X X X `zfill`_ X ================== == ======= ====== ======= ======= ==== On Mon, Mar 19, 2018 at 5:23 AM, Wes Turner <wes.turner@gmail.com> wrote:

On Tue, Mar 20, 2018 at 1:03 AM, Wes Turner <wes.turner@gmail.com> wrote:
trio.Path is an automatically generated, exact mirror of pathlib.Path, so I don't think it's very useful to have in your table? Also the missing attributes are actually handled via __getattr__, so they aren't actually missing, they're just invisible to your detection mechanism :-) In [21]: trio.Path("/a/b").anchor Out[21]: '/' In [22]: trio.Path("/a/b").name Out[22]: 'b' -n -- Nathaniel J. Smith -- https://vorpus.org

I thought that because of __fspath__ conversions pathlib can be use with any function that needs a path. So long as shutil is Pathlib friendly isn’t this asolv d problem? No need to extend pathlib for every file op, just pass the existed file ops a Path. Wasn’t this discussed at the time fspath was added? Barry

On Mon, Mar 12, 2018 at 09:57:32PM +0100, George Fischhof wrote:
This is certainly a problem. Not a big problem, but it is an annoyance.
I don't know that this will be true. It makes one problem better: you no longer have to remember which module the function is in. But it makes other things worse: - code and/or API duplication: for backwards compatibility, every existing function must be in two places, the original and in pathlib; - if new file functions are added, they will go only in pathlib, which makes pathlib effectively mandatory; - the pathlib API becomes even more complicated: not only have you got all the methods of pathlib objects, but you have all the shutil and os functions as well. I think this is a good place for an experiment. You could write a function which monkey-patches pathlib: from pathlib import Path import os import shutil def monkeypatch(): Path.remove = os.remove # etc. Then we can see how many functions are involved, how large this makes the Path object, and try it out and see whether it is better. -- Steve

2018-03-13 13:17 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:
Duplication: it is true, but it is true for several other modules as well. I checked the pathlib module: right now more than 50% of the functions are duplicate - mainly from os - so it seems that pathlib already started to develop this way ;-) " if new file functions are added, they will go only in pathlib, which makes pathlib effectively mandatory;" Yes but I think this part of the evolution: slowly everyone will shift to pathlib, and being mandatory is true for the current status as well: if you need a function, you need the module. Right now if you wan to execute some file operations, you need os plus shutil, because the half of the functions are in one of them, the other half is in the other module I collected the functions that sould be put into pathlib: - os.remove - os.removedirs (shutil.rmtree has the same functionalaty) - os.truncate - shutil.copyfileobj - shutil.copyfile - shutil.copymode - shutil.copystat - shutil.copy - shutil.copy2 - shutil.copytree with shutil.ignore_patterns - shutil.move - shutil.disk_usage - shutil.chown - os.link => path.hardlink_to - os.mkfifo - os.readlink Sum: 16 functuins And all functions from os module accept path-like objects, and none of the shutil functions. Pathlib already contains 17 functions from os an shutil modules. George

On Fri, Mar 16, 2018 at 12:38 AM, George Fischhof <george@fischhof.hu> wrote:
The os module is cheap; pathlib has a definite cost. If every file operation goes through pathlib, that basically means pathlib becomes part of the startup cost: rosuav@sikorsky:~$ python3 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import os"])' 50 loops, best of 5: 8.82 msec per loop rosuav@sikorsky:~$ python3 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import pathlib"])' 20 loops, best of 5: 15.9 msec per loop rosuav@sikorsky:~$ python3.6 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import os"])' 100 loops, best of 3: 14.1 msec per loop rosuav@sikorsky:~$ python3.6 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import pathlib"])' 10 loops, best of 3: 19.7 msec per loop rosuav@sikorsky:~$ python3.5 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import os"])' 100 loops, best of 3: 10.6 msec per loop rosuav@sikorsky:~$ python3.5 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import pathlib"])' 100 loops, best of 3: 18.7 msec per loop And this is with warm caches; for a true first-time startup, the cost could be significantly higher. ChrisA

On Fri, Mar 16, 2018 at 4:48 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
3.8, but I also tested with 3.7, 3.6, and 3.5. All of them showed the same kind of difference, but the numbers were a bit higher in the older versions. I fully expect that Python 3.4 (the first one where pathlib existed) would show the same too. ChrisA

If there it is desireable to have pathlib used to represent paths that do not map directly to the filesystem.. then it might be an acceptable compromise to have yet another... package that just imports os, pathlib, shutil etc and re-exports all relevant functions. i mean we are talking about convenience here anyway not new functionality and this would enable 1-import convenience that gives you all-you-would-ever-want but at the cost of maybe a bit more startup time if you only needed 1 of the 3 modules that do actual work. and avoid polluting pathlib with things that are... too useful in the real world :P

Joonas Liik writes:
then it might be an acceptable compromise to have yet another...
"There should be one-- and preferably only one -- obvious way to do it." The obvious way is to use the existing stdlib modules. So....
package that just imports os, pathlib, shutil etc and re-exports all relevant functions.
Anybody wanting this can easily do a better job than the stdlib ever can do -- by writing a package including all the modules they frequently use and only re-exporting those names that they use, perhaps with shorter or personally mnemonic aliases (Windows vs. Unix nomenclature for many shell utilities, for example -- but watch those builtins like "dir"!) Of course, this is horrible programming practice, making for burdens on reviewers and maintainers. I see the convenience for writing one-off scripts and perhaps a personal library, but I don't see it as a core function of the language and its standard library. So, -1 in the stdlib, and +0 for a base module on PyPI that demonstrates the principle and individual users could modify to personal taste. Steve

2018-03-17 7:18 GMT+01:00 Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp>:
Hi Steve, as I wrote previuosly pathlib already contains 17 functions from os and shutil. It seems that the original idea was something like for my idea. Just it not finished yet, With adding the remaining functions it would become a nice and whole thing George

George Fischhof writes:
It seems that the original idea was something like for my idea. Just it not finished yet,
Antoine (author and maintainer of pathlib) is not the kind of developer who leaves things unfinished. In PEP 428, there's a hint that some shutil functionality could be added, but I really don't think he meant anything as broad as in your PEP. As far as I can recall, pathlib is intended from the beginning to (1) represent paths in hierarchical local filesystems as Paths, (2) manipulate individual Paths in various ways consistent with the semantics of a hierarchal filesystem, and (3) offer various ways to access the object denoted by a single Path. Its functionality is very complete as far as that goes. It does not contain methods to (4) operate on directories as collections (with the exception of the iterdir, glob, and rglob methods, which expose directory contents as iterators of Paths), (5) perform operations on several objects denoted by Paths at once (copy and its multiple operand variants), (6) perform process control or access process characteristics, (7) perform operations (eg, mounting partitions and flow control on TTYs) on devices (block or character), even if they can be accessed via paths in some filesystem as in POSIX, or (8) deal with users and other specialized OS objects. I conclude there never was any intention to overlap with os or shutil, except to the extent that they provide for any kind of path manipulation. Rather, I suppose the intent was to provide a substitute for os.path with a more convenient, complete, object-oriented API and consistent semantics, based on more than a decade of experience with os.path. Regards, Steve

On Sat, Mar 17, 2018 at 10:15 AM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
(5) perform operations on several objects denoted by Paths at once (copy and its multiple operand variants),
Sure it does: Path.rename and Path.replace. I know why rename and copy have historically been in separate modules, but the distinction is pretty arcane and matters a lot more to implementers than it does to users. Similarly, it's hard to explain why we have Path.mkdir but not Path.makedirs -- and these have historically both lived in the 'os' module, so we can't blame it on Path being a mirror of os.path. It's also not obvious why we should have Path.rmdir, but not Path.rmtree. My understanding is that the point of Path is to be a convenient, pleasant-to-use mechanism for accessing common filesystem operations. And it does a pretty excellent job of that. But it seems obvious to me that it's still missing a number of fairly basic operations that people need all the time. I don't think the PEP is there yet, and we can quibble over the details -- just copying over all the historical decisions in shutil isn't obviously the right move (maybe it should be Path.mkdir(include_parents=True) and Path.unlink(recursive=True) instead of Path.makedirs and Path.rmtree?), but there's definitely room for improvement. -n -- Nathaniel J. Smith -- https://vorpus.org

On 18 March 2018 at 04:41, Nathaniel Smith <njs@pobox.com> wrote:
IMO, the pathlib module (just) defines Path. So I'm -1 on adding anything to pathlib that isn't a method of a Path object. Beyond that, I agree with you that Path should be a convenient interface for filesystem path objects. I haven't personally found that there's much missing that I've needed, but I agree that there are some gaps from a theoretical point of view, and adding methods to fill those gaps could be justifiable. OTOH, the fspath protocol was explicitly designed so that standalone functions (such as the ones in os and shutil) can work cleanly with Path objects - so there's a strong argument that "not everything needs to be a method" applies here. For example, while there isn't a Path.makedirs(), what's so bad about os.makedirs(Path)? (There's consistency and discoverability arguments, but they are not what I'd call compelling on their own).
I agree that there are some potential candidates for "useful additional methods for Path objects", but I'd like to see these discussed on a case by case basis, much like you do here, rather than as a blanket "if it's in some other module and it works on paths, it should be in pathlib. My biggest problem with the proposal as it stands is that it makes no attempt to justify the suggestions on a case by case basis (the first version wasn't even explicit in the functions it was proposing!) but argues from a pure "lump everything together" standpoint. Paul

Maybe this is obvious or I am missing something crucial, but I'm surprised that this hasn't been discussed yet: than the low-level module. It seems natural that the high-level module should simply use the low-level module to do the file operators, and just provide a nice (probably object-oriented) interface for those methods. In python, `os` and `shutil` are currently the low-level modules, and it stands to reason that we might consider combining these somehow (although I'm assuming that there was a good reason not to in the first place, which is why they both exist, but I haven't looked into it). And `pathlib` is currently the "high-level" module. I see two problems currently: 1) the low-level module is split in half (for example, operations for copying are contained in `shutil` and operations for removing are contained in `os`). This is a bit annoying for the user, but it's not game-breaking. It does, however, make python feel a bit unnatural in this context, and that's even more unusual because normally python feels very natural. So this becomes sort of a "huh this feels weird" situation. 2) The `pathlib` modules only provides a high-level interface for working with _single_ Path objects. There isn't really functionality to work with multiple Path objects (as pointed out by someone previously if I am understanding correctly). I don't think the current PEP under consideration adequately solves either of these problems. Currently, it seems like it's trying to make `pathlib` both a high- and low-level module, which imo doesn't make sense. But I do think we need, if not a single low-level module, at least a high-level module that makes it unnecessary to use the low-level modules. That means that `pathlib` needs more functionality added to it, which is similar in spirit to the current PEP proposal. - Jason, a reader On Sun, Mar 18, 2018 at 9:46 AM, Paul Moore <p.f.moore@gmail.com> wrote:

Gotcha, thank you! shutil being a high level library complicates things... So we have two "high-level" libraries (pathlib and shutil) and both of them provide different pieces of useful functionality. Maybe I am starting to see why this is complicated. Thanks for reading my above reply and taking the time to respond.

Hi Jason, the status of os and shutil became this because of C functions in implementation (I got something similar answer before) ... What do you think, what would be a good way to solve this - add stuff from os to shutil - add stuff from os and shutil to pathlib - create a new module on top of os, shutil and pathlib (we could name it for example filelib George 2018-03-18 19:43 GMT+01:00 Jason Maldonis <jjmaldonis@gmail.com>:

On Sun, Mar 18, 2018 at 4:16 PM, George Fischhof <george@fischhof.hu> wrote:
But...why? It's not clear to me as to how this would actually, you know, *solve* anything.
-- Ryan (ライアン) Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone else https://refi64.com/

On Sun, Mar 18, 2018 at 4:16 PM, George Fischhof <george@fischhof.hu> wrote:
I'm pretty ignorant of the situation so my opinion probably isn't worth very much, but I'll give it a try ;) If shutil is functioning as a high-level library, it makes sense to me to add the stuff from os to shutil. Then users can simply import shutil and know that they have all the functionality they need to work with files. Someone said that pathlib's Paths work with the os module but not with shutil -- if os is the low-level module and shutil is the high-level, that seems a bit strange but I'm sure it made sense to do that. It seems to me that adding os functionality to shutil to make shutil "complete", and also modifying shutil functions to work with Path objects makes the most sense. Once that's done, shutil will be _the_ high-level library for using file operations in python, with the option to use Path objects if you want (side note: I'm a big fan of the Path class!). After that, it may make sense to add some (maybe all) of the methods in shutil to the pathlib module similar to how Path already implements some file operations (e.g. Path.glob and Path.unlink). All of that seems like a three step process to me: 1) Add the missing methods from os to shutil 2) Make the shutil methods work nicely with Path objects 3) Reevaluate and see what methods make sense to add to the pathlib module As a user, I'd be thrilled with that setup.

Jason Maldonis writes:
So we have two "high-level" libraries (pathlib and shutil)
pathlib is currently "low-level" as I understand the word. The only complex things it does are resolving and globbing, which are reasonable things to do with a Path's target object. Everything else is either a formal Path manipulation, or acts on a single object that is the target of a Path.

On Sun, Mar 18, 2018 at 10:53 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
The functions in os are thin wrappers around system calls,
exactly -- and this is a very old legacy from way back. Modern Python users should not have to concern themselves with whether something they want to do is essentially a system call or a higher-level process. Typical users, and certainly newbies, think "I want to do this or that with the filesystem", and it would be really nice if there was one way, and one place to do that. The old os vs shutil was annoying enough, then we got pathlib with very little support in the stlib, which was really annoying. Now we finally have pathlib support in most of the stdlib, so I can really tell people that they can use Paths, rather than strings for paths -- great! But yes, the job is not yet finished, because we still have to go find _some_ functionality in os or shutil Yes, it seems like duplication, but that decision was made when pathlib as added. I do think we should not simply move everything, but rather work out each case -- and I like Nathanial's idea of simplifying / cleaning up the API a bit while we are at it. (please don't have an "unlink"!). -CHB PS: does shutil really still not work with Path objects? aarrgg! PPS: someone made a comment about "having to update every book about python" -- so I"ll repeat: that decision was made when pathlib was added. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, 19 Mar 2018 at 18:08 Chris Barker <chris.barker@noaa.gov> wrote:
[SNIP] PS: does shutil really still not work with Path objects? aarrgg!
Did you verify this or are you just guessing? If this is true then file a bug and optionally submit a patch. Saying "aarrgg" doesn't fix the situation nor motivate people to help out, especially when it sounds like you're not even sure yourself that it's even a problem.
PPS: someone made a comment about "having to update every book about python" -- so I"ll repeat: that decision was made when pathlib was added.
That was me and yes, we had to have updates made because we thought pathlib was worth it. At this point the PEP as proposed has not made the case that what it wants to add is worth it because right now it's just a huge bullet list of functions from two modules with no specific motivation behind the individual changes. The whole point of my comment is to say "we can't make changes just because; changes have to meet a certain bar of improvement" and this point the PEP has not proven there is such an improvement based on what has been proposed. IOW the core devs I have seen comment on this have pretty much all said "justify the individual methods" and yet no one has done that yet, so any discussion other than trying to meet that need is not helping to move anything forward.

On Tue, Mar 20, 2018 at 4:23 PM Brett Cannon <brett@python.org> wrote:
My intent was, and still is, to encourage just that. And the rest of my message did say that ( I think). We all have only so many roundtoits to spend on this — so I won’t be writing that PEP, but I do think it’s worthwhile to encourage the OP ( or anyone else ) to do so. Writing a PEP is a lot of work, one wants to know it has a chance of being accepted. Python really could be improved in this regard — it’s had a confusing API for file system manipulations forever. Recent changes have helped, but it would be nice to get all the way there. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tuesday, 20 March 2018 16:22:57 GMT Brett Cannon wrote:
Checking this onm 3.6.4 on Fedora it looks like the shutil module works well with Path(). There was one bit of code that I thought might not work but it is not a problem as you cannot make a Path('b'filename'). Maybe the simplest change would be to add to the pathlib docs a notice at the top that says look in shutil and os for the file operations. Also show some examples of Path() working with shutil to do some typical operations. Barry

Nathaniel Smith writes:
I was very careful about the semantics. Those are *single objects* denoted by multiple Paths (at different times). You could argue multiple objects for Path.replace, but I consider maybe removing the reference to the original target of the new Path to be an edge case that you need to address if you have Path.rename.
Similarly, it's hard to explain why we have Path.mkdir but not Path.makedirs
So what? Let's fix that. As you propose:
(maybe it should be Path.mkdir(include_parents=True)
is fine, although that default seems a little risky vs. typos. I know I have confused myself with mkdir -p that way a few (though very few) times. Perhaps Guido would prefer Path.makedirs for this functionality.
and Path.unlink(recursive=True)
I dislike that API, to be honest (at least two interpretations: rmtree and remove_empty_directories=True). I would definitely call the more destructive operation Path.rmtree.
but there's definitely room for improvement.
I didn't deny that. All I argued was that, no, it really seems unlikely to me that Antoine intended pathlib to become Emacs. And I am against the PEP in its current form where it clearly intends to incorporate practically everything in os (dealing with filesystem objects) and shutil. Those APIs are not clean. I also feel that before we do anything but the minor filling-in exercises discussed explicitly above, we should see if we can add URIPath conforming to RFC 3986 and RFC 3987. Echoing Antoine's misgivings, I'm dubious about that, though, because Antoine implemented the concrete "realpath" semantics (resolve links before ..) in pathlib, while RFC 3986 specifies formal path manipulation semantics to prevent traversal above DocumentRoot and similar exploits. In web programming URIPath to Path conversions, and vice versa, will be very common, but I suspect at least one direction will be fragile because conversion and canonicalize won't commute. I hope I'm wrong![1] Well we know where we're going But we don't know where we've been And we know what we're knowing But we can't say what we've seen We're on a Path to nowhere Come on inside Taking that ride to nowhere We'll take that ride :-) Footnotes: [1] Perhaps that can be fixed by recommending a single composed operation that is safe. Maybe the simple conversion operations themselves can be "private" methods.

On 16 March 2018 at 03:15, Chris Angelico <rosuav@gmail.com> wrote:
Keep in mind that the `os` layer will never go away: `pathlib` still needs a lower level API to call to *do the work* of actually interacting with the underlying operating system APIs (e.g. this is why we added os.scandir). A similar situation applies when it comes to glob, fnmatch, etc. Even `shutil` will likely retain its place as a lower level procedural API behind pathlib's object-oriented facade, since raw strings are still frequently going to be easier to work with when mixing and matching Python code and native operating system shell code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Folks, it seems for me that the welcoming of this proposal is rather positive than not. Of course several details could be put into it, but I think it would better to let the developers decide the details, because they know the environment and the possibilities. The name of the functions and the method they solve the problem (for example rmdir(tree=True9 instead of removedirs()) is all the same. The (main) goal would be that file and directory operations reside in one module. And currently the pathlib seems to be the best candidate. (we could put then into a very new module, but it would be just another duplicataion) So what do You think, this proposal IS PEPable or should I do something with this to achieve the PEPable state? BR, George 2018-03-18 9:05 GMT+01:00 Nick Coghlan <ncoghlan@gmail.com>:

I think that is up for debate.
Of course several details could be put into it, but I think it would better to let the developers decide the details, because they know the environment and the possibilities.
You mean you have no intention of doing the implementation? If not you who is willing to the go the, not inconsiderable, work.
The name of the functions and the method they solve the problem (for example rmdir(tree=True9 instead of removedirs()) is all the same.
But it is not the same. os.removedirs only removes dirs, where as shutil.rmdir will remove files and dirs.
The (main) goal would be that file and directory operations reside in one module.
As I said earlier the question is should that module be pathlib? There are good arguments on both side. Barry

On Sun, Mar 18, 2018 at 4:58 AM, George Fischhof <george@fischhof.hu> wrote:
That's not how PEPs work :-). Someone has to do the work of collating contradictory feedback and making opinionated design proposals, and the person who does that is called the PEP author. In this case, I'd also suggest framing the PEP as a list of specific things that should be added to pathlib.Path, with justifications for each. If your argument is "X should be in pathlib because it's in some other module", then that's not very compelling -- by definition it already means we have an X, so why do we need another? I think for a number of these cases there actually is a good answer to that question, but your PEP has to actually provide that answer :-). -n -- Nathaniel J. Smith -- https://vorpus.org

On Sun, 18 Mar 2018 at 20:37 Nathaniel Smith <njs@pobox.com> wrote:
And just to make it super-clear, the advise Nathaniel has provided will be required to be met before any of the PEP editors accept this proposal. IOW a bullet list will not suffice and you will need clear justification for every change you make in order to explain why every book on Python will need to be updated due to this PEP. :)

On 18/03/18 11:58, George Fischhof wrote:
it seems for me that the welcoming of this proposal is rather positive than not.
I think you may have observer bias :-) As far as I am concerned you have yet to make a convincing case that there is a problem, never mind that your solution is appropriate. Your solution also isn't detailed enough, as several people have pointed out. -- Rhodri James *-* Kynesim Ltd

On 3/19/2018 11:31 AM, Rhodri James wrote:
Or, as often happens, George is making too much of a biased sample of opinions -- those who care enough to respond, *given what has also been said.* Python-idea works best as a place to evaluate ideas and develop alternatives. Vote counting is premature. Proposers *must* be flexible and not defend their initial solution like the Alamo.
I am convinced that *some* people, especially but not limited to newbies, find the current situation confusing and less than optimal. I am also pretty convinced that the idea of dumping a copy of everything into pathlib is the wrong solution. I most agree with the latest posts from Nathaniel Smith and Stephen Trunball. -- Terry Jan Reedy

On Mon, Mar 19, 2018 at 9:59 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I'm confused by the whole discussion because I thought pathlib was a means for dealing with paths (i.e., names of things), not the objects to which they refer , and was simply a Pythonic modernization of os.path.

On 12/03/18 20:57, George Fischhof wrote:
I am mildly negative about this. In my copious spare time (ho ho) I have been considering the use of pathlib with things that aren't filing systems, such as http or ftp. Moving the file operations into pathlib would strain that idea even harder than my (extremely superficial) current thinking. -- Rhodri James *-* Kynesim Ltd

On Mar 12, 2018 1:57 PM, "George Fischhof" <george@fischhof.hu> wrote: This PEP proposes pathlib module to be a centralized place for all file-system related operations. I'd find this useful for another reason that hasn't been mentioned yet: having a single class collecting all the common/basic file operations makes it much easier to provide a convenient async version of that interface. For example: https://trio.readthedocs.io/en/latest/reference-io.html#trio.Path Obviously it can never be complete, since there are always going to be standalone functions that take paths and work on them internally (for example in third party libraries), but the operations we're talking about here are all pretty basic primitives. -n

On 12 March 2018 at 20:57, George Fischhof <george@fischhof.hu> wrote:
I don't know for certain what I think I feel about the idea - in general, it seems plausible. But I think you'll need to get down to specifics in the PEP, exactly what functions do you suggest get added to pathlib? Paul

path.py conveniently defines very many (most?) file operations as methods: https://github.com/jaraco/path.py/blob/master/path.py https://pathpy.readthedocs.io/en/latest/api.html The division operator is mapped to os.path.join: Path('a') / '/root' == Path('/root') On Monday, March 12, 2018, George Fischhof <george@fischhof.hu> wrote:

Here's a comparison table of os, os.path, shutil, pathlib, and path.py. The full version is at https://github.com/westurner/pyfilemods (README.rst) and at https://westurner.github.io/pyfilemods. I ran a few set intersections and went ahead and wrote a report to compare function/method signatures and sources. attr table ========== ================== == ======= ====== ======= ======= attr os os.path shutil pathlib path.py ================== == ======= ====== ======= ======= `__div__`_ X `__rdiv__`_ X `absolute`_ X `abspath`_ X X `access`_ X X `altsep`_ X X `anchor`_ X `as_posix`_ X `as_uri`_ X `atime`_ X `basename`_ X X `bytes`_ X `capitalize`_ X `casefold`_ X `cd`_ X `center`_ X `chdir`_ X X `chmod`_ X X X `chown`_ X X X `chroot`_ X X `chunks`_ X `commonpath`_ X `commonprefix`_ X `copy`_ X X `copy2`_ X X `copyfile`_ X X `copymode`_ X X `copystat`_ X X `copytree`_ X X `count`_ X `ctime`_ X `curdir`_ X X `cwd`_ X `defpath`_ X X `devnull`_ X X `dirname`_ X X `dirs`_ X `drive`_ X X `encode`_ X `endswith`_ X `exists`_ X X X `expand`_ X `expandtabs`_ X `expanduser`_ X X X `expandvars`_ X X `ext`_ X `extsep`_ X X `files`_ X `find`_ X `fnmatch`_ X X `format`_ X `format_map`_ X `get_owner`_ X `getatime`_ X X `getctime`_ X X `getcwd`_ X X `getmtime`_ X X `getsize`_ X X `glob`_ X X `group`_ X `home`_ X `in_place`_ X `index`_ X `is_absolute`_ X `is_block_device`_ X `is_char_device`_ X `is_dir`_ X `is_fifo`_ X `is_file`_ X `is_reserved`_ X `is_socket`_ X `is_symlink`_ X `isabs`_ X X `isalnum`_ X `isalpha`_ X `isdecimal`_ X `isdigit`_ X `isdir`_ X X `isfile`_ X X `isidentifier`_ X `islink`_ X X `islower`_ X `ismount`_ X X `isnumeric`_ X `isprintable`_ X `isspace`_ X `istitle`_ X `isupper`_ X `iterdir`_ X `join`_ X X `joinpath`_ X X `lchmod`_ X `lexists`_ X `lines`_ X `link`_ X X `listdir`_ X X `ljust`_ X `lower`_ X `lstat`_ X X X `lstrip`_ X `makedirs`_ X X `makedirs_p`_ X `maketrans`_ X `match`_ X `merge_tree`_ X `mkdir`_ X X X `mkdir_p`_ X `module`_ X `move`_ X X `mtime`_ X `name`_ X X X `namebase`_ X `normcase`_ X X `normpath`_ X X `open`_ X X X `os`_ X X `owner`_ X X `pardir`_ X X `parent`_ X X `parents`_ X `partition`_ X `parts`_ X `pathconf`_ X X `pathsep`_ X X `read_bytes`_ X `read_hash`_ X `read_hexhash`_ X `read_md5`_ X `read_text`_ X `readlink`_ X X `readlinkabs`_ X `realpath`_ X X `relative_to`_ X `relpath`_ X X `relpathto`_ X `remove`_ X X `remove_p`_ X `removedirs`_ X X `removedirs_p`_ X `rename`_ X X X `renames`_ X X `replace`_ X X X `resolve`_ X `rfind`_ X `rglob`_ X `rindex`_ X `rjust`_ X `rmdir`_ X X X `rmdir_p`_ X `rmtree`_ X X `rmtree_p`_ X `root`_ X `rpartition`_ X `rsplit`_ X `rstrip`_ X `samefile`_ X X X `sameopenfile`_ X `samestat`_ X `sep`_ X X `size`_ X `special`_ X `split`_ X X `splitall`_ X `splitdrive`_ X X `splitext`_ X X `splitlines`_ X `splitpath`_ X `splitunc`_ X `startswith`_ X `stat`_ X X X X X `statvfs`_ X X `stem`_ X X `strip`_ X `stripext`_ X `suffix`_ X `suffixes`_ X `swapcase`_ X `symlink`_ X X `symlink_to`_ X `text`_ X `title`_ X `touch`_ X X `translate`_ X `uncshare`_ X `unlink`_ X X X `unlink_p`_ X `upper`_ X `using_module`_ X `utime`_ X X `walk`_ X X `walkdirs`_ X `walkfiles`_ X `with_name`_ X `with_suffix`_ X X `write_bytes`_ X X `write_lines`_ X `write_text`_ X X `zfill`_ X ================== == ======= ====== ======= ======= On Wed, Mar 14, 2018 at 2:22 PM, Wes Turner <wes.turner@gmail.com> wrote:

I added trio to the comparison table (Things are mostly just async-wrapped, though pathlib_not_trio does show a few missing methods?). https://github.com/westurner/pyfilemods/issues/2 https://github.com/westurner/pyfilemods/blob/master/README.rst#attr-table ================== == ======= ====== ======= ======= ==== attr os os.path shutil pathlib path.py trio ================== == ======= ====== ======= ======= ==== `__div__`_ X `__rdiv__`_ X `absolute`_ X X `abspath`_ X X `access`_ X X `altsep`_ X X `anchor`_ X `as_posix`_ X X `as_uri`_ X X `atime`_ X `basename`_ X X `bytes`_ X `capitalize`_ X `casefold`_ X `cd`_ X `center`_ X `chdir`_ X X `chmod`_ X X X X `chown`_ X X X `chroot`_ X X `chunks`_ X `commonpath`_ X `commonprefix`_ X `copy`_ X X `copy2`_ X X `copyfile`_ X X `copymode`_ X X `copystat`_ X X `copytree`_ X X `count`_ X `ctime`_ X `curdir`_ X X `cwd`_ X X `defpath`_ X X `devnull`_ X X `dirname`_ X X `dirs`_ X `drive`_ X X `encode`_ X `endswith`_ X `exists`_ X X X X `expand`_ X `expandtabs`_ X `expanduser`_ X X X X `expandvars`_ X X `ext`_ X `extsep`_ X X `files`_ X `find`_ X `fnmatch`_ X X `format`_ X `format_map`_ X `get_owner`_ X `getatime`_ X X `getctime`_ X X `getcwd`_ X X `getmtime`_ X X `getsize`_ X X `glob`_ X X X `group`_ X X `home`_ X X `in_place`_ X `index`_ X `is_absolute`_ X X `is_block_device`_ X X `is_char_device`_ X X `is_dir`_ X X `is_fifo`_ X X `is_file`_ X X `is_reserved`_ X X `is_socket`_ X X `is_symlink`_ X X `isabs`_ X X `isalnum`_ X `isalpha`_ X `isdecimal`_ X `isdigit`_ X `isdir`_ X X `isfile`_ X X `isidentifier`_ X `islink`_ X X `islower`_ X `ismount`_ X X `isnumeric`_ X `isprintable`_ X `isspace`_ X `istitle`_ X `isupper`_ X `iterdir`_ X X `join`_ X X `joinpath`_ X X X `lchmod`_ X X `lexists`_ X `lines`_ X `link`_ X X `listdir`_ X X `ljust`_ X `lower`_ X `lstat`_ X X X X `lstrip`_ X `makedirs`_ X X `makedirs_p`_ X `maketrans`_ X `match`_ X X `merge_tree`_ X `mkdir`_ X X X X `mkdir_p`_ X `module`_ X `move`_ X X `mtime`_ X `name`_ X X X `namebase`_ X `normcase`_ X X `normpath`_ X X `open`_ X X X X `os`_ X X `owner`_ X X X `pardir`_ X X `parent`_ X X `parents`_ X `partition`_ X `parts`_ X `pathconf`_ X X `pathsep`_ X X `read_bytes`_ X X `read_hash`_ X `read_hexhash`_ X `read_md5`_ X `read_text`_ X X `readlink`_ X X `readlinkabs`_ X `realpath`_ X X `relative_to`_ X X `relpath`_ X X `relpathto`_ X `remove`_ X X `remove_p`_ X `removedirs`_ X X `removedirs_p`_ X `rename`_ X X X X `renames`_ X X `replace`_ X X X X `resolve`_ X X `rfind`_ X `rglob`_ X X `rindex`_ X `rjust`_ X `rmdir`_ X X X X `rmdir_p`_ X `rmtree`_ X X `rmtree_p`_ X `root`_ X `rpartition`_ X `rsplit`_ X `rstrip`_ X `samefile`_ X X X X `sameopenfile`_ X `samestat`_ X `sep`_ X X `size`_ X `special`_ X `split`_ X X `splitall`_ X `splitdrive`_ X X `splitext`_ X X `splitlines`_ X `splitpath`_ X `splitunc`_ X `startswith`_ X `stat`_ X X X X X X `statvfs`_ X X `stem`_ X X `strip`_ X `stripext`_ X `suffix`_ X `suffixes`_ X `swapcase`_ X `symlink`_ X X `symlink_to`_ X X `text`_ X `title`_ X `touch`_ X X X `translate`_ X `uncshare`_ X `unlink`_ X X X X `unlink_p`_ X `upper`_ X `using_module`_ X `utime`_ X X `walk`_ X X `walkdirs`_ X `walkfiles`_ X `with_name`_ X X `with_suffix`_ X X X `write_bytes`_ X X X `write_lines`_ X `write_text`_ X X X `zfill`_ X ================== == ======= ====== ======= ======= ==== On Mon, Mar 19, 2018 at 5:23 AM, Wes Turner <wes.turner@gmail.com> wrote:

On Tue, Mar 20, 2018 at 1:03 AM, Wes Turner <wes.turner@gmail.com> wrote:
trio.Path is an automatically generated, exact mirror of pathlib.Path, so I don't think it's very useful to have in your table? Also the missing attributes are actually handled via __getattr__, so they aren't actually missing, they're just invisible to your detection mechanism :-) In [21]: trio.Path("/a/b").anchor Out[21]: '/' In [22]: trio.Path("/a/b").name Out[22]: 'b' -n -- Nathaniel J. Smith -- https://vorpus.org

I thought that because of __fspath__ conversions pathlib can be use with any function that needs a path. So long as shutil is Pathlib friendly isn’t this asolv d problem? No need to extend pathlib for every file op, just pass the existed file ops a Path. Wasn’t this discussed at the time fspath was added? Barry

On Mon, Mar 12, 2018 at 09:57:32PM +0100, George Fischhof wrote:
This is certainly a problem. Not a big problem, but it is an annoyance.
I don't know that this will be true. It makes one problem better: you no longer have to remember which module the function is in. But it makes other things worse: - code and/or API duplication: for backwards compatibility, every existing function must be in two places, the original and in pathlib; - if new file functions are added, they will go only in pathlib, which makes pathlib effectively mandatory; - the pathlib API becomes even more complicated: not only have you got all the methods of pathlib objects, but you have all the shutil and os functions as well. I think this is a good place for an experiment. You could write a function which monkey-patches pathlib: from pathlib import Path import os import shutil def monkeypatch(): Path.remove = os.remove # etc. Then we can see how many functions are involved, how large this makes the Path object, and try it out and see whether it is better. -- Steve

2018-03-13 13:17 GMT+01:00 Steven D'Aprano <steve@pearwood.info>:
Duplication: it is true, but it is true for several other modules as well. I checked the pathlib module: right now more than 50% of the functions are duplicate - mainly from os - so it seems that pathlib already started to develop this way ;-) " if new file functions are added, they will go only in pathlib, which makes pathlib effectively mandatory;" Yes but I think this part of the evolution: slowly everyone will shift to pathlib, and being mandatory is true for the current status as well: if you need a function, you need the module. Right now if you wan to execute some file operations, you need os plus shutil, because the half of the functions are in one of them, the other half is in the other module I collected the functions that sould be put into pathlib: - os.remove - os.removedirs (shutil.rmtree has the same functionalaty) - os.truncate - shutil.copyfileobj - shutil.copyfile - shutil.copymode - shutil.copystat - shutil.copy - shutil.copy2 - shutil.copytree with shutil.ignore_patterns - shutil.move - shutil.disk_usage - shutil.chown - os.link => path.hardlink_to - os.mkfifo - os.readlink Sum: 16 functuins And all functions from os module accept path-like objects, and none of the shutil functions. Pathlib already contains 17 functions from os an shutil modules. George

On Fri, Mar 16, 2018 at 12:38 AM, George Fischhof <george@fischhof.hu> wrote:
The os module is cheap; pathlib has a definite cost. If every file operation goes through pathlib, that basically means pathlib becomes part of the startup cost: rosuav@sikorsky:~$ python3 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import os"])' 50 loops, best of 5: 8.82 msec per loop rosuav@sikorsky:~$ python3 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import pathlib"])' 20 loops, best of 5: 15.9 msec per loop rosuav@sikorsky:~$ python3.6 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import os"])' 100 loops, best of 3: 14.1 msec per loop rosuav@sikorsky:~$ python3.6 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import pathlib"])' 10 loops, best of 3: 19.7 msec per loop rosuav@sikorsky:~$ python3.5 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import os"])' 100 loops, best of 3: 10.6 msec per loop rosuav@sikorsky:~$ python3.5 -m timeit -s 'import subprocess, sys, pathlib' 'subprocess.check_call([sys.executable, "-c", "import pathlib"])' 100 loops, best of 3: 18.7 msec per loop And this is with warm caches; for a true first-time startup, the cost could be significantly higher. ChrisA

On Fri, Mar 16, 2018 at 4:48 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
3.8, but I also tested with 3.7, 3.6, and 3.5. All of them showed the same kind of difference, but the numbers were a bit higher in the older versions. I fully expect that Python 3.4 (the first one where pathlib existed) would show the same too. ChrisA

If there it is desireable to have pathlib used to represent paths that do not map directly to the filesystem.. then it might be an acceptable compromise to have yet another... package that just imports os, pathlib, shutil etc and re-exports all relevant functions. i mean we are talking about convenience here anyway not new functionality and this would enable 1-import convenience that gives you all-you-would-ever-want but at the cost of maybe a bit more startup time if you only needed 1 of the 3 modules that do actual work. and avoid polluting pathlib with things that are... too useful in the real world :P

Joonas Liik writes:
then it might be an acceptable compromise to have yet another...
"There should be one-- and preferably only one -- obvious way to do it." The obvious way is to use the existing stdlib modules. So....
package that just imports os, pathlib, shutil etc and re-exports all relevant functions.
Anybody wanting this can easily do a better job than the stdlib ever can do -- by writing a package including all the modules they frequently use and only re-exporting those names that they use, perhaps with shorter or personally mnemonic aliases (Windows vs. Unix nomenclature for many shell utilities, for example -- but watch those builtins like "dir"!) Of course, this is horrible programming practice, making for burdens on reviewers and maintainers. I see the convenience for writing one-off scripts and perhaps a personal library, but I don't see it as a core function of the language and its standard library. So, -1 in the stdlib, and +0 for a base module on PyPI that demonstrates the principle and individual users could modify to personal taste. Steve

2018-03-17 7:18 GMT+01:00 Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp>:
Hi Steve, as I wrote previuosly pathlib already contains 17 functions from os and shutil. It seems that the original idea was something like for my idea. Just it not finished yet, With adding the remaining functions it would become a nice and whole thing George

George Fischhof writes:
It seems that the original idea was something like for my idea. Just it not finished yet,
Antoine (author and maintainer of pathlib) is not the kind of developer who leaves things unfinished. In PEP 428, there's a hint that some shutil functionality could be added, but I really don't think he meant anything as broad as in your PEP. As far as I can recall, pathlib is intended from the beginning to (1) represent paths in hierarchical local filesystems as Paths, (2) manipulate individual Paths in various ways consistent with the semantics of a hierarchal filesystem, and (3) offer various ways to access the object denoted by a single Path. Its functionality is very complete as far as that goes. It does not contain methods to (4) operate on directories as collections (with the exception of the iterdir, glob, and rglob methods, which expose directory contents as iterators of Paths), (5) perform operations on several objects denoted by Paths at once (copy and its multiple operand variants), (6) perform process control or access process characteristics, (7) perform operations (eg, mounting partitions and flow control on TTYs) on devices (block or character), even if they can be accessed via paths in some filesystem as in POSIX, or (8) deal with users and other specialized OS objects. I conclude there never was any intention to overlap with os or shutil, except to the extent that they provide for any kind of path manipulation. Rather, I suppose the intent was to provide a substitute for os.path with a more convenient, complete, object-oriented API and consistent semantics, based on more than a decade of experience with os.path. Regards, Steve

On Sat, Mar 17, 2018 at 10:15 AM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
(5) perform operations on several objects denoted by Paths at once (copy and its multiple operand variants),
Sure it does: Path.rename and Path.replace. I know why rename and copy have historically been in separate modules, but the distinction is pretty arcane and matters a lot more to implementers than it does to users. Similarly, it's hard to explain why we have Path.mkdir but not Path.makedirs -- and these have historically both lived in the 'os' module, so we can't blame it on Path being a mirror of os.path. It's also not obvious why we should have Path.rmdir, but not Path.rmtree. My understanding is that the point of Path is to be a convenient, pleasant-to-use mechanism for accessing common filesystem operations. And it does a pretty excellent job of that. But it seems obvious to me that it's still missing a number of fairly basic operations that people need all the time. I don't think the PEP is there yet, and we can quibble over the details -- just copying over all the historical decisions in shutil isn't obviously the right move (maybe it should be Path.mkdir(include_parents=True) and Path.unlink(recursive=True) instead of Path.makedirs and Path.rmtree?), but there's definitely room for improvement. -n -- Nathaniel J. Smith -- https://vorpus.org

On 18 March 2018 at 04:41, Nathaniel Smith <njs@pobox.com> wrote:
IMO, the pathlib module (just) defines Path. So I'm -1 on adding anything to pathlib that isn't a method of a Path object. Beyond that, I agree with you that Path should be a convenient interface for filesystem path objects. I haven't personally found that there's much missing that I've needed, but I agree that there are some gaps from a theoretical point of view, and adding methods to fill those gaps could be justifiable. OTOH, the fspath protocol was explicitly designed so that standalone functions (such as the ones in os and shutil) can work cleanly with Path objects - so there's a strong argument that "not everything needs to be a method" applies here. For example, while there isn't a Path.makedirs(), what's so bad about os.makedirs(Path)? (There's consistency and discoverability arguments, but they are not what I'd call compelling on their own).
I agree that there are some potential candidates for "useful additional methods for Path objects", but I'd like to see these discussed on a case by case basis, much like you do here, rather than as a blanket "if it's in some other module and it works on paths, it should be in pathlib. My biggest problem with the proposal as it stands is that it makes no attempt to justify the suggestions on a case by case basis (the first version wasn't even explicit in the functions it was proposing!) but argues from a pure "lump everything together" standpoint. Paul

Maybe this is obvious or I am missing something crucial, but I'm surprised that this hasn't been discussed yet: than the low-level module. It seems natural that the high-level module should simply use the low-level module to do the file operators, and just provide a nice (probably object-oriented) interface for those methods. In python, `os` and `shutil` are currently the low-level modules, and it stands to reason that we might consider combining these somehow (although I'm assuming that there was a good reason not to in the first place, which is why they both exist, but I haven't looked into it). And `pathlib` is currently the "high-level" module. I see two problems currently: 1) the low-level module is split in half (for example, operations for copying are contained in `shutil` and operations for removing are contained in `os`). This is a bit annoying for the user, but it's not game-breaking. It does, however, make python feel a bit unnatural in this context, and that's even more unusual because normally python feels very natural. So this becomes sort of a "huh this feels weird" situation. 2) The `pathlib` modules only provides a high-level interface for working with _single_ Path objects. There isn't really functionality to work with multiple Path objects (as pointed out by someone previously if I am understanding correctly). I don't think the current PEP under consideration adequately solves either of these problems. Currently, it seems like it's trying to make `pathlib` both a high- and low-level module, which imo doesn't make sense. But I do think we need, if not a single low-level module, at least a high-level module that makes it unnecessary to use the low-level modules. That means that `pathlib` needs more functionality added to it, which is similar in spirit to the current PEP proposal. - Jason, a reader On Sun, Mar 18, 2018 at 9:46 AM, Paul Moore <p.f.moore@gmail.com> wrote:

Gotcha, thank you! shutil being a high level library complicates things... So we have two "high-level" libraries (pathlib and shutil) and both of them provide different pieces of useful functionality. Maybe I am starting to see why this is complicated. Thanks for reading my above reply and taking the time to respond.

Hi Jason, the status of os and shutil became this because of C functions in implementation (I got something similar answer before) ... What do you think, what would be a good way to solve this - add stuff from os to shutil - add stuff from os and shutil to pathlib - create a new module on top of os, shutil and pathlib (we could name it for example filelib George 2018-03-18 19:43 GMT+01:00 Jason Maldonis <jjmaldonis@gmail.com>:

On Sun, Mar 18, 2018 at 4:16 PM, George Fischhof <george@fischhof.hu> wrote:
But...why? It's not clear to me as to how this would actually, you know, *solve* anything.
-- Ryan (ライアン) Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone else https://refi64.com/

On Sun, Mar 18, 2018 at 4:16 PM, George Fischhof <george@fischhof.hu> wrote:
I'm pretty ignorant of the situation so my opinion probably isn't worth very much, but I'll give it a try ;) If shutil is functioning as a high-level library, it makes sense to me to add the stuff from os to shutil. Then users can simply import shutil and know that they have all the functionality they need to work with files. Someone said that pathlib's Paths work with the os module but not with shutil -- if os is the low-level module and shutil is the high-level, that seems a bit strange but I'm sure it made sense to do that. It seems to me that adding os functionality to shutil to make shutil "complete", and also modifying shutil functions to work with Path objects makes the most sense. Once that's done, shutil will be _the_ high-level library for using file operations in python, with the option to use Path objects if you want (side note: I'm a big fan of the Path class!). After that, it may make sense to add some (maybe all) of the methods in shutil to the pathlib module similar to how Path already implements some file operations (e.g. Path.glob and Path.unlink). All of that seems like a three step process to me: 1) Add the missing methods from os to shutil 2) Make the shutil methods work nicely with Path objects 3) Reevaluate and see what methods make sense to add to the pathlib module As a user, I'd be thrilled with that setup.

Jason Maldonis writes:
So we have two "high-level" libraries (pathlib and shutil)
pathlib is currently "low-level" as I understand the word. The only complex things it does are resolving and globbing, which are reasonable things to do with a Path's target object. Everything else is either a formal Path manipulation, or acts on a single object that is the target of a Path.

On Sun, Mar 18, 2018 at 10:53 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
The functions in os are thin wrappers around system calls,
exactly -- and this is a very old legacy from way back. Modern Python users should not have to concern themselves with whether something they want to do is essentially a system call or a higher-level process. Typical users, and certainly newbies, think "I want to do this or that with the filesystem", and it would be really nice if there was one way, and one place to do that. The old os vs shutil was annoying enough, then we got pathlib with very little support in the stlib, which was really annoying. Now we finally have pathlib support in most of the stdlib, so I can really tell people that they can use Paths, rather than strings for paths -- great! But yes, the job is not yet finished, because we still have to go find _some_ functionality in os or shutil Yes, it seems like duplication, but that decision was made when pathlib as added. I do think we should not simply move everything, but rather work out each case -- and I like Nathanial's idea of simplifying / cleaning up the API a bit while we are at it. (please don't have an "unlink"!). -CHB PS: does shutil really still not work with Path objects? aarrgg! PPS: someone made a comment about "having to update every book about python" -- so I"ll repeat: that decision was made when pathlib was added. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, 19 Mar 2018 at 18:08 Chris Barker <chris.barker@noaa.gov> wrote:
[SNIP] PS: does shutil really still not work with Path objects? aarrgg!
Did you verify this or are you just guessing? If this is true then file a bug and optionally submit a patch. Saying "aarrgg" doesn't fix the situation nor motivate people to help out, especially when it sounds like you're not even sure yourself that it's even a problem.
PPS: someone made a comment about "having to update every book about python" -- so I"ll repeat: that decision was made when pathlib was added.
That was me and yes, we had to have updates made because we thought pathlib was worth it. At this point the PEP as proposed has not made the case that what it wants to add is worth it because right now it's just a huge bullet list of functions from two modules with no specific motivation behind the individual changes. The whole point of my comment is to say "we can't make changes just because; changes have to meet a certain bar of improvement" and this point the PEP has not proven there is such an improvement based on what has been proposed. IOW the core devs I have seen comment on this have pretty much all said "justify the individual methods" and yet no one has done that yet, so any discussion other than trying to meet that need is not helping to move anything forward.

On Tue, Mar 20, 2018 at 4:23 PM Brett Cannon <brett@python.org> wrote:
My intent was, and still is, to encourage just that. And the rest of my message did say that ( I think). We all have only so many roundtoits to spend on this — so I won’t be writing that PEP, but I do think it’s worthwhile to encourage the OP ( or anyone else ) to do so. Writing a PEP is a lot of work, one wants to know it has a chance of being accepted. Python really could be improved in this regard — it’s had a confusing API for file system manipulations forever. Recent changes have helped, but it would be nice to get all the way there. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tuesday, 20 March 2018 16:22:57 GMT Brett Cannon wrote:
Checking this onm 3.6.4 on Fedora it looks like the shutil module works well with Path(). There was one bit of code that I thought might not work but it is not a problem as you cannot make a Path('b'filename'). Maybe the simplest change would be to add to the pathlib docs a notice at the top that says look in shutil and os for the file operations. Also show some examples of Path() working with shutil to do some typical operations. Barry

Nathaniel Smith writes:
I was very careful about the semantics. Those are *single objects* denoted by multiple Paths (at different times). You could argue multiple objects for Path.replace, but I consider maybe removing the reference to the original target of the new Path to be an edge case that you need to address if you have Path.rename.
Similarly, it's hard to explain why we have Path.mkdir but not Path.makedirs
So what? Let's fix that. As you propose:
(maybe it should be Path.mkdir(include_parents=True)
is fine, although that default seems a little risky vs. typos. I know I have confused myself with mkdir -p that way a few (though very few) times. Perhaps Guido would prefer Path.makedirs for this functionality.
and Path.unlink(recursive=True)
I dislike that API, to be honest (at least two interpretations: rmtree and remove_empty_directories=True). I would definitely call the more destructive operation Path.rmtree.
but there's definitely room for improvement.
I didn't deny that. All I argued was that, no, it really seems unlikely to me that Antoine intended pathlib to become Emacs. And I am against the PEP in its current form where it clearly intends to incorporate practically everything in os (dealing with filesystem objects) and shutil. Those APIs are not clean. I also feel that before we do anything but the minor filling-in exercises discussed explicitly above, we should see if we can add URIPath conforming to RFC 3986 and RFC 3987. Echoing Antoine's misgivings, I'm dubious about that, though, because Antoine implemented the concrete "realpath" semantics (resolve links before ..) in pathlib, while RFC 3986 specifies formal path manipulation semantics to prevent traversal above DocumentRoot and similar exploits. In web programming URIPath to Path conversions, and vice versa, will be very common, but I suspect at least one direction will be fragile because conversion and canonicalize won't commute. I hope I'm wrong![1] Well we know where we're going But we don't know where we've been And we know what we're knowing But we can't say what we've seen We're on a Path to nowhere Come on inside Taking that ride to nowhere We'll take that ride :-) Footnotes: [1] Perhaps that can be fixed by recommending a single composed operation that is safe. Maybe the simple conversion operations themselves can be "private" methods.

On 16 March 2018 at 03:15, Chris Angelico <rosuav@gmail.com> wrote:
Keep in mind that the `os` layer will never go away: `pathlib` still needs a lower level API to call to *do the work* of actually interacting with the underlying operating system APIs (e.g. this is why we added os.scandir). A similar situation applies when it comes to glob, fnmatch, etc. Even `shutil` will likely retain its place as a lower level procedural API behind pathlib's object-oriented facade, since raw strings are still frequently going to be easier to work with when mixing and matching Python code and native operating system shell code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Hi Folks, it seems for me that the welcoming of this proposal is rather positive than not. Of course several details could be put into it, but I think it would better to let the developers decide the details, because they know the environment and the possibilities. The name of the functions and the method they solve the problem (for example rmdir(tree=True9 instead of removedirs()) is all the same. The (main) goal would be that file and directory operations reside in one module. And currently the pathlib seems to be the best candidate. (we could put then into a very new module, but it would be just another duplicataion) So what do You think, this proposal IS PEPable or should I do something with this to achieve the PEPable state? BR, George 2018-03-18 9:05 GMT+01:00 Nick Coghlan <ncoghlan@gmail.com>:

I think that is up for debate.
Of course several details could be put into it, but I think it would better to let the developers decide the details, because they know the environment and the possibilities.
You mean you have no intention of doing the implementation? If not you who is willing to the go the, not inconsiderable, work.
The name of the functions and the method they solve the problem (for example rmdir(tree=True9 instead of removedirs()) is all the same.
But it is not the same. os.removedirs only removes dirs, where as shutil.rmdir will remove files and dirs.
The (main) goal would be that file and directory operations reside in one module.
As I said earlier the question is should that module be pathlib? There are good arguments on both side. Barry

On Sun, Mar 18, 2018 at 4:58 AM, George Fischhof <george@fischhof.hu> wrote:
That's not how PEPs work :-). Someone has to do the work of collating contradictory feedback and making opinionated design proposals, and the person who does that is called the PEP author. In this case, I'd also suggest framing the PEP as a list of specific things that should be added to pathlib.Path, with justifications for each. If your argument is "X should be in pathlib because it's in some other module", then that's not very compelling -- by definition it already means we have an X, so why do we need another? I think for a number of these cases there actually is a good answer to that question, but your PEP has to actually provide that answer :-). -n -- Nathaniel J. Smith -- https://vorpus.org

On Sun, 18 Mar 2018 at 20:37 Nathaniel Smith <njs@pobox.com> wrote:
And just to make it super-clear, the advise Nathaniel has provided will be required to be met before any of the PEP editors accept this proposal. IOW a bullet list will not suffice and you will need clear justification for every change you make in order to explain why every book on Python will need to be updated due to this PEP. :)

On 18/03/18 11:58, George Fischhof wrote:
it seems for me that the welcoming of this proposal is rather positive than not.
I think you may have observer bias :-) As far as I am concerned you have yet to make a convincing case that there is a problem, never mind that your solution is appropriate. Your solution also isn't detailed enough, as several people have pointed out. -- Rhodri James *-* Kynesim Ltd

On 3/19/2018 11:31 AM, Rhodri James wrote:
Or, as often happens, George is making too much of a biased sample of opinions -- those who care enough to respond, *given what has also been said.* Python-idea works best as a place to evaluate ideas and develop alternatives. Vote counting is premature. Proposers *must* be flexible and not defend their initial solution like the Alamo.
I am convinced that *some* people, especially but not limited to newbies, find the current situation confusing and less than optimal. I am also pretty convinced that the idea of dumping a copy of everything into pathlib is the wrong solution. I most agree with the latest posts from Nathaniel Smith and Stephen Trunball. -- Terry Jan Reedy

On Mon, Mar 19, 2018 at 9:59 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I'm confused by the whole discussion because I thought pathlib was a means for dealing with paths (i.e., names of things), not the objects to which they refer , and was simply a Pythonic modernization of os.path.

On 12/03/18 20:57, George Fischhof wrote:
I am mildly negative about this. In my copious spare time (ho ho) I have been considering the use of pathlib with things that aren't filing systems, such as http or ftp. Moving the file operations into pathlib would strain that idea even harder than my (extremely superficial) current thinking. -- Rhodri James *-* Kynesim Ltd

On Mar 12, 2018 1:57 PM, "George Fischhof" <george@fischhof.hu> wrote: This PEP proposes pathlib module to be a centralized place for all file-system related operations. I'd find this useful for another reason that hasn't been mentioned yet: having a single class collecting all the common/basic file operations makes it much easier to provide a convenient async version of that interface. For example: https://trio.readthedocs.io/en/latest/reference-io.html#trio.Path Obviously it can never be complete, since there are always going to be standalone functions that take paths and work on them internally (for example in third party libraries), but the operations we're talking about here are all pretty basic primitives. -n
participants (22)
-
Antoine Pitrou
-
Barry
-
Barry Scott
-
Brett Cannon
-
Chris Angelico
-
Chris Barker
-
Eric Fahlgren
-
George Fischhof
-
Greg Ewing
-
Jason Maldonis
-
Joonas Liik
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Petr Viktorin
-
Rhodri James
-
Ryan Gonzalez
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy
-
Wes Turner