[Python-checkins] bpo-32689: Updates shutil.move to allow for Path objects to be used as source arg (GH-15326)

Miss Islington (bot) webhook-mailer at python.org
Mon Sep 30 22:41:29 EDT 2019


https://github.com/python/cpython/commit/cf57cabef82c4689ce9796bb1fcdb125fa05efcb
commit: cf57cabef82c4689ce9796bb1fcdb125fa05efcb
branch: master
author: Maxwell A McKinnon <maxwell.mckinnon at maximintegrated.com>
committer: Miss Islington (bot) <31488909+miss-islington at users.noreply.github.com>
date: 2019-09-30T19:41:16-07:00
summary:

bpo-32689: Updates shutil.move to allow for Path objects to be used as source arg (GH-15326)



Important work originally done by @emilyemorehouse two years ago and nearly ready to go in.

This bug has affected many people and in some cases has been a dealbreaker to the adoption of the otherwise wonderful pathlib and PEP519. https://stackoverflow.com/questions/33625931/copy-file-with-pathlib-in-python.

This adds the outstanding test request from that PR @vstinner (https://github.com/python/cpython/pull/5393).

Test fails without the change, passes with it, along with every other test in test_shutil.

Some variants were experimented with to make the one line change and the most performant one was picked.


# Added Test for PathLike directory destination, the current fail case

```
Lib/test/test_shutil.py::TestMove::test_move_file_pathlike FAILED                                                               [100%]

============================================================== FAILURES ===============================================================
__________________________________________________ TestMove.test_move_file_pathlike ___________________________________________________

self = <test.test_shutil.TestMove testMethod=test_move_file_pathlike>

    def test_move_file_pathlike(self):
        # Move a file to another location on the same filesystem.
        src = pathlib.Path(self.src_file)
>       self._check_move_file(src, self.dst_dir, self.dst_file)

Lib/test/test_shutil.py:1563:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Lib/test/test_shutil.py:1545: in _check_move_file
    shutil.move(src, dst)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/shutil.py:562: in move
    real_dst = os.path.join(dst, _basename(src))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

path = PosixPath('/var/folders/r2/psq74t5x3nbfzlph8bh2pvdw0000gn/T/tmp9ie0wh9_/foo')

    def _basename(path):
        # A basename() variant which first strips the trailing slash, if present.
        # Thus we always get the last component of the path, even for directories.
        sep = os.path.sep + (os.path.altsep or '')
>       return os.path.basename(path.rstrip(sep))
E       AttributeError: 'PosixPath' object has no attribute 'rstrip'

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/shutil.py:526: AttributeError
============================================== 1 failed, 102 deselected in 0.30 seconds ===============================================
```

After change:

```
========================================================= test session starts =========================================================
platform darwin -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /Users/maxwellmckinnon/.venvs/TA3.7/bin/python3.7
cachedir: .pytest_cache
rootdir: /Users/maxwellmckinnon/dev/cpython
plugins: cov-2.7.1, mock-1.10.4
collected 103 items / 102 deselected / 1 selected

Lib/test/test_shutil.py::TestMove::test_move_file_pathlike PASSED                                                               [100%]

============================================== 1 passed, 102 deselected in 0.06 seconds ===============================================
```

Running all the tests in test_shutil.py
```
╰─ pytest Lib/test/test_shutil.py -v
========================================================= test session starts =========================================================
platform darwin -- Python 3.7.4, pytest-5.0.1, py-1.8.0, pluggy-0.12.0 -- /Users/maxwellmckinnon/.venvs/TA3.7/bin/python3.7
cachedir: .pytest_cache
rootdir: /Users/maxwellmckinnon/dev/cpython
plugins: cov-2.7.1, mock-1.10.4
collected 103 items

Lib/test/test_shutil.py::TestShutil::test_chown PASSED                                                                          [  0%]
Lib/test/test_shutil.py::TestShutil::test_copy PASSED                                                                           [  1%]
...
Lib/test/test_shutil.py::TermsizeTests::test_stty_match SKIPPED                                                                 [ 99%]
Lib/test/test_shutil.py::PublicAPITests::test_module_all_attribute PASSED                                                       [100%]

================================================ 96 passed, 7 skipped in 1.25 seconds =================================================
```

# Performance Considerations
Is it considered poor form to get rid of _basename altogether and make use of pathlib in the move function? I'm not sure if the idea is for all these modules to strictly avoid circular dependencies. They are already using os.path which is just as much a citizen in 3.8 as pathlib right?

e.g.

`real_dst = os.path.join(dst, _basename(src))`
becomes
`real_dst = Path(dst) / Path(src).name`

I've looked around and familiarized myself, and I now think importing pathlib here is fine. My only remaining concern is that of performance.

Here's the performance difference for this step. 

```
In [46]: %timeit real_dst = os.path.join("a/b/c", _basename('b/'))
2.71 µs ± 62.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [47]: %timeit real_dst = Path("a/b/c") / Path('b/').name
12.4 µs ± 65.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
```

Is 10us significant or insignificant compared to the least expensive operation this function will do? I don't know. Let's find out.

```
In [55]: %timeit os.rename('/tmp/a/a.txt', '/tmp/a/b.txt'); os.rename('/tmp/a/b.txt', '/tmp/a/a.txt')
124 µs ± 2.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
62us to rename. 10us seems significant enough that we wouldn't want to favor the Path sugar suggestion. 16% speed decrease from adding the 10us.

What do people think? I was hoping to get to use pathlib.Path here, but I suspect for this low level move, it should be as fast as possible, and 16% is not worth one line of sugary code to me.



https://bugs.python.org/issue32689



Automerge-Triggered-By: @gvanrossum

files:
A Misc/NEWS.d/next/Library/2018-02-13-17-58-30.bpo-32689.a-3SnH.rst
M Doc/library/shutil.rst
M Lib/shutil.py
M Lib/test/test_shutil.py

diff --git a/Doc/library/shutil.rst b/Doc/library/shutil.rst
index 174b7e875a38..59390d0e907e 100644
--- a/Doc/library/shutil.rst
+++ b/Doc/library/shutil.rst
@@ -355,6 +355,9 @@ Directory and files operations
       copy the file more efficiently. See
       :ref:`shutil-platform-dependent-efficient-copy-operations` section.
 
+   .. versionchanged:: 3.9
+      Accepts a :term:`path-like object` for both *src* and *dst*.
+
 .. function:: disk_usage(path)
 
    Return disk usage statistics about the given path as a :term:`named tuple`
diff --git a/Lib/shutil.py b/Lib/shutil.py
index 5c1255a6713d..f0d03366368e 100644
--- a/Lib/shutil.py
+++ b/Lib/shutil.py
@@ -730,8 +730,20 @@ def onerror(*args):
 rmtree.avoids_symlink_attacks = _use_fd_functions
 
 def _basename(path):
-    # A basename() variant which first strips the trailing slash, if present.
-    # Thus we always get the last component of the path, even for directories.
+    """A basename() variant which first strips the trailing slash, if present.
+    Thus we always get the last component of the path, even for directories.
+
+    path: Union[PathLike, str]
+
+    e.g.
+    >>> os.path.basename('/bar/foo')
+    'foo'
+    >>> os.path.basename('/bar/foo/')
+    ''
+    >>> _basename('/bar/foo/')
+    'foo'
+    """
+    path = os.fspath(path)
     sep = os.path.sep + (os.path.altsep or '')
     return os.path.basename(path.rstrip(sep))
 
@@ -769,7 +781,10 @@ def move(src, dst, copy_function=copy2):
             os.rename(src, dst)
             return
 
+        # Using _basename instead of os.path.basename is important, as we must
+        # ignore any trailing slash to avoid the basename returning ''
         real_dst = os.path.join(dst, _basename(src))
+
         if os.path.exists(real_dst):
             raise Error("Destination path '%s' already exists" % real_dst)
     try:
diff --git a/Lib/test/test_shutil.py b/Lib/test/test_shutil.py
index ab0f96d25135..428d4f341715 100644
--- a/Lib/test/test_shutil.py
+++ b/Lib/test/test_shutil.py
@@ -1835,6 +1835,16 @@ def test_move_file_to_dir(self):
         # Move a file inside an existing dir on the same filesystem.
         self._check_move_file(self.src_file, self.dst_dir, self.dst_file)
 
+    def test_move_file_to_dir_pathlike_src(self):
+        # Move a pathlike file to another location on the same filesystem.
+        src = pathlib.Path(self.src_file)
+        self._check_move_file(src, self.dst_dir, self.dst_file)
+
+    def test_move_file_to_dir_pathlike_dst(self):
+        # Move a file to another pathlike location on the same filesystem.
+        dst = pathlib.Path(self.dst_dir)
+        self._check_move_file(self.src_file, dst, self.dst_file)
+
     @mock_rename
     def test_move_file_other_fs(self):
         # Move a file to an existing dir on another filesystem.
diff --git a/Misc/NEWS.d/next/Library/2018-02-13-17-58-30.bpo-32689.a-3SnH.rst b/Misc/NEWS.d/next/Library/2018-02-13-17-58-30.bpo-32689.a-3SnH.rst
new file mode 100644
index 000000000000..d435351a3a75
--- /dev/null
+++ b/Misc/NEWS.d/next/Library/2018-02-13-17-58-30.bpo-32689.a-3SnH.rst
@@ -0,0 +1,2 @@
+Update :func:`shutil.move` function to allow for Path objects to be used as
+source argument. Patch by Emily Morehouse and Maxwell "5.13b" McKinnon.



More information about the Python-checkins mailing list