[New-bugs-announce] [issue32453] shutil.rmtree can have O(n^2) performance on large dirs
report at bugs.python.org
Sat Dec 30 00:15:09 EST 2017
New submission from Niklas Hambüchen <nh2 at deditus.de>:
See http://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=24412edeaf556a for the explanation and equivalent fix in coreutils.
The gist ist that deleting entries in inode order can improve deletion performance dramatically.
To obtain inode numbers and sort by them, one needs to `getdents()` all the entries ahead of time, but rmtree() already gets all dirents ahead of the deletion. https://bugs.python.org/issue28564 recently improved shutil.rmtree() performance by using scandir(), but nevertheless the returned generator is list()ed, so we already have all necessary informtaion in memory and would just have to perform an O(n) integer sort.
I propose we check if the improvements made to `rm -r` in coreutils should be ported to shutil.rmtree().
components: Library (Lib)
title: shutil.rmtree can have O(n^2) performance on large dirs
versions: Python 3.7
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce