Backward compatibility of shutil.rmtree

Hi, as our shutil.rmtree() is vulnerable to symlink attacks (see <http://bugs.python.org/issue4489>) I’ve implemented a safe version using os.fwalk() and os.unlinkat() for Python 3.3. Now we face a problem I’d like a broad opinion on: rmtree has a callback hook called `onerror` that that gets called with amongst others the function that caused the error (see <http://docs.python.org/dev/library/shutil.html#shutil.rmtree>). Two of them differ in the new version: os.fwalk() is used instead of os.listdir() and os.unlinkat() instead of os.remove(). The safe version is used transparently if available, so this could potentially break code. Also it would mean that rmtree would behave differently on Linux & OS X for example. I’ve been thinking to "fake" the function names, as they map pretty good anyway. I.e. call onerror with os.listdir if os.fwalk failed and with os.remove instead of os.unlinkat. That could also make sense if some kind soul writes a safe rmtree for Windows or OS X so the function works the same across all platforms. It's a bit ugly though, a cleaner way would be to start using well defined symbols, but that would break code for sure. Opinions? Cheers, Hynek

Two of them differ in the new version: os.fwalk() is used instead of os.listdir() and os.unlinkat() instead of os.remove().
It would be os.flistdir instead of os.listdir, not os.fwalk, right? The way this interface is defined, it's IMO best to do "precise" reporting, i.e. pass the exact function that caused the failure. I'd weaken the documentation to just specify that the error-causing function is reported, indicating that the exact set of functions may depend on the operating system and change across Python versions. Regards, Martin

It’s actually os.fwalk. It has been implemented by Charles-François as a dependency of the ticket because it seemed generally useful – therefore I used it for the implementation. (There has been also been the idea to re-implement the default rmdir with os.walk to make them more similar; but that's a different story.)
So you suggest to not mention all the possible functions at all? That seems useful to me, as the list will (hopefully) grow anyway and nailing it down is getting less useful with every new implementation. Regards, Hynek

Zitat von Hynek Schlawack <hs@ox.cx>:
I think that's a mistake then, because of the limited error reporting. With os.fwalk, you don't know exactly what it is that failed, but it may be useful to know. So I propose to duplicate the walking in rmtree. I also wonder how exactly in your implementation directory handles get closed, and how that correlates to attempts at removing the directories.
(There has been also been the idea to re-implement the default rmdir with os.walk to make them more similar; but that's a different story.)
-1 on that, for the reasons above.
Exactly. Users would have to look at the code, but that will make them aware that the code may change. For that reason, also, using fwalk is a bad idea, since they then will need to trace their code reading into fwalk. Regards, Martin

Am 20.05.12 23:46, schrieb martin@v.loewis.de:
Well, as fwalk does only directory traversing, it means that something went wrong while doing so. The exception should be more helpful at this point, no?
So I propose to duplicate the walking in rmtree.
I'm -1 on that one; the information gain doesn’t seem that big to me and doing fwalk right isn't trivial (see <http://hg.python.org/cpython/file/e0f997a7aaa5/Lib/os.py#l305>). It’s easy to do a copy’n’paste now but the trade-off of having to maintain both for a bit more of information from a high level function doesn’t seem worth to me.
Directory handles get closed inside of fwalk (try/finally) – but I think it’s easier if you take a quick look yourself before I explain things to you you didn’t want to know. :) Regards, Hynek
participants (3)
-
"Martin v. Löwis"
-
Hynek Schlawack
-
martin@v.loewis.de