[issue13779] os.walk: bottom-up
New submission from patrick vrijlandt <patrick.vrijlandt@gmail.com>: PythonWin 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32. Portions Copyright 1994-2008 Mark Hammond - see 'Help/About PythonWin' for further copyright information.
import os os.makedirs("g:/a/b/c") os.listdir("g:/a") ['b'] for root, dirs, files in os.walk("g:/a", topdown = False): ... print(root, dirs, files, os.listdir(root)) ... os.rmdir(root) ... g:/a\b\c [] [] [] g:/a\b ['c'] [] [] g:/a ['b'] [] []
From the documentation of os.walk: If topdown is False, the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up).
As the above example shows, the directories are generated in the correct order, "generated" referring to yield from generator os.walk. However, the generated (files? and) dirs do not necessarily reflect the current situation as produced by os.listdir. Therefore, this does not clear the entire directory tree as I would expect.
os.makedirs("g:/a/b/c") for root, dirs, files in os.walk("g:/a", topdown = False): ... print(root, dirs, files, os.listdir(root)) ... if not (files + dirs): ... os.rmdir(root) ... g:/a\b\c [] [] [] g:/a\b ['c'] [] [] g:/a ['b'] [] ['b']
I think that at least the documentation should be more clear on this issue. I would like even better, if files + dirs would match os.listdir on the moment they are generated (=yielded). ---------- assignee: docs@python components: Documentation messages: 151169 nosy: docs@python, patrick.vrijlandt priority: normal severity: normal status: open title: os.walk: bottom-up type: behavior versions: Python 3.2 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
Changes by Ross Lagerwall <rosslagerwall@gmail.com>: ---------- nosy: +rosslagerwall _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
Zbyszek Szmek <zbyszek@in.waw.pl> added the comment: Hi, I think that the documentation is pretty clear ("[if topdown=False] ... the directories in dirnames have already been generated by the time dirnames itself is generated"). And such behaviour is what one would expect, since it is the result of the simplest implementation (listdir(), yield <subdir>, yield <subdir>, yeild <subdir>, yeild <dir>). I don't think that the implementation will be changed, since it is pretty to do listdir() yourself if needed, and it would make the function slower for the common use case. Improving the documentation is always possible, what sentence would you see added (what would make the behaviour clearer to you?) ? ---------- nosy: +zbysz _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
Antoine Pitrou <pitrou@free.fr> added the comment:
However, the generated (files? and) dirs do not necessarily reflect the current situation as produced by os.listdir.
What do you mean exactly? Another process has re-created "b" in parallel? This race condition is pretty impossible to solve in the general case (unless the filesystem is transactional): even if you call os.listdir() again, a new file or dir could appear just before the following call to os.rmdir(). (just for the record, shutil.rmtree already allows you to delete a filesystem tree) ---------- nosy: +pitrou _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
patrick vrijlandt <patrick.vrijlandt@gmail.com> added the comment: The documentation of this function is generally ambiguous, because os.walk is a generator. Thus "generate" means (1) yielded from a generator and (2) prepared for later use within the generator. To avoid the ambiguity, "generate" should be avoided if possible. (1) might be replaced by "return" and (2) by "prepare". @zbysz The paragraph you cite is about "Modifying dirnames ". If you are not "Modifying dirnames" (like me) this section easily skips your attention. My problem was, of course, that I had myself changed the directory structure - It's not a race condition with another process but an effect of the loop os.walk was managing. @pitrou shutil.rmtree cannot help me, because I'm only deleting empty dirs. Proposal (very verbose I'm afraid): If you change the directory structure below dirpath while topdown=True, you can modify dirnames in-place so that walk will visit the right directories. If you change the directory structure below dirpath while topdown=False (maybe while you where there), dirnames and filenames can still reflect the old situation and it might be necessary to call os.listdir again. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
Zbyszek Szmek <zbyszek@in.waw.pl> added the comment:
The documentation of this function is generally ambiguous, because os.walk is a generator. Thus "generate" means (1) yielded from a generator and (2) prepared for later use within the generator. To avoid the ambiguity, "generate" should be avoided if possible. (1) might be replaced by "return" and (2) by "prepare". It think that you are wrong here: generate is consistently used in meaning (1) in the docstring. Also "return" means something completely different. "generate" is better than "prepare", because it is more specific, especially from the POV of the user of the function, who only cares about when something is yielded, not when it is prepared.
Proposal (very verbose I'm afraid): If you change the directory structure below dirpath while topdown=True, you can modify dirnames in-place so that walk will visit the right directories. If you change the directory structure below dirpath while topdown=False (maybe while you where there), dirnames and filenames can still reflect the old situation and it might be necessary to call os.listdir again. Hm, I think that the difference in behaviour between topdown=False and topdown=True is smaller then this proposal implies. When topdown=True, the list is not updated after changes on disk. So both removing and adding directories does _not_ cause them to added or removed from the list of subdirectories to visit. Nevertheless, the default behaviour on error is to do nothing, so it _looks_ like they were skipped.
So I think that the documentation could only clarify that the list of subdirectories is queried first, before the dir or the subdirectories are "generated". ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
Zbyszek Szmek <zbyszek@in.waw.pl> added the comment:
os.makedirs('/tmp/a/b/c') gen = os.walk('/tmp/a') next(gen) ('/tmp/a', ['b'], []) os.makedirs('/tmp/a/b2') next(gen) ('/tmp/a/b', ['c'], []) next(gen) ('/tmp/a/b/c', [], [])
gen = os.walk('/tmp/a', onerror=print) next(gen) ('/tmp/a', ['b2', 'b'], []) os.rmdir('/tmp/a/b2') next(gen) [Errno 2] No such file or directory: '/tmp/a/b2' ('/tmp/a/b', ['c'], [])
---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
Zbyszek Szmek <zbyszek@in.waw.pl> added the comment: Maybe this could be closed? I'm attaching a small patch with a documentation clarification. (Adds "In either case the list of subdirectories is retrieved before the tuples for the directory and its subdirectories are generated."). ---------- keywords: +patch Added file: http://bugs.python.org/file24603/issue13779.patch _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
patrick vrijlandt <patrick.vrijlandt@gmail.com> added the comment: Good solution. +1 for closing. Patrick ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
Mark Lawrence added the comment: The OP is happy with the proposed doc patch so can we have this committed please. ---------- nosy: +BreamoreBoy _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
Roundup Robot added the comment: New changeset 351c1422848f by Benjamin Peterson in branch '2.7': clarify when the list of subdirectories is read (closes #13779) http://hg.python.org/cpython/rev/351c1422848f New changeset b10322b5ef0f by Benjamin Peterson in branch '3.4': clarify when the list of subdirectories is read (closes #13779) http://hg.python.org/cpython/rev/b10322b5ef0f New changeset 1411df211159 by Benjamin Peterson in branch 'default': merge 3.4 (#13779) http://hg.python.org/cpython/rev/1411df211159 ---------- nosy: +python-dev resolution: -> fixed stage: -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue13779> _______________________________________
participants (6)
-
Antoine Pitrou
-
Mark Lawrence
-
patrick vrijlandt
-
Ross Lagerwall
-
Roundup Robot
-
Zbyszek Szmek