New GitHub issue #111321 from cordery:<br>
<hr>
<pre>
# Documentation
When iterating through filenames in a directory tree using `os.walk` and the `topdown=True` option, the documentation indicates that we may mutate the dirnames list to alter the list of dirnames that will be recursed.
## The Problem
This is a useful behavior however it is rather unusual and can lead the unwary programmer to write code they might not otherwise. For example here is what I imagine is one of the most likely usage patterns for this function, from Django:
Line 93 of django/core/management/commands/compilemessages.py:
```python
for dirpath, dirnames, filenames in os.walk(".", topdown=True):
for dirname in dirnames:
if is_ignored_path(
os.path.normpath(os.path.join(dirpath, dirname)), ignore_patterns
):
dirnames.remove(dirname)
elif dirname == "locale":
basedirs.append(os.path.join(dirpath, dirname))
```
Notice that dirnames is being modified in place via the `dirnames.remove(dirname)` line. This will cause any dirname that is directly after an 'ignored' name to no longer be visited by the iterating code, I imagine because removing that list member means the internal iterator index is now pointing at what was formerly the next item in the list.
Changing the line to ```for dirname in list(dirnames):``` would correct the problem by iterating through a copy instead of the original that we are modifying.
Here is the issue in the Django tracker for further exploration and example code: https://code.djangoproject.com/ticket/34925#comment:3
## Possible Fix
I think that perhaps the following paragraph of the documentation could be amended:
> When topdown is True, the caller can modify the dirnames list in-place (perhaps using [del](https://docs.python.org/3/reference/simple_stmts.html#del) or slice assignment), and [walk()](https://docs.python.org/3/library/os.html#os.walk) will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform [walk()](https://docs.python.org/3/library/os.html#os.walk) about directories the caller creates or renames before it resumes [walk()](https://docs.python.org/3/library/os.html#os.walk) again. Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.
To (added in bold):
> When topdown is True, the caller can modify the dirnames list in-place (perhaps using [del](https://docs.python.org/3/reference/simple_stmts.html#del) or slice assignment), and [walk()](https://docs.python.org/3/library/os.html#os.walk) will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform [walk()](https://docs.python.org/3/library/os.html#os.walk) about directories the caller creates or renames before it resumes [walk()](https://docs.python.org/3/library/os.html#os.walk) again.
> **Caution: Do not modify the dirnames list while iterating through it or you may unintentionally skip names. Instead, iterate through a copy of dirnames (ex: `for dirname in list(dirnames)`).**
> Modifying dirnames when topdown is False has no effect on the behavior of the walk, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.
Or something to that effect. Thoughts?
</pre>
<hr>
<a href="https://github.com/python/cpython/issues/111321">View on GitHub</a>
<p>Labels: docs</p>
<p>Assignee: </p>