[Python-ideas] Please consider skipping hidden directories in os.walk, os.fwalk, etc.

Nathaniel Smith njs at pobox.com
Wed May 9 11:04:22 EDT 2018

There are hidden directories, and then there are hidden directories :-). It
makes sense to me to add an option to the stdlib functions to skip
directories (and files) that the system considers hidden, so I guess that
means dotfiles on Unix and files with the hidden attribute on Windows. But
if you want "smart" matching that has special knowledge of CVS directories
and so forth, then that seems like something that would fit better as a
library on PyPI.

The rust "ignore" crate has a pretty good set of semantics, for reference.
It's not trivial, but it sure is handy :-):



On Tue, May 8, 2018, 00:43 Steve Barnes <gadgetsteve at live.co.uk> wrote:

> In a lot of uses of os.walk it is desirable to skip version control
> directories, (which are usually hidden directories), to the point that
> almost all of the examples given look like:
> import os
> for root, dirs, files in os.walk(some_dir):
>      if 'CVS' in dirs:
>          dirs.remove('CVS')  # or .svn or .hg etc.
>      # do something...
> But of course there are many version control systems to the point that
> much of my personal code looks like, (note that I have to use a
> multitude of version control systems due to project requirements):
> import os
> vcs_dirs = ['.hg', '.svn', 'CSV', '.git', '.bz']  # Version control
> directory names I know
> for root, dirs, files in os.walk(some_dir):
>      for dirname in vcs_dirs:
>          dirs.remove(dirname)
> I am sure that I am missing many other version control systems but the
> one thing that all of the ones that I am familiar with default to
> creating their files in hidden directories. I know that the above
> sometimes hits problems on Windows if someone manually created a
> directory and you end up with abortions such as Csv\ or .SVN ....
> Since it could be argued that hidden directories are possibly more
> common than simlinks, (especially in the Windows world of course), and
> that hidden directories have normally been hidden by someone for a
> reason it seems to make sense to me to normally ignore them in directory
> traversal.
> Obviously there are also occasions when it makes sense to include VCS,
> or other hidden, directories files, (e.g. "Where did all of my disk
> space go?" or "delete recursively"), so I would like to suggest
> including in the os.walk family of functions an additional parameter to
> control skipping all hidden directories - either positively or negatively.
> Names that spring to mind include:
>   * nohidden
>   * nohidden_dirs
>   * hidden
>   * hidden_dirs
> This change could be made with no impact on current behaviour by
> defaulting to hidden=True (or nohidden=False) which would just about
> ensure that no existing code is broken or quite a few bugs in existing
> code could be quietly fixed, (and some new ones introduced), by
> defaulting to this behaviour.
> Since the implementation of os.walk has changed to use os.scandir which
> exposes the returned file statuses in the os.DirEntry.stat() the
> overhead should be minimal.
> An alternative would be to add another new function, say os.vwalk(), to
> only walk visible entries.
> Note that a decision would have to be made on whether to include such
> filtering when topdown is False, personally I am tempted to include the
> filtering so as to maintain consistency but ignoring the filter when
> topdown is False, (or if topdown is False and the hidden behaviour is
> unspecified), might make sense if the skipping of hidden directories
> becomes the new default (then recursively removing files & directories
> would still include processing hidden items by default).
> If this receives a positive response I would be happy to undertake the
> effort involved in producing a PR.
> --
> Steve (Gadget) Barnes
> Any opinions in this message are my personal opinions and do not reflect
> those of my employer.
> ---
> This email has been checked for viruses by AVG.
> http://www.avg.com
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180509/08ef9494/attachment-0001.html>

More information about the Python-ideas mailing list