[Python-ideas] Please consider skipping hidden directories in os.walk, os.fwalk, etc.
Steve Barnes
gadgetsteve at live.co.uk
Mon May 7 02:05:15 EDT 2018
In a lot of uses of os.walk it is desirable to skip version control
directories, (which are usually hidden directories), to the point that
almost all of the examples given look like:
import os
for root, dirs, files in os.walk(some_dir):
if 'CVS' in dirs:
dirs.remove('CVS') # or .svn or .hg etc.
# do something...
But of course there are many version control systems to the point that
much of my personal code looks like, (note that I have to use a
multitude of version control systems due to project requirements):
import os
vcs_dirs = ['.hg', '.svn', 'CSV', '.git', '.bz'] # Version control
directory names I know
for root, dirs, files in os.walk(some_dir):
for dirname in vcs_dirs:
dirs.remove(dirname)
I am sure that I am missing many other version control systems but the
one thing that all of the ones that I am familiar with default to
creating their files in hidden directories. I know that the above
sometimes hits problems on Windows if someone manually created a
directory and you end up with abortions such as Csv\ or .SVN ....
Since it could be argued that hidden directories are possibly more
common than simlinks, (especially in the Windows world of course), and
that hidden directories have normally been hidden by someone for a
reason it seems to make sense to me to normally ignore them in directory
traversal.
Obviously there are also occasions when it makes sense to include VCS,
or other hidden, directories files, (e.g. "Where did all of my disk
space go?" or "delete recursively"), so I would like to suggest
including in the os.walk family of functions an additional parameter to
control skipping all hidden directories - either positively or negatively.
Names that spring to mind include:
* nohidden
* nohidden_dirs
* hidden
* hidden_dirs
This change could be made with no impact on current behaviour by
defaulting to hidden=True (or nohidden=False) which would just about
ensure that no existing code is broken or quite a few bugs in existing
code could be quietly fixed, (and some new ones introduced), by
defaulting to this behaviour.
Since the implementation of os.walk has changed to use os.scandir which
exposes the returned file statuses in the os.DirEntry.stat() the
overhead should be minimal.
An alternative would be to add another new function, say os.vwalk(), to
only walk visible entries.
Note that a decision would have to be made on whether to include such
filtering when topdown is False, personally I am tempted to include the
filtering so as to maintain consistency but ignoring the filter when
topdown is False, (or if topdown is False and the hidden behaviour is
unspecified), might make sense if the skipping of hidden directories
becomes the new default (then recursively removing files & directories
would still include processing hidden items by default).
If this receives a positive response I would be happy to undertake the
effort involved in producing a PR.
--
Steve (Gadget) Barnes
Any opinions in this message are my personal opinions and do not reflect
those of my employer.
---
This email has been checked for viruses by AVG.
http://www.avg.com
More information about the Python-ideas
mailing list