[Python-ideas] Please consider skipping hidden directories in os.walk, os.fwalk, etc.

Steve Barnes gadgetsteve at live.co.uk
Mon May 7 02:05:15 EDT 2018


In a lot of uses of os.walk it is desirable to skip version control 
directories, (which are usually hidden directories), to the point that 
almost all of the examples given look like:

import os
for root, dirs, files in os.walk(some_dir):
     if 'CVS' in dirs:
         dirs.remove('CVS')  # or .svn or .hg etc.
     # do something...

But of course there are many version control systems to the point that 
much of my personal code looks like, (note that I have to use a 
multitude of version control systems due to project requirements):


import os
vcs_dirs = ['.hg', '.svn', 'CSV', '.git', '.bz']  # Version control 
directory names I know


for root, dirs, files in os.walk(some_dir):
     for dirname in vcs_dirs:
         dirs.remove(dirname)

I am sure that I am missing many other version control systems but the 
one thing that all of the ones that I am familiar with default to 
creating their files in hidden directories. I know that the above 
sometimes hits problems on Windows if someone manually created a 
directory and you end up with abortions such as Csv\ or .SVN ....

Since it could be argued that hidden directories are possibly more 
common than simlinks, (especially in the Windows world of course), and 
that hidden directories have normally been hidden by someone for a 
reason it seems to make sense to me to normally ignore them in directory 
traversal.

Obviously there are also occasions when it makes sense to include VCS, 
or other hidden, directories files, (e.g. "Where did all of my disk 
space go?" or "delete recursively"), so I would like to suggest 
including in the os.walk family of functions an additional parameter to 
control skipping all hidden directories - either positively or negatively.

Names that spring to mind include:
  * nohidden
  * nohidden_dirs
  * hidden
  * hidden_dirs

This change could be made with no impact on current behaviour by 
defaulting to hidden=True (or nohidden=False) which would just about 
ensure that no existing code is broken or quite a few bugs in existing 
code could be quietly fixed, (and some new ones introduced), by 
defaulting to this behaviour.

Since the implementation of os.walk has changed to use os.scandir which 
exposes the returned file statuses in the os.DirEntry.stat() the 
overhead should be minimal.

An alternative would be to add another new function, say os.vwalk(), to 
only walk visible entries.

Note that a decision would have to be made on whether to include such 
filtering when topdown is False, personally I am tempted to include the 
filtering so as to maintain consistency but ignoring the filter when 
topdown is False, (or if topdown is False and the hidden behaviour is 
unspecified), might make sense if the skipping of hidden directories 
becomes the new default (then recursively removing files & directories 
would still include processing hidden items by default).

If this receives a positive response I would be happy to undertake the 
effort involved in producing a PR.
-- 
Steve (Gadget) Barnes
Any opinions in this message are my personal opinions and do not reflect 
those of my employer.

---
This email has been checked for viruses by AVG.
http://www.avg.com



More information about the Python-ideas mailing list