Following up on this, in theory the right way to walk a tree using pathlib already exists, it's the rglob() method. E.g. all paths under /foo/bar should be found as follows:
for path in pathlib.Path('/foo/bar').rglob('**/*'): print(path)
The PermissionError bug you found is already reported: http://bugs.python.org/issue24120 -- it even has a patch but it's stuck in review.
Sadly there's another error: loops introduced by symlinks cause infinite recursion. I filed that here: http://bugs.python.org/issue26012. (The fix should be judicious use of is_symlink(), but the code is a little convoluted.)
On Mon, Dec 28, 2015 at 11:25 AM, Chris Barker chris.barker@noaa.gov wrote:
On Tue, Dec 22, 2015 at 4:23 PM, Guido van Rossum guido@python.org wrote:
The two-level iteration forced upon you by os.walk() is indeed often unnecessary -- but handling dirs and files separately usually makes sense,
indeed, but not always, so a simple API that allows you to get a flat walk would be nice....
Of course for that basic use case, you could just write your own wrapper
around os.walk:
sure, but having to write "little" wrappers for common needs is unfortunate...
The problem isn't designing a nice walk API; it's integrating it with
pathlib.*
indeed -- I'd really like to see a *walk in pathlib itself. I've been trying to use pathlib whenever I need, well, a path, but then I find I almost immediately need to step out and use an os.path function, and have to string-fy it anyway -- makes me wonder what the point is..
And honestly, if open, os.walk, etc. aren't going to work with Path
objects,
but they should -- of course they should.....
Truly pushing for adoption of a new abstraction like this takes many years
-- pathlib was new (and provisional) in 3.4 so it really hasn't been long enough to give up on it. The OP hasn't!
it will take many years for sure -- but the standard library cold at least adopt it as much as possible.
Path.walk would be a nice start :-)
My example: one of our sysadmins wanted a little script to go thorugh an entire drive (Windows), and check if any paths were longer than 256 characters (Windows, remember..)
I came up with this:
def get_all_paths(start_dir='/'): for dirpath, dirnames, filenames in os.walk(start_dir): for filename in filenames: yield os.path.join(dirpath, filename)
too_long = [] for p in get_all_paths('/'): print("checking:", p) if len(p) > 255: too_long.append(p) print("Path too long!")
way too wordy!
I started with pathlib, but that just made it worse.
now that I think about it, maybe I could have simpily used pathlib.Path.rglob....
However, when I try that, I get a permission error:
/Users/chris.barker/miniconda2/envs/py3/lib/python3.5/pathlib.py in wrapped(pathobj, *args)
369 @functools.wraps(strfunc) 370 def wrapped(pathobj, *args):
--> 371 return strfunc(str(pathobj), *args) 372 return staticmethod(wrapped) 373
PermissionError: [Errno 13] Permission denied: '/Users/.chris.barker.xahome/caches/opendirectory'
as the error comes insider the rglob() generator, I'm not sure how to tell it to ignore and move on....
os.walk is somehow able to deal with this.
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov