I don't know that anything actually need to be addressed here at all. Struggling to see the real problem that needs to be solved means a bit of guesswork at what's relevant to the solution...
Yes, that's why I gave a few examples, using my stripped-down and Pythonicized wrapper, so you don't have to work it all out from scratch by trying to read the manpage and guess how you'd use it in C. But the point is, that's what something as flexible as find looks like as a function.
Yes--as I said below, sometimes you really do want to go a directory at a time, and for that, it's hard to beat the API of os.walk. But when it's unnecessary, it makes the code look more complicated than necessary, so a flat iteration can be nicer. And, significantly, that, and the need to join all over the place, are the only things I can imagine that people would find worth "solving" about os.walk's API.
Same reason this code uses with and a for loop:
with open(path) as f:
for line in f:
do_stuff(line)
Cleaning up a file handle isn't _terribly_ important, but it's not _unimportant_, and isn't it generally a good habit?
Explaining the details of the API design takes this even farther off-topic, but: my initial design was based on the same Path class that the stdlib's Path is: a subclass of str that adds attributes/properties for things that are immediately available and methods for things that aren't. (The names are de-abbreviated versions of the C names.) As for stat, for one thing, people already have code (and mental models) to deal with stat (named)tuples. Plus, if you request a fast walk without stat information (which often goes considerably faster than scandir--I've got a a Python tool that actually _beats_ the find invocation it replaced), or the stat on a file fails, I think it's clearer to have "stat" be None than to have 11-18 arbitrary attributes be None while the rest are still there.
At any rate, I was planning to take another pass at the design after finishing the Windows and generic implementations, but the project I was working on turned out to need this only for OS X, so I never got to that point.
The first example under os.walk in the library docs is identical to the wiki spool example, except the first line points at subpackages of the stdlib email package instead of the top email spool directory, and an extra little bit was added at the end:
for root, dirs, files in os.walk('python/Lib/email'):
print(root, "consumes", end=" ")
print(sum(getsize(join(root, name)) for name in files), end=" ")
print("bytes in", len(files), "non-directory files")
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
So, take that instead. Perfectly good example. And, while you could write that with a flat Iterator in a number of ways, none are going to be as simple as with two levels.
The question is what code that uses (duck-typed) Path objects expects. I'm pretty sure there was extensive discussion of why Paths should never cache during the PEP 428 discussions, and I vaguely remember both Antoine Pitrou and Nick Coghlan giving good summaries more recently, but I don't remember enough details to say whether a duck-typed Path-like object would be just as bad. But I'm guessing it could have the same problems--if some function takes a Path object, stores it for later, and expects to use it to get live info, handing it something that quacks like a Path but returns snapshot info instead would be pretty insidious.
Or just Path.glob with ** in the pattern.
So, did Antoine Pitrou already solve this problem 3 years ago (or Jason Orendorff many years before that), possibly barring a minor docs tweak, or is there still something to consider here?
I agree with everything here. I believe Path.glob can do everything he needs, and what he asked for instead couldn't do any more.
It's dead-easy to imperatively apply a regex to decide whether to prune each dir in walk (or fts). Or to do the same to the joined path or the abspath. Or to use fnmatch instead of regex, or an arbitrary predicate function. Or to reverse the sense to mean only recurse on these instead of skip these. Imagine what a declarative API that allowed all that would look like. Even find doesn't have any of those options (at least not portably), and most people have to read guides to the manpage before they can read the manpage.
At any rate, there's no reason you couldn't add some regex methods to Path and/or special Path handling code to regex to make that imperative code slightly easier, but I don't see how "pattern.match(str(path))" is any worse than "os.scandir(str(path))" or "json.load(str(path))" or any of the zillion other places where you have to convert paths to strings explicitly, or what makes regex more inherently path-related than those things.