On Dec 28, 2015, at 11:25, Chris Barker <chris.barker@noaa.gov> wrote:

On Tue, Dec 22, 2015 at 4:23 PM, Guido van Rossum <guido@python.org> wrote:
The two-level iteration forced upon you by os.walk() is indeed often unnecessary -- but handling dirs and files separately usually makes sense,

indeed, but not always, so a simple API that allows you to get a flat walk would be nice....

Of course for that basic use case, you could just write your own wrapper around os.walk:

sure, but having to write "little" wrappers for common needs is unfortunate...

You're replying to me, not Guido, here...

Anyway, if the only thing anyone will ever need is a handful of simple one-liners that even a novice could write, maybe it's reasonable to just add one to the docs to show how to do it, instead of adding them to the stdlib.

The problem isn't designing a nice walk API; it's integrating it with pathlib.*

indeed -- I'd really like to see a *walk in pathlib itself.

But first you have to solve the problem that paragraph was all about: a general-purpose walk API shouldn't be throwing away all that stat information it wasted time fetching, but the pathlib module is designed around Path objects that are always live, not snapshots. If Path.walk yields something that isn't a Path, what's the point?

I've been trying to use pathlib whenever I need, well, a path, but then I find I almost immediately need to step out and use an os.path function, and have to string-fy it anyway -- makes me wonder what the point is..

I have the same impression as you, but, as Guido says, let's give it time before judging...

 And honestly, if open, os.walk, etc. aren't going to work with Path objects, 

but they should -- of course they should..... 

So far things have gone the opposite direction: open requires strings, but there's a Path.open method; walk requires strings, but people are proposing a Path.walk method; etc. I'm not sure how that's supposed to extend to things like json.load or NamedTemporaryFile.name.

Truly pushing for adoption of a new abstraction like this takes many years -- pathlib was new (and provisional) in 3.4 so it really hasn't been long enough to give up on it. The OP hasn't!

it will take many years for sure -- but the standard library cold at least adopt it as much as possible.

Path.walk would be a nice start :-)

My example: one of our sysadmins wanted a little script to go thorugh an entire drive (Windows), and check if any paths were longer than 256 characters (Windows, remember..)

I came up with this:

def get_all_paths(start_dir='/'):
    for dirpath, dirnames, filenames in os.walk(start_dir):
        for filename in filenames:
            yield os.path.join(dirpath, filename)

too_long = []
for p in get_all_paths('/'):
    print("checking:", p)
    if len(p) > 255:
        too_long.append(p)
        print("Path too long!")

Do you really want it to print out "Path too long!" hundreds of times?

If not, this is a lot more concise, and I think readable, with comprehensions:

walk = os.walk(start_dir)
files = (os.path.join(root, file) for root, dirs, files in walk for file in files)
too_long = (file for file in files if len(file) > 255)

And now you've got a lazy Iterator over you too-long files. (If you need a list, just use a listcomp instead of a genexpr in the last step.)

way too wordy! 

I started with pathlib, but that just made it worse.

If we had a Path.walk, I don't think it could be that much better than the original version, since the only thing Path can help with is making that join a bit shorter--and at the cost of having to convert to str to check len():

walk = start_path.Walk()
files = (root / file for root, dirs, files in walk for file in files)
too_long = (file for file in files if len(str(file)) > 255)

As a side note, there's no Windows restriction to 255 _characters_, it's to 255 UTF-16 code points, just under 64K UTF-16 code points, or 255 codepage bytes, depending on which API you use. So you really want something like len(file.encode('utf-16') / 2) > 255. Also, I suspect you want either the bare filename or the abspath, not the path from the start dir (especially since a path rooted at the default '/' is two characters shorter than one rooted at 'C:\', so you're probably going to pass a bunch of files that then cause problems in your scripts).