On Mon, Dec 28, 2015 at 2:43 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
sure, but having to write "little" wrappers for common needs is unfortunate...

You're replying to me, not Guido, here...

I was intending to reply to the list :-)
Anyway, if the only thing anyone will ever need is a handful of simple one-liners that even a novice could write, maybe it's reasonable to just add one to the docs to show how to do it, instead of adding them to the stdlib.

well, it's a four liner, yes? but I'm not sure i agree -- the simple things should be simple. even if you can find the couple-liner in the docs, you've still got a lot more overhead than calling a ready-to-go function.
and it's not like it'd be a heavy maintenance burden....

The problem isn't designing a nice walk API; it's integrating it with pathlib.*

indeed -- I'd really like to see a *walk in pathlib itself.

But first you have to solve the problem that paragraph was all about: a general-purpose walk API shouldn't be throwing away all that stat information it wasted time fetching, but the pathlib module is designed around Path objects that are always live, not snapshots. If Path.walk yields something that isn't a Path, what's the point?

OK -- you've gotten out of my technical depth now.....so I'll just shut up.

But at the end of the day, if you've got the few-liner in the docs that works, maybe it's OK that it's not optimized.....
I've been trying to use pathlib whenever I need, well, a path, but then I find I almost immediately need to step out and use an os.path function, and have to string-fy it anyway -- makes me wonder what the point is..

> I have the same impression as you, but, as Guido says, let's give it time
> before judging...

time good -- but also maybe some more work to make it easy to use with rest of the stdlib. I will say that one thing that bugs me about the "old style" os.path functions is that I find myself stringing tehm together, and that gets really ugly fast:

my_path - os.path.join(os.path.split(something)[0], something_else)

here's where an OO interface is much nicer.
 And honestly, if open, os.walk, etc. aren't going to work with Path objects, 

but they should -- of course they should..... 

So far things have gone the opposite direction: open requires strings, but there's a Path.open method;

This sure feels to me like the wrong way to go -- too OO -heavy:

create a Path object, then use it to open a file. which is why we still have the regular old open() that takes strings.

I just finished teaching an intro to Python class, using py3 for the first time -- I found myself pointing students to pathlib, but then never using it in any examples, etc. That may be my old habits, but I really think we do have an ugly mix of APIs here.

> walk requires strings, but people are proposing a Path.walk method; etc.

well, walk "feels" to me like a path-y operation. whereas open() does not.

I'm not sure how that's supposed to extend to things like json.load or NamedTemporaryFile.name.

exactly -- that's why open() doesn't feel path-y to me. you have all sorts of places where you might want to open a file, and you want to open other things as well. And I like APIs that let you pass in either an open file-like object, OR a path -- so it seems allowing either a Path object or a path-in-a-string would be good.

so my "proposal" is to go through the stdlib and add the ability to accept a Path object everywhere a string path is accepted.

(hmm -- could you simply wrap str() around the input?)
My example: one of our sysadmins wanted a little script to go thorugh an entire drive (Windows), and check if any paths were longer than 256 characters (Windows, remember..)

I came up with this:

def get_all_paths(start_dir='/'):
    for dirpath, dirnames, filenames in os.walk(start_dir):
        for filename in filenames:
            yield os.path.join(dirpath, filename)

too_long = []
for p in get_all_paths('/'):
    print("checking:", p)
    if len(p) > 255:
        print("Path too long!")

> Do you really want it to print out "Path too long!" hundreds of times?

well, not in production, no, but was nice to test -- also, in theory, there shouldn't be many!

> If not, this is a lot more concise, and I think readable, with comprehensions:

walk = os.walk(start_dir)
files = (os.path.join(root, file) for root, dirs, files in walk for file in files)
too_long = (file for file in files if len(file) > 255)

thanks -- should have thought of that -- though that was to pass off to a sysadmin that doesn't know much python -- harder for him to read??

> And now you've got a lazy Iterator over you too-long files.
> (If you need a > list, just use a listcomp instead of a genexpr in the last step.)

yup -- probably I'd write it out to a file in the real use case. or stdout.

way too wordy! 

I started with pathlib, but that just made it worse.

> If we had a Path.walk, I don't think it could be that much better than the
> original version,

sure -- the wordyness comes from the fact that you have to deal with dirs and files separately.

> since the only thing Path can help with is making that join a bit
> shorter--and at the cost of having to convert to str to check len():

maybe another argument for why Path doesn't buy much over string paths...

> walk = start_path.Walk()
> files = (root / file for root, dirs, files in walk for file in files)
> too_long = (file for file in files if len(str(file)) > 255)

what I really want here is:

too_long = (filepath for filepath in Path(root) if len(filepath) > 255 )

I know python isn't a shell scripting language but it is a one liner in powershell or bash, or....


As a side note, there's no Windows restriction to 255 _characters_, it's to 255 UTF-16 code points,

IIUC, Windows itself, nor ntfs has this restriction, but some older utilities do -- really pathetic. And I asked our sysadmin about the unicode issue, and he hasd no idea.
just under 64K UTF-16 code points,

how is a codepoint different than a character???? I was wondering if it was a bytes restriction or codepoint restriction?
or 255 codepage bytes, depending on which API you use.

this is where it gets ugly -- who knows what API some utility is using???

So you really want something like len(file.encode('utf-16') / 2) > 255.

but can't some characters use more than 2 bytes in utf-16? or is that what you're trying to catch here?

Also, I suspect you want either the bare filename or the abspath, not the path from the start dir (especially since a path rooted at the default '/' is two characters shorter than one rooted at 'C:\',

well, the startdir would be C:\  and now I'm confused about whether the "C:\" is parto f the 255-something restriction!

anyway, WAY OT -- and if this is used it will be mainly to flag potential problems, not really a robust test.


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception