On 1 July 2014 03:05, Ben Hoyt <benhoyt@gmail.com> wrote:
> So, here's my alternative proposal: add an "ensure_lstat" flag to
> scandir() itself, and don't have *any* methods on DirEntry, only
> attributes.
...
> Most importantly, *regardless of platform*, the cached stat result (if
> not None) would reflect the state of the entry at the time the
> directory was scanned, rather than at some arbitrary later point in
> time when lstat() was first called on the DirEntry object.

I'm torn between whether I'd prefer the stat fields to be populated on Windows if ensure_lstat=False or not. There are good arguments each way, but overall I'm inclining towards having it consistent with POSIX - don't populate them unless ensure_lstat=True.

+0 for stat fields to be None on all platforms unless ensure_lstat=True.
 
Yeah, I quite like this. It does make the caching more explicit and
consistent. It's slightly annoying that it's less like pathlib.Path
now, but DirEntry was never pathlib.Path anyway, so maybe it doesn't
matter. The differences in naming may highlight the difference in
caching, so maybe it's a good thing.

See my comments below on .fullname.
 
Two further questions from me:

1) How does error handling work? Now os.stat() will/may be called
during iteration, so in __next__. But it hard to catch errors because
you don't call __next__ explicitly. Is this a problem? How do other
iterators that make system calls or raise errors handle this?

I think it just needs to be documented that iterating may throw the same exceptions as os.lstat(). It's a little trickier if you don't want the scope of your exception to be too broad, but you can always wrap the iteration in a generator to catch and handle the exceptions you care about, and allow the rest to propagate.
 
def scandir_accessible(path='.'):
    gen = os.scandir(path)

    while True:
        try:
            yield next(gen)
        except PermissionError:
            pass

2) There's still the open question in the PEP of whether to include a
way to access the full path. This is cheap to build, it has to be
built anyway on POSIX systems, and it's quite useful for further
operations on the file. I think the best way to handle this is a
.fullname or .full_name attribute as suggested elsewhere. Thoughts?

+1 for .fullname. The earlier suggestion to have __str__ return the name is killed I think by the fact that .fullname could be bytes.

It would be nice if pathlib.Path objects were enhanced to take a DirEntry and use the .fullname automatically, but you could always call Path(direntry.fullname).

Tim Delaney