[Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

Ben Hoyt benhoyt at gmail.com
Tue Jul 15 04:48:41 CEST 2014


> Let's not multiply entities beyond necessity.
>
> There is well-defined *follow_symlinks* parameter
> https://docs.python.org/3/library/os.html#follow-symlinks
> e.g., os.access, os.chown, os.link, os.stat, os.utime and many other
> functions in os module support follow_symlinks parameter, see
> os.supports_follow_symlinks.

Huh, interesting. I didn't know os.stat() had a follow_symlinks
parameter -- when False, it's equivalent to lstat. If DirEntry has a
.stat(follow_symlinks=True) method, we don't actually need lstat().

> os.walk is an exception that uses *followlinks*. It might be because it
> is an old function e.g., newer os.fwalk uses follow_symlinks.

Yes, I'm sure that's correct. Today it'd be called follow_symlinks,
but obviously one can't change os.walk() anymore.

> Only *recursive* functions such as os.walk, os.fwalk do not follow
> symlinks by default, to avoid symlink loops. [...]
>
> follow_symlinks=True as default for DirEntry.is_dir method allows to
> avoid easy-to-introduce bugs while replacing old
> os.listdir/os.path.isdir code or writing a new code using the same
> mental model.

I think these are good points, especially that of porting existing
listdir()/os.path.isdir() code and avoiding bugs. As I mentioned, I
was really on the fence about the link-following thing, but if it's a
tiny bit harder to implement but it avoids bugs (and I already had a
bug with this when implementing os.walk), that's a worthwhile
trade-off.

In light of that, I propose I update the PEP to basically follow
Victor's model of is_X() and stat() following symlinks by default, and
allowing you to specify follow_symlinks=False if you want something
other than that.

Victor had one other question:

> What happens to name and full_name with followlinks=True?
> Do they contain the name in the directory (name of the symlink)
> or name of the linked file?

I would say they should contain the name and full path of the entry --
the symlink, NOT the linked file. They kind of have to, right,
otherwise they'd have to be method calls that potentially call the
system.

In any case, here's the modified proposal:

scandir(path='.') -> generator of DirEntry objects, which have:

* name: name as per listdir()
* full_name: full path name (not necessarily absolute), equivalent of
os.path.join(path, entry.name)
* is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name),
but free in most cases; cached per entry
* is_file(follow_symlinks=True): like os.path.isfile(entry.full_name),
but free in most cases; cached per entry
* is_symlink(): like os.path.islink(), but free in most cases; cached per entry
* stat(follow_symlinks=True): like os.stat(entry.full_name,
follow_symlinks=follow_symlinks); cached per entry

The above may not be quite perfect, but it's good, and I think there's
been enough bike-shedding on the API. :-)

So please speak now or forever hold your peace. :-) I intend to update
the PEP to reflect this and make a few other clarifications in the
next few days.

-Ben


More information about the Python-Dev mailing list