[Python-Dev] Remaining decisions on PEP 471 -- os.scandir()

Ben Hoyt benhoyt at gmail.com
Tue Jul 15 04:48:41 CEST 2014

> Let's not multiply entities beyond necessity.
> There is well-defined *follow_symlinks* parameter
> https://docs.python.org/3/library/os.html#follow-symlinks
> e.g., os.access, os.chown, os.link, os.stat, os.utime and many other
> functions in os module support follow_symlinks parameter, see
> os.supports_follow_symlinks.

Huh, interesting. I didn't know os.stat() had a follow_symlinks
parameter -- when False, it's equivalent to lstat. If DirEntry has a
.stat(follow_symlinks=True) method, we don't actually need lstat().

> os.walk is an exception that uses *followlinks*. It might be because it
> is an old function e.g., newer os.fwalk uses follow_symlinks.

Yes, I'm sure that's correct. Today it'd be called follow_symlinks,
but obviously one can't change os.walk() anymore.

> Only *recursive* functions such as os.walk, os.fwalk do not follow
> symlinks by default, to avoid symlink loops. [...]
> follow_symlinks=True as default for DirEntry.is_dir method allows to
> avoid easy-to-introduce bugs while replacing old
> os.listdir/os.path.isdir code or writing a new code using the same
> mental model.

I think these are good points, especially that of porting existing
listdir()/os.path.isdir() code and avoiding bugs. As I mentioned, I
was really on the fence about the link-following thing, but if it's a
tiny bit harder to implement but it avoids bugs (and I already had a
bug with this when implementing os.walk), that's a worthwhile

In light of that, I propose I update the PEP to basically follow
Victor's model of is_X() and stat() following symlinks by default, and
allowing you to specify follow_symlinks=False if you want something
other than that.

Victor had one other question:

> What happens to name and full_name with followlinks=True?
> Do they contain the name in the directory (name of the symlink)
> or name of the linked file?

I would say they should contain the name and full path of the entry --
the symlink, NOT the linked file. They kind of have to, right,
otherwise they'd have to be method calls that potentially call the

In any case, here's the modified proposal:

scandir(path='.') -> generator of DirEntry objects, which have:

* name: name as per listdir()
* full_name: full path name (not necessarily absolute), equivalent of
os.path.join(path, entry.name)
* is_dir(follow_symlinks=True): like os.path.isdir(entry.full_name),
but free in most cases; cached per entry
* is_file(follow_symlinks=True): like os.path.isfile(entry.full_name),
but free in most cases; cached per entry
* is_symlink(): like os.path.islink(), but free in most cases; cached per entry
* stat(follow_symlinks=True): like os.stat(entry.full_name,
follow_symlinks=follow_symlinks); cached per entry

The above may not be quite perfect, but it's good, and I think there's
been enough bike-shedding on the API. :-)

So please speak now or forever hold your peace. :-) I intend to update
the PEP to reflect this and make a few other clarifications in the
next few days.


