[Python-Dev] My summary of the scandir (PEP 471)

Paul Moore p.f.moore at gmail.com
Tue Jul 1 23:20:17 CEST 2014

On 1 July 2014 14:00, Ben Hoyt <benhoyt at gmail.com> wrote:
> 2) Nick Coghlan's proposal on the previous thread
> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
> suggesting an ensure_lstat keyword param to scandir if you need the
> lstat_result value
> I would make one small tweak to Nick Coghlan's proposal to make
> writing cross-platform code easier. Instead of .lstat_result being
> None sometimes (on POSIX), have it None always unless you specify
> ensure_lstat=True. (Actually, call it get_lstat=True to kind of make
> this more obvious.) Per (b) above, this means Windows developers
> wouldn't accidentally write code which failed on POSIX systems -- it'd
> fail fast on Windows too if you accessed .lstat_result without
> specifying get_lstat=True.

This is getting very complicated (at least to me, as a Windows user,
where the basic idea seems straightforward).

It seems to me that the right model is the standard "thin wrapper
round the OS feature" that acts as a building block - it's typical of
the rest of the os module. I think that thin wrapper is needed - even
if the various bells and whistles are useful, they can be built on top
of a low-level version (whereas the converse is not the case).
Typically, such thin wrappers expose POSIX semantics by default, and
Windows behaviour follows as closely as possible (see for example
stat, where st_ino makes no sense on Windows, but is present). In this
case, we're exposing Windows semantics, and POSIX is the one needing
to fit the model, but the principle is the same.

On that basis, optional attributes (as used in stat results) seem
entirely sensible.

The documentation for DirEntry could easily be written to parallel
that of a stat result:

The return value is an object whose attributes correspond to the data
the OS returns about a directory entry:

  * name - the object's name
  * full_name - the object's full name (including path)
  * is_dir - whether the object is a directory
  * is file - whether the object is a plain file
  * is_symlink - whether the object is a symbolic link

On Windows, the following attributes are also available

  * st_size - the size, in bytes, of the object (only meaningful for files)
  * st_atime - time of last access
  * st_mtime - time of last write
  * st_ctime - time of creation
  * st_file_attributes - Windows file attribute bits (see the
FILE_ATTRIBUTE_* constants in the stat module)

That's no harder to understand (or to work with) than the equivalent
stat result. The only difference is that the unavailable attributes
can be queried on POSIX, there's just a separate system call involved
(with implications in terms of performance, error handling and
potential race conditions).

The version of scandir with the ensure_lstat argument is easy to write
based on one with optional arguments (I'm playing fast and loose with
adding attributes to DirEntry values here, just for the sake of an
example - the details are left as an exercise)

def scandir_ensure(path='.', ensure_lstat=False):
    for entry in os.scandir(path):
        if ensure_lstat and not hasattr(entry, 'st_size'):
            stat_data = os.lstat(entry.full_name)
            entry.st_size = stat_data.st_size
            entry.st_atime = stat_data.st_atime
            entry.st_mtime = stat_data.st_mtime
            entry.st_ctime = stat_data.st_ctime
            # Ignore file_attributes, as we'll never get here on Windows
        yield entry

Variations on how you handle errors in the lstat call, etc, can be
added to taste.

Please, let's stick to a low-level wrapper round the OS API for the
first iteration of this feature. Enhancements can be added later, when
real-world usage has proved their value.


More information about the Python-Dev mailing list