[Python-Dev] My summary of the scandir (PEP 471)

Wed Jul 2 08:35:48 CEST 2014

On 1 July 2014 14:20, Paul Moore <p.f.moore at gmail.com> wrote:
> On 1 July 2014 14:00, Ben Hoyt <benhoyt at gmail.com> wrote:
>> 2) Nick Coghlan's proposal on the previous thread
>> (https://mail.python.org/pipermail/python-dev/2014-June/135261.html)
>> suggesting an ensure_lstat keyword param to scandir if you need the
>> lstat_result value
>>
>> I would make one small tweak to Nick Coghlan's proposal to make
>> writing cross-platform code easier. Instead of .lstat_result being
>> None sometimes (on POSIX), have it None always unless you specify
>> ensure_lstat=True. (Actually, call it get_lstat=True to kind of make
>> this more obvious.) Per (b) above, this means Windows developers
>> wouldn't accidentally write code which failed on POSIX systems -- it'd
>> fail fast on Windows too if you accessed .lstat_result without
>> specifying get_lstat=True.
>
> This is getting very complicated (at least to me, as a Windows user,
> where the basic idea seems straightforward).
>
> It seems to me that the right model is the standard "thin wrapper
> round the OS feature" that acts as a building block - it's typical of
> the rest of the os module. I think that thin wrapper is needed - even
> if the various bells and whistles are useful, they can be built on top
> of a low-level version (whereas the converse is not the case).
> Typically, such thin wrappers expose POSIX semantics by default, and
> Windows behaviour follows as closely as possible (see for example
> stat, where st_ino makes no sense on Windows, but is present). In this
> case, we're exposing Windows semantics, and POSIX is the one needing
> to fit the model, but the principle is the same.
>
> On that basis, optional attributes (as used in stat results) seem
> entirely sensible.
>
> The documentation for DirEntry could easily be written to parallel
> that of a stat result:
>
> """
> The return value is an object whose attributes correspond to the data
> the OS returns about a directory entry:
>
>   * name - the object's name
>   * full_name - the object's full name (including path)
>   * is_dir - whether the object is a directory
>   * is file - whether the object is a plain file
>   * is_symlink - whether the object is a symbolic link
>
> On Windows, the following attributes are also available
>
>   * st_size - the size, in bytes, of the object (only meaningful for files)
>   * st_atime - time of last access
>   * st_mtime - time of last write
>   * st_ctime - time of creation
>   * st_file_attributes - Windows file attribute bits (see the
> FILE_ATTRIBUTE_* constants in the stat module)
> """
>
> That's no harder to understand (or to work with) than the equivalent
> stat result. The only difference is that the unavailable attributes
> can be queried on POSIX, there's just a separate system call involved
> (with implications in terms of performance, error handling and
> potential race conditions).
>
> The version of scandir with the ensure_lstat argument is easy to write
> based on one with optional arguments (I'm playing fast and loose with
> adding attributes to DirEntry values here, just for the sake of an
> example - the details are left as an exercise)
>
> def scandir_ensure(path='.', ensure_lstat=False):
>     for entry in os.scandir(path):
>         if ensure_lstat and not hasattr(entry, 'st_size'):
>             stat_data = os.lstat(entry.full_name)
>             entry.st_size = stat_data.st_size
>             entry.st_atime = stat_data.st_atime
>             entry.st_mtime = stat_data.st_mtime
>             entry.st_ctime = stat_data.st_ctime
>             # Ignore file_attributes, as we'll never get here on Windows
>         yield entry
>
> Variations on how you handle errors in the lstat call, etc, can be
> added to taste.
>
> Please, let's stick to a low-level wrapper round the OS API for the
> first iteration of this feature. Enhancements can be added later, when
> real-world usage has proved their value.

+1 from me - especially if this recipe goes in at least the PEP, and
potentially even the docs.

I'm also OK with postponing onerror support for the time being - that
should be straightforward to add later if we decide we need it.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia