[Python-Dev] Updates to PEP 471, the os.scandir() proposal

Akira Li 4kir4.1i at gmail.com
Thu Jul 10 04:28:09 CEST 2014

Ben Hoyt <benhoyt at gmail.com> writes:
> ``scandir()`` yields a ``DirEntry`` object for each file and directory
> in ``path``. Just like ``listdir``, the ``'.'`` and ``'..'``
> pseudo-directories are skipped, and the entries are yielded in
> system-dependent order. Each ``DirEntry`` object has the following
> attributes and methods:
> * ``name``: the entry's filename, relative to the ``path`` argument
>   (corresponds to the return values of ``os.listdir``)
> * ``full_name``: the entry's full path name -- the equivalent of
>   ``os.path.join(path, entry.name)``

I suggest renaming .full_name -> .path

.full_name might be misleading e.g., it implies that .full_name ==
abspath(.full_name) that might be false. The .path name has no such

The semantics of the the .path attribute is defined by these assertions::

    for entry in os.scandir(topdir):
        #NOTE: assume os.path.normpath(topdir) is not called to create .path
        assert entry.path == os.path.join(topdir, entry.name)
        assert entry.name == os.path.basename(entry.path)
        assert entry.name == os.path.relpath(entry.path, start=topdir)
        assert os.path.dirname(entry.path) == topdir
        assert (entry.path != os.path.abspath(entry.path) or
                os.path.isabs(topdir)) # it is absolute only if topdir is
        assert (entry.path != os.path.realpath(entry.path) or
                topdir == os.path.realpath(topdir)) # symlinks are not resolved
        assert (entry.path != os.path.normcase(entry.path) or
                topdir == os.path.normcase(topdir)) # no case-folding,
                                                    # unlike PureWindowsPath

> * ``is_dir()``: like ``os.path.isdir()``, but much cheaper -- it never
>   requires a system call on Windows, and usually doesn't on POSIX
>   systems

I suggest documenting the implicit follow_symlinks parameter for .is_X methods.

Note: lstat == partial(stat, follow_symlinks=False).

In particular, .is_dir() should probably use follow_symlinks=True by
default as suggested by Victor Stinner *if .is_dir() does it on Windows*

MSDN says: GetFileAttributes() does not follow symlinks.

os.path.isdir docs imply follow_symlinks=True: "both islink() and
isdir() can be true for the same path."

> Like the other functions in the ``os`` module, ``scandir()`` accepts
> either a bytes or str object for the ``path`` parameter, and returns
> the ``DirEntry.name`` and ``DirEntry.full_name`` attributes with the
> same type as ``path``. However, it is *strongly recommended* to use
> the str type, as this ensures cross-platform support for Unicode
> filenames.

Document when {e.name for e in os.scandir(path)} != set(os.listdir(path))

e.g., path can be an open file descriptor in os.listdir(path) since
Python 3.3 but the PEP doesn't mention it explicitly.

It has been discussed already e.g.,

PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path (.full_name) attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 ).

Reject explicitly in PEP 471 the support for dir_fd parameter

aka the support for paths relative to directory descriptors.

Note: it is a *different* (but related) issue.

> Notes on exception handling
> ---------------------------
> ``DirEntry.is_X()`` and ``DirEntry.lstat()`` are explicitly methods
> rather than attributes or properties, to make it clear that they may
> not be cheap operations, and they may do a system call. As a result,
> these methods may raise ``OSError``.
> For example, ``DirEntry.lstat()`` will always make a system call on
> POSIX-based systems, and the ``DirEntry.is_X()`` methods will make a
> ``stat()`` system call on such systems if ``readdir()`` returns a
> ``d_type`` with a value of ``DT_UNKNOWN``, which can occur under
> certain conditions or on certain file systems.
> For this reason, when a user requires fine-grained error handling,
> it's good to catch ``OSError`` around these method calls and then
> handle as appropriate.

I suggest documenting that next(os.scandir()) may raise OSError

e.g., on POSIX it may happen due to an OS error in opendir/readdir/closedir

Also, document whether os.scandir() itself may raise OSError (whether
opendir or other OS functions may be called before the first yield).

os.scandir() should allow the explicit cleanup

    with closing(os.scandir()) as entries:
        for _ in entries:

entries.close() is called that frees the resources if necessary, to
*avoid relying on garbage-collection for managing file descriptors*
(check whether it is consistent with the .close() method from the
generator protocol e.g., it might be already called on the exit from the
loop whether an exception happens or not without requiring the
with-statement (I don't know)). *It should be possible to limit the
resource life-time on non-refcounting Python implementations.*

 os.scandir() object may support the context manager protocol explicitly::

    with os.scandir() as entries:
        for _ in entries:

``.__exit__`` method may just call ``.close`` method.

> Rejected ideas
> ==============
> Naming
> ------
> The only other real contender for this function's name was
> ``iterdir()``. However, ``iterX()`` functions in Python (mostly found
> in Python 2) tend to be simple iterator equivalents of their
> non-iterator counterparts. For example, ``dict.iterkeys()`` is just an
> iterator version of ``dict.keys()``, but the objects returned are
> identical. In ``scandir()``'s case, however, the return values are
> quite different objects (``DirEntry`` objects vs filename strings), so
> this should probably be reflected by a difference in name -- hence
> ``scandir()``.
> See some `relevant discussion on python-dev
> <https://mail.python.org/pipermail/python-dev/2014-June/135228.html>`_.

- os.scandir() name is inconsistent with the pathlib module.
  pathlib.Path has `.iterdir() method
  that generates Path instances i.e., the argument that iterdir()
  should return strings is not valid

- os.scandir() name conflicts with POSIX. POSIX already has `scandir()
  Most functions in the os module are thin-wrappers of their
  corresponding POSIX analogs

In principle, POSIX scandir(path, &entries, sel, compar) is emulated

    entries = sorted(filter(sel, os.scandir(path)),

so that the above code snippet could be provided in the docs. We may
say that os.scandir is a pythonic analog of the POSIX function and
therefore there is no conflict even if os.scandir doesn't use POSIX
scandir function in its implementation. If we can't say it then a
*different name/module should be used to allow adding POSIX-compatible
os.scandir() in the future*.


More information about the Python-Dev mailing list