[Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)

Victor Stinner victor.stinner at gmail.com
Tue Jul 1 09:44:02 CEST 2014


Hi,

IMO we must decide if scandir() must support or not file descriptor.
It's an important decision which has an important impact on the API.


To support scandir(fd), the minimum is to store dir_fd in DirEntry:
dir_fd would be None for scandir(str).


scandir(fd) must not close the file descriptor, it should be done by
the caller. Handling the lifetime of the file descriptor is a
difficult problem, it's better to let the user decide how to handle
it.

There is the problem of the limit of open file descriptors, usually
1024 but it can be lower. It *can* be an issue for very deep file
hierarchy.

If we choose to support scandir(fd), it's probably safer to not use
scandir(fd) by default in os.walk() (use scandir(str) instead), wait
until the feature is well tested, corner cases are well known, etc.


The second step is to enhance pathlib.Path to support an optional file
descriptor. Path already has methods on filenames like chmod(),
exists(), rename(), etc.


Example:

fd = os.open(path, os.O_DIRECTORY)
try:
   for entry in os.scandir(fd):
      # ... use entry to benefit of entry cache: is_dir(), lstat_result ...
      path = pathlib.Path(entry.name, dir_fd=entry.dir_fd)
      # ... use path which uses dir_fd ...
finally:
    os.close(fd)

Problem: if the path object is stored somewhere and use after the
loop, Path methods will fail because dir_fd was closed. It's even
worse if a new directory uses the same file descriptor :-/ (security
issue, or at least tricky bugs!)

Victor


More information about the Python-Dev mailing list