[Python-Dev] PEP 471 "scandir" accepted
4kir4.1i at gmail.com
Wed Jul 23 01:21:14 CEST 2014
Ben Hoyt <benhoyt at gmail.com> writes:
>> Note: listdir() accepts an integer path (an open file descriptor that
>> refers to a directory) that is passed to fdopendir() on POSIX  i.e.,
>> *you can't use scandir() to replace listdir() in this case* (as I've
>> already mentioned in ). See the corresponding tests from .
>>  https://mail.python.org/pipermail/python-dev/2014-July/135296.html
>>  https://mail.python.org/pipermail/python-dev/2014-June/135265.html
>> From os.listdir() docs :
>>> This function can also support specifying a file descriptor; the file
>>> descriptor must refer to a directory.
>>  https://docs.python.org/3.4/library/os.html#os.listdir
>>  http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736
> Fair point.
> Yes, I hadn't realized listdir supported dir_fd (must have been
> looking at 2.x docs), though you've pointed it out at  above. and I
> guess I wasn't thinking about implementation at the time.
FYI, dir_fd is related but *different*: compare "specifying a file
descriptor"  vs. "paths relative to directory descriptors" .
"NOTE: os.supports_fd and os.supports_dir_fd are different sets." :
>>> import os
>>> os.listdir in os.supports_fd
>>> os.listdir in os.supports_dir_fd
To be clear: *listdir() does not support dir_fd* though it can be
emulated using os.open(dir_fd=..).
You can safely ignore the rest of the e-mail until you want to implement
path-fd  support for os.scandir() in several months.
Here's code example that demonstrates both path-fd  and dir-fd :
with contextlib.ExitStack() as stack:
dir_fd = os.open('/etc', os.O_RDONLY)
fd = os.open('init.d', os.O_RDONLY, dir_fd=dir_fd) # dir-fd 
print("\n".join(os.listdir(fd))) # path-fd 
It is the same as os.listdir('/etc/init.d') unless '/etc' is symlinked
to refer to another directory after the first os.open('/etc',..)
call. See also, os.fwalk(dir_fd=..) 
> However, given that we have to support this for listdir() anyway, I
> think it's worth reconsidering whether scandir()'s directory argument
> can be an integer FD.
What is entry.path in this case? If input directory is a file descriptor
(an integer) then os.path.join(directory, entry.name) won't work.
"PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 )." 
On the other hand os.fwalk()  that supports both path-fd  and
dir-fd  could be implemented without entry.path property if
os.scandir() supports just path-fd . os.fwalk() provides a safe way
to traverse a directory tree without symlink races e.g., :
"""Return total size of files in directory and subdirs."""
for root, dirs, files, rootfd in fwalk(directory)
for entry in files)
where fwalk() is the exact copy of os.fwalk() except that it uses
_fwalk() which is defined in terms of scandir():
# adapt os._fwalk() to use scandir() instead of os.listdir()
def _fwalk(topfd, toppath, topdown, onerror, follow_symlinks):
# Note: This uses O(depth of the directory tree) file descriptors:
# if necessary, it can be adapted to only require O(1) FDs, see
entries = scandir(topfd)
dirs, nondirs = , 
for entry in entries: #XXX call onerror on OSError on next() and return?
# report symlinks to directories as directories (like os.walk)
# but no recursion into symlinked subdirectories unless
# follow_symlinks is true
# add dangling symlinks as nondirs (DirEntry.is_dir() doesn't
# raise on broken links)
(dirs if entry.is_dir() else nondirs).append(entry)
continue # ignore disappeared files
yield toppath, dirs, nondirs, topfd
for entry in dirs:
orig_st = entry.stat(follow_symlinks=follow_symlinks)
#XXX O_DIRECTORY, O_CLOEXEC, [? O_NOCTTY, O_SEARCH ?]
dirfd = os.open(entry.name, os.O_RDONLY, dir_fd=topfd)
except OSError as err:
if onerror is not None:
if follow_symlinks or os.path.samestat(orig_st, os.stat(dirfd)):
dirpath = os.path.join(toppath, entry.name) # entry.path
yield from _fwalk(dirfd, dirpath, topdown, onerror,
close(dirfd) # or use with entry.opendir() as dirfd: ...
if not topdown:
yield toppath, dirs, nondirs, topfd
i.e., if os.scandir() supports specifying file descriptors  then it
is relatively straightforward to define os.fwalk() in terms of it. Would
scandir() provide the same performance benefits as for os.walk()?
entry.stat() can be implemented without entry.path when entry._directory
(or whatever other DirEntry's attribute that stores the first parameter
to os.scandir(fd)) is an open file descriptor that refers to a directory:
def stat(self, *, follow_symlinks=True):
return os.stat(self.name, #NOTE: ignore caching
lstat = lambda self: self.stat(follow_symlinks=False)
More information about the Python-Dev