[Python-Dev] PEP 471 "scandir" accepted
Akira Li
4kir4.1i at gmail.com
Wed Jul 23 01:21:14 CEST 2014
Ben Hoyt <benhoyt at gmail.com> writes:
>> Note: listdir() accepts an integer path (an open file descriptor that
>> refers to a directory) that is passed to fdopendir() on POSIX [4] i.e.,
>> *you can't use scandir() to replace listdir() in this case* (as I've
>> already mentioned in [1]). See the corresponding tests from [2].
>>
>> [1] https://mail.python.org/pipermail/python-dev/2014-July/135296.html
>> [2] https://mail.python.org/pipermail/python-dev/2014-June/135265.html
>>
>> From os.listdir() docs [3]:
>>
>>> This function can also support specifying a file descriptor; the file
>>> descriptor must refer to a directory.
>>
>> [3] https://docs.python.org/3.4/library/os.html#os.listdir
>> [4] http://hg.python.org/cpython/file/3.4/Modules/posixmodule.c#l3736
>
> Fair point.
>
> Yes, I hadn't realized listdir supported dir_fd (must have been
> looking at 2.x docs), though you've pointed it out at [1] above. and I
> guess I wasn't thinking about implementation at the time.
FYI, dir_fd is related but *different*: compare "specifying a file
descriptor" [1] vs. "paths relative to directory descriptors" [2].
"NOTE: os.supports_fd and os.supports_dir_fd are different sets." [3]:
>>> import os
>>> os.listdir in os.supports_fd
True
>>> os.listdir in os.supports_dir_fd
False
[1] https://docs.python.org/3/library/os.html#path-fd
[2] https://docs.python.org/3/library/os.html#dir-fd
[3] https://mail.python.org/pipermail/python-dev/2014-July/135296.html
To be clear: *listdir() does not support dir_fd* though it can be
emulated using os.open(dir_fd=..).
You can safely ignore the rest of the e-mail until you want to implement
path-fd [1] support for os.scandir() in several months.
Here's code example that demonstrates both path-fd [1] and dir-fd [2]:
import contextlib
import os
with contextlib.ExitStack() as stack:
dir_fd = os.open('/etc', os.O_RDONLY)
stack.callback(os.close, dir_fd)
fd = os.open('init.d', os.O_RDONLY, dir_fd=dir_fd) # dir-fd [2]
stack.callback(os.close, fd)
print("\n".join(os.listdir(fd))) # path-fd [1]
It is the same as os.listdir('/etc/init.d') unless '/etc' is symlinked
to refer to another directory after the first os.open('/etc',..)
call. See also, os.fwalk(dir_fd=..) [4]
[4] https://docs.python.org/3/library/os.html#os.fwalk
> However, given that we have to support this for listdir() anyway, I
> think it's worth reconsidering whether scandir()'s directory argument
> can be an integer FD.
What is entry.path in this case? If input directory is a file descriptor
(an integer) then os.path.join(directory, entry.name) won't work.
"PEP 471 should explicitly reject the support for specifying a file
descriptor so that a code that uses os.scandir may assume that
entry.path attribute is always present (no exceptions due
to a failure to read /proc/self/fd/NNN or an error while calling
fcntl(F_GETPATH) or GetFileInformationByHandleEx() -- see
http://stackoverflow.com/q/1188757 )." [5]
[5] https://mail.python.org/pipermail/python-dev/2014-July/135441.html
On the other hand os.fwalk() [4] that supports both path-fd [1] and
dir-fd [2] could be implemented without entry.path property if
os.scandir() supports just path-fd [1]. os.fwalk() provides a safe way
to traverse a directory tree without symlink races e.g., [6]:
def get_tree_size(directory):
"""Return total size of files in directory and subdirs."""
return sum(entry.lstat().st_size
for root, dirs, files, rootfd in fwalk(directory)
for entry in files)
[6] http://legacy.python.org/dev/peps/pep-0471/#examples
where fwalk() is the exact copy of os.fwalk() except that it uses
_fwalk() which is defined in terms of scandir():
import os
# adapt os._fwalk() to use scandir() instead of os.listdir()
def _fwalk(topfd, toppath, topdown, onerror, follow_symlinks):
# Note: This uses O(depth of the directory tree) file descriptors:
# if necessary, it can be adapted to only require O(1) FDs, see
# http://bugs.python.org/issue13734
entries = scandir(topfd)
dirs, nondirs = [], []
for entry in entries: #XXX call onerror on OSError on next() and return?
# report symlinks to directories as directories (like os.walk)
# but no recursion into symlinked subdirectories unless
# follow_symlinks is true
# add dangling symlinks as nondirs (DirEntry.is_dir() doesn't
# raise on broken links)
try:
(dirs if entry.is_dir() else nondirs).append(entry)
except FileNotFoundError:
continue # ignore disappeared files
if topdown:
yield toppath, dirs, nondirs, topfd
for entry in dirs:
try:
orig_st = entry.stat(follow_symlinks=follow_symlinks)
#XXX O_DIRECTORY, O_CLOEXEC, [? O_NOCTTY, O_SEARCH ?]
dirfd = os.open(entry.name, os.O_RDONLY, dir_fd=topfd)
except OSError as err:
if onerror is not None:
onerror(err)
return
try:
if follow_symlinks or os.path.samestat(orig_st, os.stat(dirfd)):
dirpath = os.path.join(toppath, entry.name) # entry.path
yield from _fwalk(dirfd, dirpath, topdown, onerror,
follow_symlinks)
finally:
close(dirfd) # or use with entry.opendir() as dirfd: ...
if not topdown:
yield toppath, dirs, nondirs, topfd
i.e., if os.scandir() supports specifying file descriptors [1] then it
is relatively straightforward to define os.fwalk() in terms of it. Would
scandir() provide the same performance benefits as for os.walk()?
entry.stat() can be implemented without entry.path when entry._directory
(or whatever other DirEntry's attribute that stores the first parameter
to os.scandir(fd)) is an open file descriptor that refers to a directory:
def stat(self, *, follow_symlinks=True):
return os.stat(self.name, #NOTE: ignore caching
follow_symlinks=follow_symlinks, dir_fd=self._directory)
lstat = lambda self: self.stat(follow_symlinks=False)
--
Akira
More information about the Python-Dev
mailing list