
Hi, IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API. To support scandir(fd), the minimum is to store dir_fd in DirEntry: dir_fd would be None for scandir(str). scandir(fd) must not close the file descriptor, it should be done by the caller. Handling the lifetime of the file descriptor is a difficult problem, it's better to let the user decide how to handle it. There is the problem of the limit of open file descriptors, usually 1024 but it can be lower. It *can* be an issue for very deep file hierarchy. If we choose to support scandir(fd), it's probably safer to not use scandir(fd) by default in os.walk() (use scandir(str) instead), wait until the feature is well tested, corner cases are well known, etc. The second step is to enhance pathlib.Path to support an optional file descriptor. Path already has methods on filenames like chmod(), exists(), rename(), etc. Example: fd = os.open(path, os.O_DIRECTORY) try: for entry in os.scandir(fd): # ... use entry to benefit of entry cache: is_dir(), lstat_result ... path = pathlib.Path(entry.name, dir_fd=entry.dir_fd) # ... use path which uses dir_fd ... finally: os.close(fd) Problem: if the path object is stored somewhere and use after the loop, Path methods will fail because dir_fd was closed. It's even worse if a new directory uses the same file descriptor :-/ (security issue, or at least tricky bugs!) Victor

Thanks, Victor. I don't have any experience with dir_fd handling, so unfortunately can't really comment here. What advantages does it bring? I notice that even os.listdir() on Python 3.4 doesn't have anything related to file descriptors, so I'd be in favour of not including support. We can always add it later. -Ben On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner <victor.stinner@gmail.com> wrote:

2014-07-01 14:26 GMT+02:00 Ben Hoyt <benhoyt@gmail.com>:
See https://docs.python.org/dev/library/os.html#dir-fd The idea is to make sure that you get files from the same directory. Problems occur when a directory is moved or a symlink is modified. Example: - you're browsing /tmp/test/x as root (!), /tmp/copy/passwd is owned by www user (website) - you would like to remove the file "x": call unlink("/tmp/copy/passwd") - ... but just before that, an attacker replaces the /tmp/copy directory with a symlink to /etc - you will remove /etc/passwd instead of /tmp/copy/passwd, oh oh Using unlink("passwd", dir_fd=tmp_copy_fd), you don't have this issue. You are sure that you are working in /tmp/copy directory. You can imagine a lot of other scenarios to override files and read sensitive files. Hopefully, the Linux rm commands knows unlinkat() sycall ;-) haypo@selma$ mkdir -p a/b/c haypo@selma$ strace -e unlinkat rm -rf a unlinkat(5, "c", AT_REMOVEDIR) = 0 unlinkat(4, "b", AT_REMOVEDIR) = 0 unlinkat(AT_FDCWD, "a", AT_REMOVEDIR) = 0 +++ exited with 0 +++ We should implement a similar think in shutil.rmtree(). See also os.fwalk() which is a version of os.walk() providing dir_fd.
We can always add it later.
I would prefer to discuss that right now. My proposition is to accept an int for scandir() and copy the int into DirEntry.dir_fd. It's not that complex :-) The enhancement of the pathlib module can be done later. By the way, I know that Antoine Pitrou wanted to implemented file descriptors in pathlib, but the feature was rejected or at least delayed. Victor

Ben Hoyt <benhoyt@gmail.com> writes:
FYI, os.listdir does support file descriptors in Python 3.3+ try:
import os os.listdir(os.open('.', os.O_RDONLY))
NOTE: os.supports_fd and os.supports_dir_fd are different sets. See also, https://mail.python.org/pipermail/python-dev/2014-June/135265.html -- Akira P.S. Please, don't put your answer on top of the message you are replying to.

2014-07-01 8:44 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API.
I don't think we should support it: it's way too complicated to use, error-prone, and leads to messy APIs.

2014-07-02 12:51 GMT+02:00 Charles-François Natali <cf.natali@gmail.com>:
I don't think we should support it: it's way too complicated to use, error-prone, and leads to messy APIs.
Can you please elaborate? Which kind of issue do you see? Handling the lifetime of the directory file descriptor? You don't like the dir_fd parameter of os functions? I don't have an opinion of supporting scandir(int). I asked to discuss it in the PEP directly. Victor

Yes, among other things. You can e.g. have a look at os.fwalk() or shutil._rmtree_safe_fd() to see that using those *properly* is far from being trivial.
You don't like the dir_fd parameter of os functions?
Exactly, I think it complicates the API for little benefit (FWIW, no other language I know of exposes them).

Am 01.07.14 09:44, schrieb Victor Stinner:
This is an open issue still: when is the file descriptor closed. I think the generator returned from scandir needs to support a .close method that guarantees to close the file descriptor. AFAICT, the pure-Python prototype of scandir already does, but it should be specified in the PEP. While we are at it: is it intended that the generator will also support the other generator methods, in particular .send and .throw? Regards, Martin

Thanks, Victor. I don't have any experience with dir_fd handling, so unfortunately can't really comment here. What advantages does it bring? I notice that even os.listdir() on Python 3.4 doesn't have anything related to file descriptors, so I'd be in favour of not including support. We can always add it later. -Ben On Tue, Jul 1, 2014 at 3:44 AM, Victor Stinner <victor.stinner@gmail.com> wrote:

2014-07-01 14:26 GMT+02:00 Ben Hoyt <benhoyt@gmail.com>:
See https://docs.python.org/dev/library/os.html#dir-fd The idea is to make sure that you get files from the same directory. Problems occur when a directory is moved or a symlink is modified. Example: - you're browsing /tmp/test/x as root (!), /tmp/copy/passwd is owned by www user (website) - you would like to remove the file "x": call unlink("/tmp/copy/passwd") - ... but just before that, an attacker replaces the /tmp/copy directory with a symlink to /etc - you will remove /etc/passwd instead of /tmp/copy/passwd, oh oh Using unlink("passwd", dir_fd=tmp_copy_fd), you don't have this issue. You are sure that you are working in /tmp/copy directory. You can imagine a lot of other scenarios to override files and read sensitive files. Hopefully, the Linux rm commands knows unlinkat() sycall ;-) haypo@selma$ mkdir -p a/b/c haypo@selma$ strace -e unlinkat rm -rf a unlinkat(5, "c", AT_REMOVEDIR) = 0 unlinkat(4, "b", AT_REMOVEDIR) = 0 unlinkat(AT_FDCWD, "a", AT_REMOVEDIR) = 0 +++ exited with 0 +++ We should implement a similar think in shutil.rmtree(). See also os.fwalk() which is a version of os.walk() providing dir_fd.
We can always add it later.
I would prefer to discuss that right now. My proposition is to accept an int for scandir() and copy the int into DirEntry.dir_fd. It's not that complex :-) The enhancement of the pathlib module can be done later. By the way, I know that Antoine Pitrou wanted to implemented file descriptors in pathlib, but the feature was rejected or at least delayed. Victor

Ben Hoyt <benhoyt@gmail.com> writes:
FYI, os.listdir does support file descriptors in Python 3.3+ try:
import os os.listdir(os.open('.', os.O_RDONLY))
NOTE: os.supports_fd and os.supports_dir_fd are different sets. See also, https://mail.python.org/pipermail/python-dev/2014-June/135265.html -- Akira P.S. Please, don't put your answer on top of the message you are replying to.

2014-07-01 8:44 GMT+01:00 Victor Stinner <victor.stinner@gmail.com>:
IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API.
I don't think we should support it: it's way too complicated to use, error-prone, and leads to messy APIs.

2014-07-02 12:51 GMT+02:00 Charles-François Natali <cf.natali@gmail.com>:
I don't think we should support it: it's way too complicated to use, error-prone, and leads to messy APIs.
Can you please elaborate? Which kind of issue do you see? Handling the lifetime of the directory file descriptor? You don't like the dir_fd parameter of os functions? I don't have an opinion of supporting scandir(int). I asked to discuss it in the PEP directly. Victor

Yes, among other things. You can e.g. have a look at os.fwalk() or shutil._rmtree_safe_fd() to see that using those *properly* is far from being trivial.
You don't like the dir_fd parameter of os functions?
Exactly, I think it complicates the API for little benefit (FWIW, no other language I know of exposes them).

Am 01.07.14 09:44, schrieb Victor Stinner:
This is an open issue still: when is the file descriptor closed. I think the generator returned from scandir needs to support a .close method that guarantees to close the file descriptor. AFAICT, the pure-Python prototype of scandir already does, but it should be specified in the PEP. While we are at it: is it intended that the generator will also support the other generator methods, in particular .send and .throw? Regards, Martin
participants (5)
-
"Martin v. Löwis"
-
Akira Li
-
Ben Hoyt
-
Charles-François Natali
-
Victor Stinner