[Python-Dev] PEP 471 (scandir): Add a new DirEntry.inode() method?

Victor Stinner victor.stinner at gmail.com
Fri Feb 13 10:46:55 CET 2015


Hi,

TL;DR: on POSIX, is it useful to know the inode number (st_ino)
without the device number (st_dev)?

While reading feedback on the Python 3.5 alpha 1 release, I saw a
comment saying that the current design of os.scandir() (PEP 471)
doesn't fit a very specific usecase where the inode number is needed:

"Ah, turns out we needed even more optimizations than that is able to
give us; in particular, the underlying system readdir call gives us
the inode number, which we need to compare against a cache of hard
links, in order to avoid having to stat the underlying files if we've
already done so on another hard link. It looks like the DirEntry API
used here only includes the path and name, not the inode number,
without invoking another stat call, and we needed to optimize out that
extra stat call."
https://www.reddit.com/r/Python/comments/2synry/so_8_peps_are_currently_being_proposed_for_python/cnvnz1w

Since the C function readdir() provides the inode number (d_ino field
of the dirent structure), I propose add a new DirEntry.inode() method.


*** Now the real question: is it useful to know the inode number
(st_ino) without the device number (st_dev)? ***

On POSIX, you can still get the st_dev from DirEntry.stat(), but it
always require a system call. So you loose the whole purpose of
DirEntry (no extra syscall).

I wrote a script script check_stdev.py, attached to this email, to
check if all entries of a directory have the same st_dev value than
the directory itself:

- same for /usr/bin, /usr/lib, /tmp, /proc, ...
- different for /dev

What about "union" file systems like UnionFS or thinks like "mount -o
bind"? Can someone test? Does anyone have some information?

So the answer looks to be: it's useful for all directories except of
/dev. Example:
---
/dev/hugepages st_dev is different: 35 vs 5
/dev/mqueue st_dev is different: 13 vs 5
/dev/pts st_dev is different: 11 vs 5
/dev/shm st_dev is different: 17 vs 5
---


On POSIX, DirEntry.inode() just exposes the d_ino value from readdir().

On Windows, FirstFindFileW/FindFindFileW returns almost a full
stat_result structure, except of st_ino, st_dev and st_nlink fields
which are set to 0.

So DirEntry.inode() has to call os.lstat() to read the inode number.
The inode number will be cached by DirEntry.inode() in the DirEntry
object, but the os.lstat() result is dropped.

On Windows, I don't want to cache the full os.lstat() result from
DirEntry.inode() into DirEntry to replace the previous incomplete
stat_result from FirstFindFileW/FindFindFileW, because DirEntry.stat()
would return a different result (st_ino, st_dev, st_nlink fields set
or not) depending if the inode() methode was called or not.

Note: scandir-6.patch of http://bugs.python.org/issue22524 contains an
implementation of os.scandir() with DirEntry.inode(), if you want to
play.

Victor
-------------- next part --------------
A non-text attachment was scrubbed...
Name: check_stdev.py
Type: text/x-python
Size: 300 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-dev/attachments/20150213/198967c7/attachment.py>


More information about the Python-Dev mailing list