On 10/19/20 9:52 AM, Gregory P. Smith wrote:
On Mon, Oct 19, 2020 at 6:28 AM Ivan Pozdeev via Python-Dev
mailto:python-dev@python.org> wrote: On 19.10.2020 14:47, Steve Dower wrote: > On 19Oct2020 1242, Steve Dower wrote: >> On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote: >>> TLDR: In os.scandir directory entries, atime is always a copy of mtime rather than the actual access time. >> >> Correction - os.stat() updates the access time to _now_, while os.scandir() returns the last access time without updating it. > > Let me correct myself first :) > > *Windows* has decided not to update file access time metadata *in directory entries* on reads. os.stat() always[1] looks at the file entry > metadata, while os.scandir() always looks at the directory entry metadata.
Is this behavior documented somewhere?
Such weirdness certaintly something that needs to be documented but I really don't like describing such quirks that are out of our control and may be subject to change in Python documentation. So we should only consider doing so if there are no other options.
I'm sure this is covered in MSDN. Linking to that if it has it in a concise explanation would make sense from a note in our docs.
If I'm understanding Steve correctly this is due to Windows/NTFS storing the access time potentially redundantly in two different places. One within the directory entry itself and one with the file's own metadata. Those of us with a traditional posix filesystem background may raise eyeballs at this duplication, seeing a directory as a place that merely maps names to inodes with the inode structure (equiv: file entry metadata) being the sole source of truth. Which ones get updated when and by what actions is up to the OS.
So yes, just document the "quirk" as an intended OS behavior. This is one reason scandir() can return additional information on windows vs what it can return on posix. The entire point of scandir() is to return as much as possible from the directory without triggering reads of the inodes/file-entry-metadata. :)
-gps
depending on atimes isn't a consistently reliable mechanism anyway, since filesystems on Linux et. al. are allowed to be mounted so as to not independently update access times.