On 10/19/20, Steve Dower <steve.dower@python.org> wrote:
On 15Oct2020 2239, Rob Cliffe via Python-Dev wrote:
TLDR: In os.scandir directory entries, atime is always a copy of mtime rather than the actual access time.
Correction - os.stat() updates the access time to _now_, while os.scandir() returns the last access time without updating it.
os.stat() shouldn't affect st_atime because it doesn't access the file data. That has me curious if it can be reproduced. With NTFS in Windows 10, I'd expect the os.stat() st_atime to change immediately when the file data is read or modified. With other filesystems, it may not be updated until the kernel file object that was used to access the file's data is closed. Note that updating the access time in NTFS can be disabled by the "NtfsDisableLastAccessUpdate" value in "HKLM\System\CurrentControlSet\Control\FileSystem". The default value in Windows 10 should be 0x80000002, which means the value is system managed and updating the access time is enabled. If it's only the access time that changes, the directory entry may be updated with a significant granularity such as hourly or daily. For NTFS, it's hourly. To confirm this, wait an hour from the current access time in the directory entry; open the file; read some data; and close the file. The access time in the directory entry should be updated. For details, download the [MS-FSA] PDF [1] and look for all references to the following sections: * 2.1.4.17 Algorithm for Noting That a File Has Been Modified * 2.1.4.19 Algorithm for Noting That a File Has Been Accessed * 2.1.4.18 Algorithm for Updating Duplicated Information Also check the tables in Appendix A, which provide the update granularity of file time stamps (presumably for directory entries) for common Windows filesystems. [1] https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-fsa/860b1516... Going back to my initial message, I can't stress enough that this problem is at its worst when a file has multiple hardlinks. If a particular link in a directory wasn't the last link used to access the file, its duplicated metadata may have the wrong file size, access time, modify time, and change time (the latter is not reported by Python). As is, for the current implementation, I'd only rely on the basic attributes such as whether it's a directory or reparse point (symlink, mountpoint, etc) when using scandir() to quickly process a directory. For reliable stat information, call os.stat(). I do think, however, that os.scandir() can be improved in Windows without significant performance loss if it calls GetFileAttributesExW to get st_file_attributes, st_size, st_ctime (create time), st_mtime, and st_atime. This API call is relatively fast because it doesn't require opening the file via CreateFileW, which is one of the more expensive operations in os.stat(). But I haven't tried modifying scandir() to benchmark it. Ultimately, I'm waiting for Windows 10 to provide a WinAPI function that calls the relatively new NTAPI function NtQueryInformationByName [2] (by name, not by handle!) to get the FileStatInformation, as well as for this information to be made available by handle via GetFileInformationByHandleEx. Compared to GetFileAttributesExW, the FileStatInformation additionally provides the file ID (if implemented by the filesystem), change time, reparse tag, number of links, and the effective access of the security context of the caller (i.e. process or thread access token). The latter is something that we've never impemented with os.stat(). It's not the same as POSIX owner-group-other permissions. It would need a new attribute such as st_effective_access. It could be used to provide a real implementation of os.access() in Windows. https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs...