On the sneaky caching behavior of twisted.python.filepath.FilePath

Hello,
I was reviewing some third-party code today and noticed many bugs in it that arise from FilePath's caching behavior.
I found https://github.com/twisted/twisted/issues/3161 (only after I wrote up https://github.com/twisted/twisted/issues/11676 though).
The cache behavior is long-standing but I think it needs to go. It represents a dangerous and subtle trap for anyone who tries to use FilePath, which I would *like *to be everyone - but that would be an easier sell without traps like this one.
Jean-Paul

On 21 Sep 2022, at 14:58, Jean-Paul Calderone exarkun@twistedmatrix.com wrote:
Hello,
I was reviewing some third-party code today and noticed many bugs in it that arise from FilePath's caching behavior.
I found https://github.com/twisted/twisted/issues/3161 (only after I wrote up https://github.com/twisted/twisted/issues/11676 though).
The cache behavior is long-standing but I think it needs to go. It represents a dangerous and subtle trap for anyone who tries to use FilePath, which I would like to be everyone - but that would be an easier sell without traps like this one.
There is talk I the ticket about os.listdir being a problem. Isn’t one option to use os.scandir that includes some of the stat info for free?
Barry
Jean-Paul
Twisted mailing list -- twisted@python.org To unsubscribe send an email to twisted-leave@python.org https://mail.python.org/mailman3/lists/twisted.python.org/ Message archived at https://mail.python.org/archives/list/twisted@python.org/message/36FC363TOBH... Code of Conduct: https://twisted.org/conduct

On Wed, Sep 21, 2022 at 11:03 AM Barry barry@barrys-emacs.org wrote:
On 21 Sep 2022, at 14:58, Jean-Paul Calderone exarkun@twistedmatrix.com wrote:
Hello,
I was reviewing some third-party code today and noticed many bugs in it that arise from FilePath's caching behavior.
I found https://github.com/twisted/twisted/issues/3161 (only after I wrote up https://github.com/twisted/twisted/issues/11676 though).
The cache behavior is long-standing but I think it needs to go. It represents a dangerous and subtle trap for anyone who tries to use FilePath, which I would *like *to be everyone - but that would be an easier sell without traps like this one.
There is talk I the ticket about os.listdir being a problem. Isn’t one option to use os.scandir that includes some of the stat info for free?
Barry
Neat. I didn't know about os.scandir (and it looks like it didn't exist at the time of that discussion). Based on the comments on the ticket about Windows behavior, it does look like this would get you closer to feature parity there (I'm not sure exactly what you get on Windows and unfortunately the msdn links glyph left are now 404s and don't give any hint about what APIs he was talking about).
If we just took advantage of os.scandir and updated the current implementation then you could probably have `FilePath.children` return a list of `FilePath` instances that already have their metadata cache populated so they would avoid doing even one stat().
However, something else we could do is introduce a new API like `FilePath.children` that returns both the metadata and the path object side-by-side. This would fit with the idea of removing the caching from `FilePath` altogether while still letting you avoid all of the stat() calls in the case where you're traversing the contents of a directory - both on Windows and POSIX.
Thanks for pointing that out!
Jean-Paul

On Sep 21, 2022, at 8:47 AM, Jean-Paul Calderone exarkun@twistedmatrix.com wrote:
On Wed, Sep 21, 2022 at 11:03 AM Barry <barry@barrys-emacs.org mailto:barry@barrys-emacs.org> wrote:
On 21 Sep 2022, at 14:58, Jean-Paul Calderone <exarkun@twistedmatrix.com mailto:exarkun@twistedmatrix.com> wrote:
Hello,
I was reviewing some third-party code today and noticed many bugs in it that arise from FilePath's caching behavior.
I found https://github.com/twisted/twisted/issues/3161 https://github.com/twisted/twisted/issues/3161 (only after I wrote up https://github.com/twisted/twisted/issues/11676 https://github.com/twisted/twisted/issues/11676 though).
The cache behavior is long-standing but I think it needs to go. It represents a dangerous and subtle trap for anyone who tries to use FilePath, which I would like to be everyone - but that would be an easier sell without traps like this one.
There is talk I the ticket about os.listdir being a problem. Isn’t one option to use os.scandir that includes some of the stat info for free?
Barry
Neat. I didn't know about os.scandir (and it looks like it didn't exist at the time of that discussion). Based on the comments on the ticket about Windows behavior, it does look like this would get you closer to feature parity there (I'm not sure exactly what you get on Windows and unfortunately the msdn links glyph left are now 404s and don't give any hint about what APIs he was talking about).
If we just took advantage of os.scandir and updated the current implementation then you could probably have `FilePath.children` return a list of `FilePath` instances that already have their metadata cache populated so they would avoid doing even one stat().
However, something else we could do is introduce a new API like `FilePath.children` that returns both the metadata and the path object side-by-side. This would fit with the idea of removing the caching from `FilePath` altogether while still letting you avoid all of the stat() calls in the case where you're traversing the contents of a directory - both on Windows and POSIX.
Thanks for pointing that out!
I've long been a defender (the main defender?) of the cache as a feature, so it is incumbent upon me to say this: the reasons I cared about this are long-since obsolete. Probably obsolete long before I realized they were, but certainly they have been dead and gone for at least 5 years, given the improved performance of OS filesystem caches and the widespread prevalence of SSDs.
I now regard the cache as a misfeature at best, and have no interest in its continued existence.
Let's kill it and get rid of the subtle traps. It is debatable whether they were ever worth it for the edge case performance increases it used to grant, but they certainly aren't today.
-g
participants (3)
-
Barry
-
Glyph
-
Jean-Paul Calderone