On Apr 26, 2020, at 10:49, Eric Fahlgren <ericfahlgren@gmail.com> wrote:

On Sun, Apr 26, 2020 at 9:46 AM Alex Hall <alex.mojaki@gmail.com> wrote:
It's not clear to me why people prefer an extra function which would be exactly equivalent to lru_cache in the expected use case (i.e. decorating a function without arguments). It seems like a good way to cause confusion, especially for beginners. Based on the Zen, there should be one obvious way to do it.

I don't believe it is.  lru_cache only guarantees that you will get the same result back for identical arguments, not that the function will only be called once.  Seems to me if you call it, then in the middle of caching the value, there's a thread change, you could get to the function wrapped by lru_cache twice (or more times).  In order to implement once, it needs to contain a thread lock to ensure its "once" moniker and support the singleton pattern for which it is currently being used (apparently incorrectly) in django and other places.  Am I understanding threading correctly here?

There are three different use cases for “once” in a threaded program: 1. It’s incorrect or dangerous to even call the function twice. 2. The function isn’t idempotent but you need it to be. 3. The function is idempotent and it’s purely a performance optimization. For the third case, you don’t need any synchronization for correctness (as long as reading and writing the cache value is atomic), and it may actually be a lot faster. Sure, it means occasionally you end up doing the work two or even more times at startup, but in exchange you avoid a zillion thread locks, which can be a lot more expensive. If that’s the case with those Django uses, they’re not using it incorrectly.

Also, if you know your app’s sequencing well enough and know exactly what the GIL guarantees, you might be able to prove (or at least convince yourself well enough that if test X passes it’s almost certainly safe) that there’s no chance of startup contention. This includes the really trivial case where you know what’s needed before you fork any threads that might need it (although for a lot of those cases, in Python, it’s probably simpler to just use a module global, but using an unsynchronized cache isn’t terrible for readability).

Of course it’s also possible that Django is using it incorrectly and it just shows up as a handful of web apps starting up wrong one in a million instances and there are live bugs all over the internet that nobody’s handling right. But I wouldn’t just assume that it’s incorrect and add a new feature to Python and encourage Django to rewrite a whole lot of code to use it without finding an actual bug first.

Also, it’s pretty easy to turn a unsynchronized implementation into a synchronized one: just add a @synchronized decorator around the @lru_cache or @cached_property decorator (or write a
simple @synchronized_lru_cache or @synchronized_cached_property decorator and use that). So, does Python really need to include anything in the stdlib to make it easier? (That’s not a rhetorical question; I’m not sure.)

On the other hand, if the bugs are actually the second case rather than the first, you can solve that with something faster than a full read-write mutex, but it’s a lot more complicated (and may not even be writeable in Python at all): read-acquire the cache, and if it’s empty, call the function and then compare-and-swap-release the cache, and if the CAS fails that means someone else got there first so discard your value and return theirs. If that comes up a lot and the performance benefit is often worth having, that seems like it should definitely be in the stdlib because people won’t get it right. But I doubt it does.

One last thing: the best way to cache an idempotent nullary function with lru_cache is to use maxsize=None. If people are leaving the default 128, maybe the docs need to be improved in some way?