
Thread 2 wakes up with the lock, calls the function, fills the cache, and releases the lock.
What exactly would the issue be with this: ``` import functools from threading import Lock def once(func): sentinel = object() cache = sentinel lock = Lock() @functools.wraps(func) def _wrapper(): nonlocal cache, lock, sentinel if cache is sentinel: with lock: if cache is sentinel: cache = func() return cache return _wrapper ``` It ensures it’s invoked once, it’s fast and it doesn’t require acquiring a lock needlessly. Always wrapping a lock around a static value to account for a small edge case seems redundant, more than “it’s usually not optimal”. It’s never optimal: in multi-threaded cases you’re causing needless contention, and in single threaded cases all your calls are now slower. Seems generally more correct, even in single threaded cases, to pay the overhead only in the first call if you want `call_once` semantics. Which is why you would be using `call_once` in the first place?

This recipe is the best variant so far and gives us something concrete to talk about :-) Benefits: Guarantees the wrapped function is not called more than once. Restrictions: Only works with zero argument functions. Risks: Any reentrancy or recursion will result in deadlock. Limitations: No instrumentation. No ability to reset or clear. Won't work across multiple processes. It would be nice to look at some compelling use cases. Off hand, I can't think of time when I would have used this decorator. Also, I have a nagging worry that holding a non-reentrant lock across an arbitrary user defined function call is recipe for deadlocks. That's why during code reviews we typically check every single use of Lock() to see if it should have been an RLock(), especially in big systems where GC, __del__, or weakref callbacks can trigger running any code at just about any time. Raymond

Risks: Any reentrancy or recursion will result in deadlock.
Reentrancy is a great point that I didn’t consider. I would say that as the intended use case for this is returning lazy singletons of some kind then reentrant calls would be a bug that would end with a recursion error - which is infinitely better and more debuggable than a deadlock. So yes, this should be a `RLock` instead.
Limitations: No instrumentation. No ability to reset or clear. Won't work across multiple processes.
I get the feeling that you might be about to make the point that this is somewhat re-inventing the `lru_cache` wheel, and I honestly agree with you. Given that the `lru_cache()` overhead with no arguments is so small, would you accept a PR that defaults `maxsize` to `None` if the wrapped function accepts no arguments and a change to the documentation showing that you could use `lru_cache` as a replacement for `call_once`? Or, we could add the ability to clear and add stats pretty easily. Ugly implementation: ``` def once(func): sentinel = object() cache = sentinel lock = Lock() hit_count = 0 @functools.wraps(func) def _wrapper(): nonlocal cache, lock, sentinel if cache is sentinel: with lock: if cache is sentinel: cache = func() else: hit_count += 1 else: hit_count += 1 return cache def clear(): cache = sentinel def stats(): return hit_count _wrapper.clear = clear _wrapper.stats = stats return _wrapper ```
It would be nice to look at some compelling use cases. Off hand, I can't think of time when I would have used this decorator. Also, I have a nagging worry that holding a non-reentrant lock across an arbitrary user defined function call is recipe for deadlocks. That's why during code reviews we typically check every single use of Lock() to see if it should have been an RLock(), especially in big systems where GC, __del__, or weakref callbacks can trigger running any code at just about any time.
There are two specific use cases that I’ve seen: 1. You want a module-level singleton that creates an object that depends on something that’s not present yet. In it’s simplest form that might involve circular dependencies: ``` @call_once() def create_client(): # some_module imports the outer module from some_module import something something(…) ``` In the case of Django <https://github.com/django/django/blob/77aa74cb70dd85497dbade6bc0f394aa41e88c...>, the “something that’s not present yet” is the Django settings: ``` @functools.lru_cache() def get_default_renderer(): renderer_class = import_string(settings.FORM_RENDERER) return renderer_class() ``` We don’t want to needlessly re-create the default form renderer, and we cannot create this instance at the module level. 2. You ant to create a singleton instance that is lazily created, can be easily mocked. For example, a `redis` client: ``` @call_once() def get_redis_connection(): return redis.Redis(host='localhost', port=6379, db=0) ``` You don’t want to create this at the module level as the class might well start creating connections to Redis when it’s instantiated. On top of that other modules can do `from redis import get_redis_connection`, rather than `from redis import redis_connection`, which is easier and more consistent to mock.

On Apr 29, 2020, at 11:15, Tom Forbes <tom@tomforb.es> wrote:
You’ve written an exactly equIvalent to the double-checked locking for singletons examples that broke Java 1.4 and C++03 and led to us having once functions in the first place. In both of those languages, and most others, there is no guarantee that the write to cache in thread 1 happens between the two reads from cache in thread 2. Which gives you the fun kind of bug that every few thousand runs you have corrupted data an hour later, or it works fine on your computer but it crashes for one of your users because they have two CPUs that don’t share L2 cache while you have all your cores on the same die, or it works fine until you change some completely unrelated part of the code, etc. Java solved this by adding volatile variables in Java 5 (existing code was still broken, but just mark cache volatile and it’s fixed); C++11 added a compiler-assisted call_once function (and added a memory model that allows them to specify exactly what happens and when so that the desired behavior was actually guaranteeable). Newer languages learned from their experience and got it right the first time, rather than repeating the same mistake. Is there anything about Python’s memory model guarantee that means it can’t happen in Python? I don’t think there _is_ a memory model. In CPython, or any GIL-based implementation, I _think_ it’s safe (the other thread can’t be running at the same time on a different core, so there can’t be a cache coherency ordering issue between the cores, right?), but what about on Jython, or PyPy-STM, or a future GIL-less Python? And in both of those languages, double-checked locking is still nowhere near as efficient as using a local static.
Seems generally more correct, even in single threaded cases, to pay the overhead only in the first call if you want `call_once` semantics. Which is why you would be using `call_once` in the first place?
But you won’t be paying the overhead only on the first call, you’ll be paying it on all of the calls that before the first one completed. That’s the whole point of the lock, after all—they have to wait until it’s ready—and they can’t possibly do that without the lock overhead. And for the next few afterward, because they’ll have gotten far enough to check even if they haven’t gotten far enough to get the lock, and there’s no way they can know they don’t need the lock. And for the next few after that, because unless the system only runs one thread at a time and synchronizes all of memory every time you switch threads they may not see the write yet anyway.

You’ve written an exactly equIvalent to the double-checked locking for singletons examples that broke Java 1.4 and C++03 and led to us having once functions in the first place. … but what about on Jython, or PyPy-STM, or a future GIL-less Python?
While I truly do appreciate your feedback on this idea, I’m really not clear on your line of reasoning here. What specifically do you propose would be the issue with the *Python* implementation? Are you proposing that under some Python implementations `cache = func()` could be… the result of half a function call? I could buy an issue with some implementations meaning that `cache` still appears as `sentinel` in specific situations, but I feel that would constitute a pretty obvious bug in the implementation that would impact a _lot_ of other multithreaded code rather than a glaring issue with this snippet. Both the issues you’ve referenced valid, but also are rather specific to the languages that they affect. I don’t believe they apply to Python.
But you won’t be paying the overhead only on the first call, you’ll be paying it on all of the calls that before the first one completed.
Sure, that was a typo. It should have read:
Seems generally more correct, even in single threaded cases, to pay the overhead only in the first call (or first few calls if there is contention) if you want `call_once` semantics
I still think the point stands. With your two-separate-decorators approach you’re paying it on every call. As a general purpose `call_once()` implementation I think the snippet works well, but obviously if you have some very specific use-case where it’s not appropriate - well then you are probably able to write a very specific and suitable decorator.

On May 1, 2020, at 09:51, Tom Forbes <tom@tomforb.es> wrote:
But the issues really aren’t specific to C++ and Java. The only reason C#, Swift, Go, etc. don’t have the same problem is that their memory models were designed from the start to provide a way to do this correctly. Python was not. There was an attempt to define a memory model in the 00’s (PEP 583), but it was withdrawn. According to the discussion around that PEP about when you can see uninitialized variables (not exactly the same issue, but closely related), Jython is safe when they’re globals or instance attributes and you haven’t replaced the module or object dict, but otherwise probably not; IronPython is probably safe in the same cases and more but nobody’s actually sure. Does that sound good enough to dismiss the problem?
I still think the point stands. With your two-separate-decorators approach you’re paying it on every call. As a general purpose `call_once()` implementation I think the snippet works well, but obviously if you have some very specific use-case where it’s not appropriate - well then you are probably able to write a very specific and suitable decorator.
Being willing to trade safety or portability for speed is sometimes a good tradeoff, but that’s the special use case, not the other way around. People who don’t know exactly what they need should get something safe and portable. Plus, there’s still the huge issue with single-threaded programs. It’s not like multi-threaded programs are ubiquitous in Python but, e.g., asyncio is some rare niche thing that the stdlib doesn’t have to worry about. A bunch of coroutines using a once function needs either nothing, or a coro lock; if you build a threading lock into the function, they waste time and maybe deadlock every 10000 startups for no benefit whatsoever. Why is that acceptable for a general purpose function?

Does that sound good enough to dismiss the problem?
Sure, tangibly related issues with other implementations from a discussion in the 00’s doesn’t seem like a fantastic argument, especially given that we use this exact pattern already in the stdlib without issue: https://github.com/python/cpython/blob/master/Lib/functools.py#L1200
they waste time and maybe deadlock every 10000 startups for no benefit whatsoever. Why is that acceptable for a general purpose function?
Acquiring a lock once (or even a couple of times when contested) in order to make a general purpose function work in single and multi threaded contexts does seem acceptable to me. There would be no deadlocks (assuming an RLock is used instead) nor any real perceivable performance issue. Basically, in the grand scheme of things some single threaded code acquiring a lock once is not really important especially when it’s more performant in the general case. And clearly the current use of this pattern in the stdlib means others have come to the same conclusion.

This recipe is the best variant so far and gives us something concrete to talk about :-) Benefits: Guarantees the wrapped function is not called more than once. Restrictions: Only works with zero argument functions. Risks: Any reentrancy or recursion will result in deadlock. Limitations: No instrumentation. No ability to reset or clear. Won't work across multiple processes. It would be nice to look at some compelling use cases. Off hand, I can't think of time when I would have used this decorator. Also, I have a nagging worry that holding a non-reentrant lock across an arbitrary user defined function call is recipe for deadlocks. That's why during code reviews we typically check every single use of Lock() to see if it should have been an RLock(), especially in big systems where GC, __del__, or weakref callbacks can trigger running any code at just about any time. Raymond

Risks: Any reentrancy or recursion will result in deadlock.
Reentrancy is a great point that I didn’t consider. I would say that as the intended use case for this is returning lazy singletons of some kind then reentrant calls would be a bug that would end with a recursion error - which is infinitely better and more debuggable than a deadlock. So yes, this should be a `RLock` instead.
Limitations: No instrumentation. No ability to reset or clear. Won't work across multiple processes.
I get the feeling that you might be about to make the point that this is somewhat re-inventing the `lru_cache` wheel, and I honestly agree with you. Given that the `lru_cache()` overhead with no arguments is so small, would you accept a PR that defaults `maxsize` to `None` if the wrapped function accepts no arguments and a change to the documentation showing that you could use `lru_cache` as a replacement for `call_once`? Or, we could add the ability to clear and add stats pretty easily. Ugly implementation: ``` def once(func): sentinel = object() cache = sentinel lock = Lock() hit_count = 0 @functools.wraps(func) def _wrapper(): nonlocal cache, lock, sentinel if cache is sentinel: with lock: if cache is sentinel: cache = func() else: hit_count += 1 else: hit_count += 1 return cache def clear(): cache = sentinel def stats(): return hit_count _wrapper.clear = clear _wrapper.stats = stats return _wrapper ```
It would be nice to look at some compelling use cases. Off hand, I can't think of time when I would have used this decorator. Also, I have a nagging worry that holding a non-reentrant lock across an arbitrary user defined function call is recipe for deadlocks. That's why during code reviews we typically check every single use of Lock() to see if it should have been an RLock(), especially in big systems where GC, __del__, or weakref callbacks can trigger running any code at just about any time.
There are two specific use cases that I’ve seen: 1. You want a module-level singleton that creates an object that depends on something that’s not present yet. In it’s simplest form that might involve circular dependencies: ``` @call_once() def create_client(): # some_module imports the outer module from some_module import something something(…) ``` In the case of Django <https://github.com/django/django/blob/77aa74cb70dd85497dbade6bc0f394aa41e88c...>, the “something that’s not present yet” is the Django settings: ``` @functools.lru_cache() def get_default_renderer(): renderer_class = import_string(settings.FORM_RENDERER) return renderer_class() ``` We don’t want to needlessly re-create the default form renderer, and we cannot create this instance at the module level. 2. You ant to create a singleton instance that is lazily created, can be easily mocked. For example, a `redis` client: ``` @call_once() def get_redis_connection(): return redis.Redis(host='localhost', port=6379, db=0) ``` You don’t want to create this at the module level as the class might well start creating connections to Redis when it’s instantiated. On top of that other modules can do `from redis import get_redis_connection`, rather than `from redis import redis_connection`, which is easier and more consistent to mock.

On Apr 29, 2020, at 11:15, Tom Forbes <tom@tomforb.es> wrote:
You’ve written an exactly equIvalent to the double-checked locking for singletons examples that broke Java 1.4 and C++03 and led to us having once functions in the first place. In both of those languages, and most others, there is no guarantee that the write to cache in thread 1 happens between the two reads from cache in thread 2. Which gives you the fun kind of bug that every few thousand runs you have corrupted data an hour later, or it works fine on your computer but it crashes for one of your users because they have two CPUs that don’t share L2 cache while you have all your cores on the same die, or it works fine until you change some completely unrelated part of the code, etc. Java solved this by adding volatile variables in Java 5 (existing code was still broken, but just mark cache volatile and it’s fixed); C++11 added a compiler-assisted call_once function (and added a memory model that allows them to specify exactly what happens and when so that the desired behavior was actually guaranteeable). Newer languages learned from their experience and got it right the first time, rather than repeating the same mistake. Is there anything about Python’s memory model guarantee that means it can’t happen in Python? I don’t think there _is_ a memory model. In CPython, or any GIL-based implementation, I _think_ it’s safe (the other thread can’t be running at the same time on a different core, so there can’t be a cache coherency ordering issue between the cores, right?), but what about on Jython, or PyPy-STM, or a future GIL-less Python? And in both of those languages, double-checked locking is still nowhere near as efficient as using a local static.
Seems generally more correct, even in single threaded cases, to pay the overhead only in the first call if you want `call_once` semantics. Which is why you would be using `call_once` in the first place?
But you won’t be paying the overhead only on the first call, you’ll be paying it on all of the calls that before the first one completed. That’s the whole point of the lock, after all—they have to wait until it’s ready—and they can’t possibly do that without the lock overhead. And for the next few afterward, because they’ll have gotten far enough to check even if they haven’t gotten far enough to get the lock, and there’s no way they can know they don’t need the lock. And for the next few after that, because unless the system only runs one thread at a time and synchronizes all of memory every time you switch threads they may not see the write yet anyway.

You’ve written an exactly equIvalent to the double-checked locking for singletons examples that broke Java 1.4 and C++03 and led to us having once functions in the first place. … but what about on Jython, or PyPy-STM, or a future GIL-less Python?
While I truly do appreciate your feedback on this idea, I’m really not clear on your line of reasoning here. What specifically do you propose would be the issue with the *Python* implementation? Are you proposing that under some Python implementations `cache = func()` could be… the result of half a function call? I could buy an issue with some implementations meaning that `cache` still appears as `sentinel` in specific situations, but I feel that would constitute a pretty obvious bug in the implementation that would impact a _lot_ of other multithreaded code rather than a glaring issue with this snippet. Both the issues you’ve referenced valid, but also are rather specific to the languages that they affect. I don’t believe they apply to Python.
But you won’t be paying the overhead only on the first call, you’ll be paying it on all of the calls that before the first one completed.
Sure, that was a typo. It should have read:
Seems generally more correct, even in single threaded cases, to pay the overhead only in the first call (or first few calls if there is contention) if you want `call_once` semantics
I still think the point stands. With your two-separate-decorators approach you’re paying it on every call. As a general purpose `call_once()` implementation I think the snippet works well, but obviously if you have some very specific use-case where it’s not appropriate - well then you are probably able to write a very specific and suitable decorator.

On May 1, 2020, at 09:51, Tom Forbes <tom@tomforb.es> wrote:
But the issues really aren’t specific to C++ and Java. The only reason C#, Swift, Go, etc. don’t have the same problem is that their memory models were designed from the start to provide a way to do this correctly. Python was not. There was an attempt to define a memory model in the 00’s (PEP 583), but it was withdrawn. According to the discussion around that PEP about when you can see uninitialized variables (not exactly the same issue, but closely related), Jython is safe when they’re globals or instance attributes and you haven’t replaced the module or object dict, but otherwise probably not; IronPython is probably safe in the same cases and more but nobody’s actually sure. Does that sound good enough to dismiss the problem?
I still think the point stands. With your two-separate-decorators approach you’re paying it on every call. As a general purpose `call_once()` implementation I think the snippet works well, but obviously if you have some very specific use-case where it’s not appropriate - well then you are probably able to write a very specific and suitable decorator.
Being willing to trade safety or portability for speed is sometimes a good tradeoff, but that’s the special use case, not the other way around. People who don’t know exactly what they need should get something safe and portable. Plus, there’s still the huge issue with single-threaded programs. It’s not like multi-threaded programs are ubiquitous in Python but, e.g., asyncio is some rare niche thing that the stdlib doesn’t have to worry about. A bunch of coroutines using a once function needs either nothing, or a coro lock; if you build a threading lock into the function, they waste time and maybe deadlock every 10000 startups for no benefit whatsoever. Why is that acceptable for a general purpose function?

Does that sound good enough to dismiss the problem?
Sure, tangibly related issues with other implementations from a discussion in the 00’s doesn’t seem like a fantastic argument, especially given that we use this exact pattern already in the stdlib without issue: https://github.com/python/cpython/blob/master/Lib/functools.py#L1200
they waste time and maybe deadlock every 10000 startups for no benefit whatsoever. Why is that acceptable for a general purpose function?
Acquiring a lock once (or even a couple of times when contested) in order to make a general purpose function work in single and multi threaded contexts does seem acceptable to me. There would be no deadlocks (assuming an RLock is used instead) nor any real perceivable performance issue. Basically, in the grand scheme of things some single threaded code acquiring a lock once is not really important especially when it’s more performant in the general case. And clearly the current use of this pattern in the stdlib means others have come to the same conclusion.
participants (4)
-
Andrew Barnert
-
Antoine Pitrou
-
Raymond Hettinger
-
Tom Forbes