Mailman 3 Weakrefs for lru_cache? - Python-ideas

Weakrefs for lru_cache?

Ram Rachum

Oct. 15, 2020

12:49 p.m.

Hi everyone, For many years, I've used a `cache` decorator that I built <https://github.com/cool-RR/python_toolbox/blob/master/python_toolbox/caching...> for caching Python functions. Then `functools.lru_cache` was implemented, which is much more standard. However, as far as I know, it holds normal references to its keys, rather than weak references. This means that it can cause memory leaks when it's storing items that don't have any references elsewhere. This often makes me reluctant to use it. What do you think about supporting weakrefs in for keys lru_cache? If I remember correctly, the main difficulty was that not all keys are of a type that can be weakreffed. If I remember correctly again, I've solved this by including logic that attempts a weakref when possible, and degrades to a strong ref. What do you think about that? Thanks, Ram.

Attachments:

attachment.htm (text/html — 1.0 KB)

Show replies by date

edwin＠211mainstreet.net

October 2020

1:10 p.m.

Usually when I use lru_cache I don't want items to randomly disappearing from the cache. Just because a key isn't referenced anywhere else doesn't mean that I want it to automatically disappear. That is the whole point. Take this example: @lru_cache def expensive_function(a,b,c): return a**b**c #no reference kept, so the weakref key gets evicted. print(expensive_function(5000,5000,5000)) #calls expensive_function all over again print(expensive_function(5000,5000,5000)) Doesn't your idea take away half the point of lru_cache, or am I not understanding your suggestion? October 15, 2020 1:49 PM, "Ram Rachum" <ram@rachum.com (mailto:ram@rachum.com?to=%22Ram%20Rachum%22%20<ram@rachum.com>)> wrote: Hi everyone, For many years, I've used a `cache` decorator that I built (https://github.com/cool-RR/python_toolbox/blob/master/python_toolbox/caching...) for caching Python functions. Then `functools.lru_cache` was implemented, which is much more standard. However, as far as I know, it holds normal references to its keys, rather than weak references. This means that it can cause memory leaks when it's storing items that don't have any references elsewhere. This often makes me reluctant to use it. What do you think about supporting weakrefs in for keys lru_cache? If I remember correctly, the main difficulty was that not all keys are of a type that can be weakreffed. If I remember correctly again, I've solved this by including logic that attempts a weakref when possible, and degrades to a strong ref. What do you think about that? Thanks, Ram.

Ram Rachum

1:12 p.m.

Your use case is valid, there are times in which I'd like a strong reference to be kept. But there are also lots of times I want it to not be kept. Then I would offer this feature as a flag `weakref=True` with a default of `False`. What do you think? On Thu, Oct 15, 2020 at 9:10 PM <edwin@211mainstreet.net> wrote:

...

Usually when I use lru_cache I don't want items to randomly disappearing from the cache. Just because a key isn't referenced anywhere else doesn't mean that I want it to automatically disappear. That is the whole point. Take this example:

@lru_cache def expensive_function(a,b,c): return a**b**c

#no reference kept, so the weakref key gets evicted. print(expensive_function(5000,5000,5000)) #calls expensive_function all over again print(expensive_function(5000,5000,5000))

Doesn't your idea take away half the point of lru_cache, or am I not understanding your suggestion?

October 15, 2020 1:49 PM, "Ram Rachum" <ram@rachum.com <ram@rachum.com?to=%22Ram%20Rachum%22%20%3Cram@rachum.com%3E>> wrote:

Hi everyone, For many years, I've used a `cache` decorator that I built <https://github.com/cool-RR/python_toolbox/blob/master/python_toolbox/caching...> for caching Python functions. Then `functools.lru_cache` was implemented, which is much more standard. However, as far as I know, it holds normal references to its keys, rather than weak references. This means that it can cause memory leaks when it's storing items that don't have any references elsewhere. This often makes me reluctant to use it. What do you think about supporting weakrefs in for keys lru_cache? If I remember correctly, the main difficulty was that not all keys are of a type that can be weakreffed. If I remember correctly again, I've solved this by including logic that attempts a weakref when possible, and degrades to a strong ref. What do you think about that? Thanks, Ram.

edwin＠211mainstreet.net

1:19 p.m.

That would satisfy my concerns. October 15, 2020 2:12 PM, "Ram Rachum" <ram@rachum.com (mailto:ram@rachum.com?to=%22Ram%20Rachum%22%20<ram@rachum.com>)> wrote: Your use case is valid, there are times in which I'd like a strong reference to be kept. But there are also lots of times I want it to not be kept. Then I would offer this feature as a flag `weakref=True` with a default of `False`. What do you think? On Thu, Oct 15, 2020 at 9:10 PM <edwin@211mainstreet.net (mailto:edwin@211mainstreet.net)> wrote: Usually when I use lru_cache I don't want items to randomly disappearing from the cache. Just because a key isn't referenced anywhere else doesn't mean that I want it to automatically disappear. That is the whole point. Take this example: @lru_cache def expensive_function(a,b,c): return a**b**c #no reference kept, so the weakref key gets evicted. print(expensive_function(5000,5000,5000)) #calls expensive_function all over again print(expensive_function(5000,5000,5000)) Doesn't your idea take away half the point of lru_cache, or am I not understanding your suggestion? October 15, 2020 1:49 PM, "Ram Rachum" <ram@rachum.com (mailto:ram@rachum.com?to=%22Ram%20Rachum%22%20%3Cram@rachum.com%3E)> wrote: Hi everyone, For many years, I've used a `cache` decorator that I built (https://github.com/cool-RR/python_toolbox/blob/master/python_toolbox/caching...) for caching Python functions. Then `functools.lru_cache` was implemented, which is much more standard. However, as far as I know, it holds normal references to its keys, rather than weak references. This means that it can cause memory leaks when it's storing items that don't have any references elsewhere. This often makes me reluctant to use it. What do you think about supporting weakrefs in for keys lru_cache? If I remember correctly, the main difficulty was that not all keys are of a type that can be weakreffed. If I remember correctly again, I've solved this by including logic that attempts a weakref when possible, and degrades to a strong ref. What do you think about that? Thanks, Ram.

Paul Moore

1:28 p.m.

On Thu, 15 Oct 2020 at 19:14, Ram Rachum <ram@rachum.com> wrote:

...

Your use case is valid, there are times in which I'd like a strong reference to be kept. But there are also lots of times I want it to not be kept. Then I would offer this feature as a flag `weakref=True` with a default of `False`. What do you think?

I'd like to know what your use cases are. The key point is the comment from edwin@211mainstreet.net "Usually when I use lru_cache I don't want items to randomly disappearing from the cache". Your response is to make your desired behaviour optional, but you still haven't explained why it's needed in the first place. (To be clear, I'm -1 on the proposal anyway, as Chris Angelico said, building your own solution (or continuing to use the one you said you already had) seems like a fine solution to me, and having items get lost from the cache just because they were weakly referenced sounds to me like a bug magnet unless you know exactly what you're doing, even if people have to opt into it). Paul

David Mertz

4:33 p.m.

On Thu, Oct 15, 2020 at 2:16 PM Ram Rachum <ram@rachum.com> wrote:

...

Your use case is valid, there are times in which I'd like a strong reference to be kept. But there are also lots of times I want it to not be kept. Then I would offer this feature as a flag `weakref=True` with a default of `False`. What do you think?

I understand your use case. But it feels uncommon and niche to me. Especially given that you have already ` f.cache_clear()` available if you want it. Most of the time, if I have large objects, they are mutable objects. And if I am passing mutable objects into a function, it feels unlikely I want to cache the result of the computation (which, after all, might depend on the data inside the mutable object). If the interpreter holds onto an extra copy of a 4-tuple of scalars, or a dataclass instance, that seems unlikely to make a measurable difference. So the use case needs to be: * Function operates on large objects * Function operates on large, immutable objects * Function never takes literal or computed arguments (i.e. not `fun(big1+big2)`) * Large immutable objects are deleted selectively (OK, this is plausible) * The performance hit of clearing entire cache is not suitable Given that it's not really hard to write your own caching decorator, I don't feel like this needs to change the API of lru_cache. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

Carl Meyer

11:31 p.m.

On Thu, Oct 15, 2020 at 3:33 PM David Mertz <mertz@gnosis.cx> wrote:

...

So the use case needs to be:

* Function operates on large objects * Function operates on large, immutable objects * Function never takes literal or computed arguments (i.e. not `fun(big1+big2)`) * Large immutable objects are deleted selectively (OK, this is plausible) * The performance hit of clearing entire cache is not suitable

One real-world use case I've seen personally that meets all these criteria is per-request memoization of expensive work in a server application (using an `lru_cache`-like decorator on functions taking the request as an argument.) The request object plus things hanging off it can be quite large, it's often effectively immutable (at least inasmuch as the functions so decorated care), it's desirable for all cached data for a request to expire as soon as handling for the request is finished, but you wouldn't want to fully clear the caches (e.g. the application may handle multiple requests concurrently, so other requests may still be in-flight.)

...

Given that it's not really hard to write your own caching decorator, I don't feel like this needs to change the API of lru_cache.

But I don't disagree with this conclusion. There are often other specific needs that would mandate a custom decorator anyway. Carl

Ram Rachum

2:37 a.m.

That's similar to my use case. Big mutable object (right now a state in a multi-agent simulation) with lots of methods that I want to cache. "Given that it's not really hard to write your own caching decorator, I don't feel like this needs to change the API of lru_cache." Well, given that it's not really hard to write your own caching decorator, why is lru_cache in the standard library to begin with? ;) On Fri, Oct 16, 2020 at 7:31 AM Carl Meyer <carl@oddbird.net> wrote:

...

On Thu, Oct 15, 2020 at 3:33 PM David Mertz <mertz@gnosis.cx> wrote:

...
So the use case needs to be:

* Function operates on large objects * Function operates on large, immutable objects * Function never takes literal or computed arguments (i.e. not `fun(big1+big2)`) * Large immutable objects are deleted selectively (OK, this is plausible) * The performance hit of clearing entire cache is not suitable

One real-world use case I've seen personally that meets all these criteria is per-request memoization of expensive work in a server application (using an `lru_cache`-like decorator on functions taking the request as an argument.) The request object plus things hanging off it can be quite large, it's often effectively immutable (at least inasmuch as the functions so decorated care), it's desirable for all cached data for a request to expire as soon as handling for the request is finished, but you wouldn't want to fully clear the caches (e.g. the application may handle multiple requests concurrently, so other requests may still be in-flight.)

...
Given that it's not really hard to write your own caching decorator, I don't feel like this needs to change the API of lru_cache.

But I don't disagree with this conclusion. There are often other specific needs that would mandate a custom decorator anyway.

Carl

Paul Moore

3:25 a.m.

On Fri, 16 Oct 2020 at 08:42, Ram Rachum <ram@rachum.com> wrote:

...

That's similar to my use case. Big mutable object (right now a state in a multi-agent simulation) with lots of methods that I want to cache.

Thanks all for the explanation of use cases. In this specific example, if you're caching methods where self is the "big object", wouldn't a per-class cache be a better approach? Then the cache is automatically dropped when the object is. Paul

Ram Rachum

3:34 a.m.

Did you mean like keeping a hidden attribute on the object with the result? Well, that'd require manually keeping track of these attributes for each method I'm caching. I do that sometimes, but it's verbose. On Fri, Oct 16, 2020 at 11:26 AM Paul Moore <p.f.moore@gmail.com> wrote:

...

On Fri, 16 Oct 2020 at 08:42, Ram Rachum <ram@rachum.com> wrote:

...
That's similar to my use case. Big mutable object (right now a state in

a multi-agent simulation) with lots of methods that I want to cache.

Thanks all for the explanation of use cases.

In this specific example, if you're caching methods where self is the "big object", wouldn't a per-class cache be a better approach? Then the cache is automatically dropped when the object is.

Paul

Irit Katriel

3:44 a.m.

You can use a global WeakKeyDictionary keyed by the object to achieve the same without having anything on the object. On Friday, October 16, 2020, 09:37:05 AM GMT+1, Ram Rachum <ram@rachum.com> wrote: Did you mean like keeping a hidden attribute on the object with the result? Well, that'd require manually keeping track of these attributes for each method I'm caching. I do that sometimes, but it's verbose.

Ram Rachum

3:47 a.m.

I do that in a part of my code, and I need to handle the logic of first looking at that dictionary, and if it's not there, calculate it and store it in the dictionary. Is this what you meant? That's what I'm trying to avoid, doing all that manual work that a cache decorator is supposed to do for me in one line. On Fri, Oct 16, 2020 at 11:45 AM Irit Katriel <iritkatriel@yahoo.com> wrote:

...

You can use a global WeakKeyDictionary keyed by the object to achieve the same without having anything on the object.

On Friday, October 16, 2020, 09:37:05 AM GMT+1, Ram Rachum <ram@rachum.com> wrote:

Did you mean like keeping a hidden attribute on the object with the result? Well, that'd require manually keeping track of these attributes for each method I'm caching. I do that sometimes, but it's verbose.

Irit Katriel

4 a.m.

When you initially opened this thread I thought you were suggesting a decorator that does the equivalent of a WeakValueDictionary. I guess you would need both in the library. We have both types of weak caches in our system (weak on value or weak on key), which we implemented. But the one that is weak on key (the function args) is evicted when any key element is GCed, rather than when the whole key is GCed. (It's tied to our system's memory management scheme in a very particular way.) So there are variations to think about with respect to that. Maybe this is too complicated to be in the standard library? On Friday, October 16, 2020, 09:49:04 AM GMT+1, Ram Rachum <ram@rachum.com> wrote: I do that in a part of my code, and I need to handle the logic of first looking at that dictionary, and if it's not there, calculate it and store it in the dictionary. Is this what you meant? That's what I'm trying to avoid, doing all that manual work that a cache decorator is supposed to do for me in one line.

Antoine Pitrou

4:33 a.m.

On Fri, 16 Oct 2020 11:34:08 +0300 Ram Rachum <ram@rachum.com> wrote:

...

Did you mean like keeping a hidden attribute on the object with the result? Well, that'd require manually keeping track of these attributes for each method I'm caching. I do that sometimes, but it's verbose.

You can write a decorator that does that, generating the hidden attribute name based on the decorated function name and module. Regards Antoine.

Ram Rachum

4:38 a.m.

Well, I can also write a decorator that does lru_cache, but there are a lot of advantages that it's available in the standard library... On Fri, Oct 16, 2020 at 12:35 PM Antoine Pitrou <solipsis@pitrou.net> wrote:

...

On Fri, 16 Oct 2020 11:34:08 +0300 Ram Rachum <ram@rachum.com> wrote:

...
Did you mean like keeping a hidden attribute on the object with the result? Well, that'd require manually keeping track of these attributes for each method I'm caching. I do that sometimes, but it's verbose.

You can write a decorator that does that, generating the hidden attribute name based on the decorated function name and module.

Regards

Antoine.

_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/URI4SA... Code of Conduct: http://python.org/psf/codeofconduct/

Antoine Pitrou

7:05 a.m.

On Fri, 16 Oct 2020 12:38:49 +0300 Ram Rachum <ram@rachum.com> wrote:

...

Well, I can also write a decorator that does lru_cache, but there are a lot of advantages that it's available in the standard library...

But the standard libary doesn't have to contain every utility function on Earth. And if you want it to be in the standard library, your best bet is to write it anyway and propose it for inclusion. Regards Antoine.

Antoine Pitrou

4:31 a.m.

On Thu, 15 Oct 2020 22:31:25 -0600 Carl Meyer <carl@oddbird.net> wrote:

...

On Thu, Oct 15, 2020 at 3:33 PM David Mertz <mertz@gnosis.cx> wrote:

...
So the use case needs to be:

* Function operates on large objects * Function operates on large, immutable objects * Function never takes literal or computed arguments (i.e. not `fun(big1+big2)`) * Large immutable objects are deleted selectively (OK, this is plausible) * The performance hit of clearing entire cache is not suitable

One real-world use case I've seen personally that meets all these criteria is per-request memoization of expensive work in a server application (using an `lru_cache`-like decorator on functions taking the request as an argument.) The request object plus things hanging off it can be quite large, it's often effectively immutable (at least inasmuch as the functions so decorated care), it's desirable for all cached data for a request to expire as soon as handling for the request is finished, but you wouldn't want to fully clear the caches (e.g. the application may handle multiple requests concurrently, so other requests may still be in-flight.)

A common recipe for that is simply to set a custom attribute on the request: request._myapp_cached_data = ... It's probably much faster than caching through a dict, as well. Regards Antoine.

Chris Angelico

1:21 p.m.

On Fri, Oct 16, 2020 at 4:51 AM Ram Rachum <ram@rachum.com> wrote:

...

Hi everyone,

For many years, I've used a `cache` decorator that I built for caching Python functions. Then `functools.lru_cache` was implemented, which is much more standard. However, as far as I know, it holds normal references to its keys, rather than weak references. This means that it can cause memory leaks when it's storing items that don't have any references elsewhere. This often makes me reluctant to use it.

What do you think about supporting weakrefs in for keys lru_cache?

-1; you can always build your own when you need special purpose differences like weakrefs. ChrisA

Irit Katriel

1:38 p.m.

We have a couple of weak caches In our system, so I see the use case. I don’t like the idea of silently reverting to a strong ref though. I’ve had bad experiences with systems that try to be too helpful in this manner. It seems like a good idea until it becomes a big problem, and then you have interesting debates about who’s fault it is. Irit

...

On 15 Oct 2020, at 18:53, Ram Rachum <ram@rachum.com> wrote:

Hi everyone,

For many years, I've used a `cache` decorator that I built for caching Python functions. Then `functools.lru_cache` was implemented, which is much more standard. However, as far as I know, it holds normal references to its keys, rather than weak references. This means that it can cause memory leaks when it's storing items that don't have any references elsewhere. This often makes me reluctant to use it.

What do you think about supporting weakrefs in for keys lru_cache?

If I remember correctly, the main difficulty was that not all keys are of a type that can be weakreffed. If I remember correctly again, I've solved this by including logic that attempts a weakref when possible, and degrades to a strong ref. What do you think about that?

Thanks, Ram. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Q6DT72... Code of Conduct: http://python.org/psf/codeofconduct/

Ram Rachum

1:42 p.m.

Agreed. A possible solution is that if given weakref=True and a non-weakreffable key, an exception will be raised when the function attempts to save it in cache. Paul, I'll see if I can give an example use case tomorrow. On Thu, Oct 15, 2020, 21:38 Irit Katriel <iritkatriel@yahoo.com> wrote:

...

We have a couple of weak caches In our system, so I see the use case.

I don’t like the idea of silently reverting to a strong ref though. I’ve had bad experiences with systems that try to be too helpful in this manner. It seems like a good idea until it becomes a big problem, and then you have interesting debates about who’s fault it is.

Irit

On 15 Oct 2020, at 18:53, Ram Rachum <ram@rachum.com> wrote:

Hi everyone,

For many years, I've used a `cache` decorator that I built <https://github.com/cool-RR/python_toolbox/blob/master/python_toolbox/caching...> for caching Python functions. Then `functools.lru_cache` was implemented, which is much more standard. However, as far as I know, it holds normal references to its keys, rather than weak references. This means that it can cause memory leaks when it's storing items that don't have any references elsewhere. This often makes me reluctant to use it.

What do you think about supporting weakrefs in for keys lru_cache?

If I remember correctly, the main difficulty was that not all keys are of a type that can be weakreffed. If I remember correctly again, I've solved this by including logic that attempts a weakref when possible, and degrades to a strong ref. What do you think about that?

Thanks, Ram. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Q6DT72... Code of Conduct: http://python.org/psf/codeofconduct/

Serhiy Storchaka

4:45 a.m.

15.10.20 20:49, Ram Rachum пише:

...

Hi everyone,

For many years, I've used a `cache` decorator that I built <https://github.com/cool-RR/python_toolbox/blob/master/python_toolbox/caching...> for caching Python functions. Then `functools.lru_cache` was implemented, which is much more standard. However, as far as I know, it holds normal references to its keys, rather than weak references. This means that it can cause memory leaks when it's storing items that don't have any references elsewhere. This often makes me reluctant to use it.

What do you think about supporting weakrefs in for keys lru_cache?

If I remember correctly, the main difficulty was that not all keys are of a type that can be weakreffed. If I remember correctly again, I've solved this by including logic that attempts a weakref when possible, and degrades to a strong ref. What do you think about that?

Not all objects support weak references. The function can have arguments that support weak references and those which do not. Instead of spending time on trying to create a weak reference to objects which do not support it, it is better to create a weak reference only for arguments which expected to support it. Currently you can implement it by adding yet one function which wraps arguments in weak references, and unwrap them in the cached function. def func(a, b, c): return _cached_weakref_func(a, weakref.ref(), c) @lru_cach() def _cached_weakref_func(a, bref, c): b = bref() ... # implementation You can control for which arguments you create a weak reference.

1593

Age (days ago)

1594

Last active (days ago)

List overview

Download

20 comments

9 participants

participants (9)

Antoine Pitrou
Carl Meyer
Chris Angelico
David Mertz
edwin＠211mainstreet.net
Irit Katriel
Paul Moore
Ram Rachum
Serhiy Storchaka

Weakrefs for lru_cache?

tags

participants (9)