[issue41220] add optional make_key argument to lru_cache
Itay azolay
report at bugs.python.org
Wed Jul 8 04:03:26 EDT 2020
Itay azolay <itayazolay at gmail.com> added the comment:
Thanks, you have some very good points.
Let my try to address them
* cache functions really are expected to be cheap, but what they really need to be is *cheaper*. If my computation is expensive enough, I might be okay with making a less, still somewhat expensive computation instead. I believe it's for the developer to decide.
* key is usually used in a sequence of elements contexts, but it is expected to run multiple times. I believe that this is expected(what else could someone expect to happen?). I believe this is solvable through good docs(or change the name of the parameter?)
* I believe that a matching signature key-make function is a good thing. It will enforce the user to address the key-make function if he changes the behaviour of the cached function, and he would rethink the cache, otherwise it will not work.
* I can't argue about API simplicity, you probably have much more experience there. However, I believe that if we can agree that this is a useful feature, we can find a way to make the API clear and welcoming.
BTW, I agree with the problems with the typed argument, never quite understood when can this be useful.
I'd like to compare the key argument suggested here, to key argument through other python functions. let's take `sorted` as example.
sorted supports key to be able to sort other types of data structures,
even though I like your suggestion, to use dataclass, I believe that if it's applicable here, we can say the same thing for sorted.
we could require sorted to work the same way:
@total_ordering # If I'm not mistaken
@dataclass
class MyData:
...
fields
...
def __gt__(self, other):
return self.field > other.field
sorted(MyData(my_data_instance))
I think we both see the reason why this wouldn't be optimal in some cases here.
Without the key function, the sorted function doesn't support a big part of python objects.
I think the same applies for LRU cache. Right now, we just can use it with all python objects. we have to change the API, the way we move data around, the way we keep our objects, just so that lru_cache would work.
And after all that, lru_cache will just break if someone send some data in a list instead of tuple. I think that cause a lot of developers to give up the default stdlib lru_cache.
In my case, I have a few list of lists, each list indicates an event that happened. In each event, there is a unique timestamp.
I have an object, that have few different lists
class Myobj:
events_1: List[list]
events_2: List[list]
I have a small, esoteric function, that looks like that now:
def calc(list_of_events):
# calculation
pass
and is being called from multiple places in the code, which takes a lot of time, like that
calc(events_1) # multiple times
calc(events_2) # multiple times
I wanted to cache the function calc, but now I have to do something like that:
@lru_cache
def calc_events_1(myobj):
calc(myobj.events_1)
@lru_cache
def calc_events_2(myobj):
calc(myobj.events_2)
right now I can't change the API of the lists, because they are being used in multiple places, some of this least(I have multiple events-lists) are being converted to numpy, some doesn't.
Regarding API, we could make it simpler by either use must have kwargs, like lru_cache(maxsize, typed, *, key=None)
or, like the property setters/getters case
lru_cache
def function(args, ...):
pass
@function.make_key # or key, whatever name is good
def _(args, ...):
return new_key
However I like the second option less.
Thanks
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41220>
_______________________________________
More information about the Python-bugs-list
mailing list