Re: [Python-ideas] Using functools.lru_cache only on some arguments of a function

(Replying to an email I didn't get in my inbox. I hope this goes through correctly.) On Friday, December 4, 2015 at 4:44:47 PM UTC-5, Bill Winslow wrote:
Solutions: 1. Rewrite your recursive function so that the partial state is a nonlocal variable (in the closure), and memoize the recursive part. 2. Attach the state to the function, rather than passing it as a parameter. (Also suggested in this comment: https://www.reddit.com/r/learnpython/comments/3v75g4/using_functoolslru_cach...) 3. Wrap the state in a class for which __eq__ is defined to say that all instances of it are the same, and __hash__ always returns a constant (say, the id of the class). I think #1 is the best: The state is a property of the current recursive execution. You make a function which solves the problem, and the recursive function is an inner function of the solver. The state will be lost when you finish computing the answer. If you want to keep the partial state even after the recursion finished, then #2 is the right answer: the state is a property of the solver. I feel that #3 is icky and heavy-handed. The function won't be semantically responsible for the state. Hacking/extending lru_cache is saying the wrong thing about what the partial state is. Cached information (which is what your state is) isn't really an argument to the function as a mathematical input-output box. Also, if you're calling this function multiple times, you as the caller shouldn't be expected to keep track of the cache, so you shouldn't be expected to store it (in global scope). But if it's reusable, it should be stored, so the function should be responsible for storing it. If you want to keep the lru_cache, too, then you can store the recursive function as an attribute. It will even keep itself and your partial state in its closure. Sample code so you can confirm that the state is stored between calls. Just call solve(50) twice in a row, and compare the times. # WARNING: I don't recommend x > 100. # I've killed two interpreters while messing with this. def solve(x): try: rec = solve.rec # To trigger the exception the first time. except: @lru_cache(None) def rec(n, m): return sum( rec(i, j) for i in reversed(range(n)) for j in reversed(range(m))) solve.rec = rec return rec(x, x)

By the way, there are other usecases for ignoring arguments for caching. For example, dynamic programming where the arguments are the indices of a sequence, or some other object (tree?) which isn't a recursive argument. I recommend that those also be done with a closure (separating the recursive part from the initial arguments), but I think it's worth considering an lru_cache implementation for students who haven't learned to, er, abuse closures. Unless someone thinks a recipe can/should be added to the docs. On Fri, Dec 11, 2015 at 8:01 PM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
(What's the etiquette for adding onto your own email? I snipped most of it for the kb bloat.)

On Fri, Dec 11, 2015, 8:20 PM Franklin? Lee <leewangzhong+python@gmail.com> wrote:
This whole thing is probably best implemented as two separate functions rather than using a closure, depending on how intertwined the code paths are for the shortcut/non-shortcut versions. @lru_cache def factorial(n): if n < 2: return 1 return n * factorial(n-1) @lru_cache def factorial_faster(n, shortcut=None): if shortcut is not None: return shortcut return factorial(n)
I'd flip the rare-case to the except block and put the normal-case in the try block. I believe this will be more compute-efficient and more readable. def factorial(n, answer=None): try: return factorial.recursive(n) except AttributeError: @lru_cache() def recursive(n): # shortcut if answer is not None: return answer # non-shortcut if n < 2: return 1 return n * recursive(n-1) factorial.recursive = recursive return recursive(n) Note that the original question was how to handle an optional shortcut parameter that would not change the output but simply increase speed if (and only if) that call was a cache miss. A successive cache hit should be near instantaneous, regardless of the optional parameter.

On Sat, Dec 12, 2015 at 1:34 PM, Michael Selik <mike@selik.org> wrote:
I like the closure because it has semantic ownership: the inner function is a worker for the outer function. Also, with many dynamic programming problems which have a non-recursive variable, if you don't have a closure, you would need global state instead. Here's an example: inf = float('inf') def max_nonconsecutive_subset_sum(lst, n): """ Returns the biggest possible sum of n non-consecutive items. """ def rec(i, k): """ i: index k: number of items summed so far """ if k == n: # Stop summing return 0 if i == len(lst): # Reached end of list return -inf return max(rec(i+1, k), rec(i+2, k+1) + lst[i]) return rec(0, 0) (We can also shortcut out if there aren't enough items left to take k of them, and calculate length only once, but it's not needed for correctness.) There should be no significant cost for the closure, as the recursive part should be the bulk of the work.
The rare case is in the except block, though.

On Tue, Dec 29, 2015 at 2:14 AM Franklin? Lee <leewangzhong+python@gmail.com> wrote:
True, a closure has better encapsulation, making it less likely someone will misuse the helper function. On the other hand, that means there's less modularity and it would be difficult for someone to use the inner function. It's hard to know the right choice without seeing the exact problem the original author was working on.
You're correct. Sorry, I somehow misinterpreted the comment, "# To trigger the exception the first time" as indicating that code path would run only once.

By the way, there are other usecases for ignoring arguments for caching. For example, dynamic programming where the arguments are the indices of a sequence, or some other object (tree?) which isn't a recursive argument. I recommend that those also be done with a closure (separating the recursive part from the initial arguments), but I think it's worth considering an lru_cache implementation for students who haven't learned to, er, abuse closures. Unless someone thinks a recipe can/should be added to the docs. On Fri, Dec 11, 2015 at 8:01 PM, Franklin? Lee <leewangzhong+python@gmail.com> wrote:
(What's the etiquette for adding onto your own email? I snipped most of it for the kb bloat.)

On Fri, Dec 11, 2015, 8:20 PM Franklin? Lee <leewangzhong+python@gmail.com> wrote:
This whole thing is probably best implemented as two separate functions rather than using a closure, depending on how intertwined the code paths are for the shortcut/non-shortcut versions. @lru_cache def factorial(n): if n < 2: return 1 return n * factorial(n-1) @lru_cache def factorial_faster(n, shortcut=None): if shortcut is not None: return shortcut return factorial(n)
I'd flip the rare-case to the except block and put the normal-case in the try block. I believe this will be more compute-efficient and more readable. def factorial(n, answer=None): try: return factorial.recursive(n) except AttributeError: @lru_cache() def recursive(n): # shortcut if answer is not None: return answer # non-shortcut if n < 2: return 1 return n * recursive(n-1) factorial.recursive = recursive return recursive(n) Note that the original question was how to handle an optional shortcut parameter that would not change the output but simply increase speed if (and only if) that call was a cache miss. A successive cache hit should be near instantaneous, regardless of the optional parameter.

On Sat, Dec 12, 2015 at 1:34 PM, Michael Selik <mike@selik.org> wrote:
I like the closure because it has semantic ownership: the inner function is a worker for the outer function. Also, with many dynamic programming problems which have a non-recursive variable, if you don't have a closure, you would need global state instead. Here's an example: inf = float('inf') def max_nonconsecutive_subset_sum(lst, n): """ Returns the biggest possible sum of n non-consecutive items. """ def rec(i, k): """ i: index k: number of items summed so far """ if k == n: # Stop summing return 0 if i == len(lst): # Reached end of list return -inf return max(rec(i+1, k), rec(i+2, k+1) + lst[i]) return rec(0, 0) (We can also shortcut out if there aren't enough items left to take k of them, and calculate length only once, but it's not needed for correctness.) There should be no significant cost for the closure, as the recursive part should be the bulk of the work.
The rare case is in the except block, though.

On Tue, Dec 29, 2015 at 2:14 AM Franklin? Lee <leewangzhong+python@gmail.com> wrote:
True, a closure has better encapsulation, making it less likely someone will misuse the helper function. On the other hand, that means there's less modularity and it would be difficult for someone to use the inner function. It's hard to know the right choice without seeing the exact problem the original author was working on.
You're correct. Sorry, I somehow misinterpreted the comment, "# To trigger the exception the first time" as indicating that code path would run only once.
participants (2)
-
Franklin? Lee
-
Michael Selik