[Numpy-discussion] Decorator to access intermediate results in a function/algorithm

Mon Jun 28 13:29:23 EDT 2010

Dear Mike,
Thanks for the feedback.

On Mon, Jun 28, 2010 at 12:51 PM, Michael Droettboom <mdroe at stsci.edu> wrote:
> What are the implications of this with respect to memory usage?  When
> working with large arrays, if the intermediate values of a number of
> functions are kept around (whether we want to access them or not) could
> this not lead to excessive memory usage?  Maybe this behavior should
> only apply when (as you suggest in the counterexample) a "locals=True"
> kwarg is passed in.

I've been thinking about it, but I haven't decided for a final
implementation yet. I find it a bit messy to add a new kwarg to the
signature of an existing function, as it might conflict with an
existing *args argument.
For example, redefining
f(x, *args)
as
f(x, locals=True, *args)
would break code calling f as
f(1, 2, 3) .

There are several alternatives:
1) add to the wrapping class a property to switch on and on the
behavior of the decorator
2) introduce a naming convention (e.g., variables whose name begins
with '_' are not saved)
3) have an option to dump the local variables to a file

The solution I prefer so far is the second, but since I never had the
problem in my code so far I'm not sure which option is most useful in
practice.

>
> It seems like a lot of the maintainability issues raised in the
> counterexample could be solved by returning a dictionary or a bunch [1]
> instead of a tuple -- though that still (without care on the part of the
> user) has the "keeping around references too much stuff" problem.
>
> [1]
> http://code.activestate.com/recipes/52308-the-simple-but-handy-collector-of-a-bunch-of-named/

It's true that the counter-example is slightly unrealistic, although I
have seen similar bits of code in real-life examples. Using a
decorator is an advantage when dealing with code defined in a
third-party library.

Pietro

>
> Mike
>
> On 06/28/2010 12:35 PM, Pietro Berkes wrote:
>> Dear everybody,
>>
>> This message belongs only marginally to a numpy-related mailing list,
>> but I thought it might be of interest here since it addresses what I
>> believe is a common pattern in scientific development. My apologies if
>> that is not the case...
>>
>> The code can be found at http://github.com/pberkes/persistent_locals
>> and requires the byteplay library
>> (http://code.google.com/p/byteplay/).
>>
>> The problem
>> =========
>>
>> In scientific development, functions often represent complex data
>> processing algorithm that transform input data into a desired output.
>> Internally, the function typically requires several intermediate
>> results to be computed and stored in local variables.
>>
>> As a simple toy example, consider the following function, that
>> takes three arguments and returns True if the sum of the arguments is
>> smaller than their product:
>>
>> def is_sum_lt_prod(a,b,c):
>>     sum = a+b+c
>>     prod = a*b*c
>>     return sum<prod
>>
>> A frequently occurring problem is that the developer/final user may
>> need to access the intermediate results at a later stage, in order to
>> analyze the detailed behavior of the algorithm, for debugging, or to
>> write more comprehensive tests.
>>
>> A possible solution would be to re-define the function and return the
>> needed internal variables, but this would break the existing code. A
>> better solution would be to add a keyword argument to return more
>> information:
>>
>> def is_sum_lt_prod(a,b,c, internals=False):
>>     sum = a+b+c
>>     prod = a*b*c
>>     if internals:
>>          return sum<prod, sum, prod
>>     else:
>>          return sum<prod
>>
>> This would keep the existing code intact, but only moves the problem
>> to later stages of the development. If successively the developer
>> needs access to even more local variables, the code has to be modified
>> again, and part of the code is broken. Moreover, this style leads to
>> ugly code like
>>
>> res, _, _, _, var1, _, var3 = f(x)
>>
>> where most of the returned values are irrelevant.
>>
>> Proposed solution
>> =============
>>
>> The proposed solution consists in a decorator that makes the local
>> variables accessible from a function attribute, 'locals'. For example:
>>
>> @persistent_locals
>> def is_sum_lt_prod(a,b,c):
>>     sum = a+b+c
>>     prod = a*b*c
>>     return sum<prod
>>
>> After calling the function (e.g. is_sum_lt_prod(2,1,2), which returns
>> False) we can analyze the intermediate results as
>> is_sum_lt_prod.locals
>> ->  {'a': 2, 'b': 1, 'c': 2, 'prod': 4, 'sum': 5}
>>
>> This style is cleaner, is consistent with the principle of identifying
>> the value returned by a function as the output of an algorithm, and is
>> robust to changes in the needs of the researcher.
>>
>> Note that the local variables are saved even in case of an exception,
>> which turns out to be quite useful for debugging.
>>
>> How it works
>> =========
>>
>> The local variables in the inner scope of a function are not easily
>> accessible. One solution (which I have not tried) may be to use
>> tracing code like the one used in a debugger. This, however, would
>> have a considerable cost in time.
>>
>> The proposed approach is to wrap the function in a callable object,
>> and modify its bytecode by adding an external try...finally statement
>> as follows:
>>
>>    def f(self, *args, **kwargs):
>>        try:
>>            ... old code ...
>>        finally:
>>            self.locals = locals().copy()
>>            del self.locals['self']
>>
>> The reason for wrapping the function in a class, instead of saving the
>> locals in a function attribute directly, is that there are all sorts
>> of complications in referring to itself from within a function. For
>> example, referring to the attribute as f.locals results in the
>> bytecode looking for the name 'f' in the namespace, and therefore
>> moving the function, e.g. with
>> g = f
>> del f
>> would break 'g'. There are even more problems for functions defined in
>> a closure.
>>
>> I tried modfying f.func_globals with a custom dictionary which keeps a
>> reference to f.func_globals, adding a static element to 'f', but this
>> does not work as the Python interpreter does not call the func_globals
>> dictionary with Python calls but directly with PyDict_GetItem (see
>> http://osdir.com/ml/python.ideas/2007-11/msg00092.html). It is thus
>> impossible to re-define __getitem__ to return 'f' as needed. Ideally,
>> one would like to define a new closure for the function with a cell
>> variable containing the reference, but this is impossible at present
>> as far as I can tell.
>>
>> An alternative solution (see persistent_locals_with_kwarg in deco.py)
>> is to change the signature of the function with an additional keyword
>> argument f(arg1, arg2, _self_ref=f). However, this approach breaks
>> functions that define an *args argument.
>>
>> Cost
>> ====
>> The increase in execution time of the decorated function is minimal.
>> Given its domain of application, most of the functions will take a
>> significant amount of time to complete, making the cost the decoration
>> negligible:
>>
>> import time
>> def f(x):
>>    time.sleep(0.5)
>>    return 2*x
>>
>> df = deco.persistent_locals(f)
>>
>> %timeit f(1)
>> 10 loops, best of 3: 501 ms per loop
>> %timeit df(1)
>> 10 loops, best of 3: 502 ms per loop
>>
>> Conclusion
>> ========
>>
>> The problem of accessing the intermediate
>> results in an algorithm is a recurrent one in my research, and this
>> decorator turned out to be quite useful in several occasions, and made
>> some of the code much cleaner. Hopefully, it will be useful in other
>> contexts as well!
>>
>> All the best,
>> Pietro Berkes
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> --
> Michael Droettboom
> Science Software Branch
> Space Telescope Science Institute
> Baltimore, Maryland, USA
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>