[Numpy-discussion] Decorator to access intermediate results in a function/algorithm

Michael Droettboom mdroe at stsci.edu
Mon Jun 28 12:51:23 EDT 2010


What are the implications of this with respect to memory usage?  When 
working with large arrays, if the intermediate values of a number of 
functions are kept around (whether we want to access them or not) could 
this not lead to excessive memory usage?  Maybe this behavior should 
only apply when (as you suggest in the counterexample) a "locals=True" 
kwarg is passed in.

It seems like a lot of the maintainability issues raised in the 
counterexample could be solved by returning a dictionary or a bunch [1] 
instead of a tuple -- though that still (without care on the part of the 
user) has the "keeping around references too much stuff" problem.

[1] 
http://code.activestate.com/recipes/52308-the-simple-but-handy-collector-of-a-bunch-of-named/

Mike

On 06/28/2010 12:35 PM, Pietro Berkes wrote:
> Dear everybody,
>
> This message belongs only marginally to a numpy-related mailing list,
> but I thought it might be of interest here since it addresses what I
> believe is a common pattern in scientific development. My apologies if
> that is not the case...
>
> The code can be found at http://github.com/pberkes/persistent_locals
> and requires the byteplay library
> (http://code.google.com/p/byteplay/).
>
> The problem
> =========
>
> In scientific development, functions often represent complex data
> processing algorithm that transform input data into a desired output.
> Internally, the function typically requires several intermediate
> results to be computed and stored in local variables.
>
> As a simple toy example, consider the following function, that
> takes three arguments and returns True if the sum of the arguments is
> smaller than their product:
>
> def is_sum_lt_prod(a,b,c):
>     sum = a+b+c
>     prod = a*b*c
>     return sum<prod
>
> A frequently occurring problem is that the developer/final user may
> need to access the intermediate results at a later stage, in order to
> analyze the detailed behavior of the algorithm, for debugging, or to
> write more comprehensive tests.
>
> A possible solution would be to re-define the function and return the
> needed internal variables, but this would break the existing code. A
> better solution would be to add a keyword argument to return more
> information:
>
> def is_sum_lt_prod(a,b,c, internals=False):
>     sum = a+b+c
>     prod = a*b*c
>     if internals:
>          return sum<prod, sum, prod
>     else:
>          return sum<prod
>
> This would keep the existing code intact, but only moves the problem
> to later stages of the development. If successively the developer
> needs access to even more local variables, the code has to be modified
> again, and part of the code is broken. Moreover, this style leads to
> ugly code like
>
> res, _, _, _, var1, _, var3 = f(x)
>
> where most of the returned values are irrelevant.
>
> Proposed solution
> =============
>
> The proposed solution consists in a decorator that makes the local
> variables accessible from a function attribute, 'locals'. For example:
>
> @persistent_locals
> def is_sum_lt_prod(a,b,c):
>     sum = a+b+c
>     prod = a*b*c
>     return sum<prod
>
> After calling the function (e.g. is_sum_lt_prod(2,1,2), which returns
> False) we can analyze the intermediate results as
> is_sum_lt_prod.locals
> ->  {'a': 2, 'b': 1, 'c': 2, 'prod': 4, 'sum': 5}
>
> This style is cleaner, is consistent with the principle of identifying
> the value returned by a function as the output of an algorithm, and is
> robust to changes in the needs of the researcher.
>
> Note that the local variables are saved even in case of an exception,
> which turns out to be quite useful for debugging.
>
> How it works
> =========
>
> The local variables in the inner scope of a function are not easily
> accessible. One solution (which I have not tried) may be to use
> tracing code like the one used in a debugger. This, however, would
> have a considerable cost in time.
>
> The proposed approach is to wrap the function in a callable object,
> and modify its bytecode by adding an external try...finally statement
> as follows:
>
>    def f(self, *args, **kwargs):
>        try:
>            ... old code ...
>        finally:
>            self.locals = locals().copy()
>            del self.locals['self']
>
> The reason for wrapping the function in a class, instead of saving the
> locals in a function attribute directly, is that there are all sorts
> of complications in referring to itself from within a function. For
> example, referring to the attribute as f.locals results in the
> bytecode looking for the name 'f' in the namespace, and therefore
> moving the function, e.g. with
> g = f
> del f
> would break 'g'. There are even more problems for functions defined in
> a closure.
>
> I tried modfying f.func_globals with a custom dictionary which keeps a
> reference to f.func_globals, adding a static element to 'f', but this
> does not work as the Python interpreter does not call the func_globals
> dictionary with Python calls but directly with PyDict_GetItem (see
> http://osdir.com/ml/python.ideas/2007-11/msg00092.html). It is thus
> impossible to re-define __getitem__ to return 'f' as needed. Ideally,
> one would like to define a new closure for the function with a cell
> variable containing the reference, but this is impossible at present
> as far as I can tell.
>
> An alternative solution (see persistent_locals_with_kwarg in deco.py)
> is to change the signature of the function with an additional keyword
> argument f(arg1, arg2, _self_ref=f). However, this approach breaks
> functions that define an *args argument.
>
> Cost
> ====
> The increase in execution time of the decorated function is minimal.
> Given its domain of application, most of the functions will take a
> significant amount of time to complete, making the cost the decoration
> negligible:
>
> import time
> def f(x):
>    time.sleep(0.5)
>    return 2*x
>
> df = deco.persistent_locals(f)
>
> %timeit f(1)
> 10 loops, best of 3: 501 ms per loop
> %timeit df(1)
> 10 loops, best of 3: 502 ms per loop
>
> Conclusion
> ========
>
> The problem of accessing the intermediate
> results in an algorithm is a recurrent one in my research, and this
> decorator turned out to be quite useful in several occasions, and made
> some of the code much cleaner. Hopefully, it will be useful in other
> contexts as well!
>
> All the best,
> Pietro Berkes
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>    


-- 
Michael Droettboom
Science Software Branch
Space Telescope Science Institute
Baltimore, Maryland, USA




More information about the NumPy-Discussion mailing list