[Numpy-discussion] Decorator to access intermediate results in a function/algorithm

Mon Jun 28 12:35:37 EDT 2010

Dear everybody,

This message belongs only marginally to a numpy-related mailing list,
but I thought it might be of interest here since it addresses what I
believe is a common pattern in scientific development. My apologies if
that is not the case...

The code can be found at http://github.com/pberkes/persistent_locals
and requires the byteplay library
(http://code.google.com/p/byteplay/).

The problem
=========

In scientific development, functions often represent complex data
processing algorithm that transform input data into a desired output.
Internally, the function typically requires several intermediate
results to be computed and stored in local variables.

As a simple toy example, consider the following function, that
takes three arguments and returns True if the sum of the arguments is
smaller than their product:

def is_sum_lt_prod(a,b,c):
   sum = a+b+c
   prod = a*b*c
   return sum<prod

A frequently occurring problem is that the developer/final user may
need to access the intermediate results at a later stage, in order to
analyze the detailed behavior of the algorithm, for debugging, or to
write more comprehensive tests.

A possible solution would be to re-define the function and return the
needed internal variables, but this would break the existing code. A
better solution would be to add a keyword argument to return more
information:

def is_sum_lt_prod(a,b,c, internals=False):
   sum = a+b+c
   prod = a*b*c
   if internals:
        return sum<prod, sum, prod
   else:
        return sum<prod

This would keep the existing code intact, but only moves the problem
to later stages of the development. If successively the developer
needs access to even more local variables, the code has to be modified
again, and part of the code is broken. Moreover, this style leads to
ugly code like

res, _, _, _, var1, _, var3 = f(x)

where most of the returned values are irrelevant.

Proposed solution
=============

The proposed solution consists in a decorator that makes the local
variables accessible from a function attribute, 'locals'. For example:

@persistent_locals
def is_sum_lt_prod(a,b,c):
   sum = a+b+c
   prod = a*b*c
   return sum<prod

After calling the function (e.g. is_sum_lt_prod(2,1,2), which returns
False) we can analyze the intermediate results as
is_sum_lt_prod.locals
-> {'a': 2, 'b': 1, 'c': 2, 'prod': 4, 'sum': 5}

This style is cleaner, is consistent with the principle of identifying
the value returned by a function as the output of an algorithm, and is
robust to changes in the needs of the researcher.

Note that the local variables are saved even in case of an exception,
which turns out to be quite useful for debugging.

How it works
=========

The local variables in the inner scope of a function are not easily
accessible. One solution (which I have not tried) may be to use
tracing code like the one used in a debugger. This, however, would
have a considerable cost in time.

The proposed approach is to wrap the function in a callable object,
and modify its bytecode by adding an external try...finally statement
as follows:

  def f(self, *args, **kwargs):
      try:
          ... old code ...
      finally:
          self.locals = locals().copy()
          del self.locals['self']

The reason for wrapping the function in a class, instead of saving the
locals in a function attribute directly, is that there are all sorts
of complications in referring to itself from within a function. For
example, referring to the attribute as f.locals results in the
bytecode looking for the name 'f' in the namespace, and therefore
moving the function, e.g. with
g = f
del f
would break 'g'. There are even more problems for functions defined in
a closure.

I tried modfying f.func_globals with a custom dictionary which keeps a
reference to f.func_globals, adding a static element to 'f', but this
does not work as the Python interpreter does not call the func_globals
dictionary with Python calls but directly with PyDict_GetItem (see
http://osdir.com/ml/python.ideas/2007-11/msg00092.html). It is thus
impossible to re-define __getitem__ to return 'f' as needed. Ideally,
one would like to define a new closure for the function with a cell
variable containing the reference, but this is impossible at present
as far as I can tell.

An alternative solution (see persistent_locals_with_kwarg in deco.py)
is to change the signature of the function with an additional keyword
argument f(arg1, arg2, _self_ref=f). However, this approach breaks
functions that define an *args argument.

Cost
====
The increase in execution time of the decorated function is minimal.
Given its domain of application, most of the functions will take a
significant amount of time to complete, making the cost the decoration
negligible:

import time
def f(x):
  time.sleep(0.5)
  return 2*x

df = deco.persistent_locals(f)

%timeit f(1)
10 loops, best of 3: 501 ms per loop
%timeit df(1)
10 loops, best of 3: 502 ms per loop

Conclusion
========

The problem of accessing the intermediate
results in an algorithm is a recurrent one in my research, and this
decorator turned out to be quite useful in several occasions, and made
some of the code much cleaner. Hopefully, it will be useful in other
contexts as well!

All the best,
Pietro Berkes