Re: [Python-ideas] Bad programming style in decorators?

3 Jan 2016

      On 3 January 2016 at 04:56, Andrew Barnert via Python-ideas
 wrote:
...
On Jan 2, 2016, at 10:14, u8y7541 The Awesome Person  wrote:
...
In most decorator tutorials, it's taught using functions inside functions. Isn't this inefficient because every time the decorator is called, you're redefining the function, which takes up much more memory?
No.
First, most decorators are only called once. For example:
@lru_cache(maxsize=None)
    def fib(n)
        if n < 2:
            return n
        return fib(n-1) + fib(n-2)
The lru_cache function gets called once, and creates and returns a decorator function. That decorator function is passed fib, and creates and returns a wrapper function. That wrapper function is what gets stored in the globals as fib. It may then get called a zillion times, but it doesn't create or call any new function.
So, why should you care whether lru_cache is implemented with a function or a class? You're talking about a difference of a few dozen bytes, once in the entire lifetime of your program.
We need to make a slight terminology clarification here, as the answer
to Surya's question changes depend on whether we're talking about
implementing wrapper functions inside decorators (like the
"_lru_cache_wrapper" that lru_cache wraps around the passed in
callable), or about implementing decorators inside decorator factories
(like the transient "decorating_function" that lru_cache uses to apply
the wrapper to the function being defined). Most of the time that
distinction isn't important, so folks use the more informal approach
of using "decorator" to refer to both decorators and decorator
factories, but this is a situation where the difference matters.

Every decorator factor does roughly the same thing: when called, it
produces a new instance of a callable type which accepts a single
function as its sole argument. From the perspective of the *user* of
the decorator factory, it doesn't matter whether internally that's
handled using a def statement, a lambda expression, functools.partial,
instantiating a class that defines a custom __call__ method, or some
other technique. It's also rare for decorator factories to be invoked
in code that's a performance bottleneck, so it's generally more
important to optimise for readability and maintainability when writing
them than it is to optimise for speed.

The wrapper functions themselves, though, exist in a one:one
correspondence with the functions they're applied to - when you apply
functools.lru_cache to a function, the transient decorator produced by
the decorator factory only lasts as long as the execution of the
function definition, but the wrapper function lasts for as long as the
wrapped function does, and gets invoked every time that function is
called (and if a function is performance critical enough for the
results to be worth caching, then it's likely performance critical
enough to be thinking about micro-optimisations).

As such, from a micro-optimisation perspective, it's reasonable to
want to know the answers to:

* Which is faster, defining a new function object, or instantiating an
existing class?
* Which is faster, calling a function object that accepts a single
parameter, or calling a class with a custom __call__ method?
* Which uses more memory, defining a new function object, or
instantiating an existing class?

The answers to these questions can technically vary by implementation,
but in practice, CPython's likely to be representative of their
*relative* performance for any given implementation, so we can use it
to check whether or not our intuitions about relative speed and memory
consumption are correct.

For the first question then, here are the numbers I get locally for CPython 3.4:

$ python3 -m timeit "def f(): pass"
10000000 loops, best of 3: 0.0744 usec per loop
$ python3 -m timeit -s "class C: pass" "c = C()"
10000000 loops, best of 3: 0.113 usec per loop

The trick here is to realise that *at runtime*, a def statement is
really just instantiating a new instance of types.FunctionType - most
of the heavy lifting has already been done at compile time. The reason
it manages to be faster than typical class instantiation is because we
get to use customised bytecode operating on constant values rather
than having to look the class up by name and making a standard
function call:
...
...
...
dis.dis("def f(): pass")
  1           0 LOAD_CONST               0 ()
              3 LOAD_CONST               1 ('f')
              6 MAKE_FUNCTION            0
              9 STORE_NAME               0 (f)
             12 LOAD_CONST               2 (None)
             15 RETURN_VALUE
dis.dis("c = C()")
  1           0 LOAD_NAME                0 (C)
              3 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
              6 STORE_NAME               1 (c)
              9 LOAD_CONST               0 (None)
             12 RETURN_VALUE

For the second question:

$ python3 -m timeit -s "def f(arg): pass" "f(None)"
10000000 loops, best of 3: 0.111 usec per loop
[ncoghlan@thechalk ~]$ python3 -m timeit -s "class C:" -s "  def
__call__(self, arg): pass" -s "c = C()" "c(None)"
1000000 loops, best of 3: 0.232 usec per loop

Again, we see that the native function outperforms the class with a
custom __call__ method. There's no difference in the bytecode this
time, but rather a difference in what happens inside the CALL_FUNCTION
opcode: for the second case, we first have to retrieve the bound
c.__call__() method, and then call *that* as c.__call__(None), which
in turn internally calls C.__call__(c, None), while for the native
function case we get to skip straight to running the called function.
The speed difference can be significantly reduced (but not entirely
eliminated), by caching the bound method during setup:

$ python3 -m timeit -s "class C:" -s "  def __call__(self, arg): pass"
-s "c_call = C().__call__" "c_call(None)"
10000000 loops, best of 3: 0.115 usec per loop

Finally, we get to the question of relative size: are function
instances larger or smaller than your typical class instance? Again,
we don't have to guess, we can use the interpreter to experiment and
check our assumptions:

    >>> import sys
    >>> def f(): pass
    ...
    >>> sys.getsizeof(f)
    136
    >>> class C(): pass
    ...
    >>> sys.getsizeof(C())
    56

That's a potentially noticeable difference if we're applying the
wrapper often enough - the native function is 80 bytes larger than an
empty standard class instance. Looking at the available data
attributes on f, we can see the likely causes of the difference:

    >>> set(dir(f)) - set(dir(C()))
    {'__code__', '__defaults__', '__name__', '__closure__', '__get__',
'__kwdefaults__', '__qualname__', '__annotations__', '__globals__',
'__call__'}

There are 10 additional attributes there, although 2 of them (__get__
and __call__) relate to methods our native function has defined, but
the empty class doesn't. The other 8 represent additional pieces of
data stored (or potentially stored) per function, that we don't store
for a typical class instance.

However, we also need to account for the overhead of defining a new
class object, and that's a non-trivial amount of memory when we're
talking about a size difference of only 80 bytes per wrapped function:

    >>> sys.getsizeof(C)
    976

That means if a wrapper function is only used a few times in any given
run of the program, then native functions will be faster *and* use
less memory (at least on CPython). If the wrapper is used more often
than that, then native functions will still be the fastest option, but
not the lowest memory option.

Furthermore, if we decide to cache the bound __call__ method to reduce
the speed impact of using a custom __call__ method, we give up most of
the memory gains:

    >>> sys.getsizeof(C().__call__)
    64

This all suggests that if your application is severely memory
constrained (e.g. it's running on an embedded interpreter like
MicroPython), then it *might* make sense to incur the extra complexity
of using classes with a custom __call__ method to define wrapper
functions, over just using a nested function. For more typical cases
though, the difference is going to disappear into the noise, so you're
likely to be better off defaulting to using nested function
definitions, and only switching to the class based version in cases
where it's more readable and maintainable (and in those cases
considering whether or not it might make sense to return the bound
__call__ method from the decorator, rather than the callable itself).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan@gmail.com   |   Brisbane, Australia