[Python-ideas] Before and after the colon in funciton defs.

Tue Sep 20 00:23:14 CEST 2011

On Mon, Sep 19, 2011 at 11:22 PM, Sven Marnach <sven at marnach.net> wrote:
> I don't see too much benefit of the proposed syntax for this use
> case.  If f() is a local throw-away function, I wouldn't worry about
> its signature.  If f() is a longer-lived object and I do care about
> its signature, I'd uses a class:
>
>    class Adder:
>        def __init__(self, i):
>            self.i = i
>        def __call__(self, x):
>            return x + self.i
>
>    [...] adders.append(Adder(i))
>
> I still think classes are the Python way to hide state, not closures.

The thing is, real function objects genuinely *are* special. They have
a privileged place in the interpreter, the inspect module and other
introspection tools know more about how to deal with them, they have
instance method descriptor behaviour built in, etc.

Switching from a real function object to a custom class with a
__call__ method is genuinely expensive in terms of the difficulty of
writing the code in the first place, as well as in being able to read
it later. Your 'Adder' example above, for instance, doesn't implement
the descriptor protocol, so will behave like a staticmethod when
placed in a class. That may be what you want, but it isn't easy to get
instance method behaviour in the case where you would prefer that.

$ python3 -m timeit -s "from call import Adder; f = Adder(1)" "f(5)"
1000000 loops, best of 3: 0.362 usec per loop
$ python3 -m timeit -s "from call import Adder; f = Adder(1).__call__" "f(5)"
1000000 loops, best of 3: 0.222 usec per loop
$ python3 -m timeit -s "from closure import adder; f = adder(1)" "f(5)"
10000000 loops, best of 3: 0.174 usec per loop
$ python3 -m timeit -s "from default import adder; f = adder(1)" "f(5)"
10000000 loops, best of 3: 0.166 usec per loop

When what you're trying to express is a single short algorithm,
overriding __call__ isn't even in the contest - we aren't talking
minor percentage differences in call overhead, we're talking more than
double. You can gain a fair bit of that back by retrieving __call__ as
a bound method in advance, but really, your algorithm needs to be
complex enough for the difference in call overhead to be trivial
before implementing __call__ becomes an attractive alternative to
using a closure

Now, if there are *multiple* algorithms operating on the same data,
then obviously you want a class with multiple methods. But in that
case, you're less likely to bless any one of them with privilege of
occupying the '__call__' slot. Basically, classes make sense when the
state is the most important thing, while functions focus on a specific
algorithm. For the special case of "single algorithm with some
associated state", a closure (or the default argument hack) will often
be a better modelling tool than a class. (Obviously, anyone that
disagrees with me on this point will consider this whole thread silly
- however, the popularity of closures for the implementation of
factory functions, including decorator factories, shows that there is
plenty of code out there that happily follows this approach)

With closures vs the default argument hack, the performance and
introspection arguments don't apply - in both of these cases you have
a real function, so the only trade-off is between forcing readers to
understand how closures work and forcing them to ignore additional
arguments that are there just to prepopulate the local namespace with
some data.

However, the closure approach has some genuine downsides from a
readability point of view. The real function when using a closure is
the inner one.  The outer function definition, the trailing return
statement and the invocation of the outer function are all boilerplate
that obscures the actual purpose of the code.

You can tidy some of that up with a decorator, but you can't avoid the
need for the nested function in order to create the closure.

And that's why people use the default argument hack today - they weigh
those downsides up against the consequences of having a bit of noise
in the function signature (as seen by introspection tools) and decide
they're happy to accept that trade-off in return for a simple and
straightforward expression of a single algorithm with some associated
state.

Various ideas for addressing this have been proposed in the past. PEP
3150's statement local namespaces are one, as are the assorted
incarnations of the proposal for flagging arbitrary expressions within
a function for evaluation at definition time rather than runtime
(search the python-dev and python-ideas archives for phrase like 'once
statement' and 'atdef' - alas, nobody has stepped forward to codify
those ideas into a PEP).

Where those proposals all come unstuck is that they try to do more
than the default argument hack allows, *without compelling use cases
to guide the additional semantics*. The pre-initialised locals concept
deliberately avoids that problem by targeting exactly the use cases
that are *already* supported via the default argument hack, just in a
way that tries to avoid the negative effects on introspection and
readability.

Cheers,
Nick.

P.S. Code for the timeit runs:

$ cat > call.py
class Adder(object):
  def __init__(self, i):
    self.i = i
  def __call__(self, x):
    return x + self.i

$ cat > closure.py
def adder(i):
  def f(x):
    return x + i
  return f

$ cat > default.py
def adder(i):
  def f(x, _i=i):
    return x + _i
  return f

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia