More about thunks. I've resisted the temptation to look at the Wikipedia page -- this is from memory.

On Sat, Oct 23, 2021 at 11:39 PM Steven D'Aprano <steve@pearwood.info> wrote:

> I would prefer to see this situation handled as part of a
> larger-scale
> change of adding some kind of "inline lambda" which executes directly in
> the calling scope.

That would be what I called a "thunk" in two posts now, stealing the
term from Algol.

It would be nice if one of the core devs who understand the deep
internals of the interpreter could comment on whether that sort of
delayed evaluation of an expression is even plausible for Python.

IIRC, a thunk in Algol was more or less defined as "substitute the argument in each use". There was no expression in the function definition, just an argument. Translating to Python, we'd have something like this (all arguments were thunks by default):

def foo(arg):

print(arg)

arg = arg + 1

x = 42

foo(x)

print(x)

This would print 42 and 43. Writing foo(42) would produce an error. But if you had another function

def bar(arg)

print(arg)

it would be okay to write bar(42).

A key property (IIRC) was that the thunk would be evaluated each time it was used. This led to "Jensen's device" where you could write

def foo(a, b, n):

total = 0

for a in range(1, n+1): # IIRC Algol spelled this 'for a := 1 to n'

total += b

return total

and you'd call it like this:

foo(x, x**2, 10)

which would compute the sum of the squares of i for i i range(10).

It went out of style because it was complex to implement and expensive to execute -- for example, the 'n' argument would be a thunk too.

You can't easily introduce this in Python because Python is built on not knowing the signature of foo when the code for the call foo(x, x**2, 10) is compiled. So it wouldn't be sufficient to mark 'a' and 'b' as thunks in the 'def foo(a, b, n)' line. You'd also have to mark the call site.

Suppose we introduce a \ to indicate thunks in both places. (This is the worst character but enough to explain the mechanism.)

You'd write

def foo(\a, \b, n):

total = 0

for a in range(1, n+1):

total += b

return total

and you'd call it like

x = None # dummy

foo(\x, \x**2, 10)

Now the compiler has enough information to compile code for the thunks. Since thunks can be used as l-values as well as r-values, there would be two hidden functions, one to get the value, another to set it. The call would pass an object containing those two functions and the code in the callee would translate each use of a thunk argument into a call to either the getter or the setter. Above, the getter functions for x and x**2 are simple:

get_a = lambda: x

get_b = lambda: x**2

(Defined in the caller's scope.)

The setter function for the first argument would be a bit more complex:

def set_a(val):

nonlocal x

x = val

The second argument's setter function is missing, since 'x+42' is not an l-value. (We'd get a runtime error if foo() assigned to b.)

If we wanted thunks without assignment and without Jensen's device, we would only need a getter function. Then \x in the caller would just pass lambda: x, and an argument marked with \a in a function definition would cause code to be generated that calls it each time it is used.

Getting rid of the need to mark all thunk arguments in the caller would require the compiler to have knowledge of which function is being called. That would require an amount of static analysis beyond what even mypy and friends can do, so I don't think we should try to pursue that.

The key property of a thunk, IMO, is that it is evaluated in the caller's scope. It's no different than a function defined in the caller. I don't think it would be a good substitute for late-binding default arguments. (You could make something up that uses dynamic scoping, but that's a whole different can of worms.)

--Guido van Rossum (python.org/~guido)

Pronouns: he/him (why is my pronoun here?)