The Importance of Terminology's Quality

Rob Warnock rpw3 at rpw3.org
Tue Jul 22 14:23:22 CEST 2008


Martin Gregorie  <martin at see_sig_for_address.invalid> wrote:
+---------------
| John W Kennedy wrote:
| > No, the "thunks" were necessary at the machine-language level to 
| > /implement/ ALGOL 60, but they could not be expressed /in/ ALGOL.
|
| Are you sure about that? 
+---------------

I don't know if John is, but *I* am!  ;-}

+---------------
| I used Algol 60 on an Elliott 503 and the ICL 1900 series back when
| it was a current language. The term "thunking" did not appear in either
| compiler manual nor in any Algol 60 language definition I've seen.
+---------------

It wouldn't have been. Thunks were something used by Algol 60
*compiler writers* in the code generated by their compilers to
implement the semantics of Algol 60 call-by-name, but were not
visible to users at all [except that they allowed call-by-name
to "work right"].

+---------------
| A60 could pass values by name or value and procedures by name. That
| was it. Call by name is what is now referred to as reference passing.
+---------------

(*sigh*) NO, IT IS NOT!!!  Please go read the following:

    http://en.wikipedia.org/wiki/Thunk
    http://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_name
    http://en.wikipedia.org/wiki/Jensen%27s_Device

+---------------
| [Algol 60] did not have a mechanism for declaring anonymous procedures.
+---------------

Quite correct, but completely off the mark. While an Algol 60 *user*
could not declare an anonymous procedure, the *implementation* of an
Algol 60 compilers required the ability for the compiler itself to
generate/emit internal anonymous procedures, to wit, the above-mentioned
thunks, sometimes creating them dynamically during the procedure call.
[Actually, a pair of them for each actual argument in a procedure call.]

+---------------
| That, like the incorporation of machine code inserts, would have been
| a compiler-specific extension, so it is a terminological mistake to
| refer to it without specifying the implementing compiler.
+---------------

Again, "incompetent, irrelevant, and immaterial" [as Perry Mason used
to so frequently object during trials]. Thunks were not "extensions" to
Algol 60 compilers; they were part of the basic implementation strategy
*within* Algol 60 compilers, necessary because of the semantics required
by call-by-name.

Basically, in Algol 60, each parameter must be passed [in general,
that is, one can optimize away many special cases] as *two* closures --
conventionally called "thunks" by Algol 60 compiler writers -- one
for "getting" (evaluating) and the other for "setting" the parameter
[if the parameter was a "place" in Common Lisp terms, else an error
was signalled]... IN THE CALLER'S LEXICAL ENVIRONMENT!!

The big deal was two-fold: (1) each time a formal parameter was
*referenced* in a callee, the expression for the actual parameter
in the caller had to be *(re)evaluated* in the *caller's* lexical
environment, and the value of that (re)evaluation used as the
value of the referenced formal parameter in the callee; and
(2) if a variable appeared twice (or more) in a parameter list,
say, once as a naked variable [which is a "place", note!] and again
as a sub-expression of a more complicated parameter, then setting
the formal parameter in the *callee* would *change* the value of
the actual parameter in the caller(!!), which in turn would change
the value of the *other* actual parameter in the caller the next time
it was referenced in the callee. The above-referenced "Jensen's Device"
shows how this can be used to do "very tricky stuff". A simpler and
shorter example is here:

    http://www.cs.rit.edu/~afb/20013/plc/slides/procedures-07.html

Because the actual parameters in the callee had to be evaluated
in the *caller's* lexical environment -- and because Algol 60 was
fully recursive, allowed nested procedure definitions, and could
pass "local" procedures as arguments -- efficient implementation of
Algol 60 procedure calls almost necessitated placing the bodies
of the compiler-generated actual parameter thunks on the caller's
dynamic stack frame [or at least call instructions *to* the thunks
which could pass the current lexical contours as sub-arguments].
Knuth's nasty "man or boy test" stressed this to the limit:

    http://en.wikipedia.org/wiki/Man_or_boy_test


-Rob

p.s. IIRC [but it's been a *long* time!], the ALGOL-10 compiler
for the DEC PDP-10 passed each actual parameter as the address of
a triple of words, of which the first two were executable and the
third could be used to store a variables value (simple case) or
to pass the lexical contour (more complicated case). When the
callee needed to reference (evaluate) an argument, it used the 
PDP-10 XCT ("execute") instruction to execute the first word of
the block, which was required to deliver the value to a standard
register [let's say "T0", just for concreteness], and if the callee
wanted to *set* an argument, it executed the *second* word pointed
to by the passed address, with the new value also in a defined place
[again, let's use T0]. So to implement "X := X + 1;" in the callee,
the compiler would emit code like this:

	    MOVE  T1,{the arg (address) corresponding to "X"}
	    XCT   0(T1)       ; fetch the current value of X into T0.
	    ADDI  T0, 1       ; increment it
	    XCT   1(T1)       ; execute the "setter" for X.

Now in the case where the actual parameter in the caller was a
simple global variable, call it "Y", then the address passed as
the arg could be the following "YTHNK" in static data space:

    YTHNK:  MOVE   T0,.+2     ; one-instruction "getter"
            MOVEM  T0,.+1     ; one-instruction "setter"
    Y:      BLOCK  1           ; the actual place where the value "Y" lives

Whereas if the argument being passed were some more complicated
expression, such as an array reference or a reference to a local
procedure in the caller, then the 3-word arg block would be passed
on the stack and the passed address would point to this [possibly
dynamically-constructed] triple, where PUSHJ is the PDP-10 stack-
oriented subroutine call instruction:

            PUSHJ  P,{lambda-lifted getter code}
            PUSHJ  P,{lambda-lifted setter code}
	    EXP    {lexical contour info needed for getter/setter to work}

Efficient for the simple case; slow-but-correct for the messy case.

-----
Rob Warnock			<rpw3 at rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607




More information about the Python-list mailing list