[Steven D'Aprano
I'm hoping that the arguments for assignment expressions will be over by Christmas *wink* so as a partial (and hopefully less controversial) alternative, what do people think of the idea of flagging certain expressions as "pure functions" so the compiler can automatically cache results from it?
Let me explain: one of the use-cases for assignment expressions is to reduce repetition of code which may be expensive. A toy example:
func(arg) + func(arg)*2 + func(arg)**2
If func() is a pure function with no side-effects, that is three times as costly as it ought to be:
(f := func(arg)) + f*2 + f**2
Functional languages like Haskell can and do make this optimization all the time (or so I am lead to believe), because the compiler knows that func must be a pure, side-effect-free function. But the Python interpreter cannot do this optimization for us, because it has no way of knowing whether func() is a pure function.
Now for the wacky idea: suppose we could tell the interpreter to cache the result of some sub-expression, and re-use it within the current expression? That would satisfy one use-case for assignment operators, and perhaps weaken the need for := operator.
Good idea? Dumb idea?
Despite that Haskell can do optimizations like this , its "let ... in ..." and "... where ..." constructs (which create names for expressions, for use in another expression or code block) are widely used anyway. They don't care about the optimization (they already get it), but about improving clarity. In Haskell they'd spell it like, e.g., (mixing Python with Haskell keywords in UPPERCASE) :LET fa = func(arg) IN fa + fa*2 + fa**2 which the compiler may (but almost certainly won't) optimize further to LET fa = func(arg) IN fa * (3 + fa) if it knows that fa is of a type for which that makes sense. In Python today, I expect most people would do it as t = func(arg) t + 2*t + t*t # or t*(3+t) because they also know that multiplying t by itself once is usually faster than squaring ;-) And they wouldn't _want_ all the redundant typing in func(arg) + func(arg)*2 + func(arg)**2 anyway. So I'm not saying "good" or "bad", but that it needs a more compelling use case.
Good idea, but you want the assignment operator regardless?
I'd probably write the example the way "I expect most people would do it" above even if we do get assignment expressions.
I don't have a suggestion for syntax yet, so I'm going to make up syntax which is *clearly and obviously rubbish*, a mere placeholder, so don't bother telling me all the myriad ways it sucks. I know it sucks, that's deliberate. Please focus on the *concept*, not the syntax.
We would need to flag which expression can be cached because it is PURE, and tag how far the CACHE operates over:
<BEGIN CACHE> <PURE> func(arg) <END PURE> + func(arg)*2 + func(arg)**2 <END CACHE>
That syntax is clearly and obviously rubbish! It sucks. You're welcome ;-)
This would tell the compiler to only evaluate the sub-expression "func(arg)" once, cache the result, and re-use it each other time it sees that same sub-expression within the surrounding expression.
To be clear: it doesn't matter whether or not the sub-expression actually is pure. And it doesn't have to be a function call: it could be anything legal in an expression.
If we had this, with appropriately awesome syntax, would that negate the usefulness of assignment expresions in your mind?
The use cases I envision for that have no intersection with use cases I have for assignment expressions, so, no. My first thought about where it might be handy probably has no intersection with what you were thinking of either ;-) <BEGIN CACHE> <PURE> math.ceil math.floor <END PURE> def int_away_from_zero(x): if x >= 0: return math.ceil(x) else: return math.floor(x) <END CACHE> The body of `int_away_from_zero()` is the way I _want_ to write it. But in heavily used functions it's expensive to look up "math", then look up its "ceil" (or "floor") attribute, on every call. So stuff like this often abuses default arguments instead: def int_away_from_zero(x, mc=math.ceil, mf=math.floor): if x >= 0: return mc(x) else: return mf(x) As the function grows over time, the default arg abuse grows, and the body of the function gets more obscure as more-&-more "tiny names" are introduced to save on repeated global and module attribute lookups. Indeed, in many cases I'd like to wrap an entire module in <BEGIN CACHE> .... <END CACHE>, with oodles of "module.attribute" thingies in the <PURE> block. _Most_ of my code gets no benefit from Python's "extremely dynamic" treatment of module.attribute. It would be great if Python could do those lookups once at module import time, then generate some kind of `LOAD_GLOBAL_FAST index` opcode to fetch the results whenever they're used anywhere inside the module. Which would doubtless delight all the people struggling to cut Python's startup time - "Jeez Louise - now he wants Python to do even _more_ at import time?!" ;-) There are, e.g., other cases where invariant values of the form `n+1` or `n-1` are frequently used in a long function, and - cheap as each one is - it can actually make a time difference if they're pre-computed outside a loop. I'm ashamed of how many variables I have named "np1" and "nm1" :-( So there's some interesting stuff to ponder here!