A partial (wacky?) alternative to assignment expressions

I'm hoping that the arguments for assignment expressions will be over by Christmas *wink* so as a partial (and hopefully less controversial) alternative, what do people think of the idea of flagging certain expressions as "pure functions" so the compiler can automatically cache results from it? Let me explain: one of the use-cases for assignment expressions is to reduce repetition of code which may be expensive. A toy example: func(arg) + func(arg)*2 + func(arg)**2 If func() is a pure function with no side-effects, that is three times as costly as it ought to be: (f := func(arg)) + f*2 + f**2 Functional languages like Haskell can and do make this optimization all the time (or so I am lead to believe), because the compiler knows that func must be a pure, side-effect-free function. But the Python interpreter cannot do this optimization for us, because it has no way of knowing whether func() is a pure function. Now for the wacky idea: suppose we could tell the interpreter to cache the result of some sub-expression, and re-use it within the current expression? That would satisfy one use-case for assignment operators, and perhaps weaken the need for := operator. Good idea? Dumb idea? Good idea, but you want the assignment operator regardless? I don't have a suggestion for syntax yet, so I'm going to make up syntax which is *clearly and obviously rubbish*, a mere placeholder, so don't bother telling me all the myriad ways it sucks. I know it sucks, that's deliberate. Please focus on the *concept*, not the syntax. We would need to flag which expression can be cached because it is PURE, and tag how far the CACHE operates over: <BEGIN CACHE> <PURE> func(arg) <END PURE> + func(arg)*2 + func(arg)**2 <END CACHE> This would tell the compiler to only evaluate the sub-expression "func(arg)" once, cache the result, and re-use it each other time it sees that same sub-expression within the surrounding expression. To be clear: it doesn't matter whether or not the sub-expression actually is pure. And it doesn't have to be a function call: it could be anything legal in an expression. If we had this, with appropriately awesome syntax, would that negate the usefulness of assignment expresions in your mind? -- Steve

On Tue, May 15, 2018 at 10:35 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Not for me, no. It doesn't eliminate lengthy expressions, only costly ones. It also doesn't deduplicate for convenience of editing. So that's one goal achieved, two not achieved. If the function is actually pure, all you need is lru_cache and you get that result. If it isn't, you've hit one of the two hardest problems in computing, good luck. ChrisA

The time machine is used again! We HAVE a spelling: @functions.lru_cache() The problem is that there's still a time/space trade-off. If you might call a (pure) function with millions of different values, you likely don't want to cache them all. The current spelling makes this configurable, but there's no universal right answer. On Mon, May 14, 2018, 8:36 PM Steven D'Aprano <steve@pearwood.info> wrote:

[Steven D'Aprano <steve@pearwood.info>]
Despite that Haskell can do optimizations like this , its "let ... in ..." and "... where ..." constructs (which create names for expressions, for use in another expression or code block) are widely used anyway. They don't care about the optimization (they already get it), but about improving clarity. In Haskell they'd spell it like, e.g., (mixing Python with Haskell keywords in UPPERCASE) :LET fa = func(arg) IN fa + fa*2 + fa**2 which the compiler may (but almost certainly won't) optimize further to LET fa = func(arg) IN fa * (3 + fa) if it knows that fa is of a type for which that makes sense. In Python today, I expect most people would do it as t = func(arg) t + 2*t + t*t # or t*(3+t) because they also know that multiplying t by itself once is usually faster than squaring ;-) And they wouldn't _want_ all the redundant typing in func(arg) + func(arg)*2 + func(arg)**2 anyway. So I'm not saying "good" or "bad", but that it needs a more compelling use case.
Good idea, but you want the assignment operator regardless?
I'd probably write the example the way "I expect most people would do it" above even if we do get assignment expressions.
That syntax is clearly and obviously rubbish! It sucks. You're welcome ;-)
The use cases I envision for that have no intersection with use cases I have for assignment expressions, so, no. My first thought about where it might be handy probably has no intersection with what you were thinking of either ;-) <BEGIN CACHE> <PURE> math.ceil math.floor <END PURE> def int_away_from_zero(x): if x >= 0: return math.ceil(x) else: return math.floor(x) <END CACHE> The body of `int_away_from_zero()` is the way I _want_ to write it. But in heavily used functions it's expensive to look up "math", then look up its "ceil" (or "floor") attribute, on every call. So stuff like this often abuses default arguments instead: def int_away_from_zero(x, mc=math.ceil, mf=math.floor): if x >= 0: return mc(x) else: return mf(x) As the function grows over time, the default arg abuse grows, and the body of the function gets more obscure as more-&-more "tiny names" are introduced to save on repeated global and module attribute lookups. Indeed, in many cases I'd like to wrap an entire module in <BEGIN CACHE> .... <END CACHE>, with oodles of "module.attribute" thingies in the <PURE> block. _Most_ of my code gets no benefit from Python's "extremely dynamic" treatment of module.attribute. It would be great if Python could do those lookups once at module import time, then generate some kind of `LOAD_GLOBAL_FAST index` opcode to fetch the results whenever they're used anywhere inside the module. Which would doubtless delight all the people struggling to cut Python's startup time - "Jeez Louise - now he wants Python to do even _more_ at import time?!" ;-) There are, e.g., other cases where invariant values of the form `n+1` or `n-1` are frequently used in a long function, and - cheap as each one is - it can actually make a time difference if they're pre-computed outside a loop. I'm ashamed of how many variables I have named "np1" and "nm1" :-( So there's some interesting stuff to ponder here!

On Mon, May 14, 2018 at 8:35 PM, Steven D'Aprano <steve@pearwood.info> wrote:
While I support the idea of marking pure functions somehow, automating this optimization is not a simple matter. Haskell (or at least its standard compiler) usually doesn't do it for you: """ GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/laziness of the program. So GHC does do CSE, but only in specific circumstances --- see the GHC manual. (Section??) Long story short: "If you care about CSE, do it by hand." """ https://wiki.haskell.org/GHC_optimisations#Common_subexpression_elimination Pure functions do, however, help with constant propagation during compilation. A PyPy blog post goes into a bit of detail about how pure functions can be helpful: https://morepypy.blogspot.com/2011/03/controlling-tracing-of-interpreter-wit...

On Tue, 15 May 2018 10:35:58 +1000, Steven D'Aprano wrote:
Does anyone else remember an ancient Python recipe that looked something like what follows? SENTINEL = object() class Magic: def __call__(self, arg=SENTINEL): if arg != SENTINEL: self.value = arg return self.value magic = Magic() And then you'd use it something like this: if magic(g(x)) != 4: print("probably not 4: ", magic()) else: print("should be 4: ", magic()) magic(f(x)) + magic()*2 + magic()**2 consume_the_value_again(magic()) It's not perfect, but it's not too much worse than some of the other suggestions floating around. Sort of. Dan

On Tue, May 15, 2018 at 10:35 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Not for me, no. It doesn't eliminate lengthy expressions, only costly ones. It also doesn't deduplicate for convenience of editing. So that's one goal achieved, two not achieved. If the function is actually pure, all you need is lru_cache and you get that result. If it isn't, you've hit one of the two hardest problems in computing, good luck. ChrisA

The time machine is used again! We HAVE a spelling: @functions.lru_cache() The problem is that there's still a time/space trade-off. If you might call a (pure) function with millions of different values, you likely don't want to cache them all. The current spelling makes this configurable, but there's no universal right answer. On Mon, May 14, 2018, 8:36 PM Steven D'Aprano <steve@pearwood.info> wrote:

[Steven D'Aprano <steve@pearwood.info>]
Despite that Haskell can do optimizations like this , its "let ... in ..." and "... where ..." constructs (which create names for expressions, for use in another expression or code block) are widely used anyway. They don't care about the optimization (they already get it), but about improving clarity. In Haskell they'd spell it like, e.g., (mixing Python with Haskell keywords in UPPERCASE) :LET fa = func(arg) IN fa + fa*2 + fa**2 which the compiler may (but almost certainly won't) optimize further to LET fa = func(arg) IN fa * (3 + fa) if it knows that fa is of a type for which that makes sense. In Python today, I expect most people would do it as t = func(arg) t + 2*t + t*t # or t*(3+t) because they also know that multiplying t by itself once is usually faster than squaring ;-) And they wouldn't _want_ all the redundant typing in func(arg) + func(arg)*2 + func(arg)**2 anyway. So I'm not saying "good" or "bad", but that it needs a more compelling use case.
Good idea, but you want the assignment operator regardless?
I'd probably write the example the way "I expect most people would do it" above even if we do get assignment expressions.
That syntax is clearly and obviously rubbish! It sucks. You're welcome ;-)
The use cases I envision for that have no intersection with use cases I have for assignment expressions, so, no. My first thought about where it might be handy probably has no intersection with what you were thinking of either ;-) <BEGIN CACHE> <PURE> math.ceil math.floor <END PURE> def int_away_from_zero(x): if x >= 0: return math.ceil(x) else: return math.floor(x) <END CACHE> The body of `int_away_from_zero()` is the way I _want_ to write it. But in heavily used functions it's expensive to look up "math", then look up its "ceil" (or "floor") attribute, on every call. So stuff like this often abuses default arguments instead: def int_away_from_zero(x, mc=math.ceil, mf=math.floor): if x >= 0: return mc(x) else: return mf(x) As the function grows over time, the default arg abuse grows, and the body of the function gets more obscure as more-&-more "tiny names" are introduced to save on repeated global and module attribute lookups. Indeed, in many cases I'd like to wrap an entire module in <BEGIN CACHE> .... <END CACHE>, with oodles of "module.attribute" thingies in the <PURE> block. _Most_ of my code gets no benefit from Python's "extremely dynamic" treatment of module.attribute. It would be great if Python could do those lookups once at module import time, then generate some kind of `LOAD_GLOBAL_FAST index` opcode to fetch the results whenever they're used anywhere inside the module. Which would doubtless delight all the people struggling to cut Python's startup time - "Jeez Louise - now he wants Python to do even _more_ at import time?!" ;-) There are, e.g., other cases where invariant values of the form `n+1` or `n-1` are frequently used in a long function, and - cheap as each one is - it can actually make a time difference if they're pre-computed outside a loop. I'm ashamed of how many variables I have named "np1" and "nm1" :-( So there's some interesting stuff to ponder here!

On Mon, May 14, 2018 at 8:35 PM, Steven D'Aprano <steve@pearwood.info> wrote:
While I support the idea of marking pure functions somehow, automating this optimization is not a simple matter. Haskell (or at least its standard compiler) usually doesn't do it for you: """ GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/laziness of the program. So GHC does do CSE, but only in specific circumstances --- see the GHC manual. (Section??) Long story short: "If you care about CSE, do it by hand." """ https://wiki.haskell.org/GHC_optimisations#Common_subexpression_elimination Pure functions do, however, help with constant propagation during compilation. A PyPy blog post goes into a bit of detail about how pure functions can be helpful: https://morepypy.blogspot.com/2011/03/controlling-tracing-of-interpreter-wit...

On Tue, 15 May 2018 10:35:58 +1000, Steven D'Aprano wrote:
Does anyone else remember an ancient Python recipe that looked something like what follows? SENTINEL = object() class Magic: def __call__(self, arg=SENTINEL): if arg != SENTINEL: self.value = arg return self.value magic = Magic() And then you'd use it something like this: if magic(g(x)) != 4: print("probably not 4: ", magic()) else: print("should be 4: ", magic()) magic(f(x)) + magic()*2 + magic()**2 consume_the_value_again(magic()) It's not perfect, but it's not too much worse than some of the other suggestions floating around. Sort of. Dan
participants (6)
-
Chris Angelico
-
Dan Sommers
-
David Mertz
-
Franklin? Lee
-
Steven D'Aprano
-
Tim Peters