Generalized deferred computation in Python
Here is a very rough draft of an idea I've floated often, but not with much specification. Take this as "ideas" with little firm commitment to details from me. PRs, or issues, or whatever, can go to https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as mentioning them in this thread. PEP: 9999 Title: Generalized deferred computation Author: David Mertz <dmertz@gnosis.cx> Discussions-To: https://mail.python.org/archives/list/python-ideas@python.org/thread/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 21-Jun-2022 Python-Version: 3.12 Post-History: Abstract ======== This PEP proposes introducing the soft keyword ``later`` to express the concept of deferred computation. When an expression is preceded by the keyword, the expression is not evaluated but rather creates a "thunk" or "deferred object." Reference to the deferred object later in program flow causes the expression to be executed at that point, and for both the value and type of the object to become the result of the evaluated expression. Motivation ========== "Lazy" or "deferred" evaluation is useful paradigm for expressing relationships among potentially expensive operations prior their actual computation. Many functional programming languages, such as Haskell, build laziness into the heart of their language. Within the Python ecosystem, the popular scientific library `dask-delayed <dask-delayed>`_ provides a framework for lazy evaluation that is very similar to that proposed in this PEP. .. _dask-delayed: https://docs.dask.org/en/stable/delayed.html Examples of Use =============== While the use of deferred computation is principally useful when computations are likely to be expensive, the simple examples shown do not necessarily use such expecially spendy computations. Most of these are directly inspired by examples used in the documentation of dask-delayed. In dask-delayed, ``Delayed`` objects are create by functions, and operations create a *directed acyclic graph* rather than perform actual computations. For example:: >>> import dask >>> @dask.delayed ... def later(x): ... return x ... >>> output = [] >>> data = [23, 45, 62] >>> for x in data: ... x = later(x) ... a = x * 3 ... b = 2**x ... c = a + b ... output.append(c) ... >>> total = sum(output) >>> total Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0') >>> total.compute() 4611721202807865734 >>> total.visualize() .. figure:: pep-9999-dag.png :align: center :width: 50% :class: invert-in-dark-mode Figure 1. Dask DAG created from simple operations. Under this PEP, the soft keyword ``later`` would work in a similar manner to this dask.delayed code. But rather than requiring calling ``.compute()`` on a ``Delayed`` object to arrive at the result of a computation, every reference to a binding would perform the "compute" *unless* it was itself a deferred expression. So the equivalent code under this PEP would be:: >>> output = [] >>> data = [23, 45, 62] >>> for later x in data: ... a = later (x * 3) ... b = later (2**x) ... c = later (a + b) ... output.append(later c) ... >>> total = later sum(output) >>> type(total) # type() does not un-thunk <class 'DeferredObject'> >>> if value_needed: ... print(total) # Actual computation occurs here 4611721202807865734 In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects. In general, however, every regular reference to a bound object will force a computation and re-binding on a ``Deferred``. This includes access to simple names, but also similarly to instance attributes, index positions in lists or tuples, or any other means by which an object may be referenced. Rejected Spellings ================== A number of alternate spellings for creating a ``Deferred`` object are possible. This PEP-author has little preference among them. The words ``defer`` or ``delay``, or their past participles ``deferred`` and ``delayed`` are commonly used in discussions of lazy evaluation. All of these would work equally well as the suggested soft keyword ``later``. The keyword ``lazy`` is not completely implausible, but does not seem to read as well. No punctuation is immediately obvious for this purpose, although surrounding expressions with backticks is somewhat suggestive of quoting in Lisp, and perhaps slightly reminiscent of the ancient use of backtick for shell commands in Python 1.x. E.g.:: might_use = `math.gcd(a, math.factorial(b))` Relationship to PEP-0671 ======================== The concept of "late-bound function argument defaults" is introduced in :pep:`671`. Under that proposal, a special syntactic marker would be permitted in function signatures with default arguments to allow the expressions indicated as defaults to be evaluated at call time rather than at runtime. In current Python, we might write a toy function such as:: def func(items=[], n=None): if n is None: n = len(items) items.append("Hello") print(n) func([1, 2, 3]) # prints: 3 Using the :pep:`671` approach this could be simplified somewhat as:: def func(items=[], n=>len(items)): # late-bound defaults act as if bound here items.append("Hello") print(n) func([1, 2, 3]) # prints: 3 Under the current PEP, evaluation of a ``Deferred`` object only occurs upon reference. That is, for the current toy function, the evaluation would not occur until the ``print(n)`` line.:: def func(items=[], n=later len(items)): items.append("Hello") print(n) func([1, 2, 3]) # prints: 4 To completely replicate the behavior of PEP-0671, an extra line at the start of the function body would be required:: def func(items=[], n=later len(items)): n = n # Evaluate the Deferred and re-bind the name n items.append("Hello") print(n) func([1, 2, 3]) # prints: 3 References ========== https://github.com/DavidMertz/peps/blob/master/pep-9999.rst Copyright ========= This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On 2022-06-21 13:53, David Mertz, Ph.D. wrote:
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
I'll have to ponder my thoughts about the proposal as a whole, but this particular aspect seems dubious to me. As I understand it this would require some fairly deep changes to how evaluation works in Python. Right now in an expression like `type(blah)`, there isn't any way for the evaluation of `blah` to depend on the fact that it happens to occur as an argument to `type`. The evaluation of the arguments of any function call happens before the function gets any say in the matter. For this proposal to work, that would have to change. Questions that immediately come to mind: 1. Would this also apply to the three-argument form of type? If not, then that makes the special-casing even more special. 2. What if I do `my_alias = type` and then do `my_alias(deferred_obj)`? Does that still not unwrap the thunk? What if I instead reassign the name `type` by doing `type = some_unrelated_func` and then call `type(deferred_obj)`? Now what happens? 3. Is there any way to write a new Python function that similarly bypasses un-thunking? Are there any other builtins that similarly bypass unthunking? 4. Why is this special-casing needed anyway? What would/could the surrounding code do differently with the deferred object based on knowing that it is deferred, since any other operations you would do it on would evaluate it anyway? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Tue, Jun 21, 2022 at 5:53 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
I'll have to ponder my thoughts about the proposal as a whole, but this particular aspect seems dubious to me. As I understand it this would require some fairly deep changes to how evaluation works in Python. Right now in an expression like `type(blah)`, there isn't any way for the evaluation of `blah` to depend on the fact that it happens to occur as an argument to `type`.
I absolutely agree that this is a sore point in my first draft. I could shift the magic from `type()` to `isdeferred()`, but that doesn't really change anything for your examples. I suppose, that is, unless `isdeferred()` becomes something other than a real function, but more like some sort of macro. That doesn't make me happy either. However, I *would* like to be able to answer the question "Is this object a DeferredObject?" somehow. For example, I'd like some way to write code similar to: if isdeferred(expensive_result): log.debug("The computationally expensive result is not worth calculating here") else: log.debug(f"We already hit a path that needed the result, and it is {expensive_result}") Any thoughts on what might be the least ugly way to get that? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Tue, Jun 21, 2022 at 4:10 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
On Tue, Jun 21, 2022 at 5:53 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
I'll have to ponder my thoughts about the proposal as a whole, but this particular aspect seems dubious to me. As I understand it this would require some fairly deep changes to how evaluation works in Python. Right now in an expression like `type(blah)`, there isn't any way for the evaluation of `blah` to depend on the fact that it happens to occur as an argument to `type`.
I absolutely agree that this is a sore point in my first draft. I could shift the magic from `type()` to `isdeferred()`, but that doesn't really change anything for your examples. I suppose, that is, unless `isdeferred()` becomes something other than a real function, but more like some sort of macro. That doesn't make me happy either.
However, I *would* like to be able to answer the question "Is this object a DeferredObject?" somehow. For example, I'd like some way to write code similar to:
if isdeferred(expensive_result): log.debug("The computationally expensive result is not worth calculating here") else: log.debug(f"We already hit a path that needed the result, and it is {expensive_result}")
Any thoughts on what might be the least ugly way to get that?
I think all it really requires is for isdeferred() to be a builtin implemented in C rather than Python. It will be much better IMO to have isdeferred() returning a bool and not have any deferred object/type visible to Python. I think this PEP, although perhaps motivated by PEP 671 discussion, is really a generalization of PEP 690 (the lazy imports PEP.) The best implementation would be almost identical (relying on the dictionary implementation to reify lazy values on access), it’s just that lazy objects would have to be generalized to hold any expression, not only an import. The updated and more detailed draft of PEP 690 might be interesting reading here: https://github.com/python/peps/pull/2613 Carl
On Wed, 22 Jun 2022 at 08:21, Carl Meyer via Python-ideas <python-ideas@python.org> wrote:
On Tue, Jun 21, 2022 at 4:10 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
On Tue, Jun 21, 2022 at 5:53 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
I'll have to ponder my thoughts about the proposal as a whole, but this particular aspect seems dubious to me. As I understand it this would require some fairly deep changes to how evaluation works in Python. Right now in an expression like `type(blah)`, there isn't any way for the evaluation of `blah` to depend on the fact that it happens to occur as an argument to `type`.
I absolutely agree that this is a sore point in my first draft. I could shift the magic from `type()` to `isdeferred()`, but that doesn't really change anything for your examples. I suppose, that is, unless `isdeferred()` becomes something other than a real function, but more like some sort of macro. That doesn't make me happy either.
However, I *would* like to be able to answer the question "Is this object a DeferredObject?" somehow. For example, I'd like some way to write code similar to:
if isdeferred(expensive_result): log.debug("The computationally expensive result is not worth calculating here") else: log.debug(f"We already hit a path that needed the result, and it is {expensive_result}")
Any thoughts on what might be the least ugly way to get that?
I think all it really requires is for isdeferred() to be a builtin implemented in C rather than Python. It will be much better IMO to have isdeferred() returning a bool and not have any deferred object/type visible to Python.
It would have to be non-assignable and probably a keyword. Otherwise, the exact same issues will keep occurring. What are the scoping rules for deferred objects? Do they access names where they are evaluated, or where they are defined? Consider: def f(x): spam = 1 print(x) def g(): spam = 2 f(later spam) g() Does this print 1 or 2? This is a fundamental and crucial point, and must be settled early. You cannot defer this. :) ChrisA
On Tue, Jun 21, 2022 at 4:28 PM Chris Angelico <rosuav@gmail.com> wrote:
On Wed, 22 Jun 2022 at 08:21, Carl Meyer via Python-ideas <python-ideas@python.org> wrote:
On Tue, Jun 21, 2022 at 4:10 PM David Mertz, Ph.D. <
david.mertz@gmail.com> wrote:
On Tue, Jun 21, 2022 at 5:53 PM Brendan Barnwell <brenbarn@brenbarn.net>
wrote:
In the example, we assume that the built-in function `type()` is
special
in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
I'll have to ponder my thoughts about the proposal as a whole, but this particular aspect seems dubious to me. As I understand it this would require some fairly deep changes to how evaluation works in Python. Right now in an expression like `type(blah)`, there isn't any way for the evaluation of `blah` to depend on the fact that it happens to occur as an argument to `type`.
I absolutely agree that this is a sore point in my first draft. I could shift the magic from `type()` to `isdeferred()`, but that doesn't really change anything for your examples. I suppose, that is, unless `isdeferred()` becomes something other than a real function, but more like some sort of macro. That doesn't make me happy either.
However, I *would* like to be able to answer the question "Is this object a DeferredObject?" somehow. For example, I'd like some way to write code similar to:
if isdeferred(expensive_result): log.debug("The computationally expensive result is not worth calculating here") else: log.debug(f"We already hit a path that needed the result, and it is {expensive_result}")
Any thoughts on what might be the least ugly way to get that?
I think all it really requires is for isdeferred() to be a builtin implemented in C rather than Python. It will be much better IMO to have isdeferred() returning a bool and not have any deferred object/type visible to Python.
It would have to be non-assignable and probably a keyword. Otherwise, the exact same issues will keep occurring.
Mmm, you’re right. It would either need to be handled specially in the compiler, or it would have to use the PEP 690 approach of `is_lazy_import(globals(), “name”)` where it takes both the namespace dictionary and the key as string, to avoid triggering reification. What are the scoping rules for deferred objects? Do they access names
where they are evaluated, or where they are defined? Consider:
def f(x): spam = 1 print(x)
def g(): spam = 2 f(later spam)
g()
Does this print 1 or 2?
This is a fundamental and crucial point, and must be settled early. You cannot defer this. :)
Yes, I agree this is a crucial point, and the answer (which is already somewhat implied by the analogy to lambdas) should be that it closes over the definition scope (just like defining a lambda or online function would.) Carl
On Wed, 22 Jun 2022 at 08:36, Carl Meyer <carl@oddbird.net> wrote:
On Tue, Jun 21, 2022 at 4:28 PM Chris Angelico <rosuav@gmail.com> wrote:
On Wed, 22 Jun 2022 at 08:21, Carl Meyer via Python-ideas <python-ideas@python.org> wrote:
On Tue, Jun 21, 2022 at 4:10 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
On Tue, Jun 21, 2022 at 5:53 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
I'll have to ponder my thoughts about the proposal as a whole, but this particular aspect seems dubious to me. As I understand it this would require some fairly deep changes to how evaluation works in Python. Right now in an expression like `type(blah)`, there isn't any way for the evaluation of `blah` to depend on the fact that it happens to occur as an argument to `type`.
I absolutely agree that this is a sore point in my first draft. I could shift the magic from `type()` to `isdeferred()`, but that doesn't really change anything for your examples. I suppose, that is, unless `isdeferred()` becomes something other than a real function, but more like some sort of macro. That doesn't make me happy either.
However, I *would* like to be able to answer the question "Is this object a DeferredObject?" somehow. For example, I'd like some way to write code similar to:
if isdeferred(expensive_result): log.debug("The computationally expensive result is not worth calculating here") else: log.debug(f"We already hit a path that needed the result, and it is {expensive_result}")
Any thoughts on what might be the least ugly way to get that?
I think all it really requires is for isdeferred() to be a builtin implemented in C rather than Python. It will be much better IMO to have isdeferred() returning a bool and not have any deferred object/type visible to Python.
It would have to be non-assignable and probably a keyword. Otherwise, the exact same issues will keep occurring.
Mmm, you’re right. It would either need to be handled specially in the compiler, or it would have to use the PEP 690 approach of `is_lazy_import(globals(), “name”)` where it takes both the namespace dictionary and the key as string, to avoid triggering reification.
What are the scoping rules for deferred objects? Do they access names where they are evaluated, or where they are defined? Consider:
def f(x): spam = 1 print(x)
def g(): spam = 2 f(later spam)
g()
Does this print 1 or 2?
This is a fundamental and crucial point, and must be settled early. You cannot defer this. :)
Yes, I agree this is a crucial point, and the answer (which is already somewhat implied by the analogy to lambdas) should be that it closes over the definition scope (just like defining a lambda or online function would.)
If that's the case, then the example of late-bound argument defaults simply won't work. ChrisA
I haven't gotten to writing that into the PEP yet, but I think the rule has to be to take the scope of evaluation not the scope of definition. I know it needs to be there, and I'm just thinking about helpful examples. Which is to say the semantics are more like `eval()` than like a lambda closure. ... and I know this is going to raise the neck hairs of many folks, because effectively I'm proposing a kind of dynamic scoping. Possibly in my defense, I think Carl's PEP 690 can do the same thing. :-) On Tue, Jun 21, 2022, 6:27 PM Chris Angelico
What are the scoping rules for deferred objects? Do they access names where they are evaluated, or where they are defined?
On Tue, Jun 21, 2022 at 4:55 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
I haven't gotten to writing that into the PEP yet, but I think the rule has to be to take the scope of evaluation not the scope of definition. I know it needs to be there, and I'm just thinking about helpful examples.
Which is to say the semantics are more like `eval()` than like a lambda closure.
... and I know this is going to raise the neck hairs of many folks, because effectively I'm proposing a kind of dynamic scoping.
I’ll be curious to see the rationale for this in the PEP, because it seems like clearly the wrong choice to me. It turns the feature from “delayed computation” into something much stranger that I doubt I would support. If the motivation is “so it can be used in argument default values and subsume PEP 671,” I think that’s a bad reason. IMO putting complex default values that would require delayed evaluation into the function signature line is net harmful (because of the pass through problem) so I oppose both PEP 671 and the use of deferred evaluation for that purpose. Possibly in my defense, I think Carl's PEP 690 can do the same thing. :-)
I’m not sure what you mean. I don’t think there’s any way PEP 690 can introduce dynamic scoping like this. Can you give an example? Carl
Thanks Carl and Chris. After reading your comments, and thinking some more about it, I agree you are both correct that it only makes sense to have a DeferredObject act like a closure in the scope of its creation. That really should suffice for anything I could sensibly want; it's enough for Dask, and it's enough for functional programming languages with laziness. Moreover, the limited laziness that Python currently has also follows that rule. I also haven't gotten to writing that needed section of the PEP, but I need to address generators and boolean shortcutting as limited kinds of laziness. Obviously, my PEP is much broader than those, but generators also use lexical scope not dynamic scope. Possibly in my defense, I think Carl's PEP 690 can do the same thing. :-)
I’m not sure what you mean. I don’t think there’s any way PEP 690 can introduce dynamic scoping like this. Can you give an example?
Let me just withdraw that. I think I might be able to construct something where a module's top-level code does something perverse with the call stack where it winds up getting evaluated that can act like dynamic scoping. I might be wrong about that; but even if it's technically true, it would need very abnormal abuse of the language (as bad as the lovely zero-argument super() does :-)). I don't think this actually has much effect on encompassing late-binding of default arguments, other than needing to say the arguments contribute to the scope on a left-to-right basis. But actually, despite writing the section because of recent discussion, PEP-671 has little to do with why I want generalized deferreds. I've suggested it on a number of occasions long before PEP-671 existed (but admittedly always as a passing thought, not as a detailed proposal... which my draft still is not either). On one small point Chris mentions, I realized before he commented that there was no need to actually rebind the deferred argument to force evaluation of its default. Simply mentioning it would force the evaluation. I found something similar in PEP-690 that reminded me that doing this is plenty: def func(items=[], n=later len(items)): n # Evaluate the Deferred items.append("Hello") print(n) Of course, that's an extra first line of the function for the motivating case of emulating the sentinel pattern. But it also does let us *decide* when each deferred gets "fixed in place". I.e. maybe a, b, and c are all `later` arguments. We could make the first line `a, b` but only reference `c` at some later point where it was appropriate to the logic we wanted. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Wed, 22 Jun 2022 at 11:59, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Thanks Carl and Chris. After reading your comments, and thinking some more about it, I agree you are both correct that it only makes sense to have a DeferredObject act like a closure in the scope of its creation. That really should suffice for anything I could sensibly want; it's enough for Dask, and it's enough for functional programming languages with laziness.
Moreover, the limited laziness that Python currently has also follows that rule. I also haven't gotten to writing that needed section of the PEP, but I need to address generators and boolean shortcutting as limited kinds of laziness. Obviously, my PEP is much broader than those, but generators also use lexical scope not dynamic scope.
Possibly in my defense, I think Carl's PEP 690 can do the same thing. :-)
I’m not sure what you mean. I don’t think there’s any way PEP 690 can introduce dynamic scoping like this. Can you give an example?
Let me just withdraw that. I think I might be able to construct something where a module's top-level code does something perverse with the call stack where it winds up getting evaluated that can act like dynamic scoping. I might be wrong about that; but even if it's technically true, it would need very abnormal abuse of the language (as bad as the lovely zero-argument super() does :-)).
So then a deferred object is basically a lambda function that calls itself when referenced. That's reasonable, and broadly sane...
I don't think this actually has much effect on encompassing late-binding of default arguments, other than needing to say the arguments contribute to the scope on a left-to-right basis. But actually, despite writing the section because of recent discussion, PEP-671 has little to do with why I want generalized deferreds. I've suggested it on a number of occasions long before PEP-671 existed (but admittedly always as a passing thought, not as a detailed proposal... which my draft still is not either).
On one small point Chris mentions, I realized before he commented that there was no need to actually rebind the deferred argument to force evaluation of its default. Simply mentioning it would force the evaluation. I found something similar in PEP-690 that reminded me that doing this is plenty:
def func(items=[], n=later len(items)): n # Evaluate the Deferred items.append("Hello") print(n)
Of course, that's an extra first line of the function for the motivating case of emulating the sentinel pattern. But it also does let us *decide* when each deferred gets "fixed in place". I.e. maybe a, b, and c are all `later` arguments. We could make the first line `a, b` but only reference `c` at some later point where it was appropriate to the logic we wanted.
... but it doesn't work for this case. If a deferred is defined by its creation context, then the names 'len' and 'items' will be looked up in the surrounding scope, NOT the function's. It would be like this: def func(items=[], n=lambda: len(items)): n = n() which, in turn, is similar to this: _default = lambda: len(items) def func(items=[], n=None): if n is None: n = _default n = n() and it's fairly clear that this has the wrong scope. That's why PEP 671 has the distinction that the late-bound default is scoped within the function body, not its definition. Otherwise, this doesn't work. ChrisA
On Wed, 22 Jun 2022 at 08:54, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
I haven't gotten to writing that into the PEP yet, but I think the rule has to be to take the scope of evaluation not the scope of definition. I know it needs to be there, and I'm just thinking about helpful examples.
Which is to say the semantics are more like `eval()` than like a lambda closure.
... and I know this is going to raise the neck hairs of many folks, because effectively I'm proposing a kind of dynamic scoping. Possibly in my defense, I think Carl's PEP 690 can do the same thing. :-)
Okay. So, in order to make your generic deferreds do the same job as late-bound argument defaults, you need: 1) All deferreds to be evaluated in the scope of evaluation, not the scope of definition. This goes completely against all other Python scoping rules. 2) Deferreds used in argument defaults have to be re-deferred every time the function is called, to avoid coalescing to the exact same object every time. 3) And then you still have to do "n=n" at the start of the function to un-defer it. Point #1 is majorly problematic, because *every function* will have to be deoptimized the way that a star import would deoptimize a Python 2 function. Consider: def f(x): from sys import * print(x) x += y This doesn't work in Python 3, but in Python 2, this deoptimizes the function and stops it from knowing what *any* name means. In fact, in a closure, this is actually disallowed, because it's impossible to know what variables from the outer function would need to be retained in a closure. Making deferreds work the way you say would have this effect on *every function*. Consider: y = 1 def f(x): print("x is", x) print("y is", y) def g(): f(later y:=2) What is the scope of y when referred to in x? If a deferred object retains the scope of its definition, then f is a simple function that behaves sanely (it's possible for arbitrary code to be executed when you refer to a simple name, but that's basically just extending the concept of @property to all namespaces); y will be the global, and upon evaluating x, the local inside g would be reassigned. But if a deferred object is evaluated in the scope where it's referenced, f has to reassign y. That means it has to be compiled to be compatible with y being reassigned, simply because x could be a deferred object. EVERY function has to assume that EVERY name could be rebound in this way. It will become impossible to statically analyze anything, even to knowing what type of name something is. Is that what you actually want? Point #2 is also problematic. In every other way, an early-bound default is evaluated at call time. Deferred objects should be evaluated only once, which means that this will only print once: x = later print("Hello") x x But that means that, unless special magic is done, this will generate only a single list: def f(x=later []): x.append(1) return x The deferred object will be undeferred into a single list, and every subsequent call will reuse that same object. So you'd need some way to signal to the compiler that you want this particular deferred object to be redeferred every call, but undeferred within it. In fact, what you really want is for the deferred object to be created as the function begins, NOT as it's defined. In other words, to get PEP 671 semantics with deferred objects, you basically need PEP 671 semantics, plus the unnecessary overhead of packaging it up and then requiring that the programmer force evaluation at the start of the function. ChrisA
Hi David, I read the PEP and I think it would be useful to expand the Motivation and Examples sections. While indeed Dask uses lazy evaluation to build a complex computation without executing it, I don't think that it is the whole story. Dask takes this deferred complex computation and *plans* how to execute it and then it *executes* it in non-obvious/direct ways. For example, the computation of the min() of a dataframe can be done computing the min() of each partition of the dataframe and then computing the min() of them. Here is where the plan and the execution stages play. All of this is hidden from the developer. From his/her perspective the min() is called once over the whole dataframe. Dask's deferred computations are "useless" without the planning/execution plan. PySpark, like Dask, does exactly the same. But what about Django's ORM? Indeed Django allows you the build a SQL query without executing it. You can then perform more subqueries, joins and group by without executing them. Only when you need the real data the query is executed. This is another example of deferred execution similar to Dask/PySpark however when we consider the planning/execution stages the similarities ends there. Django's ORM writes a SQL query and send it to a SQL database. Another example of deferred execution would be my library to interact with web pages programmatically: selectq. Very much like an ORM, you can select elements from a web page, perform subselections and unions without really interacting with the web page. Only when you want to get the data from the page is when the deferred computations are executed and like an ORM, the plan done by selectq is to build a single xpath and then execute it using Selenium. So... Three cases: Dask/PySpark, Django's ORM and selectq. All of them implement deferred expressions but all of them "compute" them in very specific ways (aka, they plan and execute the computation differently). Would those libs (and probably others) do benefit from the PEP? How? Thanks, Martin. On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote:
Here is a very rough draft of an idea I've floated often, but not with much specification. Take this as "ideas" with little firm commitment to details from me. PRs, or issues, or whatever, can go to https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as mentioning them in this thread.
PEP: 9999 Title: Generalized deferred computation Author: David Mertz <dmertz@gnosis.cx> Discussions-To: https://mail.python.org/archives/list/python-ideas@python.org/thread/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 21-Jun-2022 Python-Version: 3.12 Post-History:
Abstract ========
This PEP proposes introducing the soft keyword ``later`` to express the concept of deferred computation. When an expression is preceded by the keyword, the expression is not evaluated but rather creates a "thunk" or "deferred object." Reference to the deferred object later in program flow causes the expression to be executed at that point, and for both the value and type of the object to become the result of the evaluated expression.
Motivation ==========
"Lazy" or "deferred" evaluation is useful paradigm for expressing relationships among potentially expensive operations prior their actual computation. Many functional programming languages, such as Haskell, build laziness into the heart of their language. Within the Python ecosystem, the popular scientific library `dask-delayed <dask-delayed>`_ provides a framework for lazy evaluation that is very similar to that proposed in this PEP.
.. _dask-delayed: https://docs.dask.org/en/stable/delayed.html
Examples of Use ===============
While the use of deferred computation is principally useful when computations are likely to be expensive, the simple examples shown do not necessarily use such expecially spendy computations. Most of these are directly inspired by examples used in the documentation of dask-delayed.
In dask-delayed, ``Delayed`` objects are create by functions, and operations create a *directed acyclic graph* rather than perform actual computations. For example::
import dask @dask.delayed ... def later(x): ... return x ... output = [] data = [23, 45, 62] for x in data: ... x = later(x) ... a = x * 3 ... b = 2**x ... c = a + b ... output.append(c) ... total = sum(output) total Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0') total.compute() 4611721202807865734 total.visualize()
.. figure:: pep-9999-dag.png :align: center :width: 50% :class: invert-in-dark-mode
Figure 1. Dask DAG created from simple operations.
Under this PEP, the soft keyword ``later`` would work in a similar manner to this dask.delayed code. But rather than requiring calling ``.compute()`` on a ``Delayed`` object to arrive at the result of a computation, every reference to a binding would perform the "compute" *unless* it was itself a deferred expression. So the equivalent code under this PEP would be::
output = [] data = [23, 45, 62] for later x in data: ... a = later (x * 3) ... b = later (2**x) ... c = later (a + b) ... output.append(later c) ... total = later sum(output) type(total) # type() does not un-thunk <class 'DeferredObject'> if value_needed: ... print(total) # Actual computation occurs here 4611721202807865734
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
In general, however, every regular reference to a bound object will force a computation and re-binding on a ``Deferred``. This includes access to simple names, but also similarly to instance attributes, index positions in lists or tuples, or any other means by which an object may be referenced.
Rejected Spellings ==================
A number of alternate spellings for creating a ``Deferred`` object are possible. This PEP-author has little preference among them. The words ``defer`` or ``delay``, or their past participles ``deferred`` and ``delayed`` are commonly used in discussions of lazy evaluation. All of these would work equally well as the suggested soft keyword ``later``. The keyword ``lazy`` is not completely implausible, but does not seem to read as well.
No punctuation is immediately obvious for this purpose, although surrounding expressions with backticks is somewhat suggestive of quoting in Lisp, and perhaps slightly reminiscent of the ancient use of backtick for shell commands in Python 1.x. E.g.::
might_use = `math.gcd(a, math.factorial(b))`
Relationship to PEP-0671 ========================
The concept of "late-bound function argument defaults" is introduced in :pep:`671`. Under that proposal, a special syntactic marker would be permitted in function signatures with default arguments to allow the expressions indicated as defaults to be evaluated at call time rather than at runtime. In current Python, we might write a toy function such as::
def func(items=[], n=None): if n is None: n = len(items) items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Using the :pep:`671` approach this could be simplified somewhat as::
def func(items=[], n=>len(items)): # late-bound defaults act as if bound here items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Under the current PEP, evaluation of a ``Deferred`` object only occurs upon reference. That is, for the current toy function, the evaluation would not occur until the ``print(n)`` line.::
def func(items=[], n=later len(items)): items.append("Hello") print(n)
func([1, 2, 3]) # prints: 4
To completely replicate the behavior of PEP-0671, an extra line at the start of the function body would be required::
def func(items=[], n=later len(items)): n = n # Evaluate the Deferred and re-bind the name n items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
References ==========
https://github.com/DavidMertz/peps/blob/master/pep-9999.rst
Copyright =========
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DQTR3C... Code of Conduct: http://python.org/psf/codeofconduct/
Hi Martin, Short answer: yes, I agree. Slightly longer: I would be eternally grateful if you wish to contribute to the PEP with any such expansion of the Motivation and Expansion. On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola <martinp.dipaola@gmail.com> wrote:
Hi David, I read the PEP and I think it would be useful to expand the Motivation and Examples sections.
While indeed Dask uses lazy evaluation to build a complex computation without executing it, I don't think that it is the whole story.
Dask takes this deferred complex computation and *plans* how to execute it and then it *executes* it in non-obvious/direct ways.
For example, the computation of the min() of a dataframe can be done computing the min() of each partition of the dataframe and then computing the min() of them. Here is where the plan and the execution stages play.
All of this is hidden from the developer. From his/her perspective the min() is called once over the whole dataframe.
Dask's deferred computations are "useless" without the planning/execution plan.
PySpark, like Dask, does exactly the same.
But what about Django's ORM? Indeed Django allows you the build a SQL query without executing it. You can then perform more subqueries, joins and group by without executing them.
Only when you need the real data the query is executed.
This is another example of deferred execution similar to Dask/PySpark however when we consider the planning/execution stages the similarities ends there.
Django's ORM writes a SQL query and send it to a SQL database.
Another example of deferred execution would be my library to interact with web pages programmatically: selectq.
Very much like an ORM, you can select elements from a web page, perform subselections and unions without really interacting with the web page.
Only when you want to get the data from the page is when the deferred computations are executed and like an ORM, the plan done by selectq is to build a single xpath and then execute it using Selenium.
So...
Three cases: Dask/PySpark, Django's ORM and selectq. All of them implement deferred expressions but all of them "compute" them in very specific ways (aka, they plan and execute the computation differently).
Would those libs (and probably others) do benefit from the PEP? How?
Thanks, Martin.
Here is a very rough draft of an idea I've floated often, but not with much specification. Take this as "ideas" with little firm commitment to
from me. PRs, or issues, or whatever, can go to https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as mentioning them in this thread.
PEP: 9999 Title: Generalized deferred computation Author: David Mertz <dmertz@gnosis.cx> Discussions-To: https://mail.python.org/archives/list/python-ideas@python.org/thread/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 21-Jun-2022 Python-Version: 3.12 Post-History:
Abstract ========
This PEP proposes introducing the soft keyword ``later`` to express the concept of deferred computation. When an expression is preceded by the keyword,
On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote: details the
expression is not evaluated but rather creates a "thunk" or "deferred object." Reference to the deferred object later in program flow causes the expression to be executed at that point, and for both the value and type of the object to become the result of the evaluated expression.
Motivation ==========
"Lazy" or "deferred" evaluation is useful paradigm for expressing relationships among potentially expensive operations prior their actual computation. Many functional programming languages, such as Haskell, build laziness into the heart of their language. Within the Python ecosystem, the popular scientific library `dask-delayed <dask-delayed>`_ provides a framework for lazy evaluation that is very similar to that proposed in this PEP.
.. _dask-delayed: https://docs.dask.org/en/stable/delayed.html
Examples of Use ===============
While the use of deferred computation is principally useful when computations are likely to be expensive, the simple examples shown do not necessarily use such expecially spendy computations. Most of these are directly inspired by examples used in the documentation of dask-delayed.
In dask-delayed, ``Delayed`` objects are create by functions, and operations create a *directed acyclic graph* rather than perform actual computations. For example::
import dask @dask.delayed ... def later(x): ... return x ... output = [] data = [23, 45, 62] for x in data: ... x = later(x) ... a = x * 3 ... b = 2**x ... c = a + b ... output.append(c) ... total = sum(output) total Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0') total.compute() 4611721202807865734 total.visualize()
.. figure:: pep-9999-dag.png :align: center :width: 50% :class: invert-in-dark-mode
Figure 1. Dask DAG created from simple operations.
Under this PEP, the soft keyword ``later`` would work in a similar manner to this dask.delayed code. But rather than requiring calling ``.compute()`` on a ``Delayed`` object to arrive at the result of a computation, every reference to a binding would perform the "compute" *unless* it was itself a deferred expression. So the equivalent code under this PEP would be::
output = [] data = [23, 45, 62] for later x in data: ... a = later (x * 3) ... b = later (2**x) ... c = later (a + b) ... output.append(later c) ... total = later sum(output) type(total) # type() does not un-thunk <class 'DeferredObject'> if value_needed: ... print(total) # Actual computation occurs here 4611721202807865734
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
In general, however, every regular reference to a bound object will force a computation and re-binding on a ``Deferred``. This includes access to simple names, but also similarly to instance attributes, index positions in lists or tuples, or any other means by which an object may be referenced.
Rejected Spellings ==================
A number of alternate spellings for creating a ``Deferred`` object are possible. This PEP-author has little preference among them. The words ``defer`` or ``delay``, or their past participles ``deferred`` and ``delayed`` are commonly used in discussions of lazy evaluation. All of these would work equally well as the suggested soft keyword ``later``. The keyword ``lazy`` is not completely implausible, but does not seem to read as well.
No punctuation is immediately obvious for this purpose, although surrounding expressions with backticks is somewhat suggestive of quoting in Lisp, and perhaps slightly reminiscent of the ancient use of backtick for shell commands in Python 1.x. E.g.::
might_use = `math.gcd(a, math.factorial(b))`
Relationship to PEP-0671 ========================
The concept of "late-bound function argument defaults" is introduced in :pep:`671`. Under that proposal, a special syntactic marker would be permitted in function signatures with default arguments to allow the expressions indicated as defaults to be evaluated at call time rather than at runtime. In current Python, we might write a toy function such as::
def func(items=[], n=None): if n is None: n = len(items) items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Using the :pep:`671` approach this could be simplified somewhat as::
def func(items=[], n=>len(items)): # late-bound defaults act as if bound here items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Under the current PEP, evaluation of a ``Deferred`` object only occurs upon reference. That is, for the current toy function, the evaluation would not occur until the ``print(n)`` line.::
def func(items=[], n=later len(items)): items.append("Hello") print(n)
func([1, 2, 3]) # prints: 4
To completely replicate the behavior of PEP-0671, an extra line at the start of the function body would be required::
def func(items=[], n=later len(items)): n = n # Evaluate the Deferred and re-bind the name n items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
References ==========
https://github.com/DavidMertz/peps/blob/master/pep-9999.rst
Copyright =========
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DQTR3C... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SK267M... Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Wed, 22 Jun 2022 at 18:35, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Hi Martin,
Short answer: yes, I agree. Slightly longer: I would be eternally grateful if you wish to contribute to the PEP with any such expansion of the Motivation and Expansion.
One concern I have, triggered by Martin's Dask, PySpark and Django examples, is that we've seen proposals in the past for "deferred expression" objects that capture an unevaluated expression, and make its AST available for user code to manipulate. The three examples here could all use such a feature, as could other ORMs (and I'm sure there are other use cases). This is in contrast to your proposal, which doesn't seem to help those use cases (if it does, I'd like to understand how). The key distinction seems to be that with your proposal, evaluation is "on reference" and unavoidable, whereas in the other proposals I've seen, evaluation happens on demand (and as a result, it's also possible to work with the expression AST *before* evaluation). My concern is that we're unlikely to be able to justify *two* forms of "deferred expression" construct in Python, and your proposal, by requiring transparent evaluation on reference, would preclude any processing (such as optimisation, name injection, or other forms of AST manipulation) of the expression before evaluation. I suspect that you consider evaluation-on-reference as an important feature of your proposal, but could you consider explicit evaluation as an alternative? Or at the very least address in the PEP the fact that this would close the door on future explicit evaluation models? Paul
On Jun 22, 2022, at 2:12 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On Wed, 22 Jun 2022 at 18:35, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
Hi Martin,
Short answer: yes, I agree. Slightly longer: I would be eternally grateful if you wish to contribute to the PEP with any such expansion of the Motivation and Expansion.
One concern I have, triggered by Martin's Dask, PySpark and Django examples, is that we've seen proposals in the past for "deferred expression" objects that capture an unevaluated expression, and make its AST available for user code to manipulate. The three examples here could all use such a feature, as could other ORMs (and I'm sure there are other use cases). This is in contrast to your proposal, which doesn't seem to help those use cases (if it does, I'd like to understand how).
The key distinction seems to be that with your proposal, evaluation is "on reference" and unavoidable, whereas in the other proposals I've seen, evaluation happens on demand (and as a result, it's also possible to work with the expression AST *before* evaluation). My concern is that we're unlikely to be able to justify *two* forms of "deferred expression" construct in Python, and your proposal, by requiring transparent evaluation on reference, would preclude any processing (such as optimisation, name injection, or other forms of AST manipulation) of the expression before evaluation.
I suspect that you consider evaluation-on-reference as an important feature of your proposal, but could you consider explicit evaluation as an alternative? Or at the very least address in the PEP the fact that this would close the door on future explicit evaluation models?
Every time I’ve looked at this, I come back to: other than the clunky syntax, how is explicit evaluation different from a zero-argument lambda? Eric
On Wed, Jun 22, 2022 at 2:17 PM Eric V. Smith <eric@trueblade.com> wrote:
Every time I’ve looked at this, I come back to: other than the clunky syntax, how is explicit evaluation different from a zero-argument lambda?
The difference is in composition of operations. I can write a dozen zero-argument lambdas easily enough. But those are all isolated. I've enhanced the PEP, so maybe look at the link for some of my updates; but I need to add a bunch more, so don't want to repost each small draft change. But basically, think about `x = (later expensive1() + later expensive2()) / later expensive3()`. How can we make `x` itself be a zero argument lambda? And not just with those exact operations on the Deferreds, but arbitrary combinations of Deferreds to create more complex Deferreds, without performing the intermediate computations? -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Wed, Jun 22, 2022 at 02:30:14PM -0400, David Mertz, Ph.D. wrote:
The difference is in composition of operations. I can write a dozen zero-argument lambdas easily enough. But those are all isolated. But basically, think about `x = (later expensive1() + later expensive2()) / later expensive3()`. How can we make `x` itself be a zero argument lambda? [... see below ...]
The following three constructions are roughly equivalent: # Using proposed PEP x = (later expensive1() + later expensive2()) / later expensive3() x.compute() # Using zero-argument nested lambdas x = lambda: ((lambda: expensive1())() + (lambda: expensive2())()) / (lambda: expensive3())() x() # compute # Using the good old function def x(): return (expensive1() + expensive2()) / expensive3() x() # compute
[... cont ...] And not just with those exact operations on the Deferreds, but arbitrary combinations of Deferreds to create more complex Deferreds, without performing the intermediate computations?
Perhaps this is the real focus to analyze. The PEP suggests that x.compute() does something *fundamentally* different from just executing the intermediate expressions. The *key* difference between the PEP proposal and the use of zero-argument lambdas or good-old functions is that no intermediate expression is computed (expensive1, expensive2 and expensive3 separately) but the whole combined expression is computed by *xxx* allowing it to perform *yyy*. Having said that, I want to remark the "xxx" and "yyy" above are missing from the PEP. The PEP does not explain who nor how these deferred computations are really executed (the xxx and yyy). As I mentioned in my previous email, the PEP gives an example about how Dask uses deferred expression to build and distribute complex arithmetic/statistics operations over a partitioned dataframe but this example cannot be used to assume that Python will do it in the same way. In my previous email I mentioned others libs that implement deferred expressions and "should" benefit from the PEP however I cannot see it how. Perhaps a concrete example borrowed from selectq could spot the missing pieces (xxx and yyy): # from selectq red_divs = sQ.select("div", class_="red") # this is deferred blue_divs = sQ.select("div", class_="blue") # this is deferred both = red_divs | blue_divs # this is deferred count = both.count() # the count() forces the real computation Now, I imagine rewriting it using PEP but.... # using PEP red_divs = later sQ.select("div", class_="red") blue_divs = later sQ.select("div", class_="blue") both = later (red_divs | blue_divs) count = later both.count() count = count.compute() ...but how does know the Python VM what "compute()" means or how to do it? (the yyy). Is the Python VM who should do it? (the xxx) I would assume that it is not the Python VM but the library... - how the library will know what compute() means? - how will know about the intermediate red_divs and blue_divs? - what benefits would bring this PEP to libs like PySpark, selectq and Django's ORM (to mention a few)? So far the PEP *does not* explain that and it only talks about how to delay plain python code. In this sense then, I would *agree* with Eric Smith. David may have a different intention, but in the current form the PEP is equivalent to zero-argument lambdas. Thanks, Martin. On Wed, Jun 22, 2022 at 02:30:14PM -0400, David Mertz, Ph.D. wrote:
On Wed, Jun 22, 2022 at 2:17 PM Eric V. Smith <eric@trueblade.com> wrote:
Every time I’ve looked at this, I come back to: other than the clunky syntax, how is explicit evaluation different from a zero-argument lambda?
I've enhanced the PEP, so maybe look at the link for some of my updates; but I need to add a bunch more, so don't want to repost each small draft change.
But basically, think about `x = (later expensive1() + later expensive2()) / later expensive3()`. How can we make `x` itself be a zero argument lambda? And not just with those exact operations on the Deferreds, but arbitrary combinations of Deferreds to create more complex Deferreds, without performing the intermediate computations?
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Y6LVDC... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, 22 Jun 2022 at 22:52, Martin Di Paola <martinp.dipaola@gmail.com> wrote:
Perhaps this is the real focus to analyze. The PEP suggests that x.compute() does something *fundamentally* different from just executing the intermediate expressions.
Hang on, did the PEP change? The version I saw didn't have a compute() method, deferred objects were just evaluated when they were referenced. There's a *huge* difference (in my opinion) between auto-executing deferred expressions, and a syntax for creating *objects* that can be asked to calculate their value. And yes, the latter is extremely close to being nothing more than "a shorter and more composable form of zero-arg lambda", so it needs to be justifiable in comparison to zero-arg lambda (which is why I'm more interested in the composability aspect, building an AST by combining delayed expressions into larger ones). Paul
On Wed, Jun 22, 2022 at 11:22:05PM +0100, Paul Moore wrote:
Hang on, did the PEP change? The version I saw didn't have a compute() method, deferred objects were just evaluated when they were referenced.
You are right, the PEP does not mention a compute() method but uses the that term. I just used to make explicit when the evaluation takes place in the examples that I gave. My bad.
There's a *huge* difference (in my opinion) between auto-executing deferred expressions, and a syntax for creating *objects* that can be asked to calculate their value. And yes, the latter is extremely close to being nothing more than "a shorter and more composable form of zero-arg lambda", so it needs to be justifiable in comparison to zero-arg lambda (which is why I'm more interested in the composability aspect, building an AST by combining delayed expressions into larger ones).
Agree, the *huge* difference is what I tried to highlight because it is there where I see holes in the PEP. Building an AST as you mentioned could fill on of those holes but how they are iterated and evaluated is still missing. Of course, the exactly details will depend of the library that theoretically could use deferred expressions (like PySpark) but still I see non trivial details to fill. - what would be the API for the objects of the AST that represents the deferred expresion(s) ? - how the "evaluator" of the expressions would iterate over them? Do will the "evaluator" have to check that every of the expressions is meaningful for it? - does the AST simplifies the implementation of existing libs implementing deferred methods? - who is the "evaluator" in the case of expressions that don't share a common "implementation"? Allow me to expand on the last item: # some Dask code df = later dask_df.filter(...) s = later df.sum() # some selectq code d = later sQ.select("div") c = later d.count() # now, mix and compute! (s + c).compute() I can see how the deferred expressions are linked and how the AST is built but "who" knows how to execute it... I'm not sure. Will be Dask that will know how to plan, optimize and execute the sum() over the partitions of the dataframe, or will be selectq that knows how to build an xpath and talk with Selenium? May be will be the Python VM? May be the three? I know that those questions have an answer but I still fill that there are more unknowns (specially of why the PEP would be useful for some lib). Thanks, Martin.
On Wed, Jun 22, 2022 at 02:30:14PM -0400, David Mertz, Ph.D. wrote:
But basically, think about `x = (later expensive1() + later expensive2()) / later expensive3()`. How can we make `x` itself be a zero argument lambda? [... see below ...]
x = lambda: (expensive1() + expensive2()) / expensive3() What am I missing? I don't understand what you meant by saying that zero-argument lambdas are "isolated". It sounds like a way that you happen to think about lambdas, and not an objective property of lambdas. This proposal is like lambda on the definition end, but on the invocation end the call happens implicitly. In effect you have to explicitly mark everywhere that you *don't* want it to be called instead of everywhere that you *do* want it to be called. It isn't clear to me that that's better, much less enough better to justify changing the semantics of what I suppose is the single most common operation in Python ("getting a value"). On Wed, Jun 22, 2022 at 2:53 PM Martin Di Paola <martinp.dipaola@gmail.com> wrote:
# Using zero-argument nested lambdas x = lambda: ((lambda: expensive1())() + (lambda: expensive2())()) / (lambda: expensive3())()
Why not just expensive1() instead of (lambda: expensive1())()?
On 22 Jun 2022, at 19:09, Paul Moore <p.f.moore@gmail.com> wrote:
I suspect that you consider evaluation-on-reference as an important feature of your proposal, but could you consider explicit evaluation as an alternative? Or at the very least address in the PEP the fact that this would close the door on future explicit evaluation models?
I can think of ways to implement evaluation-on-reference, but they all have the effect of making python slower. The simple a = b will need to slow down so that the object in b can checked to see if it need evaluating. How will you avoid making python slower with this feature? Barry
Barry Scott writes:
I can think of ways to implement evaluation-on-reference, but they all have the effect of making python slower.
Probably.
The simple
a = b
will need to slow down so that the object in b can checked to see if it need evaluating.
No, it doesn't. Binding a name is special in many ways, why not this one too? Or "a = a" could be the idiom for "resolve a deferred now", which would require the check for __evaluate_me_now__ as you say. But such simple "a = b" assignments are not so common that they would be a major slowdown. I would think the real problem would be the "oops" of doing "a = b" and evaluating a deferred you don't want to evaluate. But this isn't a completely new problem, it's similar to a = b = [] and expecting a is not b. Now consider a = b + 0. b.__add__ will be invoked in the usual way. Only if b is a deferred will evaluation take place. So I don't really see the rest of Python slowing down much.
On 23 Jun 2022, at 08:27, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Barry Scott writes:
I can think of ways to implement evaluation-on-reference, but they all have the effect of making python slower.
Probably.
The simple
a = b
will need to slow down so that the object in b can checked to see if it need evaluating.
No, it doesn't. Binding a name is special in many ways, why not this one too? Or "a = a" could be the idiom for "resolve a deferred now", which would require the check for __evaluate_me_now__ as you say. But such simple "a = b" assignments are not so common that they would be a major slowdown. I would think the real problem would be the "oops" of doing "a = b" and evaluating a deferred you don't want to evaluate. But this isn't a completely new problem, it's similar to a = b = [] and expecting a is not b.
Interest idea that ref does not auto evaluate in all cases. I was wondering about what the compile/runtime can do it avoid the costs of checking for an evaluation.
Now consider a = b + 0. b.__add__ will be invoked in the usual way. Only if b is a deferred will evaluation take place.
But the act of checking if b is deferred is a cost I am concerned about.
So I don't really see the rest of Python slowing down much.
Once we have the PEP address it’s semantics in detail we can estimate the costs. I would think that it’s not that hard to add the expected check into the python ceval.c And benchmark the impact of the checks. This would not need a full implementation of the deferred mechanism. Barry
Barry writes:
On 23 Jun 2022, at 08:27, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Interest idea that ref does not auto evaluate in all cases. I was wondering about what the compile/runtime can do it avoid the costs of checking for an evaluation.
I think the main thing to do is to put the burden on the deferred object, since it has to be something special in any case.
Now consider a = b + 0. b.__add__ will be invoked in the usual way. Only if b is a deferred will evaluation take place.
But the act of checking if b is deferred is a cost I am concerned about.
That's true in David's proposed semantics, where the runtime does that check. I'm suggesting modified semantics where deferreds can be a proxy object, whose normal reaction to *any* operation (possibly excepting name binding) is 1. check for a memoized value, if not found evaluate its stored code, and memoize the value 2. perform the action on the memoized value That means that in the statement "a = b + 0", if b is an int, int.__add__(b, 0) gets called with no burden to code that uses no deferreds. Then the question is, why do we need syntax? Well, there's the PEP 671 rationales for deferring function argument defaults. There is also the question of whether name binding should trigger evalution. If so, a = defer b would (somewhat similar to iter) defer normal b (this could be optimized by checking for non-mutables) and "re-defer" a deferred b (ie, just bind it to a without evaluation). The same consideration would apply to "as" and function arguments (possibly with different resolutions! I'm probably missing some name-binding syntax here, but it should be clear where this going).
I would think that it’s not that hard to add the expected check into the python ceval.c And benchmark the impact of the checks. This would not need a full implementation of the deferred mechanism.
+1 Although I'm not going to spend next weekend it. ;-)
On Fri, 24 Jun 2022 at 17:51, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
Then the question is, why do we need syntax? Well, there's the PEP 671 rationales for deferring function argument defaults.
Which the OP hasn't actually explained in a way that works yet. A vanilla argument default value, with any sort of deferred object as described by anyone in this thread so far, would not achieve the semantics of a late-bound argument default - usually failing on this test: def f(x=[]): x.append(1) return x How do you make this construct a new list every time it is called with no arguments? "def f(x=later [])" just delays the construction of the single default list until the first time it's needed, but it won't create a new one.
There is also the question of whether name binding should trigger evalution. If so,
a = defer b
would (somewhat similar to iter) defer normal b (this could be optimized by checking for non-mutables) and "re-defer" a deferred b (ie, just bind it to a without evaluation).
Not sure what you mean by "similar to iter", but my understanding of "a = defer b" (or "a = later b" or whatever the syntax is) is "when a gets referenced, go figure out what b is", and that's going to have to go and poke b once a gets poked. There could be an optimization here, but the vanilla interpretation is that it's a new thunk. The underlying b won't be evaluated at this point.
The same consideration would apply to "as" and function arguments (possibly with different resolutions! I'm probably missing some name-binding syntax here, but it should be clear where this going).
It's not really the name binding that's the consideration - it's the lookup. Borrowing CPython's bytecode to explain, "a = b" looks like this: "Load b, then store your latest thing into a". (Both the load and the store need to know what kind of name to work with - a fast local, a closure cell (whether from the outer or inner function), a global, or something harder to optimize ("LOAD_NAME" and "STORE_NAME" are used in class context).) Name binding should never trigger evaluation, as that would make it impossible to use deferred objects at all; but using the name on the RHS would constitute a load, and based on some versions of the proposal, that is the point at which evaluation happens. ChrisA
On Fri, Jun 24, 2022 at 3:50 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:
That's true in David's proposed semantics, where the runtime does that check. I'm suggesting modified semantics where deferreds can be a proxy object, whose normal reaction to *any* operation (possibly excepting name binding) is
1. check for a memoized value, if not found evaluate its stored code, and memoize the value 2. perform the action on the memoized value
I think I like these semantics better than those my draft proposal. I haven't had a chance to enhance the proto-PEP more in the last few days (other work). But all of these comments are extremely helpful, and I'll have a better version in a few days. Hopefully I can address many of the concerns raised.
David Mertz, Ph.D. writes:
On Fri, Jun 24, 2022 at 3:50 AM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
I'm suggesting modified semantics where deferreds can be a proxy object, whose normal reaction to *any* operation (possibly excepting name binding) is
1. check for a memoized value, if not found evaluate its stored code, and memoize the value 2. perform the action on the memoized value
I think I like these semantics better than those my draft proposal. I haven't had a chance to enhance the proto-PEP more in the last few days (other work). But all of these comments are extremely helpful, and I'll have a better version in a few days. Hopefully I can address many of the concerns raised.
We (not sure how much help I'll be, but I'm in) need to deal with Chris A's point that a pure memoizing object doesn't help with the mutable defaults problem. That is with def foo(cookiejar=defer []): foo() produces a late bound empty list that will be used again the next time foo() is invoked. Now, we could modify the defer syntax in function parameter default values to produce a deferred deferred object (or, more likely, a deferred object that lacks the memoization functionality). But I suspect Chris will respond with (a polite expression with the semantics of) the puke emoji, and I'm not sure I disagree, yet. ;-) Steve
On Sat, 25 Jun 2022 at 00:54, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
David Mertz, Ph.D. writes:
On Fri, Jun 24, 2022 at 3:50 AM Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
I'm suggesting modified semantics where deferreds can be a proxy object, whose normal reaction to *any* operation (possibly excepting name binding) is
1. check for a memoized value, if not found evaluate its stored code, and memoize the value 2. perform the action on the memoized value
I think I like these semantics better than those my draft proposal. I haven't had a chance to enhance the proto-PEP more in the last few days (other work). But all of these comments are extremely helpful, and I'll have a better version in a few days. Hopefully I can address many of the concerns raised.
We (not sure how much help I'll be, but I'm in) need to deal with Chris A's point that a pure memoizing object doesn't help with the mutable defaults problem. That is with
def foo(cookiejar=defer []):
foo() produces a late bound empty list that will be used again the next time foo() is invoked.
Now, we could modify the defer syntax in function parameter default values to produce a deferred deferred object (or, more likely, a deferred object that lacks the memoization functionality). But I suspect Chris will respond with (a polite expression with the semantics of) the puke emoji, and I'm not sure I disagree, yet. ;-)
It can't simply lack the memoization functionality, as that would break the obvious expectation that the list is the same throughout the function: def foo(cookiejar=defer []): cookiejar.append(1) # new list here cookiejar.append(2) # another new list??? So the only way around it would be to make the defer keyword somehow magical when used in a function signature, which kinda defeats the whole point about being able to reuse another mechanic to achieve this. Also, it would create some other oddity, depending on which way this is handled: _default = defer [] def foo(cookiejar=_default): Does this also get the magic, or doesn't it? Either way, there'd be a really weird inconsistency here. ChrisA
Chris Angelico writes:
So the only way around it would be to make the defer keyword somehow magical when used in a function signature, which kinda defeats the whole point about being able to reuse another mechanic to achieve this.
The defer keyword is already magical. Overloading it with more magic doesn't bother me. The question of internal consistency of the various magics does bother me.
Also, it would create some other oddity, depending on which way this is handled:
_default = defer [] def foo(cookiejar=_default):
Does this also get the magic, or doesn't it? Either way, there'd be a really weird inconsistency here.
Don't know, need to think about the definition and implementation of the magic first.
On Fri, Jun 24, 2022 at 10:52 AM Stephen J. Turnbull < stephenjturnbull@gmail.com> wrote:
We (not sure how much help I'll be, but I'm in) need to deal with Chris A's point that a pure memoizing object doesn't help with the mutable defaults problem. That is with
def foo(cookiejar=defer []):
foo() produces a late bound empty list that will be used again the next time foo() is invoked.
Yeah... this is a good point, and it is more frustrating to cover the late-bound argument case. I welcome edits to the proto-PEP (you can add yourself as co-author, even if you wind up at -1 on it :-)). I think I'm more-or-less U+1F922 on "deferred deferred" myself. 🤢 -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola <martinp.dipaola@gmail.com> wrote:
Hi David, I read the PEP and I think it would be useful to expand the Motivation and Examples sections. While indeed Dask uses lazy evaluation to build a complex computation without executing it, I don't think that it is the whole story. Dask takes this deferred complex computation and *plans* how to execute it and then it *executes* it in non-obvious/direct ways.
Dask is very clever about execution planning. Ray is possibly even more clever in that regard. However, I think that that should be an explicit non-goal of the PEP. DeferredObjects should create a DAG, yes. But I think Python itself should not think about being *clever* in evaluating that DAG, nor in itself think about parallelism. If my PEP were adopted, that would be something other libraries like Dask or Django could build on top of with more elaborate evaluation plans. But just the DAG itself gets you more than just "wait until needed to do the final computation." It allows for intermediate computation of nodes of the DAG lower down than the final result. For example, imagine dependencies like this (where all the computation steps are expensive): A -> B B -> Z B -> Y B -> X A -> C C -> X X -> F X -> G C -> W C -> V A -> D D -> V D -> U D -> T Hopefully you either see my ASCII art in fixed font, or it's at least intelligible. If I want to force evaluation of A, I need to do everything. But if it turns out all I need within my program is C, then I have to do computations C, X, F, G, W, V. Which is maybe still expensive, but at least I don't worry about B, Z, Y, U, T, or A. Yes, I should add something like this to the PEP.
Could this idea be used to specify the default factory of a dataclass field? For example, @dataclass class X: x: list[int] = deferred [] instead of @dataclass class X: x: list[int] = field(default_factory=list) If so, it might be worth adding to your proposal as another common motivating example? Best, Neil On Wednesday, June 22, 2022 at 2:23:13 PM UTC-4 David Mertz, Ph.D. wrote:
On Wed, Jun 22, 2022 at 10:47 AM Martin Di Paola <martinp...@gmail.com> wrote:
Hi David, I read the PEP and I think it would be useful to expand the Motivation and Examples sections. While indeed Dask uses lazy evaluation to build a complex computation without executing it, I don't think that it is the whole story. Dask takes this deferred complex computation and *plans* how to execute it and then it *executes* it in non-obvious/direct ways.
Dask is very clever about execution planning. Ray is possibly even more clever in that regard.
However, I think that that should be an explicit non-goal of the PEP. DeferredObjects should create a DAG, yes. But I think Python itself should not think about being *clever* in evaluating that DAG, nor in itself think about parallelism. If my PEP were adopted, that would be something other libraries like Dask or Django could build on top of with more elaborate evaluation plans.
But just the DAG itself gets you more than just "wait until needed to do the final computation." It allows for intermediate computation of nodes of the DAG lower down than the final result. For example, imagine dependencies like this (where all the computation steps are expensive):
A -> B B -> Z B -> Y B -> X A -> C C -> X X -> F X -> G C -> W C -> V A -> D D -> V D -> U D -> T
Hopefully you either see my ASCII art in fixed font, or it's at least intelligible. If I want to force evaluation of A, I need to do everything. But if it turns out all I need within my program is C, then I have to do computations C, X, F, G, W, V. Which is maybe still expensive, but at least I don't worry about B, Z, Y, U, T, or A.
Yes, I should add something like this to the PEP.
Martin Di Paola wrote:
Three cases: Dask/PySpark, Django's ORM and selectq. All of them implement deferred expressions but all of them "compute" them in very specific ways (aka, they plan and execute the computation differently).
So - I've been hit with the "transparency execution of deferred code" dilemma before. What happens is that: Python, at one point will have to "use" an object - and that use is through calling one of the dunder methods. Up to that time, like, just writing the object name in a no-operation line, does nothing. (unless the line is in a REPL, which will then call the __repr__ method in the object). I have implemented a toy project far in the past that would implement "all possible" dunder methods, and proxy those to the underlying object, for a "future type" that was calculated off-process, and did not need any ".value()" or ".result()" methods to be called. Any such an object, that has slots for all dunder methods, any of which, when called, would trigger the resolve, could work today, without any modification, to implement the proposed behavior. And all that is needed to be possible to manipulate the object before the evaluation takes place, is a single reserved name, within the object namespace, that is not a proxy to evaluation. It could be special-cased within the object's class __getattribute__ itself: not even a new reserved dunder slot would be needed: that is: "getattr(myobj, "_is_deferred", False)" would not trigger the evaluation. (although a special slot for it in object would allow plain checking using "myobj.__is_deferred__" without the need to use getattr or hasattr) So, all that would be needed for such a feature would be keyword support to build this special proxy type. That said, the usefulness or not of this proposal can be better thought, as well, as, knowing that this "special attribute" mechanism can be used to add further inspection/modification mechanisms to the delayed objects. The act of "filling in all possible dunder methods" itself is quite hacky, but even if done in C, I don't think it could be avoided. Here is the code I referred to that implements the same proxy type that would be needed for this feature - (IRRC it is even pip installable): https://bitbucket.org/jsbueno/lelo/src/master/lelo/_lelo.py On Wed, Jun 22, 2022 at 11:46 AM Martin Di Paola <martinp.dipaola@gmail.com> wrote:
Hi David, I read the PEP and I think it would be useful to expand the Motivation and Examples sections.
While indeed Dask uses lazy evaluation to build a complex computation without executing it, I don't think that it is the whole story.
Dask takes this deferred complex computation and *plans* how to execute it and then it *executes* it in non-obvious/direct ways.
For example, the computation of the min() of a dataframe can be done computing the min() of each partition of the dataframe and then computing the min() of them. Here is where the plan and the execution stages play.
All of this is hidden from the developer. From his/her perspective the min() is called once over the whole dataframe.
Dask's deferred computations are "useless" without the planning/execution plan.
PySpark, like Dask, does exactly the same.
But what about Django's ORM? Indeed Django allows you the build a SQL query without executing it. You can then perform more subqueries, joins and group by without executing them.
Only when you need the real data the query is executed.
This is another example of deferred execution similar to Dask/PySpark however when we consider the planning/execution stages the similarities ends there.
Django's ORM writes a SQL query and send it to a SQL database.
Another example of deferred execution would be my library to interact with web pages programmatically: selectq.
Very much like an ORM, you can select elements from a web page, perform subselections and unions without really interacting with the web page.
Only when you want to get the data from the page is when the deferred computations are executed and like an ORM, the plan done by selectq is to build a single xpath and then execute it using Selenium.
So...
Three cases: Dask/PySpark, Django's ORM and selectq. All of them implement deferred expressions but all of them "compute" them in very specific ways (aka, they plan and execute the computation differently).
Would those libs (and probably others) do benefit from the PEP? How?
Thanks, Martin.
Here is a very rough draft of an idea I've floated often, but not with much specification. Take this as "ideas" with little firm commitment to
from me. PRs, or issues, or whatever, can go to https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as mentioning them in this thread.
PEP: 9999 Title: Generalized deferred computation Author: David Mertz <dmertz@gnosis.cx> Discussions-To: https://mail.python.org/archives/list/python-ideas@python.org/thread/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 21-Jun-2022 Python-Version: 3.12 Post-History:
Abstract ========
This PEP proposes introducing the soft keyword ``later`` to express the concept of deferred computation. When an expression is preceded by the keyword,
On Tue, Jun 21, 2022 at 04:53:44PM -0400, David Mertz, Ph.D. wrote: details the
expression is not evaluated but rather creates a "thunk" or "deferred object." Reference to the deferred object later in program flow causes the expression to be executed at that point, and for both the value and type of the object to become the result of the evaluated expression.
Motivation ==========
"Lazy" or "deferred" evaluation is useful paradigm for expressing relationships among potentially expensive operations prior their actual computation. Many functional programming languages, such as Haskell, build laziness into the heart of their language. Within the Python ecosystem, the popular scientific library `dask-delayed <dask-delayed>`_ provides a framework for lazy evaluation that is very similar to that proposed in this PEP.
.. _dask-delayed: https://docs.dask.org/en/stable/delayed.html
Examples of Use ===============
While the use of deferred computation is principally useful when computations are likely to be expensive, the simple examples shown do not necessarily use such expecially spendy computations. Most of these are directly inspired by examples used in the documentation of dask-delayed.
In dask-delayed, ``Delayed`` objects are create by functions, and operations create a *directed acyclic graph* rather than perform actual computations. For example::
import dask @dask.delayed ... def later(x): ... return x ... output = [] data = [23, 45, 62] for x in data: ... x = later(x) ... a = x * 3 ... b = 2**x ... c = a + b ... output.append(c) ... total = sum(output) total Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0') total.compute() 4611721202807865734 total.visualize()
.. figure:: pep-9999-dag.png :align: center :width: 50% :class: invert-in-dark-mode
Figure 1. Dask DAG created from simple operations.
Under this PEP, the soft keyword ``later`` would work in a similar manner to this dask.delayed code. But rather than requiring calling ``.compute()`` on a ``Delayed`` object to arrive at the result of a computation, every reference to a binding would perform the "compute" *unless* it was itself a deferred expression. So the equivalent code under this PEP would be::
output = [] data = [23, 45, 62] for later x in data: ... a = later (x * 3) ... b = later (2**x) ... c = later (a + b) ... output.append(later c) ... total = later sum(output) type(total) # type() does not un-thunk <class 'DeferredObject'> if value_needed: ... print(total) # Actual computation occurs here 4611721202807865734
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
In general, however, every regular reference to a bound object will force a computation and re-binding on a ``Deferred``. This includes access to simple names, but also similarly to instance attributes, index positions in lists or tuples, or any other means by which an object may be referenced.
Rejected Spellings ==================
A number of alternate spellings for creating a ``Deferred`` object are possible. This PEP-author has little preference among them. The words ``defer`` or ``delay``, or their past participles ``deferred`` and ``delayed`` are commonly used in discussions of lazy evaluation. All of these would work equally well as the suggested soft keyword ``later``. The keyword ``lazy`` is not completely implausible, but does not seem to read as well.
No punctuation is immediately obvious for this purpose, although surrounding expressions with backticks is somewhat suggestive of quoting in Lisp, and perhaps slightly reminiscent of the ancient use of backtick for shell commands in Python 1.x. E.g.::
might_use = `math.gcd(a, math.factorial(b))`
Relationship to PEP-0671 ========================
The concept of "late-bound function argument defaults" is introduced in :pep:`671`. Under that proposal, a special syntactic marker would be permitted in function signatures with default arguments to allow the expressions indicated as defaults to be evaluated at call time rather than at runtime. In current Python, we might write a toy function such as::
def func(items=[], n=None): if n is None: n = len(items) items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Using the :pep:`671` approach this could be simplified somewhat as::
def func(items=[], n=>len(items)): # late-bound defaults act as if bound here items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Under the current PEP, evaluation of a ``Deferred`` object only occurs upon reference. That is, for the current toy function, the evaluation would not occur until the ``print(n)`` line.::
def func(items=[], n=later len(items)): items.append("Hello") print(n)
func([1, 2, 3]) # prints: 4
To completely replicate the behavior of PEP-0671, an extra line at the start of the function body would be required::
def func(items=[], n=later len(items)): n = n # Evaluate the Deferred and re-bind the name n items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
References ==========
https://github.com/DavidMertz/peps/blob/master/pep-9999.rst
Copyright =========
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DQTR3C... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/SK267M... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Jun 22, 2022 at 6:36 PM Joao S. O. Bueno <jsbueno@python.org.br> wrote:
implement "all possible" dunder methods, and proxy those to the underlying object, for a "future type" that was calculated off-process, and did not need any ".value()" or ".result()" methods to be called.
Here's a package on PyPI that seems to do that: https://pypi.org/project/lazy-object-proxy/ It's written partly in C, so it may be fast. I haven't tested it.
On Thu, 23 Jun 2022 at 11:35, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
Martin Di Paola wrote:
Three cases: Dask/PySpark, Django's ORM and selectq. All of them implement deferred expressions but all of them "compute" them in very specific ways (aka, they plan and execute the computation differently).
So - I've been hit with the "transparency execution of deferred code" dilemma before.
What happens is that: Python, at one point will have to "use" an object - and that use is through calling one of the dunder methods. Up to that time, like, just writing the object name in a no-operation line, does nothing. (unless the line is in a REPL, which will then call the __repr__ method in the object).
Why are dunder methods special? Does being passed to some other function also do nothing? What about a non-dunder attribute? Especially, does being involved in an 'is' check count as using an object? dflt = fetch_cached_object("default") mine = later fetch_cached_object(user.keyword) ... if mine is dflt: ... # "using" mine? Or not? Does it make a difference whether the object has previously been poked in some other way? ChrisA
On Thu, Jun 23, 2022 at 2:53 AM Chris Angelico <rosuav@gmail.com> wrote: > On Thu, 23 Jun 2022 at 11:35, Joao S. O. Bueno <jsbueno@python.org.br> > wrote: > > > > Martin Di Paola wrote: > > > Three cases: Dask/PySpark, Django's ORM and selectq. All of them > > > implement deferred expressions but all of them "compute" them in very > > > specific ways (aka, they plan and execute the computation differently). > > > > > > So - I've been hit with the "transparency execution of deferred code" > dilemma > > before. > > > > What happens is that: Python, at one point will have to "use" an object > - and that use > > is through calling one of the dunder methods. Up to that time, like, > just writing the object name > > in a no-operation line, does nothing. (unless the line is in a REPL, > which will then call the __repr__ > > method in the object). > > Why are dunder methods special? Does being passed to some other > function also do nothing? What about a non-dunder attribute? > Non-dunder attributes goes through obj.__getattribute__ at which point evaluation is triggered anyway. > > Especially, does being involved in an 'is' check count as using an object? > "is" is not "using', and will be always false or true as for any other object. Under this approach, the delayed object is a proxy, and remains a proxy, so this would have side-effects in code consuming the object. (extensions expecting strict built-in types might not work with a proxy for an int or str) - but "is" comparison should bring 0 surprises. > > dflt = fetch_cached_object("default") > mine = later fetch_cached_object(user.keyword) > ... > if mine is dflt: ... # "using" mine? Or not? > > Does it make a difference whether the object has previously been poked > in some other way? > In this case, "mine" should be a proxy for the evaluation of the call of "fetch_cached_object" which clearly IS NOT the returned object stored in "dflt". This is so little, or so much, surprising as verifying that "bool([])" yields False: it just follows the language inner workings, with not special casing. Of course, this if this proposal goes forward - I am just pointing that the existing mechanisms in the language can already support it in a way with no modification. If "is" triggering the resolve is desired, or if is desired the delayed object should be replaced "in place", instead of using a proxy, another approach would be needed - and I'd favor the "already working" proxy approach I presented here. (I won't dare touch the bike-shedding about the syntax on this, though) > ChrisA > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-leave@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/HUJ36AA34SZU7D5Q4G6N5UFFKYUOGOFT/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
On Fri, 24 Jun 2022 at 13:26, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
On Thu, Jun 23, 2022 at 2:53 AM Chris Angelico <rosuav@gmail.com> wrote:
On Thu, 23 Jun 2022 at 11:35, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
Martin Di Paola wrote:
Three cases: Dask/PySpark, Django's ORM and selectq. All of them implement deferred expressions but all of them "compute" them in very specific ways (aka, they plan and execute the computation differently).
So - I've been hit with the "transparency execution of deferred code" dilemma before.
What happens is that: Python, at one point will have to "use" an object - and that use is through calling one of the dunder methods. Up to that time, like, just writing the object name in a no-operation line, does nothing. (unless the line is in a REPL, which will then call the __repr__ method in the object).
Why are dunder methods special? Does being passed to some other function also do nothing? What about a non-dunder attribute?
Non-dunder attributes goes through obj.__getattribute__ at which point evaluation is triggered anyway.
Hmm, do they actually, or is that only if it's defined? But okay. In that case, simply describe it as "accessing any attribute".
Especially, does being involved in an 'is' check count as using an object?
"is" is not "using', and will be always false or true as for any other object. Under this approach, the delayed object is a proxy, and remains a proxy, so this would have side-effects in code consuming the object. (extensions expecting strict built-in types might not work with a proxy for an int or str) - but "is" comparison should bring 0 surprises.
At this point, I'm wondering if the proposal's been watered down to being nearly useless. You don't get the actual object, it's always a proxy, and EVERY attribute lookup on EVERY object has to first check to see if it's a special proxy.
dflt = fetch_cached_object("default") mine = later fetch_cached_object(user.keyword) ... if mine is dflt: ... # "using" mine? Or not?
Does it make a difference whether the object has previously been poked in some other way?
In this case, "mine" should be a proxy for the evaluation of the call of "fetch_cached_object" which clearly IS NOT the returned object stored in "dflt".
This is so little, or so much, surprising as verifying that "bool([])" yields False: it just follows the language inner workings, with not special casing.
If it's defined as a proxy, then yes, that's the case - it will never be that object, neither before nor after the undeferral. But that means that a "later" expression will never truly become the actual object, so you always have to keep that in mind. I foresee a large number of style guides decrying the use of identity checks because they "won't work" with deferred objects.
Of course, this if this proposal goes forward - I am just pointing that the existing mechanisms in the language can already support it in a way with no modification. If "is" triggering the resolve is desired, or if is desired the delayed object should be replaced "in place", instead of using a proxy, another approach would be needed - and I'd favor the "already working" proxy approach I presented here.
(I won't dare touch the bike-shedding about the syntax on this, though)
Right, but if the existing mechanisms are sufficient, why not just use them? We *have* lambda expressions. It wouldn't be THAT hard to define a small wrapper - okay, the syntax is a bit clunky, but bear with me: class later: def __init__(self, func): self.func = func self.__is_real = False def __getattribute__(self, attr): self.__makereal() return getattr(self.__wrapped, attr) def __makereal(self): if self.__is_real: return self.__wrapped = self.func() self.__is_real = True x = later(lambda: expensive+expression()*to/calc) And we don't see a lot of this happening. Why? I don't know for sure, but I can guess at a few possible reasons: 1) It's not part of the standard library, so you have to go fetch a thing to do it. If that's significant enough, this is solvable by adding it to the stdlib, or even a new builtin. 2) "later(lambda: expr)" is clunky. Very clunky. Your proposal solves that, by making "later expr" do that job, but at the price of creating some weird edge cases (for instance, you *cannot* parenthesize the expression - this is probably the only place where that's possible, as even non-expressions can often be parenthesized, eg import and with statements). 3) It's never actually the result of the expression, but always this proxy. 4) There's no (clean) way to get at the true object, which means that all the penalties are permanent. 5) Maybe the need just isn't that strong. How much benefit would this be? You're proposing a syntactic construct for something that isn't used all that often, so it needs to be a fairly dramatic improvement in the cases where it _is_ used. ChrisA
On Fri, Jun 24, 2022 at 1:06 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, 24 Jun 2022 at 13:26, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
On Thu, Jun 23, 2022 at 2:53 AM Chris Angelico <rosuav@gmail.com> wrote:
On Thu, 23 Jun 2022 at 11:35, Joao S. O. Bueno <jsbueno@python.org.br>
wrote:
Martin Di Paola wrote:
Three cases: Dask/PySpark, Django's ORM and selectq. All of them implement deferred expressions but all of them "compute" them in
very
specific ways (aka, they plan and execute the computation differently).
So - I've been hit with the "transparency execution of deferred code" dilemma before.
What happens is that: Python, at one point will have to "use" an object - and that use is through calling one of the dunder methods. Up to that time, like, just writing the object name in a no-operation line, does nothing. (unless the line is in a REPL, which will then call the __repr__ method in the object).
Why are dunder methods special? Does being passed to some other function also do nothing? What about a non-dunder attribute?
Non-dunder attributes goes through obj.__getattribute__ at which point evaluation is triggered anyway.
Hmm, do they actually, or is that only if it's defined? But okay. In that case, simply describe it as "accessing any attribute".
Especially, does being involved in an 'is' check count as using an object?
"is" is not "using', and will be always false or true as for any other object. Under this approach, the delayed object is a proxy, and remains a proxy, so this would have side-effects in code consuming the object. (extensions expecting strict built-in types might not work with a proxy for an int or str) - but "is" comparison should bring 0 surprises.
At this point, I'm wondering if the proposal's been watered down to being nearly useless. You don't get the actual object, it's always a proxy, and EVERY attribute lookup on EVERY object has to first check to see if it's a special proxy.
dflt = fetch_cached_object("default") mine = later fetch_cached_object(user.keyword) ... if mine is dflt: ... # "using" mine? Or not?
Does it make a difference whether the object has previously been poked in some other way?
In this case, "mine" should be a proxy for the evaluation of the call of "fetch_cached_object" which clearly IS NOT the returned object stored in "dflt".
This is so little, or so much, surprising as verifying that "bool([])" yields False: it just follows the language inner workings, with not special casing.
If it's defined as a proxy, then yes, that's the case - it will never be that object, neither before nor after the undeferral. But that means that a "later" expression will never truly become the actual object, so you always have to keep that in mind. I foresee a large number of style guides decrying the use of identity checks because they "won't work" with deferred objects.
Of course, this if this proposal goes forward - I am just pointing that the existing mechanisms in the language can already support it in a way with no modification. If "is" triggering the resolve is desired, or if is desired the delayed object should be replaced "in place", instead of using a proxy, another approach would be needed - and I'd favor the "already working" proxy approach I presented here.
(I won't dare touch the bike-shedding about the syntax on this, though)
Right, but if the existing mechanisms are sufficient, why not just use them? We *have* lambda expressions. It wouldn't be THAT hard to define a small wrapper - okay, the syntax is a bit clunky, but bear with me:
class later: def __init__(self, func): self.func = func self.__is_real = False def __getattribute__(self, attr): self.__makereal() return getattr(self.__wrapped, attr) def __makereal(self): if self.__is_real: return self.__wrapped = self.func() self.__is_real = True
x = later(lambda: expensive+expression()*to/calc)
And we don't see a lot of this happening. Why? I don't know for sure, but I can guess at a few possible reasons:
1) It's not part of the standard library, so you have to go fetch a thing to do it. If that's significant enough, this is solvable by adding it to the stdlib, or even a new builtin.
2) "later(lambda: expr)" is clunky. Very clunky. Your proposal solves that, by making "later expr" do that job, but at the price of creating some weird edge cases (for instance, you *cannot* parenthesize the expression - this is probably the only place where that's possible, as even non-expressions can often be parenthesized, eg import and with statements).
3) It's never actually the result of the expression, but always this proxy.
4) There's no (clean) way to get at the true object, which means that all the penalties are permanent.
5) Maybe the need just isn't that strong.
How much benefit would this be? You're proposing a syntactic construct for something that isn't used all that often, so it needs to be a fairly dramatic improvement in the cases where it _is_ used.
Excuse-me Who is the "you" you are referring to in the last paragraphs? (honest question) I am not proposing this - the proto-pep is David Mertz' . I just pointed out that the language, as it is today,can handle the inner part of the deferred object, as it is. (if one just adds all possible dunder methods to your proxy example above, for example) Moreover, there could be an attribute namespace to deal/modify the object so - retrieving the "real" object could be trivial. (the original would actually be retrieved in _any_ operation with with the object that would make use of its dunder attributes - think "str", or "myobj + 3", since the proxy dunder would forward the operation to the wrapped object corresponding method. I am talking about this because I had played around with that "transparent future object" in the Lelo project I linked in the other e-mail, and it just works, and actually looks like magic, due to it auto-resolving whenever it is "consumed". But like you, I don't know how useful it would actually be - so I am not the "you" from your last paragraphs - I'd not used the "lelo" proxy in production code: calling a ".result()" method, or, in these days, having an "await" expression offers something with a lot more control I just wrote because it is something I made work before - and if there are indeed uses for it, the language might not even need changes to support it beyond an operator keyword. ChrisA
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JTRM6Q... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, 24 Jun 2022 at 16:34, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
On Fri, Jun 24, 2022 at 1:06 AM Chris Angelico <rosuav@gmail.com> wrote:
How much benefit would this be? You're proposing a syntactic construct for something that isn't used all that often, so it needs to be a fairly dramatic improvement in the cases where it _is_ used.
Excuse-me Who is the "you" you are referring to in the last paragraphs? (honest question)
I am not proposing this - the proto-pep is David Mertz' .
You, because you're the one who devised the version that I was responding to. His version is a much more in-depth change, although it has other issues.
I just pointed out that the language, as it is today,can handle the inner part of the deferred object, as it is.
Yes, but with the limitations that I described.
(if one just adds all possible dunder methods to your proxy example above, for example)
I still don't understand why you treat dunder methods as special here. Are you, or are you not, relying on __getattribute__? Have you taken tp_* slots into account?
Moreover, there could be an attribute namespace to deal/modify the object so - retrieving the "real" object could be trivial. (the original would actually be retrieved in _any_ operation with with the object that would make use of its dunder attributes - think "str", or "myobj + 3", since the proxy dunder would forward the operation to the wrapped object corresponding method.
Okay, here's an exercise for you. Given any function f(), ascertain whether these two calls returned the same object: x = f() y = later f() You do not know what kind of object it is. You just have to write the code that will answer the question of whether the second call to f() returned the exact same object as the first call. Calling str() on the two objects is insufficient, for instance. Calling id(y) is not going to touch any of y's dunder methods - it's just going to return the ID of the proxy, so it'll always show as different.
I am talking about this because I had played around with that "transparent future object" in the Lelo project I linked in the other e-mail, and it just works, and actually looks like magic, due to it auto-resolving whenever it is "consumed".
Right. That auto-resolving requires language support, but it means that it's not a "transparent future object". It's a real object, just one that you don't have yet. There *is no object* representing the pending state.
But like you, I don't know how useful it would actually be - so I am not the "you" from your last paragraphs - I'd not used the "lelo" proxy in production code: calling a ".result()" method, or, in these days, having an "await" expression offers something with a lot more control
Then you are not talking about the same thing at all. You're talking about a completely different concept, and you *are* the "you" from my last paragraphs.
I just wrote because it is something I made work before - and if there are indeed uses for it, the language might not even need changes to support it beyond an operator keyword.
Yes, you've done something that is broadly similar to this proposal, but like every idea, has its own set of limitations. It's easy to say "I did something different from what you did, and it doesn't require language support", but your version of the proposal introduces new problems, which is why I responded to them. ChrisA
On Fri, Jun 24, 2022 at 5:38 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, 24 Jun 2022 at 16:34, Joao S. O. Bueno <jsbueno@python.org.br>
wrote:
On Fri, Jun 24, 2022 at 1:06 AM Chris Angelico <rosuav@gmail.com> wrote:
How much benefit would this be? You're proposing a syntactic construct for something that isn't used all that often, so it needs to be a fairly dramatic improvement in the cases where it _is_ used.
Excuse-me Who is the "you" you are referring to in the last paragraphs? (honest question)
I am not proposing this - the proto-pep is David Mertz' .
You, because you're the one who devised the version that I was responding to. His version is a much more in-depth change, although it has other issues.
ok.
I just pointed out that the language, as it is today,can handle the inner part of the deferred object, as it is.
Yes, but with the limitations that I described.
Indeed - I don't want to argue about that, just point out that the natural way things work in Python as is, some of those limitations do not apply.
(if one just adds all possible dunder methods to your proxy example above, for example)
I still don't understand why you treat dunder methods as special here. Are you, or are you not, relying on __getattribute__? Have you taken tp_* slots into account? I had not thought about tp_*slots - I am just considering pure Python code: any slot which does not alias to a visible dunder method would map to the proxy instead, in a straightforward way for one looking only at the Python code. Maybe some of the not mapped slots might cause some undesired effects, and should trigger the resolve as well.
. The reason I am treating dunder attributes as special is simply because it is what cPython does when resolving any operator with an object - any other attribute access, from Python code, goes through __getattribute__, but the code path triggered by operators (+, -, ..., not, len, str) does not.
Moreover, there could be an attribute namespace to deal/modify the
so - retrieving the "real" object could be trivial. (the original would actually be retrieved in _any_ operation with with the object that would make use of its dunder attributes - think "str", or "myobj + 3", since
object the proxy
dunder would forward the operation to the wrapped object corresponding method.
Okay, here's an exercise for you. Given any function f(), ascertain whether these two calls returned the same object:
x = f() y = later f()
You do not know what kind of object it is. You just have to write the code that will answer the question of whether the second call to f() returned the exact same object as the first call. Calling str() on the two objects is insufficient, for instance. Calling id(y) is not going to touch any of y's dunder methods - it's just going to return the ID of the proxy, so it'll always show as different.
It won't work, indeed. unless there are reserved attributes that would cause the explicit resolve. Even if it is not given, and there is no way for a "is" comparison, this derives from the natural usage of the proxy, with no exceptional behaviors needed. The proxy is not the underlying object, after all. And not even a convention such as a ".__deferred_resolve__" call could solve it: the simpler path I pointed out does not involve "in place attribute substitution". But such a method could return resolve and return the wrapped object, and then: `(z := resolve(y)) is x`, would work , as well as id(resolve(y)) == id(x), but "y"would still be the proxy <- no magic needed, and that is the point I wanted to bring. A similar proxy that is used in day to day coding is a super() instance, and I never saw one needing `super(cls, instance) is instance` to be true. [...]
Then you are not talking about the same thing at all. You're talking about a completely different concept, and you *are* the "you" from my last paragraphs.
I see. I've stepped in because that approach worked _really_ well, and I don't think it is _all_ that different from the proposal on the thread, and is instead a middleground not involving "inplace object mutation", that could make something very close to that proposal feasible. Maybe I'd be more happy to see a generic way to implement "super proxys" like these in a less hacky way, and then those could be used to build the deferred objects as in this proposal, than this specific implementation. In the example project itself, Lelo, the proxys are used to calculate the object in a subprocess, rather than just delaying their resolve in-thread.
I just wrote because it is something I made work before - and if there
are indeed
uses for it, the language might not even need changes to support it beyond an operator keyword.
Yes, you've done something that is broadly similar to this proposal, but like every idea, has its own set of limitations. It's easy to say "I did something different from what you did, and it doesn't require language support", but your version of the proposal introduces new problems, which is why I responded to them.
Alright - but the only outstanding problem is the "is" and "id"comparison - I am replying still because I have the impression you had not grokked the main point: at some point, sooner or later, for any object in Python, one of the dunder methods _will_ be called (except for identity comparison, if one has it as an "end in itself"). Be it for printing, serializing, or being the target of a unary or binary operator. This path can be hooked to trigger the deferred resolve in the proposal in this thread. That said, I am not super in favor of it being in the language, and I will leave that for other people to discuss. So, thank you for your time. Really! js -><-
ChrisA Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, 25 Jun 2022 at 10:37, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
On Fri, Jun 24, 2022 at 5:38 AM Chris Angelico <rosuav@gmail.com> wrote:
On Fri, 24 Jun 2022 at 16:34, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
On Fri, Jun 24, 2022 at 1:06 AM Chris Angelico <rosuav@gmail.com> wrote:
How much benefit would this be? You're proposing a syntactic construct for something that isn't used all that often, so it needs to be a fairly dramatic improvement in the cases where it _is_ used.
Excuse-me Who is the "you" you are referring to in the last paragraphs? (honest question)
I am not proposing this - the proto-pep is David Mertz' .
You, because you're the one who devised the version that I was responding to. His version is a much more in-depth change, although it has other issues.
ok.
I just pointed out that the language, as it is today,can handle the inner part of the deferred object, as it is.
Yes, but with the limitations that I described.
Indeed - I don't want to argue about that, just point out that the natural way things work in Python as is, some of those limitations do not apply.
(if one just adds all possible dunder methods to your proxy example above, for example)
I still don't understand why you treat dunder methods as special here. Are you, or are you not, relying on __getattribute__? Have you taken tp_* slots into account? I had not thought about tp_*slots - I am just considering pure Python code: any slot which does not alias to a visible dunder method would map to the proxy instead, in a straightforward way for one looking only at the Python code. Maybe some of the not mapped slots might cause some undesired effects, and should trigger the resolve as well.
. The reason I am treating dunder attributes as special is simply because it is what cPython does when resolving any operator with an object - any other attribute access, from Python code, goes through __getattribute__, but the code path triggered by operators (+, -, ..., not, len, str) does not.
Hmmm, I think possibly you're misunderstanding the nature of class slots, then. The most important part is that they are looked up on the *class*, not the instance; but there are some other quirks too:
class Meta(type): ... def __getattribute__(self, attr): ... print("Fetching %s from the metaclass" % attr) ... return super().__getattribute__(attr) ... class Demo(metaclass=Meta): ... def __getattribute__(self, attr): ... print("Fetching %s from the class" % attr) ... return super().__getattribute__(attr) ... x = Demo() x * 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for *: 'Demo' and 'int'
Neither the metaclass nor the class itself had __getattribute__ called, because __mul__ goes into the corresponding slot. HOWEVER:
Demo().__mul__ Fetching __mul__ from the class Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in __getattribute__ Fetching __dict__ from the class Fetching __class__ from the class Fetching __dict__ from the metaclass Fetching __bases__ from the metaclass AttributeError: 'Demo' object has no attribute '__mul__'. Did you mean: '__module__'?
If you explicitly ask for the dunder method, it does go through __getattribute__. In other words, even though methods are able to customize the behaviour of operators, the behaviour of operators is not defined in terms of method lookups. (This is particularly obvious with function objects, which have __call__ methods; obviously the effect of calling a function cannot be to look up its __call__ method and call that, as it would lead to infinite recursion.)
Moreover, there could be an attribute namespace to deal/modify the object so - retrieving the "real" object could be trivial. (the original would actually be retrieved in _any_ operation with with the object that would make use of its dunder attributes - think "str", or "myobj + 3", since the proxy dunder would forward the operation to the wrapped object corresponding method.
Okay, here's an exercise for you. Given any function f(), ascertain whether these two calls returned the same object:
x = f() y = later f()
You do not know what kind of object it is. You just have to write the code that will answer the question of whether the second call to f() returned the exact same object as the first call. Calling str() on the two objects is insufficient, for instance. Calling id(y) is not going to touch any of y's dunder methods - it's just going to return the ID of the proxy, so it'll always show as different.
It won't work, indeed. unless there are reserved attributes that would cause the explicit resolve. Even if it is not given, and there is no way for a "is" comparison, this derives from the natural usage of the proxy, with no exceptional behaviors needed. The proxy is not the underlying object, after all. And not even a convention such as a ".__deferred_resolve__" call could solve it: the simpler path I pointed out does not involve "in place attribute substitution". But such a method could return resolve and return the wrapped object, and then: `(z := resolve(y)) is x`, would work , as well as id(resolve(y)) == id(x), but "y"would still be the proxy <- no magic needed, and that is the point I wanted to bring.
That's a consequence of it being a proxy, though. You're assuming that a proxy is the only option. Proxies are never fully transparent, and that's a fundamental difficulty with working with them; you can't treat them like the underlying object, you have to think of them as proxies forever. The original proposal, if I'm not mistaken, was that the "deferred thing" really truly would become the resulting object. That requires compiler support, but it makes everything behave sanely: basic identity checks function as you'd expect, there are no bizarre traps with weak references, C-implemented functions don't have to be rewritten to cope with them, etc, etc, etc.
A similar proxy that is used in day to day coding is a super() instance, and I never saw one needing `super(cls, instance) is instance` to be true.
That's partly because super() deliberately does NOT return a transparent, or even nearly-transparent, proxy. The point of it is to have different behaviour from the underlying instance. So, obviously, the super object itself has to be a distinct thing. Usually, a proxy offers some kind of special value that makes it distinct from the original object (otherwise why have it?), so it'll often have some special attributes that tie in with that (for instance, a proxy for objects stored in a database might have an "is_unsaved" attribute/method to show whether it's been assigned a unique ID yet). This is one of very few places where there's no value whatsoever in keeping the proxy around; you just want to go straight to the real object with minimal fuss.
Then you are not talking about the same thing at all. You're talking about a completely different concept, and you *are* the "you" from my last paragraphs.
I see. I've stepped in because that approach worked _really_ well, and I don't think it is _all_ that different from the proposal on the thread, and is instead a middleground not involving "inplace object mutation", that could make something very close to that proposal feasible.
This seems to be a bit of a theme: a proposal is made, someone else says "but you could do it in this completely different way", and because code is so flexible, that's always technically true. But it's not the same proposal, and when you describe it as a different implementation of the same proposal, you confuse the issue quite a bit. Your proposal is basically just a memoized lambda function with proxying capabilities. The OP in this thread was talking about deferred expressions. And my proposal was about a different way to do argument defaults. All of these are *different* proposals, they are not just implementations of each other. Trying to force one proposal to be another just doesn't work.
Maybe I'd be more happy to see a generic way to implement "super proxys" like these in a less hacky way, and then those could be used to build the deferred objects as in this proposal, than this specific implementation. In the example project itself, Lelo, the proxys are used to calculate the object in a subprocess, rather than just delaying their resolve in-thread.
IMO that's a terrible idea. A proxy usually has some other purpose for existing; purely transparent proxies are usually useless. Making it easier to make transparent proxies in a generic way isn't going to be any value to anything that doesn't want to be fully transparent. Calculating in a subprocess means that everything needed for that calculation has to be able to be serialized (probably pickled) and sent to the subprocess, and the result likewise. That's very limiting, and where you're okay with that, you probably _aren't_ okay with that sort of thing magically happening with all attribute lookups.
I just wrote because it is something I made work before - and if there are indeed uses for it, the language might not even need changes to support it beyond an operator keyword.
Yes, you've done something that is broadly similar to this proposal, but like every idea, has its own set of limitations. It's easy to say "I did something different from what you did, and it doesn't require language support", but your version of the proposal introduces new problems, which is why I responded to them.
Alright - but the only outstanding problem is the "is" and "id"comparison - I am replying still because I have the impression you had not grokked the main point: at some point, sooner or later, for any object in Python, one of the dunder methods _will_ be called (except for identity comparison, if one has it as an "end in itself"). Be it for printing, serializing, or being the target of a unary or binary operator. This path can be hooked to trigger the deferred resolve in the proposal in this thread.
As shown above, not true; dunder methods are not always called if they don't exist, so __getattribute__ cannot always proxy them.
That said, I am not super in favor of it being in the language, and I will leave that for other people to discuss.
So, thank you for your time. Really!
No probs. Always happy to discuss ideas; there's nothing wrong with throwing thoughts out there, as long as you don't mind people disagreeing with you :) ChrisA
On Fri, Jun 24, 2022 at 10:05 PM Chris Angelico <rosuav@gmail.com> wrote:
Hmmm, I think possibly you're misunderstanding the nature of class slots, then. The most important part is that they are looked up on the *class*, not the instance; but there are some other quirks too:
Sorry, no. I know how those work.
class Meta(type): ... def __getattribute__(self, attr): ... print("Fetching %s from the metaclass" % attr) ... return super().__getattribute__(attr) ... class Demo(metaclass=Meta): ... def __getattribute__(self, attr): ... print("Fetching %s from the class" % attr) ... return super().__getattribute__(attr) ... x = Demo() x * 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for *: 'Demo' and 'int'
Neither the metaclass nor the class itself had __getattribute__
Yes - if you go back to my first e-mail on the thread, and the example code, that is why I am saying all along the proxy have to explicitly define all possible dunder methods. I've repeateadly written that all _other_ methods and attributes access go through __getattribute__.
called, because __mul__ goes into the corresponding slot. HOWEVER:
Demo().__mul__ Fetching __mul__ from the class Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 4, in __getattribute__ Fetching __dict__ from the class Fetching __class__ from the class Fetching __dict__ from the metaclass Fetching __bases__ from the metaclass AttributeError: 'Demo' object has no attribute '__mul__'. Did you mean: '__module__'?
If you explicitly ask for the dunder method, it does go through __getattribute__.
Please. I know that. The thing is if I define in a class both "__add__" and "__getattribte__" I will cover both "instance + 0" and "instance.__add__(0)" .(...)
That's a consequence of it being a proxy, though. You're assuming that a proxy is the only option. Proxies are never fully transparent, and that's a fundamental difficulty with working with them; you can't treat them like the underlying object, you have to think of them as proxies forever.
No, not in Python. Once you have a proxy class that cover all dundeer methods to operate and return the proxied object, whoever will make use of that proxy will have it working in a transparent way. In any code that don't try direct memory access to the proxied object data. "Lelo" objects from that link can be freely passed around and used - at one point, if the object is not dropped, code has to go through one of the dunder methods - there is no way Python code can do any calculation or output the proxied object without doing so. And that experiment worked fantastically fine in doing so, better than I thought it would, and that is the only thng I am trying to say.
The original proposal, if I'm not mistaken, was that the "deferred thing" really truly would become the resulting object. That requires compiler support, but it makes everything behave sanely: basic identity checks function as you'd expect, there are no bizarre traps with weak references, C-implemented functions don't have to be rewritten to cope with them, etc, etc, etc.
So, as I 've written from the first message, this would require deep suport in the language, which the proxy approach does not.
A similar proxy that is used in day to day coding is a super()
instance, and I
never saw one needing `super(cls, instance) is instance` to be true.
That's partly because super() deliberately does NOT return a transparent, or even nearly-transparent, proxy. The point of it is to have different behaviour from the underlying instance. So, obviously, the super object itself has to be a distinct thing.
Usually, a proxy offers some kind of special value that makes it distinct from the original object (otherwise why have it?), so it'll often have some special attributes that tie in with that (for instance, a proxy for objects stored in a database might have an "is_unsaved" attribute/method to show whether it's been assigned a unique ID yet). This is one of very few places where there's no value whatsoever in keeping the proxy around; you just want to go straight to the real object with minimal fuss.
and that is acheived when what happens when all the dunder methods transparently work on the real object: minimal fuss. inst + other work, str(inst) work, print(inst) work, because it will call str(inst) further down,.
Then you are not talking about the same thing at all. You're talking about a completely different concept, and you *are* the "you" from my last paragraphs.
I see. I've stepped in because that approach worked _really_ well, and I don't
think
it is _all_ that different from the proposal on the thread, and is instead a middleground not involving "inplace object mutation", that could make something very close to that proposal feasible.
This seems to be a bit of a theme: a proposal is made, someone else says "but you could do it in this completely different way", and because code is so flexible, that's always technically true. But it's not the same proposal, and when you describe it as a different implementation of the same proposal, you confuse the issue quite a bit.
Your proposal is basically just a memoized lambda function with proxying capabilities. The OP in this thread was talking about deferred expressions. And my proposal was about a different way to do argument defaults. All of these are *different* proposals, they are not just implementations of each other. Trying to force one proposal to be another just doesn't work.
I am not trying to force anything. My idea was to put on the table a way to achieve most of the effects of the initial proposal with less effort. I guess this is "python-ideas" for a reason. One of the things we collectively try to achieve is, IMHO, to think about innovative proposals, and avoid bloating the language with things that could easily be implemented as 3rdy party packages. In this case, I'd like to 'save the core from further complelxities' if the aspects deemed interesting from the proposal presented can be implemented with the existing mechanisms. And the "inplace object mutation" is a thing that, for me, looks specially scary in terms of complexity of the runtime. We had recently another lenghty thread - about splitting class declaration with a "forward class declaration", that after some lengthy discussion was dismissed because it would also require this.
Maybe I'd be more happy to see a generic way to implement "super proxys" like these in a less hacky way, and then those could be used to build
as in this proposal, than this specific implementation. In the example
the deferred objects project itself, Lelo, the
proxys are used to calculate the object in a subprocess, rather than just delaying their resolve in-thread.
IMO that's a terrible idea.
Go back to e-mail one. I only used the thing as a toy and proof of concept, and presented it as such. It is just that the proxying concept it uses work. A conventionally wrapped "Future" is much more useful for actual work.
A proxy usually has some other purpose for existing; purely transparent proxies are usually useless. Making it easier to make transparent proxies in a generic way isn't going to be any value to anything that doesn't want to be fully transparent.
In my reading of the problem at hand, a "purely transparent proxy" is a nice approach. What one does not want is to compute the expression eagerly. And it does not even need to be "purely transparent" as __getattribute__ implementation allows for some attribute namespace that will allow one to query or otherwise message the underlying object.
Calculating in a subprocess means that everything needed for that calculation has to be able to be serialized (probably pickled) and sent to the subprocess, and the result likewise. That's very limiting, and where you're okay with that, you probably _aren't_ okay with that sort of thing magically happening with all attribute lookups.
Please - I know you ahve some messgae to convey, but you don't have to reply to every sentence I write. In the case presented, of course, after the first lookup was required, the result was waited for, and held as an ordinary internal attribute to the proxy in the same process. Even for a relatively careless toy implementation that is the obvious thing to do.
I just wrote because it is something I made work before - and if
there are indeed
uses for it, the language might not even need changes to support it beyond an operator keyword.
Yes, you've done something that is broadly similar to this proposal, but like every idea, has its own set of limitations. It's easy to say "I did something different from what you did, and it doesn't require language support", but your version of the proposal introduces new problems, which is why I responded to them.
Alright - but the only outstanding problem is the "is" and "id"comparison - I am replying still because I have the impression you had not grokked the main point: at some point, sooner or later, for any object in Python, one of the dunder methods _will_ be called (except for identity comparison, if one has it as an "end in itself"). Be it for printing, serializing, or being the target of a unary or binary operator. This path can be hooked to trigger the deferred resolve in the proposal in this thread.
As shown above, not true; dunder methods are not always called if they don't exist, so __getattribute__ cannot always proxy them.
Not sure which part you are thinking I don't understand about this mechanism here. for the second time in this message; dunder attributes are not used through __getattribute__ and I know that, and that is why they have to be present in the proxy class. There is, of course, _another_ problem - that there are, and not few, code paths that assume that when a dunder attribute _is_ present, the object have that capability. So, having the proxy raise typerror or return NotImplemented if the proxied object don't have a certain dunder is not enough. (that is, one could check for "__len__" and treat the deferred object as a sequnece - "__len__" would be in the proxy, but not on the resolved object) - and yes that is a problem for this approach.
That said, I am not super in favor of it being in the language, and I
will leave
that for other people to discuss.
So, thank you for your time. Really!
No probs. Always happy to discuss ideas; there's nothing wrong with throwing thoughts out there, as long as you don't mind people disagreeing with you :)
:)
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MYPVXM... Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, 25 Jun 2022 at 11:56, Joao S. O. Bueno <gwidion@gmail.com> wrote:
On Fri, Jun 24, 2022 at 10:05 PM Chris Angelico <rosuav@gmail.com> wrote:
Hmmm, I think possibly you're misunderstanding the nature of class slots, then. The most important part is that they are looked up on the *class*, not the instance; but there are some other quirks too:
Sorry, no. I know how those work.
class Meta(type): ... def __getattribute__(self, attr): ... print("Fetching %s from the metaclass" % attr) ... return super().__getattribute__(attr) ... class Demo(metaclass=Meta): ... def __getattribute__(self, attr): ... print("Fetching %s from the class" % attr) ... return super().__getattribute__(attr) ... x = Demo() x * 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for *: 'Demo' and 'int'
Neither the metaclass nor the class itself had __getattribute__
Yes - if you go back to my first e-mail on the thread, and the example code, that is why I am saying all along the proxy have to explicitly define all possible dunder methods.
This is part of the problem: "your first email" is now trying to function as a brand new proposal, yet you're also saying that this is part of the deferred evaluation proposal that David Mertz put forward. Maybe this would have been less confusing if you'd simply proposed it as a stand-alone feature, instead of answering a different post.
No, not in Python. Once you have a proxy class that cover all dundeer methods to operate and return the proxied object, whoever will make use of that proxy will have it working in a transparent way. In any code that don't try direct memory access to the proxied object data. "Lelo" objects from that link can be freely passed around and used - at one point, if the object is not dropped, code has to go through one of the dunder methods - there is no way Python code can do any calculation or output the proxied object without doing so. And that experiment worked fantastically fine in doing so, better than I thought it would, and that is the only thng I am trying to say.
So your proxy has to have code for every single dunder method, and any time a new one is devised, it has to be added? That sounds like a maintenance nightmare.
The original proposal, if I'm not mistaken, was that the "deferred thing" really truly would become the resulting object. That requires compiler support, but it makes everything behave sanely: basic identity checks function as you'd expect, there are no bizarre traps with weak references, C-implemented functions don't have to be rewritten to cope with them, etc, etc, etc.
So, as I 've written from the first message, this would require deep suport in the language, which the proxy approach does not.
Yes, that proposal requires proper language support. But things that require no language support aren't even language proposals, they're just "hey check out this thing that can be done". Of course there's always a way to do it in pure Python right now; the question is really: how limited is the existing version, and how cumbersome is it to write? What you're proposing is VERY cumbersome - you have to enumerate *every single dunder method* and make sure they are perfectly proxied - and limited in a number of ways.
Your proposal is basically just a memoized lambda function with proxying capabilities. The OP in this thread was talking about deferred expressions. And my proposal was about a different way to do argument defaults. All of these are *different* proposals, they are not just implementations of each other. Trying to force one proposal to be another just doesn't work.
I am not trying to force anything. My idea was to put on the table a way to achieve most of the effects of the initial proposal with less effort. I guess this is "python-ideas" for a reason.
You're posting as a response to a different thread. The obvious implication is that you believe your proposal to be a variant of this one. If that's not the case, and you really just want this to stand alone, start a new thread, and then it'll be obvious.
And the "inplace object mutation" is a thing that, for me, looks specially scary in terms of complexity of the runtime. We had recently another lenghty thread - about splitting class declaration with a "forward class declaration", that after some lengthy discussion was dismissed because it would also require this.
It does have a lot of scariness, and frankly, I don't think it's worth the cost; but it does at least have something more to offer than just "magic lambda function that calls itself at some point".
In my reading of the problem at hand, a "purely transparent proxy" is a nice approach. What one does not want is to compute the expression eagerly. And it does not even need to be "purely transparent" as __getattribute__ implementation allows for some attribute namespace that will allow one to query or otherwise message the underlying object.
If it doesn't need to be transparent, why not just use a perfectly normal closure (eg a lambda function)?
There is, of course, _another_ problem - that there are, and not few, code paths that assume that when a dunder attribute _is_ present, the object have that capability. So, having the proxy raise typerror or return NotImplemented if the proxied object don't have a certain dunder is not enough. (that is, one could check for "__len__" and treat the deferred object as a sequnece - "__len__" would be in the proxy, but not on the resolved object) - and yes that is a problem for this approach.
Yes. That's part of why it's non-trivial to "just proxy everything". Proxying is not easy, and usually has limitations. ChrisA
I think I have an idea how to do something like what you're asking with less magic, and I think an example implementation of this could actually be done in pure Python code (though a more performant implementation would need support at the C level). What if a deferred object has 1 magic method ( __isdeferred__ ) that is invoked directly rather than causing a thunk, and invocation of any other method does cause a thunk. For the example implementation, a thunk would simply mean that the value is computed and stored within the instance, and method calls on the wrapper are now delegated to that. In the proper implementation, the object would change its identity to become its computed result.
Steve Jorgensen wrote:
I think I have an idea how to do something like what you're asking with less magic, and I think an example implementation of this could actually be done in pure Python code (though a more performant implementation would need support at the C level). What if a deferred object has 1 magic method ( __isdeferred__ ) that is invoked directly rather than causing a thunk, and invocation of any other method does cause a thunk. For the example implementation, a thunk would simply mean that the value is computed and stored within the instance, and method calls on the wrapper are now delegated to that. In the proper implementation, the object would change its identity to become its computed result.
I haven't had any replies to this, but I think it warrants some attention, so I'll try to clarify what I'm suggesting. Basically, have a deferred object be a wrapper around any kind of callable, and give the wrapper a single method __is_deferred__ that does not trigger unwrapping. Any other method call or anything else that depends on knowing the actual object results in the callable being executed and the wrapper object being replaced by that result. From then on, it is no longer deferred. I like this idea because it is very easy to reason about and fairly flexible. Whether the deferred object is a closure or not depends entirely on its callable. When it gets unwrapped is easy to understand (basically anything other than assignment, passing as an argument, or asking whether it is deferred). What this does NOT help much with is using for argument defaults. Personally, I think that's OK. I think that there are good arguments (separately) for dynamic argument defaults and deferred objects and that trying to come up with 1 concept that covers both of those is not necessarily a good idea. It's not a good idea if we can't come up with a way to do it that IS easy to reason about, anyway.
Thank you for your proposal David. At last we have a counter-proposal to talk about. A few points: (1) (As I pointed out in an earlier post) There is a flaw in using the syntax of an expression PRECEDED by a SOFT keyword: x = later -y With your proposal, x is assigned a deferred-evaluation-object which will be evaluated at some time later as "minus y", right? Erm, no. This is already legal syntax for x being immediately assigned a value of "later minus y". If you put the soft keyword *after* the expression: x = -y later it may or may not read as well (subjective) but AFAICS would work. Alternatively you could propose a hard keyword. Or a different syntax altogether. (2) Delayed evaluation may be useful for many purposes. But for the specific purpose of providing late-bound function argument defaults, having to write the extra line ("n = n" in your example) removes much of the appeal. Two lines of boilerplate (using a sentinel) replaced by one obscure one plus one keyword is not much if any of a win, whereas PEP 671 would remove the boilerplate altogether apart from one sigil. Under your proposal, I for one would probably stick with the sentinel idiom which is explicit. I think "n=n" is confusing to an inexperienced Python user. You may not think this is important. My opinion is that late-bound defaults are important. (We may have to agree to differ.) Apart from anything else: Python fully supports early-bound defaults, why discriminate against late-bound ones? (3) You talk about "deferred objects" and in one place you actually say "Evaluate the Deferred". A "deferred" is an important object but a different concept in Twisted, I think calling it something else would be better to avoid confusion. Best wishes Rob Cliffe On 21/06/2022 21:53, David Mertz, Ph.D. wrote:
Here is a very rough draft of an idea I've floated often, but not with much specification. Take this as "ideas" with little firm commitment to details from me. PRs, or issues, or whatever, can go to https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as mentioning them in this thread.
PEP: 9999 Title: Generalized deferred computation Author: David Mertz <dmertz@gnosis.cx> Discussions-To: https://mail.python.org/archives/list/python-ideas@python.org/thread/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 21-Jun-2022 Python-Version: 3.12 Post-History:
Abstract ========
This PEP proposes introducing the soft keyword ``later`` to express the concept of deferred computation. When an expression is preceded by the keyword, the expression is not evaluated but rather creates a "thunk" or "deferred object." Reference to the deferred object later in program flow causes the expression to be executed at that point, and for both the value and type of the object to become the result of the evaluated expression.
Motivation ==========
"Lazy" or "deferred" evaluation is useful paradigm for expressing relationships among potentially expensive operations prior their actual computation. Many functional programming languages, such as Haskell, build laziness into the heart of their language. Within the Python ecosystem, the popular scientific library `dask-delayed <dask-delayed>`_ provides a framework for lazy evaluation that is very similar to that proposed in this PEP.
.. _dask-delayed: https://docs.dask.org/en/stable/delayed.html
Examples of Use ===============
While the use of deferred computation is principally useful when computations are likely to be expensive, the simple examples shown do not necessarily use such expecially spendy computations. Most of these are directly inspired by examples used in the documentation of dask-delayed.
In dask-delayed, ``Delayed`` objects are create by functions, and operations create a *directed acyclic graph* rather than perform actual computations. For example::
>>> import dask >>> @dask.delayed ... def later(x): ... return x ... >>> output = [] >>> data = [23, 45, 62] >>> for x in data: ... x = later(x) ... a = x * 3 ... b = 2**x ... c = a + b ... output.append(c) ... >>> total = sum(output) >>> total Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0') >>> total.compute() 4611721202807865734 >>> total.visualize()
.. figure:: pep-9999-dag.png :align: center :width: 50% :class: invert-in-dark-mode
Figure 1. Dask DAG created from simple operations.
Under this PEP, the soft keyword ``later`` would work in a similar manner to this dask.delayed code. But rather than requiring calling ``.compute()`` on a ``Delayed`` object to arrive at the result of a computation, every reference to a binding would perform the "compute" *unless* it was itself a deferred expression. So the equivalent code under this PEP would be::
>>> output = [] >>> data = [23, 45, 62] >>> for later x in data: ... a = later (x * 3) ... b = later (2**x) ... c = later (a + b) ... output.append(later c) ... >>> total = later sum(output) >>> type(total) # type() does not un-thunk <class 'DeferredObject'> >>> if value_needed: ... print(total) # Actual computation occurs here 4611721202807865734
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
In general, however, every regular reference to a bound object will force a computation and re-binding on a ``Deferred``. This includes access to simple names, but also similarly to instance attributes, index positions in lists or tuples, or any other means by which an object may be referenced.
Rejected Spellings ==================
A number of alternate spellings for creating a ``Deferred`` object are possible. This PEP-author has little preference among them. The words ``defer`` or ``delay``, or their past participles ``deferred`` and ``delayed`` are commonly used in discussions of lazy evaluation. All of these would work equally well as the suggested soft keyword ``later``. The keyword ``lazy`` is not completely implausible, but does not seem to read as well.
No punctuation is immediately obvious for this purpose, although surrounding expressions with backticks is somewhat suggestive of quoting in Lisp, and perhaps slightly reminiscent of the ancient use of backtick for shell commands in Python 1.x. E.g.::
might_use = `math.gcd(a, math.factorial(b))`
Relationship to PEP-0671 ========================
The concept of "late-bound function argument defaults" is introduced in :pep:`671`. Under that proposal, a special syntactic marker would be permitted in function signatures with default arguments to allow the expressions indicated as defaults to be evaluated at call time rather than at runtime. In current Python, we might write a toy function such as::
def func(items=[], n=None): if n is None: n = len(items) items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Using the :pep:`671` approach this could be simplified somewhat as::
def func(items=[], n=>len(items)): # late-bound defaults act as if bound here items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Under the current PEP, evaluation of a ``Deferred`` object only occurs upon reference. That is, for the current toy function, the evaluation would not occur until the ``print(n)`` line.::
def func(items=[], n=later len(items)): items.append("Hello") print(n)
func([1, 2, 3]) # prints: 4
To completely replicate the behavior of PEP-0671, an extra line at the start of the function body would be required::
def func(items=[], n=later len(items)): n = n # Evaluate the Deferred and re-bind the name n items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
References ==========
https://github.com/DavidMertz/peps/blob/master/pep-9999.rst
Copyright =========
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list --python-ideas@python.org To unsubscribe send an email topython-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived athttps://mail.python.org/archives/list/python-ideas@python.org/message/DQTR3C... Code of Conduct:http://python.org/psf/codeofconduct/
On Thu, 23 Jun 2022 at 10:44, Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:
Thank you for your proposal David. At last we have a counter-proposal to talk about. A few points:
(1) (As I pointed out in an earlier post) There is a flaw in using the syntax of an expression PRECEDED by a SOFT keyword: x = later -y With your proposal, x is assigned a deferred-evaluation-object which will be evaluated at some time later as "minus y", right? Erm, no. This is already legal syntax for x being immediately assigned a value of "later minus y". If you put the soft keyword *after* the expression: x = -y later it may or may not read as well (subjective) but AFAICS would work. Alternatively you could propose a hard keyword. Or a different syntax altogether.
Or just define that the soft keyword applies only if not followed by an operator. That way, "later -y" would be interpreted the same way it always has, and if you actually want a deferred of y's negation, you'd need to spell it some other way. Although I'm not entirely sure how, since the obvious choice, grouping parentheses, just makes it look like a function call instead, and "later 0-y" might not have the same semantics. ChrisA
Thanks Rob, I recognize that I have so-far skirted the order-of-precedence concern. I believe I have used parens in my example everywhere there might be a question... But that's not a general description or rule. I have a bunch of issues that I know I need to flesh out, many coming as suggestions in this thread, which I appreciate. I just wanted to provide something concrete to start the conversation. FWIW, there is a bunch more at the link now than in my initial paste. But I want to clarify more before I copy a new version into the email thread. I haven't used Twisted in a while, but it is certainly an important library, and I don't want to cause confusion. Any specific recommendation on language to use? On Wed, Jun 22, 2022, 8:45 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote:
Thank you for your proposal David. At last we have a counter-proposal to talk about. A few points:
(1) (As I pointed out in an earlier post) There is a flaw in using the syntax of an expression PRECEDED by a SOFT keyword: x = later -y With your proposal, x is assigned a deferred-evaluation-object which will be evaluated at some time later as "minus y", right? Erm, no. This is already legal syntax for x being immediately assigned a value of "later minus y". If you put the soft keyword *after* the expression: x = -y later it may or may not read as well (subjective) but AFAICS would work. Alternatively you could propose a hard keyword. Or a different syntax altogether.
(2) Delayed evaluation may be useful for many purposes. But for the specific purpose of providing late-bound function argument defaults, having to write the extra line ("n = n" in your example) removes much of the appeal. Two lines of boilerplate (using a sentinel) replaced by one obscure one plus one keyword is not much if any of a win, whereas PEP 671 would remove the boilerplate altogether apart from one sigil. Under your proposal, I for one would probably stick with the sentinel idiom which is explicit. I think "n=n" is confusing to an inexperienced Python user. You may not think this is important. My opinion is that late-bound defaults are important. (We may have to agree to differ.) Apart from anything else: Python fully supports early-bound defaults, why discriminate against late-bound ones?
(3) You talk about "deferred objects" and in one place you actually say "Evaluate the Deferred". A "deferred" is an important object but a different concept in Twisted, I think calling it something else would be better to avoid confusion.
Best wishes Rob Cliffe
On 21/06/2022 21:53, David Mertz, Ph.D. wrote:
Here is a very rough draft of an idea I've floated often, but not with much specification. Take this as "ideas" with little firm commitment to details from me. PRs, or issues, or whatever, can go to https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as mentioning them in this thread.
PEP: 9999 Title: Generalized deferred computation Author: David Mertz <dmertz@gnosis.cx> Discussions-To: https://mail.python.org/archives/list/python-ideas@python.org/thread/ Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 21-Jun-2022 Python-Version: 3.12 Post-History:
Abstract ========
This PEP proposes introducing the soft keyword ``later`` to express the concept of deferred computation. When an expression is preceded by the keyword, the expression is not evaluated but rather creates a "thunk" or "deferred object." Reference to the deferred object later in program flow causes the expression to be executed at that point, and for both the value and type of the object to become the result of the evaluated expression.
Motivation ==========
"Lazy" or "deferred" evaluation is useful paradigm for expressing relationships among potentially expensive operations prior their actual computation. Many functional programming languages, such as Haskell, build laziness into the heart of their language. Within the Python ecosystem, the popular scientific library `dask-delayed <dask-delayed>`_ provides a framework for lazy evaluation that is very similar to that proposed in this PEP.
.. _dask-delayed: https://docs.dask.org/en/stable/delayed.html
Examples of Use ===============
While the use of deferred computation is principally useful when computations are likely to be expensive, the simple examples shown do not necessarily use such expecially spendy computations. Most of these are directly inspired by examples used in the documentation of dask-delayed.
In dask-delayed, ``Delayed`` objects are create by functions, and operations create a *directed acyclic graph* rather than perform actual computations. For example::
>>> import dask >>> @dask.delayed ... def later(x): ... return x ... >>> output = [] >>> data = [23, 45, 62] >>> for x in data: ... x = later(x) ... a = x * 3 ... b = 2**x ... c = a + b ... output.append(c) ... >>> total = sum(output) >>> total Delayed('add-8f4018dbf2d3c1d8e6349c3e0509d1a0') >>> total.compute() 4611721202807865734 >>> total.visualize()
.. figure:: pep-9999-dag.png :align: center :width: 50% :class: invert-in-dark-mode
Figure 1. Dask DAG created from simple operations.
Under this PEP, the soft keyword ``later`` would work in a similar manner to this dask.delayed code. But rather than requiring calling ``.compute()`` on a ``Delayed`` object to arrive at the result of a computation, every reference to a binding would perform the "compute" *unless* it was itself a deferred expression. So the equivalent code under this PEP would be::
>>> output = [] >>> data = [23, 45, 62] >>> for later x in data: ... a = later (x * 3) ... b = later (2**x) ... c = later (a + b) ... output.append(later c) ... >>> total = later sum(output) >>> type(total) # type() does not un-thunk <class 'DeferredObject'> >>> if value_needed: ... print(total) # Actual computation occurs here 4611721202807865734
In the example, we assume that the built-in function `type()` is special in not counting as a reference to the binding for purpose of realizing a computation. Alternately, some new special function like `isdeferred()` might be used to check for ``Deferred`` objects.
In general, however, every regular reference to a bound object will force a computation and re-binding on a ``Deferred``. This includes access to simple names, but also similarly to instance attributes, index positions in lists or tuples, or any other means by which an object may be referenced.
Rejected Spellings ==================
A number of alternate spellings for creating a ``Deferred`` object are possible. This PEP-author has little preference among them. The words ``defer`` or ``delay``, or their past participles ``deferred`` and ``delayed`` are commonly used in discussions of lazy evaluation. All of these would work equally well as the suggested soft keyword ``later``. The keyword ``lazy`` is not completely implausible, but does not seem to read as well.
No punctuation is immediately obvious for this purpose, although surrounding expressions with backticks is somewhat suggestive of quoting in Lisp, and perhaps slightly reminiscent of the ancient use of backtick for shell commands in Python 1.x. E.g.::
might_use = `math.gcd(a, math.factorial(b))`
Relationship to PEP-0671 ========================
The concept of "late-bound function argument defaults" is introduced in :pep:`671`. Under that proposal, a special syntactic marker would be permitted in function signatures with default arguments to allow the expressions indicated as defaults to be evaluated at call time rather than at runtime. In current Python, we might write a toy function such as::
def func(items=[], n=None): if n is None: n = len(items) items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Using the :pep:`671` approach this could be simplified somewhat as::
def func(items=[], n=>len(items)): # late-bound defaults act as if bound here items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
Under the current PEP, evaluation of a ``Deferred`` object only occurs upon reference. That is, for the current toy function, the evaluation would not occur until the ``print(n)`` line.::
def func(items=[], n=later len(items)): items.append("Hello") print(n)
func([1, 2, 3]) # prints: 4
To completely replicate the behavior of PEP-0671, an extra line at the start of the function body would be required::
def func(items=[], n=later len(items)): n = n # Evaluate the Deferred and re-bind the name n items.append("Hello") print(n)
func([1, 2, 3]) # prints: 3
References ==========
https://github.com/DavidMertz/peps/blob/master/pep-9999.rst
Copyright =========
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.orghttps://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DQTR3C... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/STVZXL... Code of Conduct: http://python.org/psf/codeofconduct/
On 2022-06-21 13:53, David Mertz, Ph.D. wrote:
Here is a very rough draft of an idea I've floated often, but not with much specification. Take this as "ideas" with little firm commitment to details from me. PRs, or issues, or whatever, can go to https://github.com/DavidMertz/peps/blob/master/pep-9999.rst as well as mentioning them in this thread.
After looking at this a bit more (with the newer revisions) and following the discussion I think this proposal doesn't really achieve what I would want from deferred evaluation. That may be because what I want is unreasonable, but, well, such is life. :-) First, it is not clear to me what the real point of this type of deferred evaluation is. The PEP has a "motivation" section that makes a link to Haskell and Dask, but as far as I can see it doesn't explicitly say what is gained by introducing this new form of lazy evaluation into Python. In particular (as I think someone else mentioned on this thread), Dask-style deferred computations are based on explicitly evaluating the thunk, whereas this proposal would automatically evaluate it on reference. I think that in practice this would make many Dask-style usages unwieldy because you would have to keep repeating the `later` keyword in order to gradually build up a complex deferred computation over multiple statements. For such cases it is more natural to explicitly evaluate the whole thing at the end, rather than explicitly not evaluate it until then. In theory there could be performance gains, as mentioned in the PEP. But again I don't see a huge advantage to this in Python. It might make sense in Haskell where laziness is built into the language at a fundamental level. But in Python, where eager evaluation is the norm, it again seems more natural to me to use "explicit laziness" (i.e., explicit rather than automatic evaluation). It seems rather unusual to have cases where some variable or function argument might contain either a computationally cheap expression or an expensive one; usually for those types of applications you know where you might do something expensive. And even if you don't, I see little downside to requiring an explicit "eval this thunk" step at the end. In contrast, what I would want out of deferred evaluation is precisely the ability to evaluate the deferred expression in the *evaluating* scope (not the definition scope) --- or in a custom provided namespace. Whether this evaluation is implicit or explicit is less important to me than the ability to control the scope in which it occurs. As others mentioned in early posts on this thread, this could complicate things too much to be feasible, but without it I don't really see the point. The reason this is key for me is that I'm focused on a different set of motivating use cases. What I'm interested in is "query-type" situations where you want to pass an expression to some sort of "query engine", which will evaluate the expression in a namespace representing the dataset to be queried. One example would be SQL queries, where it would be nice to be able to do things like: my_sql_table.select(where=thunk (column1 == 2 and column2 > 5)) Likewise this would make pandas indexing less verbose, turning it from: df[(df.column1 == 2) & (df.column2 > 5)] to: df[(column1 == 2) & (column2 > 3)] or even potentially: df[column1 == 2 and column2 > 3] . . . because the evaluator would have control over the evaluation and could provide a namespace in which `column1` and `column2` do not evaluate directly to numpy-like arrays (for which `and` doesn't work), but to some kind of combinable query object which converts the `and` into something that will work with numpy-like elementwise comparison. In other words, the point here is not performance gains or even laziness, but simply the ability to use ordinary Python expression syntax (not, say, a string) to create an unevaluated chunk which can be passed to some other code which then gets to control its evaluation scope, rather than having that scope locked to where it was defined. Because of this, it is probably okay with me if explicit unwrapping of the thunk is required. You know when you are writing a query handler and so you know that what you want is an unevaluated query expression; you don't need to have an argument whose value might either be an unevaluated expression or a fully-evaluated result. This would also mean that such deferred objects could handle the late-bound default case, but the function would have to "commit" to explicit evaluation of such defaults. Probably there could be a no-op "unwrapping" operation that would work on non-deferred objects (so that `unwrap([])` or whatever would just evaluate to the same regular list you passed in), so you could still pass in a plain list a to an argument whose default was `deferred []`, but the function would still have to explicitly evaluate it in its body. Again, I think I'm okay with this, partly because (as I mentioned in the other thread) I don't see PEP 671-style late-bound defaults as a particularly pressing need. There are definitely some holes in my idea. For one thing, with explicit evaluation required, it is much closer to a regular lambda. The only real difference is that it would involve more flexible scope control (rather than unalterably closing over the defining scope). For another, because it is not lazy, it is closer to being achievable with existing mechanisms, like requiring all "field" references in the query to be specified as attributes on some base object (which is indeed most SQL ORMs and pandas-like data structures do it currently). Other people might not be as annoyed with these existing solutions as I am. :-) There is also the question of whether it would unacceptably slow down name references because functions would no longer know which variables were local; I think I would be okay with saying that the thunk could not mutate the enclosing namespace (so, e.g., walruses inside the thunk would only affect an internal thunk namespace). The point here is for the consumer to *evaluate* the thunk and get the result, not inline it into the surrounding code. My idea is much more half-baked than David's proto-PEP so this isn't really worthy of being called an alternative proposal right now. But I wanted to mention these ideas here to at least handwave about what to me the gain would be from deferred evaluation, as I'm coming at it from a somewhat different angle than the proto-PEP. I have a suspicion that the response will be a combination of disgust and deafening silence but that's life. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Sun, 26 Jun 2022 at 04:41, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
In contrast, what I would want out of deferred evaluation is precisely the ability to evaluate the deferred expression in the *evaluating* scope (not the definition scope) --- or in a custom provided namespace. Whether this evaluation is implicit or explicit is less important to me than the ability to control the scope in which it occurs. As others mentioned in early posts on this thread, this could complicate things too much to be feasible, but without it I don't really see the point.
A custom-provided namespace can already be partly achieved, but working in the evaluating scope is currently impossible and would require some major deoptimizations to become possible.
expr = lambda: x + y expr.__code__.co_code b't\x00t\x01\x17\x00S\x00' ns = {"x": 3, "y": 7} eval(expr.__code__, ns) 10
This works because the code object doesn't have any locals, so the name references are encoded as global lookups, and eval() is happy to use arbitrary globals. I say "partly achieved" because this won't work if there are any accidental closure variables - you can't isolate the lambda function from its original context and force everything to be a global:
def f(x): ... return lambda: x + y ... expr = f(42) eval(expr.__code__, ns) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: code object passed to eval() may not contain free variables
The mere fact that there's a local variable 'x' means that you can't compile the expression 'x + y'. So maybe there'd need to be some weird trick with class namespaces, but I'm really not sure what would be worth doing. But evaluating in the caller's namespace is not going to work without some fairly major reworking. At the very least, you'd have to forbid any form of assignment (including assignment expressions), and it would force every surrounding variable to become a nonlocal (no fast locals any more). I don't know what other costs there'd be, and whether it'd even be possible, but if it is, it would certainly be a massive deoptimization to all code, just to permit the possibility that something gets evaluated in this context.
The reason this is key for me is that I'm focused on a different set of motivating use cases. What I'm interested in is "query-type" situations where you want to pass an expression to some sort of "query engine", which will evaluate the expression in a namespace representing the dataset to be queried. One example would be SQL queries, where it would be nice to be able to do things like:
my_sql_table.select(where=thunk (column1 == 2 and column2 > 5))
Likewise this would make pandas indexing less verbose, turning it from:
df[(df.column1 == 2) & (df.column2 > 5)]
to:
df[(column1 == 2) & (column2 > 3)]
So far, so good. In fact, aside from the "accidental closure variable" problem, these could currently be done with a lambda function.
or even potentially:
df[column1 == 2 and column2 > 3]
. . . because the evaluator would have control over the evaluation and could provide a namespace in which `column1` and `column2` do not evaluate directly to numpy-like arrays (for which `and` doesn't work), but to some kind of combinable query object which converts the `and` into something that will work with numpy-like elementwise comparison.
Converting "and" isn't possible, nor should it ever be. But depending on how the lookup is done, it might be possible to actually reevaluate for every row (or maybe that'd be just hopelessly inefficient on numpy's end).
This would also mean that such deferred objects could handle the late-bound default case, but the function would have to "commit" to explicit evaluation of such defaults. Probably there could be a no-op "unwrapping" operation that would work on non-deferred objects (so that `unwrap([])` or whatever would just evaluate to the same regular list you passed in), so you could still pass in a plain list a to an argument whose default was `deferred []`, but the function would still have to explicitly evaluate it in its body. Again, I think I'm okay with this, partly because (as I mentioned in the other thread) I don't see PEP 671-style late-bound defaults as a particularly pressing need.
That seems all very well, but it does incur a fairly huge cost for a relatively simple benefit. Consider: def f(x=defer [], n=defer len(x)): unwrap(x); unwrap(n) print("You gave me", n, "elements to work with") f(defer (print := lambda *x: None)) Is it correct for every late-bound argument default to also be a code injection opportunity? And if so, then why should other functions *not* have such an opportunity afforded to them? I mean, if we're going to have spooky action at a distance, we may as well commit to it. Okay, I jest, but still - giving callers the ability to put arbitrary code into the function is going to be FAR harder to reason about than simply having the code in the function header.
There are definitely some holes in my idea. For one thing, with explicit evaluation required, it is much closer to a regular lambda. The only real difference is that it would involve more flexible scope control (rather than unalterably closing over the defining scope).
TBH I think that that's quite useful, just not for PEP 671. For query languages, it'd be very handy to be able to have a keyword that says "isolate the parsing of this". I could imagine this being useful for function annotations too, although they've been special-cased somewhat, so that might be less of a concern.
There is also the question of whether it would unacceptably slow down name references because functions would no longer know which variables were local; I think I would be okay with saying that the thunk could not mutate the enclosing namespace (so, e.g., walruses inside the thunk would only affect an internal thunk namespace). The point here is for the consumer to *evaluate* the thunk and get the result, not inline it into the surrounding code.
Yep; but the trouble is that referring to a name can also incur a cost, especially when it comes to closures. So I think the explicit namespace is going to be far safer than "evaluate in the caller's context". That said: you can and should be able to prepopulate the evaluation namespace with whatever you like, so using locals() as a "seed" dictionary would basically give you what you want - a non-assignable namespace that has all of these locals available for reference. ChrisA
On 2022-06-25 13:41, Chris Angelico wrote:
On Sun, 26 Jun 2022 at 04:41, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
In contrast, what I would want out of deferred evaluation is precisely the ability to evaluate the deferred expression in the *evaluating* scope (not the definition scope) --- or in a custom provided namespace. Whether this evaluation is implicit or explicit is less important to me than the ability to control the scope in which it occurs. As others mentioned in early posts on this thread, this could complicate things too much to be feasible, but without it I don't really see the point.
A custom-provided namespace can already be partly achieved, but working in the evaluating scope is currently impossible and would require some major deoptimizations to become possible.
expr = lambda: x + y expr.__code__.co_code b't\x00t\x01\x17\x00S\x00' ns = {"x": 3, "y": 7} eval(expr.__code__, ns) 10
This works because the code object doesn't have any locals, so the name references are encoded as global lookups, and eval() is happy to use arbitrary globals. I say "partly achieved" because this won't work if there are any accidental closure variables - you can't isolate the lambda function from its original context and force everything to be a global:
Yes, that is the blocker. It is an important blocker for the query use case, because if you're building a query involving variables called `length` and `width` and so on, the code building this query and/or working with the results may often have its own variables with the same names. So it needs to be possible to create a fully independent namespace that does not care what names happened to be defined in the surrounding scope. Another complicating factor (which I didn't mention in my earlier post) is that you actually sometimes might want to explicitly pass through (that is, close over) variables in the enclosing scope. For instance you might want to make a query like `column1 == threshold` where `threshold` is a variable in the definition scope, whose value you want to "freeze" at that moment as part of the deferred query expression. This would require some way to mark which values are to be frozen in this way (as pandas DataFrame.query does with "@"), which could get a bit hairy.
This would also mean that such deferred objects could handle the late-bound default case, but the function would have to "commit" to explicit evaluation of such defaults. Probably there could be a no-op "unwrapping" operation that would work on non-deferred objects (so that `unwrap([])` or whatever would just evaluate to the same regular list you passed in), so you could still pass in a plain list a to an argument whose default was `deferred []`, but the function would still have to explicitly evaluate it in its body. Again, I think I'm okay with this, partly because (as I mentioned in the other thread) I don't see PEP 671-style late-bound defaults as a particularly pressing need.
That seems all very well, but it does incur a fairly huge cost for a relatively simple benefit. Consider:
def f(x=defer [], n=defer len(x)): unwrap(x); unwrap(n) print("You gave me", n, "elements to work with")
f(defer (print := lambda *x: None))
Is it correct for every late-bound argument default to also be a code injection opportunity? And if so, then why should other functions *not* have such an opportunity afforded to them? I mean, if we're going to have spooky action at a distance, we may as well commit to it. Okay, I jest, but still - giving callers the ability to put arbitrary code into the function is going to be FAR harder to reason about than simply having the code in the function header.
As I said, I don't really care so much about whether the deferred object has the ability to modify the scope in which its evaluated. The important part is that it has to be able to *read* that scope, in a way that doesn't depend in an implicit, non-configurable way on what variables happened to exist in the defining scope. In other words it really is very similar to what a lambda currently is, but with more fine-grained control over which variables are bound in which namespaces (definition vs. eval). I'm not talking about "putting arbitrary code in the function" in the sense of inlining into the eval scope. In fact, one of the things I dislike about PEP 671 is that it does exactly this with the late-bound defaults. I find it even more egregious in that case for extra reasons, but yeah, spooky action at a distance is not the goal here.
There are definitely some holes in my idea. For one thing, with explicit evaluation required, it is much closer to a regular lambda. The only real difference is that it would involve more flexible scope control (rather than unalterably closing over the defining scope).
TBH I think that that's quite useful, just not for PEP 671. For query languages, it'd be very handy to be able to have a keyword that says "isolate the parsing of this". I could imagine this being useful for function annotations too, although they've been special-cased somewhat, so that might be less of a concern.
Right, that's the point of this. In fact there's a part of me that wants something even crazier, like making the deferred object retain info about its AST, so that the eval-ing code could manipulate that if needed. R uses this kind of thing to do some pretty crazy stuff. Perhaps too crazy, which is why only part of me wants this. But it can be pretty powerful.
There is also the question of whether it would unacceptably slow down name references because functions would no longer know which variables were local; I think I would be okay with saying that the thunk could not mutate the enclosing namespace (so, e.g., walruses inside the thunk would only affect an internal thunk namespace). The point here is for the consumer to *evaluate* the thunk and get the result, not inline it into the surrounding code.
Yep; but the trouble is that referring to a name can also incur a cost, especially when it comes to closures. So I think the explicit namespace is going to be far safer than "evaluate in the caller's context".
That said: you can and should be able to prepopulate the evaluation namespace with whatever you like, so using locals() as a "seed" dictionary would basically give you what you want - a non-assignable namespace that has all of these locals available for reference.
It sounds like what you're saying is that the hard part is referring to the "real" evaluation namespace, but it would be easy to refer to a copy of that namespace. That again would probably be okay with me. Like if you could do eval(unevaluated_expression)` and it auto-filled the namespace with `locals()` (i.e., a read-only copy of the eval-ing namespace) that would be cool. As I mentioned before, the point is not for the deferred expression to be inlined into the eval-ing namespace; the point is for the programmer to be able to choose at will which names in deferred expression will have their values taken from the eval-ing namespace (as opposed to the defining namespace). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown
On Sun, 26 Jun 2022 at 16:18, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
On 2022-06-25 13:41, Chris Angelico wrote:
On Sun, 26 Jun 2022 at 04:41, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
In contrast, what I would want out of deferred evaluation is precisely the ability to evaluate the deferred expression in the *evaluating* scope (not the definition scope) --- or in a custom provided namespace. Whether this evaluation is implicit or explicit is less important to me than the ability to control the scope in which it occurs. As others mentioned in early posts on this thread, this could complicate things too much to be feasible, but without it I don't really see the point.
A custom-provided namespace can already be partly achieved, but working in the evaluating scope is currently impossible and would require some major deoptimizations to become possible.
expr = lambda: x + y expr.__code__.co_code b't\x00t\x01\x17\x00S\x00' ns = {"x": 3, "y": 7} eval(expr.__code__, ns) 10
This works because the code object doesn't have any locals, so the name references are encoded as global lookups, and eval() is happy to use arbitrary globals. I say "partly achieved" because this won't work if there are any accidental closure variables - you can't isolate the lambda function from its original context and force everything to be a global:
Yes, that is the blocker. It is an important blocker for the query use case, because if you're building a query involving variables called `length` and `width` and so on, the code building this query and/or working with the results may often have its own variables with the same names. So it needs to be possible to create a fully independent namespace that does not care what names happened to be defined in the surrounding scope.
Another complicating factor (which I didn't mention in my earlier post) is that you actually sometimes might want to explicitly pass through (that is, close over) variables in the enclosing scope. For instance you might want to make a query like `column1 == threshold` where `threshold` is a variable in the definition scope, whose value you want to "freeze" at that moment as part of the deferred query expression. This would require some way to mark which values are to be frozen in this way (as pandas DataFrame.query does with "@"), which could get a bit hairy.
Hmm, that gets a bit messy, since it's entirely possible to want both namespaces (closed-over names and free names) at the same time. There's not going to be an easy fix. It might be safest to reject all closures completely, but then have the ability to define a separate set of constants that will be available in the expression. This might be getting outside the scope (pun intended) of a language proposal, but maybe there could be a general thing of "give me this expression as a code object, NO closures", and then build something on top of that to capture specific values.
This would also mean that such deferred objects could handle the late-bound default case, but the function would have to "commit" to explicit evaluation of such defaults. Probably there could be a no-op "unwrapping" operation that would work on non-deferred objects (so that `unwrap([])` or whatever would just evaluate to the same regular list you passed in), so you could still pass in a plain list a to an argument whose default was `deferred []`, but the function would still have to explicitly evaluate it in its body. Again, I think I'm okay with this, partly because (as I mentioned in the other thread) I don't see PEP 671-style late-bound defaults as a particularly pressing need.
That seems all very well, but it does incur a fairly huge cost for a relatively simple benefit. Consider:
def f(x=defer [], n=defer len(x)): unwrap(x); unwrap(n) print("You gave me", n, "elements to work with")
f(defer (print := lambda *x: None))
Is it correct for every late-bound argument default to also be a code injection opportunity? And if so, then why should other functions *not* have such an opportunity afforded to them? I mean, if we're going to have spooky action at a distance, we may as well commit to it. Okay, I jest, but still - giving callers the ability to put arbitrary code into the function is going to be FAR harder to reason about than simply having the code in the function header.
As I said, I don't really care so much about whether the deferred object has the ability to modify the scope in which its evaluated. The important part is that it has to be able to *read* that scope, in a way that doesn't depend in an implicit, non-configurable way on what variables happened to exist in the defining scope.
Reading that scope is probably fairly doable, but it's easiest to have a variant of exec() then.
In other words it really is very similar to what a lambda currently is, but with more fine-grained control over which variables are bound in which namespaces (definition vs. eval). I'm not talking about "putting arbitrary code in the function" in the sense of inlining into the eval scope. In fact, one of the things I dislike about PEP 671 is that it does exactly this with the late-bound defaults. I find it even more egregious in that case for extra reasons, but yeah, spooky action at a distance is not the goal here.
PEP 671 doesn't put arbitrary code into the function. Only the function itself can define what gets executed in it. It just transforms this: def f(x=NOT_PROVIDED): if x was not provided: x = EXPR into this: def f(x=>EXPR): Either way, there's no "arbitrary code" being added to the function. The function signature is as much a part of the function as the body is. The problem starts happening when deferred expressions have to be provided from outside the function, such as: _default = later EXPR def f(x): if x was not provided: x = _default x = unlater x which is the semantics of other argument defaults, and allows a passed-in argument to inject code.
There are definitely some holes in my idea. For one thing, with explicit evaluation required, it is much closer to a regular lambda. The only real difference is that it would involve more flexible scope control (rather than unalterably closing over the defining scope).
TBH I think that that's quite useful, just not for PEP 671. For query languages, it'd be very handy to be able to have a keyword that says "isolate the parsing of this". I could imagine this being useful for function annotations too, although they've been special-cased somewhat, so that might be less of a concern.
Right, that's the point of this. In fact there's a part of me that wants something even crazier, like making the deferred object retain info about its AST, so that the eval-ing code could manipulate that if needed. R uses this kind of thing to do some pretty crazy stuff. Perhaps too crazy, which is why only part of me wants this. But it can be pretty powerful.
TBH I wouldn't be averse to having some sort of syntax that takes executable code and yields the AST. Trouble is, it would need really really good syntax, otherwise it'll be simpler and safer to just use compile() and provide the code as a string.
Yep; but the trouble is that referring to a name can also incur a cost, especially when it comes to closures. So I think the explicit namespace is going to be far safer than "evaluate in the caller's context".
That said: you can and should be able to prepopulate the evaluation namespace with whatever you like, so using locals() as a "seed" dictionary would basically give you what you want - a non-assignable namespace that has all of these locals available for reference.
It sounds like what you're saying is that the hard part is referring to the "real" evaluation namespace, but it would be easy to refer to a copy of that namespace. That again would probably be okay with me.
There are a few things that are hard, and it's entirely possible that you don't need any of them. 1) Closures need to capture names.
def f(): ... x = 1 ... def g(): ... print(locals()) ... return g ... f()() {}
There's nothing inside g() that says that the name x is important, so when f() returns, it's disposed of. My guess? This won't be a problem, and the semantics of locals() will be fine. 2) As mentioned, mutations. Again, the semantics of locals() are probably fine for your needs, although if you want to guarantee that it ignores mutations, locals().copy() will ensure that. 3) Class namespaces are unusual. Nested class namespaces can be weird and surprising if you don't think about them carefully.
Like if you could do eval(unevaluated_expression)` and it auto-filled the namespace with `locals()` (i.e., a read-only copy of the eval-ing namespace) that would be cool. As I mentioned before, the point is not for the deferred expression to be inlined into the eval-ing namespace; the point is for the programmer to be able to choose at will which names in deferred expression will have their values taken from the eval-ing namespace (as opposed to the defining namespace).
You can just use "eval(code_object, locals())" for that. Or locals().copy() if you want the safety. The key is getting the right code object, and I don't know of a way to do that without either (a) starting from a string, or (b) making an unwanted closure. But a variant of the lambda keyword could provide exactly that. ChrisA
niedz., 26 cze 2022 o 08:17 Brendan Barnwell <brenbarn@brenbarn.net> napisał(a):
In other words it really is very similar to what a lambda currently is, but with more fine-grained control over which variables are bound in which namespaces (definition vs. eval).
But lambda already have fine-grained control, just pass eval namespace as lambda argument. You might want lambda ast, but that's separate issue.
El sáb, 25 jun 2022 a las 13:44, Chris Angelico (<rosuav@gmail.com>) escribió:
On Sun, 26 Jun 2022 at 04:41, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
In contrast, what I would want out of deferred evaluation is
precisely
the ability to evaluate the deferred expression in the *evaluating* scope (not the definition scope) --- or in a custom provided namespace. Whether this evaluation is implicit or explicit is less important to me than the ability to control the scope in which it occurs. As others mentioned in early posts on this thread, this could complicate things too much to be feasible, but without it I don't really see the point.
A custom-provided namespace can already be partly achieved, but working in the evaluating scope is currently impossible and would require some major deoptimizations to become possible.
expr = lambda: x + y expr.__code__.co_code b't\x00t\x01\x17\x00S\x00' ns = {"x": 3, "y": 7} eval(expr.__code__, ns) 10
This works because the code object doesn't have any locals, so the name references are encoded as global lookups, and eval() is happy to use arbitrary globals. I say "partly achieved" because this won't work if there are any accidental closure variables - you can't isolate the lambda function from its original context and force everything to be a global:
def f(x): ... return lambda: x + y ... expr = f(42) eval(expr.__code__, ns) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: code object passed to eval() may not contain free variables
The mere fact that there's a local variable 'x' means that you can't compile the expression 'x + y'. So maybe there'd need to be some weird trick with class namespaces, but I'm really not sure what would be worth doing.
Note that in Python 3.11 exec (but not eval) gains the ability to pass an explicit closure:
def f(x): ... return lambda: print(x + y) ... l = f(1) exec(l.__code__, {"y": 3}, closure=(types.CellType(2),)) 5
This doesn't quite solve the problem being discussed here, but it may help. This was added in https://github.com/python/cpython/pull/92204. We didn't add it to eval() because there was no use case at the time, but it would be easy to add the same support to eval() too.
On Sun, 26 Jun 2022 at 23:20, Jelle Zijlstra <jelle.zijlstra@gmail.com> wrote:
El sáb, 25 jun 2022 a las 13:44, Chris Angelico (<rosuav@gmail.com>) escribió:
On Sun, 26 Jun 2022 at 04:41, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
In contrast, what I would want out of deferred evaluation is precisely the ability to evaluate the deferred expression in the *evaluating* scope (not the definition scope) --- or in a custom provided namespace. Whether this evaluation is implicit or explicit is less important to me than the ability to control the scope in which it occurs. As others mentioned in early posts on this thread, this could complicate things too much to be feasible, but without it I don't really see the point.
A custom-provided namespace can already be partly achieved, but working in the evaluating scope is currently impossible and would require some major deoptimizations to become possible.
expr = lambda: x + y expr.__code__.co_code b't\x00t\x01\x17\x00S\x00' ns = {"x": 3, "y": 7} eval(expr.__code__, ns) 10
This works because the code object doesn't have any locals, so the name references are encoded as global lookups, and eval() is happy to use arbitrary globals. I say "partly achieved" because this won't work if there are any accidental closure variables - you can't isolate the lambda function from its original context and force everything to be a global:
def f(x): ... return lambda: x + y ... expr = f(42) eval(expr.__code__, ns) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: code object passed to eval() may not contain free variables
The mere fact that there's a local variable 'x' means that you can't compile the expression 'x + y'. So maybe there'd need to be some weird trick with class namespaces, but I'm really not sure what would be worth doing.
Note that in Python 3.11 exec (but not eval) gains the ability to pass an explicit closure:
def f(x): ... return lambda: print(x + y) ... l = f(1) exec(l.__code__, {"y": 3}, closure=(types.CellType(2),)) 5
This doesn't quite solve the problem being discussed here, but it may help.
This was added in https://github.com/python/cpython/pull/92204. We didn't add it to eval() because there was no use case at the time, but it would be easy to add the same support to eval() too.
Interesting. That would take a bit of extra work (preprocess the dictionary by checking the function object for its variable names, then lifting those out into the separate tuple), but it could be done. ChrisA
participants (18)
-
Barry
-
Barry Scott
-
Ben Rudiak-Gould
-
Brendan Barnwell
-
Carl Meyer
-
Chris Angelico
-
David Mertz, Ph.D.
-
Eric V. Smith
-
Jelle Zijlstra
-
Joao S. O. Bueno
-
Joao S. O. Bueno
-
Martin Di Paola
-
Neil Girdhar
-
Paul Moore
-
Piotr Duda
-
Rob Cliffe
-
Stephen J. Turnbull
-
Steve Jorgensen