[Python-ideas] Thunks (lazy evaluation) [was Re: Delay evaluation of annotations]

אלעזר elazarg at gmail.com
Mon Sep 26 12:04:06 EDT 2016


You already know I want this for contracts etc.. Here some things that I
consider important:

1. There should be some way to bind the names to function parameters, as in

    @contract
    def invert(x: `x != 0`) -> float: return 1 / x

    @contract
    def invertdiff(x: int, y: `x != y`) -> float: return 1 / (x-y)

2. For this and other reasons, the AST should be available. I think it can
be a single AST per place in code, but it should be immutable.

3. Backticks are problematic because they cannot be nested. I suggest
(name: <expression>) or ('name': expression). This name can be googled.

    def compose(f: `such_that: pure(f)`,
                       g: `such_that: pure(g)`):
         return lambda x: f(g(x))

4. I think it's a bad idea to use thunks as DSL (different semantics than
standard expressions), except in annotations and for specification purposes.

In short, I want this thing. But for only annotations, assertions, and
possibly default arguments as an ad-hoc fix.

Elazar

On Mon, Sep 26, 2016 at 5:05 PM Joseph Jevnik <joejev at gmail.com> wrote:

> Hello everyone, this idea looks like something I have tried building
> already: https://github.com/llllllllll/lazy_python. This project
> implements a `thunk` class which builds up a deferred computation which is
> evaluated only when needed. One use case I have had for this project is
> building up a larger expression so that it may be simplified and then
> computed concurrently with dask:
> http://daisy-python.readthedocs.io/en/latest/. By building up a larger
> expression (and making the tree accessible) users have the ability to
> remove common subexpressions or remove intermediate objects. In numpy
> chained expressions often make lots of allocations which are quickly thrown
> away which is why projects like numexpr (https://github.com/pydata/numexpr)
> can be such a serious speed up. These intermediates are required because
> the whole expression isn't known at the start so it must be evaluated as
> written.
>
> Things to consider about when to evaluate:
>
> 1. Functions which branch on their input need to know which branch to
> select.
> 2. Iteration is really hard to defer in a way that is efficient.
> lazy_python just eagerly evaluates at iteration time but builds thunks in
> the body.
> 3. Stateful operations like IO which normally have an implied order of
> operation now need some explicit ordering.
>
> Regarding the `Py_TYPE` change: I don't think that is correct unless we
> made a thunk have the same binary representation as the underlying object.
> A lot of code does a type check and then calls macros that act on the
> actual type like `PyTuple_GET_ITEM` so we cannot fool C functions very
> easily.
>
> On Mon, Sep 26, 2016 at 9:27 AM, Sjoerd Job Postmus <
> sjoerdjob at sjoerdjob.com> wrote:
>
>> On Mon, Sep 26, 2016 at 10:46:57PM +1000, Steven D'Aprano wrote:
>> > Let's talk about lazy evaluation in a broader sense that just function
>> > annotations.
>> >
>> > If we had syntax for lazy annotation -- let's call them thunks, after
>> > Algol's thunks -- then we could use them in annotations as well as
>> > elsewhere. But if we special case annotations only, the Zen has
>> > something to say about special cases.
>> >
>> >
>> > On Mon, Sep 26, 2016 at 02:57:36PM +1000, Nick Coghlan wrote:
>> > [...]
>> > > OK, that does indeed make more sense, and significantly reduces the
>> > > scope for potential runtime compatibility breaks related to
>> > > __annotations__ access. Instead, it changes the discussion to focus on
>> > > the following main challenges:
>> > >
>> > > - the inconsistency introduced between annotations (lazily evaluated)
>> > > and default arguments (eagerly evaluated)
>> > > - the remaining compatibility breaks (depending on implementation
>> details)
>> > > - the runtime overhead of lazy evaluation
>> > > - the debugging challenges of lazy evaluation
>> >
>> >
>> > Default arguments are a good use-case for thunks. One of the most common
>> > gotchas in Python is early binding of function defaults:
>> >
>> > def func(arg=[]):
>> >     ...
>> >
>> > Nine times out of ten, that's probably not what you want. Now, to avoid
>> > all doubt, I do not want to change function defaults to late binding.
>> > I've argued repeatedly on comp.lang.python and elsewhere that if a
>> > language only offers one of early binding or late binding, it should
>> > offer early binding as Python does. The reason is, given early binding,
>> > it it trivial to simulate something like late binding:
>> >
>> > def func(arg=None):
>> >     if arg is None:
>> >         arg = []
>> >     ...
>> >
>> > but given late binding, it is ugly and inconvenient to get a poor
>> > substitute for early binding when that's what you want. So, please,
>> > let's not have a debate over the behaviour of function defaults.
>> >
>> > But what if we could have both? Suppose we use backticks `...` to make a
>> > thunk, then we could write:
>> >
>> > def func(arg=`[]`):
>> >     ...
>> >
>> > to get the late binding result wanted.
>> >
>> > Are there other uses for thunks? Potentially, they could be used for
>> > Ruby-like code blocks:
>> >
>> > result = function(arg1, arg2, block=```# triple backticks
>> >     do_this()
>> >     do_that()
>> >     while condition:
>> >        do_something_else()
>> >     print('Done')
>> >     ```,
>> >     another_arg=1)
>> >
>> >
>> > but then I'm not really sure what advantage code blocks have over
>> > functions.
>> >
>> >
>> > > The inconsistency argument is simply that people will be even more
>> > > confused than they are today if default arguments are evaluated at
>> > > definition time while annotations aren't. There is a lot of code out
>> > > there that actively relies on eager evaluation of default arguments,
>> > > so changing that is out of the question, which then provides a strong
>> > > consistency argument in favour of keeping annotations eagerly
>> > > evaluated as well.
>> >
>> > Indeed. There are only (to my knowledge) only two places where Python
>> > delays evaluation of code:
>> >
>> > - functions (def statements and lambda expressions);
>> > - generator expressions;
>> >
>> > where the second can be considered to be syntactic sugar for a generator
>> > function (def with yield). Have I missed anything?
>> >
>> > In the same way that Haskell is fundamentally built on lazy evaluation,
>> > Python is fundamentally built on eager evaluation, and I don't think we
>> > should change that.
>> >
>> > Until now, the only way to delay the evaluation of code (other than the
>> > body of a function, of course) is to write it as a string, then pass it
>> > to eval/exec. Thunks offer an alternative for delayed evaluation that
>> > makes it easier for editors to apply syntax highlighting: don't apply it
>> > to ordinary strings, but do apply it to thunks.
>> >
>> > I must admit that I've loved the concept of thunks for years now, but
>> > I'm still looking for the killer use-case for them, the one clear
>> > justification for why Python should include them.
>> >
>> > - Late-bound function default arguments? Nice to have, but we already
>> > have a perfectly serviceable way to get the equivalent behaviour.
>> >
>> > - Code blocks? Maybe a Ruby programmer can explain why they're so
>> > important, but we have functions, including lambda.
>> >
>> > - Function annotations? I'm not convinced thunks are needed or desirable
>> > for annotations.
>> >
>> > - A better way to write code intended for delayed execution? Sounds
>> > interesting, but not critical.
>> >
>> > Maybe somebody else can think of the elusive killer use-case for thunks,
>> > because I've been pondering this question for many years now and I'm no
>> > closer to an answer.
>>
>> Well, there's a use-case I have been pondering for a long while now
>> which could be satisfied by this: enumerated generator displays.
>>
>> So suppose you have a composite boolean value, composed by the 'and' of
>> many conditions (which all take long to compute), and you want to
>> short-circuit. Let's take the following example.
>>
>>     valid = True
>>     valid &= looks_like_emailaddress(username)
>>     valid &= more_than_8_characters(password)
>>     valid &= does_not_exist_in_database(username)
>>     valid &= domain_name_of_emailaddress_has_mx_record(username)
>>     ... some more options ...
>>
>> (I forgot the exact use-case, but I still remember the functionality I
>> wanted, so bear with me).
>>
>> Of course, the above is not short-circuiting, so it would be replaced by
>>
>>    def check_valid(username, password):
>>        if not looks_like_emailaddress(username): return False
>>        if not more_than_8_characters(password): return False
>>        if not does_not_exist_in_database(username): return False
>>        if not domain_name_of_emailaddress_has_mx_record(username): return
>> False
>>        ...
>>        return True
>>
>>
>>     valid = check_valid()
>>
>> or
>>
>>     valid = True\
>>         and looks_like_emailaddress(username)\
>>         and more_than_8_characters(password)\
>>         and does_not_exist_in_database(username)\
>>         and domain_name_of_emailaddress_has_mx_record(username)
>>
>> But in all reality, I want to write something like:
>>
>>     valid = all(@@@
>>         looks_like_emailaddress(username),
>>         more_than_8_characters(password),
>>         does_not_exist_in_database(username),
>>         domain_name_of_emailaddress_has_mx_record(username),
>>     @@@)
>>
>> With `@@@` designating the beginning/ending of the enumerated generator
>> display.
>>
>> Now, this is currently not possible, but if we had some kind of thunk
>> syntax that would become possible, without needing an enumerated
>> generator display.
>>
>> However the problem I see with the concept of `thunk` is: When does it
>> get un-thunked? In which of the following cases?
>>
>> 1. When getting an attribute on it?
>> 2. When calling it? --> See 1. with `__call__`.
>> 3. When subindexing it? --> See 1. with `__getitem__`.
>> 4. When assigning it to a name? It shouldn't have to be un-thunked, I
>>    think.
>> 5. When adding it to a list? No un-thunking should be necessary, I
>>    think.
>>
>> However, the problem with thunks is (I think) that to make that happen
>> either
>>
>> - *all* objects need to include yet another level of redirection,
>> or
>> - a thunk needs to get allocated the maximum size of the value it could
>>   possibly store. (But a `unicode` object could have an arbitrary size)
>> or
>> - there needs to be some way to 'notify' objects holding the thunk that
>>   its value got updated.  For a dict/list/tuple this could readily grow
>>   into O(n) behaviour when un-thunking a thunk.
>> or
>> - any C-level functionality needs to learn how to deal with thunks. For
>>   instance, `Py_TYPE` would have to *resolve* the thunk, and then return
>>   the type of the value.
>> or
>> - I'm running out of ideas here, but maybe creating a custom type object
>>   for each thunk that does pass-through to a wrapped item? Thunked
>>   objects would work *exactly* the same as normal objects, but at a
>>   (small) indirection for any action taken. Still, somehow `Py_TYPE` and
>>   `Py_SIZE` and any other macros would still have to force evaluation.
>>
>> Kind regards,
>> Sjoerd Job
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160926/84079234/attachment-0001.html>


More information about the Python-ideas mailing list