[Python-ideas] PEP draft - Composable futures for reactive programming
Guido van Rossum
guido at python.org
Sat Dec 28 04:30:41 CET 2013
Hi Sergii,
I'm trying to give some constructive criticism here, please bear with me.
The biggest issue perhaps seems to me that the unification between
concurrent and threaded Futures still feels uncomfortable to me. A
symptom is the completely different semantics of result() -- when the
result isn't ready yet, this either raises an exception or blocks the
current thread, and that makes reasoning about what will happen
A lesser issue is naming -- I read some earlier example code you
posted, and I couldn't understand it, because the names for the new
operations you added are pretty arbitrary. Especially grating is your
reusing some well-known names of built-in Python functions for
different purposes, the worst offender being map(), but all() isn't so
great either. In general your FutureBaseExt class (also an awkward
name IMO) introduces a bunch of new functions with a wide variety of
functionality that seems to have little logic to it. Why this set of
functions and not another?
A separate question is why the distinction between FutureBase and FutureBaseExt.
It seems you copied some phrases from the asyncio docs or PEP 3156 --
e.g. add_done_callback() references call_soon(); this seems incorrect
for threaded Futures.
The definition of an Executor seems incomplete (SynchronousExecutor is
referenced but not defined), and very vague -- I don't believe that
making it just a callable suffices for the functionality. There is
also a mention of global configuration of a default executor by
assigning to config.Default.CALLBACK_EXECUTOR, which seems a bad idea
-- I'm sure a lot of code will in practice depend on the choice of
Another issue: why the try_* methods?
Finally, I'm not sure I am convinced by your motivation section. Or,
at least, I'd like you to address how your proposal addresses each of
the bullets in your motivation, with some examples.
(I may have more, but at the current rate it would take me a day per
paragraph, so I'll get to more later.)
On Tue, Dec 24, 2013 at 8:56 AM, Sergii Mikhtoniuk <mikhtonyuk at gmail.com> wrote:
> Thanks everyone for your feedback.
> Taking all your suggestions into account I have revised my proposal.
> In short, it’s now:
> - defines separate Future classes for cooperative and multithreaded cases in
> concurrent.futures package
> - multithreaded implementation adds thread-safety to basic implementation,
> so in cooperative concurrency case there is absolutely no overhead
> - cooperative future’s interface is identical to asyncio.future
> - asyncio.Future inherits from concurrent.futures.cooperative.Future adding
> only methods specific to `yield from`
> - adds common composition methods for futures (intended to replace and
> enhance asyncio.wait/gather and concurrent.futures.wait)
> There’s still some work to be done for backward compatibility of
> concurrent.futures.Future, but implementation is almost ready.
> Would really appreciate if you could take a look.
> Thanks,
> Sergii
> On Mon, Dec 23, 2013 at 12:42 AM, Guido van Rossum <guido at python.org> wrote:
>> Aha. That is clever. I will have to look into the details more, but the
>> idea is promising. Sorry I didn't see tty his before.
>> On Dec 22, 2013 12:30 PM, "Ben Darnell" <ben at bendarnell.com> wrote:
>>> On Sat, Dec 21, 2013 at 10:45 PM, Guido van Rossum <guido at python.org>
>>> wrote:
>>>> There's still the issue that in the threading version, you wait for a
>>>> Future by blocking the current thread, while in the asyncio version,
>>>> you must use "yield from" to block. For interoperability you would
>>>> have to refrain from *any* blocking operations (including "yield
>>>> from") so you would only be able to use callbacks. But whether you had
>>>> to write "x = f.result()" or "x = concurrent.futures.wait_for(f)",
>>>> either way you'd implicitly be blocking the current thread.
>>> Threaded *consumers* of Futures wait for them by blocking, while
>>> asynchronous consumers wait for them by yielding. It doesn't matter whether
>>> the *producer* of the Future is threaded or asynchronous (except that if you
>>> know you won't be using threads you can use a faster thread-unsafe Future
>>> implementation).
>>> -Ben
>>>> Yes, a clever scheduler could run other callbacks while blocking, but
>>>> that's not a complete solution, because another callback might do a
>>>> similar blocking operation, and whatever that waited for could hold up
>>>> the earlier blocking operation, causing an ever-deeper recursion and
>>>> of event loop invocations that might never complete. (I've had to
>>>> debug this in production code.) To cut through that you'd have to have
>>>> some kind of stack-swapping coroutine implementation like gevent, or a
>>>> syntactic preprocessor that inserts yield or yield-from operations
>>>> (I've heard from people who do this), or you'd need a clairvoyant
>>>> scheduler that would know which callbacks won't block.
>>>> I like the C# solution, but it depends on static typing so a compiler
>>>> can know when to emit the coroutine interactions. That wouldn't work
>>>> in Python, unless you made the compiler recognizing the wait_for()
>>>> operation by name, which feels unsavory (although we do it for super()
>>>> :-).
>>>> I guess for extreme interop, callbacks that never block is your only
>>>> option anyway, but I'd be sad if we had to to recommend this as the
>>>> preferred paradigm, or claim that it is all you need.
>>>> --Guido (if I don't respond to this thread for the next two weeks,
>>>> it's because I'm on vacation :-)
>>>> On Sat, Dec 21, 2013 at 5:37 PM, Ben Darnell <ben at bendarnell.com> wrote:
>>>> > On Sat, Dec 21, 2013 at 7:26 PM, Guido van Rossum <guido at python.org>
>>>> > wrote:
>>>> >>
>>>> >> On Sat, Dec 21, 2013 at 2:48 PM, Sergii Mikhtoniuk
>>>> >> <mikhtonyuk at gmail.com>
>>>> >> wrote:
>>>> >> > Indeed there is a lot of overlap between asyncio and
>>>> >> > concurrent.futures
>>>> >> > packages, so it would be very interesting to hear your overall
>>>> >> > thoughts
>>>> >> > on
>>>> >> > role/future of concurrent package. Do you consider it rudimentary
>>>> >> > and
>>>> >> > replaceable by asyncio.Future completely?
>>>> >>
>>>> >> They don't really compare. concurrent.futures is about *threads*.
>>>> >> asyncio.Future is about *avoiding* threads in favor of more
>>>> >> lightweight "tasks" and "coroutines", which in turn are built on top
>>>> >> of lower-level callbacks.
>>>> >
>>>> >
>>>> > concurrent.futures.ThreadPoolExecutor is about threads; the Future
>>>> > class
>>>> > itself is broader. When I integrated Futures into Tornado I used
>>>> > concurrent.futures.Future directly (when available). asyncio.Future
>>>> > is just
>>>> > an optimized version of c.f.Future (the optimization comes from
>>>> > assuming
>>>> > single-threaded usage). There should at least be a common ABC between
>>>> > them.
>>>> >
>>>> >>
>>>> >>
>>>> >> (While I want to get away from callbacks as a programming paradigm,
>>>> >> asyncio uses them at the lower levels both because they are a logical
>>>> >> low-level building block and for interoperability with other
>>>> >> frameworks like Tornado and Twisted.)
>>>> >>
>>>> >> > I think the question is even not much about which is your preferred
>>>> >> > implementation, but rather do we want having futures as stand-alone
>>>> >> > package
>>>> >> > or not. Do you see all these implementations converging in future?
>>>> >>
>>>> >> I see them as not even competing. They use different paradigms and
>>>> >> apply to different use cases.
>>>> >>
>>>> >> > To me Future is a very simple and self-contained primitive,
>>>> >> > independent
>>>> >> > of
>>>> >> > thread pools, processes, IO, and networking.
>>>> >>
>>>> >> (Agreed on the I/O and networking part only.)
>>>> >>
>>>> >> > One thing concurrent.futures
>>>> >> > package does a very good job at is defining an isolated namespace
>>>> >> > for
>>>> >> > futures, stressing out this clear boundary (let’s disregard here
>>>> >> > ThreadPoolExecutor and ProcessPoolExecutor classes which for some
>>>> >> > reason
>>>> >> > ended up in it too).
>>>> >>
>>>> >> Actually the executor API is an important and integral part of that
>>>> >> package, and threads underlie everything you can do with its Futures.
>>>> >>
>>>> >> > So when I think of schedulers and event loops implementations I see
>>>> >> > them
>>>> >> > as
>>>> >> > ones that build on top of the Future primitive, not providing it as
>>>> >> > part
>>>> >> > of
>>>> >> > their implementation. What I think is important is that having
>>>> >> > unified
>>>> >> > Future class simplifies interoperability between different kinds of
>>>> >> > schedulers, not necessarily associated with asyncio event loops
>>>> >> > (process
>>>> >> > pools for example).
>>>> >>
>>>> >> The interoperability is completely missing. I can't tell if you've
>>>> >> used asyncio at all, but the important operation of *waiting* for a
>>>> >> result is fundamentally different there than in concurrent.futures.
>>>> >> In
>>>> >> the latter, you just write "x = f.result()" and your thread blocks
>>>> >> until the result is available. In asyncio, you have to write "x =
>>>> >> yield from f.result()" which is a coroutine block that lets other
>>>> >> tasks run in the same thread. ("yield from" in this case is how
>>>> >> Python
>>>> >> spells the operation that C# calls "await").
>>>> >
>>>> >
>>>> > The way I see it, the fundamental operation on Futures is
>>>> > add_done_callback.
>>>> > We then have various higher-level operations that let us get away from
>>>> > using
>>>> > callbacks directly. One of these happens to be a method on Future:
>>>> > the
>>>> > blocking mode of Future.result(). Another is implemented in asyncio
>>>> > and
>>>> > Tornado, in the ability to "yield" a Future. The blocking mode of
>>>> > result()
>>>> > is just a convenience; if asyncio-style futures had existed first then
>>>> > we
>>>> > could instead have a function like "x =
>>>> > concurrent.futures.wait_for(f)". In
>>>> > fact, you could write this wait_for function today in a way that works
>>>> > for
>>>> > both concurrent and asyncio futures.
>>>> >
>>>> > This is already interoperable: Tornado's Resolver interface
>>>> >
>>>> > (https://github.com/facebook/tornado/blob/master/tornado/netutil.py#L184)
>>>> > returns a Future, which may be generated by a ThreadPoolExecutor or an
>>>> > asynchronous wrapper around pycares or twisted. In the other
>>>> > direction I've
>>>> > worked on hybrid apps that have one Tornado thread alongside a bunch
>>>> > of
>>>> > Django threads; in these apps it would work to have a Django thread
>>>> > block on
>>>> > f.result() for a Future returned by Tornado's AsyncHTTPClient.
>>>> >
>>>> > -Ben
>>>> >
>>>> >>
>>>> >> > Futures are definitely not the final solution for concurrency
>>>> >> > problem
>>>> >> > but
>>>> >> > rather a well-established utility for representing async
>>>> >> > operations. It
>>>> >> > is a
>>>> >> > job of higher layer systems to provide more convenient ways for
>>>> >> > dealing
>>>> >> > with
>>>> >> > asyncs (yield from coroutines, async/await rewrites etc.),
>>>> >>
>>>> >> But unless you are proposing some kind of radical change to add
>>>> >> compile-time type checking/inference to Python, the rewrite option is
>>>> >> unavailable in Python.
>>>> >>
>>>> >> > so I would not
>>>> >> > say that futures encourage callback-style programming,
>>>> >>
>>>> >> The concurrent.futures.Future class does not. But unless I misread
>>>> >> your proposal, your extensions do.
>>>> >>
>>>> >> > it’s simply a
>>>> >> > lower-layer functionality. On the contrary, monadic
>>>> >>
>>>> >> (Say that word one more time and everyone tunes out. :-)
>>>> >>
>>>> >> > methods for futures
>>>> >> > composition (e.g. map(), all(), first() etc.) ensure that no errors
>>>> >> > would be
>>>> >> > lost in the process, so I think they would complement yield from
>>>> >> > model
>>>> >> > quite
>>>> >> > nicely by hiding complexity of state maintenance from user and
>>>> >> > reducing
>>>> >> > the
>>>> >> > number of back-and-forth communications between event loop and
>>>> >> > coroutines.
>>>> >>
>>>> >> I'm not sure I follow. Again, I'm not sure if you've actually written
>>>> >> any code using asyncio.
>>>> >>
>>>> >> TBH I've written a fair number of example programs for asyncio and
>>>> >> I've very rarely felt the need for these composition functions. The
>>>> >> main composition primitive I tend to use is "yield from".
>>>> >>
>>>> >> > Besides Futures, reactive programming
>>>> >>
>>>> >> What *is* reactive programming? If you're talking about
>>>> >> http://en.wikipedia.org/wiki/Reactive_programming,
>>>> >> I'm not sure that it maps well to Python.
>>>> >>
>>>> >> > has more utilities to offer, such as
>>>> >> > Observables (representing asynchronous streams of values). It is
>>>> >> > also a
>>>> >> > very
>>>> >> > useful abstraction with a rich set of composition strategies
>>>> >> > (merging,
>>>> >> > concatenation, grouping), and may deserve its place in separate
>>>> >> > package.
>>>> >>
>>>> >> It all sounds very abstract and academic. :-)
>>>> >>
>>>> >> > Hope to hear back from you to get better picture on overall design
>>>> >> > direction
>>>> >> > here before jumping to implementation details.
>>>> >> >
>>>> >> > Brief follow-up to your questions:
>>>> >> > - Idea behind the Future/Promise separation is to draw a clean
>>>> >> > line
>>>> >> > between
>>>> >> > client-facing and scheduler-side APIs respectively. Futures should
>>>> >> > not
>>>> >> > expose any completion methods, which clients should not call
>>>> >> > anyway.
>>>> >>
>>>> >> Given that Future and Promise are often used synonymously, using them
>>>> >> to make this distinction sounds confusing. I agree that Futures have
>>>> >> two different APIs, one for the consumer and another for the
>>>> >> producer.
>>>> >> But I'm not sure that it's necessary to separate them more strictly
>>>> >> --
>>>> >> convention seems good enough to me here. (It's the same with many
>>>> >> communication primitives, like queues and even threads.)
>>>> >>
>>>> >> (The one thing that trips people up frequently is that, while
>>>> >> set_result() and set_exception() are producer APIs, cancel() is a
>>>> >> consumer API, and the cancellation signal travels from the consumer
>>>> >> to
>>>> >> the producer.)
>>>> >>
>>>> >> > - Completely agree with you that Twisted-style callbacks are evil
>>>> >> > and
>>>> >> > it is
>>>> >> > better to have single code path for getting result or raising
>>>> >> > exception
>>>> >> > - Sorry for bad grammar in the proposal, it’s an early draft
>>>> >> > written in
>>>> >> > 3
>>>> >> > AM, so I will definitely improve on it if we decide to move
>>>> >> > forward.
>>>> >>
>>>> >> No problem!
>>>> >>
>>>> >> > Thanks,
>>>> >> > Sergii
>>>> >>
>>>> >> --
>>>> >> --Guido van Rossum (python.org/~guido)
>>>> >
>>>> >
>>>> --
>>>> --Guido van Rossum (python.org/~guido)
--Guido van Rossum (python.org/~guido)
