[Python-ideas] Async API: some code to review

Mon Oct 29 20:58:35 CET 2012

On Mon, Oct 29, 2012 at 8:43 PM, Guido van Rossum <guido at python.org> wrote:
> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodolà <g.rodola at gmail.com> wrote:
>> 2012/10/29 Guido van Rossum <guido at python.org>
>>>
>>> I'm most interested in feedback on the design of polling.py and
>>> scheduling.py, and to a lesser extent on the design of sockets.py;
>>> main.py is just an example of how this style works out in practice.
>>
>> Follows my comments.
>>
>> === About polling.py ===
>>
>> 1 - I think DelayedCall should have a reset() method, other than just cancel().
>
> So, essentially an uncancel()? Why not just re-register in that case?
> Or what's your use case? (Right now there's no problem in calling one
> of these many times -- it's just that cancellation is permanent.)
>
>> 2 - EventLoopMixin should have a call_every() method other than just
>> call_later()
>
> Arguably you can emulate that with a simple loop:
>
> def call_every(secs, func, *args):
>     while True:
>         yield from scheduler.sleep(secs)
>         func(*args)
>
> (Flavor to taste to log exceptions, handle cancellation, automatically
> spawn a separate task, etc.)
>
> I can build lots of other useful things out of call_soon() and
> call_later() -- but I do need at least those two as "axioms".
>
>> 3 - call_later() and call_every() should also take **kwargs other than
>> just *args
>
> I just replied to that in a previous message; there's also a comment
> in the code. How important is this really? Are there lots of use cases
> that require you to pass keyword args? If it's only on occasion you
> can use a lambda. (The *args is a compromise so we don't need a lambda
> to wrap every callback. But I want to reserve keyword args for future
> extensions to the registration functions.)

Well, using keyword-only arguments for passing flags can be good point.
I can live with *args only. Maybe using **kwargs for call_later family
only is good compromise?
Really I don't care on add_reader/add_writer, that functions intended
to library writers.
call_later and call_soon can be used in user code often enough and
passing keyword arguments can be convenient.

>
>> 4 - I think PollsterBase should provide a method to modify() the
>> events registered for a certain fd (both poll() and epoll() have such
>> a method and it's faster compared to un/registering a fd).
>
> Did you see the concrete implementations? Those where this matters
> implicitly uses modify() if the required flags change. I can imagine
> more optimizations of the implementations (e.g. delaying
> register()/modify() calls until poll() is actually called, to avoid
> unnecessary churn) without making the API more complex.
>
>> Feel free to take a look at my scheduler implementation which looks
>> quite similar to what you've done in polling.py:
>> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85
>
> Thanks, I had seen it previously, I think this also proves that
> there's nothing particularly earth-shattering about this design. :-)
> I'd love to copy some more of your tricks, e.g. the occasional
> re-heapifying. (What usage pattern is this dealing with exactly?) I
> should also check that I've taken care of all the various flags and
> other details (I recall being quite surprised that with poll(), on
> some platforms I need to check for POLLHUP but not on others).
>
>> === About sockets.py ===
>>
>> 1 - In SocketTransport it seems there's no error handling provisioned
>> for send() and recv().
>> You should expect these errors
>> http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60
>> signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry"
>
> Right, I know have been naive about these and have already got a TODO note.
>
>> 2 - SslTransport's send() and recv() methods should suffer the same problem.
>
> Ditto, Antoine told me.
>
>> 3 - I don't fully understand how data transfer works exactly but keep
>> in mind that the transport should interact with the pollster.
>> What I mean is that generally speaking a connected socket should
>> *always* be readable ("r"), even when it's idle, then switch to "rw"
>> events when sending data, then get back to "r" when all the data has
>> been sent.
>> This is *crucial* if you want to achieve high performances/scalability
>> and that is why PollsterBase should probably provide a modify()
>> method.
>> Please take a look at what I've done here:
>> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809
>
> Hm. I am not convinced that managing this explicitly from the
> transport is the right solution (note that my transports are quite
> different from those in Twisted). But I'll keep this in mind -- I
> would like to set up a benchmark suite at some point. I will probably
> have to implement the server side of HTTP for that purpose, so I can
> point e.g. ab at my app.
>
>> ===  Other considerations ===
>>
>> This 'yield' / 'yield from' approach is new to me (I'm more of a
>> "callback guy") so I can't say I fully understand what's going on just
>> by reading the code.
>
> Fair enough. You should probably start by reading Greg Ewing's
> tutorial -- it's short and sweet:
> http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.html
>
>> What I would like to see instead of main.py is a bunch of code samples
>> / demos showing how this library is supposed to be used in different
>> circumstances.
>
> Agreed, more examples are needed.
>
>> In details I'd like to see at least:
>>
>> 1 - a client example (connect(), send() a string, recv() a response, close())
>
> Hm, that's all in urlfetch().
>
>> 2 - an echo server example (accept(), recv() string,  send() it back(), close()
>
> Yes, that's missing.
>
>> 3 - how to use a different transport (e.g. UDP)?
>
> I haven't looked into this yet. I expect I'll have to write a
> different SocketTransport for this (the existing transports are
> implicitly stream-oriented) but I know that the scheduler and
> eventloop implementation can handle this fine.
>
>> 4 - how to run long running tasks in a thread?
>
> That's implemented. Check out call_in_thread(). Note that you can pass
> it an alternate threadpool (executor).
>
>> Also:
>>
>> 5 - is it possible to use multiple "reactors" in different threads?
>
> Should be possible.
>
>> How?  (asyncore for example achieves this by providing a separate
>> 'map' argument for both the 'reactor' and the dispatchers)
>
> It works by making the Context class use thread-local storage (TLS).
>
>> I understand you just started with this so I'm probably asking too
>> much at this point in time.
>> Feel free to consider this a kind of a "long term review".
>
> You have asked many useful questions already. Since you have
> implemented a real-world I/O loop yourself, your input is extremely
> valuable. Thanks, and keep at it!
>
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-- 
Thanks,
Andrew Svetlov