On Fri, Jan 4, 2013 at 6:53 PM, Markus <nepenthesdev(a)gmail.com> wrote:
> On Fri, Jan 4, 2013 at 11:33 PM, Guido van Rossum <guido(a)python.org> wrote:
>>On Wed, Dec 26, 2012 at 2:38 PM, Markus <nepenthesdev(a)gmail.com> wrote:
>>> First shoot should be getting a well established event loop into python.
>> Perhaps. What is your definition of an event loop?
> I ask the loop to notify me via callback if something I care about happens.
Heh. That's rather too general -- it depends on "something I care
about" which could be impossible to guess. :-)
> Usually that's fds and read/writeability.
Ok, although on some platforms it can't be a fd (UNIX-style small
integer) but some other abstraction, e.g. a socket *object* in Jython
or a "handle" on Windows (but I am already starting to repeat myself
> I create a data structure which has the fd, the event I care about,
> the callback and userdata, pass it to the loop, and the loop will take
> Next, timers, same story,
> I create a data structure which has the time I care about, the
> callback and userdata, pass it to the loop, and the loop will take
The "create data structure" part is a specific choice of interface
style, not necessarily the best for Python. Most event loop
implementations I've seen for Python (pyev excluded) just have various
methods that express everything through the argument list, not with a
separate data structure.
> Signals - sometimes having signals in the event loop is handy too.
> Same story.
Agreed, I've added this to the open issues section in the PEP.
Do you have a suggestion for a minimal interface for signal handling?
I could imagine the following:
- add_signal_handler(sig, callback, *args). Whenever signal 'sig' is
received, arrange for callback(*args) to be called. Returns a Handler
which can be used to cancel the signal callback. Specifying another
callback for the same signal replaces the previous handler (only one
handler can be active per signal).
- remove_signal_handler(sig). Removes the handler for signal 'sig',
if one is set.
Is anything else needed?
Note that Python only receives signals in the main thread, and the
effect may be undefined if the event loop is not running in the main
thread, or if more than one event loop sets a handler for the same
signal. It also can't work for signals directed to a specific thread
(I think POSIX defines a few of these, but I don't know of any support
for these in Python.)
>> But sockets are not native on Windows, and I am making some effort
>> with PEP 3156 to efficiently support higher-level abstractions without
>> tying them to sockets. (The plan is to support IOCP on Windows. The
>> previous version of Tulip already had a branch that did support that,
>> as a demonstration of the power of this abstraction.)
> Supporting IOCP on windows is absolutely required, as WSAPoll is
> broken and won't be fixed.
Wow. Now I'm even more glad that we're planning to support IOCP.
>> Only if the C code also uses libev, of course. But C programs may use
>> other event mechanisms -- e.g. AFAIK there are alternatives to libev
>> (during the early stages of Tulip development I chatted a bit with one
>> of the original authors of libevent, Niels Provos, and I believe
>> there's also something called libuv), and GUI frameworks (e.g. X, Qt,
>> Gtk, Wx) tend to have their own event loop.
> libuv is a wrapper around libev -adding IOCP- which adds some other
> things besides an event loop and is developed for/used in node.js.
Ah, that's helpful. I did not realize this after briefly skimming the
libuv page. (And the github logs suggest that it may no longer be the
>> PEP 3156 is designed to let alternative *implementations* of the same
>> *interface* be selected at run time. Hopefully it is possible to
>> provide a conforming implementation using libev -- then your goal
>> (smooth interoperability with C code using libev) is obtained.
> Smooth interoperability is not a major goal here - it's great if you
> get it for free.
> I'm just looking forward an event loop in the stdlib I want to use.
Heh, so stop objecting. :-)
>> (It would also be harder to implement initially as a 3rd party
>> framework. At the lowest level, no changes to Python itself are needed
>> -- it already supports non-blocking sockets, for example. But adding
>> optional callbacks to existing low-level APIs would require changes
>> throughout the stdlib.)
> As a result - making the stdlib async io aware - the complete stdlib.
> Would be great.
No matter what API style is chosen, making the entire stdlib async
aware will be tough. No matter what you do, the async support will
have to be "pulled through" every abstraction layer -- e.g. making
sockets async-aware doesn't automatically make socketserver or urllib2
async-aware(*). With the strong requirements for backwards
compatibility, in many cases it may be easier to define a new API that
is suitable for async use instead of trying to augment existing APIs.
(*) Unless you use microthreads, like gevent, but this has its own set
of problems -- I don't want to get into that here, since we seem to at
least agree on the need for an event loop with callbacks.
>> I am not so concerned about naming (it
>> seems inevitable that everyone uses somewhat different terminology
>> anyway, and it is probably better not to reuse terms when the meaning
>> is different), but I do like to look at guarantees (or the absence
>> thereof!) and best practices for dealing with the differences between
> Handler - the best example for not re-using terms.
??? (Can't tell if you're sarcastic or agreeing here.)
>> You haven't convinced me about this.
> Fine, if you include transports, I'll pick on the transports as well ;)
>> However, you can help me by
>> comparing the event loop part of PEP 3156 (ignoring anything that
>> returns or takes a Future) to libev and pointing out things (either
>> specific APIs or certain guarantees or requirements) that would be
>> hard to implement using libev, as well as useful features in libev
>> that you think every event loop should have.
> Note: In libev only the "default event loop" can have timers.
Interesting. This seems an odd constraint.
> * run() - ev_run(struct ev_loop)
> * stop() - ev_break(EV_UNLOOP_ALL)
> * run_forever() - registering an idle watcher will keep the loop alive
> * run_once(timeout=None) - registering an timer, have the timer stop() the loop
> * call_later(delay, callback, *args) - ev_timer
> * call_repeatedly(interval, callback, **args) - ev_timer (periodic)
> * call_soon(callback, *args) - Equivalent to call_later(0, callback, *args).
> - call_soon_threadsafe(callback, *args) - it would be better to have
> the event loops taking care of signals too, else waking up an ev_async
> in the loop which checks a async queue which contains the required
> information to register the call_soon callback would be possible
Not sure I understand. PEP 3156/Tulip uses a self-pipe to prevent race
conditions when call_soon_threadsafe() is called from a signal handler
or other thread(*) -- but I don't know if that is relevant or not.
> - getaddrinfo(host, port, family=0, type=0, proto=0, flags=0) - libev
> does not do dns
> - getnameinfo(sockaddr, flags=0) - libev does not do dns
Note that these exist at least in part so that an event loop
implementation may *choose* to implement its own DNS handling (IIUC
Twisted has this), whereas the default behavior is just to run
socket.getaddrinfo() -- but in a separate thread because it blocks.
(This is a useful test case for run_in_executor() too.)
> - create_transport(protocol_factory, host, port, **kwargs) - libev
> does not do transports
> - start_serving(protocol_factory, host, port, **kwds) - libev does
> not do transports
> * add_reader(fd, callback, *args) - create a ev_io watcher with EV_READ
> * add_writer(fd, callback, *args) - create ev_io watcher with EV_WRITE
> * remove_reader(fd) - in libev you have to name the watcher you want
> to stop, you can not remove watchers/handlers by fd, workaround is
> maintaining a dict with fd:Handler in the EventLoop
Ok, this does not sound like a show-stopper for a conforming PEP 3156
implementation on top of libev then, right? Just a minor
inconvenience. I'm sure everyone has *some* impedance mismatches to
> * remove_writer(fd) - same
> * add_connector(fd, callback, *args) - poll for writeability, getsockopt, done
TBH, I'm not 100% convinced of the need for add_connector(), but
Richard Oudkerk claims that it is needed for Windows. (OTOH if
WSAPoll() is too broken to bother, maybe we don't need it. It's a bit
of a nuisance because code that uses add_writer() instead works just
fine on UNIX but would be subtly broken on Windows, leading to
disappointments when porting apps to Windows. I'd rather have things
break on all platforms, or on none...)
> * remove_connector(fd) - same as with all other remove-by-fd methods
> As Transport are part of the PEP - some more:
> * create_transport(protocol_factory, host, port, **kwargs)
> kwargs requires "local" - local address as tuple like
> ('fe80::14ad:1680:54e1:6a91%eth0',0) - so you can bind when using ipv6
> link local scope.
> or ('192.168.2.1',5060) - bind local port for udp
Not sure I understand. What socket.connect() (or other API) call
parameters does this correspond to? What can't expressed through the
host and port parameters?
> * start_serving(protocol_factory, host, port, **kwds)
> what is the behaviour for SOCK_DGRAM - does this multiplex sessions
> based on src host/port / dst host/port - I'd love it.
TBH I haven't thought much about datagram transports. It's been years
since I used UDP. I guess the API may have to distinguish between
connected and unconnected UDP. I think the transport/protocol API will
be different than for SOCK_STREAM: for every received datagram, the
transport will call protocol.datagram_received(data, address), (the
address will be a dummy for connected use) and to send a datagram, the
protocol must call tranport.write_datagram(data, [address]), which
returns immediately. Flow control (if supported) should work the same
as for streams: if the transport finds its buffers exceed a certain
limit it will tell the protocol to back off by calling
> Requiring 2 handlers for every active connection r/w is highly ineffective.
How so? What is the concern? The actions of the read and write handler
are typically completely different, so the first thing the handler
would have to do is to decide whether to call the read or the write
code. Also, depending on flow control, only one of the two may be
If you are after minimizing the number of records passed to [e]poll or
kqueue, you can always collapse the handlers at that level and
distinguish between read/write based on the mask and recover the
appropriate user-level handler from the readers/writers array (and
this is what Tulip's epoll pollster class does).
PS. Also check out this issue, where an implementation of *just*
Tulip's pollster class for the stdlib is being designed:
http://bugs.python.org/issue16853; also check out the code reviews
> I'd prefer to be able to create a Handler from a loop.
> Handler = EventLoop.create_handler(socket, callback, events)
> and have the callback called with the returned events, so I can
> multiplex read/write op in the callback.
Hm. See above.
> Additionally, I can .stop() the handler without having to know the fd,
> .stop() the handler, change the events the handler is looking for,
> restart the handler with .start().
> In your proposal, I'd create a new handler every time I want to sent
> something, poll for readability - discard the handler when I'm done,
> create a new one for the next sent.
The questions are, does it make any difference in efficiency (when
using Python -- the performance of the C API is hardly relevant here),
and how often does this pattern occur.
> Not in the PEP - re-arming a timer
> lets say I want to do something if nothing happens for 5 seconds.
> I create a timer call_later(5.,cb), if something happens, I need to
> cancel the timer and create a new one. If there was a Timer:
Actually it's one less call using the PEP's proposed API:
timer = loop.call_later(5, callback)
Which of the two idioms is faster? Who knows? libev's pattern is
probably faster in C, but that has little to bear on the cost in
Python. My guess is that the amount of work is about the same -- the
real cost is that you have to make some changes the heap used to keep
track of all timers in the order in which they will trigger, and those
changes are the same regardless of how you style the API.
> I think SSL should be a Protocol not a transport - implemented using BIO pairs.
> If you can chain protocols, like Transport / ProtocolA / ProtocolB you can have
> TCP / SSL / HTTP as https or TCP / SSL / SOCKS / HTTP as https via
> ssl enabled socks proxy without having to much problems. Another
> example, shaping a connection TCP / RATELIMIT / HTTP.
Interesting idea. This may be up to the implementation -- not every
implementation may have BIO wrappers available (AFAIK the stdlib
doesn't), so the stackability may not be easy to implement everywhere.
In any case, when you stack things like this, the stack doesn't look
like transport<-->protocol<-->protocol<-->protocol; rather, it's
A<-->B<-->C<-->D where each object has a "left" and a "right" API.
Each arrow connects the "transport (right) half" of the object on its
left (e.g. A) to the "protocol (left) half" of the object on the
arrow's right (e.g. B). So maybe we can visualise this as T1 <-->
P2:T2 <--> P3:T3 <--> P4.
> Having SSL as a Protocol allows closing the SSL connection without
> closing the TCP connection, re-using the TCP connection, re-using a
> SSL session cookie during reconnect of the SSL Protocol.
That seems a pretty esoteric use case (though given your background in
honeypots maybe common for you :-). It also seems hard to get both
sides acting correctly when you do this (but I'm certainly no SSL
expert -- I just want it supported because half the web is
inaccessible these days if you don't speak SSL, regardless of whether
you do any actual verification).
All in all I think that stackable transports/protocols are mostly
something that is enabled by the interfaces defined here (the PEP
takes care not to specify any base classes from which you must inherit
-- you must just implement certain methods, and the rest is duck
typing) but otherwise does not concern the PEP much.
The only concern I have, really, is that the PEP currently hints that
both protocols and transports might have pause() and resume() methods
for flow control, where the protocol calls transport.pause() if
protocol.data_received() is called too frequently, and the transport
calls protocol.pause() if transport.write() has buffered more data
than sensible. But for an object that is both a protocol and a
transport, this would make it impossible to distinguish between
pause() calls by its left and right neighbors. So maybe the names must
differ. Given the tendency of transport method names to be shorter
(e.g. write()) vs. the longer protocol method names (data_received(),
connection_lost() etc.), perhaps it should be transport.pause() and
protocol.pause_writing() (and similar for resume()).
> * reconnect() - I'd love to be able to reconnect a transport
But what does that mean in general? It depends on the protocol (e.g.
FTP, HTTP, IRC, SMTP) how much state must be restored/renegotiated
upon a reconnect, and how much data may have to be re-sent. This seems
a higher-level feature that transports and protocols will have to
> * timers - Transports need timers
I think you mean timeouts?
> * dns-resolve-timeout - dns can be slow
> * connecting-timeout - connecting can take too much time, more than
> we want to wait
> * idle-timeout ( no action on the connection for a while ) - call
> * sustain-timeout ( max session time ) - close() transport
> * ssl-handshake-timeout ( in case ssl is a Transport ) - close transport
> * close-timeout (shutdown is async) - close transport hard
> * reconnect-timeout - (wait some seconds before reconnecting) -
> reconnect connection
This is an interesting point. I think some of these really do need
APIs in the PEP, others may be implemented using existing machinery
(e.g. call_later() to schedule a callback that calls cancel() on a
task). I've added a bullet on this to Open Issue.
> Now, in case we connect to a host by name, and have multiple addresses
> resolved, and the first connection can not be established, there is no
> way to 'reconnect()' - as the protocol does not yet exist.
Twisted suggested something here which I haven't implemented yet but
which seems reasonable -- using a series of short timeouts try
connecting to the various addresses and keep the first one that
connects successfully. If multiple addresses connect after the first
timeout, too bad, just close the redundant sockets, little harm is
done (though the timeouts should be tuned that this is relatively
rare, because a server may waste significant resources on such
> For almost all the timeouts I mentioned - the protocol needs to take
> care - so the protocol has to exist before the connection is
> established in case of outbound connections.
I'm not sure I follow. Can you sketch out some code to help me here?
ISTM that e.g. the DNS, connect and handshake timeouts can be
implemented by the machinery that tries to set up the connection
behind the scenes, and the user's protocol won't know anything of
these shenanigans. The code that calls create_transport() (actually
it'll probably be renamed create_client()) will just get a Future that
either indicates success (and then the protocol and transport are
successfully hooked up) or an error (and then no protocol was created
-- whether or not a transport was created is an implementation
> In case aconnection is lost and reconnecting is required -
> .reconnect() is handy, so the protocol can request reconnecting.
I'd need more details of how you would like to specify this.
> As this does not work with the current Protocols callbacks I propose
> Protocols.connection_established() therefore.
How does this differ from connection_made()?
(I'm trying to follow Twisted's guidance here, they seem to have the
longest experience doing these kinds of things. When I talked to Glyph
IIRC he was skeptical about reconnecting in general.)
> I'd outline protocol_factory can be a instance of a class, which can
> set specific parameters for 'things'
> class p:
> def __init__(self, a=1,b=2,c=3):
> self.a = a
> self.b = b
> self.c = c
> def __call__(self):
> return p(a=self.a, b=self.b, c=self.c)
> def ... all protocol methods ...:
> EventLoop.start_serving(p(a=5,b=7), ...)
> EventLoop.start_serving(p(a=9,b=4), ...)
> Same Protocol, different parameters for it.
No such helper method (or class) is needed. You can use a lambda or
functools.partial for the same effect. I'll add a note to the PEP to
remind people of this.
> + connection_established()
> + timeout_dns()
> + timeout_idle()
> + timeout_connecting()
> * data_received(data) - if it was possible to return the number of
> bytes consumed by the protocol, and have the Transport buffer the rest
> for the next io in call, one would avoid having to do this in every
> Protocol on it's own - learned from experience.
Twisted has a whole slew of protocol implementation subclasses that
implement various strategies like line-buffering (including a really
complex version where you can turn the line buffering on and off) and
"netstrings". I am trying to limit the PEP's size by not including
these, but I fully expect that in practice a set of useful protocol
implementations will be created that handles common cases. I'm not
convinced that putting this in the transport/protocol interface will
make user code less buggy: it seems easy for the user code to miscount
the bytes or not return a count at all in a rarely taken code branch.
> * eof_received()/connection_lost(exc) - a connection can be closed
> clean recv()=0, unclean recv()=-1, errno, SIGPIPE when writing and in
> case of SSL even more, it is required to distinguish.
Well, this is why eof_received() exists -- to indicate a clean close.
We should never receive SIGPIPE (Python disables this signal, so you
always get the errno instead). According to Glyph, SSL doesn't support
sending eof, so you have to use Content-length or a chunked encoding.
What other conditions do you expect from SSL that wouldn't be
distinguished by the exception instance passed to connection_lost()?
> + nextlayer_is_empty() - called if the Transport (or underlying
> Protocol in case of chaining) write buffer is empty - Imagine an http
> server sending a 1GB file, you do not want to sent 1GB at once - as
> you do not have that much memory, but get a callback if the transport
> done sending the chunk you've queued, so you can send the next chunk
> of data.
That's what the pause()/resume() flow control protocol is for. You
read the file (presumably it's a file) in e.g. 16K blocks and call
write() for each block; if the transport can't keep up and exceeds its
buffer space, it calls protocol.pause() (or perhaps
protocol.pause_writing(), see discussion above).
> Next, what happens if a dns can not be resolved, ssl handshake (in
> case ssl is transport) or connecting fails - in my opinion it's an
> error the protocol is supposed to take care of
> + error_dns
> + error_ssl
> + error_connecting
The future returned by create_transport() (aka create_client()) will
raise the exception.
> I'm not that much into futures - so I may have got some things wrong.
No problem. You may want to read PEP 3148, it explains Futures and
much of that explanation remains valid; just in PEP 3156 to wait for a
future you must use "yield from <future>".
--Guido van Rossum (python.org/~guido)
On December 28th, an unknown attacker used a previously unknown remote
code exploit on http://wiki.python.org/. The attacker was able to get
shell access as the "moin" user, but no other services were affected.
Some time later, the attacker deleted all files owned by the "moin"
user, including all instance data for both the Python and Jython
wikis. The attack also had full access to all MoinMoin user data on
all wikis. In light of this, the Python Software Foundation encourages
all wiki users to change their password on other sites if the same one
is in use elsewhere. We apologize for the inconvenience and will post
further news as we bring the new and improved wiki.python.org online.
If you have any questions about this incident please contact
jnoller(a)python.org. Thank you for your patience.
There's an interesting python "variant" (more of an overlay actually)
that is rather intriguing on github -- Vigil: a truly safe progamming
>From the readme:
"Infinitely more important than mere syntax and semantics are its
addition of supreme moral vigilance. This is similar to contracts, but
less legal and more medieval."
On Fri, Jan 4, 2013 at 2:38 PM, Dustin Mitchell <djmitche(a)gmail.com> wrote:
> As the maintainer of a pretty large, complex app written in Twisted, I think
> this is great. I look forward to a future of being able to select from a
> broad library of async tools, and being able to write tools that can be used
> outside of Twisted.
Thanks. Me too. :-)
> Buildbot began, lo these many years ago, doing a lot of things in memory on
> on local disk, neither of which require asynchronous IO. So a lot of API
> methods did not originally return Deferreds. Those methods are then used by
> other methods, many of which also do not return Deferreds. Now, we want to
> use a database backend, and parallelize some of the operations, meaning that
> the methods need to return a Deferred. Unfortunately, that requires a
> complete tree traversal of all of the methods and methods that call them,
> rewriting them to take and return Deferreds. There's no "halfway" solution.
> This is a little easier with generators (@inlineCallbacks), since the syntax
> doesn't change much, but it's a significant change to the API (in fact, this
> is a large part of the reason for the big rewrite for Buildbot-0.9.x).
> I bring all this up to say, this PEP will introduce a new "kind" of method
> signature into standard Python, one which the caller must know, and the use
> of which changes the signature of the caller. That can cause sweeping
> changes, and debugging those changes can be tricky.
Yes, and this is the biggest unproven point of the PEP. (The rest is
all backed by a decade or more of experience.)
> Two things can help:
> First, `yield from somemeth()` should work fine even if `somemeth` is not a
> coroutine function, and authors of async tools should be encouraged to use
> this form to assist future-compatibility. Second, `somemeth()` without a
> yield should fail loudly if `somemeth` is a coroutine function. Otherwise,
> the effects can be pretty confusing.
That would be nice. But the way yield from and generators work, that's
hard to accomplish without further changes to the language -- and I
don't want to have to change the language again (at least not
immediately -- maybe in a few releases, after we've learned what the
real issues are). The best I can do for the first requirement is to
define @coroutine in a way that if the decorated function isn't a
generator, it is wrapped in one. For the second requirement, if you
call somemeth() and ignore the result, nothing happens at all -- this
is indeed infuriating but I see no way to change this.(*) If you use
the result, well, Futures have different attributes than most other
objects so hopefully you'll get a loud AttributeError or TypeError
soon, but of course if you pass it into something else which uses it,
it may still be difficult to track. Hopefully these error messages
provide a hint:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Future' object has no attribute 'foo'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'Future' object is not callable
(*) There's a heavy gun we might use, but I would make this optional,
as a heavy duty debugging mode only. @coroutine could wrap generators
in a lightweight object with a __del__ method and an __iter__ method.
If __del__ is called before __iter__ is ever called, it could raise an
exception or log a warning. But this probably adds too much overhead
to have it always enabled.
> In http://code.google.com/p/uthreads, I accomplished the latter by taking
> advantage of garbage collection: if the generator is garbage collected
> before it's begun, then it's probably not been yielded. This is a bit
> gross, but good enough as a debugging technique.
Eh, yeah, what I said. :-)
> On the topic of debugging, I also took pains to make sure that tracebacks
> looked reasonable, filtering out scheduler code. I haven't looked
> closely at Tulip to see if that's a problem. Most of the "noise" in the
> tracebacks came from the lack of 'yield from', so it may not be an issue at
One of the great advantages of using yield from is that the tracebacks
automatically look nice.
--Guido van Rossum (python.org/~guido)
As the maintainer of a pretty large, complex app written in Twisted, I
think this is great. I look forward to a future of being able to
select from a broad library of async tools, and being able to write
tools that can be used outside of Twisted.
Buildbot began, lo these many years ago, doing a lot of things in
memory on on local disk, neither of which require asynchronous IO. So
a lot of API methods did not originally return Deferreds. Those
methods are then used by other methods, many of which also do not
return Deferreds. Now, we want to use a database backend, and
parallelize some of the operations, meaning that the methods need to
return a Deferred. Unfortunately, that requires a complete tree
traversal of all of the methods and methods that call them, rewriting
them to take and return Deferreds. There's no "halfway" solution.
This is a little easier with generators (@inlineCallbacks), since the
syntax doesn't change much, but it's a significant change to the API
(in fact, this is a large part of the reason for the big rewrite for
I bring all this up to say, this PEP will introduce a new "kind" of
method signature into standard Python, one which the caller must know,
and the use of which changes the signature of the caller. That can
cause sweeping changes, and debugging those changes can be tricky.
Two things can help:
First, `yield from somemeth()` should work fine even if `somemeth` is
not a coroutine function, and authors of async tools should be
encouraged to use this form to assist future-compatibility. Second,
`somemeth()` without a yield should fail loudly if `somemeth` is a
coroutine function. Otherwise, the effects can be pretty confusing.
In http://code.google.com/p/uthreads, I accomplished the latter by
taking advantage of garbage collection: if the generator is garbage
collected before it's begun, then it's probably not been yielded.
This is a bit gross, but good enough as a debugging technique.
On the topic of debugging, I also took pains to make sure that
tracebacks looked reasonable, filtering out scheduler code. I
haven't looked closely at Tulip to see if that's a problem. Most of
the "noise" in the tracebacks came from the lack of 'yield from', so
it may not be an issue at all.
P.S. Apologies for the bad threading - I wasn't on the list when this
was last posted.
[Markus sent this to me off-list, but agreed to me responding on-list,
quoting his entire message.]
On Wed, Dec 26, 2012 at 2:38 PM, Markus <nepenthesdev(a)gmail.com> wrote:
I don't believe we've met before, have we? It would probably help if
you introduced yourself and your experience, since our past
experiences color our judgment.
> as I've been waiting for this to happen, I decided to speak up.
> While I really look forward to this, I disagree with the PEP.
Heh, we can't all agree on everything. :-)
> First shoot should be getting a well established event loop into python.
Perhaps. What is your definition of an event loop?
> libev is great, it takes care of operating system specialities, and
> only does a single job, providing an event loop.
It is also written for C, and I presume much of its API design was
influenced by the conventions and affordabilities of that language.
> This event loop can take care of timers, sockets and signals,
But sockets are not native on Windows, and I am making some effort
with PEP 3156 to efficiently support higher-level abstractions without
tying them to sockets. (The plan is to support IOCP on Windows. The
previous version of Tulip already had a branch that did support that,
as a demonstration of the power of this abstraction.)
> pyev, a
> great python wrapper for libev already provides this simple eventing
> facility in python.
But, being a libev wrapper, it is likely also strongly influenced by C.
> In case you embed python in a c program, the libev default loop of the
> python code and c code can even be shared, providing a great amount of
Only if the C code also uses libev, of course. But C programs may use
other event mechanisms -- e.g. AFAIK there are alternatives to libev
(during the early stages of Tulip development I chatted a bit with one
of the original authors of libevent, Niels Provos, and I believe
there's also something called libuv), and GUI frameworks (e.g. X, Qt,
Gtk, Wx) tend to have their own event loop.
PEP 3156 is designed to let alternative *implementations* of the same
*interface* be selected at run time. Hopefully it is possible to
provide a conforming implementation using libev -- then your goal
(smooth interoperability with C code using libev) is obtained.
It's possible that in order to do that the PEP 3156 interface may have
to be refactored into separate pieces. The Tulip implementation
already has separate "pollster" implementations (which concern
themselves *only* with polling for I/O using select, poll, or other
alternatives). It probably makes sense to factor the part that
implements transports out as well. However, the whole point of
including transports and protocols (and futures) in the PEP is that
some platforms may want to implement the same high-level API (e.g.
create a transport that connects to a certain host/port) using a
different approach altogether, e.g. on Windows the transport might not
even use sockets. OTOH on UNIX it may be possible to add file
descriptors representing pipes and pseudo-ttys.
> libev is great as it is small - it provides exactly what's required,
> and nothing beyond.
Depending on your requirements. :-)
> getaddrinfo/getnameinfo/create_transport are out of scope from a event
> loop point of view.
> This functionality already exists in python, it just does not use a
> event loop and is blocking, as every other io related api.
It wasn't random to add these. The "event loop" in PEP 3156 provides
abstractions that leave the platform free to implement connections
using the appropriate native constructs without letting those
constructs "leak" into the application -- after all, whether you're on
UNIX or on Windows, a TCP connection represents the same abstraction,
but the network stack may have a very different interface.
> I'd propose not to replicate the functionality in the event loop
> namespace, but to extend the existing implementations - by allowing to
> provide an event loop/callback/ctx as optional args which get used.
That's an interface choice that I would regret (I really don't like
writing code using callbacks).
(It would also be harder to implement initially as a 3rd party
framework. At the lowest level, no changes to Python itself are needed
-- it already supports non-blocking sockets, for example. But adding
optional callbacks to existing low-level APIs would require changes
throughout the stdlib.)
> If you specify something like pyev as PEP, you can still come up with
> another PEP which defines the semantics for upper layer protocols like
> udp/tcp on IPv4/6, which can be used to take care of dns and
I could split up the PEP, but that wouldn't really change anything,
since to me it is still a package deal. I am willing to put an effort
into specifying a low-level event loop because I know that I can still
write high-level code which is (mostly) free of callbacks, using
futures, tasks and the yield-from construct. And in order to do that I
need a minimum set of high-level abstractions such as getaddrinfo()
and transport creation (the exact names of the transport creation
methods are still under debate, as are the details of their
signatures, but the need for them is established without a doubt in my
I note that the stdlib socket module has roughly the same set of
abstractions bundled together:
- socket objects
- getaddrinfo(), getnameinfo()
- the makefile() methods on socket objects, which create buffered streams
PEP 3156 offers alternatives for all of these, using higher-level
abstractions that have been developed and proven in practice by
Twisted, *and* offers a path to interop to frameworks that previously
couldn't very well interoperate -- Twisted, Tornado, and others have
traditionally been pretty segregated, but with PEP 3156 they can
interoperate both through the event loop and through Futures (which
are friendly both to a callback style and to yield-from).
> Anyway, I really hope you'll have a look on libev and pyev, both is
> great and well tested software and may give you an idea what people
> who dedicate themselves to event loops came up with already in terms
> of names, subclassing, requirements, guarantees and workarounds for
> platform specific failures (kqueue, epoll ...).
I will certainly have a look! I am not so concerned about naming (it
seems inevitable that everyone uses somewhat different terminology
anyway, and it is probably better not to reuse terms when the meaning
is different), but I do like to look at guarantees (or the absence
thereof!) and best practices for dealing with the differences between
> All together, I'd limit the scope of the PEP to the API of the event
> loop, just focussing on io/timers/signals and propose to extend
> existing API to be usable with an event loop, instead of replicating
You haven't convinced me about this. However, you can help me by
comparing the event loop part of PEP 3156 (ignoring anything that
returns or takes a Future) to libev and pointing out things (either
specific APIs or certain guarantees or requirements) that would be
hard to implement using libev, as well as useful features in libev
that you think every event loop should have.
> For naming I'd prefer 'watcher' over 'Handler'.
Hm, 'watcher' to me sounds more active than the behavior I have in
mind for this class. It is just a reification of a specific function
and some arguments to pass to it, with the ability to cancel the call
Thanks for writing!
--Guido van Rossum (python.org/~guido)
I propose to add new standard collection types: IdentityDict and
IdentitySet. They are almost same as ordinal dict and set, but uses
identity check instead of equality check (and id() or hash(id()) as a
hash). They will be useful for pickling, for implementing __sizeof__()
for compound types, and for other graph algorithms.
Of course, they can be implemented using ordinal dicts:
IdentityDict: key -> value as a dict: id(key) -> (key, value)
IdentitySet as a dict: id(value) -> value
However implementing them directly in the core has advantages, it
consumes less memory and time, and more comfortable for use from C.
IdentityDict and IdentitySet implementations will share almost all code
with implementations of ordinal dict and set, only lookup function and
metainformation will be different. However dict and set already use a
lookup function overloading.
середа 02 січень 2013 21:43:47 Eli Bendersky ви написали:
> I agree that the data structures may be useful, but is there no way to some
> allow the customization of existing data structures instead, without losing
> performance? It's a shame to have another kind of dict just for this
What interface for the customization is possible? Obviously, a dict
constructor can't have a special keyword argument.
Sometimes, I have the flexibility to reduce the memory used by my
program (e.g., by destroying large cached objects, etc.). It would be
great if I could ask Python interpreter to notify me when memory is
running out, so I can take such actions.
Of course, it's nearly impossible for Python to know in advance if the
OS would run out of memory with the next malloc call. Furthermore,
Python shouldn't guess which memory (physical, virtual, etc.) is
relevant in the particular situation (for instance, in my case, I only
care about physical memory, since swapping to disk makes my
application as good as frozen). So the problem as stated above is
But let's say I am willing to do some work to estimate the maximum
amount of memory my application can be allowed to use. If I provide
that number to Python interpreter, it may be possible for it to notify
me when the next memory allocation would exceed this limit by calling
a function I provide it (hopefully passing as arguments the amount of
memory being requested, as well as the amount currently in use). My
callback function could then destroy some objects, and return True to
indicate that some objects were destroyed. At that point, the
intepreter could run its standard garbage collection routines to
release the memory that corresponded to those objects - before
proceeding with whatever it was trying to do originally. (If I
returned False, or if I didn't provide a callback function at all, the
interpreter would simply behave as it does today.) Any memory
allocations that happen while the callback function itself is
executing, would not trigger further calls to it. The whole mechanism
would be disabled for the rest of the session if the memory freed by
the callback function was insufficient to prevent going over the
Would this be worth considering for a future language extension? How
hard would it be to implement?