Mailman 3 January 2013 - Python-ideas

Re: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
by Guido van Rossum Jan. 8, 2013

Jan. 8, 2013

On Fri, Jan 4, 2013 at 6:53 PM, Markus <nepenthesdev(a)gmail.com> wrote: > On Fri, Jan 4, 2013 at 11:33 PM, Guido van Rossum <guido(a)python.org> wrote: >>On Wed, Dec 26, 2012 at 2:38 PM, Markus <nepenthesdev(a)gmail.com> wrote: >>> First shoot should be getting a well established event loop into python. >> >> Perhaps. What is your definition of an event loop? > > I ask the loop to notify me via callback if something I care about happens. … [View More]Heh. That's rather too general -- it depends on "something I care about" which could be impossible to guess. :-) > Usually that's fds and read/writeability. Ok, although on some platforms it can't be a fd (UNIX-style small integer) but some other abstraction, e.g. a socket *object* in Jython or a "handle" on Windows (but I am already starting to repeat myself :-). > I create a data structure which has the fd, the event I care about, > the callback and userdata, pass it to the loop, and the loop will take > care. > > Next, timers, same story, > I create a data structure which has the time I care about, the > callback and userdata, pass it to the loop, and the loop will take > care. The "create data structure" part is a specific choice of interface style, not necessarily the best for Python. Most event loop implementations I've seen for Python (pyev excluded) just have various methods that express everything through the argument list, not with a separate data structure. > Signals - sometimes having signals in the event loop is handy too. > Same story. Agreed, I've added this to the open issues section in the PEP. Do you have a suggestion for a minimal interface for signal handling? I could imagine the following: - add_signal_handler(sig, callback, *args). Whenever signal 'sig' is received, arrange for callback(*args) to be called. Returns a Handler which can be used to cancel the signal callback. Specifying another callback for the same signal replaces the previous handler (only one handler can be active per signal). - remove_signal_handler(sig). Removes the handler for signal 'sig', if one is set. Is anything else needed? Note that Python only receives signals in the main thread, and the effect may be undefined if the event loop is not running in the main thread, or if more than one event loop sets a handler for the same signal. It also can't work for signals directed to a specific thread (I think POSIX defines a few of these, but I don't know of any support for these in Python.) >> But sockets are not native on Windows, and I am making some effort >> with PEP 3156 to efficiently support higher-level abstractions without >> tying them to sockets. (The plan is to support IOCP on Windows. The >> previous version of Tulip already had a branch that did support that, >> as a demonstration of the power of this abstraction.) > > Supporting IOCP on windows is absolutely required, as WSAPoll is > broken and won't be fixed. > http://social.msdn.microsoft.com/Forums/hu/wsk/thread/18769abd-fca0-4d3c-98… Wow. Now I'm even more glad that we're planning to support IOCP. >> Only if the C code also uses libev, of course. But C programs may use >> other event mechanisms -- e.g. AFAIK there are alternatives to libev >> (during the early stages of Tulip development I chatted a bit with one >> of the original authors of libevent, Niels Provos, and I believe >> there's also something called libuv), and GUI frameworks (e.g. X, Qt, >> Gtk, Wx) tend to have their own event loop. > > libuv is a wrapper around libev -adding IOCP- which adds some other > things besides an event loop and is developed for/used in node.js. Ah, that's helpful. I did not realize this after briefly skimming the libuv page. (And the github logs suggest that it may no longer be the case: https://github.com/joyent/libuv/commit/1282d64868b9c560c074b9c9630391f3b18e… >> PEP 3156 is designed to let alternative *implementations* of the same >> *interface* be selected at run time. Hopefully it is possible to >> provide a conforming implementation using libev -- then your goal >> (smooth interoperability with C code using libev) is obtained. > > Smooth interoperability is not a major goal here - it's great if you > get it for free. > I'm just looking forward an event loop in the stdlib I want to use. Heh, so stop objecting. :-) >> (It would also be harder to implement initially as a 3rd party >> framework. At the lowest level, no changes to Python itself are needed >> -- it already supports non-blocking sockets, for example. But adding >> optional callbacks to existing low-level APIs would require changes >> throughout the stdlib.) > > As a result - making the stdlib async io aware - the complete stdlib. > Would be great. No matter what API style is chosen, making the entire stdlib async aware will be tough. No matter what you do, the async support will have to be "pulled through" every abstraction layer -- e.g. making sockets async-aware doesn't automatically make socketserver or urllib2 async-aware(*). With the strong requirements for backwards compatibility, in many cases it may be easier to define a new API that is suitable for async use instead of trying to augment existing APIs. (*) Unless you use microthreads, like gevent, but this has its own set of problems -- I don't want to get into that here, since we seem to at least agree on the need for an event loop with callbacks. >> I am not so concerned about naming (it >> seems inevitable that everyone uses somewhat different terminology >> anyway, and it is probably better not to reuse terms when the meaning >> is different), but I do like to look at guarantees (or the absence >> thereof!) and best practices for dealing with the differences between >> platforms. > > Handler - the best example for not re-using terms. ??? (Can't tell if you're sarcastic or agreeing here.) >> You haven't convinced me about this. > > Fine, if you include transports, I'll pick on the transports as well ;) ??? (Similar.) >> However, you can help me by >> comparing the event loop part of PEP 3156 (ignoring anything that >> returns or takes a Future) to libev and pointing out things (either >> specific APIs or certain guarantees or requirements) that would be >> hard to implement using libev, as well as useful features in libev >> that you think every event loop should have. > > > Note: In libev only the "default event loop" can have timers. Interesting. This seems an odd constraint. > EventLoop > * run() - ev_run(struct ev_loop) > * stop() - ev_break(EV_UNLOOP_ALL) > * run_forever() - registering an idle watcher will keep the loop alive > * run_once(timeout=None) - registering an timer, have the timer stop() the loop > * call_later(delay, callback, *args) - ev_timer > * call_repeatedly(interval, callback, **args) - ev_timer (periodic) > * call_soon(callback, *args) - Equivalent to call_later(0, callback, *args). > - call_soon_threadsafe(callback, *args) - it would be better to have > the event loops taking care of signals too, else waking up an ev_async > in the loop which checks a async queue which contains the required > information to register the call_soon callback would be possible Not sure I understand. PEP 3156/Tulip uses a self-pipe to prevent race conditions when call_soon_threadsafe() is called from a signal handler or other thread(*) -- but I don't know if that is relevant or not. (*) http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#448 and http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#576 > - getaddrinfo(host, port, family=0, type=0, proto=0, flags=0) - libev > does not do dns > - getnameinfo(sockaddr, flags=0) - libev does not do dns Note that these exist at least in part so that an event loop implementation may *choose* to implement its own DNS handling (IIUC Twisted has this), whereas the default behavior is just to run socket.getaddrinfo() -- but in a separate thread because it blocks. (This is a useful test case for run_in_executor() too.) > - create_transport(protocol_factory, host, port, **kwargs) - libev > does not do transports > - start_serving(protocol_factory, host, port, **kwds) - libev does > not do transports > * add_reader(fd, callback, *args) - create a ev_io watcher with EV_READ > * add_writer(fd, callback, *args) - create ev_io watcher with EV_WRITE > * remove_reader(fd) - in libev you have to name the watcher you want > to stop, you can not remove watchers/handlers by fd, workaround is > maintaining a dict with fd:Handler in the EventLoop Ok, this does not sound like a show-stopper for a conforming PEP 3156 implementation on top of libev then, right? Just a minor inconvenience. I'm sure everyone has *some* impedance mismatches to deal with. > * remove_writer(fd) - same > * add_connector(fd, callback, *args) - poll for writeability, getsockopt, done TBH, I'm not 100% convinced of the need for add_connector(), but Richard Oudkerk claims that it is needed for Windows. (OTOH if WSAPoll() is too broken to bother, maybe we don't need it. It's a bit of a nuisance because code that uses add_writer() instead works just fine on UNIX but would be subtly broken on Windows, leading to disappointments when porting apps to Windows. I'd rather have things break on all platforms, or on none...) > * remove_connector(fd) - same as with all other remove-by-fd methods > > As Transport are part of the PEP - some more: > > EventLoop > * create_transport(protocol_factory, host, port, **kwargs) > kwargs requires "local" - local address as tuple like > ('fe80::14ad:1680:54e1:6a91%eth0',0) - so you can bind when using ipv6 > link local scope. > or ('192.168.2.1',5060) - bind local port for udp Not sure I understand. What socket.connect() (or other API) call parameters does this correspond to? What can't expressed through the host and port parameters? > * start_serving(protocol_factory, host, port, **kwds) > what is the behaviour for SOCK_DGRAM - does this multiplex sessions > based on src host/port / dst host/port - I'd love it. TBH I haven't thought much about datagram transports. It's been years since I used UDP. I guess the API may have to distinguish between connected and unconnected UDP. I think the transport/protocol API will be different than for SOCK_STREAM: for every received datagram, the transport will call protocol.datagram_received(data, address), (the address will be a dummy for connected use) and to send a datagram, the protocol must call tranport.write_datagram(data, [address]), which returns immediately. Flow control (if supported) should work the same as for streams: if the transport finds its buffers exceed a certain limit it will tell the protocol to back off by calling protocol.pause(). > Handler: > Requiring 2 handlers for every active connection r/w is highly ineffective. How so? What is the concern? The actions of the read and write handler are typically completely different, so the first thing the handler would have to do is to decide whether to call the read or the write code. Also, depending on flow control, only one of the two may be active. If you are after minimizing the number of records passed to [e]poll or kqueue, you can always collapse the handlers at that level and distinguish between read/write based on the mask and recover the appropriate user-level handler from the readers/writers array (and this is what Tulip's epoll pollster class does). PS. Also check out this issue, where an implementation of *just* Tulip's pollster class for the stdlib is being designed: http://bugs.python.org/issue16853; also check out the code reviews here; http://bugs.python.org/review/16853/ > I'd prefer to be able to create a Handler from a loop. > Handler = EventLoop.create_handler(socket, callback, events) > and have the callback called with the returned events, so I can > multiplex read/write op in the callback. Hm. See above. > Additionally, I can .stop() the handler without having to know the fd, > .stop() the handler, change the events the handler is looking for, > restart the handler with .start(). > In your proposal, I'd create a new handler every time I want to sent > something, poll for readability - discard the handler when I'm done, > create a new one for the next sent. The questions are, does it make any difference in efficiency (when using Python -- the performance of the C API is hardly relevant here), and how often does this pattern occur. > Timers: > Not in the PEP - re-arming a timer > lets say I want to do something if nothing happens for 5 seconds. > I create a timer call_later(5.,cb), if something happens, I need to > cancel the timer and create a new one. If there was a Timer: > Timer.stop() > Timer.set(5) > Timer.start() Actually it's one less call using the PEP's proposed API: timer.cancel() timer = loop.call_later(5, callback) Which of the two idioms is faster? Who knows? libev's pattern is probably faster in C, but that has little to bear on the cost in Python. My guess is that the amount of work is about the same -- the real cost is that you have to make some changes the heap used to keep track of all timers in the order in which they will trigger, and those changes are the same regardless of how you style the API. > Transports: > I think SSL should be a Protocol not a transport - implemented using BIO pairs. > If you can chain protocols, like Transport / ProtocolA / ProtocolB you can have > TCP / SSL / HTTP as https or TCP / SSL / SOCKS / HTTP as https via > ssl enabled socks proxy without having to much problems. Another > example, shaping a connection TCP / RATELIMIT / HTTP. Interesting idea. This may be up to the implementation -- not every implementation may have BIO wrappers available (AFAIK the stdlib doesn't), so the stackability may not be easy to implement everywhere. In any case, when you stack things like this, the stack doesn't look like transport<-->protocol<-->protocol<-->protocol; rather, it's A<-->B<-->C<-->D where each object has a "left" and a "right" API. Each arrow connects the "transport (right) half" of the object on its left (e.g. A) to the "protocol (left) half" of the object on the arrow's right (e.g. B). So maybe we can visualise this as T1 <--> P2:T2 <--> P3:T3 <--> P4. > Having SSL as a Protocol allows closing the SSL connection without > closing the TCP connection, re-using the TCP connection, re-using a > SSL session cookie during reconnect of the SSL Protocol. That seems a pretty esoteric use case (though given your background in honeypots maybe common for you :-). It also seems hard to get both sides acting correctly when you do this (but I'm certainly no SSL expert -- I just want it supported because half the web is inaccessible these days if you don't speak SSL, regardless of whether you do any actual verification). All in all I think that stackable transports/protocols are mostly something that is enabled by the interfaces defined here (the PEP takes care not to specify any base classes from which you must inherit -- you must just implement certain methods, and the rest is duck typing) but otherwise does not concern the PEP much. The only concern I have, really, is that the PEP currently hints that both protocols and transports might have pause() and resume() methods for flow control, where the protocol calls transport.pause() if protocol.data_received() is called too frequently, and the transport calls protocol.pause() if transport.write() has buffered more data than sensible. But for an object that is both a protocol and a transport, this would make it impossible to distinguish between pause() calls by its left and right neighbors. So maybe the names must differ. Given the tendency of transport method names to be shorter (e.g. write()) vs. the longer protocol method names (data_received(), connection_lost() etc.), perhaps it should be transport.pause() and protocol.pause_writing() (and similar for resume()). > * reconnect() - I'd love to be able to reconnect a transport But what does that mean in general? It depends on the protocol (e.g. FTP, HTTP, IRC, SMTP) how much state must be restored/renegotiated upon a reconnect, and how much data may have to be re-sent. This seems a higher-level feature that transports and protocols will have to implement themselves. > * timers - Transports need timers I think you mean timeouts? > * dns-resolve-timeout - dns can be slow > * connecting-timeout - connecting can take too much time, more than > we want to wait > * idle-timeout ( no action on the connection for a while ) - call > protocol.timeout_idle() > * sustain-timeout ( max session time ) - close() transport > * ssl-handshake-timeout ( in case ssl is a Transport ) - close transport > * close-timeout (shutdown is async) - close transport hard > * reconnect-timeout - (wait some seconds before reconnecting) - > reconnect connection This is an interesting point. I think some of these really do need APIs in the PEP, others may be implemented using existing machinery (e.g. call_later() to schedule a callback that calls cancel() on a task). I've added a bullet on this to Open Issue. > Now, in case we connect to a host by name, and have multiple addresses > resolved, and the first connection can not be established, there is no > way to 'reconnect()' - as the protocol does not yet exist. Twisted suggested something here which I haven't implemented yet but which seems reasonable -- using a series of short timeouts try connecting to the various addresses and keep the first one that connects successfully. If multiple addresses connect after the first timeout, too bad, just close the redundant sockets, little harm is done (though the timeouts should be tuned that this is relatively rare, because a server may waste significant resources on such redundant connects). > For almost all the timeouts I mentioned - the protocol needs to take > care - so the protocol has to exist before the connection is > established in case of outbound connections. I'm not sure I follow. Can you sketch out some code to help me here? ISTM that e.g. the DNS, connect and handshake timeouts can be implemented by the machinery that tries to set up the connection behind the scenes, and the user's protocol won't know anything of these shenanigans. The code that calls create_transport() (actually it'll probably be renamed create_client()) will just get a Future that either indicates success (and then the protocol and transport are successfully hooked up) or an error (and then no protocol was created -- whether or not a transport was created is an implementation detail). > In case aconnection is lost and reconnecting is required - > .reconnect() is handy, so the protocol can request reconnecting. I'd need more details of how you would like to specify this. > As this does not work with the current Protocols callbacks I propose > Protocols.connection_established() therefore. How does this differ from connection_made()? (I'm trying to follow Twisted's guidance here, they seem to have the longest experience doing these kinds of things. When I talked to Glyph IIRC he was skeptical about reconnecting in general.) > Protocols > I'd outline protocol_factory can be a instance of a class, which can > set specific parameters for 'things' > class p: > def __init__(self, a=1,b=2,c=3): > self.a = a > self.b = b > self.c = c > def __call__(self): > return p(a=self.a, b=self.b, c=self.c) > def ... all protocol methods ...: > pass > > EventLoop.start_serving(p(a=5,b=7), ...) > EventLoop.start_serving(p(a=9,b=4), ...) > > Same Protocol, different parameters for it. No such helper method (or class) is needed. You can use a lambda or functools.partial for the same effect. I'll add a note to the PEP to remind people of this. > + connection_established() > + timeout_dns() > + timeout_idle() > + timeout_connecting() Signatures please? > * data_received(data) - if it was possible to return the number of > bytes consumed by the protocol, and have the Transport buffer the rest > for the next io in call, one would avoid having to do this in every > Protocol on it's own - learned from experience. Twisted has a whole slew of protocol implementation subclasses that implement various strategies like line-buffering (including a really complex version where you can turn the line buffering on and off) and "netstrings". I am trying to limit the PEP's size by not including these, but I fully expect that in practice a set of useful protocol implementations will be created that handles common cases. I'm not convinced that putting this in the transport/protocol interface will make user code less buggy: it seems easy for the user code to miscount the bytes or not return a count at all in a rarely taken code branch. > * eof_received()/connection_lost(exc) - a connection can be closed > clean recv()=0, unclean recv()=-1, errno, SIGPIPE when writing and in > case of SSL even more, it is required to distinguish. Well, this is why eof_received() exists -- to indicate a clean close. We should never receive SIGPIPE (Python disables this signal, so you always get the errno instead). According to Glyph, SSL doesn't support sending eof, so you have to use Content-length or a chunked encoding. What other conditions do you expect from SSL that wouldn't be distinguished by the exception instance passed to connection_lost()? > + nextlayer_is_empty() - called if the Transport (or underlying > Protocol in case of chaining) write buffer is empty - Imagine an http > server sending a 1GB file, you do not want to sent 1GB at once - as > you do not have that much memory, but get a callback if the transport > done sending the chunk you've queued, so you can send the next chunk > of data. That's what the pause()/resume() flow control protocol is for. You read the file (presumably it's a file) in e.g. 16K blocks and call write() for each block; if the transport can't keep up and exceeds its buffer space, it calls protocol.pause() (or perhaps protocol.pause_writing(), see discussion above). > Next, what happens if a dns can not be resolved, ssl handshake (in > case ssl is transport) or connecting fails - in my opinion it's an > error the protocol is supposed to take care of > + error_dns > + error_ssl > + error_connecting The future returned by create_transport() (aka create_client()) will raise the exception. > I'm not that much into futures - so I may have got some things wrong. No problem. You may want to read PEP 3148, it explains Futures and much of that explanation remains valid; just in PEP 3156 to wait for a future you must use "yield from <future>". -- --Guido van Rossum (python.org/~guido) [View Less]

4 8

FYI - wiki.python.org compromised
by Brian Curtin Jan. 8, 2013

Jan. 8, 2013

On December 28th, an unknown attacker used a previously unknown remote code exploit on http://wiki.python.org/. The attacker was able to get shell access as the "moin" user, but no other services were affected. Some time later, the attacker deleted all files owned by the "moin" user, including all instance data for both the Python and Jython wikis. The attack also had full access to all MoinMoin user data on all wikis. In light of this, the Python Software Foundation encourages all wiki users … [View More]

1 0

Vigil
by Mark Adam Jan. 6, 2013

Jan. 6, 2013

There's an interesting python "variant" (more of an overlay actually) that is rather intriguing on github -- Vigil: a truly safe progamming language. >From the readme: "Infinitely more important than mere syntax and semantics are its addition of supreme moral vigilance. This is similar to contracts, but less legal and more medieval." http://github.com/munificent/vigil Mark

3 3

Re: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence
by Nick Coghlan Jan. 6, 2013

Jan. 6, 2013

Gah, the PEP number in the subject should, of course, be 432 (not 342). Cheers, Nick. -- Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia

4 5

Re: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted
by Guido van Rossum Jan. 4, 2013

Jan. 4, 2013

On Fri, Jan 4, 2013 at 2:38 PM, Dustin Mitchell <djmitche(a)gmail.com> wrote: > As the maintainer of a pretty large, complex app written in Twisted, I think > this is great. I look forward to a future of being able to select from a > broad library of async tools, and being able to write tools that can be used > outside of Twisted. Thanks. Me too. :-) > Buildbot began, lo these many years ago, doing a lot of things in memory on > on local disk, neither of which require … [View More]asynchronous IO. So a lot of API > methods did not originally return Deferreds. Those methods are then used by > other methods, many of which also do not return Deferreds. Now, we want to > use a database backend, and parallelize some of the operations, meaning that > the methods need to return a Deferred. Unfortunately, that requires a > complete tree traversal of all of the methods and methods that call them, > rewriting them to take and return Deferreds. There's no "halfway" solution. > This is a little easier with generators (@inlineCallbacks), since the syntax > doesn't change much, but it's a significant change to the API (in fact, this > is a large part of the reason for the big rewrite for Buildbot-0.9.x). > > I bring all this up to say, this PEP will introduce a new "kind" of method > signature into standard Python, one which the caller must know, and the use > of which changes the signature of the caller. That can cause sweeping > changes, and debugging those changes can be tricky. Yes, and this is the biggest unproven point of the PEP. (The rest is all backed by a decade or more of experience.) > Two things can help: > > First, `yield from somemeth()` should work fine even if `somemeth` is not a > coroutine function, and authors of async tools should be encouraged to use > this form to assist future-compatibility. Second, `somemeth()` without a > yield should fail loudly if `somemeth` is a coroutine function. Otherwise, > the effects can be pretty confusing. That would be nice. But the way yield from and generators work, that's hard to accomplish without further changes to the language -- and I don't want to have to change the language again (at least not immediately -- maybe in a few releases, after we've learned what the real issues are). The best I can do for the first requirement is to define @coroutine in a way that if the decorated function isn't a generator, it is wrapped in one. For the second requirement, if you call somemeth() and ignore the result, nothing happens at all -- this is indeed infuriating but I see no way to change this.(*) If you use the result, well, Futures have different attributes than most other objects so hopefully you'll get a loud AttributeError or TypeError soon, but of course if you pass it into something else which uses it, it may still be difficult to track. Hopefully these error messages provide a hint: >>> f.foo Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'Future' object has no attribute 'foo' >>> f() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'Future' object is not callable >>> (*) There's a heavy gun we might use, but I would make this optional, as a heavy duty debugging mode only. @coroutine could wrap generators in a lightweight object with a __del__ method and an __iter__ method. If __del__ is called before __iter__ is ever called, it could raise an exception or log a warning. But this probably adds too much overhead to have it always enabled. > In http://code.google.com/p/uthreads, I accomplished the latter by taking > advantage of garbage collection: if the generator is garbage collected > before it's begun, then it's probably not been yielded. This is a bit > gross, but good enough as a debugging technique. Eh, yeah, what I said. :-) > On the topic of debugging, I also took pains to make sure that tracebacks > looked reasonable, filtering out scheduler code[1]. I haven't looked > closely at Tulip to see if that's a problem. Most of the "noise" in the > tracebacks came from the lack of 'yield from', so it may not be an issue at > all. One of the great advantages of using yield from is that the tracebacks automatically look nice. > Dustin > > [1] > http://code.google.com/p/uthreads/source/browse/trunk/uthreads/core.py#253 -- --Guido van Rossum (python.org/~guido) [View Less]

1 0

Re: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
by Dustin J. Mitchell Jan. 4, 2013

Jan. 4, 2013

As the maintainer of a pretty large, complex app written in Twisted, I think this is great. I look forward to a future of being able to select from a broad library of async tools, and being able to write tools that can be used outside of Twisted. Buildbot began, lo these many years ago, doing a lot of things in memory on on local disk, neither of which require asynchronous IO. So a lot of API methods did not originally return Deferreds. Those methods are then used by other methods, many of … [View More]

1 0

Re: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
by Guido van Rossum Jan. 4, 2013

Jan. 4, 2013

[Markus sent this to me off-list, but agreed to me responding on-list, quoting his entire message.] On Wed, Dec 26, 2012 at 2:38 PM, Markus <nepenthesdev(a)gmail.com> wrote: > Hi, Hi Markus, I don't believe we've met before, have we? It would probably help if you introduced yourself and your experience, since our past experiences color our judgment. > as I've been waiting for this to happen, I decided to speak up. > While I really look forward to this, I disagree with the PEP.… [View More] Heh, we can't all agree on everything. :-) > First shoot should be getting a well established event loop into python. Perhaps. What is your definition of an event loop? > libev is great, it takes care of operating system specialities, and > only does a single job, providing an event loop. It is also written for C, and I presume much of its API design was influenced by the conventions and affordabilities of that language. > This event loop can take care of timers, sockets and signals, But sockets are not native on Windows, and I am making some effort with PEP 3156 to efficiently support higher-level abstractions without tying them to sockets. (The plan is to support IOCP on Windows. The previous version of Tulip already had a branch that did support that, as a demonstration of the power of this abstraction.) > pyev, a > great python wrapper for libev already provides this simple eventing > facility in python. But, being a libev wrapper, it is likely also strongly influenced by C. > In case you embed python in a c program, the libev default loop of the > python code and c code can even be shared, providing a great amount of > flexibility. Only if the C code also uses libev, of course. But C programs may use other event mechanisms -- e.g. AFAIK there are alternatives to libev (during the early stages of Tulip development I chatted a bit with one of the original authors of libevent, Niels Provos, and I believe there's also something called libuv), and GUI frameworks (e.g. X, Qt, Gtk, Wx) tend to have their own event loop. PEP 3156 is designed to let alternative *implementations* of the same *interface* be selected at run time. Hopefully it is possible to provide a conforming implementation using libev -- then your goal (smooth interoperability with C code using libev) is obtained. It's possible that in order to do that the PEP 3156 interface may have to be refactored into separate pieces. The Tulip implementation already has separate "pollster" implementations (which concern themselves *only* with polling for I/O using select, poll, or other alternatives). It probably makes sense to factor the part that implements transports out as well. However, the whole point of including transports and protocols (and futures) in the PEP is that some platforms may want to implement the same high-level API (e.g. create a transport that connects to a certain host/port) using a different approach altogether, e.g. on Windows the transport might not even use sockets. OTOH on UNIX it may be possible to add file descriptors representing pipes and pseudo-ttys. > libev is great as it is small - it provides exactly what's required, > and nothing beyond. Depending on your requirements. :-) > getaddrinfo/getnameinfo/create_transport are out of scope from a event > loop point of view. > This functionality already exists in python, it just does not use a > event loop and is blocking, as every other io related api. It wasn't random to add these. The "event loop" in PEP 3156 provides abstractions that leave the platform free to implement connections using the appropriate native constructs without letting those constructs "leak" into the application -- after all, whether you're on UNIX or on Windows, a TCP connection represents the same abstraction, but the network stack may have a very different interface. > I'd propose not to replicate the functionality in the event loop > namespace, but to extend the existing implementations - by allowing to > provide an event loop/callback/ctx as optional args which get used. That's an interface choice that I would regret (I really don't like writing code using callbacks). (It would also be harder to implement initially as a 3rd party framework. At the lowest level, no changes to Python itself are needed -- it already supports non-blocking sockets, for example. But adding optional callbacks to existing low-level APIs would require changes throughout the stdlib.) > If you specify something like pyev as PEP, you can still come up with > another PEP which defines the semantics for upper layer protocols like > udp/tcp on IPv4/6, which can be used to take care of dns and > 'echo-server-connections'. I could split up the PEP, but that wouldn't really change anything, since to me it is still a package deal. I am willing to put an effort into specifying a low-level event loop because I know that I can still write high-level code which is (mostly) free of callbacks, using futures, tasks and the yield-from construct. And in order to do that I need a minimum set of high-level abstractions such as getaddrinfo() and transport creation (the exact names of the transport creation methods are still under debate, as are the details of their signatures, but the need for them is established without a doubt in my mind). I note that the stdlib socket module has roughly the same set of abstractions bundled together: - socket objects - getaddrinfo(), getnameinfo() - create_connection() - the makefile() methods on socket objects, which create buffered streams PEP 3156 offers alternatives for all of these, using higher-level abstractions that have been developed and proven in practice by Twisted, *and* offers a path to interop to frameworks that previously couldn't very well interoperate -- Twisted, Tornado, and others have traditionally been pretty segregated, but with PEP 3156 they can interoperate both through the event loop and through Futures (which are friendly both to a callback style and to yield-from). > Anyway, I really hope you'll have a look on libev and pyev, both is > great and well tested software and may give you an idea what people > who dedicate themselves to event loops came up with already in terms > of names, subclassing, requirements, guarantees and workarounds for > platform specific failures (kqueue, epoll ...). I will certainly have a look! I am not so concerned about naming (it seems inevitable that everyone uses somewhat different terminology anyway, and it is probably better not to reuse terms when the meaning is different), but I do like to look at guarantees (or the absence thereof!) and best practices for dealing with the differences between platforms. > http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod > http://code.google.com/p/pyev/ > > All together, I'd limit the scope of the PEP to the API of the event > loop, just focussing on io/timers/signals and propose to extend > existing API to be usable with an event loop, instead of replicating > it. You haven't convinced me about this. However, you can help me by comparing the event loop part of PEP 3156 (ignoring anything that returns or takes a Future) to libev and pointing out things (either specific APIs or certain guarantees or requirements) that would be hard to implement using libev, as well as useful features in libev that you think every event loop should have. > For naming I'd prefer 'watcher' over 'Handler'. Hm, 'watcher' to me sounds more active than the behavior I have in mind for this class. It is just a reification of a specific function and some arguments to pass to it, with the ability to cancel the call altogether. Thanks for writing! -- --Guido van Rossum (python.org/~guido) [View Less]

1 0

Identity dicts and sets
by Serhiy Storchaka Jan. 3, 2013

Jan. 3, 2013

I propose to add new standard collection types: IdentityDict and IdentitySet. They are almost same as ordinal dict and set, but uses identity check instead of equality check (and id() or hash(id()) as a hash). They will be useful for pickling, for implementing __sizeof__() for compound types, and for other graph algorithms. Of course, they can be implemented using ordinal dicts: IdentityDict: key -> value as a dict: id(key) -> (key, value) IdentitySet as a dict: id(value) -… [View More]

7 12

Re: [Python-ideas] Identity dicts and sets
by Serhiy Storchaka Jan. 3, 2013

Jan. 3, 2013

середа 02 січень 2013 21:43:47 Eli Bendersky ви написали: > I agree that the data structures may be useful, but is there no way to some > allow the customization of existing data structures instead, without losing > performance? It's a shame to have another kind of dict just for this > purpose. What interface for the customization is possible? Obviously, a dict constructor can't have a special keyword argument.

6 9

Preventing out of memory conditions
by Max Moroz Jan. 3, 2013

Jan. 3, 2013

Sometimes, I have the flexibility to reduce the memory used by my program (e.g., by destroying large cached objects, etc.). It would be great if I could ask Python interpreter to notify me when memory is running out, so I can take such actions. Of course, it's nearly impossible for Python to know in advance if the OS would run out of memory with the next malloc call. Furthermore, Python shouldn't guess which memory (physical, virtual, etc.) is relevant in the particular situation (for instance,… [View More]

7 8