Mailman 3 September 2010 - Python-ideas

Re: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object
by James Yonan 23 Sep '10

23 Sep '10

I think that Glyph hit the nail on the head when he said that "you can go from any arbitrary Future to a full-featured Deferred, but not the other way around." This is exactly my concern, and the reason why I think it's important for Python to standardize on an async result type that is sufficiently general that it can accommodate the different kinds of async semantics in common use in the Python world today. If you don't think this is a problem, just Google for "twisted vs. tornado". While the debate is sometimes passionate and rude, it points to the fragmentation that has occured in the Python async space due to the lack of direction from the standard library. And there's a real cost to this fragmentation -- it's not easy to build an application that uses different async frameworks when there's no standardized result object or reactor model. My concern is that PEP 3148 was really designed for the purpose of thread and process pooling, and that the Future object is designed with the minimum functionality required to achieve this end. The problem is that the Future object starts to look like a stripped-down version of a Twisted Deferred. And that begs the question of why are we standardizing on the special case and not the general case? Wouldn't it be better to break this into two problems: * Develop a full-featured standard async result type and reactor model to facilitate interoperability of different async libraries. This would consist of a standard async result type and an abstract base class for a reactor model. * Let PEP 3148 focus on the problem of thread and process pooling and leverage on the above async result type. The semantics that a general async type should support include: 1. Semantics that allow you to define a callback channel for results and and optionally a separate channel for exceptions as well. 2. Semantics that offer the flexibility of working with async results at the callback level or at the generator level (having a separate channel for exceptions makes it easy for the generator decorator implementation (that facilitates "yield function_returning_async_object()") to dispatch exceptions into the caller). 3. Semantics that can easily be used to pass results and exceptions back from thread or process pools. 4. Semantics that allow for aggregate processing of parallel asynchronous results, such as "fire async result when all of the async results in an async set have fired" or "fire async result when the first result from an async set has fired." Deferreds presently support all of the above. My point here is not so much that Deferreds should be the standard, but that whatever standard is chosen, that the semantics be general enough that different async Python libraries/platforms can interoperate. James > Thanks for the ping about this (I don't think I subscribe to python-ideas, so someone may have to moderate my post in). Sorry for the delay in responding, but I've been kinda busy and cooking up these examples took a bit of thinking. > > And thanks, James, for restarting this discussion. I obviously find it interesting :). > > I'm going to mix in some other stuff I found on the web archives, since it's easiest just to reply in one message. I'm sorry that this response is a bit sprawling and doesn't have a single clear narrative, the thread thus far didn't seem to lend it to one. > > For those of you who don't want to read my usual novel-length post, you can probably stop shortly after the end of the first block of code examples. > > On Sep 11, 2010, at 10:26 PM, Guido van Rossum wrote: > >>>> although he didn't say what >>>> deferreds really added beyond what futures provide, and why the >>>> "add_done_callback" method isn't adequate to provide interoperability >>>> between futures and deferreds (which would be odd, since Brian made >>>> changes to that part of PEP 3148 to help with that interoperability >>>> after discussions with Glyph). >>>> >>>> Between PEP 380 and PEP 3148 I'm not really seeing a lot more scope >>>> for standardisation in this space though. >>>> >>>> Cheers, >>>> Nick. >>> >>> That was my initial reaction as well, but I'm more than open to >>> hearing from Jean Paul/Glyph and the other twisted folks on this. > >> But thinking about this more I don't know that it will be easy to mix >> PEP 3148, which is solidly thread-based, with a PEP 342 style >> scheduler (whether or not the PEP 380 enhancements are applied, or >> even PEP 3152). And if we take the OP's message at face value, his >> point isn't so much that Twisted is great, but that in order to >> benefit maximally from PEP 342 there needs to be a standard way of >> using callbacks. I think that's probably true. And comparing the >> blog's examples to PEP 3148, I find Twisted's terminology rather >> confusing compared to the PEP's clean Futures API (where IMO you can >> ignore almost everything except result()). > > That blog post was written to demonstrate why programs using generators are "... far easier to read and write ..." than ones using Deferreds, so it stands to reason it would choose an example where that helps :). > > When you want to write systems that manage varying levels of parallelism within a single computation, generators can start to get pretty hairy and the "normal" Deferred way of doing things looks more straightforward. > > Thinking in terms of asynchronicity is tricky, and generators can be a useful tool for promoting that understanding, but they only make it superficially easier. For example: > >>>> def serial(): >>>> results = set() >>>> for x in ...: >>>> results.add((yield do_something_async(x))) >>>> return results > > If you're writing an application whose parallelism calls for an asynchronous approach, after all, you presumably don't want to be standing around waiting for each network round trip to complete. How do you re-write this so that there are always at least N outstanding do_something_async calls running in parallel? > > You can sorta do it like this: > >>>> def parallel(N): >>>> results = set() >>>> outstanding = [] >>>> for x in ...: >>>> if len(outstanding) > N: >>>> results.add((yield outstanding.pop(0))) >>>> else: >>>> outstanding.append(do_something_async(x)) > > but that will always block on one particular do_something_async, when you really want to say "let me know when any outstanding call is complete". So I could handwave about 'yield any_completed(outstanding)'... > >>>> def parallel(N): >>>> results = set() >>>> outstanding = set() >>>> for x in ...: >>>> if len(outstanding) > N: >>>> results.add((yield any_completed(outstanding))) >>>> else: >>>> outstanding.add(do_something_async(x)) > > but that just begs the question of how you implement any_completed(), and I can't think of a way to do that with generators, without getting into the specifics of some Deferred-or-Future-like asynchronous result object. You could implement such a function with such primitives, and here's what it looks like with Deferreds: > >>>> def any_completed(setOfDeferreds): >>>> d = Deferred() >>>> called = [] >>>> def fireme(result, whichDeferred): >>>> if not called: >>>> called.append(True) >>>> setOfDeferreds.remove(whichDeferred) >>>> d.callback(result) >>>> return result >>>> for subd in setOfDeferreds: >>>> subd.addBoth(fireme, subd) >>>> return d > > Here's how you do the top-level task in Twisted, without generators, in the truly-parallel fashion (keep in mind this combines the functionality of 'any_completed' and 'parallel', so it's a bit shorter): > >>>> def parallel(N): >>>> ds = DeferredSemaphore(N) >>>> l = [] >>>> def release(result): >>>> ds.release() >>>> return result >>>> def after(sem, it): >>>> return do_something_async(it) >>>> for x in ...: >>>> l.append(ds.acquire().addCallback(after_acquire, x).addBoth(release)) >>>> return gatherResults(l).addCallback(set) > > Some informal benchmarking has shown this method to be considerably faster (on the order of 1/2 to 1/3 as much CPU time) than at least our own inlineCallbacks generator-scheduling method. Take this with the usual fist-sized grain of salt that you do any 'informal' benchmarks, but the difference is significant enough that I do try to refactor into this style in my own code, and I have seen performance benefits from doing this on more specific benchmarks. > > This is all untested, and that's far too many lines of code to expect to work without testing, but hopefully it gives a pretty good impression of the differences in flavor between the different styles. > >> Yeah, please do explain why Twisted has so much machinery to handle exceptions? > > There are a lot of different implied questions here, so I'll answer a few of those. > > Why does twisted.python.failure exist? The answer to that is that we wanted an object that represented an exception as raised at a particular point, associated with a particular stack, that could live on without necessarily capturing all the state in that stack. If you're going to report failures asynchronously, you don't necessarily want to hold a reference to every single thing in a potentially giant stack while you're waiting to send it to some network endpoint. Also, in 1.5.2 we had no way of chaining exceptions, and this code is that old. Finally, even if you can chain exceptions, it's a serious performance hit to have to re-raise and re-catch the same exception 4 or 5 times in order to translate it or handle it at many different layers of the stack, so a Failure is intended to encapsulate that state such that it can just be returned, in performance-sensitive areas. (This is sort of a weak point though, since the performance of Failure itself is so terrible, for u nrelated reasons.) > > Why is twisted.python.failure such a god damned mess? The answer to that is ... uh, sorry. Yes, it is. We should clean it up. It was written a long time ago and the equivalent module now could be _much_ shorter, simpler, and less of a performance problem. It just never seems to be the highest priority. Maybe after we're done porting to py3 :). My one defense here is that still a slight improvement over the stdlib 'traceback' module ;-). > > Why do Deferreds have an errback chain rather than just handing you an exception object in the callback chain? Basically, this is for the same reason that Python has exceptions instead of just making you check return codes. We wanted it to be easy to say: > >>>> d = getPage("http://...") >>>> def ok(page): >>>> doSomething(...) >>>> d.addCallback(ok) > > and know that the argument to 'ok' would always be what getPage promised (you don't need to typecheck it for exception-ness) and the default error behavior would be to simply bail out with a traceback, not to barrel through your success-path code wreaking havoc. > >> ISTM that the main difference is that add_done_callback() isn't meant for callbacks that return a value. > > > add_done_callback works fine with callbacks that return a value. If it didn't, I'd be concerned, because then it would have the barrel-through-the-success-path flaw. But, I assume the idiomatic asynchronous-code-using-Futures would look like this: > >>>> f = some_future_thing(...) >>>> def my_callback(future): >>>> result = future.result() >>>> do_something(result) >>>> f.add_done_callback(my_callback) > > This is one extra line of code as compared to the Twisted version, and chaining involves a bit more gymnastics (somehow creating more futures to return further up the stack, I guess, I haven't thought about it too hard), but it does allow you to handle exceptions with a simple 'except:', rather than calling some exception-handling methods, so I can see why some people would prefer it. > >> Maybe it's possible to write a little framework that lets you create Futures using either threads, processes (both supported by PEP 3148) or generators. But I haven't tried it. And maybe the need to use 'yield' for everything that may block when using generators, but not when using threads or processes, will make this awkward. > > You've already addressed the main point that I really wanted to mention here, but I'd like to emphasize it. Blocking and not-blocking are fundamentally different programming styles, and if you sometimes allow blocking on asynchronous results, that means you are effectively always programming in the blocking-and-threaded style and not getting much benefit from the code which does choose to be politely non-blocking. > > I was somewhat pleased with the changes made to the Futures PEP because you could use them as an asynchronous result, and have things that implemented the Future API but raised an exception if you tried to wait on them. That would at least allow some layer of stdlib compatibility. If you are disciplined and careful, this would let you write async code which used a common interoperability mechanism, and if you weren't careful, it would blow up when you tried to use it the wrong way. > > But - and I am guessing that this is the main thrust of this discussion - I do think that having Deferred in the standard library would be much, much better if we can do that. > >> So maybe we'll be stuck with at least two Future-like APIs: PEP 3148 and something else, generator-based. > > Having something "generator-based" is, in my opinion, an abstraction inversion. The things which you are yielding from these generators are asynchronous results. There should be a specific type for asynchronous results which can be easily interacted with. Generators are syntactic sugar for doing that interaction in a way which doesn't involve defining tons of little functions. This is useful, and it makes the concept more accessible, so I don't say "just" syntactic sugar: but nevertheless, the generators need to be 'yield'ing something, and the type of thing that they're yielding is a Deferred-or-something-like-it. > > I don't think that this is really two 'Future-like APIs'. At least, they're not redundant, any more than having both socket.makefile() and socket.recv() is redundant. > > If Future had a deferred() method rather than an add_done_callback() method, then it would always be very clear whether you had a synchronous-but-possibly-not-ready or a purely-asynchronous result. Although it would be equally easy to just have a function that turned a Future into a Deferred by calling add_done_callback(). You can go from any arbitrary Future to a full-featured Deferred, but not the other way around. > >> Or maybe PEP 3152. > > > I don't like PEP 3152 aesthetically on many levels, but I can't deny that it would do the job. 'cocall', though, really? It would be nice if it read like an actual word, i.e. "yield to" or "invoke" or even just "call" or something. > > In another message, where Guido is replying to Antoine: > >>> I think the main reason, though, that people find Deferreds inconvenient is that they force you to think in terms of asynchronicity (...) >> >> Actually I think the main reason is historic: Twisted introduced callback-based asynchronous (thread-less) programming when there was no alternative in Python, and they invented both the mechanisms and the terminology as they were figuring it all out. That is no mean feat. But with PEP 342 (generator-based coroutines) and especially PEP 380 (yield from) there *is* an alternative, and while Twisted has added APIs to support generators, it hasn't started to deprecate its other APIs, and its terminology becomes hard to follow for people (like me, frankly) who first learned this stuff through PEP 342. > > I really have to go with Antoine on this one: people were confused about Deferreds long before PEP 342 came along :). Given that Javascript environments have mostly adopted the Twisted terminology (oddly, Node.js doesn't, but Dojo and MochiKit both have pretty literal-minded Deferred translations), there are plenty of people who are familiar with the terminology but still get confused. > > See the beginning of the message for why we're not deprecating our own APIs. > > Once again, sorry for not compressing this down further! If you got this far, you win a prize :).

7 7

New 3.x restriction in list comprehensions
by Raymond Hettinger 18 Sep '10

18 Sep '10

In Python2, you can transform: r = [] for x in 2, 4, 6: r.append(x*x+1) into: r = [x*x+1 for x in 2, 4, 6] In Python3, the first still works but the second gives a SyntaxError. It wants the 2, 4, 6 to have parentheses. The good parts of the change: + it matches what genexps do + that simplifies the grammar a bit (listcomps bodies and genexp bodies) + a listcomp can be reliably transformed to a genexp The bad parts: + The restriction wasn't necessary (we could undo it) + It makes 2-to-3 conversion a bit harder + It no longer parallels other paren-free tuple constructions: return x, y yield x, y t = x, y ... + It particular, it no longer parallels regular for-loop syntax The last part is the one that seems the most problematic. If you write for-loops day in and day out with the unrestricted syntax, you (or least me) will tend to do the wrong thing when writing a list comprehension. It is a bit jarring to get the SyntaxError when the code looks correct -- it took me a bit of fiddling to figure-out what was going on. My question for the group is whether it would be a good idea to drop the new restriction. Raymond

7 6

Cofunctions: It's alive! Its alive!
by Greg Ewing 17 Sep '10

17 Sep '10

I've been doing some more hacking, and I now have a working implementation of cofunctions, which I'll upload somewhere soon. I have also translated my yield-from examples to use cofunctions. In the course of doing this, the additional restrictions that cofunctions impose have already proved their worth -- I forgot a cocall, and it clearly told me so and pointed out exactly where it had to go! -- Greg

11 24

list.sort with a int or str key
by Daniel Stutzbach 17 Sep '10

17 Sep '10

list.sort, sorted, and similar methods currently have a "key" argument that accepts a callable. Often, that leads to code looking like this: mylist.sort(key=lambda x: x[1]) myotherlist.sort(key=lambda x: x.length) I would like to propose that the "key" parameter be generalized to accept str and int types, so the above code could be rewritten as follows: mylist.sort(key=1) myotherlist.sort(key='length') I find the latter to be much more readable. As a bonus, performance for those cases would also improve. -- Daniel Stutzbach <http://stutzbachenterprises.com>

10 11

Re: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object
by Glyph Lefkowitz 16 Sep '10

16 Sep '10

On Sep 15, 2010, at 6:09 PM, exarkun(a)twistedmatrix.com wrote: > > Glyph meant this: > > def parallel(N): > ds = DeferredSemaphore(N) > l = [] > for x in ...: > l.append(ds.run(do_something_async, it)) > return gatherResults(l).addCallback(set) > > Jean-Paul I knew it should have looked shorter and sweeter. Thanks.

1 0

Re: [Python-ideas] [Python-Dev] Python needs a standard asynchronous return object
by Guido van Rossum 16 Sep '10

16 Sep '10

Moving to python-ideas. Have you seen http://www.python.org/dev/peps/pep-3148/ ? That seems exactly what you want. --Guido On Fri, Sep 10, 2010 at 4:00 PM, James Yonan <james(a)openvpn.net> wrote: > I'd like to propose that the Python community standardize on a "deferred" > object for asynchronous return values, modeled after the well-thought-out > Twisted Deferred class. > > With more and more Python libraries implementing asynchronicity (for example > Futures -- PEP 3148), it's crucial to have a standard deferred object in > place so that code using a single asynchronous reactor can interoperate with > different asynchronous libraries. > > I think a lot of people don't realize how much cooler and more elegant it is > to return a deferred object from an asynchronous function rather than using > a generic callback approach (where you pass a function argument to the > asynchronous function telling it where to call when the asynchronous > operation completes). > > While asynchronous systems have been shown to have excellent scalability > properties, the callback-based programming style often used in asynchronous > programming has been criticized for breaking up the sequential readability > of program logic. > > This problem is elegantly addressed by using Deferred Generators. Since > Python 2.5 added enhanced generators (i.e. the capability for "yield" to > return a value), the infrastructure is now in place to allow an asynchronous > function to be written in a sequential style, without the use of explicit > callbacks. > > See the following blog article for a nice write-up on the capability: > > http://blog.mekk.waw.pl/archives/14-Twisted-inlineCallbacks-and-deferredGen… > > Mekk's Twisted Deferred example: > > @defer.inlineCallbacks > def someFunction(): > a = 1 > b = yield deferredReturningFunction(a) > c = yield anotherDeferredReturningFunction(a, b) > defer.returnValue(c) > > What's cool about this is that between the two yield statements, the Twisted > reactor is in control meaning that other pending asynchronous tasks can be > attended to or the thread's remaining time slice can be yielded to the > kernel, yet this is all accomplished without the use of multi-threading. > Another interesting aspect of this approach is that since it leverages on > Python's enhanced generators, an exception thrown inside either of the > deferred-returning functions will be propagated through to someFunction() > where it can be handled with try/except. > > Think about what this means -- this sort of emulates the "stackless" design > pattern you would expect in Erlang or Stackless Python without leaving > standard Python. And it's made possible under the hood by Python Enhanced > Generators. > > Needless to say, it would be great to see this coolness be part of the > standard Python library, instead of having every Python asynchronous library > implement its own ad-hoc callback system. > > James Yonan > _______________________________________________ > Python-Dev mailing list > Python-Dev(a)python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido)

5 8

Using * in indexes
by Greg Ewing 15 Sep '10

15 Sep '10

I just found myself writing a method like this: def __getitem__(self, index): return self.data[(Ellipsis,) + index + (slice(),)] I would have liked to write it like this: self.data[..., index, :] because that would make it much easier to see what's being done. However, that won't work if index is itself a tuple of index elements. So I'd like to be able to do this: self.data[..., *index, :] -- Greg

4 5

Why not break cycles with one __del__?
by yoav glazner 14 Sep '10

14 Sep '10

Hi! I was thinking, why not let python gc break cycles with only one object.__del__ ? I don't see a problem with calling the __del__ method and then proceed as usual (break the cycle if it wasn't already broken by __del__) Many Thanks, Yoav Glazner

11 14

Why not f(*my_list, *my_other_list) ?
by cool-RR 11 Sep '10

11 Sep '10

I noticed that it's impossible to call a Python function with two starred argument lists, like this: `f(*my_list, *my_other_list)`. I mean, if someone wants to feed two lists of arguments into a function, why not? I understand why you can't have two stars in a function definition; But why can't you have two (or more) stars in a function call? Ram.

9 8

with statement syntax forces ugly line breaks?
by Mark Summerfield 10 Sep '10

10 Sep '10

Hi, I can't see a _nice_ way of splitting a with statement over mulitple lines: class FakeContext: def __init__(self, name): self.name = name def __enter__(self): print("enter", self.name) def __exit__(self, *args): print("exit", self.name) with FakeContext("a") as a, FakeContext("b") as b: pass # works fine with FakeContext("a") as a, FakeContext("b") as b: pass # synax error with (FakeContext("a") as a, FakeContext("b") as b): pass # synax error The use case where this mattered to me was this: with open(args.actual, encoding="utf-8") as afh, open(args.expected, encoding="utf-8") as efh: actual = [line.rstrip("\n\r") for line in afh.readlines()] expected = [line.rstrip("\n\r") for line in efh.readlines()] Naturally, I could split the line in an ugly place: with open(args.actual, encoding="utf-8") as afh, open(args.expected, encoding="utf-8") as efh: but it seems a shame to do so. Or am I missing something? I'm using Python 3.1.2. -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Rapid GUI Programming with Python and Qt" - ISBN 0132354187 http://www.qtrac.eu/pyqtbook.html

15 23