Mailman 3 October 2012 - Python-ideas

Is there a good reason to use * for multiplication?
by Ram Rachum Oct. 18, 2012

Oct. 18, 2012

Hi everybody, Today a funny thought occurred to me. Ever since I've learned to program when I was a child, I've taken for granted that when programming, the sign used for multiplication is *. But now that I think about it, why? Now that we have Unicode, why not use · ? Do you think that we can make Python support · in addition to *? I can think of a couple of problems, but none of them seem like deal-breakers: - Backward compatibility: Python already uses *, but I don't see a … [View More]

25 47

asyncore: included batteries don't fit
by chrysn Oct. 18, 2012

Oct. 18, 2012

hello python-ideas, i'd like to start discussion about the state of asyncore/asynchat's adaption in the python standard library, with the intention of finding a roadmap for how to improve things, and of kicking off and coordinating implementations. here's the problem (as previously described in [issue15978] and redirected here, with some additions): the asyncore module would be much more useful if it were well integrated in the standard library. in particular, it should be supported by: * … [View More]subprocess * BaseHTTPServer / http.server (and thus, socketserver) * urllib2 / urllib, http.client * probably many other network libraries except smtpd, which already uses asyncore * third party libraries (if stdlib leads the way, the ecosystem will follow; eg pyserial) without widespread asyncore support, it is not possible to easily integrate different servers and services with each other; with asyncore support, it's just a matter of creating the objects and entering the main loop. (eg, a http server for controlling a serial device, with a telnet-like debugging interface). some examples of the changes required: * the socketserver documents that it would like to have such a framework ("Future work: [...] Standard framework for select-based multiplexing"). due to the nature of socketserver based implementations (blocking reads), we can't just "add glue so it works", but there could be extensions so that implementations can be ported to asynchronous socketservers. i've done if for a particular case (ported SimpleHTTPServer, but it's a mess of monkey-patching and intermediate StringIOs). * for subprocess, there's a bunch of recipies at [1]. * pyserial (not standard library, but might as well become) can be ported quite easily [2]. this touches several modules whose implementations can be handled independently from each other; i'd implement some of them myself. terry.reedy redirected me from the issue tracker to this list, hoping for controversy and alternatives. if you'd like to discuss, throw in questions, and we'll find a solution. if you'd think talk is cheap, i can try to work out first sketches. python already has batteries for nonblocking operation included, and i say it's doing it right -- let's just make sure the batteries fit in the other gadgets! yours truly chrysn [1] http://code.activestate.com/recipes/576957-asynchronous-subprocess-using-as… [2] http://sourceforge.net/tracker/?func=detail&aid=3559321&group_id=46487&atid… [issue15978] http://bugs.python.org/issue15978 -- Es ist nicht deine Schuld, dass die Welt ist, wie sie ist -- es wär' nur deine Schuld, wenn sie so bleibt. (You are not to blame for the state of the world, but you would be if that state persisted.) -- Die Ärzte [View Less]

19 49

Expressiveness of coroutines versus Deferred callbacks (or possibly promises, futures)
by Glyph Oct. 17, 2012

Oct. 17, 2012

Still working my way through zillions of messages on this thread, trying to find things worth responding to, I found this, from Guido: > [Generators are] more flexible [than Deferreds], since it is easier to catch different exceptions at different points (...) In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the … [View More]real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change. If you were actually paying attention, we did explain what "the real stuff" is, and why you can't do it with inlineCallbacks. ;-) (Or perhaps I should say, why we prefer to do it with Deferreds explicitly.) Managing parallelism is easy with the when-this-then-that idiom of Deferreds, but challenging with the sequential this-then-this-then-this idiom of generators. The examples in the quoted message were all sequential workflows, which are roughly equivalent in both styles. As soon as a for loop gets involved though, yield-based coroutines have a harder time expressing the kind of parallelism that a lot of applications should use, so it's easy to become accidentally sequential (and therefore less responsive) even if you don't need to be. For example, using some hypothetical generator coroutine library, the idiomatic expression of a loop across several request/responses would be something like this: @yield_coroutine def something_async(): values = yield step1() results = set() for value in values: results.add(step3((yield step2(value)))) return_(results) Since it's in a set, the order of 'results' doesn't actually matter; but this code needs to sit and wait for each result to come back in order; it can't perform any processing on the ones that are already ready while it's waiting. You express this with Deferreds: def something_deferred(): return step1().addCallback( lambda values: gatherResults([step2(value).addCallback(step3) for value in values])).addCallback(set) In addition to being a roughly equivalent amount of code (fewer lines, but denser), that will run step2() and step3() on demand, as results are ready from the set of Deferreds from step1. That means that your program will automatically spread out its computation, which makes better use of time as results may be arriving in any order. The problem is that it is difficult to express laziness with generator coroutines: you've already spent the generator-ness on the function on responding to events, so there's no longer any syntactic support for laziness. (There's another problem where sometimes you can determine that work needs to be done as it arrives; that's an even trickier abstraction than Deferreds though and I'm still working on it. I think I've mentioned <http://tm.tl/1956> already in one of my previous posts.) Also, this is not at all a hypothetical or academic example. This pattern comes up all the time in e.g. web-spidering and chat applications. To be fair, you could express this in a generator-coroutine library like this: @yield_coroutine def something_async(): values = yield step1() thunks = [] @yield_coroutine def do_steps(value): return_(step3((yield step2(value)))) for value in values: thunks.append(do_steps(value)) return_(set((yield multi_wait(thunks)))) but that seems bizarre and not very idiomatic; to me, it looks like the confusing aspects of both styles. David Reid also wrote up some examples of how Deferreds can express sequential workflows more nicely as well (also indirectly as a response to Guido!) on his blog, here: <http://dreid.org/2012/03/30/deferreds-are-a-dataflow-abstraction>. > Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle), inlineCallbacks (and the even-earlier deferredGenerator) predates Monocle. That's not to say Monocle has no value; it is a portability layer between Twisted and Tornado that does the same thing inlineCallbacks does but allows you to do it even if you're not using Deferreds, which will surely be useful to some people. I don't want to belabor this point, but it bugs me a little bit that we get so much feedback from the broader Python community along the lines of "Why doesn't Twisted do X? I'd use it if it did X, but it's all weird and I don't understand Y that it forces me to do instead, that's why I use Z" when, in fact: Twisted does do X It's done X for years It actually invented X in the first place There are legitimate reasons why we (Twisted core developers) suggest and prefer Y for many cases, but you don't need to do it if you don't want to follow our advice Thing Z that is being cited as doing X actually explicitly mentions Twisted as an inspiration for its implementation of X It's fair, of course, to complain that we haven't explained this very well, and I'll cop to that unless I can immediately respond with a pre-existing URL that explains things :). One other comment that's probably worth responding to: > I suppose on systems that support both networking and GUI events, in my design these would use different I/O objects (created using different platform-specific factories) and the shared reactor API would sort things out based on the type of I/O object passed in to it. In my opinion, it is a mistake to try to harmonize or unify all GUI event systems, unless you are also harmonizing the GUI itself (i.e. writing a totally portable GUI toolkit that does everything). And I think we can all agree that writing a totally portable GUI toolkit is an impossibly huge task that is out of scope for this (or, really, any other) discussion. GUI systems can already dispatch its event to user code just fine - interposing a Python reactor API between the GUI and the event registration adds additional unnecessary work, and may not even be possible in some cases. See, for example, the way that Xcode (formerly Interface Builder) and the Glade interface designer use: the name of the event handler is registered inside a somewhat opaque blob, which is data and not code, and then hooked up automatically at runtime based on reflection. The code itself never calls any event-registration APIs. Also, modeling all GUI interaction as a request/response conversation is limiting and leads to bad UI conventions. Consider: the UI element that most readily corresponds to a request/response is a modal dialog box. Does anyone out there really like applications that consist mainly of popping up dialog after dialog to prompt you for the answers to questions? -g [View Less]

3 4

Proposal: A simple protocol for generator tasks
by Piet Delport Oct. 16, 2012

Oct. 16, 2012

[This is a lengthy mail; I apologize in advance!] Hi, I've been following this discussion with great interest, and would like to put forward a suggestion that might simplify some of the questions that are up in the air. There are several key point being considered: what exactly constitutes a "coroutine" or "tasklet", what the precise semantics of "yield" and "yield from" should be, how the stdlib can support different event loops and reactors, and how exactly Futures, Deferreds, and other … [View More]APIs fit into the whole picture. This mail is mostly about the first point: I think everyone agrees roughly what a coroutine-style generator is, but there's enough variation in how they are used, both historically and presently, that the concept isn't as precise as it should be. This makes them hard to think and reason about (failing the "BDFL gets headaches" test), and makes it harder to define the behavior of all the parts that they interact with, too. This is a sketch of an attempt to define what constitutes a generator-based task or coroutine more rigorously: I think that the essential behavior can be captured in a small protocol, building on the generator and iterator protocols. If anyone else thinks this is a good idea, maybe something like this could work its way into a PEP? (For the sake of this mail, I will use the term "generator task" or "task" as a straw man term, but feel free to substitute "coroutine", or whatever the preferred name ends up being.) Definition ========== Very informally: A "generator task" is what you get if you take a normal Python function and replace its blocking calls with "yield from" calls to equivalent subtasks. More formally, a "generator task" is a generator that implements an incremental, multi-step computation, and is intended to be externally driven to completion by a runner, or "scheduler", until it delivers a final result. This driving process happens as follows: 1. A generator task is iterated by its scheduler to yield a series of intermediate "step" values. 2. Each value yielded as a "step" represents a scheduling instruction, or primitive, to be interpreted by the task's scheduler. This scheduling instruction can be None ("just resume this task later"), or a variety of other primitives, such as Futures ("resume this task with the result of this Future"); see below for more. 3. The scheduler is responsible for interpreting each "step" instruction as appropriate, and sending the instruction's result, if any, back to the task using send() or throw(). A scheduler may run a single task to completion, or may multiplex execution between many tasks: generator tasks should assume that other tasks may have executed while the task was yielding. 4. The generator task completes by successfully returning (raising StopIteration), or by raising an exception. The task's caller receives this result. (For the sake of discussion, I use "the scheduler" to refer to whoever calls the generator task's next/send/throw methods, and "the task's caller" to refer to whoever receives the task's final result, but this is not important to the protocol: a task should not care who drives it or consumes its result, just like an iterator should not.) Scheduling instructions / primitives ==================================== (This could probably use a better name.) The protocol is intentionally agnostic about the implementation of schedulers, event loops, or reactors: as long as they implement the same set of scheduling primitives, code should work across them. There multiple ways to accomplish this, but one possibility is to have a set common, generic instructions in a standard library module such as "tasklib" (which could also contain things like default scheduler implementations, helper functions, and so on). A partial list of possible primitives (the names are all made up, not serious suggestions): 1. None: The most basic "do nothing" instruction. This just instructs the scheduler to resume the yielding task later. 2. Futures: Instruct the scheduler to resume with the future's result. Similar types in third-party libraries, such Deferreds, could potentially be implemented either natively by a scheduler that supports it, or using a wait_for_deferred(d) helper task, or using the idea of a "adapter" scheduler (see below). 3. Control primitives: spawn, sleep, etc. - Spawn a new (independent) task: yield tasklib.spawn(task()) - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar()) - Delay execution: yield tasklib.sleep(seconds) - etc. These could be simple marker objects, leaving it up to the underlying scheduler to actually recognize and implement them; some could also be implemented in terms of simpler operations (e.g. sleep(), in terms of lower-level suspend and resume operations). 4. I/O operations This could be anything from low-level "yield fd_readable(sock)" style requests, or any of the higher-level APIs being discussed elsewhere. Whatever the exact API ends up being, the scheduler should implement these primitives by waiting for the I/O (or condition), and resuming the task with the result, if any. 5. Cooperative concurrency primitives, for working with locks, condition variables, and so on. (If useful?) 6. Custom, scheduler-specific instructions: Since a generator task can potentially yield anything as a scheduler instruction, it's not inconceivable for specialized schedulers to support specialized instructions. (Code that relies on such special instructions won't work on other schedulers, but that would be the point.) A question open to debate is what a scheduler should do when faced with an unrecognized scheduling instruction. Raising TypeError or NotImplementedError back into the task is probably a reasonable action, and would allow code like: def task(): try: yield fancy_magic_instruction() except NotImplementedError: yield from boring_fallback() ... Generator tasks as schedulers, and vice versa ============================================= Note that there is a symmetry to the protocol when a generator task calls another using "yield from": def task() spam = yield from subtask() Here, task() is both a generator task, and the effective scheduler for subtask(): it "implements" subtask()'s scheduling instructions by delegating them to its own scheduler. This is a plain observation on its own, however, it raises one or two interesting possibilities for more interesting schedulers implemented as generator tasks themselves, including: - Specialized sub-schedulers that run as a normal task within their parent scheduler, but implement for example weighted or priority queuing of their subtasks, or similar features. - "Adapter" schedulers that intercept special scheduler instructions (say, Deferreds or other library-specific objects), and implement them using more generic instructions to the underlying scheduler. -- Piet Delport [View Less]

4 5

re-implementing Twisted for fun and profit
by Glyph Oct. 16, 2012

Oct. 16, 2012

There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days. There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion. Would everyone interested in this please please please read <https://github.com/lvh/async-pep/blob/master/pep-3153.rst> several times? Especially this section: <https://github.com/lvh/async-pep/blob/… [View More]

6 16

filename comparison [was] Re: PEP 428 - object-oriented filesystem paths
by Jim Jewett Oct. 16, 2012

Oct. 16, 2012

On 10/8/12, Greg Ewing <greg.ewing(a)canterbury.ac.nz> wrote: > Ronald Oussoren wrote: >> neither statvs, statvfs, nor pathconf seem to be able to tell if a >> filesystem is case insensitive. > Even if they could, you wouldn't be entirely out of the woods, > because different parts of the same path can be on different > file systems... > But how important is all this anyway? I'm trying to think of > occasions when I've wanted to compare two entire paths … [View More]

3 2

The async API of the future: Reactors
by Guido van Rossum Oct. 15, 2012

Oct. 15, 2012

[This is the first spin-off thread from "asyncore: included batteries don't fit"] On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell <ben(a)bendarnell.com> wrote: > On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum <guido(a)python.org> wrote: >>> Re base reactor interface: drawing maximally from the lessons learned in >>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, >>> etc), asynchronous-looking name lookup, fd handling are … [View More]the important parts. >> >> That actually sounds more concrete than I'd like a reactor interface >> to be. In the App Engine world, there is a definite need for a >> reactor, but it cannot talk about file descriptors at all -- all I/O >> is defined in terms of RPC operations which have their own (several >> layers of) async management but still need to be plugged in to user >> code that might want to benefit from other reactor functionality such >> as scheduling and placing a call at a certain moment in the future. > > So are you thinking of something like > reactor.add_event_listener(event_type, event_params, func)? One thing > to keep in mind is that file descriptors are somewhat special (at > least in a level-triggered event loop), because of the way the event > will keep firing until the socket buffer is drained or the event is > unregistered. I'd be inclined to keep file descriptors in the > interface even if they just raise an error on app engine, since > they're fairly fundamental to the (unixy) event loop. On the other > hand, I don't have any experience with event loops outside the > unix/network world so I don't know what other systems might need for > their event loops. Hmm... This is definitely an interesting issue. I'm tempted to believe that it is *possible* to change every level-triggered setup into an edge-triggered setup by using an explicit loop -- but I'm not saying it is a good idea. In practice I think we need to support both equally well, so that the *app* can decide which paradigm to use. E.g. if I were to implement an HTTP server, I might use level-triggered for the "accept" call on the listening socket, but edge-triggered for everything else. OTOH someone else might prefer a buffered stream abstraction that just keeps filling its read buffer (and draining its write buffer) using level-triggered callbacks, at least up to a certain buffer size -- we have to be robust here and make it impossible for an evil client to fill up all our memory without our approval! I'm not at all familiar with the Twisted reactor interface. My own design would be along the following lines: - There's an abstract Reactor class and an abstract Async I/O object class. To get a reactor to call you back, you must give it an I/O object, a callback, and maybe some more stuff. (I have gone back and like passing optional args for the callback, rather than requiring lambdas to create closures.) Note that the callback is *not* a designated method on the I/O object! In order to distinguish between edge-triggered and level-triggered, you just use a different reactor method. There could also be a reactor method to schedule a "bare" callback, either after some delay, or immediately (maybe with a given priority), although such functionality could also be implemented through magic I/O objects. - In systems supporting file descriptors, there's a reactor implementation that knows how to use select/poll/etc., and there are concrete I/O object classes that wrap file descriptors. On Windows, those would only be socket file descriptors. On Unix, any file descriptor would do. To create such an I/O object you would use a platform-specific factory. There would be specialized factories to create e.g. listening sockets, connections, files, pipes, and so on. - In systems like App Engine that don't support async I/O on file descriptors at all, the constructors for creating I/O objects for disk files and connection sockets would comply with the interface but fake out almost everything (just like today, using httplib or httplib2 on App Engine works by adapting them to a "urlfetch" RPC request). >>> call_every can be implemented in terms of call_later on a separate object, >>> so I think it should be (eg twisted.internet.task.LoopingCall). One thing >>> that is apparently forgotten about is event loop integration. The prime way >>> of having two event loops cooperate is *NOT* "run both in parallel", it's >>> "have one call the other". Even though not all loops support this, I think >>> it's important to get this as part of the interface (raise an exception for >>> all I care if it doesn't work). >> >> This is definitely one of the things we ought to get right. My own >> thoughts are slightly (perhaps only cosmetically) different again: >> ideally each event loop would have a primitive operation to tell it to >> run for a little while, and then some other code could tie several >> event loops together. >> >> Possibly the primitive operation would be something like "block until >> either you've got one event ready, or until a certain time (possibly >> 0) has passed without any events, and then give us the events that are >> ready and a lower bound for when you might have more work to do" -- or >> maybe instead of returning the event(s) it could just call the >> associated callback (it might have to if it is part of a GUI library >> that has callbacks written in C/C++ for certain events like screen >> refreshes). > > That doesn't work very well - while one loop is waiting for its > timeout, nothing can happen on the other event loop. You have to > switch back and forth frequently to keep things responsive, which is > inefficient. I'd rather give each event loop its own thread; you can > minimize the thread-synchronization concerns by picking one loop as > "primary" and having all the others just pass callbacks over to it > when their events fire. That's a good point. I suppose on systems that support both networking and GUI events, in my design these would use different I/O objects (created using different platform-specific factories) and the shared reactor API would sort things out based on the type of I/O object passed in to it. Note that many GUI events would be level-triggered, but sometimes using the edge-triggered paradigm can work well too: e.g. I imagine that writing code to draw a curve following the mouse as long as a button is pressed might be conveniently written as a loop of the form def on_mouse_press(x, y, buttons): <set up polygon starting current x, y> while True: x, y, buttons = yield <get mouse event> if not buttons: break <extend polygon to x, y> <finish polygon> which itself is registered as a level-triggered handler for mouse presses. (Dealing with multiple buttons is left as an exercise. :-) -- --Guido van Rossum (python.org/~guido) [View Less]

15 30

Off-line most of the day
by Guido van Rossum Oct. 15, 2012

Oct. 15, 2012

I'm about to enter an intense all-day-long meeting at work, and won't have time to keep up with email at all until late tonight. So have fun discussing async APIs without me, and please stay on topic! -- --Guido van Rossum (python.org/~guido)

1 0

Python as a tool to download stuff for bootstrapping
by anatoly techtonik Oct. 15, 2012

Oct. 15, 2012

This one is practical. I am looking at NaCl SDK download page: https://developers.google.com/native-client/sdk/download "you need Python installed", "download SDK update utility" What makes me sad that update utility is a Python script in a zip file - nacl_sdk.zip which includes shell script and a .bat file for launching this Python script. This makes me kind of sad. You have Python installed. Why can't you just crossplatformly do: mkdir nacl cd nacl python -m urllib get http://commondatastorage.googleapis.com/nativeclient-mirror/nacl/nacl_sdk/u… python update_sdk.py

5 8

The async API of the future: Some thoughts from an ignorant Tornado user
by Daniel McDougall Oct. 15, 2012

Oct. 15, 2012

(This is a response to GVR's Google+ post asking for ideas; I apologize in advance if I come off as an ignorant programming newbie) I am the author of Gate One (https://github.com/liftoff/GateOne/) which makes extensive use of Tornado's asynchronous capabilities. It also uses multiprocessing and threading to a lesser extent. The biggest issue I've had trying to write asynchronous code for Gate One is complexity. Complexity creates problems with expressiveness which results in code that, to … [View More]me, feels un-Pythonic. For evidence of this I present the following example: The retrieve_log_playback() function: http://bit.ly/W532m6 (link goes to Github) All the function does is generate and return (to the client browser) an HTML playback of their terminal session recording. To do it efficiently without blocking the event loop or slowing down all other connected clients required loads of complexity (or maybe I'm just ignorant of "a better way"--feel free to enlighten me). In an ideal world I could have just done something like this: import async # The API of the future ;) async.async_call(retrieve_log_playback, settings, tws, mechanism=multiprocessing) # tws == instance of tornado.web.WebSocketHandler that holds the open connection ...but instead I had to create an entirely separate function to act as the multiprocessing.Process(), create a multiprocessing.Queue() to shuffle data back and forth, watch a special file descriptor for updates (so I can tell when the task is complete), and also create a closure because the connection instance (aka 'tws') isn't pickleable. After reading through these threads I feel much of the discussion is over my head but as someone who will ultimately become a *user* of the "async API of the future" I would like to share my thoughts... My opinion is that the goal of any async module that winds up in Python's standard library should be simplicity and portability. In terms of features, here's my 'async wishlist': * I should not have to worry about what is and isn't pickleable when I decide that a task should be performed asynchronously. * I should be able to choose the type of event loop/async mechanism that is appropriate for the task: For CPU-bound tasks I'll probably want to use multiprocessing. For IO-bound tasks I might want to use threading. For a multitude of tasks that "just need to be async" (by nature) I'll want to use an event loop. * Any async module should support 'basics' like calling functions at an interval and calling functions after a timeout occurs (with the ability to cancel). * Asynchronous tasks should be able to access the same namespace as everything else. Maybe wishful thinking. * It should support publish/subscribe-style events (i.e. an event dispatcher). For example, the ability to watch a file descriptor or socket for changes in state and call a function when that happens. Preferably with the flexibility to define custom events (i.e don't have it tied to kqueue/epoll-specific events). Thanks for your consideration; and thanks for the awesome language. -- Dan McDougall - Chief Executive Officer and Developer Liftoff Software ✈ Your flight to the cloud is now boarding. 904-446-8323 [View Less]

3 5