Today a funny thought occurred to me. Ever since I've learned to program
when I was a child, I've taken for granted that when programming, the sign
used for multiplication is *. But now that I think about it, why? Now that
we have Unicode, why not use · ?
Do you think that we can make Python support · in addition to *?
I can think of a couple of problems, but none of them seem like
- Backward compatibility: Python already uses *, but I don't see a
backward compatibility problem with supporting · additionally. Let people
use whichever they want, like spaces and tabs.
- Input methods: I personally use an IDE that could be easily set to
automatically convert * to · where appropriate and to allow manual input of
·. People on Linux can type Alt-. . Anyone else can set up a script that'll
let them type · using whichever keyboard combination they want. I admit
this is pretty annoying, but since you can always use * if you want to, I
figure that anyone who cares enough about using · instead of * (I bet that
people in scientific computing would like that) would be willing to take
the time to set it up.
What do you think?
i'd like to start discussion about the state of asyncore/asynchat's
adaption in the python standard library, with the intention of finding a
roadmap for how to improve things, and of kicking off and coordinating
here's the problem (as previously described in [issue15978] and
redirected here, with some additions):
the asyncore module would be much more useful if it were well integrated
in the standard library. in particular, it should be supported by:
* BaseHTTPServer / http.server (and thus, socketserver)
* urllib2 / urllib, http.client
* probably many other network libraries except smtpd, which already uses
* third party libraries (if stdlib leads the way, the ecosystem will
follow; eg pyserial)
without widespread asyncore support, it is not possible to easily
integrate different servers and services with each other; with asyncore
support, it's just a matter of creating the objects and entering the
main loop. (eg, a http server for controlling a serial device, with a
telnet-like debugging interface).
some examples of the changes required:
* the socketserver documents that it would like to have such a
framework ("Future work: [...] Standard framework for select-based
multiplexing"). due to the nature of socketserver based
implementations (blocking reads), we can't just "add glue so it
works", but there could be extensions so that implementations can be
ported to asynchronous socketservers. i've done if for a particular
case (ported SimpleHTTPServer, but it's a mess of monkey-patching and
* for subprocess, there's a bunch of recipies at .
* pyserial (not standard library, but might as well become) can be
ported quite easily .
this touches several modules whose implementations can be handled
independently from each other; i'd implement some of them myself.
terry.reedy redirected me from the issue tracker to this list, hoping
for controversy and alternatives. if you'd like to discuss, throw in
questions, and we'll find a solution. if you'd think talk is cheap, i
can try to work out first sketches.
python already has batteries for nonblocking operation included, and i
say it's doing it right -- let's just make sure the batteries fit in the
Es ist nicht deine Schuld, dass die Welt ist, wie sie ist -- es wär' nur
deine Schuld, wenn sie so bleibt.
(You are not to blame for the state of the world, but you would be if
that state persisted.)
-- Die Ärzte
Still working my way through zillions of messages on this thread, trying to find things worth responding to, I found this, from Guido:
> [Generators are] more flexible [than Deferreds], since it is easier to catch different exceptions at different points (...) In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change.
If you were actually paying attention, we did explain what "the real stuff" is, and why you can't do it with inlineCallbacks. ;-)
(Or perhaps I should say, why we prefer to do it with Deferreds explicitly.)
Managing parallelism is easy with the when-this-then-that idiom of Deferreds, but challenging with the sequential this-then-this-then-this idiom of generators. The examples in the quoted message were all sequential workflows, which are roughly equivalent in both styles. As soon as a for loop gets involved though, yield-based coroutines have a harder time expressing the kind of parallelism that a lot of applications should use, so it's easy to become accidentally sequential (and therefore less responsive) even if you don't need to be. For example, using some hypothetical generator coroutine library, the idiomatic expression of a loop across several request/responses would be something like this:
values = yield step1()
results = set()
for value in values:
Since it's in a set, the order of 'results' doesn't actually matter; but this code needs to sit and wait for each result to come back in order; it can't perform any processing on the ones that are already ready while it's waiting. You express this with Deferreds:
lambda values: gatherResults([step2(value).addCallback(step3)
for value in values])).addCallback(set)
In addition to being a roughly equivalent amount of code (fewer lines, but denser), that will run step2() and step3() on demand, as results are ready from the set of Deferreds from step1. That means that your program will automatically spread out its computation, which makes better use of time as results may be arriving in any order.
The problem is that it is difficult to express laziness with generator coroutines: you've already spent the generator-ness on the function on responding to events, so there's no longer any syntactic support for laziness.
(There's another problem where sometimes you can determine that work needs to be done as it arrives; that's an even trickier abstraction than Deferreds though and I'm still working on it. I think I've mentioned <http://tm.tl/1956> already in one of my previous posts.)
Also, this is not at all a hypothetical or academic example. This pattern comes up all the time in e.g. web-spidering and chat applications.
To be fair, you could express this in a generator-coroutine library like this:
values = yield step1()
thunks = 
for value in values:
but that seems bizarre and not very idiomatic; to me, it looks like the confusing aspects of both styles.
David Reid also wrote up some examples of how Deferreds can express sequential workflows more nicely as well (also indirectly as a response to Guido!) on his blog, here: <http://dreid.org/2012/03/30/deferreds-are-a-dataflow-abstraction>.
> Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle),
inlineCallbacks (and the even-earlier deferredGenerator) predates Monocle. That's not to say Monocle has no value; it is a portability layer between Twisted and Tornado that does the same thing inlineCallbacks does but allows you to do it even if you're not using Deferreds, which will surely be useful to some people.
I don't want to belabor this point, but it bugs me a little bit that we get so much feedback from the broader Python community along the lines of "Why doesn't Twisted do X? I'd use it if it did X, but it's all weird and I don't understand Y that it forces me to do instead, that's why I use Z" when, in fact:
Twisted does do X
It's done X for years
It actually invented X in the first place
There are legitimate reasons why we (Twisted core developers) suggest and prefer Y for many cases, but you don't need to do it if you don't want to follow our advice
Thing Z that is being cited as doing X actually explicitly mentions Twisted as an inspiration for its implementation of X
It's fair, of course, to complain that we haven't explained this very well, and I'll cop to that unless I can immediately respond with a pre-existing URL that explains things :).
One other comment that's probably worth responding to:
> I suppose on systems that support both networking and GUI events, in my design these would use different I/O objects (created using different platform-specific factories) and the shared reactor API would sort things out based on the type of I/O object passed in to it.
In my opinion, it is a mistake to try to harmonize or unify all GUI event systems, unless you are also harmonizing the GUI itself (i.e. writing a totally portable GUI toolkit that does everything). And I think we can all agree that writing a totally portable GUI toolkit is an impossibly huge task that is out of scope for this (or, really, any other) discussion. GUI systems can already dispatch its event to user code just fine - interposing a Python reactor API between the GUI and the event registration adds additional unnecessary work, and may not even be possible in some cases. See, for example, the way that Xcode (formerly Interface Builder) and the Glade interface designer use: the name of the event handler is registered inside a somewhat opaque blob, which is data and not code, and then hooked up automatically at runtime based on reflection. The code itself never calls any event-registration APIs.
Also, modeling all GUI interaction as a request/response conversation is limiting and leads to bad UI conventions. Consider: the UI element that most readily corresponds to a request/response is a modal dialog box. Does anyone out there really like applications that consist mainly of popping up dialog after dialog to prompt you for the answers to questions?
[This is a lengthy mail; I apologize in advance!]
I've been following this discussion with great interest, and would like
to put forward a suggestion that might simplify some of the questions
that are up in the air.
There are several key point being considered: what exactly constitutes a
"coroutine" or "tasklet", what the precise semantics of "yield" and
"yield from" should be, how the stdlib can support different event loops
and reactors, and how exactly Futures, Deferreds, and other APIs fit
into the whole picture.
This mail is mostly about the first point: I think everyone agrees
roughly what a coroutine-style generator is, but there's enough
variation in how they are used, both historically and presently, that
the concept isn't as precise as it should be. This makes them hard to
think and reason about (failing the "BDFL gets headaches" test), and
makes it harder to define the behavior of all the parts that they
interact with, too.
This is a sketch of an attempt to define what constitutes a
generator-based task or coroutine more rigorously: I think that the
essential behavior can be captured in a small protocol, building on the
generator and iterator protocols. If anyone else thinks this is a good
idea, maybe something like this could work its way into a PEP?
(For the sake of this mail, I will use the term "generator task" or
"task" as a straw man term, but feel free to substitute "coroutine", or
whatever the preferred name ends up being.)
Very informally: A "generator task" is what you get if you take a normal
Python function and replace its blocking calls with "yield from" calls
to equivalent subtasks.
More formally, a "generator task" is a generator that implements an
incremental, multi-step computation, and is intended to be externally
driven to completion by a runner, or "scheduler", until it delivers a
This driving process happens as follows:
1. A generator task is iterated by its scheduler to yield a series of
intermediate "step" values.
2. Each value yielded as a "step" represents a scheduling instruction,
or primitive, to be interpreted by the task's scheduler.
This scheduling instruction can be None ("just resume this task
later"), or a variety of other primitives, such as Futures ("resume
this task with the result of this Future"); see below for more.
3. The scheduler is responsible for interpreting each "step" instruction
as appropriate, and sending the instruction's result, if any, back to
the task using send() or throw().
A scheduler may run a single task to completion, or may multiplex
execution between many tasks: generator tasks should assume that
other tasks may have executed while the task was yielding.
4. The generator task completes by successfully returning (raising
StopIteration), or by raising an exception. The task's caller
receives this result.
(For the sake of discussion, I use "the scheduler" to refer to whoever
calls the generator task's next/send/throw methods, and "the task's
caller" to refer to whoever receives the task's final result, but this
is not important to the protocol: a task should not care who drives it
or consumes its result, just like an iterator should not.)
Scheduling instructions / primitives
(This could probably use a better name.)
The protocol is intentionally agnostic about the implementation of
schedulers, event loops, or reactors: as long as they implement the same
set of scheduling primitives, code should work across them.
There multiple ways to accomplish this, but one possibility is to have a
set common, generic instructions in a standard library module such as
"tasklib" (which could also contain things like default scheduler
implementations, helper functions, and so on).
A partial list of possible primitives (the names are all made up, not
1. None: The most basic "do nothing" instruction. This just instructs
the scheduler to resume the yielding task later.
2. Futures: Instruct the scheduler to resume with the future's result.
Similar types in third-party libraries, such Deferreds, could
potentially be implemented either natively by a scheduler that
supports it, or using a wait_for_deferred(d) helper task, or using
the idea of a "adapter" scheduler (see below).
3. Control primitives: spawn, sleep, etc.
- Spawn a new (independent) task: yield tasklib.spawn(task())
- Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar())
- Delay execution: yield tasklib.sleep(seconds)
These could be simple marker objects, leaving it up to the underlying
scheduler to actually recognize and implement them; some could also
be implemented in terms of simpler operations (e.g. sleep(), in
terms of lower-level suspend and resume operations).
4. I/O operations
This could be anything from low-level "yield fd_readable(sock)" style
requests, or any of the higher-level APIs being discussed elsewhere.
Whatever the exact API ends up being, the scheduler should implement
these primitives by waiting for the I/O (or condition), and resuming
the task with the result, if any.
5. Cooperative concurrency primitives, for working with locks, condition
variables, and so on. (If useful?)
6. Custom, scheduler-specific instructions: Since a generator task can
potentially yield anything as a scheduler instruction, it's not
inconceivable for specialized schedulers to support specialized
instructions. (Code that relies on such special instructions won't
work on other schedulers, but that would be the point.)
A question open to debate is what a scheduler should do when faced with
an unrecognized scheduling instruction.
Raising TypeError or NotImplementedError back into the task is probably
a reasonable action, and would allow code like:
yield from boring_fallback()
Generator tasks as schedulers, and vice versa
Note that there is a symmetry to the protocol when a generator task
calls another using "yield from":
spam = yield from subtask()
Here, task() is both a generator task, and the effective scheduler for
subtask(): it "implements" subtask()'s scheduling instructions by
delegating them to its own scheduler.
This is a plain observation on its own, however, it raises one or two
interesting possibilities for more interesting schedulers implemented as
generator tasks themselves, including:
- Specialized sub-schedulers that run as a normal task within their
parent scheduler, but implement for example weighted or priority
queuing of their subtasks, or similar features.
- "Adapter" schedulers that intercept special scheduler instructions
(say, Deferreds or other library-specific objects), and implement them
using more generic instructions to the underlying scheduler.
There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days. There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion.
Would everyone interested in this please please please read <https://github.com/lvh/async-pep/blob/master/pep-3153.rst> several times? Especially this section: <https://github.com/lvh/async-pep/blob/master/pep-3153.rst#why-separate-prot…>. If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation.
I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer. Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore.
If it needs a name, I suppose I'd call my preferred style "event triggering".
Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once. If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk.
On 10/8/12, Greg Ewing <greg.ewing(a)canterbury.ac.nz> wrote:
> Ronald Oussoren wrote:
>> neither statvs, statvfs, nor pathconf seem to be able to tell if a
>> filesystem is case insensitive.
> Even if they could, you wouldn't be entirely out of the woods,
> because different parts of the same path can be on different
> file systems...
> But how important is all this anyway? I'm trying to think of
> occasions when I've wanted to compare two entire paths for
> equality, and I can't think of *any*.
I can think of several, but when I thought a bit harder, they were
mostly bug attractors.
If I want my program (or a dict) to know that "CONFIG" and "config"
are the same, then I also want it to know that "My Documents" is the
same as "MYDOCU~1".*
Ideally, I would also have a way to find out that a pathname is likely
to be problematic for cross-platform uses, or at least whether two
specific pathnames are known to be collision-prone on existing
platforms other than mine. (But I'm not sure that sort of test can be
reliable enough for the stdlib. Would just check for caseless
equality, reserved Windows names, and non-alphanumeric characters in
*(Well, assuming it is. The short name depends on the history of the
[This is the first spin-off thread from "asyncore: included batteries
On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell <ben(a)bendarnell.com> wrote:
> On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum <guido(a)python.org> wrote:
>>> Re base reactor interface: drawing maximally from the lessons learned in
>>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
>>> etc), asynchronous-looking name lookup, fd handling are the important parts.
>> That actually sounds more concrete than I'd like a reactor interface
>> to be. In the App Engine world, there is a definite need for a
>> reactor, but it cannot talk about file descriptors at all -- all I/O
>> is defined in terms of RPC operations which have their own (several
>> layers of) async management but still need to be plugged in to user
>> code that might want to benefit from other reactor functionality such
>> as scheduling and placing a call at a certain moment in the future.
> So are you thinking of something like
> reactor.add_event_listener(event_type, event_params, func)? One thing
> to keep in mind is that file descriptors are somewhat special (at
> least in a level-triggered event loop), because of the way the event
> will keep firing until the socket buffer is drained or the event is
> unregistered. I'd be inclined to keep file descriptors in the
> interface even if they just raise an error on app engine, since
> they're fairly fundamental to the (unixy) event loop. On the other
> hand, I don't have any experience with event loops outside the
> unix/network world so I don't know what other systems might need for
> their event loops.
Hmm... This is definitely an interesting issue. I'm tempted to believe
that it is *possible* to change every level-triggered setup into an
edge-triggered setup by using an explicit loop -- but I'm not saying
it is a good idea. In practice I think we need to support both equally
well, so that the *app* can decide which paradigm to use. E.g. if I
were to implement an HTTP server, I might use level-triggered for the
"accept" call on the listening socket, but edge-triggered for
everything else. OTOH someone else might prefer a buffered stream
abstraction that just keeps filling its read buffer (and draining its
write buffer) using level-triggered callbacks, at least up to a
certain buffer size -- we have to be robust here and make it
impossible for an evil client to fill up all our memory without our
I'm not at all familiar with the Twisted reactor interface. My own
design would be along the following lines:
- There's an abstract Reactor class and an abstract Async I/O object
class. To get a reactor to call you back, you must give it an I/O
object, a callback, and maybe some more stuff. (I have gone back and
like passing optional args for the callback, rather than requiring
lambdas to create closures.) Note that the callback is *not* a
designated method on the I/O object! In order to distinguish between
edge-triggered and level-triggered, you just use a different reactor
method. There could also be a reactor method to schedule a "bare"
callback, either after some delay, or immediately (maybe with a given
priority), although such functionality could also be implemented
through magic I/O objects.
- In systems supporting file descriptors, there's a reactor
implementation that knows how to use select/poll/etc., and there are
concrete I/O object classes that wrap file descriptors. On Windows,
those would only be socket file descriptors. On Unix, any file
descriptor would do. To create such an I/O object you would use a
platform-specific factory. There would be specialized factories to
create e.g. listening sockets, connections, files, pipes, and so on.
- In systems like App Engine that don't support async I/O on file
descriptors at all, the constructors for creating I/O objects for disk
files and connection sockets would comply with the interface but fake
out almost everything (just like today, using httplib or httplib2 on
App Engine works by adapting them to a "urlfetch" RPC request).
>>> call_every can be implemented in terms of call_later on a separate object,
>>> so I think it should be (eg twisted.internet.task.LoopingCall). One thing
>>> that is apparently forgotten about is event loop integration. The prime way
>>> of having two event loops cooperate is *NOT* "run both in parallel", it's
>>> "have one call the other". Even though not all loops support this, I think
>>> it's important to get this as part of the interface (raise an exception for
>>> all I care if it doesn't work).
>> This is definitely one of the things we ought to get right. My own
>> thoughts are slightly (perhaps only cosmetically) different again:
>> ideally each event loop would have a primitive operation to tell it to
>> run for a little while, and then some other code could tie several
>> event loops together.
>> Possibly the primitive operation would be something like "block until
>> either you've got one event ready, or until a certain time (possibly
>> 0) has passed without any events, and then give us the events that are
>> ready and a lower bound for when you might have more work to do" -- or
>> maybe instead of returning the event(s) it could just call the
>> associated callback (it might have to if it is part of a GUI library
>> that has callbacks written in C/C++ for certain events like screen
> That doesn't work very well - while one loop is waiting for its
> timeout, nothing can happen on the other event loop. You have to
> switch back and forth frequently to keep things responsive, which is
> inefficient. I'd rather give each event loop its own thread; you can
> minimize the thread-synchronization concerns by picking one loop as
> "primary" and having all the others just pass callbacks over to it
> when their events fire.
That's a good point. I suppose on systems that support both networking
and GUI events, in my design these would use different I/O objects
(created using different platform-specific factories) and the shared
reactor API would sort things out based on the type of I/O object
passed in to it.
Note that many GUI events would be level-triggered, but sometimes
using the edge-triggered paradigm can work well too: e.g. I imagine
that writing code to draw a curve following the mouse as long as a
button is pressed might be conveniently written as a loop of the form
def on_mouse_press(x, y, buttons):
<set up polygon starting current x, y>
x, y, buttons = yield <get mouse event>
if not buttons:
<extend polygon to x, y>
which itself is registered as a level-triggered handler for mouse
presses. (Dealing with multiple buttons is left as an exercise. :-)
--Guido van Rossum (python.org/~guido)
I'm about to enter an intense all-day-long meeting at work, and won't
have time to keep up with email at all until late tonight. So have fun
discussing async APIs without me, and please stay on topic!
--Guido van Rossum (python.org/~guido)
(This is a response to GVR's Google+ post asking for ideas; I
apologize in advance if I come off as an ignorant programming newbie)
I am the author of Gate One (https://github.com/liftoff/GateOne/)
which makes extensive use of Tornado's asynchronous capabilities. It
also uses multiprocessing and threading to a lesser extent. The
biggest issue I've had trying to write asynchronous code for Gate One
is complexity. Complexity creates problems with expressiveness which
results in code that, to me, feels un-Pythonic. For evidence of this
I present the following example: The retrieve_log_playback()
function: http://bit.ly/W532m6 (link goes to Github)
All the function does is generate and return (to the client browser)
an HTML playback of their terminal session recording. To do it
efficiently without blocking the event loop or slowing down all other
connected clients required loads of complexity (or maybe I'm just
ignorant of "a better way"--feel free to enlighten me). In an ideal
world I could have just done something like this:
import async # The API of the future ;)
async.async_call(retrieve_log_playback, settings, tws,
# tws == instance of tornado.web.WebSocketHandler that holds the open connection
...but instead I had to create an entirely separate function to act as
the multiprocessing.Process(), create a multiprocessing.Queue() to
shuffle data back and forth, watch a special file descriptor for
updates (so I can tell when the task is complete), and also create a
closure because the connection instance (aka 'tws') isn't pickleable.
After reading through these threads I feel much of the discussion is
over my head but as someone who will ultimately become a *user* of the
"async API of the future" I would like to share my thoughts...
My opinion is that the goal of any async module that winds up in
Python's standard library should be simplicity and portability. In
terms of features, here's my 'async wishlist':
* I should not have to worry about what is and isn't pickleable when I
decide that a task should be performed asynchronously.
* I should be able to choose the type of event loop/async mechanism
that is appropriate for the task: For CPU-bound tasks I'll probably
want to use multiprocessing. For IO-bound tasks I might want to use
threading. For a multitude of tasks that "just need to be async" (by
nature) I'll want to use an event loop.
* Any async module should support 'basics' like calling functions at
an interval and calling functions after a timeout occurs (with the
ability to cancel).
* Asynchronous tasks should be able to access the same namespace as
everything else. Maybe wishful thinking.
* It should support publish/subscribe-style events (i.e. an event
dispatcher). For example, the ability to watch a file descriptor or
socket for changes in state and call a function when that happens.
Preferably with the flexibility to define custom events (i.e don't
have it tied to kqueue/epoll-specific events).
Thanks for your consideration; and thanks for the awesome language.
Dan McDougall - Chief Executive Officer and Developer
Liftoff Software ✈ Your flight to the cloud is now boarding.