From casevh at  Mon Oct  1 00:08:07 2012
From: casevh at (Case Van Horsen)
Date: Sun, 30 Sep 2012 15:08:07 -0700
Subject: [Python-ideas] Deprecate the round builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Sep 30, 2012 at 2:51 PM, Joshua Landau
< at> wrote:
> On 30 September 2012 22:48, Joshua Landau < at>
> wrote:
>> This seems like a problem for the proposal, though: we can't have it in
>> the math library if it's a method!
> Now I think about it: yeah, it can be. We just coerce to float/decimal
> first. *sigh*
math.ceil(x), math.floor(x), and math.trunc(x) and round(x) already
call the special methods x.__ceil__, x.__floor__, x.__round__, and
x.__trunc__. So those four functions already work with decimal
instances (and other numeric types that support those methods.)


> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From at  Mon Oct  1 00:19:33 2012
From: at (Joshua Landau)
Date: Sun, 30 Sep 2012 23:19:33 +0100
Subject: [Python-ideas] Deprecate the round builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On 30 September 2012 23:08, Case Van Horsen <casevh at> wrote:

> On Sun, Sep 30, 2012 at 2:51 PM, Joshua Landau
> < at> wrote:
> > On 30 September 2012 22:48, Joshua Landau < at>
> > wrote:
> >>
> >> This seems like a problem for the proposal, though: we can't have it in
> >> the math library if it's a method!
> >
> >
> > Now I think about it: yeah, it can be. We just coerce to float/decimal
> > first. *sigh*
> math.ceil(x), math.floor(x), and math.trunc(x) and round(x) already
> call the special methods x.__ceil__, x.__floor__, x.__round__, and
> x.__trunc__. So those four functions already work with decimal
> instances (and other numeric types that support those methods.)

 >>> math.ceil("")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: *a float is required*

How deceptive... I hope you forgive me for not realizing that (even though
I must have seen the __ceil__ and __floor__ methods a thousand times).
OK, carry on.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Mon Oct  1 03:44:36 2012
From: steve at (Steven D'Aprano)
Date: Mon, 1 Oct 2012 11:44:36 +1000
Subject: [Python-ideas] Deprecate the round builtin
In-Reply-To: <>
References: <>
Message-ID: <20121001014435.GC8499@ando>

On Sun, Sep 30, 2012 at 02:38:33PM -0700, Gregory P. Smith wrote:
> Why suggest adding new round-like functions to the math module rather than
> defining a new round method on all numerical objects?

round already calls the special __round__ method, and in 3.2 works with 
ints, floats, Decimals and Fractions. Only complex misses out.

py> round(12345, -2)
py> from decimal import Decimal as D
py> round(D("1.2345"), 2)
py> from fractions import Fraction as F
py> round(F(12345, 10000), 2)
Fraction(123, 100)


From tjreedy at  Mon Oct  1 05:17:55 2012
From: tjreedy at (Terry Reedy)
Date: Sun, 30 Sep 2012 23:17:55 -0400
Subject: [Python-ideas] Deprecate the round builtin
In-Reply-To: <>
References: <>
Message-ID: <k4b217$ufg$>

On 9/30/2012 6:19 PM, Joshua Landau wrote:

>       >>> math.ceil("")
>     Traceback (most recent call last):
>        File "<stdin>", line 1, in <module>
>     TypeError: *a float is required*
> How deceptive... I hope you forgive me for not realizing that (even
> though I must have seen the __ceil__ and __floor__ methods a thousand
> times).
> OK, carry on.

The obsolete error message should be fixed. A number is required. Or 
perhaps 'float or number with __ceil__ method'.

Terry Jan Reedy

From dreamingforward at  Mon Oct  1 06:46:05 2012
From: dreamingforward at (Mark Adam)
Date: Sun, 30 Sep 2012 23:46:05 -0500
Subject: [Python-ideas] Deprecate the round builtin
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Sep 27, 2012 at 5:23 AM, Greg Ewing <greg.ewing at> wrote:
> Presumably they would be implemented as module objects,
> created automatically at interpreter startup instead of
> being loaded from a file.
> In which case "built-in module" might be a better term
> for them. And their names should start with lower case.

That's cool.  YES, lowercase.

> Also you wouldn't need new syntax to get names out of
> them, just the existing import machinery:
>   from numbers import *

Well, to me there must be a clear partitioning.

The stuff in the builtin [module] sets the tone for the whole
interpreter environment (and I think python culture itself).  If one
were to use the standard import language (like in your example), it
confuses one "semantically" -- because you're suggesting to treat a it
(i.e. a whole class of "things") as something optional.

Does that make sense?  Thanks,


From steve at  Mon Oct  1 08:05:51 2012
From: steve at (Steven D'Aprano)
Date: Mon, 1 Oct 2012 16:05:51 +1000
Subject: [Python-ideas] Namespaces and modules [was Deprecate the round
In-Reply-To: <>
References: <>
Message-ID: <20121001060551.GA9193@ando>

On Sun, Sep 30, 2012 at 11:46:05PM -0500, Mark Adam wrote:
> On Thu, Sep 27, 2012 at 5:23 AM, Greg Ewing <greg.ewing at> wrote:
> > Presumably they would be implemented as module objects,
> > created automatically at interpreter startup instead of
> > being loaded from a file.
> >
> > In which case "built-in module" might be a better term
> > for them. And their names should start with lower case.
> That's cool.  YES, lowercase.

I'm not sure why "built-in module" is a better term for something which 
I gather is a separate namespace within a module, so you can have:

module.spam  # global namespace
module.sub.spam  # sub is a "submodule" or "namespace"

but sub has no independent existence as a file on disk. If that's what 
we're discussing, I don't think that "built-in module" is a good name, 
since it isn't *built-in*.

We already have something called "built-in modules" -- modules like sys 
which actually are built-in to the Python virtual machine.

> > Also you wouldn't need new syntax to get names out of
> > them, just the existing import machinery:
> >
> >   from numbers import *
> Well, to me there must be a clear partitioning.
> The stuff in the builtin [module] sets the tone for the whole
> interpreter environment (and I think python culture itself).  If one
> were to use the standard import language (like in your example), it
> confuses one "semantically" -- because you're suggesting to treat a it
> (i.e. a whole class of "things") as something optional.
> Does that make sense?

Not to me, I'm afraid.


From stefan_ml at  Mon Oct  1 09:22:50 2012
From: stefan_ml at (Stefan Behnel)
Date: Mon, 01 Oct 2012 09:22:50 +0200
Subject: [Python-ideas] make decimal the default non-integer instead of
In-Reply-To: <k4a29h$7rr$>
References: <>
	<k48ldn$bl7$> <k4a29h$7rr$>
Message-ID: <k4bgc9$b6$>

Serhiy Storchaka, 30.09.2012 18:35:
> Instructive story about fractions:

Sorry - I don't get it. Instructive in what way?


From solipsis at  Mon Oct  1 13:57:46 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 1 Oct 2012 13:57:46 +0200
Subject: [Python-ideas] Deprecate the round builtin
References: <>
Message-ID: <>

On Wed, 26 Sep 2012 17:21:40 -0400
Daniel Holth <dholth at> wrote:
> Normally deprecation means you keep it forever but don't mention it much in
> the docs...

Not really. Most deprecated things disappear one or two versions after
they are deprecated. We only keep something forever when removing it
would break a lot of code and keeping it is cheap.



Software development and contracting:

From jimjjewett at  Mon Oct  1 17:43:04 2012
From: jimjjewett at (Jim Jewett)
Date: Mon, 1 Oct 2012 11:43:04 -0400
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
Message-ID: <>

On 9/30/12, Steven D'Aprano <steve at> wrote:
> On 01/10/12 00:00, Oscar Benjamin wrote:

> py> A = 42
> py> ? = 23
> py> A == ?
> False

It will never be possible to catch all confusables, which is one
reason that the unicode property stalled.

It seems like it would be reasonable to at least warn when identifiers
are not all in the same script -- but real-world examples from Emacs
Lisp made it clear that this is often intentional.  There were still
clear word-boundaries, but it wasn't clear how that word-boundary
detection could be properly automated in the general case.

> Besides, just because you and I can't distinguish A from ? in my editor,
> using one particular choice of font, doesn't mean that the author or his
> intended audience (Greek programmers perhaps?) can't distinguish them,

In many cases, it does -- for the letters to look different requires
an unnatural font choice, though perhaps not so extreme as the
print-the-hex-code font.

> I would welcome "confusable detection" in the standard library, possibly a
> string method "skeleton" or some other interface to the Confusables file,
> perhaps in unicodedata.

I would too, and agree that it shouldn't be limited to identifiers.


From grosser.meister.morti at  Mon Oct  1 18:07:19 2012
From: grosser.meister.morti at (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=)
Date: Mon, 01 Oct 2012 18:07:19 +0200
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
Message-ID: <>

I still don't understand why unicode characters are allowed at all in identifier names. Is the 
reason for this written down somewhere?

On 10/01/2012 05:43 PM, Jim Jewett wrote:
> On 9/30/12, Steven D'Aprano <steve at> wrote:
>> On 01/10/12 00:00, Oscar Benjamin wrote:
>> py> A = 42
>> py> ? = 23
>> py> A == ?
>> False
> It will never be possible to catch all confusables, which is one
> reason that the unicode property stalled.
> It seems like it would be reasonable to at least warn when identifiers
> are not all in the same script -- but real-world examples from Emacs
> Lisp made it clear that this is often intentional.  There were still
> clear word-boundaries, but it wasn't clear how that word-boundary
> detection could be properly automated in the general case.
>> Besides, just because you and I can't distinguish A from ? in my editor,
>> using one particular choice of font, doesn't mean that the author or his
>> intended audience (Greek programmers perhaps?) can't distinguish them,
> In many cases, it does -- for the letters to look different requires
> an unnatural font choice, though perhaps not so extreme as the
> print-the-hex-code font.
>> I would welcome "confusable detection" in the standard library, possibly a
>> string method "skeleton" or some other interface to the Confusables file,
>> perhaps in unicodedata.
> I would too, and agree that it shouldn't be limited to identifiers.
> -jJ

From dreamingforward at  Mon Oct  1 18:12:46 2012
From: dreamingforward at (Mark Adam)
Date: Mon, 1 Oct 2012 11:12:46 -0500
Subject: [Python-ideas] Namespaces and modules [was Deprecate the round
In-Reply-To: <20121001060551.GA9193@ando>
References: <>
Message-ID: <>

On Mon, Oct 1, 2012 at 1:05 AM, Steven D'Aprano <steve at> wrote:
> I'm not sure why "built-in module" is a better term for something which
> I gather is a separate namespace within a module, so you can have:

Yeah, I'm not really sure it makes sense to call it a module at all.
I was sort of capitulating about the use of the word "module".   It's
not like you can do "import __builtins__" in the interpreter, so if
one is going to call it a module (like the interpreter currently
does), one should see that it is a very special exception of the word.

I prefer "namespace", it's the built-in namespace which is a synonym
for "the global module".

>> Well, to me there must be a clear partitioning.
>> The stuff in the builtin [module] sets the tone for the whole
>> interpreter environment (and I think python culture itself).  If one
>> were to use the standard import language (like in your example), it
>> confuses one "semantically" -- because you're suggesting to treat a it
>> (i.e. a whole class of "things") as something optional.
>> Does that make sense?
> Not to me, I'm afraid.

Hopefully the above makes it a little clearer.  But, it's as if you're
going on a road trip, you want to travel efficient and light -- what
you include in your backpack ("interpreter environment") is your
"builtin" and everything else you'll "buy"/import on the road.
Modules are those things on the road.


From rosuav at  Mon Oct  1 18:19:40 2012
From: rosuav at (Chris Angelico)
Date: Tue, 2 Oct 2012 02:19:40 +1000
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 2, 2012 at 2:07 AM, Mathias Panzenb?ck
<grosser.meister.morti at> wrote:
> I still don't understand why unicode characters are allowed at all in
> identifier names. Is the reason for this written down somewhere?

Same reason you're allowed more than two letters in your identifiers:
to allow programmers to make variable names meaningful. The problem
isn't with Unicode, anyway; there are plenty of fonts in which l and 1
are practically identical, and unless your font is monospaced, you
probably will have trouble distinguishing __________rn___ from
__________m___ (just how many underscores IS that?). It's up to the
programmer to be smart about his names.


From robert.kern at  Mon Oct  1 18:43:40 2012
From: robert.kern at (Robert Kern)
Date: Mon, 01 Oct 2012 17:43:40 +0100
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
Message-ID: <k4ch7q$8rv$>

On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
> I still don't understand why unicode characters are allowed at all in identifier
> names. Is the reason for this written down somewhere?

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

From grosser.meister.morti at  Mon Oct  1 19:02:07 2012
From: grosser.meister.morti at (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=)
Date: Mon, 01 Oct 2012 19:02:07 +0200
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <k4ch7q$8rv$>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

On 10/01/2012 06:43 PM, Robert Kern wrote:
> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
>> I still don't understand why unicode characters are allowed at all in identifier
>> names. Is the reason for this written down somewhere?

But the Python keywords and more importantly the documentation is English. Don't you need to be able 
to speak/write English in order to code Python anyway? And if you keep you code+comments English you 
can access a much larger developer pool (all developers who speak English should by my hypothesis be 
a superset of all developers who speak a certain language).

From massimo.dipierro at  Mon Oct  1 19:18:31 2012
From: massimo.dipierro at (Massimo DiPierro)
Date: Mon, 1 Oct 2012 12:18:31 -0500
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <k4ch7q$8rv$>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

The great thing about open source is that is has brought the world together. I am not an english speaker and I learned the meaning of IF, THEN, FOR, WHILE, not in the context of the English language, but as keywords of the Basic programming language. The fact that they are english words has is accidental. The great thing about code is (used to be) that anybody can read and understand what others write.

When I used program in Italy, I had to deal with latin-1 characters. This was never a problem. Not even in Cobol, Basic, Clipper, or Paradox because data should be separated from code. Data may contain latin-1 or unicode or whatever. Code always contains ASCII and if one does not mix the two there is never a problem.

Allowing unicode in variable names blurs this separation. It makes code written people speaking one language unreadable by people speaking a different language.

I should point out that most of my students are Chinese. They do not have any problem with reading and writing code using the english alphabet.

Any one of us could design better power plugs for our homes. That does not mean it would be a good idea to do so.


On Oct 1, 2012, at 11:43 AM, Robert Kern wrote:

> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
>> I still don't understand why unicode characters are allowed at all in identifier
>> names. Is the reason for this written down somewhere?
> -- 
> Robert Kern
> "I have come to believe that the whole world is an enigma, a harmless enigma
> that is made terrible by our own mad attempt to interpret it as though it had
> an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From guido at  Mon Oct  1 19:44:42 2012
From: guido at (Guido van Rossum)
Date: Mon, 1 Oct 2012 10:44:42 -0700
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

On Mon, Oct 1, 2012 at 10:02 AM, Mathias Panzenb?ck
<grosser.meister.morti at> wrote:
> On 10/01/2012 06:43 PM, Robert Kern wrote:
>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
>>> I still don't understand why unicode characters are allowed at all in
>>> identifier
>>> names. Is the reason for this written down somewhere?
> But the Python keywords and more importantly the documentation is English.
> Don't you need to be able to speak/write English in order to code Python
> anyway? And if you keep you code+comments English you can access a much
> larger developer pool (all developers who speak English should by my
> hypothesis be a superset of all developers who speak a certain language).

Hi Matthias,

Your objections go pretty much exactly along the lines of my original
resistance to this proposal (which was proposed many times before it
got to be a PEP). What finally made me change my mind was talking to
educators who were teaching Python in countries where not only English
is not the primary language, the primary language is not even related
to English. (E.g. Chinese or Japanese.)

Teaching the students the necessary language keywords and standard
library names is not that difficult; even if English *is* your primary
language you have to learn what they mean in the context of
programming. (Example: "print" comes from a very ancient mode of using
computers where the only form of output was through a physical

But these students often have a very limited English vocabulary, and
their science and math classes (which are often useful starting points
for programming exercises) are usually taught in the native language.
So when teachers show students example programs it helps if they can
name e.g. their variables and functions in the native language.
Comments are also often written in the native language. Here, it
really helps if the students can type their native language directly
rather than having to use the Latin transcription (even if they often
also have to learn the latter, for unrelated pragmatic reasons).

>From your name and email it sounds like your native language might be
German. Like me, you probably take pride in your English skills and
like me, you write all your code using English for identifiers and
comments. However, for students just learning to program and not yet
well-versed in English, that would be like trying to teach them
multiple things at once. It may work for the smartest students, but it
probably would be unnecessarily off-putting for many others.

As an example in German, I found a Python book aimed at middle- and
high-schoolers written in German, Python f?r Kids. You can look inside
it on the Amazon website:
-- the examples use German words for most module and variable names.
Luckily German limited to ASCII is still fairly readable ("fuer"
instead of "f?r" etc.), so Unicode is not strictly needed for this
case -- but you can understand that in languages whose native alphabet
is not English, Unicode is essential for the same style of

I'm sure there are also examples beyond education -- e.g. in a program
for calculating dutch taxes I would use the dutch names for the
various technical terms naming concepts in dutch tax law, and again,
in the case of the Dutch language that doesn't require Unicode, but
for many other languages it would.

I hope this helps. (Also note, as the PEP states explicitly, that the
Python standard library should use only ASCII and English for
identifiers and comments, except in those unittests that are
specifically testing the Unicode identifiers feature.)

--Guido van Rossum (

From guido at  Mon Oct  1 19:51:54 2012
From: guido at (Guido van Rossum)
Date: Mon, 1 Oct 2012 10:51:54 -0700
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

On Mon, Oct 1, 2012 at 10:18 AM, Massimo DiPierro
<massimo.dipierro at> wrote:
> The great thing about open source is that is has brought the world together. I am not an english speaker and I learned the meaning of IF, THEN, FOR, WHILE, not in the context of the English language, but as keywords of the Basic programming language. The fact that they are english words has is accidental. The great thing about code is (used to be) that anybody can read and understand what others write.
> When I used program in Italy, I had to deal with latin-1 characters. This was never a problem. Not even in Cobol, Basic, Clipper, or Paradox because data should be separated from code. Data may contain latin-1 or unicode or whatever. Code always contains ASCII and if one does not mix the two there is never a problem.
> Allowing unicode in variable names blurs this separation. It makes code written people speaking one language unreadable by people speaking a different language.
> I should point out that most of my students are Chinese. They do not have any problem with reading and writing code using the english alphabet.
> Any one of us could design better power plugs for our homes. That does not mean it would be a good idea to do so.

Our posts crossed. I hope my explanation makes sense to you. The age /
grade level of students probably matters; all classes in middle or
high school are typically taught in the native language, but in
University more and more courses are taught in English (some European
countries are even making English the mandatory teachkng language at
the University level).

Not everything you design is meant to be a better power plug for the
world. Sometimes you just need to find a way to fit *your* oven in
*your* cabinet, and cutting up some planks in a way that wouldn't work
for anyone else is fine.

--Guido van Rossum (

From g.brandl at  Mon Oct  1 19:48:44 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 01 Oct 2012 19:48:44 +0200
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <k4cl1q$dd1$>

On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote:
> On 10/01/2012 06:43 PM, Robert Kern wrote:
>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
>>> I still don't understand why unicode characters are allowed at all in identifier
>>> names. Is the reason for this written down somewhere?
> But the Python keywords and more importantly the documentation is English. Don't you need to be able
> to speak/write English in order to code Python anyway? And if you keep you code+comments English you
> can access a much larger developer pool (all developers who speak English should by my hypothesis be
> a superset of all developers who speak a certain language).

Please; the PEP has been discussed quite a lot when it was proposed,
and believe me, yours is not an unfamiliar argument :)  You're about
5 years late.


From solipsis at  Mon Oct  1 20:04:06 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 1 Oct 2012 20:04:06 +0200
Subject: [Python-ideas] Visually confusable unicode characters in
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

On Mon, 1 Oct 2012 10:44:42 -0700
Guido van Rossum <guido at> wrote:
> As an example in German, I found a Python book aimed at middle- and
> high-schoolers written in German, Python f?r Kids. You can look inside
> it on the Amazon website:

Oh but why isn't it named Python f?r Kinder? :-)



Software development and contracting:

From guido at  Mon Oct  1 20:10:32 2012
From: guido at (Guido van Rossum)
Date: Mon, 1 Oct 2012 11:10:32 -0700
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

On Mon, Oct 1, 2012 at 11:04 AM, Antoine Pitrou <solipsis at> wrote:
> On Mon, 1 Oct 2012 10:44:42 -0700
> Guido van Rossum <guido at> wrote:
>> As an example in German, I found a Python book aimed at middle- and
>> high-schoolers written in German, Python f?r Kids. You can look inside
>> it on the Amazon website:
> Oh but why isn't it named Python f?r Kinder? :-)

Probably to be "cool" for the "kids". Why is a mobile phone in Germany
called a "Handy" ?

--Guido van Rossum (

From jkbbwr at  Mon Oct  1 20:12:41 2012
From: jkbbwr at (Jakob Bowyer)
Date: Mon, 1 Oct 2012 19:12:41 +0100
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

Because it fits in your hand? And its handy? :)

On Mon, Oct 1, 2012 at 7:10 PM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 1, 2012 at 11:04 AM, Antoine Pitrou <solipsis at> wrote:
>> On Mon, 1 Oct 2012 10:44:42 -0700
>> Guido van Rossum <guido at> wrote:
>>> As an example in German, I found a Python book aimed at middle- and
>>> high-schoolers written in German, Python f?r Kids. You can look inside
>>> it on the Amazon website:
>> Oh but why isn't it named Python f?r Kinder? :-)
> Probably to be "cool" for the "kids". Why is a mobile phone in Germany
> called a "Handy" ?
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From tjreedy at  Mon Oct  1 20:21:05 2012
From: tjreedy at (Terry Reedy)
Date: Mon, 01 Oct 2012 14:21:05 -0400
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <k4cmum$1no$>

On 10/1/2012 1:02 PM, Mathias Panzenb?ck wrote:
> On 10/01/2012 06:43 PM, Robert Kern wrote:
>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
>>> I still don't understand why unicode characters are allowed at all in
>>> identifier
>>> names. Is the reason for this written down somewhere?

I have the impression that latin-1 chars were/are (unofficially) 
accepted in Python2.

> But the Python keywords and more importantly the documentation is
> English.

I know of at least one translation
though keeping up with changes is obvious a problem.

There are multiple books in multiple languages. When I went to a 
bookstore in Japan, the program languages sections had about 8 for 
Python. I suspect that is more than most equivalent US bookstores.

Terry Jan Reedy

From ncoghlan at  Mon Oct  1 20:35:47 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 2 Oct 2012 00:05:47 +0530
Subject: [Python-ideas] Namespaces and modules [was Deprecate the round
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 1, 2012 at 9:42 PM, Mark Adam <dreamingforward at> wrote:
> On Mon, Oct 1, 2012 at 1:05 AM, Steven D'Aprano <steve at> wrote:
>> I'm not sure why "built-in module" is a better term for something which
>> I gather is a separate namespace within a module, so you can have:
> Yeah, I'm not really sure it makes sense to call it a module at all.
> I was sort of capitulating about the use of the word "module".   It's
> not like you can do "import __builtins__" in the interpreter, so if
> one is going to call it a module (like the interpreter currently
> does), one should see that it is a very special exception of the word.

"import __builtin__" in Python 2, "import builtins" in Python 3. The
contents of those modules are implicitly made available to all Python
code running in that process.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From massimo.dipierro at  Mon Oct  1 21:29:46 2012
From: massimo.dipierro at (Massimo DiPierro)
Date: Mon, 1 Oct 2012 14:29:46 -0500
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

Hello Guido,

it does make sense. The only point I tried to make is that, because something is allowed, it does mean it should be encouraged.
I am sure there are instructors who want to teach to code using Japanese of Chinese variable names. Python gives them a way to do so. 
Yet, if they do so, they would be isolating their students and their code from the rest of the world.


On Oct 1, 2012, at 12:51 PM, Guido van Rossum wrote:

> On Mon, Oct 1, 2012 at 10:18 AM, Massimo DiPierro
> <massimo.dipierro at> wrote:
>> The great thing about open source is that is has brought the world together. I am not an english speaker and I learned the meaning of IF, THEN, FOR, WHILE, not in the context of the English language, but as keywords of the Basic programming language. The fact that they are english words has is accidental. The great thing about code is (used to be) that anybody can read and understand what others write.
>> When I used program in Italy, I had to deal with latin-1 characters. This was never a problem. Not even in Cobol, Basic, Clipper, or Paradox because data should be separated from code. Data may contain latin-1 or unicode or whatever. Code always contains ASCII and if one does not mix the two there is never a problem.
>> Allowing unicode in variable names blurs this separation. It makes code written people speaking one language unreadable by people speaking a different language.
>> I should point out that most of my students are Chinese. They do not have any problem with reading and writing code using the english alphabet.
>> Any one of us could design better power plugs for our homes. That does not mean it would be a good idea to do so.
> Our posts crossed. I hope my explanation makes sense to you. The age /
> grade level of students probably matters; all classes in middle or
> high school are typically taught in the native language, but in
> University more and more courses are taught in English (some European
> countries are even making English the mandatory teachkng language at
> the University level).
> Not everything you design is meant to be a better power plug for the
> world. Sometimes you just need to find a way to fit *your* oven in
> *your* cabinet, and cutting up some planks in a way that wouldn't work
> for anyone else is fine.
> -- 
> --Guido van Rossum (

From grosser.meister.morti at  Mon Oct  1 21:33:13 2012
From: grosser.meister.morti at (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=)
Date: Mon, 01 Oct 2012 21:33:13 +0200
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <k4cl1q$dd1$>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <>

On 10/01/2012 07:48 PM, Georg Brandl wrote:
> On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote:
>> On 10/01/2012 06:43 PM, Robert Kern wrote:
>>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
>>>> I still don't understand why unicode characters are allowed at all in identifier
>>>> names. Is the reason for this written down somewhere?
>> But the Python keywords and more importantly the documentation is English. Don't you need to be able
>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you
>> can access a much larger developer pool (all developers who speak English should by my hypothesis be
>> a superset of all developers who speak a certain language).
> Please; the PEP has been discussed quite a lot when it was proposed,
> and believe me, yours is not an unfamiliar argument :)  You're about
> 5 years late.
> Georg

I didn't want to start a discussion. I just wanted to know why one would implement such a language 
feature. Guido's answer cleared it up for me, thanks. I can see the purpose in an educational 
setting (not in production code of anything a little bit bigger).


From ncoghlan at  Mon Oct  1 21:37:24 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 2 Oct 2012 01:07:24 +0530
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

On Tue, Oct 2, 2012 at 12:59 AM, Massimo DiPierro
<massimo.dipierro at> wrote:
> Hello Guido,
> it does make sense. The only point I tried to make is that, because something is allowed, it does mean it should be encouraged.
> I am sure there are instructors who want to teach to code using Japanese of Chinese variable names. Python gives them a way to do so.
> Yet, if they do so, they would be isolating their students and their code from the rest of the world.

Only if they *stop* there. The idea is just to allow the learning
curve to be made gentler - as people learn the standard library and
the tools on PyPI, then yes, it will still be necessary to continue
learning English in order to make use of those tools (especially as
many of them won't have translated documentation).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From g.brandl at  Mon Oct  1 22:03:21 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 01 Oct 2012 22:03:21 +0200
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <k4csu8$o7s$>

On 10/01/2012 09:33 PM, Mathias Panzenb?ck wrote:
> On 10/01/2012 07:48 PM, Georg Brandl wrote:
>> On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote:
>>> On 10/01/2012 06:43 PM, Robert Kern wrote:
>>>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
>>>>> I still don't understand why unicode characters are allowed at all in identifier
>>>>> names. Is the reason for this written down somewhere?
>>> But the Python keywords and more importantly the documentation is English. Don't you need to be able
>>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you
>>> can access a much larger developer pool (all developers who speak English should by my hypothesis be
>>> a superset of all developers who speak a certain language).
>> Please; the PEP has been discussed quite a lot when it was proposed,
>> and believe me, yours is not an unfamiliar argument :)  You're about
>> 5 years late.
>> Georg
> I didn't want to start a discussion. I just wanted to know why one would implement such a language
> feature.

Well, in that case I would have said "read the PEP": I think it's well
explained there.


From g.brandl at  Mon Oct  1 22:04:51 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 01 Oct 2012 22:04:51 +0200
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <k4ct11$ogm$>

On 10/01/2012 08:10 PM, Guido van Rossum wrote:
> On Mon, Oct 1, 2012 at 11:04 AM, Antoine Pitrou <solipsis at> wrote:
>> On Mon, 1 Oct 2012 10:44:42 -0700
>> Guido van Rossum <guido at> wrote:
>>> As an example in German, I found a Python book aimed at middle- and
>>> high-schoolers written in German, Python f?r Kids. You can look inside
>>> it on the Amazon website:
>> Oh but why isn't it named Python f?r Kinder? :-)
> Probably to be "cool" for the "kids". Why is a mobile phone in Germany
> called a "Handy" ?

And why, oh why, do we have to buy our bread rolls at a "Backshop" nowadays...


From oscar.j.benjamin at  Mon Oct  1 22:26:07 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Mon, 1 Oct 2012 21:26:07 +0100
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <>

On 1 October 2012 20:33, Mathias Panzenb?ck
<grosser.meister.morti at> wrote:
> On 10/01/2012 07:48 PM, Georg Brandl wrote:
>> On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote:
>>> On 10/01/2012 06:43 PM, Robert Kern wrote:
>>>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
>>>>> I still don't understand why unicode characters are allowed at all in identifier
>>>>> names. Is the reason for this written down somewhere?
>>> But the Python keywords and more importantly the documentation is English. Don't you need to be able
>>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you
>>> can access a much larger developer pool (all developers who speak English should by my hypothesis be
>>> a superset of all developers who speak a certain language).
>> Please; the PEP has been discussed quite a lot when it was proposed,
>> and believe me, yours is not an unfamiliar argument :)  You're about
>> 5 years late.
>> Georg
> I didn't want to start a discussion. I just wanted to know why one would implement such a language feature. Guido's answer cleared it up for me, thanks. I can see the purpose in an educational setting (not in production code of anything a little bit bigger).

Non-ascii identifiers have other possible uses. I'll repost the case
that started this discussion on python-tutor (attached in case it
doesn't display):

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-

# Parameters
? = 1
? = 0.1
? = 1.5
? = 0.075

# Initial conditions
x? = 10
y? = 5
Z? = x?, y?

# Solution parameters
t? = 0
?t = 0.001
T = 10

# Lotka-Volterra derivative
def f(Z, t):
    x, y = Z
    x? = x * (? - ?*y)
    y? = -y * (? - ?*x)
    return x?, y?

# Accumulate results from Euler stepper
t? = t?
Z? = Z?
Z?, t = [], []
while t? <= t? + T:
    Z? = [Z??+ ?t*Z??? for Z??, Z??? in zip(Z?, f(Z?, t?))]
    t? += ?t

# Output since I don't have plotting libraries in Python 3
print('t', 'x', 'y')
for t?, (x?, y?) in zip(t, Z?):
    print(t?, x?, y?)

-------------- next part --------------
A non-text attachment was scrubbed...
Type: application/octet-stream
Size: 735 bytes
Desc: not available
URL: <>

From dholth at  Mon Oct  1 22:44:21 2012
From: dholth at (Daniel Holth)
Date: Mon, 1 Oct 2012 16:44:21 -0400
Subject: [Python-ideas] use multiprocessing in included compileall script
Message-ID: <>

As an option, compileall should use a multiprocessing Pool() to speed
up its work.

From guido at  Mon Oct  1 22:51:34 2012
From: guido at (Guido van Rossum)
Date: Mon, 1 Oct 2012 13:51:34 -0700
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<k4cl1q$dd1$> <>
Message-ID: <>

On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin
<oscar.j.benjamin at> wrote:
> On 1 October 2012 20:33, Mathias Panzenb?ck
> <grosser.meister.morti at> wrote:
>> On 10/01/2012 07:48 PM, Georg Brandl wrote:
>>> On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote:
>>>> On 10/01/2012 06:43 PM, Robert Kern wrote:
>>>>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote:
>>>>>> I still don't understand why unicode characters are allowed at all in identifier
>>>>>> names. Is the reason for this written down somewhere?
>>>> But the Python keywords and more importantly the documentation is English. Don't you need to be able
>>>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you
>>>> can access a much larger developer pool (all developers who speak English should by my hypothesis be
>>>> a superset of all developers who speak a certain language).
>>> Please; the PEP has been discussed quite a lot when it was proposed,
>>> and believe me, yours is not an unfamiliar argument :)  You're about
>>> 5 years late.
>>> Georg
>> I didn't want to start a discussion. I just wanted to know why one would implement such a language feature. Guido's answer cleared it up for me, thanks. I can see the purpose in an educational setting (not in production code of anything a little bit bigger).
> Non-ascii identifiers have other possible uses. I'll repost the case
> that started this discussion on python-tutor (attached in case it
> doesn't display):
> '''
> #!/usr/bin/env python3
> # -*- encoding: utf-8 -*-
> # Parameters
> ? = 1
> ? = 0.1
> ? = 1.5
> ? = 0.075
> # Initial conditions
> x? = 10
> y? = 5
> Z? = x?, y?
> # Solution parameters
> t? = 0
> ?t = 0.001
> T = 10
> # Lotka-Volterra derivative
> def f(Z, t):
>     x, y = Z
>     x? = x * (? - ?*y)
>     y? = -y * (? - ?*x)
>     return x?, y?
> # Accumulate results from Euler stepper
> t? = t?
> Z? = Z?
> Z?, t = [], []
> while t? <= t? + T:
>     Z?.append(Z?)
>     t.append(t?)
>     Z? = [Z??+ ?t*Z??? for Z??, Z??? in zip(Z?, f(Z?, t?))]
>     t? += ?t
> # Output since I don't have plotting libraries in Python 3
> print('t', 'x', 'y')
> for t?, (x?, y?) in zip(t, Z?):
>     print(t?, x?, y?)
> '''

Those examples would be a lot more compelling if there was an
acceptable way to input those characters. Maybe we could support some
kind of input method that enabled LaTeX style math notation as used by
scientists for writing equations in papers?

--Guido van Rossum (

From solipsis at  Mon Oct  1 22:51:23 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 1 Oct 2012 22:51:23 +0200
Subject: [Python-ideas] use multiprocessing in included compileall script
References: <>
Message-ID: <>

Hello Daniel,

On Mon, 1 Oct 2012 16:44:21 -0400
Daniel Holth <dholth at> wrote:
> As an option, compileall should use a multiprocessing Pool() to speed
> up its work.

This kind of concrete proposal can be brought directly on the bug
tracker, no need to go through python-ideas.



Software development and contracting:

From dholth at  Mon Oct  1 22:54:50 2012
From: dholth at (Daniel Holth)
Date: Mon, 1 Oct 2012 16:54:50 -0400
Subject: [Python-ideas] use multiprocessing in included compileall script
In-Reply-To: <>
References: <>
Message-ID: <>


From andre.roberge at  Mon Oct  1 22:55:30 2012
From: andre.roberge at (Andre Roberge)
Date: Mon, 1 Oct 2012 17:55:30 -0300
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <>

On Mon, Oct 1, 2012 at 5:51 PM, Guido van Rossum <guido at> wrote:

> On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin

> > Non-ascii identifiers have other possible uses. I'll repost the case
> > that started this discussion on python-tutor (attached in case it
> > doesn't display):
> >
> > '''
> > #!/usr/bin/env python3
> > # -*- encoding: utf-8 -*-
> >
> > # Parameters
> > ? = 1
> > ? = 0.1
> > ? = 1.5
> > ? = 0.075
> >
> > # Initial conditions
> > x? = 10
> > y? = 5
> > Z? = x?, y?
> >

> Those examples would be a lot more compelling if there was an
> acceptable way to input those characters. Maybe we could support some
> kind of input method that enabled LaTeX style math notation as used by
> scientists for writing equations in papers?

Andr? Roberge

> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From oscar.j.benjamin at  Mon Oct  1 23:46:50 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Mon, 1 Oct 2012 22:46:50 +0100
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <>

On 1 October 2012 21:51, Guido van Rossum <guido at> wrote:
> On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin
> <oscar.j.benjamin at> wrote:
>> # Parameters
>> ? = 1
>> ? = 0.1
>> ? = 1.5
>> ? = 0.075
>> # Initial conditions
>> x? = 10
>> y? = 5
>> Z? = x?, y?
> Those examples would be a lot more compelling if there was an
> acceptable way to input those characters. Maybe we could support some
> kind of input method that enabled LaTeX style math notation as used by
> scientists for writing equations in papers?

Sympy already has a few of the basic TeX concepts. I imagine that something
like Sympy notebooks (a browser-based interface) might one day gain support
for this. A readline-ish method to do it would be a great extension to
isympy (since it already works for output):

$ isympy
IPython console for SymPy 0.7.1.rc1 (Python 2.7.3-64-bit) (ground types:

In [1]: Symbol('beta')
Out[1]: ?

In [2]: Symbol('c_1')
Out[2]: c?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From g.brandl at  Mon Oct  1 23:54:04 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 01 Oct 2012 23:54:04 +0200
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <k4d3dr$j38$>

On 10/01/2012 10:51 PM, Guido van Rossum wrote:
> On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin
>> Non-ascii identifiers have other possible uses. I'll repost the case
>> that started this discussion on python-tutor (attached in case it
>> doesn't display):

Very nice!

>> '''
>> #!/usr/bin/env python3
>> # -*- encoding: utf-8 -*-
>> # Parameters
>> ? = 1
>> ? = 0.1
>> ? = 1.5
>> ? = 0.075
>> # Initial conditions
>> x? = 10
>> y? = 5
>> Z? = x?, y?
>> # Solution parameters
>> t? = 0
>> ?t = 0.001
>> T = 10
>> # Lotka-Volterra derivative
>> def f(Z, t):
>>     x, y = Z
>>     x? = x * (? - ?*y)
>>     y? = -y * (? - ?*x)
>>     return x?, y?
>> # Accumulate results from Euler stepper
>> t? = t?
>> Z? = Z?
>> Z?, t = [], []
>> while t? <= t? + T:
>>     Z?.append(Z?)
>>     t.append(t?)
>>     Z? = [Z??+ ?t*Z??? for Z??, Z??? in zip(Z?, f(Z?, t?))]
>>     t? += ?t
>> # Output since I don't have plotting libraries in Python 3
>> print('t', 'x', 'y')
>> for t?, (x?, y?) in zip(t, Z?):
>>     print(t?, x?, y?)
>> '''
> Those examples would be a lot more compelling if there was an
> acceptable way to input those characters. Maybe we could support some
> kind of input method that enabled LaTeX style math notation as used by
> scientists for writing equations in papers?

With the right editor, of course, it's not a problem :)

(Emacs has a TeX input method with which I could type this example without


From matthew at  Tue Oct  2 00:28:09 2012
From: matthew at (Matthew Woodcraft)
Date: Mon, 01 Oct 2012 23:28:09 +0100
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <k4d5do$5if$>

On 2012-10-01 21:51, Guido van Rossum wrote:
> Those examples would be a lot more compelling if there was an
> acceptable way to input those characters. Maybe we could support some
> kind of input method that enabled LaTeX style math notation as used by
> scientists for writing equations in papers?

I think that's up to the OS or the text editor.

In Emacs, this works:
M-x set-input-method tex


From greg.ewing at  Tue Oct  2 01:07:05 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 02 Oct 2012 12:07:05 +1300
Subject: [Python-ideas] Namespaces and modules [was Deprecate the round
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Adam wrote:
> It's not like you can do "import __builtins__" in the interpreter,

But you *can* do "import __builtin__".

Also, "sys" is created at interpreter startup and doesn't correspond
to any disk file, but we don't seem to mind calling it a module and
using the same import syntax to access it.

The only difference I can see with these proposed namespace things is
that they would be pre-bound to names in the builtin namespace.

 > But, it's as if you're
> going on a road trip, you want to travel efficient and light -- what
> you include in your backpack ("interpreter environment") is your
> "builtin" and everything else you'll "buy"/import on the road.
> Modules are those things on the road.

The sys module violates this taxonomy -- it's already in your
backpack, just tucked away in a paper bag that you need to open


From greg.ewing at  Tue Oct  2 01:24:27 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 02 Oct 2012 12:24:27 +1300
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

Antoine Pitrou wrote:

> Oh but why isn't it named Python f?r Kinder? :-)

It looks like Germans have adopted "kid" as an abbreviation
for "kinder", just like we use it as an abbreviation for
"child". Or maybe we got it from them -- it's closer to
their original word than ours!

They seem to be using our plural, though -- "kids", not


From grosser.meister.morti at  Tue Oct  2 02:06:35 2012
From: grosser.meister.morti at (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=)
Date: Tue, 02 Oct 2012 02:06:35 +0200
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

On 10/02/2012 01:24 AM, Greg Ewing wrote:
> Antoine Pitrou wrote:
>> Oh but why isn't it named Python f?r Kinder? :-)
> It looks like Germans have adopted "kid" as an abbreviation
> for "kinder", just like we use it as an abbreviation for
> "child". Or maybe we got it from them -- it's closer to
> their original word than ours!
> They seem to be using our plural, though -- "kids", not
> "kidden"...

Sometimes we use the ...s for plural as well, especially for acronyms, words of English or French 
origin and last names. But it would not be ...en, maybe Is there any German word that uses 
...en for plural? I don't think so. Anyway, "kids" is definitely an anglicism, because we pronounce 
it "English" and not like it would be pronounced if it where derived from "Kind" (it would be more 
like "keed"). German today is full of anglicisms.

But then, there are some German words used by English people as well: gesundheit, kindergarten, 
?ber, blitz(krieg), angst (used as something different as the German word), abseiling ("abseilen" in 
German), doppelg?nger, gestalt, poltergeist, Zeitgeist...

From steve at  Tue Oct  2 02:15:18 2012
From: steve at (Steven D'Aprano)
Date: Tue, 02 Oct 2012 10:15:18 +1000
Subject: [Python-ideas] use multiprocessing in included compileall script
In-Reply-To: <>
References: <>
Message-ID: <>

On 02/10/12 06:44, Daniel Holth wrote:
> As an option, compileall should use a multiprocessing Pool() to speed
> up its work.

Sounds like overkill.

In my experience, very few ideas are so self-evident that they don't
need any explanation, and this is certainly not one of them. What is
your rationale for why compileall should use multiprocessing?


From steve at  Tue Oct  2 03:32:06 2012
From: steve at (Steven D'Aprano)
Date: Tue, 02 Oct 2012 11:32:06 +1000
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
Message-ID: <>

On 02/10/12 05:29, Massimo DiPierro wrote:

> it does make sense. The only point I tried to make is that,
> because something is allowed, it does mean it should be
> encouraged. I am sure there are instructors who want to teach
>to code using Japanese of Chinese variable names. Python gives
> them a way to do so. Yet, if they do so, they would be
>isolating their students and their code from the rest of the

People very often over-estimate the cost of that isolation, and
over-value access to the rest of the world.

The average open source piece of software has one, maybe two,
contributors. What do they care if millions of English-speaking
programmers can't contribute when they weren't going to contribute
regardless of the language? Perhaps the convenience of being able
to read your own code in your own native language outweighs the
loss of being able to attract contributors that you can't even
talk to.

And for proprietary software, again it is irrelevant. If a Chinese
company writes Chinese software for Chinese users with Chinese
developers, why would they want to write it in English? Perhaps
they have little choice due to the overwhelming trend towards English
in programming languages, but there's no positive benefit to using
a non-native language.

Quite frankly, and I'm saying this as somebody who only speaks
English, I think that the use of English as the single lingua franca
of computer programming is as unnecessary (and ultimately as harmful)
as the use of Latin and then French as the sole lingua franca of
science and mathematics. I expect that it too will be a passing phase.

By the way, are you familiar with ChinesePython and IronPerunis?


From stephen at  Tue Oct  2 05:48:07 2012
From: stephen at (Stephen J. Turnbull)
Date: Tue, 02 Oct 2012 12:48:07 +0900
Subject: [Python-ideas] Visually confusable unicode characters
	in	identifiers
In-Reply-To: <>
References: <>
Message-ID: <>

Mathias Panzenb?ck writes:

 > I still don't understand why unicode characters are allowed at all
 > in identifier names.

"Consenting adults."  'nuff said?

An anecdote.  Back when I was first learning Japanese, I maintained an
Emacs interface to EDICT, a free Japanese-English dictionary.  The
code was smart enough to parse morphosyntax (inflection of verbs and
adjectives) into dictionary forms, but I wasn't (and according to my
daughter, still am not<wink/>).  So I asked my tutor for help.

Although a total non-programmer, he was able to read the grammar
easily because the state names (identifiers for callable objects) were
written in Japanese, using the standard grammatical name for the
inflection.  The "easy" part comes in because although his English was
good, it wasn't good enough to disentangle Lisp gobbledygook from the
morphosyntax data had it been written in ASCII.  But he was able to
read and comment on the whole grammar in about half an hour because he
could just skip *all* the ASCII!

From stephen at  Tue Oct  2 06:11:58 2012
From: stephen at (Stephen J. Turnbull)
Date: Tue, 02 Oct 2012 13:11:58 +0900
Subject: [Python-ideas] Visually confusable unicode characters
	in	identifiers
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <>

Guido van Rossum writes:

 > Those examples would be a lot more compelling if there was an
 > acceptable way to input those characters.

Hey!!  What's "unacceptable" about Emacs??<duck/>

 > Maybe we could support some kind of input method that enabled LaTeX
 > style math notation as used by scientists for writing equations in
 > papers?

If you're talking about interactive use, Emacs has a method based on
searching the Unicode character database.

LaTeX math notation has a number of potential pitfalls.  In
particular, the sub-/superscript notation can be applied to anything,
not just characters that happen to have *script versions in Unicode.
Also, not everything that seems to a character in LaTeX necessarily
has a corresponding Unicode character.

From ben+python at  Tue Oct  2 06:25:40 2012
From: ben+python at (Ben Finney)
Date: Tue, 02 Oct 2012 14:25:40 +1000
Subject: [Python-ideas] Visually confusable unicode characters in
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <>

Matthew Woodcraft <matthew at>

> On 2012-10-01 21:51, Guido van Rossum wrote:
> > Those examples would be a lot more compelling if there was an
> > acceptable way to input those characters. Maybe we could support
> > some kind of input method that enabled LaTeX style math notation as
> > used by scientists for writing equations in papers?
> I think that's up to the OS or the text editor.

Agreed. Make of these identifiers will need to be typed at an OS command
line, after all (e.g. for naming a test case to run, as one which
springs easily to mind).

Solve the keyboard input problem in the OS layer ? as someone who
anticipates working with non-ASCII characters must already do ? and you
solve it for Python code as well. I don't think it's Python's business
to get involved at the input method level.

 \       ?The apparent lesson of the Inquisition is that insistence on |
  `\         uniformity of belief is fatal to intellectual, moral, and |
_o__)    spiritual health.? ?_The Uses Of The Past_, Herbert J. Muller |
Ben Finney

From greg.ewing at  Tue Oct  2 07:09:14 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 02 Oct 2012 18:09:14 +1300
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <>
Message-ID: <>

Mathias Panzenb?ck wrote:
> But it would not be 
> ...en, maybe Is there any German word that uses ...en for plural? 
> I don't think so.

This page seems to think that some do:


From stephen at  Tue Oct  2 10:04:55 2012
From: stephen at (Stephen J. Turnbull)
Date: Tue, 02 Oct 2012 17:04:55 +0900
Subject: [Python-ideas] Visually confusable unicode characters
	in	identifiers
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
	<k4d5do$5if$> <>
Message-ID: <>

Ben Finney writes:

 > Solve the keyboard input problem in the OS layer ? as someone who
 > anticipates working with non-ASCII characters must already do ? and you
 > solve it for Python code as well.

That simply isn't true for symbol characters and Greek letters.  I
still let either TeX or XEmacs translate TeX macros for me.  I don't
even know how to type an integral sign in Mac OS X Terminal
(conveniently, that is -- of course there's always the character
palette), and if I wanted directed quotation marks (I don't), I'd just
use ASCII quotes and let XEmacs translate those, too.

There ought to be a standard way to get those symbols and punctuation,
preferably ASCII-based, on any terminal, using the standard Python

From storchaka at  Tue Oct  2 12:43:07 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 02 Oct 2012 13:43:07 +0300
Subject: [Python-ideas] Visually confusable unicode characters in
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
Message-ID: <k4egfv$6i0$>

On 01.10.12 23:51, Guido van Rossum wrote:
> Those examples would be a lot more compelling if there was an
> acceptable way to input those characters. Maybe we could support some
> kind of input method that enabled LaTeX style math notation as used by
> scientists for writing equations in papers?


Java already allows this outside of the string literals. And it 
sometimes causes unpleasant effects.

From ben+python at  Tue Oct  2 13:39:12 2012
From: ben+python at (Ben Finney)
Date: Tue, 02 Oct 2012 21:39:12 +1000
Subject: [Python-ideas] Visually confusable unicode characters
	in	identifiers
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
	<k4d5do$5if$> <>
Message-ID: <>

"Stephen J. Turnbull" <stephen at>

> I still let either TeX or XEmacs translate TeX macros for me. I don't
> even know how to type an integral sign in Mac OS X Terminal
> (conveniently, that is -- of course there's always the character
> palette), and if I wanted directed quotation marks (I don't), I'd just
> use ASCII quotes and let XEmacs translate those, too.

Right. So you've solved it for one program only, not the OS which is (or
should be) responsible for turning what you type into characters,
uniformly across all applications you have keyboard input for.

> There ought to be a standard way to get those symbols and punctuation,
> preferably ASCII-based, on any terminal

Definitely agreed with this. Indeed, it's my point: the problem should
be solved in one place for the user of the computer, not separately per
application or framework.

> using the standard Python interpreter.

If you mean that the Python interpreter should be aware of the solution,
why? That's solving it at the wrong level, because any non-Python
program (such as a shell or an editor) gets no benefit from that.

If you mean that the single, one-point solution should work across all
programs, including the standard Python interpreter, then yes I agree.

I'm saying the OS is the right place to solve it, by installing an
appropriate input method (or whatever each OS calls them).

 \         ?In economics, hope and faith coexist with great scientific |
  `\      pretension and also a deep desire for respectability.? ?John |
_o__)                                    Kenneth Galbraith, 1970-06-07 |
Ben Finney

From stephen at  Wed Oct  3 07:31:46 2012
From: stephen at (Stephen J. Turnbull)
Date: Wed, 03 Oct 2012 14:31:46 +0900
Subject: [Python-ideas] Visually confusable unicode
	characters	in	identifiers
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
	<k4d5do$5if$> <>
Message-ID: <>

Ben Finney writes:

 > "Stephen J. Turnbull" <stephen at>
 > writes:
 > > I still let either TeX or XEmacs translate TeX macros for me. I don't
 > > even know how to type an integral sign in Mac OS X Terminal
 > > (conveniently, that is -- of course there's always the character
 > > palette), and if I wanted directed quotation marks (I don't), I'd just
 > > use ASCII quotes and let XEmacs translate those, too.
 > Right. So you've solved it for one program only, not the OS

You seem to be under a misconception.  Emacs *is* an OS, it just runs
on top of the more primitive OSes normally associated with the term. ;-)

 > I'm saying the OS is the right place to solve it, by installing an
 > appropriate input method (or whatever each OS calls them).

I doubt very many people used to and fond of LaTeX would agree with
you, since AFAIK there aren't any OSes providing TeX macros as an
input method.  AFAICS it's not available on my Mac.

While I don't particularly favor it, it may be the best compromise, as
many people are familiar with it, and many many symbols are available
with familiar, intuitive names so that non-TeXnical typists can often
guess them.

From bborcic at  Wed Oct  3 14:52:43 2012
From: bborcic at (Boris Borcic)
Date: Wed, 03 Oct 2012 14:52:43 +0200
Subject: [Python-ideas] Deprecate the round builtin
In-Reply-To: <>
References: <>
Message-ID: <k4hceq$5nn$>

Mike Graham wrote:
> round(x, n) for n>0 is quite simply not sane code.

I've occasionally used round(x,n) with n>0 - as a quick way to normalize away 
numeric imprecisions and have values generated by a computation recognized as 
identical set elements or dictionary keys. I'd have used a function to round in 
binary instead of decimal had one been handy, but otoh I don't see it would make 
a real difference, would it?

From chrysn at  Wed Oct  3 16:43:20 2012
From: chrysn at (chrysn)
Date: Wed, 3 Oct 2012 16:43:20 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Sep 26, 2012 at 10:02:24AM -0700, Josiah Carlson wrote:
> Go ahead and read PEP 3153, we will wait.
> A careful reading of PEP 3153 will tell you that the intent is to make
> a "light" version of Twisted built into Python. There isn't any
> discussion as to *why* this is a good idea, it just lays out the plan
> of action. Its ideas were gathered from the experience of the Twisted
> folks.
> Their experience is substantial, but in the intervening 1.5+ years
> since Pycon 2011, only the barest of abstract interfaces has been
> defined (,
> and no discussion has taken place as to forward migration of the
> (fairly large) body of existing asyncore code.

it doesn't look like twisted-light to me, more like a interface
suggestion for a small subset of twisted. in particular, it doesn't talk
about main loops / reactors / registration-in-the-first-place.

you mention interaction with the twisted people. is there willingness,
from the twisted side, to use a standard python middle layer, once it
exists and has sufficiently high quality?

> To the point, Giampaolo already has a reactor that implements the
> interface (more or less "idea #3" from his earlier message), and it's
> been used in production (under staggering ftp(s) load). Even better,
> it offers effectively transparent replacement of the existing asyncore
> loop, and supports existing asyncore-derived classes. It is available:

i've had a look at it, but honestly can't say more than that it's good
to have a well-tested asyncore compatible main loop with scheduling
support, and i'll try it out for my own projects.

> >> Again, at this point in time what you're proposing looks too vague,
> >> ambitious and premature to me.
> >
> > please don't get me wrong -- i'm not proposing anything for immediate
> > action, i just want to start a thinking process towards a better
> > integrated stdlib.
> I am curious as to what you mean by "a better integrated stdlib". A
> new interface that doesn't allow people to easily migrate from an
> existing (and long-lived, though flawed) standard library is not
> better integration. Better integration requires allowing previous
> users to migrate, while encouraging new users to join in with any
> later development. That's what Giampaolo's suggested interface offers
> on the lowest level; something to handle file-handle reactors,
> combined with a scheduler.

a new interface won't make integration automatically happen, but it's
something the standard library components can evolve on. whether, for
example urllib2 will then automatically work asynchronously in that
framework or whether we'll wait for urllib3, we'll see when we have it.

@migrate from an existing standard library: is there a big user base for
the current asyncore framework? my impression from is that it is not
very well known among python users, and most that could use it use

> > we've talked about many things we'd need in a python asynchronous
> > interface (not implementation), so what are the things we *don't* need?
> > (so we won't start building a framework like twisted). i'll start:
> >
> > * high-level protocol handling (can be extra modules atop of it)
> > * ssl
> > * something like the twisted delayed framework (not sure about that, i
> >   guess the twisted people will have good reason to use it, but i don't
> >   see compelling reasons for such a thing in a minimal interface from my
> >   limited pov)
> > * explicit connection handling (retries, timeouts -- would be up to the
> >   user as well, eg urllib might want to set up a timeout and retries for
> >   asynchronous url requests)
> I disagree with the last 3. If you have an IO loop, more often than
> not you want an opportunity to do something later in the same context.
> This is commonly the case for bandwidth limiting, connection timeouts,
> etc., which are otherwise *very* difficult to do at a higher level
> (which are the reasons why schedulers are built into IO loops).
> Further, SSL in async can be tricky to get right. Having the 20-line
> SSL layer as an available class is a good idea, and will save people
> time by not having them re-invent it (poorly or incorrectly) every
> time.

i see; those should be provided, then.

i'm afraid i don't completely get the point you're making, sorry for
that, maybe i've missed important statements or lack sufficiently deep
knowledge of topics affected and got lost in details.

what is your opinion on the state of asynchronous operations in python,
and what would you like it to be?

thanks for staying with this topic

To use raw power is to make yourself infinitely vulnerable to greater powers.
  -- Bene Gesserit axiom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <>

From maxmoroz at  Thu Oct  4 13:48:03 2012
From: maxmoroz at (Max Moroz)
Date: Thu, 4 Oct 2012 04:48:03 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
Message-ID: <>

It seems that built-in classes do not short-circuit `__eq__` method
when the objects are identical, at least in CPython:

    f = frozenset(range(200000000))
    f1 = f
    f1 == f # this operation will take about 1 sec on my machine

Is there any disadvantage to checking whether the equality was called
with the same object, and if it was, return `True` right away? I
noticed this when trying to memoize a function that has large
frozenset arguments. While hashing of a large argument is very fast
after it's done once (hash value is presumably cached), the equality
comparison is always slow even against itself. So when the same large
argument is provided over and over, memoization is slow.

Of course, there's a workaround: subclass frozenset, and redefine
__eq__ to check id() first. And arguably, for this particular use
case, I should redefine both __hash__ and __eq__, to make them only
look exclusively at id(), since it's not worth wasting memoizer time
trying to compare two non-identical large arguments that are highly
unlikely to compare equal anyway. So if there's any reason for the
current implementation, I don't have a strong argument against it.

From steve at  Thu Oct  4 15:53:50 2012
From: steve at (Steven D'Aprano)
Date: Thu, 04 Oct 2012 23:53:50 +1000
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 04/10/12 21:48, Max Moroz wrote:
> It seems that built-in classes do not short-circuit `__eq__` method
> when the objects are identical, at least in CPython:
>      f = frozenset(range(200000000))
>      f1 = f
>      f1 == f # this operation will take about 1 sec on my machine

You shouldn't over-generalize. Some built-ins do short-circuit __eq__
when the objects are identical. I believe that strings and ints both
do. Other types might not.

> Is there any disadvantage to checking whether the equality was called
> with the same object, and if it was, return `True` right away?

That would break floats and Decimals, both of which support NANs.

The decision whether or not to optimize __eq__ should be left up to the
type. Some types, for example, might decide to optimize x == x even if
x contains a NAN or other objects that break reflexivity of equality.
Other types might prefer not to.

(Please do not start an argument about NANs and reflexivity. That's
been argued to death, and there are very good reasons for the IEEE 754
standard to define NANs the way they do.)

Since frozensets containing NANs are rare (I presume), I think it is
reasonable to optimize frozenset equality. But I do not think it is
reasonable for Python to mandate identity checking before __eq__.

> I noticed this when trying to memoize a function that has large
> frozenset arguments. While hashing of a large argument is very fast
> after it's done once (hash value is presumably cached), the equality
> comparison is always slow even against itself. So when the same large
> argument is provided over and over, memoization is slow.

I'm not sure what you are doing here, because dicts (at least in Python
3.2) already short-circuit equality:

py> NAN = float('nan')
py> NAN == NAN
py> d = {NAN: 42}
py> d[NAN]

Actually, that behaviour goes back to at least 2.4, so I'm not sure how
you are doing memoization and not seeing the same optimization.


From grosser.meister.morti at  Thu Oct  4 16:02:29 2012
From: grosser.meister.morti at (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Thu, 04 Oct 2012 16:02:29 +0200
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
> py> NAN == NAN
> False

Why isn't this True anyway? Is there a PEP that explains this (IMHO odd) behavior?

From mikegraham at  Thu Oct  4 16:07:36 2012
From: mikegraham at (Mike Graham)
Date: Thu, 4 Oct 2012 10:07:36 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Thu, Oct 4, 2012 at 10:02 AM, Mathias Panzenb?ck
<grosser.meister.morti at> wrote:
> On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
>> py> NAN == NAN
>> False
> Why isn't this True anyway? Is there a PEP that explains this (IMHO odd)
> behavior?

IEEE 754 specifies this.


From python at  Thu Oct  4 16:19:44 2012
From: python at (MRAB)
Date: Thu, 04 Oct 2012 15:19:44 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 2012-10-04 15:07, Mike Graham wrote:
> On Thu, Oct 4, 2012 at 10:02 AM, Mathias Panzenb?ck
> <grosser.meister.morti at> wrote:
>> On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
>>> py> NAN == NAN
>>> False
>> Why isn't this True anyway? Is there a PEP that explains this (IMHO odd)
>> behavior?
> IEEE 754 specifies this.
Think of it this way:

Calculation A returns NaN for some reason

Calculation B also returns NaN for some reason

Have they really returned the same result? Just because they're both
NaN doesn't mean that they're the _same_ NaN...

From rosuav at  Thu Oct  4 16:30:50 2012
From: rosuav at (Chris Angelico)
Date: Fri, 5 Oct 2012 00:30:50 +1000
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Oct 5, 2012 at 12:19 AM, MRAB <python at> wrote:
> On 2012-10-04 15:07, Mike Graham wrote:
>> On Thu, Oct 4, 2012 at 10:02 AM, Mathias Panzenb?ck
>> <grosser.meister.morti at> wrote:
>>> On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
>>>> py> NAN == NAN
>>>> False
>>> Why isn't this True anyway? Is there a PEP that explains this (IMHO odd)
>>> behavior?
>> IEEE 754 specifies this.
> Think of it this way:
> Calculation A returns NaN for some reason
> Calculation B also returns NaN for some reason
> Have they really returned the same result? Just because they're both
> NaN doesn't mean that they're the _same_ NaN...

The only other viable option would be to declare that (NaN==NaN) is
NaN - kinda like SQL's NULL and its weird semantics. And that would be
*highly* confusing to many situations.


From victor.stinner at  Thu Oct  4 17:08:40 2012
From: victor.stinner at (Victor Stinner)
Date: Thu, 4 Oct 2012 17:08:40 +0200
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

2012/10/4 Steven D'Aprano <steve at>:
> On 04/10/12 21:48, Max Moroz wrote:
>> It seems that built-in classes do not short-circuit `__eq__` method
>> when the objects are identical, at least in CPython:
>>      f = frozenset(range(200000000))
>>      f1 = f
>>      f1 == f # this operation will take about 1 sec on my machine
> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
> when the objects are identical. I believe that strings and ints both
> do. Other types might not.

This optimization is not implemented for Unicode strings.

PyObject_RichCompareBool() implements this optimization which leads to
incorrect results:

nan = float("nan")
mytuple = (nan,)
assert mytuple != mytuple # fails

I think that the optimization should be implemented for Unicode
strings, but disabled in PyObject_RichCompareBool().

@Max Moroz: Can you please open an issue on


From steve at  Thu Oct  4 17:53:36 2012
From: steve at (Steven D'Aprano)
Date: Fri, 05 Oct 2012 01:53:36 +1000
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 05/10/12 01:08, Victor Stinner wrote:
> 2012/10/4 Steven D'Aprano<steve at>:
>> On 04/10/12 21:48, Max Moroz wrote:
>>> It seems that built-in classes do not short-circuit `__eq__` method
>>> when the objects are identical, at least in CPython:
>>>       f = frozenset(range(200000000))
>>>       f1 = f
>>>       f1 == f # this operation will take about 1 sec on my machine
>> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
>> when the objects are identical. I believe that strings and ints both
>> do. Other types might not.
> This optimization is not implemented for Unicode strings.

That does not match my experience. In Python 3.2, I generate a large
unicode string, and an equal but not identical copy:

s = "a?cdef"*100000
t = "a" + s[1:]
assert s is not t and s == t

Using timeit, s == s is about 10000 times faster than s == t.


From python at  Thu Oct  4 18:05:43 2012
From: python at (MRAB)
Date: Thu, 04 Oct 2012 17:05:43 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-04 16:53, Steven D'Aprano wrote:
> On 05/10/12 01:08, Victor Stinner wrote:
>> 2012/10/4 Steven D'Aprano<steve at>:
>>> On 04/10/12 21:48, Max Moroz wrote:
>>>> It seems that built-in classes do not short-circuit `__eq__` method
>>>> when the objects are identical, at least in CPython:
>>>>       f = frozenset(range(200000000))
>>>>       f1 = f
>>>>       f1 == f # this operation will take about 1 sec on my machine
>>> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
>>> when the objects are identical. I believe that strings and ints both
>>> do. Other types might not.
>> This optimization is not implemented for Unicode strings.
> That does not match my experience. In Python 3.2, I generate a large
> unicode string, and an equal but not identical copy:
> s = "a?cdef"*100000
> t = "a" + s[1:]
> assert s is not t and s == t
> Using timeit, s == s is about 10000 times faster than s == t.
In Python 3.3 I get a similar result.

From oscar.j.benjamin at  Thu Oct  4 18:48:59 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Thu, 4 Oct 2012 17:48:59 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 4 October 2012 17:05, MRAB <python at> wrote:
> On 2012-10-04 16:53, Steven D'Aprano wrote:
>> On 05/10/12 01:08, Victor Stinner wrote:
>>> 2012/10/4 Steven D'Aprano<steve at>:
>>>> On 04/10/12 21:48, Max Moroz wrote:
>>>>> It seems that built-in classes do not short-circuit `__eq__` method
>>>>> when the objects are identical, at least in CPython:
>>>>>       f = frozenset(range(200000000))
>>>>>       f1 = f
>>>>>       f1 == f # this operation will take about 1 sec on my machine
>>>> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
>>>> when the objects are identical. I believe that strings and ints both
>>>> do. Other types might not.
>>> This optimization is not implemented for Unicode strings.
>> That does not match my experience. In Python 3.2, I generate a large
>> unicode string, and an equal but not identical copy:
>> s = "a?cdef"*100000
>> t = "a" + s[1:]
>> assert s is not t and s == t
>> Using timeit, s == s is about 10000 times faster than s == t.
> In Python 3.3 I get a similar result.

This was discussed not long ago in a different thread. Here is the line:

As I understood it that line is the reason that comparisons for
interned strings are faster.


From grosser.meister.morti at  Thu Oct  4 18:51:23 2012
From: grosser.meister.morti at (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Thu, 04 Oct 2012 18:51:23 +0200
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
> On 04/10/12 21:48, Max Moroz wrote:
>> It seems that built-in classes do not short-circuit `__eq__` method
>> when the objects are identical, at least in CPython:
>>      f = frozenset(range(200000000))
>>      f1 = f
>>      f1 == f # this operation will take about 1 sec on my machine
> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
> when the objects are identical. I believe that strings and ints both
> do. Other types might not.
>> Is there any disadvantage to checking whether the equality was called
>> with the same object, and if it was, return `True` right away?
> That would break floats and Decimals, both of which support NANs.
> The decision whether or not to optimize __eq__ should be left up to the
> type. Some types, for example, might decide to optimize x == x even if
> x contains a NAN or other objects that break reflexivity of equality.
> Other types might prefer not to.
> (Please do not start an argument about NANs and reflexivity. That's
> been argued to death, and there are very good reasons for the IEEE 754
> standard to define NANs the way they do.)
> Since frozensets containing NANs are rare (I presume), I think it is
> reasonable to optimize frozenset equality. But I do not think it is
> reasonable for Python to mandate identity checking before __eq__.

But it seems like set and frozenset behave like this anyway (using "is" to compare it's items):

 >>> frozenset([float("nan")]) == frozenset([float("nan")])

 >>> s = frozenset([float("nan")])
 >>> s == s

 >>> NaN = float("nan")
 >>> NaN == NaN
 >>> frozenset([NaN]) == frozenset([NaN])

So the "is" optimization should not change it's semantics.

(I tested this in Python 2.7.3 and 3.2.3)

>> I noticed this when trying to memoize a function that has large
>> frozenset arguments. While hashing of a large argument is very fast
>> after it's done once (hash value is presumably cached), the equality
>> comparison is always slow even against itself. So when the same large
>> argument is provided over and over, memoization is slow.
> I'm not sure what you are doing here, because dicts (at least in Python
> 3.2) already short-circuit equality:
> py> NAN = float('nan')
> py> NAN == NAN
> False
> py> d = {NAN: 42}
> py> d[NAN]
> 42
> Actually, that behaviour goes back to at least 2.4, so I'm not sure how
> you are doing memoization and not seeing the same optimization.

From maxmoroz at  Thu Oct  4 19:49:45 2012
From: maxmoroz at (Max Moroz)
Date: Thu, 4 Oct 2012 10:49:45 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 4, 2012 at 6:53 AM, Steven D'Aprano <steve at> wrote:
> I'm not sure what you are doing here, because dicts (at least in Python
> 3.2) already short-circuit equality:
> py> NAN = float('nan')
> py> NAN == NAN
> False
> py> d = {NAN: 42}
> py> d[NAN]
> 42
> Actually, that behaviour goes back to at least 2.4, so I'm not sure how
> you are doing memoization and not seeing the same optimization.

It was my mistake... I do see this optimization now that I know where
to look for it. Thanks for clarifying this.

From maxmoroz at  Thu Oct  4 19:50:50 2012
From: maxmoroz at (Max Moroz)
Date: Thu, 4 Oct 2012 10:50:50 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 4, 2012 at 7:19 AM, MRAB <python at> wrote:
> Think of it this way:
> Calculation A returns NaN for some reason
> Calculation B also returns NaN for some reason
> Have they really returned the same result? Just because they're both
> NaN doesn't mean that they're the _same_ NaN...

Someone who performs two calculations with float numbers should never
compare their results for equality. It's really a bug to rely on that

# this is a bug
# since the result of this comparison for regular numbers is unpredictable
# so doesn't it really matter how this behaves when NaNs are compared?
if a/b == c/d:
    # ...

On the other hand, comparing a number to another number, when none of
the two numbers are involved in a calculation, is perfectly fine:

# this is not a bug
# too bad that it won't work as expected
# when input1 == input2 == 'nan'
a = float(input1)
b = float(input2)
if a == b:
    # ...

So it seems to me your argument is this: "let's break the expectations
of developers who are writing valid code, in order to partially meet
the expectations of developers who are writing buggy code". If so, I disagree.

From solipsis at  Fri Oct  5 01:00:10 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 5 Oct 2012 01:00:10 +0200
Subject: [Python-ideas] checking for identity before comparing built-in
References: <>
Message-ID: <>

On Thu, 4 Oct 2012 17:08:40 +0200
Victor Stinner <victor.stinner at>
> PyObject_RichCompareBool() implements this optimization which leads to
> incorrect results:
> nan = float("nan")
> mytuple = (nan,)
> assert mytuple != mytuple # fails
> I think that the optimization should be implemented for Unicode
> strings, but disabled in PyObject_RichCompareBool().

I think we should wait for someone to complain before disabling it.
It's a useful optimization.



Software development and contracting:

From ben+python at  Thu Oct  4 00:20:34 2012
From: ben+python at (Ben Finney)
Date: Thu, 04 Oct 2012 08:20:34 +1000
Subject: [Python-ideas] Visually confusable unicode
	characters	in	identifiers
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
	<k4d5do$5if$> <>
Message-ID: <>

"Stephen J. Turnbull" <stephen at>

> Ben Finney writes:
>  > Right. So you've solved it for one program only, not the OS
> You seem to be under a misconception.  Emacs *is* an OS [?]

? all it needs is a good editor? :-)

(I'm claiming permission for that snark because Emacs is my primary

>  > I'm saying the OS is the right place to solve it, by installing an
>  > appropriate input method (or whatever each OS calls them).
> I doubt very many people used to and fond of LaTeX would agree with
> you, since AFAIK there aren't any OSes providing TeX macros as an
> input method.

I've shown several LaTeX-comfortable people IBus on GNOME and/or KDE
(for GNU+Linux), and they were very glad that it has a LaTeX input
method. So anyone who is fond of LaTeX and has IBus or an equivalent
input method engine on their OS can agree.

> AFAICS it's not available on my Mac.

That's a shame. Maybe some OS vendors don't want to support users
extending the OS functionality? Or maybe your OS does have such a thing
available. I haven't been motivated to look for it.

> While I don't particularly favor it, it may be the best compromise, as
> many people are familiar with it, and many many symbols are available
> with familiar, intuitive names so that non-TeXnical typists can often
> guess them.

Agreed. Which is why I advocate installing such an input method in one's
OS input method engine, so that input method is available for all

 \     ?I thought I'd begin by reading a poem by Shakespeare, but then |
  `\     I thought ?Why should I? He never reads any of mine.?? ?Spike |
_o__)                                                         Milligan |
Ben Finney

From stephen at  Fri Oct  5 05:11:34 2012
From: stephen at (Stephen J. Turnbull)
Date: Fri, 05 Oct 2012 12:11:34 +0900
Subject: [Python-ideas] Visually confusable
	unicode	characters	in	identifiers
In-Reply-To: <>
References: <>
	<> <k4ch7q$8rv$>
	<> <k4cl1q$dd1$>
	<k4d5do$5if$> <>
Message-ID: <>

Ben Finney writes:

 > I've shown several LaTeX-comfortable people IBus on GNOME and/or KDE
 > (for GNU+Linux), and they were very glad that it has a LaTeX input
 > method.

I'm happy to be proved wrong!

 > > AFAICS it's not available on my Mac.
 > That's a shame. Maybe some OS vendors don't want to support users
 > extending the OS functionality? Or maybe your OS does have such a thing
 > available. I haven't been motivated to look for it.

I have looked for it; if it's available on Mac OS X, it's not easy to
find.  I suspect the same is true for Windows.

 > Agreed. Which is why I advocate installing such an input method in one's
 > OS input method engine, so that input method is available for all
 > applications.

Whatever makes you think I don't?  That's *exactly* why I live in
XEmacs, because it provides me with a portable environment for mixing
English and math with a language whose orthography puts Brainf*ck
syntax to shame.

But pragmatically speaking, Unicode support is a sore point for
Python.  "Screw you if you don't know how to conveniently input
integral signs on your OS" is not a message we want to be sending.

From steve at  Fri Oct  5 06:52:55 2012
From: steve at (Steven D'Aprano)
Date: Fri, 5 Oct 2012 14:52:55 +1000
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <20121005045254.GA14666@ando>

On Fri, Oct 05, 2012 at 01:00:10AM +0200, Antoine Pitrou wrote:
> On Thu, 4 Oct 2012 17:08:40 +0200
> Victor Stinner <victor.stinner at>
> wrote:
> > PyObject_RichCompareBool() implements this optimization which leads to
> > incorrect results:
> > 
> > nan = float("nan")
> > mytuple = (nan,)
> > assert mytuple != mytuple # fails
> > 
> > I think that the optimization should be implemented for Unicode
> > strings, but disabled in PyObject_RichCompareBool().
> I think we should wait for someone to complain before disabling it.
> It's a useful optimization.


I will go to the wall to defend correct IEEE 754 semantics for NANs, but 
I also support containers that optimise away those semantics by default.

I think it's too early to talk about disabling it without even the 
report of a bug caused by it.


From andy at  Fri Oct  5 11:27:28 2012
From: andy at (Andy Buckley)
Date: Fri, 05 Oct 2012 11:27:28 +0200
Subject: [Python-ideas] History stepping in interactive session?
Message-ID: <>

A couple of weeks ago I posted a question on about whether
there is a way to get the same *very* convenient
stepping-through-command-history behaviour in an interactive Python
interpreter session as is possible in (at least) the bash shell with the
Ctrl-o keybinding:

I was spurred to ask this question by a painful development experience
full of Up Up Up Up Up Enter Up Up Up Up Up Enter ... keypresses to
repeat a previous set of Python commands/statements that weren't worth
putting in a script file, or which I wanted to make very minor changes
to on each iteration.

As you might have noticed, I didn't get any answers, which either means
that I'm the only person in the world to think this is an issue worth
getting bothered about, or that there is no such behaviour available.
Perhaps both -- but my feeling is that if this behaviour were available
and well-known, it would become heavily used and very popular. As many
other readline behaviours *do* work, this one would be really nice to
have -- any chance that it could be added to a future release? (if it's
not already there via some secret binding)


From stephen at  Fri Oct  5 12:26:03 2012
From: stephen at (Stephen J. Turnbull)
Date: Fri, 05 Oct 2012 19:26:03 +0900
Subject: [Python-ideas]  History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

Andy Buckley writes:

 > A couple of weeks ago I posted a question on

Maybe it's a bug.  (See below.)  Have you checked the tracker?  Have
you posted to python-list?  That's a better place than here to get
that kind of information.

 > As you might have noticed,

The people on this list (and on python-dev) probably don't pay much
attention to questions on, unless they're the kind of
people who hang out on python-list.

Sorry for not being much help, but after trying the obvious (read the
GNU bash manpage, grep the output of "bind -p" to find out what C-o
does, check that python does link to "True GNU" readline on my
platform, try python, and when that didn't work, restart python after
doing "bind -p >> ~/.inputc", which didn't work either), I don't know.

It *might* be a bug, or you could file for an RFE if it's by design.

From solipsis at  Fri Oct  5 14:09:27 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 5 Oct 2012 14:09:27 +0200
Subject: [Python-ideas] History stepping in interactive session?
References: <>
Message-ID: <>

On Fri, 05 Oct 2012 11:27:28 +0200
Andy Buckley <andy at> wrote:
> A couple of weeks ago I posted a question on about whether
> there is a way to get the same *very* convenient
> stepping-through-command-history behaviour in an interactive Python
> interpreter session as is possible in (at least) the bash shell with the
> Ctrl-o keybinding:

The interactive interpreter (and I mean the default one, not
third-party choices like IPython) uses libreadline for its
editing and history functionality, so it's really a question about
libreadline you're asking. I don't know if it allows such
customization, but perhaps the Web site has the answer you're looking



Software development and contracting:

From oscar.j.benjamin at  Fri Oct  5 14:43:00 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Fri, 5 Oct 2012 13:43:00 +0100
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 October 2012 10:27, Andy Buckley <andy at> wrote:
> A couple of weeks ago I posted a question on about whether
> there is a way to get the same *very* convenient
> stepping-through-command-history behaviour in an interactive Python
> interpreter session as is possible in (at least) the bash shell with the
> Ctrl-o keybinding:
> I was spurred to ask this question by a painful development experience
> full of Up Up Up Up Up Enter Up Up Up Up Up Enter ... keypresses to
> repeat a previous set of Python commands/statements that weren't worth
> putting in a script file, or which I wanted to make very minor changes
> to on each iteration.

As soon as I find myself doing this I quit the interpreter and start
ipython. The feature that ipython has that makes what you are doing
much easier is the magic %edit command. Just type

In [1]: edit

and your favourite editor will open up allowing you to write/edit some
code. When you close the editor, ipython will run the code from
within the interactive session (as if you had typed it in directly).
If you want to rerun that code with modifications just type 'edit' again and you can make the modifications within your editor.


From steve at  Fri Oct  5 16:24:39 2012
From: steve at (Steven D'Aprano)
Date: Sat, 06 Oct 2012 00:24:39 +1000
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

On 05/10/12 22:09, Antoine Pitrou wrote:
> On Fri, 05 Oct 2012 11:27:28 +0200
> Andy Buckley<andy at>  wrote:
>> A couple of weeks ago I posted a question on about whether
>> there is a way to get the same *very* convenient
>> stepping-through-command-history behaviour in an interactive Python
>> interpreter session as is possible in (at least) the bash shell with the
>> Ctrl-o keybinding:
> The interactive interpreter (and I mean the default one, not
> third-party choices like IPython) uses libreadline for its
> editing and history functionality, so it's really a question about
> libreadline you're asking.

I don't think so. I'm not an expert on readline, but it seems to me to be a
Python bug.

In bash, I check for the existence of the "operate-and-get-next" command,
and sure enough it is bound to C-o (Ctrl-o) as expected:

[steve at ando ~]$ bind -p | grep operate
"\C-o": operate-and-get-next

I don't believe that there is any direct mechanism for querying the current
readline bindings in Python, but I can fake it with the "dump-functions"

import readline
readline.parse_and_bind(r'"\C-xd": dump-functions')

If I then type Ctrl-x d at the interactive interpreter, readline dumps the
function bindings to screen:

py> readline.parse_and_bind(r'"\C-xd": dump-functions')

abort can be found on "\C-g", "\C-x\C-g", "\M-\C-g".
accept-line can be found on "\C-j", "\C-m".
arrow-key-prefix is not bound to any keys
backward-byte is not bound to any keys
backward-char can be found on "\C-b", "\M-OD", "\M-[D".

operate-and-get-next is absent from the list. I don't mean that it is
not bound. It just isn't there at all.

If I nevertheless try to use it:

readline.parse_and_bind(r'"\C-o": operate-and-get-next')

it does *not* enable Ctrl-o as expected, operate-and-get-next remains
absent from the list of bindings.

I have checked this on both Python 2.7 and 3.3.0rc3 under Centos 5,
and on 3.3.0rc3 under Debian Squeeze.


From solipsis at  Fri Oct  5 16:30:00 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 5 Oct 2012 16:30:00 +0200
Subject: [Python-ideas] History stepping in interactive session?
References: <>
Message-ID: <>

On Sat, 06 Oct 2012 00:24:39 +1000
Steven D'Aprano <steve at> wrote:

> On 05/10/12 22:09, Antoine Pitrou wrote:
> > On Fri, 05 Oct 2012 11:27:28 +0200
> > Andy Buckley<andy at>  wrote:
> >> A couple of weeks ago I posted a question on about whether
> >> there is a way to get the same *very* convenient
> >> stepping-through-command-history behaviour in an interactive Python
> >> interpreter session as is possible in (at least) the bash shell with the
> >> Ctrl-o keybinding:
> >
> > The interactive interpreter (and I mean the default one, not
> > third-party choices like IPython) uses libreadline for its
> > editing and history functionality, so it's really a question about
> > libreadline you're asking.
> I don't think so. I'm not an expert on readline, but it seems to me to be a
> Python bug.
[snip useful explanations]

Well, if there is a bug, then it should be reported on the tracker (and
a patch uploaded, if possible :-)).



Software development and contracting:

From phd at  Fri Oct  5 15:17:17 2012
From: phd at (Oleg Broytman)
Date: Fri, 5 Oct 2012 17:17:17 +0400
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 05, 2012 at 02:09:27PM +0200, Antoine Pitrou <solipsis at> wrote:

   The manual lacks the function "operate-and-get-next" bound in bash to
Ctrl-O. Either the manual is old or the function is not a function of
readline but rather one implemented by bash. That requires further
investigation (which I'm not going to do).

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From nadeem.vawda at  Fri Oct  5 17:23:08 2012
From: nadeem.vawda at (Nadeem Vawda)
Date: Fri, 5 Oct 2012 17:23:08 +0200
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 5, 2012 at 3:17 PM, Oleg Broytman <phd at> wrote:
> On Fri, Oct 05, 2012 at 02:09:27PM +0200, Antoine Pitrou <solipsis at> wrote:
>    The manual lacks the function "operate-and-get-next" bound in bash to
> Ctrl-O. Either the manual is old or the function is not a function of
> readline but rather one implemented by bash. That requires further
> investigation (which I'm not going to do).

The function is implemented by bash; see operate_and_get_next() in bashline.c.


From phd at  Fri Oct  5 17:39:27 2012
From: phd at (Oleg Broytman)
Date: Fri, 5 Oct 2012 19:39:27 +0400
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 05, 2012 at 05:23:08PM +0200, Nadeem Vawda <nadeem.vawda at> wrote:
> On Fri, Oct 5, 2012 at 3:17 PM, Oleg Broytman <phd at> wrote:
> > On Fri, Oct 05, 2012 at 02:09:27PM +0200, Antoine Pitrou <solipsis at> wrote:
> >>
> >
> >    The manual lacks the function "operate-and-get-next" bound in bash to
> > Ctrl-O. Either the manual is old or the function is not a function of
> > readline but rather one implemented by bash. That requires further
> > investigation (which I'm not going to do).
> The function is implemented by bash; see operate_and_get_next() in bashline.c.

   Thanks! That closes the issue -- the function are to be implemented
by (a user of) Python if one wants to have it in Python.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From solipsis at  Fri Oct  5 20:25:34 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 5 Oct 2012 20:25:34 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
Message-ID: <>


This PEP is a resurrection of the idea of having object-oriented
filesystem paths in the stdlib. It comes with a general API proposal
as well as a specific implementation (*). The implementation is young
and discussion is quite open.




PS: You can all admire my ASCII-art skills.

PEP: 428
Title: The pathlib module -- object-oriented filesystem paths
Version: $Revision$
Last-Modified: $Date
Author: Antoine Pitrou <solipsis at>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 30-July-2012
Python-Version: 3.4


This PEP proposes the inclusion of a third-party module, `pathlib`_, in
the standard library.  The inclusion is proposed under the provisional
label, as described in :pep:`411`.  Therefore, API changes can be done,
either as part of the PEP process, or after acceptance in the standard
library (and until the provisional label is removed).

The aim of this library is to provide a simple hierarchy of classes to
handle filesystem paths and the common operations users do over them.

.. _`pathlib`:

Related work

An object-oriented API for filesystem paths has already been proposed
and rejected in :pep:`355`.  Several third-party implementations of the
idea of object-oriented filesystem paths exist in the wild:

* The historical ` module`_ by Jason Orendorff, Jason R. Coombs
  and others, which provides a ``str``-subclassing ``Path`` class;

* Twisted's slightly specialized `FilePath class`_;

* An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than

* `Unipath`_, a variation on the str-subclassing approach with two public
  classes, an ``AbstractPath`` class for operations which don't do I/O and a
  ``Path`` class for all common operations.

This proposal attempts to learn from these previous attempts and the
rejection of :pep:`355`.

.. _` module`:
.. _`FilePath class`:
.. _`AlternativePathClass proposal`:
.. _`Unipath`:

Why an object-oriented API

The rationale to represent filesystem paths using dedicated classes is the
same as for other kinds of stateless objects, such as dates, times or IP
addresses.  Python has been slowly moving away from strictly replicating
the C language's APIs to providing better, more helpful abstractions around
all kinds of common functionality.  Even if this PEP isn't accepted, it is
likely that another form of filesystem handling abstraction will be adopted
one day into the standard library.

Indeed, many people will prefer handling dates and times using the high-level
objects provided by the ``datetime`` module, rather than using numeric
timestamps and the ``time`` module API.  Moreover, using a dedicated class
allows to enable desirable behaviours by default, for example the case
insensitivity of Windows paths.


Class hierarchy

The `pathlib`_ module implements a simple hierarchy of classes::

                           |          |
                  ---------| PurePath |--------
                  |        |          |       |
                  |        +----------+       |
                  |             |             |
                  |             |             |
                  v             |             v
           +---------------+    |     +------------+
           |               |    |     |            |
           | PurePosixPath |    |     | PureNTPath |
           |               |    |     |            |
           +---------------+    |     +------------+
                  |             v             |
                  |          +------+         |
                  |          |      |         |
                  |   -------| Path |------   |
                  |   |      |      |     |   |
                  |   |      +------+     |   |
                  |   |                   |   |
                  |   |                   |   |
                  v   v                   v   v
             +-----------+              +--------+
             |           |              |        |
             | PosixPath |              | NTPath |
             |           |              |        |
             +-----------+              +--------+

This hierarchy divides path classes along two dimensions:

* a path class can be either pure or concrete: pure classes support only
  operations that don't need to do any actual I/O, which are most path
  manipulation operations; concrete classes support all the operations
  of pure classes, plus operations that do I/O.

* a path class is of a given flavour according to the kind of operating
  system paths it represents.  `pathlib`_ implements two flavours: NT paths
  for the filesystem semantics embodied in Windows systems, POSIX paths for
  other systems (````'s terminology is re-used here).

Any pure class can be instantiated on any system: for example, you can
manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects
under Unix, and so on.  However, concrete classes can only be instantiated
on a matching system: indeed, it would be error-prone to start doing I/O
with ``NTPath`` objects under Unix, or vice-versa.

Furthermore, there are two base classes which also act as system-dependent
factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
``PureNTPath`` depending on the operating system.  Similarly, ``Path``
will instantiate either a ``PosixPath`` or a ``NTPath``.

It is expected that, in most uses, using the ``Path`` class is adequate,
which is why it has the shortest name of all.

No confusion with builtins

In this proposal, the path classes do not derive from a builtin type.  This
contrasts with some other Path class proposals which were derived from
``str``.  They also do not pretend to implement the sequence protocol:
if you want a path to act as a sequence, you have to lookup a dedicate
attribute (the ``parts`` attribute).

By avoiding to pass as builtin types, the path classes minimize the potential
for confusion if they are combined by accident with genuine builtin types.


Path objects are immutable, which makes them hashable and also prevents a
class of programming errors.

Sane behaviour

Little of the functionality from os.path is reused.  Many os.path functions
are tied by backwards compatibility to confusing or plain wrong behaviour
(for example, the fact that ``os.path.abspath()`` simplifies ".." path
components without resolving symlinks first).

Also, using classes instead of plain strings helps make system-dependent
behaviours natural.  For example, comparing and ordering Windows path
objects is case-insensitive, and path separators are automatically converted
to the platform default.

Useful notations

The API tries to provide useful notations all the while avoiding magic.
Some examples::

    >>> p = Path('/home/antoine/pathlib/')
    >>> p.ext
    >>> p.root
    < ['/', 'home', 'antoine', 'pathlib', '']>
    >>> list(p.parents())
    [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
    >>> p.exists()
    >>> p.st_size

Pure paths API

The philosophy of the ``PurePath`` API is to provide a consistent array of
useful path manipulation operations, without exposing a hodge-podge of
functions like ``os.path`` does.


First a couple of conventions:

* All paths can have a drive and a root.  For POSIX paths, the drive is
  always empty.

* A relative path has neither drive nor root.

* A POSIX path is absolute if it has a root.  A Windows path is absolute if
  it has both a drive *and* a root.  A Windows UNC path (e.g.
  ``\\some\\share\\myfile.txt``) always has a drive and a root
  (here, ``\\some\\share`` and ``\\``, respectively).

* A drive which has either a drive *or* a root is said to be anchored.
  Its anchor is the concatenation of the drive and root.  Under POSIX,
  "anchored" is the same as "absolute".

Construction and joining

We will present construction and joining together since they expose
similar semantics.

The simplest way to construct a path is to pass it its string representation::

    >>> PurePath('')

Extraneous path separators and ``"."`` components are eliminated::

    >>> PurePath('a///b/c/./d/')

If you pass several arguments, they will be automatically joined::

    >>> PurePath('docs', 'Makefile')

Joining semantics are similar to os.path.join, in that anchored paths ignore
the information from the previously joined components::

    >>> PurePath('/etc', '/usr', 'bin')

However, with Windows paths, the drive is retained as necessary::

    >>> PureNTPath('c:/foo', '/Windows')
    >>> PureNTPath('c:/foo', 'd:')

Calling the constructor without any argument creates a path object pointing
to the logical "current directory"::

    >>> PurePosixPath()

A path can be joined with another using the ``__getitem__`` operator::

    >>> p = PurePosixPath('foo')
    >>> p['bar']
    >>> p[PurePosixPath('bar')]

As with constructing, multiple path components can be specified at once::

    >>> p['bar/xyzzy']

A join() method is also provided, with the same behaviour.  It can serve
as a factory function::

    >>> path_factory = p.join
    >>> path_factory('bar')


To represent a path (e.g. to pass it to third-party libraries), just call
``str()`` on it::

    >>> p = PurePath('/home/antoine/pathlib/')
    >>> str(p)
    >>> p = PureNTPath('c:/windows')
    >>> str(p)

To force the string representation with forward slashes, use the ``as_posix()``

    >>> p.as_posix()

To get the bytes representation (which might be useful under Unix systems),
call ``bytes()`` on it, or use the ``as_bytes()`` method::

    >>> bytes(p)


Five simple properties are provided on every path (each can be empty)::

    >>> p = PureNTPath('c:/pathlib/')
    >>> p.root
    >>> p.anchor
    >>> p.ext

Sequence-like access

The ``parts`` property provides read-only sequence access to a path object::

    >>> p = PurePosixPath('/etc/init.d')
    < ['/', 'etc', 'init.d']>

Simple indexing returns the invidual path component as a string, while
slicing returns a new path object constructed from the selected components::


Windows paths handle the drive and the root as a single path component::

    >>> p = PureNTPath('c:/')
    < ['c:\\', '']>
    >>> p.root

(separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).

The ``parent()`` method returns an ancestor of the path::

    >>> p.parent()
    >>> p.parent(2)
    >>> p.parent(3)

The ``parents()`` method automates repeated invocations of ``parent()``, until
the anchor is reached::

    >>> p = PureNTPath('c:/python33/bin/python.exe')
    >>> for parent in p.parents(): parent


``is_relative()`` returns True if the path is relative (see definition
above), False otherwise.

``is_reserved()`` returns True if a Windows path is a reserved path such
as ``CON`` or ``NUL``.  It always returns False for POSIX paths.

``match()`` matches the path against a glob pattern::

    >>> PureNTPath('c:/PATHLIB/').match('c:*lib/*.PY')

``relative()`` returns a new relative path by stripping the drive and root::

    >>> PurePosixPath('').relative()
    >>> PurePosixPath('/').relative()

``relative_to()`` computes the relative difference of a path to another::

    >>> PurePosixPath('/usr/bin/python').relative_to('/usr')

``normcase()`` returns a case-folded version of the path for NT paths::

    >>> PurePosixPath('CAPS').normcase()
    >>> PureNTPath('CAPS').normcase()

Concrete paths API

In addition to the operations of the pure API, concrete paths provide
additional methods which actually access the filesystem to query or mutate


The classmethod ``cwd()`` creates a path object pointing to the current
working directory in absolute form::

    >>> Path.cwd()

File metadata

The ``stat()`` method caches and returns the file's stat() result;
``restat()`` forces refreshing of the cache. ``lstat()`` is also provided,
but doesn't have any caching behaviour::

    >>> p.stat()
    posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)

For ease of use, direct attribute access to the fields of the stat structure
is provided over the path object itself::

    >>> p.st_size
    >>> p.st_mtime

Higher-level methods help examine the kind of the file::

    >>> p.exists()
    >>> p.is_file()
    >>> p.is_dir()
    >>> p.is_symlink()

The file owner and group names (rather than numeric ids) are queried
through matching properties::

    >>> p = Path('/etc/shadow')
    >>> p.owner

Path resolution

The ``resolve()`` method makes a path absolute, resolving any symlink on
the way.  It is the only operation which will remove "``..``" path components.

Directory walking

Simple (non-recursive) directory access is done by iteration::

    >>> p = Path('docs')
    >>> for child in p: child

This allows simple filtering through list comprehensions::

    >>> p = Path('.')
    >>> [child for child in p if child.is_dir()]
    [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]

Simple and recursive globbing is also provided::

    >>> for child in p.glob('**/*.py'): child

File opening

The ``open()`` method provides a file opening API similar to the builtin
``open()`` method::

    >>> p = Path('')
    >>> with as f: f.readline()
    '#!/usr/bin/env python3\n'

The ``raw_open()`` method, on the other hand, is similar to ````::

    >>> fd = p.raw_open(os.O_RDONLY)
    >>>, 15)
    b'#!/usr/bin/env '

Filesystem alteration

Several common filesystem operations are provided as methods: ``touch()``,
``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
``chmod()``, ``lchmod()``, ``symlink_to()``.  More operations could be
provided, for example some of the functionality of the shutil module.

Experimental openat() support

On compatible POSIX systems, the concrete PosixPath class can take advantage
of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of
open file descriptors as necessary.  Support is enabled by passing the
*use_openat* argument to the constructor::

    >>> p = Path(".", use_openat=True)

Then all paths constructed by navigating this path (either by iteration or
indexing) will also use the openat() family of functions.  The point of using
these functions is to avoid race conditions whereby a given directory is
silently replaced with another (often a symbolic link to a sensitive system
location) between two accesses.

.. _`openat()`:


This document has been placed into the public domain.

    Local Variables:
    mode: indented-text
    indent-tabs-mode: nil
    sentence-end-double-space: t
    fill-column: 70
    coding: utf-8

From josiah.carlson at  Fri Oct  5 20:51:21 2012
From: josiah.carlson at (Josiah Carlson)
Date: Fri, 5 Oct 2012 11:51:21 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 3, 2012 at 7:43 AM, chrysn <chrysn at> wrote:
> On Wed, Sep 26, 2012 at 10:02:24AM -0700, Josiah Carlson wrote:
>> Go ahead and read PEP 3153, we will wait.
>> A careful reading of PEP 3153 will tell you that the intent is to make
>> a "light" version of Twisted built into Python. There isn't any
>> discussion as to *why* this is a good idea, it just lays out the plan
>> of action. Its ideas were gathered from the experience of the Twisted
>> folks.
>> Their experience is substantial, but in the intervening 1.5+ years
>> since Pycon 2011, only the barest of abstract interfaces has been
>> defined (,
>> and no discussion has taken place as to forward migration of the
>> (fairly large) body of existing asyncore code.
> it doesn't look like twisted-light to me, more like a interface
> suggestion for a small subset of twisted. in particular, it doesn't talk
> about main loops / reactors / registration-in-the-first-place.
> you mention interaction with the twisted people. is there willingness,
> from the twisted side, to use a standard python middle layer, once it
> exists and has sufficiently high quality?

>> To the point, Giampaolo already has a reactor that implements the
>> interface (more or less "idea #3" from his earlier message), and it's
>> been used in production (under staggering ftp(s) load). Even better,
>> it offers effectively transparent replacement of the existing asyncore
>> loop, and supports existing asyncore-derived classes. It is available:
> i've had a look at it, but honestly can't say more than that it's good
> to have a well-tested asyncore compatible main loop with scheduling
> support, and i'll try it out for my own projects.
>> >> Again, at this point in time what you're proposing looks too vague,
>> >> ambitious and premature to me.
>> >
>> > please don't get me wrong -- i'm not proposing anything for immediate
>> > action, i just want to start a thinking process towards a better
>> > integrated stdlib.
>> I am curious as to what you mean by "a better integrated stdlib". A
>> new interface that doesn't allow people to easily migrate from an
>> existing (and long-lived, though flawed) standard library is not
>> better integration. Better integration requires allowing previous
>> users to migrate, while encouraging new users to join in with any
>> later development. That's what Giampaolo's suggested interface offers
>> on the lowest level; something to handle file-handle reactors,
>> combined with a scheduler.
> a new interface won't make integration automatically happen, but it's
> something the standard library components can evolve on. whether, for
> example urllib2 will then automatically work asynchronously in that
> framework or whether we'll wait for urllib3, we'll see when we have it.

Things don't "automatically work" without work. You can't just make
urllib2 work asynchronously unless you do the sorts of greenlet-style
stack switching that lies to you about what is going on, or unless you
redesign it from scratch to do such. That's not to say that greenlets
are bad, they are great. But expecting that a standard library
implementing an updated async spec will all of a sudden hook itself
into a synchronous socket client? I think that expectation is

> @migrate from an existing standard library: is there a big user base for
> the current asyncore framework? my impression from is that it is not
> very well known among python users, and most that could use it use
> twisted.

"Well known" is an interesting claim. I believe it actually known of
by quite a large part of the community, but due to a (perhaps
deserved) reputation (that may or may not still be the case), isn't
used as often as Twisted.

But along those lines, there are a few questions that should be asked:
1. Is it desirable to offer users the chance to transition from
asyncore-derived stuff to some new thing?
2. If so, what is necessary for an upgrade/replacement for
asyncore/asynchat in the long term?
3. Would 3rd parties use this as a basis for their libraries?
4. What are the short, mid, and long-term goals?

For my answers:
1. I think it is important to offer people who are using a standard
library module to continue using a standard library module if
2. A transition should offer either an adapter or similar-enough API
equivalency between the old and new.
3. I think that if it offers a reasonable API, good functionality, and
examples are provided - both as part of the stdlib and outside the
stdlib, people will see the advantages of maintaining less of their
own custom code. To the point: would Twisted use *whatever* was in the
stdlib? I don't know the answer, but unless the API is effectively
identical to Twisted, that transition may be delayed significantly.
4. Short: get current asyncore people transitioned to something
demonstrably better, that 3rd parties might also use. Mid: pull
parsers/logic out of cores of methods and make them available for
sync/async/3rd party parsing/protocol handling (get the best protocol
parsers into the stdlib, separated from the transport). Long: everyone
contributes/updates the stdlib modules because it has the best parsers
for protocols/formats, that can be used from *anywhere* (sync or

My long-term dream (which has been the case for 6+ years, since I
proposed doing it myself on the python-dev mailing list and was told
"no") is that whether someone uses urllib2, httplib2, smtpd, requests,
ftplib, etc., they all have access to high-quality protocol-level
protocol parsers. So that once one person writes the bit that handles
http 30X redirects, everyone can use it. So that when one person
writes the gzip + chunked transfer encoding/decoding, everyone can use

>> > we've talked about many things we'd need in a python asynchronous
>> > interface (not implementation), so what are the things we *don't* need?
>> > (so we won't start building a framework like twisted). i'll start:
>> >
>> > * high-level protocol handling (can be extra modules atop of it)
>> > * ssl
>> > * something like the twisted delayed framework (not sure about that, i
>> >   guess the twisted people will have good reason to use it, but i don't
>> >   see compelling reasons for such a thing in a minimal interface from my
>> >   limited pov)
>> > * explicit connection handling (retries, timeouts -- would be up to the
>> >   user as well, eg urllib might want to set up a timeout and retries for
>> >   asynchronous url requests)
>> I disagree with the last 3. If you have an IO loop, more often than
>> not you want an opportunity to do something later in the same context.
>> This is commonly the case for bandwidth limiting, connection timeouts,
>> etc., which are otherwise *very* difficult to do at a higher level
>> (which are the reasons why schedulers are built into IO loops).
>> Further, SSL in async can be tricky to get right. Having the 20-line
>> SSL layer as an available class is a good idea, and will save people
>> time by not having them re-invent it (poorly or incorrectly) every
>> time.
> i see; those should be provided, then.
> i'm afraid i don't completely get the point you're making, sorry for
> that, maybe i've missed important statements or lack sufficiently deep
> knowledge of topics affected and got lost in details.
> what is your opinion on the state of asynchronous operations in python,
> and what would you like it to be?

I think it is functional, but flawed. I also think that every 3rd
party that does network-level protocols are different mixes of
functional and flawed. I think that there is a repeated and
often-times wasted effort where folks are writing different and
invariably crappy (to some extent) protocol parsers and network
handlers. I think that whenever possible, that should stop, and the
highest-quality protocol parsing functions/methods should be available
in the Python standard library, available to be called from any
library, whether sync, async, stdlib, or 3rd party.

Now, my discussions in the context of asyncore-related upgrades may
seem like a strange leap, but some of these lesser-quality parsing
routines exist in asyncore-derived classes, as well as
non-asyncore-derived classes. But if we make an effort on the asyncore
side of things, under the auspices of improving one stdlib module,
offering additional functionality, the obviousness of needing
protocol-level parsers shared among sync/async should become obvious
to *everyone* (that it isn't now the case I suspect is because the
communities either don't spend a lot of time cross-pollinating, people
like writing parsers - I do too ;) - or the sync folks end up going
the greenlet route if/when threading bites them on the ass).

 - Josiah

From phd at  Fri Oct  5 21:16:25 2012
From: phd at (Oleg Broytman)
Date: Fri, 5 Oct 2012 23:16:25 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>


On Fri, Oct 05, 2012 at 08:25:34PM +0200, Antoine Pitrou <solipsis at> wrote:
> This PEP proposes the inclusion of a third-party module, `pathlib`_, in
> the standard library.

   +1 from me for a sane path handling in the stdlib!

>     >>> p = Path('/home/antoine/pathlib/')
>     >>>
>     ''
>     >>> p.ext
>     '.py'
>     >>> p.root
>     '/'
>     >>>
>     < ['/', 'home', 'antoine', 'pathlib', '']>
>     >>> list(p.parents())
>     [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]

   Some attributes are properties and some are methods. Which is which?
Why .root is a property but .parents() is a method? .owner/.group are
properties but .exists() is a method, and so on. .stat() just returns
self._stat, but said ._stat is a property!

>   A Windows UNC path (e.g.
>   ``\\some\\share\\myfile.txt``) always has a drive and a root
>   (here, ``\\some\\share`` and ``\\``, respectively).

   If I understand it correctly these should are either
\\\\some\\share\\myfile.txt and \\\\some\\share
\\some\share\myfile.txt and \\some\share

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From tjreedy at  Fri Oct  5 21:18:21 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 05 Oct 2012 15:18:21 -0400
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <k4nbq7$pht$>

On 10/5/2012 8:43 AM, Oscar Benjamin wrote:
> On 5 October 2012 10:27, Andy Buckley <andy at> wrote:

>> I was spurred to ask this question by a painful development experience
>> full of Up Up Up Up Up Enter Up Up Up Up Up Enter ... keypresses to
>> repeat a previous set of Python commands/statements that weren't worth
>> putting in a script file, or which I wanted to make very minor changes
>> to on each iteration.

Using Windows for a couple of decades, I am not spoiled by bash ;-).

Idle lets me directly click on a previous statement and hit enter to 
make it the current statement. Edit if desired and hit enter again to 
execute again in the current workspace. But I agree with Oscar that even 
a few lines are worth a temporary script file.

> As soon as I find myself doing this I quit the interpreter and start
> ipython. The feature that ipython has that makes what you are doing
> much easier is the magic %edit command. Just type
> In [1]: edit
> and your favourite editor will open up allowing you to write/edit some
> code. When you close the editor, ipython will run the code from
> within the interactive session (as if you had typed it in directly).
> If you want to rerun that code with modifications just type 'edit
>' again and you can make the modifications within your editor.

In Idle, I click File - Recent files - .../ (in my misc. files 
directory) to open an edit window, which I leave open all day. Running 
from the edit window does restart the workspace, so one would have to 
cut and paste to not restart. I seldom want to re-run multiple lines 
without restarting.

If I want to keep the 'temporary' code, saving under a different name is 

Terry Jan Reedy

From p.f.moore at  Fri Oct  5 21:19:12 2012
From: p.f.moore at (Paul Moore)
Date: Fri, 5 Oct 2012 20:19:12 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 5 October 2012 19:25, Antoine Pitrou <solipsis at> wrote:
> A path can be joined with another using the ``__getitem__`` operator::
>     >>> p = PurePosixPath('foo')
>     >>> p['bar']
>     PurePosixPath('foo/bar')
>     >>> p[PurePosixPath('bar')]
>     PurePosixPath('foo/bar')

There is a risk that this is too "cute". However, it's probably better
than overloading the '/' operator, and you do need something short.

> As with constructing, multiple path components can be specified at once::
>     >>> p['bar/xyzzy']
>     PurePosixPath('foo/bar/xyzzy')

That's risky. Are you proposing always using '/' regardless of OS? I'd
have expected os.sep (so \ on Windows). On the other hand, that would


mean two different things on Windows and Unix - 2 extra path levels on
Windows, only one on Unix (and a filename containing a backslash).

It would probably be better to allow tuples as arguments:

p['bar', 'baz']

> Properties
> ----------
> Five simple properties are provided on every path (each can be empty)::
>     >>> p = PureNTPath('c:/pathlib/')
>     >>>
>     'c:'
>     >>> p.root
>     '\\'
>     >>> p.anchor
>     'c:\\'
>     >>>
>     ''
>     >>> p.ext
>     '.py'

I don't like the way the distinction between "root" and "anchor" works
here. Unix users are never going to use "anchor", as "root" is the
natural term, and it does exactly the right thing on Unix. So code
written on Unix will tend to do the wrong thing on Windows (where
generally you'd want to use "anchor" or you'll find yourself switching
accidentally to the current drive).

It's a rare situation where it would matter, which on the one hand
makes it much less worth worrying about, but on the other hand means
that when bugs *do* occur, they will be very obscure :-(

Also, there is no good terminology in current use here. The only
concrete thing I can suggest is that "root" would be better used as
the term for what you're calling "anchor" as Windows users would
expect the root of "C:\foo\bar\baz" to be "C:\". The term "drive"
would be right for "C:" (although some might expect that to mean "C:\"
as well, but there's no point wasting two terms on the one concept).
It might be more practical to use a new, but explicit, term like
"driveroot" for "\". It's the same as root on Unix, and on Windows
it's fairly obviously "the root on the current drive". And by using
the coined term for the less common option, it might act as a reminder
to people that something not entirely portable is going on.

But there's no really simple answer - Windows and Unix are just different here.

> The ``parts`` property provides read-only sequence access to a path object::
>     >>> p = PurePosixPath('/etc/init.d')
>     >>>
>     < ['/', 'etc', 'init.d']>

+1. There's lots of times I have wished os.path had this.

> Windows paths handle the drive and the root as a single path component::
>     >>> p = PureNTPath('c:/')
>     >>>
>     < ['c:\\', '']>
>     >>> p.root
>     '\\'
>     >>>[0]
>     'c:\\'
> (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).

This again suggests to me that "C:\" is more closely allied to the
term "root" here.

Also, I assume that paths will be comparable, using case sensitivity
appropriate to the platform. Presumably a PurePath and a Path are
comparable, too. What about a PosixPath and an NTPath? Would you
expect them to be comparable or not?

But in general, this looks like a pretty good proposal. Having a
decent path abstraction in the stdlib would be great.


From mikegraham at  Fri Oct  5 21:23:57 2012
From: mikegraham at (Mike Graham)
Date: Fri, 5 Oct 2012 15:23:57 -0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 5, 2012 at 2:25 PM, Antoine Pitrou <solipsis at> wrote:
> Hello,
> This PEP is a resurrection of the idea of having object-oriented
> filesystem paths in the stdlib. It comes with a general API proposal
> as well as a specific implementation (*). The implementation is young
> and discussion is quite open.
> (*)
> Regards
> Antoine.

The os.path approach probably isn't the best, but it does work pretty
well in practice. I'm not sure I see the benefit of introducing
something new.


From ubershmekel at  Fri Oct  5 21:36:56 2012
From: ubershmekel at (Yuval Greenfield)
Date: Fri, 5 Oct 2012 21:36:56 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 5, 2012 at 9:16 PM, Oleg Broytman <phd at> wrote:

>    Some attributes are properties and some are methods. Which is which?
> Why .root is a property but .parents() is a method? .owner/.group are
> properties but .exists() is a method, and so on. .stat() just returns
> self._stat, but said ._stat is a property!
Unobvious indeed. Maybe operations that cause OS api calls should have

Also, I agree with Paul Moore that the naming at its current state may
cause cross-platform bugs.

Though I don't understand why not to overload the "/" or "+" operators.
Sounds more elegant than square brackets. Just make sure the op fails on
anything other than Path objects.

I'm +1 on adding such a useful abstraction to python if and only if it were
>= os.path on every front,

Yuval Greenfield
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Fri Oct  5 21:41:01 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 5 Oct 2012 21:41:01 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Fri, 5 Oct 2012 23:16:25 +0400
Oleg Broytman <phd at> wrote:
> Hi!
> On Fri, Oct 05, 2012 at 08:25:34PM +0200, Antoine Pitrou <solipsis at> wrote:
> > This PEP proposes the inclusion of a third-party module, `pathlib`_, in
> > the standard library.
>    +1 from me for a sane path handling in the stdlib!
> >     >>> p = Path('/home/antoine/pathlib/')
> >     >>>
> >     ''
> >     >>> p.ext
> >     '.py'
> >     >>> p.root
> >     '/'
> >     >>>
> >     < ['/', 'home', 'antoine', 'pathlib', '']>
> >     >>> list(p.parents())
> >     [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
>    Some attributes are properties and some are methods. Which is which?
> Why .root is a property but .parents() is a method? .owner/.group are
> properties but .exists() is a method, and so on. .stat() just returns
> self._stat, but said ._stat is a property!

parents() returns a generator (hence the list() call in the
example above). A generator-returning property sounds a bit too
confusing IMHO.

._stat is an implementation detail.  stat() and exists() both
mirror similar APIs in the os / os.path modules.

.name, .ext, .root, .parts just return static, immutable properties of
the path, I see no reason for them to be methods.

> >   A Windows UNC path (e.g.
> >   ``\\some\\share\\myfile.txt``) always has a drive and a root
> >   (here, ``\\some\\share`` and ``\\``, respectively).
>    If I understand it correctly these should are either
> \\\\some\\share\\myfile.txt and \\\\some\\share
>    or
> \\some\share\myfile.txt and \\some\share
>    no?

Ah, right. I'll correct it.



Software development and contracting:

From ethan at  Fri Oct  5 21:44:07 2012
From: ethan at (Ethan Furman)
Date: Fri, 05 Oct 2012 12:44:07 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Paul Moore wrote:
> On 5 October 2012 19:25, Antoine Pitrou <solipsis at> wrote:
>> A path can be joined with another using the ``__getitem__`` operator::
>>     >>> p = PurePosixPath('foo')
>>     >>> p['bar']
>>     PurePosixPath('foo/bar')
>>     >>> p[PurePosixPath('bar')]
>>     PurePosixPath('foo/bar')
> There is a risk that this is too "cute". However, it's probably better
> than overloading the '/' operator, and you do need something short.

I actually like using the '/' operator for this.  My own path module 
uses it, and the resulting code is along the lines of:

    job = Path('c:/orders/38273')
    table = dbf.Table(job/'ABC12345')

>> As with constructing, multiple path components can be specified at once::
>>     >>> p['bar/xyzzy']
>>     PurePosixPath('foo/bar/xyzzy')
> That's risky. Are you proposing always using '/' regardless of OS?

Mine does; it also accepts `\\` on Windows machines.  Personally, I 
don't care for the index notation Antoine is suggesting.


From solipsis at  Fri Oct  5 21:55:20 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 5 Oct 2012 21:55:20 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Fri, 5 Oct 2012 20:19:12 +0100
Paul Moore <p.f.moore at> wrote:
> On 5 October 2012 19:25, Antoine Pitrou <solipsis at> wrote:
> > A path can be joined with another using the ``__getitem__`` operator::
> >
> >     >>> p = PurePosixPath('foo')
> >     >>> p['bar']
> >     PurePosixPath('foo/bar')
> >     >>> p[PurePosixPath('bar')]
> >     PurePosixPath('foo/bar')
> There is a risk that this is too "cute". However, it's probably better
> than overloading the '/' operator, and you do need something short.

I think overloading '/' is ugly (dividing paths??).

Someone else proposed overloading '+', which would be confusing since we
need to be able to combine paths and regular strings, for ease of use.
The point of using __getitem__ is that you get an error if you replace
the Path object with a regular string by mistake:

>>> PurePath('foo')['bar']
>>> 'foo'['bar']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: string indices must be integers

If you were to use the '+' operator instead, 'foo' + 'bar' would work
but give you the wrong result.

> > As with constructing, multiple path components can be specified at once::
> >
> >     >>> p['bar/xyzzy']
> >     PurePosixPath('foo/bar/xyzzy')
> That's risky. Are you proposing always using '/' regardless of OS? I'd
> have expected os.sep (so \ on Windows).

Both '/' and '\\' are accepted as path separators under Windows. Under
Unix, '\\' is a regular character:

>>> PurePosixPath('foo\\bar') == PurePosixPath('foo/bar')
>>> PureNTPath('foo\\bar') == PureNTPath('foo/bar')

> It would probably be better to allow tuples as arguments:
> p['bar', 'baz']

It already works indeed:

>>> p = PurePath('foo')
>>> p['bar', 'baz']

> > Five simple properties are provided on every path (each can be empty)::
> >
> >     >>> p = PureNTPath('c:/pathlib/')
> >     >>>
> >     'c:'
> >     >>> p.root
> >     '\\'
> >     >>> p.anchor
> >     'c:\\'
> >     >>>
> >     ''
> >     >>> p.ext
> >     '.py'
> I don't like the way the distinction between "root" and "anchor" works
> here. Unix users are never going to use "anchor", as "root" is the
> natural term, and it does exactly the right thing on Unix. So code
> written on Unix will tend to do the wrong thing on Windows (where
> generally you'd want to use "anchor" or you'll find yourself switching
> accidentally to the current drive).

Well, I expect .root or .anchor to be used mostly for presentation or
debugging purposes. There's nothing really useful to be done with them
otherwise, IMHO. Do you know of any use cases?

> Also, there is no good terminology in current use here. The only
> concrete thing I can suggest is that "root" would be better used as
> the term for what you're calling "anchor" as Windows users would
> expect the root of "C:\foo\bar\baz" to be "C:\".

But then the root of "C:foo" would be "C:", which sounds wrong:
"C:" isn't a root at all.

> But there's no really simple answer - Windows and Unix are just different here.

Yes, and Unix users are expecting something simpler than what's going on
under Windows ;)

> Also, I assume that paths will be comparable, using case sensitivity
> appropriate to the platform. Presumably a PurePath and a Path are
> comparable, too. What about a PosixPath and an NTPath? Would you
> expect them to be comparable or not?

Currently, different flavours imply unequal (and unorderable) paths:

>>> PurePosixPath('foo') == PureNTPath('foo')
>>> PurePosixPath('foo') > PureNTPath('foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: PurePosixPath() > PureNTPath()

However, pure paths and concrete paths of the same flavour can be
equal, and ordered:

>>> PurePath('foo') == Path('foo')
>>> PurePath('foo') >= Path('foo')



Software development and contracting:

From amcnabb at  Fri Oct  5 21:53:27 2012
From: amcnabb at (Andrew McNabb)
Date: Fri, 5 Oct 2012 13:53:27 -0600
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 05, 2012 at 09:36:56PM +0200, Yuval Greenfield wrote:
> Though I don't understand why not to overload the "/" or "+" operators.
> Sounds more elegant than square brackets. Just make sure the op fails on
> anything other than Path objects.

Path concatenation is obviously not a form of division, so it makes
little sense to use the division operator for this purpose.  I always
wonder why the designers of C++ felt that it made sense to perform
output by left-bitshifting the output stream by a string:
    std::cout << "hello, world";

Fortunately, operator overloading in Python is generally limited to
cases where the operator's meaning is preserved (with the unfortunate
exception of the % operator for strings).

Andrew McNabb
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

From ethan at  Fri Oct  5 22:06:57 2012
From: ethan at (Ethan Furman)
Date: Fri, 05 Oct 2012 13:06:57 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Antoine Pitrou wrote:
> On Fri, 5 Oct 2012 20:19:12 +0100
> Paul Moore <p.f.moore at> wrote:
>> On 5 October 2012 19:25, Antoine Pitrou <solipsis at> wrote:
>>> A path can be joined with another using the ``__getitem__`` operator::
>>>     >>> p = PurePosixPath('foo')
>>>     >>> p['bar']
>>>     PurePosixPath('foo/bar')
>>>     >>> p[PurePosixPath('bar')]
>>>     PurePosixPath('foo/bar')
>> There is a risk that this is too "cute". However, it's probably better
>> than overloading the '/' operator, and you do need something short.
> I think overloading '/' is ugly (dividing paths??).

But '/' is the normal path separator, so it's not dividing; and it 
certainly makes more sense than `%` with string interpolations.  ;)

> Someone else proposed overloading '+', which would be confusing since we
> need to be able to combine paths and regular strings, for ease of use.
> The point of using __getitem__ is that you get an error if you replace
> the Path object with a regular string by mistake:
>>>> PurePath('foo')['bar']
> PurePosixPath('foo/bar')
>>>> 'foo'['bar']
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: string indices must be integers
> If you were to use the '+' operator instead, 'foo' + 'bar' would work
> but give you the wrong result.

I would rather use the `/` and `+` and risk the occasional wrong result. 
(And yes, I have spent time tracking bugs because of that wrong result 
when using my own Path module -- and I'd still rather make that trade-off.)


From solipsis at  Fri Oct  5 22:09:54 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 5 Oct 2012 22:09:54 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
References: <>
Message-ID: <>

On Fri, 5 Oct 2012 11:51:21 -0700
Josiah Carlson <josiah.carlson at>
> My long-term dream (which has been the case for 6+ years, since I
> proposed doing it myself on the python-dev mailing list and was told
> "no") is that whether someone uses urllib2, httplib2, smtpd, requests,
> ftplib, etc., they all have access to high-quality protocol-level
> protocol parsers.

I'm not sure what you're talking about: what were you told "no" about,
specifically? Your proposal sounds reasonable and (ideally) desirable to



Software development and contracting:

From andrew.svetlov at  Fri Oct  5 22:59:24 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Fri, 5 Oct 2012 23:59:24 +0300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

+1 in general. I like to have library like that in the battery.

I would to see the note why [] used instead  / or + in the pep while
I'm agree with that.
+0 for /
-1 for +
For method/property decision I guess (maybe stupid) rule:
properties for simple accessors and methods for operations which
require os calls.
With exception for parents() as method which returns generator.

On Fri, Oct 5, 2012 at 11:06 PM, Ethan Furman <ethan at> wrote:
> Antoine Pitrou wrote:
>> On Fri, 5 Oct 2012 20:19:12 +0100
>> Paul Moore <p.f.moore at> wrote:
>>> On 5 October 2012 19:25, Antoine Pitrou <solipsis at> wrote:
>>>> A path can be joined with another using the ``__getitem__`` operator::
>>>>     >>> p = PurePosixPath('foo')
>>>>     >>> p['bar']
>>>>     PurePosixPath('foo/bar')
>>>>     >>> p[PurePosixPath('bar')]
>>>>     PurePosixPath('foo/bar')
>>> There is a risk that this is too "cute". However, it's probably better
>>> than overloading the '/' operator, and you do need something short.
>> I think overloading '/' is ugly (dividing paths??).
> But '/' is the normal path separator, so it's not dividing; and it certainly
> makes more sense than `%` with string interpolations.  ;)
>> Someone else proposed overloading '+', which would be confusing since we
>> need to be able to combine paths and regular strings, for ease of use.
>> The point of using __getitem__ is that you get an error if you replace
>> the Path object with a regular string by mistake:
>>>>> PurePath('foo')['bar']
>> PurePosixPath('foo/bar')
>>>>> 'foo'['bar']
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in <module>
>> TypeError: string indices must be integers
>> If you were to use the '+' operator instead, 'foo' + 'bar' would work
>> but give you the wrong result.
> I would rather use the `/` and `+` and risk the occasional wrong result.
> (And yes, I have spent time tracking bugs because of that wrong result when
> using my own Path module -- and I'd still rather make that trade-off.)
> ~Ethan~
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Andrew Svetlov

From ethan at  Fri Oct  5 23:38:57 2012
From: ethan at (Ethan Furman)
Date: Fri, 05 Oct 2012 14:38:57 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:
> Extraneous path separators and ``"."`` components are eliminated::
>     >>> PurePath('a///b/c/./d/')
>     PurePosixPath('a/b/c/d')

I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error?

> The ``parent()`` method returns an ancestor of the path::
>     >>> p.parent()
>     PureNTPath('c:\\python33\\bin')
>     >>> p.parent(2)
>     PureNTPath('c:\\python33')
>     >>> p.parent(3)
>     PureNTPath('c:\\')
> The ``parents()`` method automates repeated invocations of ``parent()``, until
> the anchor is reached::
>     >>> p = PureNTPath('c:/python33/bin/python.exe')
>     >>> for parent in p.parents(): parent
>     ...
>     PureNTPath('c:\\python33\\bin')
>     PureNTPath('c:\\python33')
>     PureNTPath('c:\\')

What's the use-case for iterating through all the parent directories?

Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), 
and I export it to .csv in the same folder; how would I transform the 
above PureNTPath's ext from 'dbf' to 'csv'?


From phd at  Sat Oct  6 00:05:14 2012
From: phd at (Oleg Broytman)
Date: Sat, 6 Oct 2012 02:05:14 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 05, 2012 at 02:38:57PM -0700, Ethan Furman <ethan at> wrote:
> Antoine Pitrou wrote:
> >Extraneous path separators and ``"."`` components are eliminated::
> >
> >    >>> PurePath('a///b/c/./d/')
> >    PurePosixPath('a/b/c/d')
> I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error?

   Why? They aren't errors in the underlying OS.

> >    >>> p = PureNTPath('c:/python33/bin/python.exe')
> >    >>> for parent in p.parents(): parent
> >    ...
> >    PureNTPath('c:\\python33\\bin')
> >    PureNTPath('c:\\python33')
> >    PureNTPath('c:\\')
> What's the use-case for iterating through all the parent directories?

for parent in p.parents():
    if parent['.svn'].exists():
        last_seen = parent
        print("The topmost directory of the project: %s" % last_seen)

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From ethan at  Sat Oct  6 00:21:06 2012
From: ethan at (Ethan Furman)
Date: Fri, 05 Oct 2012 15:21:06 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>
Message-ID: <>

Oleg Broytman wrote:
> On Fri, Oct 05, 2012 at 02:38:57PM -0700, Ethan Furman <ethan at> wrote:
>> Antoine Pitrou wrote:
>>> Extraneous path separators and ``"."`` components are eliminated::
>>>    >>> PurePath('a///b/c/./d/')
>>>    PurePosixPath('a/b/c/d')
>> I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error?
>    Why? They aren't errors in the underlying OS.

They are on Windows (no comment on whether or not it qualifies as an OS ;).

c:\temp>dir \\\\\temp
The filename, directory name, or volume label syntax is incorrect.

c:\temp>dir \\temp
The filename, directory name, or volume label syntax is incorrect.

Although I see it works fine in between path pieces:

c:\temp\34400>dir \temp\\\34400
[snip listing]

>> What's the use-case for iterating through all the parent directories?
> for parent in p.parents():
>     if parent['.svn'].exists():
>         last_seen = parent
>         continue
>     else:
>         print("The topmost directory of the project: %s" % last_seen)
>         break

Cool, thanks.


From steve at  Sat Oct  6 00:41:05 2012
From: steve at (Steven D'Aprano)
Date: Sat, 06 Oct 2012 08:41:05 +1000
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 06/10/12 05:53, Andrew McNabb wrote:

> Path concatenation is obviously not a form of division, so it makes
> little sense to use the division operator for this purpose.

But / is not just a division operator. It is also used for:

* alternatives: "tea and/or coffee, breakfast/lunch/dinner"
* italic markup: "some apps use /slashes/ for italics"
* instead of line breaks when quoting poetry
* abbreviations such as n/a b/w c/o and even w/ (not applicable,
   between, care of, with)
* date separator

Since / is often (but not always) used as a path separator, using it as
a path component join operator makes good sense.

BTW, are there any supported platforms where the path separator or
alternate path are not slash? There used to be Apple Mac OS using


From grosser.meister.morti at  Sat Oct  6 00:47:28 2012
From: grosser.meister.morti at (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Sat, 06 Oct 2012 00:47:28 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>
Message-ID: <>

On 10/06/2012 12:21 AM, Ethan Furman wrote:
> Oleg Broytman wrote:
>> On Fri, Oct 05, 2012 at 02:38:57PM -0700, Ethan Furman <ethan at> wrote:
>>> Antoine Pitrou wrote:
>>>> Extraneous path separators and ``"."`` components are eliminated::
>>>>    >>> PurePath('a///b/c/./d/')
>>>>    PurePosixPath('a/b/c/d')
>>> I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error?
>>    Why? They aren't errors in the underlying OS.
> They are on Windows (no comment on whether or not it qualifies as an OS ;).
> c:\temp>dir \\\\\temp
> The filename, directory name, or volume label syntax is incorrect.
> c:\temp>dir \\temp
> The filename, directory name, or volume label syntax is incorrect.
> Although I see it works fine in between path pieces:
> c:\temp\34400>dir \temp\\\34400
> [snip listing]

\\ at the start of a path has a special meaning under windows:

>>> What's the use-case for iterating through all the parent directories?
>> for parent in p.parents():
>>     if parent['.svn'].exists():
>>         last_seen = parent
>>         continue
>>     else:
>>         print("The topmost directory of the project: %s" % last_seen)
>>         break
> Cool, thanks.
> ~Ethan~

From solipsis at  Sat Oct  6 01:16:09 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 01:16:09 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<> <>
Message-ID: <>

On Sat, 06 Oct 2012 00:47:28 +0200
Mathias Panzenb?ck <grosser.meister.morti at>
> \\ at the start of a path has a special meaning under windows:

And indeed the API preserves them:

    >>> PurePosixPath('//some/path')
    >>> PureNTPath('//some/path')



Software development and contracting:

From solipsis at  Sat Oct  6 01:48:23 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 01:48:23 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Fri, 05 Oct 2012 14:38:57 -0700
Ethan Furman <ethan at> wrote:
> Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), 
> and I export it to .csv in the same folder; how would I transform the 
> above PureNTPath's ext from 'dbf' to 'csv'?

Something like:

>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
>>> p.parent()['.')[0] + '.csv']

Any suggestion to ease this use case a bit?



Software development and contracting:

From amcnabb at  Sat Oct  6 01:54:57 2012
From: amcnabb at (Andrew McNabb)
Date: Fri, 5 Oct 2012 18:54:57 -0500
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 06, 2012 at 08:41:05AM +1000, Steven D'Aprano wrote:
> On 06/10/12 05:53, Andrew McNabb wrote:
> >Path concatenation is obviously not a form of division, so it makes
> >little sense to use the division operator for this purpose.
> But / is not just a division operator. It is also used for:
> * alternatives: "tea and/or coffee, breakfast/lunch/dinner"
> * italic markup: "some apps use /slashes/ for italics"
> * instead of line breaks when quoting poetry
> * abbreviations such as n/a b/w c/o and even w/ (not applicable,
>   between, care of, with)
> * date separator

This is the difference between C++ style operators, where the only thing
that matters is what the operator symbol looks like, and Python style
operators, where an operator symbol is just syntactic sugar.  In Python,
the "/" is synonymous with `operator.div` and is defined in terms of the
`__div__` special method.  This distinction is why I hate operator
overloading in C++ but like it in Python.

Andrew McNabb
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

From shibturn at  Sat Oct  6 02:27:49 2012
From: shibturn at (Richard Oudkerk)
Date: Sat, 06 Oct 2012 01:27:49 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <k4ntu7$9c1$>

On 06/10/2012 12:48am, Antoine Pitrou wrote:
>>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
>>>> >>>p.parent()['.')[0] + '.csv']
> PureNTPath('c:\\orders\\12345\\abc67890.csv')
> Any suggestion to ease this use case a bit?

Maybe p.basename could be shorthand for'.')[0].


From greg.ewing at  Sat Oct  6 02:37:26 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 06 Oct 2012 13:37:26 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:

> Well, I expect .root or .anchor to be used mostly for presentation or
> debugging purposes.

I'm having trouble thinking of *any* use cases, even for
presentation or debugging.

Maybe they should be dropped altogether until someone comes
up with a use case.


From solipsis at  Sat Oct  6 02:38:14 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 02:38:14 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<> <k4ntu7$9c1$>
Message-ID: <>

On Sat, 06 Oct 2012 01:27:49 +0100
Richard Oudkerk <shibturn at>

> On 06/10/2012 12:48am, Antoine Pitrou wrote:
> >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
> >>>> >>>p.parent()['.')[0] + '.csv']
> > PureNTPath('c:\\orders\\12345\\abc67890.csv')
> >
> > Any suggestion to ease this use case a bit?
> Maybe p.basename could be shorthand for'.')[0].

Wouldn't there be some confusion with os.path.basename:

> Richard

Software development and contracting:

From solipsis at  Sat Oct  6 02:39:23 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 02:39:23 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<> <k4ntu7$9c1$>
Message-ID: <>

On Sat, 06 Oct 2012 01:27:49 +0100
Richard Oudkerk <shibturn at>

> On 06/10/2012 12:48am, Antoine Pitrou wrote:
> >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
> >>>> >>>p.parent()['.')[0] + '.csv']
> > PureNTPath('c:\\orders\\12345\\abc67890.csv')
> >
> > Any suggestion to ease this use case a bit?
> Maybe p.basename could be shorthand for'.')[0].

Wouldn't there be some confusion with os.path.basename:

>>> os.path.basename('a/b/c.ext')

(sorry for the earlier, unfinished reply)



Software development and contracting:

From greg.ewing at  Sat Oct  6 02:54:21 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 06 Oct 2012 13:54:21 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Andrew McNabb wrote:

> This is the difference between C++ style operators, where the only thing
> that matters is what the operator symbol looks like, and Python style
> operators, where an operator symbol is just syntactic sugar.  In Python,
> the "/" is synonymous with `operator.div` and is defined in terms of the
> `__div__` special method.  This distinction is why I hate operator
> overloading in C++ but like it in Python.

Not sure what you're saying here -- in both languages, operators
are no more than syntactic sugar for dispatching to an appropriate
method or function. Python just avoids introducing a special syntax
for spelling the name of the operator, which is nice, but it's
not a huge difference.

The same issues of what you *should* use operators for arises in
both communities, and it seems to be very much a matter of
personal taste.

(The use of << for output in C++ has never bothered me, BTW. There
are plenty of problems with the way I/O is done in C++, but the
use of << is the least of them, IMO...)


From greg.ewing at  Sat Oct  6 03:05:41 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 06 Oct 2012 14:05:41 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <k4ntu7$9c1$>
References: <>
	<> <>
Message-ID: <>

How about making a path object behave like a sequence
of pathname components? Then

* You can iterate over it directly instead of needing .parents()

* p[:-1] gives you the dirname

* p[-1] gives you the os.path.basename


From massimo.dipierro at  Sat Oct  6 04:41:17 2012
From: massimo.dipierro at (massimo.dipierro at
Date: Fri, 5 Oct 2012 19:41:17 -0700 (PDT)
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
Message-ID: <>

An HTML attachment was scrubbed...
URL: <>

From ericsnowcurrently at  Sat Oct  6 06:57:48 2012
From: ericsnowcurrently at (Eric Snow)
Date: Fri, 5 Oct 2012 22:57:48 -0600
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 5, 2012 at 1:55 PM, Antoine Pitrou <solipsis at> wrote:
> I think overloading '/' is ugly (dividing paths??).

Agreed.  +1 on the proposed API in this regard.  It's pretty easy to
grok.  I also like that item access here mirrors how paths are treated
as sequences/iterables in other parts of the API.

It wouldn't surprise me if the join syntax is the most contentious
part of the proposal. ;)


From ericsnowcurrently at  Sat Oct  6 07:16:55 2012
From: ericsnowcurrently at (Eric Snow)
Date: Fri, 5 Oct 2012 23:16:55 -0600
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 5, 2012 at 5:48 PM, Antoine Pitrou <solipsis at> wrote:
> On Fri, 05 Oct 2012 14:38:57 -0700
> Ethan Furman <ethan at> wrote:
>> Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'),
>> and I export it to .csv in the same folder; how would I transform the
>> above PureNTPath's ext from 'dbf' to 'csv'?
> Something like:
>>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
>>>> p.parent()['.')[0] + '.csv']
> PureNTPath('c:\\orders\\12345\\abc67890.csv')
> Any suggestion to ease this use case a bit?

Each namedtuple has a _replace() method that's is used to generate a
new instance with one or more attributes changed.  We could do
something similar here:

>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
>>> p.replace(ext='.csv')


From ethan at  Sat Oct  6 07:36:49 2012
From: ethan at (Ethan Furman)
Date: Fri, 05 Oct 2012 22:36:49 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Andrew McNabb wrote:
> On Sat, Oct 06, 2012 at 08:41:05AM +1000, Steven D'Aprano wrote:
>> On 06/10/12 05:53, Andrew McNabb wrote:
>>> Path concatenation is obviously not a form of division, so it makes
>>> little sense to use the division operator for this purpose.
>> But / is not just a division operator. It is also used for:
>> * alternatives: "tea and/or coffee, breakfast/lunch/dinner"
>> * italic markup: "some apps use /slashes/ for italics"
>> * instead of line breaks when quoting poetry
>> * abbreviations such as n/a b/w c/o and even w/ (not applicable,
>>   between, care of, with)
>> * date separator
> This is the difference between C++ style operators, where the only thing
> that matters is what the operator symbol looks like, and Python style
> operators, where an operator symbol is just syntactic sugar.  In Python,
> the "/" is synonymous with `operator.div` and is defined in terms of the
> `__div__` special method.  This distinction is why I hate operator
> overloading in C++ but like it in Python.

'/' is just a symbol.  One common interpretation is as division, but 
that is not its only purpose.  It's not even one of the first two 
symbols I learned for division when I was younger.


From ethan at  Sat Oct  6 07:42:00 2012
From: ethan at (Ethan Furman)
Date: Fri, 05 Oct 2012 22:42:00 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Eric Snow wrote:
> On Fri, Oct 5, 2012 at 5:48 PM, Antoine Pitrou <solipsis at> wrote:
>> On Fri, 05 Oct 2012 14:38:57 -0700
>> Ethan Furman <ethan at> wrote:
>>> Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'),
>>> and I export it to .csv in the same folder; how would I transform the
>>> above PureNTPath's ext from 'dbf' to 'csv'?
>> Something like:
>>>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
>>>>> p.parent()['.')[0] + '.csv']
>> PureNTPath('c:\\orders\\12345\\abc67890.csv')
>> Any suggestion to ease this use case a bit?
> Each namedtuple has a _replace() method that's is used to generate a
> new instance with one or more attributes changed.  We could do
> something similar here:
>>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
>>>> p.replace(ext='.csv')
> PureNTPath('c:\\orders\\12345\\abc67890.csv')


From turnbull at  Sat Oct  6 10:00:31 2012
From: turnbull at (Stephen J. Turnbull)
Date: Sat, 06 Oct 2012 17:00:31 +0900
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <k4ntu7$9c1$>
Message-ID: <>

Antoine Pitrou writes:
 > On Sat, 06 Oct 2012 01:27:49 +0100
 > Richard Oudkerk <shibturn at>
 > wrote:
 > > On 06/10/2012 12:48am, Antoine Pitrou wrote:
 > > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
 > > >>>> >>>p.parent()['.')[0] + '.csv']
 > > > PureNTPath('c:\\orders\\12345\\abc67890.csv')
 > > >
 > > > Any suggestion to ease this use case a bit?
 > > 
 > > Maybe p.basename could be shorthand for'.')[0].
 > Wouldn't there be some confusion with os.path.basename:
 > >>> os.path.basename('a/b/c.ext')
 > 'c.ext'

Not to mention standard Unix usage.  GNU basename will allow you to
specify a *particular* extension explicitly, which will be stripped if
present and otherwise ignored.  Eg, "basename a/b/c.ext ext" => "c."
(note the period!) and "basename a/b/c ext" => "c".  I don't know if
that's an extension to POSIX.  In any case, it would require basename
to be a method rather than a property.

 > (sorry for the earlier, unfinished reply)

Also there are applications where "basenames" contain periods (eg,
wget often creates directories with names like ""), and
filenames may have multiple extensions, eg, "index.ja.html".

I think it's reasonable to define "extension" to mean "the portion
after the last period (if any, maybe including the period), but I
think usage of the complementary concept is pretty application-

From stephen at  Sat Oct  6 10:04:44 2012
From: stephen at (Stephen J. Turnbull)
Date: Sat, 06 Oct 2012 17:04:44 +0900
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Ethan Furman writes:
 > Eric Snow wrote:
 > > On Fri, Oct 5, 2012 at 5:48 PM, Antoine Pitrou <solipsis at> wrote:
 > >> On Fri, 05 Oct 2012 14:38:57 -0700
 > >> Ethan Furman <ethan at> wrote:
 > >>> Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'),
 > >>> and I export it to .csv in the same folder; how would I transform the
 > >>> above PureNTPath's ext from 'dbf' to 'csv'?
 > >> Something like:
 > >>
 > >>>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
 > >>>>> p.parent()['.')[0] + '.csv']
 > >> PureNTPath('c:\\orders\\12345\\abc67890.csv')
 > >>
 > >> Any suggestion to ease this use case a bit?
 > > 
 > > Each namedtuple has a _replace() method that's is used to generate a
 > > new instance with one or more attributes changed.  We could do
 > > something similar here:
 > > 
 > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
 > >>>> p.replace(ext='.csv')
 > > PureNTPath('c:\\orders\\12345\\abc67890.csv')
 > +1

How about a more general subst() method?  Indeed, it would need
keyword arguments for named components like ext, but I often do things
like "mv ~/Maildir/{tmp,new}/42" in the shell.  I think it would be
useful to be able to replace any component of a path.

From turnbull at  Sat Oct  6 10:39:13 2012
From: turnbull at (Stephen J. Turnbull)
Date: Sat, 06 Oct 2012 17:39:13 +0900
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou writes:

 > On Fri, 5 Oct 2012 20:19:12 +0100
 > Paul Moore <p.f.moore at> wrote:
 > > On 5 October 2012 19:25, Antoine Pitrou <solipsis at> wrote:
 > > > A path can be joined with another using the ``__getitem__`` operator::
 > > >
 > > >     >>> p = PurePosixPath('foo')
 > > >     >>> p['bar']
 > > >     PurePosixPath('foo/bar')
 > > >     >>> p[PurePosixPath('bar')]
 > > >     PurePosixPath('foo/bar')
 > >
 > > There is a risk that this is too "cute". However, it's probably better
 > > than overloading the '/' operator, and you do need something
 > > short.

I didn't like this much at first.  However, if you think of this as a
"collection" (cf. WebDAV), then the bracket notation is the obvious
way to do it in Python (FVO "it" == "accessing a member of a
collection by name").

I wonder if there is a need to distinguish between a path naming a
directory as a collection, and as a file itself?  Or can/should this
be implicit (wash my mouth out with soap!) in the operation using the

 > Someone else proposed overloading '+', which would be confusing
 > since we need to be able to combine paths and regular strings, for
 > ease of use.

Is it really that obnoxious to write "p + Path('bar')" (where p is a

What about the case "'bar' + p"?  Since Python isn't C, you can't
express that as "'bar'[p]"!

 > The point of using __getitem__ is that you get an error if you replace
 > the Path object with a regular string by mistake:
 > > > As with constructing, multiple path components can be specified at once::
 > > >
 > > >     >>> p['bar/xyzzy']
 > > >     PurePosixPath('foo/bar/xyzzy')
 > > 
 > > That's risky. Are you proposing always using '/' regardless of OS? I'd
 > > have expected os.sep (so \ on Windows).
 > Both '/' and '\\' are accepted as path separators under Windows. Under
 > Unix, '\\' is a regular character:

That's outright ugly, especially from the "collections" point of view
(foo/bar/xyzzy is not a member of foo).  If you want something that
doesn't suffer from the bogosities of os.path, this kind of platform-
dependence should be avoided, I think.

 > > Also, there is no good terminology in current use here. The only
 > > concrete thing I can suggest is that "root" would be better used as
 > > the term for what you're calling "anchor" as Windows users would
 > > expect the root of "C:\foo\bar\baz" to be "C:\".
 > But then the root of "C:foo" would be "C:", which sounds wrong:
 > "C:" isn't a root at all.

Why not interpret the root of "C:foo" to be None?  The Windows user
can still get "C:" as the drive, and I don't think that will be
surprising to them.

 > > But there's no really simple answer - Windows and Unix are just
 > > different here.
 > Yes, and Unix users are expecting something simpler than what's going on
 > under Windows ;)

Well, Unix users can do things more uniformly.  But there's also a lot
of complexity going on under the hood.  Every file system has a root,
of which only one is named "/".  I don't know if Python programs ever
need that information (I never have :-), but it would be nice to leave
room for extension.  Similarly, many "file systems" are actually just
hierarchically organized database access methods with no physical
existence on hardware.

I wonder if "mount_point" is sufficiently general to include the roots
of real local file systems, remote file systems, Windows drives, and
pseudo file systems?  An obvious problem is that Windows users would
not find that terminology natural.

From stephen at  Sat Oct  6 12:09:05 2012
From: stephen at (Stephen J. Turnbull)
Date: Sat, 06 Oct 2012 19:09:05 +0900
Subject: [Python-ideas]  PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou writes:

 > ``relative()`` returns a new relative path by stripping the drive and root::

Does this have use cases so common that it deserves a convenience
method?  I would expect "relative" to require an argument.  (Ie, I
would expect it to have the semantics of "relative_to".)  Or is the
issue that you can't count on PureNTPath(p).relative_to('C:\\') to

Maybe the 

From p.f.moore at  Sat Oct  6 12:24:01 2012
From: p.f.moore at (Paul Moore)
Date: Sat, 6 Oct 2012 11:24:01 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 October 2012 09:39, Stephen J. Turnbull <turnbull at> wrote:
> I wonder if "mount_point" is sufficiently general to include the roots
> of real local file systems, remote file systems, Windows drives, and
> pseudo file systems?  An obvious problem is that Windows users would
> not find that terminology natural.

Technically, newer versions of Windows (Vista and later, I think)
allow you to mount a drive on a directory rather than a drive letter,
just like Unix. Although I'm not sure I've ever seen it done, and I
don't know if there are suitable API calls to determine if a directory
is a mount point (I guess there must be).

An ugly, but viable, approach would be to have drive and mount_point
properties, which are synonyms.


From p.f.moore at  Sat Oct  6 12:27:58 2012
From: p.f.moore at (Paul Moore)
Date: Sat, 6 Oct 2012 11:27:58 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 6 October 2012 11:09, Stephen J. Turnbull <stephen at> wrote:
> Antoine Pitrou writes:
>  > ``relative()`` returns a new relative path by stripping the drive and root::
> Does this have use cases so common that it deserves a convenience
> method?


> I would expect "relative" to require an argument.  (Ie, I
> would expect it to have the semantics of "relative_to".)

I agree that's what I thought relative() would be when I first read the name.

> Or is the
> issue that you can't count on PureNTPath(p).relative_to('C:\\') to

It seems to me that if p isn't on drive C:, then the right thing is
clearly to raise an exception. No ambiguity there - although Unix
users might well write code that doesn't allow for exceptions from the
method, just because it's not a possible result on Unix. Having it
documented might help raise awareness of the possibility, though. And
that's about the best you can hope for.


From mark at  Sat Oct  6 12:49:35 2012
From: mark at (Mark Shannon)
Date: Sat, 06 Oct 2012 11:49:35 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Just to add my 2p's worth.

On 05/10/12 19:25, Antoine Pitrou wrote:
> Hello,
> This PEP is a resurrection of the idea of having object-oriented
> filesystem paths in the stdlib. It comes with a general API proposal
> as well as a specific implementation (*). The implementation is young
> and discussion is quite open.
> (*)
> Regards
> Antoine.
> PS: You can all admire my ASCII-art skills.

In general I like it.

> Class hierarchy
> ---------------

Lovely ASCII art work :)
but it does have have the n*m problem of such hierarchies.
N types of file:
file, directory, mount-point, drive, root, etc, etc
and M implementations
Posix, NT, linux, OSX, network, database, etc, etc

I would prefer duck-typing.
Add ABCs for all the N types of file and use concrete classes for the 
actual filesystems
That way there are N+M rather than N*M classes.

Although I'm generally against operator overloading, would the // 
operator be better than  the // operator as it is more rarely used and 
more visually distinctive?


From solipsis at  Sat Oct  6 14:06:52 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 14:06:52 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Sat, 06 Oct 2012 17:39:13 +0900
"Stephen J. Turnbull"
<turnbull at> wrote:
> I wonder if there is a need to distinguish between a path naming a
> directory as a collection, and as a file itself?  Or can/should this
> be implicit (wash my mouth out with soap!) in the operation using the
> Path?

I don't think there's a need to distinguish. Trying to
access /etc/passwd/somefile will simply raise an error on I/O.

>  > Someone else proposed overloading '+', which would be confusing
>  > since we need to be able to combine paths and regular strings, for
>  > ease of use.
> Is it really that obnoxious to write "p + Path('bar')" (where p is a
> Path)?
> What about the case "'bar' + p"?  Since Python isn't C, you can't
> express that as "'bar'[p]"!

The issue I envision is if you write `p + "bar"`, thinking p is a Path,
and p is actually a str object. It won't raise, but give you the wrong

>  > Both '/' and '\\' are accepted as path separators under Windows. Under
>  > Unix, '\\' is a regular character:
> That's outright ugly, especially from the "collections" point of view
> (foo/bar/xyzzy is not a member of foo).  If you want something that
> doesn't suffer from the bogosities of os.path, this kind of platform-
> dependence should be avoided, I think.

Well, you do want to be able to convert str paths to Path objects
without handling path separator conversion by hand. It's a matter of

>  > > Also, there is no good terminology in current use here. The only
>  > > concrete thing I can suggest is that "root" would be better used as
>  > > the term for what you're calling "anchor" as Windows users would
>  > > expect the root of "C:\foo\bar\baz" to be "C:\".
>  > 
>  > But then the root of "C:foo" would be "C:", which sounds wrong:
>  > "C:" isn't a root at all.
> Why not interpret the root of "C:foo" to be None?  The Windows user
> can still get "C:" as the drive, and I don't think that will be
> surprising to them.

That's a possibility indeed. I'd like to have feedback from more
Windows users about your suggestion:

>>> PureNTPath('c:foo').root
>>> PureNTPath('c:\\foo').root

which would also give the following for UNC paths:

>>> PureNTPath('//network/share/foo/bar').root

> I wonder if "mount_point" is sufficiently general to include the roots
> of real local file systems, remote file systems, Windows drives, and
> pseudo file systems?  An obvious problem is that Windows users would
> not find that terminology natural.

Another is that finding mount points is I/O, while finding the root is
a purely lexical operation.



Software development and contracting:

From solipsis at  Sat Oct  6 14:09:24 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 14:09:24 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Fri, 5 Oct 2012 23:16:55 -0600
Eric Snow <ericsnowcurrently at>
> On Fri, Oct 5, 2012 at 5:48 PM, Antoine Pitrou <solipsis at> wrote:
> > On Fri, 05 Oct 2012 14:38:57 -0700
> > Ethan Furman <ethan at> wrote:
> >>
> >> Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'),
> >> and I export it to .csv in the same folder; how would I transform the
> >> above PureNTPath's ext from 'dbf' to 'csv'?
> >
> > Something like:
> >
> >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
> >>>> p.parent()['.')[0] + '.csv']
> > PureNTPath('c:\\orders\\12345\\abc67890.csv')
> >
> > Any suggestion to ease this use case a bit?
> Each namedtuple has a _replace() method that's is used to generate a
> new instance with one or more attributes changed.  We could do
> something similar here:

The concrete Path objects' replace() method already maps to
Note os.replace() is new in 3.3 and is a portable always-overwriting
alternative to os.rename():



Software development and contracting:

From solipsis at  Sat Oct  6 14:18:58 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 14:18:58 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Sat, 6 Oct 2012 11:27:58 +0100
Paul Moore <p.f.moore at> wrote:
> > I would expect "relative" to require an argument.  (Ie, I
> > would expect it to have the semantics of "relative_to".)
> I agree that's what I thought relative() would be when I first read the name.

You are right, relative() could be removed and replaced with the
current relative_to() method. I wasn't sure about how these names would
feel to a native English speaker.

> > Or is the
> > issue that you can't count on PureNTPath(p).relative_to('C:\\') to
> > DTRT?
> It seems to me that if p isn't on drive C:, then the right thing is
> clearly to raise an exception.


>>> PureNTPath('/foo').relative_to('c:/foo')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "", line 894, in relative_to
    .format(str(self), str(formatted)))
ValueError: '\\foo' does not start with 'c:\\foo'

> No ambiguity there - although Unix
> users might well write code that doesn't allow for exceptions from the
> method, just because it's not a possible result on Unix.

Actually, it can raise too:

>>> PurePosixPath('/usr').relative_to('/usr/lib')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "", line 894, in relative_to
    .format(str(self), str(formatted)))
ValueError: '/usr' does not start with '/usr/lib'

You can't really add '..' components and expect the result to be
correct, for example if '/usr/lib' is a symlink to '/lib', then
'/usr/lib/..' is '/', not /usr'.

That's why the resolve() method, which resolves symlinks along the path,
is the only one allowed to muck with '..' components.



Software development and contracting:

From solipsis at  Sat Oct  6 14:25:29 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 14:25:29 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <> <>
Message-ID: <>

Hello Mark,

On Sat, 06 Oct 2012 11:49:35 +0100
Mark Shannon <mark at> wrote:
> >
> > Class hierarchy
> > ---------------
> Lovely ASCII art work :)
> but it does have have the n*m problem of such hierarchies.
> N types of file:
> file, directory, mount-point, drive, root, etc, etc
> and M implementations
> Posix, NT, linux, OSX, network, database, etc, etc

There is no distinction per "type of file": files, directories, etc.
all share the same implementation. So you only have a per-flavour
distinction (Posix / NT).

> I would prefer duck-typing.
> Add ABCs for all the N types of file and use concrete classes for the 
> actual filesystems

It seems to me that "duck typing" and "ABCs" are mutually exclusive,
kind of :)

> Although I'm generally against operator overloading, would the // 
> operator be better than  the // operator as it is more rarely used and 
> more visually distinctive?

You mean "would the / operator be better than the [] operator"?

I didn't choose / at first because I knew this choice would be quite
contentious. However, if there happens to be a strong majority in its
favour, why not.



Software development and contracting:

From phd at  Sat Oct  6 14:26:42 2012
From: phd at (Oleg Broytman)
Date: Sat, 6 Oct 2012 16:26:42 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 06, 2012 at 02:09:24PM +0200, Antoine Pitrou <solipsis at> wrote:
> On Fri, 5 Oct 2012 23:16:55 -0600
> Eric Snow <ericsnowcurrently at>
> wrote:
> > Each namedtuple has a _replace() method that's is used to generate a
> > new instance with one or more attributes changed.  We could do
> > something similar here:
> The concrete Path objects' replace() method already maps to
> os.replace().

   Call it "with":

newpath = path.with_drive('C:')
newpath = path.with_name('newname')
newpath = path.with_ext('.zip')

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From phd at  Sat Oct  6 14:40:49 2012
From: phd at (Oleg Broytman)
Date: Sat, 6 Oct 2012 16:40:49 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman <phd at> wrote:
> newpath = path.with_drive('C:')
> newpath = path.with_name('newname')
> newpath = path.with_ext('.zip')

   BTW, I think having these three -- replacing drive, name and extension --
is enough.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From solipsis at  Sat Oct  6 14:46:35 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 14:46:35 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Sat, 6 Oct 2012 16:40:49 +0400
Oleg Broytman <phd at> wrote:
> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman <phd at> wrote:
> > newpath = path.with_drive('C:')
> > newpath = path.with_name('newname')
> > newpath = path.with_ext('.zip')
>    BTW, I think having these three -- replacing drive, name and extension --
> is enough.

What is the point of replacing the drive?

Replacing the name is already trivial: path.parent()[newname]

So we only need to replace the "basename" and the extension (I think
I'm ok with the "basename" terminology now :-)).



Software development and contracting:

From g.brandl at  Sat Oct  6 14:55:16 2012
From: g.brandl at (Georg Brandl)
Date: Sat, 06 Oct 2012 14:55:16 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <k4p9lf$a5h$>

Am 06.10.2012 14:46, schrieb Antoine Pitrou:
> On Sat, 6 Oct 2012 16:40:49 +0400
> Oleg Broytman <phd at> wrote:
>> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman <phd at> wrote:
>> > newpath = path.with_drive('C:')
>> > newpath = path.with_name('newname')
>> > newpath = path.with_ext('.zip')
>>    BTW, I think having these three -- replacing drive, name and extension --
>> is enough.
> What is the point of replacing the drive?
> Replacing the name is already trivial: path.parent()[newname]
> So we only need to replace the "basename" and the extension (I think
> I'm ok with the "basename" terminology now :-)).

If my crystal ball is correct, the middle example above replaces not the
basename but the "part before the extension".  So we have to find another
name for it ...


From phd at  Sat Oct  6 14:52:27 2012
From: phd at (Oleg Broytman)
Date: Sat, 6 Oct 2012 16:52:27 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 06, 2012 at 02:46:35PM +0200, Antoine Pitrou <solipsis at> wrote:
> On Sat, 6 Oct 2012 16:40:49 +0400
> Oleg Broytman <phd at> wrote:
> > On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman <phd at> wrote:
> > > newpath = path.with_drive('C:')
> > > newpath = path.with_name('newname')
> > > newpath = path.with_ext('.zip')
> > 
> >    BTW, I think having these three -- replacing drive, name and extension --
> > is enough.
> What is the point of replacing the drive?
> Replacing the name is already trivial: path.parent()[newname]
> So we only need to replace the "basename" and the extension (I think
> I'm ok with the "basename" terminology now :-)).

   I'm ok with that.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From solipsis at  Sat Oct  6 14:57:44 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 14:57:44 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<> <k4p9lf$a5h$>
Message-ID: <>

On Sat, 06 Oct 2012 14:55:16 +0200
Georg Brandl <g.brandl at> wrote:
> Am 06.10.2012 14:46, schrieb Antoine Pitrou:
> > On Sat, 6 Oct 2012 16:40:49 +0400
> > Oleg Broytman <phd at> wrote:
> >> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman <phd at> wrote:
> >> > newpath = path.with_drive('C:')
> >> > newpath = path.with_name('newname')
> >> > newpath = path.with_ext('.zip')
> >> 
> >>    BTW, I think having these three -- replacing drive, name and extension --
> >> is enough.
> > 
> > What is the point of replacing the drive?
> > 
> > Replacing the name is already trivial: path.parent()[newname]
> > 
> > So we only need to replace the "basename" and the extension (I think
> > I'm ok with the "basename" terminology now :-)).
> If my crystal ball is correct, the middle example above replaces not the
> basename but the "part before the extension".  So we have to find another
> name for it ...

Well, "basename" is the name proposed for the "part before the
extension". "name" is the full filename.

(so == path.basename + path.ext)



Software development and contracting:

From g.brandl at  Sat Oct  6 15:08:27 2012
From: g.brandl at (Georg Brandl)
Date: Sat, 06 Oct 2012 15:08:27 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <k4p9lf$a5h$>
Message-ID: <k4pae7$hpg$>

Am 06.10.2012 14:57, schrieb Antoine Pitrou:
> On Sat, 06 Oct 2012 14:55:16 +0200
> Georg Brandl <g.brandl at> wrote:
>> Am 06.10.2012 14:46, schrieb Antoine Pitrou:
>> > On Sat, 6 Oct 2012 16:40:49 +0400
>> > Oleg Broytman <phd at> wrote:
>> >> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman <phd at> wrote:
>> >> > newpath = path.with_drive('C:')
>> >> > newpath = path.with_name('newname')
>> >> > newpath = path.with_ext('.zip')
>> >> 
>> >>    BTW, I think having these three -- replacing drive, name and extension --
>> >> is enough.
>> > 
>> > What is the point of replacing the drive?
>> > 
>> > Replacing the name is already trivial: path.parent()[newname]
>> > 
>> > So we only need to replace the "basename" and the extension (I think
>> > I'm ok with the "basename" terminology now :-)).
>> If my crystal ball is correct, the middle example above replaces not the
>> basename but the "part before the extension".  So we have to find another
>> name for it ...
> Well, "basename" is the name proposed for the "part before the
> extension". "name" is the full filename.
> (so == path.basename + path.ext)

Is it?  You said yourself it was easily confused with os.path.basename()'s result.


From mark at  Sat Oct  6 15:08:31 2012
From: mark at (Mark Shannon)
Date: Sat, 06 Oct 2012 14:08:31 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <> <>
Message-ID: <>

On 06/10/12 13:25, Antoine Pitrou wrote:
> Hello Mark,
> On Sat, 06 Oct 2012 11:49:35 +0100
> Mark Shannon <mark at> wrote:
>>> Class hierarchy
>>> ---------------
>> Lovely ASCII art work :)
>> but it does have have the n*m problem of such hierarchies.
>> N types of file:
>> file, directory, mount-point, drive, root, etc, etc
>> and M implementations
>> Posix, NT, linux, OSX, network, database, etc, etc
> There is no distinction per "type of file": files, directories, etc.
> all share the same implementation. So you only have a per-flavour
> distinction (Posix / NT).
>> I would prefer duck-typing.
>> Add ABCs for all the N types of file and use concrete classes for the
>> actual filesystems
> It seems to me that "duck typing" and "ABCs" are mutually exclusive,
> kind of :)
>> Although I'm generally against operator overloading, would the //
>> operator be better than  the // operator as it is more rarely used and
>> more visually distinctive?
> You mean "would the / operator be better than the [] operator"?

Actually I did mean the '//' (floor division) operator as it would stand 
out more than '/'.
It is just something for you to consider (in case you didn't have enough 
possibilities already :) )

> I didn't choose / at first because I knew this choice would be quite
> contentious. However, if there happens to be a strong majority in its
> favour, why not.
> Regards
> Antoine.

From solipsis at  Sat Oct  6 15:42:28 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 15:42:28 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<> <k4p9lf$a5h$>
	<> <k4pae7$hpg$>
Message-ID: <>

On Sat, 06 Oct 2012 15:08:27 +0200
Georg Brandl <g.brandl at> wrote:
> > 
> > Well, "basename" is the name proposed for the "part before the
> > extension". "name" is the full filename.
> > 
> > (so == path.basename + path.ext)
> Is it?  You said yourself it was easily confused with os.path.basename()'s result.

True, but since we already have the name attribute it stands reasonable
for basename to mean something else than name :-)
Do you have another suggestion?



Software development and contracting:

From ubershmekel at  Sat Oct  6 15:49:49 2012
From: ubershmekel at (Yuval Greenfield)
Date: Sat, 6 Oct 2012 15:49:49 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4p9lf$a5h$> <>
	<k4pae7$hpg$> <>
Message-ID: <>

On Sat, Oct 6, 2012 at 3:42 PM, Antoine Pitrou <solipsis at> wrote:

> On Sat, 06 Oct 2012 15:08:27 +0200
> Georg Brandl <g.brandl at> wrote:
> > >
> > > Well, "basename" is the name proposed for the "part before the
> > > extension". "name" is the full filename.
> > >
> > > (so == path.basename + path.ext)
> >
> > Is it?  You said yourself it was easily confused with
> os.path.basename()'s result.
> True, but since we already have the name attribute it stands reasonable
> for basename to mean something else than name :-)
> Do you have another suggestion?
It appears "base name" or "base" is the convention for the part before the

Perhaps os.path.basename should be deprecated in favor of a better named
function one day. But that's probably for a different thread.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From phd at  Sat Oct  6 16:01:16 2012
From: phd at (Oleg Broytman)
Date: Sat, 6 Oct 2012 18:01:16 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <k4p9lf$a5h$>
	<> <k4pae7$hpg$>
Message-ID: <>

On Sat, Oct 06, 2012 at 03:49:49PM +0200, Yuval Greenfield <ubershmekel at> wrote:
> Perhaps os.path.basename should be deprecated in favor of a better named
> function one day. But that's probably for a different thread.

   That's certainly for a different Python. os.path.basename cannot be
renamed because:

1) it's used in millions of programs;
2) it's in line with GNU tools.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From g.brandl at  Sat Oct  6 16:47:06 2012
From: g.brandl at (Georg Brandl)
Date: Sat, 06 Oct 2012 16:47:06 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <k4p9lf$a5h$>
	<> <k4pae7$hpg$>
Message-ID: <k4pg75$te0$>

Am 06.10.2012 15:42, schrieb Antoine Pitrou:
> On Sat, 06 Oct 2012 15:08:27 +0200
> Georg Brandl <g.brandl at> wrote:
>> > 
>> > Well, "basename" is the name proposed for the "part before the
>> > extension". "name" is the full filename.
>> > 
>> > (so == path.basename + path.ext)
>> Is it?  You said yourself it was easily confused with os.path.basename()'s result.
> True, but since we already have the name attribute it stands reasonable
> for basename to mean something else than name :-)
> Do you have another suggestion?

Not really.  I'd prefer "base" or "namebase" though, to at least have a
tiny bit of difference.


From stephen at  Sat Oct  6 16:49:25 2012
From: stephen at (Stephen J. Turnbull)
Date: Sat, 06 Oct 2012 23:49:25 +0900
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou writes:

 > >  > Someone else proposed overloading '+', which would be confusing
 > >  > since we need to be able to combine paths and regular strings, for
 > >  > ease of use.
 > > 
 > > Is it really that obnoxious to write "p + Path('bar')" (where p is a
 > > Path)?
 > > 
 > > What about the case "'bar' + p"?  Since Python isn't C, you can't
 > > express that as "'bar'[p]"!
 > The issue I envision is if you write `p + "bar"`, thinking p is a Path,
 > and p is actually a str object. It won't raise, but give you the wrong
 > result.

No, my point is that for me prepending new segments is quite common,
though not as common as appending them.  The asymmetry of the bracket
operator means that there's no easy way to deal with that.

On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is
a Path, not a string) both seem reasonable to me.  It's true that one
could screw up as you suggest, but that requires *two* mistakes, first
thinking that p is a Path when it's a string, and then forgetting to
convert 'bar' to Path.  I don't think that's very likely if you don't
allow mixing strings and Paths without explicit conversion.

 > >  > Both '/' and '\\' are accepted as path separators under Windows. Under
 > >  > Unix, '\\' is a regular character:
 > > 
 > > That's outright ugly, especially from the "collections" point of view
 > > (foo/bar/xyzzy is not a member of foo).  If you want something that
 > > doesn't suffer from the bogosities of os.path, this kind of platform-
 > > dependence should be avoided, I think.
 > Well, you do want to be able to convert str paths to Path objects
 > without handling path separator conversion by hand. It's a matter of
 > practicality.

Sorry, cut too much context.  I was referring to the use of
path['foo/bar'] where path['foo', 'bar'] will do.  Of course
overloading the constructor is an obvious thing to do.

From ironfroggy at  Sat Oct  6 18:14:40 2012
From: ironfroggy at (Calvin Spealman)
Date: Sat, 6 Oct 2012 12:14:40 -0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Responding late, but I didn't get a chance to get my very strong
feelings on this proposal in yesterday.

I do not like it. I'll give full disclosure and say that I think our
earlier failure to include the path library in the stdlib has been a
loss for Python and I'll always hope we can fix that one day. I still
hold out hope.

It feels like this proposal is "make it object oriented, because
object oriented is good" without any actual justification or obvious
problem this solves. The API looks clunky and redundant, and does not
appear to actually improve anything over the facilities in the os.path
module. This takes a lot of things we can already do with paths and
files and remixes them into a not-so intuitive API for the sake of
change, not for the sake of solving a real problem.

As for specific problems I have with the proposal:

Frankly, I think not keeping the / operator for joining is a huge
mistake. This is the number one best feature of path and despite that
many people don't like it, it makes sense. It makes our most common
path operation read very close to the actual representation of the
what you're creating. This is great.

Not inheriting from str means that we can't directly path these path
objects to existing code that just expects a string, so we have a
really hard boundary around the edges of this new API. It does not
lend itself well to incrementally transitioning to it from existing

The stat operations and other file-facilities tacked on feel out of
place, and limited. Why does it make sense to add these facilities to
path and not other file operations? Why not give me a read method on
paths? or maybe a copy? Putting lots of file facilities on a path
object feels wrong because you can't extend it easily. This is one
place that function(thing) works better than thing.function()

Overall, I'm completely -1 on the whole thing.

On Fri, Oct 5, 2012 at 2:25 PM, Antoine Pitrou <solipsis at> wrote:
> Hello,
> This PEP is a resurrection of the idea of having object-oriented
> filesystem paths in the stdlib. It comes with a general API proposal
> as well as a specific implementation (*). The implementation is young
> and discussion is quite open.
> (*)
> Regards
> Antoine.
> PS: You can all admire my ASCII-art skills.
> PEP: 428
> Title: The pathlib module -- object-oriented filesystem paths
> Version: $Revision$
> Last-Modified: $Date
> Author: Antoine Pitrou <solipsis at>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 30-July-2012
> Python-Version: 3.4
> Post-History:
> Abstract
> ========
> This PEP proposes the inclusion of a third-party module, `pathlib`_, in
> the standard library.  The inclusion is proposed under the provisional
> label, as described in :pep:`411`.  Therefore, API changes can be done,
> either as part of the PEP process, or after acceptance in the standard
> library (and until the provisional label is removed).
> The aim of this library is to provide a simple hierarchy of classes to
> handle filesystem paths and the common operations users do over them.
> .. _`pathlib`:
> Related work
> ============
> An object-oriented API for filesystem paths has already been proposed
> and rejected in :pep:`355`.  Several third-party implementations of the
> idea of object-oriented filesystem paths exist in the wild:
> * The historical ` module`_ by Jason Orendorff, Jason R. Coombs
>   and others, which provides a ``str``-subclassing ``Path`` class;
> * Twisted's slightly specialized `FilePath class`_;
> * An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than
>   ``str``;
> * `Unipath`_, a variation on the str-subclassing approach with two public
>   classes, an ``AbstractPath`` class for operations which don't do I/O and a
>   ``Path`` class for all common operations.
> This proposal attempts to learn from these previous attempts and the
> rejection of :pep:`355`.
> .. _` module`:
> .. _`FilePath class`:
> .. _`AlternativePathClass proposal`:
> .. _`Unipath`:
> Why an object-oriented API
> ==========================
> The rationale to represent filesystem paths using dedicated classes is the
> same as for other kinds of stateless objects, such as dates, times or IP
> addresses.  Python has been slowly moving away from strictly replicating
> the C language's APIs to providing better, more helpful abstractions around
> all kinds of common functionality.  Even if this PEP isn't accepted, it is
> likely that another form of filesystem handling abstraction will be adopted
> one day into the standard library.
> Indeed, many people will prefer handling dates and times using the high-level
> objects provided by the ``datetime`` module, rather than using numeric
> timestamps and the ``time`` module API.  Moreover, using a dedicated class
> allows to enable desirable behaviours by default, for example the case
> insensitivity of Windows paths.
> Proposal
> ========
> Class hierarchy
> ---------------
> The `pathlib`_ module implements a simple hierarchy of classes::
>                            +----------+
>                            |          |
>                   ---------| PurePath |--------
>                   |        |          |       |
>                   |        +----------+       |
>                   |             |             |
>                   |             |             |
>                   v             |             v
>            +---------------+    |     +------------+
>            |               |    |     |            |
>            | PurePosixPath |    |     | PureNTPath |
>            |               |    |     |            |
>            +---------------+    |     +------------+
>                   |             v             |
>                   |          +------+         |
>                   |          |      |         |
>                   |   -------| Path |------   |
>                   |   |      |      |     |   |
>                   |   |      +------+     |   |
>                   |   |                   |   |
>                   |   |                   |   |
>                   v   v                   v   v
>              +-----------+              +--------+
>              |           |              |        |
>              | PosixPath |              | NTPath |
>              |           |              |        |
>              +-----------+              +--------+
> This hierarchy divides path classes along two dimensions:
> * a path class can be either pure or concrete: pure classes support only
>   operations that don't need to do any actual I/O, which are most path
>   manipulation operations; concrete classes support all the operations
>   of pure classes, plus operations that do I/O.
> * a path class is of a given flavour according to the kind of operating
>   system paths it represents.  `pathlib`_ implements two flavours: NT paths
>   for the filesystem semantics embodied in Windows systems, POSIX paths for
>   other systems (````'s terminology is re-used here).
> Any pure class can be instantiated on any system: for example, you can
> manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects
> under Unix, and so on.  However, concrete classes can only be instantiated
> on a matching system: indeed, it would be error-prone to start doing I/O
> with ``NTPath`` objects under Unix, or vice-versa.
> Furthermore, there are two base classes which also act as system-dependent
> factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
> ``PureNTPath`` depending on the operating system.  Similarly, ``Path``
> will instantiate either a ``PosixPath`` or a ``NTPath``.
> It is expected that, in most uses, using the ``Path`` class is adequate,
> which is why it has the shortest name of all.
> No confusion with builtins
> --------------------------
> In this proposal, the path classes do not derive from a builtin type.  This
> contrasts with some other Path class proposals which were derived from
> ``str``.  They also do not pretend to implement the sequence protocol:
> if you want a path to act as a sequence, you have to lookup a dedicate
> attribute (the ``parts`` attribute).
> By avoiding to pass as builtin types, the path classes minimize the potential
> for confusion if they are combined by accident with genuine builtin types.
> Immutability
> ------------
> Path objects are immutable, which makes them hashable and also prevents a
> class of programming errors.
> Sane behaviour
> --------------
> Little of the functionality from os.path is reused.  Many os.path functions
> are tied by backwards compatibility to confusing or plain wrong behaviour
> (for example, the fact that ``os.path.abspath()`` simplifies ".." path
> components without resolving symlinks first).
> Also, using classes instead of plain strings helps make system-dependent
> behaviours natural.  For example, comparing and ordering Windows path
> objects is case-insensitive, and path separators are automatically converted
> to the platform default.
> Useful notations
> ----------------
> The API tries to provide useful notations all the while avoiding magic.
> Some examples::
>     >>> p = Path('/home/antoine/pathlib/')
>     >>>
>     ''
>     >>> p.ext
>     '.py'
>     >>> p.root
>     '/'
>     >>>
>     < ['/', 'home', 'antoine', 'pathlib', '']>
>     >>> list(p.parents())
>     [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
>     >>> p.exists()
>     True
>     >>> p.st_size
>     928
> Pure paths API
> ==============
> The philosophy of the ``PurePath`` API is to provide a consistent array of
> useful path manipulation operations, without exposing a hodge-podge of
> functions like ``os.path`` does.
> Definitions
> -----------
> First a couple of conventions:
> * All paths can have a drive and a root.  For POSIX paths, the drive is
>   always empty.
> * A relative path has neither drive nor root.
> * A POSIX path is absolute if it has a root.  A Windows path is absolute if
>   it has both a drive *and* a root.  A Windows UNC path (e.g.
>   ``\\some\\share\\myfile.txt``) always has a drive and a root
>   (here, ``\\some\\share`` and ``\\``, respectively).
> * A drive which has either a drive *or* a root is said to be anchored.
>   Its anchor is the concatenation of the drive and root.  Under POSIX,
>   "anchored" is the same as "absolute".
> Construction and joining
> ------------------------
> We will present construction and joining together since they expose
> similar semantics.
> The simplest way to construct a path is to pass it its string representation::
>     >>> PurePath('')
>     PurePosixPath('')
> Extraneous path separators and ``"."`` components are eliminated::
>     >>> PurePath('a///b/c/./d/')
>     PurePosixPath('a/b/c/d')
> If you pass several arguments, they will be automatically joined::
>     >>> PurePath('docs', 'Makefile')
>     PurePosixPath('docs/Makefile')
> Joining semantics are similar to os.path.join, in that anchored paths ignore
> the information from the previously joined components::
>     >>> PurePath('/etc', '/usr', 'bin')
>     PurePosixPath('/usr/bin')
> However, with Windows paths, the drive is retained as necessary::
>     >>> PureNTPath('c:/foo', '/Windows')
>     PureNTPath('c:\\Windows')
>     >>> PureNTPath('c:/foo', 'd:')
>     PureNTPath('d:')
> Calling the constructor without any argument creates a path object pointing
> to the logical "current directory"::
>     >>> PurePosixPath()
>     PurePosixPath('.')
> A path can be joined with another using the ``__getitem__`` operator::
>     >>> p = PurePosixPath('foo')
>     >>> p['bar']
>     PurePosixPath('foo/bar')
>     >>> p[PurePosixPath('bar')]
>     PurePosixPath('foo/bar')
> As with constructing, multiple path components can be specified at once::
>     >>> p['bar/xyzzy']
>     PurePosixPath('foo/bar/xyzzy')
> A join() method is also provided, with the same behaviour.  It can serve
> as a factory function::
>     >>> path_factory = p.join
>     >>> path_factory('bar')
>     PurePosixPath('foo/bar')
> Representing
> ------------
> To represent a path (e.g. to pass it to third-party libraries), just call
> ``str()`` on it::
>     >>> p = PurePath('/home/antoine/pathlib/')
>     >>> str(p)
>     '/home/antoine/pathlib/'
>     >>> p = PureNTPath('c:/windows')
>     >>> str(p)
>     'c:\\windows'
> To force the string representation with forward slashes, use the ``as_posix()``
> method::
>     >>> p.as_posix()
>     'c:/windows'
> To get the bytes representation (which might be useful under Unix systems),
> call ``bytes()`` on it, or use the ``as_bytes()`` method::
>     >>> bytes(p)
>     b'/home/antoine/pathlib/'
> Properties
> ----------
> Five simple properties are provided on every path (each can be empty)::
>     >>> p = PureNTPath('c:/pathlib/')
>     >>>
>     'c:'
>     >>> p.root
>     '\\'
>     >>> p.anchor
>     'c:\\'
>     >>>
>     ''
>     >>> p.ext
>     '.py'
> Sequence-like access
> --------------------
> The ``parts`` property provides read-only sequence access to a path object::
>     >>> p = PurePosixPath('/etc/init.d')
>     >>>
>     < ['/', 'etc', 'init.d']>
> Simple indexing returns the invidual path component as a string, while
> slicing returns a new path object constructed from the selected components::
>     >>>[-1]
>     'init.d'
>     >>>[:-1]
>     PurePosixPath('/etc')
> Windows paths handle the drive and the root as a single path component::
>     >>> p = PureNTPath('c:/')
>     >>>
>     < ['c:\\', '']>
>     >>> p.root
>     '\\'
>     >>>[0]
>     'c:\\'
> (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).
> The ``parent()`` method returns an ancestor of the path::
>     >>> p.parent()
>     PureNTPath('c:\\python33\\bin')
>     >>> p.parent(2)
>     PureNTPath('c:\\python33')
>     >>> p.parent(3)
>     PureNTPath('c:\\')
> The ``parents()`` method automates repeated invocations of ``parent()``, until
> the anchor is reached::
>     >>> p = PureNTPath('c:/python33/bin/python.exe')
>     >>> for parent in p.parents(): parent
>     ...
>     PureNTPath('c:\\python33\\bin')
>     PureNTPath('c:\\python33')
>     PureNTPath('c:\\')
> Querying
> --------
> ``is_relative()`` returns True if the path is relative (see definition
> above), False otherwise.
> ``is_reserved()`` returns True if a Windows path is a reserved path such
> as ``CON`` or ``NUL``.  It always returns False for POSIX paths.
> ``match()`` matches the path against a glob pattern::
>     >>> PureNTPath('c:/PATHLIB/').match('c:*lib/*.PY')
>     True
> ``relative()`` returns a new relative path by stripping the drive and root::
>     >>> PurePosixPath('').relative()
>     PurePosixPath('')
>     >>> PurePosixPath('/').relative()
>     PurePosixPath('')
> ``relative_to()`` computes the relative difference of a path to another::
>     >>> PurePosixPath('/usr/bin/python').relative_to('/usr')
>     PurePosixPath('bin/python')
> ``normcase()`` returns a case-folded version of the path for NT paths::
>     >>> PurePosixPath('CAPS').normcase()
>     PurePosixPath('CAPS')
>     >>> PureNTPath('CAPS').normcase()
>     PureNTPath('caps')
> Concrete paths API
> ==================
> In addition to the operations of the pure API, concrete paths provide
> additional methods which actually access the filesystem to query or mutate
> information.
> Constructing
> ------------
> The classmethod ``cwd()`` creates a path object pointing to the current
> working directory in absolute form::
>     >>> Path.cwd()
>     PosixPath('/home/antoine/pathlib')
> File metadata
> -------------
> The ``stat()`` method caches and returns the file's stat() result;
> ``restat()`` forces refreshing of the cache. ``lstat()`` is also provided,
> but doesn't have any caching behaviour::
>     >>> p.stat()
>     posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
> For ease of use, direct attribute access to the fields of the stat structure
> is provided over the path object itself::
>     >>> p.st_size
>     928
>     >>> p.st_mtime
>     1328287308.889562
> Higher-level methods help examine the kind of the file::
>     >>> p.exists()
>     True
>     >>> p.is_file()
>     True
>     >>> p.is_dir()
>     False
>     >>> p.is_symlink()
>     False
> The file owner and group names (rather than numeric ids) are queried
> through matching properties::
>     >>> p = Path('/etc/shadow')
>     >>> p.owner
>     'root'
>     >>>
>     'shadow'
> Path resolution
> ---------------
> The ``resolve()`` method makes a path absolute, resolving any symlink on
> the way.  It is the only operation which will remove "``..``" path components.
> Directory walking
> -----------------
> Simple (non-recursive) directory access is done by iteration::
>     >>> p = Path('docs')
>     >>> for child in p: child
>     ...
>     PosixPath('docs/')
>     PosixPath('docs/_templates')
>     PosixPath('docs/make.bat')
>     PosixPath('docs/index.rst')
>     PosixPath('docs/_build')
>     PosixPath('docs/_static')
>     PosixPath('docs/Makefile')
> This allows simple filtering through list comprehensions::
>     >>> p = Path('.')
>     >>> [child for child in p if child.is_dir()]
>     [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
> Simple and recursive globbing is also provided::
>     >>> for child in p.glob('**/*.py'): child
>     ...
>     PosixPath('')
>     PosixPath('')
>     PosixPath('')
>     PosixPath('docs/')
>     PosixPath('build/lib/')
> File opening
> ------------
> The ``open()`` method provides a file opening API similar to the builtin
> ``open()`` method::
>     >>> p = Path('')
>     >>> with as f: f.readline()
>     ...
>     '#!/usr/bin/env python3\n'
> The ``raw_open()`` method, on the other hand, is similar to ````::
>     >>> fd = p.raw_open(os.O_RDONLY)
>     >>>, 15)
>     b'#!/usr/bin/env '
> Filesystem alteration
> ---------------------
> Several common filesystem operations are provided as methods: ``touch()``,
> ``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
> ``chmod()``, ``lchmod()``, ``symlink_to()``.  More operations could be
> provided, for example some of the functionality of the shutil module.
> Experimental openat() support
> -----------------------------
> On compatible POSIX systems, the concrete PosixPath class can take advantage
> of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of
> open file descriptors as necessary.  Support is enabled by passing the
> *use_openat* argument to the constructor::
>     >>> p = Path(".", use_openat=True)
> Then all paths constructed by navigating this path (either by iteration or
> indexing) will also use the openat() family of functions.  The point of using
> these functions is to avoid race conditions whereby a given directory is
> silently replaced with another (often a symbolic link to a sensitive system
> location) between two accesses.
> .. _`openat()`:
> Copyright
> =========
> This document has been placed into the public domain.
> ..
>     Local Variables:
>     mode: indented-text
>     indent-tabs-mode: nil
>     sentence-end-double-space: t
>     fill-column: 70
>     coding: utf-8
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From ethan at  Sat Oct  6 18:20:00 2012
From: ethan at (Ethan Furman)
Date: Sat, 06 Oct 2012 09:20:00 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>
	<k4ntu7$9c1$>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
> Antoine Pitrou writes:
>> Richard Oudkerk wrote:
>>> Maybe p.basename could be shorthand for'.')[0].
>> Wouldn't there be some confusion with os.path.basename:
>>--> os.path.basename('a/b/c.ext')
>> 'c.ext'

I wouldn't worry too much about this; after all, we are trying to 
replace a primitive system with a more advanced, user-friendly one.

> Also there are applications where "basenames" contain periods (eg,
> wget often creates directories with names like ""), and
> filenames may have multiple extensions, eg, "index.ja.html".
> I think it's reasonable to define "extension" to mean "the portion
> after the last period (if any, maybe including the period), but I
> think usage of the complementary concept is pretty application-
> specific.

FWIW, my own implementation uses the names

.path  -> c:\foo\bar  or  \\computer_name\share\dir1\dir2
.vol   -> c:              \\computer_name\share
.dirs  ->   \foo\bar                           \dir1\dir2

.filename ->  some_file.txt  or  archive.tar.gz
.basename ->  some_file          archive
.ext      ->           .txt             .tar.gz


From ethan at  Sat Oct  6 18:27:26 2012
From: ethan at (Ethan Furman)
Date: Sat, 06 Oct 2012 09:27:26 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<>
Message-ID: <>

Stephen J. Turnbull wrote:
> Ethan Furman writes:
>> Eric Snow wrote:
>>>--> p = PureNTPath('c:/orders/12345/abc67890.dbf')
>>>--> p.replace(ext='.csv')
>>> PureNTPath('c:\\orders\\12345\\abc67890.csv')
>> +1
> How about a more general subst() method?  Indeed, it would need
> keyword arguments for named components like ext, but I often do things
> like "mv ~/Maildir/{tmp,new}/42" in the shell.  I think it would be
> useful to be able to replace any component of a path.

How would 'subst' differ from 'replace'?  As you can see from the 
example, the keyword 'ext' is being used to specify with component gets 


From ethan at  Sat Oct  6 18:38:54 2012
From: ethan at (Ethan Furman)
Date: Sat, 06 Oct 2012 09:38:54 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Antoine Pitrou wrote:
> On Sat, 6 Oct 2012 16:40:49 +0400
> Oleg Broytman <phd at> wrote:
>> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman <phd at> wrote:
>>> newpath = path.with_drive('C:')
>>> newpath = path.with_name('newname')
>>> newpath = path.with_ext('.zip')
>>    BTW, I think having these three -- replacing drive, name and extension --
>> is enough.

I do not.

> What is the point of replacing the drive?

At my work we have identical path structures on several machines, and we 
sometimes move entire branches from one machine to another.  In those 
instances it is good to be able to change from one drive/mount/share to 

> Replacing the name is already trivial: path.parent()[newname]

Or, if '/' is allowed, path.path/newname.

I can see the reasonableness of using indexing (to me, it sorta looks 
like a window onto the path ;) ), but I prefer other methods when 
possible (tender wrists -- arthritis sucks)


From solipsis at  Sat Oct  6 19:08:21 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 19:08:21 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Sat, 6 Oct 2012 12:14:40 -0400
Calvin Spealman <ironfroggy at>
> It feels like this proposal is "make it object oriented, because
> object oriented is good" without any actual justification or obvious
> problem this solves. The API looks clunky and redundant, and does not
> appear to actually improve anything over the facilities in the os.path
> module.

Personally, I cringe everytime I have to type
`os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two
directories upwards of a given path. Compare, with, say:

>>> p = Path('/a/b/c/d')
>>> p.parent(2)

Really, I don't think os.path is the prettiest or most convenient
"battery" in the stdlib.

> This takes a lot of things we can already do with paths and
> files and remixes them into a not-so intuitive API for the sake of
> change, not for the sake of solving a real problem.

Ironing out difficulties such as platform-specific case-sensitivity
rules or the various path separators is a real problem that is not
solved by a os.path-like API, because you can't muck with str and give
it the required semantics for a filesystem path. So people end up
sprinkling their code with calls to os.path.normpath() and/or
os.path.normcase() in the hope that it will appease the Gods of
Portability (which will also lose casing information).

> Not inheriting from str means that we can't directly path these path
> objects to existing code that just expects a string, so we have a
> really hard boundary around the edges of this new API. It does not
> lend itself well to incrementally transitioning to it from existing
> code.

As discussed in the PEP, I consider inheriting from str to be a mistake
when your intent is to provide different semantics from str.

Why should indexing or iterating over a path produce individual
Why should Path.split() split over whitespace by default?
Why should "c:\\" be considered unequal to "C:\\" under Windows? 
Why should startswith() work character by character, rather than path
component by path component?

These are all standard str behaviours that are unhelpful when applied
to filesystem paths.

As for the transition, you just have to call str() on the path object.
Since str() also works on plain str objects (and is a no-op), it seems
rather painless to me.

(Of course, you are not forced to transition. The PEP doesn't call for
deprecation of os.path.)

> The stat operations and other file-facilities tacked on feel out of
> place, and limited. Why does it make sense to add these facilities to
> path and not other file operations? Why not give me a read method on
> paths? or maybe a copy?

There is always room to improve and complete the API without breaking
compatibility. To quote the PEP: ?More operations could be provided,
for example some of the functionality of the shutil module?.

The focus of the PEP is not to enumerate every possible file operation,
but to propose the semantic and syntactic foundations (such as how to
join paths, how to divide them into their individual components, etc.).

> Putting lots of file facilities on a path
> object feels wrong because you can't extend it easily. This is one
> place that function(thing) works better than thing.function()

But you can still define a function() taking a Path as an argument, if
you need to.
Similarly, you can define a function() taking a datetime object if the
datetime object's API lacks some useful functionality for you.



Software development and contracting:

From grosser.meister.morti at  Sat Oct  6 19:26:26 2012
From: grosser.meister.morti at (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Sat, 06 Oct 2012 19:26:26 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Would there be something like this:

 >>> prefix.join("some","sub","path")

This would be the same as:

 >>> prefix["some"]["sub"]["path"]

But the join variant would be much less of a finger-twister on non-english keyboards.

From ericsnowcurrently at  Sat Oct  6 19:29:37 2012
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 6 Oct 2012 11:29:37 -0600
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 6:09 AM, Antoine Pitrou <solipsis at> wrote:
> On Fri, 5 Oct 2012 23:16:55 -0600
> Eric Snow <ericsnowcurrently at>
> wrote:
>> Each namedtuple has a _replace() method that's is used to generate a
>> new instance with one or more attributes changed.  We could do
>> something similar here:
> The concrete Path objects' replace() method already maps to
> os.replace().
> Note os.replace() is new in 3.3 and is a portable always-overwriting
> alternative to os.rename():

Sure.  The point is that the API include some method that works this
way, regardless of what the name ultimately is.  :)  Stephen J.
Turnbull called it subst() and expanded on the idea.


From massimo.dipierro at  Sat Oct  6 19:32:08 2012
From: massimo.dipierro at (Massimo DiPierro)
Date: Sat, 6 Oct 2012 12:32:08 -0500
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

How about something along this lines:

import os

class Path(str):
    def __add__(self,other):
        return Path(self+os.path.sep+other)
    def __getitem__(self,i):
        return self.split(os.path.sep)[i]
    def __setitem__(self,i,v):
        items = self.split(os.path.sep)
        return Path(os.path.sep.join(items))
    def append(self,v):
        self += os.path.sep+v
    def filename(self):
        return self.split(os.path.sep)[-1]
    def folder(self):
        items =self.split(os.path.sep)
        return Path(os.path.sep.join(items[:-1]))

path = Path('/this/is/an/example.png')
print isinstance(path,str) # True
print path[-1] # example.png
print path.filename # example.png
print path.folder # /this/is/an

On Oct 6, 2012, at 12:08 PM, Antoine Pitrou wrote:

> On Sat, 6 Oct 2012 12:14:40 -0400
> Calvin Spealman <ironfroggy at>
> wrote:
>> It feels like this proposal is "make it object oriented, because
>> object oriented is good" without any actual justification or obvious
>> problem this solves. The API looks clunky and redundant, and does not
>> appear to actually improve anything over the facilities in the os.path
>> module.
> Personally, I cringe everytime I have to type
> `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two
> directories upwards of a given path. Compare, with, say:
>>>> p = Path('/a/b/c/d')
>>>> p.parent(2)
> PosixPath('/a/b')
> Really, I don't think os.path is the prettiest or most convenient
> "battery" in the stdlib.
>> This takes a lot of things we can already do with paths and
>> files and remixes them into a not-so intuitive API for the sake of
>> change, not for the sake of solving a real problem.
> Ironing out difficulties such as platform-specific case-sensitivity
> rules or the various path separators is a real problem that is not
> solved by a os.path-like API, because you can't muck with str and give
> it the required semantics for a filesystem path. So people end up
> sprinkling their code with calls to os.path.normpath() and/or
> os.path.normcase() in the hope that it will appease the Gods of
> Portability (which will also lose casing information).
>> Not inheriting from str means that we can't directly path these path
>> objects to existing code that just expects a string, so we have a
>> really hard boundary around the edges of this new API. It does not
>> lend itself well to incrementally transitioning to it from existing
>> code.
> As discussed in the PEP, I consider inheriting from str to be a mistake
> when your intent is to provide different semantics from str.
> Why should indexing or iterating over a path produce individual
> characters?
> Why should Path.split() split over whitespace by default?
> Why should "c:\\" be considered unequal to "C:\\" under Windows? 
> Why should startswith() work character by character, rather than path
> component by path component?
> These are all standard str behaviours that are unhelpful when applied
> to filesystem paths.
> As for the transition, you just have to call str() on the path object.
> Since str() also works on plain str objects (and is a no-op), it seems
> rather painless to me.
> (Of course, you are not forced to transition. The PEP doesn't call for
> deprecation of os.path.)
>> The stat operations and other file-facilities tacked on feel out of
>> place, and limited. Why does it make sense to add these facilities to
>> path and not other file operations? Why not give me a read method on
>> paths? or maybe a copy?
> There is always room to improve and complete the API without breaking
> compatibility. To quote the PEP: ?More operations could be provided,
> for example some of the functionality of the shutil module?.
> The focus of the PEP is not to enumerate every possible file operation,
> but to propose the semantic and syntactic foundations (such as how to
> join paths, how to divide them into their individual components, etc.).
>> Putting lots of file facilities on a path
>> object feels wrong because you can't extend it easily. This is one
>> place that function(thing) works better than thing.function()
> But you can still define a function() taking a Path as an argument, if
> you need to.
> Similarly, you can define a function() taking a datetime object if the
> datetime object's API lacks some useful functionality for you.
> Regards
> Antoine.
> -- 
> Software development and contracting:
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From ericsnowcurrently at  Sat Oct  6 19:41:00 2012
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 6 Oct 2012 11:41:00 -0600
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 6:26 AM, Oleg Broytman <phd at> wrote:
> On Sat, Oct 06, 2012 at 02:09:24PM +0200, Antoine Pitrou <solipsis at> wrote:
>> On Fri, 5 Oct 2012 23:16:55 -0600
>> Eric Snow <ericsnowcurrently at>
>> wrote:
>> > Each namedtuple has a _replace() method that's is used to generate a
>> > new instance with one or more attributes changed.  We could do
>> > something similar here:
>> The concrete Path objects' replace() method already maps to
>> os.replace().
>    Call it "with":
> newpath = path.with_drive('C:')
> newpath = path.with_name('newname')
> newpath = path.with_ext('.zip')

Yeah, having dedicated methods makes more sense here, given the small
number of candidates for replacement.


From guido at  Sat Oct  6 19:44:37 2012
From: guido at (Guido van Rossum)
Date: Sat, 6 Oct 2012 10:44:37 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 5, 2012 at 11:25 AM, Antoine Pitrou <solipsis at> wrote:

> This PEP is a resurrection of the idea of having object-oriented
> filesystem paths in the stdlib. It comes with a general API proposal
> as well as a specific implementation (*). The implementation is young
> and discussion is quite open.

Thanks for getting this started! I haven't read the whole PEP or the
whole thread, but I like many of the principles, such as not deriving
from existing built-in types (str or tuple), immutability, explicitly
caring about OS differences, and distinguishing between pure and
impure (I/O-using) operations. (Though admittedly I'm not super-keen
on the specific term "pure".)

I can't say I'm thrilled about overloading p[s], but I can't get too
excited about p/s either; p+s makes more sense but that would beg the
question of how to append an extension to a path (transforming e.g.
'foo/bar' to 'foo/' by appending '.py'). At the same time I'm
not in the camp that says you can't use / because it's not division.

But rather than diving right into the syntax, I would like to focus on
some use cases. (Some of this may already be in the PEP, my
apologize.) Some things I care about (based on path manipulations I
remember I've written at some point or another):

- Distinguishing absolute paths from relative paths; this affects
joining behavior as for os.path.join().

- Various normal forms that can be used for comparing paths for
equality; there should be a pure normalization as well as an impure
one (like os.path.realpath()).

- An API that encourage Unix lovers to write code that is most likely
also to make sense on Windows.

- An API that encourages Windows lovers to write code that is most
likely also to make sense on Unix.

- Integration with fnmatch (pure) and glob (impure).

- In addition to stat(), some simple derived operations like
getmtime(), getsize(), islink().

- Easy checks and manipulations (applying to the basename) like "ends
with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc
extension with .py", "remove trailing ~", "append .tmp", "remove
leading @", and so on.

- While it's nice to be able to ask for "the extension" it would be
nice if the checks above would not be hardcoded to use "." as a
separator; and it would be nice if the extension-parsing code could
deal with multiple extensions and wasn't confused by names starting or
ending with a dot.

- Matching on patterns on directory names (e.g. "does not contain a
segment named .hg").

- A matching notation based on glob/fnmatch syntax instead of regular

PS. Another occasional use for "posix" style paths I have found is
manipulating the path portion of a URL. There are some posix-like
features, e.g. the interpretation of trailing / as "directory", the
requirement of leading / as root, the interpretation of "." and "..",
and the notion of relative paths (although path joining is different).
It would be nice if the "pure" posix path class could be reused for
this purpose, or if a related class with a subset or superset of the
same methods existed. This may influence the basic design somewhat in
showing the need for custom subclasses etc.

--Guido van Rossum (

From g.brandl at  Sat Oct  6 19:51:40 2012
From: g.brandl at (Georg Brandl)
Date: Sat, 06 Oct 2012 19:51:40 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <k4pr17$i94$>

Am 06.10.2012 19:32, schrieb Massimo DiPierro:
> How about something along this lines:
> import os
> class Path(str):
>     def __add__(self,other):
>         return Path(self+os.path.sep+other)
>     def __getitem__(self,i):
>         return self.split(os.path.sep)[i]
>     def __setitem__(self,i,v):
>         items = self.split(os.path.sep)
>         items[i]=v
>         return Path(os.path.sep.join(items))
>     def append(self,v):
>         self += os.path.sep+v
>     @property
>     def filename(self):
>         return self.split(os.path.sep)[-1]
>     @property
>     def folder(self):
>         items =self.split(os.path.sep)
>         return Path(os.path.sep.join(items[:-1]))
> path = Path('/this/is/an/example.png')
> print isinstance(path,str) # True
> print path[-1] # example.png
> print path.filename # example.png
> print path.folder # /this/is/an

If you inherit from str, you cannot override any of the operations that
str already has (i.e. __add__, __getitem__).  And obviously you also
can't make it mutable, i.e. __setitem__.


From jeanpierreda at  Sat Oct  6 19:53:55 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Sat, 6 Oct 2012 13:53:55 -0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 12:14 PM, Calvin Spealman <ironfroggy at> wrote:
> The stat operations and other file-facilities tacked on feel out of
> place, and limited. Why does it make sense to add these facilities to
> path and not other file operations? Why not give me a read method on
> paths? or maybe a copy? Putting lots of file facilities on a path
> object feels wrong because you can't extend it easily. This is one
> place that function(thing) works better than thing.function()

The only reason to have objects for anything is to let people have
other implementations that do something else with the same method.

I remember one of the advantages to having an object-oriented path
API, that I always wanted, is that the actual filesystem doesn't have
to be what the paths access. They could be names for web resources, or
files within a zip archive, or virtual files on a pretend hard drive
in your demo application. That's fantastic to have, imo, and it's
something function calls (like you suggest) can't possibly support,
because functions aren't extensibly polymorphic.

If we don't get this sort of polymorphism of functionality, there's
very little point to an object oriented path API. It is syntax sugar
for function calls with slightly better type safety (NTPath(...) /
UnixPath(...) == TypeError -- I hope.)

So I'd assume the reason that these methods exist is to enable polymorphism.

As for why your suggested methods don't exist, they are better written
as functions because they don't need to be ad-hoc polymorphic, they
work just fine as regular functions that call methods on path objects.

def read(path):

def copy(path1, path2):'w').write( # won't work for very large
files, blah blah blah

Whereas the open method cannot work this way, because the path should
define how file opening works. (It might return an io.StringIO for
example.) And the return value of .open() might not be a real file
with a real fd, so you can't implement a stat function in terms of
open and f.fileno() and such. And so on.

-- Devin

From g.brandl at  Sat Oct  6 19:57:02 2012
From: g.brandl at (Georg Brandl)
Date: Sat, 06 Oct 2012 19:57:02 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <k4prb9$mhh$>

Am 06.10.2012 16:49, schrieb Stephen J. Turnbull:
> Antoine Pitrou writes:
>  > >  > Someone else proposed overloading '+', which would be confusing
>  > >  > since we need to be able to combine paths and regular strings, for
>  > >  > ease of use.
>  > > 
>  > > Is it really that obnoxious to write "p + Path('bar')" (where p is a
>  > > Path)?
>  > > 
>  > > What about the case "'bar' + p"?  Since Python isn't C, you can't
>  > > express that as "'bar'[p]"!
>  > 
>  > The issue I envision is if you write `p + "bar"`, thinking p is a Path,
>  > and p is actually a str object. It won't raise, but give you the wrong
>  > result.
> No, my point is that for me prepending new segments is quite common,
> though not as common as appending them.  The asymmetry of the bracket
> operator means that there's no easy way to deal with that.
> On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is
> a Path, not a string) both seem reasonable to me.  It's true that one
> could screw up as you suggest, but that requires *two* mistakes, first
> thinking that p is a Path when it's a string, and then forgetting to
> convert 'bar' to Path.  I don't think that's very likely if you don't
> allow mixing strings and Paths without explicit conversion.

But having to call Path() explicitly every time is not very convenient either;
in that case you can also call .join() -- and I bet people would prefer

  p + Path('foo/bar/baz')

(which is probably not correct in all cases) to

  p + Path('foo') + Path('bar') + Path('baz')

just because it's such a pain.

On the other hand, when the explicit conversion is not needed, confusion
will ensue, as Antoine says.

In any case, for me using "+" to join paths is quite ugly.  I guess it's
because after all, I think of the underlying path as a string, and "+" is
hardwired in my brain as string concatenation (at least in Python).


From massimo.dipierro at  Sat Oct  6 20:22:06 2012
From: massimo.dipierro at (Massimo DiPierro)
Date: Sat, 6 Oct 2012 13:22:06 -0500
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <k4pr17$i94$>
References: <>
Message-ID: <>

I was thinking of the api more than the implementation.
The point to me is that it would be nice to have something the behaves as a string and as a list at the same time. 
Here is another possible incomplete implementation.

import os
class Path(object):
    def __init__(self,s='/',sep=os.path.sep):
        self.sep = sep
        self.s = s.split(sep)
    def __str__(self):
        return self.sep.join(self.s)
    def __add__(self,other):
        if other[0]=='':
            return Path(other)
            return Path(str(self)+os.sep+str(other))
    def __getitem__(self,i):
        return self.s[i]
    def __setitem__(self,i,v):
        self.s[i] = v
    def append(self,v):
    def filename(self):
        return self.s[-1]
    def folder(self):
        return Path(self.sep.join(self.s[:-1]))

>>> path = Path('/this/is/an/example.png')
>>> print path[-1]
>>> print path.filename
>>> print path.folder
>>> path[1]='that'
>>> print path.folder + 'this'

On Oct 6, 2012, at 12:51 PM, Georg Brandl wrote:

> Am 06.10.2012 19:32, schrieb Massimo DiPierro:
>> How about something along this lines:
>> import os
>> class Path(str):
>>    def __add__(self,other):
>>        return Path(self+os.path.sep+other)
>>    def __getitem__(self,i):
>>        return self.split(os.path.sep)[i]
>>    def __setitem__(self,i,v):
>>        items = self.split(os.path.sep)
>>        items[i]=v
>>        return Path(os.path.sep.join(items))
>>    def append(self,v):
>>        self += os.path.sep+v
>>    @property
>>    def filename(self):
>>        return self.split(os.path.sep)[-1]
>>    @property
>>    def folder(self):
>>        items =self.split(os.path.sep)
>>        return Path(os.path.sep.join(items[:-1]))
>> path = Path('/this/is/an/example.png')
>> print isinstance(path,str) # True
>> print path[-1] # example.png
>> print path.filename # example.png
>> print path.folder # /this/is/an
> If you inherit from str, you cannot override any of the operations that
> str already has (i.e. __add__, __getitem__).  And obviously you also
> can't make it mutable, i.e. __setitem__.
> Georg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From phd at  Sat Oct  6 11:55:53 2012
From: phd at (Oleg Broytman)
Date: Sat, 6 Oct 2012 13:55:53 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 06, 2012 at 05:04:44PM +0900, "Stephen J. Turnbull" <stephen at> wrote:
>  > Eric Snow wrote:
>  > > Each namedtuple has a _replace() method that's is used to generate a
>  > > new instance with one or more attributes changed.  We could do
>  > > something similar here:
>  > > 
>  > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
>  > >>>> p.replace(ext='.csv')
>  > > PureNTPath('c:\\orders\\12345\\abc67890.csv')
> How about a more general subst() method?  Indeed, it would need
> keyword arguments for named components like ext, but I often do things
> like "mv ~/Maildir/{tmp,new}/42" in the shell.  I think it would be
> useful to be able to replace any component of a path.

   I think this would be overgeneralization. IMO there is no need to
replace parts beyond drive/name/extension. To "replace" root or path
components just construct a new Path.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From ironfroggy at  Sat Oct  6 20:42:22 2012
From: ironfroggy at (Calvin Spealman)
Date: Sat, 6 Oct 2012 14:42:22 -0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 1:08 PM, Antoine Pitrou <solipsis at> wrote:
> On Sat, 6 Oct 2012 12:14:40 -0400
> Calvin Spealman <ironfroggy at>
> wrote:
>> It feels like this proposal is "make it object oriented, because
>> object oriented is good" without any actual justification or obvious
>> problem this solves. The API looks clunky and redundant, and does not
>> appear to actually improve anything over the facilities in the os.path
>> module.
> Personally, I cringe everytime I have to type
> `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two
> directories upwards of a given path. Compare, with, say:
>>>> p = Path('/a/b/c/d')
>>>> p.parent(2)
> PosixPath('/a/b')

I would never do the first version in the first place. I would just
join(my_path, "../..")

Note that we really need to get out of the habit of "import os"
instead of "from os.path import join, etc..." We are making our code
uglier and arbitrarily creating many of your concerns by making the
use of os.path harder than it should be.

> Really, I don't think os.path is the prettiest or most convenient
> "battery" in the stdlib.
>> This takes a lot of things we can already do with paths and
>> files and remixes them into a not-so intuitive API for the sake of
>> change, not for the sake of solving a real problem.
> Ironing out difficulties such as platform-specific case-sensitivity
> rules or the various path separators is a real problem that is not
> solved by a os.path-like API, because you can't muck with str and give
> it the required semantics for a filesystem path. So people end up
> sprinkling their code with calls to os.path.normpath() and/or
> os.path.normcase() in the hope that it will appease the Gods of
> Portability (which will also lose casing information).

I agree this stuff is difficult, but I think normalizing is a lot more
predictable than lots of platform specific paths (both FS and code

>> Not inheriting from str means that we can't directly path these path
>> objects to existing code that just expects a string, so we have a
>> really hard boundary around the edges of this new API. It does not
>> lend itself well to incrementally transitioning to it from existing
>> code.
> As discussed in the PEP, I consider inheriting from str to be a mistake
> when your intent is to provide different semantics from str.
> Why should indexing or iterating over a path produce individual
> characters?
> Why should Path.split() split over whitespace by default?
> Why should "c:\\" be considered unequal to "C:\\" under Windows?
> Why should startswith() work character by character, rather than path
> component by path component?

Good points, but I'm not convinced that subclasses from string means
you can't change these in your subclass.

> These are all standard str behaviours that are unhelpful when applied
> to filesystem paths.

We agree there.

> As for the transition, you just have to call str() on the path object.
> Since str() also works on plain str objects (and is a no-op), it seems
> rather painless to me.

But then I loose all the helpful path information. Something further
down the call chain, path aware, might be able to make use of it.

> (Of course, you are not forced to transition. The PEP doesn't call for
> deprecation of os.path.)

If we are only adding something redundant and intend to leave both
forever, it only feels like bloat. We should be shrinking the stdlib,
not growing it with redundant APIs.

>> The stat operations and other file-facilities tacked on feel out of
>> place, and limited. Why does it make sense to add these facilities to
>> path and not other file operations? Why not give me a read method on
>> paths? or maybe a copy?
> There is always room to improve and complete the API without breaking
> compatibility. To quote the PEP: ?More operations could be provided,
> for example some of the functionality of the shutil module?.

What I meant is that I can't extend it in third party code without
being second class. I can add another library that does file
operations os.path or stat() don't provide, and they sit side by side.

> The focus of the PEP is not to enumerate every possible file operation,
> but to propose the semantic and syntactic foundations (such as how to
> join paths, how to divide them into their individual components, etc.).
>> Putting lots of file facilities on a path
>> object feels wrong because you can't extend it easily. This is one
>> place that function(thing) works better than thing.function()
> But you can still define a function() taking a Path as an argument, if
> you need to.
> Similarly, you can define a function() taking a datetime object if the
> datetime object's API lacks some useful functionality for you.
> Regards
> Antoine.
> --
> Software development and contracting:
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From ethan at  Sat Oct  6 20:39:17 2012
From: ethan at (Ethan Furman)
Date: Sat, 06 Oct 2012 11:39:17 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <k4pr17$i94$>
References: <>	<>	<>	<>
Message-ID: <>

Georg Brandl wrote:
> If you inherit from str, you cannot override any of the operations that
> str already has (i.e. __add__, __getitem__).  

Is this a 3.x thing?  My 2.x version of Path overrides many of the str 
methods and works just fine.

> And obviously you also can't make it mutable, i.e. __setitem__.

Well, since Paths (both Antoine's and mine) are immutable that's not an 


From ethan at  Sat Oct  6 20:44:02 2012
From: ethan at (Ethan Furman)
Date: Sat, 06 Oct 2012 11:44:02 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>
Message-ID: <>

Oleg Broytman wrote:
> On Sat, Oct 06, 2012 at 05:04:44PM +0900, "Stephen J. Turnbull" <stephen at> wrote:
>>  > Eric Snow wrote:
>>  > > Each namedtuple has a _replace() method that's is used to generate a
>>  > > new instance with one or more attributes changed.  We could do
>>  > > something similar here:
>>  > > 
>>  > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf')
>>  > >>>> p.replace(ext='.csv')
>>  > > PureNTPath('c:\\orders\\12345\\abc67890.csv')
>> How about a more general subst() method?  Indeed, it would need
>> keyword arguments for named components like ext, but I often do things
>> like "mv ~/Maildir/{tmp,new}/42" in the shell.  I think it would be
>> useful to be able to replace any component of a path.
>    I think this would be overgeneralization. IMO there is no need to
> replace parts beyond drive/name/extension. To "replace" root or path
> components just construct a new Path.

And if your new path is exactly the same as the old, /except/ for the 
root?  Are you suggesting something like:

--> p = PureNTPath('c:/orders/12345/abc67890.dbf')
--> q = '//another_machine/share' + + p.filename



From mikegraham at  Sat Oct  6 20:56:21 2012
From: mikegraham at (Mike Graham)
Date: Sat, 6 Oct 2012 14:56:21 -0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
Message-ID: <>

On Sat, Oct 6, 2012 at 2:39 PM, Ethan Furman <ethan at> wrote:
> Georg Brandl wrote:
>> If you inherit from str, you cannot override any of the operations that
>> str already has (i.e. __add__, __getitem__).
> Is this a 3.x thing?  My 2.x version of Path overrides many of the str
> methods and works just fine.

This is for theoretical/practical reasons, not technical ones.


From phd at  Sat Oct  6 21:05:34 2012
From: phd at (Oleg Broytman)
Date: Sat, 6 Oct 2012 23:05:34 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 06, 2012 at 11:44:02AM -0700, Ethan Furman <ethan at> wrote:
> Oleg Broytman wrote:
> >   IMO there is no need to
> >replace parts beyond drive/name/extension. To "replace" root or path
> >components just construct a new Path.
> And if your new path is exactly the same as the old, /except/ for
> the root?  Are you suggesting something like:
> --> p = PureNTPath('c:/orders/12345/abc67890.dbf')
> --> q = '//another_machine/share' + + p.filename
> ?

   Yes. Even if the new path differs from the old by one letter
somewhere in a middle component.
   "Practicality beats purity". We need to see real use cases to decide
what is really needed.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From ethan at  Sat Oct  6 20:59:29 2012
From: ethan at (Ethan Furman)
Date: Sat, 06 Oct 2012 11:59:29 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<k4pr17$i94$>	<>
Message-ID: <>

Mike Graham wrote:
> On Sat, Oct 6, 2012 at 2:39 PM, Ethan Furman <ethan at> wrote:
>> Georg Brandl wrote:
>>> If you inherit from str, you cannot override any of the operations that
>>> str already has (i.e. __add__, __getitem__).
>> Is this a 3.x thing?  My 2.x version of Path overrides many of the str
>> methods and works just fine.
> This is for theoretical/practical reasons, not technical ones.

Ah, you mean you can't give them different semantics.  Gotcha.


From storchaka at  Sat Oct  6 22:10:51 2012
From: storchaka at (Serhiy Storchaka)
Date: Sat, 06 Oct 2012 23:10:51 +0300
Subject: [Python-ideas] Propagating StopIteration value
Message-ID: <k4q38d$j8e$>

As StopIteration now have value, this value is lost when using functions 
which works with iterators/generators (map, filter, itertools). 
Therefore, wrapping the iterator, which preserved its semantics in 
versions before 3.3, no longer preserves it:

   map(lambda x: x, iterator)
   filter(lambda x: True, iterator)
   itertools.accumulate(iterator, lambda x, y: y)
   itertools.compress(iterator, itertools.cycle([True]))
   itertools.dropwhile(lambda x: False, iterator)
   itertools.filterfalse(lambda x: False, iterator)
   next(itertools.groupby(iterator, lambda x: None))[1]
   itertools.takewhile(lambda x: True, iterator)
   itertools.tee(iterator, 1)[0]

Perhaps it would be worth to propagate original exception (or at least 
it's value) in functions for which it makes sense.

From g.brandl at  Sat Oct  6 22:20:27 2012
From: g.brandl at (Georg Brandl)
Date: Sat, 06 Oct 2012 22:20:27 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<k4pr17$i94$>	<>
Message-ID: <k4q3o6$m9q$>

Am 06.10.2012 20:59, schrieb Ethan Furman:
> Mike Graham wrote:
>> On Sat, Oct 6, 2012 at 2:39 PM, Ethan Furman <ethan at> wrote:
>>> Georg Brandl wrote:
>>>> If you inherit from str, you cannot override any of the operations that
>>>> str already has (i.e. __add__, __getitem__).
>>> Is this a 3.x thing?  My 2.x version of Path overrides many of the str
>>> methods and works just fine.
>> This is for theoretical/practical reasons, not technical ones.
> Ah, you mean you can't give them different semantics.  Gotcha.

Yep.  Not much use being able to pass them directly to APIs expecting strings
if they can't operate on them like any other string :)


From ethan at  Sat Oct  6 22:19:54 2012
From: ethan at (Ethan Furman)
Date: Sat, 06 Oct 2012 13:19:54 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

I was hesitant to put mine on PyPI because there's already a slew of 
others, but for the sake of discussion here it is [1].

Mine is str based, has no actual I/O components, and can easily be used 
in normal os, shutil, etc., calls.

Example usage:

job = '12345'
home = Path('c:/orders'/job)
work = Path('c:/work/')
for pdf in glob(work/'*.pdf'):
     dash = pdf.filename.index('-')
     dest = home/'reports'/job + pdf.filename[dash:]
     shutil.copy(pdf, dest)

Assuming I haven't typo'ed anything, the above code takes all the pdf 
files, removes the standard (and useless to me) header info before the 
'-' in the filename, then copies it over to its final resting place.

If I understand Antoine's Path, the code would look something like:

job = '12345'
home = Path('c:/orders/')[job]
work = Path('c:/work/')
for child in work:
     if child.ext != '.pdf':
     name = child.filename
     dash = name.index('-')
     dest = home['reports'][name]
     shutil.copy(str(child), str(dest))

My biggest objections are the extra str calls, and indexing just doesn't 
look like path concatenation.



Oh, very nice ascii-art!

From solipsis at  Sat Oct  6 22:39:34 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 6 Oct 2012 22:39:34 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Sat, 06 Oct 2012 13:19:54 -0700
Ethan Furman <ethan at> wrote:
> If I understand Antoine's Path, the code would look something like:
> job = '12345'
> home = Path('c:/orders/')[job]
> work = Path('c:/work/')
> for child in work:
>      if child.ext != '.pdf':
>          continue

You could actually write `for child in work.glob('*.pdf')`
(non-recursive) or `for child in work.glob('**/*.pdf')` (recursive).



Software development and contracting:

From mikegraham at  Sat Oct  6 22:47:36 2012
From: mikegraham at (Mike Graham)
Date: Sat, 6 Oct 2012 16:47:36 -0400
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4q38d$j8e$>
References: <k4q38d$j8e$>
Message-ID: <>

On Sat, Oct 6, 2012 at 4:10 PM, Serhiy Storchaka <storchaka at> wrote:
> As StopIteration now have value, this value is lost when using functions
> which works with iterators/generators (map, filter, itertools). Therefore,
> wrapping the iterator, which preserved its semantics in versions before 3.3,
> no longer preserves it:
>   map(lambda x: x, iterator)
>   filter(lambda x: True, iterator)
>   itertools.accumulate(iterator, lambda x, y: y)
>   itertools.chain(iterator)
>   itertools.compress(iterator, itertools.cycle([True]))
>   itertools.dropwhile(lambda x: False, iterator)
>   itertools.filterfalse(lambda x: False, iterator)
>   next(itertools.groupby(iterator, lambda x: None))[1]
>   itertools.takewhile(lambda x: True, iterator)
>   itertools.tee(iterator, 1)[0]
> Perhaps it would be worth to propagate original exception (or at least it's
> value) in functions for which it makes sense.

Can you provide an example of a time when you want to use such a value
with a generator on which you want to use one of these so I can better
understand why this is necessary? the times I'm familiar with wanting
this value I'd usually be manually stepping through my generator.


From storchaka at  Sat Oct  6 23:01:40 2012
From: storchaka at (Serhiy Storchaka)
Date: Sun, 07 Oct 2012 00:01:40 +0300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <k4q67n$9cs$>

On 05.10.12 21:25, Antoine Pitrou wrote:
> PS: You can all admire my ASCII-art skills.

PurePosixPath and PureNTPath looks closer to Path than to PurePath.

> The ``parent()`` method returns an ancestor of the path::

p[:-n] is shorter and looks neater than p.parent(n). Possible the 
``parent()`` method is unnecessary?

From amcnabb at  Sat Oct  6 23:45:40 2012
From: amcnabb at (Andrew McNabb)
Date: Sat, 6 Oct 2012 16:45:40 -0500
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 06, 2012 at 01:54:21PM +1300, Greg Ewing wrote:
> Andrew McNabb wrote:
> >This is the difference between C++ style operators, where the only thing
> >that matters is what the operator symbol looks like, and Python style
> >operators, where an operator symbol is just syntactic sugar.  In Python,
> >the "/" is synonymous with `operator.div` and is defined in terms of the
> >`__div__` special method.  This distinction is why I hate operator
> >overloading in C++ but like it in Python.
> Not sure what you're saying here -- in both languages, operators
> are no more than syntactic sugar for dispatching to an appropriate
> method or function. Python just avoids introducing a special syntax
> for spelling the name of the operator, which is nice, but it's
> not a huge difference.

To clarify my point: in Python, "/" is not just a symbol--it
specifically means "div".

> The same issues of what you *should* use operators for arises in
> both communities, and it seems to be very much a matter of
> personal taste.

Overriding the div operator requires creating a "__div__" special
method, which I think has helped influence personal taste within the
Python community.  I personally would feel dirty creating a "__div__"
method that had absolutely nothing to do with division.

Whether or not the sense of personal taste within the Python community
is directly attributable to this or not, I believe that overloaded
operators in Python tend to be more predictable and consistent than what
I have seen in C++.

Andrew McNabb
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

From ethan at  Sat Oct  6 23:27:52 2012
From: ethan at (Ethan Furman)
Date: Sat, 06 Oct 2012 14:27:52 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <k4q67n$9cs$>
References: <> <k4q67n$9cs$>
Message-ID: <>

Serhiy Storchaka wrote:
> On 05.10.12 21:25, Antoine Pitrou wrote:
>> PS: You can all admire my ASCII-art skills.
> PurePosixPath and PureNTPath looks closer to Path than to PurePath.
>> The ``parent()`` method returns an ancestor of the path::
> p[:-n] is shorter and looks neater than p.parent(n). Possible the 
> ``parent()`` method is unnecessary?

Sequencing currently operates as an os.listdir, so [:-n] would give the 
last entry of the folder.

Perhaps Path should not have a default iteration, but instead have 
.children, .parents, .parts, etc.


From guido at  Sun Oct  7 00:00:54 2012
From: guido at (Guido van Rossum)
Date: Sat, 6 Oct 2012 15:00:54 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

This is an incredibly important discussion.

I would like to contribute despite my limited experience with the
various popular options. My own async explorations are limited to the
constraints of the App Engine runtime environment, where a rather
unique type of reactor is required. I am developing some ideas around
separating reactors, futures, and yield-based coroutines, but they
take more thinking and probably some experimental coding before I'm
ready to write it up in any detail. For a hint on what I'm after, you
might read up on monocle ( and my
approach to building coroutines on top of Futures

In the mean time I'd like to bring up a few higher-order issues:

(1) How importance is it to offer a compatibility path for asyncore? I
would have thought that offering an integration path forward for
Twisted and Tornado would be more important.

(2) We're at a fork in the road here. On the one hand, we could choose
to deeply integrate greenlets/gevents into the standard library. (It's
not monkey-patching if it's integrated, after all. :-) I'm not sure
how this would work for other implementations than CPython, or even
how to address CPython on non-x86 architectures. But users seem to
like the programming model: write synchronous code, get async
operation for free. It's easy to write protocol parsers that way. On
the other hand, we could reject this approach: the integration would
never be completely smooth, there's the issue of other implementations
and architectures, it probably would never work smoothly even for
CPython/x86 when 3rd party extension modules are involved.
Callback-based APIs don't have these downsides, but they are harder to
program; however we can make programming them easier by using
yield-based coroutines. Even Twisted offers those (inline callbacks).

Before I invest much more time in these ideas I'd like to at least
have (2) sorted out.

--Guido van Rossum (

From massimo.dipierro at  Sun Oct  7 00:10:09 2012
From: massimo.dipierro at (Massimo DiPierro)
Date: Sat, 6 Oct 2012 17:10:09 -0500
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

I would strongly support integrating gevents into the standard library.
That would finally make me switch to Python 3. :-)

On Oct 6, 2012, at 5:00 PM, Guido van Rossum wrote:

> This is an incredibly important discussion.
> I would like to contribute despite my limited experience with the
> various popular options. My own async explorations are limited to the
> constraints of the App Engine runtime environment, where a rather
> unique type of reactor is required. I am developing some ideas around
> separating reactors, futures, and yield-based coroutines, but they
> take more thinking and probably some experimental coding before I'm
> ready to write it up in any detail. For a hint on what I'm after, you
> might read up on monocle ( and my
> approach to building coroutines on top of Futures
> (
> In the mean time I'd like to bring up a few higher-order issues:
> (1) How importance is it to offer a compatibility path for asyncore? I
> would have thought that offering an integration path forward for
> Twisted and Tornado would be more important.
> (2) We're at a fork in the road here. On the one hand, we could choose
> to deeply integrate greenlets/gevents into the standard library. (It's
> not monkey-patching if it's integrated, after all. :-) I'm not sure
> how this would work for other implementations than CPython, or even
> how to address CPython on non-x86 architectures. But users seem to
> like the programming model: write synchronous code, get async
> operation for free. It's easy to write protocol parsers that way. On
> the other hand, we could reject this approach: the integration would
> never be completely smooth, there's the issue of other implementations
> and architectures, it probably would never work smoothly even for
> CPython/x86 when 3rd party extension modules are involved.
> Callback-based APIs don't have these downsides, but they are harder to
> program; however we can make programming them easier by using
> yield-based coroutines. Even Twisted offers those (inline callbacks).
> Before I invest much more time in these ideas I'd like to at least
> have (2) sorted out.
> -- 
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From solipsis at  Sun Oct  7 00:24:02 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 7 Oct 2012 00:24:02 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
References: <>
Message-ID: <>

On Sat, 6 Oct 2012 15:00:54 -0700
Guido van Rossum <guido at> wrote:
> (2) We're at a fork in the road here. On the one hand, we could choose
> to deeply integrate greenlets/gevents into the standard library. (It's
> not monkey-patching if it's integrated, after all. :-) I'm not sure
> how this would work for other implementations than CPython, or even
> how to address CPython on non-x86 architectures. But users seem to
> like the programming model: write synchronous code, get async
> operation for free. It's easy to write protocol parsers that way. On
> the other hand, we could reject this approach: the integration would
> never be completely smooth, there's the issue of other implementations
> and architectures, it probably would never work smoothly even for
> CPython/x86 when 3rd party extension modules are involved.
> Callback-based APIs don't have these downsides, but they are harder to
> program; however we can make programming them easier by using
> yield-based coroutines. Even Twisted offers those (inline callbacks).

greenlets/gevents only get you half the advantages of single-threaded
"async" programming: they get you scalability in the face of a high
number of concurrent connections, but they don't get you the robustness
of cooperative multithreading (because it's not obvious when reading
the code where the possible thread-switching points are).

(I don't actually understand the attraction of gevent, except for
extreme situations; threads should be cheap on a decent OS)



Software development and contracting:

From greg.ewing at  Sun Oct  7 00:41:01 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 07 Oct 2012 11:41:01 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <> <>
Message-ID: <>

Antoine Pitrou wrote:
> I didn't choose / at first because I knew this choice would be quite
> contentious. However, if there happens to be a strong majority in its
> favour, why not.

Count me as +1 on / as a path concatenation operator.

It's very intuitive, IMO, and it would free up indexing
for the purpose of extracting pathname components, which
is a more intuitive use for that as well, I think.


From christian at  Sun Oct  7 00:41:24 2012
From: christian at (Christian Heimes)
Date: Sun, 07 Oct 2012 00:41:24 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Am 05.10.2012 20:25, schrieb Antoine Pitrou:
> Hello,
> This PEP is a resurrection of the idea of having object-oriented
> filesystem paths in the stdlib. It comes with a general API proposal
> as well as a specific implementation (*). The implementation is young
> and discussion is quite open.

I already gave you my +1 on #python-dev. I've some additional ideas that
I like to suggest for pathlib.

* Jason Orendorff's path module has some methods that are quite useful
for shell and find like script. I especially like the
files(pattern=None), dirs(pattern=None) and their recursive counterparts
walkfiles() and walkdirs(). They make code like recursively remove all
pyc files easy to write:

  for pyc in path.walkfiles('*.py'):

* I like to see a convenient method to format sizes in SI units (for
example 1.2 MB, 5 GB) and non SI units (MiB, GiB, aka human readable,
multiple of 2). I've some code that would be useful for the task.

* Web application often need to know the mimetype of a file. How about a
mimetype property that returns the mimetype according to the extension?

* Symlink and directory traversal attacks are a constant thread. I like
to see a pathlib object that restricts itself an all its offsprings to a
directory. Perhaps this can be implemented as a proxy object around a
pathlib object?

* While we are working on pathlib I like to improve os.listdir() in two
ways. The os.listdir() function currently returns a list of file names.
This can consume lots of memory for a directory with hundreds of
thousands files. How about I implement an iterator version that returns
some additional information, too? On Linux and most BSD you can get the
file type (d_type, e.g. file, directory, symlink) for free.

* Implement "if filename in directory" with os.path.exists().


From josiah.carlson at  Sun Oct  7 00:44:05 2012
From: josiah.carlson at (Josiah Carlson)
Date: Sat, 6 Oct 2012 15:44:05 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 5, 2012 at 1:09 PM, Antoine Pitrou <solipsis at> wrote:
> On Fri, 5 Oct 2012 11:51:21 -0700
> Josiah Carlson <josiah.carlson at>
> wrote:
>> My long-term dream (which has been the case for 6+ years, since I
>> proposed doing it myself on the python-dev mailing list and was told
>> "no") is that whether someone uses urllib2, httplib2, smtpd, requests,
>> ftplib, etc., they all have access to high-quality protocol-level
>> protocol parsers.
> I'm not sure what you're talking about: what were you told "no" about,
> specifically? Your proposal sounds reasonable and (ideally) desirable to
> me.

I've managed to find the email where I half-way proposed it (though
not as pointed as what I posted above):

Phillip J. Eby said in a reply that policy would kill it. My
experience at the time told me that policy was a tough nut to crack,
and my 24-year old self wasn't confident enough to keep pushing (even
though I had the time). Now, my 32-year old self has the confidence
and the knowledge to do it (or advise how to do it), but not the time
(I'm finishing up my first book, doing a conference tour, running a
startup, and preparing for my first child).

One of the big reasons why I like and am pushing Giampaolo's ideas
(and existing code) is my faith that he *can* and *will* do it, if he
says he will.

 - Josiah

From greg.ewing at  Sun Oct  7 00:49:43 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 07 Oct 2012 11:49:43 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

Mark Shannon wrote:
> Actually I did mean the '//' (floor division) operator as it would stand 
> out more than '/'.


This would weaken the mnemonic value. We separate
paths with single slashes, not double slashes.


From greg.ewing at  Sun Oct  7 00:57:04 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 07 Oct 2012 11:57:04 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<k4p9lf$a5h$> <>
	<k4pae7$hpg$> <>
Message-ID: <>

Antoine Pitrou wrote:

> True, but since we already have the name attribute it stands reasonable
> for basename to mean something else than name :-)
> Do you have another suggestion?

If we have a method for replacing the extension, I don't think
we have a strong need a name for "all of the last name except the
extension", because usually all you want that for is so you can add
a different extension (possibly empty).

So I propose to avoid the term "basename" altogether, and just
have --> all of the last component
    path.ext --> the extension

    path.with_name(foo) -- replaces all of the last component
    path.with_ext(ext) -- replaces the extension

Then if you really want to extract the last component without the
extension (which I expect to be a rare requirement), you can do



From greg.ewing at  Sun Oct  7 01:01:21 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 07 Oct 2012 12:01:21 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Stephen J. Turnbull wrote:

> On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is
> a Path, not a string) both seem reasonable to me.

I don't like the idea of using + as the path concatenation
operator, because

    path + ".c"

is an obvious way to add an extension or other suffix to a
filename, and it ought to work.


From greg.ewing at  Sun Oct  7 01:21:01 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 07 Oct 2012 12:21:01 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:
> Personally, I cringe everytime I have to type
> `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two
> directories upwards of a given path. Compare, with, say:
>>>>p = Path('/a/b/c/d')

Or if we allow slicing,



From greg.ewing at  Sun Oct  7 01:22:32 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 07 Oct 2012 12:22:32 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <> <>
Message-ID: <>

Mathias Panzenb?ck wrote:
> Would there be something like this:
>  >>> prefix.join("some","sub","path")

Using a / operator, this would be

    prefix / "some" / "sub" / "path"


From greg.ewing at  Sun Oct  7 01:28:51 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 07 Oct 2012 12:28:51 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Massimo DiPierro wrote:
> How about something along this lines:
> class Path(str):
 >    ...
> path = Path('/this/is/an/example.png')
> print path[-1] # example.png

Unfortunately, if you subclass from str, I don't think it will
be feasible to make indexing return pathname components, because
code that's treating it as a string will be expecting it to
index characters.

Similarly you can't make + mean path concatenation -- it must
remain ordinary string concatenation.


From guido at  Sun Oct  7 02:23:48 2012
From: guido at (Guido van Rossum)
Date: Sat, 6 Oct 2012 17:23:48 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou <solipsis at> wrote:
> On Sat, 6 Oct 2012 15:00:54 -0700
> Guido van Rossum <guido at> wrote:
>> (2) We're at a fork in the road here. On the one hand, we could choose
>> to deeply integrate greenlets/gevents into the standard library. (It's
>> not monkey-patching if it's integrated, after all. :-) I'm not sure
>> how this would work for other implementations than CPython, or even
>> how to address CPython on non-x86 architectures. But users seem to
>> like the programming model: write synchronous code, get async
>> operation for free. It's easy to write protocol parsers that way. On
>> the other hand, we could reject this approach: the integration would
>> never be completely smooth, there's the issue of other implementations
>> and architectures, it probably would never work smoothly even for
>> CPython/x86 when 3rd party extension modules are involved.
>> Callback-based APIs don't have these downsides, but they are harder to
>> program; however we can make programming them easier by using
>> yield-based coroutines. Even Twisted offers those (inline callbacks).
> greenlets/gevents only get you half the advantages of single-threaded
> "async" programming: they get you scalability in the face of a high
> number of concurrent connections, but they don't get you the robustness
> of cooperative multithreading (because it's not obvious when reading
> the code where the possible thread-switching points are).

I used to think that too, long ago, until I discovered that as you add
abstraction layers, cooperative multithreading is untenable -- sooner
or later you will lose track of where the threads are switched.

> (I don't actually understand the attraction of gevent, except for
> extreme situations; threads should be cheap on a decent OS)

I think it's the observation that the number of sockets you can
realistically have open in a single process or machine is always 1-2
orders of maginuted larger than the number of threads you can have --
and this makes sense since the total amount of memory (kernel and
user) to represent a socket is just much smaller than needed for a
thread. Just check the configuration limits of your typical Linux
kernel if you don't believe me. :-)

--Guido van Rossum (

From ben+python at  Sun Oct  7 02:41:14 2012
From: ben+python at (Ben Finney)
Date: Sun, 07 Oct 2012 11:41:14 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

Antoine Pitrou <solipsis at>

>     >>> p = Path('/home/antoine/pathlib/')
>     >>>
>     ''
>     >>> p.ext
>     '.py'

The term ?extension? is a barnacle from mainframe filesystems where a
filename is necessarily divided into exactly two parts, the name and the
extension. It doesn't really apply to POSIX filesystems.

On filesystems where the user has always been free to have any number of
parts in a filename, the closest concept is better referred to by the
term ?suffix?::

    >>> p.suffix

It may be useful to add an API method to query the *sequence* of
suffixes of a filename::

    >>> p = Path('/home/antoine/pathlib.tar.gz')
    >>> p.suffix
    >>> p.suffixes
    ['.tar', '.gz']

Thanks for keeping this proposal active, Antoine.

 \         ?In any great organization it is far, far safer to be wrong |
  `\          with the majority than to be right alone.? ?John Kenneth |
_o__)                                            Galbraith, 1989-07-28 |
Ben Finney

From carlopires at  Sun Oct  7 02:45:59 2012
From: carlopires at (Carlo Pires)
Date: Sat, 6 Oct 2012 21:45:59 -0300
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>


Can we dream with gevent integrated to standard cpython ? This would be a
fantastic path for 3.4 :)

And I definitely should move to 3.x.

Because for web programming, I just can't think another way to program
using python. I'm seeing some people going to other languages where async
is more easy like Go (some are trying Erlang). Async is a MUST HAVE for web
programming these days...

In my experience, I've found that "robustness of cooperative
multithreading" come at the price of a code difficult to maintain. And, in
single threading it never reach the SMP benefits with easy. Thats why
erlang shines... it abstracts the hard work of to maintain the switching
under control. Gevent walks the same line: makes the programmer life easier.

  Carlo Pires

2012/10/6 Guido van Rossum <guido at>

> This is an incredibly important discussion.
> I would like to contribute despite my limited experience with the
> various popular options. My own async explorations are limited to the
> constraints of the App Engine runtime environment, where a rather
> unique type of reactor is required. I am developing some ideas around
> separating reactors, futures, and yield-based coroutines, but they
> take more thinking and probably some experimental coding before I'm
> ready to write it up in any detail. For a hint on what I'm after, you
> might read up on monocle ( and my
> approach to building coroutines on top of Futures
> (
> ).
> In the mean time I'd like to bring up a few higher-order issues:
> (1) How importance is it to offer a compatibility path for asyncore? I
> would have thought that offering an integration path forward for
> Twisted and Tornado would be more important.
> (2) We're at a fork in the road here. On the one hand, we could choose
> to deeply integrate greenlets/gevents into the standard library. (It's
> not monkey-patching if it's integrated, after all. :-) I'm not sure
> how this would work for other implementations than CPython, or even
> how to address CPython on non-x86 architectures. But users seem to
> like the programming model: write synchronous code, get async
> operation for free. It's easy to write protocol parsers that way. On
> the other hand, we could reject this approach: the integration would
> never be completely smooth, there's the issue of other implementations
> and architectures, it probably would never work smoothly even for
> CPython/x86 when 3rd party extension modules are involved.
> Callback-based APIs don't have these downsides, but they are harder to
> program; however we can make programming them easier by using
> yield-based coroutines. Even Twisted offers those (inline callbacks).
> Before I invest much more time in these ideas I'd like to at least
> have (2) sorted out.
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ericsnowcurrently at  Sun Oct  7 02:47:56 2012
From: ericsnowcurrently at (Eric Snow)
Date: Sat, 6 Oct 2012 18:47:56 -0600
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 6, 2012 6:41 PM, "Ben Finney" <ben+python at> wrote:
> Antoine Pitrou <solipsis at>
> writes:
> >     >>> p = Path('/home/antoine/pathlib/')
> >     >>>
> >     ''
> >     >>> p.ext
> >     '.py'
> The term ?extension? is a barnacle from mainframe filesystems where a
> filename is necessarily divided into exactly two parts, the name and the
> extension. It doesn't really apply to POSIX filesystems.
> On filesystems where the user has always been free to have any number of
> parts in a filename, the closest concept is better referred to by the
> term ?suffix?::
>     >>> p.suffix
>     '.py'
> It may be useful to add an API method to query the *sequence* of
> suffixes of a filename::
>     >>> p = Path('/home/antoine/pathlib.tar.gz')
>     >>>
>     'pathlib.tar.gz'
>     >>> p.suffix
>     '.gz'
>     >>> p.suffixes
>     ['.tar', '.gz']


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Sun Oct  7 03:09:44 2012
From: steve at (Steven D'Aprano)
Date: Sun, 07 Oct 2012 12:09:44 +1100
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4q38d$j8e$>
References: <k4q38d$j8e$>
Message-ID: <>

On 07/10/12 07:10, Serhiy Storchaka wrote:
> As StopIteration now have value, this value is lost when using functions which
>works with iterators/generators (map, filter, itertools). Therefore, wrapping
>the iterator, which preserved its semantics in versions before 3.3, no longer
>  preserves it:
> Perhaps it would be worth to propagate original exception (or at least it's
>value) in functions for which it makes sense.

A concrete example would be useful for those who don't know about the (new?)
StopIteration.value attribute. I think you are referring to this:

py> def myiter():
...     yield 1
...     raise StopIteration("spam")
py> it = map(lambda x:x, myiter())
py> next(it)
py> next(it)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>

The argument given to StopIteration is eaten by map.

But this is not *new* to 3.3, it goes back to at least 2.4, so I'm
not sure if you are talking about this or something different.


From ben+python at  Sun Oct  7 03:13:23 2012
From: ben+python at (Ben Finney)
Date: Sun, 07 Oct 2012 12:13:23 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<> <>
	<> <k4p9lf$a5h$>
	<> <k4pae7$hpg$>
Message-ID: <>

Greg Ewing <greg.ewing at> writes:

> If we have a method for replacing the extension, I don't think
> we have a strong need a name for "all of the last name except the
> extension", because usually all you want that for is so you can add
> a different extension (possibly empty).

This is based on the false concept that there is one ?extension? in a
filename. On POSIX filesystems, that's just not true; filenames often
have several suffixes in sequence, e.g. ?foo.tar.gz? or ?,
and each one conveys meaningful intent by whoever named the file.

> So I propose to avoid the term "basename" altogether, and just
> have
> --> all of the last component
>    path.ext --> the extension
>    path.with_name(foo) -- replaces all of the last component
>    path.with_ext(ext) -- replaces the extension

+1 on avoiding the term ?basename? for anything to do with the concept
being discussed here, since it already has a different meaning (?the
part of the filename without any leading directory parts?).

?1 on entrenching this false concept of ?the extension? of a filename.

 \          Eccles: ?I'll get [the job] too, you'll see. I'm wearing a |
  `\        Cambridge tie.?  Greenslade: ?What were you doing there??  |
_o__)   Eccles: ?Buying a tie.? ?The Goon Show, _The Greenslade Story_ |
Ben Finney

From ethan at  Sun Oct  7 03:13:17 2012
From: ethan at (Ethan Furman)
Date: Sat, 06 Oct 2012 18:13:17 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Ben Finney wrote:
> Antoine Pitrou <solipsis at>
> writes:
>>     >>> p = Path('/home/antoine/pathlib/')
>>     >>>
>>     ''
>>     >>> p.ext
>>     '.py'
> The term ?extension? is a barnacle from mainframe filesystems where a
> filename is necessarily divided into exactly two parts, the name and the
> extension. It doesn't really apply to POSIX filesystems.
> On filesystems where the user has always been free to have any number of
> parts in a filename, the closest concept is better referred to by the
> term ?suffix?::
>     >>> p.suffix
>     '.py'
> It may be useful to add an API method to query the *sequence* of
> suffixes of a filename::
>     >>> p = Path('/home/antoine/pathlib.tar.gz')
>     >>>
>     'pathlib.tar.gz'
>     >>> p.suffix
>     '.gz'
>     >>> p.suffixes
>     ['.tar', '.gz']


From steve at  Sun Oct  7 03:36:30 2012
From: steve at (Steven D'Aprano)
Date: Sun, 07 Oct 2012 12:36:30 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <> <>
Message-ID: <>

On 07/10/12 09:41, Christian Heimes wrote:

> * Jason Orendorff's path module has some methods that are quite useful
> for shell and find like script. I especially like the
> files(pattern=None), dirs(pattern=None) and their recursive counterparts
> walkfiles() and walkdirs(). They make code like recursively remove all
> pyc files easy to write:
>    for pyc in path.walkfiles('*.py'):
>        pyc.remove()

Ouch! My source code!!! *grin*

> * I like to see a convenient method to format sizes in SI units (for
> example 1.2 MB, 5 GB) and non SI units (MiB, GiB, aka human readable,
> multiple of 2). I've some code that would be useful for the task.

So do I.

Although it's only listed as an "alpha" package, that's just me being
conservative about allowing changes to the API. The code is actually
fairly mature.

If there is interest in having this in the standard library, I am more
than happy to target 3.4 and commit to maintaining it.


From steve at  Sun Oct  7 03:41:44 2012
From: steve at (Steven D'Aprano)
Date: Sun, 07 Oct 2012 12:41:44 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 07/10/12 04:08, Antoine Pitrou wrote:

> Personally, I cringe everytime I have to type
> `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two
> directories upwards of a given path. Compare, with, say:

I would cringe too if I did that, because it goes THREE directories
up, not two:

py> path = '/a/b/c/d'
py> os.path.dirname(os.path.dirname(os.path.dirname(path)))


>>>> p = Path('/a/b/c/d')
>>>> p.parent(2)
> PosixPath('/a/b')

You know, I don't think I've ever needed to call dirname more than
once at a time, but if I was using it a lot:

parent = os.path.dirname

which is not as short as p.parent(3), but it's still pretty clear.


From guido at  Sun Oct  7 03:45:54 2012
From: guido at (Guido van Rossum)
Date: Sat, 6 Oct 2012 18:45:54 -0700
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

On Sat, Oct 6, 2012 at 6:09 PM, Steven D'Aprano <steve at> wrote:
> A concrete example would be useful for those who don't know about the (new?)
> StopIteration.value attribute. I think you are referring to this:
> py> def myiter():
> ...     yield 1
> ...     raise StopIteration("spam")
> ...
> py> it = map(lambda x:x, myiter())
> py> next(it)
> 1
> py> next(it)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> StopIteration
> The argument given to StopIteration is eaten by map.
> But this is not *new* to 3.3, it goes back to at least 2.4, so I'm
> not sure if you are talking about this or something different.

What's new in 3.3 (due to PEP 380) is that instead of the rather
awkward and uncommon

  raise StopIteration("spam")

you can now write

  return "spam"

with exactly the same effect.

But yes, this was all considered and accepted when PEP 380 was debated
(endlessly :-), and I see no reason to change anything about this.
"Don't do that" is the best I can say about it -- there are a zillion
other situations in Python where that's the only sensible motto.

--Guido van Rossum (

From greg.ewing at  Sun Oct  7 04:11:54 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 07 Oct 2012 15:11:54 +1300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

Steven D'Aprano wrote:

> py> def myiter():
> ...     yield 1
> ...     raise StopIteration("spam")
> ...
> py> it = map(lambda x:x, myiter())
> py> next(it)
> 1
> py> next(it)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> StopIteration
> The argument given to StopIteration is eaten by map.

It's highly debatable whether this is even wrong. The purpose
of StopIteration(value) is for a generator to return a value
to its immediate caller when invoked using yield-from. The
value is not intended to propagate any further than that.

A non-iterator analogy would be

    def f():
       return 42

    def g():

Would you expect g() to return 42 here?


From greg.ewing at  Sun Oct  7 04:19:44 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 07 Oct 2012 15:19:44 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<k4p9lf$a5h$> <>
	<k4pae7$hpg$> <>
	<> <>
Message-ID: <>

Ben Finney wrote:
> filenames often
> have several suffixes in sequence, e.g. ?foo.tar.gz? or ?,
> and each one conveys meaningful intent by whoever named the file.

When I talk about "the extension", I mean the last one. The
vast majority of the time, that's all you're interested in --
you unwrap one layer of the onion at a time, and leave the
rest for the next layer of software up.

That's not always true, but it's true often enough that I
think it's worth having special APIs for dealing with the last


From josiah.carlson at  Sun Oct  7 04:22:26 2012
From: josiah.carlson at (Josiah Carlson)
Date: Sat, 6 Oct 2012 19:22:26 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum <guido at> wrote:
> This is an incredibly important discussion.
> I would like to contribute despite my limited experience with the
> various popular options. My own async explorations are limited to the
> constraints of the App Engine runtime environment, where a rather
> unique type of reactor is required. I am developing some ideas around
> separating reactors, futures, and yield-based coroutines, but they
> take more thinking and probably some experimental coding before I'm
> ready to write it up in any detail. For a hint on what I'm after, you
> might read up on monocle ( and my
> approach to building coroutines on top of Futures
> (

Yield-based coroutines like monocle are the simplest way to do
multi-paradigm in the same code. Whether you have a async-style
reactor, greenlet-style stack switching, cooperatively scheduled
generator trampolines, or just plain blocking threaded sockets; that
style works with all of them (the futures and wrapper around
everything just looks a little different).

That said, it forces everyone to drink the same coroutine-styled
kool-aid. That doesn't bother me. But I understand it, and have built
similar systems before. I don't have an intuition about whether 3rd
parties will like it or will migrate to it. Someone want to ping the
Twisted and Tornado folks about it?

> In the mean time I'd like to bring up a few higher-order issues:
> (1) How importance is it to offer a compatibility path for asyncore? I
> would have thought that offering an integration path forward for
> Twisted and Tornado would be more important.
> (2) We're at a fork in the road here. On the one hand, we could choose
> to deeply integrate greenlets/gevents into the standard library. (It's
> not monkey-patching if it's integrated, after all. :-) I'm not sure
> how this would work for other implementations than CPython, or even
> how to address CPython on non-x86 architectures. But users seem to
> like the programming model: write synchronous code, get async
> operation for free. It's easy to write protocol parsers that way. On
> the other hand, we could reject this approach: the integration would
> never be completely smooth, there's the issue of other implementations
> and architectures, it probably would never work smoothly even for
> CPython/x86 when 3rd party extension modules are involved.
> Callback-based APIs don't have these downsides, but they are harder to
> program; however we can make programming them easier by using
> yield-based coroutines. Even Twisted offers those (inline callbacks).
> Before I invest much more time in these ideas I'd like to at least
> have (2) sorted out.

Combining your responses to #1 and now this, are you proposing a path
forward for Twisted/Tornado to be greenlets? That's an interesting
approach to the problem, though I can see the draw. ;)

I have been hesitant on the Twisted side of things for an arbitrarily
selfish reason. After 2-3 hours of reading over a codebase (which I've
done 5 or 6 times in the last 8 years), I ask myself whether I believe
I understand 80+% of how things work; how data flows, how
callbacks/layers are invoked, and whether I could add a piece of
arbitrary functionality to one layer or another (or to determine the
proper layer in which to add the functionality). If my answer is "no",
then my gut says "this is probably a bad idea". But if I start
figuring out the layers before I've finished my 2-3 hours, and I start
finding bugs? Well, then I think it's a much better idea, even if the
implementation is buggy.

Maybe something like Monocle would be better (considering your favor
for that style, it obviously has a leg-up on the competition). I don't
know. But if something like Monocle can merge it all together, then
maybe I'd be happy. Incidentally, I can think of a few different
styles of wrappers that would actually let people using
asyncore-derived stuff use something like Monocle. So maybe that's
really the right answer?

 - Josiah

P.S. Thank you for weighing in on this Guido. Even if it doesn't end
up the way I had originally hoped, at least now there's discussion.

From guido at  Sun Oct  7 06:05:13 2012
From: guido at (Guido van Rossum)
Date: Sat, 6 Oct 2012 21:05:13 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 7:22 PM, Josiah Carlson <josiah.carlson at> wrote:
> On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum <guido at> wrote:
>> This is an incredibly important discussion.
>> I would like to contribute despite my limited experience with the
>> various popular options. My own async explorations are limited to the
>> constraints of the App Engine runtime environment, where a rather
>> unique type of reactor is required. I am developing some ideas around
>> separating reactors, futures, and yield-based coroutines, but they
>> take more thinking and probably some experimental coding before I'm
>> ready to write it up in any detail. For a hint on what I'm after, you
>> might read up on monocle ( and my
>> approach to building coroutines on top of Futures
>> (
> Yield-based coroutines like monocle are the simplest way to do
> multi-paradigm in the same code. Whether you have a async-style
> reactor, greenlet-style stack switching, cooperatively scheduled
> generator trampolines, or just plain blocking threaded sockets; that
> style works with all of them (the futures and wrapper around
> everything just looks a little different).

Glad I'm not completely crazy here. :-)

> That said, it forces everyone to drink the same coroutine-styled
> kool-aid. That doesn't bother me. But I understand it, and have built
> similar systems before. I don't have an intuition about whether 3rd
> parties will like it or will migrate to it. Someone want to ping the
> Twisted and Tornado folks about it?

They should be reading this. Or maybe we should bring it up on
python-dev before too long.

>> In the mean time I'd like to bring up a few higher-order issues:
>> (1) How importance is it to offer a compatibility path for asyncore? I
>> would have thought that offering an integration path forward for
>> Twisted and Tornado would be more important.
>> (2) We're at a fork in the road here. On the one hand, we could choose
>> to deeply integrate greenlets/gevents into the standard library. (It's
>> not monkey-patching if it's integrated, after all. :-) I'm not sure
>> how this would work for other implementations than CPython, or even
>> how to address CPython on non-x86 architectures. But users seem to
>> like the programming model: write synchronous code, get async
>> operation for free. It's easy to write protocol parsers that way. On
>> the other hand, we could reject this approach: the integration would
>> never be completely smooth, there's the issue of other implementations
>> and architectures, it probably would never work smoothly even for
>> CPython/x86 when 3rd party extension modules are involved.
>> Callback-based APIs don't have these downsides, but they are harder to
>> program; however we can make programming them easier by using
>> yield-based coroutines. Even Twisted offers those (inline callbacks).
>> Before I invest much more time in these ideas I'd like to at least
>> have (2) sorted out.
> Combining your responses to #1 and now this, are you proposing a path
> forward for Twisted/Tornado to be greenlets? That's an interesting
> approach to the problem, though I can see the draw. ;)

Can't tell whether you're serious, but that's not what I meant. Surely
it will never fly for Twisted. Tornado apparently already works with
greenlets (though maybe through a third party hack). But personally
I'd be leaning towards rejecting greenlets, for the same reasons I've
kept the doors tightly shut for Stackless -- I like it fine as a
library, but not as a language feature, because I don't see how it can
be supported on all platforms where Python must be supported.

However I figured that if we define the interfaces well enough, it
might be possible to use (a superficially modified version of)
Twisted's reactors instead of the standard ones, and, orthogonally,
Twisted's deferred's could be wrapped in the standard Futures (or the
other way around?) when used with a non-Twisted reactor. Which would
hopefully open the door for migrating some of their more useful
protocol parsers into the stdlib.

> I have been hesitant on the Twisted side of things for an arbitrarily
> selfish reason. After 2-3 hours of reading over a codebase (which I've
> done 5 or 6 times in the last 8 years), I ask myself whether I believe
> I understand 80+% of how things work; how data flows, how
> callbacks/layers are invoked, and whether I could add a piece of
> arbitrary functionality to one layer or another (or to determine the
> proper layer in which to add the functionality). If my answer is "no",
> then my gut says "this is probably a bad idea". But if I start
> figuring out the layers before I've finished my 2-3 hours, and I start
> finding bugs? Well, then I think it's a much better idea, even if the
> implementation is buggy.

Can't figure what you're implying here. On which side does Twisted fall for you?

> Maybe something like Monocle would be better (considering your favor
> for that style, it obviously has a leg-up on the competition). I don't
> know. But if something like Monocle can merge it all together, then
> maybe I'd be happy.

My worry is that monocle is too simple and does not cater for advanced
needs. It doesn't seem to have caught on much outside the company
where it originated.

> Incidentally, I can think of a few different
> styles of wrappers that would actually let people using
> asyncore-derived stuff use something like Monocle. So maybe that's
> really the right answer?

I still don't really think asyncore is going to be a problem. It can
easily be separated into a reactor and callbacks.

> Regards,
>  - Josiah
> P.S. Thank you for weighing in on this Guido. Even if it doesn't end
> up the way I had originally hoped, at least now there's discussion.

Hm, there seemed to be plenty of discussion before...

--Guido van Rossum (

From oubiwann at  Sun Oct  7 06:17:23 2012
From: oubiwann at (Duncan McGreggor)
Date: Sat, 6 Oct 2012 21:17:23 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 9:05 PM, Guido van Rossum <guido at> wrote:

> On Sat, Oct 6, 2012 at 7:22 PM, Josiah Carlson <josiah.carlson at>
> wrote:
> > On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum <guido at>
> wrote:
> >> This is an incredibly important discussion.
> >>
> >> I would like to contribute despite my limited experience with the
> >> various popular options. My own async explorations are limited to the
> >> constraints of the App Engine runtime environment, where a rather
> >> unique type of reactor is required. I am developing some ideas around
> >> separating reactors, futures, and yield-based coroutines, but they
> >> take more thinking and probably some experimental coding before I'm
> >> ready to write it up in any detail. For a hint on what I'm after, you
> >> might read up on monocle ( and my
> >> approach to building coroutines on top of Futures
> >> (
> ).
> >
> > Yield-based coroutines like monocle are the simplest way to do
> > multi-paradigm in the same code. Whether you have a async-style
> > reactor, greenlet-style stack switching, cooperatively scheduled
> > generator trampolines, or just plain blocking threaded sockets; that
> > style works with all of them (the futures and wrapper around
> > everything just looks a little different).
> Glad I'm not completely crazy here. :-)
> > That said, it forces everyone to drink the same coroutine-styled
> > kool-aid. That doesn't bother me. But I understand it, and have built
> > similar systems before. I don't have an intuition about whether 3rd
> > parties will like it or will migrate to it. Someone want to ping the
> > Twisted and Tornado folks about it?
> They should be reading this.

Yup, we are. I've pinged others in the Twisted cabal on this matter, so
hopefully you'll be hearing from one or more of us soon...


> Or maybe we should bring it up on
> python-dev before too long.
> >> In the mean time I'd like to bring up a few higher-order issues:
> >>
> >> (1) How importance is it to offer a compatibility path for asyncore? I
> >> would have thought that offering an integration path forward for
> >> Twisted and Tornado would be more important.
> >>
> >> (2) We're at a fork in the road here. On the one hand, we could choose
> >> to deeply integrate greenlets/gevents into the standard library. (It's
> >> not monkey-patching if it's integrated, after all. :-) I'm not sure
> >> how this would work for other implementations than CPython, or even
> >> how to address CPython on non-x86 architectures. But users seem to
> >> like the programming model: write synchronous code, get async
> >> operation for free. It's easy to write protocol parsers that way. On
> >> the other hand, we could reject this approach: the integration would
> >> never be completely smooth, there's the issue of other implementations
> >> and architectures, it probably would never work smoothly even for
> >> CPython/x86 when 3rd party extension modules are involved.
> >> Callback-based APIs don't have these downsides, but they are harder to
> >> program; however we can make programming them easier by using
> >> yield-based coroutines. Even Twisted offers those (inline callbacks).
> >>
> >> Before I invest much more time in these ideas I'd like to at least
> >> have (2) sorted out.
> >
> > Combining your responses to #1 and now this, are you proposing a path
> > forward for Twisted/Tornado to be greenlets? That's an interesting
> > approach to the problem, though I can see the draw. ;)
> Can't tell whether you're serious, but that's not what I meant. Surely
> it will never fly for Twisted. Tornado apparently already works with
> greenlets (though maybe through a third party hack). But personally
> I'd be leaning towards rejecting greenlets, for the same reasons I've
> kept the doors tightly shut for Stackless -- I like it fine as a
> library, but not as a language feature, because I don't see how it can
> be supported on all platforms where Python must be supported.
> However I figured that if we define the interfaces well enough, it
> might be possible to use (a superficially modified version of)
> Twisted's reactors instead of the standard ones, and, orthogonally,
> Twisted's deferred's could be wrapped in the standard Futures (or the
> other way around?) when used with a non-Twisted reactor. Which would
> hopefully open the door for migrating some of their more useful
> protocol parsers into the stdlib.
> > I have been hesitant on the Twisted side of things for an arbitrarily
> > selfish reason. After 2-3 hours of reading over a codebase (which I've
> > done 5 or 6 times in the last 8 years), I ask myself whether I believe
> > I understand 80+% of how things work; how data flows, how
> > callbacks/layers are invoked, and whether I could add a piece of
> > arbitrary functionality to one layer or another (or to determine the
> > proper layer in which to add the functionality). If my answer is "no",
> > then my gut says "this is probably a bad idea". But if I start
> > figuring out the layers before I've finished my 2-3 hours, and I start
> > finding bugs? Well, then I think it's a much better idea, even if the
> > implementation is buggy.
> Can't figure what you're implying here. On which side does Twisted fall
> for you?
> > Maybe something like Monocle would be better (considering your favor
> > for that style, it obviously has a leg-up on the competition). I don't
> > know. But if something like Monocle can merge it all together, then
> > maybe I'd be happy.
> My worry is that monocle is too simple and does not cater for advanced
> needs. It doesn't seem to have caught on much outside the company
> where it originated.
> > Incidentally, I can think of a few different
> > styles of wrappers that would actually let people using
> > asyncore-derived stuff use something like Monocle. So maybe that's
> > really the right answer?
> I still don't really think asyncore is going to be a problem. It can
> easily be separated into a reactor and callbacks.
> > Regards,
> >  - Josiah
> >
> > P.S. Thank you for weighing in on this Guido. Even if it doesn't end
> > up the way I had originally hoped, at least now there's discussion.
> Hm, there seemed to be plenty of discussion before...
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jeanpierreda at  Sun Oct  7 06:23:43 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Sun, 7 Oct 2012 00:23:43 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 12:05 AM, Guido van Rossum <guido at> wrote:
> However I figured that if we define the interfaces well enough, it
> might be possible to use (a superficially modified version of)
> Twisted's reactors instead of the standard ones, and, orthogonally,
> Twisted's deferred's could be wrapped in the standard Futures (or the
> other way around?) when used with a non-Twisted reactor. Which would
> hopefully open the door for migrating some of their more useful
> protocol parsers into the stdlib.

I thought futures were meant for thread and process pools? The
blocking methods make them a bad fit for an asynchronous networking

The Twisted folks have discussed integrating futures and Twisted (see
also the reply, which has some corrections):

-- Devin

From steve at  Sun Oct  7 06:33:42 2012
From: steve at (Steven D'Aprano)
Date: Sun, 07 Oct 2012 15:33:42 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 06/10/12 09:54, Andrew McNabb wrote:
> On Sat, Oct 06, 2012 at 08:41:05AM +1000, Steven D'Aprano wrote:
>> On 06/10/12 05:53, Andrew McNabb wrote:
>>> Path concatenation is obviously not a form of division, so it makes
>>> little sense to use the division operator for this purpose.
>> But / is not just a division operator. It is also used for:
>> * alternatives: "tea and/or coffee, breakfast/lunch/dinner"
>> * italic markup: "some apps use /slashes/ for italics"
>> * instead of line breaks when quoting poetry
>> * abbreviations such as n/a b/w c/o and even w/ (not applicable,
>>    between, care of, with)
>> * date separator
> This is the difference between C++ style operators, where the only thing
> that matters is what the operator symbol looks like, and Python style
> operators, where an operator symbol is just syntactic sugar.  In Python,
> the "/" is synonymous with `operator.div` and is defined in terms of the
> `__div__` special method.  This distinction is why I hate operator
> overloading in C++ but like it in Python.

I'm afraid that it's a distinction that seems meaningless to me.

int + int and str + str are not the same, even though the operator symbol
looks the same. Likewise int - int and set - set are not the same even
though they use the same operator symbol. Similarly for & and | operators.

For what it is worth, when I am writing pseudocode on paper, just playing
around with ideas, I often use / to join path components:

open(path/name)  # pseudo-code

sort of thing, so I would be much more comfortable writing either of these:




which looks like it ought to be a lookup, not a constructor.


From guido at  Sun Oct  7 06:35:42 2012
From: guido at (Guido van Rossum)
Date: Sat, 6 Oct 2012 21:35:42 -0700
Subject: [Python-ideas]  asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Saturday, October 6, 2012, Devin Jeanpierre wrote:

> On Sun, Oct 7, 2012 at 12:05 AM, Guido van Rossum <guido at>
> wrote:
> > However I figured that if we define the interfaces well enough, it
> > might be possible to use (a superficially modified version of)
> > Twisted's reactors instead of the standard ones, and, orthogonally,
> > Twisted's deferred's could be wrapped in the standard Futures (or the
> > other way around?) when used with a non-Twisted reactor. Which would
> > hopefully open the door for migrating some of their more useful
> > protocol parsers into the stdlib.
> I thought futures were meant for thread and process pools? The
> blocking methods make them a bad fit for an asynchronous networking
> toolset.

The specific Future implementation in the py3k stdlib uses threads and is
indeed meant for thread and process pools.

But the *concept* of futures works fine in event-based systems, see the
link I posted into the NDB sources. I'm not keen on cancellation and
threadpools FWIW.

> The Twisted folks have discussed integrating futures and Twisted (see
> also the reply, which has some corrections):
> -- Devin

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Sun Oct  7 10:36:11 2012
From: stephen at (Stephen J. Turnbull)
Date: Sun, 07 Oct 2012 17:36:11 +0900
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing writes:
 > Stephen J. Turnbull wrote:
 > > On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is
 > > a Path, not a string) both seem reasonable to me.
 > I don't like the idea of using + as the path concatenation
 > operator, because
 >     path + ".c"
 > is an obvious way to add an extension or other suffix to a
 > filename, and it ought to work.

I don't have a problem with it because I don't append extensions as
often as I substitute, and because I don't think of paths as strings.

I think of (some) strings as representatives of paths.

From stephen at  Sun Oct  7 10:40:18 2012
From: stephen at (Stephen J. Turnbull)
Date: Sun, 07 Oct 2012 17:40:18 +0900
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <k4ntu7$9c1$>
Message-ID: <>

Ethan Furman writes:
 > Stephen J. Turnbull wrote:
 > > Antoine Pitrou writes:
 > >> Richard Oudkerk wrote:
 > >>> Maybe p.basename could be shorthand for'.')[0].
 > >> 
 > >> Wouldn't there be some confusion with os.path.basename:
 > >> 
 > >>--> os.path.basename('a/b/c.ext')
 > >> 'c.ext'
 > I wouldn't worry too much about this; after all, we are trying to 
 > replace a primitive system with a more advanced, user-friendly one.

Please, don't oversell your case.  We are *not* replacing POSIX,
Antoine is proposing a system that coexists with it.

From solipsis at  Sun Oct  7 12:09:31 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 7 Oct 2012 12:09:31 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
References: <>
Message-ID: <>

On Sat, 6 Oct 2012 17:23:48 -0700
Guido van Rossum <guido at> wrote:
> On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou <solipsis at> wrote:
> > On Sat, 6 Oct 2012 15:00:54 -0700
> > Guido van Rossum <guido at> wrote:
> >>
> >> (2) We're at a fork in the road here. On the one hand, we could choose
> >> to deeply integrate greenlets/gevents into the standard library. (It's
> >> not monkey-patching if it's integrated, after all. :-) I'm not sure
> >> how this would work for other implementations than CPython, or even
> >> how to address CPython on non-x86 architectures. But users seem to
> >> like the programming model: write synchronous code, get async
> >> operation for free. It's easy to write protocol parsers that way. On
> >> the other hand, we could reject this approach: the integration would
> >> never be completely smooth, there's the issue of other implementations
> >> and architectures, it probably would never work smoothly even for
> >> CPython/x86 when 3rd party extension modules are involved.
> >> Callback-based APIs don't have these downsides, but they are harder to
> >> program; however we can make programming them easier by using
> >> yield-based coroutines. Even Twisted offers those (inline callbacks).
> >
> > greenlets/gevents only get you half the advantages of single-threaded
> > "async" programming: they get you scalability in the face of a high
> > number of concurrent connections, but they don't get you the robustness
> > of cooperative multithreading (because it's not obvious when reading
> > the code where the possible thread-switching points are).
> I used to think that too, long ago, until I discovered that as you add
> abstraction layers, cooperative multithreading is untenable -- sooner
> or later you will lose track of where the threads are switched.

Even with an explicit notation like "yield" / "yield from"?



Software development and contracting:

From solipsis at  Sun Oct  7 12:15:53 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 7 Oct 2012 12:15:53 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<> <>
Message-ID: <>

On Sun, 07 Oct 2012 12:41:44 +1100
Steven D'Aprano <steve at> wrote:

> On 07/10/12 04:08, Antoine Pitrou wrote:
> > Personally, I cringe everytime I have to type
> > `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two
> > directories upwards of a given path. Compare, with, say:
> I would cringe too if I did that, because it goes THREE directories
> up, not two:
> py> path = '/a/b/c/d'
> py> os.path.dirname(os.path.dirname(os.path.dirname(path)))
> '/a'

Not if d is a file, actually (yes, the formulation was a bit ambiguous).



Software development and contracting:

From guido at  Sun Oct  7 17:04:30 2012
From: guido at (Guido van Rossum)
Date: Sun, 7 Oct 2012 08:04:30 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 3:09 AM, Antoine Pitrou <solipsis at> wrote:
> On Sat, 6 Oct 2012 17:23:48 -0700
> Guido van Rossum <guido at> wrote:
>> On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou <solipsis at> wrote:
>> > greenlets/gevents only get you half the advantages of single-threaded
>> > "async" programming: they get you scalability in the face of a high
>> > number of concurrent connections, but they don't get you the robustness
>> > of cooperative multithreading (because it's not obvious when reading
>> > the code where the possible thread-switching points are).
>> I used to think that too, long ago, until I discovered that as you add
>> abstraction layers, cooperative multithreading is untenable -- sooner
>> or later you will lose track of where the threads are switched.
> Even with an explicit notation like "yield" / "yield from"?

If you strictly adhere to using those you should be safe (though
distinguishing between the two may prove challenging) -- but in
practice it's hard to get everyone and every API to use this style. So
you'll have some blocking API calls hidden deep inside what looks like
a perfectly innocent call to some helper function.

IIUC in Go this is solved by mixing threads and lighter-weight
constructs (say, greenlets) -- if a greenlet gets blocked for I/O, the
rest of the system continues to make progress by spawning another

My own experience with NDB is that it's just too hard to make everyone
use the async APIs all the time -- so I gave up and made async APIs an
optional feature, offering a blocking and an async version of every
API. I didn't start out that way, but once I started writing
documentation aimed at unsophisticated users, I realized that it was
just too much of an uphill battle to bother.

So I think it's better to accept this and deal with it, possibly
adding locking primitives into the mix that work well with the rest of
the framework. Building a lock out of a tasklet-based (i.e.
non-threading) Future class is easy enough.

--Guido van Rossum (

From solipsis at  Sun Oct  7 19:37:35 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 7 Oct 2012 19:37:35 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Sat, 6 Oct 2012 10:44:37 -0700
Guido van Rossum <guido at> wrote:
> But rather than diving right into the syntax, I would like to focus on
> some use cases. (Some of this may already be in the PEP, my
> apologize.) Some things I care about (based on path manipulations I
> remember I've written at some point or another):
> - Distinguishing absolute paths from relative paths; this affects
> joining behavior as for os.path.join().

The proposed API does function like os.path.join() in that respect:
when joining a relative path to an absolute path, the relative path is
simply discarded:

>>> p = PurePath('a')
>>> q = PurePath('/b')
>>> p[q]

> - Various normal forms that can be used for comparing paths for
> equality; there should be a pure normalization as well as an impure
> one (like os.path.realpath()).

Impure normalization is done with the resolve() method:

>>> os.chdir('/etc')
>>> Path('ssl/certs').resolve()

(/etc/ssl/certs being a symlink to /etc/pki/tks/certs on my system)

Pure comparison already obeys case-sensitivity rules as well as the
different path separators:

>>> PureNTPath('a/b') == PureNTPath('A\\B')
>>> PurePosixPath('a/b') == PurePosixPath('a\\b')

Note the case information isn't lost either:

>>> str(PureNTPath('a/b'))
>>> str(PureNTPath('A/B'))

> - An API that encourage Unix lovers to write code that is most likely
> also to make sense on Windows.
> - An API that encourages Windows lovers to write code that is most
> likely also to make sense on Unix.

I agree on these goals, that's why I'm trying to avoid system-specific
methods. For example is_reserved() is also defined under Unix, it just
always returns False:

>>> PurePosixPath('CON').is_reserved()
>>> PureNTPath('CON').is_reserved()

> - Integration with fnmatch (pure) and glob (impure).

This is provided indeed, with the match() and glob() methods

> - In addition to stat(), some simple derived operations like
> getmtime(), getsize(), islink().

The PEP proposes properties mimicking the stat object attributes:

>>> p = Path('')
>>> p.st_size
>>> p.st_mtime

And methods to query the file type:

>>> p.is_symlink()
>>> p.is_file()

Perhaps the properties / methods mix isn't very consistent.

> - Easy checks and manipulations (applying to the basename) like "ends
> with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc
> extension with .py", "remove trailing ~", "append .tmp", "remove
> leading @", and so on.

I'll try to reconcile this with Ben Finney's suffix / suffixes proposal.

> - Matching on patterns on directory names (e.g. "does not contain a
> segment named .hg").

Sequence-like access on the parts property provides this:

>>> p = PurePath('foo/.hg/hgrc')
>>> '.hg' in



Software development and contracting:

From storchaka at  Sun Oct  7 20:40:41 2012
From: storchaka at (Serhiy Storchaka)
Date: Sun, 07 Oct 2012 21:40:41 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$>
Message-ID: <k4sibc$rbv$>

On 06.10.12 23:47, Mike Graham wrote:
> Can you provide an example of a time when you want to use such a value
> with a generator on which you want to use one of these so I can better
> understand why this is necessary? the times I'm familiar with wanting
> this value I'd usually be manually stepping through my generator.

There are no many uses yet because it's a new feature. Python 3.3 just 
released. For example see the proposed patch for In general case `yield from` returns 
such a value.

From storchaka at  Sun Oct  7 21:06:29 2012
From: storchaka at (Serhiy Storchaka)
Date: Sun, 07 Oct 2012 22:06:29 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <k4sjrt$7hh$>

On 07.10.12 05:11, Greg Ewing wrote:
> It's highly debatable whether this is even wrong. The purpose
> of StopIteration(value) is for a generator to return a value
> to its immediate caller when invoked using yield-from. The
> value is not intended to propagate any further than that.

If immediate caller can propagate generated values with the help of 
"yield from", why it can not propagate returned from "yield from" value?

> A non-iterator analogy would be
>     def f():
>        return 42
>     def g():
>        f()

No, a non-iterator analogy would be

   g = functools.partial(f)


   g = functools.lru_cache()(f)

I expect g() to return 42 here.

And it will be expected and useful if

   yield from itertools.chain([prefix], iterator)

will return the same value as

   yield from iterator

Now chain equivalent to:

   def chain(*iterables):
       for it in iterables:
           yield from it

I propose make it equivalent to:

   def chain(*iterables):
       value = None
       for it in iterables:
           value = yield from it
       return value

From shibturn at  Sun Oct  7 21:18:37 2012
From: shibturn at (Richard Oudkerk)
Date: Sun, 07 Oct 2012 20:18:37 +0100
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4sjrt$7hh$>
References: <k4q38d$j8e$> <>
	<> <k4sjrt$7hh$>
Message-ID: <k4skie$d5d$>

On 07/10/2012 8:06pm, Serhiy Storchaka wrote:
> I propose make it equivalent to:
>    def chain(*iterables):
>        value = None
>        for it in iterables:
>            value = yield from it
>        return value

That means that all but the last return value is ignored.  Why is the 
last return value any more important than the earlier ones?

ISTM it would make just as much sense to do

   def chain(*iterables):
       values = []
       for it in iterables:
           values.append(yield from it)
       return values

But I don't see any point in changing the current behaviour.


From storchaka at  Sun Oct  7 21:30:16 2012
From: storchaka at (Serhiy Storchaka)
Date: Sun, 07 Oct 2012 22:30:16 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <k4sl8b$ibt$>

On 07.10.12 04:45, Guido van Rossum wrote:
> But yes, this was all considered and accepted when PEP 380 was debated
> (endlessly :-), and I see no reason to change anything about this.

The reason is that when someone uses StopIteration.value for some 
purposes, he will lose this value if the iterator will be wrapped into 
itertools.chain (quite often used technique) or into other standard 
iterator wrapper.

> "Don't do that" is the best I can say about it -- there are a zillion
> other situations in Python where that's the only sensible motto.

The problem is that two different authors can use two legal techniques 
(using values returned by "yield from" and wrap iterators with 
itertools.chain) which do not work in combination. The conflict easily 
solved if instead of standard itertools.chain to use handwriten code. It 
looks as bug in itertools.chain.

From rndblnch at  Sun Oct  7 21:37:41 2012
From: rndblnch at (rndblnch)
Date: Sun, 7 Oct 2012 19:37:41 +0000 (UTC)
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

Antoine Pitrou <solipsis at ...> writes:

> PS: You can all admire my ASCII-art skills.

but you got the direction of the "is a" arrows wrong.


From guido at  Sun Oct  7 22:19:11 2012
From: guido at (Guido van Rossum)
Date: Sun, 7 Oct 2012 13:19:11 -0700
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4sl8b$ibt$>
References: <k4q38d$j8e$> <>
Message-ID: <>

On Sun, Oct 7, 2012 at 12:30 PM, Serhiy Storchaka <storchaka at> wrote:
> On 07.10.12 04:45, Guido van Rossum wrote:
>> But yes, this was all considered and accepted when PEP 380 was debated
>> (endlessly :-), and I see no reason to change anything about this.
> The reason is that when someone uses StopIteration.value for some purposes,
> he will lose this value if the iterator will be wrapped into itertools.chain
> (quite often used technique) or into other standard iterator wrapper.

If this is just about iterator.chain() I may see some value in it (but
TBH the discussion so far mostly confuses -- please spend some more
time coming up with good examples that show actually useful use cases
rather than f() and g() or foo() and bar())

 OTOH yield from is not primarily for iterators -- it is for
coroutines. I suspect most of the itertools functionality just doesn't
work with coroutines.

>> "Don't do that" is the best I can say about it -- there are a zillion
>> other situations in Python where that's the only sensible motto.
> The problem is that two different authors can use two legal techniques
> (using values returned by "yield from" and wrap iterators with
> itertools.chain) which do not work in combination. The conflict easily
> solved if instead of standard itertools.chain to use handwriten code. It
> looks as bug in itertools.chain.

Okay, so please do work out a complete, useful use case. We may yet
see the light.

--Guido van Rossum (

From guido at  Sun Oct  7 22:24:59 2012
From: guido at (Guido van Rossum)
Date: Sun, 7 Oct 2012 13:24:59 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 10:37 AM, Antoine Pitrou <solipsis at> wrote:
> On Sat, 6 Oct 2012 10:44:37 -0700
> Guido van Rossum <guido at> wrote:
>> But rather than diving right into the syntax, I would like to focus on
>> some use cases. (Some of this may already be in the PEP, my
>> apologize.) Some things I care about (based on path manipulations I
>> remember I've written at some point or another):
>> - Distinguishing absolute paths from relative paths; this affects
>> joining behavior as for os.path.join().
> The proposed API does function like os.path.join() in that respect:
> when joining a relative path to an absolute path, the relative path is
> simply discarded:
>>>> p = PurePath('a')
>>>> q = PurePath('/b')
>>>> p[q]
> PurePosixPath('/b')
>> - Various normal forms that can be used for comparing paths for
>> equality; there should be a pure normalization as well as an impure
>> one (like os.path.realpath()).
> Impure normalization is done with the resolve() method:
>>>> os.chdir('/etc')
>>>> Path('ssl/certs').resolve()
> PosixPath('/etc/pki/tls/certs')
> (/etc/ssl/certs being a symlink to /etc/pki/tks/certs on my system)
> Pure comparison already obeys case-sensitivity rules as well as the
> different path separators:
>>>> PureNTPath('a/b') == PureNTPath('A\\B')
> True
>>>> PurePosixPath('a/b') == PurePosixPath('a\\b')
> False
> Note the case information isn't lost either:
>>>> str(PureNTPath('a/b'))
> 'a\\b'
>>>> str(PureNTPath('A/B'))
> 'A\\B'
>> - An API that encourage Unix lovers to write code that is most likely
>> also to make sense on Windows.
>> - An API that encourages Windows lovers to write code that is most
>> likely also to make sense on Unix.
> I agree on these goals, that's why I'm trying to avoid system-specific
> methods. For example is_reserved() is also defined under Unix, it just
> always returns False:
>>>> PurePosixPath('CON').is_reserved()
> False
>>>> PureNTPath('CON').is_reserved()
> True
>> - Integration with fnmatch (pure) and glob (impure).
> This is provided indeed, with the match() and glob() methods
> respectively.
>> - In addition to stat(), some simple derived operations like
>> getmtime(), getsize(), islink().
> The PEP proposes properties mimicking the stat object attributes:
>>>> p = Path('')
>>>> p.st_size
> 977
>>>> p.st_mtime
> 1349461817.8768747
> And methods to query the file type:
>>>> p.is_symlink()
> False
>>>> p.is_file()
> True
> Perhaps the properties / methods mix isn't very consistent.

I would warn about caching these results on the path object. I can
easily imagine cases where I want to repeatedly call stat() because
I'm waiting for a file to change (e.g. tail -f does something like
this). I would prefer to have a stat() method that always calls
os.stat(), and no caching of the results; the user can cache the
stat() return value. (Maybe we can add is_file() etc. as methods on
stat() results now they are no longer just tuples?)

>> - Easy checks and manipulations (applying to the basename) like "ends
>> with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc
>> extension with .py", "remove trailing ~", "append .tmp", "remove
>> leading @", and so on.
> I'll try to reconcile this with Ben Finney's suffix / suffixes proposal.
>> - Matching on patterns on directory names (e.g. "does not contain a
>> segment named .hg").
> Sequence-like access on the parts property provides this:
>>>> p = PurePath('foo/.hg/hgrc')
>>>> '.hg' in
> True

Sounds cool. I will try to refrain from bikeshedding much more on this
proposal; I'd rather focus on reactors and futures...

--Guido van Rossum (

From andy at  Sun Oct  7 22:25:34 2012
From: andy at (Andy Buckley)
Date: Sun, 07 Oct 2012 22:25:34 +0200
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

On 05/10/12 12:26, Stephen J. Turnbull wrote:
> Andy Buckley writes:
>  > A couple of weeks ago I posted a question on
> Maybe it's a bug.  (See below.)  Have you checked the tracker?  Have
> you posted to python-list?  That's a better place than here to get
> that kind of information.
>  > As you might have noticed,
> The people on this list (and on python-dev) probably don't pay much
> attention to questions on, unless they're the kind of
> people who hang out on python-list.

Hi Stephen -- thanks for the feedback. I know StackExchange sites are
not affiliated to the Python project! By "as you might have noticed" I
didn't mean to imply that you spend your time scouring all Q&A sites for
anything Python-related, but just that if you followed the link I posted
you'd probably notice the zero response :)

>From searching around before that SuperUser post, and some more
afterwards, I couldn't find any reference at all to history-stepping as
an available Python interpreter feature, so I was trying to suggest that
as a new feature -- not a bug report. Sorry if python-ideas is only for
language/stdlib features rather than the standard infrastructure.

However, I hadn't remembered when I first posted that I was already
making use of a PYTHONSTARTUP script with the readline module to enable
some history functionality -- I'd set that up years ago and ported it
between systems. So my premise that readline *should* work was not
accurate: sorry for the noise. Notably the operate-and-get-next readline
function (thanks for the bind -p suggestion) bound to Ctrl-o does not
work with Python readline... but I will follow up on that potential bug

So one last question, in case it is an acceptable python-ideas topic:
how about adding readline-like support by default in the interpreter?
But maybe there is a reason for new users to have a more bare-bones,
no-history introduction to the language, unless they start with ipython?

Thanks again,

From mikegraham at  Sun Oct  7 22:27:48 2012
From: mikegraham at (Mike Graham)
Date: Sun, 7 Oct 2012 16:27:48 -0400
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4sjrt$7hh$>
References: <k4q38d$j8e$> <>
	<> <k4sjrt$7hh$>
Message-ID: <>

On Sun, Oct 7, 2012 at 3:06 PM, Serhiy Storchaka <storchaka at> wrote:
> On 07.10.12 05:11, Greg Ewing wrote:
>> It's highly debatable whether this is even wrong. The purpose
>> of StopIteration(value) is for a generator to return a value
>> to its immediate caller when invoked using yield-from. The
>> value is not intended to propagate any further than that.
> If immediate caller can propagate generated values with the help of "yield
> from", why it can not propagate returned from "yield from" value?
>> A non-iterator analogy would be
>>     def f():
>>        return 42
>>     def g():
>>        f()
> No, a non-iterator analogy would be
>   g = functools.partial(f)
> or
>   g = functools.lru_cache()(f)
> I expect g() to return 42 here.

Rather than speaking in analogies, can we be concrete? I can't imagine
doing map(f, x) where x is a generator whose return value I cared
about. Can you show us a concrete example of something that looks like
practical code?


From phd at  Sun Oct  7 22:45:41 2012
From: phd at (Oleg Broytman)
Date: Mon, 8 Oct 2012 00:45:41 +0400
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 07, 2012 at 10:25:34PM +0200, Andy Buckley <andy at> wrote:
> Sorry if python-ideas is only for
> language/stdlib features rather than the standard infrastructure.

   readline is a Python module so ideas about it are certainly allowed

> Notably the operate-and-get-next readline
> function (thanks for the bind -p suggestion) bound to Ctrl-o does not
> work with Python readline... but I will follow up on that potential bug
> elsewhere.

   You probably need to reread the entire thread because the reason why
it does not work with Python was already found and reported.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From ubershmekel at  Sun Oct  7 23:15:38 2012
From: ubershmekel at (Yuval Greenfield)
Date: Sun, 7 Oct 2012 23:15:38 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou <solipsis at> wrote:

> On Sat, 6 Oct 2012 10:44:37 -0700
> Guido van Rossum <guido at> wrote:
> >
> > But rather than diving right into the syntax, I would like to focus on
> > some use cases. (Some of this may already be in the PEP, my
> > apologize.) Some things I care about (based on path manipulations I
> > remember I've written at some point or another):
> >
> > - Distinguishing absolute paths from relative paths; this affects
> > joining behavior as for os.path.join().
> The proposed API does function like os.path.join() in that respect:
> when joining a relative path to an absolute path, the relative path is
> simply discarded:
> >>> p = PurePath('a')
> >>> q = PurePath('/b')
> >>> p[q]
> PurePosixPath('/b')
What's the use case for this behavior?

I'd much rather if joining an absolute path to a relative one fail and
reveal the potential bug....

    >>> os.unlink(Path('myproj') / Path('/lib'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: absolute path can't be appended to a relative path
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From arnodel at  Sun Oct  7 23:43:02 2012
From: arnodel at (Arnaud Delobelle)
Date: Sun, 7 Oct 2012 22:43:02 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 7 October 2012 18:37, Antoine Pitrou <solipsis at> wrote:
> Pure comparison already obeys case-sensitivity rules as well as the
> different path separators:
>>>> PureNTPath('a/b') == PureNTPath('A\\B')
> True
>>>> PurePosixPath('a/b') == PurePosixPath('a\\b')
> False

Naive question: how do you deal with HFS+, which is case-preserving
but on most machines case-insensitive?


From solipsis at  Sun Oct  7 23:42:12 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 7 Oct 2012 23:42:12 +0200
Subject: [Python-ideas] PEP 428 - joining
References: <>
Message-ID: <>

On Sun, 7 Oct 2012 23:15:38 +0200
Yuval Greenfield <ubershmekel at>
> On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou <solipsis at> wrote:
> > On Sat, 6 Oct 2012 10:44:37 -0700
> > Guido van Rossum <guido at> wrote:
> > >
> > > But rather than diving right into the syntax, I would like to focus on
> > > some use cases. (Some of this may already be in the PEP, my
> > > apologize.) Some things I care about (based on path manipulations I
> > > remember I've written at some point or another):
> > >
> > > - Distinguishing absolute paths from relative paths; this affects
> > > joining behavior as for os.path.join().
> >
> > The proposed API does function like os.path.join() in that respect:
> > when joining a relative path to an absolute path, the relative path is
> > simply discarded:
> >
> > >>> p = PurePath('a')
> > >>> q = PurePath('/b')
> > >>> p[q]
> > PurePosixPath('/b')
> >
> >
> What's the use case for this behavior?
> I'd much rather if joining an absolute path to a relative one fail and
> reveal the potential bug....
>     >>> os.unlink(Path('myproj') / Path('/lib'))
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     TypeError: absolute path can't be appended to a relative path

In all honesty I followed os.path.join's behaviour here. I agree a
ValueError (not TypeError) would be sensible too.



Software development and contracting:

From solipsis at  Sun Oct  7 23:47:18 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 7 Oct 2012 23:47:18 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Sun, 7 Oct 2012 22:43:02 +0100
Arnaud Delobelle <arnodel at>
> On 7 October 2012 18:37, Antoine Pitrou <solipsis at> wrote:
> > Pure comparison already obeys case-sensitivity rules as well as the
> > different path separators:
> >
> >>>> PureNTPath('a/b') == PureNTPath('A\\B')
> > True
> >>>> PurePosixPath('a/b') == PurePosixPath('a\\b')
> > False
> Naive question: how do you deal with HFS+, which is case-preserving
> but on most machines case-insensitive?

I don't know. How does os.path deal with it?



Software development and contracting:

From g.brandl at  Mon Oct  8 00:11:26 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 08 Oct 2012 00:11:26 +0200
Subject: [Python-ideas] PEP 428 - joining
In-Reply-To: <>
References: <>
Message-ID: <k4suk9$nv8$>

Am 07.10.2012 23:42, schrieb Antoine Pitrou:
> On Sun, 7 Oct 2012 23:15:38 +0200
> Yuval Greenfield <ubershmekel at>
> wrote:
>> On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou <solipsis at> wrote:
>> > On Sat, 6 Oct 2012 10:44:37 -0700
>> > Guido van Rossum <guido at> wrote:
>> > >
>> > > But rather than diving right into the syntax, I would like to focus on
>> > > some use cases. (Some of this may already be in the PEP, my
>> > > apologize.) Some things I care about (based on path manipulations I
>> > > remember I've written at some point or another):
>> > >
>> > > - Distinguishing absolute paths from relative paths; this affects
>> > > joining behavior as for os.path.join().
>> >
>> > The proposed API does function like os.path.join() in that respect:
>> > when joining a relative path to an absolute path, the relative path is
>> > simply discarded:
>> >
>> > >>> p = PurePath('a')
>> > >>> q = PurePath('/b')
>> > >>> p[q]
>> > PurePosixPath('/b')
>> >
>> >
>> What's the use case for this behavior?
>> I'd much rather if joining an absolute path to a relative one fail and
>> reveal the potential bug....
>>     >>> os.unlink(Path('myproj') / Path('/lib'))
>>     Traceback (most recent call last):
>>       File "<stdin>", line 1, in <module>
>>     TypeError: absolute path can't be appended to a relative path
> In all honesty I followed os.path.join's behaviour here. I agree a
> ValueError (not TypeError) would be sensible too.

Please no -- this is a very important use case (for os.path.join, at least):
resolving a path from config/user/command line that can be given either absolute
or relative to a certain directory.

Right now it's as simple as join(default, path), and i'd prefer to keep this.
There is no bug here, it's working as designed.


From python at  Mon Oct  8 00:29:25 2012
From: python at (MRAB)
Date: Sun, 07 Oct 2012 23:29:25 +0100
Subject: [Python-ideas] PEP 428 - joining
In-Reply-To: <k4suk9$nv8$>
References: <>
	<> <k4suk9$nv8$>
Message-ID: <>

On 2012-10-07 23:11, Georg Brandl wrote:
> Am 07.10.2012 23:42, schrieb Antoine Pitrou:
>> On Sun, 7 Oct 2012 23:15:38 +0200
>> Yuval Greenfield <ubershmekel at>
>> wrote:
>>> On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou <solipsis at> wrote:
>>> > On Sat, 6 Oct 2012 10:44:37 -0700
>>> > Guido van Rossum <guido at> wrote:
>>> > >
>>> > > But rather than diving right into the syntax, I would like to focus on
>>> > > some use cases. (Some of this may already be in the PEP, my
>>> > > apologize.) Some things I care about (based on path manipulations I
>>> > > remember I've written at some point or another):
>>> > >
>>> > > - Distinguishing absolute paths from relative paths; this affects
>>> > > joining behavior as for os.path.join().
>>> >
>>> > The proposed API does function like os.path.join() in that respect:
>>> > when joining a relative path to an absolute path, the relative path is
>>> > simply discarded:
>>> >
>>> > >>> p = PurePath('a')
>>> > >>> q = PurePath('/b')
>>> > >>> p[q]
>>> > PurePosixPath('/b')
>>> >
>>> >
>>> What's the use case for this behavior?
>>> I'd much rather if joining an absolute path to a relative one fail and
>>> reveal the potential bug....
>>>     >>> os.unlink(Path('myproj') / Path('/lib'))
>>>     Traceback (most recent call last):
>>>       File "<stdin>", line 1, in <module>
>>>     TypeError: absolute path can't be appended to a relative path
>> In all honesty I followed os.path.join's behaviour here. I agree a
>> ValueError (not TypeError) would be sensible too.
> Please no -- this is a very important use case (for os.path.join, at least):
> resolving a path from config/user/command line that can be given either absolute
> or relative to a certain directory.
> Right now it's as simple as join(default, path), and i'd prefer to keep this.
> There is no bug here, it's working as designed.
In that use case, wouldn't it be more likely that the default is itself
absolute, so it'd be either relative to that absolute path or
overriding that absolute path with another absolute path?

From greg.ewing at  Mon Oct  8 00:40:04 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 08 Oct 2012 11:40:04 +1300
Subject: [Python-ideas] PEP 428 - joining
In-Reply-To: <>
References: <>
	<> <k4suk9$nv8$>
Message-ID: <>

MRAB wrote:
> In that use case, wouldn't it be more likely that the default is itself
> absolute,

Not necessarily -- the default could be something provided on
the command line, to be interpreted relative to the current


From oscar.j.benjamin at  Mon Oct  8 00:43:15 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Sun, 7 Oct 2012 23:43:15 +0100
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

On 7 October 2012 21:19, Guido van Rossum <guido at> wrote:
> On Sun, Oct 7, 2012 at 12:30 PM, Serhiy Storchaka <storchaka at> wrote:
>> On 07.10.12 04:45, Guido van Rossum wrote:
>>> But yes, this was all considered and accepted when PEP 380 was debated
>>> (endlessly :-), and I see no reason to change anything about this.
>> The reason is that when someone uses StopIteration.value for some purposes,
>> he will lose this value if the iterator will be wrapped into itertools.chain
>> (quite often used technique) or into other standard iterator wrapper.
> If this is just about iterator.chain() I may see some value in it (but
> TBH the discussion so far mostly confuses -- please spend some more
> time coming up with good examples that show actually useful use cases
> rather than f() and g() or foo() and bar())
>  OTOH yield from is not primarily for iterators -- it is for
> coroutines. I suspect most of the itertools functionality just doesn't
> work with coroutines.

I think what Serhiy is saying is that although pep 380 mainly
discusses generator functions it has effectively changed the
definition of what it means to be an iterator for all iterators:
previously an iterator was just something that yielded values but now
it also returns a value. Since the meaning of an iterator has changed,
functions that work with iterators need to be updated.

Before pep 380 filter(lambda x: True, obj) returned an object that was
the same kind of iterator as obj (it would yield the same values). Now
the "kind of iterator" that obj is depends not only on the values that
it yields but also on the value that it returns. Since filter does not
pass on the same return value, filter(lambda x: True, obj) is no
longer the same kind of iterator as obj. The same considerations apply
to many other functions such as map, itertools.groupby,

Cases like itertools.chain and zip are trickier since they each act on
multiple underlying iterables. Probably chain should return a tuple of
the return values from each of its iterables.

This feature was new in Python 3.3 which was released a week ago so it
is not widely used but it has uses that are not anything to do with
coroutines. As an example of how you could use it, consider parsing a
file that can contains #include statements. When the #include
statement is encountered we need to insert the contents of the
included file. This is easy to do with a recursive generator. The
example uses the return value of the generator to keep track of which
line is being parsed in relation to the flattened output file:

def parse(filename, output_lineno=0):
    with open(filename) as fin:
        for input_lineno, line in enumerate(fin):
            if line.startswith('#include '):
                subfilename = line.split()[1]
                output_lineno = yield from parse(subfilename, output_lineno)
                    yield parse_line(line)
                except ParseLineError:
                    raise ParseError(filename, input_lineno, output_lineno)
                output_lineno += 1
    return output_lineno

When writing code like the above that depends on being able to get the
value returned from an iterator, it is no longer possible to freely
mix utilities like filter, map, zip, itertools.chain with the
iterators returned by parse() as they no longer act as transparent
wrappers over the underlying iterators (by not propagating the value
attached to StopIteration).

Hopefully, I've understood Serhiy and the docs correctly (I don't have
access to Python 3.3 right now to test any of this).


From steve at  Mon Oct  8 00:47:37 2012
From: steve at (Steven D'Aprano)
Date: Mon, 08 Oct 2012 09:47:37 +1100
Subject: [Python-ideas] Issue 8492 [was Re: [Python-dev] History stepping in
 interactive session?]
In-Reply-To: <>
References: <>
Message-ID: <>

Over on python-ideas, a question about readline was raised and, I think,
resolved. But while investigating the question, it became obvious to me
that the ability to inspect the current readline bindings from Python
was both useful and important.

I wrote:

> I don't believe that there is any direct mechanism for querying the current
> readline bindings in Python,

But it was requested some time ago:

Is there anyone willing and able to give this issue some attention please?

(Replies to python-dev only please.)


From greg.ewing at  Mon Oct  8 00:55:26 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 08 Oct 2012 11:55:26 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:
> On Sun, 7 Oct 2012 22:43:02 +0100
> Arnaud Delobelle <arnodel at>
> wrote:
>>Naive question: how do you deal with HFS+, which is case-preserving
>>but on most machines case-insensitive?
> I don't know. How does os.path deal with it?

Not all that well, apparently. From the docs for os.path:

     Normalize the case of a pathname. On Unix and Mac OS X, this returns the
     path unchanged; on case-insensitive filesystems, it converts the path to
     lowercase. On Windows, it also converts forward slashes to backward slashes.

This is partially self-contradictory, since many MacOSX filesystems are
actually case-insensitive; it depends on the particular filesystem concerned.
Worse, different parts of the same path can have different case sensitivities.
Also, with network file systems, not all paths are necessarily case-insensitive
on Windows.

So there's really no certain way to compare pure paths for equality. Basing
it on which OS is running your code is no more than a guess.


From greg.ewing at  Mon Oct  8 01:30:29 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 08 Oct 2012 12:30:29 +1300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

Oscar Benjamin wrote:
> Before pep 380 filter(lambda x: True, obj) returned an object that was
> the same kind of iterator as obj (it would yield the same values). Now
> the "kind of iterator" that obj is depends not only on the values that
> it yields but also on the value that it returns. Since filter does not
> pass on the same return value, filter(lambda x: True, obj) is no
> longer the same kind of iterator as obj.

Something like this has happened before, when the ability to
send() values into a generator was added. If you wrap a
generator with filter, you likewise don't get the same kind
of object -- you don't get the ability to send() things
into your filtered generator.

So, "provide the same kind of iterator" is not currently part
of the contract of these functions.

> When writing code like the above that depends on being able to get the
> value returned from an iterator, it is no longer possible to freely
> mix utilities like filter, map, zip, itertools.chain with the
> iterators returned by parse() as they no longer act as transparent
> wrappers over the underlying iterators (by not propagating the value
> attached to StopIteration).

In many cases they *can't* act as transparent wrappers with
respect to the return value, because there is more than one return
value to deal with.

There's also the added complication that sometimes not all of the
sub-iterators are run to completion -- e.g. izip() stops as soon
as one of them reaches the end.


From greg.ewing at  Mon Oct  8 01:36:20 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 08 Oct 2012 12:36:20 +1300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4sl8b$ibt$>
References: <k4q38d$j8e$> <>
Message-ID: <>

Serhiy Storchaka wrote:
> The conflict easily 
> solved if instead of standard itertools.chain to use handwriten code. It 
> looks as bug in itertools.chain.

Don't underestimate the value of handwritten code. It makes the
intent clear to the reader, whereas relying on some arbitrary
default behaviour of a function doesn't.


From guido at  Mon Oct  8 01:36:20 2012
From: guido at (Guido van Rossum)
Date: Sun, 7 Oct 2012 16:36:20 -0700
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

On Sun, Oct 7, 2012 at 3:43 PM, Oscar Benjamin
<oscar.j.benjamin at> wrote:
> On 7 October 2012 21:19, Guido van Rossum <guido at> wrote:
>> On Sun, Oct 7, 2012 at 12:30 PM, Serhiy Storchaka <storchaka at> wrote:
>>> On 07.10.12 04:45, Guido van Rossum wrote:
>>>> But yes, this was all considered and accepted when PEP 380 was debated
>>>> (endlessly :-), and I see no reason to change anything about this.
>>> The reason is that when someone uses StopIteration.value for some purposes,
>>> he will lose this value if the iterator will be wrapped into itertools.chain
>>> (quite often used technique) or into other standard iterator wrapper.
>> If this is just about iterator.chain() I may see some value in it (but
>> TBH the discussion so far mostly confuses -- please spend some more
>> time coming up with good examples that show actually useful use cases
>> rather than f() and g() or foo() and bar())
>>  OTOH yield from is not primarily for iterators -- it is for
>> coroutines. I suspect most of the itertools functionality just doesn't
>> work with coroutines.
> I think what Serhiy is saying is that although pep 380 mainly
> discusses generator functions it has effectively changed the
> definition of what it means to be an iterator for all iterators:
> previously an iterator was just something that yielded values but now
> it also returns a value. Since the meaning of an iterator has changed,
> functions that work with iterators need to be updated.

I think there are different philosophical viewpoints possible on that
issue. My own perspective is that there is no change in the definition
of iterator -- only in the definition of generator. Note that the
*ability* to attach a value to StopIteration is not new at all.

> Before pep 380 filter(lambda x: True, obj) returned an object that was
> the same kind of iterator as obj (it would yield the same values). Now
> the "kind of iterator" that obj is depends not only on the values that
> it yields but also on the value that it returns. Since filter does not
> pass on the same return value, filter(lambda x: True, obj) is no
> longer the same kind of iterator as obj. The same considerations apply
> to many other functions such as map, itertools.groupby,
> itertools.dropwhile.

There are other differences between iterators and generators that are
not preserved by the various forms of "iterator algebra" that can be
applied -- in particular, non-generator iterators don't support
send(). I think it's perfectly valid to view generators as a kind of
special iterators with properties that aren't preserved by applying
generic iterator operations to them (like itertools or filter()).

> Cases like itertools.chain and zip are trickier since they each act on
> multiple underlying iterables. Probably chain should return a tuple of
> the return values from each of its iterables.

That's one possible interpretation, but I doubt it's the most useful one.

> This feature was new in Python 3.3 which was released a week ago

It's been in alpha/beta/candidate for a long time, and PEP 380 was
first discussed in 2009.

> so it is not widely used but it has uses that are not anything to do with
> coroutines.

Yes, as a shortcut for "for x in <iterator>: yield x". Note that the
for-loop ignores the value in the StopIteration -- would you want to
change that too?

> As an example of how you could use it, consider parsing a
> file that can contains #include statements. When the #include
> statement is encountered we need to insert the contents of the
> included file. This is easy to do with a recursive generator. The
> example uses the return value of the generator to keep track of which
> line is being parsed in relation to the flattened output file:
> def parse(filename, output_lineno=0):
>     with open(filename) as fin:
>         for input_lineno, line in enumerate(fin):
>             if line.startswith('#include '):
>                 subfilename = line.split()[1]
>                 output_lineno = yield from parse(subfilename, output_lineno)
>             else:
>                 try:
>                     yield parse_line(line)
>                 except ParseLineError:
>                     raise ParseError(filename, input_lineno, output_lineno)
>                 output_lineno += 1
>     return output_lineno

Hm. This example looks constructed to prove your point... It would be
easier to count the output lines in the caller. Or you could use a
class to hold that state. I think it's just a bad habit to start using
the return value for this purpose. Please use the same approach as you
would before 3.3, using "yield from" just as the shortcut I mentione

> When writing code like the above that depends on being able to get the
> value returned from an iterator, it is no longer possible to freely
> mix utilities like filter, map, zip, itertools.chain with the
> iterators returned by parse() as they no longer act as transparent
> wrappers over the underlying iterators (by not propagating the value
> attached to StopIteration).

I see that as one more argument for not using the return value here...

> Hopefully, I've understood Serhiy and the docs correctly (I don't have
> access to Python 3.3 right now to test any of this).

I don't doubt it. But I think you're fighting windmills.

--Guido van Rossum (

From sven at  Mon Oct  8 01:43:25 2012
From: sven at (Sven Marnach)
Date: Mon, 8 Oct 2012 00:43:25 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <20121007234325.GA20216@bagheera>

On Thu, Oct 04, 2012 at 05:08:40PM +0200, Victor Stinner wrote:
> I think that the optimization should be implemented for Unicode
> strings, but disabled in PyObject_RichCompareBool().

Actually, this change to PyObject_RichCompareBool() has been made
before, but was reverted after the discussion in


From alexander.belopolsky at  Mon Oct  8 02:35:14 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Sun, 7 Oct 2012 20:35:14 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 4, 2012 at 9:53 AM, Steven D'Aprano <steve at> wrote:
> (Please do not start an argument about NANs and reflexivity. That's
> been argued to death, and there are very good reasons for the IEEE 754
> standard to define NANs the way they do.)

Why not?  This is python-ideas, isn't it?  I've been hearing that IEEE
754 committee had some "very good reasons" to violate reflexivity of
equality comparison with NaNs since I first learned about NaNs some 20
years ago.   From time to time, I've also heard claims that there are
some important numeric algorithms that depend on this behavior.
However, I've never been able to dig out the actual rationale that
convinced the committee that voted for IEEE 754 or any very good
reasons to preserve this behavior in Python.

I am not suggesting any language changes, but I think it will be
useful to explain why float('nan') != float('nan') somewhere in the
docs.  A reference to IEEE 754 does not help much.  Java implements
IEEE 754 to some extent, but preserves reflexivity of object equality.

From rosuav at  Mon Oct  8 02:42:28 2012
From: rosuav at (Chris Angelico)
Date: Mon, 8 Oct 2012 11:42:28 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 11:35 AM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> I am not suggesting any language changes, but I think it will be
> useful to explain why float('nan') != float('nan') somewhere in the
> docs.  A reference to IEEE 754 does not help much.  Java implements
> IEEE 754 to some extent, but preserves reflexivity of object equality.

NaN isn't a single value, but a whole category of values.
Conceptually, it's an uncountably infinite (I think that's the
technical term) of invalid results; in implementation, NaN has the
highest possible exponent and any non-zero mantissa.

So then the question becomes: Should *all* NaNs be equal, or only ones
with the same bit pattern? Aside from signalling vs non-signalling
NaNs, I don't think there's any difference between one and another, so
they should probably all compare equal. And once you go there, a huge
can o'worms is opened involving floating point equality.

It's much MUCH easier and simpler to defer to somebody else's standard
and just say "NaNs behave according to IEEE 754, blame them if you
don't like it". There would possibly be value in guaranteeing
reflexivity, but it would increase confusion somewhere else.


From mikegraham at  Mon Oct  8 02:43:35 2012
From: mikegraham at (Mike Graham)
Date: Sun, 7 Oct 2012 20:43:35 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 8:35 PM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> Java implements IEEE 754 to some extent, but preserves reflexivity of object equality.

I don't actually know Java, but if I run

class HelloNaN {
    public static void main(String[] args) {
        double nan1 = 0.0 / 0.0;
        double nan2 = 0.0 / 0.0;
        System.out.println(nan1 == nan2);

I get the output "false".


From alexander.belopolsky at  Mon Oct  8 02:47:36 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Sun, 7 Oct 2012 20:47:36 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

Try this with Double instead of double.  Note that I said "*object*
equality".  In Java, lowercase double is not an object type.

On Sun, Oct 7, 2012 at 8:43 PM, Mike Graham <mikegraham at> wrote:
> On Sun, Oct 7, 2012 at 8:35 PM, Alexander Belopolsky
> <alexander.belopolsky at> wrote:
>> Java implements IEEE 754 to some extent, but preserves reflexivity of object equality.
> I don't actually know Java, but if I run
> class HelloNaN {
>     public static void main(String[] args) {
>         double nan1 = 0.0 / 0.0;
>         double nan2 = 0.0 / 0.0;
>         System.out.println(nan1 == nan2);
>     }
> }
> I get the output "false".
> Mike

From alexander.belopolsky at  Mon Oct  8 02:50:01 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Sun, 7 Oct 2012 20:50:01 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 8:42 PM, Chris Angelico <rosuav at> wrote:
> It's much MUCH easier and simpler to defer to somebody else's standard
> and just say "NaNs behave according to IEEE 754, blame them if you
> don't like it". There would possibly be value in guaranteeing
> reflexivity, but it would increase confusion somewhere else.

I agree, but a good thing about standards is that there are plenty to
choose from.  We can as easily refer to Java as a standard.

From guido at  Mon Oct  8 02:52:41 2012
From: guido at (Guido van Rossum)
Date: Sun, 7 Oct 2012 17:52:41 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 9:09 PM, Duncan M. McGreggor
<duncan.mcgreggor at> wrote:
> We're here ;-)
> I'm forwarding this to the rest of the Twisted cabal...

Quick question. I'd like to see how Twisted typically implements a
protocol parser. Where would be a good place to start reading example

--Guido van Rossum (

From guido at  Mon Oct  8 02:54:10 2012
From: guido at (Guido van Rossum)
Date: Sun, 7 Oct 2012 17:54:10 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 5:50 PM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> On Sun, Oct 7, 2012 at 8:42 PM, Chris Angelico <rosuav at> wrote:
>> It's much MUCH easier and simpler to defer to somebody else's standard
>> and just say "NaNs behave according to IEEE 754, blame them if you
>> don't like it". There would possibly be value in guaranteeing
>> reflexivity, but it would increase confusion somewhere else.
> I agree, but a good thing about standards is that there are plenty to
> choose from.  We can as easily refer to Java as a standard.

Very funny.

Seriously, we can't change our position on this topic now without
making a lot of people seriously unhappy. IEEE 754 it is.

--Guido van Rossum (

From alexander.belopolsky at  Mon Oct  8 03:09:08 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Sun, 7 Oct 2012 21:09:08 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum <guido at> wrote:
> Seriously, we can't change our position on this topic now without
> making a lot of people seriously unhappy. IEEE 754 it is.

I did not suggest a change.  I wrote: "I am not suggesting any
language changes, but I think it will be
useful to explain why float('nan') != float('nan') somewhere in the
docs."  If there is a concise explanation for the choice of IEEE 754
vs. Java, I think we should write it down and put an end to this

From ben at  Mon Oct  8 03:41:52 2012
From: ben at (Ben Darnell)
Date: Sun, 7 Oct 2012 18:41:52 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
Message-ID: <>

Hi python-ideas,

I'm jumping in to this thread on behalf of Tornado.  I think there are
actually two separate issues here and it's important to keep them
distinct:  at a low level, there is a need for a standardized event
loop, while at a higher level there is a question of what asynchronous
code should look like.

This thread so far has been more about the latter, but the need for
standardization is more acute for the core event loop.  I've written a
bridge between Tornado and Twisted so libraries written for both event
loops can coexist, but obviously that wouldn't scale if there were a
proliferation of event loop implementations out there.  I'd be in
favor of a simple event loop interface in the standard library, with
reference implementation(s) (select, epoll, kqueue, iocp) and some
means of configuring the global (or thread-local) singleton.  My
preference is to keep the interface fairly low-level and close to the
underlying mechanisms (i.e. like IReactorFDSet instead of
IReactor{TCP,UDP,SSL,etc}), so that different interfaces like
Tornado's IOStream or Twisted's protocols can be built on top of it.

As for the higher-level question of what asynchronous code should look
like, there's a lot more room for spirited debate, and I don't think
there's enough consensus to declare a One True Way.  Personally, I'm
-1 on greenlets as a general solution (what if you have to call
MySQLdb or getaddrinfo?), although they can be useful in particular
cases to convert well-behaved synchronous code into async (as in
 I like Futures, though, and I find that they work well in
asynchronous code.  The use of the result() method to encapsulate both
successful responses and exceptions is especially nice with generator

FWIW, here's the interface I'm moving towards for async code.  From
the caller's perspective, asynchronous functions return a Future (the
future has to be constructed by hand since there is no Executor
involved), and also take an optional callback argument (mainly for
consistency with currently-prevailing patterns for async code; if the
callback is given it is simply added to the Future with
add_done_callback).  In Tornado the Future is created by a decorator
and hidden from the asynchronous function (it just sees the callback),
although this relies on some Tornado-specific magic for exception
handling.  In a coroutine, the decorator recognizes Futures and
resumes execution when the future is done.  With these decorators
asynchronous code looks almost like synchronous code, except for the
"yield" keyword before each asynchronous call.


From guido at  Mon Oct  8 03:51:51 2012
From: guido at (Guido van Rossum)
Date: Sun, 7 Oct 2012 18:51:51 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum <guido at> wrote:
>> Seriously, we can't change our position on this topic now without
>> making a lot of people seriously unhappy. IEEE 754 it is.
> I did not suggest a change.  I wrote: "I am not suggesting any
> language changes, but I think it will be
> useful to explain why float('nan') != float('nan') somewhere in the
> docs."  If there is a concise explanation for the choice of IEEE 754
> vs. Java, I think we should write it down and put an end to this
> debate.

Referencing Java here is absurd and I still consider this suggestion
as a troll. Python is not in any way based on Java.

On the other hand referencing IEEE 754 makes all the sense in the
world, since every other aspect of Python float is based on IEEE 754
double whenever the underlying platform implements this standard --
and all modern CPUs do. I don't think there's anything else we need to

--Guido van Rossum (

From guido at  Mon Oct  8 04:01:42 2012
From: guido at (Guido van Rossum)
Date: Sun, 7 Oct 2012 19:01:42 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 6:41 PM, Ben Darnell <ben at> wrote:
> Hi python-ideas,
> I'm jumping in to this thread on behalf of Tornado.


> I think there are
> actually two separate issues here and it's important to keep them
> distinct:  at a low level, there is a need for a standardized event
> loop, while at a higher level there is a question of what asynchronous
> code should look like.

Yes, yes. I tried to bring up thing distinction. I'm glad I didn't
completely fail.

> This thread so far has been more about the latter, but the need for
> standardization is more acute for the core event loop.  I've written a
> bridge between Tornado and Twisted so libraries written for both event
> loops can coexist, but obviously that wouldn't scale if there were a
> proliferation of event loop implementations out there.  I'd be in
> favor of a simple event loop interface in the standard library, with
> reference implementation(s) (select, epoll, kqueue, iocp) and some
> means of configuring the global (or thread-local) singleton.  My
> preference is to keep the interface fairly low-level and close to the
> underlying mechanisms (i.e. like IReactorFDSet instead of
> IReactor{TCP,UDP,SSL,etc}), so that different interfaces like
> Tornado's IOStream or Twisted's protocols can be built on top of it.

As long as it's not so low-level that other people shy away from it.

I also have a feeling that one way or another this will require
cooperation between the Twisted and Tornado developers in order to
come up with a compromise that both are willing to conform to in a
meaningful way. (Unfortunately I don't know how to define "meaningful
way" more precisely here. I guess the idea is that almost all things
*using* an event loop use the standardized abstract API without caring
whether underneath it's Tornado, Twisted, or some simpler thing in the

> As for the higher-level question of what asynchronous code should look
> like, there's a lot more room for spirited debate, and I don't think
> there's enough consensus to declare a One True Way.  Personally, I'm
> -1 on greenlets as a general solution (what if you have to call
> MySQLdb or getaddrinfo?), although they can be useful in particular
> cases to convert well-behaved synchronous code into async (as in
> Motor:

Agreed on both counts.

>  I like Futures, though, and I find that they work well in
> asynchronous code.  The use of the result() method to encapsulate both
> successful responses and exceptions is especially nice with generator
> coroutines.


> FWIW, here's the interface I'm moving towards for async code.  From
> the caller's perspective, asynchronous functions return a Future (the
> future has to be constructed by hand since there is no Executor
> involved),

Ditto for NDB (though there's a decorator that often takes care of the
future construction).

> and also take an optional callback argument (mainly for
> consistency with currently-prevailing patterns for async code; if the
> callback is given it is simply added to the Future with
> add_done_callback).

That's interesting. I haven't found the need for this yet. Is it
really so common that you can't write this as a Future() constructor
plus a call to add_done_callback()? Or is there some subtle semantic

> In Tornado the Future is created by a decorator
> and hidden from the asynchronous function (it just sees the callback),

Hm, interesting. NDB goes the other way, the callbacks are mostly used
to make Futures work, and most code (including large swaths of
internal code) uses Futures. I think NDB is similar to monocle here.
In NDB, you can do

  f = <some function returning a Future>
  r = yield f

where "yield f" is mostly equivalent to f.result(), except it gives
better opportunity for concurrency.

> although this relies on some Tornado-specific magic for exception
> handling.  In a coroutine, the decorator recognizes Futures and
> resumes execution when the future is done.  With these decorators
> asynchronous code looks almost like synchronous code, except for the
> "yield" keyword before each asynchronous call.

Yes! Same here.

I am currently trying to understand if using "yield from" (and
returning a value from a generator) will simplify things. For example
maybe the need for a special decorator might go away. But I keep
getting headaches -- perhaps there's a Monad involved. :-)

--Guido van Rossum (

From alexander.belopolsky at  Mon Oct  8 04:33:37 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Sun, 7 Oct 2012 22:33:37 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 9:51 PM, Guido van Rossum <guido at> wrote:
> Referencing Java here is absurd and I still consider this suggestion
> as a troll. Python is not in any way based on Java.

I did not suggest that.  Sorry if it came out this way.  I am well
aware that Python and Java were invented independently and have
different roots.  (IIRC, Java was born from Oak and Python from ABC
and Oak and ABC were both developed in the 1980s.)  IEEE 784 precedes
both languages and one team decided that equality reflexivity for
hashable objects was more important than IEEE 784 compliance while the
other decided otherwise.

Many Python features (mostly library) are motivated by C.  In the 90s,
"because C does it this way" was a good explanation for a language
feature.  Doing things differently from the "C way", on the other hand
would deserve an explanation.  These days, C is rarely first language
that a student learns.  Hopefully Python will take this place in not
so distant future, but many students graduated in late 90s - early
2000s knowing nothing but Java.   As a result, these days it is a
valid question to ask about a language feature: "Why does Python do X
differently from Java?"  Hopefully in most cases the answer is
"because Python does it better."

In case of nan != nan, I would really like to know a modern reason why
Python's way is better.  Better compliance with a 20-year old standard
does not really qualify.

From ned at  Mon Oct  8 04:35:17 2012
From: ned at (Ned Batchelder)
Date: Sun, 07 Oct 2012 22:35:17 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/7/2012 9:51 PM, Guido van Rossum wrote:
> On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky
> <alexander.belopolsky at> wrote:
>> On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum <guido at> wrote:
>>> Seriously, we can't change our position on this topic now without
>>> making a lot of people seriously unhappy. IEEE 754 it is.
>> I did not suggest a change.  I wrote: "I am not suggesting any
>> language changes, but I think it will be
>> useful to explain why float('nan') != float('nan') somewhere in the
>> docs."  If there is a concise explanation for the choice of IEEE 754
>> vs. Java, I think we should write it down and put an end to this
>> debate.
> Referencing Java here is absurd and I still consider this suggestion
> as a troll. Python is not in any way based on Java.
> On the other hand referencing IEEE 754 makes all the sense in the
> world, since every other aspect of Python float is based on IEEE 754
> double whenever the underlying platform implements this standard --
> and all modern CPUs do. I don't think there's anything else we need to
> say.
I don't understand the reluctance to address a common conceptual 
speed-bump in the docs.  After all, the tutorial has an entire chapter 
( that explains how 
floats work, even though they work exactly as IEEE 754 says they should.

A sentence in section 5.4 (Numeric Types) would help.  Something like, 
"In accordance with the IEEE 754 standard, NaN's are not equal to any 
value, even another NaN.  This is because NaN doesn't represent a 
particular number, it represents an unknown result, and there is no way 
to know if one unknown result is equal to another unknown result."


From tjreedy at  Mon Oct  8 04:40:31 2012
From: tjreedy at (Terry Reedy)
Date: Sun, 07 Oct 2012 22:40:31 -0400
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <k4tefg$52k$>

On 10/7/2012 7:30 PM, Greg Ewing wrote:
> Oscar Benjamin wrote:
>> Before pep 380 filter(lambda x: True, obj) returned an object that was
>> the same kind of iterator as obj (it would yield the same values). Now
>> the "kind of iterator" that obj is depends not only on the values that
>> it yields but also on the value that it returns. Since filter does not
>> pass on the same return value, filter(lambda x: True, obj) is no
>> longer the same kind of iterator as obj.
> Something like this has happened before, when the ability to
> send() values into a generator was added. If you wrap a
> generator with filter, you likewise don't get the same kind
> of object -- you don't get the ability to send() things
> into your filtered generator.
> So, "provide the same kind of iterator" is not currently part
> of the contract of these functions.

Iterators are Python's generic sequential access device. They do that 
one thing and do it well.

The iterator protocol is intentionally and properly minimal. An iterator 
class *must* have appropriate .__iter__ and .__next__ methods. It *may* 
also have any other method and any data attribute. Indeed, any iterator 
much have some specific internal data. But these are ignored in generic 
iterator (or iterable) functions. If one does not want that, one should 
write more specific code.

For instance, file objects are iterators. Wrappers such as filter(lambda 
line: line[0] != '\n', open('somefile')) do not have any of the many 
other file methods and attributes. No one expects otherwise. If one 
needs access to the other attributes of the file object, one keeps a 
direct reference to the file object. Hence, the recommended idiom is to 
use a with statement.

Generators are another class of objects that are both iterators (and 
hence iterables) and something more. When they are used as input 
arguments to generic functions of iterables, the other behaviors are 
ignored, and should be ignored, just as with file objects and any other 
iterator+ objects.

Serhily, if you want a module of *generator* specific functions 
('gentools' ?), you should write one and submit it to pypi for testing.

Terry Jan Reedy

From alexander.belopolsky at  Mon Oct  8 04:48:53 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Sun, 7 Oct 2012 22:48:53 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 10:33 PM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> In case of nan != nan, I would really like to know a modern reason why
> Python's way is better.

To this end, a link to Kahan's "How Java?s Floating-Point Hurts
Everyone Everywhere" <>
may be appropriate.

From guido at  Mon Oct  8 04:49:29 2012
From: guido at (Guido van Rossum)
Date: Sun, 7 Oct 2012 19:49:29 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 7:16 PM, Duncan McGreggor
<duncan.mcgreggor at> wrote:
> On Sun, Oct 7, 2012 at 5:52 PM, Guido van Rossum <guido at> wrote:
>> On Sat, Oct 6, 2012 at 9:09 PM, Duncan M. McGreggor
>> <duncan.mcgreggor at> wrote:
>> > We're here ;-)
>> >
>> > I'm forwarding this to the rest of the Twisted cabal...
>> Quick question. I'd like to see how Twisted typically implements a
>> protocol parser. Where would be a good place to start reading example
>> code?
> I'm not exactly sure what you're looking for (e.g., I'm not sure what your
> exact definition of a protocol parser is), but this might be getting close
> to what you want:
>  *
>  *
> The POP3 protocol implementation in Twisted is a pretty good example of how
> one should create a protocol. It's a subclass of the
> twisted.protocol.basic.LineOnlyReceiver, and I'm guessing when you said
> "parsing" you're wanting to look at what's in the dataReceived method of
> that class.
> Hopefully that's what you were after...

Yes, those are perfect. The term I used came from one of Josiah's
previous messages in this thread, but I think he really meant protocol

My current goal is to see if it would be possible to come up with an
abstraction that makes it possible to write protocol handlers that are
independent from the rest of the infrastructure (e.g. transport,
reactor). I honestly have no idea if this is a sane idea but I'm going
to look into it anyway; if it works it would be cool to be able to
reuse the same POP3 logic in different environments (e.g. synchronous
thread-based, Twisted) without having to pul in all of Twisted. I.e.
Twisted could contribute the code to the stdlib and the stdlib could
make it work with SocketServer but Twisted could still use it
(assuming Twisted ever gets ported to Py3k :-).

--Guido van Rossum (

From rob.cliffe at  Mon Oct  8 05:09:06 2012
From: rob.cliffe at (Rob Cliffe)
Date: Mon, 08 Oct 2012 04:09:06 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 08/10/2012 03:35, Ned Batchelder wrote:
> On 10/7/2012 9:51 PM, Guido van Rossum wrote:
>> On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky
>> <alexander.belopolsky at> wrote:
>>> On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum <guido at> 
>>> wrote:
>>>> Seriously, we can't change our position on this topic now without
>>>> making a lot of people seriously unhappy. IEEE 754 it is.
>>> I did not suggest a change.  I wrote: "I am not suggesting any
>>> language changes, but I think it will be
>>> useful to explain why float('nan') != float('nan') somewhere in the
>>> docs."  If there is a concise explanation for the choice of IEEE 754
>>> vs. Java, I think we should write it down and put an end to this
>>> debate.
>> Referencing Java here is absurd and I still consider this suggestion
>> as a troll. Python is not in any way based on Java.
>> On the other hand referencing IEEE 754 makes all the sense in the
>> world, since every other aspect of Python float is based on IEEE 754
>> double whenever the underlying platform implements this standard --
>> and all modern CPUs do. I don't think there's anything else we need to
>> say.
> I don't understand the reluctance to address a common conceptual 
> speed-bump in the docs.  After all, the tutorial has an entire chapter 
> ( that explains how 
> floats work, even though they work exactly as IEEE 754 says they should.
> A sentence in section 5.4 (Numeric Types) would help.  Something like, 
> "In accordance with the IEEE 754 standard, NaN's are not equal to any 
> value, even another NaN.  This is because NaN doesn't represent a 
> particular number, it represents an unknown result, and there is no 
> way to know if one unknown result is equal to another unknown result."
> --Ned.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
I understand that the undefined result of a computation is not the same 
as the undefined result of another computation.
(E.g. one might represent positive infinity, another might represent 
underflow or loss of accuracy.)
But I can't help feeling (strongly) that the result of a computation 
should be equal to itself.
In other words, after
     x = float('nan')
     y = float('nan')
I would expect
     x != y
     x == x

After all, how much sense does this make (I got this in a quick test 
with Python 2.7.3):
 >>> x=float('nan')
 >>> x is x
True            # Well I guess you'd sorta expect this
 >>> x==x
False           # You what?
 >>> D = {1:x, 2:x}
 >>> D[1]==D[2]
False          # I see, both NANs - hmph!
 >>> [x]==[x]
True            # Oh yeh, it doesn't always work that way then?

Making equality non-reflexive feels utterly wrong to me, partly no doubt 
because of my mathematical background, partly because of the difficulty 
in implementing container objects and algorithms and God knows what else 
when you have to remember that some of the objects they may deal with 
may not be equal to themselves.  In particular the difference between my 
last two examples ( D[1]!=D[2] but [x]==[x] ) looks impossible to 
justify except by saying that for historical reasons the designers of 
lists and the designers of dictionaries made different - but entirely 
reasonable - assumptions about the equality relation, and (perhaps) 
whether identity implies equality (how do you explain to a Python 
learner that it doesn't (pathological code examples aside) ???).
Couldn't each NAN when generated contain something that identified it 
uniquely, so that different NANs would always compare as not equal, but 
any given NAN would compare equal to itself?
Rob Cliffe

From alexander.belopolsky at  Mon Oct  8 05:46:43 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Sun, 7 Oct 2012 23:46:43 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 11:09 PM, Rob Cliffe <rob.cliffe at> wrote:
> Couldn't each NAN when generated contain something that identified it
> uniquely, so that different NANs would always compare as not equal, but any
> given NAN would compare equal to itself?

If we take this route and try to distinguish NaNs with different
payload, I am sure you will want to distinguish between -0.0 and 0.0
as well.  The later would violate transitivity in -0.0 == 0 == 0.0.

The only sensible thing to do with NaNs is either to treat them all
equal (the Eiffel way) or to stick to IEEE default.

I don't think NaN behavior in Python is a result of a deliberate
decision to implement IEEE 754.  If that was the case, why 0.0/0.0
does not produce NaN?  Similarly, Python math library does not produce
infinities where IEEE 754 compliant library should:

>>> math.log(0.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: math domain error

Some other operations behave inconsistently:

>>> 2 * 10.**308

>>> 10.**309
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: (34, 'Result too large')

I think non-reflexivity of nan in Python is an accidental feature.
Python's float type was not designed with NaN in mind and until
recently, it was relatively difficult to create a nan in pure python.

It is also not true that IEEE 754 requires that nan == nan is false.
IEEE 754 does not define operator '==' (nor does it define boolean
false).  Instead, IEEE defines a comparison operation that can have
one of four results: >, <, =, or unordered.  The standard does require
than NaN compares unordered with anything including itself, but it
does not follow that a language that defines an == operator with
boolean results must define it so that nan == nan is false.

From ben at  Mon Oct  8 06:44:27 2012
From: ben at (Ben Darnell)
Date: Sun, 7 Oct 2012 21:44:27 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 7:01 PM, Guido van Rossum <guido at> wrote:
> As long as it's not so low-level that other people shy away from it.

That depends on the target audience.  The low-level IOLoop and Reactor
are pretty similar -- you can implement one in terms of the other --
but as you move up the stack cross-compatibility becomes harder.  For
example, if I wanted to implement tornado's IOStreams in twisted, I
wouldn't start with the analogous class in twisted (Protocol?), I'd go
down to the Reactor and build from there, so putting something
IOStream or Protocol in asycore2 wouldn't do much to unify the two
worlds.  (it would help people build async stuff with the stdlib
alone, but at that point it becomes more like a peer or competitor to
tornado and twisted instead of a bridge between them)

> I also have a feeling that one way or another this will require
> cooperation between the Twisted and Tornado developers in order to
> come up with a compromise that both are willing to conform to in a
> meaningful way. (Unfortunately I don't know how to define "meaningful
> way" more precisely here. I guess the idea is that almost all things
> *using* an event loop use the standardized abstract API without caring
> whether underneath it's Tornado, Twisted, or some simpler thing in the
> stdlib.

I'd phrase the goal as being able to run both Tornado and Twisted in
the same thread without any piece of code needing to know about both
systems.  I think that's achievable as far as core functionality goes.
 I expect both sides have some lesser-used functionality that might
not make it into the stdlib version, but as long as it's possible to
plug in a "real" IOLoop or Reactor when needed it should be OK.

>> As for the higher-level question of what asynchronous code should look
>> like, there's a lot more room for spirited debate, and I don't think
>> there's enough consensus to declare a One True Way.  Personally, I'm
>> -1 on greenlets as a general solution (what if you have to call
>> MySQLdb or getaddrinfo?), although they can be useful in particular
>> cases to convert well-behaved synchronous code into async (as in
>> Motor:
> Agreed on both counts.
>>  I like Futures, though, and I find that they work well in
>> asynchronous code.  The use of the result() method to encapsulate both
>> successful responses and exceptions is especially nice with generator
>> coroutines.
> Yay!
>> FWIW, here's the interface I'm moving towards for async code.  From
>> the caller's perspective, asynchronous functions return a Future (the
>> future has to be constructed by hand since there is no Executor
>> involved),
> Ditto for NDB (though there's a decorator that often takes care of the
> future construction).
>> and also take an optional callback argument (mainly for
>> consistency with currently-prevailing patterns for async code; if the
>> callback is given it is simply added to the Future with
>> add_done_callback).
> That's interesting. I haven't found the need for this yet. Is it
> really so common that you can't write this as a Future() constructor
> plus a call to add_done_callback()? Or is there some subtle semantic
> difference?

It's a Future constructor, a (conditional) add_done_callback, plus the
calls to set_result or set_exception and the with statement for error
handling.  In full:

def future_wrap(f):
    def wrapper(*args, **kwargs):
        future = Future()
        if kwargs.get('callback') is not None:
        kwargs['callback'] = future.set_result
        def handle_error(typ, value, tb):
            return True
        with ExceptionStackContext(handle_error):
            f(*args, **kwargs)
        return future
    return wrapper

>> In Tornado the Future is created by a decorator
>> and hidden from the asynchronous function (it just sees the callback),
> Hm, interesting. NDB goes the other way, the callbacks are mostly used
> to make Futures work, and most code (including large swaths of
> internal code) uses Futures. I think NDB is similar to monocle here.
> In NDB, you can do
>   f = <some function returning a Future>
>   r = yield f
> where "yield f" is mostly equivalent to f.result(), except it gives
> better opportunity for concurrency.

Yes, tornado's gen.engine does the same thing here.  However, the
stakes are higher than "better opportunity for concurrency" - in an
event loop if you call future.result() without yielding, you'll
deadlock if that Future's task needs to run on the same event loop.

>> although this relies on some Tornado-specific magic for exception
>> handling.  In a coroutine, the decorator recognizes Futures and
>> resumes execution when the future is done.  With these decorators
>> asynchronous code looks almost like synchronous code, except for the
>> "yield" keyword before each asynchronous call.
> Yes! Same here.
> I am currently trying to understand if using "yield from" (and
> returning a value from a generator) will simplify things. For example
> maybe the need for a special decorator might go away. But I keep
> getting headaches -- perhaps there's a Monad involved. :-)

I think if you build generator handling directly into the event loop
and use "yield from" for calls from one async function to another then
you can get by without any decorators.  But I'm not sure if you can do
that and maintain any compatibility with existing non-generator async

I think the ability to return from a generator is actually a bigger
deal than "yield from" (and I only learned about it from another
python-ideas thread today).  The only reason a generator decorated
with @tornado.gen.engine needs a callback passed in to it is to act as
a psuedo-return, and a real return would prevent the common mistake of
running the callback then falling through to the rest of the function.

For concreteness, here's a crude sketch of what the APIs I'm talking
about would look like in use (in a hypothetical future version of

def async_http_client(url, callback):
    parsed_url = urlparse.urlsplit(url)
    # works the same whether the future comes from a thread pool or @future_wrap
    addrinfo = yield g_thread_pool.submit(socket.getaddrinfo,
parsed_url.hostname, parsed_url.port)
    stream = IOStream(socket.socket())
    yield stream.connect((addrinfo[0][-1]))
    stream.write('GET %s HTTP/1.0' % parsed_url.path)
    header_data = yield stream.read_until('\r\n\r\n')
    headers = parse_headers(header_data)
    body_data = yield stream.read_bytes(int(headers['Content-Length']))

# another function to demonstrate composability
def fetch_some_urls(url1, url2, url3, callback):
    body1 = yield async_http_client(url1)
    # yield a list of futures for concurrency
    future2 = yield async_http_client(url2)
    future3 = yield async_http_client(url3)
    body2, body3 = yield [future2, future3]
    callback((body1, body2, body3))

One hole in this design is how to deal with callbacks that are run
multiple times.  For example, the IOStream read methods take both a
regular callback and an optional streaming_callback (which is called
with each chunk of data as it arrives).  I think this needs to be
modeled as something like an iterator of Futures, but I haven't worked
out the details yet.


> --
> --Guido van Rossum (

From g.brandl at  Mon Oct  8 08:05:29 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 08 Oct 2012 08:05:29 +0200
Subject: [Python-ideas] PEP 428 - joining
In-Reply-To: <>
References: <>
	<> <k4suk9$nv8$>
Message-ID: <k4tqd4$hk9$>

Am 08.10.2012 00:29, schrieb MRAB:

>>>> I'd much rather if joining an absolute path to a relative one fail and
>>>> reveal the potential bug....
>>>>     >>> os.unlink(Path('myproj') / Path('/lib'))
>>>>     Traceback (most recent call last):
>>>>       File "<stdin>", line 1, in <module>
>>>>     TypeError: absolute path can't be appended to a relative path
>>> In all honesty I followed os.path.join's behaviour here. I agree a
>>> ValueError (not TypeError) would be sensible too.
>> Please no -- this is a very important use case (for os.path.join, at least):
>> resolving a path from config/user/command line that can be given either absolute
>> or relative to a certain directory.
>> Right now it's as simple as join(default, path), and i'd prefer to keep this.
>> There is no bug here, it's working as designed.
> In that use case, wouldn't it be more likely that the default is itself
> absolute, so it'd be either relative to that absolute path or
> overriding that absolute path with another absolute path?

That doesn't really matter; the default could be anything (e.g. "." could be a
common value).


From solipsis at  Mon Oct  8 08:26:28 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 8 Oct 2012 08:26:28 +0200
Subject: [Python-ideas] checking for identity before comparing built-in
References: <>
Message-ID: <>

On Sun, 07 Oct 2012 22:35:17 -0400
Ned Batchelder <ned at>
> I don't understand the reluctance to address a common conceptual 
> speed-bump in the docs.  After all, the tutorial has an entire chapter 
> ( that explains how 
> floats work, even though they work exactly as IEEE 754 says they should.
> A sentence in section 5.4 (Numeric Types) would help.  Something like, 
> "In accordance with the IEEE 754 standard, NaN's are not equal to any 
> value, even another NaN.  This is because NaN doesn't represent a 
> particular number, it represents an unknown result, and there is no way 
> to know if one unknown result is equal to another unknown result."




Software development and contracting:

From solipsis at  Mon Oct  8 08:30:08 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 8 Oct 2012 08:30:08 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Mon, 08 Oct 2012 11:55:26 +1300
Greg Ewing <greg.ewing at> wrote:
> Not all that well, apparently. From the docs for os.path:
> os.path.normcase(path)
>      Normalize the case of a pathname. On Unix and Mac OS X, this returns the
>      path unchanged; on case-insensitive filesystems, it converts the path to
>      lowercase. On Windows, it also converts forward slashes to backward slashes.
> This is partially self-contradictory, since many MacOSX filesystems are
> actually case-insensitive; it depends on the particular filesystem concerned.
> Worse, different parts of the same path can have different case sensitivities.
> Also, with network file systems, not all paths are necessarily case-insensitive
> on Windows.

That's true, but considering paths case-insensitive under Windows and
case-sensitive under (non-OS X) Unix is still a very good approximation
that seems to satisfy most everyone.

> So there's really no certain way to compare pure paths for equality. Basing
> it on which OS is running your code is no more than a guess.

I wonder how well other file-dealing tools cope under OS X, especially
those that are portable and not OS X-specific.



Software development and contracting:

From stephen at  Mon Oct  8 10:12:24 2012
From: stephen at (Stephen J. Turnbull)
Date: Mon, 08 Oct 2012 17:12:24 +0900
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

Andy Buckley writes:

 > So one last question, in case it is an acceptable python-ideas topic:
 > how about adding readline-like support by default in the
 > interpreter?

If readline-like support is available on the system, it's used.
However, it's apparently only readline-like.  For example, on Mac OS
X, the BSD-licensed libedit readline emulation is used by default, it
appears.  I wouldn't expect full functionality there.

On GNU/Linux systems, as I wrote, True GNU readline is used.  Why this
particular function isn't bound or doesn't work right, I don't know
offhand.  It is apparently a bug (my Python sources are from April,
but I can't see why this would change), since the sources say
(ll. 927-931 of Modules/readline.c):

    /* Initialize (allows .inputrc to override)
     * XXX: A bug in the readline-2.2 library causes a memory leak
     * inside this function.  Nothing we can do about it.

 but even adding a binding to .inputrc doesn't work for me (Gentoo Linux).


are related; I don't know whether it's worth filing an additional bug
as I suspect it will get fixed in passing if 8492 is fixed.

From ncoghlan at  Mon Oct  8 12:31:06 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Oct 2012 16:01:06 +0530
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

I've said before that I like the general shape of the pathlib API and
that's still the case. It's the only OO API I've seen that's
semantically clean enough for me to support introducing it as "the"
standard path abstraction in the standard library.

However, there are still a few rough edges I would like to see smoothed out :)

On Sat, Oct 6, 2012 at 5:48 PM, Antoine Pitrou <solipsis at> wrote:
> On Sat, 6 Oct 2012 11:27:58 +0100
> Paul Moore <p.f.moore at> wrote:
>> I agree that's what I thought relative() would be when I first read the name.
> You are right, relative() could be removed and replaced with the
> current relative_to() method. I wasn't sure about how these names would
> feel to a native English speaker.

The minor problem is that "relative" on its own is slightly unclear
about whether the invariant involved is "a ==
b.subpath(a.relative(b))" or "b == a.subpath(a.relative(b))"

By including the extra word, the intended meaning becomes crystal
clear: "a == b.subpath(a.relative_to(b))"

However, "a relative to b" is the more natural interpretation, so +1
for using "relative" for the semantics of the method based equivalent
to the current os.path.relpath(). I agree there's no need for a
shorthand for "a.relative(a.root)"

As the invariants above suggest, I'm also currently -1 on *any* of the
proposed shorthands for "p.subpath(subpath)", *as well as* the use of
"join" as the method name (due to the major difference in semantics
relative to str.join).

All of the shorthands are magical and/or cryptic and save very little
typing over the explicitly named method. As already noted in the PEP,
you can also shorten it manually by saving the bound method to a local

It's important to remember that you can't readily search for syntactic
characters or common method names to find out what they mean, and
these days that kind of thing should be taken into account when
designing an API. "p.subpath('foo', 'bar')" looks like executable
pseudocode for creating a new path based on existing one to me, unlike
"p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')".

The method semantics are obvious by comparison, since they would be
the same as those for ordinary construction: "p.subpath(*args) ==
type(p)(p, *args)"

I'm not 100% sold on "subpath" as an alternative (since ".." entries
may mean that the result isn't really a subpath of the original
directory at all), but I do like the way it reads in the absence of
parent directory references, and I definitely like it better than
"join" or "[]" or "/" or "+". This interpretation is also favoured by
the fact that the calculation of relative path references is strict by
default (i.e. it won't insert ".." to make the reference work when the
target isn't a subpath)

> You can't really add '..' components and expect the result to be
> correct, for example if '/usr/lib' is a symlink to '/lib', then
> '/usr/lib/..' is '/', not /usr'.
> That's why the resolve() method, which resolves symlinks along the path,
> is the only one allowed to muck with '..' components.

This seems too strict for the general case. Configuration files in
bundled applications, for example, often contain paths relative to the
file (e.g. open up a Visual Studio project file). There are no
symlinks involved there. Perhaps a "require_subpath" flag that
defaults to True would be appropriate? Passing "require_subpath=False"
would then provide explicit permission to add ".." entries as
appropriate, and it would be up to the developer to document the "no
symlinks!" restriction on their layout.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ronaldoussoren at  Mon Oct  8 12:00:22 2012
From: ronaldoussoren at (Ronald Oussoren)
Date: Mon, 08 Oct 2012 12:00:22 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 7 Oct, 2012, at 23:43, Arnaud Delobelle <arnodel at> wrote:

> On 7 October 2012 18:37, Antoine Pitrou <solipsis at> wrote:
>> Pure comparison already obeys case-sensitivity rules as well as the
>> different path separators:
>>>>> PureNTPath('a/b') == PureNTPath('A\\B')
>> True
>>>>> PurePosixPath('a/b') == PurePosixPath('a\\b')
>> False
> Naive question: how do you deal with HFS+, which is case-preserving
> but on most machines case-insensitive?

Or CIFS filesystems mounted on a Linux?   Case-sensitivity is a file-system property, not a operating system one.


From phd at  Mon Oct  8 13:07:48 2012
From: phd at (Oleg Broytman)
Date: Mon, 8 Oct 2012 15:07:48 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren <ronaldoussoren at> wrote:
> Or CIFS filesystems mounted on a Linux?   Case-sensitivity is a file-system property, not a operating system one.

   But there is no API to ask what type of filesystem a path belongs to.
So guessing by OS name is the only heuristic we can do.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From flub at  Mon Oct  8 13:10:05 2012
From: flub at (Floris Bruynooghe)
Date: Mon, 8 Oct 2012 12:10:05 +0100
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 October 2012 03:49, Guido van Rossum <guido at> wrote:
> My current goal is to see if it would be possible to come up with an
> abstraction that makes it possible to write protocol handlers that are
> independent from the rest of the infrastructure (e.g. transport,
> reactor).

This would be my ideal situation too and I think this is what PEP 3153
was trying to achieve.  While I am an greenlet (eventlet) user I agree
with the sentiment that it is not ideal to include it into the stdlib
itself and instead work to a solution where we can share protocol
implementations while having the freedom to run on a twisted reactor,
tornado, something greenlet based or something in the stdlib depending
on the preference of the developer.

FWIW I have implemented the AgentX protocol based on PEP-3153 and it
isn't complete yet (I had to go outside of what it defines).  It is
also rather heavy handed and I'm not sure how one could migrate the
stdlib to something like this.  So hopefully there are better
solutions possible.


From p.f.moore at  Mon Oct  8 13:11:52 2012
From: p.f.moore at (Paul Moore)
Date: Mon, 8 Oct 2012 12:11:52 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 October 2012 11:31, Nick Coghlan <ncoghlan at> wrote:
> It's important to remember that you can't readily search for syntactic
> characters or common method names to find out what they mean, and
> these days that kind of thing should be taken into account when
> designing an API. "p.subpath('foo', 'bar')" looks like executable
> pseudocode for creating a new path based on existing one to me, unlike
> "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')".

Until precisely this point in your email, I'd been completely
confused, because I thought that p.supbath(xxx) was some sort of "is
xxx a subpath of p" query. It never occurred to me that it was the
os.path.join equivalent operation. In fact, I'm not sure where you got
it from, as I couldn't find it in either the PEP or in pathlib's

I'm not unhappy with using a method for creating a new path based on
an existing one (none of the operator forms seems particularly
compelling to me) but I really don't like subpath as a name.

I don't dislike p.join(parts) as it links back nicely to os.path.join.
I can't honestly see anyone getting confused in practice. But I'm not
so convinced that I would want to insist on it.

+1 on a method
-1 on subpath as its name
+0 on join as its name
I'm happy for someone to come up with a better name

-0 on a convenience operator form. Mainly because "only one way to do
it" and the general controversy over which is the best operator to
use, suggests that leaving the operator form out altogether at least
in the initial implementation is the better option.


From christian at  Mon Oct  8 14:39:14 2012
From: christian at (Christian Heimes)
Date: Mon, 08 Oct 2012 14:39:14 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Ben,

Am 08.10.2012 03:41, schrieb Ben Darnell:
> This thread so far has been more about the latter, but the need for
> standardization is more acute for the core event loop.  I've written a
> bridge between Tornado and Twisted so libraries written for both event
> loops can coexist, but obviously that wouldn't scale if there were a
> proliferation of event loop implementations out there.  I'd be in
> favor of a simple event loop interface in the standard library, with
> reference implementation(s) (select, epoll, kqueue, iocp) and some
> means of configuring the global (or thread-local) singleton.

Python's standard library doesn't contain in interface to I/O Completion
Ports. I think a common event loop system is a good reason to add IOCP
if somebody is up for the challenge.

Would you prefer an IOCP wrapper in the stdlib or your own version?
Twisted has its own Cython based wrapper, some other libraries use a
libevent-based solution.


From stephen at  Mon Oct  8 14:46:13 2012
From: stephen at (Stephen J. Turnbull)
Date: Mon, 08 Oct 2012 21:46:13 +0900
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Paul Moore writes:
 > On 8 October 2012 11:31, Nick Coghlan <ncoghlan at> wrote:

 > > designing an API. "p.subpath('foo', 'bar')" looks like executable
 > > pseudocode for creating a new path based on existing one to me, unlike
 > > "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')".
 > Until precisely this point in your email, I'd been completely
 > confused, because I thought that p.supbath(xxx) was some sort of "is
 > xxx a subpath of p" query.

I agree with Paul on this.

If .join() doesn't work for you, how about .append() for adding new
path components at the end, vs. .suffix() for adding an extension to
the last component?

(I don't claim Paul would agree with this next, but as long as I'm
here....)  I really think that the main API for paths should be the
API for sequences specialized to "sequence of path components", with a
subsidiary set of operations for common textual manipulations applied
to individual components.

From him at  Mon Oct  8 15:34:52 2012
From: him at (=?ISO-8859-1?Q?Joachim_K=F6nig?=)
Date: Mon, 08 Oct 2012 15:34:52 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On 08/10/2012 03:41 Ben Darnell wrote:
> As for the higher-level question of what asynchronous code should look
> like, there's a lot more room for spirited debate, and I don't think
> there's enough consensus to declare a One True Way.  Personally, I'm
> -1 on greenlets as a general solution (what if you have to call
> MySQLdb or getaddrinfo?)

The caller of such a potentially blocking function could:

* spawn a new thread for the call
* call the function inside the thread and collect return value or exception
* register the thread (id) to inform the event loop (scheduler) it's 
waiting for it's completion
* yield (aka "switch" in greenlet) to the event loop / scheduler
* upon continuation either continue with the result or reraise the 
exception that happened in the thread

Unfortunately on Unix systems select/poll/kqueue cannot specify threads as
event resources, so an additional pipe descriptor would be needed for 
the scheduler
to detect thread completions without blocking (threads would write to 
the pipe upon
completion), not elegant but doable.


From phd at  Mon Oct  8 16:28:12 2012
From: phd at (Oleg Broytman)
Date: Mon, 8 Oct 2012 18:28:12 +0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 08, 2012 at 03:59:18PM +0200, Ronald Oussoren <ronaldoussoren at> wrote:
> On 8 Oct, 2012, at 13:07, Oleg Broytman <phd at> wrote:
> > On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren <ronaldoussoren at> wrote:
> >> Or CIFS filesystems mounted on a Linux?   Case-sensitivity is a file-system property, not a operating system one.
> > 
> >   But there is no API to ask what type of filesystem a path belongs to.
> > So guessing by OS name is the only heuristic we can do.
> I guess so, as neither statvs, statvfs,  nor pathconf seem to be able to tell if a filesystem is case insensitive.
> The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem.

   If a filesystem mounted to w32 is exported from a server by CIFS/SMB
protocol -- is it case sensitive? What if said server is Linux? What if
said filesystem was actually imported to Linux from a Novel server by
NetWare Core Protocol. It's not a fictional situation -- I do it at; the server is Linux that mounts two CIFS and NCP filesystem
and reexport them via Samba.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From ronaldoussoren at  Mon Oct  8 15:59:18 2012
From: ronaldoussoren at (Ronald Oussoren)
Date: Mon, 08 Oct 2012 15:59:18 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 Oct, 2012, at 13:07, Oleg Broytman <phd at> wrote:

> On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren <ronaldoussoren at> wrote:
>> Or CIFS filesystems mounted on a Linux?   Case-sensitivity is a file-system property, not a operating system one.
>   But there is no API to ask what type of filesystem a path belongs to.
> So guessing by OS name is the only heuristic we can do.

I guess so, as neither statvs, statvfs,  nor pathconf seem to be able to tell if a filesystem is case insensitive.

The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem.


From rosuav at  Mon Oct  8 17:03:59 2012
From: rosuav at (Chris Angelico)
Date: Tue, 9 Oct 2012 02:03:59 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 1:28 AM, Oleg Broytman <phd at> wrote:
>    If a filesystem mounted to w32 is exported from a server by CIFS/SMB
> protocol -- is it case sensitive? What if said server is Linux? What if
> said filesystem was actually imported to Linux from a Novel server by
> NetWare Core Protocol. It's not a fictional situation -- I do it at
>; the server is Linux that mounts two CIFS and NCP filesystem
> and reexport them via Samba.

And I thought I was weird in using sshfs and Samba together to
"bounce" drive access without having to set up SMB passwords for lots
of systems...

Would it be safer to simply assume that everything's case sensitive
until you actually do a filesystem call (a stat or something)? That
is, every Pure function works as though the FS is case sensitive?


From jsbueno at  Mon Oct  8 17:13:55 2012
From: jsbueno at (Joao S. O. Bueno)
Date: Mon, 8 Oct 2012 12:13:55 -0300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 October 2012 11:28, Oleg Broytman <phd at> wrote:
> On Mon, Oct 08, 2012 at 03:59:18PM +0200, Ronald Oussoren <ronaldoussoren at> wrote:
>> On 8 Oct, 2012, at 13:07, Oleg Broytman <phd at> wrote:
>> > On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren <ronaldoussoren at> wrote:
>> >> Or CIFS filesystems mounted on a Linux?   Case-sensitivity is a file-system property, not a operating system one.
>> >
>> >   But there is no API to ask what type of filesystem a path belongs to.
>> > So guessing by OS name is the only heuristic we can do.
>> I guess so, as neither statvs, statvfs,  nor pathconf seem to be able to tell if a filesystem is case insensitive.
>> The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem.
>    If a filesystem mounted to w32 is exported from a server by CIFS/SMB
> protocol -- is it case sensitive? What if said server is Linux? What if
> said filesystem was actually imported to Linux from a Novel server by
> NetWare Core Protocol. It's not a fictional situation -- I do it at
>; the server is Linux that mounts two CIFS and NCP filesystem
> and reexport them via Samba.

Actually, after just thinking of a few corner cases, (and in this case
seen some real world scenarios) it is easy to infer that it is impossible
to estabilish for certain that a filesystem, worse, that a given
directory,  is case-sensitive or not.

So, regardless of general passive assumptions, I think Python should include a
way to actively verify the filesystem case sensitivity. Something along
"assert_case_sensitiveness(<path>)" that would check for a filename
in the given path, and try to retrieve it inverting some capitalization.
If a suitable filename were not found in the given directory, it could
raise an error - or try to make an active test by writtng there (this behavior
should be controled by keyword parameters).

So, whenever one needs to know about case sensitiveness, there would
be one obvious way in place to know for shure, even at the cost of
some extra system resources.


> Oleg.
> --
>      Oleg Broytman              phd at
>            Programmers don't die, they just GOSUB without RETURN.
Hmmm...maybe that applies for programmers who not kept up with the
times only? I'd rather raise StopVitalFunctions when my time comes.

From steve at  Mon Oct  8 15:03:55 2012
From: steve at (Steven D'Aprano)
Date: Tue, 09 Oct 2012 00:03:55 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 07/10/12 08:45, Andrew McNabb wrote:

> To clarify my point: in Python, "/" is not just a symbol--it
> specifically means "div".

I think that's wrong. / is a symbol that means whatever the class
gives it. It isn't like __init__ or __call__ that have defined
language semantics, and there is no rule that says that / means
division. I'll grant you that it's a strong convention, but it is
just a convention.

> Overriding the div operator requires creating a "__div__" special
> method,

Actually it is __truediv__ in Python 3. __div__ no longer has any
meaning or special status.

But it's just a name. __add__ doesn't necessarily perform addition,
__sub__ doesn't necessarily perform subtraction, and __or__ doesn't
necessarily have anything to do with either bitwise or boolean OR.
Why should we insist that __*div__ (true, floor or just plain div)
must only be used for numeric division when we don't privilege other
numeric operators like that?


From guido at  Mon Oct  8 17:30:12 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 08:30:12 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 9:44 PM, Ben Darnell <ben at> wrote:
> On Sun, Oct 7, 2012 at 7:01 PM, Guido van Rossum <guido at> wrote:
>> As long as it's not so low-level that other people shy away from it.
> That depends on the target audience.  The low-level IOLoop and Reactor
> are pretty similar -- you can implement one in terms of the other --
> but as you move up the stack cross-compatibility becomes harder.  For
> example, if I wanted to implement tornado's IOStreams in twisted, I
> wouldn't start with the analogous class in twisted (Protocol?), I'd go
> down to the Reactor and build from there, so putting something
> IOStream or Protocol in asycore2 wouldn't do much to unify the two
> worlds.  (it would help people build async stuff with the stdlib
> alone, but at that point it becomes more like a peer or competitor to
> tornado and twisted instead of a bridge between them)

Sure. And of course we can't expect Twisted and Tornado to just merge
projects. They each have different strengths and weaknesses and they
each have strong opinions on how things should be done. I do get your
point that none of that is incompatible with a shared reactor

>> I also have a feeling that one way or another this will require
>> cooperation between the Twisted and Tornado developers in order to
>> come up with a compromise that both are willing to conform to in a
>> meaningful way. (Unfortunately I don't know how to define "meaningful
>> way" more precisely here. I guess the idea is that almost all things
>> *using* an event loop use the standardized abstract API without caring
>> whether underneath it's Tornado, Twisted, or some simpler thing in the
>> stdlib.
> I'd phrase the goal as being able to run both Tornado and Twisted in
> the same thread without any piece of code needing to know about both
> systems.  I think that's achievable as far as core functionality goes.
>  I expect both sides have some lesser-used functionality that might
> not make it into the stdlib version, but as long as it's possible to
> plug in a "real" IOLoop or Reactor when needed it should be OK.

Sounds good. I think a reactor is always going to be an extension of
the shared spec.

>> That's interesting. I haven't found the need for this yet. Is it
>> really so common that you can't write this as a Future() constructor
>> plus a call to add_done_callback()? Or is there some subtle semantic
>> difference?
> It's a Future constructor, a (conditional) add_done_callback, plus the
> calls to set_result or set_exception and the with statement for error
> handling.  In full:
> def future_wrap(f):
>     @functools.wraps(f)
>     def wrapper(*args, **kwargs):
>         future = Future()
>         if kwargs.get('callback') is not None:
>             future.add_done_callback(kwargs.pop('callback'))
>         kwargs['callback'] = future.set_result
>         def handle_error(typ, value, tb):
>             future.set_exception(value)
>             return True
>         with ExceptionStackContext(handle_error):
>             f(*args, **kwargs)
>         return future
>     return wrapper

Hmm... I *think* it automatically adds a special keyword 'callback' to
the *call* site so that you can do things like

  fut = some_wrapped_func(blah, callback=my_callback)

and then instead of using yield to wait for the callback, put the
continuation of your code in the my_callback() function. But it also
seems like it passes callback=future.set_result as the callback to the
wrapped function, which looks to me like that function was apparently
written before Futures were widely used. This seems pretty impure to
me and I'd like to propose a "future" where such functions either be
given the Future where the result is expected, or (more commonly) the
function would create the Future itself.

Unless I'm totally missing the programming model here.

PS. I'd like to learn more about ExceptionStackContext() -- I've
struggled somewhat with getting decent tracebacks in NDB.

>>> In Tornado the Future is created by a decorator
>>> and hidden from the asynchronous function (it just sees the callback),
>> Hm, interesting. NDB goes the other way, the callbacks are mostly used
>> to make Futures work, and most code (including large swaths of
>> internal code) uses Futures. I think NDB is similar to monocle here.
>> In NDB, you can do
>>   f = <some function returning a Future>
>>   r = yield f
>> where "yield f" is mostly equivalent to f.result(), except it gives
>> better opportunity for concurrency.
> Yes, tornado's gen.engine does the same thing here.  However, the
> stakes are higher than "better opportunity for concurrency" - in an
> event loop if you call future.result() without yielding, you'll
> deadlock if that Future's task needs to run on the same event loop.

That would depend on the semantics of the event loop implementation.
In NDB's event loop, such a .result() call would just recursively
enter the event loop, and you'd only deadlock if you actually have two
pieces of code waiting for each other's completion.

>> I am currently trying to understand if using "yield from" (and
>> returning a value from a generator) will simplify things. For example
>> maybe the need for a special decorator might go away. But I keep
>> getting headaches -- perhaps there's a Monad involved. :-)
> I think if you build generator handling directly into the event loop
> and use "yield from" for calls from one async function to another then
> you can get by without any decorators.  But I'm not sure if you can do
> that and maintain any compatibility with existing non-generator async
> code.
> I think the ability to return from a generator is actually a bigger
> deal than "yield from" (and I only learned about it from another
> python-ideas thread today).  The only reason a generator decorated
> with @tornado.gen.engine needs a callback passed in to it is to act as
> a psuedo-return, and a real return would prevent the common mistake of
> running the callback then falling through to the rest of the function.

Ah, so you didn't come up with the clever hack of raising an exception
to signify the return value. In NDB, you raise StopIteration (though
it is given the alias 'Return' for clarity) with an argument, and the
wrapper code that is responsible for the Future takes the value from
the StopIteration exception and passes it to the Future's

> For concreteness, here's a crude sketch of what the APIs I'm talking
> about would look like in use (in a hypothetical future version of
> tornado).
> @future_wrap
> @gen.engine
> def async_http_client(url, callback):
>     parsed_url = urlparse.urlsplit(url)
>     # works the same whether the future comes from a thread pool or @future_wrap

And you need the thread pool because there's no async version of
getaddrinfo(), right?

>     addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
>     stream = IOStream(socket.socket())
>     yield stream.connect((addrinfo[0][-1]))
>     stream.write('GET %s HTTP/1.0' % parsed_url.path)

Why no yield in front of the write() call?

>     header_data = yield stream.read_until('\r\n\r\n')
>     headers = parse_headers(header_data)
>     body_data = yield stream.read_bytes(int(headers['Content-Length']))
>     stream.close()
>     callback(body_data)
> # another function to demonstrate composability
> @future_wrap
> @gen.engine
> def fetch_some_urls(url1, url2, url3, callback):
>     body1 = yield async_http_client(url1)
>     # yield a list of futures for concurrency
>     future2 = yield async_http_client(url2)
>     future3 = yield async_http_client(url3)
>     body2, body3 = yield [future2, future3]
>     callback((body1, body2, body3))

This second one is nearly identical to the way we it's done in NDB.
However I think you have a typo -- I doubt that there should be yields
on the lines creating future2 and future3.

> One hole in this design is how to deal with callbacks that are run
> multiple times.  For example, the IOStream read methods take both a
> regular callback and an optional streaming_callback (which is called
> with each chunk of data as it arrives).  I think this needs to be
> modeled as something like an iterator of Futures, but I haven't worked
> out the details yet.

Ah. Yes, that's a completely different kind of thing, and probably
needs to be handled in a totally different way. I think it probably
needs to be modeled more like an infinite loop where at the blocking
point (e.g. a low-level read() or accept() call) you yield a Future.
Although I can see that this doesn't work well with the IOLoop's
concept of file descriptor (or other event source) registration.

--Guido van Rossum (

From guido at  Mon Oct  8 17:34:29 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 08:34:29 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 4:10 AM, Floris Bruynooghe <flub at> wrote:
> On 8 October 2012 03:49, Guido van Rossum <guido at> wrote:
>> My current goal is to see if it would be possible to come up with an
>> abstraction that makes it possible to write protocol handlers that are
>> independent from the rest of the infrastructure (e.g. transport,
>> reactor).
> This would be my ideal situation too and I think this is what PEP 3153
> was trying to achieve.  While I am an greenlet (eventlet) user I agree
> with the sentiment that it is not ideal to include it into the stdlib
> itself and instead work to a solution where we can share protocol
> implementations while having the freedom to run on a twisted reactor,
> tornado, something greenlet based or something in the stdlib depending
> on the preference of the developer.
> FWIW I have implemented the AgentX protocol based on PEP-3153 and it
> isn't complete yet (I had to go outside of what it defines).  It is
> also rather heavy handed and I'm not sure how one could migrate the
> stdlib to something like this.  So hopefully there are better
> solutions possible.

The more I think about this the more I think it will be really hard to
accomplish. I think we ought to try and go for goals that are easier
to obtain (and still useful) first, such as a common reactor/ioloop
specification and a "best practice" implementation (which may choose a
different polling mechanism depending on the platform OS) in the
stdlib. 3rd party code could then hook into this mechanism and offer
alternate reactors, e.g. integrated with a 3rd party GUI library such
as Wx, Gtk, Qt -- maybe we can offer Tk integration in the stdlib. 3rd
party reactors could also offer additional functionality, e.g.
advanced scheduling, threadpool integration, or whatever (my
imagination isn't very good here).

--Guido van Rossum (

From guido at  Mon Oct  8 17:35:08 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 08:35:08 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <christian at> wrote:
> Python's standard library doesn't contain in interface to I/O Completion
> Ports. I think a common event loop system is a good reason to add IOCP
> if somebody is up for the challenge.
> Would you prefer an IOCP wrapper in the stdlib or your own version?
> Twisted has its own Cython based wrapper, some other libraries use a
> libevent-based solution.

What's an IOCP?

--Guido van Rossum (

From guido at  Mon Oct  8 17:37:28 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 08:37:28 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 6:34 AM, Joachim K?nig <him at> wrote:
> On 08/10/2012 03:41 Ben Darnell wrote:
>> As for the higher-level question of what asynchronous code should look
>> like, there's a lot more room for spirited debate, and I don't think
>> there's enough consensus to declare a One True Way.  Personally, I'm
>> -1 on greenlets as a general solution (what if you have to call
>> MySQLdb or getaddrinfo?)
> The caller of such a potentially blocking function could:
> * spawn a new thread for the call
> * call the function inside the thread and collect return value or exception
> * register the thread (id) to inform the event loop (scheduler) it's waiting for it's completion
> * yield (aka "switch" in greenlet) to the event loop / scheduler
> * upon continuation either continue with the result or reraise the exception that happened in the thread

Ben just posted an example of how to do exactly that for getaddrinfo().

> Unfortunately on Unix systems select/poll/kqueue cannot specify threads as
> event resources, so an additional pipe descriptor would be needed for the scheduler
> to detect thread completions without blocking (threads would write to the pipe upon
> completion), not elegant but doable.

However it must be done this seems a useful thing to solve once and
for all in a standard reactor specification and stdlib implementation.
(Ditto for signal handlers BTW.)

--Guido van Rossum (

From amcnabb at  Mon Oct  8 18:06:17 2012
From: amcnabb at (Andrew McNabb)
Date: Mon, 8 Oct 2012 10:06:17 -0600
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 09, 2012 at 12:03:55AM +1100, Steven D'Aprano wrote:
> / is a symbol that means whatever the class
> gives it. It isn't like __init__ or __call__ that have defined
> language semantics, and there is no rule that says that / means
> division. I'll grant you that it's a strong convention, but it is
> just a convention.

I'll grant you that the semantics of the __truediv__ method are defined
by convention.

> But it's just a name. __add__ doesn't necessarily perform addition,
> __sub__ doesn't necessarily perform subtraction, and __or__ doesn't
> necessarily have anything to do with either bitwise or boolean OR.
> Why should we insist that __*div__ (true, floor or just plain div)
> must only be used for numeric division when we don't privilege other
> numeric operators like that?

__add__ for strings doesn't mean numerical addition, but people find it
perfectly natural to speak of "adding two strings," for example.  Seeing
`string1.__add__(string2)` is readable, as is `operator.add(string1,
string2)`.  Every other example of operator overloading that I find
tasteful is analogous enough to the numerical operators to retain use
the name.

Since this really is a matter of personal taste, I'll end my
participation in this discussion by voicing support for Nick Coghlan's
suggestion of a `join` method, whether it's named `join` or `append` or
something else.

Andrew McNabb
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

From guido at  Mon Oct  8 18:19:31 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 09:19:31 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 7:33 PM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> On Sun, Oct 7, 2012 at 9:51 PM, Guido van Rossum <guido at> wrote:
>> Referencing Java here is absurd and I still consider this suggestion
>> as a troll. Python is not in any way based on Java.
> I did not suggest that.  Sorry if it came out this way.  I am well
> aware that Python and Java were invented independently and have
> different roots.  (IIRC, Java was born from Oak and Python from ABC
> and Oak and ABC were both developed in the 1980s.)  IEEE 784 precedes
> both languages and one team decided that equality reflexivity for
> hashable objects was more important than IEEE 784 compliance while the
> other decided otherwise.
> Many Python features (mostly library) are motivated by C.  In the 90s,
> "because C does it this way" was a good explanation for a language
> feature.  Doing things differently from the "C way", on the other hand
> would deserve an explanation.  These days, C is rarely first language
> that a student learns.  Hopefully Python will take this place in not
> so distant future, but many students graduated in late 90s - early
> 2000s knowing nothing but Java.   As a result, these days it is a
> valid question to ask about a language feature: "Why does Python do X
> differently from Java?"  Hopefully in most cases the answer is
> "because Python does it better."

Explaining the differences between Python and Java is a job for
educators, not for the language reference.

I agree that documenting APIs as "this behaves just like C" does not
have the same appeal -- but that turn of phrase was mostly used for
system calls anyway, and for those I think that a slightly modified
redirection (to the OS man pages) is still completely appropriate.

> In case of nan != nan, I would really like to know a modern reason why
> Python's way is better.  Better compliance with a 20-year old standard
> does not really qualify.

I am not aware of an update to the standard. Being 20 years old does
not make it outdated.

Again, there are plenty of reasons (you have to ask the numpy folks),
but I don't think it is the job of the Python reference manual to give
its motivations. It just needs to explain how things work, and if that
can be done best by deferring to an existing standard that's fine.

Of course a tutorial should probably mention this behavior, but a
tutorial does not have the task of giving you the reason for every
language feature either -- most readers of the tutorial don't have the
context yet to understand those reasons, many don't care, and whether
they like it or not, it's not going to change.

You keep getting very close to suggesting to make changes, despite
your insistence that you just want to know the reason. But assuming
you really just are asking in an obnoxious way for the reason, I
recommand that you ask the people who wrote the IEEE 754 standard. I'm
sure their explanation (which I recall having read once but can't
reproduce here) makes sense for Python too.

--Guido van Rossum (

From guido at  Mon Oct  8 18:25:16 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 09:25:16 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 7:35 PM, Ned Batchelder <ned at> wrote:
> I don't understand the reluctance to address a common conceptual speed-bump
> in the docs.  After all, the tutorial has an entire chapter
> ( that explains how
> floats work, even though they work exactly as IEEE 754 says they should.

I'm sorry. I didn't intend to refuse to document the behavior. I was
mostly reacting to things I thought I read between the lines -- the
suggestion that there is no reason for the NaN behavior except silly
compatibility with an old standard that nobody cares about. From this
it is only a small step to reading (again between the lines) the
suggesting to change the behavior.

> A sentence in section 5.4 (Numeric Types) would help.  Something like, "In
> accordance with the IEEE 754 standard, NaN's are not equal to any value,
> even another NaN.  This is because NaN doesn't represent a particular
> number, it represents an unknown result, and there is no way to know if one
> unknown result is equal to another unknown result."

That sounds like a great addition to the docs, except for the nit that
I don't like writing the plural of NaN as "NaN's" -- I prefer "NaNs"
myself. Also, the words here can still cause confusion. The exact
behavior is that every one of the 6 comparison operators (==, !=, <,
<=, >, >=) returns False when either argument (or both) is a NaN. I
think your suggested words could lead someone to believe that they
mean that x != NaN or NaN != Nan would return True.

Anyway, once we can agree to words I agree that we should update that section.

--Guido van Rossum (

From guido at  Mon Oct  8 18:29:42 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 09:29:42 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sun, Oct 7, 2012 at 8:09 PM, Rob Cliffe <rob.cliffe at> wrote:
> I understand that the undefined result of a computation is not the same as
> the undefined result of another computation.
> (E.g. one might represent positive infinity, another might represent
> underflow or loss of accuracy.)
> But I can't help feeling (strongly) that the result of a computation should
> be equal to itself.
> In other words, after
>     x = float('nan')
>     y = float('nan')
> I would expect
>     x != y
> but
>     x == x

That's too bad. It sounds like this mailing list really wouldn't have
enough space in its margins to convince you otherwise. And yet you are

> After all, how much sense does this make (I got this in a quick test with
> Python 2.7.3):
>>>> x=float('nan')
>>>> x is x
> True            # Well I guess you'd sorta expect this
>>>> x==x
> False           # You what?
>>>> D = {1:x, 2:x}
>>>> D[1]==D[2]
> False          # I see, both NANs - hmph!
>>>> [x]==[x]
> True            # Oh yeh, it doesn't always work that way then?
> Making equality non-reflexive feels utterly wrong to me, partly no doubt
> because of my mathematical background,

Do you have any background at all in *numerical* mathematics?

> partly because of the difficulty in
> implementing container objects and algorithms and God knows what else when
> you have to remember that some of the objects they may deal with may not be
> equal to themselves.  In particular the difference between my last two
> examples ( D[1]!=D[2] but [x]==[x] ) looks impossible to justify except by
> saying that for historical reasons the designers of lists and the designers
> of dictionaries made different - but entirely reasonable - assumptions about
> the equality relation, and (perhaps) whether identity implies equality (how
> do you explain to a Python learner that it doesn't (pathological code
> examples aside) ???).
> Couldn't each NAN when generated contain something that identified it
> uniquely, so that different NANs would always compare as not equal, but any
> given NAN would compare equal to itself?

It's not about equality. If you ask whether two NaNs are *unequal* the
answer is *also* False.

I admit that a tutorial section describing the behavior would be good.
But I am less than ever convinced that it's possible to explain the
*reason* for the behavior in a tutorial.

--Guido van Rossum (

From massimo.dipierro at  Mon Oct  8 18:38:31 2012
From: massimo.dipierro at (Massimo DiPierro)
Date: Mon, 8 Oct 2012 11:38:31 -0500
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

The + symbol means addition and union of disjoint sets. A path (including a fs path) is a set of links (for a fs path, a link is a folder name). Using the + symbols has a natural interpretation as concatenation of subpaths (sets) to for form a longer path (superset).

The / symbol means the quotient of a group. It always returns a subgroup. When I see path1 / path2 I would expect it to return all paths that start by path2 or contain path2, not concatenation.

The fact that string paths in Unix use the / to represent concatenation is accidental. That's just how the path is serialized into a string. In fact Windows uses a different separator. I do think a serialized representation of an object makes a good example for its abstract representation.


On Oct 8, 2012, at 11:06 AM, Andrew McNabb wrote:

> On Tue, Oct 09, 2012 at 12:03:55AM +1100, Steven D'Aprano wrote:
>> / is a symbol that means whatever the class
>> gives it. It isn't like __init__ or __call__ that have defined
>> language semantics, and there is no rule that says that / means
>> division. I'll grant you that it's a strong convention, but it is
>> just a convention.
> I'll grant you that the semantics of the __truediv__ method are defined
> by convention.
>> But it's just a name. __add__ doesn't necessarily perform addition,
>> __sub__ doesn't necessarily perform subtraction, and __or__ doesn't
>> necessarily have anything to do with either bitwise or boolean OR.
>> Why should we insist that __*div__ (true, floor or just plain div)
>> must only be used for numeric division when we don't privilege other
>> numeric operators like that?
> __add__ for strings doesn't mean numerical addition, but people find it
> perfectly natural to speak of "adding two strings," for example.  Seeing
> `string1.__add__(string2)` is readable, as is `operator.add(string1,
> string2)`.  Every other example of operator overloading that I find
> tasteful is analogous enough to the numerical operators to retain use
> the name.
> Since this really is a matter of personal taste, I'll end my
> participation in this discussion by voicing support for Nick Coghlan's
> suggestion of a `join` method, whether it's named `join` or `append` or
> something else.
> --
> Andrew McNabb
> PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Mon Oct  8 18:45:34 2012
From: barry at (Barry Warsaw)
Date: Mon, 8 Oct 2012 12:45:34 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
References: <>
Message-ID: <>

On Oct 06, 2012, at 03:00 PM, Guido van Rossum wrote:

>This is an incredibly important discussion.

Indeed.  If Python gets it right, it could be yet another killer reason for
upgrading to Python 3, at least for the growing subset of event-driven

>(1) How importance is it to offer a compatibility path for asyncore?

I've written and continue to use async-based code.  I don't personally care
much about compatibility.  I've use async because it was the simplest and most
stdlibby of the options for the Python versions I can use, but I have no love
for it.  If there were a better, more readable and comprehensible way to do
it, I'd ditch the async-based versions as soon as possible.

>I would have thought that offering an integration path forward for Twisted
>and Tornado would be more important.

Agreed.  I share the same dream as someone else in this thread mentioned.  It
would be really fantastic if the experts in a particular protocol could write
support for that protocol Just Once and have it as widely shared as possible.
Maybe this is an unrealistic dream, but now's the time to have them anyway.

Even something like the email package could benefit from this.  The FeedParser
is our attempt to support asynchronous reading of email data for parsing.  I'm
not so sure that the asynchronous part of that is very useful.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From guido at  Mon Oct  8 18:47:48 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 09:47:48 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sun, Oct 7, 2012 at 8:46 PM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> On Sun, Oct 7, 2012 at 11:09 PM, Rob Cliffe <rob.cliffe at> wrote:
>> Couldn't each NAN when generated contain something that identified it
>> uniquely, so that different NANs would always compare as not equal, but any
>> given NAN would compare equal to itself?
> If we take this route and try to distinguish NaNs with different
> payload, I am sure you will want to distinguish between -0.0 and 0.0
> as well.  The later would violate transitivity in -0.0 == 0 == 0.0.
> The only sensible thing to do with NaNs is either to treat them all
> equal (the Eiffel way) or to stick to IEEE default.
> I don't think NaN behavior in Python is a result of a deliberate
> decision to implement IEEE 754.

Oh, it was. It was very deliberate. Like in many other areas of
Python, I refused to invent new rules when there was existing behavior
elsewhere that I could borrow and with which I had no reason to
quibble. (And in the case of floating point behavior, there really is
no alternate authority to choose from besides IEEE 754. Languages that
disagree with it do not make an authority.)

Even if I *did* have reasons to quibble with the NaN behavior (there
were no NaNs on the mainframe where I learned programming, so they
were as new and weird to me as they are to today's novices), Tim
Peters, who has implemented numerical libraries for Fortran compilers
in a past life and is an absolute authority on floating points,
convinced me to follow IEEE 754 as closely as I could.

> If that was the case, why 0.0/0.0 does not produce NaN?

Easy. It was an earlier behavior, from the days where IEEE 754
hardware did not yet rule the world, and Python didn't have much op an
opinion on float behavior at all -- it just did whatever the platform
did. Infinities and NaNs were not on my radar (I hadn't met Tim yet
:-). However division by zero (which is not just a float but also an
int behavior) was something that I just had to address, so I made the
runtime check for it and raise an exception. When we became more
formal about this, we considered changing this but decided that the
ZeroDivisionError was more user-friendly than silently propagating
NaNs everywhere, given the typical use of Python. (I suppose we could
make it optional, and IIRC that's what Decimal does -- but for floats
we don't have a well-developed numerical context concept yet.)

> Similarly, Python math library does not produce
> infinities where IEEE 754 compliant library should:
>>>> math.log(0.0)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> ValueError: math domain error

Again, this mostly comes from backward compatibility with the math
module's origins (and it is as old as Python itself, again predating
its use of IEEE 754). AFAIK Tim went over the math library very
carefully and cleaned up what he could, so he probably thought about
this as well. Also, IIUC the IEEE library prescribes exceptions as
well as return values; e.g. "man 3 log" on my OSX computer says that
log(0) returns -inf as well as raise a divide-by-zero exception. So I
think this is probably compliant with the standard -- one can decide
to ignore the exceptions in certain contexts and honor them in others.
(Probably even the 1/0 behavior can be defended this way.)

> Some other operations behave inconsistently:
>>>> 2 * 10.**308
> inf
> but
>>>> 10.**309
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> OverflowError: (34, 'Result too large')

Probably the same. IEEE 754 may be more complex than you think!

> I think non-reflexivity of nan in Python is an accidental feature.

It is not.

> Python's float type was not designed with NaN in mind and until
> recently, it was relatively difficult to create a nan in pure python.

And when we did add NaN and Inf we thought about the issues carefully.

> It is also not true that IEEE 754 requires that nan == nan is false.
> IEEE 754 does not define operator '==' (nor does it define boolean
> false).  Instead, IEEE defines a comparison operation that can have
> one of four results: >, <, =, or unordered.  The standard does require
> than NaN compares unordered with anything including itself, but it
> does not follow that a language that defines an == operator with
> boolean results must define it so that nan == nan is false.

Are you proposing changes again? Because it sure sounds like you are
unhappy with the status quo and will not take an explanation, however
authoritative it is.

Given a language with the 6 comparisons like Python (and most do),
they have to be mapped to the IEEE comparison *somehow*, and I believe
we chose one of the most logical translations imaginable (given that
nobody likes == and != raising exceptions).

--Guido van Rossum (

From mikegraham at  Mon Oct  8 19:04:00 2012
From: mikegraham at (Mike Graham)
Date: Mon, 8 Oct 2012 13:04:00 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 11:35 AM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <christian at> wrote:
>> Python's standard library doesn't contain in interface to I/O Completion
>> Ports. I think a common event loop system is a good reason to add IOCP
>> if somebody is up for the challenge.
>> Would you prefer an IOCP wrapper in the stdlib or your own version?
>> Twisted has its own Cython based wrapper, some other libraries use a
>> libevent-based solution.
> What's an IOCP?

It's the non-crappy select equivalent on Windows.


From p.f.moore at  Mon Oct  8 19:07:25 2012
From: p.f.moore at (Paul Moore)
Date: Mon, 8 Oct 2012 18:07:25 +0100
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 October 2012 18:04, Mike Graham <mikegraham at> wrote:
>> What's an IOCP?
> It's the non-crappy select equivalent on Windows.

I/O Completion port, just for clarity :-)

From ncoghlan at  Mon Oct  8 19:41:42 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Oct 2012 23:11:42 +0530
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 4:41 PM, Paul Moore <p.f.moore at> wrote:
> On 8 October 2012 11:31, Nick Coghlan <ncoghlan at> wrote:
>> It's important to remember that you can't readily search for syntactic
>> characters or common method names to find out what they mean, and
>> these days that kind of thing should be taken into account when
>> designing an API. "p.subpath('foo', 'bar')" looks like executable
>> pseudocode for creating a new path based on existing one to me, unlike
>> "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')".
> Until precisely this point in your email, I'd been completely
> confused, because I thought that p.supbath(xxx) was some sort of "is
> xxx a subpath of p" query.

That's OK, I don't set the bar for my mnemonics *that* high: I use
Guido's rule that good names are easy to remember once you know what
they mean. Being able to guess precisely just from the name is a nice
bonus, but not strictly necessary.

> It never occurred to me that it was the
> os.path.join equivalent operation. In fact, I'm not sure where you got
> it from, as I couldn't find it in either the PEP or in pathlib's
> documentation.

I made it up by using "make subpath" as the reverse of "get relative
path".  The "is subpath" query could be handled by calling

I'd be fine with "joinpath" as well (that is what uses to
avoid the conflict with str.join)

> I'm not unhappy with using a method for creating a new path based on
> an existing one (none of the operator forms seems particularly
> compelling to me) but I really don't like subpath as a name.
> I don't dislike p.join(parts) as it links back nicely to os.path.join.
> I can't honestly see anyone getting confused in practice. But I'm not
> so convinced that I would want to insist on it.

I really don't like it because of the semantic conflict with str.join.
That semantic conflict is the reason I only do "from os.path import
join as joinpath" or else call it as "os.path.join" - I find that
using the bare "join" directly is too hard to interpret when reading

I consider .append() and .extend() unacceptable for the same reason -
they're too closely tied to mutating method semantics on sequences.

> -0 on a convenience operator form. Mainly because "only one way to do
> it" and the general controversy over which is the best operator to
> use, suggests that leaving the operator form out altogether at least
> in the initial implementation is the better option.

Right, this is my main point as well. The method form *has* to exist.
I am *not* convinced that the cute syntactic shorthands actually
*improve* readability - they improve *brevity*, but that's not the
same thing.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Mon Oct  8 19:59:58 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 8 Oct 2012 23:29:58 +0530
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 6, 2012 at 9:44 PM, Calvin Spealman <ironfroggy at> wrote:
> Responding late, but I didn't get a chance to get my very strong
> feelings on this proposal in yesterday.
> I do not like it. I'll give full disclosure and say that I think our
> earlier failure to include the path library in the stdlib has been a
> loss for Python and I'll always hope we can fix that one day. I still
> hold out hope.
> It feels like this proposal is "make it object oriented, because
> object oriented is good" without any actual justification or obvious
> problem this solves. The API looks clunky and redundant, and does not
> appear to actually improve anything over the facilities in the os.path
> module. This takes a lot of things we can already do with paths and
> files and remixes them into a not-so intuitive API for the sake of
> change, not for the sake of solving a real problem.

The PEP needs to better articulate the rationale, but the key points are:
- better abstraction and encapsulation of cross-platform logic so file
manipulation algorithms written on Windows are more likely to work
correctly on POSIX systems (and vice-versa)
- improved ability to manipulate paths with Windows semantics on a
POSIX system (and vice-versa)
- better support for creation of "mock" filesystem APIs

> As for specific problems I have with the proposal:
> Frankly, I think not keeping the / operator for joining is a huge
> mistake. This is the number one best feature of path and despite that
> many people don't like it, it makes sense. It makes our most common
> path operation read very close to the actual representation of the
> what you're creating. This is great.

It trades readability (and discoverability) for brevity. Not good.

> Not inheriting from str means that we can't directly path these path
> objects to existing code that just expects a string, so we have a
> really hard boundary around the edges of this new API. It does not
> lend itself well to incrementally transitioning to it from existing
> code.

It's the exact design philosophy as was used in the creation of the
new ipaddress module: the objects in ipaddress must still be converted
to a string or integer before they can be passed to other operations
(such as the socket module APIs). Strings and integers remain the data
interchange formats here as well (although far more focused on strings
in the path case).

> The stat operations and other file-facilities tacked on feel out of
> place, and limited. Why does it make sense to add these facilities to
> path and not other file operations? Why not give me a read method on
> paths? or maybe a copy? Putting lots of file facilities on a path
> object feels wrong because you can't extend it easily. This is one
> place that function(thing) works better than thing.function()

Indeed, I'm personally much happier with the "pure" path classes than
I am with the ones that can do filesystem manipulation. Having both
"" and "open(str(p), mode)" seems strange. OTOH, I can see
the attraction in being able to better fake filesystem access through
the method API, so I'm willing to go along with it.

> Overall, I'm completely -1 on the whole thing.

I find this very hard to square with your enthusiastic support for Like ipaddr, which needed to clean up its semantic model
before it could be included in the standard library (as ipaddress), we
need a clean cross-platform semantic model for path objects before a
convenience API can be added for manipulating them.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From steve at  Mon Oct  8 20:23:45 2012
From: steve at (Steven D'Aprano)
Date: Tue, 09 Oct 2012 05:23:45 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 08/10/12 21:31, Nick Coghlan wrote:
> I've said before that I like the general shape of the pathlib API and
> that's still the case. It's the only OO API I've seen that's
> semantically clean enough for me to support introducing it as "the"
> standard path abstraction in the standard library.

The use of indexing to join path components:

     # Example from the PEP
     >>> p = PurePosixPath('foo')
     >>> p['bar']

is an absolute deal breaker for me. I'd rather stick with the status quo
than have to deal with something which so clearly shouts "index/key lookup"
but does something radically different (join/concatenate components).

I would *much* rather use the / or + operator, but I think even better
(and less likely to cause arguments about the operator) is an explicit
`join` method. After all, we call it "joining path components", so the name
is intuitive (at least for English speakers) and simple.

I don't believe that there will be confusion with str.join -- we already
have an os.path.join method, and I haven't seen any sign of confusion
caused by that.

> It's important to remember that you can't readily search for syntactic
> characters or common method names to find out what they mean, and
> these days that kind of thing should be taken into account when
> designing an API.

To some degree, that's a failure of the search engine, not of the
language. Why can't we type "symbol=+" into the search field and
get information about addition? If Google can let you do mathematical
calculations in their search field, surely we could search for symbols?
But I digress.

>"p.subpath('foo', 'bar')" looks like executable
> pseudocode for creating a new path based on existing one to me,

That notation quite possibly goes beyond unintuitive to downright
perverse. You are using a method called "subpath" to generate a
*superpath* (deeper, longer path which includes p as a part).

p = /a/b/c
q = /a/b/c/d/e  # p.subpath(d, e)

p is a subpath of q, not the other way around: q is a path PLUS some
subdirectories of that path, i.e. a longer path.

It's also a pretty unusual term outside of graph theory: Googling finds
fewer than 400,000 references to "subpath". It gets used in graphics
applications, some games, and in an extension to mercurial for adding
symbolic names to repo URLs. I can't see any sign that it is used in
the sense you intend.

> unlike
> "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')".

Okay, I'll grant you that we'll probably never get a consensus on
operators + versus / but I really don't understand why you think that
p.join is unsuitable for a method which joins path components.


From solipsis at  Mon Oct  8 20:36:37 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 8 Oct 2012 20:36:37 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
References: <>
Message-ID: <>

On Mon, 8 Oct 2012 13:04:00 -0400
Mike Graham <mikegraham at> wrote:
> On Mon, Oct 8, 2012 at 11:35 AM, Guido van Rossum <guido at> wrote:
> > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <christian at> wrote:
> >> Python's standard library doesn't contain in interface to I/O Completion
> >> Ports. I think a common event loop system is a good reason to add IOCP
> >> if somebody is up for the challenge.
> >>
> >> Would you prefer an IOCP wrapper in the stdlib or your own version?
> >> Twisted has its own Cython based wrapper, some other libraries use a
> >> libevent-based solution.
> >
> > What's an IOCP?
> It's the non-crappy select equivalent on Windows.

Except that it's not exactly an equivalent, it's a whole different
programming model ;)

(but I understand what you mean: it allows to do non-blocking I/O on an
arbitrary number of objects in parallel)



Software development and contracting:

From guido at  Mon Oct  8 20:39:03 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 11:39:03 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 10:36 AM, Guido van Rossum <guido at> wrote:
>>> It's not about equality. If you ask whether two NaNs are *unequal* the
>>> answer is *also* False.
>> Does this mean that the following behaviour of lists is a bug?
>> >>> x=float('NAN')
>> >>> [x]==[x], [x]<=[x], [x]>=[x]
>> (True, True, True)
> No. That's a special case in the comparisons for sequences.

[Now that I'm back at a real keyboard I can elaborate...]

This applies to all container comparisons: without the rule that if
two contained items reference the same object they are to be
considered equal without calling their __eq__, containers couldn't
take the shortcut that a container is always equal to itself (i.e. c1
is c2 => c1 == c2). Without this shortcut, container comparisons would
be much more expensive: any time a large container was compared to
itself, it would be forced to recursively compare all the contained
items. You might say that it has to do this anyway when comparing to a
container that is not itself, but if the anser is "unequal" the
comparison can stop as soon as two unequal items are found, whereas if
the answer is "equal" you end up comparing all items. For two
different containers there is no possible shortcut, but comparing a
container to itself is quite common and really does deserve the
shortcut. We discussed this in the past and always came to the same
conclusion: despite the rules for NaN, the shortcut for containers is
required. A similar shortcut exists for 'x in [x]' BTW.

--Guido van Rossum (

From ncoghlan at  Mon Oct  8 20:39:23 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Oct 2012 00:09:23 +0530
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 11:53 PM, Steven D'Aprano <steve at> wrote:
>> "p.subpath('foo', 'bar')" looks like executable
>> pseudocode for creating a new path based on existing one to me,
> That notation quite possibly goes beyond unintuitive to downright
> perverse. You are using a method called "subpath" to generate a
> *superpath* (deeper, longer path which includes p as a part).

Huh? It's a tree structure. A subpath lives inside its parent path,
just as subnodes are children of their parent node. Agreed it's not a
widely used term though - it's a generalisation of subdirectory to
also cover file paths.

They're certainly not "super" anything, any more than a subdirectory
is really a superdirectory (which is what you appear to be arguing).

> Okay, I'll grant you that we'll probably never get a consensus on
> operators + versus / but I really don't understand why you think that
> p.join is unsuitable for a method which joins path components.

"p.join(r)" has exactly the same problem as "p + r": pass in a string
to a function expecting a path object and you get data corruption
instead of an exception. When you want *different* semantics, then
ducktyping is your enemy and it's necessary to take steps to avoid it,
include changing method names and avoiding some operators.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guido at  Mon Oct  8 20:40:28 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 11:40:28 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 11:36 AM, Antoine Pitrou <solipsis at> wrote:
> On Mon, 8 Oct 2012 13:04:00 -0400
> Mike Graham <mikegraham at> wrote:
>> On Mon, Oct 8, 2012 at 11:35 AM, Guido van Rossum <guido at> wrote:
>> > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <christian at> wrote:
>> >> Python's standard library doesn't contain in interface to I/O Completion
>> >> Ports. I think a common event loop system is a good reason to add IOCP
>> >> if somebody is up for the challenge.
>> >>
>> >> Would you prefer an IOCP wrapper in the stdlib or your own version?
>> >> Twisted has its own Cython based wrapper, some other libraries use a
>> >> libevent-based solution.
>> >
>> > What's an IOCP?
>> It's the non-crappy select equivalent on Windows.
> Except that it's not exactly an equivalent, it's a whole different
> programming model ;)
> (but I understand what you mean: it allows to do non-blocking I/O on an
> arbitrary number of objects in parallel)

Now I know what it is I think that (a) the abstract reactor design
should support IOCP, and (b) the stdlib should have enabled by default
IOCP when on Windows.

--Guido van Rossum (

From solipsis at  Mon Oct  8 20:40:14 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 8 Oct 2012 20:40:14 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Mon, 8 Oct 2012 10:06:17 -0600
Andrew McNabb <amcnabb at> wrote:
> Since this really is a matter of personal taste, I'll end my
> participation in this discussion by voicing support for Nick Coghlan's
> suggestion of a `join` method, whether it's named `join` or `append` or
> something else.

The join() method already exists in the current PEP, but it's less
convenient, synctatically, than either '[]' or '/'.



Software development and contracting:

From p.f.moore at  Mon Oct  8 20:47:43 2012
From: p.f.moore at (Paul Moore)
Date: Mon, 8 Oct 2012 19:47:43 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 October 2012 19:39, Nick Coghlan <ncoghlan at> wrote:
>> Okay, I'll grant you that we'll probably never get a consensus on
>> operators + versus / but I really don't understand why you think that
>> p.join is unsuitable for a method which joins path components.
> "p.join(r)" has exactly the same problem as "p + r": pass in a string
> to a function expecting a path object and you get data corruption
> instead of an exception. When you want *different* semantics, then
> ducktyping is your enemy and it's necessary to take steps to avoid it,
> include changing method names and avoiding some operators.

Ah, OK. I understand your objection now.

I concede that Path.join() is a bad idea based on this.
I still don't like subpath() though.
And pathjoin() is too likely to be redundant in real code:

temp_path = Path(tempfile.mkdtemp())
generated_file = temp_path.pathjoin('data_extract.csv')

I can't think of a better term, though :-(


From massimo.dipierro at  Mon Oct  8 20:48:05 2012
From: massimo.dipierro at (massimo.dipierro at
Date: Mon, 8 Oct 2012 11:48:05 -0700 (PDT)
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
Message-ID: <>

An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Mon Oct  8 20:49:03 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Oct 2012 00:19:03 +0530
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 12:10 AM, Antoine Pitrou <solipsis at> wrote:
> On Mon, 8 Oct 2012 10:06:17 -0600
> Andrew McNabb <amcnabb at> wrote:
>> Since this really is a matter of personal taste, I'll end my
>> participation in this discussion by voicing support for Nick Coghlan's
>> suggestion of a `join` method, whether it's named `join` or `append` or
>> something else.
> The join() method already exists in the current PEP, but it's less
> convenient, synctatically, than either '[]' or '/'.

Right. My objections boil down to:

1. The case has not been adequately made that a second way to do it is
needed. Therefore, the initial version should just include the method

2. Using "join" as the method name is a bad idea for the same reason
that using "+" as the operator syntax would be a bad idea: it can
cause erroneous output instead of an exception if a string is passed
where a Path object is expected.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From solipsis at  Mon Oct  8 20:47:07 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 8 Oct 2012 20:47:07 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
Message-ID: <>


Since there has been some controversy about the joining syntax used in
PEP 428 (filesystem path objects), I would like to run an informal poll
about it. Please answer with +1/+0/-0/-1 for each proposal:

- `p[q]` joins path q to path p
- `p + q` joins path q to path p
- `p / q` joins path q to path p
- `p.join(q)` joins path q to path p

(you can include a rationale if you want, but don't forget to vote :-))

Thank you


Software development and contracting:

From guido at  Mon Oct  8 20:53:11 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 11:53:11 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 11:49 AM, Nick Coghlan <ncoghlan at> wrote:
> On Tue, Oct 9, 2012 at 12:10 AM, Antoine Pitrou <solipsis at> wrote:
>> On Mon, 8 Oct 2012 10:06:17 -0600
>> Andrew McNabb <amcnabb at> wrote:
>>> Since this really is a matter of personal taste, I'll end my
>>> participation in this discussion by voicing support for Nick Coghlan's
>>> suggestion of a `join` method, whether it's named `join` or `append` or
>>> something else.
>> The join() method already exists in the current PEP, but it's less
>> convenient, synctatically, than either '[]' or '/'.
> Right. My objections boil down to:
> 1. The case has not been adequately made that a second way to do it is
> needed. Therefore, the initial version should just include the method
> API.
> 2. Using "join" as the method name is a bad idea for the same reason
> that using "+" as the operator syntax would be a bad idea: it can
> cause erroneous output instead of an exception if a string is passed
> where a Path object is expected.

It took me a while before I realized that 'abc'.join('def') already
has a meaning (returning 'dabceabcf'). But yes, this makes it a poor
choice for a Path method.

--Guido van Rossum (

From guido at  Mon Oct  8 20:54:06 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 11:54:06 -0700
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

I don't like any of those; I'd vote for another regular method, maybe

On Mon, Oct 8, 2012 at 11:47 AM, Antoine Pitrou <solipsis at> wrote:
> Hello,
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p
> - `p + q` joins path q to path p
> - `p / q` joins path q to path p
> - `p.join(q)` joins path q to path p
> (you can include a rationale if you want, but don't forget to vote :-))
> Thank you
> Antoine.
> --
> Software development and contracting:
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

--Guido van Rossum (

From solipsis at  Mon Oct  8 20:51:39 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 8 Oct 2012 20:51:39 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Tue, 9 Oct 2012 00:09:23 +0530
Nick Coghlan <ncoghlan at> wrote:
> On Mon, Oct 8, 2012 at 11:53 PM, Steven D'Aprano <steve at> wrote:
> >> "p.subpath('foo', 'bar')" looks like executable
> >> pseudocode for creating a new path based on existing one to me,
> >
> >
> > That notation quite possibly goes beyond unintuitive to downright
> > perverse. You are using a method called "subpath" to generate a
> > *superpath* (deeper, longer path which includes p as a part).
> Huh? It's a tree structure. A subpath lives inside its parent path,
> just as subnodes are children of their parent node. Agreed it's not a
> widely used term though - it's a generalisation of subdirectory to
> also cover file paths.

Well, it's a "subpath", except when it isn't:

  >>> p = Path('a')
  >>> p.join('/b')

I have to admit I didn't understand what your meant by "subpath" until
you explained that it was another name for "join". It really don't
think it's a good name. child() would be a good name, except for the
case above where you join with an absolute path (above). Actually,
child() could be a variant of join() which wouldn't allow for absolute



Software development and contracting:

From solipsis at  Mon Oct  8 20:56:34 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 8 Oct 2012 20:56:34 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

On Tue, 9 Oct 2012 00:19:03 +0530
Nick Coghlan <ncoghlan at> wrote:
> >
> > The join() method already exists in the current PEP, but it's less
> > convenient, synctatically, than either '[]' or '/'.
> Right. My objections boil down to:
> 1. The case has not been adequately made that a second way to do it is
> needed. Therefore, the initial version should just include the method
> API.

But you really want a short method name, otherwise it's better to have
a dedicated operator.  joinpath() definitely doesn't cut it, IMO.

(perhaps that's the same reason I am reluctant to use str.format() :-))

By the way, I also thought of using __call__, but for some reason I
think it tastes a bit bad ("too clever"?).

> 2. Using "join" as the method name is a bad idea for the same reason
> that using "+" as the operator syntax would be a bad idea: it can
> cause erroneous output instead of an exception if a string is passed
> where a Path object is expected.

Admitted, although I think the potential for confusion is smaller
than with "+" (I can't really articulate why, it's just that I fear
one much less than the other :-)).



Software development and contracting:

From guido at  Mon Oct  8 21:04:56 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 12:04:56 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 11:56 AM, Antoine Pitrou <solipsis at> wrote:
> On Tue, 9 Oct 2012 00:19:03 +0530
> Nick Coghlan <ncoghlan at> wrote:
>> >
>> > The join() method already exists in the current PEP, but it's less
>> > convenient, synctatically, than either '[]' or '/'.
>> Right. My objections boil down to:
>> 1. The case has not been adequately made that a second way to do it is
>> needed. Therefore, the initial version should just include the method
>> API.
> But you really want a short method name, otherwise it's better to have
> a dedicated operator.  joinpath() definitely doesn't cut it, IMO.

Maybe you're overreacting? The current notation for this operation is
os.path.join(p, q) which is even longer than p.pathjoin(q). To me the
latter is fine.

> (perhaps that's the same reason I am reluctant to use str.format() :-))
> By the way, I also thought of using __call__, but for some reason I
> think it tastes a bit bad ("too clever"?).

__call__ overloading is often overused. Please don't go there. It is
really hard to figure out what some (semi-)obscure operation means if
it uses __call__ overloading.

>> 2. Using "join" as the method name is a bad idea for the same reason
>> that using "+" as the operator syntax would be a bad idea: it can
>> cause erroneous output instead of an exception if a string is passed
>> where a Path object is expected.
> Admitted, although I think the potential for confusion is smaller
> than with "+" (I can't really articulate why, it's just that I fear
> one much less than the other :-)).

Personally I fear '+' much more -- to me, + can be used to add an
extension without adding a new directory level. If we *have* to
overload an operator, I'd prefer p/q over p[q] any day.

--Guido van Rossum (

From stefan at  Mon Oct  8 21:14:44 2012
From: stefan at (Stefan Krah)
Date: Mon, 8 Oct 2012 21:14:44 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum <guido at> wrote:
> Personally I fear '+' much more -- to me, + can be used to add an
> extension without adding a new directory level. If we *have* to
> overload an operator, I'd prefer p/q over p[q] any day.

'^' or '@' are used for concatenation in some languages. At least accidental
confusion with xor is pretty unlikely.

Stefan Krah

From ncoghlan at  Mon Oct  8 21:15:41 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Oct 2012 00:45:41 +0530
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum <guido at> wrote:
> I don't like any of those; I'd vote for another regular method, maybe
> p.pathjoin(q).

My own current preference is to take "p.joinpath(q)" straight from (

My rationale for disliking all of the poll options (clarified during
the previous discussions, so I can summarise it better now):

"p[q]", "p + q", "p / q": A method API is desirable *anyway* (for
better integration with all the tools that deal with callables in
general), and no compelling justification has been provided for
offering two ways to do it (mere brevity when writing doesn't cut it,
when the result is something that is more cryptic when reading and

"p + q", "p.join(q)": passing strings where path objects are needed is
expected to be a common error mode, especially for people just
starting to use the new API. It is desirable that such errors produce
an exception rather than silently producing an incorrect string.

I don't *love* joinpath as a name, I just don't actively dislike it
the way I do the four presented options (and it has the virtue of the precedent).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ethan at  Mon Oct  8 20:58:50 2012
From: ethan at (Ethan Furman)
Date: Mon, 08 Oct 2012 11:58:50 -0700
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

`p[q]`      -1
`p + q`     -1 ('+' should just tack on to the filename field)
`p / q`     +1
`p.join(q)` +0

From phd at  Mon Oct  8 21:17:16 2012
From: phd at (Oleg Broytman)
Date: Mon, 8 Oct 2012 23:17:16 +0400
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 08, 2012 at 08:47:07PM +0200, Antoine Pitrou <solipsis at> wrote:
> - `p[q]` joins path q to path p

-1. Confusing with p[-2]

> - `p + q` joins path q to path p

-0. What is "path addition"? Concatenation? Joining? Puzzled...

> - `p / q` joins path q to path p

+0. Again, "path division" is a bit strange but at least I understand
    '/' is the separation symbol.

> - `p.join(q)` joins path q to path p

+1. That one I love best, even with the name "join". I used to use
    os.path.join() quite extensively so there is no chance I confuse that
    with str.join().

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From dreamingforward at  Mon Oct  8 21:20:57 2012
From: dreamingforward at (Mark Adam)
Date: Mon, 8 Oct 2012 14:20:57 -0500
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 7, 2012 at 9:01 PM, Guido van Rossum <guido at> wrote:
> On Sun, Oct 7, 2012 at 6:41 PM, Ben Darnell <ben at> wrote:
>> I think there are
>> actually two separate issues here and it's important to keep them
>> distinct:  at a low level, there is a need for a standardized event
>> loop, while at a higher level there is a question of what asynchronous
>> code should look like.
> Yes, yes. I tried to bring up thing distinction. I'm glad I didn't
> completely fail.

Perhaps this is obvious to others, but (like hinted at above) there
seem to be two primary issues with event handlers:

1) event handlers for the machine-program interface (ex. network I/O)
2) event handlers for the program-user interface (ex. mouse I/O)

While similar, my gut tell me they have to be handled in completely
different way in order to preserve order (i.e. sanity).

This issue, for me, has come up with wanting to make a p2p network
application with VPython.


From python at  Mon Oct  8 21:22:06 2012
From: python at (MRAB)
Date: Mon, 08 Oct 2012 20:22:06 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-08 19:47, Antoine Pitrou wrote:
> Hello,
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p

-1. I would much prefer subscripting to be used to slice paths, e.g.
p[-1] == os.path.basename(p).

> - `p + q` joins path q to path p

+0. I would prefer that to mean "join without directory separator",
e.g. Path("/foo/bar") + ".txt" == Path("/foo/bar.txt").

> - `p / q` joins path q to path p

+1. Join with directory separator, e.g. Path("/foo") / "bar" ==

> - `p.join(q)` joins path q to path p


> (you can include a rationale if you want, but don't forget to vote :-))

From ncoghlan at  Mon Oct  8 21:24:03 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Oct 2012 00:54:03 +0530
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 12:34 AM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 8, 2012 at 11:56 AM, Antoine Pitrou <solipsis at> wrote:
>> Admitted, although I think the potential for confusion is smaller
>> than with "+" (I can't really articulate why, it's just that I fear
>> one much less than the other :-)).
> Personally I fear '+' much more -- to me, + can be used to add an
> extension without adding a new directory level. If we *have* to
> overload an operator, I'd prefer p/q over p[q] any day.

Yes, of all the syntactic shorthands, I also favour "/". However, I'm
also a big fan of starting with a minimalist core and growing it.
Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write
it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least
isn't going backwards, and is more obvious in isolation than "a / b /
c / d / e".


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From amcnabb at  Mon Oct  8 21:25:46 2012
From: amcnabb at (Andrew McNabb)
Date: Mon, 8 Oct 2012 13:25:46 -0600
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 08, 2012 at 08:47:07PM +0200, Antoine Pitrou wrote:
> - `p[q]` joins path q to path p


> - `p + q` joins path q to path p

-1 (or +0 if q is forbidden from being a string)

> - `p / q` joins path q to path p


> - `p.join(q)` joins path q to path p


Andrew McNabb
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

From ncoghlan at  Mon Oct  8 21:29:17 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Oct 2012 00:59:17 +0530
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

Reducing to numeric votes:

p[q]: -1 (confusing w.r.t to indexing/slicing, not convinced it is needed)
p + q : -1 (confusing w.r.t to strings, not convinced it is needed)
p / q : -0 (not convinced it is needed)
p.join(q): -0 (confusing w.r.t strings)
p.joinpath(q): +1 (avoids confusion, precedent, need a method
API anyway)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From at  Mon Oct  8 21:41:30 2012
From: at (Joshua Landau)
Date: Mon, 8 Oct 2012 20:41:30 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 8 October 2012 20:14, Stefan Krah <stefan at> wrote:

> Guido van Rossum <guido at> wrote:
> > Personally I fear '+' much more -- to me, + can be used to add an
> > extension without adding a new directory level. If we *have* to
> > overload an operator, I'd prefer p/q over p[q] any day.
> '^' or '@' are used for concatenation in some languages. At least
> accidental
> confusion with xor is pretty unlikely.

On the basis that we want standard libraries to be non-contentious issues:
is it not obvious that "+", "/" and  "[]" *cannot* be the right choices as
they're contentious?

I would argue that a lot of this argument is ?pointless? because there is
no right answer. For example, I prefer indexing out of the lot, but since a
lot of people really dislike it I'm not going to bother vouching for it.

I think we should ague more along the lines of:

# Possibility for accidental validity if configdir is a string
> configdir.join("myprogram")

# A bit long
> # My personal objection is that one shouldn't have to state "path" in the
> name: it's not str.stringjoin()
> configdir.joinpath("myprogram")
> configdir.pathjoin("myprogram")

# There's argument here, but I don't find them intuitive or nice
> configdir.subpath("mypogram")
> configdir.superpath("mypogram")

# My favorites ('cause my opinion: so there)
> configdir.child("myprogram")  # Does sorta' imply IO
> configdir.get("myprogram")  # 'Cause it's short, but it does sorta' imply
> IO
> configdir.goto("myprogam")  # "GOTO IS BAD!! BOO!"

# What I'm surprised (but half-glad) hasn't been mentioned"myprogam")  # Not a link, just GMail's silly-ness

We already know the semantics for the function; now it's *just a name*.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mikegraham at  Mon Oct  8 21:44:54 2012
From: mikegraham at (Mike Graham)
Date: Mon, 8 Oct 2012 15:44:54 -0400
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
Message-ID: <>

I regularly see learners using "is" to check for string equality and
sometimes other equality. Due to optimizations, they often come away
thinking it worked for them.

There are no cases where

    if x is "foo":


   if x is 4:

is actually the code someone intended to write.

Although this has no benefit to anyone but new learners, it also
doesn't really do any harm.


From guido at  Mon Oct  8 21:46:43 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 12:46:43 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 12:03 PM, Rob Cliffe <rob.cliffe at> wrote:
> On 08/10/2012 19:39, Guido van Rossum wrote:
>> Does this mean that the following behaviour of lists is a bug?
>>>>>>> x=float('NAN')
>>>>>>> [x]==[x], [x]<=[x], [x]>=[x]
>>>> (True, True, True)
>>> No. That's a special case in the comparisons for sequences.
>> [Now that I'm back at a real keyboard I can elaborate...]
>> This applies to all container comparisons: without the rule that if
>> two contained items reference the same object they are to be
>> considered equal without calling their __eq__, containers couldn't
>> take the shortcut that a container is always equal to itself (i.e. c1
>> is c2 => c1 == c2). Without this shortcut, container comparisons would
>> be much more expensive: any time a large container was compared to
>> itself, it would be forced to recursively compare all the contained
>> items. You might say that it has to do this anyway when comparing to a
>> container that is not itself, but if the anser is "unequal" the
>> comparison can stop as soon as two unequal items are found, whereas if
>> the answer is "equal" you end up comparing all items. For two
>> different containers there is no possible shortcut, but comparing a
>> container to itself is quite common and really does deserve the
>> shortcut. We discussed this in the past and always came to the same
>> conclusion: despite the rules for NaN, the shortcut for containers is
>> required. A similar shortcut exists for 'x in [x]' BTW.
> Thank you for elaborating, I was going to ask what the justification for the
> special case was.
> You have explained why
>>>> x=float('NAN'); A=[x]; A==A
> True
> but not as far as I can see why
>>>> x=float('NAN'); A=[x]; B=[x]; A==B, [x]=[x]
> (True, True)
> where neither of the results is comparing a container to itself.

It's so that when the container is iterating over pairs of elements it
can check for item identity (a simple pointer comparison) first, which
makes a pretty big difference in speed.

--Guido van Rossum (

From guido at  Mon Oct  8 21:48:07 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 12:48:07 -0700
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham <mikegraham at> wrote:
> I regularly see learners using "is" to check for string equality and
> sometimes other equality. Due to optimizations, they often come away
> thinking it worked for them.
> There are no cases where
>     if x is "foo":
> or
>    if x is 4:
> is actually the code someone intended to write.
> Although this has no benefit to anyone but new learners, it also
> doesn't really do any harm.

I think the best we can do is to make these SyntaxWarnings. I had the
same thought recently and I do agree that these are common beginners
mistakes that can easily hide bugs by succeeding in simple tests.

--Guido van Rossum (

From masklinn at  Mon Oct  8 21:59:53 2012
From: masklinn at (Masklinn)
Date: Mon, 8 Oct 2012 21:59:53 +0200
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-08, at 21:48 , Guido van Rossum wrote:
> On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham <mikegraham at> wrote:
>> I regularly see learners using "is" to check for string equality and
>> sometimes other equality. Due to optimizations, they often come away
>> thinking it worked for them.
>> There are no cases where
>>    if x is "foo":
>> or
>>   if x is 4:
>> is actually the code someone intended to write.
>> Although this has no benefit to anyone but new learners, it also
>> doesn't really do any harm.
> I think the best we can do is to make these SyntaxWarnings. I had the
> same thought recently and I do agree that these are common beginners
> mistakes that can easily hide bugs by succeeding in simple tests.

How would the rather common pattern of using an `object` instance as a
placeholder be handled? An identity test precisely expresses what is
meant and desired in that case, while an equality test does not.

An other one which seems to have some serious usage in the stdlib is
type-testing (e.g., decimal or tests of exception types).
Without type inference, I'm not too sure how that could be handled
as syntactic warnings, and as above an identity test expresses the
purpose of the code better than an equality one.

From massimo.dipierro at  Mon Oct  8 22:00:06 2012
From: massimo.dipierro at (Massimo DiPierro)
Date: Mon, 8 Oct 2012 15:00:06 -0500
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

p[q]: -1
p + q : +1 
p / q : -1
p.join(q): -1
p.joinpath(q): -0

Looks like I am a minority. :-(

A directory structure with symbolic links is a graph. A path is an ordered set of links. A path can start anywhere in the graph and can end up anywhere. Links are represented by folder names. To me this means a natural representation of a Path as a list of strings which can be serialized in a OS-specific path. 

In fact today we all do, already: path.split(os.path.sep) and then manipulate the resulting list.

Representing the Path with an object that has the same API as a list of strings (add, radd, append, insert, get item, get slice) and a few extra ones, will make it easier for new users to understand it and remember the APIs.

I do not like p[q] and p/q because they fire the wrong neurons in my brain. p[q] picks an element in a set, p/q picks a subset of p. I also do  not like p.join because p is not a string and it may be confusing.

I am not opposed to q.joinpath(q) but it would require that users learn a new API. They cannot just guess it. they would have to look it up. That gives aways the main reason I use Python: it is intuitive. Moreover p.joinpath(q) seems to indicate that q is a path but q could be a string not a Path.


On Oct 8, 2012, at 2:29 PM, Nick Coghlan wrote:

> Reducing to numeric votes:
> p[q]: -1 (confusing w.r.t to indexing/slicing, not convinced it is needed)
> p + q : -1 (confusing w.r.t to strings, not convinced it is needed)
> p / q : -0 (not convinced it is needed)
> p.join(q): -0 (confusing w.r.t strings)
> p.joinpath(q): +1 (avoids confusion, precedent, need a method
> API anyway)
> Cheers,
> Nick.
> -- 
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From mwm at  Mon Oct  8 22:05:25 2012
From: mwm at (Mike Meyer)
Date: Mon, 8 Oct 2012 15:05:25 -0500
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 2:14 PM, Stefan Krah <stefan at> wrote:
> Guido van Rossum <guido at> wrote:
>> Personally I fear '+' much more -- to me, + can be used to add an
>> extension without adding a new directory level. If we *have* to
>> overload an operator, I'd prefer p/q over p[q] any day.
> '^' or '@' are used for concatenation in some languages. At least accidental
> confusion with xor is pretty unlikely.

@? I like it (@ is used for array indexing in some languages), but
don't see a special method for it.....

Maybe you meant **?

From guido at  Mon Oct  8 22:00:45 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 13:00:45 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 12:20 PM, Mark Adam <dreamingforward at> wrote:
> On Sun, Oct 7, 2012 at 9:01 PM, Guido van Rossum <guido at> wrote:
>> On Sun, Oct 7, 2012 at 6:41 PM, Ben Darnell <ben at> wrote:
>>> I think there are
>>> actually two separate issues here and it's important to keep them
>>> distinct:  at a low level, there is a need for a standardized event
>>> loop, while at a higher level there is a question of what asynchronous
>>> code should look like.
>> Yes, yes. I tried to bring up this distinction. I'm glad I didn't
>> completely fail.
> Perhaps this is obvious to others, but (like hinted at above) there
> seem to be two primary issues with event handlers:
> 1) event handlers for the machine-program interface (ex. network I/O)
> 2) event handlers for the program-user interface (ex. mouse I/O)
> While similar, my gut tell me they have to be handled in completely
> different way in order to preserve order (i.e. sanity).
> This issue, for me, has come up with wanting to make a p2p network
> application with VPython.

Interesting. I agree that these are different in nature, but I think
it would still be useful to have a single event loop ("reactor") that
can multiplex them together. I think where the paths diverge is when
it comes to the signature of the callback; for GUI events there is
certain standard structure that must be passed to the callback and
which isn't readily available when you *specify* the callback. OTOH
for your typical socket event the callback can just call the
appropriate method on the socket once it knows the socket is ready.

But still, in many cases I would like to see these all serialized in
the same thread and multiplexed according to some kind of assigned or
implied priorities, and IIRC, GUI events often are "collapsed" (e.g.
multple redraw events for the same window, or multiple mouse motion

I also imagine the typical GUI event loop has hooks for integrating
file descriptor polling, or perhaps it gives you a file descriptor to
add to your select/poll/etc. map.

Also, doesn't the Windows IOCP unify the two?

--Guido van Rossum (

From guido at  Mon Oct  8 22:07:37 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 13:07:37 -0700
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 12:59 PM, Masklinn <masklinn at> wrote:
> On 2012-10-08, at 21:48 , Guido van Rossum wrote:
>> On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham <mikegraham at> wrote:
>>> I regularly see learners using "is" to check for string equality and
>>> sometimes other equality. Due to optimizations, they often come away
>>> thinking it worked for them.
>>> There are no cases where
>>>    if x is "foo":
>>> or
>>>   if x is 4:
>>> is actually the code someone intended to write.
>>> Although this has no benefit to anyone but new learners, it also
>>> doesn't really do any harm.
>> I think the best we can do is to make these SyntaxWarnings. I had the
>> same thought recently and I do agree that these are common beginners
>> mistakes that can easily hide bugs by succeeding in simple tests.
> How would the rather common pattern of using an `object` instance as a
> placeholder be handled? An identity test precisely expresses what is
> meant and desired in that case, while an equality test does not.

It wouldn't be affected. The warning should only be emitted if either
argument to 'is' is a literal number or string. Even if x could be an
object instance I still don't see how it would lend meaning to "if x
is 4:".

> An other one which seems to have some serious usage in the stdlib is
> type-testing (e.g., decimal or tests of exception types).
> Without type inference, I'm not too sure how that could be handled
> as syntactic warnings, and as above an identity test expresses the
> purpose of the code better than an equality one.

Looks like you're mistaking the proposal for "reject 'is' whenever
either argument is a numeric or string value". The proposal is meant
to look at the source code and only trigger if a *literal* of those
types is used.

--Guido van Rossum (

From masklinn at  Mon Oct  8 22:08:05 2012
From: masklinn at (Masklinn)
Date: Mon, 8 Oct 2012 22:08:05 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-08, at 20:47 , Antoine Pitrou wrote:
> - `p[q]` joins path q to path p


> - `p + q` joins path q to path p


> - `p / q` joins path q to path p

+0, looks like a unix path although others will have issues

> - `p.join(q)` joins path q to path p

+1, especially if `p.join(*q)`, strongly reminiscent of os.path.join
(which I often import "bare" in path-heavy code), I don't think the
common naming with str.join is an issue anymore than it is for

> - `p.joinpath(q)` joins path q to path p

same as `join`, although more of a +0.9 as it's longer without

From masklinn at  Mon Oct  8 22:14:52 2012
From: masklinn at (Masklinn)
Date: Mon, 8 Oct 2012 22:14:52 +0200
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-08, at 22:07 , Guido van Rossum wrote:

> On Mon, Oct 8, 2012 at 12:59 PM, Masklinn <masklinn at> wrote:
>> On 2012-10-08, at 21:48 , Guido van Rossum wrote:
>>> On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham <mikegraham at> wrote:
>>>> I regularly see learners using "is" to check for string equality and
>>>> sometimes other equality. Due to optimizations, they often come away
>>>> thinking it worked for them.
>>>> There are no cases where
>>>>   if x is "foo":
>>>> or
>>>>  if x is 4:
>>>> is actually the code someone intended to write.
>>>> Although this has no benefit to anyone but new learners, it also
>>>> doesn't really do any harm.
>>> I think the best we can do is to make these SyntaxWarnings. I had the
>>> same thought recently and I do agree that these are common beginners
>>> mistakes that can easily hide bugs by succeeding in simple tests.
>> How would the rather common pattern of using an `object` instance as a
>> placeholder be handled? An identity test precisely expresses what is
>> meant and desired in that case, while an equality test does not.
> It wouldn't be affected. The warning should only be emitted if either
> argument to 'is' is a literal number or string. Even if x could be an
> object instance I still don't see how it would lend meaning to "if x
> is 4:".

I went from the description and missed the "literals" part of "non-singleton

Sorry about that.

From mark at  Mon Oct  8 22:20:12 2012
From: mark at (Mark Shannon)
Date: Mon, 08 Oct 2012 21:20:12 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On 08/10/12 19:47, Antoine Pitrou wrote:
> Hello,
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p
-1 Counter intuitive
> - `p + q` joins path q to path p
-1 Confusion with strings
> - `p / q` joins path q to path p
+1 Matches (unix) file separator, no confusion with strings
> - `p.join(q)` joins path q to path p
-1 Confusion with strings again


From p.f.moore at  Mon Oct  8 22:22:30 2012
From: p.f.moore at (Paul Moore)
Date: Mon, 8 Oct 2012 21:22:30 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Monday, 8 October 2012, Antoine Pitrou wrote:
> - `p[q]` joins path q to path p

-1 it isn't really indexing

> - `p + q` joins path q to path p

-1 risk of ambiguity (string concatenation, e.g. it's too easy to assume
you can add an extension with p + '.txt')

> - `p / q` joins path q to path p

-0 best of the operator options

> - `p.join(q)` joins path q to path p

+0 would like it except for the risk of silent errors if p is a string

+1 I wish there was a better name, but I doubt one will appear :-(

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Mon Oct  8 22:26:49 2012
From: barry at (Barry Warsaw)
Date: Mon, 8 Oct 2012 16:26:49 -0400
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
References: <>
Message-ID: <>

On Oct 08, 2012, at 03:44 PM, Mike Graham wrote:

>I regularly see learners using "is" to check for string equality and
>sometimes other equality. Due to optimizations, they often come away
>thinking it worked for them.
>There are no cases where
>    if x is "foo":
>   if x is 4:
>is actually the code someone intended to write.
>Although this has no benefit to anyone but new learners, it also
>doesn't really do any harm.

Conversely, I often see this:

    if x == None

and even

    if x == True

Okay, so maybe these are less harmful than the original complaint, but still,

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From at  Mon Oct  8 22:38:31 2012
From: at (Joshua Landau)
Date: Mon, 8 Oct 2012 21:38:31 +0100
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 October 2012 21:26, Barry Warsaw <barry at> wrote:

> On Oct 08, 2012, at 03:44 PM, Mike Graham wrote:
> >I regularly see learners using "is" to check for string equality and
> >sometimes other equality. Due to optimizations, they often come away
> >thinking it worked for them.
> >
> >There are no cases where
> >
> >    if x is "foo":
> >
> >or
> >
> >   if x is 4:
> >
> >is actually the code someone intended to write.
> >
> >Although this has no benefit to anyone but new learners, it also
> >doesn't really do any harm.
> Conversely, I often see this:
>     if x == None
> and even
>     if x == True
> Okay, so maybe these are less harmful than the original complaint, but
> still,
> yuck!

We can't really warn against these.

>>> class EqualToTrue:
> ...     def __eq__(self, other):
> ...             return other is True
> ...
> >>> EqualToTrue() is True
> False
> >>> EqualToTrue() == True
> True
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ned at  Mon Oct  8 22:39:52 2012
From: ned at (Ned Batchelder)
Date: Mon, 08 Oct 2012 16:39:52 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/8/2012 12:25 PM, Guido van Rossum wrote:
> On Sun, Oct 7, 2012 at 7:35 PM, Ned Batchelder <ned at> wrote:
>> I don't understand the reluctance to address a common conceptual speed-bump
>> in the docs.  After all, the tutorial has an entire chapter
>> ( that explains how
>> floats work, even though they work exactly as IEEE 754 says they should.
> I'm sorry. I didn't intend to refuse to document the behavior. I was
> mostly reacting to things I thought I read between the lines -- the
> suggestion that there is no reason for the NaN behavior except silly
> compatibility with an old standard that nobody cares about. From this
> it is only a small step to reading (again between the lines) the
> suggesting to change the behavior.
>> A sentence in section 5.4 (Numeric Types) would help.  Something like, "In
>> accordance with the IEEE 754 standard, NaN's are not equal to any value,
>> even another NaN.  This is because NaN doesn't represent a particular
>> number, it represents an unknown result, and there is no way to know if one
>> unknown result is equal to another unknown result."
> That sounds like a great addition to the docs, except for the nit that
> I don't like writing the plural of NaN as "NaN's" -- I prefer "NaNs"
> myself. Also, the words here can still cause confusion. The exact
> behavior is that every one of the 6 comparison operators (==, !=, <,
> <=, >, >=) returns False when either argument (or both) is a NaN. I
> think your suggested words could lead someone to believe that they
> mean that x != NaN or NaN != Nan would return True.
> Anyway, once we can agree to words I agree that we should update that section.
How about:

"In accordance with the IEEE 754 standard, when NaNs are compared to any value, even another NaN, the result is always False, regardless of the comparison.  This is because NaN represents an unknown result.  There is no way to know the relationship between an unknown result and any other result, especially another unknown one.  Even comparing a NaN to itself always produces False."


From guido at  Mon Oct  8 22:47:53 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 13:47:53 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 1:39 PM, Ned Batchelder <ned at> wrote:
> On 10/8/2012 12:25 PM, Guido van Rossum wrote:
>> Anyway, once we can agree to words I agree that we should update that
>> section.
> How about:
> "In accordance with the IEEE 754 standard, when NaNs are compared to any
> value, even another NaN, the result is always False, regardless of the
> comparison.  This is because NaN represents an unknown result.  There is no
> way to know the relationship between an unknown result and any other result,
> especially another unknown one.  Even comparing a NaN to itself always
> produces False."

Sounds good. (But now maybe we also need to come clean with the
exceptions for NaNs compared as part of container comparisons?)

--Guido van Rossum (

From tjreedy at  Mon Oct  8 22:51:14 2012
From: tjreedy at (Terry Reedy)
Date: Mon, 08 Oct 2012 16:51:14 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <k4vech$7t1$>

On 10/8/2012 12:19 PM, Guido van Rossum wrote:

> I am not aware of an update to the standard. Being 20 years old does
> not make it outdated.

Similarly, being hundreds or thousands of years old does not make the 
equality standard, which includes reflexivity of equality, outdated. The 
IEEE standard violated that older standard.
illustrates some of the problems than come with that violation. But 
given the compromise made to maintain sane behavior of Python's 
collection classes, I see little reason to change nan in isolation.

I wonder if it would be helpful to make a NaN subclass of floats with 
its own arithmetic and comparison methods. This would clearly mark a nan 
as Not a Normal float. Since subclasses rule (at least some) binary 
operations*, this might also simplify normal float code. But perhaps 
this was considered and rejected before adding math.isnan in 2.6. (And 
ditto for infinities.)

* in that class_ob op subclass_ob is delegated to subclass.__op__, but I 
am not sure if this applies only to arithmetic, comparisons, or both.

Terry Jan Reedy

From solipsis at  Mon Oct  8 22:50:47 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 8 Oct 2012 22:50:47 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
References: <>
Message-ID: <>

I'm forwarding Barry's answer:

-------- Message transf?r? --------
De: Barry Warsaw <barry at>
?: Antoine Pitrou
<public-solipsis-xNDA5Wrcr86sTnJN9+BGXg at>
Sujet: Re: PEP 428: poll about the joining syntax
Date: Mon, 8 Oct 2012 15:17:01 -0400

Like a good American low-information voter, I'll cast my ballot without
having read PEP 428.

On Oct 08, 2012, at 08:47 PM, Antoine Pitrou wrote:

>- `p[q]` joins path q to path p

-1 Definitely not intuitive.

>- `p + q` joins path q to path p

+0.  IMHO, the most intuitive, but causes problems when you just want
to tack on an extension, er, suffix.  I guess if PathObj + str works
it's not so bad.

>- `p / q` joins path q to path p

+0.  Cute!  Too *nix centric?

>- `p.join(q)` joins path q to path p

-0.  Explicit (yay), but a bit verbose (boo).  Maybe this should be the
default underlying API, with one of the above as nice syntactic sugar?


From storchaka at  Mon Oct  8 23:02:36 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 00:02:36 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <k4vf1f$d72$>

On 07.10.12 23:19, Guido van Rossum wrote:
> If this is just about iterator.chain() I may see some value in it (but
> TBH the discussion so far mostly confuses -- please spend some more
> time coming up with good examples that show actually useful use cases
> rather than f() and g() or foo() and bar())

Not I was the first one who showed an example with f() and g(). ;)  I 
only showed that it was wrong analogy.

Yes, first of all I think about itertools.chain(). But then I found all 
other iterator tools which also can be extended to better generators 
support. Perhaps.

I have only one imperfect example for use of StopIterator's value from 
generator (my patch for issue16009). It is difficult to find examples 
for feature, which appeared only recently. But I think I can find them 
before 3.4 feature freezing.

>   OTOH yield from is not primarily for iterators -- it is for
> coroutines. I suspect most of the itertools functionality just doesn't
> work with coroutines.

Indeed. But they work with subset of generators, and this subset can be 
extended. Please look at (Implement 
generator interface in itertools.chain). Does it make sense?

From storchaka at  Mon Oct  8 23:06:49 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 00:06:49 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <k4vf9c$fcp$>

On 08.10.12 01:43, Oscar Benjamin wrote:
> Hopefully, I've understood Serhiy and the docs correctly (I don't have
> access to Python 3.3 right now to test any of this).

Thank you for explanation and example, Oscar.

From storchaka at  Mon Oct  8 23:12:18 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 00:12:18 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4tefg$52k$>
References: <k4q38d$j8e$> <>
	<> <k4tefg$52k$>
Message-ID: <k4vfjl$i24$>

On 08.10.12 05:40, Terry Reedy wrote:
> Serhily, if you want a module of *generator* specific functions
> ('gentools' ?), you should write one and submit it to pypi for testing.

In there is proposed extending of 
itertools.chain to support generators (send(), throw() and close() 
methods). Is it wrong?

From shibturn at  Mon Oct  8 23:17:07 2012
From: shibturn at (Richard Oudkerk)
Date: Mon, 08 Oct 2012 22:17:07 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k4vfsk$enj$>

On 08/10/2012 9:22pm, Paul Moore wrote:
>   p.joinpath(q)
> +1 I wish there was a better name, but I doubt one will appear :-(

I would go for


which at least has the virtue of brevity.


From tjreedy at  Mon Oct  8 23:17:56 2012
From: tjreedy at (Terry Reedy)
Date: Mon, 08 Oct 2012 17:17:56 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <k4vfuj$ku5$>

On 10/8/2012 12:47 PM, Guido van Rossum wrote:

> this as well. Also, IIUC the IEEE library prescribes exceptions as
> well as return values; e.g. "man 3 log" on my OSX computer says that
> log(0) returns -inf as well as raise a divide-by-zero exception. So I
> think this is probably compliant with the standard -- one can decide
> to ignore the exceptions in certain contexts and honor them in others.
> (Probably even the 1/0 behavior can be defended this way.)

I agree. In C, as I remember, a function can both (passively) 'raise an 
exception' by setting errno *and* return a value. This requires the 
programmer to check for an exception, and forgetting to do so is a 
common bug. In Python, raising an exception actively aborts returning a 
value, so you had to choose one of the two behaviors.

>> Some other operations behave inconsistently:
>>>>> 2 * 10.**308
>> inf
>> but
>>>>> 10.**309
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>> OverflowError: (34, 'Result too large')
> Probably the same. IEEE 754 may be more complex than you think!

Or this might be an accidental inconsistency, in that float 
multiplication was changed to return inf but pow was not. But I would be 
reluctant to fiddle with such details now.

Alexander, while I might have chosen to make nan == nan True, I consider 
it a near tossup with no happy resolution and would not change it now. 
Guido's explanation is pretty clear: he went with the IEEE standard as 
interpreted for Python by Tim Peters.

Terry Jan Reedy

From storchaka at  Mon Oct  8 23:22:57 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 00:22:57 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4skie$d5d$>
References: <k4q38d$j8e$> <>
	<k4sjrt$7hh$> <k4skie$d5d$>
Message-ID: <k4vg7j$ni2$>

On 07.10.12 22:18, Richard Oudkerk wrote:
> That means that all but the last return value is ignored.  Why is the
> last return value any more important than the earlier ones?

Because I think the last return value more useful for idiom

   lookahead = next(iterator)
   iterator = itertools.chain([lookahead], iterator)

> ISTM it would make just as much sense to do
>    def chain(*iterables):
>        values = []
>        for it in iterables:
>            values.append(yield from it)
>        return values

It changes the behavior for iterators. And now more difficult to get a 
generator which yields and returns the same values as the original. We 
need yet one wrapper.

   def lastvalue(generator):
       return (yield from generator)[-1]

   iterator = lastvalue(itertools.chain([lookahead], iterator))

Yes, it can work.

From grosser.meister.morti at  Mon Oct  8 23:39:22 2012
From: grosser.meister.morti at (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Mon, 08 Oct 2012 23:39:22 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

`p[q]`           0
`p + q`         -1
`p / q`         +0
`p.join(q)`     +1
`p.pathjoin(q)` +0

Where .join/.pathjoin shall take argument lists. The arguments my be path objects or strings.

Example usage (where filename is a string):

 >>> prefix.join(some,path,components,filename+".txt")

I'm against + because how would you do the example above? Because this:

 >>> prefix + some + path + components + filename + ".txt"

would do something different than this:

 >>> prefix + some + path + components + (filename + ".txt")

Which might surprise a user and is in any case confusing.

From storchaka at  Tue Oct  9 00:00:43 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 01:00:43 +0300
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k4vied$amr$>

On 08.10.12 21:47, Antoine Pitrou wrote:
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:

Of course I have no right to vote, but because the poll is informal, I 
give my humble opinion.

> - `p[q]` joins path q to path p

-1. Counter intuitive and indexing can be used for path splitting.

> - `p + q` joins path q to path p

-1. Confusion with strings. path + str can be used for suffix appending.

> - `p / q` joins path q to path p

+1. Intuitive. No risk of conflicts.

> - `p.join(q)` joins path q to path p

-0. A bit confusion with strings. -0.1 verbose. +0.1 can have many 
arguments. +0.1 similar to os.path.join. -0.1 but have a little 
different semantic.

 > - `p.pathjoin(q)` joins path q to path p

+0. Same as `p.join(q)`, but more verbose (-) and less confusion (+).

From rosuav at  Tue Oct  9 00:02:32 2012
From: rosuav at (Chris Angelico)
Date: Tue, 9 Oct 2012 09:02:32 +1100
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 6:44 AM, Mike Graham <mikegraham at> wrote:
> There are no cases where
>     if x is "foo":
> is actually the code someone intended to write.

Are literals guaranteed to be interned? If so, this code would make
sense, if the programmer knows that x is itself an interned string.

Although I guess a warning wouldn't be a problem there, as they're
easily ignored/suppressed.


From greg.ewing at  Tue Oct  9 00:11:26 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 11:11:26 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan wrote:
> I'm not 100% sold on "subpath" as an alternative

I don't much like the term "subpath" at all. To me it suggests
extracting components out of the path somehow, rather than
adding them on.


From greg.ewing at  Tue Oct  9 00:15:26 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 11:15:26 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Paul Moore wrote:
> "only one way to do
> it" and the general controversy over which is the best operator to
> use, suggests that leaving the operator form out altogether at least
> in the initial implementation is the better option.

Although if we start with a method, it will be impossible
to add an operator later without there then being two ways
to do it.


From greg.ewing at  Tue Oct  9 00:18:54 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 11:18:54 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Steven D'Aprano wrote:

> But it's just a name. __add__ doesn't necessarily perform addition,
> __sub__ doesn't necessarily perform subtraction, and __or__ doesn't
> necessarily have anything to do with either bitwise or boolean OR.

Maybe they should have been called __plus__, __dash__, __star__,
__slash__ etc., then we wouldn't keep having this argument...


From greg.ewing at  Tue Oct  9 00:23:53 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 11:23:53 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Ronald Oussoren wrote:
> neither statvs, statvfs,  nor pathconf seem to be able to tell if a filesystem is case insensitive.

Even if they could, you wouldn't be entirely out of the woods,
because different parts of the same path can be on different
file systems...

But how important is all this anyway? I'm trying to think of
occasions when I've wanted to compare two entire paths for
equality, and I can't think of *any*.


From guido at  Tue Oct  9 00:26:57 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 15:26:57 -0700
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 3:02 PM, Chris Angelico <rosuav at> wrote:
> On Tue, Oct 9, 2012 at 6:44 AM, Mike Graham <mikegraham at> wrote:
>> There are no cases where
>>     if x is "foo":
>> is actually the code someone intended to write.
> Are literals guaranteed to be interned? If so, this code would make
> sense, if the programmer knows that x is itself an interned string.

No, interning is not guaranteed.

> Although I guess a warning wouldn't be a problem there, as they're
> easily ignored/suppressed.

--Guido van Rossum (

From storchaka at  Tue Oct  9 00:42:24 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 01:42:24 +0300
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <k4vksj$ukt$>

On 08.10.12 22:44, Mike Graham wrote:
> There are no cases where
>      if x is "foo":

I see such code in docutils (Doc/tools/docutils/writers/latex2e/

> or
>     if x is 4:

and in tests (Lib/test/, Lib/test/, 
Lib/test/, Lib/test/

From guido at  Tue Oct  9 00:44:12 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 15:44:12 -0700
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4vf1f$d72$>
References: <k4q38d$j8e$> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 2:02 PM, Serhiy Storchaka <storchaka at> wrote:
> On 07.10.12 23:19, Guido van Rossum wrote:
>> If this is just about iterator.chain() I may see some value in it (but
>> TBH the discussion so far mostly confuses -- please spend some more
>> time coming up with good examples that show actually useful use cases
>> rather than f() and g() or foo() and bar())
> Not I was the first one who showed an example with f() and g(). ;)  I only
> showed that it was wrong analogy.
> Yes, first of all I think about itertools.chain(). But then I found all
> other iterator tools which also can be extended to better generators
> support. Perhaps.
> I have only one imperfect example for use of StopIterator's value from
> generator (my patch for issue16009).

I don't understand that code at all, and it seems to be undocumented
(no docstrings, no mention in the external docs). Why is it using
StopIteration at all? There isn't an iterator or generator in sight.
AFAICT it should just use a different exception. But even if you did
use StopIteration -- why would you care about itertools here? AFAICT
it's just being used as a private communication channel between
scan_once() and its caller. Where is the possibility to wrap anything
in itertools at all?

> It is difficult to find examples for
> feature, which appeared only recently. But I think I can find them before
> 3.4 feature freezing.

I think you're going at this from the wrong direction. You shouldn't
be using this feature in circumstances where you're at all likely to
run into this "problem".

>>   OTOH yield from is not primarily for iterators -- it is for
>> coroutines. I suspect most of the itertools functionality just doesn't
>> work with coroutines.>
> Indeed. But they work with subset of generators, and this subset can be
> extended. Please look at (Implement
> generator interface in itertools.chain). Does it make sense?

But that just seems to perpetuate the idea that you have, which IMO is
wrongheaded. Itertools is for iterators, and all the extra generator
features make no sense for it.

--Guido van Rossum (

From Andy.Henshaw at  Tue Oct  9 00:32:06 2012
From: Andy.Henshaw at (Henshaw, Andy)
Date: Mon, 8 Oct 2012 22:32:06 +0000
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <k4vfsk$enj$>
References: <>
Message-ID: <A9B1577E2DCC7545A1018EC14CC4B8BE67780A66@apatlisdmbx01>

On 08/10/2012 9:22pm, Paul Moore wrote:
>   p.joinpath(q)
> +1 I wish there was a better name, but I doubt one will appear :-(

How about

I can imagine getting "joinpath" wrong 50% of the time by typing "pathjoin".

From greg.ewing at  Tue Oct  9 00:47:55 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 11:47:55 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Andrew McNabb wrote:
> Since this really is a matter of personal taste, I'll end my
> participation in this discussion by voicing support for Nick Coghlan's
> suggestion of a `join` method, whether it's named `join` or `append` or
> something else.

I'd prefer 'append', because

    path.append("somedir", "file.txt")

is pretty self-explanatory, whereas

    path.join("somedir", "path.txt")

looks confusingly similar to

    s.join("somedir", "path.txt")

where s is a string, but has very different semantics.


From massimo.dipierro at  Tue Oct  9 00:54:04 2012
From: massimo.dipierro at (Massimo DiPierro)
Date: Mon, 8 Oct 2012 17:54:04 -0500
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>


On Oct 8, 2012, at 5:47 PM, Greg Ewing wrote:

> Andrew McNabb wrote:
>> Since this really is a matter of personal taste, I'll end my
>> participation in this discussion by voicing support for Nick Coghlan's
>> suggestion of a `join` method, whether it's named `join` or `append` or
>> something else.
> I'd prefer 'append', because
>   path.append("somedir", "file.txt")
> is pretty self-explanatory, whereas
>   path.join("somedir", "path.txt")
> looks confusingly similar to
>   s.join("somedir", "path.txt")
> where s is a string, but has very different semantics.
> -- 
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From bauertomer at  Tue Oct  9 01:02:05 2012
From: bauertomer at (T.B.)
Date: Tue, 09 Oct 2012 01:02:05 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <k4vfsk$enj$>
References: <>
Message-ID: <>

On 2012-10-08 23:17, Richard Oudkerk wrote:
> On 08/10/2012 9:22pm, Paul Moore wrote:
>>   p.joinpath(q)
>> +1 I wish there was a better name, but I doubt one will appear :-(
> I would go for
>      p.add(q)
I like the short 'add'. A small problem I see with 'add' (and with 
'append') is that the outcome of adding (or appending) an absolute path 
is too surprising, unlike with the 'join' or 'joinpath' names.

Also, How would we add an extension to a path (without turning it into a 
str first)? Will there be a method called addext() or addsuffix() as the 
.ext/.suffix property is immutable? The suggestions I saw in the thread 
so far targeted substituting the extension, not adding.

Regarding '/', I would like to mention Scapy [1], the packet 
manipulation program. From its documentation: "The / operator has been 
used as a composition operator between two layers". The '/' feels 
natural to use with Scapy. An example from the docs:
> Let?s say I want a broadcast MAC address, and IP payload to and to, TTL value from 1 to 9, and an UDP payload:
> >>> Ether(dst="ff:ff:ff:ff:ff:ff")/IP(dst=["",""],ttl=(1,9))/UDP()



From guido at  Tue Oct  9 01:02:27 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 16:02:27 -0700
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <k4vksj$ukt$>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 3:42 PM, Serhiy Storchaka <storchaka at> wrote:
> On 08.10.12 22:44, Mike Graham wrote:
>> There are no cases where
>>      if x is "foo":
> I see such code in docutils (Doc/tools/docutils/writers/latex2e/

And that's probably a bug.

>> or
>>     if x is 4:
> and in tests (Lib/test/, Lib/test/,
> Lib/test/, Lib/test/

The tests are easily rewritten using

four = 4
if x is four:

--Guido van Rossum (

From mikegraham at  Tue Oct  9 01:05:25 2012
From: mikegraham at (Mike Graham)
Date: Mon, 8 Oct 2012 19:05:25 -0400
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <k4vksj$ukt$>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 6:42 PM, Serhiy Storchaka <storchaka at> wrote:
> On 08.10.12 22:44, Mike Graham wrote:
>> There are no cases where
>>      if x is "foo":
> I see such code in docutils (Doc/tools/docutils/writers/latex2e/

Thanks for finding these!

I can't find this in a couple versions of Python I checked. If this
code is still around, it sounds like it has a bug and should be fixed.

>> or
>>     if x is 4:
> and in tests (Lib/test/, Lib/test/,
> Lib/test/, Lib/test/ is correct, but trivially so. It merely ensures that
`1 is 1` and `1 is not 1` are proper Python syntax. As we're talking
about tweaking Python's syntax rules, obviously code that tests that
the grammar is the current thing would use the current thing. and are valid but unique, in that they rely
on the behavior that no other code should implicitly rely on to test
an implementation detail has an `is 0` check and an `is ""` check. Both should be fixed.

Thanks again,

From ryan at  Tue Oct  9 01:06:25 2012
From: ryan at (Ryan D Hiebert)
Date: Mon, 8 Oct 2012 16:06:25 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 8, 2012, at 3:47 PM, Greg Ewing <greg.ewing at> wrote:
> I'd prefer 'append', because
>   path.append("somedir", "file.txt")


In so many ways, I see a path as a list of its components. Because of that, path.append and path.extend, with similar semantics to list.append and list.extend, makes a lot of sense to me.

When I think about a path as a list of components rather than as a string, the '+' operator starts to make sense for joins as well. I'm OK with using the '/' for path joining as well, because the parallel with list doesn't fit in this case, although I understand Massimo's objection to it. In very many ways, I like thinking of a path as a list (slicing, append, etc).

The fact that list.append doesn't return the new list has always bugged me, but if we were to use append and extend, they should mirror the semantics from list.

I'm much more inclined to think of path as a special list than as a special string.


From phd at  Tue Oct  9 01:11:38 2012
From: phd at (Oleg Broytman)
Date: Tue, 9 Oct 2012 03:11:38 +0400
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
	<k4vfsk$enj$> <>
Message-ID: <>

On Tue, Oct 09, 2012 at 01:02:05AM +0200, "T.B." <bauertomer at> wrote:
> Regarding '/', I would like to mention Scapy [1], the packet
> manipulation program. From its documentation: "The / operator has
> been used as a composition operator between two layers". The '/'
> feels natural to use with Scapy. An example from the docs:
> >Let?s say I want a broadcast MAC address, and IP payload to and to, TTL value from 1 to 9, and an UDP payload:
> >
> >>>> Ether(dst="ff:ff:ff:ff:ff:ff")/IP(dst=["",""],ttl=(1,9))/UDP()

   Except that layers are divided (pun intended) in wrong order. It
seems that Ether is at the top where the traditional stack order from
Ether to UDP is from bottom to top.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From ryan at  Tue Oct  9 01:19:00 2012
From: ryan at (Ryan D Hiebert)
Date: Mon, 8 Oct 2012 16:19:00 -0700
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 8, 2012, at 11:47 AM, Antoine Pitrou <solipsis at> wrote:
> - `p[q]` joins path q to path p
> - `p + q` joins path q to path p
> - `p / q` joins path q to path p
> - `p.join(q)` joins path q to path p
-1, but +1 to p.append(q)

If we want a p.pathjoin method, it would make sense to me for it to work similar to urllib.parse.urljoin, i.e., if the joined path is absolute, have it replace the path, except possibly for the drive on windows. 

I like to follow any parallels to list that make sense.


From oscar.j.benjamin at  Tue Oct  9 01:24:47 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Tue, 9 Oct 2012 00:24:47 +0100
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

On 8 October 2012 00:36, Guido van Rossum <guido at> wrote:
> On Sun, Oct 7, 2012 at 3:43 PM, Oscar Benjamin
> <oscar.j.benjamin at> wrote:
>> I think what Serhiy is saying is that although pep 380 mainly
>> discusses generator functions it has effectively changed the
>> definition of what it means to be an iterator for all iterators:
>> previously an iterator was just something that yielded values but now
>> it also returns a value. Since the meaning of an iterator has changed,
>> functions that work with iterators need to be updated.
> I think there are different philosophical viewpoints possible on that
> issue. My own perspective is that there is no change in the definition
> of iterator -- only in the definition of generator. Note that the
> *ability* to attach a value to StopIteration is not new at all.

I guess I'm viewing it from the perspective that an ordinary iterator
is simply an iterator that happens to return None just like a function
that doesn't bother to return anything. If I understand correctly,
though, it is possible for any iterator to return a value that yield
from would propagate, so the feature (returning a value) is not
specific to generators.

>> This feature was new in Python 3.3 which was released a week ago
> It's been in alpha/beta/candidate for a long time, and PEP 380 was
> first discussed in 2009.
>> so it is not widely used but it has uses that are not anything to do with
>> coroutines.
> Yes, as a shortcut for "for x in <iterator>: yield x". Note that the
> for-loop ignores the value in the StopIteration -- would you want to
> change that too?

Not really. I thought about how it could be changed. Once APIs are
available that use this feature to communicate important information,
use cases will arise for using the same APIs outside of a coroutine
context. I'm not really sure how you could get the value from a for
loop. I guess it would have to be tied to the else clause in some way.

>> As an example of how you could use it, consider parsing a
>> file that can contains #include statements. When the #include
>> statement is encountered we need to insert the contents of the
>> included file. This is easy to do with a recursive generator. The
>> example uses the return value of the generator to keep track of which
>> line is being parsed in relation to the flattened output file:
>> def parse(filename, output_lineno=0):
>>     with open(filename) as fin:
>>         for input_lineno, line in enumerate(fin):
>>             if line.startswith('#include '):
>>                 subfilename = line.split()[1]
>>                 output_lineno = yield from parse(subfilename, output_lineno)
>>             else:
>>                 try:
>>                     yield parse_line(line)
>>                 except ParseLineError:
>>                     raise ParseError(filename, input_lineno, output_lineno)
>>                 output_lineno += 1
>>     return output_lineno
> Hm. This example looks constructed to prove your point... It would be
> easier to count the output lines in the caller. Or you could use a
> class to hold that state. I think it's just a bad habit to start using
> the return value for this purpose. Please use the same approach as you
> would before 3.3, using "yield from" just as the shortcut I mentione
> above.

I'll admit that the example is contrived but it's to think about how
to use the new feature rather than to prove a point (Otherwise I would
have contrived a reason for wanting to use filter()). I just wanted to
demonstrate that people can (and will) use this outside of a coroutine

Also I envisage something like this being a common use case. The
'yield from' expression can only provide information to its immediate
caller by returning a value attached to StopIteration or be raising a
different type of exception. There will be many cases where people
want to get some information about what was yielded/done by 'yield
from' at the point where it is used.


From ericsnowcurrently at  Tue Oct  9 01:31:10 2012
From: ericsnowcurrently at (Eric Snow)
Date: Mon, 8 Oct 2012 17:31:10 -0600
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 1:24 PM, Nick Coghlan <ncoghlan at> wrote:
> However, I'm
> also a big fan of starting with a minimalist core and growing it.
> Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write
> it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least
> isn't going backwards, and is more obvious in isolation than "a / b /
> c / d / e".


From ericsnowcurrently at  Tue Oct  9 01:35:52 2012
From: ericsnowcurrently at (Eric Snow)
Date: Mon, 8 Oct 2012 17:35:52 -0600
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 12:47 PM, Antoine Pitrou <solipsis at> wrote:
> - `p[q]` joins path q to path p
> - `p + q` joins path q to path p
> - `p / q` joins path q to path p
> - `p.join(q)` joins path q to path p
+1 (with a different name)

I've found Nick's argument against operators-from-day-1 to be
convincing, as well as his argument against join() or any other name
already provided by string/sequence APIs.


From oscar.j.benjamin at  Tue Oct  9 01:45:00 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Tue, 9 Oct 2012 00:45:00 +0100
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4tefg$52k$>
References: <k4q38d$j8e$> <>
	<> <k4tefg$52k$>
Message-ID: <>

On 8 October 2012 03:40, Terry Reedy <tjreedy at> wrote:
> On 10/7/2012 7:30 PM, Greg Ewing wrote:
>> Oscar Benjamin wrote:
>>> Before pep 380 filter(lambda x: True, obj) returned an object that was
>>> the same kind of iterator as obj (it would yield the same values). Now
>>> the "kind of iterator" that obj is depends not only on the values that
>>> it yields but also on the value that it returns. Since filter does not
>>> pass on the same return value, filter(lambda x: True, obj) is no
>>> longer the same kind of iterator as obj.
>> Something like this has happened before, when the ability to
>> send() values into a generator was added. If you wrap a
>> generator with filter, you likewise don't get the same kind
>> of object -- you don't get the ability to send() things
>> into your filtered generator.
>> So, "provide the same kind of iterator" is not currently part
>> of the contract of these functions.

They do provide the same kind of iterator in the sense that they
reproduce the properties of the object *in so far as it is an
iterator* by yielding the same values. I probably should have compared
filter(lambda x: True, obj) with iter(obj) rather than obj. In most
cases iter(obj) has a more limited interface.

send() is clearly specific to generators: user defined iterator
classes can provide any number of state-changing methods (usually with
more relevant names) but this is difficult for generators so a generic
mechanism is needed. The return value attached to StopIteration
"feels" more fundamental to me since there is now specific language
syntax both for extracting it and for returning it in generator

> Iterators are Python's generic sequential access device. They do that one
> thing and do it well.
> The iterator protocol is intentionally and properly minimal. An iterator
> class *must* have appropriate .__iter__ and .__next__ methods. It *may* also
> have any other method and any data attribute. Indeed, any iterator much have
> some specific internal data. But these are ignored in generic iterator (or
> iterable) functions. If one does not want that, one should write more
> specific code.

Generalising the concept of an iterator this way is entirely backwards
compatible with existing iterators and does not place any additional
burden on defining iterators: most iterators can simply be iterators
that return None. The feature is optional for any iterator but this
thread is about whether it should be optional for a generic processor
of iterators.

> Serhily, if you want a module of *generator* specific functions ('gentools'
> ?), you should write one and submit it to pypi for testing.

This is probably the right idea. As the feature gains use cases the
best way to handle it will become clearer.


From ericsnowcurrently at  Tue Oct  9 01:45:39 2012
From: ericsnowcurrently at (Eric Snow)
Date: Mon, 8 Oct 2012 17:45:39 -0600
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 4:47 PM, Greg Ewing <greg.ewing at> wrote:
> Andrew McNabb wrote:
>> Since this really is a matter of personal taste, I'll end my
>> participation in this discussion by voicing support for Nick Coghlan's
>> suggestion of a `join` method, whether it's named `join` or `append` or
>> something else.
> I'd prefer 'append', because
>    path.append("somedir", "file.txt")
> is pretty self-explanatory, whereas
>    path.join("somedir", "path.txt")
> looks confusingly similar to
>    s.join("somedir", "path.txt")
> where s is a string, but has very different semantics.

As Nick noted, the problem is that append() conflicts with
MutableSequence.append().  If someone subclasses Path and friends to
act like a list then it complicates the situation.  In my mind the
name should be one that is not already in use by strings or sequences.


From guido at  Tue Oct  9 01:47:23 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 16:47:23 -0700
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 4:24 PM, Oscar Benjamin
<oscar.j.benjamin at> wrote:
> On 8 October 2012 00:36, Guido van Rossum <guido at> wrote:
>> On Sun, Oct 7, 2012 at 3:43 PM, Oscar Benjamin
>> <oscar.j.benjamin at> wrote:
>>> I think what Serhiy is saying is that although pep 380 mainly
>>> discusses generator functions it has effectively changed the
>>> definition of what it means to be an iterator for all iterators:
>>> previously an iterator was just something that yielded values but now
>>> it also returns a value. Since the meaning of an iterator has changed,
>>> functions that work with iterators need to be updated.
>> I think there are different philosophical viewpoints possible on that
>> issue. My own perspective is that there is no change in the definition
>> of iterator -- only in the definition of generator. Note that the
>> *ability* to attach a value to StopIteration is not new at all.
> I guess I'm viewing it from the perspective that an ordinary iterator
> is simply an iterator that happens to return None just like a function
> that doesn't bother to return anything. If I understand correctly,
> though, it is possible for any iterator to return a value that yield
> from would propagate, so the feature (returning a value) is not
> specific to generators.

Substitute "pass a value via StopIteration" and I'll agree that it is

I still don't think it is all that useful, nor that it should be
encouraged (outside the use case of coroutines).

>>> This feature was new in Python 3.3 which was released a week ago
>> It's been in alpha/beta/candidate for a long time, and PEP 380 was
>> first discussed in 2009.
>>> so it is not widely used but it has uses that are not anything to do with
>>> coroutines.
>> Yes, as a shortcut for "for x in <iterator>: yield x". Note that the
>> for-loop ignores the value in the StopIteration -- would you want to
>> change that too?
> Not really. I thought about how it could be changed. Once APIs are
> available that use this feature to communicate important information,
> use cases will arise for using the same APIs outside of a coroutine
> context. I'm not really sure how you could get the value from a for
> loop. I guess it would have to be tied to the else clause in some way.

Given the elusive nature of StopIteration (many operations catch and
ignore it, and that's the main intended use) I don't think it should
be used to pass along *important* information except for the specific
case of coroutines, where the normal use case is to use .send()
instead of .__next__() and to catch the StopIteration exception.

>>> As an example of how you could use it, consider parsing a
>>> file that can contains #include statements. When the #include
>>> statement is encountered we need to insert the contents of the
>>> included file. This is easy to do with a recursive generator. The
>>> example uses the return value of the generator to keep track of which
>>> line is being parsed in relation to the flattened output file:
>>> def parse(filename, output_lineno=0):
>>>     with open(filename) as fin:
>>>         for input_lineno, line in enumerate(fin):
>>>             if line.startswith('#include '):
>>>                 subfilename = line.split()[1]
>>>                 output_lineno = yield from parse(subfilename, output_lineno)
>>>             else:
>>>                 try:
>>>                     yield parse_line(line)
>>>                 except ParseLineError:
>>>                     raise ParseError(filename, input_lineno, output_lineno)
>>>                 output_lineno += 1
>>>     return output_lineno
>> Hm. This example looks constructed to prove your point... It would be
>> easier to count the output lines in the caller. Or you could use a
>> class to hold that state. I think it's just a bad habit to start using
>> the return value for this purpose. Please use the same approach as you
>> would before 3.3, using "yield from" just as the shortcut I mentione
>> above.
> I'll admit that the example is contrived but it's to think about how
> to use the new feature rather than to prove a point (Otherwise I would
> have contrived a reason for wanting to use filter()). I just wanted to
> demonstrate that people can (and will) use this outside of a coroutine
> context.

Just that they will use it doesn't make it a good idea. I claim it's a
bad idea and I don't think you're close to convincing me otherwise.

> Also I envisage something like this being a common use case. The
> 'yield from' expression can only provide information to its immediate
> caller by returning a value attached to StopIteration or be raising a
> different type of exception. There will be many cases where people
> want to get some information about what was yielded/done by 'yield
> from' at the point where it is used.

Maybe. But I think we should wait a few years before we conclude that
we made a mistake. The story of iterators and generators has evolved
in many small steps, each informed by how the previous step turned
out. It's way too soon to say that the existence of yield-from
requires us to change all the other iterator algebra to preserve the
value from StopIteration.

I'll happily take this discussion up again after we've used it for a
couple of years though!

--Guido van Rossum (

From oscar.j.benjamin at  Tue Oct  9 01:59:26 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Tue, 9 Oct 2012 00:59:26 +0100
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

On 9 October 2012 00:47, Guido van Rossum <guido at> wrote:
> Given the elusive nature of StopIteration (many operations catch and
> ignore it, and that's the main intended use) I don't think it should
> be used to pass along *important* information except for the specific
> case of coroutines, where the normal use case is to use .send()
> instead of .__next__() and to catch the StopIteration exception.

It certainly is elusive! I caught a bug a few weeks ago where
StopIteration was generated from a call to next and caught by a for
loop several frames above. I couldn't work out why the loop was
terminating early (since there was no attempt to catch any exceptions
anywhere in the code) and it took about 20 minutes of processing to
reproduce. With no traceback and no way to catch the exception with
pdb it had me stumped for a while.


From greg.ewing at  Tue Oct  9 02:02:11 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 13:02:11 +1300
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Guido van Rossum wrote:

> It's not about equality. If you ask whether two NaNs are *unequal* the
> answer is *also* False.

That's the weirdest part about this whole business, I think.
Unless you're really keeping your wits about you, it's easy
to forget that the assumption (x == y) == False implies
(x != y) == True doesn't necessarily hold.

This is actually a very important assumption when it comes
to reasoning about programs -- even more important than
reflexivity, etc, I believe. Consider

    if x == y:

where x and y are known to be floats. It's easy to see that
the following is equivalent:

    if not x == y:

but it's not quite so easy to spot that the following is
*not* equivalent:

    if x != y:

This trap is made all the easier to fall into because float
comparison is *mostly* well-behaved, except for a small subset
of the possible values. Most other nonstandard comparison behaviours
in Python apply to whole types. E.g. we refuse to compare complex
numbers for ordering, even if their values happen to be real,
so if you try that you get an early exception. But the weirdness
with NaNs only shows up in corner cases that may escape testing.

Now, there *is* a third possibility -- we could raise an exception
if a comparison involving NaNs is attempted. This would be a
more faithful way of adhering to the IEEE 754 specification that
NaNs are "unordered". More importantly, it would make the second code 
transformation above valid in all cases.

So the question that really needs to be answered, I think, is
not "Why is NaN == NaN false?", but "Why doesn't NaN == anything
raise an exception, when it would make so much more sense to
do so?"


From ubershmekel at  Tue Oct  9 02:01:59 2012
From: ubershmekel at (Yuval Greenfield)
Date: Tue, 9 Oct 2012 02:01:59 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

`p[q]`            0
`p + q`          +0.5
`p / q`          +1
`p.join(q)`      -1
`p.pathjoin(q)`  -1
`p.pathjoin(q)`  -1
`p.add(q)`       +0.5

Joining/adding/appending paths is one of the most common ops. Please let's make
it short, easy and obvious.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Tue Oct  9 02:11:33 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 17:11:33 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
>> It's not about equality. If you ask whether two NaNs are *unequal* the
>> answer is *also* False.
> That's the weirdest part about this whole business, I think.
> Unless you're really keeping your wits about you, it's easy
> to forget that the assumption (x == y) == False implies
> (x != y) == True doesn't necessarily hold.
> This is actually a very important assumption when it comes
> to reasoning about programs -- even more important than
> reflexivity, etc, I believe. Consider
>    if x == y:
>       dosomething()
>    else:
>       dosomethingelse()
> where x and y are known to be floats. It's easy to see that
> the following is equivalent:
>    if not x == y:
>       dosomethingelse()
>    else:
>       dosomething()
> but it's not quite so easy to spot that the following is
> *not* equivalent:
>    if x != y:
>       dosomethingelse()
>    else:
>       dosomething()
> This trap is made all the easier to fall into because float
> comparison is *mostly* well-behaved, except for a small subset
> of the possible values. Most other nonstandard comparison behaviours
> in Python apply to whole types. E.g. we refuse to compare complex
> numbers for ordering, even if their values happen to be real,
> so if you try that you get an early exception. But the weirdness
> with NaNs only shows up in corner cases that may escape testing.
> Now, there *is* a third possibility -- we could raise an exception
> if a comparison involving NaNs is attempted. This would be a
> more faithful way of adhering to the IEEE 754 specification that
> NaNs are "unordered". More importantly, it would make the second code
> transformation above valid in all cases.
> So the question that really needs to be answered, I think, is
> not "Why is NaN == NaN false?", but "Why doesn't NaN == anything
> raise an exception, when it would make so much more sense to
> do so?"

Because == raising an exception is really unpleasant. We had this in
Python 2 for unicode/str comparisons and it was very awkward.

Nobody arguing against the status quo seems to care at all about
numerical algorithms though. I propose that you go find some numerical
mathematicians and ask them.

--Guido van Rossum (

From christian at  Tue Oct  9 02:13:03 2012
From: christian at (Christian Heimes)
Date: Tue, 09 Oct 2012 02:13:03 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Am 08.10.2012 17:35, schrieb Guido van Rossum:
> On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <christian at> wrote:
>> Python's standard library doesn't contain in interface to I/O Completion
>> Ports. I think a common event loop system is a good reason to add IOCP
>> if somebody is up for the challenge.
>> Would you prefer an IOCP wrapper in the stdlib or your own version?
>> Twisted has its own Cython based wrapper, some other libraries use a
>> libevent-based solution.
> What's an IOCP?

I/O Completion Ports,

It's a Windows (and apparently also Solaris) API for async IO that can
handle multiple threads.


From steve at  Tue Oct  9 02:28:13 2012
From: steve at (Steven D'Aprano)
Date: Tue, 09 Oct 2012 11:28:13 +1100
Subject: [Python-ideas] Subpaths [was Re: PEP 428 - object-oriented
 filesystem paths]
In-Reply-To: <>
References: <>
Message-ID: <>


I've come to the conclusion that you are right to prefer a named method
over an operator for joining paths. But I think you are wrong to name that
method "subpath" -- see below.

On 09/10/12 05:39, Nick Coghlan wrote:
> On Mon, Oct 8, 2012 at 11:53 PM, Steven D'Aprano<steve at>  wrote:
>>> "p.subpath('foo', 'bar')" looks like executable
>>> pseudocode for creating a new path based on existing one to me,
>> That notation quite possibly goes beyond unintuitive to downright
>> perverse. You are using a method called "subpath" to generate a
>> *superpath* (deeper, longer path which includes p as a part).
> Huh? It's a tree structure. A subpath lives inside its parent path,
> just as subnodes are children of their parent node. Agreed it's not a
> widely used term though - it's a generalisation of subdirectory to
> also cover file paths.

I believe you mentioned in an earlier email that you invented the term
for this discussion. Quote:

     I made it up by using "make subpath" as the reverse of
     "get relative path".

Unfortunately subpath already has an established meaning, and it is the
complete opposite of the sense you intend: paths are trees are graphs,
and the graph a->b->c->d is a superpath, not subpath, of a->b->c:

a->b->c is strictly contained within a->b->c->d; the reverse is not true.

Just as "abcd" is a superstring of "abc", not a substring. Likewise for
superset and subset.

And likewise for trees (best viewed in a monospaced font):


One can say that the tree a-f-g is a subtree of the whole, but one cannot
say that a-f-g-h is a subtree since h is not a part of the first tree.

> They're certainly not "super" anything, any more than a subdirectory
> is really a superdirectory (which is what you appear to be arguing).

Common usage is that "subdirectory" gets used for relative paths: given
path /a/b/c/d, we say that "d" is a subdirectory of /a/b/c. I've never
come across anyone giving d in absolute terms. Now perhaps I've lived a
sheltered life *wink* and people do talk about subdirectories in absolute
paths all the time. That's fine. But they don't talk about "subpaths" in
the sense you intend, and the sense you intend goes completely against
the established sense.

The point is, despite the common "sub" prefix, the semantics of
"subdirectory" is quite different from the semantics of "substring",
"subset", "subtree" and "subpath".


From tjreedy at  Tue Oct  9 02:28:51 2012
From: tjreedy at (Terry Reedy)
Date: Mon, 08 Oct 2012 20:28:51 -0400
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k4vr4k$ck0$>

On 10/8/2012 2:47 PM, Antoine Pitrou wrote:
> Hello,
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p

-1 to this

> - `p + q` joins path q to path p
> - `p / q` joins path q to path p
> - `p.join(q)` joins path q to path p

currently neutral between these

Terry Jan Reedy

From oscar.j.benjamin at  Tue Oct  9 02:32:39 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Tue, 9 Oct 2012 01:32:39 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 9 October 2012 01:11, Guido van Rossum <guido at> wrote:
> On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing <greg.ewing at> wrote:
>> So the question that really needs to be answered, I think, is
>> not "Why is NaN == NaN false?", but "Why doesn't NaN == anything
>> raise an exception, when it would make so much more sense to
>> do so?"
> Because == raising an exception is really unpleasant. We had this in
> Python 2 for unicode/str comparisons and it was very awkward.
> Nobody arguing against the status quo seems to care at all about
> numerical algorithms though. I propose that you go find some numerical
> mathematicians and ask them.

The main purpose of quiet NaNs is to propagate through computation
ruining everything they touch. In a programming language like C that
lacks exceptions this is important as it allows you to avoid checking
all the time for invalid values, whilst still being able to know if
the end result of your computation was ever affected by an invalid
numerical operation. The reasons for NaNs to compare unequal are no
doubt related to this purpose.

It is of course arguable whether the same reasoning applies to a
language like Python that has a very good system of exceptions but I
agree with Guido that raising an exception on == would be unfortunate.
How many people would forget that they needed to catch those
exceptions? How awkward could your code be if you did remember to
catch all those exceptions? In an exception handling language it's
important to know that there are some operations that you can trust.


From christian at  Tue Oct  9 02:35:27 2012
From: christian at (Christian Heimes)
Date: Tue, 09 Oct 2012 02:35:27 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

Am 08.10.2012 21:15, schrieb Nick Coghlan:
> My own current preference is to take "p.joinpath(q)" straight from
> (
> I don't *love* joinpath as a name, I just don't actively dislike it
> the way I do the four presented options (and it has the virtue of the
> precedent).

I dislike + and [] because I find the result too surprising. If I'd be
forced to choose between +, / and [] then I would go for / as it looks
kinda like a path.

+1 for p.joinpath(*args). It's really a must have feature. The name is
debatable, though.

+0 for p / sub

-1 for p + sub and p[sub]


From stephen at  Tue Oct  9 02:37:52 2012
From: stephen at (Stephen J. Turnbull)
Date: Tue, 09 Oct 2012 09:37:52 +0900
Subject: [Python-ideas] checking for identity before comparing
	built-in	objects
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum writes:

 > Sounds good. (But now maybe we also need to come clean with the
 > exceptions for NaNs compared as part of container comparisons?)

For a second I thought you meant IEEE 754 Exceptions.  Whew!  How

For reasons of efficiency, Python allows comparisons of containers to
shortcut element comparisons.  These shortcuts mean that it is
possible that comparison of two containers may return True, even if
they contain NaNs.  For details, see the language reference[1].

Longer than I think it deserves, but maybe somebody has a better idea?

[1]  Sorry about that, but details don't really belong in a *Python*
tutorial.  Maybe this should be "see the implementation notes"?

From ironfroggy at  Tue Oct  9 02:39:13 2012
From: ironfroggy at (Calvin Spealman)
Date: Mon, 8 Oct 2012 20:39:13 -0400
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 2:47 PM, Antoine Pitrou <solipsis at> wrote:

> Hello,
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p

-1 This syntax makes no sense, it doesn't match the syntax in an obvious way

> - `p + q` joins path q to path p

-1 Too easy to confuse with string concat

> - `p / q` joins path q to path p

+1 Reads like a path, makes logical sense

> - `p.join(q)` joins path q to path p

+1 Allows passing as a callable (tho we could just use operator module). I
think we should have both / and .join()

(you can include a rationale if you want, but don't forget to vote :-))
> Thank you
> Antoine.
> --
> Software development and contracting:
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From stephen at  Tue Oct  9 02:50:52 2012
From: stephen at (Stephen J. Turnbull)
Date: Tue, 09 Oct 2012 09:50:52 +0900
Subject: [Python-ideas] checking for identity before comparing
	built-in	objects
In-Reply-To: <k4vech$7t1$>
References: <>
Message-ID: <>

Terry Reedy writes:

 > I wonder if it would be helpful to make a NaN subclass of floats with 
 > its own arithmetic and comparison methods.

It can't be helpful, unless you go a lot further.  Specifically, you'd
need to require containers to check every element for NaN-ness.  That
doesn't seem very practical.

In any case, the presentation by Kahan (cited earlier by Alexander
himself) demolishes the idea that any sort of attempt to implement
DWIM for floats in a programming language can succeed at the present
state of the art.  The best we can get is DWGM ("do what Guido means",
even if what Guido means is "ask the Timbot"<wink/>).

Kahan pretty explicitly endorses this approach, by the way.  At least
in the context of choosing default policy for IEEE 754 Exceptions.

From carlopires at  Tue Oct  9 02:57:57 2012
From: carlopires at (Carlo Pires)
Date: Mon, 8 Oct 2012 21:57:57 -0300
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

2012/10/8 Antoine Pitrou <solipsis at>

> - `p[q]` joins path q to path p

> - `p + q` joins path q to path p

> - `p / q` joins path q to path p

> - `p.join(q)` joins path q to path p

  Carlo Pires
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ironfroggy at  Tue Oct  9 03:02:11 2012
From: ironfroggy at (Calvin Spealman)
Date: Mon, 8 Oct 2012 21:02:11 -0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 1:59 PM, Nick Coghlan <ncoghlan at> wrote:

> On Sat, Oct 6, 2012 at 9:44 PM, Calvin Spealman <ironfroggy at>
> wrote:
> > Responding late, but I didn't get a chance to get my very strong
> > feelings on this proposal in yesterday.
> >
> > I do not like it. I'll give full disclosure and say that I think our
> > earlier failure to include the path library in the stdlib has been a
> > loss for Python and I'll always hope we can fix that one day. I still
> > hold out hope.
> >
> > It feels like this proposal is "make it object oriented, because
> > object oriented is good" without any actual justification or obvious
> > problem this solves. The API looks clunky and redundant, and does not
> > appear to actually improve anything over the facilities in the os.path
> > module. This takes a lot of things we can already do with paths and
> > files and remixes them into a not-so intuitive API for the sake of
> > change, not for the sake of solving a real problem.
> The PEP needs to better articulate the rationale, but the key points are:
> - better abstraction and encapsulation of cross-platform logic so file
> manipulation algorithms written on Windows are more likely to work
> correctly on POSIX systems (and vice-versa)

Frankly, for 99% of file path work, anything I do on one "just works" on
the other, and complicating things with these POSIX versus NT path types
just seems to be a whole lot of early complication for a few edge cases
most people never see.

Simplest example is requiring the backslash separator on NT when it handles
forward slash, just like POSIX, just fine, and has for a long, long time.

> - improved ability to manipulate paths with Windows semantics on a
> POSIX system (and vice-versa)
> - better support for creation of "mock" filesystem APIs

I admit the mock FS intrigues me

> > As for specific problems I have with the proposal:
> >
> > Frankly, I think not keeping the / operator for joining is a huge
> > mistake. This is the number one best feature of path and despite that
> > many people don't like it, it makes sense. It makes our most common
> > path operation read very close to the actual representation of the
> > what you're creating. This is great.
> It trades readability (and discoverability) for brevity. Not good.

I thought it had all three. In these situations, where my and another's
perception of a systems strengths and weaknesses are opposite, I don't
really know how to make a good response. :-/

> Not inheriting from str means that we can't directly path these path
> > objects to existing code that just expects a string, so we have a
> > really hard boundary around the edges of this new API. It does not
> > lend itself well to incrementally transitioning to it from existing
> > code.
> It's the exact design philosophy as was used in the creation of the
> new ipaddress module: the objects in ipaddress must still be converted
> to a string or integer before they can be passed to other operations
> (such as the socket module APIs). Strings and integers remain the data
> interchange formats here as well (although far more focused on strings
> in the path case).
> >
> > The stat operations and other file-facilities tacked on feel out of
> > place, and limited. Why does it make sense to add these facilities to
> > path and not other file operations? Why not give me a read method on
> > paths? or maybe a copy? Putting lots of file facilities on a path
> > object feels wrong because you can't extend it easily. This is one
> > place that function(thing) works better than thing.function()
> Indeed, I'm personally much happier with the "pure" path classes than
> I am with the ones that can do filesystem manipulation. Having both
> "" and "open(str(p), mode)" seems strange. OTOH, I can see
> the attraction in being able to better fake filesystem access through
> the method API, so I'm willing to go along with it.
> > Overall, I'm completely -1 on the whole thing.
> I find this very hard to square with your enthusiastic support for
> Like ipaddr, which needed to clean up its semantic model
> before it could be included in the standard library (as ipaddress), we
> need a clean cross-platform semantic model for path objects before a
> convenience API can be added for manipulating them.

I somewhat dislike this because I loved so much and this proposal
seems to actively avoid exactly the aspects of that I enjoyed the
most (like the / joining).

> Cheers,
> Nick.
> was in teh wild, and is still in use. Why do we find ourselves
debating new libraries like this as PEPs? We need to let them play out, see
what sticks. If someone wants to make this library and stick it on PyPI,
I'm not stopping them. I'm encouraging it. Let's see how it plays out. if
it works out well, it deserves a PEP. In two or three years.

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From zachary.ware+pyideas at  Tue Oct  9 03:05:35 2012
From: zachary.ware+pyideas at (Zachary Ware)
Date: Mon, 8 Oct 2012 20:05:35 -0500
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

Speaking as a relatively inexperienced user (whose opinions should
justly be given little weight as such),

> - `p[q]` joins path q to path p
-1: Doesn't make sense at first glance

> - `p + q` joins path q to path p
-1: For reasons stated elsewhere by several others; Path + (Path or
str) != str + str

> - `p / q` joins path q to path p
+1: Short, makes sense if you can get your brain past "/ in Python
means 'divide'"

> - `p.join(q)` joins path q to path p
+1: Except it needs a different name, for the same reasons as +

What about p.unite(q)? The one word definition of 'join' is 'unite'
and it's definitely not used by str, and I don't know of anywhere else
that it is used. And it's only one extra character instead of the 4 of
'pathjoin' or 'joinpath'.

From guido at  Tue Oct  9 03:07:48 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 18:07:48 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 5:32 PM, Oscar Benjamin
<oscar.j.benjamin at> wrote:
> On 9 October 2012 01:11, Guido van Rossum <guido at> wrote:
>> On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing <greg.ewing at> wrote:
>>> So the question that really needs to be answered, I think, is
>>> not "Why is NaN == NaN false?", but "Why doesn't NaN == anything
>>> raise an exception, when it would make so much more sense to
>>> do so?"
>> Because == raising an exception is really unpleasant. We had this in
>> Python 2 for unicode/str comparisons and it was very awkward.
>> Nobody arguing against the status quo seems to care at all about
>> numerical algorithms though. I propose that you go find some numerical
>> mathematicians and ask them.
> The main purpose of quiet NaNs is to propagate through computation
> ruining everything they touch. In a programming language like C that
> lacks exceptions this is important as it allows you to avoid checking
> all the time for invalid values, whilst still being able to know if
> the end result of your computation was ever affected by an invalid
> numerical operation. The reasons for NaNs to compare unequal are no
> doubt related to this purpose.
> It is of course arguable whether the same reasoning applies to a
> language like Python that has a very good system of exceptions but I
> agree with Guido that raising an exception on == would be unfortunate.
> How many people would forget that they needed to catch those
> exceptions? How awkward could your code be if you did remember to
> catch all those exceptions? In an exception handling language it's
> important to know that there are some operations that you can trust.

If we want to do *anything* I think we should first introduce a
floating point context similar to the Decimal context. Then we can

--Guido van Rossum (

From raymond.hettinger at  Tue Oct  9 03:13:25 2012
From: raymond.hettinger at (Raymond Hettinger)
Date: Mon, 8 Oct 2012 18:13:25 -0700
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 8, 2012, at 12:44 PM, Mike Graham <mikegraham at> wrote:

> I regularly see learners using "is" to check for string equality and
> sometimes other equality. Due to optimizations, they often come away
> thinking it worked for them.
> There are no cases where
>    if x is "foo":
> or
>   if x is 4:
> is actually the code someone intended to write.
> Although this has no benefit to anyone but new learners, it also
> doesn't really do any harm.

This seems like a job for pyflakes, pylint, or pychecker.


From ironfroggy at  Tue Oct  9 03:14:57 2012
From: ironfroggy at (Calvin Spealman)
Date: Mon, 8 Oct 2012 21:14:57 -0400
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 3:44 PM, Mike Graham <mikegraham at> wrote:
> I regularly see learners using "is" to check for string equality and
> sometimes other equality. Due to optimizations, they often come away
> thinking it worked for them.
> There are no cases where
>     if x is "foo":
> or
>    if x is 4:
> is actually the code someone intended to write.
> Although this has no benefit to anyone but new learners, it also
> doesn't really do any harm.


> Mike
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From alexander.belopolsky at  Tue Oct  9 03:31:40 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Mon, 8 Oct 2012 21:31:40 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <k4vfuj$ku5$>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 5:17 PM, Terry Reedy <tjreedy at> wrote:
> Alexander, while I might have chosen to make nan == nan True, I consider it
> a near tossup with no happy resolution and would not change it now.

While I did suggest to change nan == nan result two years ago,
I am not suggesting it now.  Here I am merely trying to understand to
what extent Python's float is implementing IEEE 754 and why in some
cases Python's behavior deviates from the standard while in the case
of nan == nan, IEEE 754 is taken as a gospel.

> Guido's
> explanation is pretty clear: he went with the IEEE standard as interpreted
> for Python by Tim Peters.

It would be helpful if that interpretation was clearly written
somewhere.  Without a written document this interpretation seems
apocryphal to me.

Earlier in this thread, Guido wrote: "I am not aware of an update to
the standard."  To the best of my knowledge IEEE Std 754 was last
updated in 2008.  I don't think the differences between 1985 and 2008
revisions matter much for this discussion, but since I am going to
refer to chapter and verse, I will start by citing the document that I
will use:

IEEE Std 754(TM)-2008
(Revision of IEEE Std 754-1985)
IEEE Standard for Floating-Point Arithmetic
Approved 12 June 2008
IEEE-SA Standards Board

(AFAICT, the main difference between 754-2008 and 754-1985 is that the
former includes decimal floats added in 854-1987.)

Now, let me put my language lawyer hat on and compare Python floating
point implementations to IEEE 754-2008 standard.   Here are the
relevant clauses:

3. Floating-point formats
4. Attributes and rounding
5. Operations
6. Infinity, NaNs, and sign bit
7. Default exception handling
8. Alternate exception handling attributes
9. Recommended operations
10. Expression evaluation
11. Reproducible floating-point results

Clause 3 (Floating-point formats) defines five formats: 3 binary and 2
decimal.  Python supports a superset of decimal formats and a single
binary format.  Section 3.1.2 (Conformance) contains the following
provision: "A programming environment conforms to this standard, in a
particular radix, by implementing one or more of the basic formats of
that radix as both a supported arithmetic format and a supported
interchange format."  I would say Python is conforming to Clause 3.

Clause 4 (Attributes and rounding) is supported only by Decimal
through contexts: "For attribute specification, the implementation
shall provide language-defined means, such as compiler directives, to
specify a constant value for the attribute parameter for all standard
operations in a block; the scope of the attribute value is the block
with which it is associated."  I believe Decimal is mostly conforming,
but float is not conforming at all.

Clause 5 requires "[a]ll conforming implementations of this standard
shall provide the operations listed in this clause for all supported
arithmetic formats, except as stated below."  In other words, a
language standard that claims conformance with IEEE 754 must provide
all operations unless the standard states otherwise.  Let's try to map
 IEEE 754 required operations to Python float operations.

5.3.1 General operations

sourceFormat roundToIntegralTiesToEven(source)
sourceFormat roundToIntegralTiesToAway(source)
sourceFormat roundToIntegralTowardZero(source)
sourceFormat roundToIntegralTowardPositive(source)
sourceFormat roundToIntegralTowardNegative(source)
sourceFormat roundToIntegralExact(source)

Python only provides float.__trunc__ which implements
roundToIntegralTowardZero.  (The builtin round() belongs to a
different category because it changes format from double to int.)

sourceFormat nextUp(source)
sourceFormat nextDown(source)

I don't think these are available for Python floats.

sourceFormat remainder(source, source) - float.__mod__

Not fully conforming.  For example, the standard requires
remainder(-2.0, 1.0) to return -0.0, but in Python 3.3:

>>> -2.0 % 1.0

On the other hand,

>>> math.fmod(-2.0, 1.0)

sourceFormat minNum(source, source)
sourceFormat maxNum(source, source)
sourceFormat minNumMag(source, source)
sourceFormat maxNumMag(source, source)

I don't think these are available for Python floats.

5.3.3 logBFormat operations

I don't think these are available for Python floats.

5.4.1 Arithmetic operations

formatOf-addition(source1, source2) - float.__add__
formatOf-subtraction(source1, source2) - float.__sub__
formatOf-multiplication(source1, source2) - float.__mul__
formatOf-division(source1, source2) - float.__truediv__
formatOf-squareRoot(source1) - math.sqrt
formatOf-fusedMultiplyAdd(source1, source2, source3) - missing
formatOf-convertFromInt(int) - float.__new__

With exception of fusedMultiplyAdd, Python float is conforming.


Python has a single builtin round().

5.5.1 Sign bit operations

sourceFormat copy(source) - float.__pos__
sourceFormat negate(source) - float.__neg__
sourceFormat abs(source) - float.__abs__
sourceFormat copySign(source, source) - math.copysign

Python float is conforming.

Now we are getting close to the issue at hand:
5.6.1 Comparisons
Implementations shall provide the following comparison operations, for
all supported floating-point operands of the same radix in arithmetic

boolean compareQuietEqual(source1, source2)
boolean compareQuietNotEqual(source1, source2)
boolean compareSignalingEqual(source1, source2)
boolean compareSignalingGreater(source1, source2)
boolean compareSignalingGreaterEqual(source1, source2)
boolean compareSignalingLess(source1, source2)
boolean compareSignalingLessEqual(source1, source2)
boolean compareSignalingNotEqual(source1, source2)
boolean compareSignalingNotGreater(source1, source2)
boolean compareSignalingLessUnordered(source1, source2)
boolean compareSignalingNotLess(source1, source2)
boolean compareSignalingGreaterUnordered(source1, source2)
boolean compareQuietGreater(source1, source2)
boolean compareQuietGreaterEqual(source1, source2)
boolean compareQuietLess(source1, source2)
boolean compareQuietLessEqual(source1, source2)
boolean compareQuietUnordered(source1, source2)
boolean compareQuietNotGreater(source1, source2)
boolean compareQuietLessUnordered(source1, source2)
boolean compareQuietNotLess(source1, source2)
boolean compareQuietGreaterUnordered(source1, source2)
boolean compareQuietOrdered(source1, source2).

Signaling comparisons are missing.  Ordered/Unordered comparisons are
missing.  Note that the standard does not require any particular
spelling for operations.  "In this standard, operations are written as
named functions; in a specific programming environment they might be
represented by operators, or by families of format-specific functions,
or by operations or functions whose names might differ from those in
this standard."  (Sec. 5.1)  It would be perfectly conforming for
python to spell compareSignalingEqual() as '==' and
compareQuietEqual() as math.eq() or even
ieee745_2008.compareQuietEqual().  The choice that Python made was not
dictated by the standard.  (As I have shown above, Python's %
operation does not implement a conforming IEEE 754 residual(), but
math.fmod() seems to fill the gap.)

This post is already too long, so I'll leave Clauses 6-11 for another
time.  "IEEE 754 may be more complex than you think!" (GvR, earlier in
this thread.)  I hope I already made the case that Python's float does
not conform to IEEE 754 and that IEEE 754 does not require an
operation spelled "==" or "float.__eq__" to return False when
comparing two NaNs.  The standard requires support for 22 comparison
operations, but Python's float supports around six.  On top of that,
Python has an operation that has no analogue in IEEE 754 - the "is"
comparison.   This is why IEEE 754 standard does not help in answering
the main question in this thread: should (x is y) imply (x == y)?  We
need to formulate a rationale for breaking this implication without a
reference to IEEE 754 or Tim's interpretation thereof.


Alexander Belopolsky

From alexander.belopolsky at  Tue Oct  9 03:37:47 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Mon, 8 Oct 2012 21:37:47 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 9:07 PM, Guido van Rossum <guido at> wrote:
> If we want to do *anything* I think we should first introduce a
> floating point context similar to the Decimal context. Then we can
> talk.


From steve at  Tue Oct  9 04:03:27 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 13:03:27 +1100
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <20121009020327.GB27445@ando>

On Mon, Oct 08, 2012 at 12:48:07PM -0700, Guido van Rossum wrote:
> On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham <mikegraham at> wrote:
> > I regularly see learners using "is" to check for string equality and
> > sometimes other equality. Due to optimizations, they often come away
> > thinking it worked for them.
> >
> > There are no cases where
> >
> >     if x is "foo":
> >
> > or
> >
> >    if x is 4:
> >
> > is actually the code someone intended to write.
> >
> > Although this has no benefit to anyone but new learners, it also
> > doesn't really do any harm.
> I think the best we can do is to make these SyntaxWarnings. I had the
> same thought recently and I do agree that these are common beginners
> mistakes that can easily hide bugs by succeeding in simple tests.

In my experience beginners barely read error messages, let alone 

A SyntaxWarning might help intermediate users who have graduated beyond 
the stage of "my program doesn't work, please somebody fix it", but I 
believe that at best it will be ignored by beginners, if not actively 
confuse them. And I expect that most intermediate users will have 
already learned enough not to use "is" when then mean "==".

So I'm -0 on doing anything to "fix" this. Many things in Python are 
potentially misleading:

array = [[0]*10]*10

On the other hand, I must admit that I've been known to accidently write 
"if x is 0:", so perhaps the real benefit is to prevent silly brainos 
(like typos -- thinkos perhaps?) among more experienced coders. Perhaps 
I should increase my vote to +0.


From guido at  Tue Oct  9 04:09:53 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 19:09:53 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 6:31 PM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> IEEE 754 standard does not help in answering
> the main question in this thread: should (x is y) imply (x == y)?  We
> need to formulate a rationale for breaking this implication without a
> reference to IEEE 754 or Tim's interpretation thereof.

Such a rationale exists in my mind. Since floats are immutable, an
implementation may or may not intern certain float values (just as
certain string and int values are interned but others are not).
Therefore, the fact that "x is y" says nothing about whether the
computations that produced x and y had anything to do with each other.
This is not true for mutable objects: if I have two lists, computed
separately, and find they are the same object, the computations that
produced them must have communicated somehow, or the same list was
passed in to each computations. So, since two computations might
return the same object without having followed the same computational
path, in another implementation the exact same computation might not
return the same object, and so the == comparison should produce the
same value in either case -- in particular, if x and y are both NaN,
all 6 comparisons on them should return False (given that in general
comparing two NaNs returns False regardless of the operator used).

The reason for invoking IEEE 754 here is that without it, Python might
well have grown a language-wide rule stating that an object should
*always* compare equal to itself, as there would have been no
significant counterexamples. (As it is, such a rule only exists for
containers, and technically even there it is optional -- it is just
not required for containers to invoke == for contained items that
reference the same object.)

--Guido van Rossum (

From steve at  Tue Oct  9 04:12:15 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 13:12:15 +1100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <20121009021215.GC27445@ando>

On Mon, Oct 08, 2012 at 11:54:06AM -0700, Guido van Rossum wrote:
> I don't like any of those; I'd vote for another regular method, maybe
> p.pathjoin(q).


Like list.listappend and dict.dictupdate perhaps? :-)

I'm never going to remember whether it is pathjoin or joinpath.

-1 on method names that repeat the type name.


From guido at  Tue Oct  9 04:14:37 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 19:14:37 -0700
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <20121009020327.GB27445@ando>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 7:03 PM, Steven D'Aprano <steve at> wrote:
> On Mon, Oct 08, 2012 at 12:48:07PM -0700, Guido van Rossum wrote:
>> On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham <mikegraham at> wrote:
>> > I regularly see learners using "is" to check for string equality and
>> > sometimes other equality. Due to optimizations, they often come away
>> > thinking it worked for them.
>> >
>> > There are no cases where
>> >
>> >     if x is "foo":
>> >
>> > or
>> >
>> >    if x is 4:
>> >
>> > is actually the code someone intended to write.
>> >
>> > Although this has no benefit to anyone but new learners, it also
>> > doesn't really do any harm.
>> I think the best we can do is to make these SyntaxWarnings. I had the
>> same thought recently and I do agree that these are common beginners
>> mistakes that can easily hide bugs by succeeding in simple tests.
> In my experience beginners barely read error messages, let alone
> warnings.
> A SyntaxWarning might help intermediate users who have graduated beyond
> the stage of "my program doesn't work, please somebody fix it", but I
> believe that at best it will be ignored by beginners, if not actively
> confuse them. And I expect that most intermediate users will have
> already learned enough not to use "is" when then mean "==".
> So I'm -0 on doing anything to "fix" this. Many things in Python are
> potentially misleading:
> array = [[0]*10]*10
> On the other hand, I must admit that I've been known to accidently write
> "if x is 0:", so perhaps the real benefit is to prevent silly brainos
> (like typos -- thinkos perhaps?) among more experienced coders. Perhaps
> I should increase my vote to +0.

Exactly. Pragmatically, in large code bases this occurs frequently
enough to worry about it, and (unlike language warts like the aliasing
problem you alluded to above) it serves no useful purpose. I have seen
this particular mistake reported many times in Google's extensive
Python codebase.

Maybe we should do something more drastic and always create a new,
unique constant whenever a literal occurs as an argument of 'is' or
'is not'? Then such code would never work, leading people to examine
their code more closely. I betcha we have people who could change the
bytecode compiler easily enough to do that. (I'm not seriously
proposing this, except as a threat of what we could do if the
SyntaxWarning is rejected. :-)

--Guido van Rossum (

From tjreedy at  Tue Oct  9 04:19:44 2012
From: tjreedy at (Terry Reedy)
Date: Mon, 08 Oct 2012 22:19:44 -0400
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k4vfjl$i24$>
References: <k4q38d$j8e$> <>
	<k4tefg$52k$> <k4vfjl$i24$>
Message-ID: <k501kh$qbk$>

On 10/8/2012 5:12 PM, Serhiy Storchaka wrote:
> On 08.10.12 05:40, Terry Reedy wrote:
>> Serhily, if you want a module of *generator* specific functions
>> ('gentools' ?), you should write one and submit it to pypi for testing.
> In there is proposed extending of
> itertools.chain to support generators (send(), throw() and close()
> methods). Is it wrong?


Terry Jan Reedy

From steve at  Tue Oct  9 04:26:53 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 13:26:53 +1100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <20121009022653.GD27445@ando>


- favourite & second-favourite choices: `p.join(q)` or `p.add(q)`.

- most disliked:  `p[q]`

`p[q]`  -1: looks like indexing or key-lookup, isn't either.

`p + q`  +0: potential confusion between path component joining and 
    filename suffix appending.

`p / q`  +0: looks funny if either arg is a literal string.

`p.join(q)`:  +1: self-explanatory, suggests os.path.join, I am not 
    convinced that Nick's fears about confusing Path.join and str.join 
    will be a problem in practice.

`p.pathjoin(q)` and `p.joinpath(q)`  -1: dislike repeating the type 
    name in the method name;  also, too easy to forget which one is

`p.add(q)`  +0.5: nice and short, but not quite self-explanatory.

`p.append(q)`  -0: suggests an in-place modification.


From steve at  Tue Oct  9 04:42:04 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 13:42:04 +1100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <20121009022653.GD27445@ando>
References: <> <20121009022653.GD27445@ando>
Message-ID: <20121009024204.GE27445@ando>

And I knew there was another suggestion tickling around in my 
subconscious... I have a new favourite:

`p & q`  +1: unlikely to be confused with int or set &; strings 
    do not currently use it; suggests concatenation; short, can 
    work with two paths or path and string if needed.

p + ".ext" to add a suffix to the file name; an error if p is a 

"spam" + p should probably an error. I can't think of a good use case 
for prepending a string to a path.

p & q to concatenate (join) path q to path p.

p.add(q [, r, s, ...]) to concatentation multiple path components at 
once, more efficient than p & q & r & ..., and to make the function more 
discoverable and searchable.


From ben+python at  Tue Oct  9 04:54:05 2012
From: ben+python at (Ben Finney)
Date: Tue, 09 Oct 2012 13:54:05 +1100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
References: <>
Message-ID: <>

Antoine Pitrou <solipsis at>

> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:

I hope you count U+2212 MINUS SIGN and not only U+002D HYPHEN-MINUS :-)

> - `p[q]` joins path q to path p

?1. Ugly and counter-intuitive. Bracket syntax is for accessing items of
a collection.

> - `p + q` joins path q to path p

+1. Works as I'd expect it to work, and is easily discovered.

> - `p / q` joins path q to path p

?1. ?/? as a Python operator strongly connotes ?division?, and this
isn't it.

> - `p.join(q)` joins path q to path p

+1. Explicit and clear.

> (you can include a rationale if you want, but don't forget to vote :-))

Thanks for the poll.

 \            ?The whole area of [treating source code as intellectual |
  `\    property] is almost assuring a customer that you are not going |
_o__)               to do any innovation in the future.? ?Gary Barnett |
Ben Finney

From steve at  Tue Oct  9 05:11:09 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 14:11:09 +1100
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <20121009031109.GF27445@ando>

On Mon, Oct 08, 2012 at 05:12:24PM +0900, Stephen J. Turnbull wrote:
> Andy Buckley writes:
>  > So one last question, in case it is an acceptable python-ideas topic:
>  > how about adding readline-like support by default in the
>  > interpreter?
> If readline-like support is available on the system, it's used.
> However, it's apparently only readline-like.  For example, on Mac OS
> X, the BSD-licensed libedit readline emulation is used by default, it
> appears.  I wouldn't expect full functionality there.
> On GNU/Linux systems, as I wrote, True GNU readline is used.  Why this
> particular function isn't bound or doesn't work right, I don't know
> offhand.  It is apparently a bug (my Python sources are from April,
> but I can't see why this would change), since the sources say
> (ll. 927-931 of Modules/readline.c):

I thought so too, but apparently the behaviour being talked about is a 
bash extension to readline. Adding it to Python would be a feature 
request, not a bug fix.

While it's a useful feature, I think that it's probably something which 
can distinguish the vanilla Python interactive interpreter from more 
advanced environments like iPython, which apparently already has it.


From casevh at  Tue Oct  9 05:38:13 2012
From: casevh at (Case Van Horsen)
Date: Mon, 8 Oct 2012 20:38:13 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 6:37 PM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> On Mon, Oct 8, 2012 at 9:07 PM, Guido van Rossum <guido at> wrote:
>> If we want to do *anything* I think we should first introduce a
>> floating point context similar to the Decimal context. Then we can
>> talk.
> +float('inf')

I implemented a floating point context manager for gmpy2 and the MPFR
floating point library. By default, it enables a non-stop mode where
infinities and NaN are returned but you can also raise exceptions. You
can experiment with gmpy2:

Some examples

>>> import gmpy2
>>> gmpy2.get_context()
context(precision=53, real_prec=Default, imag_prec=Default,
        round=RoundToNearest, real_round=Default, imag_round=Default,
        emax=1073741823, emin=-1073741823,
        trap_underflow=False, underflow=False,
        trap_overflow=False, overflow=False,
        trap_inexact=False, inexact=False,
        trap_invalid=False, invalid=False,
        trap_erange=False, erange=False,
        trap_divzero=False, divzero=False,
>>> gmpy2.log(0)
>>> gmpy2.get_context()
context(precision=53, real_prec=Default, imag_prec=Default,
        round=RoundToNearest, real_round=Default, imag_round=Default,
        emax=1073741823, emin=-1073741823,
        trap_underflow=False, underflow=False,
        trap_overflow=False, overflow=False,
        trap_inexact=False, inexact=False,
        trap_invalid=False, invalid=False,
        trap_erange=False, erange=False,
        trap_divzero=False, divzero=True,
>>> gmpy2.get_context().clear_flags()
>>> gmpy2.get_context().trap_divzero=True
>>> gmpy2.log(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
gmpy2.DivisionByZeroError: 'mpfr' division by zero in log()
>>> gmpy2.set_context(gmpy2.context())
>>> gmpy2.nan()==gmpy2.nan()
>>> gmpy2.get_context()
context(precision=53, real_prec=Default, imag_prec=Default,
        round=RoundToNearest, real_round=Default, imag_round=Default,
        emax=1073741823, emin=-1073741823,
        trap_underflow=False, underflow=False,
        trap_overflow=False, overflow=False,
        trap_inexact=False, inexact=False,
        trap_invalid=False, invalid=False,
        trap_erange=False, erange=True,
        trap_divzero=False, divzero=False,
>>> gmpy2.get_context().trap_erange=True
>>> gmpy2.nan()==gmpy2.nan()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
gmpy2.RangeError: comparison with NaN

Standard disclaimers:

* I'm the maintainer of gmpy2.

* Please use SVN or beta2 (when it is released) to avoid a couple of
embarrassing bugs. :(

> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From stephen at  Tue Oct  9 05:42:03 2012
From: stephen at (Stephen J. Turnbull)
Date: Tue, 09 Oct 2012 12:42:03 +0900
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <20121009031109.GF27445@ando>
References: <>
Message-ID: <>

Steven D'Aprano writes:

 > I thought so too, but apparently the behaviour being talked about is a 
 > bash extension to readline. Adding it to Python would be a feature 
 > request, not a bug fix.

In that case, I think it's unfortunately that Python doesn't provide a
way to warn about unimplemented stuff in .inputrc.  Both on my Mac and
on my Gentoo system, C-o simply does nothing.

From guido at  Tue Oct  9 06:13:58 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 21:13:58 -0700
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 8:42 PM, Stephen J. Turnbull <stephen at> wrote:
> Steven D'Aprano writes:
>  > I thought so too, but apparently the behaviour being talked about is a
>  > bash extension to readline. Adding it to Python would be a feature
>  > request, not a bug fix.
> In that case, I think it's unfortunately that Python doesn't provide a
> way to warn about unimplemented stuff in .inputrc.  Both on my Mac and
> on my Gentoo system, C-o simply does nothing.

Please do file a bug about this. Python's interface to readline is
pretty old, I wouldn't be surprised if more functionality could be
added. Regarding operate-and-get-next, I searched for "gnu readline
operate-and-get-next" and found some feature requests about it for
Sage and IPython (not sure of the status there), plus an explanation
of why it's not part of GNU readline: it needs to be implemented by
the calling app because only the latter knows what constitutes a
complete statement. I think either of these would probably be a fun
project for an aspiring core developer interested in improving their C

--Guido van Rossum (

From steve at  Tue Oct  9 06:16:16 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 15:16:16 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <20121009041613.GG27445@ando>

On Sun, Oct 07, 2012 at 10:35:17PM -0400, Ned Batchelder wrote:

> A sentence in section 5.4 (Numeric Types) would help.  Something like, 
> "In accordance with the IEEE 754 standard, NaN's are not equal to any 
> value, even another NaN.  This is because NaN doesn't represent a 
> particular number, it represents an unknown result, and there is no way 
> to know if one unknown result is equal to another unknown result."

NANs don't quite mean "unknown result". If they did they would probably 
be called "MISSING" or "UNKNOWN" or "NA" (Not Available).

NANs represent a calculation result which is Not A Number. Hence the 
name :-) Since we're talking about the mathematical domain here, a 
numeric calculation that doesn't return a numeric result could be said 
to have no result at all: there is no real-valued x for which x**2 == 
-1, hence sqrt(-1) can return a NAN.

It certainly doesn't mean "well, there is an answer, but I don't know 
what it is". It means "I know that there is no answer".

Since neither sqrt(-1) nor sqrt(-2) exist in the reals, we cannot say 
that they are equal. If we did, we could prove anything:

sqrt(-1) = sqrt(-2)

Square both sides:

-1 = -2

I was not on the IEEE committee, so I can't speak for them, but my guess 
is that they reasoned that since there are an infinite number of "no 
result" not-a-number calculations, but only a finite number of NAN bit 
patterns available to be used for them, it isn't even safe to presume 
that two NANs with the same bit pattern are equal since they may have 
come from completely different calculations.

Of course this was before object identity was a relevant factor. As I've 
stated before, I think that having collections choose to optimize away 
equality tests using object identity is fine. If I need a tuple that 
honours NAN semantics, I can subclass tuple to get one. I shouldn't 
expect the default tuple behaviour to carry that cost.

By the way, NANs are awesome and don't get anywhere near enough respect. 
Here's a great idea from the D language:


From steve at  Tue Oct  9 06:32:36 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 15:32:36 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <20121009043236.GI27445@ando>

On Mon, Oct 08, 2012 at 04:39:52PM -0400, Ned Batchelder wrote:

> How about:
> "In accordance with the IEEE 754 standard, when NaNs are compared to any 
> value, even another NaN, the result is always False, regardless of the 
> comparison.  This is because NaN represents an unknown result.  There is no 
> way to know the relationship between an unknown result and any other 
> result, especially another unknown one.  Even comparing a NaN to itself 
> always produces False."

Two issues:

1) It is not the case that NaN <comp> NaN is always false.

2) "invalid result" is more appropriate than "unknown result".


From steve at  Tue Oct  9 06:35:56 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 15:35:56 +1100
Subject: [Python-ideas] History stepping in interactive session?
In-Reply-To: <>
References: <>
Message-ID: <20121009043556.GJ27445@ando>

On Mon, Oct 08, 2012 at 09:13:58PM -0700, Guido van Rossum wrote:
> On Mon, Oct 8, 2012 at 8:42 PM, Stephen J. Turnbull <stephen at> wrote:
> > Steven D'Aprano writes:
> >
> >  > I thought so too, but apparently the behaviour being talked about is a
> >  > bash extension to readline. Adding it to Python would be a feature
> >  > request, not a bug fix.
> >
> > In that case, I think it's unfortunately that Python doesn't provide a
> > way to warn about unimplemented stuff in .inputrc.  Both on my Mac and
> > on my Gentoo system, C-o simply does nothing.
> Please do file a bug about this. Python's interface to readline is
> pretty old, I wouldn't be surprised if more functionality could be
> added. 

The time machine strikes again:


From steve at  Tue Oct  9 06:26:35 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 15:26:35 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <20121009042635.GH27445@ando>

On Mon, Oct 08, 2012 at 09:29:42AM -0700, Guido van Rossum wrote:

> It's not about equality. If you ask whether two NaNs are *unequal* the
> answer is *also* False.

Not so. I think you are conflating NAN equality/inequality with ordering 
comparisons. Using Python 3.3:

py> nan = float('nan')
py> nan > 0
py> nan < 0
py> nan == 0
py> nan != 0


py> nan == nan
py> nan != nan


From greg.ewing at  Tue Oct  9 07:08:18 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 18:08:18 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan wrote:
> Huh? It's a tree structure. A subpath lives inside its parent path,
> just as subnodes are children of their parent node.

You're confusing the path, which is a name, with the
object that it names. It's called a path because it's
the route that you follow from the root to reach the
node being named. To reach a subnode of N requires
following a *longer* path than you did to reach N.
There's no sense in which the *path* to the subnode
is "contained" within the path to N -- rather it's
the other way around.


From ben at  Tue Oct  9 07:12:51 2012
From: ben at (Ben Darnell)
Date: Mon, 8 Oct 2012 22:12:51 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum <guido at> wrote:
>> It's a Future constructor, a (conditional) add_done_callback, plus the
>> calls to set_result or set_exception and the with statement for error
>> handling.  In full:
>> def future_wrap(f):
>>     @functools.wraps(f)
>>     def wrapper(*args, **kwargs):
>>         future = Future()
>>         if kwargs.get('callback') is not None:
>>             future.add_done_callback(kwargs.pop('callback'))
>>         kwargs['callback'] = future.set_result
>>         def handle_error(typ, value, tb):
>>             future.set_exception(value)
>>             return True
>>         with ExceptionStackContext(handle_error):
>>             f(*args, **kwargs)
>>         return future
>>     return wrapper
> Hmm... I *think* it automatically adds a special keyword 'callback' to
> the *call* site so that you can do things like
>   fut = some_wrapped_func(blah, callback=my_callback)
> and then instead of using yield to wait for the callback, put the
> continuation of your code in the my_callback() function.

Yes.  Note that if you're passing in a callback you're probably going
to just ignore the return value.  The callback argument and the future
return value are essentially two alternative interfaces; it probably
doesn't make sense to use both at once (but as a library author it's
useful to provide both).

> But it also
> seems like it passes callback=future.set_result as the callback to the
> wrapped function, which looks to me like that function was apparently
> written before Futures were widely used. This seems pretty impure to
> me and I'd like to propose a "future" where such functions either be
> given the Future where the result is expected, or (more commonly) the
> function would create the Future itself.

Yes, it's impure and based on pre-Future patterns.  The caller's
callback argument and the inner function's callback not really related
any more (they were the same in pre-Future async code of course).
They should probably have different names, although if the inner
function's return value were passed via exception (StopIteration or
return) the inner callback argument can just go away.

> Unless I'm totally missing the programming model here.
> PS. I'd like to learn more about ExceptionStackContext() -- I've
> struggled somewhat with getting decent tracebacks in NDB.

StackContext doesn't quite give you better tracebacks, although I
think it could be adapted to do that.  ExceptionStackContext is
essentially a try/except block that follows you around across
asynchronous operations - on entry it sets a thread-local state, and
all the tornado asynchronous functions know to save this state when
they are passed a callback, and restore it when they execute it.  This
has proven to be extremely helpful in ensuring that all exceptions get
caught by something that knows how to do the appropriate cleanup (i.e.
an asynchronous web page serves an error instead of just spinning
forever), although it has turned out to be a little more intrusive and
magical than I had originally anticipated.

>>>> In Tornado the Future is created by a decorator
>>>> and hidden from the asynchronous function (it just sees the callback),
>>> Hm, interesting. NDB goes the other way, the callbacks are mostly used
>>> to make Futures work, and most code (including large swaths of
>>> internal code) uses Futures. I think NDB is similar to monocle here.
>>> In NDB, you can do
>>>   f = <some function returning a Future>
>>>   r = yield f
>>> where "yield f" is mostly equivalent to f.result(), except it gives
>>> better opportunity for concurrency.
>> Yes, tornado's gen.engine does the same thing here.  However, the
>> stakes are higher than "better opportunity for concurrency" - in an
>> event loop if you call future.result() without yielding, you'll
>> deadlock if that Future's task needs to run on the same event loop.
> That would depend on the semantics of the event loop implementation.
> In NDB's event loop, such a .result() call would just recursively
> enter the event loop, and you'd only deadlock if you actually have two
> pieces of code waiting for each other's completion.

Hmm, I think I'd rather deadlock. :)  If the event loop is reentrant
then the application code has be coded defensively as if it were
preemptively multithreaded, which introduces the possibility of
deadlock or (probably) more subtle/less frequent errors.  Reentrancy
has been a significant problem in my experience, so I've been moving
towards a policy where methods in Tornado that take a callback never
run it immediately; callbacks are always scheduled on the next
iteration of the IOLoop with IOLoop.add_callback.

> [...]
>>> I am currently trying to understand if using "yield from" (and
>>> returning a value from a generator) will simplify things. For example
>>> maybe the need for a special decorator might go away. But I keep
>>> getting headaches -- perhaps there's a Monad involved. :-)
>> I think if you build generator handling directly into the event loop
>> and use "yield from" for calls from one async function to another then
>> you can get by without any decorators.  But I'm not sure if you can do
>> that and maintain any compatibility with existing non-generator async
>> code.
>> I think the ability to return from a generator is actually a bigger
>> deal than "yield from" (and I only learned about it from another
>> python-ideas thread today).  The only reason a generator decorated
>> with @tornado.gen.engine needs a callback passed in to it is to act as
>> a psuedo-return, and a real return would prevent the common mistake of
>> running the callback then falling through to the rest of the function.
> Ah, so you didn't come up with the clever hack of raising an exception
> to signify the return value. In NDB, you raise StopIteration (though
> it is given the alias 'Return' for clarity) with an argument, and the
> wrapper code that is responsible for the Future takes the value from
> the StopIteration exception and passes it to the Future's
> set_result().

I think I may have thought about "raise Return(x)" and dismissed it as
too weird.  But then, I'm abnormally comfortable with asynchronous
code that passes callbacks around.

>> For concreteness, here's a crude sketch of what the APIs I'm talking
>> about would look like in use (in a hypothetical future version of
>> tornado).
>> @future_wrap
>> @gen.engine
>> def async_http_client(url, callback):
>>     parsed_url = urlparse.urlsplit(url)
>>     # works the same whether the future comes from a thread pool or @future_wrap
> And you need the thread pool because there's no async version of
> getaddrinfo(), right?


>>     addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
>>     stream = IOStream(socket.socket())
>>     yield stream.connect((addrinfo[0][-1]))
>>     stream.write('GET %s HTTP/1.0' % parsed_url.path)
> Why no yield in front of the write() call?

Because we don't need to wait for the write to complete before we
continue to the next statement.  write() doesn't return anything; it
just succeeds or fails, and if it fails the next read_until will fail
too. (although in this case it wouldn't hurt to have the yield either)

>>     header_data = yield stream.read_until('\r\n\r\n')
>>     headers = parse_headers(header_data)
>>     body_data = yield stream.read_bytes(int(headers['Content-Length']))
>>     stream.close()
>>     callback(body_data)
>> # another function to demonstrate composability
>> @future_wrap
>> @gen.engine
>> def fetch_some_urls(url1, url2, url3, callback):
>>     body1 = yield async_http_client(url1)
>>     # yield a list of futures for concurrency
>>     future2 = yield async_http_client(url2)
>>     future3 = yield async_http_client(url3)
>>     body2, body3 = yield [future2, future3]
>>     callback((body1, body2, body3))
> This second one is nearly identical to the way we it's done in NDB.
> However I think you have a typo -- I doubt that there should be yields
> on the lines creating future2 and future3.


>> One hole in this design is how to deal with callbacks that are run
>> multiple times.  For example, the IOStream read methods take both a
>> regular callback and an optional streaming_callback (which is called
>> with each chunk of data as it arrives).  I think this needs to be
>> modeled as something like an iterator of Futures, but I haven't worked
>> out the details yet.
> Ah. Yes, that's a completely different kind of thing, and probably
> needs to be handled in a totally different way. I think it probably
> needs to be modeled more like an infinite loop where at the blocking
> point (e.g. a low-level read() or accept() call) you yield a Future.
> Although I can see that this doesn't work well with the IOLoop's
> concept of file descriptor (or other event source) registration.

It works just fine at the IOLoop level:  you call
IOLoop.add_handler(fd, func, READ), and you'll get read events
whenever there's new data until you call remove_handler(fd) (or
update_handler).  If you're passing callbacks around explicitly it's
pretty straightforward (as much as anything ever is in that style) to
allow for those callbacks to be run more than once.  The problem is
that generators more or less require that each callback be run exactly
once.  That's a generally desirable property, but the mismatch between
the two layers can be difficult to deal with.


From greg.ewing at  Tue Oct  9 07:18:20 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 18:18:20 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Massimo DiPierro wrote:
> The fact that string paths in Unix use the / to represent concatenation 
> is accidental. 

Maybe so, but it can be regarded as a fortuitous accident, since
/ also happens to be an operator in Python, so it would have
mnemonic value to Unix users.

The correspondence is not exact for Windows users, but / is similar
enough to still have some mnemonic value for them. And all the
OSes using other separators seem to have died out.


From greg.ewing at  Tue Oct  9 07:31:09 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 18:31:09 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Massimo DiPierro wrote:
> The + symbol means addition and union of disjoint sets. A path 
> (including a fs path) is a set of links (for a fs path, a link is a 
> folder name). Using the + symbols has a natural interpretation as 
> concatenation of subpaths (sets) to for form a longer path (superset).

A reason *not* to use '+' is that it would violate associativity
in some cases, e.g.

    (path + "foo") + "bar"

would not be the same as

    path + ("foo" + "bar")

Using '/', or any other operator not currently defined on strings,
would prevent this mistake from occuring.

A reason to want an operator is the symmetry of path concatenation.
Symmetrical operations deserve a symmetrical syntax, and to achieve
that in Python you need either an operator or a stand-alone function.

A reason to prefer an operator over a function is associativity.
It would be nice to be able to write

    path1 / path2 / path3

and not have to think about the order in which the operations are
being done.

If '/' is considered too much of a stretch, how about '&'? It
suggests a kind of addition or concatenation, and in fact is
used for string concatenation in some other languages.


From alexander.belopolsky at  Tue Oct  9 07:32:12 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Tue, 9 Oct 2012 01:32:12 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <20121009041613.GG27445@ando>
References: <>
	<> <20121009041613.GG27445@ando>
Message-ID: <>

On Tue, Oct 9, 2012 at 12:16 AM, Steven D'Aprano <steve at> wrote:
> NANs don't quite mean "unknown result". If they did they would probably
> be called "MISSING" or "UNKNOWN" or "NA" (Not Available).
> NANs represent a calculation result which is Not A Number. Hence the
> name :-)

This is quite true, but in Python "Not A Number" is spelled None.  In
many aspects, None is like signaling NaN - any numerical operation on
it results in a type error, but None == None is True.

> Since neither sqrt(-1) nor sqrt(-2) exist in the reals, we cannot say
> that they are equal. If we did, we could prove anything:
> sqrt(-1) = sqrt(-2)
> Square both sides:
> -1 = -2

This is a typical mathematical fallacy where a progression of
seemingly equivalent equations contains an invalid operation.  See

This is not an argument to make nan == nan false.  The IEEE 754
argument goes as follows: in the domain of 2**64 bit patterns most
patterns represent real numbers, some represent infinities and some do
not represent either infinities or numbers.  Boolean comparison
operations are defined on the entire domain,  but <, =, or > outcomes
are not exclusive if NaNs are present.  The forth outcome is
"unordered."  In other words for any two patterns x and y one and only
one of the following is true: x < y or x = y or x > y or x and y are
unordered.  If x is NaN, it compares as unordered to any other pattern
including itself.   This explains why compareQuietEqual(x, x) is false
when x is NaN.  In this case, x is unordered with itself, unordered is
different from equal, so  compareQuietEqual(x, x) cannot be true.  It
cannot raise an exception either because it has to be quiet.  Thus the
only correct result is to return false.

The problem that we have in Python is that float.__eq__ is used for
too many different things and compareQuietEqual is not always
appropriate. Here is a partial list:

1. x == y
2. x in [y]
3. {y:1}[x]
4. x in {y}
5. [y].index(x)

In python 3, we already took a step away from using the same notion of
equality in all these cases.  Thus in #2, we use x is y or x == y
instead of plain x == y.  But that leads to some strange results:

>>> x = float('nan')
>>> x in [x]
>>> float('nan') in [float('nan')]

An alternative would be to define x in l as any(isnan(x) and isnan(y)
or x == y for y in l) when x and all elements of l are floats.  Again,
I am not making a change proposal - just mention a possibility.

From greg.ewing at  Tue Oct  9 07:36:48 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 18:36:48 +1300
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:

> - `p[q]` joins path q to path p

-1, confuses operation on a path with operation on the object named by the path.

> - `p + q` joins path q to path p

-0.9, interacts with string concatenation in undesirable ways

> - `p / q` joins path q to path p


> - `p.join(q)` joins path q to path p

-0.9, 'append' would be clearer IMO


From greg.ewing at  Tue Oct  9 07:41:46 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 18:41:46 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Antoine Pitrou wrote:

> But you really want a short method name, otherwise it's better to have
> a dedicated operator.  joinpath() definitely doesn't cut it, IMO.

I agree, it's far too longwinded. It would clutter your code
just as badly as using os.path.join() all over the place does
now, but without the option of aliasing it to a shorter name.


From greg.ewing at  Tue Oct  9 07:48:23 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 18:48:23 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Stefan Krah wrote:
> '^' or '@' are used for concatenation in some languages. At least accidental
> confusion with xor is pretty unlikely.

We'd have to add '@' as a new operator before we could use that.

But '^' might have possibilities... if you squint, it looks a
bit like a compromise between Unix and Windows path separators. :-)


From greg.ewing at  Tue Oct  9 07:56:15 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 18:56:15 +1300
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Adam wrote:
> 1) event handlers for the machine-program interface (ex. network I/O)
> 2) event handlers for the program-user interface (ex. mouse I/O)
> While similar, my gut tell me they have to be handled in completely
> different way in order to preserve order (i.e. sanity).

They can't be *completely* different, because deep down there
has to be a single event loop that can handle all kinds of
asynchronous events.

Upper layers can provide different APIs for them, but there
has to be some commonality in the lowest layers.


From greg.ewing at  Tue Oct  9 08:02:35 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 19:02:35 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Nick Coghlan wrote:

> Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write
> it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least
> isn't going backwards, and is more obvious in isolation than "a / b /
> c / d / e".

I think we should keep in mind that we're (hopefully) not going
to see things like "a / b / c / d / e" in real-life code. Rather
we're going to see things like

    backupath = destdir / "archive" / filename + ".bak"

In other words, there should be some clue from the names
that paths are involved, from which it should be fairly
easy to guess what the "/" means.


From greg.ewing at  Tue Oct  9 08:11:18 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 19:11:18 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

> On 8 October 2012 20:14, Stefan Krah <stefan at 
> <mailto:stefan at>> wrote:
>     # A bit long
>     # My personal objection is that one shouldn't have to state "path"
>     in the name: it's not str.stringjoin()
>     configdir.joinpath("myprogram")
>     configdir.pathjoin("myprogram") 

I was just thinking the same thing.

My preference for this at the moment is 'append', notwithstanding
the fact that it will be non-mutating. It's a single, short word, it
avoids re-stating the datatype, and it resonates with the idea of
appending to a sequence of path components.

>     # My favorites ('cause my opinion: so there)
>     configdir.child("myprogram")  # Does sorta' imply IO

Except that the result isn't always a child (the RHS could be
an absolute path, start with "..", etc.)


Aaaghh... my brain... the lobotomy does nothing...


From alexander.belopolsky at  Tue Oct  9 08:14:10 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Tue, 9 Oct 2012 02:14:10 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 10:09 PM, Guido van Rossum <guido at> wrote:
> Such a rationale exists in my mind. Since floats are immutable, an
> implementation may or may not intern certain float values (just as
> certain string and int values are interned but others are not).

This is an interesting argument, but I don't quite understand it.  Are
you suggesting that some valid Python implementation may inter NaNs?
Wouldn't that require that all NaNs are equal?

> Therefore, the fact that "x is y" says nothing about whether the
> computations that produced x and y had anything to do with each other.


> This is not true for mutable objects: if I have two lists, computed
> separately, and find they are the same object, the computations that
> produced them must have communicated somehow, or the same list was
> passed in to each computations.


> So, since two computations might
> return the same object without having followed the same computational
> path, in another implementation the exact same computation might not
> return the same object, and so the == comparison should produce the
> same value in either case

True, but this logic does not dictate what this values should be.

> -- in particular, if x and y are both NaN,
> all 6 comparisons on them should return False (given that in general
> comparing two NaNs returns False regardless of the operator used).

Except for operator compareQuietUnordered() which is missing in
Python.  Note that IEEE 754 also defines   totalOrder() operation
which is more or less lexicographical ordering of bit patterns.  A
hypothetical language could map its 6 comparisons to  totalOrder() and
still claim  IEEE 754 conformity as long as it implements the other 22
comparison predicates somehow.

> The reason for invoking IEEE 754 here is that without it, Python might
> well have grown a language-wide rule stating that an object should
> *always* compare equal to itself, as there would have been no
> significant counterexamples.

Why would it be a bad thing?  Isn't this rule what Bertrand Meyer
calls one of the pillars of civilization?

It looks like you give a circular argument.  Python cannot have a rule
that x is y implies x == y because that would preclude implementing
float.__eq__ as IEEE 754 equality comparison and we implement
float.__eq__ as IEEE 754 equality comparison in order to provide a
significant counterexample to x is y implies x == y rule.  I am not
sure how interning comes into play here, so I must have missed

From dickinsm at  Tue Oct  9 08:43:57 2012
From: dickinsm at (Mark Dickinson)
Date: Tue, 9 Oct 2012 07:43:57 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 9:39 PM, Ned Batchelder <ned at> wrote:
> How about:
> "In accordance with the IEEE 754 standard, when NaNs are compared to any
> value, even another NaN, the result is always False, regardless of the
> comparison.  This is because NaN represents an unknown result.  There is no
> way to know the relationship between an unknown result and any other result,
> especially another unknown one.  Even comparing a NaN to itself always
> produces False."

Looks fine, but I'd suggest leaving out the philosophy ('there is no
way to know ...') and sticking to the statement that Python follows
the IEEE 754 standard in this respect.  The justification isn't
particularly convincing and (IMO) only serves to invite arguments.


From guido at  Tue Oct  9 08:44:12 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 23:44:12 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <20121009042635.GH27445@ando>
References: <>
Message-ID: <>

This smells like a bug in the != operator, it seems to fall back to not ==
which it didn't used to. More later.....

On Monday, October 8, 2012, Steven D'Aprano wrote:

> On Mon, Oct 08, 2012 at 09:29:42AM -0700, Guido van Rossum wrote:
> > It's not about equality. If you ask whether two NaNs are *unequal* the
> > answer is *also* False.
> Not so. I think you are conflating NAN equality/inequality with ordering
> comparisons. Using Python 3.3:
> py> nan = float('nan')
> py> nan > 0
> False
> py> nan < 0
> False
> py> nan == 0
> False
> py> nan != 0
> True
> but:
> py> nan == nan
> False
> py> nan != nan
> True
> --
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at <javascript:;>

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From dickinsm at  Tue Oct  9 08:49:30 2012
From: dickinsm at (Mark Dickinson)
Date: Tue, 9 Oct 2012 07:49:30 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 7:44 AM, Guido van Rossum <guido at> wrote:
> This smells like a bug in the != operator, it seems to fall back to not ==
> which it didn't used to. More later.....

I'm fairly sure it's deliberate, and has been this way in Python for a
long time. IEEE 754 also has x != x when x is a NaN (at least, for
those IEEE 754 functions that return a boolean rather than signaling
an invalid exception), and it's a well documented property of NaNs
across languages.


From ben at  Tue Oct  9 08:53:11 2012
From: ben at (Ben Darnell)
Date: Mon, 8 Oct 2012 23:53:11 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 10:56 PM, Greg Ewing <greg.ewing at> wrote:
> Mark Adam wrote:
>> 1) event handlers for the machine-program interface (ex. network I/O)
>> 2) event handlers for the program-user interface (ex. mouse I/O)
>> While similar, my gut tell me they have to be handled in completely
>> different way in order to preserve order (i.e. sanity).
> They can't be *completely* different, because deep down there
> has to be a single event loop that can handle all kinds of
> asynchronous events.

There doesn't *have* to be - you could run a network event loop in one
thread and a GUI event loop in another and pass control back and forth
via methods like IOLoop.add_callback or Reactor.callFromThread.
However, Twisted has Reactor implementations that are integrated with
several different GUI toolkit's event loops, and while I haven't
worked with such a beast my gut instinct is that in most cases a
single shared event loop is the way to go.


> Upper layers can provide different APIs for them, but there
> has to be some commonality in the lowest layers.
> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From guido at  Tue Oct  9 08:58:55 2012
From: guido at (Guido van Rossum)
Date: Mon, 8 Oct 2012 23:58:55 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 11:49 PM, Mark Dickinson <dickinsm at> wrote:
> On Tue, Oct 9, 2012 at 7:44 AM, Guido van Rossum <guido at> wrote:
>> This smells like a bug in the != operator, it seems to fall back to not ==
>> which it didn't used to. More later.....
> I'm fairly sure it's deliberate, and has been this way in Python for a
> long time. IEEE 754 also has x != x when x is a NaN (at least, for
> those IEEE 754 functions that return a boolean rather than signaling
> an invalid exception), and it's a well documented property of NaNs
> across languages.

Yeah, sorry, I misremembered. :-) This does mean we need to update the
text Ned is proposing.

--Guido van Rossum (

From steve at  Tue Oct  9 09:05:49 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 18:05:49 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <20121009070549.GA30054@ando>

On Mon, Oct 08, 2012 at 11:44:12PM -0700, Guido van Rossum wrote:
> This smells like a bug in the != operator, it seems to fall back to not ==
> which it didn't used to. More later.....

I'm pretty sure the behaviour is correct. When I get home this evening, 
I will check my copy of the Standard Apple Numerics manual (one of the 
first IEEE 754 compliant systems). In the meantime, I quote from 

"What Every Computer Scientist Should Know About Floating-Point 

"Since comparing a NaN to a number with <, ?, >, ?, or = (but not ?) 
always returns false..."

(Admittedly it doesn't specifically state the case of comparing a NAN 
with a NAN.)


From senthil at  Tue Oct  9 09:10:47 2012
From: senthil at (Senthil Kumaran)
Date: Tue, 9 Oct 2012 00:10:47 -0700
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou <solipsis at> wrote:

> - `p[q]` joins path q to path p


I think, this is listed as example in PEP 428.
I had to look it up to understand. Not intuitive (to me atleast) as join.

> - `p + q` joins path q to path p


I would be +1. But in the PEP you have listed that we need a way
separate path behaviors from confusing with builtins
Though it provides a lot of convenience, it can be confused with str
behaviors or other object behaviors.

> - `p / q` joins path q to path p


> - `p.join(q)` joins path q to path p


> `p.pathjoin(q)`


It is very explicit and hard to get it wrong.

From p.f.moore at  Tue Oct  9 09:12:52 2012
From: p.f.moore at (Paul Moore)
Date: Tue, 9 Oct 2012 08:12:52 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On 9 October 2012 00:19, Ryan D Hiebert <ryan at> wrote:
> If we want a p.pathjoin method, it would make sense to me for it to work similar to urllib.parse.urljoin

The parallel with urljoin also suggests that pathjoin is a better name
than joinpath. But note that I've seen both used in this thread -
there is obviously some level of confusion possible.


From guido at  Tue Oct  9 09:13:08 2012
From: guido at (Guido van Rossum)
Date: Tue, 9 Oct 2012 00:13:08 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 8, 2012 at 11:14 PM, Alexander Belopolsky
<alexander.belopolsky at> wrote:
> On Mon, Oct 8, 2012 at 10:09 PM, Guido van Rossum <guido at> wrote:
>> Such a rationale exists in my mind. Since floats are immutable, an
>> implementation may or may not intern certain float values (just as
>> certain string and int values are interned but others are not).
> This is an interesting argument, but I don't quite understand it.  Are
> you suggesting that some valid Python implementation may inter NaNs?
> Wouldn't that require that all NaNs are equal?

Sorry, it seems I got this part slightly wrong. Forget interning. The
argument goes the other way: If you *do* compute x and y exactly the
same way, and if they don't return the same object, and if they both
return NaN, the rules for comparing NaN apply, and the values must
compare unequal. So if you compute them exactly the same way but
somehow you do return the same object, that shouldn't suddenly make
them compare equal.

>> Therefore, the fact that "x is y" says nothing about whether the
>> computations that produced x and y had anything to do with each other.
> True.
>> This is not true for mutable objects: if I have two lists, computed
>> separately, and find they are the same object, the computations that
>> produced them must have communicated somehow, or the same list was
>> passed in to each computations.
> True.
>> So, since two computations might
>> return the same object without having followed the same computational
>> path, in another implementation the exact same computation might not
>> return the same object, and so the == comparison should produce the
>> same value in either case
> True, but this logic does not dictate what this values should be.
>> -- in particular, if x and y are both NaN,
>> all 6 comparisons on them should return False (given that in general
>> comparing two NaNs returns False regardless of the operator used).
> Except for operator compareQuietUnordered() which is missing in
> Python.  Note that IEEE 754 also defines   totalOrder() operation
> which is more or less lexicographical ordering of bit patterns.  A
> hypothetical language could map its 6 comparisons to  totalOrder() and
> still claim  IEEE 754 conformity as long as it implements the other 22
> comparison predicates somehow.

Yes, but that's not the choice Python made, so it's irrelevant.
(Unless you now *do* want to change the language, despite stating
several times that you were just asking for explanations. :-)

>> The reason for invoking IEEE 754 here is that without it, Python might
>> well have grown a language-wide rule stating that an object should
>> *always* compare equal to itself, as there would have been no
>> significant counterexamples.
> Why would it be a bad thing?  Isn't this rule what Bertrand Meyer
> calls one of the pillars of civilization?

I spent a week with Bertrand recently. He is prone to exaggeration. :-)

> It looks like you give a circular argument.  Python cannot have a rule
> that x is y implies x == y because that would preclude implementing
> float.__eq__ as IEEE 754 equality comparison and we implement
> float.__eq__ as IEEE 754 equality comparison in order to provide a
> significant counterexample to x is y implies x == y rule.  I am not
> sure how interning comes into play here, so I must have missed
> something.

No, that's not what I meant -- maybe my turn of phrase "invoking IEEE"
was confusing. The first part is what I meant: "Python cannot have a
rule that x is y implies x == y because that would preclude
implementing float.__eq__ as IEEE 754 equality comparison." The second
half should be: "And we have already (independently from all this)
decided that we want to implement float.__eq__ as IEEE 754 equality
comparison." I'm sure a logician could rearrange the words a bit and
make it look more logical.

--Guido van Rossum (

From guido at  Tue Oct  9 09:13:38 2012
From: guido at (Guido van Rossum)
Date: Tue, 9 Oct 2012 00:13:38 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <20121009070549.GA30054@ando>
References: <>
	<> <>
Message-ID: <>

Already retracted. :-(

On Tue, Oct 9, 2012 at 12:05 AM, Steven D'Aprano <steve at> wrote:
> On Mon, Oct 08, 2012 at 11:44:12PM -0700, Guido van Rossum wrote:
>> This smells like a bug in the != operator, it seems to fall back to not ==
>> which it didn't used to. More later.....
> I'm pretty sure the behaviour is correct. When I get home this evening,
> I will check my copy of the Standard Apple Numerics manual (one of the
> first IEEE 754 compliant systems). In the meantime, I quote from
> "What Every Computer Scientist Should Know About Floating-Point
> Arithmetic"
> "Since comparing a NaN to a number with <, ?, >, ?, or = (but not ?)
> always returns false..."
> (Admittedly it doesn't specifically state the case of comparing a NAN
> with a NAN.)
> --
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

--Guido van Rossum (

From senthil at  Tue Oct  9 09:19:29 2012
From: senthil at (Senthil Kumaran)
Date: Tue, 9 Oct 2012 00:19:29 -0700
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 12:12 AM, Paul Moore <p.f.moore at> wrote:
> On 9 October 2012 00:19, Ryan D Hiebert <ryan at> wrote:
>> If we want a p.pathjoin method, it would make sense to me for it to work similar to urllib.parse.urljoin
> The parallel with urljoin also suggests that pathjoin is a better name
> than joinpath. But note that I've seen both used in this thread -
> there is obviously some level of confusion possible.

pathjoin is strikes well, if we are already accustomed with the term 'urljoin'.

Ryan - the protocols of those two joins will vary and should not be
confused. Also pathjoin specifics would be listed in PEP 428.


From greg.ewing at  Tue Oct  9 09:35:05 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 20:35:05 +1300
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
	<k4vfsk$enj$> <>
Message-ID: <>

T.B. wrote:
> A small problem I see with 'add' (and with 
> 'append') is that the outcome of adding (or appending) an absolute path 
> is too surprising, unlike with the 'join' or 'joinpath' names.

I don't think it's any less surprising with "join" -- when
you join two things, you just as much expect both of them to
be part of the result.

There doesn't seem to be any concise term that encompasses
all the nuances of the operation. Using an arbitrarily chosen
operator would at least have the advantage of sidestepping
the whole concern.

Programmer 1: "Hey, what does ^ do on path objects?"

Programmer 2: "It concatenates them with a path separator
between, except when the second one is an absolute path,
in which case it just returns the second one."

Programmer 1: "That's so obscure. Why didn't they just
define a concat_with_pathsep_or_second_if_absolute()
method... oh, wait, I think I see..."


From p.f.moore at  Tue Oct  9 09:36:58 2012
From: p.f.moore at (Paul Moore)
Date: Tue, 9 Oct 2012 08:36:58 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 9 October 2012 06:41, Greg Ewing <greg.ewing at> wrote:
> Antoine Pitrou wrote:
>> But you really want a short method name, otherwise it's better to have
>> a dedicated operator.  joinpath() definitely doesn't cut it, IMO.
> I agree, it's far too longwinded. It would clutter your code
> just as badly as using os.path.join() all over the place does
> now, but without the option of aliasing it to a shorter name.

Good point - the fact that it's not possible to alias a method name
means that it's important to get the name right if we're to use a
method, because we're all stuck with it forever. Because of that, I'm
much more reluctant to "just put up with" Path.pathjoin on the basis
that it's better than any other option.

Are there any libraries that use a method on a path object (or
something similar - URL objects, maybe) and if so, what method name
did they use? I'd like to see what real code using any proposed method
name would look like. As a point of reference, twisted's FilePath
class uses "child".


From greg.ewing at  Tue Oct  9 09:51:20 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 20:51:20 +1300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
	<> <k4tefg$52k$>
Message-ID: <>

Oscar Benjamin wrote:

> They do provide the same kind of iterator in the sense that they
> reproduce the properties of the object *in so far as it is an
> iterator* by yielding the same values.

I think we agree on that. Where we seem to disagree is on
whether returning a value with StopIteration is part of the
iterator protocol or the generator protocol.

To my mind it's part of the generator protocol, and as such,
itertools functions are not under any obligation to support


From songofacandy at  Tue Oct  9 09:52:52 2012
From: songofacandy at (INADA Naoki)
Date: Tue, 9 Oct 2012 16:52:52 +0900
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

- `p[q]` joins path q to path p

Because I can't imagine consistent iterator and __contains__.

- `p + q` joins path q to path p

+0. I prefer '/' because it is very common path separator.

- `p / q` joins path q to path p


- `p.join(q)` joins path q to path p

+1. But `q` should be `*q`.

-1 on `pathjoin`.  `Path.pathjoin` is ugly.
The `urljoin()` is OK because it is just a function.

INADA Naoki  <songofacandy at>

From greg.ewing at  Tue Oct  9 10:13:25 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 21:13:25 +1300
Subject: [Python-ideas] Subpaths [was Re: PEP 428 - object-oriented
 filesystem paths]
In-Reply-To: <>
References: <>
Message-ID: <>

Steven D'Aprano wrote:

> The point is, despite the common "sub" prefix, the semantics of
> "subdirectory" is quite different from the semantics of "substring",
> "subset", "subtree" and "subpath".

I think the "sub" in "subdirectory" is more in the
sense of "below", rather than "is a part of". Like a
submarine is something that travels below the surface of
the sea, not something that's part of the sea.


From him at  Tue Oct  9 10:18:10 2012
From: him at (=?ISO-8859-1?Q?Joachim_K=F6nig?=)
Date: Tue, 09 Oct 2012 10:18:10 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 09/10/2012 00:47 Greg Ewing wrote:
> I'd prefer 'append', because
>    path.append("somedir", "file.txt")
> is pretty self-explanatory, whereas

As has already been stated by others, paths are immutable so using them
like lists is leading to confusion (and list's append() only wants one 
arg, so
extend() might be better in that case).

But paths could then be interpreted as tuples of "directory entries" 

So adding a path to a path would "join" them:

pathA + pathB

and in order to not always need a path object for pathB one could also write
the right argument of __add__ as a tuple of strings:

pathA + ("somedir", "file.txt")

One could also use "+" for adding to the last segment if it isn't a path 
object or a tuple:

pathA + ".tar.gz"


From greg.ewing at  Tue Oct  9 10:19:54 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 21:19:54 +1300
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Oscar Benjamin wrote:
> The main purpose of quiet NaNs is to propagate through computation
> ruining everything they touch.

But they stop doing that as soon as they hit an if statement.
It seems to me that the behaviour chosen for NaN comparison
could just as easily make things go wrong as make them go
right. E.g.

    while not (error < epsilon):

If error ever ends up being NaN, this will go into an
infinite loop.


From ncoghlan at  Tue Oct  9 10:22:47 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Oct 2012 13:52:47 +0530
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 12:43 PM, Guido van Rossum <guido at> wrote:
> No, that's not what I meant -- maybe my turn of phrase "invoking IEEE"
> was confusing. The first part is what I meant: "Python cannot have a
> rule that x is y implies x == y because that would preclude
> implementing float.__eq__ as IEEE 754 equality comparison." The second
> half should be: "And we have already (independently from all this)
> decided that we want to implement float.__eq__ as IEEE 754 equality
> comparison." I'm sure a logician could rearrange the words a bit and
> make it look more logical.

I'll have a go. It's a lot longer, though :)

When designing their floating point support, language designers must
choose between two mutually exclusive options:
1. IEEE754 compliant floating point comparison where NaN != NaN, *even
if* they're the same object
2. The invariant that "x is y" implies "x == y"

The idea behind following the IEEE754 model is that mathematics is a
*value based system*. There is only really one NaN, just as there is
only one 4 (or 5, or any other specific value). The idea of a number
having an identity distinct from its value simply doesn't exist. Thus,
when modelling mathematics in an object system, it makes sense to say
that *object identity is irrelevant, and only value matters*.

This is the approach Python has chosen: for *numeric* operations,
including comparisons, object identity is irrelevant to the maximum
extent that is practical. Thus "x = float('nan'); assert x != x" holds
for *exactly the same reason* that "x = 10e50; y = 10e50; assert x ==
y" holds.

However, when it comes to containers, being able to assume that "x is
y" implies "x == y" has an immense practical benefit in terms of being
able to implement a large number of non-trivial optimisations. Thus
the Python language definition explicitly allows containers to make
that assumption, *even though it is known not to be universally true*.

This hybrid model means that even though "'x is y' implies 'x == y'"
is not true in the general case, it may still be *assumed to be true*
regardless by container implementations. In particular, the containers
defined in the standard library reference are *required* to make this

This does mean that certain invariants about containers don't hold in
the presence of NaN values. This is mostly a theoretical concern, but,
in those cases where it *does* matter, then the appropriate solution
is to implement a custom container type that handles NaN values

It's perhaps worth including a section explaining this somewhere in
the language reference. It's not an accident that Python behaves the
way it does, but it's certainly a rationale that can help implementors
correctly interpret the rest of the language spec.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From solipsis at  Tue Oct  9 10:30:58 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 9 Oct 2012 08:30:58 +0000 (UTC)
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

Nick Coghlan <ncoghlan at ...> writes:
> On Tue, Oct 9, 2012 at 12:10 AM, Antoine Pitrou <solipsis at ...> wrote:
> > On Mon, 8 Oct 2012 10:06:17 -0600
> > Andrew McNabb <amcnabb at ...> wrote:
> >>
> >> Since this really is a matter of personal taste, I'll end my
> >> participation in this discussion by voicing support for Nick Coghlan's
> >> suggestion of a `join` method, whether it's named `join` or `append` or
> >> something else.
> >
> > The join() method already exists in the current PEP, but it's less
> > convenient, synctatically, than either '[]' or '/'.
> Right. My objections boil down to:
> 1. The case has not been adequately made that a second way to do it is
> needed. Therefore, the initial version should just include the method
> API.

For the record, most Path objects out there seem to include an operator-based
join operation (Twisted's FilePath is an exception, but its API is generally not
very pretty).

Still, I'll let the poll run a bit more :-)



From ncoghlan at  Tue Oct  9 10:31:42 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Oct 2012 14:01:42 +0530
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 1:22 PM, INADA Naoki <songofacandy at> wrote:
> -1 on `pathjoin`.  `Path.pathjoin` is ugly.
> The `urljoin()` is OK because it is just a function.

Hmm, this is a *very* interesting point. *All* of the alternatives
presented are mainly replacements for just doing this:

Path(p, q)

And if you want a partially applied version, that's just:

prefix = functools.partial(Path, p)

So perhaps the right answer for the initial API is: no method, no
operator, just use the constructor?

The counterargument is that this approach doesn't let "p" control the
return type the way a method or operator does, though.

It does suggest a whole new class of verbs though, like "make" or "build".


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From greg.ewing at  Tue Oct  9 10:35:21 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 21:35:21 +1300
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Alexander Belopolsky wrote:
> "For attribute specification, the implementation
> shall provide language-defined means, such as compiler directives, to
> specify a constant value for the attribute parameter for all standard
> operations in a block; the scope of the attribute value is the block
> with which it is associated."  I believe Decimal is mostly conforming,

That depends on whether "scope" is meant lexically or
dynamically. Decimal contexts are scoped dynamically.


From storchaka at  Tue Oct  9 10:35:54 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 11:35:54 +0300
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <20121009024204.GE27445@ando>
References: <> <20121009022653.GD27445@ando>
Message-ID: <k50nle$pqv$>

On 09.10.12 05:42, Steven D'Aprano wrote:
> p + ".ext" to add a suffix to the file name; an error if p is a
> directory.

Why? A directory can have a suffix. E.g. /etc/init.d.

From steve at  Tue Oct  9 10:42:41 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 19:42:41 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <20121009084239.GB30054@ando>

On Tue, Oct 09, 2012 at 08:36:58AM +0100, Paul Moore wrote:
> On 9 October 2012 06:41, Greg Ewing <greg.ewing at> wrote:
> > Antoine Pitrou wrote:
> >
> >> But you really want a short method name, otherwise it's better to have
> >> a dedicated operator.  joinpath() definitely doesn't cut it, IMO.
> >
> >
> > I agree, it's far too longwinded. It would clutter your code
> > just as badly as using os.path.join() all over the place does
> > now, but without the option of aliasing it to a shorter name.
> Good point - the fact that it's not possible to alias a method name
> means that it's important to get the name right if we're to use a
> method, because we're all stuck with it forever.


py> f = str.join  # "join" is too long and I don't like it
py> f("*", ["spam", "ham", "eggs"])

We should get the name right because we're stuck with it forever due to 
backwards compatibility, not because you can't alias it.


From steve at  Tue Oct  9 10:43:51 2012
From: steve at (Steven D'Aprano)
Date: Tue, 9 Oct 2012 19:43:51 +1100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <k50nle$pqv$>
References: <> <20121009022653.GD27445@ando>
	<20121009024204.GE27445@ando> <k50nle$pqv$>
Message-ID: <20121009084351.GC30054@ando>

On Tue, Oct 09, 2012 at 11:35:54AM +0300, Serhiy Storchaka wrote:
> On 09.10.12 05:42, Steven D'Aprano wrote:
> >p + ".ext" to add a suffix to the file name; an error if p is a
> >directory.
> Why? A directory can have a suffix. E.g. /etc/init.d.

Fair point.


From rosuav at  Tue Oct  9 10:44:45 2012
From: rosuav at (Chris Angelico)
Date: Tue, 9 Oct 2012 19:44:45 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 7:19 PM, Greg Ewing <greg.ewing at> wrote:
> But they stop doing that as soon as they hit an if statement.
> It seems to me that the behaviour chosen for NaN comparison
> could just as easily make things go wrong as make them go
> right. E.g.
>    while not (error < epsilon):
>       find_a_better_approximation()
> If error ever ends up being NaN, this will go into an
> infinite loop.

But if you know that that's a possibility, you simply code your
condition the other way:

while error > epsilon:

Which will then immediately terminate the loop if error bonks to NaN.


From oscar.j.benjamin at  Tue Oct  9 10:52:01 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Tue, 9 Oct 2012 09:52:01 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 9, 2012 9:20 AM, "Greg Ewing" <greg.ewing at> wrote:
> Oscar Benjamin wrote:
>> The main purpose of quiet NaNs is to propagate through computation
>> ruining everything they touch.
> But they stop doing that as soon as they hit an if statement.
> It seems to me that the behaviour chosen for NaN comparison
> could just as easily make things go wrong as make them go
> right. E.g.
>    while not (error < epsilon):
>       find_a_better_approximation()
> If error ever ends up being NaN, this will go into an
> infinite loop.

I should expect that an experienced numericist would be aware of the
possibility of a NaN and make a trivial modification of your loop to take
advantage of the simple fact that any comparison with NaN returns false. It
is only because you have artificially placed a not in the while clause that
it doesn't work. I would have tested for error>eps without even thinking
about NaNs.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From victor.stinner at  Tue Oct  9 11:33:08 2012
From: victor.stinner at (Victor Stinner)
Date: Tue, 9 Oct 2012 11:33:08 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p
> - `p + q` joins path q to path p
> - `p / q` joins path q to path p
> - `p.join(q)` joins path q to path p

I cannot decide with such trivial examples. More realistic examples:
def read_config(name):
  home = Path(os.path.expanduser("~")) # pathlib doesn't support expanduser??
  with open(home / ".config" / name + ".conf") as f:

The join() method has an advantage: it avoids temporary objects
(config / ".config" is my example).
def read_config(name):
  home = Path(os.path.expanduser("~")) # pathlib doesn't support expanduser??
  with open(home.join(".config", name + ".conf")) as f:

It should work even if name is a Path object, so Path + str should
concatenate a suffix without adding directory separator.

My vote:

> - `p[q]` joins path q to path p

home[".config"][name] # + ".conf" ???


> - `p + q` joins path q to path p

home + ".config" + name # + ".conf" ???

-1 -> Path + str must be reserved to add a suffix

> - `p / q` joins path q to path p

home / ".config" / name + ".conf"

+1: it's natural, but maybe "suboptimal" in performance

> - `p.join(q)` joins path q to path p

home.join(".config", name + ".conf")

+0: more efficient, but it may be confusing with str.join() which is
very different.

a.join(b, c) : a is the separator or the root directory, depending on
the type of a (str or Path).

We should avoid confusion between Path and str methods and operator
(a+b and a.join(b)).


From solipsis at  Tue Oct  9 11:43:02 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 9 Oct 2012 09:43:02 +0000 (UTC)
Subject: [Python-ideas] PEP 428: poll about the joining syntax
References: <>
Message-ID: <>

Nick Coghlan <ncoghlan at ...> writes:
> On Tue, Oct 9, 2012 at 1:22 PM, INADA Naoki <songofacandy at ...> wrote:
> > -1 on `pathjoin`.  `Path.pathjoin` is ugly.
> > The `urljoin()` is OK because it is just a function.
> Hmm, this is a *very* interesting point. *All* of the alternatives
> presented are mainly replacements for just doing this:
> Path(p, q)
> And if you want a partially applied version, that's just:
> prefix = functools.partial(Path, p)
> So perhaps the right answer for the initial API is: no method, no
> operator, just use the constructor?

Well, you would have to use either PurePath(p, q) or Path(p, q) based on whether
p is pure or concrete. Unless we make the constructor more magic and let Path()
switch to PurePath() when the first argument is a pure path. Which does sounds a
bit too magic to me (Path would instantiate something which is 
not a Path instance...).

> It does suggest a whole new class of verbs though, like "make" or "build".

They are rather vague, though.



From storchaka at  Tue Oct  9 11:58:54 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 12:58:54 +0300
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <k50sh0$4q8$>

On 09.10.12 02:05, Mike Graham wrote:
> I can't find this in a couple versions of Python I checked. If this
> code is still around, it sounds like it has a bug and should be fixed.

It's "if node.tagname is 'admonition':" line.

> has an `is 0` check and an `is ""` check. Both should be fixed.

From greg.ewing at  Tue Oct  9 12:34:24 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 23:34:24 +1300
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

I just consulted a thesaurus about synonyms for 'append',
and it came up with 'affix' and 'adjoin'.


From greg.ewing at  Tue Oct  9 11:11:43 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 09 Oct 2012 22:11:43 +1300
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Ben Darnell wrote:

> StackContext doesn't quite give you better tracebacks, although I
> think it could be adapted to do that.  ExceptionStackContext is
> essentially a try/except block that follows you around across
> asynchronous operations - on entry it sets a thread-local state, and
> all the tornado asynchronous functions know to save this state when
> they are passed a callback, and restore it when they execute it.

This is something that generator-based coroutines using
yield-from ought to handle a lot more cleanly. You should
be able to just use an ordinary try-except block in your
generator code and have it do the right thing.

I hope that the new async core will be designed so that
generator-based coroutines can be plugged into it directly
and efficiently, without the need for a lot of decorators,
callbacks, Futures, etc. in between.


From ubershmekel at  Tue Oct  9 13:03:38 2012
From: ubershmekel at (Yuval Greenfield)
Date: Tue, 9 Oct 2012 13:03:38 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 3:02 AM, Calvin Spealman <ironfroggy at>wrote:

> was in teh wild, and is still in use. Why do we find ourselves
> debating new libraries like this as PEPs? We need to let them play out, see
> what sticks. If someone wants to make this library and stick it on PyPI,
> I'm not stopping them. I'm encouraging it. Let's see how it plays out. if
> it works out well, it deserves a PEP. In two or three years.
I agree,

This discussion has been framed unfairly.

The only things that should appear in this PEP are the guidelines Guido
mentioned earlier in the discussion along with some use cases.

So python is chartering a path object module, and we should let whichever
module is the best on pypi eventually get into the std-lib.

Yuval Greenfield
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Tue Oct  9 13:26:47 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 9 Oct 2012 11:26:47 +0000 (UTC)
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
Message-ID: <>

Yuval Greenfield <ubershmekel at ...> writes:
> On Tue, Oct 9, 2012 at 3:02 AM, Calvin Spealman
<ironfroggy at> wrote:
> was in teh wild, and is still in use. Why do we find ourselves
debating new libraries like this as PEPs? We need to let them play out, see what
sticks. If someone wants to make this library and stick it on PyPI, I'm not
stopping them. I'm encouraging it. Let's see how it plays out. if it works out
well, it deserves a PEP. In two or three years.
> I agree,
> This discussion has been framed unfairly. (or a similar API) has already been rejected as PEP 355. I see no need 
to go through this again, at least not in this discussion thread. If you want to
re-discuss PEP 355, please open a separate thread.



From ncoghlan at  Tue Oct  9 14:07:35 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Oct 2012 17:37:35 +0530
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 3:13 PM, Antoine Pitrou <solipsis at> wrote:
>> It does suggest a whole new class of verbs though, like "make" or "build".
> They are rather vague, though.

Agreed, but what os.path.join actually *does* is rather vague, since
it is really "joins the path segments, starting with the last absolute
path segment".

I'm mostly playing Devil's Advocate here, but I thought it was a very
good point that requesting a Path or PurePath directly is *always*
going to be an option. And really, the only time you *need* a PurePath
is when you want to construct a non-native path - for native paths
you'll always be able to create it, some methods just won't work if it
doesn't actually exist on the filesystem.

Using Victor's more complicated example, compare:

    open(os.path.join(home, ".config", name + ".conf"))

    open(str(Path(home, ".config", name + ".conf")))
    open(home.join(".config", name + ".conf")))
    open(str(home / ".config" / name + ".conf"))

    Path(home, ".config", name + ".conf").open()
    home.join(".config", name + ".conf").open()
    (home / ".config" / name + ".conf").open()

One note regarding the extra "str()" calls relative to something like we get to define the language, so we can get the benefits of
implicit conversion without the many downsides by *defining a new
conversion method*, such as __fspath__. That may provide an attractive
alternative to offering methods that shadow builtin functions:

    open(Path(home, ".config", name + ".conf"))
    open(home.join(".config", name + ".conf"))
    open(home / ".config" / name + ".conf")

"As easy to use as, without the design compromises imposed by
inheriting from str" is a worthwhile goal.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Tue Oct  9 14:16:38 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 9 Oct 2012 17:46:38 +0530
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 4:33 PM, Yuval Greenfield <ubershmekel at> wrote:
> So python is chartering a path object module, and we should let whichever
> module is the best on pypi eventually get into the std-lib.

No, the module has to at least have a nodding acquaintance with good
software design principles, avoid introducing too many ways to do the
same thing, and various other concerns many authors of modules on PyPI
often don't care about.

That's *why* got rejected in the first place. Just as
ipaddress is not the same as ipaddr due to those additional concerns,
so will whatever path abstraction makes into the standard library take
those concerns into account.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From solipsis at  Tue Oct  9 14:34:36 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 9 Oct 2012 14:34:36 +0200 (CEST)
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan <ncoghlan at ...> writes:
> One note regarding the extra "str()" calls relative to something like
> we get to define the language, so we can get the benefits of
> implicit conversion without the many downsides by *defining a new
> conversion method*, such as __fspath__. That may provide an attractive
> alternative to offering methods that shadow builtin functions:
>     open(Path(home, ".config", name + ".conf"))
>     open(home.join(".config", name + ".conf"))
>     open(home / ".config" / name + ".conf")
> "As easy to use as, without the design compromises imposed by
> inheriting from str" is a worthwhile goal.

That's a very good idea! Even better if there's a way to make it work as
expected with openat support (for example by allowing __fspath__ to return
a (dir_fd, filename) tuple).



From breamoreboy at  Tue Oct  9 14:47:41 2012
From: breamoreboy at (Mark Lawrence)
Date: Tue, 09 Oct 2012 13:47:41 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k516d6$u34$>

On 08/10/2012 19:47, Antoine Pitrou wrote:
> Hello,
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p

-1 yuck

> - `p + q` joins path q to path p

+1 Pythonic

> - `p / q` joins path q to path p

-0 veering to +0 it just seems wrong but I can't strongly put my finger 
on why.

> - `p.join(q)` joins path q to path p

-1 likely to confuse idiots like me as it's too similar to string.join.

For the last one would there be a real need for a path.join method, or 
has this already been discussed and I've forgotten about it?

> (you can include a rationale if you want, but don't forget to vote :-))
> Thank you
> Antoine.


Mark Lawrence.

From steve at  Tue Oct  9 14:54:52 2012
From: steve at (Steven D'Aprano)
Date: Tue, 09 Oct 2012 23:54:52 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
Message-ID: <>

On 09/10/12 11:32, Oscar Benjamin wrote:

> The main purpose of quiet NaNs is to propagate through computation
> ruining everything they touch. In a programming language like C that
> lacks exceptions this is important as it allows you to avoid checking
> all the time for invalid values, whilst still being able to know if
> the end result of your computation was ever affected by an invalid
> numerical operation.

Correct, but I'd like to point out that NaNs are a bit more
sophisticated than just "numeric contagion".

1) NaNs carry payload, so you can actually identify what sort of
calculation failed. E.g. NaN-27 might mean "logarithm of a negative
number", while NaN-95 might be "inverse trig function domain error".
Any calculation involving a single NaN is supposed to propagate the
same payload, so at the end of the calculation you can see that you
tried to take the log of a negative number and debug accordingly.

2) On rare occasions, NaNs can validly disappear from a calculation,
leaving you with a non-NaN answer. The rule is, if you can replace
the NaN with *any* other value, and still get the same result, then
the NaN is irrelevant and can be consumed. William Kahan gives an

     For example, 0*NaN must be NaN because 0*? is an INVALID
     operation (NaN). On the other hand, for hypot(x, y) :=
     ?(x*x + y*y) we find that hypot(?, y) = +? for all real y,
     finite or not, and deduce that hypot(?, NaN) = +? too;
     naive implementations of hypot may do differently.

Page 7 of


From oscar.j.benjamin at  Tue Oct  9 15:07:45 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Tue, 9 Oct 2012 14:07:45 +0100
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

On 7 October 2012 23:43, Oscar Benjamin <oscar.j.benjamin at> wrote:
> Before pep 380 filter(lambda x: True, obj) returned an object that was
> the same kind of iterator as obj (it would yield the same values). Now
> the "kind of iterator" that obj is depends not only on the values that
> it yields but also on the value that it returns. Since filter does not
> pass on the same return value, filter(lambda x: True, obj) is no
> longer the same kind of iterator as obj. The same considerations apply
> to many other functions such as map, itertools.groupby,
> itertools.dropwhile.

I really should have checked this before posting but I didn't have
Python 3.3 available:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600
32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import itertools
>>> def f():
...     return 'Returned from generator!'
...     yield
>>> next(filter(lambda x:True, f()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: Returned from generator!

So filter does propagate the same StopIteration instance. However map does not:

>>> next(map(None, f()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

The itertools module is inconsistent in this respect as well. As
already mentioned itertools.chain() hides the value:

>>> next(itertools.chain(f(), f()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
>>> next(itertools.chain(f()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

Other functions may or may not:

>>> next(itertools.dropwhile(lambda x:True, f()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: Returned from generator!
>>> next(itertools.groupby(f()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

These next two seem wrong since there are two iterables (but I don't
think they can be done differently):

>>> def g():
...     return 'From the other generator...'
...     yield
>>> next(itertools.compress(f(), g()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: Returned from generator!
>>> next(zip(f(), g()))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration: Returned from generator!

I guess this should be treated as undefined behaviour? Perhaps it
should be documented as such so that anyone who chooses to rely on it
was warned.

Also some of the itertools documentation is ambiguous in relation to
returning vs yielding values from an iterator. Those on the builtin
functions page are defined carefully:
    filter(function, iterable)
    Construct an iterator from those elements of iterable for which
function returns true.
    map(function, iterable, ...)
    Return an iterator that applies function to every item of
iterable, yielding the results.

But some places in the itertools module use 'return' in place of 'yield':
    itertools.filterfalse(predicate, iterable)
    Make an iterator that filters elements from iterable returning
only those for which the predicate is False. If predicate is None,
return the items that are false.
    itertools.groupby(iterable, key=None)
    Make an iterator that returns consecutive keys and groups from the
iterable. The key is a function computing a key value for each
element. If not specified or is None, key defaults to an identity
function and returns the element unchanged.


From ericsnowcurrently at  Tue Oct  9 15:30:00 2012
From: ericsnowcurrently at (Eric Snow)
Date: Tue, 9 Oct 2012 07:30:00 -0600
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 9, 2012 1:12 AM, "Senthil Kumaran" <senthil at> wrote:
> > `p.pathjoin(q)`
> +1
> It is very explicit and hard to get it wrong.


...and it's not _that_ long a name.  This would be a provisional module, so
we could try the name on for size <wink> or hide it behind an operator

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From massimo.dipierro at  Tue Oct  9 15:49:06 2012
From: massimo.dipierro at (Massimo Di Pierro)
Date: Tue, 9 Oct 2012 08:49:06 -0500
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 9, 2012, at 12:31 AM, Greg Ewing wrote:

> Massimo DiPierro wrote:
>> The + symbol means addition and union of disjoint sets. A path (including a fs path) is a set of links (for a fs path, a link is a folder name). Using the + symbols has a natural interpretation as concatenation of subpaths (sets) to for form a longer path (superset).
> A reason *not* to use '+' is that it would violate associativity
> in some cases, e.g.
>   (path + "foo") + "bar"
> would not be the same as
>   path + ("foo" + "bar")

I am missing something. Why not?

> Using '/', or any other operator not currently defined on strings,
> would prevent this mistake from occuring.
> A reason to want an operator is the symmetry of path concatenation.
> Symmetrical operations deserve a symmetrical syntax, and to achieve
> that in Python you need either an operator or a stand-alone function.
> A reason to prefer an operator over a function is associativity.
> It would be nice to be able to write
>   path1 / path2 / path3
> and not have to think about the order in which the operations are
> being done.
> If '/' is considered too much of a stretch, how about '&'? It
> suggests a kind of addition or concatenation, and in fact is
> used for string concatenation in some other languages.
> -- 
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From solipsis at  Tue Oct  9 16:00:40 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 9 Oct 2012 14:00:40 +0000 (UTC)
Subject: [Python-ideas] PEP 428: poll about the joining syntax
References: <>
Message-ID: <>

Eric Snow <ericsnowcurrently at ...> writes:
> > > `p.pathjoin(q)`
> >
> > +1
> >
> > It is very explicit and hard to get it wrong.
> +1
> ...and it's not _that_ long a name.? This would be a provisional module, so
> we could try the name on for size <wink> or hide it behind an operator later.

Or, precisely, since it's provisional, we needn't *wait* before we provide an 
operator. Any stdlib module API can be augmented; what provisional modules
allow in addition to that is to modify or remove existing APIs.

So we can, say, enable Path.__truediv__ and wait for people to complain about it.

By the way, it's not new to have dual operator / method APIs. For example
set.union and set.__or__; list.extend and list.__iadd__; etc.



From michelelacchia at  Tue Oct  9 16:27:24 2012
From: michelelacchia at (Michele Lacchia)
Date: Tue, 9 Oct 2012 07:27:24 -0700 (PDT)
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

> - `p[q]` joins path q to path p 

For some obscure reason I really like this one. I can understand the 
against it though. So I'll probably be the only one to be +0 on this 

- `p + q` joins path q to path p 

I agree with who says this operator should be used as suffix appending, and 
for path components. 

- `p / q` joins path q to path p 

I'm not against the div operator I'd prefer to use another one. I'm a *nix 
person but
I find this proposal too *nix-centric.

About the operator: I really like *Steven D'Aprano*'s proposal: I find *&*just perfect.
I'm way more than +1 on it.

- `p.join(q)` joins path q to path p 
I feel the need for a method, in parallel with some operator. About the 
name: if join
is rejected I am:
    +1 on add()
    +1 on adjoin()
    +0 on append()
    -1 on pathjoin() / joinpath() -- too long, too similar, way too ugly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From michelelacchia at  Tue Oct  9 16:30:08 2012
From: michelelacchia at (Michele Lacchia)
Date: Tue, 9 Oct 2012 07:30:08 -0700 (PDT)
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

> > 
> > A reason *not* to use '+' is that it would violate associativity 
> > in some cases, e.g. 
> > 
> >   (path + "foo") + "bar" 
> > 
> > would not be the same as 
> > 
> >   path + ("foo" + "bar") 
> > 
> I am missing something. Why not? 

Because the result would be (respectively): *path/foo/bar* and *path/foobar*
In the second example the two strings would be concatenated and only
then joined to the path.
This is a very good argument against the + operator!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From massimo.dipierro at  Tue Oct  9 16:57:12 2012
From: massimo.dipierro at (Massimo Di Pierro)
Date: Tue, 9 Oct 2012 09:57:12 -0500
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

This is an excellent point. I change my vote to using the / operator (wait, do I even any right to vote not his?).

On Oct 9, 2012, at 9:30 AM, Michele Lacchia wrote:

> > 
> > A reason *not* to use '+' is that it would violate associativity 
> > in some cases, e.g. 
> > 
> >   (path + "foo") + "bar" 
> > 
> > would not be the same as 
> > 
> >   path + ("foo" + "bar") 
> > 
> I am missing something. Why not? 
> Because the result would be (respectively): path/foo/bar and path/foobar.
> In the second example the two strings would be concatenated and only
> then joined to the path.
> This is a very good argument against the + operator!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From him at  Tue Oct  9 16:58:49 2012
From: him at (=?ISO-8859-1?Q?Joachim_K=F6nig?=)
Date: Tue, 09 Oct 2012 16:58:49 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 09/10/2012 16:30, Michele Lacchia wrote:
>     >
>     > A reason *not* to use '+' is that it would violate associativity
>     > in some cases, e.g.
>     >
>     >   (path + "foo") + "bar"
>     >
>     > would not be the same as
>     >
>     >   path + ("foo" + "bar")
>     >
>     I am missing something. Why not?
> Because the result would be (respectively): /path/foo/bar/ and 
> /path/foobar/.
> In the second example the two strings would be concatenated and only
> then joined to the path.
> This is a very good argument against the + operator!

But why not interpret a path as a tuple (not a list, it's immutable) of 
path segments and have:

     path + ("foo", "bar")


     path + ".tar.gz"

behave different (i.e. tuples add segments and strings add to the last 

And of course path1 + path2 adds the segments together.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Tue Oct  9 17:15:29 2012
From: barry at (Barry Warsaw)
Date: Tue, 9 Oct 2012 11:15:29 -0400
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
References: <>
Message-ID: <>

On Oct 08, 2012, at 06:13 PM, Raymond Hettinger wrote:

>On Oct 8, 2012, at 12:44 PM, Mike Graham <mikegraham-Re5JQEeQqe8AvxtiuMwx3w at> wrote:
>> I regularly see learners using "is" to check for string equality and
>> sometimes other equality. Due to optimizations, they often come away
>> thinking it worked for them.
>> There are no cases where
>>    if x is "foo":
>> or
>>   if x is 4:
>> is actually the code someone intended to write.
>> Although this has no benefit to anyone but new learners, it also
>> doesn't really do any harm.
>This seems like a job for pyflakes, pylint, or pychecker.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From storchaka at  Tue Oct  9 17:28:14 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 18:28:14 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <k51fql$r7k$>

On 09.10.12 16:07, Oscar Benjamin wrote:
> I really should have checked this before posting but I didn't have
> Python 3.3 available:

Generator expression also eats the StopIteration value:

 >>> next(x for x in f())
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>

> These next two seem wrong since there are two iterables (but I don't
> think they can be done differently):
>>>> def g():
> ....     return 'From the other generator...'
> ....     yield
> ....
>>>> next(itertools.compress(f(), g()))
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> StopIteration: Returned from generator!
>>>> next(zip(f(), g()))
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> StopIteration: Returned from generator!

 >>> def h():
...     yield 42
...     return 'From the another generator...'
 >>> next(zip(f(), h()))
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
StopIteration: Returned from generator!
 >>> next(zip(h(), f()))
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
StopIteration: Returned from generator!

This is logical. Value returned from the first exhausted iterator.

> I guess this should be treated as undefined behaviour? Perhaps it
> should be documented as such so that anyone who chooses to rely on it
> was warned.

This should be treated as implementation details now.

From storchaka at  Tue Oct  9 17:34:57 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 18:34:57 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
	<> <k4tefg$52k$>
Message-ID: <k51g78$ur1$>

On 09.10.12 10:51, Greg Ewing wrote:
> Where we seem to disagree is on
> whether returning a value with StopIteration is part of the
> iterator protocol or the generator protocol.

Is a generator expression work with the iterator protocol or the 
generator protocol?

A generator expression eats a value with StopIteration:

 >>> def G():
...     return 42
...     yield
 >>> next(x for x in G())
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>

Is it a bug?

From ethan at  Tue Oct  9 17:54:03 2012
From: ethan at (Ethan Furman)
Date: Tue, 09 Oct 2012 08:54:03 -0700
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <20121009043236.GI27445@ando>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Steven D'Aprano wrote:
> 1) It is not the case that NaN <comp> NaN is always false.

Huh -- well, apparently NaN != Nan --> True.

However, borrowing Steven's earlier example, and modifying slightly:

sqr(-1) != sqr(-1)

Shouldn't this be False?

Or, to look at it another way, surely somewhere out in the Real World 
(tm) it is the case that two NaNs are indeed equal.


From christian at  Tue Oct  9 18:11:08 2012
From: christian at (Christian Heimes)
Date: Tue, 09 Oct 2012 18:11:08 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Am 08.10.2012 20:40, schrieb Guido van Rossum:
> Now I know what it is I think that (a) the abstract reactor design
> should support IOCP, and (b) the stdlib should have enabled by default
> IOCP when on Windows.

I've created a ticket for the topic:


From steve at  Tue Oct  9 18:11:42 2012
From: steve at (Steven D'Aprano)
Date: Wed, 10 Oct 2012 03:11:42 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<20121009043236.GI27445@ando> <>
Message-ID: <>

On 10/10/12 02:54, Ethan Furman wrote:

> Or, to look at it another way, surely somewhere out in the Real
>World (tm) it is the case that two NaNs are indeed equal.

By definition, no.


From storchaka at  Tue Oct  9 18:18:09 2012
From: storchaka at (Serhiy Storchaka)
Date: Tue, 09 Oct 2012 19:18:09 +0300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <k51io8$qmo$>

On 09.10.12 01:44, Guido van Rossum wrote:
> I don't understand that code at all, and it seems to be undocumented
> (no docstrings, no mention in the external docs). Why is it using
> StopIteration at all? There isn't an iterator or generator in sight.
> AFAICT it should just use a different exception.

I agree with you. StopIteration is not needed here (or I don't 
understand that code), ValueError can be raised instead it. Perhaps the 
author was going to use it for the iterative parsing.

This is a bad example, but it is only real example which I have. I have 
also the artificial (but also imperfect) example:

def file_reader(f):
     while not f.eof:

def zlib_decompressor(input):
     d = zlib.decompressobj()
     while not d.eof:
         yield d.decompress(d.unconsumed_tail or next(input))
     return d.unused_data

def bzip2_decompressor(input):
     decomp = bz2.BZ2Decompressor()
     while not decomp.eof:
         yield decomp.decompress(next(input))
     return decomp.unused_data

def detect_decompressor(input):
     data = b''
     while len(data) < 5:
         data += next(input)
     if data.startswith('deflt'):
         decompressor = zlib_decompressor
         data = data[5:]
     elif data.startswith('bzip2'):
         decompressor = bzip2_decompressor
         data = data[5:]
         decompressor = None
     input = itertools.chain([data], input)
     return decompressor, input

def multi_stream_decompressor(input):
     while True:
         decompressor, input = detect_decompressor(input)
         if decompressor is None:
             return input
         unused_data = yield from decompressor(input)
         if not unused_data:
             return input
         input = itertools.chain([unused_data], input)

Of cause this can be implemented without generators, using a class to 
hold a state.

> I think you're going at this from the wrong direction. You shouldn't
> be using this feature in circumstances where you're at all likely to
> run into this "problem".

I think that the new language features (as returning value from 
generators/iterators) will generated new methods of solving problems. 
And for these new methods will be useful to expand the existing tools. 
But now I see that it is still too early to talk about it.

> Itertools is for iterators, and all the extra generator
> features make no sense for it.

As said Greg, the question is whether returning a value with 
StopIteration is part of the iterator protocol or the generator protocol.

From amcnabb at  Tue Oct  9 18:35:02 2012
From: amcnabb at (Andrew McNabb)
Date: Tue, 9 Oct 2012 10:35:02 -0600
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 09, 2012 at 11:34:24PM +1300, Greg Ewing wrote:
> I just consulted a thesaurus about synonyms for 'append',
> and it came up with 'affix' and 'adjoin'.

Yet another possibility is "combine", which unlike "join", gives less of
an implicit guarantee that it's a straightforward concatenation.

Other synonyms for "combine" include "couple", "fuse", and "hitch".

Andrew McNabb
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

From ethan at  Tue Oct  9 18:37:09 2012
From: ethan at (Ethan Furman)
Date: Tue, 09 Oct 2012 09:37:09 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>
	<>	<>
	<>	<>
	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Greg Ewing wrote:
> Nick Coghlan wrote:
>> Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write
>> it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least
>> isn't going backwards, and is more obvious in isolation than "a / b /
>> c / d / e".
> I think we should keep in mind that we're (hopefully) not going
> to see things like "a / b / c / d / e" in real-life code. Rather
> we're going to see things like
>    backupath = destdir / "archive" / filename + ".bak"
> In other words, there should be some clue from the names
> that paths are involved, from which it should be fairly
> easy to guess what the "/" means.


From ryan at  Tue Oct  9 18:59:06 2012
From: ryan at (Ryan D Hiebert)
Date: Tue, 9 Oct 2012 09:59:06 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Oct 9, 2012, at 1:18 AM, Joachim K?nig <him at> wrote:
> As has already been stated by others, paths are immutable so using them
> like lists is leading to confusion (and list's append() only wants one arg, so
> extend() might be better in that case).
> But paths could then be interpreted as tuples of "directory entries" instead.
> So adding a path to a path would "join" them:
> pathA + pathB
> and in order to not always need a path object for pathB one could also write
> the right argument of __add__ as a tuple of strings:
> pathA + ("somedir", "file.txt")

I like it. As you pointed out, my comparison with list is inappropriate because of path's immutability. So .append() and .extend() probably don't make sense.

> One could also use "+" for adding to the last segment if it isn't a path object or a tuple:
> pathA + ".tar.gz"

This might be a reasonable way to appease both those who are viewing path as a special tuple and those who are viewing it as a special string. It breaks the parallel with tuple a bit, but it's clear that there are important properties of both strings and tuples that would be nice to preserve.


From guido at  Tue Oct  9 19:05:12 2012
From: guido at (Guido van Rossum)
Date: Tue, 9 Oct 2012 10:05:12 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 2:11 AM, Greg Ewing <greg.ewing at> wrote:
> Ben Darnell wrote:
>> StackContext doesn't quite give you better tracebacks, although I
>> think it could be adapted to do that.  ExceptionStackContext is
>> essentially a try/except block that follows you around across
>> asynchronous operations - on entry it sets a thread-local state, and
>> all the tornado asynchronous functions know to save this state when
>> they are passed a callback, and restore it when they execute it.

> This is something that generator-based coroutines using
> yield-from ought to handle a lot more cleanly. You should
> be able to just use an ordinary try-except block in your
> generator code and have it do the right thing.

Indeed, in NDB this works great. However tracebacks don't work so
great: If you don't catch the exception right away, it takes work to
make the tracebacks look right when you catch it a few generator calls
down on the (conceptual) stack. I fixed this to some extent in NDB, by
passing the traceback explicitly along when setting an exception on a
Future; before I did this, tracebacks looked awful. But there are
still StackContextquite a few situations in NDB where an uncaught
exception prints a baffling traceback, showing lots of frames from the
event loop and other async machinery but not the user code that was
actually waiting for anything. I have to study Tornado's to see if
there are ideas there for improving this.

> I hope that the new async core will be designed so that
> generator-based coroutines can be plugged into it directly
> and efficiently, without the need for a lot of decorators,
> callbacks, Futures, etc. in between.

That has been my hope too. But so far when thinking about this
recently I have found the goal elusive -- somehow it seems there *has*
to be a distinction between an operation you just *yield* (this would
be waiting for a specific low-level I/O operation) and something you
use with yield-from, which returns a value through StopIteration. I
keep getting a headache when I think about this, so there must be a
Monad in there somewhere... :-( Perhaps you can clear things up by
showing some detailed (but still simple enough) example code to handle
e.g. a simple web client?

--Guido van Rossum (

From eric at  Tue Oct  9 19:11:48 2012
From: eric at (Eric V. Smith)
Date: Tue, 09 Oct 2012 13:11:48 -0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 10/09/2012 12:59 PM, Ryan D Hiebert wrote:
> On Oct 9, 2012, at 1:18 AM, Joachim K?nig <him at> wrote:
>> As has already been stated by others, paths are immutable so using them
>> like lists is leading to confusion (and list's append() only wants one arg, so
>> extend() might be better in that case).
>> But paths could then be interpreted as tuples of "directory entries" instead.
>> So adding a path to a path would "join" them:
>> pathA + pathB
>> and in order to not always need a path object for pathB one could also write
>> the right argument of __add__ as a tuple of strings:
>> pathA + ("somedir", "file.txt")
> I like it. As you pointed out, my comparison with list is inappropriate because of path's immutability. So .append() and .extend() probably don't make sense.
>> One could also use "+" for adding to the last segment if it isn't a path object or a tuple:
>> pathA + ".tar.gz"

But then you'd have to say:

pathA + ("file.txt",)


That doesn't seem very friendly.


From guido at  Tue Oct  9 19:44:27 2012
From: guido at (Guido van Rossum)
Date: Tue, 9 Oct 2012 10:44:27 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Tue, Oct 9, 2012 at 10:11 AM, Eric V. Smith <eric at> wrote:
> On 10/09/2012 12:59 PM, Ryan D Hiebert wrote:
>> On Oct 9, 2012, at 1:18 AM, Joachim K?nig <him at> wrote:
>>> As has already been stated by others, paths are immutable so using them
>>> like lists is leading to confusion (and list's append() only wants one arg, so
>>> extend() might be better in that case).
>>> But paths could then be interpreted as tuples of "directory entries" instead.
>>> So adding a path to a path would "join" them:
>>> pathA + pathB
>>> and in order to not always need a path object for pathB one could also write
>>> the right argument of __add__ as a tuple of strings:
>>> pathA + ("somedir", "file.txt")
>> I like it. As you pointed out, my comparison with list is inappropriate because of path's immutability. So .append() and .extend() probably don't make sense.
>>> One could also use "+" for adding to the last segment if it isn't a path object or a tuple:
>>> pathA + ".tar.gz"
> But then you'd have to say:
> pathA + ("file.txt",)
> right?
> That doesn't seem very friendly.

Yeah, like the problem with % formatting. Another argument for picking
a method name.

--Guido van Rossum (

From ryan at  Tue Oct  9 19:48:09 2012
From: ryan at (Ryan D Hiebert)
Date: Tue, 9 Oct 2012 10:48:09 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Oct 9, 2012, at 10:11 AM, Eric V. Smith <eric at> wrote:
>>> One could also use "+" for adding to the last segment if it isn't a path object or a tuple:
>>> pathA + ".tar.gz"
> But then you'd have to say:
> pathA + ("file.txt",)


pathA + Path("file.txt")

Just like with any tuple, if you wish to add a new part, it must be a tuple (Path) first.

I'm not convinced that adding a string to a path should be allowed, but if not then we should probably throw a TypeError if its not a tuple or Path. That would leave the following method for appending a suffix:

path[:-1] + Path(path[-1] + '.tar.gz')

That's alot more verbose than the option to "add a string".


From _ at  Tue Oct  9 20:00:21 2012
From: _ at (Laurens Van Houtven)
Date: Tue, 9 Oct 2012 20:00:21 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Oh my me. This is a very long thread that I probably should have replied to
a long time ago. This thread is intensely long right now, and tonight is
the first chance I've had to try and go through it comprehensively. I'll
try to reply to individual points made in the thread -- if I missed yours,
please don't be offended, I promise it's my fault :)

FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka

First of all, I'm glad to see that there's some more "let's get that pep
along" movement. I tabled it because:

a) I didn't have enough time to contribute,
b) a lot of promised contributions ended up not happening when it came down
to it, which was incredibly demotivating. The combination of this thread,
plus the fact that I was strong armed at Pycon ZA by a bunch of community
members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into
exploring this thing again.

First of all, I don't feel async-pep is an attempt at twisted light in the
stdlib. Other than separation of transport and protocol, there's not really
much there that even smells of twisted (especially since right now I'd
probably throw consumers/producers out) -- and that separation is simply
good practice. Twisted does the same thing, but it didn't invent it.
Furthermore, the advantages seem clear: reusability and testability are
more than enough for me.

If there's one take away idea from async-pep, it's reusable protocols.

The PEP should probably be a number of PEPs. At first sight, it seems that
this number is at least four:

1. Protocol and transport abstractions, making no mention of asynchronous
IO (this is what I want 3153 to be, because it's small, manageable, and
virtually everyone appears to agree it's a fantastic idea)
2. A base reactor interface
3. A way of structuring callbacks: probably deferreds with a built-in
inlineCallbacks for people who want to write synchronous-looking code with
explicit yields for asynchronous procedures
4+ adapting the stdlib tools to using these new things

Re: forward path for existing asyncore code. I don't remember this being
raised as an issue. If anything, it was mentioned in passing, and I think
the answer to it was something to the tune of "asyncore's API is broken,
fixing it is more important than backwards compat". Essentially I agree
with Guido that the important part is an upgrade path to a good third-party
library, which is the part about asyncore that REALLY sucks right now.
Regardless, an API upgrade is probably a good idea. I'm not sure if it
should go in the first PEP: given the separation I've outlined above (which
may be too spread out...), there's no obvious place to put it besides it
being a new PEP.

Re base reactor interface: drawing maximally from the lessons learned in
twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
etc), asynchronous-looking name lookup, fd handling are the important
parts. call_every can be implemented in terms of call_later on a separate
object, so I think it should be (eg twisted.internet.task.LoopingCall). One
thing that is apparently forgotten about is event loop integration. The
prime way of having two event loops cooperate is *NOT* "run both in
parallel", it's "have one call the other". Even though not all loops
support this, I think it's important to get this as part of the interface
(raise an exception for all I care if it doesn't work).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From g.brandl at  Tue Oct  9 21:24:08 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 09 Oct 2012 21:24:08 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k51tij$3ee$>

Am 08.10.2012 20:47, schrieb Antoine Pitrou:
> Hello,
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p


> - `p + q` joins path q to path p


> - `p / q` joins path q to path p


> - `p.join(q)` joins path q to path p


+0 for .joinpath() as the only way, +1 for .joinpath() as an alternative.


From him at  Tue Oct  9 21:30:15 2012
From: him at (=?ISO-8859-1?Q?Joachim_K=F6nig?=)
Date: Tue, 09 Oct 2012 21:30:15 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 09.10.2012 19:11, Eric V. Smith wrote:
> But then you'd have to say:
> pathA + ("file.txt",)
> right?
> That doesn't seem very friendly.

You could of course write:

pathA + "/file.txt"

because with a separator it's still explicit. But this requires 
because "/file.txt" could be considered an absolut path. But IMO the
string additionen should be concatenation. YMMV.


From andre.roberge at  Tue Oct  9 21:32:36 2012
From: andre.roberge at (Andre Roberge)
Date: Tue, 9 Oct 2012 16:32:36 -0300
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 3:47 PM, Antoine Pitrou <solipsis at> wrote:

> Hello,
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p
-1    ... semantics too different from usual meaning of [ ]

- `p + q` joins path q to path p
-1  ... too problematic with strings...

- `p / q` joins path q to path p
+0   ... my brain is hard-wired to see / as division or equivalent (e.g.
quotient groups, etc.)

- `p.join(q)` joins path q to path p
+1    .... only remaining choice.  Besides, I think an explicit method
makes more sense.

If paths were only directories, I would have really like   [with support for multiple arguments of course]
as everyone (I think) would naturally recognize this...   However, since we
can have file as well, I was trying to think of something to mean
change path p so that it now points to the joining of path p and q ...
and suggest p.goto(q)  ;-)  ;-)


> (you can include a rationale if you want, but don't forget to vote :-))
> Thank you
> Antoine.
> --
> Software development and contracting:
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tjreedy at  Tue Oct  9 21:32:44 2012
From: tjreedy at (Terry Reedy)
Date: Tue, 09 Oct 2012 15:32:44 -0400
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k51u5c$85f$>

On 10/9/2012 9:30 AM, Eric Snow wrote:
> On Oct 9, 2012 1:12 AM, "Senthil Kumaran"
> <senthil at
> <mailto:senthil at>> wrote:
>  > > `p.pathjoin(q)`
>  >
>  > +1
>  >
>  > It is very explicit and hard to get it wrong.

or path.concat(otherpath)

Terry Jan Reedy

From g.brandl at  Tue Oct  9 22:15:42 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 09 Oct 2012 22:15:42 +0200
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <k520j8$sgj$>

Am 08.10.2012 22:38, schrieb Joshua Landau:

>     Conversely, I often see this:
>         if x == None
>     and even
>         if x == True
>     Okay, so maybe these are less harmful than the original complaint, but still,
>     yuck!
> We can't really warn against these.
>     >>> class EqualToTrue:
>     ...     def __eq__(self, other):
>     ...             return other is True
>     ... 
>     >>> EqualToTrue() is True
>     False
>     >>> EqualToTrue() == True
>     True

The point is that in 99.9...% of cases,

  if x == True:

is just

  if x:


From solipsis at  Tue Oct  9 22:16:00 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 9 Oct 2012 22:16:00 +0200
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
References: <>
Message-ID: <>

On Tue, 09 Oct 2012 22:15:42 +0200
Georg Brandl <g.brandl at> wrote:
> The point is that in 99.9...% of cases,
>   if x == True:
> is just
>   if x:

But it's not dangerous to write `if x == True`, and so there isn't any
point in warning. As Raymond said, this is a job for a style checker.



Software development and contracting:

From g.brandl at  Tue Oct  9 22:16:49 2012
From: g.brandl at (Georg Brandl)
Date: Tue, 09 Oct 2012 22:16:49 +0200
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <k50sh0$4q8$>
References: <>
Message-ID: <k520lc$sgj$>

Am 09.10.2012 11:58, schrieb Serhiy Storchaka:
> On 09.10.12 02:05, Mike Graham wrote:
>> I can't find this in a couple versions of Python I checked. If this
>> code is still around, it sounds like it has a bug and should be fixed.
> It's "if node.tagname is 'admonition':" line.

It's not part of Python anyway, and should be reported to the docutils


From storchaka at  Tue Oct  9 23:10:06 2012
From: storchaka at (Serhiy Storchaka)
Date: Wed, 10 Oct 2012 00:10:06 +0300
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <k520j8$sgj$>
References: <>
Message-ID: <k523rh$ufi$>

On 09.10.12 23:15, Georg Brandl wrote:
> The point is that in 99.9...% of cases,
>    if x == True:
> is just
>    if x:

Of cause. However in Lib/unittest/ I found a lot of "if x != 
False:" which is not equivalent to just "if x:". It is equivalent to "if 
x is None or x:" and so I left it as is.

From jeanpierreda at  Tue Oct  9 23:32:33 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Tue, 9 Oct 2012 17:32:33 -0400
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 10:14 PM, Guido van Rossum <guido at> wrote:
> Maybe we should do something more drastic and always create a new,
> unique constant whenever a literal occurs as an argument of 'is' or
> 'is not'? Then such code would never work, leading people to examine
> their code more closely. I betcha we have people who could change the
> bytecode compiler easily enough to do that. (I'm not seriously
> proposing this, except as a threat of what we could do if the
> SyntaxWarning is rejected. :-)

Is this any better than making `x is 0` raise a TypeError with a
message about what's wrong (as suggested by Mike Graham)?

In both cases, `x is 0` is basically worthless, but at least if it
raises an exception people can understand what "went wrong", because
of the error message that comes with the exception.

-- Devin

From tjreedy at  Tue Oct  9 23:37:46 2012
From: tjreedy at (Terry Reedy)
Date: Tue, 09 Oct 2012 17:37:46 -0400
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k51g78$ur1$>
References: <k4q38d$j8e$> <>
	<> <k4tefg$52k$>
	<> <k51g78$ur1$>
Message-ID: <k525fr$dod$>

On 10/9/2012 11:34 AM, Serhiy Storchaka wrote:
> On 09.10.12 10:51, Greg Ewing wrote:
>> Where we seem to disagree is on
>> whether returning a value with StopIteration is part of the
>> iterator protocol or the generator protocol.

There is a generator class but no 'generator protocol'. Adding the extra 
generator methods to another iterator class will not give its instances 
the suspend/resume behavior of generators. That requires the special 
bytecodes and flags resulting from the presence of 'yield' in the 
generator function whose call produces the generator.

> Is a generator expression work with the iterator protocol or the
> generator protocol?

A generator expression produces a generator, which implements the 
iterator protocol and has the extra generator methods and suspend/resume 

Part of the iterator protocol is that .__next__ methods raise 
StopIteration to signal that no more objects will be yielded. A value 
can be attached to StopIteration, but it is irrelevant to it use as a 
'done' signal. Any iterator .__next__ method.  can raise or pass along 
StopIteration(something). Whether 'something' is even seen or not is a 
different question. The main users of iterators, for statements, ignore 
anything extra.

> A generator expression eats a value with StopIteration:
>  >>> def G():
> ...     return 42
> ...     yield
> ...
>  >>> next(x for x in G())
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> StopIteration
> Is it a bug?

Of course not. A generator expression is an abbreviation for a def 
statement defining a generator function followed by a call to that 
generator function. (x for x in G()) is roughly equivalent to

def __():
   for x in G():
     yield x
   # when execution reaches here, None is returned, as usual

_ = __()
del __
_  # IE, _ is the value of the expression

A for loop stops when it catches (and swallows) a StopIteration 
instance. That instance has served it function as a signal. The for 
mechanism ignores any attributes thereof.

The generator .__next__ method that wraps the generator code object (the 
compiled body of the generator function) raises StopIteration if the 
code object ends by returning None. So the StopIteration printed in the 
traceback above is a different StopIteration instance and come from a 
different callable than the one from G that stopped the for loop in the 
generator. There is no sensible way to connect the two. Note that a 
generator can iterate through multiple iterators, like map and chain do.

If the generator stops by raising StopIteration instead of returning 
None, *that* StopIteration instance is passed along by the .__next__ 
wrapper. (This may be an implementation detail, but it is currently true.)

 >>> def g2():
	SI = StopIteration('g2')
	print(SI, id(SI))
	raise SI
	yield 1

 >>> try: next(g2())
except StopIteration as SI:
	print(SI, id(SI))
g2 52759816
g2 52759816

If you want any iterator to raise or propagate a value-laden 
StopIteration, you must do it explicitly or avoid swallowing one.

 >>> def G():  return 42;     yield

 >>> def g3(): # replacement for your generator expression
	it = iter(G())
	while True:
		yield next(it)

 >>> next(g3())
Traceback (most recent call last):
   File "<pyshell#29>", line 1, in <module>
   File "<pyshell#28>", line 4, in g3
     yield next(it)
StopIteration: 42  # now you see the value

Since filter takes a single iterable, it can be written like g3 and not 
catch the StopIteration of the corresponding iterator.

def filter(pred, iterable):
   it = iter(iterable)
   while True:
     item = next(it)
     if pred(item):
       yield item
   # never reaches here, never returns None

Map takes multiple iterables. In 2.x, map extended short iterables with 
None to match the longest. So it had to swallow StopIteration until it 
had collected one for each iterator. In 3.x, map stops at the first 
StopIteration, so it probably could be rewritten to not catch it. 
Whether it makes sense to do that is another question.

Terry Jan Reedy

From guido at  Tue Oct  9 23:38:19 2012
From: guido at (Guido van Rossum)
Date: Tue, 9 Oct 2012 14:38:19 -0700
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 2:32 PM, Devin Jeanpierre <jeanpierreda at> wrote:
> On Mon, Oct 8, 2012 at 10:14 PM, Guido van Rossum <guido at> wrote:
>> Maybe we should do something more drastic and always create a new,
>> unique constant whenever a literal occurs as an argument of 'is' or
>> 'is not'? Then such code would never work, leading people to examine
>> their code more closely. I betcha we have people who could change the
>> bytecode compiler easily enough to do that. (I'm not seriously
>> proposing this, except as a threat of what we could do if the
>> SyntaxWarning is rejected. :-)
> Is this any better than making `x is 0` raise a TypeError with a
> message about what's wrong (as suggested by Mike Graham)?
> In both cases, `x is 0` is basically worthless, but at least if it
> raises an exception people can understand what "went wrong", because
> of the error message that comes with the exception.

But it's not a runtime error. It should depend on whether a literal is
used in the source code, not whether the argument is an int. (There
are tons of situations where it makes sense to dynamically compare two
objects that may happen to be ints using 'is' -- just not when it's a

So I claim that it should be a message produced during compilation --
or by a lint-like tool, as others have argued.

--Guido van Rossum (

From arnodel at  Tue Oct  9 23:50:29 2012
From: arnodel at (Arnaud Delobelle)
Date: Tue, 9 Oct 2012 22:50:29 +0100
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <k523rh$ufi$>
References: <>
	<k520j8$sgj$> <k523rh$ufi$>
Message-ID: <>

On 9 October 2012 22:10, Serhiy Storchaka <storchaka at> wrote:
> On 09.10.12 23:15, Georg Brandl wrote:
>> The point is that in 99.9...% of cases,
>>    if x == True:
>> is just
>>    if x:
> Of cause. However in Lib/unittest/ I found a lot of "if x != False:"
> which is not equivalent to just "if x:". It is equivalent to "if x is None
> or x:" and so I left it as is.


>>> x = ''
>>> bool(x != False)
>>> bool(x is None or x)

(same with any empty sequence)


From storchaka at  Wed Oct 10 00:00:07 2012
From: storchaka at (Serhiy Storchaka)
Date: Wed, 10 Oct 2012 01:00:07 +0300
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <>
References: <>
	<k520j8$sgj$> <k523rh$ufi$>
Message-ID: <k526p5$ndb$>

On 10.10.12 00:50, Arnaud Delobelle wrote:
>> Of cause. However in Lib/unittest/ I found a lot of "if x != False:"
>> which is not equivalent to just "if x:". It is equivalent to "if x is None
>> or x:" and so I left it as is.
> ??? context of Lib/unittest/

From storchaka at  Wed Oct 10 00:00:41 2012
From: storchaka at (Serhiy Storchaka)
Date: Wed, 10 Oct 2012 01:00:41 +0300
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <k520lc$sgj$>
References: <>
	<k50sh0$4q8$> <k520lc$sgj$>
Message-ID: <k526q8$ndb$>

On 09.10.12 23:16, Georg Brandl wrote:
> It's not part of Python anyway, and should be reported to the docutils
> maintainers.


From at  Wed Oct 10 00:13:57 2012
From: at (Joshua Landau)
Date: Tue, 9 Oct 2012 23:13:57 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

Just a curiosity here (as I can guess of plausible reasons myself, so there
probably are some official stances).

Is there a reason NaNs are not instances of NaN class? Then x == x would be
True (as they want), but [this NaN] == [that NaN] would be False, as

I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1),
but it seems a lot less of a big deal than all of the exceptions with
container equalities.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From oscar.j.benjamin at  Wed Oct 10 00:49:08 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Tue, 9 Oct 2012 23:49:08 +0100
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k525fr$dod$>
References: <k4q38d$j8e$> <>
	<> <k4tefg$52k$>
	<> <k51g78$ur1$>
Message-ID: <>

On 9 October 2012 22:37, Terry Reedy <tjreedy at> wrote:
> On 10/9/2012 11:34 AM, Serhiy Storchaka wrote:
>> On 09.10.12 10:51, Greg Ewing wrote:
>>> Where we seem to disagree is on
>>> whether returning a value with StopIteration is part of the
>>> iterator protocol or the generator protocol.


> Part of the iterator protocol is that .__next__ methods raise StopIteration
> to signal that no more objects will be yielded. A value can be attached to
> StopIteration, but it is irrelevant to it use as a 'done' signal. Any
> iterator .__next__ method.  can raise or pass along
> StopIteration(something). Whether 'something' is even seen or not is a
> different question. The main users of iterators, for statements, ignore
> anything extra.

I know this isn't going anywhere right now but since it might one day
I thought I'd say that I considered how it could be different and the
best I came up with was:

def f():
  return 42

for x in f():
else return_value:
   # return_value = 42 if we get here

> If the generator stops by raising StopIteration instead of returning None,
> *that* StopIteration instance is passed along by the .__next__ wrapper.
> (This may be an implementation detail, but it is currently true.)

I'm wondering whether propagating or not propagating the StopIteration
should be a documented feature of some iterator-based functions or
should always be considered an implementation detail (i.e. undefined
language behaviour). Currently in Python 3.3 I guess that it is always
an implementation detail since the behaviour probably results from an
implementation that was written under the assumption that
StopIteration instances are interchangeable.

> Since filter takes a single iterable, it can be written like g3 and not
> catch the StopIteration of the corresponding iterator.
> def filter(pred, iterable):
>   it = iter(iterable)
>   while True:
>     item = next(it)
>     if pred(item):
>       yield item
>   # never reaches here, never returns None
> Map takes multiple iterables. In 2.x, map extended short iterables with None
> to match the longest. So it had to swallow StopIteration until it had
> collected one for each iterator. In 3.x, map stops at the first
> StopIteration, so it probably could be rewritten to not catch it. Whether it
> makes sense to do that is another question.

Thanks. That makes more sense now as I hadn't considered this
behaviour of map before.


From greg.ewing at  Wed Oct 10 01:14:33 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 10 Oct 2012 12:14:33 +1300
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k51g78$ur1$>
References: <k4q38d$j8e$> <>
	<> <k4tefg$52k$>
	<> <k51g78$ur1$>
Message-ID: <>

Serhiy Storchaka wrote:
> Is a generator expression work with the iterator protocol or the 
> generator protocol?

Iterator protocol, I think. There is no way to explicitly
return a value from a generator expression, and I don't
think it should implicitly return one either.

Keep in mind that there can be more than one iterable
involved in a genexp, so it's not clear what the return
value should be in general.


From greg.ewing at  Wed Oct 10 02:44:23 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 10 Oct 2012 13:44:23 +1300
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> Indeed, in NDB this works great. However tracebacks don't work so
> great: If you don't catch the exception right away, it takes work to
> make the tracebacks look right when you catch it a few generator calls
> down on the (conceptual) stack. I fixed this to some extent in NDB, by
> passing the traceback explicitly along when setting an exception on a
> Future;

Was this before or after the recent change that was supposed
to improve tracebacks from yield-fram chains? If there's still
a problem after that, maybe exception handling in yield-from
requires some more work.

> But so far when thinking about this
> recently I have found the goal elusive -- 

 > Perhaps you can clear things up by
> showing some detailed (but still simple enough) example code to handle
> e.g. a simple web client?

You might like to take a look at this, where I develop a series of
examples culminating in a simple multi-threaded server:

Code here:

 > somehow it seems there *has*
 > to be a distinction between an operation you just *yield* (this would
 > be waiting for a specific low-level I/O operation) and something you
 > use with yield-from, which returns a value through StopIteration.

It may be worth noting that nothing in my server example uses 'yield'
to send or receive values -- yield is only used without argument as
a suspension point. But the functions containing the yields *are*
called with yield-from and may return values via StopIteration.

So I think there are (at least) two distinct ways of using generators,
but the distinction isn't quite the one you're making. Rather, we
have "coroutines" (don't yield values, do return values) and
"iterators" (do yield values, don't return values).

Moreover, it's *only* the "coroutine" variety that we need to cater
for when designing an async event system. Does that help to
alleviate any of your monad-induced headaches?


From steve at  Wed Oct 10 03:14:26 2012
From: steve at (Steven D'Aprano)
Date: Wed, 10 Oct 2012 12:14:26 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<20121009043236.GI27445@ando> <>
Message-ID: <>

On 10/10/12 09:13, Joshua Landau wrote:
> Just a curiosity here (as I can guess of plausible reasons myself, so there
> probably are some official stances).
> Is there a reason NaNs are not instances of NaN class?

Because that would complicate Python's using floats for absolutely no benefit.
Instead of float operations always returning a float, they would have to return
a float or a NAN. To check for a valid floating point instance, instead of

isinstance(x, float)

you would have to say:

isinstance(x, (float, NAN))

And what about infinities, denorm numbers, and negative zero? Do they get
dedicated classes too?

And what is the point of this added complexity? Nothing.

You *still* have the rule that "x == x for all x, except for NANs". The
only difference is that "NANs" now means "instances of NAN class" rather than
"NAN floats" (and Decimals). Working with IEEE 754 floats is now far more of
a nuisance because some valid floating point values aren't floats but have a
different class, but nothing meaningful is different.

> Then x == x would be True (as they want), but [this NaN] == [that NaN]
> would be False, as expected.

Making NANs their own class wouldn't give you that. If we wanted that
behaviour, we could have it without introducing a NAN class: just change the
list __eq__ method to scan the list for a NAN using math.isnan before checking
whether the lists were identical.

But that would defeat the purpose of the identity check (an optimization to
avoid scanning the list)! Replacing math.isnan with isinstance doesn't change

> I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1),

That question has already been raised, and answered, repeatedly in this thread.

> but it seems a lot less of a big deal than all of the exceptions with
> container equalities.

Container equalities are not a big deal. I'm not sure what problem you think
you are solving.


From mikegraham at  Wed Oct 10 03:25:55 2012
From: mikegraham at (Mike Graham)
Date: Tue, 9 Oct 2012 21:25:55 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

On Tue, Oct 9, 2012 at 9:14 PM, Steven D'Aprano <steve at> wrote:
> On 10/10/12 09:13, Joshua Landau wrote:
>> Just a curiosity here (as I can guess of plausible reasons myself, so
>> there
>> probably are some official stances).
>> Is there a reason NaNs are not instances of NaN class?
> Because that would complicate Python's using floats for absolutely no
> benefit.
> Instead of float operations always returning a float, they would have to
> return
> a float or a NAN. To check for a valid floating point instance, instead of
> saying:
> isinstance(x, float)
> you would have to say:
> isinstance(x, (float, NAN))
> And what about infinities, denorm numbers, and negative zero? Do they get
> dedicated classes too?
> And what is the point of this added complexity? Nothing.
> You *still* have the rule that "x == x for all x, except for NANs". The
> only difference is that "NANs" now means "instances of NAN class" rather
> than
> "NAN floats" (and Decimals). Working with IEEE 754 floats is now far more of
> a nuisance because some valid floating point values aren't floats but have a
> different class, but nothing meaningful is different.
>> Then x == x would be True (as they want), but [this NaN] == [that NaN]
>> would be False, as expected.
> Making NANs their own class wouldn't give you that. If we wanted that
> behaviour, we could have it without introducing a NAN class: just change the
> list __eq__ method to scan the list for a NAN using math.isnan before
> checking
> whether the lists were identical.
> But that would defeat the purpose of the identity check (an optimization to
> avoid scanning the list)! Replacing math.isnan with isinstance doesn't
> change
> that.
>> I guess that raises the question about why x == x but sqrt(-1) !=
>> sqrt(-1),
> That question has already been raised, and answered, repeatedly in this
> thread.
>> but it seems a lot less of a big deal than all of the exceptions with
>> container equalities.
> Container equalities are not a big deal. I'm not sure what problem you think
> you are solving.
> --
> Steven

I'm sometimes surprised at the creativity and passion behind solutions
to this issue.

I've been a Python user for some years now, including time dealing
with stuff like numpy where you're fairly likely to run into NaNs.
I've been an active member of several support communities where I can
confidently say I have encountered tens of thousands of Python
questions. Not once can I recall ever having or seeing anyone have an
actual problem that I had or someone else had due to the way Python
handles NaN. As far as I can tell, it works _perfectly_.

I appreciate the aesthetic concerns, but I really wish someone would
explain to me what's actually broken and in need of fixing.


From wuwei23 at  Wed Oct 10 04:23:23 2012
From: wuwei23 at (alex23)
Date: Tue, 9 Oct 2012 19:23:23 -0700 (PDT)
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Oct 9, 5:14?pm, Guido van Rossum <gu... at> wrote:
> I spent a week with Bertrand recently.

Any chance you might blog about this? :)

From dholth at  Wed Oct 10 04:34:59 2012
From: dholth at (Daniel Holth)
Date: Tue, 9 Oct 2012 22:34:59 -0400
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <k51u5c$85f$>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 3:32 PM, Terry Reedy <tjreedy at> wrote:
> On 10/9/2012 9:30 AM, Eric Snow wrote:
>> On Oct 9, 2012 1:12 AM, "Senthil Kumaran"
>> <senthil at
>> <mailto:senthil at>> wrote:
>>  > > `p.pathjoin(q)`
>>  >
>>  > +1
>>  >
>>  > It is very explicit and hard to get it wrong.
> or path.concat(otherpath)
> --
> Terry Jan Reedy

I like the [] syntax. ZODB works this way when the subpath name is not
a valid Python identifier. a.b['c-d'] would be like a/b/c-d if ZODB
was a filesystem. I like the + syntax.

No one has suggested overloading the > operator?

p1 > p2 > p3

The < operator would keep its normal use for sorting. ;-)

From ncoghlan at  Wed Oct 10 06:02:57 2012
From: ncoghlan at (Nick Coghlan)
Date: Wed, 10 Oct 2012 09:32:57 +0530
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 6:04 PM, Antoine Pitrou <solipsis at> wrote:
> That's a very good idea! Even better if there's a way to make it work as
> expected with openat support (for example by allowing __fspath__ to return
> a (dir_fd, filename) tuple).

The other thing I thought might be useful is to try to tie it into the
new "opener" parameter for open somehow, On the other hand, that's
getting further into full-blown filesystem emulation territory
( That may not be a bad thing, though
- a proper filesystem abstraction might finally let us deal with
encoding and case-sensitivity issues in a sane way, since they're
filesystem dependent rather than platform dependent (e.g. opening
FAT/FAT32 devices on *nix systems).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From stephen at  Wed Oct 10 08:06:10 2012
From: stephen at (Stephen J. Turnbull)
Date: Wed, 10 Oct 2012 15:06:10 +0900
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
Message-ID: <>

Ethan Furman writes:

 > Or, to look at it another way, surely somewhere out in the Real World 
 > (tm) it is the case that two NaNs are indeed equal.

Sure, but according to Kahan's Uncertainty principle, you'll never be
able to detect it.

Really-there's no-alternative-to-backward-compatibility-or-IEEE754-ly y'rs

From greg.ewing at  Wed Oct 10 09:53:04 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 10 Oct 2012 20:53:04 +1300
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> But there are
> still quite a few situations in NDB where an uncaught
> exception prints a baffling traceback, showing lots of frames from the
> event loop and other async machinery but not the user code that was
> actually waiting for anything.

I just tried an experiment using Python 3.3. I modified the
parse_request() function of my spamserver example to raise
an exception that isn't caught anywhere:

def parse_request(line):
   tokens = line.split()
   if tokens and tokens[0] == b"EGGS":
     raise ValueError("Server is allergic to eggs")

The resulting traceback looks like this. The last two lines
show very clearly where abouts the exception occurred in
user code. So it all seems to work quite happily.

Traceback (most recent call last):
   File "", line 73, in <module>
"/Local/Projects/D/Python/YieldFrom/3.3/Examples/Scheduler/", line 
109, in run2
"/Local/Projects/D/Python/YieldFrom/3.3/Examples/Scheduler/", line 
53, in run
   File "", line 50, in handler
     n = parse_request(line)
   File "", line 61, in parse_request
     raise ValueError("Server is allergic to eggs")
ValueError: Server is allergic to eggs


From ronaldoussoren at  Wed Oct 10 10:09:33 2012
From: ronaldoussoren at (Ronald Oussoren)
Date: Wed, 10 Oct 2012 10:09:33 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 8 Oct, 2012, at 16:28, Oleg Broytman <phd at> wrote:

> On Mon, Oct 08, 2012 at 03:59:18PM +0200, Ronald Oussoren <ronaldoussoren at> wrote:
>> On 8 Oct, 2012, at 13:07, Oleg Broytman <phd at> wrote:
>>> On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren <ronaldoussoren at> wrote:
>>>> Or CIFS filesystems mounted on a Linux?   Case-sensitivity is a file-system property, not a operating system one.
>>>  But there is no API to ask what type of filesystem a path belongs to.
>>> So guessing by OS name is the only heuristic we can do.
>> I guess so, as neither statvs, statvfs,  nor pathconf seem to be able to tell if a filesystem is case insensitive.
>> The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem.
>   If a filesystem mounted to w32 is exported from a server by CIFS/SMB
> protocol -- is it case sensitive? What if said server is Linux? What if
> said filesystem was actually imported to Linux from a Novel server by
> NetWare Core Protocol. It's not a fictional situation -- I do it at
>; the server is Linux that mounts two CIFS and NCP filesystem
> and reexport them via Samba.

Even more fun :-).  CIFS/SMB from Windows to Linux or OSX behaves like a case-preserving filesystem on the systems I tested. Likewise a NFS filesystem exported from Linux to OSX behaves like a case sensitive filesystem if the Linux filesystem is case sensitive.

All in all the best we seem to be able to do is use the OS as a heuristic, most Unix filesystems are case sensitive while Windows and OSX filesystems are case preserving. 

> Oleg.
> -- 
>     Oleg Broytman              phd at
>           Programmers don't die, they just GOSUB without RETURN.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From ronaldoussoren at  Wed Oct 10 10:16:27 2012
From: ronaldoussoren at (Ronald Oussoren)
Date: Wed, 10 Oct 2012 10:16:27 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 9 Oct, 2012, at 0:23, Greg Ewing <greg.ewing at> wrote:

> Ronald Oussoren wrote:
>> neither statvs, statvfs,  nor pathconf seem to be able to tell if a filesystem is case insensitive.
> Even if they could, you wouldn't be entirely out of the woods,
> because different parts of the same path can be on different
> file systems...
> But how important is all this anyway? I'm trying to think of
> occasions when I've wanted to compare two entire paths for
> equality, and I can't think of *any*.

AFAIK the only place I care about case sensitivity in my code is when I'm basicly using glob or fnmatch.


> -- 
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From p.f.moore at  Wed Oct 10 11:54:58 2012
From: p.f.moore at (Paul Moore)
Date: Wed, 10 Oct 2012 10:54:58 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On 10 October 2012 09:16, Ronald Oussoren <ronaldoussoren at> wrote:
>> But how important is all this anyway? I'm trying to think of
>> occasions when I've wanted to compare two entire paths for
>> equality, and I can't think of *any*.
> AFAIK the only place I care about case sensitivity in my code is when I'm basicly using glob or fnmatch.

Mercurial had to consider this issue when dealing with repositories
built on Unix and being used on Windows. Specifically, it needed to
know, if the repository contains files README and ReadMe, could it
safely write both of these files without one overwriting the other.

Actually, something as simple as an unzip utility could hit the same
issue (it's just that it's not as critical to be careful with unzip as
with a DVCS system... :-))

I don't know how Mercurial fixed the problem in the end - I believe
the in-repo format encodes filenames to preserve case even on case
insensitive systems, and I *think* it detects case insensitive
filesystems for writing by writing a test file and reading it back in
a different case. But that may have changed.


From robert.kern at  Wed Oct 10 15:23:38 2012
From: robert.kern at (Robert Kern)
Date: Wed, 10 Oct 2012 14:23:38 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<20121009043236.GI27445@ando> <>
Message-ID: <k53ssp$3ri$>

On 10/10/12 2:25 AM, Mike Graham wrote:

> I'm sometimes surprised at the creativity and passion behind solutions
> to this issue.
> I've been a Python user for some years now, including time dealing
> with stuff like numpy where you're fairly likely to run into NaNs.
> I've been an active member of several support communities where I can
> confidently say I have encountered tens of thousands of Python
> questions. Not once can I recall ever having or seeing anyone have an
> actual problem that I had or someone else had due to the way Python
> handles NaN. As far as I can tell, it works _perfectly_.
> I appreciate the aesthetic concerns, but I really wish someone would
> explain to me what's actually broken and in need of fixing.

While I also don't think that anything needs to be fixed, I must say that in my 
years of monitoring tens of thousands of Python questions, there have been a few 
legitimate problems with the NaN behavior. It does come up from time to time.

The most frequent problem is checking if a list contains a NaN. The obvious 
thing to do for many users:

   nan in list_of_floats

This is a reasonable prediction based on what one normally does for most objects 
in Python, but this is quite wrong. But because list.__contains__() checks for 
identity first, it can look like it works when people test it out:

   >>> nan = float('nan')
   >>> nan in [1.0, 2.0, nan]

Then they write their code doing the wrong thing thinking that they tested their 

I classify this as a wart: it breaks reasonable predictions from users, requires 
more exceptions-based knowledge about NaNs to use correctly, and can trap users 
who do try to experiment to determine the behavior. But I think that the cost of 
acquiring and retaining such knowledge is not so onerous as to justify the cost 
of any of the attempts to fix the wart.

The other NaN wart (unrelated to this thread) is that sorting a list of floats 
containing a NaN will usually leave the list unsorted because "inequality 
comparisons with a NaN always return False" breaks the assumptions of timsort 
and other sorting algorithms. You should remember this, as you once demonstrated 
the problem:

This is a real problem, so much so that numpy works around it by enforcing our 
sorts to always sort NaN at the end of the array. Unfortunately, lists do not 
have the luxury of cheaply knowing the type of all of the objects in the list, 
so this is not an option for them.

Real problems, but nothing that motivates a change, in my opinion.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

From g.brandl at  Wed Oct 10 16:21:52 2012
From: g.brandl at (Georg Brandl)
Date: Wed, 10 Oct 2012 16:21:52 +0200
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <k523rh$ufi$>
References: <>
	<k520j8$sgj$> <k523rh$ufi$>
Message-ID: <k5407p$2sq$>

Am 09.10.2012 23:10, schrieb Serhiy Storchaka:
> On 09.10.12 23:15, Georg Brandl wrote:
>> The point is that in 99.9...% of cases,
>>    if x == True:
>> is just
>>    if x:
> Of cause. However in Lib/unittest/ I found a lot of "if x != 
> False:" which is not equivalent to just "if x:". It is equivalent to "if 
> x is None or x:" and so I left it as is.

Arguably, that should be "if x is not False", but it probably doesn't
matter too much.


From ben at  Wed Oct 10 18:41:33 2012
From: ben at (Ben Darnell)
Date: Wed, 10 Oct 2012 09:41:33 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 5:44 PM, Greg Ewing <greg.ewing at> wrote:
> You might like to take a look at this, where I develop a series of
> examples culminating in a simple multi-threaded server:

Thanks for this link, it was very helpful to see it all come together
from scratch.  And I think the most compelling thing about it is
something that I hadn't picked up on when I looked at "yield from"
before, that it naturally preserves the call stack for exception
handling.  That's a big deal, and may be worth the requirement of 3.3+
since the tricks we've used to get better exception handling in
earlier pythons have been pretty ugly.  On the other hand, it does
mean starting from scratch with a new asynchronous world that's not
directly compatible with the existing Twisted or Tornado ecosystems.


> Code here:
>> somehow it seems there *has*
>> to be a distinction between an operation you just *yield* (this would
>> be waiting for a specific low-level I/O operation) and something you
>> use with yield-from, which returns a value through StopIteration.
> It may be worth noting that nothing in my server example uses 'yield'
> to send or receive values -- yield is only used without argument as
> a suspension point. But the functions containing the yields *are*
> called with yield-from and may return values via StopIteration.
> So I think there are (at least) two distinct ways of using generators,
> but the distinction isn't quite the one you're making. Rather, we
> have "coroutines" (don't yield values, do return values) and
> "iterators" (do yield values, don't return values).
> Moreover, it's *only* the "coroutine" variety that we need to cater
> for when designing an async event system. Does that help to
> alleviate any of your monad-induced headaches?
> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From dreamingforward at  Wed Oct 10 18:56:17 2012
From: dreamingforward at (Mark Adam)
Date: Wed, 10 Oct 2012 11:56:17 -0500
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 1:53 AM, Ben Darnell <ben at> wrote:
> On Mon, Oct 8, 2012 at 10:56 PM, Greg Ewing <greg.ewing at> wrote:
>> Mark Adam wrote:
>>> 1) event handlers for the machine-program interface (ex. network I/O)
>>> 2) event handlers for the program-user interface (ex. mouse I/O)
>>> While similar, my gut tell me they have to be handled in completely
>>> different way in order to preserve order (i.e. sanity).
>> They can't be *completely* different, because deep down there
>> has to be a single event loop that can handle all kinds of
>> asynchronous events.
> There doesn't *have* to be - you could run a network event loop in one
> thread and a GUI event loop in another and pass control back and forth
> via methods like IOLoop.add_callback or Reactor.callFromThread.

No, this won't work.  The key FAIL in that sentence is "...and pass
control", because the O.S. has to be in charge of things that happen
in user space.  And everything in Python happens in user space.
(hence my suggestion of creating a Python O.S.).


From storchaka at  Wed Oct 10 19:13:22 2012
From: storchaka at (Serhiy Storchaka)
Date: Wed, 10 Oct 2012 20:13:22 +0300
Subject: [Python-ideas] Make "is" checks on non-singleton literals errors
In-Reply-To: <k5407p$2sq$>
References: <>
	<k520j8$sgj$> <k523rh$ufi$>
Message-ID: <k54abh$6l2$>

On 10.10.12 17:21, Georg Brandl wrote:
> Arguably, that should be "if x is not False", but it probably doesn't
> matter too much.

Some old code can use 1/0 instead True/False. This change will break 
such code.

From ben at  Wed Oct 10 19:29:35 2012
From: ben at (Ben Darnell)
Date: Wed, 10 Oct 2012 10:29:35 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 10, 2012 at 9:56 AM, Mark Adam <dreamingforward at> wrote:
> On Tue, Oct 9, 2012 at 1:53 AM, Ben Darnell <ben at> wrote:
>> On Mon, Oct 8, 2012 at 10:56 PM, Greg Ewing <greg.ewing at> wrote:
>>> Mark Adam wrote:
>>>> 1) event handlers for the machine-program interface (ex. network I/O)
>>>> 2) event handlers for the program-user interface (ex. mouse I/O)
>>>> While similar, my gut tell me they have to be handled in completely
>>>> different way in order to preserve order (i.e. sanity).
>>> They can't be *completely* different, because deep down there
>>> has to be a single event loop that can handle all kinds of
>>> asynchronous events.
>> There doesn't *have* to be - you could run a network event loop in one
>> thread and a GUI event loop in another and pass control back and forth
>> via methods like IOLoop.add_callback or Reactor.callFromThread.
> No, this won't work.  The key FAIL in that sentence is "...and pass
> control", because the O.S. has to be in charge of things that happen
> in user space.  And everything in Python happens in user space.
> (hence my suggestion of creating a Python O.S.).

Letting the OS/GUI library have control of the UI thread is exactly
the point I was making.  Perhaps "pass control" was a little vague,
but what I meant is that you'd have two threads, one for UI and one
for networking.  When you need to start a network operation from the
UI thread you'd use IOLoop.add_callback() to pass a function to the
network thread, and then when the network operation completes you'd
use the analogous function from the UI library to send the response
back and update the interface from the UI thread.


From solipsis at  Wed Oct 10 21:07:03 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 10 Oct 2012 21:07:03 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
References: <>
Message-ID: <>

On Tue, 9 Oct 2012 00:45:41 +0530
Nick Coghlan <ncoghlan at> wrote:
> On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum <guido at> wrote:
> > I don't like any of those; I'd vote for another regular method, maybe
> > p.pathjoin(q).
> I don't *love* joinpath as a name, I just don't actively dislike it
> the way I do the four presented options (and it has the virtue of the
> precedent).

How about ?



Software development and contracting:

From michelelacchia at  Wed Oct 10 21:19:40 2012
From: michelelacchia at (Michele Lacchia)
Date: Wed, 10 Oct 2012 12:19:40 -0700 (PDT)
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

> > 
> > I don't *love* joinpath as a name, I just don't actively dislike it 
> > the way I do the four presented options (and it has the virtue of the 
> > precedent). 
> How about ? 

to() is just awesome. Short, rather easy to guess what it does, and easy
to remember once you start using it.
So now +1 on to() and &.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ethan at  Wed Oct 10 21:13:19 2012
From: ethan at (Ethan Furman)
Date: Wed, 10 Oct 2012 12:13:19 -0700
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>	<>	<>
Message-ID: <>

Antoine Pitrou wrote:
> On Tue, 9 Oct 2012 00:45:41 +0530
> Nick Coghlan <ncoghlan at> wrote:
>> On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum <guido at> wrote:
>>> I don't like any of those; I'd vote for another regular method, maybe
>>> p.pathjoin(q).
> [...]
>> I don't *love* joinpath as a name, I just don't actively dislike it
>> the way I do the four presented options (and it has the virtue of the
>> precedent).
> How about ?

.to  -> +0
.add -> +1

From mikegraham at  Wed Oct 10 21:36:08 2012
From: mikegraham at (Mike Graham)
Date: Wed, 10 Oct 2012 15:36:08 -0400
Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings
Message-ID: <>

The literal"\c" should be an error but in practice means "\\c". It's
probably too late to make this invalid syntax as it out to be, but I
wonder if a warning isn't in order, especially with the theoretical
potential of adding new string escapes in the future.

From solipsis at  Wed Oct 10 21:46:07 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 10 Oct 2012 21:46:07 +0200
Subject: [Python-ideas] Make undefined escape sequences have
References: <>
Message-ID: <>

On Wed, 10 Oct 2012 15:36:08 -0400
Mike Graham <mikegraham at> wrote:
> The literal"\c" should be an error but in practice means "\\c". It's
> probably too late to make this invalid syntax as it out to be, but I
> wonder if a warning isn't in order, especially with the theoretical
> potential of adding new string escapes in the future.

-1. This will make life more difficult with regular expressions (and
produce lots of spurious warnings in existing code).



Software development and contracting:

From storchaka at  Wed Oct 10 22:04:25 2012
From: storchaka at (Serhiy Storchaka)
Date: Wed, 10 Oct 2012 23:04:25 +0300
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <>
References: <>
Message-ID: <k54kck$8q8$>

On 10.10.12 22:46, Antoine Pitrou wrote:
> -1. This will make life more difficult with regular expressions (and
> produce lots of spurious warnings in existing code).

Strings for regular expressions always should be raw. Now regular 
expressions supports \u and \U escapes and no reason to use non-raw strings.

From mikegraham at  Wed Oct 10 22:08:22 2012
From: mikegraham at (Mike Graham)
Date: Wed, 10 Oct 2012 16:08:22 -0400
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou <solipsis at> wrote:
> On Wed, 10 Oct 2012 15:36:08 -0400
> Mike Graham <mikegraham at> wrote:
>> The literal"\c" should be an error but in practice means "\\c". It's
>> probably too late to make this invalid syntax as it out to be, but I
>> wonder if a warning isn't in order, especially with the theoretical
>> potential of adding new string escapes in the future.
> -1. This will make life more difficult with regular expressions (and
> produce lots of spurious warnings in existing code).
> Regards
> Antoine.

Regular expressions are difficult if you're remembering which escape
sequences exist and are easy if you're using raw string literals.


From python at  Wed Oct 10 22:11:49 2012
From: python at (MRAB)
Date: Wed, 10 Oct 2012 21:11:49 +0100
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-10 20:46, Antoine Pitrou wrote:
> On Wed, 10 Oct 2012 15:36:08 -0400
> Mike Graham <mikegraham at> wrote:
>> The literal"\c" should be an error but in practice means "\\c". It's
>> probably too late to make this invalid syntax as it out to be, but I
>> wonder if a warning isn't in order, especially with the theoretical
>> potential of adding new string escapes in the future.
> -1. This will make life more difficult with regular expressions (and
> produce lots of spurious warnings in existing code).
How would it make life more difficult with regular expressions?

I would've preferred:

1. Unknown escapes in string literals give a compile-time error

2. Raw string literals treat backslashes as pure literals

3. Unknown escapes in regex patterns give a run-time error

Unfortunately, changing them would break existing code. (I retain the
behaviour of re in the regex module for this reason, not that I like
it. :-()

It would've been nice if the 'fix' had been made in Python 3...

From solipsis at  Wed Oct 10 22:16:03 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 10 Oct 2012 22:16:03 +0200
Subject: [Python-ideas] Make undefined escape sequences have
References: <>
Message-ID: <>

On Wed, 10 Oct 2012 16:08:22 -0400
Mike Graham <mikegraham at> wrote:
> On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou <solipsis-xNDA5Wrcr86sTnJN9+BGXg at> wrote:
> > On Wed, 10 Oct 2012 15:36:08 -0400
> > Mike Graham <mikegraham at> wrote:
> >> The literal"\c" should be an error but in practice means "\\c". It's
> >> probably too late to make this invalid syntax as it out to be, but I
> >> wonder if a warning isn't in order, especially with the theoretical
> >> potential of adding new string escapes in the future.
> >
> > -1. This will make life more difficult with regular expressions (and
> > produce lots of spurious warnings in existing code).
> >
> > Regards
> >
> > Antoine.
> Regular expressions are difficult if you're remembering which escape
> sequences exist and are easy if you're using raw string literals.

That's a misconception, since as the re docs mention:

?Most of the standard escapes supported by Python string literals are
also accepted by the regular expression parser: [snip]?

In other words, whether you put "\t" or "\\t" in a regexp doesn't
matter: it means the same to the regexp engine.



Software development and contracting:

From solipsis at  Wed Oct 10 22:18:39 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 10 Oct 2012 22:18:39 +0200
Subject: [Python-ideas] Make undefined escape sequences have
References: <>
	<> <k54kck$8q8$>
Message-ID: <>

On Wed, 10 Oct 2012 23:04:25 +0300
Serhiy Storchaka <storchaka at>
> On 10.10.12 22:46, Antoine Pitrou wrote:
> > -1. This will make life more difficult with regular expressions (and
> > produce lots of spurious warnings in existing code).
> Strings for regular expressions always should be raw. Now regular 
> expressions supports \u and \U escapes and no reason to use non-raw strings.

That's a style issue, not a language rule.



Software development and contracting:

From mikegraham at  Wed Oct 10 22:45:06 2012
From: mikegraham at (Mike Graham)
Date: Wed, 10 Oct 2012 16:45:06 -0400
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 10, 2012 at 4:16 PM, Antoine Pitrou <solipsis at> wrote:
> On Wed, 10 Oct 2012 16:08:22 -0400
> Mike Graham <mikegraham at> wrote:
>> On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou <solipsis-xNDA5Wrcr86sTnJN9+BGXg at> wrote:
>> > On Wed, 10 Oct 2012 15:36:08 -0400
>> > Mike Graham <mikegraham at> wrote:
>> >> The literal"\c" should be an error but in practice means "\\c". It's
>> >> probably too late to make this invalid syntax as it out to be, but I
>> >> wonder if a warning isn't in order, especially with the theoretical
>> >> potential of adding new string escapes in the future.
>> >
>> > -1. This will make life more difficult with regular expressions (and
>> > produce lots of spurious warnings in existing code).
>> >
>> > Regards
>> >
>> > Antoine.
>> Regular expressions are difficult if you're remembering which escape
>> sequences exist and are easy if you're using raw string literals.
> That's a misconception, since as the re docs mention:
> ?Most of the standard escapes supported by Python string literals are
> also accepted by the regular expression parser: [snip]?
> In other words, whether you put "\t" or "\\t" in a regexp doesn't
> matter: it means the same to the regexp engine.
> Regards
> Antoine.

I'm not sure what misconception you're saying I have. An example of
when you have to remember what the escapes are is

>>>"\by\b", "x y z") is None
>>>"\\by\\b", "x y z") is None


From greg.ewing at  Wed Oct 10 23:23:00 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 11 Oct 2012 10:23:00 +1300
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>Mark Adam wrote:

>>>There doesn't *have* to be - you could run a network event loop in one
>>>thread and a GUI event loop in another and pass control back and forth
>>>via methods like IOLoop.add_callback or Reactor.callFromThread.

Well, that could be done, but one of the reasons for using
an event loop approach in the first place is to avoid having
to deal with threads and all their attendant concurrency


From at  Wed Oct 10 23:33:45 2012
From: at (Joshua Landau)
Date: Wed, 10 Oct 2012 22:33:45 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

On 10 October 2012 02:14, Steven D'Aprano <steve at> wrote:

> On 10/10/12 09:13, Joshua Landau wrote:
>> Just a curiosity here (as I can guess of plausible reasons myself, so
>> there
>> probably are some official stances).
>> Is there a reason NaNs are not instances of NaN class?
> Because that would complicate Python's using floats for absolutely no
> benefit.
> Instead of float operations always returning a float, they would have to
> return
> a float or a NAN. To check for a valid floating point instance, instead of
> saying:
> isinstance(x, float)
> you would have to say:
> isinstance(x, (float, NAN))

Not the way I'm proposing it.

>>> class NAN(float):
> ...     def __new__(self):
> ...             return float.__new__(self, "nan")
> ...     def __eq__(self, other):
> ...             return other is self
> ...
> >>> isinstance(NAN(), float)
> True
> >>> NAN() is NAN()
> False
> >>> NAN() == NAN()
> False
> >>> x = NAN()
> >>> x is x
> True
> >>> x == x
> True
> >>> x
> nan

> And what about infinities, denorm numbers, and negative zero? Do they get
> dedicated classes too?

Infinities? No, although they might well if the infinities were different
(set of reals vs set of ints, for example).
Denorms? No, that's a completely different thing.
-0.0? No, that's a completely different thing.

I was asking, because instances of a class maps on to a behavior that
matches *almost exactly* what *both* parties want, why was it not used?
This is not the case with anything other than that.

And what is the point of this added complexity? Nothing.

Simplicity. It's simpler.

> You *still* have the rule that "x == x for all x, except for NANs".

False. I was proposing that x == x but NAN() != NAN().

> The only difference is that "NANs" now means "instances of NAN class"
> rather than
> "NAN floats" (and Decimals).

False, if you subclass float.

> Working with IEEE 754 floats is now far more of
> a nuisance because some valid floating point values aren't floats but have
> a
> different class, but nothing meaningful is different.

 Then x == x would be True (as they want), but [this NaN] == [that NaN]
>> would be False, as expected.
> Making NANs their own class wouldn't give you that. If we wanted that
> behaviour, we could have it without introducing a NAN class: just change
> the
> list __eq__ method to scan the list for a NAN using math.isnan before
> checking
> whether the lists were identical.


>>> x == x
> True
> >>> [NAN()] == [NAN()]
> False

as per my previous "implementation".

> But that would defeat the purpose of the identity check (an optimization to
> avoid scanning the list)! Replacing math.isnan with isinstance doesn't
> change
> that.
>  I guess that raises the question about why x == x but sqrt(-1) !=
>> sqrt(-1),
> That question has already been raised, and answered, repeatedly in this
> thread.

False. x != x, so that has *not* been "answered". This was an example
problem with my own suggested implementation.

 but it seems a lot less of a big deal than all of the exceptions with
>> container equalities.
> Container equalities are not a big deal. I'm not sure what problem you
> think
> you are solving.

 Why would you assume that? I mentioned it from *honest* *curiosity*, and
all I got back was an attack. Please, I want to be civil but you need to
act less angrily.

[Has not been spell-checked, as I don't really have time </lie>]

Thank you for your time, even though I disagree,

Joshua Landau
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Wed Oct 10 23:38:32 2012
From: at (Joshua Landau)
Date: Wed, 10 Oct 2012 22:38:32 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

On 10 October 2012 22:33, Joshua Landau < at> wrote:

>   Why would you assume that? I mentioned it from *honest* *curiosity*,
> and all I got back was an attack. Please, I want to be civil but you need
> to act less angrily.

After reconsidering, I regret these sentences.

Yes, I do still believe your response was overly angry, but I did get a
thought out response and you did try and address my concerns. In the
interest of benevolence, may I redact my statement?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Thu Oct 11 00:05:43 2012
From: at (Joshua Landau)
Date: Wed, 10 Oct 2012 23:05:43 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

I don't normally triple-post, but here it goes.

After re-re-reading this thread, it turns out one *(1)* post and two
*(2)* answers
to that post have covered a topic very similar to the one I have raised.
All of the others, to my understanding, do not dwell over the fact
that *float("nan")
is not float("nan")* . The mentioned post was not quite the same as mine,
but it still had two replies.

I will respond to them here. My response, again, is a curiosity why, *not* a
suggestion to change anything. I agree that there is probably no real
concern with the current state, I have never had a concern and the concern
caused by change would dwarf any possible benefits.

Response 1:
This implies that you want to differentiate between -0.0 and +0.0. That is

My response:
Why would I want to do that?

Response 2:
"There is not space on this thread to convince you otherwise." [paraphrased]

My response:
That comment was not directed at me and thus has little relevance to my own

Hopefully now you should understand why I felt need to ask the question
after so much has already been said on the topic.

Finally, Mike Graham says (probably referring to me):
"I'm sometimes surprised at the creativity and passion behind solutions to
this issue."

My response:
It was an immediate thought, not one dwelled upon. The fact it was not
answered in the thread prompted my curiosity. It is *honestly* nothing more.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From oscar.j.benjamin at  Thu Oct 11 00:42:06 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Wed, 10 Oct 2012 23:42:06 +0100
Subject: [Python-ideas] Floating point contexts in Python core
Message-ID: <>

On 9 October 2012 02:07, Guido van Rossum <guido at> wrote:
> On Mon, Oct 8, 2012 at 5:32 PM, Oscar Benjamin
> <oscar.j.benjamin at> wrote:
>> On 9 October 2012 01:11, Guido van Rossum <guido at> wrote:
>>> On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing <greg.ewing at> wrote:
>>>> So the question that really needs to be answered, I think, is
>>>> not "Why is NaN == NaN false?", but "Why doesn't NaN == anything
>>>> raise an exception, when it would make so much more sense to
>>>> do so?"
>>> Because == raising an exception is really unpleasant. We had this in
>>> Python 2 for unicode/str comparisons and it was very awkward.
>>> Nobody arguing against the status quo seems to care at all about
>>> numerical algorithms though. I propose that you go find some numerical
>>> mathematicians and ask them.
>> The main purpose of quiet NaNs is to propagate through computation
>> ruining everything they touch. In a programming language like C that
>> lacks exceptions this is important as it allows you to avoid checking
>> all the time for invalid values, whilst still being able to know if
>> the end result of your computation was ever affected by an invalid
>> numerical operation. The reasons for NaNs to compare unequal are no
>> doubt related to this purpose.
>> It is of course arguable whether the same reasoning applies to a
>> language like Python that has a very good system of exceptions but I
>> agree with Guido that raising an exception on == would be unfortunate.
>> How many people would forget that they needed to catch those
>> exceptions? How awkward could your code be if you did remember to
>> catch all those exceptions? In an exception handling language it's
>> important to know that there are some operations that you can trust.
> If we want to do *anything* I think we should first introduce a
> floating point context similar to the Decimal context. Then we can
> talk.

The other thread has gone on for ages now and isn't going anywhere.
Guido's suggestion here is much more interesting (to me) so I want to
start a new thread on this subject. Python's default handling of
floating point operations is IEEE-754 compliant which in my opinion is
the obvious and right thing to do.

However, Python is a much more versatile language than some of the
other languages for which IEEE-754 was designed. Python offers the
possibility of a very rich approach to the control and verification of
the accuracy of numeric operations on both a function by function and
code block by code block basis. This kind of functionality is already
implemented in the decimal module [1] as well as numpy [2], gmpy [3],
sympy [4] and no doubt other numerical modules that I'm not aware of.
It would be a real blessing to numerical Python programmers if
either/both of the following were to occur:

1) Support for calculation contexts with floats
2) A generic kind of calculation context manager that was recognised
widely by the builtin/stdlib types and also by third party numerical



From g.brandl at  Thu Oct 11 00:45:33 2012
From: g.brandl at (Georg Brandl)
Date: Thu, 11 Oct 2012 00:45:33 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k54to7$p05$>

Am 10.10.2012 21:07, schrieb Antoine Pitrou:
> On Tue, 9 Oct 2012 00:45:41 +0530
> Nick Coghlan <ncoghlan at> wrote:
>> On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum <guido at> wrote:
>> > I don't like any of those; I'd vote for another regular method, maybe
>> > p.pathjoin(q).
> [...]
>> I don't *love* joinpath as a name, I just don't actively dislike it
>> the way I do the four presented options (and it has the virtue of the
>> precedent).
> How about ?

I'd have no idea what it means, honestly.


From zuo at  Thu Oct 11 00:51:14 2012
From: zuo at (Jan Kaliszewski)
Date: Thu, 11 Oct 2012 00:51:14 +0200
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <k51fql$r7k$>
References: <k4q38d$j8e$> <>
Message-ID: <>

Hello .*

On 09.10.2012 17:28, Serhiy Storchaka wrote:
> On 09.10.12 16:07, Oscar Benjamin wrote:
>> I really should have checked this before posting but I didn't have
>> Python 3.3 available:
> Generator expression also eats the StopIteration value:
>>>> next(x for x in f())
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> StopIteration

Why shouldn't it "eat" that value? The full-generator equivalent, even 
with `yield from`, will "eat" it also:

     >>> def _make_this_gen_expr_equivalent():
     >>>     yield from f()  # or:  for x in f(): yield x
     >>> g = _make_this_gen_expr_equivalent()
     >>> next(g)
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>

After all, any generator will "eat" the StopIteration argument unless:
* explicitly propagates it (with `return arg` or `raise 
StopIteration(arg)`), or
* iterates over the subiterator "by hand" using the next() builtin or 
the __next__() method and does not catch StopIteration.

I believe that the new `yield from...` feature changes nothing in the 
*iterator* protocol.

What it adds are only two things that can be placed in the code of a 
1) a return statement with a value -- finishing execution of the 
generator + raising StopIteration instantiated with that value passed as 
the only argument,
2) a `yield from subiterator` expression which propagates the items 
generated by the subiterator (not necessarily a generator) + does all 
that dance with __next__/send/throw/close (see PEP 380...) + caches 
StopIteration and returns the value passed with this exception (if any 
value has been passed).

Not less, not more. Especially, the `yield from subiterator` expression 
itself* does not propagate* its value outside the generator.

The goal described by the OP could be reached with a wrapper generator 
-- something like this:

     def preservefrom(iter_factory, *args, which):
         final_value = None
         subiter = iter(which)
         def catching_gen():
             nonlocal final_value
                 while True:
                     yield next(subiter)
             except StopIteration as exc:
                 if exc.args:
                     final_value = exc.args[0]
         args = [arg if arg is not which else catching_gen()
                 for arg in args]
         yield from iter_factory(*args)
         return final_value

Example usage:

     >>> import itertools
     >>> def f():
     ...     yield 'g'
     ...     return 1000000
     >>> my_gen = f()
     >>> my_chain = preservefrom(itertools.chain, 'abc', 'def', my_gen, 
     >>> while True:
     ...     print(next(my_chain))
     Traceback (most recent call last):
       File "<stdin>", line 2, in <module>
     StopIteration: 1000000
     >>> my_gen = f()
     >>> my_filter = preservefrom(filter, lambda x: True, my_gen, 
     >>> next(my_filter)
     >>> next(my_filter)
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
     StopIteration: 1000000

Best regards.

From alexander.belopolsky at  Thu Oct 11 00:56:08 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Wed, 10 Oct 2012 18:56:08 -0400
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 10, 2012, at 6:42 PM, Oscar Benjamin <oscar.j.benjamin at> wrote:

>> If we want to do *anything* I think we should first introduce a
>> floating point context similar to the Decimal context. Then we can
>> talk.
> The other thread has gone on for ages now and isn't going anywhere.
> Guido's suggestion here is much more interesting (to me) so I want to
> start a new thread on this subject. Python's default handling of
> floating point operations is IEEE-754 compliant which in my opinion is
> the obvious and right thing to do.

I gave this idea +float('inf') in the other thread and was thinking about it since. I  am now toying with the idea to unify float and decimal in Python.  IEEE combined their two FP standards in one  recently, so we have a precedent for this.

We can start by extending decimal to support radix 2 and once that code is mature enough and has accelerated code for platform formats (single, double, long double), we can replace Python float with the new fully platform independent IEEE 754 compliant implementation. We can even supply a legacy context to support some current warts. 

From zuo at  Thu Oct 11 01:29:32 2012
From: zuo at (Jan Kaliszewski)
Date: Thu, 11 Oct 2012 01:29:32 +0200
Subject: [Python-ideas] Propagating StopIteration value
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

> The goal described by the OP could be reached with a wrapper
> generator -- something like this:

PS. A more convenient version (you don't need to repeat yourself):

     import collections

     def genfrom(iter_factory, *args):
         final_value = None
         def catching(iterable):
             subiter = iter(iterable)
             nonlocal final_value
                 while True:
                     yield next(subiter)
             except StopIteration as exc:
                 if exc.args:
                     final_value = exc.args[0]
         args = [catching(arg.iterable) if isinstance(arg, genfrom.this) 
else arg
                 for arg in args]
         yield from iter_factory(*args)
         return final_value

     genfrom.this = collections.namedtuple('propagate_from_this', 

Some examples:

     >>> import itertools
     >>> def f():
     ...     yield 'g'
     ...     return 10000000
     >>> my_chain = genfrom(itertools.chain, 'abc', 'def', 
     >>> while True:
     ...     print(next(my_chain))
     Traceback (most recent call last):
       File "<stdin>", line 2, in <module>
     StopIteration: 10000000
     >>> my_filter = genfrom(filter, lambda x: True, genfrom.this(f()))
     >>> next(my_filter)
     >>> next(my_filter)
     Traceback (most recent call last):
       File "<stdin>", line 1, in <module>
     StopIteration: 10000000

From zuo at  Thu Oct 11 01:38:21 2012
From: zuo at (Jan Kaliszewski)
Date: Thu, 11 Oct 2012 01:38:21 +0200
Subject: [Python-ideas] Propagating StopIteration value [PS. #2]
In-Reply-To: <>
References: <k4q38d$j8e$> <>
Message-ID: <>

W dniu 11.10.2012 01:29, Jan Kaliszewski napisa?(a):
>> The goal described by the OP could be reached with a wrapper
>> generator -- something like this:
> [snip]
> PS. A more convenient version (you don't need to repeat yourself):

PS2. Sorry for flooding, obviously it can be simpler:

     import collections

     def genfrom(iter_factory, *args):
         final_value = None
         def catching(iterable):
             nonlocal final_value
             final_value = yield from iterable
         args = [catching(arg.iterable) if isinstance(arg, genfrom.this) 
else arg
                 for arg in args]
         yield from iter_factory(*args)
         return final_value

     genfrom.this = collections.namedtuple('propagate_from_this', 


From steve at  Thu Oct 11 02:07:45 2012
From: steve at (Steven D'Aprano)
Date: Thu, 11 Oct 2012 11:07:45 +1100
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <k54kck$8q8$>
References: <>
	<> <k54kck$8q8$>
Message-ID: <>

On 11/10/12 07:04, Serhiy Storchaka wrote:
> On 10.10.12 22:46, Antoine Pitrou wrote:
>> -1. This will make life more difficult with regular expressions (and
>> produce lots of spurious warnings in existing code).
> Strings for regular expressions always should be raw.

Why? The re module doesn't care how you construct the strings. It *can't* care
how you construct the strings.

Something like'\D*', 'abcd1234xyz') works perfectly well and there
is no need for a raw string. Any requirement to "always use raw strings" is a
style issue, not a language issue.


From steve at  Thu Oct 11 02:08:13 2012
From: steve at (Steven D'Aprano)
Date: Thu, 11 Oct 2012 11:08:13 +1100
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <>
References: <>
Message-ID: <>

On 11/10/12 07:08, Mike Graham wrote:
> On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou<solipsis at>  wrote:
>> On Wed, 10 Oct 2012 15:36:08 -0400
>> Mike Graham<mikegraham at>  wrote:

>>> The literal"\c" should be an error

Who says so? My bash shell disagrees with you:

[steve at ando ~]$ touch spam
[steve at ando ~]$ ls s\pa\m

and so do I.

There are three obvious behaviours for extraneous escapes:

1) backslash-c resolves to just c (what bash and VisualStudio do)
2) backslash-c resolves to backslash-c (what Python does)
3) raise an exception or compile-time error (what Java does)

It is undefined behaviour in C.

It is a matter of opinion that Java got it right and the others got it
wrong, one which I do not share.

>>> but in practice means "\\c". It's
>>> probably too late to make this invalid syntax as it out to be, but I
>>> wonder if a warning isn't in order, especially with the theoretical
>>> potential of adding new string escapes in the future.
>> -1. This will make life more difficult with regular expressions (and
>> produce lots of spurious warnings in existing code).

I agree with Antoine here.

If and when there is a serious, concrete proposal to add a new string
escape, and not just a "theoretical potential", then we should consider
adding warnings.

> Regular expressions are difficult if you're remembering which escape
> sequences exist and are easy if you're using raw string literals.

Just because some people find it hard to remember doesn't mean that it
should be an error *not* to use raw strings.


From steve at  Thu Oct 11 02:25:20 2012
From: steve at (Steven D'Aprano)
Date: Thu, 11 Oct 2012 11:25:20 +1100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On 11/10/12 06:07, Antoine Pitrou wrote:
> On Tue, 9 Oct 2012 00:45:41 +0530
> Nick Coghlan<ncoghlan at>  wrote:
>> On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum<guido at>  wrote:
>>> I don't like any of those; I'd vote for another regular method, maybe
>>> p.pathjoin(q).
> [...]
>> I don't *love* joinpath as a name, I just don't actively dislike it
>> the way I do the four presented options (and it has the virtue of the
>> precedent).
> How about ?


"To" implies to me either:

* one_path is mutated to become other_path; or

* you supply the end points, and the method finds a path between them

neither of which is remotely relevant. It certainly is not a synonym
for add/join/combine/concat paths. Brevity is not more important than


From trent at  Thu Oct 11 02:55:23 2012
From: trent at (Trent Nelson)
Date: Wed, 10 Oct 2012 20:55:23 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 08, 2012 at 05:13:03PM -0700, Christian Heimes wrote:
> Am 08.10.2012 17:35, schrieb Guido van Rossum:
> > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes <christian at> wrote:
> >> Python's standard library doesn't contain in interface to I/O Completion
> >> Ports. I think a common event loop system is a good reason to add IOCP
> >> if somebody is up for the challenge.
> >>
> >> Would you prefer an IOCP wrapper in the stdlib or your own version?
> >> Twisted has its own Cython based wrapper, some other libraries use a
> >> libevent-based solution.
> > 
> > What's an IOCP?
> I/O Completion Ports,
> It's a Windows (and apparently also Solaris)

    And AIX, too.  For every OS IOCP implementation, there's a
    corresponding Snakebite box :-)

> API for async IO that can handle multiple threads.

    I find it helps to think of it in terms of a half-sync/half-async
    pattern.  The half-async part handles the I/O; the OS wakes up one
    of your "I/O" threads upon incoming I/O.  The job of such threads
    is really just to pull/push the bytes from/to kernel/user space as
    quickly as it can.

        (Since Vista, Windows has provided a corresponding thread pool
         API that gels really well with IOCP.  Windows will optimally
         manage threads based on incoming I/O; spawning/destroying
         threads as per necessary.  You can even indicate to Windows
         whether your threads will be "compute" or I/O bound, which
         it uses to optimize its scheduling algorithm.)

    The half-sync part is the event-loop part of your app, which simply
    churns away on the data prepared for it by the async threads.

    What would be neat is if the half-async path could be run outside
    the GIL.  They would need to be able to allocate memory that could
    then be "owned" by the GIL-holding half-sync part.

    You could leverage this with kqueue and epoll; have similar threads
    set up to simply process I/O independent of the GIL, using the same
    facilities that would be used by IOCP-processing threads.

    Then the "asyncore" event-loop simply becomes the half-sync part of
    the pattern, enumerating over all the I/O requests queued up for it
    by all the GIL-independent half-async threads.


From steve at  Thu Oct 11 03:20:39 2012
From: steve at (Steven D'Aprano)
Date: Thu, 11 Oct 2012 12:20:39 +1100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<20121009043236.GI27445@ando> <>
Message-ID: <>

On 11/10/12 09:05, Joshua Landau wrote:

> After re-re-reading this thread, it turns out one *(1)* post and two
> *(2)* answers
> to that post have covered a topic very similar to the one I have raised.
> All of the others, to my understanding, do not dwell over the fact
> that *float("nan") is not float("nan")* .

That's no different from any other float.

py> float('nan') is float('nan')
py> float('1.5') is float('1.5')

Floats are not interned or cached, although of course interning is
implementation dependent and this is subject to change without notice.

For that matter, it's true of *nearly all builtins* in Python. The
exceptions being bool(obj) which returns one of two fixed instances,
and int() and str(), where *some* but not all instances are cached.

> Response 1:
> This implies that you want to differentiate between -0.0 and +0.0. That is
> bad.
> My response:
> Why would I want to do that?

If you are doing numeric work, you *should* differentiate between -0.0
and 0.0. That's why the IEEE 754 standard mandates a -0.0.

Both -0.0 and 0.0 compare equal, but they can be distinguished (although
doing so is tricky in Python). The reason for distinguishing them is to
distinguish between underflow to zero from positive or negative values.
E.g. log(x) should return -infinity if x underflows from a positive value,
and a NaN if x underflows from a negative.


From steve at  Thu Oct 11 03:30:13 2012
From: steve at (Steven D'Aprano)
Date: Thu, 11 Oct 2012 12:30:13 +1100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 11/10/12 09:56, Alexander Belopolsky wrote:

> I gave this idea +float('inf') in the other thread and was thinking
>  about it since. I  am now toying with the idea to unify float and
>decimal in Python.  IEEE combined their two FP standards in one
>  recently, so we have a precedent for this.
> We can start by extending decimal to support radix 2 and once that
>  code is mature enough and has accelerated code for platform formats
>  (single, double, long double),

I don't want to be greedy, but supporting minifloats would be a real
boon to beginners trying to learn how floats work.

>we can replace Python float with the
>  new fully platform independent IEEE 754 compliant implementation.
>We can even supply a legacy context to support some current warts.

This all sounds very exciting, but also like a huge amount of work.


From guido at  Thu Oct 11 03:44:04 2012
From: guido at (Guido van Rossum)
Date: Wed, 10 Oct 2012 18:44:04 -0700
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

> This all sounds very exciting, but also like a huge amount of work.

Indeed. But that's what we're here for.

Anyway, as an indication of the amount of work, you might want to look
at the fpectl module -- the module itself is tiny, but its
introduction required a huge amount of changes to every place where
CPython uses a double. I don't know if anybody uses it, though it's
still in the Py3k codebase.

--Guido van Rossum (

From chris.jerdonek at  Thu Oct 11 04:24:07 2012
From: chris.jerdonek at (Chris Jerdonek)
Date: Wed, 10 Oct 2012 19:24:07 -0700
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <k51u5c$85f$>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 12:32 PM, Terry Reedy <tjreedy at> wrote:
> On 10/9/2012 9:30 AM, Eric Snow wrote:
>> On Oct 9, 2012 1:12 AM, "Senthil Kumaran"
>> <senthil at
>> <mailto:senthil at>> wrote:
>>  > > `p.pathjoin(q)`
>>  >
>>  > +1
>>  >
>>  > It is very explicit and hard to get it wrong.
> or path.concat(otherpath)

Or how about path.slash(other_path)? :)

I'm not feeling the operators, though I haven't thought about them much.


From mikegraham at  Thu Oct 11 04:24:12 2012
From: mikegraham at (Mike Graham)
Date: Wed, 10 Oct 2012 22:24:12 -0400
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 10, 2012 at 8:08 PM, Steven D'Aprano <steve at> wrote:
> On 11/10/12 07:08, Mike Graham wrote:
>> On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou<solipsis at>
>> wrote:
>>> On Wed, 10 Oct 2012 15:36:08 -0400
>>> Mike Graham<mikegraham at>  wrote:
>>>> The literal"\c" should be an error
> Who says so? My bash shell disagrees with you:

Frankly, I don't look to bash for sensible language design advice. I
think concepts like "In the face of ambiguity, refuse the temptation
to guess" guides how we should see the decision here. "Backslash is
for escape sequences except when it's not" seemed like an
obviously-misfortunate thing to me. I'm truly perplexed people see it
as a feature they're eager to use, but I guess I should learn
something from that.

>> Regular expressions are difficult if you're remembering which escape
>> sequences exist and are easy if you're using raw string literals.
> Just because some people find it hard to remember doesn't mean that it
> should be an error *not* to use raw strings.

I didn't say that it should be an error not to use raw strings. I was
saying that the implication that this suggestion makes constructing
regex strings hard is silly and mentioning the thing that makes them
easy. I'm not suggesting that you shouldn't be able to use normal
string literals.

Antoine went on to point out that things like "\t" worked in regex
strings. This is an unrelated feature that I never suggested altering.
In that case, a tab character in your string is regarded like \t. This
behavior would remain.

I think four string escapes have been added since versions of Python I
was aware of. Writing code like "ab\c" seems seedy in light of that


From ericsnowcurrently at  Thu Oct 11 04:34:36 2012
From: ericsnowcurrently at (Eric Snow)
Date: Wed, 10 Oct 2012 20:34:36 -0600
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 8, 2012 5:35 PM, "Eric Snow" <ericsnowcurrently at> wrote:
> On Mon, Oct 8, 2012 at 12:47 PM, Antoine Pitrou <solipsis at>
> > - `p[q]` joins path q to path p
> -1
> > - `p + q` joins path q to path p
> -1
> > - `p / q` joins path q to path p
> -1
> > - `p.join(q)` joins path q to path p
> +1 (with a different name)
> I've found Nick's argument against operators-from-day-1 to be
> convincing, as well as his argument against join() or any other name
> already provided by string/sequence APIs.

Changing my vote:

p[q]                 -1
p + q               -1
p / q               +0
p.pathjoin()   +1

A method is essential, regardless of the color the bikeshed ends up.  As
far as operators go, / is the only option here that doesn't conflict with
string/collection APIs.  The alternative has an adverse impact on
subclassing and on future design choices on the path API.  This goes for
the method name too.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Thu Oct 11 04:58:33 2012
From: steve at (Steven D'Aprano)
Date: Thu, 11 Oct 2012 13:58:33 +1100
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <>
References: <>
Message-ID: <>

On 11/10/12 13:24, Mike Graham wrote:
> On Wed, Oct 10, 2012 at 8:08 PM, Steven D'Aprano<steve at>  wrote:
>> On 11/10/12 07:08, Mike Graham wrote:
>>> On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou<solipsis at>
>>> wrote:
>>>> On Wed, 10 Oct 2012 15:36:08 -0400
>>>> Mike Graham<mikegraham at>   wrote:
>>>>> The literal"\c" should be an error
>> Who says so? My bash shell disagrees with you:
> Frankly, I don't look to bash for sensible language design advice.

Pity, because in this case I think bash is actually more sensible than
either Python or Java. If you escape a character, you should get
something. If it's a special character, you get the special meaning.
If it's not, escaping should be transparent: escaping something that
doesn't need escaping is a null op:

py> from urllib import quote_plus
py> quote_plus('abc')

If we were designing Python from scratch, I'd prefer '\D' -> 'D'. But
we're not, so I'm happy with the current behaviour, and don't agree that
it should be an error or that it needs warning about.

> I
>  think concepts like "In the face of ambiguity, refuse the temptation
> to guess" guides how we should see the decision here.

Where is the ambiguity? Is there ever a context where \D could mean
two different things and it isn't clear which one?

"In the face of ambiguity..." does not mean "refuse to decide on
language behaviour". Everything is ambiguous until you decide what
something will mean. It's only when you have two possible meanings
and no clear, obvious way to determine which one applies that the
ambiguity koan applies.

> "Backslash is
> for escape sequences except when it's not" seemed like an
> obviously-misfortunate thing to me.

No. In cooked strings, backslash-C is always an escape sequence, for
any character (or hex/oct code) C. But some escape sequences resolve
to a single char (\n -> newline) and some resolve to a pair of chars
(\D -> backslash D). In Haskell, \& resolves to the empty string.
It's still an escape sequence.

> I think four string escapes have been added since versions of Python I
> was aware of. Writing code like "ab\c" seems seedy in light of that

Adding a new escape sequence is almost as big a step as adding a new
built-in or new syntax. I see that as a good thing, it discourages too
many requests for new escape sequences.


From alexander.belopolsky at  Thu Oct 11 06:07:33 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Thu, 11 Oct 2012 00:07:33 -0400
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

On Wed, Oct 10, 2012 at 9:20 PM, Steven D'Aprano <steve at> wrote:
> Both -0.0 and 0.0 compare equal, but they can be distinguished (although
> doing so is tricky in Python).

Not really:

>>> math.copysign(1.0,-0.0)
>>> math.copysign(1.0,0.0)

From alexander.belopolsky at  Thu Oct 11 06:20:17 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Thu, 11 Oct 2012 00:20:17 -0400
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 10, 2012 at 9:44 PM, Guido van Rossum <guido at> wrote:
> Anyway, as an indication of the amount of work, you might want to look
> at the fpectl module -- the module itself is tiny, but its
> introduction required a huge amount of changes to every place where
> CPython uses a double.

I would start from another end.  I would look at first.
This is little over 6,400 line of code and I think most of it can be
reused to implement base 2 (or probably better base 16) float.
Multi-precision binary float can coexist with existing float until
the code matures and accelerators are written for major platforms.  At
the same time we can make incremental improvements to builtin float
until it can be replaced by a multi-precision float in some
well-defined context.

From greg.ewing at  Thu Oct 11 07:34:32 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 11 Oct 2012 18:34:32 +1300
Subject: [Python-ideas] Make undefined escape sequences
	have	SyntaxWarnings
In-Reply-To: <>
References: <>
Message-ID: <>

Steven D'Aprano wrote:
> If you escape a character, you should get
> something. If it's a special character, you get the special meaning.
> If it's not, escaping should be transparent: escaping something that
> doesn't need escaping is a null op

I think that calling "\n", "\t" etc. "escape sequences" is a misnomer
that is causing confusion in this discussion.

The term "escape" in this context means to prevent something from
having a special meaning that it would otherwise have. But the
backslash in these is being used to *give* a special meaning to
the following character.

In Python string literals, the only true escape sequences associated
with the backslash are '\\', "\'" and '\"'.

So the backslash is a bit schizophrenic -- sometimes it's an escape
character, sometimes it's a prefix that imparts a special meaning.

This means that "\c" where c is not special in any way is somewhat
ambiguous. Are you redundantly escaping something that doesn't
need it, are you asking for a special meaning that doesn't exist
(which is probably a mistake), or do you just want a literal

Python guesses that you want a literal backslash. This seems to be
motivated by the desire to minimise the need for backslash doubling.
That sounds fine in theory, but I don't think it helps much in
practice. I for one don't trust myself to keep the entire set of
special characters in my head, including all the rarely-used ones,
so I end up doubling every backslash anyway.

Given that, I wouldn't have minded at all if Python had refused
to guess in this case, and raised a compile-time error. That would
have left the way open for extending the set of special chars in
the future.

> Adding a new escape sequence is almost as big a step as adding a new
> built-in or new syntax. I see that as a good thing, it discourages too
> many requests for new escape sequences.

I don't see it makes much difference. We get plenty of requests for
new syntax of all kinds, and we seem to have enough sense to reject
them unless they're backed by extremely good arguments. There's no
reason requests for new special chars should be treated any differently.


From greg.ewing at  Thu Oct 11 07:45:50 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 11 Oct 2012 18:45:50 +1300
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

Alexander Belopolsky wrote:

> I gave this idea +float('inf') in the other thread and was thinking about it
> since. I am now toying with the idea to unify float and decimal in Python.

Are you sure there would be any point in this? People who
specifically *want* base-2 floats are probably quite happy
with the current float type, and wouldn't appreciate having
it slowed down, even by a small amount.

It might make sense for them to share whatever parts of the
fp context apply to both, and they might have a common base
type, but they should probably remain distinct types with
separate implementations.


From stephen at  Thu Oct 11 07:53:12 2012
From: stephen at (Stephen J. Turnbull)
Date: Thu, 11 Oct 2012 14:53:12 +0900
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

Ronald Oussoren writes:

 > All in all the best we seem to be able to do is use the OS as a
 > heuristic, most Unix filesystems are case sensitive while Windows
 > and OSX filesystems are case preserving.

We can do better than that heuristic.  All of the POSIX systems I know
publish mtab by default.  The mount utility by default will report the
types of filesystems.

While a path module should not depend on such information, I suppose[1],
there ought to be a way to ask for it.

Of course this is still an heuristic (at least some Mac filesystems
can be configured to be case sensitive rather than case-preserving,
and I don't think this information is available in mtab), but it's far
more accurate than using only the OS.

[1]  Requires a system call or subprocess execution, and since mounts
can be dynamically changed, doing it once at module initialization is
not good enough.

From rohit0286 at  Thu Oct 11 08:19:09 2012
From: rohit0286 at (rohit sharma)
Date: Thu, 11 Oct 2012 11:49:09 +0530
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

p + q               +1

This is a familiar notation to any developer and its been used widely.


On Thu, Oct 11, 2012 at 8:04 AM, Eric Snow <ericsnowcurrently at>wrote:

> On Oct 8, 2012 5:35 PM, "Eric Snow" <ericsnowcurrently at> wrote:
> >
> > On Mon, Oct 8, 2012 at 12:47 PM, Antoine Pitrou <solipsis at>
> wrote:
> > > - `p[q]` joins path q to path p
> > -1
> > > - `p + q` joins path q to path p
> > -1
> > > - `p / q` joins path q to path p
> > -1
> > > - `p.join(q)` joins path q to path p
> > +1 (with a different name)
> >
> > I've found Nick's argument against operators-from-day-1 to be
> > convincing, as well as his argument against join() or any other name
> > already provided by string/sequence APIs.
> Changing my vote:
> p[q]                 -1
> p + q               -1
> p / q               +0
> p.pathjoin()   +1
> A method is essential, regardless of the color the bikeshed ends up.  As
> far as operators go, / is the only option here that doesn't conflict with
> string/collection APIs.  The alternative has an adverse impact on
> subclassing and on future design choices on the path API.  This goes for
> the method name too.
> -eric
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From breamoreboy at  Thu Oct 11 08:55:06 2012
From: breamoreboy at (Mark Lawrence)
Date: Thu, 11 Oct 2012 07:55:06 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k55qgd$54h$>

On 10/10/2012 20:07, Antoine Pitrou wrote:
> How about ?
> Regards
> Antoine.

-1 two much chance of confusing it with the other ways that too can be 
spelt :)


Mark Lawrence.

From stephen at  Thu Oct 11 08:59:23 2012
From: stephen at (Stephen J. Turnbull)
Date: Thu, 11 Oct 2012 15:59:23 +0900
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou writes:
 > On Tue, 9 Oct 2012 00:45:41 +0530
 > Nick Coghlan <ncoghlan at> wrote:
 > > On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum <guido at> wrote:
 > > > I don't like any of those; I'd vote for another regular method, maybe
 > > > p.pathjoin(q).
 > > 
 > [...]
 > > 
 > > I don't *love* joinpath as a name, I just don't actively dislike it
 > > the way I do the four presented options (and it has the virtue of the
 > > precedent).


 > How about ?

TOOWDTI, yes, but to me what it obviously does is

Path("/usr/local/bin").to(Path("/usr/bin")) => Path("../bin")

Ie, to me it's another spelling for .relative_to(), except that the
operands have reversed.  FWIW M?2% YMMV etc.

Some random thoughts follow.  If you think that is out of keeping with
the progress of this thread<wink/>, stop reading now.

I just don't think this problem (of convenient and object-oriented
paths) is going to get solved.  Basically what most of the people who
are posting about this seem to want is a subclass of string that
DWIMs.  The problem is that "DWIM" varies substantially across
programmers, and seems to be nondeterministic for some (me, for one).
If path "objects" "should" behave like strings with specialized
convenience methods, how can you improve on os.path?  I haven't seen
any answers to that, only "WIBNI Paths looked like strings
representing paths?"  And only piece by piece at that, no coherent
overview of what Paths-like-str might look like from a space station.

If we're going to have an object-oriented path module, why can't it be
object-oriented?  Paths are sequences of path components.  They are
not sequences of characters.  Sorry!  Path components are strings (or
subclasses thereof), but they have additional syntax (extensions,
Windows devices, Windows remote paths).  Maybe that, we can do
something with!

Antoine says that Paths need to be immutable.  Makes sense, but does
that preclude having MutablePath?  Then `mp[-1] += ".ext"` is a
natural notation, no?  How about `mp[-1] %= ".tex"; mp[-1] += .pdf"`?
Then just

    my_path = MutablePath(arg_path)
    return Path(my_path)

does the work safely.

As has been noted several times, all paths have syntax resembling URL
syntax.  Even the semantics are similar, except (of course you are in
no way surprised by this) on Windows, where the syntactic role of
"scheme" has semantics "device", and there is the issue of the
different path separator.  Maybe it would be reasonable to forget
object-oriented Paths restricted to filesystems and use URLs when you
want object-oriented behavior.  Under the hood URL methods working
with file URLs would be manipulating paths via os.path, perhaps.

I realize that this would impose an asymmetric burden on developers on
Windows.  On the other hand, these days who isn't familiar with URL
syntax and passing familiar with its minor differences from file
system path semantics?  Perhaps the benefits of working with a
well-defined object model would outweight the costs, at least when
developing new code.  In ordinary maintenance or major refactoring,
the developer would have the option of continuing to use os.path or
homebrew functions to manipulate paths.


From g.brandl at  Thu Oct 11 09:01:56 2012
From: g.brandl at (Georg Brandl)
Date: Thu, 11 Oct 2012 09:01:56 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k55qqt$ai5$>

Am 11.10.2012 08:19, schrieb rohit sharma:
> p + q               +1
> This is a familiar notation to any developer and its been used widely.

I'd like to see that claim supported.


From him at  Thu Oct 11 09:03:23 2012
From: him at (=?ISO-8859-1?Q?Joachim_K=F6nig?=)
Date: Thu, 11 Oct 2012 09:03:23 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On 11/10/2012 04:24, Chris Jerdonek wrote:
> Or how about path.slash(other_path)? :)
and path.backslash(other_path) for windows compaptibility ;-)


From storchaka at  Thu Oct 11 10:45:09 2012
From: storchaka at (Serhiy Storchaka)
Date: Thu, 11 Oct 2012 11:45:09 +0300
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <k560uo$tus$>

On 11.10.12 07:20, Alexander Belopolsky wrote:
> This is little over 6,400 line of code and I think most of it can be
> reused to implement base 2 (or probably better base 16) float.

With base 16 floats you can't emulate x86 native 53-bit mantissa floats.

From oscar.j.benjamin at  Thu Oct 11 10:56:31 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Thu, 11 Oct 2012 09:56:31 +0100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 11 October 2012 06:45, Greg Ewing <greg.ewing at> wrote:
> Alexander Belopolsky wrote:
>> I gave this idea +float('inf') in the other thread and was thinking about
>> it
>> since. I am now toying with the idea to unify float and decimal in Python.
> Are you sure there would be any point in this? People who
> specifically *want* base-2 floats are probably quite happy
> with the current float type, and wouldn't appreciate having
> it slowed down, even by a small amount.
> It might make sense for them to share whatever parts of the
> fp context apply to both, and they might have a common base
> type, but they should probably remain distinct types with
> separate implementations.

This is what I was pitching at. It would be great if a single floating
point context could be used to control the behaviour of float,
decimal, ndarray etc simultaneously.

Something that would have made my life easier yesterday would have
been a way to enter a debugger at the point when a first NaN is
created during execution. Something like:

    python -m pdb --error-nan

Or perhaps:

    PYTHONRUNFIRST='import errornan' python

With numpy you can already do:

    export PYTHONRUNFIRST='imoprt numpy; numpy.seterr(all='raise')'

(Except that PYTHONRUNFIRST isn't implemented yet:


From storchaka at  Thu Oct 11 11:00:01 2012
From: storchaka at (Serhiy Storchaka)
Date: Thu, 11 Oct 2012 12:00:01 +0300
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <>
References: <>
	<> <k54kck$8q8$>
Message-ID: <k561qg$3vr$>

On 10.10.12 23:18, Antoine Pitrou wrote:
> On Wed, 10 Oct 2012 23:04:25 +0300
> Serhiy Storchaka <storchaka at>
> wrote:
>> On 10.10.12 22:46, Antoine Pitrou wrote:
>>> -1. This will make life more difficult with regular expressions (and
>>> produce lots of spurious warnings in existing code).
>> Strings for regular expressions always should be raw. Now regular
>> expressions supports \u and \U escapes and no reason to use non-raw strings.
> That's a style issue, not a language rule.

Yes, of course, that's a style advice. Sorry if I used the wrong words.

This will not make life more difficult with regular expressions because 
you always can use raw string literals.

From solipsis at  Thu Oct 11 12:03:10 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 11 Oct 2012 12:03:10 +0200
Subject: [Python-ideas] Floating point contexts in Python core
References: <>
Message-ID: <>

On Thu, 11 Oct 2012 18:45:50 +1300
Greg Ewing <greg.ewing at> wrote:
> Alexander Belopolsky wrote:
> > I gave this idea +float('inf') in the other thread and was thinking about it
> > since. I am now toying with the idea to unify float and decimal in Python.
> Are you sure there would be any point in this? People who
> specifically *want* base-2 floats are probably quite happy
> with the current float type, and wouldn't appreciate having
> it slowed down, even by a small amount.

Indeed, I don't see the point either. Decimal's strength over float is
to be able to represent *decimal* numbers of arbitrary precision, which
is useful because any common human activity uses base-10 numbers. I
don't see how adding a new binary float type would help any use case.



Software development and contracting:

From ubershmekel at  Thu Oct 11 12:31:14 2012
From: ubershmekel at (Yuval Greenfield)
Date: Thu, 11 Oct 2012 12:31:14 +0200
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <k561qg$3vr$>
References: <>
	<> <k54kck$8q8$>
	<> <k561qg$3vr$>
Message-ID: <>

I'm not sure I understand what this line from the docs means:

\newline Backslash and newline ignored

I understand that row as either "\n" won't appear in the resulting string
or that I should get "\\newline".

Yuval Greenfield
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From storchaka at  Thu Oct 11 12:53:00 2012
From: storchaka at (Serhiy Storchaka)
Date: Thu, 11 Oct 2012 13:53:00 +0300
Subject: [Python-ideas] Make undefined escape sequences have
In-Reply-To: <>
References: <>
	<> <k54kck$8q8$>
	<> <k561qg$3vr$>
Message-ID: <k568ef$vr6$>

On 11.10.12 13:31, Yuval Greenfield wrote:
> I'm not sure I understand what this line from the docs means:
> \newline Backslash and newline ignored
> I understand that row as either "\n" won't appear in the resulting
> string or that I should get "\\newline".

Newline is newline in source code.

 >>> "a\
... b"

Type <Quote><Key "a"><Backslash><Enter><Key "b"><Quote>. Result is "ab".

From arnodel at  Thu Oct 11 13:27:05 2012
From: arnodel at (Arnaud Delobelle)
Date: Thu, 11 Oct 2012 12:27:05 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On 11 October 2012 08:03, Joachim K?nig <him at> wrote:
> On 11/10/2012 04:24, Chris Jerdonek wrote:
>> Or how about path.slash(other_path)? :)
> and path.backslash(other_path) for windows compaptibility ;-)

That's made my day!

How about a past participle to express it's not mutating?

1. path.joined("foo/bar")
2. path.extended("foo", "bar")

(or some better and shorter one I can't think of).


From steve at  Thu Oct 11 13:35:28 2012
From: steve at (Steven D'Aprano)
Date: Thu, 11 Oct 2012 22:35:28 +1100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 11/10/12 16:45, Greg Ewing wrote:
> Alexander Belopolsky wrote:
>> I gave this idea +float('inf') in the other thread and was thinking about it
>> since. I am now toying with the idea to unify float and decimal in Python.
> Are you sure there would be any point in this? People who
> specifically *want* base-2 floats are probably quite happy
> with the current float type, and wouldn't appreciate having
> it slowed down, even by a small amount.

I would gladly give up a small amount of speed for better control
over floats, such as whether 1/0.0 raised an exception or
returned infinity.

If I wanted fast code, I'd be using C. I'm happy with *fast enough*.

For example, 1/0.0 in a continued fraction is generally harmless,
provided it returns infinity. If it raises an exception, you have
to write slow, ugly code to evaluate continued fractions robustly.
I wouldn't expect 1/0.0 -> infinity to becomes the default, but
I'd like a runtime switch to turn it on and off as needed.


From alexander.belopolsky at  Thu Oct 11 14:05:41 2012
From: alexander.belopolsky at (Alexander Belopolsky)
Date: Thu, 11 Oct 2012 08:05:41 -0400
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <k560uo$tus$>
References: <>
Message-ID: <>

On Oct 11, 2012, at 4:45 AM, Serhiy Storchaka <storchaka at> wrote:

> With base 16 floats you can't emulate x86 native 53-bit mantissa floats.

I realized that as soon as I hit send. :-( I also realized that it does not matter for python implementation because decimal stores mantissa as an int rather than a list of digits. 

From sturla at  Thu Oct 11 14:31:31 2012
From: sturla at (Sturla Molden)
Date: Thu, 11 Oct 2012 14:31:31 +0200
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 11.10.2012 13:35, Steven D'Aprano wrote:

> I would gladly give up a small amount of speed for better control
> over floats, such as whether 1/0.0 raised an exception or
> returned infinity.
> If I wanted fast code, I'd be using C. I'm happy with *fast enough*.
> For example, 1/0.0 in a continued fraction is generally harmless,
> provided it returns infinity. If it raises an exception, you have
> to write slow, ugly code to evaluate continued fractions robustly.
> I wouldn't expect 1/0.0 -> infinity to becomes the default, but
> I'd like a runtime switch to turn it on and off as needed.

For those who use Python for numerical or scientific computing or 
computer graphics this is a real issue.

First: The standard way of dealing with 1/0.0 in this context, since the 
days of FORTRAN, is to return an inf. Consequently, that is what NumPy 
does, as does Matlab, R and most C programs and libraries.

Now compare:

 >>> 1.0/0.0

Traceback (most recent call last):
   File "<pyshell#0>", line 1, in <module>
ZeroDivisionError: float division by zero

With this:

 >>> import numpy as np
 >>> np.float64(1.0)/np.float64(0.0)

Thus, the NumPy float64 scalar behaves differently from the Python float 

In less than trivial expressions, we can have a combination of Python 
floats and ints and NumPy types (arrays or scalars). What this means is 
that the behavior is undefined. You might get an inf, or you might get 
an exception. Who can tell?

The issue also affects integers:

 >>> 1/0

Traceback (most recent call last):
   File "<pyshell#5>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero


 >>> np.int64(1)/np.int64(0)

 >>> np.int32(1)/np.int32(0)

And with arrays:

 >>> np.ones(10,,
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

 >>> np.ones(10, dtype=np.float64)/np.zeros(10, dtype=np.float64)
array([ inf,  inf,  inf,  inf,  inf,  inf,  inf,  inf,  inf,  inf])

I think for the sake of us who actually need computation -- believe it 
or not, Python is rapidly becoming the language of choice for numerical 
computing -- it would be very nice is this was controllable. Not just 
the behavior of floats, but also the behavior of ints.

A global switch in the sys module would make life a lot easier. Even 
better would be a context manager that allows us to set up a "numerical" 
context for local expressions using a with statement. That would not 
have a lasting effect, but just affect the context. Preferably it should 
not even propagate across function calls. Something like this:

def foobar():
     1/0.0 # raise an exception
     1/0   # raise an exception
with sys.numerical:
     1/0.0 # return inf
     1/0   # return 0

(NumPy actually prints divide by zero warnings on their first 
occurrence, but I removed it for clarity.)

Sturla Molden

From rosuav at  Thu Oct 11 15:18:20 2012
From: rosuav at (Chris Angelico)
Date: Fri, 12 Oct 2012 00:18:20 +1100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Thu, Oct 11, 2012 at 11:31 PM, Sturla Molden <sturla at> wrote:
> A global switch in the sys module would make life a lot easier. Even better
> would be a context manager that allows us to set up a "numerical" context
> for local expressions using a with statement. That would not have a lasting
> effect, but just affect the context. Preferably it should not even propagate
> across function calls. Something like this:
> def foobar():
>     1/0.0 # raise an exception
>     1/0   # raise an exception
> with sys.numerical:
>     1/0.0 # return inf
>     1/0   # return 0
>     foobar()

Not propagating across function calls strikes me as messy, but I see
why you'd want it. Would this be better as a __future__ directive?
There's already the concept that they apply to a module but not to
what that module calls.


From oscar.j.benjamin at  Thu Oct 11 15:36:11 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Thu, 11 Oct 2012 14:36:11 +0100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 11 October 2012 14:18, Chris Angelico <rosuav at> wrote:
> On Thu, Oct 11, 2012 at 11:31 PM, Sturla Molden <sturla at> wrote:
>> A global switch in the sys module would make life a lot easier. Even better
>> would be a context manager that allows us to set up a "numerical" context
>> for local expressions using a with statement. That would not have a lasting
>> effect, but just affect the context. Preferably it should not even propagate
>> across function calls. Something like this:
>> def foobar():
>>     1/0.0 # raise an exception
>>     1/0   # raise an exception
>> with sys.numerical:
>>     1/0.0 # return inf
>>     1/0   # return 0
>>     foobar()
> Not propagating across function calls strikes me as messy, but I see
> why you'd want it. Would this be better as a __future__ directive?
> There's already the concept that they apply to a module but not to
> what that module calls.

__future__ directives are for situations in which the default
behaviour will be changed in the future but you want to get the new
behaviour now. The proposal is to always have widely supported,
convenient ways to switch between different handling modes for
numerical operations. The default Python behaviour would be unchanged
by this.


From rosuav at  Thu Oct 11 16:11:45 2012
From: rosuav at (Chris Angelico)
Date: Fri, 12 Oct 2012 01:11:45 +1100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Oct 12, 2012 at 12:36 AM, Oscar Benjamin
<oscar.j.benjamin at> wrote:
> On 11 October 2012 14:18, Chris Angelico <rosuav at> wrote:
>> Not propagating across function calls strikes me as messy, but I see
>> why you'd want it. Would this be better as a __future__ directive?
>> There's already the concept that they apply to a module but not to
>> what that module calls.
> __future__ directives are for situations in which the default
> behaviour will be changed in the future but you want to get the new
> behaviour now. The proposal is to always have widely supported,
> convenient ways to switch between different handling modes for
> numerical operations. The default Python behaviour would be unchanged
> by this.

Sure, it's not perfect for __future__ either, but it does seem odd for
a function invocation to suddenly change semantics. This change
"feels" to me more like a try/catch block - it's a change to this code
that causes different behaviour around error conditions. That ought to
continue into a called function.


From solipsis at  Thu Oct 11 16:40:43 2012
From: solipsis at (Antoine Pitrou)
Date: Thu, 11 Oct 2012 16:40:43 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
References: <>
	<> <>
Message-ID: <>

On Wed, 10 Oct 2012 20:55:23 -0400
Trent Nelson <trent at> wrote:
>     You could leverage this with kqueue and epoll; have similar threads
>     set up to simply process I/O independent of the GIL, using the same
>     facilities that would be used by IOCP-processing threads.

Would you really win anything by doing I/O in separate threads, while
doing normal request processing in the main thread?

That said, the idea of a common API architected around async I/O,
rather than non-blocking I/O, sounds interesting at least theoretically.
Maybe all those outdated Snakebite Operating Systems are useful for
something after all. ;-P



Software development and contracting:

From guido at  Thu Oct 11 16:54:35 2012
From: guido at (Guido van Rossum)
Date: Thu, 11 Oct 2012 07:54:35 -0700
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

I think you're mistaking  my suggestion. I meant to recommend that
there should be a way to control the behavior (e.g. whether to
silently return Nan/Inf or raise an exception) of floating point
operations, using the capabilities of the hardware as exposed through
C, using Python's existing float type. I did not for a second consider
reimplementing IEEE 754 from scratch. Therein lies insanity.

That's also why I recommended you look at the fpectl module.

--Guido van Rossum (

From storchaka at  Thu Oct 11 16:55:48 2012
From: storchaka at (Serhiy Storchaka)
Date: Thu, 11 Oct 2012 17:55:48 +0300
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <k56mlm$6c9$>

On 11.10.12 15:31, Sturla Molden wrote:
>  >>> np.int64(1)/np.int64(0)
> 0
>  >>> np.int32(1)/np.int32(0)
> 0

For such behavior must be some rationale.

From oscar.j.benjamin at  Thu Oct 11 17:52:43 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Thu, 11 Oct 2012 16:52:43 +0100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <k56mlm$6c9$>
References: <>
	<> <>
Message-ID: <>

On 11 October 2012 15:55, Serhiy Storchaka <storchaka at> wrote:
> On 11.10.12 15:31, Sturla Molden wrote:
>>  >>> np.int64(1)/np.int64(0)
>> 0
>>  >>> np.int32(1)/np.int32(0)
>> 0
> For such behavior must be some rationale.

I don't know what the rationale for that is but it is at least
controllable in numpy:

>>> import numpy as np
>>> np.seterr(all='raise')  # Exceptions instead of mostly useless values
{'over': 'raise', 'divide': 'raise', 'invalid': 'raise', 'under': 'raise'}
>>> np.int32(1) / np.int32(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FloatingPointError: divide by zero encountered in long_scalars
>>> np.float32(1e20) * np.float32(1e20)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FloatingPointError: overflow encountered in float_scalars
>>> np.float32('inf')
>>> np.float32('inf') / np.float32('inf')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FloatingPointError: invalid value encountered in float_scalars


From stephen at  Thu Oct 11 18:05:33 2012
From: stephen at (Stephen J. Turnbull)
Date: Fri, 12 Oct 2012 01:05:33 +0900
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

Steven D'Aprano writes:

 > I would gladly give up a small amount of speed for better control
 > over floats, such as whether 1/0.0 raised an exception or
 > returned infinity.

Isn't that what the fpectl module is supposed to buy, albeit much less
pleasantly than Decimal contexts do?

From oscar.j.benjamin at  Thu Oct 11 19:17:33 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Thu, 11 Oct 2012 18:17:33 +0100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 11 October 2012 15:54, Guido van Rossum <guido at> wrote:
> I think you're mistaking  my suggestion. I meant to recommend that
> there should be a way to control the behavior (e.g. whether to
> silently return Nan/Inf or raise an exception) of floating point
> operations, using the capabilities of the hardware as exposed through
> C, using Python's existing float type. I did not for a second consider
> reimplementing IEEE 754 from scratch. Therein lies insanity.
> That's also why I recommended you look at the fpectl module.

I would like to have precisely the functionality you are suggesting
and I don't want to reimplement anything (I assume this message is
intended for me since it was addressed to me).

I don't know enough about the implementation details to agree on the
hardware capabilities part. From a quick glance at the fpectl module I
see that it has problems with portability:

   Setting up a given processor to trap IEEE-754 floating point errors
   currently requires custom code on a per-architecture basis.
  You may have to modify fpectl to control your particular hardware.

This presumably explains why I don't have the module in my Windows
build or on the Linux machines in the HPC cluster I use. Are these
problems that can be overcome? If it is necessary to have this
hardware-specific accelerator for floating point exceptions then is it
reasonable to expect implementations other than CPython to be able to
match the semantics of floating point contexts without a significant
degradation in performance?

I was expecting the implementation to be some checks in straight
forward C code for invalid values. I would expect this to cause a
small degradation in performance (the kind that you wouldn't notice
unless you went out of your way to measure it). Python already does
this by checking for a zero value on every division. As far as I can
tell from the numpy codebase this is how it works there.

This function seems to be responsible for the integer division by zero
result in numpy:

>>> import numpy as np
>>> np.seterr()
{'over': 'warn', 'divide': 'warn', 'invalid': 'warn', 'under': 'ignore'}
>>> np.int32(1) / np.int32(0)
__main__:1: RuntimeWarning: divide by zero encountered in long_scalars
>>> np.seterr(divide='ignore')
{'over': 'warn', 'divide': 'warn', 'invalid': 'warn', 'under': 'ignore'}
>>> np.int32(1) / np.int32(0)
>>> np.seterr(divide='raise')
{'over': 'warn', 'divide': 'ignore', 'invalid': 'warn', 'under': 'ignore'}
>>> np.int32(1) / np.int32(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
FloatingPointError: divide by zero encountered in long_scalars

This works perfectly well in numpy and also in decimal I see no reason
why it couldn't work for float/int. But what would would be even
better is if you could control all of them with a single context
manager. Typically I don't care with the error occurred as a result of
operations on ints/floats/ndarrays/decimals I just know that I got a
NaN from somewhere and I need to debug it.


From matt at  Thu Oct 11 19:33:38 2012
From: matt at (Matt Chaput)
Date: Thu, 11 Oct 2012 13:33:38 -0400
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

 > - `p[q]` joins path q to path p


 > - `p + q` joins path q to path p


 > - `p / q` joins path q to path p


 > - `p.join(q)` joins path q to path p


I think .join() should be the "obvious" way to do it and + should be a 


From oscar.j.benjamin at  Thu Oct 11 20:42:55 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Thu, 11 Oct 2012 19:42:55 +0100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 11 October 2012 17:05, Stephen J. Turnbull <stephen at> wrote:
> Steven D'Aprano writes:
>  > I would gladly give up a small amount of speed for better control
>  > over floats, such as whether 1/0.0 raised an exception or
>  > returned infinity.
> Isn't that what the fpectl module is supposed to buy, albeit much less
> pleasantly than Decimal contexts do?

But the fpectl module IIUC wouldn't work for 1 / 0. Since Python has
managed to unify integer/float division now it would be a shame to
introduce any new reasons to bring in superfluous .0s again:

with context(zero_division='infinity'):
    x = 1 / 0.0  # float('inf')
    y = 1 / 0  # I'd like to see float('inf') here as well

I've spent 4 hours this week in computer labs with students using
Python 2.7 as an introduction to scientific programming. A significant
portion of that time was spent explaining the int/float division
problem. They all get the issue now but not all of them understand
that it is specifically about division: many are putting .0s
everywhere. I expect it to be easier when we use Python 3 and I can
simply explain that there are two types of division with two different


From guido at  Thu Oct 11 20:45:02 2012
From: guido at (Guido van Rossum)
Date: Thu, 11 Oct 2012 11:45:02 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

 Tue, Oct 9, 2012 at 5:44 PM, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
>> Indeed, in NDB this works great. However tracebacks don't work so
>> great: If you don't catch the exception right away, it takes work to
>> make the tracebacks look right when you catch it a few generator calls
>> down on the (conceptual) stack. I fixed this to some extent in NDB, by
>> passing the traceback explicitly along when setting an exception on a
>> Future;
> Was this before or after the recent change that was supposed
> to improve tracebacks from yield-fram chains? If there's still
> a problem after that, maybe exception handling in yield-from
> requires some more work.

Sadly it was with Python 2.5/2.7...

>> But so far when thinking about this
>> recently I have found the goal elusive --
>> Perhaps you can clear things up by
>> showing some detailed (but still simple enough) example code to handle
>> e.g. a simple web client?
> You might like to take a look at this, where I develop a series of
> examples culminating in a simple multi-threaded server:

Definitely very enlightening. Though I think you should not use
'thread' since that term is already reserved for OS threads as
supported by the threading module. In NDB I chose to use 'tasklet' --
while that also has other meanings, its meaning isn't fixed in core
Python. You could also use task, which also doesn't have a core Python
meaning. Just don't call it "process", never mind that Erlang uses
this (a number of other languages rooted in old traditions do too, I

Also I think you can now revisit it and rewrite the code to use Python 3.3.

> Code here:

It does bother me somehow that you're not using .send() and yield
arguments at all. I notice that you have a lot ofthree-line code
blocks like this:

      data = sock.recv(1024)

The general form seems to be:

      arrange for a callback when some operation can be done without blocking
      do the operation

This seems to be begging to be collapsed into a single line, e.g.

      data = yield sock.recv_async(1024)

(I would also prefer to see the socket wrapped in an object that makes
it hard to accidentally block.)

>> somehow it seems there *has*
>> to be a distinction between an operation you just *yield* (this would
>> be waiting for a specific low-level I/O operation) and something you
>> use with yield-from, which returns a value through StopIteration.
> It may be worth noting that nothing in my server example uses 'yield'
> to send or receive values -- yield is only used without argument as
> a suspension point. But the functions containing the yields *are*
> called with yield-from and may return values via StopIteration.

Yeah, but see my remark above...

> So I think there are (at least) two distinct ways of using generators,
> but the distinction isn't quite the one you're making. Rather, we
> have "coroutines" (don't yield values, do return values) and
> "iterators" (do yield values, don't return values).

But surely there's still a place for send() and other PEP 342 features?

> Moreover, it's *only* the "coroutine" variety that we need to cater
> for when designing an async event system. Does that help to
> alleviate any of your monad-induced headaches?

Not entirely, no. I now have a fair amount experience writing an async
system and helping users make sense of its error messages, and there
are some practical considerations. E.g. my users sometimes want to
treat something as a coroutine but they don't have any yields in it
(perhaps they are writing skeleton code and plan to fill in the I/O
later). Example:

def caller():
  data = yield from reader()

def reader():
    return 'dummy'

works, but if you drop the yield it doesn't work. With a decorator I
know how to make it work either way.

--Guido van Rossum (

From guido at  Thu Oct 11 20:46:34 2012
From: guido at (Guido van Rossum)
Date: Thu, 11 Oct 2012 11:46:34 -0700
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 11:42 AM, Oscar Benjamin
<oscar.j.benjamin at> wrote:
> On 11 October 2012 17:05, Stephen J. Turnbull <stephen at> wrote:
>> Steven D'Aprano writes:
>>  > I would gladly give up a small amount of speed for better control
>>  > over floats, such as whether 1/0.0 raised an exception or
>>  > returned infinity.
>> Isn't that what the fpectl module is supposed to buy, albeit much less
>> pleasantly than Decimal contexts do?
> But the fpectl module IIUC wouldn't work for 1 / 0. Since Python has
> managed to unify integer/float division now it would be a shame to
> introduce any new reasons to bring in superfluous .0s again:
> with context(zero_division='infinity'):
>     x = 1 / 0.0  # float('inf')
>     y = 1 / 0  # I'd like to see float('inf') here as well
> I've spent 4 hours this week in computer labs with students using
> Python 2.7 as an introduction to scientific programming. A significant
> portion of that time was spent explaining the int/float division
> problem. They all get the issue now but not all of them understand
> that it is specifically about division: many are putting .0s
> everywhere. I expect it to be easier when we use Python 3 and I can
> simply explain that there are two types of division with two different
> operators.

You could have just told them to "from __future__ import division"

--Guido van Rossum (

From oscar.j.benjamin at  Thu Oct 11 21:12:37 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Thu, 11 Oct 2012 20:12:37 +0100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 11 October 2012 19:46, Guido van Rossum <guido at> wrote:
> On Thu, Oct 11, 2012 at 11:42 AM, Oscar Benjamin
> <oscar.j.benjamin at> wrote:
>> I've spent 4 hours this week in computer labs with students using
>> Python 2.7 as an introduction to scientific programming. A significant
>> portion of that time was spent explaining the int/float division
>> problem. They all get the issue now but not all of them understand
>> that it is specifically about division: many are putting .0s
>> everywhere. I expect it to be easier when we use Python 3 and I can
>> simply explain that there are two types of division with two different
>> operators.
> You could have just told them to "from __future__ import division"

I know but the reason for choosing Python is the low barrier to
getting started with procedural programming. When they're having
trouble understanding the difference between the Python shell and the
OS shell I'd like to avoid introducing the concept that the
interpreter can change its calculation modes dynamically and forget
those changes when you restart it. It's also unfortunate for the
students to know that some of the things they're seeing on day one
will change in the next version (you can't just tell people to import
things from the "future" without some kind of explanation).

I used the opportunity to think a little bit about types by running
type(x) and explain that different types of objects behave
differently. I would rather explain that using genuinely incompatible
types like strings and numbers than ints and floats though.


From tjreedy at  Thu Oct 11 22:44:46 2012
From: tjreedy at (Terry Reedy)
Date: Thu, 11 Oct 2012 16:44:46 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <k57b4j$7ud$>

On 10/11/2012 2:45 PM, Guido van Rossum wrote:
>   Tue, Oct 9, 2012 at 5:44 PM, Greg Ewing <greg.ewing at> wrote:

>> You might like to take a look at this, where I develop a series of
>> examples culminating in a simple multi-threaded server:
> Definitely very enlightening. Though I think you should not use
> 'thread' since that term is already reserved for OS threads as
> supported by the threading module. In NDB I chose to use 'tasklet' --

I read through this also and agree that using 'thread' for 'task', 
'tasklet', 'micrethread', or whatever is distracting. Part of the point, 
to me, is that the code does *not* use (OS) threads and the thread module.

Tim Peters intended iterators, including generators, to be an 
alternative to what he viewed as 'inside-out' callback code. The idea 
was that pausing where appropriate allowed code that belongs together to 
be kept together. I find generator-based event loops to be somewhat 
easier to understand than callback-based loops. I certainly was more 
comfortable with Greg's example than what I have read about twisted. So 
I would like to see a generator-based system in the stdlib.

Terry Jan Reedy

From guido at  Thu Oct 11 23:18:50 2012
From: guido at (Guido van Rossum)
Date: Thu, 11 Oct 2012 14:18:50 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 9, 2012 at 11:00 AM, Laurens Van Houtven <_ at> wrote:
> Oh my me. This is a very long thread that I probably should have replied to
> a long time ago. This thread is intensely long right now, and tonight is the
> first chance I've had to try and go through it comprehensively. I'll try to
> reply to individual points made in the thread -- if I missed yours, please
> don't be offended, I promise it's my fault :)

No problem, I'm running behind myself...

> FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka
> async-pep.

I suppose that's your pet name for it. :-) For most everyone else it's PEP 3153.

> First of all, I'm glad to see that there's some more "let's get that pep
> along" movement. I tabled it because:
> a) I didn't have enough time to contribute,
> b) a lot of promised contributions ended up not happening when it came down
> to it, which was incredibly demotivating. The combination of this thread,
> plus the fact that I was strong armed at Pycon ZA by a bunch of community
> members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into
> exploring this thing again.
> First of all, I don't feel async-pep is an attempt at twisted light in the
> stdlib. Other than separation of transport and protocol, there's not really
> much there that even smells of twisted (especially since right now I'd
> probably throw consumers/producers out) -- and that separation is simply
> good practice. Twisted does the same thing, but it didn't invent it.
> Furthermore, the advantages seem clear: reusability and testability are more
> than enough for me.
> If there's one take away idea from async-pep, it's reusable protocols.

Is there a newer version that what's on ? It seems to be missing any
specific proposals, after spending a lot of time giving a rationale
and defining some terms. The version on doesn't seem to be any more complete.

> The PEP should probably be a number of PEPs. At first sight, it seems that
> this number is at least four:
> 1. Protocol and transport abstractions, making no mention of asynchronous IO
> (this is what I want 3153 to be, because it's small, manageable, and
> virtually everyone appears to agree it's a fantastic idea)

But the devil is in the details. *What* specifically are you
proposing? How would you write a protocol handler/parser without any
reference to I/O? Most protocols are two-way streets -- you read some
stuff, and you write some stuff, then you read some more. (HTTP may be
the exception here, if you don't keep the connection open.)

> 2. A base reactor interface

I agree that this should be a separate PEP. But I do think that in
practice there will be dependencies between the different PEPs you are

> 3. A way of structuring callbacks: probably deferreds with a built-in
> inlineCallbacks for people who want to write synchronous-looking code with
> explicit yields for asynchronous procedures

Your previous two ideas sound like you're not tied to backward
compatibility with Tornado and/or Twisted (not even via an adaptation
layer). Given that we're talking Python 3.4 here that's fine with me
(though I think we should be careful to offer a path forward for those
packages and their users, even if it means making changes to the
libraries). But Twisted Deferred is pretty arcane, and I would much
rather not use it as the basis of a forward-looking design. I'd much
rather see what we can mooch off PEP 3148 (Futures).

> 4+ adapting the stdlib tools to using these new things

We at least need to have an idea for how this could be done. We're
talking serious rewrites of many of our most fundamental existing
synchronous protocol libraries (e.g. httplib, email, possibly even
io.TextWrapper), most of which have had only scant updates even
through the Python 3 transition apart from complications to deal with
the bytes/str dichotomy.

> Re: forward path for existing asyncore code. I don't remember this being
> raised as an issue. If anything, it was mentioned in passing, and I think
> the answer to it was something to the tune of "asyncore's API is broken,
> fixing it is more important than backwards compat". Essentially I agree with
> Guido that the important part is an upgrade path to a good third-party
> library, which is the part about asyncore that REALLY sucks right now.

I have the feeling that the main reason asyncore sucks is that it
requires you to subclass its Dispatcher class, which has a rather
treacherous interface.

> Regardless, an API upgrade is probably a good idea. I'm not sure if it
> should go in the first PEP: given the separation I've outlined above (which
> may be too spread out...), there's no obvious place to put it besides it
> being a new PEP.

Aren't all your proposals API upgrades?

> Re base reactor interface: drawing maximally from the lessons learned in
> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
> etc), asynchronous-looking name lookup, fd handling are the important parts.

That actually sounds more concrete than I'd like a reactor interface
to be. In the App Engine world, there is a definite need for a
reactor, but it cannot talk about file descriptors at all -- all I/O
is defined in terms of RPC operations which have their own (several
layers of) async management but still need to be plugged in to user
code that might want to benefit from other reactor functionality such
as scheduling and placing a call at a certain moment in the future.

> call_every can be implemented in terms of call_later on a separate object,
> so I think it should be (eg twisted.internet.task.LoopingCall). One thing
> that is apparently forgotten about is event loop integration. The prime way
> of having two event loops cooperate is *NOT* "run both in parallel", it's
> "have one call the other". Even though not all loops support this, I think
> it's important to get this as part of the interface (raise an exception for
> all I care if it doesn't work).

This is definitely one of the things we ought to get right. My own
thoughts are slightly (perhaps only cosmetically) different again:
ideally each event loop would have a primitive operation to tell it to
run for a little while, and then some other code could tie several
event loops together.

Possibly the primitive operation would be something like "block until
either you've got one event ready, or until a certain time (possibly
0) has passed without any events, and then give us the events that are
ready and a lower bound for when you might have more work to do" -- or
maybe instead of returning the event(s) it could just call the
associated callback (it might have to if it is part of a GUI library
that has callbacks written in C/C++ for certain events like screen

Anyway, it would be good to have input from representatives from Wx,
Qt, Twisted and Tornado to ensure that the *functionality* required is
all there (never mind the exact signatures of the APIs needed to
provide all that functionality).

--Guido van Rossum (

From guido at  Fri Oct 12 00:28:18 2012
From: guido at (Guido van Rossum)
Date: Thu, 11 Oct 2012 15:28:18 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 10:12 PM, Ben Darnell <ben at> wrote:
> On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum <guido at> wrote:
>>> It's a Future constructor, a (conditional) add_done_callback, plus the
>>> calls to set_result or set_exception and the with statement for error
>>> handling.  In full:
>>> def future_wrap(f):
>>>     @functools.wraps(f)
>>>     def wrapper(*args, **kwargs):
>>>         future = Future()
>>>         if kwargs.get('callback') is not None:
>>>             future.add_done_callback(kwargs.pop('callback'))
>>>         kwargs['callback'] = future.set_result
>>>         def handle_error(typ, value, tb):
>>>             future.set_exception(value)
>>>             return True
>>>         with ExceptionStackContext(handle_error):
>>>             f(*args, **kwargs)
>>>         return future
>>>     return wrapper
>> Hmm... I *think* it automatically adds a special keyword 'callback' to
>> the *call* site so that you can do things like
>>   fut = some_wrapped_func(blah, callback=my_callback)
>> and then instead of using yield to wait for the callback, put the
>> continuation of your code in the my_callback() function.
> Yes.  Note that if you're passing in a callback you're probably going
> to just ignore the return value.  The callback argument and the future
> return value are essentially two alternative interfaces; it probably
> doesn't make sense to use both at once (but as a library author it's
> useful to provide both).

Definitely sounds like something that could be simplified if you
didn't have backward compatibility baggage...

>> But it also
>> seems like it passes callback=future.set_result as the callback to the
>> wrapped function, which looks to me like that function was apparently
>> written before Futures were widely used. This seems pretty impure to
>> me and I'd like to propose a "future" where such functions either be
>> given the Future where the result is expected, or (more commonly) the
>> function would create the Future itself.
> Yes, it's impure and based on pre-Future patterns.  The caller's
> callback argument and the inner function's callback not really related
> any more (they were the same in pre-Future async code of course).
> They should probably have different names, although if the inner
> function's return value were passed via exception (StopIteration or
> return) the inner callback argument can just go away.
>> Unless I'm totally missing the programming model here.
>> PS. I'd like to learn more about ExceptionStackContext() -- I've
>> struggled somewhat with getting decent tracebacks in NDB.
> StackContext doesn't quite give you better tracebacks, although I
> think it could be adapted to do that.  ExceptionStackContext is
> essentially a try/except block that follows you around across
> asynchronous operations - on entry it sets a thread-local state, and
> all the tornado asynchronous functions know to save this state when
> they are passed a callback, and restore it when they execute it.  This
> has proven to be extremely helpful in ensuring that all exceptions get
> caught by something that knows how to do the appropriate cleanup (i.e.
> an asynchronous web page serves an error instead of just spinning
> forever), although it has turned out to be a little more intrusive and
> magical than I had originally anticipated.

Heh. I'll try to mine it for gems.

>>>>> In Tornado the Future is created by a decorator
>>>>> and hidden from the asynchronous function (it just sees the callback),
>>>> Hm, interesting. NDB goes the other way, the callbacks are mostly used
>>>> to make Futures work, and most code (including large swaths of
>>>> internal code) uses Futures. I think NDB is similar to monocle here.
>>>> In NDB, you can do
>>>>   f = <some function returning a Future>
>>>>   r = yield f
>>>> where "yield f" is mostly equivalent to f.result(), except it gives
>>>> better opportunity for concurrency.
>>> Yes, tornado's gen.engine does the same thing here.  However, the
>>> stakes are higher than "better opportunity for concurrency" - in an
>>> event loop if you call future.result() without yielding, you'll
>>> deadlock if that Future's task needs to run on the same event loop.
>> That would depend on the semantics of the event loop implementation.
>> In NDB's event loop, such a .result() call would just recursively
>> enter the event loop, and you'd only deadlock if you actually have two
>> pieces of code waiting for each other's completion.
> Hmm, I think I'd rather deadlock. :)  If the event loop is reentrant
> then the application code has be coded defensively as if it were
> preemptively multithreaded, which introduces the possibility of
> deadlock or (probably) more subtle/less frequent errors.  Reentrancy
> has been a significant problem in my experience, so I've been moving
> towards a policy where methods in Tornado that take a callback never
> run it immediately; callbacks are always scheduled on the next
> iteration of the IOLoop with IOLoop.add_callback.

The latter is a good tactic and I'm also using it. (Except for some
reason we had to add the concept of "immediate callbacks" to our
Future class, and those are run inside the set_result() call. But most
callbacks don't use that feature.)

I don't have a choice about making the event loop reentrant -- App
Engine's underlying RPC multiplexing implementation *is* reentrant,
and there is a large set of "classic" APIs that I cannot stop the user
from calling that reenter it. But even if my hand wasn't forced, I'm
not sure if I would make your choice. In NDB, there is a full
complement of synchronous APIs that exactly matches the async APIs,
and users are free to use the synchronous APIs in parts of their code
where they don't need concurrency. Hence, every sychronous API just
calls its async sibling and immediately waits for its result, which
implicitly invokes the event loop.

Of course, I have it easy -- multiple incoming requests are dispatched
to separate threads by the App Engine runtime, so I don't have to
worry about multiplexing at that level at all -- just end user code
that is essentially single-threaded unless they go out of their way.

I did end up debugging one user's problem where they were making a
synchronous call inside an async handler, and -- very rarely! -- the
recursive event loop calls kept stacking up until they hit a
StackOverflowError. So I would agree that async code shouldn't make
synchronous API calls; but I haven't heard yet from anyone who was
otherwise hurt by the recursive event loop invocations -- in
particular, nobody has requested locks.

Still, this sounds like an important issue to revisit when discussing
a standard reactor API as part of Lourens's PEP offensive.

>> [...]
>>>> I am currently trying to understand if using "yield from" (and
>>>> returning a value from a generator) will simplify things. For example
>>>> maybe the need for a special decorator might go away. But I keep
>>>> getting headaches -- perhaps there's a Monad involved. :-)
>>> I think if you build generator handling directly into the event loop
>>> and use "yield from" for calls from one async function to another then
>>> you can get by without any decorators.  But I'm not sure if you can do
>>> that and maintain any compatibility with existing non-generator async
>>> code.
>>> I think the ability to return from a generator is actually a bigger
>>> deal than "yield from" (and I only learned about it from another
>>> python-ideas thread today).  The only reason a generator decorated
>>> with @tornado.gen.engine needs a callback passed in to it is to act as
>>> a psuedo-return, and a real return would prevent the common mistake of
>>> running the callback then falling through to the rest of the function.
>> Ah, so you didn't come up with the clever hack of raising an exception
>> to signify the return value. In NDB, you raise StopIteration (though
>> it is given the alias 'Return' for clarity) with an argument, and the
>> wrapper code that is responsible for the Future takes the value from
>> the StopIteration exception and passes it to the Future's
>> set_result().
> I think I may have thought about "raise Return(x)" and dismissed it as
> too weird.  But then, I'm abnormally comfortable with asynchronous
> code that passes callbacks around.

As I thought about the issue of how to spell "return a value" and
looked at various approaches, I decided I definitely didn't like what
monocle does: they let you say "yield X" where X is a non-Future
value; and I saw some other solution (Twisted? Phillip Eby?) that
simply called a function named something like returnValue(X). But I
also wanted it to look like a control statement that ends a block (so
auto-indenting editors would auto-dedent the next line), and that
means there are only four choices: continue, break, raise or return.
Three of those are useless... So the only choice really was which
exception to raise. FOrtunately I had the advantage of knowing that
PEP 380 was going to implement "return X" from a generator as "raise
StopIteration(X)" so I decided to be compatible with that.

>>> For concreteness, here's a crude sketch of what the APIs I'm talking
>>> about would look like in use (in a hypothetical future version of
>>> tornado).
>>> @future_wrap
>>> @gen.engine
>>> def async_http_client(url, callback):
>>>     parsed_url = urlparse.urlsplit(url)
>>>     # works the same whether the future comes from a thread pool or @future_wrap
>> And you need the thread pool because there's no async version of
>> getaddrinfo(), right?
> Right.
>>>     addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
>>>     stream = IOStream(socket.socket())
>>>     yield stream.connect((addrinfo[0][-1]))
>>>     stream.write('GET %s HTTP/1.0' % parsed_url.path)
>> Why no yield in front of the write() call?
> Because we don't need to wait for the write to complete before we
> continue to the next statement.  write() doesn't return anything; it
> just succeeds or fails, and if it fails the next read_until will fail
> too. (although in this case it wouldn't hurt to have the yield either)

I guess you have a certain kind of buffering built in to your stream?
So if you make two write() calls without waiting in quick succession,
does the system collapse these into one, or does it end up making two
system calls, or what? In NDB, there's a similar issue with multiple
RPCs that can be batched. I ended up writing an abstraction that
automatically combines these; the call isn't actually made until there
are no other runnable tasks. I've had to explain this a few times to
users who try to get away with overlapping CPU work and I/O, but
otherwise it's worked quite well.

>>>     header_data = yield stream.read_until('\r\n\r\n')
>>>     headers = parse_headers(header_data)
>>>     body_data = yield stream.read_bytes(int(headers['Content-Length']))
>>>     stream.close()
>>>     callback(body_data)
>>> # another function to demonstrate composability
>>> @future_wrap
>>> @gen.engine
>>> def fetch_some_urls(url1, url2, url3, callback):
>>>     body1 = yield async_http_client(url1)
>>>     # yield a list of futures for concurrency
>>>     future2 = yield async_http_client(url2)
>>>     future3 = yield async_http_client(url3)
>>>     body2, body3 = yield [future2, future3]
>>>     callback((body1, body2, body3))
>> This second one is nearly identical to the way we it's done in NDB.
>> However I think you have a typo -- I doubt that there should be yields
>> on the lines creating future2 and future3.
> Right.
>>> One hole in this design is how to deal with callbacks that are run
>>> multiple times.  For example, the IOStream read methods take both a
>>> regular callback and an optional streaming_callback (which is called
>>> with each chunk of data as it arrives).  I think this needs to be
>>> modeled as something like an iterator of Futures, but I haven't worked
>>> out the details yet.
>> Ah. Yes, that's a completely different kind of thing, and probably
>> needs to be handled in a totally different way. I think it probably
>> needs to be modeled more like an infinite loop where at the blocking
>> point (e.g. a low-level read() or accept() call) you yield a Future.
>> Although I can see that this doesn't work well with the IOLoop's
>> concept of file descriptor (or other event source) registration.
> It works just fine at the IOLoop level:  you call
> IOLoop.add_handler(fd, func, READ), and you'll get read events
> whenever there's new data until you call remove_handler(fd) (or
> update_handler).  If you're passing callbacks around explicitly it's
> pretty straightforward (as much as anything ever is in that style) to
> allow for those callbacks to be run more than once.  The problem is
> that generators more or less require that each callback be run exactly
> once.  That's a generally desirable property, but the mismatch between
> the two layers can be difficult to deal with.

Okay, I see that these are useful. However they feel as two very
different classes of callbacks -- one that is called when a *specific*
piece of I/O that was previously requested is done; another that will
be called *whenever* a certain condition becomes true on a certain
channel. The former would correspond to e.g. completion of the headers
of an incoming HTTP request); the latter might correspond to a
"listening" socket receiving another connection.

--Guido van Rossum (

From jeanpierreda at  Fri Oct 12 00:42:55 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Thu, 11 Oct 2012 18:42:55 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 5:18 PM, Guido van Rossum <guido at> wrote:
> On Tue, Oct 9, 2012 at 11:00 AM, Laurens Van Houtven <_ at> wrote:
>> Oh my me. This is a very long thread that I probably should have replied to
>> a long time ago. This thread is intensely long right now, and tonight is the
>> first chance I've had to try and go through it comprehensively. I'll try to
>> reply to individual points made in the thread -- if I missed yours, please
>> don't be offended, I promise it's my fault :)
> No problem, I'm running behind myself...
>> FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka
>> async-pep.
> I suppose that's your pet name for it. :-) For most everyone else it's PEP 3153.
>> First of all, I'm glad to see that there's some more "let's get that pep
>> along" movement. I tabled it because:
>> a) I didn't have enough time to contribute,
>> b) a lot of promised contributions ended up not happening when it came down
>> to it, which was incredibly demotivating. The combination of this thread,
>> plus the fact that I was strong armed at Pycon ZA by a bunch of community
>> members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into
>> exploring this thing again.
>> First of all, I don't feel async-pep is an attempt at twisted light in the
>> stdlib. Other than separation of transport and protocol, there's not really
>> much there that even smells of twisted (especially since right now I'd
>> probably throw consumers/producers out) -- and that separation is simply
>> good practice. Twisted does the same thing, but it didn't invent it.
>> Furthermore, the advantages seem clear: reusability and testability are more
>> than enough for me.
>> If there's one take away idea from async-pep, it's reusable protocols.
> Is there a newer version that what's on
> ? It seems to be missing any
> specific proposals, after spending a lot of time giving a rationale
> and defining some terms. The version on
> doesn't seem to be any more complete.
>> The PEP should probably be a number of PEPs. At first sight, it seems that
>> this number is at least four:
>> 1. Protocol and transport abstractions, making no mention of asynchronous IO
>> (this is what I want 3153 to be, because it's small, manageable, and
>> virtually everyone appears to agree it's a fantastic idea)
> But the devil is in the details. *What* specifically are you
> proposing? How would you write a protocol handler/parser without any
> reference to I/O? Most protocols are two-way streets -- you read some
> stuff, and you write some stuff, then you read some more. (HTTP may be
> the exception here, if you don't keep the connection open.)
>> 2. A base reactor interface
> I agree that this should be a separate PEP. But I do think that in
> practice there will be dependencies between the different PEPs you are
> proposing.
>> 3. A way of structuring callbacks: probably deferreds with a built-in
>> inlineCallbacks for people who want to write synchronous-looking code with
>> explicit yields for asynchronous procedures
> Your previous two ideas sound like you're not tied to backward
> compatibility with Tornado and/or Twisted (not even via an adaptation
> layer). Given that we're talking Python 3.4 here that's fine with me
> (though I think we should be careful to offer a path forward for those
> packages and their users, even if it means making changes to the
> libraries). But Twisted Deferred is pretty arcane, and I would much
> rather not use it as the basis of a forward-looking design. I'd much
> rather see what we can mooch off PEP 3148 (Futures).

Could you be more specific? I've never heard Deferreds in particular
called "arcane". They're very popular in e.g. the JS world, and
possibly elsewhere. Moreover, they're extremely similar to futures, so
if one is arcane so is the other.

Maybe if you could elaborate on features of their designs that are better/worse?

As far as I know, they mostly differ in that:

- Callbacks are added in a pipeline, rather than "in parallel"
- Deferreds pass in values along the pipeline, rather than self (and
have a separate pipeline for error values).

Neither is clearly better or more obvious than the other. If anything
I generally find deferred composition more useful than deferred
tee-ing, so I feel like composition is the correct base operator, but
you could pick another. Either way, each is implementable in terms of
the other (ish?). The pipeline approach is particularly nice for the
errback pipeline, because it allows chained exception (Failure)
handling on the deferred to be very simple. The larger issue is that
futures don't make chaining easy at all, even if it is theoretically

For example, look at the following Twisted code: , and imagine how that
might generalize to more realistic error handling scenarios.

The equivalent Futures code would involve creating one Future per
callback in the pipeline and manually hooking them up with a special
callback that passes values to the next future. And if we add that to
the futures API, the API will almost certainly be somewhat similar to
what Twisted has with deferreds and chaining and such. So then,
equally arcane.

To my mind, it is Futures that need to mooch off of Deferreds, not the
other way around. Twisted's Deferreds have a lot of history with
making asynchronous computation pleasant, and Futures are missing a
lot of good tools.

-- Devin

From guido at  Fri Oct 12 01:37:42 2012
From: guido at (Guido van Rossum)
Date: Thu, 11 Oct 2012 16:37:42 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre
<jeanpierreda at> wrote:
> On Thu, Oct 11, 2012 at 5:18 PM, Guido van Rossum <guido at> wrote:
>> [...] Twisted Deferred is pretty arcane, and I would much
>> rather not use it as the basis of a forward-looking design. I'd much
>> rather see what we can mooch off PEP 3148 (Futures).
> Could you be more specific? I've never heard Deferreds in particular
> called "arcane". They're very popular in e.g. the JS world,

Really? Twisted is used in the JS world? Or do you just mean the
pervasiveness of callback style async programming? That's one of the
things I am desperately trying to keep out of Python, I find that
style unreadable and unmanageable (whenever I click on a button in a
website and nothing happens I know someone has a bug in their
callbacks). I understand you feel different; but I feel the general
sentiment is that callback-based async programming is even harder than
multi-threaded programming (and nobody is claiming that threads are
easy :-).

> and possibly elsewhere. Moreover, they're extremely similar to futures, so
> if one is arcane so is the other.

I love Futures, they represent a nice simple programming model. But I
especially love that you can write async code using Futures and
yield-based coroutines (what you call inlineCallbacks) and never have
to write an explicit callback function. Ever.

> Maybe if you could elaborate on features of their designs that are better/worse?
> As far as I know, they mostly differ in that:
> - Callbacks are added in a pipeline, rather than "in parallel"
> - Deferreds pass in values along the pipeline, rather than self (and
> have a separate pipeline for error values).

These two combined are indeed what mostly feels arcane to me.

> Neither is clearly better or more obvious than the other. If anything
> I generally find deferred composition more useful than deferred
> tee-ing, so I feel like composition is the correct base operator, but
> you could pick another.

If you're writing long complicated chains of callbacks that benefit
from these features, IMO you are already doing it wrong. I understand
that this is a matter of style where I won't be able to convince you.
But style is important to me, so let's agree to disagree.

> Either way, each is implementable in terms of
> the other (ish?). The pipeline approach is particularly nice for the
> errback pipeline, because it allows chained exception (Failure)
> handling on the deferred to be very simple. The larger issue is that
> futures don't make chaining easy at all, even if it is theoretically
> possible.

But as soon as you switch from callbacks to yield-based coroutines the
chaining becomes natural, error handling is just a matter of
try/except statements (or not if you want the error to bubble up) and
(IMO) the code becomes much more readable.

> For example, look at the following Twisted code:
> , and imagine how that
> might generalize to more realistic error handling scenarios.

Looks fine to me. I have a lot of code like that in NDB and it works
great. (Note that NDB's Futures are not the same as PEP 3148 Futures,
although they have some things in common; in particular NDB Futures
are not tied to threads.)

> The equivalent Futures code would involve creating one Future per
> callback in the pipeline and manually hooking them up with a special
> callback that passes values to the next future. And if we add that to
> the futures API, the API will almost certainly be somewhat similar to
> what Twisted has with deferreds and chaining and such. So then,
> equally arcane.

The *implementation* of this stuff in NDB is certainly hairy; I
already posted the link to the code:
However, this is internal code and doesn't affect the Future API at

> To my mind, it is Futures that need to mooch off of Deferreds, not the
> other way around. Twisted's Deferreds have a lot of history with
> making asynchronous computation pleasant, and Futures are missing a
> lot of good tools.

I am totally open to learning from Twisted's experience. I hope that
you are willing to share even the end result might not look like
Twisted at all -- after all in Python 3.3 we have "yield from" and
return from a generator and many years of experience with different
styles of async APIs. In addition to Twisted, there's Tornado and
Monocle, and then there's the whole greenlets/gevent and
Stackless/microthreads community that we can't completely ignore. I
believe somewhere is an ideal async architecture, and I hope you can
help us discover it.

(For example, I am very interested in Twisted's experiences writing
real-world performant, robust reactors.)

--Guido van Rossum (

From dreamingforward at  Fri Oct 12 02:08:21 2012
From: dreamingforward at (Mark Adam)
Date: Thu, 11 Oct 2012 19:08:21 -0500
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 6:35 AM, Steven D'Aprano <steve at> wrote:
> On 11/10/12 16:45, Greg Ewing wrote:
>> Are you sure there would be any point in this? People who
>> specifically *want* base-2 floats are probably quite happy
>> with the current float type, and wouldn't appreciate having
>> it slowed down, even by a small amount.
> I would gladly give up a small amount of speed for better control
> over floats, such as whether 1/0.0 raised an exception or
> returned infinity.

Umm, you would be giving up a *lot* of speed.  Native floating point
happens right in the processor, so if you want special behavior, you'd
have to take the floating point out of the CPU and into "user space".


From steve at  Fri Oct 12 02:16:05 2012
From: steve at (Steven D'Aprano)
Date: Fri, 12 Oct 2012 11:16:05 +1100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 12/10/12 03:05, Stephen J. Turnbull wrote:
> Steven D'Aprano writes:
>   >  I would gladly give up a small amount of speed for better control
>   >  over floats, such as whether 1/0.0 raised an exception or
>   >  returned infinity.
> Isn't that what the fpectl module is supposed to buy, albeit much less
> pleasantly than Decimal contexts do?

I can't test it, because I don't have that module installed, but I would
think not.

Reading the docs:

I would say that fpectl exists to turn on floating point exceptions where
Python currently returns an inf or NaN, not to turn on special values
where Python currently raises an exception, e.g. 1/0.0.

Because it depends on a build-time option, using it is even less convenient
that most other non-standard libraries.

It only has a single exception type for any of Division by Zero, Overflow
and Invalid, and doesn't appear to trap Underflow or Inexact at all. It's
not just less pleasant than Decimal contexts, but much less powerful as


From tjreedy at  Fri Oct 12 02:29:05 2012
From: tjreedy at (Terry Reedy)
Date: Thu, 11 Oct 2012 20:29:05 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <k57o97$g2m$>

On 10/11/2012 5:18 PM, Guido van Rossum wrote:

> Anyway, it would be good to have input from representatives from Wx,
> Qt, Twisted and Tornado to ensure that the *functionality* required is
> all there (never mind the exact signatures of the APIs needed to
> provide all that functionality).

And of course tk/tkinter (tho perhaps we can represent that). It occurs 
to me that while i/o (file/socket) events can be added to a user 
(mouse/key) event loop, and I suspect that some tk/tkinter apps do so, 
it might be sensible to keep the two separate. A master loop could tell 
the user-event loop to handle all user events and then the i/o loop to 
handle one i/o event. This all depends on the relative speed of the 
handler code.

Terry Jan Reedy

From guido at  Fri Oct 12 02:34:33 2012
From: guido at (Guido van Rossum)
Date: Thu, 11 Oct 2012 17:34:33 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <k57o97$g2m$>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 5:29 PM, Terry Reedy <tjreedy at> wrote:
> On 10/11/2012 5:18 PM, Guido van Rossum wrote:
>> Anyway, it would be good to have input from representatives from Wx,
>> Qt, Twisted and Tornado to ensure that the *functionality* required is
>> all there (never mind the exact signatures of the APIs needed to
>> provide all that functionality).
> And of course tk/tkinter (tho perhaps we can represent that). It occurs to
> me that while i/o (file/socket) events can be added to a user (mouse/key)
> event loop, and I suspect that some tk/tkinter apps do so, it might be
> sensible to keep the two separate. A master loop could tell the user-event
> loop to handle all user events and then the i/o loop to handle one i/o
> event. This all depends on the relative speed of the handler code.

You should talk to a Tcl/Tk user (if there are any left :-). They
actually really like the unified event loop that's used for both
widget events and network events. Tk is probably also a good example
of a hybrid GUI system, where some of the callbacks (e.g. redraw
events) are implemented in C.

--Guido van Rossum (

From ben at  Fri Oct 12 02:41:57 2012
From: ben at (Ben Darnell)
Date: Thu, 11 Oct 2012 17:41:57 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 3:28 PM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 8, 2012 at 10:12 PM, Ben Darnell <ben at> wrote:
>> On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum <guido at> wrote:
>>>> It's a Future constructor, a (conditional) add_done_callback, plus the
>>>> calls to set_result or set_exception and the with statement for error
>>>> handling.  In full:
>>>> def future_wrap(f):
>>>>     @functools.wraps(f)
>>>>     def wrapper(*args, **kwargs):
>>>>         future = Future()
>>>>         if kwargs.get('callback') is not None:
>>>>             future.add_done_callback(kwargs.pop('callback'))
>>>>         kwargs['callback'] = future.set_result
>>>>         def handle_error(typ, value, tb):
>>>>             future.set_exception(value)
>>>>             return True
>>>>         with ExceptionStackContext(handle_error):
>>>>             f(*args, **kwargs)
>>>>         return future
>>>>     return wrapper
>>> Hmm... I *think* it automatically adds a special keyword 'callback' to
>>> the *call* site so that you can do things like
>>>   fut = some_wrapped_func(blah, callback=my_callback)
>>> and then instead of using yield to wait for the callback, put the
>>> continuation of your code in the my_callback() function.
>> Yes.  Note that if you're passing in a callback you're probably going
>> to just ignore the return value.  The callback argument and the future
>> return value are essentially two alternative interfaces; it probably
>> doesn't make sense to use both at once (but as a library author it's
>> useful to provide both).
> Definitely sounds like something that could be simplified if you
> didn't have backward compatibility baggage...

Probably, although I still feel like callback-passing has its place.
For example, I think the Tornado chat demo
would be less clear with coroutines and Futures than it is now
(although it would fit better into Greg's schedule/unschedule style).
That doesn't mean that every method has to take a callback, but I'd be
reluctant to get rid of them until we have more experience with the
generator/future-focused style.

>>>>>> In Tornado the Future is created by a decorator
>>>>>> and hidden from the asynchronous function (it just sees the callback),
>>>>> Hm, interesting. NDB goes the other way, the callbacks are mostly used
>>>>> to make Futures work, and most code (including large swaths of
>>>>> internal code) uses Futures. I think NDB is similar to monocle here.
>>>>> In NDB, you can do
>>>>>   f = <some function returning a Future>
>>>>>   r = yield f
>>>>> where "yield f" is mostly equivalent to f.result(), except it gives
>>>>> better opportunity for concurrency.
>>>> Yes, tornado's gen.engine does the same thing here.  However, the
>>>> stakes are higher than "better opportunity for concurrency" - in an
>>>> event loop if you call future.result() without yielding, you'll
>>>> deadlock if that Future's task needs to run on the same event loop.
>>> That would depend on the semantics of the event loop implementation.
>>> In NDB's event loop, such a .result() call would just recursively
>>> enter the event loop, and you'd only deadlock if you actually have two
>>> pieces of code waiting for each other's completion.
>> Hmm, I think I'd rather deadlock. :)  If the event loop is reentrant
>> then the application code has be coded defensively as if it were
>> preemptively multithreaded, which introduces the possibility of
>> deadlock or (probably) more subtle/less frequent errors.  Reentrancy
>> has been a significant problem in my experience, so I've been moving
>> towards a policy where methods in Tornado that take a callback never
>> run it immediately; callbacks are always scheduled on the next
>> iteration of the IOLoop with IOLoop.add_callback.
> The latter is a good tactic and I'm also using it. (Except for some
> reason we had to add the concept of "immediate callbacks" to our
> Future class, and those are run inside the set_result() call. But most
> callbacks don't use that feature.)
> I don't have a choice about making the event loop reentrant -- App
> Engine's underlying RPC multiplexing implementation *is* reentrant,
> and there is a large set of "classic" APIs that I cannot stop the user
> from calling that reenter it. But even if my hand wasn't forced, I'm
> not sure if I would make your choice. In NDB, there is a full
> complement of synchronous APIs that exactly matches the async APIs,
> and users are free to use the synchronous APIs in parts of their code
> where they don't need concurrency. Hence, every sychronous API just
> calls its async sibling and immediately waits for its result, which
> implicitly invokes the event loop.

Tornado has a synchronous HTTPClient that does the same thing,
although each fetch creates and runs its own IOLoop rather than
spinning the top-level IOLoop.  (This means it doesn't really make
sense to run it when there is a top-level IOLoop; it's provided as a
convenience for scripts and multi-threaded apps who want an
HTTPRequest interface consistent with the async version).

> Of course, I have it easy -- multiple incoming requests are dispatched
> to separate threads by the App Engine runtime, so I don't have to
> worry about multiplexing at that level at all -- just end user code
> that is essentially single-threaded unless they go out of their way.
> I did end up debugging one user's problem where they were making a
> synchronous call inside an async handler, and -- very rarely! -- the
> recursive event loop calls kept stacking up until they hit a
> StackOverflowError. So I would agree that async code shouldn't make
> synchronous API calls; but I haven't heard yet from anyone who was
> otherwise hurt by the recursive event loop invocations -- in
> particular, nobody has requested locks.

I think that's because you don't have file descriptor support.  In a
(level-triggered) event loop if you don't drain the socket before
reentering the loop then your read handler will be called again, which
generally makes a mess.  I suppose with coroutines you'd want
edge-triggered instead of level-triggered though, which might make
this problem go away.

>>>> For concreteness, here's a crude sketch of what the APIs I'm talking
>>>> about would look like in use (in a hypothetical future version of
>>>> tornado).
>>>> @future_wrap
>>>> @gen.engine
>>>> def async_http_client(url, callback):
>>>>     parsed_url = urlparse.urlsplit(url)
>>>>     # works the same whether the future comes from a thread pool or @future_wrap
>>> And you need the thread pool because there's no async version of
>>> getaddrinfo(), right?
>> Right.
>>>>     addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
>>>>     stream = IOStream(socket.socket())
>>>>     yield stream.connect((addrinfo[0][-1]))
>>>>     stream.write('GET %s HTTP/1.0' % parsed_url.path)
>>> Why no yield in front of the write() call?
>> Because we don't need to wait for the write to complete before we
>> continue to the next statement.  write() doesn't return anything; it
>> just succeeds or fails, and if it fails the next read_until will fail
>> too. (although in this case it wouldn't hurt to have the yield either)
> I guess you have a certain kind of buffering built in to your stream?
> So if you make two write() calls without waiting in quick succession,
> does the system collapse these into one, or does it end up making two
> system calls, or what? In NDB, there's a similar issue with multiple
> RPCs that can be batched. I ended up writing an abstraction that
> automatically combines these; the call isn't actually made until there
> are no other runnable tasks. I've had to explain this a few times to
> users who try to get away with overlapping CPU work and I/O, but
> otherwise it's worked quite well.

Yes, IOStream does buffering for you.  Each IOStream.write() call will
generally result in a syscall, but once the outgoing socket buffer is
full subsequent writes will be buffered in the IOStream and written
when the IOLoop says the socket is writable.  (the callback argument
to write() can be used for flow control in this case)  I used to defer
the syscall until the IOLoop was idle to batch things up, but it turns
out to be more efficient in practice to just write things out each
time and let the higher level do its own buffering when appropriate.


From ben at  Fri Oct 12 02:57:38 2012
From: ben at (Ben Darnell)
Date: Thu, 11 Oct 2012 17:57:38 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum <guido at> wrote:
>> Re base reactor interface: drawing maximally from the lessons learned in
>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
>> etc), asynchronous-looking name lookup, fd handling are the important parts.
> That actually sounds more concrete than I'd like a reactor interface
> to be. In the App Engine world, there is a definite need for a
> reactor, but it cannot talk about file descriptors at all -- all I/O
> is defined in terms of RPC operations which have their own (several
> layers of) async management but still need to be plugged in to user
> code that might want to benefit from other reactor functionality such
> as scheduling and placing a call at a certain moment in the future.

So are you thinking of something like
reactor.add_event_listener(event_type, event_params, func)?  One thing
to keep in mind is that file descriptors are somewhat special (at
least in a level-triggered event loop), because of the way the event
will keep firing until the socket buffer is drained or the event is
unregistered.  I'd be inclined to keep file descriptors in the
interface even if they just raise an error on app engine, since
they're fairly fundamental to the (unixy) event loop.  On the other
hand, I don't have any experience with event loops outside the
unix/network world so I don't know what other systems might need for
their event loops.

>> call_every can be implemented in terms of call_later on a separate object,
>> so I think it should be (eg twisted.internet.task.LoopingCall). One thing
>> that is apparently forgotten about is event loop integration. The prime way
>> of having two event loops cooperate is *NOT* "run both in parallel", it's
>> "have one call the other". Even though not all loops support this, I think
>> it's important to get this as part of the interface (raise an exception for
>> all I care if it doesn't work).
> This is definitely one of the things we ought to get right. My own
> thoughts are slightly (perhaps only cosmetically) different again:
> ideally each event loop would have a primitive operation to tell it to
> run for a little while, and then some other code could tie several
> event loops together.
> Possibly the primitive operation would be something like "block until
> either you've got one event ready, or until a certain time (possibly
> 0) has passed without any events, and then give us the events that are
> ready and a lower bound for when you might have more work to do" -- or
> maybe instead of returning the event(s) it could just call the
> associated callback (it might have to if it is part of a GUI library
> that has callbacks written in C/C++ for certain events like screen
> refreshes).

That doesn't work very well - while one loop is waiting for its
timeout, nothing can happen on the other event loop.  You have to
switch back and forth frequently to keep things responsive, which is
inefficient.  I'd rather give each event loop its own thread; you can
minimize the thread-synchronization concerns by picking one loop as
"primary" and having all the others just pass callbacks over to it
when their events fire.


> Anyway, it would be good to have input from representatives from Wx,
> Qt, Twisted and Tornado to ensure that the *functionality* required is
> all there (never mind the exact signatures of the APIs needed to
> provide all that functionality).
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From greg.ewing at  Fri Oct 12 03:32:10 2012
From: greg.ewing at (Greg Ewing)
Date: Fri, 12 Oct 2012 14:32:10 +1300
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> Though I think you should not use
> 'thread' since that term is already reserved for OS threads as
> supported by the threading module. ... You could also use task, 
> which also doesn't have a core Python
> meaning. 
> Also I think you can now revisit it and rewrite the code to use Python 3.3.

Both good ideas. I'll see about publishing an updated version.

> It does bother me somehow that you're not using .send() and yield
> arguments at all. I notice that you have a lot ofthree-line code
> blocks like this:
>       block_for_reading(sock)
>       yield
>       data = sock.recv(1024)

I wouldn't say I have a "lot". In the spamserver, there are really
only three -- one for accepting a connection, one for reading from
a socket, and one for writing to a socket. These are primitive
operations that would be provided by an async socket library.

Generally, all the yields would be hidden inside primitives like
this. Normally, user code would never need to use 'yield', only
'yield from'.

This probably didn't come through as clearly as it might have in my
tutorial. Part of the reason is that at the time I wrote it, I was
having to manually expand yield-froms into for-loops, so I was
reluctant to use any more of them than I needed to. Also, yield-from
was a new and unfamiliar concept, and I didn't want to scare people
by overusing it. These considerations led me to push some of the
yields slightly further up the layer stack than they could be.

> The general form seems to be:
>       arrange for a callback when some operation can be done without blocking
>       yield
>       do the operation
> This seems to be begging to be collapsed into a single line, e.g.
>       data = yield sock.recv_async(1024)

I'm not sure how you're imagining that would work, but whatever
it is, it's wrong -- that just doesn't make sense.

What *would* make sense is

    data = yield from sock.recv_async(1024)

with sock.recv_async() being a primitive that encapsulates the
block/yield/process triplet.

> (I would also prefer to see the socket wrapped in an object that makes
> it hard to accidentally block.)

It would be straightforward to make the primitives be methods of a
socket wrapper object. I only used functions in the tutorial in the
interests of keeping the amount of machinery to a bare minimum.

> But surely there's still a place for send() and other PEP 342 features?

In the wider world of generator usage, yes. If you have a
generator that it makes sense to send() things into, for
example, and you want to factor part of it out into another
function, the fact that yield-from passes through sent values
is useful.

But we're talking about a very specialised use of generators
here, and so far I haven't thought of a use for sent or yielded
values in this context that can't be done in a more straightforward
way by other means.

Keep in mind that a value yielded by a generator being used as
part of a coroutine is *not* seen by code calling it with
yield-from. Rather, it comes out in the inner loop of the
scheduler, from the next() call being used to resume the
coroutine. Likewise, any send() call would have to be made
by the scheduler, not the yield-from caller.

So, the send/yield channel is exclusively for communication
with the *scheduler* and nothing else. Under the old way of
doing generator-based coroutines, this channel was used to
simulate a call stack by yielding 'call' and 'return'
instructions that the scheduler interpreted. But all that
is now taken care of by the yield-from mechanism, and there
is nothing left for the send/yield channel to do.

> my users sometimes want to
> treat something as a coroutine but they don't have any yields in it
> def caller():
>   data = yield from reader()
> def reader():
>     return 'dummy'
>     yield
> works, but if you drop the yield it doesn't work. With a decorator I
> know how to make it work either way.

If you're talking about a decorator that turns a function
into a generator, I can't see anything particularly headachish
about that. If you mean something else, you'll have to elaborate.


From steve at  Fri Oct 12 03:03:50 2012
From: steve at (Steven D'Aprano)
Date: Fri, 12 Oct 2012 12:03:50 +1100
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 12/10/12 11:04, Mark Adam wrote:
> On Thu, Oct 11, 2012 at 6:35 AM, Steven D'Aprano<steve at>  wrote:
>> On 11/10/12 16:45, Greg Ewing wrote:
>>> Are you sure there would be any point in this? People who
>>> specifically *want* base-2 floats are probably quite happy
>>> with the current float type, and wouldn't appreciate having
>>> it slowed down, even by a small amount.
>> I would gladly give up a small amount of speed for better control
>> over floats, such as whether 1/0.0 raised an exception or
>> returned infinity.
> Umm, you would be giving up a *lot* of speed.  Native floating point
> happens right in the processor, so if you want special behavior, you'd
> have to take the floating point out of hardware and into "user space".

Any half-decent processor supports the IEEE-754 standard. If it doesn't,
it's broken by design.

Even in user-space, you're not giving up that much speed in practical
terms, at least not for my needs. The new decimal module in Python 3.3 is
less than a factor of 10 times slower than Python's floats, which makes it
pretty much instantaneous to my mind :)

numpy supports configurable numeric contexts, and I don't hear that many
complaints that numpy is slower than standard Python.


> mark

From dreamingforward at  Fri Oct 12 03:38:43 2012
From: dreamingforward at (Mark Adam)
Date: Thu, 11 Oct 2012 20:38:43 -0500
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 7:34 PM, Guido van Rossum <guido at> wrote:
> On Thu, Oct 11, 2012 at 5:29 PM, Terry Reedy <tjreedy at> wrote:
>> On 10/11/2012 5:18 PM, Guido van Rossum wrote:
>>> Anyway, it would be good to have input from representatives from Wx,
>>> Qt, Twisted and Tornado to ensure that the *functionality* required is
>>> all there (never mind the exact signatures of the APIs needed to
>>> provide all that functionality).
>> And of course tk/tkinter (tho perhaps we can represent that). It occurs to
>> me that while i/o (file/socket) events can be added to a user (mouse/key)
>> event loop, and I suspect that some tk/tkinter apps do so, it might be
>> sensible to keep the two separate. A master loop could tell the user-event
>> loop to handle all user events and then the i/o loop to handle one i/o
>> event. This all depends on the relative speed of the handler code.

Here's the thing:  the underlying O.S is always handling two major I/O
channels at any given time and it needs all it's attention to do this:
 the GUI and one of the following (network, file) I/O.  You can
shuffle these around all you want, but somewhere the O.S. kernel is
going to have to be involved, which means either portability is
sacrificed or speed if one is going to pursue and abstract, unified
async API.

> You should talk to a Tcl/Tk user (if there are any left :-).

I used to be one of those :)


From stephen at  Fri Oct 12 05:01:12 2012
From: stephen at (Stephen J. Turnbull)
Date: Fri, 12 Oct 2012 12:01:12 +0900
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

Steven D'Aprano writes:
 > On 12/10/12 03:05, Stephen J. Turnbull wrote:
 > > Steven D'Aprano writes:
 > >
 > >   >  I would gladly give up a small amount of speed for better control
 > >   >  over floats, such as whether 1/0.0 raised an exception or
 > >   >  returned infinity.
 > >
 > > Isn't that what the fpectl module is supposed to buy, albeit much less
 > > pleasantly than Decimal contexts do?
 > I can't test it, because I don't have that module installed, but I would
 > think not.
 > Reading the docs:
 > I would say that fpectl exists to turn on floating point exceptions where
 > Python currently returns an inf or NaN, not to turn on special values
 > where Python currently raises an exception, e.g. 1/0.0.

OK.  But if Python does that, it must be checking the value of the
operand as well as the type.  Surely that could be delegated to the
hardware easily by commenting out one line.  (Of course that would
need to be a build-time option, and requires care in initialization.)

 > Because it depends on a build-time option, using it is even less convenient
 > that most other non-standard libraries.

That is neither here nor there.  I think the people who would use such
facilities are a very small minority; imposing a slight extra burden
on them is not a huge cost to Python.  Eg, I'm perfectly happy with
Python's current behavior because I only write toy examples/classroom
demos in pure Python.  If I were going to try to write statistical
code in Python (vaguely plausible but not likely :-), I'd surely use

 > It only has a single exception type for any of Division by Zero, Overflow
 > and Invalid, and doesn't appear to trap Underflow or Inexact at all. It's
 > not just less pleasant than Decimal contexts, but much less powerful as
 > well.

Now you're really picking nits.  Nobody said fpectl is perfect for all
uses, just that you could get *better* control over floats.  If you're
going to insist that nothing less than Decimal contexts will do,
you're right for you -- but that's not what you said.

From guido at  Fri Oct 12 05:40:37 2012
From: guido at (Guido van Rossum)
Date: Thu, 11 Oct 2012 20:40:37 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 5:41 PM, Ben Darnell <ben at> wrote:
> On Thu, Oct 11, 2012 at 3:28 PM, Guido van Rossum <guido at> wrote:
>> On Mon, Oct 8, 2012 at 10:12 PM, Ben Darnell <ben at> wrote:
>>> On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum <guido at> wrote:
>>>>> It's a Future constructor, a (conditional) add_done_callback, plus the
>>>>> calls to set_result or set_exception and the with statement for error
>>>>> handling.  In full:
>>>>> def future_wrap(f):
>>>>>     @functools.wraps(f)
>>>>>     def wrapper(*args, **kwargs):
>>>>>         future = Future()
>>>>>         if kwargs.get('callback') is not None:
>>>>>             future.add_done_callback(kwargs.pop('callback'))
>>>>>         kwargs['callback'] = future.set_result
>>>>>         def handle_error(typ, value, tb):
>>>>>             future.set_exception(value)
>>>>>             return True
>>>>>         with ExceptionStackContext(handle_error):
>>>>>             f(*args, **kwargs)
>>>>>         return future
>>>>>     return wrapper
>>>> Hmm... I *think* it automatically adds a special keyword 'callback' to
>>>> the *call* site so that you can do things like
>>>>   fut = some_wrapped_func(blah, callback=my_callback)
>>>> and then instead of using yield to wait for the callback, put the
>>>> continuation of your code in the my_callback() function.
>>> Yes.  Note that if you're passing in a callback you're probably going
>>> to just ignore the return value.  The callback argument and the future
>>> return value are essentially two alternative interfaces; it probably
>>> doesn't make sense to use both at once (but as a library author it's
>>> useful to provide both).
>> Definitely sounds like something that could be simplified if you
>> didn't have backward compatibility baggage...
> Probably, although I still feel like callback-passing has its place.
> For example, I think the Tornado chat demo
> (
> would be less clear with coroutines and Futures than it is now
> (although it would fit better into Greg's schedule/unschedule style).

Hmm... That's an interesting challenge. I can't quite say I understand
that whole program yet, but I'd like to give it a try. I think it can
be made clearer than Tornado with Futures and coroutines -- it all
depends on how you define your primitives.

> That doesn't mean that every method has to take a callback, but I'd be
> reluctant to get rid of them until we have more experience with the
> generator/future-focused style.

Totally understood. Though the nice thing of Futures is that you can
tie callbacks to them *or* use them in coroutines.

>>>>>>> In Tornado the Future is created by a decorator
>>>>>>> and hidden from the asynchronous function (it just sees the callback),
>>>>>> Hm, interesting. NDB goes the other way, the callbacks are mostly used
>>>>>> to make Futures work, and most code (including large swaths of
>>>>>> internal code) uses Futures. I think NDB is similar to monocle here.
>>>>>> In NDB, you can do
>>>>>>   f = <some function returning a Future>
>>>>>>   r = yield f
>>>>>> where "yield f" is mostly equivalent to f.result(), except it gives
>>>>>> better opportunity for concurrency.
>>>>> Yes, tornado's gen.engine does the same thing here.  However, the
>>>>> stakes are higher than "better opportunity for concurrency" - in an
>>>>> event loop if you call future.result() without yielding, you'll
>>>>> deadlock if that Future's task needs to run on the same event loop.
>>>> That would depend on the semantics of the event loop implementation.
>>>> In NDB's event loop, such a .result() call would just recursively
>>>> enter the event loop, and you'd only deadlock if you actually have two
>>>> pieces of code waiting for each other's completion.
>>> Hmm, I think I'd rather deadlock. :)  If the event loop is reentrant
>>> then the application code has be coded defensively as if it were
>>> preemptively multithreaded, which introduces the possibility of
>>> deadlock or (probably) more subtle/less frequent errors.  Reentrancy
>>> has been a significant problem in my experience, so I've been moving
>>> towards a policy where methods in Tornado that take a callback never
>>> run it immediately; callbacks are always scheduled on the next
>>> iteration of the IOLoop with IOLoop.add_callback.
>> The latter is a good tactic and I'm also using it. (Except for some
>> reason we had to add the concept of "immediate callbacks" to our
>> Future class, and those are run inside the set_result() call. But most
>> callbacks don't use that feature.)
>> I don't have a choice about making the event loop reentrant -- App
>> Engine's underlying RPC multiplexing implementation *is* reentrant,
>> and there is a large set of "classic" APIs that I cannot stop the user
>> from calling that reenter it. But even if my hand wasn't forced, I'm
>> not sure if I would make your choice. In NDB, there is a full
>> complement of synchronous APIs that exactly matches the async APIs,
>> and users are free to use the synchronous APIs in parts of their code
>> where they don't need concurrency. Hence, every sychronous API just
>> calls its async sibling and immediately waits for its result, which
>> implicitly invokes the event loop.
> Tornado has a synchronous HTTPClient that does the same thing,
> although each fetch creates and runs its own IOLoop rather than
> spinning the top-level IOLoop.  (This means it doesn't really make
> sense to run it when there is a top-level IOLoop; it's provided as a
> convenience for scripts and multi-threaded apps who want an
> HTTPRequest interface consistent with the async version).

I see. Yet another possible design choice.

>> Of course, I have it easy -- multiple incoming requests are dispatched
>> to separate threads by the App Engine runtime, so I don't have to
>> worry about multiplexing at that level at all -- just end user code
>> that is essentially single-threaded unless they go out of their way.
>> I did end up debugging one user's problem where they were making a
>> synchronous call inside an async handler, and -- very rarely! -- the
>> recursive event loop calls kept stacking up until they hit a
>> StackOverflowError. So I would agree that async code shouldn't make
>> synchronous API calls; but I haven't heard yet from anyone who was
>> otherwise hurt by the recursive event loop invocations -- in
>> particular, nobody has requested locks.
> I think that's because you don't have file descriptor support.  In a
> (level-triggered) event loop if you don't drain the socket before
> reentering the loop then your read handler will be called again, which
> generally makes a mess.  I suppose with coroutines you'd want
> edge-triggered instead of level-triggered though, which might make
> this problem go away.

Ah, good terminology. Coroutines definitely like being edge-triggered.

>>>>> For concreteness, here's a crude sketch of what the APIs I'm talking
>>>>> about would look like in use (in a hypothetical future version of
>>>>> tornado).
>>>>> @future_wrap
>>>>> @gen.engine
>>>>> def async_http_client(url, callback):
>>>>>     parsed_url = urlparse.urlsplit(url)
>>>>>     # works the same whether the future comes from a thread pool or @future_wrap
>>>> And you need the thread pool because there's no async version of
>>>> getaddrinfo(), right?
>>> Right.
>>>>>     addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port)
>>>>>     stream = IOStream(socket.socket())
>>>>>     yield stream.connect((addrinfo[0][-1]))
>>>>>     stream.write('GET %s HTTP/1.0' % parsed_url.path)
>>>> Why no yield in front of the write() call?
>>> Because we don't need to wait for the write to complete before we
>>> continue to the next statement.  write() doesn't return anything; it
>>> just succeeds or fails, and if it fails the next read_until will fail
>>> too. (although in this case it wouldn't hurt to have the yield either)
>> I guess you have a certain kind of buffering built in to your stream?
>> So if you make two write() calls without waiting in quick succession,
>> does the system collapse these into one, or does it end up making two
>> system calls, or what? In NDB, there's a similar issue with multiple
>> RPCs that can be batched. I ended up writing an abstraction that
>> automatically combines these; the call isn't actually made until there
>> are no other runnable tasks. I've had to explain this a few times to
>> users who try to get away with overlapping CPU work and I/O, but
>> otherwise it's worked quite well.
> Yes, IOStream does buffering for you.  Each IOStream.write() call will
> generally result in a syscall, but once the outgoing socket buffer is
> full subsequent writes will be buffered in the IOStream and written
> when the IOLoop says the socket is writable.  (the callback argument
> to write() can be used for flow control in this case)  I used to defer
> the syscall until the IOLoop was idle to batch things up, but it turns
> out to be more efficient in practice to just write things out each
> time and let the higher level do its own buffering when appropriate.

Makes sense. I think different people might want to implement slightly
different IOStream-like abstractions; this would be a good test of the
infrastructure. You should be able to craft one from scratch out of
sockets and Futures, but there should be one or two standard ones as
well, and they should all happily mix and match using the same

--Guido van Rossum (

From stephen at  Fri Oct 12 05:40:56 2012
From: stephen at (Stephen J. Turnbull)
Date: Fri, 12 Oct 2012 12:40:56 +0900
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

Oscar Benjamin writes:

 > But the fpectl module IIUC wouldn't work for 1 / 0.

No, and it shouldn't.

 > Since Python has managed to unify integer/float division now it
 > would be a shame to introduce any new reasons to bring in
 > superfluous .0s again:

With all due respect to the designers, unification of integer/float
division is a compromise, even a mathematical kludge.  I'm not
complaining, it happens to work well for most applications, even for
me (at least where I need a computer to do the calculations :-).

Practicality beats purity.

 > with context(zero_division='infinity'):
 >     x = 1 / 0.0  # float('inf')
 >     y = 1 / 0  # I'd like to see float('inf') here as well

I'd hate that.  Zero simply isn't a unit in any ring of integers; if I
want to handle divide-by-zero specially (rather than consider it a
programming error in preceding code) a LBYL non-zero divisor test or a
try handler for divide-by-zero is appropriate.

And in the case of

    z = -1 / 0.0

should it be float('inf') (complex) or -float('inf') (real)?
(Obviously it should be the latter, as most scientific programming is
done using real algorithms.  But one could argue that just as integer
is corrupted to float in the interests of continuity in division
results, float should be corrupted to complex in the interest of a
larger domain for roots and trigonometric functions.)

 > I've spent 4 hours this week in computer labs with students using
 > Python 2.7 as an introduction to scientific programming. A significant
 > portion of that time was spent explaining the int/float division
 > problem. They all get the issue now but not all of them understand
 > that it is specifically about division: many are putting .0s
 > everywhere.

A perfectly rational approach for them, which may appeal to their
senses of beauty in mathematics -- I personally would always write
1.0/0.0, not 1/0.0, and more mathematically correct than what you try
to teach them.  I really don't understand why you have a problem with
it.  Your problem seems to be that Python shouldn't have integers,
except as an internal optimization for a subset of floating point
operations.  Then "1" could always be an abbreviation for "1.0"!

 > I expect it to be easier when we use Python 3 and I can simply
 > explain that there are two types of division with two different
 > operators.

Well, it's been more than 40 years since I studied this stuff in
America, but what they taught 10-year-olds then was that there are two
ways to view division: in integers with result and remainder, and as a
fraction.  And they used the same operator!  Not to mention that the
algorithm for reducing fractions depends on integer division.  It's a
shame students forget so quickly. :-)

From jeanpierreda at  Fri Oct 12 06:29:05 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Fri, 12 Oct 2012 00:29:05 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

First of all, sorry for not snipping the reply I made previously.
Noticed that only after I sent it :(

On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum <guido at> wrote:
> On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre
> <jeanpierreda at> wrote:
>> Could you be more specific? I've never heard Deferreds in particular
>> called "arcane". They're very popular in e.g. the JS world,
> Really? Twisted is used in the JS world? Or do you just mean the
> pervasiveness of callback style async programming?

Ah, I mean Deferreds. I attended a talk earlier this year all about
deferreds in JS, and not a single reference to Python or Twisted was

These are the examples I remember mentioned in the talk:

- (not very twistedish
at all, ill-liked by the speaker)
- (maybe
not a good example, mochikit tries to be "python in JS")
- (also includes an explanation of why
the author likes deferreds)

There were a few more that the speaker mentioned, but didn't cover.
One of his points was that the various systems of deferreds are subtly
different, some very badly so, and that it was a mess, but that
deferreds were still awesome. JS is a language where async programming
is mainstream, so lots of people try to make it easier, and they all
do it slightly differently.

> That's one of the
> things I am desperately trying to keep out of Python, I find that
> style unreadable and unmanageable (whenever I click on a button in a
> website and nothing happens I know someone has a bug in their
> callbacks). I understand you feel different; but I feel the general
> sentiment is that callback-based async programming is even harder than
> multi-threaded programming (and nobody is claiming that threads are
> easy :-).


There are (at least?) four different styles of asynchronous
computation used in Twisted, and you seem to be confused as to which
ones I'm talking about.

1. Explicit callbacks:

    For example, reactor.callLater(t, lambda: print("woo hoo"))

2. Method dispatch callbacks:

    Similar to the above, the reactor or somebody has a handle on your
object, and calls methods that you've defined when events happen
    e.g. IProtocol's dataReceived method

3. Deferred callbacks:

    When you ask for something to be done, it's set up, and you get an
object back, which you can add a pipeline of callbacks to that will be
called whenever whatever happens
    e.g. twisted.internet.threads.deferToThread(print,
"x").addCallback(print, "x was printed in some other thread!")

4. Generator coroutines

    These are a syntactic wrapper around deferreds. If you yield a
deferred, you will be sent the result if the deferred succeeds, or an
exception if the deferred fails.
    e.g. examples from previous message

I don't see a reason for the first to exist at all, the second one is
kind of nice in some circumstances (see below), but perhaps overused.

I feel like you're railing on the first and second when I'm talking
about the third and fourth. I could be wrong.

>> and possibly elsewhere. Moreover, they're extremely similar to futures, so
>> if one is arcane so is the other.
> I love Futures, they represent a nice simple programming model. But I
> especially love that you can write async code using Futures and
> yield-based coroutines (what you call inlineCallbacks) and never have
> to write an explicit callback function. Ever.

The reason explicit non-deferred callbacks are involved in Twisted is
because of situations in which deferreds are not present, because of
past history in Twisted. It is not at all a limitation of deferreds or
something futures are better at, best as I'm aware.

(In case that's what you're getting at.)

Anyway, one big issue is that generator coroutines can't really
effectively replace callbacks everywhere. Consider the GUI button
example you gave. How do you write that as a coroutine?

I can see it being written like this:

    def mycoroutine(gui):
        while True:
            clickevent = yield gui.mybutton1.on_click()
            # handle clickevent

But that's probably worse than using callbacks.

>> Neither is clearly better or more obvious than the other. If anything
>> I generally find deferred composition more useful than deferred
>> tee-ing, so I feel like composition is the correct base operator, but
>> you could pick another.
> If you're writing long complicated chains of callbacks that benefit
> from these features, IMO you are already doing it wrong. I understand
> that this is a matter of style where I won't be able to convince you.
> But style is important to me, so let's agree to disagree.

This is more than a matter of style, so at least for now I'd like to
hold off on calling it even.

In my day to day silly, synchronous, python code, I do lots of
synchronous requests. For example, it's not unreasonable for me to
want to load two different files from disk, or make several database
interactions, etc. If I want to make this asynchronous, I have to find
a way to execute multiple things that could hypothetically block, at
the same time. If I can't do that easily, then the asynchronous
solution has failed, because its entire purpose is to do everything
that I do synchronously, except without blocking the main thread.

Here's an example with lots of synchronous requests in Django:

def view_paste(request, filekey):
        fileinfo= Pastes.objects.get(key=filekey)
    except DoesNotExist:
        t = loader.get_template('pastebin/error.html')
        return HttpResponse(t.render(Context(dict(error='File does not

    f = open(fileinfo.filename)
    fcontents =
    t = loader.get_template('pastebin/paste.html')
    return HttpResponse(t.render(Context(dict(file=fcontents))))

How many blocking requests are there? Lots. This is, in a word, a
long, complicated chain of synchronous requests. This is also very
similar to what actual django code might look like in some
circumstances. Even if we might think this is unreasonable, some
subset of alteration of this is reasonable. Certainly we should be
able to, say, load multiple (!) objects from the database, and open
the template (possibly from disk), all potentially-blocking

This is inherently a long, complicated chain of requests, whether we
implement it asynchronously or synchronously, or use Deferreds or
Futures, or write it in Java or Python. Some parts can be done at any
time before the end (loader.get_template(...)), some need to be done
in a certain order, and there's branching depending on what happens in
different cases. In order to even write this code _at all_, we need a
way to chain these IO actions together. If we can't chain them
together, we can't produce that final synthesis of results at the end.

We _need_ a pipeline or something computationally equivalent or more
powerful. Results from past "deferred computations" need to be passed
forward into future "deferred computations", in order to implement
this at all.

This is not a style issue, this is an issue of needing to be able to
solve problems that involve more than one computation where the
results of every computation matters somewhere. It's just that in this
case, some of the computations are computed asynchronously.

> I am totally open to learning from Twisted's experience. I hope that
> you are willing to share even the end result might not look like
> Twisted at all -- after all in Python 3.3 we have "yield from" and
> return from a generator and many years of experience with different
> styles of async APIs. In addition to Twisted, there's Tornado and
> Monocle, and then there's the whole greenlets/gevent and
> Stackless/microthreads community that we can't completely ignore. I
> believe somewhere is an ideal async architecture, and I hope you can
> help us discover it.
> (For example, I am very interested in Twisted's experiences writing
> real-world performant, robust reactors.)

For that stuff, you'd have to speak to the main authors of Twisted.
I'm just a twisted user. :(

In the end it really doesn't matter what API you go with. The Twisted
people will wrap it up so that they are compatible, as far as that is

I hope I haven't detracted too much from the main thrust of the
surrounding discussion. Futures/deferreds are a pretty big tangent, so
sorry. I justified it to myself by figuring that it'd probably come up
anyway, somehow, since these are useful abstractions for asynchronous

-- Devin

From jeanpierreda at  Fri Oct 12 06:40:04 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Fri, 12 Oct 2012 00:40:04 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 12:29 AM, Devin Jeanpierre
<jeanpierreda at> wrote:
>> If you're writing long complicated chains of callbacks that benefit
>> from these features, IMO you are already doing it wrong. I understand
>> that this is a matter of style where I won't be able to convince you.
>> But style is important to me, so let's agree to disagree.
> This is more than a matter of style, so at least for now I'd like to
> hold off on calling it even.
-- snip boredom --
> together, we can't produce that final synthesis of results at the end.

Ugh, just realized way after the fact that of course you meant
callbacks, not composition. I feel dumb.

Nevermind that whole segment.

-- Devin

From trent at  Fri Oct 12 06:45:06 2012
From: trent at (Trent Nelson)
Date: Fri, 12 Oct 2012 00:45:06 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Thu, Oct 11, 2012 at 07:40:43AM -0700, Antoine Pitrou wrote:
> On Wed, 10 Oct 2012 20:55:23 -0400
> Trent Nelson <trent at> wrote:
> > 
> >     You could leverage this with kqueue and epoll; have similar threads
> >     set up to simply process I/O independent of the GIL, using the same
> >     facilities that would be used by IOCP-processing threads.
> Would you really win anything by doing I/O in separate threads, while
> doing normal request processing in the main thread?

    If the I/O threads can run independent of the GIL, yes, definitely.
    The whole premise of IOCP is that the kernel takes care of waking
    one of your I/O handlers when data is ready.  IOCP allows that to
    happen completely independent of your application's event loop.

    It really is the best way to do I/O.  The Windows NT design team
    got it right from the start.  The AIX and Solaris implementations
    are semantically equivalent to Windows, without the benefit of
    automatic thread pool management (and a few other optimisations).

    On Linux and BSD, you could get similar functionality by spawning
    I/O threads that could also run independent of the GIL.  They would
    differ from the IOCP worker threads in the sense that they all have
    their own little event loops around epoll/kqueue+timeout.  i.e. they
    have to continually ask "is there anything to do with this set of
    fds", then process the results, then manage set synchronisation.

    IOCP threads, on the other hand, wait for completion of something
    that has already been requested.  The thread body implementation is
    significantly simpler, and no synchronisation primitives are needed.

> That said, the idea of a common API architected around async I/O,
> rather than non-blocking I/O, sounds interesting at least theoretically.

    It's the best way to do it.  There should really be a libevent-type
    library (libiocp?) that leverages IOCP where possible, and fakes it
    when not using a half-sync/half-async pattern with threads and epoll
    or kqueue on Linux and FreeBSD, falling back to processes and poll
    on everything else (NetBSD, OpenBSD and HP-UX (the former two not
    having robust-enough pthread implementations, the latter not having
    anything better than select or poll)).

    However, given that the best IOCP implementations are a) Windows by
    a huge margin, and then b) Solaris and AIX in equal, distant second
    place, I can't see that happening any time soon.

    (Trying to use IOCP in the reactor fashion described above for epoll
     and kqueue is far more limiting than having an IOCP-oriented API
     and faking it for platforms where native support isn't available.)

> Maybe all those outdated Snakebite Operating Systems are useful for
> something after all. ;-P

    All the operating systems are the latest version available!
    In addition, there's also a Solaris 9 and HP-UX 11iv2 box.
    The hardware, on the other hand... not so new in some cases.


From solipsis at  Fri Oct 12 09:14:54 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 12 Oct 2012 09:14:54 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
References: <>
Message-ID: <>

On Fri, 12 Oct 2012 00:29:05 -0400
Devin Jeanpierre <jeanpierreda at>
> These are the examples I remember mentioned in the talk:
> - (not very twistedish
> at all, ill-liked by the speaker)
> - (maybe
> not a good example, mochikit tries to be "python in JS")
> -
> - (also includes an explanation of why
> the author likes deferreds)

Mochikit has been dead for years.

As for the others, just because they are called "Deferred" doesn't mean
they are the same thing. None of them seems to look like Twisted's
Deferred abstraction.

> The reason explicit non-deferred callbacks are involved in Twisted is
> because of situations in which deferreds are not present, because of
> past history in Twisted. It is not at all a limitation of deferreds or
> something futures are better at, best as I'm aware.

A Deferred can only be called once, but a dataReceived method can be
called any number of times. So you can't use a Deferred for
dataReceived unless you introduce significant hackery.

> Anyway, one big issue is that generator coroutines can't really
> effectively replace callbacks everywhere. Consider the GUI button
> example you gave. How do you write that as a coroutine?
> I can see it being written like this:
>     def mycoroutine(gui):
>         while True:
>             clickevent = yield gui.mybutton1.on_click()
>             # handle clickevent
> But that's probably worse than using callbacks.

Agreed. And that's precisely because your GUI button handler is a
dataReceived-alike :-)



Software development and contracting:

From solipsis at  Fri Oct 12 09:18:25 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 12 Oct 2012 09:18:25 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
References: <>
Message-ID: <>

On Fri, 12 Oct 2012 09:14:54 +0200
Antoine Pitrou <solipsis at> wrote:

> On Fri, 12 Oct 2012 00:29:05 -0400
> Devin Jeanpierre <jeanpierreda at>
> wrote:
> > 
> > These are the examples I remember mentioned in the talk:
> > 
> > - (not very twistedish
> > at all, ill-liked by the speaker)
> > - (maybe
> > not a good example, mochikit tries to be "python in JS")
> > -
> > - (also includes an explanation of why
> > the author likes deferreds)
> Mochikit has been dead for years.
> As for the others, just because they are called "Deferred" doesn't mean
> they are the same thing. None of them seems to look like Twisted's
> Deferred abstraction.

Correction: actually, some of them do :-) I should have looked a bit



Software development and contracting:

From _ at  Fri Oct 12 11:25:41 2012
From: _ at (Laurens Van Houtven)
Date: Fri, 12 Oct 2012 11:25:41 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 11, 2012 at 11:18 PM, Guido van Rossum <guido at> wrote:

> > If there's one take away idea from async-pep, it's reusable protocols.
> Is there a newer version that what's on
> ? It seems to be missing any
> specific proposals, after spending a lot of time giving a rationale
> and defining some terms. The version on
> doesn't seem to be any more complete.


If I had to change it today, I'd throw out consumers and producers and just
stick to a protocol API.

Do you feel that there should be less talk about rationale?

> > The PEP should probably be a number of PEPs. At first sight, it seems
> that
> > this number is at least four:
> >
> > 1. Protocol and transport abstractions, making no mention of
> asynchronous IO
> > (this is what I want 3153 to be, because it's small, manageable, and
> > virtually everyone appears to agree it's a fantastic idea)
> But the devil is in the details. *What* specifically are you
> proposing? How would you write a protocol handler/parser without any
> reference to I/O? Most protocols are two-way streets -- you read some
> stuff, and you write some stuff, then you read some more. (HTTP may be
> the exception here, if you don't keep the connection open.)

It's not that there's *no* reference to IO: it's just that that reference
is abstracted away in data_received and the protocol's transport object,
just like Twisted's IProtocol.

> > 2. A base reactor interface
> I agree that this should be a separate PEP. But I do think that in
> practice there will be dependencies between the different PEPs you are
> proposing.


> > 3. A way of structuring callbacks: probably deferreds with a built-in
> > inlineCallbacks for people who want to write synchronous-looking code
> with
> > explicit yields for asynchronous procedures
> Your previous two ideas sound like you're not tied to backward
> compatibility with Tornado and/or Twisted (not even via an adaptation
> layer). Given that we're talking Python 3.4 here that's fine with me
> (though I think we should be careful to offer a path forward for those
> packages and their users, even if it means making changes to the
> libraries).

I'm assuming that by previous ideas you mean points 1, 2: protocol
interface + reactor interface.

I don't see why twisted's IProtocol couldn't grow an adapter for stdlib
Protocols. Ditto for Tornado. Similarly, the reactor interface could be
*provided* (through a fairly simple translation layer) by different
implementations, including twisted.

> But Twisted Deferred is pretty arcane, and I would much
> rather not use it as the basis of a forward-looking design. I'd much
> rather see what we can mooch off PEP 3148 (Futures).

I think this needs to be addressed in a separate mail, since more stuff has
been said about deferreds in this thread.

> > 4+ adapting the stdlib tools to using these new things
> We at least need to have an idea for how this could be done. We're
> talking serious rewrites of many of our most fundamental existing
> synchronous protocol libraries (e.g. httplib, email, possibly even
> io.TextWrapper), most of which have had only scant updates even
> through the Python 3 transition apart from complications to deal with
> the bytes/str dichotomy.

I certainly agree that this is a very large amount of work. However, it has
obvious huge advantages in terms of code reuse. I'm not sure if I
understand the technical barrier though. It should be quite easy to create
a blocking API with a protocol implementation that doesn't care; just call
data_received with all your data at once, and presto! (Since transports in
general don't provide guarantees as to how bytes will arrive, existing
Twisted IProtocols have to do this already anyway, and that seems to work

> > Re: forward path for existing asyncore code. I don't remember this being
> > raised as an issue. If anything, it was mentioned in passing, and I think
> > the answer to it was something to the tune of "asyncore's API is broken,
> > fixing it is more important than backwards compat". Essentially I agree
> with
> > Guido that the important part is an upgrade path to a good third-party
> > library, which is the part about asyncore that REALLY sucks right now.
> I have the feeling that the main reason asyncore sucks is that it
> requires you to subclass its Dispatcher class, which has a rather
> treacherous interface.

There's at least a few others, but sure, that's an obvious one. Many of the
objections I can raise however don't matter if there's already an *existing
working solution*. I mean, sure, it can't do SSL, but if you have code that
does what you want right now, then obviously SSL isn't actually needed.

> > Regardless, an API upgrade is probably a good idea. I'm not sure if it
> > should go in the first PEP: given the separation I've outlined above
> (which
> > may be too spread out...), there's no obvious place to put it besides it
> > being a new PEP.
> Aren't all your proposals API upgrades?

Sorry, that was incredibly poor wording. I meant something more of an
adapter: an upgrade path for existing asyncore code to new and shiny 3153

> > Re base reactor interface: drawing maximally from the lessons learned in
> > twisted, I think IReactorCore (start, stop, etc), IReactorTime (call
> later,
> > etc), asynchronous-looking name lookup, fd handling are the important
> parts.
> That actually sounds more concrete than I'd like a reactor interface
> to be. In the App Engine world, there is a definite need for a
> reactor, but it cannot talk about file descriptors at all -- all I/O
> is defined in terms of RPC operations which have their own (several
> layers of) async management but still need to be plugged in to user
> code that might want to benefit from other reactor functionality such
> as scheduling and placing a call at a certain moment in the future.

I have a hard time understanding how that would work well outside of
something like GAE. IIUC, that level of abstraction was chosen because it
made sense for GAE (and I don't disagree), but I'm not sure it makes sense

In this example, where would eg the select/epoll/whatever calls happen? Is
it something that calls the reactor that then in turn calls whatever?

 > call_every can be implemented in terms of call_later on a separate
> object,
> > so I think it should be (eg twisted.internet.task.LoopingCall). One thing
> > that is apparently forgotten about is event loop integration. The prime
> way
> > of having two event loops cooperate is *NOT* "run both in parallel", it's
> > "have one call the other". Even though not all loops support this, I
> think
> > it's important to get this as part of the interface (raise an exception
> for
> > all I care if it doesn't work).
> This is definitely one of the things we ought to get right. My own
> thoughts are slightly (perhaps only cosmetically) different again:
> ideally each event loop would have a primitive operation to tell it to
> run for a little while, and then some other code could tie several
> event loops together.

As an API, that's pretty close to Twisted's IReactorCore.iterate, I think.
It'd work well enough. The issue is only with event loops that don't
cooperate so well.

Possibly the primitive operation would be something like "block until
> either you've got one event ready, or until a certain time (possibly
> 0) has passed without any events, and then give us the events that are
> ready and a lower bound for when you might have more work to do" -- or
> maybe instead of returning the event(s) it could just call the
> associated callback (it might have to if it is part of a GUI library
> that has callbacks written in C/C++ for certain events like screen
> refreshes).
> Anyway, it would be good to have input from representatives from Wx,
> Qt, Twisted and Tornado to ensure that the *functionality* required is
> all there (never mind the exact signatures of the APIs needed to
> provide all that functionality).

>  --
> --Guido van Rossum (

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From _ at  Fri Oct 12 11:29:06 2012
From: _ at (Laurens Van Houtven)
Date: Fri, 12 Oct 2012 11:29:06 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

I'm not quite sure why Deferreds + @inlineCallbacks is more complicated
than Futures + coroutines. They seem, at least from a high level
perspective, quite similar. You mention that you can both attach callbacks
and use them in coroutines: deferreds do pretty much exactly the same thing
(that is, at least there's something to translate your coroutine into a
sequence of callbacks/errbacks).

If the arcane part of deferreds is from people writing ridiculous
errback/callback chains, then I understand. Unfortunately people will write
terrible code.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From sturla at  Fri Oct 12 12:39:37 2012
From: sturla at (Sturla Molden)
Date: Fri, 12 Oct 2012 12:39:37 +0200
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On 12.10.2012 03:03, Steven D'Aprano wrote:

 > Any half-decent processor supports the IEEE-754 standard. If it doesn't,
 > it's broken by design.
 > Even in user-space, you're not giving up that much speed in practical
 > terms, at least not for my needs. The new decimal module in Python 3.3 is
 > less than a factor of 10 times slower than Python's floats, which 
makes it
 > pretty much instantaneous to my mind :)

I will not have any effect on the flops rate. The other stuff the 
interpreter must do when using floats (allocating and deleting float 
objects on the heap, initializing new objects, etc.) will dominate the 
run-time performance. Even a simple check for divide-by-zero (as we have 
today) will be more expensive than using another numerical context 
inside the hardware.


From jeanpierreda at  Fri Oct 12 14:44:39 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Fri, 12 Oct 2012 08:44:39 -0400
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 3:14 AM, Antoine Pitrou <solipsis at> wrote:
> Mochikit has been dead for years.

>From the front page: "MochiKit is "feature complete" at 1.4 and not
currently in active development. It has done what we've needed it to
do for a number of years so we haven't bothered to make any major
changes to it."

Last update to the github repository was a few months ago.

That said, looking at their APIs now, I'm pretty sure mochikit was not
in that presentation. Its API isn't jQuery-like.

> As for the others, just because they are called "Deferred" doesn't mean
> they are the same thing. None of them seems to look like Twisted's
> Deferred abstraction.

They have separate callbacks for error and success, which are passed
values. That is the same. The callback chains are formed from
sequences of deferreds. That's different. If a callback returns a
deferred, then the rest of the chain is only called once that deferred
resolves -- that's the same, and super important.

There's some API differences, like .addCallbacks() --> .then(); and
.callback() --> .resolve(). And IIRC jQuery had other differences, but
maybe it's just that you use .pipe() to chain deferreds because
.then() returns a Promise instead of a Deferred? I don't remember what
was weird about jQuery, it's been a while since that talk. :(

>> The reason explicit non-deferred callbacks are involved in Twisted is
>> because of situations in which deferreds are not present, because of
>> past history in Twisted. It is not at all a limitation of deferreds or
>> something futures are better at, best as I'm aware.
> A Deferred can only be called once, but a dataReceived method can be
> called any number of times. So you can't use a Deferred for
> dataReceived unless you introduce significant hackery.

Haha, oops! I was being dumb and only thinking of minor cases when
callbacks are used, rather than major cases.

Some people complain that Twisted's protocols (and dataReceived)
should be like that GUI button example, though. Not major hackery,
just somewhat nasty and bug-prone.

-- Devin

From syrion at  Fri Oct 12 14:45:49 2012
From: syrion at (Blake Hyde)
Date: Fri, 12 Oct 2012 08:45:49 -0400
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

I'm a Python developer rather than a developer of Python, but I'd like to ask a
question about this option (and implicitly vote against it, I suppose); if you
specialize a method name, such as .pathjoin, aren't you implying that methods
must be unambiguous even across types and classes?  This seems negative.  Even
if .join is already used for strings, it also makes sense for this use case.

Of course, the proposed syntactic sugar options (operator overloading) seems
more pathological than either of the method-based options, so I suppose you
could consider my votes as -1 to everything else, +.5 to .pathjoin, and +1 to

On Mon, Oct 8, 2012 at 2:54 PM, Guido van Rossum <guido at> wrote:
> I don't like any of those; I'd vote for another regular method, maybe
> p.pathjoin(q).
> On Mon, Oct 8, 2012 at 11:47 AM, Antoine Pitrou <solipsis at> wrote:
> >
> > Hello,
> >
> > Since there has been some controversy about the joining syntax used in
> > PEP 428 (filesystem path objects), I would like to run an informal poll
> > about it. Please answer with +1/+0/-0/-1 for each proposal:
> >
> > - `p[q]` joins path q to path p
> > - `p + q` joins path q to path p
> > - `p / q` joins path q to path p
> > - `p.join(q)` joins path q to path p
> >
> > (you can include a rationale if you want, but don't forget to vote :-))
> >
> > Thank you
> >
> > Antoine.
> >
> >
> > --
> > Software development and contracting:
> >
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at
> >
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From g.brandl at  Fri Oct 12 18:12:26 2012
From: g.brandl at (Georg Brandl)
Date: Fri, 12 Oct 2012 18:12:26 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k59ff3$i5i$>

Am 12.10.2012 14:45, schrieb Blake Hyde:
> I'm a Python developer rather than a developer of Python, but I'd like to ask a
> question about this option (and implicitly vote against it, I suppose); if you
> specialize a method name, such as .pathjoin, aren't you implying that methods
> must be unambiguous even across types and classes?  This seems negative.  Even
> if .join is already used for strings, it also makes sense for this use case.

Of course different classes can have methods of the same name.

The issue here is that due to the similarity (and interchangeability) of path
objects and strings it is likely that people get them mixed up every now and
then, and if .join() works on both objects the failure mode (strange result
from str.join when you expected path.join) is horribly confusing.

It's the same argument against the "+" operator.  (Apart from the other downside
that it will act differently depending on *two* objects, i.e. both operands.)
In contrast, the "/" operator is not defined on strings, but on numbers, and
the both the confusion likelihood and failure mode of mixing numbers and
strings are much less severe.

It's really kind of the same reason why integer floor division was awkward
with "/", and has been changed to "//" in Python 3.


From ethan at  Fri Oct 12 18:12:02 2012
From: ethan at (Ethan Furman)
Date: Fri, 12 Oct 2012 09:12:02 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Greg Ewing wrote:
> Ronald Oussoren wrote:
>> neither statvs, statvfs,  nor pathconf seem to be able to tell if a 
>> filesystem is case insensitive.
> Even if they could, you wouldn't be entirely out of the woods,
> because different parts of the same path can be on different
> file systems...
> But how important is all this anyway? I'm trying to think of
> occasions when I've wanted to compare two entire paths for
> equality, and I can't think of *any*.

Well, while I haven't had to compare the /entire/ path, I have had to 
compare (and sort) the filename portion.  And since the SMB share uses 
lower-case, and our legacy FoxPro code writes upper-case, and files get 
copied from SMB to the local Windows drive, having the case-insensitive 
compare option in Path makes my life much easier.


From ethan at  Fri Oct 12 18:27:48 2012
From: ethan at (Ethan Furman)
Date: Fri, 12 Oct 2012 09:27:48 -0700
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <k59ff3$i5i$>
References: <>	<>	<>
Message-ID: <>

Georg Brandl wrote:
> Am 12.10.2012 14:45, schrieb Blake Hyde:
>> I'm a Python developer rather than a developer of Python, but I'd like to ask a
>> question about this option (and implicitly vote against it, I suppose); if you
>> specialize a method name, such as .pathjoin, aren't you implying that methods
>> must be unambiguous even across types and classes?  This seems negative.  Even
>> if .join is already used for strings, it also makes sense for this use case.
> Of course different classes can have methods of the same name.
> The issue here is that due to the similarity (and interchangeability) of path
> objects and strings it is likely that people get them mixed up every now and
> then, and if .join() works on both objects the failure mode (strange result
> from str.join when you expected path.join) is horribly confusing.

I don't understand the "horribly confusing" part.  Sure, when I got them 
mixed up and ended up with a plain ol' string instead of a really cool 
Path it took a moment to figure out where I had made the error, but the 
traceback of "AttributeError: 'str' object has no attribute 'path'" left 
absolutely no room for confusion as to what the problem was.


From guido at  Fri Oct 12 18:41:10 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 09:41:10 -0700
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

I am going to start some new threads on this topic, to avoid going
over 100 messages. Topics will be roughly:

- reactors
- protocol implementations
- Twisted (esp. Deferred)
- Tornado
- yield from vs. Futures

It may be a while (hours, not days).

--Guido van Rossum (

From dickinsm at  Fri Oct 12 19:26:18 2012
From: dickinsm at (Mark Dickinson)
Date: Fri, 12 Oct 2012 18:26:18 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

On Thu, Oct 11, 2012 at 2:20 AM, Steven D'Aprano <steve at> wrote:
> E.g. log(x) should return -infinity if x underflows from a positive value,
> and a NaN if x underflows from a negative.

IEEE 754 disagrees. :-)  Both log(-0.0) and log(0.0) are required to
return -infinity (and/or signal the divideByZero exception).

And as for sqrt(-0.0) returning -0.0...  Grr.  I've never understood
the motivation for that one, especially as it disagrees with the usual
recommendations for complex square root (where the real part of the
result *always* has its sign bit cleared).


From guido at  Fri Oct 12 20:13:23 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 11:13:23 -0700
Subject: [Python-ideas] The async API of the future: Reactors
Message-ID: <>

[This is the first spin-off thread from "asyncore: included batteries
don't fit"]

On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell <ben at> wrote:
> On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum <guido at> wrote:
>>> Re base reactor interface: drawing maximally from the lessons learned in
>>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
>>> etc), asynchronous-looking name lookup, fd handling are the important parts.
>> That actually sounds more concrete than I'd like a reactor interface
>> to be. In the App Engine world, there is a definite need for a
>> reactor, but it cannot talk about file descriptors at all -- all I/O
>> is defined in terms of RPC operations which have their own (several
>> layers of) async management but still need to be plugged in to user
>> code that might want to benefit from other reactor functionality such
>> as scheduling and placing a call at a certain moment in the future.
> So are you thinking of something like
> reactor.add_event_listener(event_type, event_params, func)?  One thing
> to keep in mind is that file descriptors are somewhat special (at
> least in a level-triggered event loop), because of the way the event
> will keep firing until the socket buffer is drained or the event is
> unregistered.  I'd be inclined to keep file descriptors in the
> interface even if they just raise an error on app engine, since
> they're fairly fundamental to the (unixy) event loop.  On the other
> hand, I don't have any experience with event loops outside the
> unix/network world so I don't know what other systems might need for
> their event loops.

Hmm... This is definitely an interesting issue. I'm tempted to believe
that it is *possible* to change every level-triggered setup into an
edge-triggered setup by using an explicit loop -- but I'm not saying
it is a good idea. In practice I think we need to support both equally
well, so that the *app* can decide which paradigm to use. E.g. if I
were to implement an HTTP server, I might use level-triggered for the
"accept" call on the listening socket, but edge-triggered for
everything else. OTOH someone else might prefer a buffered stream
abstraction that just keeps filling its read buffer (and draining its
write buffer) using level-triggered callbacks, at least up to a
certain buffer size -- we have to be robust here and make it
impossible for an evil client to fill up all our memory without our

I'm not at all familiar with the Twisted reactor interface. My own
design would be along the following lines:

- There's an abstract Reactor class and an abstract Async I/O object
class. To get a reactor to call you back, you must give it an I/O
object, a callback, and maybe some more stuff. (I have gone back and
like passing optional args for the callback, rather than requiring
lambdas to create closures.) Note that the callback is *not* a
designated method on the I/O object! In order to distinguish between
edge-triggered and level-triggered, you just use a different reactor
method. There could also be a reactor method to schedule a "bare"
callback, either after some delay, or immediately (maybe with a given
priority), although such functionality could also be implemented
through magic I/O objects.

- In systems supporting file descriptors, there's a reactor
implementation that knows how to use select/poll/etc., and there are
concrete I/O object classes that wrap file descriptors. On Windows,
those would only be socket file descriptors. On Unix, any file
descriptor would do. To create such an I/O object you would use a
platform-specific factory. There would be specialized factories to
create e.g. listening sockets, connections, files, pipes, and so on.

- In systems like App Engine that don't support async I/O on file
descriptors at all, the constructors for creating I/O objects for disk
files and connection sockets would comply with the interface but fake
out almost everything (just like today, using httplib or httplib2 on
App Engine works by adapting them to a "urlfetch" RPC request).

>>> call_every can be implemented in terms of call_later on a separate object,
>>> so I think it should be (eg twisted.internet.task.LoopingCall). One thing
>>> that is apparently forgotten about is event loop integration. The prime way
>>> of having two event loops cooperate is *NOT* "run both in parallel", it's
>>> "have one call the other". Even though not all loops support this, I think
>>> it's important to get this as part of the interface (raise an exception for
>>> all I care if it doesn't work).
>> This is definitely one of the things we ought to get right. My own
>> thoughts are slightly (perhaps only cosmetically) different again:
>> ideally each event loop would have a primitive operation to tell it to
>> run for a little while, and then some other code could tie several
>> event loops together.
>> Possibly the primitive operation would be something like "block until
>> either you've got one event ready, or until a certain time (possibly
>> 0) has passed without any events, and then give us the events that are
>> ready and a lower bound for when you might have more work to do" -- or
>> maybe instead of returning the event(s) it could just call the
>> associated callback (it might have to if it is part of a GUI library
>> that has callbacks written in C/C++ for certain events like screen
>> refreshes).
> That doesn't work very well - while one loop is waiting for its
> timeout, nothing can happen on the other event loop.  You have to
> switch back and forth frequently to keep things responsive, which is
> inefficient.  I'd rather give each event loop its own thread; you can
> minimize the thread-synchronization concerns by picking one loop as
> "primary" and having all the others just pass callbacks over to it
> when their events fire.

That's a good point. I suppose on systems that support both networking
and GUI events, in my design these would use different I/O objects
(created using different platform-specific factories) and the shared
reactor API would sort things out based on the type of I/O object
passed in to it.

Note that many GUI events would be level-triggered, but sometimes
using the edge-triggered paradigm can work well too: e.g. I imagine
that writing code to draw a curve following the mouse as long as a
button is pressed might be conveniently written as a loop of the form

def on_mouse_press(x, y, buttons):
  <set up polygon starting current x, y>
  while True:
    x, y, buttons = yield <get mouse event>
    if not buttons:
    <extend polygon to x, y>
  <finish polygon>

which itself is registered as a level-triggered handler for mouse
presses. (Dealing with multiple buttons is left as an exercise. :-)

--Guido van Rossum (

From ncoghlan at  Fri Oct 12 20:26:44 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 13 Oct 2012 04:26:44 +1000
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
	<k59ff3$i5i$> <>
Message-ID: <>

On Sat, Oct 13, 2012 at 2:27 AM, Ethan Furman <ethan at> wrote:
> Georg Brandl wrote:
>> Am 12.10.2012 14:45, schrieb Blake Hyde:
>>> I'm a Python developer rather than a developer of Python, but I'd like to
>>> ask a
>>> question about this option (and implicitly vote against it, I suppose);
>>> if you
>>> specialize a method name, such as .pathjoin, aren't you implying that
>>> methods
>>> must be unambiguous even across types and classes?  This seems negative.
>>> Even
>>> if .join is already used for strings, it also makes sense for this use
>>> case.
>> Of course different classes can have methods of the same name.
>> The issue here is that due to the similarity (and interchangeability) of
>> path
>> objects and strings it is likely that people get them mixed up every now
>> and
>> then, and if .join() works on both objects the failure mode (strange
>> result
>> from str.join when you expected path.join) is horribly confusing.
> I don't understand the "horribly confusing" part.  Sure, when I got them
> mixed up and ended up with a plain ol' string instead of a really cool Path
> it took a moment to figure out where I had made the error, but the traceback
> of "AttributeError: 'str' object has no attribute 'path'" left absolutely no
> room for confusion as to what the problem was.

Now, instead of retrieving an attribute, call str() and send the path
name over a pipe or socket, or save it to a file. Instead of an
immediate error, you'll get a bad path somewhere *else*, and have to
track down where the data corruption came from (which not even be in
the current process, or in a process that was even running on the
current machine).

Making "+" and "Path.join" mean something different from what they
mean when called on strings is, in the specific case of a path
representation, far too likely to lead to data corruption bugs for us
to be happy with allowing it in the standard library. This is one I
think Jason Orendorff's original got right, which is why my
current preference is "just copy and use / and Path.joinpath".


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From solipsis at  Fri Oct 12 20:33:11 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 12 Oct 2012 20:33:11 +0200
Subject: [Python-ideas] The async API of the future: Reactors
References: <>
Message-ID: <>

Hello Guido,

On Fri, 12 Oct 2012 11:13:23 -0700
Guido van Rossum <guido at> wrote:
> OTOH someone else might prefer a buffered stream
> abstraction that just keeps filling its read buffer (and draining its
> write buffer) using level-triggered callbacks, at least up to a
> certain buffer size -- we have to be robust here and make it
> impossible for an evil client to fill up all our memory without our
> approval!

I'd like to know what a sane buffered API for non-blocking I/O may look
like, because right now it doesn't seem to make a lot of sense. At
least this bug is tricky to resolve:

> - There's an abstract Reactor class and an abstract Async I/O object
> class. To get a reactor to call you back, you must give it an I/O
> object, a callback, and maybe some more stuff. (I have gone back and
> like passing optional args for the callback, rather than requiring
> lambdas to create closures.) Note that the callback is *not* a
> designated method on the I/O object!

Why isn't it? In practice, you need several callbacks: in Twisted
parlance, you have dataReceived but also e.g. ConnectionLost
(depending on the transport, you may even imagine other callbacks, for
example for things happening on the TLS layer?).

> - In systems supporting file descriptors, there's a reactor
> implementation that knows how to use select/poll/etc., and there are
> concrete I/O object classes that wrap file descriptors. On Windows,
> those would only be socket file descriptors. On Unix, any file
> descriptor would do.

Windows *is* able to do async I/O on things other than sockets (see the
discussion about IOCP). It's just that the Windows implementation of
select() (the POSIX function call) is limited to sockets.



Software development and contracting:

From solipsis at  Fri Oct 12 20:34:23 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 12 Oct 2012 20:34:23 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
References: <>
	<k59ff3$i5i$> <>
Message-ID: <>

On Sat, 13 Oct 2012 04:26:44 +1000
Nick Coghlan <ncoghlan at> wrote:
> This is one I
> think Jason Orendorff's original got right, which is why my
> current preference is "just copy and use / and Path.joinpath".

This is my current preference too. I don't like joinpath(), but as long
as there is an operator to do the same thing I don't care :-)



Software development and contracting:

From ethan at  Fri Oct 12 20:37:13 2012
From: ethan at (Ethan Furman)
Date: Fri, 12 Oct 2012 11:37:13 -0700
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>	<>	<>	<k59ff3$i5i$>	<>
Message-ID: <>

Nick Coghlan wrote:
> Making "+" and "Path.join" mean something different from what they
> mean when called on strings is, in the specific case of a path
> representation, far too likely to lead to data corruption bugs for us
> to be happy with allowing it in the standard library. This is one I
> think Jason Orendorff's original got right, which is why my
> current preference is "just copy and use / and Path.joinpath".

Okay, that makes sense.

I think we should settle on one of the possibilities that does /not/ 
duplicate the word 'path', however.  That's one of those things that 
drives me nuts.  ;)  (It's a Path object -- of /course/ it's joining 
path stuff!)


From at  Fri Oct 12 20:42:38 2012
From: at (Joshua Landau)
Date: Fri, 12 Oct 2012 19:42:38 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

On 11 October 2012 02:20, Steven D'Aprano <steve at> wrote:

> On 11/10/12 09:05, Joshua Landau wrote:
>  After re-re-reading this thread, it turns out one *(1)* post and two
>> *(2)* answers
>> to that post have covered a topic very similar to the one I have raised.
>> All of the others, to my understanding, do not dwell over the fact
>> that *float("nan") is not float("nan")* .
> That's no different from any other float.
> py> float('nan') is float('nan')
> False
> py> float('1.5') is float('1.5')
> False
> Floats are not interned or cached, although of course interning is
> implementation dependent and this is subject to change without notice.
> For that matter, it's true of *nearly all builtins* in Python. The
> exceptions being bool(obj) which returns one of two fixed instances,
> and int() and str(), where *some* but not all instances are cached.

>>> float(1.5) is float(1.5)
>>> float("1.5") is float("1.5")

Confusing re-use of identity strikes again. Can anyone care to explain what
causes this? I understand float(1.5) is likely to return the inputted
float, but that's as far as I can reason.

What I was saying, though, is that all other posts assumed equality between
two different NaNs should be the same as identity between a NaN and itself.
This is what I'm really asking about, I guess.

>  Response 1:
>> This implies that you want to differentiate between -0.0 and +0.0. That is
>> bad.
>> My response:
>> Why would I want to do that?
> If you are doing numeric work, you *should* differentiate between -0.0
> and 0.0. That's why the IEEE 754 standard mandates a -0.0.
> Both -0.0 and 0.0 compare equal, but they can be distinguished (although
> doing so is tricky in Python). The reason for distinguishing them is to
> distinguish between underflow to zero from positive or negative values.
> E.g. log(x) should return -infinity if x underflows from a positive value,
> and a NaN if x underflows from a negative.


Can you give me a more explicit example? When would you not *want* f(-0.0)
to always return the result of f(0.0)? [aka, for -0.0 to warp into 0.0 on
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Fri Oct 12 21:18:34 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 12:18:34 -0700
Subject: [Python-ideas] The async API of the future: yield-from
Message-ID: <>

[This is the second spin-off thread from "asyncore: included batteries
don't fit"]

On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
>> It does bother me somehow that you're not using .send() and yield
>> arguments at all. I notice that you have a lot of three-line code
>> blocks like this:
>>       block_for_reading(sock)
>>       yield
>>       data = sock.recv(1024)

> I wouldn't say I have a "lot". In the spamserver, there are really
> only three -- one for accepting a connection, one for reading from
> a socket, and one for writing to a socket. These are primitive
> operations that would be provided by an async socket library.

Hm. In such a small sample program, three near-identical blocks is a lot!

> Generally, all the yields would be hidden inside primitives like
> this. Normally, user code would never need to use 'yield', only
> 'yield from'.
> This probably didn't come through as clearly as it might have in my
> tutorial. Part of the reason is that at the time I wrote it, I was
> having to manually expand yield-froms into for-loops, so I was
> reluctant to use any more of them than I needed to. Also, yield-from
> was a new and unfamiliar concept, and I didn't want to scare people
> by overusing it. These considerations led me to push some of the
> yields slightly further up the layer stack than they could be.

But the fact remains that you can't completely hide these yields --
the best you can do is replace them with a single yield-from.

>> The general form seems to be:
>>       arrange for a callback when some operation can be done without blocking
>>       yield
>>       do the operation
>> This seems to be begging to be collapsed into a single line, e.g.
>>       data = yield sock.recv_async(1024)

> I'm not sure how you're imagining that would work, but whatever
> it is, it's wrong -- that just doesn't make sense.

That's a strong statement! It makes a lot of sense in a world using
Futures and a Future-aware trampoline/scheduler, instead of yield-from
and bare generators. I can see however that you don't like it in the
yield-from world you're envisioning, and how it would be confusing
there. I'll get back to this in a bit.

> What *would* make sense is
>    data = yield from sock.recv_async(1024)
> with sock.recv_async() being a primitive that encapsulates the
> block/yield/process triplet.

Right, that's how you would spell it.

>> (I would also prefer to see the socket wrapped in an object that makes
>> it hard to accidentally block.)

> It would be straightforward to make the primitives be methods of a
> socket wrapper object. I only used functions in the tutorial in the
> interests of keeping the amount of machinery to a bare minimum.


>> But surely there's still a place for send() and other PEP 342 features?

> In the wider world of generator usage, yes. If you have a
> generator that it makes sense to send() things into, for
> example, and you want to factor part of it out into another
> function, the fact that yield-from passes through sent values
> is useful.

But the only use for send() on a generator is when using it as a
coroutine for a concurrent tasks system -- send() really makes no
sense for generators used as iterators. And you're claiming, it seems,
that you prefer yield-from for concurrent tasks.

> But we're talking about a very specialised use of generators
> here, and so far I haven't thought of a use for sent or yielded
> values in this context that can't be done in a more straightforward
> way by other means.
> Keep in mind that a value yielded by a generator being used as
> part of a coroutine is *not* seen by code calling it with
> yield-from. Rather, it comes out in the inner loop of the
> scheduler, from the next() call being used to resume the
> coroutine. Likewise, any send() call would have to be made
> by the scheduler, not the yield-from caller.

I'm very much aware of that. There is a *huge* difference between
yield-from and yield.

However, now that I've implemented a substantial library (NDB, which
has thousands of users in the App Engine world, if not hundreds of
thousands), I feel that "value = yield <something that returns a
Future>" is quite a good paradigm, and the only part of PEP 380 I'm
really looking forward to embracing (once App Engine supports Python
3.3) is the option to return a value from a generator -- which my
users currently have to spell as "raise ndb.Return(<value>)".

> So, the send/yield channel is exclusively for communication
> with the *scheduler* and nothing else. Under the old way of
> doing generator-based coroutines, this channel was used to
> simulate a call stack by yielding 'call' and 'return'
> instructions that the scheduler interpreted. But all that
> is now taken care of by the yield-from mechanism, and there
> is nothing left for the send/yield channel to do.

I understand that's the state of the world that you're looking forward
to. However I'm slightly worried that in practice there are some
issues to be resolved. One is what to do with operations directly
implemented in C. It would be horrible to require C to create a fake
generator. It would be mildly nasty to have to wrap these all in
Python code just so you can use them with yield-from. Fortunately an
iterator whose final __next__() raises StopIteration(<value>) works in
the latest Python 3.3 (it didn't work in some of the betas IIRC).

>> my users sometimes want to
>> treat something as a coroutine but they don't have any yields in it
>> def caller():
>>   data = yield from reader()
>> def reader():
>>     return 'dummy'
>>     yield
>> works, but if you drop the yield it doesn't work. With a decorator I
>> know how to make it work either way.

> If you're talking about a decorator that turns a function
> into a generator, I can't see anything particularly headachish
> about that. If you mean something else, you'll have to elaborate.

Well, I'm talking about a decorator that you *always* apply, and which
does nothing (or very little) when wrapping a generator, but adds
generator behavior when wrapping a non-generator function.

Anyway, I am trying to come up with a table comparing Futures and your
yield-from-using generators. I'm basing this on a subset of the PEP
3148 API, and I'm not presuming threads -- I'm just looking at the
functionality around getting and setting callbacks, results, and
exceptions. My reference is actually based on NDB, but the API there
differs from PEP 3148 in uninteresting ways, so I'll use the PEP 3148
method names.

(1) Calling an async operation and waiting for its result, using yield

  result = yield some_async_op(args)

  result = yield from some_async_op(args)

(2) Setting the result of an async operation

  f.set_result(value)  # From any callback

  return value  # From the outermost generator

(3) Handling an exception

    result = yield some_async_op(args)
  except MyException:
    <handle exception>

    result = yield from some_async_op(args)
  except MyException:
    <handle exception>

Note: with yield-from, the tracebacks for unhandled exceptions are
possibly prettier.

(4) Raising an exception as the outcome of an async operation

  f.set_exception(<Exception instance>)

  raise <Exception instance or class>  # From any of the generators

Note: with Futures, the traceback also needs to be stored; in Python 3
it is stored on the Exception instance's __traceback__ attribute. But
when letting exceptions bubble through multiple levels of nested
calls, you must do something special to ensure the traceback looks
right to the end user.

(5) Having one async operation invoke another async operation

  def outer(args):
    res = yield inner(args)
    return res

  def outer(args):
    res = yield from inner(args)
    return res

Note: I'm including this because in the Futures case, each level of
yield requires the creation of a separate Future. In practice this
requires decorating all async functions. And also as a lead-in to the
next item.

(6) Spawning off multiple async subtasks

  f1 = subtask1(args1)  # Note: no yield!!!
  f2 = subtask2(args2)
  res1, res2 = yield f1, f2


*** Greg, can you come up with a good idiom to spell concurrency at
this level? Your example only has concurrency in the philosophers
example, but it appears to interact directly with the scheduler, and
the philosophers don't return values. ***

(7) Checking whether an operation is already complete

  if f.done(): ...


(8) Getting the result of an operation multiple times


  f = async_op(args)
  # squirrel away a reference to f somewhere else
  r = yield f
  # ... later, elsewhere
  r = f.result()


(9) Canceling an operation



Note: I haven't needed canceling yet, and I believe Devin said that
Twisted just got rid of it. However some of the JS Deferred
implementations seem to support it.

(10) Registering additional callbacks



Note: this is used in NDB to trigger "hooks" that should run e.g. when
a database write completes. The user's code just writes yield
ent.put_async(); the trigger is automatically called by the Future's
machinery. This also uses (8).

--Guido van Rossum (

From python at  Fri Oct 12 21:19:48 2012
From: python at (MRAB)
Date: Fri, 12 Oct 2012 20:19:48 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<20121009043236.GI27445@ando> <>
Message-ID: <>

On 2012-10-12 19:42, Joshua Landau wrote:
> On 11 October 2012 02:20, Steven D'Aprano <steve at
> <mailto:steve at>> wrote:
>     On 11/10/12 09:05, Joshua Landau wrote:
>         After re-re-reading this thread, it turns out one *(1)* post and two
>         *(2)* answers
>         to that post have covered a topic very similar to the one I have
>         raised.
>         All of the others, to my understanding, do not dwell over the fact
>         that *float("nan") is not float("nan")* .
>     That's no different from any other float.
>     py> float('nan') is float('nan')
>     False
>     py> float('1.5') is float('1.5')
>     False
>     Floats are not interned or cached, although of course interning is
>     implementation dependent and this is subject to change without notice.
>     For that matter, it's true of *nearly all builtins* in Python. The
>     exceptions being bool(obj) which returns one of two fixed instances,
>     and int() and str(), where *some* but not all instances are cached.
>  >>> float(1.5) is float(1.5)
> True

It re-uses an immutable literal:

 >>> 1.5 is 1.5
 >>> "1.5" is "1.5"

and 'float' returns its argument if it's already a float:

 >>> float(1.5) is 1.5


 >>> float(1.5) is float(1.5)

But apart from that, when a new object is created, it doesn't check
whether it's identical to another, except in certain cases such as ints
in a limited range:

 >>> float("1.5") is float("1.5")
 >>> float("1.5") is 1.5
 >>> int("1") is 1

And it's an implementation-specific behaviour.

>  >>> float("1.5") is float("1.5")
> False
> Confusing re-use of identity strikes again. Can anyone care to explain
> what causes this? I understand float(1.5) is likely to return the
> inputted float, but that's as far as I can reason.
> What I was saying, though, is that all other posts assumed equality
> between two different NaNs should be the same as identity between a NaN
> and itself. This is what I'm really asking about, I guess.
>         Response 1:
>         This implies that you want to differentiate between -0.0 and
>         +0.0. That is
>         bad.
>         My response:
>         Why would I want to do that?
>     If you are doing numeric work, you *should* differentiate between -0.0
>     and 0.0. That's why the IEEE 754 standard mandates a -0.0.
>     Both -0.0 and 0.0 compare equal, but they can be distinguished (although
>     doing so is tricky in Python). The reason for distinguishing them is to
>     distinguish between underflow to zero from positive or negative values.
>     E.g. log(x) should return -infinity if x underflows from a positive
>     value,
>     and a NaN if x underflows from a negative.
> Interesting.
> Can you give me a more explicit example? When would you not *want*
> f(-0.0) to always return the result of f(0.0)? [aka, for -0.0 to warp
> into 0.0 on creation]
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From dickinsm at  Fri Oct 12 21:22:37 2012
From: dickinsm at (Mark Dickinson)
Date: Fri, 12 Oct 2012 20:22:37 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

On Fri, Oct 12, 2012 at 7:42 PM, Joshua Landau
< at> wrote:
> Can you give me a more explicit example? When would you not *want* f(-0.0)
> to always return the result of f(0.0)? [aka, for -0.0 to warp into 0.0 on
> creation]

A few examples:

(1) In the absence of exceptions, 1 / 0.0 is +inf, while 1 / -0.0 is
-inf.  So e.g. the function exp(-exp(1/x)) has different values at
-0.0 and 0.0:

>>> from numpy import float64, exp
>>> exp(-exp(1/float64(0.0)))
>>> exp(-exp(1/float64(-0.0)))

(2) For the atan2 function, we have e.g.,

>>> from math import atan2
>>> atan2(0.0, -1.0)
>>> atan2(-0.0, -1.0)

This gives atan2 a couple of nice invariants:  the sign of the result
always matches the sign of the first argument, and atan2(-y, x) ==
-atan2(y, x) for any (non-nan) x and y.

(3) Similarly, for complex math functions (which aren't covered by
IEEE 754, but are standardised in various other languages), it's
sometimes convenient to be able to depend on invariants like e.g.
asin(z.conj()) == asin(z).conj().  Those are only possible if -0.0 and
0.0 are distinguished;  the effect is most visible if you pick values
lying on a branch cut.

>>> from cmath import sin
>>> z = complex(2.0, 0.0)
>>> asin(z).conjugate()
>>> asin(z.conjugate())

You can't take that too far, though:  e.g., it would be nice if
complex multiplication had the property that (z * w).conjugate() was
always the same as z.conjugate() * w.conjugate(), but it's impossible
to keep both that invariant and the commutativity of multiplication.
(E.g., consider the result of complex(1, 1) * complex(1, -1).)


From ethan at  Fri Oct 12 21:23:46 2012
From: ethan at (Ethan Furman)
Date: Fri, 12 Oct 2012 12:23:46 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <k4q3o6$m9q$>
References: <>	<>	<>	<>	<k4pr17$i94$>	<>	<>	<>
Message-ID: <>

Georg Brandl wrote:
> Am 06.10.2012 20:59, schrieb Ethan Furman:
>> Mike Graham wrote:
>>> On Sat, Oct 6, 2012 at 2:39 PM, Ethan Furman <ethan at> wrote:
>>>> Georg Brandl wrote:
>>>>> If you inherit from str, you cannot override any of the operations that
>>>>> str already has (i.e. __add__, __getitem__).
>>>> Is this a 3.x thing?  My 2.x version of Path overrides many of the str
>>>> methods and works just fine.
>>> This is for theoretical/practical reasons, not technical ones.
>> Ah, you mean you can't give them different semantics.  Gotcha.
> Yep.  Not much use being able to pass them directly to APIs expecting strings
> if they can't operate on them like any other string :)

Which is why I would like to see Path based on str, despite Guido's 
misgivings.  (Yes, I know I'm probably tilting at windmills here...)

If Path is string based we get backwards compatibility with all the os 
and third-party tools that expect and use strings; this would allow a 
gentle migration to using them, as opposed to the all-or-nothing if Path 
is a completely new type.  This would be especially useful for accessing 
the functions that haven't been added on to Path yet.

If Path is string based some questions evaporate:  '+'?  It does what 
str does; iterate? Just like str (we can make named methods for the 
iterations that we want, such as Path.dirs).

If Path is string based we still get to use '/' to combine them together 
(I think that was the preference from the poll... but that could be 
wishful thinking on my part. ;) )  Even Path.joinpath would make sense 
to differentiate from Path.join (which is really str.join).

Anyway, my two cents worth.

From guido at  Fri Oct 12 21:32:11 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 12:32:11 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 11:33 AM, Antoine Pitrou <solipsis at> wrote:
> On Fri, 12 Oct 2012 11:13:23 -0700
> Guido van Rossum <guido at> wrote:
>> OTOH someone else might prefer a buffered stream
>> abstraction that just keeps filling its read buffer (and draining its
>> write buffer) using level-triggered callbacks, at least up to a
>> certain buffer size -- we have to be robust here and make it
>> impossible for an evil client to fill up all our memory without our
>> approval!
> I'd like to know what a sane buffered API for non-blocking I/O may look
> like, because right now it doesn't seem to make a lot of sense. At
> least this bug is tricky to resolve:

Good question. It actually depends quite a bit on whether you have an
event loop or not -- with the help of an event loop, you can have a
level-triggered callback that fills the buffer behind your back (up to
a given limit, at which point it should unregister the I/O object);
that bug seems to be about a situation without an event loop, where
you can't do that. Also the existing io module design never
anticipated cooperation with an event loop.

>> - There's an abstract Reactor class and an abstract Async I/O object
>> class. To get a reactor to call you back, you must give it an I/O
>> object, a callback, and maybe some more stuff. (I have gone back and
>> like passing optional args for the callback, rather than requiring
>> lambdas to create closures.) Note that the callback is *not* a
>> designated method on the I/O object!
> Why isn't it? In practice, you need several callbacks: in Twisted
> parlance, you have dataReceived but also e.g. ConnectionLost
> (depending on the transport, you may even imagine other callbacks, for
> example for things happening on the TLS layer?).

Yes, but I really want to separate the callbacks from the object, so
that I don't have to inherit from an I/O object class -- asyncore
requires this and IMO it's wrong. It also makes it harder to use the
same callback code with different types of I/O objects.

>> - In systems supporting file descriptors, there's a reactor
>> implementation that knows how to use select/poll/etc., and there are
>> concrete I/O object classes that wrap file descriptors. On Windows,
>> those would only be socket file descriptors. On Unix, any file
>> descriptor would do.
> Windows *is* able to do async I/O on things other than sockets (see the
> discussion about IOCP). It's just that the Windows implementation of
> select() (the POSIX function call) is limited to sockets.

I know, but IOCP is currently not supported in the stdlib. I expect
that on Windows, to use IOCP, you'd need to use a different reactor
implementation and a different I/O object than the vanilla fd-based
ones. My design is actually *inspired* by the desire to support this

--Guido van Rossum (

From g.brandl at  Fri Oct 12 21:39:16 2012
From: g.brandl at (Georg Brandl)
Date: Fri, 12 Oct 2012 21:39:16 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>	<>	<>
	<k59ff3$i5i$> <>
Message-ID: <k59rit$vh3$>

Am 12.10.2012 18:27, schrieb Ethan Furman:
> Georg Brandl wrote:
>> Am 12.10.2012 14:45, schrieb Blake Hyde:
>>> I'm a Python developer rather than a developer of Python, but I'd like to ask a
>>> question about this option (and implicitly vote against it, I suppose); if you
>>> specialize a method name, such as .pathjoin, aren't you implying that methods
>>> must be unambiguous even across types and classes?  This seems negative.  Even
>>> if .join is already used for strings, it also makes sense for this use case.
>> Of course different classes can have methods of the same name.
>> The issue here is that due to the similarity (and interchangeability) of path
>> objects and strings it is likely that people get them mixed up every now and
>> then, and if .join() works on both objects the failure mode (strange result
>> from str.join when you expected path.join) is horribly confusing.
> I don't understand the "horribly confusing" part.  Sure, when I got them 
> mixed up and ended up with a plain ol' string instead of a really cool 
> Path it took a moment to figure out where I had made the error, but the 
> traceback of "AttributeError: 'str' object has no attribute 'path'" left 
> absolutely no room for confusion as to what the problem was.

"no attribute 'path'"?  Not sure where that exception comes from.
This is what I meant:

>>> p = Path('/usr')
>>> p.join('lib')

>>> p = '/usr'
>>> p.join('lib')


From tim.peters at  Fri Oct 12 21:42:34 2012
From: tim.peters at (Tim Peters)
Date: Fri, 12 Oct 2012 14:42:34 -0500
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

[Mark Dickinson]
> ...
> And as for sqrt(-0.0) returning -0.0...  Grr.  I've never understood
> the motivation for that one, especially as it disagrees with the usual
> recommendations for complex square root (where the real part of the
> result *always* has its sign bit cleared).

The only rationale I've seen for this is in Kahan's obscure paper
"Branch Cuts for Complex Elementary Functions or Much Ado About
Nothing's Sign Bit".  Hard to find.  Here's a mostly readable scan:

In part it's to preserve various identities, such as that
sqrt(conjugate(z)) is the same as conjugate(sqrt(z)).  When z is +0,
that becomes

    sqrt(conjugate(+0)) same_as conjugate(sqrt(+0))

which is

    sqrt(-0) same_as conjugate(+0)

which is

    sqrt(-0) same as -0

Conviced?  LOL.  There are others in the paper ;-)

From solipsis at  Fri Oct 12 21:42:24 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 12 Oct 2012 21:42:24 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On Fri, 12 Oct 2012 12:23:46 -0700
Ethan Furman <ethan at> wrote:
> Which is why I would like to see Path based on str, despite Guido's 
> misgivings.  (Yes, I know I'm probably tilting at windmills here...)
> If Path is string based we get backwards compatibility with all the os 
> and third-party tools that expect and use strings; this would allow a 
> gentle migration to using them, as opposed to the all-or-nothing if Path 
> is a completely new type.

It is not all-or-nothing since you can just call str() and it will work
fine with both strings and paths.



Software development and contracting:

From ram.rachum at  Fri Oct 12 22:27:41 2012
From: ram.rachum at (Ram Rachum)
Date: Fri, 12 Oct 2012 13:27:41 -0700 (PDT)
Subject: [Python-ideas] Is there a good reason to use * for multiplication?
Message-ID: <>

Hi everybody,

Today a funny thought occurred to me. Ever since I've learned to program 
when I was a child, I've taken for granted that when programming, the sign 
used for multiplication is *. But now that I think about it, why? Now that 
we have Unicode, why not use ? ?

Do you think that we can make Python support ? in addition to *? 

I can think of a couple of problems, but none of them seem like 

 - Backward compatibility: Python already uses *, but I don't see a 
backward compatibility problem with supporting ? additionally. Let people 
use whichever they want, like spaces and tabs.
 - Input methods: I personally use an IDE that could be easily set to 
automatically convert * to ? where appropriate and to allow manual input of 
?. People on Linux can type Alt-. . Anyone else can set up a script that'll 
let them type ? using whichever keyboard combination they want. I admit 
this is pretty annoying, but since you can always use * if you want to, I 
figure that anyone who cares enough about using ? instead of * (I bet that 
people in scientific computing would like that) would be willing to take 
the time to set it up.

What do you think?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ram.rachum at  Fri Oct 12 22:37:47 2012
From: ram.rachum at (Ram Rachum)
Date: Fri, 12 Oct 2012 22:37:47 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 10:34 PM, Mike Graham <mikegraham at> wrote:

> On Fri, Oct 12, 2012 at 4:27 PM, Ram Rachum <ram.rachum at> wrote:
> > Hi everybody,
> >
> > Today a funny thought occurred to me. Ever since I've learned to program
> > when I was a child, I've taken for granted that when programming, the
> sign
> > used for multiplication is *. But now that I think about it, why? Now
> that
> > we have Unicode, why not use ? ?
> >
> > Do you think that we can make Python support ? in addition to *?
> >
> > I can think of a couple of problems, but none of them seem like
> > deal-breakers:
> >
> >  - Backward compatibility: Python already uses *, but I don't see a
> backward
> > compatibility problem with supporting ? additionally. Let people use
> > whichever they want, like spaces and tabs.
> >  - Input methods: I personally use an IDE that could be easily set to
> > automatically convert * to ? where appropriate and to allow manual input
> of
> > ?. People on Linux can type Alt-. . Anyone else can set up a script
> that'll
> > let them type ? using whichever keyboard combination they want. I admit
> this
> > is pretty annoying, but since you can always use * if you want to, I
> figure
> > that anyone who cares enough about using ? instead of * (I bet that
> people
> > in scientific computing would like that) would be willing to take the
> time
> > to set it up.
> >
> >
> > What do you think?
> >
> >
> > Ram
> Python should not expect characters that are hard for most people to
> type.

No one will be forced to type it. If you can't type it, use *.

> Python should not expect characters that are still hard to
> display on many common platforms.

We allow people to have unicode variable names, if they wish, don't we? So
why not allow them to use unicode operator, if they wish, as a completely
optional thing?

> I think you'll find strong opposition to adding any non-ASCII
> characters or characters that don't occur on almost all keyboards as
> part of the language.
> Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Fri Oct 12 22:43:24 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 13:43:24 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

[Responding to a different message that also pertains to the reactors thread]

On Thu, Oct 11, 2012 at 6:38 PM, Mark Adam <dreamingforward at> wrote:
> Here's the thing:  the underlying O.S is always handling two major I/O
> channels at any given time and it needs all it's attention to do this:
>  the GUI and one of the following (network, file) I/O.  You can
> shuffle these around all you want, but somewhere the O.S. kernel is
> going to have to be involved, which means either portability is
> sacrificed or speed if one is going to pursue and abstract, unified
> async API.

I'm convinced that the OS has to get involved. I'm not convinced that
it will get in the way of designing an abstract unified API -- however
that API will have to be more complicated than the kind of event loop
that *only* handles network I/O or the kind that *only* handles GUI

I wonder if Windows' IOCP API that was mentioned before in the parent
thread wouldn't be able to handle both though. Windows' event concept
seems more general than sockets or GUI events. However I don't know if
this is actually how GUI events are handled in Windows.

>> You should talk to a Tcl/Tk user (if there are any left :-).
> I used to be one of those :)

So tell us more about the user experience of having a standard event
loop always available in the language, and threads, network I/O and
GUI events all integrated. What worked, what didn't? What did you wish
had been different?

--Guido van Rossum (

From ram.rachum at  Fri Oct 12 22:45:40 2012
From: ram.rachum at (Ram Rachum)
Date: Fri, 12 Oct 2012 22:45:40 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 10:40 PM, Blake Hyde <syrion at> wrote:

> Is anything gained from this addition?

To give a practical answer, I could say that for newbies it's one small
confusion that could removed from the language. You and I have been
programming for a long time so we take it for granted that * means
multiplication, but for any other person that's just another
weird idiosyncrasy that further alienates programming.

Also, I think that using * for multiplication is ugly.

> On Fri, Oct 12, 2012 at 4:37 PM, Ram Rachum <ram.rachum at> wrote:
> >
> >
> > On Fri, Oct 12, 2012 at 10:34 PM, Mike Graham <mikegraham at>
> wrote:
> >>
> >> On Fri, Oct 12, 2012 at 4:27 PM, Ram Rachum <ram.rachum at>
> wrote:
> >> > Hi everybody,
> >> >
> >> > Today a funny thought occurred to me. Ever since I've learned to
> program
> >> > when I was a child, I've taken for granted that when programming, the
> >> > sign
> >> > used for multiplication is *. But now that I think about it, why? Now
> >> > that
> >> > we have Unicode, why not use ? ?
> >> >
> >> > Do you think that we can make Python support ? in addition to *?
> >> >
> >> > I can think of a couple of problems, but none of them seem like
> >> > deal-breakers:
> >> >
> >> >  - Backward compatibility: Python already uses *, but I don't see a
> >> > backward
> >> > compatibility problem with supporting ? additionally. Let people use
> >> > whichever they want, like spaces and tabs.
> >> >  - Input methods: I personally use an IDE that could be easily set to
> >> > automatically convert * to ? where appropriate and to allow manual
> input
> >> > of
> >> > ?. People on Linux can type Alt-. . Anyone else can set up a script
> >> > that'll
> >> > let them type ? using whichever keyboard combination they want. I
> admit
> >> > this
> >> > is pretty annoying, but since you can always use * if you want to, I
> >> > figure
> >> > that anyone who cares enough about using ? instead of * (I bet that
> >> > people
> >> > in scientific computing would like that) would be willing to take the
> >> > time
> >> > to set it up.
> >> >
> >> >
> >> > What do you think?
> >> >
> >> >
> >> > Ram
> >>
> >> Python should not expect characters that are hard for most people to
> >> type.
> >
> >
> > No one will be forced to type it. If you can't type it, use *.
> >
> >
> >>
> >> Python should not expect characters that are still hard to
> >> display on many common platforms.
> >
> >
> > We allow people to have unicode variable names, if they wish, don't we?
> So
> > why not allow them to use unicode operator, if they wish, as a completely
> > optional thing?
> >
> >>
> >>
> >> I think you'll find strong opposition to adding any non-ASCII
> >> characters or characters that don't occur on almost all keyboards as
> >> part of the language.
> >>
> >> Mike
> >
> >
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From dickinsm at  Fri Oct 12 22:46:00 2012
From: dickinsm at (Mark Dickinson)
Date: Fri, 12 Oct 2012 21:46:00 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

On Fri, Oct 12, 2012 at 8:42 PM, Tim Peters <tim.peters at> wrote:
> In part it's to preserve various identities, such as that
> sqrt(conjugate(z)) is the same as conjugate(sqrt(z)).  When z is +0,
> that becomes
>     sqrt(conjugate(+0)) same_as conjugate(sqrt(+0))
> which is
>     sqrt(-0) same_as conjugate(+0)
> which is
>     sqrt(-0) same as -0
> Conviced?

Not really. :-)  In fact, it's exactly that paper that made me think
sqrt(-0.0) -> -0.0 is suspect.

The way I read it, the argument from the paper implies that
cmath.sqrt(complex(0.0, -0.0)) should be complex(0.0, -0.0), which I
have no problem with---it makes things nice and neat:  quadrants 1 and
2 in the complex plane map to quadrant 1, and quadrants 3 and 4 to
quadrant 4, with the signs of the zeros making it clear what
'quadrant' means in all (non-nan) cases.  But I don't see how to get
from there to math.sqrt(-0.0) being -0.0.

It's exactly the mismatch between the real and complex math that makes
no sense to me:  math.sqrt(-0.0) should resemble
cmath.sqrt(complex(-0.0, +/-0.0)).  But the latter, quite reasonably,
is complex(0.0, +/-0.0) (at least according to both Kahan and C99
Annex G), while the former is specified to be -0.0 in IEEE 754.


From at  Fri Oct 12 22:45:58 2012
From: at (Joshua Landau)
Date: Fri, 12 Oct 2012 21:45:58 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <k59rit$vh3$>
References: <>
	<k59ff3$i5i$> <>
Message-ID: <>

On 12 October 2012 20:39, Georg Brandl <g.brandl at> wrote:

> Am 12.10.2012 18:27, schrieb Ethan Furman:
> > Georg Brandl wrote:
> >> Am 12.10.2012 14:45, schrieb Blake Hyde:
> >>> I'm a Python developer rather than a developer of Python, but I'd like
> to ask a
> >>> question about this option (and implicitly vote against it, I
> suppose); if you
> >>> specialize a method name, such as .pathjoin, aren't you implying that
> methods
> >>> must be unambiguous even across types and classes?  This seems
> negative.  Even
> >>> if .join is already used for strings, it also makes sense for this use
> case.
> >>
> >> Of course different classes can have methods of the same name.
> >>
> >> The issue here is that due to the similarity (and interchangeability)
> of path
> >> objects and strings it is likely that people get them mixed up every
> now and
> >> then, and if .join() works on both objects the failure mode (strange
> result
> >> from str.join when you expected path.join) is horribly confusing.
> >
> > I don't understand the "horribly confusing" part.  Sure, when I got them
> > mixed up and ended up with a plain ol' string instead of a really cool
> > Path it took a moment to figure out where I had made the error, but the
> > traceback of "AttributeError: 'str' object has no attribute 'path'" left
> > absolutely no room for confusion as to what the problem was.
> "no attribute 'path'"?  Not sure where that exception comes from.
> This is what I meant:
> >>> p = Path('/usr')
> >>> p.join('lib')
> Path('/usr/lib')
> >>> p = '/usr'
> >>> p.join('lib')
> 'l/usri/usrb'

I don't know about you, but I found that so horribly confusing I had to
check the output. I'm just not used to thinking of str.join(str) as
sensible, and I could not for the life of me see where the
output 'l/usri/usrb' came from. Where was "lib"?

I might just have been an idiot for a minute, but it'll just get harder in
real code. And I'm not the worst for stupid mistakes: we wan't newbies to
be able (and want) to use the built in path modules. When they come back
wondering why

> homepath.join("joshua").join(".config")

> '.j/homeo/homes/homeh/homeu/homeacj/homeo/homes/homeh/homeu/homeaoj/homeo/homes/homeh/homeu/homeanj/homeo/homes/homeh/homeu/homeafj/homeo/homes/homeh/homeu/homeaij/homeo/homes/homeh/homeu/homeag'

we are going to have a problem.

So I agree with you [Georg Brandl] here. I would even rather
.pathjoin/.joinpath than .join despite the utterly painful name*.

* As others have stated, if you like it why not .strjoin and .dictupdate
and .listappend?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mikegraham at  Fri Oct 12 22:46:39 2012
From: mikegraham at (Mike Graham)
Date: Fri, 12 Oct 2012 16:46:39 -0400
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 4:37 PM, Ram Rachum <ram.rachum at> wrote:
> On Fri, Oct 12, 2012 at 10:34 PM, Mike Graham <mikegraham at> wrote:
>> On Fri, Oct 12, 2012 at 4:27 PM, Ram Rachum <ram.rachum at> wrote:
>> > Hi everybody,
>> >
>> > Today a funny thought occurred to me. Ever since I've learned to program
>> > when I was a child, I've taken for granted that when programming, the
>> > sign
>> > used for multiplication is *. But now that I think about it, why? Now
>> > that
>> > we have Unicode, why not use ? ?
>> >
>> > Do you think that we can make Python support ? in addition to *?
>> >
>> > I can think of a couple of problems, but none of them seem like
>> > deal-breakers:
>> >
>> >  - Backward compatibility: Python already uses *, but I don't see a
>> > backward
>> > compatibility problem with supporting ? additionally. Let people use
>> > whichever they want, like spaces and tabs.
>> >  - Input methods: I personally use an IDE that could be easily set to
>> > automatically convert * to ? where appropriate and to allow manual input
>> > of
>> > ?. People on Linux can type Alt-. . Anyone else can set up a script
>> > that'll
>> > let them type ? using whichever keyboard combination they want. I admit
>> > this
>> > is pretty annoying, but since you can always use * if you want to, I
>> > figure
>> > that anyone who cares enough about using ? instead of * (I bet that
>> > people
>> > in scientific computing would like that) would be willing to take the
>> > time
>> > to set it up.
>> >
>> >
>> > What do you think?
>> >
>> >
>> > Ram
>> Python should not expect characters that are hard for most people to
>> type.
> No one will be forced to type it. If you can't type it, use *.
>> Python should not expect characters that are still hard to
>> display on many common platforms.
> We allow people to have unicode variable names, if they wish, don't we? So
> why not allow them to use unicode operator, if they wish, as a completely
> optional thing?

1. Non-ASCII unicode identifiers are heavily discouraged in most
contexts and are not present anywhere in the core language or stdlib
for a reason.

2. Having duplicative features where neither is encouraged is a bad
idea. "There should be one-- and preferably only one --obvious way to
do it." This is doubly true when one of the ways makes it harder for
people to read and edit code others' wrote.


From mikegraham at  Fri Oct 12 22:49:18 2012
From: mikegraham at (Mike Graham)
Date: Fri, 12 Oct 2012 16:49:18 -0400
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 4:45 PM, Ram Rachum <ram.rachum at> wrote:
> On Fri, Oct 12, 2012 at 10:40 PM, Blake Hyde <syrion at> wrote:
>> Is anything gained from this addition?
> To give a practical answer, I could say that for newbies it's one small
> confusion that could removed from the language. You and I have been
> programming for a long time so we take it for granted that * means
> multiplication, but for any other person that's just another weird
> idiosyncrasy that further alienates programming.
> Also, I think that using * for multiplication is ugly.

You're emphatically not getting rid of *, though, which means 1)
you're only making it harder for new people to learn and deal with,
and b) you're at best not eliminating any perceived ugliness, in
reality probably compounding it.


From ethan at  Fri Oct 12 22:33:14 2012
From: ethan at (Ethan Furman)
Date: Fri, 12 Oct 2012 13:33:14 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<k4pr17$i94$>
	<>	<>	<>
	<k4q3o6$m9q$>	<>
Message-ID: <>

Antoine Pitrou wrote:
> On Fri, 12 Oct 2012 12:23:46 -0700
> Ethan Furman <ethan at> wrote:
>> Which is why I would like to see Path based on str, despite Guido's 
>> misgivings.  (Yes, I know I'm probably tilting at windmills here...)
>> If Path is string based we get backwards compatibility with all the os 
>> and third-party tools that expect and use strings; this would allow a 
>> gentle migration to using them, as opposed to the all-or-nothing if Path 
>> is a completely new type.
> It is not all-or-nothing since you can just call str() and it will work
> fine with both strings and paths.

D'oh.  You're correct, of course.

What I was thinking was along the lines of:

--> some_table = Path('~/addresses.dbf')
--> some_table = os.path.expanduser(some_table)


--> some_table = Path('~/addresses.dbf')
--> some_table = Path(os.path.expanduser(str(some_table)))

The Path/str sandwich is ackward, as well as verbose.


From solipsis at  Fri Oct 12 22:53:06 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 12 Oct 2012 22:53:06 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On Fri, 12 Oct 2012 13:33:14 -0700
Ethan Furman <ethan at> wrote:
> Antoine Pitrou wrote:
> > On Fri, 12 Oct 2012 12:23:46 -0700
> > Ethan Furman <ethan at> wrote:
> >> Which is why I would like to see Path based on str, despite Guido's 
> >> misgivings.  (Yes, I know I'm probably tilting at windmills here...)
> >>
> >> If Path is string based we get backwards compatibility with all the os 
> >> and third-party tools that expect and use strings; this would allow a 
> >> gentle migration to using them, as opposed to the all-or-nothing if Path 
> >> is a completely new type.
> > 
> > It is not all-or-nothing since you can just call str() and it will work
> > fine with both strings and paths.
> D'oh.  You're correct, of course.
> What I was thinking was along the lines of:
> --> some_table = Path('~/addresses.dbf')
> --> some_table = os.path.expanduser(some_table)
> vs
> --> some_table = Path('~/addresses.dbf')
> --> some_table = Path(os.path.expanduser(str(some_table)))

Hey, nice catch, I need to add a expanduser()-alike to the Path API.

Thank you!


Software development and contracting:

From at  Fri Oct 12 23:03:03 2012
From: at (Joshua Landau)
Date: Fri, 12 Oct 2012 22:03:03 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
Message-ID: <>

On 12 October 2012 21:33, Ethan Furman <ethan at> wrote:

> Antoine Pitrou wrote:
>> On Fri, 12 Oct 2012 12:23:46 -0700
>> Ethan Furman <ethan at> wrote:
>>> Which is why I would like to see Path based on str, despite Guido's
>>> misgivings.  (Yes, I know I'm probably tilting at windmills here...)
>>> If Path is string based we get backwards compatibility with all the os
>>> and third-party tools that expect and use strings; this would allow a
>>> gentle migration to using them, as opposed to the all-or-nothing if Path is
>>> a completely new type.
>> It is not all-or-nothing since you can just call str() and it will work
>> fine with both strings and paths.
> D'oh.  You're correct, of course.
> What I was thinking was along the lines of:
> --> some_table = Path('~/addresses.dbf')
> --> some_table = os.path.expanduser(some_table)
> vs
> --> some_table = Path('~/addresses.dbf')
> --> some_table = Path(os.path.expanduser(str(**some_table)))
> The Path/str sandwich is ackward, as well as verbose.

A lot of them might end up inadvertently converting back to a pure string
as well, so a better comparison will in many places be:

some_table = Path('~/addresses.dbf')
> some_table = Path(os.path.expanduser(some_table))


some_table = Path('~/addresses.dbf')
> some_table = Path(os.path.expanduser(str(**some_table)))

which is only five characters different. I would also prefer:

some_table = Path('~/addresses.dbf')
> some_table = Path(os.path.expanduser(some_table.raw()))

or some other method. It just looks nicer to me in this case. Maybe .str(),
.chars() or.text().

Additionally, if this is too painful and too often used, we can always make
an auxiliary function.

some_table = Path('~/addresses.dbf')
> some_table = some_table.str_apply(os.path.expanduser)

Where .str_apply takes (func, *args, **kwargs) and you need to wrap the
function if it takes the path at a different position. I don't particularly
like this option, but it exists.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From python at  Fri Oct 12 23:12:27 2012
From: python at (MRAB)
Date: Fri, 12 Oct 2012 22:12:27 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On 2012-10-12 20:42, Antoine Pitrou wrote:
> On Fri, 12 Oct 2012 12:23:46 -0700
> Ethan Furman <ethan at> wrote:
>> Which is why I would like to see Path based on str, despite Guido's
>> misgivings.  (Yes, I know I'm probably tilting at windmills here...)
>> If Path is string based we get backwards compatibility with all the os
>> and third-party tools that expect and use strings; this would allow a
>> gentle migration to using them, as opposed to the all-or-nothing if Path
>> is a completely new type.
> It is not all-or-nothing since you can just call str() and it will work
> fine with both strings and paths.
The disadvantage of using str is that it will also convert non-path
objects to strings, possibly changing the result of the call:

 >>> os.path.isdir(1)
Traceback (most recent call last):
   File "<pyshell#6>", line 1, in <module>
TypeError: 'int' does not support the buffer interface
 >>> os.path.isdir(str(1))

From ethan at  Fri Oct 12 23:00:32 2012
From: ethan at (Ethan Furman)
Date: Fri, 12 Oct 2012 14:00:32 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<k4pr17$i94$>
	<>	<>	<>
	<k4q3o6$m9q$>	<>	<>	<>
Message-ID: <>

Antoine Pitrou wrote:
> Hey, nice catch, I need to add a expanduser()-alike to the Path API.
> Thank you!

You're welcome.  :p

My point about the Path(...(str(...))) sandwich still applies, though, 
for every function that isn't built in to Path.  :)


From at  Fri Oct 12 23:16:13 2012
From: at (Joshua Landau)
Date: Fri, 12 Oct 2012 22:16:13 +0100
Subject: [Python-ideas] checking for identity before comparing built-in
In-Reply-To: <>
References: <>
	<> <20121009043236.GI27445@ando>
	<> <>
Message-ID: <>

Thank you all for being so thorough. I think I'm sated for tonight. ^^

With all due respect,

Joshua Landau
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From victor.stinner at  Fri Oct 12 23:31:23 2012
From: victor.stinner at (Victor Stinner)
Date: Fri, 12 Oct 2012 23:31:23 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

2012/10/12 Ram Rachum <ram.rachum at>:
> What do you think?

It's maybe time to implement


From ethan at  Fri Oct 12 23:37:53 2012
From: ethan at (Ethan Furman)
Date: Fri, 12 Oct 2012 14:37:53 -0700
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

Ram Rachum wrote:
> Hi everybody,
> Today a funny thought occurred to me. Ever since I've learned to program 
> when I was a child, I've taken for granted that when programming, the 
> sign used for multiplication is *. But now that I think about it, why? 
> Now that we have Unicode, why not use ? ?

Because it is too easy to confuse ? with .

Because it is not solving a problem.

Because it would still take work, and then easily cause confusion.

In college we dropped the ? and just wrote stuff like:

(x + z)(x - y)

but we can't do that in Python because they are function calls.

In short, I don't see it happening.


From tarek at  Fri Oct 12 23:57:06 2012
From: tarek at (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 12 Oct 2012 22:57:06 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/12/12 10:31 PM, Victor Stinner wrote:
> 2012/10/12 Ram Rachum <ram.rachum at>:
>> What do you think?
> It's maybe time to implement
I'd use ? ?? to speed up a piece code, not for exceptions...

> Victor
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sat Oct 13 00:11:54 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 15:11:54 -0700
Subject: [Python-ideas] The async API of the future: Twisted and Deferreds
Message-ID: <>

[This is the third spin-off thread from "asyncore: included batteries
don't fit"]

On Thu, Oct 11, 2012 at 9:29 PM, Devin Jeanpierre
<jeanpierreda at> wrote:
> On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum <guido at> wrote:
>> On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre
>> <jeanpierreda at> wrote:
>>> Could you be more specific? I've never heard Deferreds in particular
>>> called "arcane". They're very popular in e.g. the JS world,
>> Really? Twisted is used in the JS world? Or do you just mean the
>> pervasiveness of callback style async programming?
> Ah, I mean Deferreds. I attended a talk earlier this year all about
> deferreds in JS, and not a single reference to Python or Twisted was
> made!
> These are the examples I remember mentioned in the talk:
> - (not very twistedish
> at all, ill-liked by the speaker)
> - (maybe
> not a good example, mochikit tries to be "python in JS")
> -
> - (also includes an explanation of why
> the author likes deferreds)
> There were a few more that the speaker mentioned, but didn't cover.
> One of his points was that the various systems of deferreds are subtly
> different, some very badly so, and that it was a mess, but that
> deferreds were still awesome. JS is a language where async programming
> is mainstream, so lots of people try to make it easier, and they all
> do it slightly differently.

Thanks for those links. I followed the kriskowal/q link and was
reminded of why Twisted's Deferreds are considered more awesome than
Futures: it's the chaining.

BUT... That's only important if callbacks are all the language lets
you do! If your baseline is this:

step1(function (value1) {
    step2(value1, function(value2) {
        step3(value2, function(value3) {
            step4(value3, function(value4) {
                // Do something with value4

then of course the alternative using Deferred looks better:

.then(function (value4) {
    // Do something with value4
}, function (error) {
    // Handle any error from step1 through step4

(Both quoted literally from the kriskowal/q link.)

I also don't doubt that using classic Futures you can't do this -- the
chaining really matter for this style, and I presume this (modulo
unimportant API differences) is what typical Twisted code looks like.

However, Python has yield, and you can do much better (I'll write
plain yield for now, but it works the same with yield-from):

  value1 = yield step1(<args>)
  value2 = yield step2(value1)
  value3 = yield step3(value2)
  # Do something with value4
except Exception:
  # Handle any error from step1 through step4

There's an outer function missing here, since you can't have a
toplevel yield; I think that's the same for the JS case, typically.
Also, strictly speaking the "Do something with value4" code should
probably be in an else: clause after the except handler. But that
actually leads nicely to the advantage:

This form is more flexible, since it is easier to catch different
exceptions at different points. It is also much easier to pass extra
information around. E.g. what if your flow ends up having to pass both
value1 and value2 into step3()? Sure, you can do that by making value2
a tuple (or a dict, or an object) incorporating value1 and the
original value2, but that's exactly where this style becomes
cumbersome, whereas in the yield-based form, such things can remain
simple local variables. All in all I find it more readable.

In the past, when I pointed this out to Twisted aficionados, the
responses usually were a mix of "sure, if you like that style, we got
it covered, Twisted has inlineCallbacks," and "but that only works for
the simple cases, for the real stuff you still need Deferreds." But
that really sounds to me like Twisted people just liking what they've
got and not wanting to change. Which I understand -- I don't want to
change either. But I also observe that a lot of people find bare
Twisted-with-Deferreds too hard to grok, so they use Tornado instead,
or they build a layer on top of either (like Monocle), or they go a
completely different route and use greenlets/gevent instead -- and get
amazing performance and productivity that way too, even though they
know it's monkey-patching their asses off...

So, in the end, for Python 3.4 and beyond, I want to promote a style
that mixes simple callbacks (perhaps augmented with simple Futures)
and generator-based coroutines (either PEP 342, yield/send-based, or
PEP 380 yield-from-based). I'm looking to Twisted for the best
reactors (see other thread). But for transport/protocol
implementations I think that generator/coroutines offers a cleaner,
better interface than incorporating Deferred.

I hope that the path forward for Twisted will be simple enough: it
should be possible to hook Deferred into the simpler callback APIs
(perhaps a new implementation using some form of adaptation, but
keeping the interface the same). In a sense, the greenlet/gevent crowd
will be the biggest losers, since they currently write async code
without either callbacks or yield, using microthreads instead. I
wouldn't want to have to start putting yield back everywhere into that
code. But the stdlib will still support yield-free blocking calls
(even if under the hood some of these use yield/send-based or
yield-from-based couroutines) so the monkey-patchey tradition can

>> That's one of the
>> things I am desperately trying to keep out of Python, I find that
>> style unreadable and unmanageable (whenever I click on a button in a
>> website and nothing happens I know someone has a bug in their
>> callbacks). I understand you feel different; but I feel the general
>> sentiment is that callback-based async programming is even harder than
>> multi-threaded programming (and nobody is claiming that threads are
>> easy :-).
> :S
> There are (at least?) four different styles of asynchronous
> computation used in Twisted, and you seem to be confused as to which
> ones I'm talking about.
> 1. Explicit callbacks:
>     For example, reactor.callLater(t, lambda: print("woo hoo"))

I actually like this, as it's a lowest-common-denominator approach
which everyone can easily adapt to their purposes. See the thread I
started about reactors.

> 2. Method dispatch callbacks:
>     Similar to the above, the reactor or somebody has a handle on your
> object, and calls methods that you've defined when events happen
>     e.g. IProtocol's dataReceived method

While I'm sure it's expedient and captures certain common patterns
well, I like this the least of all -- calling fixed methods on an
object sounds like a step back; it smells of the old Java way (before
it had some equivalent of anonymous functions), and of asyncore, which
(nearly) everybody agrees is kind of bad due to its insistence that
you subclass its classes. (Notice how subclassing as the prevalent
approach to structuring your code has gotten into a lot of discredit
since 1996.)

> 3. Deferred callbacks:
>     When you ask for something to be done, it's set up, and you get an
> object back, which you can add a pipeline of callbacks to that will be
> called whenever whatever happens
>     e.g. twisted.internet.threads.deferToThread(print,
> "x").addCallback(print, "x was printed in some other thread!")

Discussed above.

> 4. Generator coroutines
>     These are a syntactic wrapper around deferreds. If you yield a
> deferred, you will be sent the result if the deferred succeeds, or an
> exception if the deferred fails.
>     e.g. examples from previous message

Seeing them as syntactic sugar for Deferreds is one way of looking at
it; no doubt this is how they're seen in the Twisted community because
Deferreds are older and more entrenched. But there's no requirement
that an architecture has to have Deferreds in order to use generator
coroutines -- simple Futures will do just fine, and Greg Ewing has
shown that using yield-from you can even do without those. (But he
does use simple, explicit callbacks at the lowest level of his

> I don't see a reason for the first to exist at all, the second one is
> kind of nice in some circumstances (see below), but perhaps overused.
> I feel like you're railing on the first and second when I'm talking
> about the third and fourth. I could be wrong.

I think you're wrong -- I was (and am) most concerned about the
perceived complexity of the API offered by, and the typical looks of
code using, Deferreds (i.e., #3).

>>> and possibly elsewhere. Moreover, they're extremely similar to futures, so
>>> if one is arcane so is the other.
>> I love Futures, they represent a nice simple programming model. But I
>> especially love that you can write async code using Futures and
>> yield-based coroutines (what you call inlineCallbacks) and never have
>> to write an explicit callback function. Ever.
> The reason explicit non-deferred callbacks are involved in Twisted is
> because of situations in which deferreds are not present, because of
> past history in Twisted. It is not at all a limitation of deferreds or
> something futures are better at, best as I'm aware.
> (In case that's what you're getting at.)

I don't think I was. It's clear to me (now) that Futures are simpler
than Deferreds -- and I like Futures better because of it, because for
the complex cases I would much rather use generator coroutines than

> Anyway, one big issue is that generator coroutines can't really
> effectively replace callbacks everywhere. Consider the GUI button
> example you gave. How do you write that as a coroutine?
> I can see it being written like this:
>     def mycoroutine(gui):
>         while True:
>             clickevent = yield gui.mybutton1.on_click()
>             # handle clickevent
> But that's probably worse than using callbacks.

I touched on this briefly in the reactor thread. Basically, GUI
callbacks are often level-triggered rather than edge-triggered, and
IIUC Deferreds are not great for that either; and in a few cases where
edge-triggered coding makes sense I *would* like to use a generator

>>> Neither is clearly better or more obvious than the other. If anything
>>> I generally find deferred composition more useful than deferred
>>> tee-ing, so I feel like composition is the correct base operator, but
>>> you could pick another.
>> If you're writing long complicated chains of callbacks that benefit
>> from these features, IMO you are already doing it wrong. I understand
>> that this is a matter of style where I won't be able to convince you.
>> But style is important to me, so let's agree to disagree.

[In a follow-up to yourself, you quoted starting from this point and
appended "Nevermind that whole segment." I'm keeping it in here just
for context of the thread.]

> This is more than a matter of style, so at least for now I'd like to
> hold off on calling it even.
> In my day to day silly, synchronous, python code, I do lots of
> synchronous requests. For example, it's not unreasonable for me to
> want to load two different files from disk, or make several database
> interactions, etc. If I want to make this asynchronous, I have to find
> a way to execute multiple things that could hypothetically block, at
> the same time. If I can't do that easily, then the asynchronous
> solution has failed, because its entire purpose is to do everything
> that I do synchronously, except without blocking the main thread.
> Here's an example with lots of synchronous requests in Django:
> def view_paste(request, filekey):
>     try:
>         fileinfo= Pastes.objects.get(key=filekey)
>     except DoesNotExist:
>         t = loader.get_template('pastebin/error.html')
>         return HttpResponse(t.render(Context(dict(error='File does not exist'))))
>     f = open(fileinfo.filename)
>     fcontents =
>     t = loader.get_template('pastebin/paste.html')
>     return HttpResponse(t.render(Context(dict(file=fcontents))))
> How many blocking requests are there? Lots. This is, in a word, a
> long, complicated chain of synchronous requests. This is also very
> similar to what actual django code might look like in some
> circumstances. Even if we might think this is unreasonable, some
> subset of alteration of this is reasonable. Certainly we should be
> able to, say, load multiple (!) objects from the database, and open
> the template (possibly from disk), all potentially-blocking
> operations.
> This is inherently a long, complicated chain of requests, whether we
> implement it asynchronously or synchronously, or use Deferreds or
> Futures, or write it in Java or Python. Some parts can be done at any
> time before the end (loader.get_template(...)), some need to be done
> in a certain order, and there's branching depending on what happens in
> different cases. In order to even write this code _at all_, we need a
> way to chain these IO actions together. If we can't chain them
> together, we can't produce that final synthesis of results at the end.

[This is here you write "Ugh, just realized way after the fact that of
course you meant callbacks, not composition. I feel dumb. Nevermind
that whole segment."]

I'd like to come back to that Django example though. You are implying
that there are some opportunities for concurrency here, and I agree,
assuming we believe disk I/O is slow enough to bother making it
asynchronously. (In App Engine it's not, and we can't anyways, but in
other contexts I agree that it would be bad if a slow disk seek were
to hold up all processing -- not to mention that it might really be

The potentially async operations I see are:

(1) fileinfo = Pastes.objects.get(key=filekey)  # I assume this is
some kind of database query

(2) loader.get_template('pastebin/error.html')

(3) f = open(fileinfo.filename)  # depends on (1)

(4) fcontents =  # depends on (3)

(5) loader.get_template('pastebin/paste.html')

How would you code that using Twisted Deferreds?

Using Futures and generator coroutines, I would do it as follows. I'm
hypothesizing that for every blocking API foo() there is a
corresponding non-blocking API foo_async() with the same call
signature, and returning a Future whose result is what the synchronous
API returns (and raises what the synchronous call would raise, if
there's an error). These are the conventions I use in NDB. I'm also
inventing a @task decorator.

 def view_paste_async(request, filekey):
    # Create Futures -- no yields!
    f1 = Pastes.objects.get_async(key=filekey) # This won't raise
    f2 = loader.get_template_async('pastebin/error.html')
    f3 = loader.get_template_async('pastebin/paste.html')

        fileinfo= yield f1
    except DoesNotExist:
        t = yield f2
        return HttpResponse(t.render(Context(dict(error='File does not

    f = yield open_async(fileinfo.filename)
    fcontents = yield f.read_async()
    t = yield f3
    return HttpResponse(t.render(Context(dict(file=fcontents))))

You could easily decide not to bother loading the error template
asynchronously (assuming most requests don't fail), and you could move
the creation of f3 below the try/except. But you get the idea. Even if
you do everything serially, inserting the yields and _async calls
would make this more parallellizable without the use of threads. (If
you were using threads, all this would be moot of course -- but then
your limit on requests being handled concurrently probably goes way

> We _need_ a pipeline or something computationally equivalent or more
> powerful. Results from past "deferred computations" need to be passed
> forward into future "deferred computations", in order to implement
> this at all.

Yeah, and I think that a single generator using multiple yields is the
ideal pipeline to me (see my example near the top based on

> This is not a style issue, this is an issue of needing to be able to
> solve problems that involve more than one computation where the
> results of every computation matters somewhere. It's just that in this
> case, some of the computations are computed asynchronously.

And I think generators do this very well.

>> I am totally open to learning from Twisted's experience. I hope that
>> you are willing to share even the end result might not look like
>> Twisted at all -- after all in Python 3.3 we have "yield from" and
>> return from a generator and many years of experience with different
>> styles of async APIs. In addition to Twisted, there's Tornado and
>> Monocle, and then there's the whole greenlets/gevent and
>> Stackless/microthreads community that we can't completely ignore. I
>> believe somewhere is an ideal async architecture, and I hope you can
>> help us discover it.
>> (For example, I am very interested in Twisted's experiences writing
>> real-world performant, robust reactors.)
> For that stuff, you'd have to speak to the main authors of Twisted.
> I'm just a twisted user. :(

They seem to be mostly ignoring this conversation, so your standing in
as a proxy for them is much appreciated!

> In the end it really doesn't matter what API you go with. The Twisted
> people will wrap it up so that they are compatible, as far as that is
> possible.

And I want to ensure that that is possible and preferably easy, if I
can do it without introducing too many warts in the API that
non-Twisted users see and use.

> I hope I haven't detracted too much from the main thrust of the
> surrounding discussion. Futures/deferreds are a pretty big tangent, so
> sorry. I justified it to myself by figuring that it'd probably come up
> anyway, somehow, since these are useful abstractions for asynchronous
> programming.

Not at all. This has been a valuable refresher for me!

--Guido van Rossum (

From nadeem.vawda at  Sat Oct 13 00:16:38 2012
From: nadeem.vawda at (Nadeem Vawda)
Date: Sat, 13 Oct 2012 00:16:38 +0200
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 8, 2012 at 8:47 PM, Antoine Pitrou <solipsis at> wrote:
> - `p[q]` joins path q to path p

-1. Much less intuitive than the other two proposed operators.

> - `p + q` joins path q to path p

-1. Silently does the wrong thing if p and q are both strings.

> - `p / q` joins path q to path p

+1. Reads naturally, and fails loudly if p is a string.

> - `p.join(q)` joins path q to path p

-1. Produces a nonsensical result if p and q are both strings. I'd be
+1 on `p.joinpath(q)`, since it doesn't have this problem.


From breamoreboy at  Sat Oct 13 00:38:59 2012
From: breamoreboy at (Mark Lawrence)
Date: Fri, 12 Oct 2012 23:38:59 +0100
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
Message-ID: <k5a63s$r0p$>

On 08/10/2012 19:47, Antoine Pitrou wrote:
> Hello,
> Since there has been some controversy about the joining syntax used in
> PEP 428 (filesystem path objects), I would like to run an informal poll
> about it. Please answer with +1/+0/-0/-1 for each proposal:
> - `p[q]` joins path q to path p
> - `p + q` joins path q to path p
> - `p / q` joins path q to path p
> - `p.join(q)` joins path q to path p
> (you can include a rationale if you want, but don't forget to vote :-))
> Thank you
> Antoine.

How about using the caret symbol to join so `p ^ q`?  Rationale, it 
looks like a miniature combination of the backslash and forwardslash so 
should keep Windows and *nix camps happy, plus it's only used in Python 
(I think?) for bitwise operations so shouldn't confuse anybody. 
Parachute is ready for the antiaircraft fire :)


Mark Lawrence.

From guido at  Sat Oct 13 00:49:36 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 15:49:36 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

[Responding to yet another message in the original thread]

On Thu, Oct 11, 2012 at 9:45 PM, Trent Nelson <trent at> wrote:
> On Thu, Oct 11, 2012 at 07:40:43AM -0700, Antoine Pitrou wrote:
>> On Wed, 10 Oct 2012 20:55:23 -0400 Trent Nelson <trent at> wrote:
>> >     You could leverage this with kqueue and epoll; have similar threads
>> >     set up to simply process I/O independent of the GIL, using the same
>> >     facilities that would be used by IOCP-processing threads.

>> Would you really win anything by doing I/O in separate threads, while
>> doing normal request processing in the main thread?

>     If the I/O threads can run independent of the GIL, yes, definitely.
>     The whole premise of IOCP is that the kernel takes care of waking
>     one of your I/O handlers when data is ready.  IOCP allows that to
>     happen completely independent of your application's event loop.
>     It really is the best way to do I/O.  The Windows NT design team
>     got it right from the start.  The AIX and Solaris implementations
>     are semantically equivalent to Windows, without the benefit of
>     automatic thread pool management (and a few other optimisations).
>     On Linux and BSD, you could get similar functionality by spawning
>     I/O threads that could also run independent of the GIL.  They would
>     differ from the IOCP worker threads in the sense that they all have
>     their own little event loops around epoll/kqueue+timeout.  i.e. they
>     have to continually ask "is there anything to do with this set of
>     fds", then process the results, then manage set synchronisation.
>     IOCP threads, on the other hand, wait for completion of something
>     that has already been requested.  The thread body implementation is
>     significantly simpler, and no synchronisation primitives are needed.

>> That said, the idea of a common API architected around async I/O,
>> rather than non-blocking I/O, sounds interesting at least theoretically.

(Oh, what a nice distinction.)

>     It's the best way to do it.  There should really be a libevent-type
>     library (libiocp?) that leverages IOCP where possible, and fakes it
>     when not using a half-sync/half-async pattern with threads and epoll
>     or kqueue on Linux and FreeBSD, falling back to processes and poll
>     on everything else (NetBSD, OpenBSD and HP-UX (the former two not
>     having robust-enough pthread implementations, the latter not having
>     anything better than select or poll)).

In which category does OS X fall?

>     However, given that the best IOCP implementations are a) Windows by
>     a huge margin, and then b) Solaris and AIX in equal, distant second
>     place, I can't see that happening any time soon.
>     (Trying to use IOCP in the reactor fashion described above for epoll
>      and kqueue is far more limiting than having an IOCP-oriented API
>      and faking it for platforms where native support isn't available.)

How close would our abstracted reactor interface have to be exactly
like IOCP? The actual IOCP API calls have very little to recommend
them -- it's the implementation and the architecture that we're after.
But we want it to be able to use actual IOCP calls on all systems that
have them.

>> Maybe all those outdated Snakebite Operating Systems are useful for
>> something after all. ;-P

>     All the operating systems are the latest version available!
>     In addition, there's also a Solaris 9 and HP-UX 11iv2 box.
>     The hardware, on the other hand... not so new in some cases.

--Guido van Rossum (

From dreamingforward at  Sat Oct 13 00:56:06 2012
From: dreamingforward at (Mark Adam)
Date: Fri, 12 Oct 2012 17:56:06 -0500
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

>>> I would gladly give up a small amount of speed for better control
>>> over floats, such as whether 1/0.0 raised an exception or
>>> returned infinity.
>> Umm, you would be giving up a *lot* of speed.  Native floating point
>> happens right in the processor, so if you want special behavior, you'd
>> have to take the floating point out of hardware and into "user space".
> Even in user-space, you're not giving up that much speed in practical
> terms, at least not for my needs. The new decimal module in Python 3.3 is
> less than a factor of 10 times slower than Python's floats, which makes it
> pretty much instantaneous to my mind :)

Hmm, well, if it's only that much slower, then we should implement
Rationals and get rid of the issue altogether.


From python at  Sat Oct 13 00:57:30 2012
From: python at (MRAB)
Date: Fri, 12 Oct 2012 23:57:30 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-12 22:37, Ethan Furman wrote:
> Ram Rachum wrote:
>> Hi everybody,
>> Today a funny thought occurred to me. Ever since I've learned to program
>> when I was a child, I've taken for granted that when programming, the
>> sign used for multiplication is *. But now that I think about it, why?
>> Now that we have Unicode, why not use ? ?
Why not use ? ?

> Because it is too easy to confuse ? with .
Because it is too easy to confuse ? with x

> Because it is not solving a problem.

> Because it would still take work, and then easily cause confusion.
> <aside>
> In college we dropped the ? and just wrote stuff like:
> (x + z)(x - y)
> but we can't do that in Python because they are function calls.
> </aside>
> In short, I don't see it happening.

From guido at  Sat Oct 13 01:22:39 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 16:22:39 -0700
Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep)
Message-ID: <>

[Hopefully this is the last spin-off thread from "asyncore: included
batteries don't fit"]

>> > If there's one take away idea from async-pep, it's reusable protocols.

>> Is there a newer version that what's on
>> ? It seems to be missing any
>> specific proposals, after spending a lot of time giving a rationale
>> and defining some terms. The version on
>> doesn't seem to be any more complete.

> Correct.

So it's totally unfinished?

> If I had to change it today, I'd throw out consumers and producers and just
> stick to a protocol API.
> Do you feel that there should be less talk about rationale?

No, but I feel that there should be some actual specification. I am
also looking forward to an actual meaty bit of example code -- ISTR
you mentioned you had something, but that it was incomplete, and I
can't find the link.

>> > The PEP should probably be a number of PEPs. At first sight, it seems
>> > that this number is at least four:
>> >
>> > 1. Protocol and transport abstractions, making no mention of
>> > asynchronous IO
>> > (this is what I want 3153 to be, because it's small, manageable, and
>> > virtually everyone appears to agree it's a fantastic idea)
>> But the devil is in the details. *What* specifically are you
>> proposing? How would you write a protocol handler/parser without any
>> reference to I/O? Most protocols are two-way streets -- you read some
>> stuff, and you write some stuff, then you read some more. (HTTP may be
>> the exception here, if you don't keep the connection open.)
> It's not that there's *no* reference to IO: it's just that that reference is
> abstracted away in data_received and the protocol's transport object, just
> like Twisted's IProtocol.

The words "data_received" don't even occur in the PEP.

>> > 2. A base reactor interface
>> I agree that this should be a separate PEP. But I do think that in
>> practice there will be dependencies between the different PEPs you are
>> proposing.
> Absolutely.
>> > 3. A way of structuring callbacks: probably deferreds with a built-in
>> > inlineCallbacks for people who want to write synchronous-looking code
>> > with
>> > explicit yields for asynchronous procedures
>> Your previous two ideas sound like you're not tied to backward
>> compatibility with Tornado and/or Twisted (not even via an adaptation
>> layer). Given that we're talking Python 3.4 here that's fine with me
>> (though I think we should be careful to offer a path forward for those
>> packages and their users, even if it means making changes to the
>> libraries).
> I'm assuming that by previous ideas you mean points 1, 2: protocol interface
> + reactor interface.


> I don't see why twisted's IProtocol couldn't grow an adapter for stdlib
> Protocols. Ditto for Tornado. Similarly, the reactor interface could be
> *provided* (through a fairly simple translation layer) by different
> implementations, including twisted.


>> But Twisted Deferred is pretty arcane, and I would much
>> rather not use it as the basis of a forward-looking design. I'd much
>> rather see what we can mooch off PEP 3148 (Futures).
> I think this needs to be addressed in a separate mail, since more stuff has
> been said about deferreds in this thread.

Yes, that's in the thread with subject "The async API of the future:
Twisted and Deferreds".

>> > 4+ adapting the stdlib tools to using these new things
>> We at least need to have an idea for how this could be done. We're
>> talking serious rewrites of many of our most fundamental existing
>> synchronous protocol libraries (e.g. httplib, email, possibly even
>> io.TextWrapper), most of which have had only scant updates even
>> through the Python 3 transition apart from complications to deal with
>> the bytes/str dichotomy.
> I certainly agree that this is a very large amount of work. However, it has
> obvious huge advantages in terms of code reuse. I'm not sure if I understand
> the technical barrier though. It should be quite easy to create a blocking
> API with a protocol implementation that doesn't care; just call
> data_received with all your data at once, and presto! (Since transports in
> general don't provide guarantees as to how bytes will arrive, existing
> Twisted IProtocols have to do this already anyway, and that seems to work
> fine.)

Hmm... I guess that depends on how your legacy code works. As Barry
mentioned somewhere, the email package's feedparser() is an attempt at
implementing this -- but he sounded he has doubts that it works as-is
in an async environment.

However I am more worried about pull-based APIs. Take (as an extreme
example) the standard stream API for reading, especially
TextIOWrapper. I could see how we could turn the *writing* APIs async
easily enough, but I don't see how to do it for the reading end -- you
can't seriously propose to read the entire file into the buffer and
then satisfy all reads from memory.

>> > Re: forward path for existing asyncore code. I don't remember this being
>> > raised as an issue. If anything, it was mentioned in passing, and I think
>> > the answer to it was something to the tune of "asyncore's API is broken,
>> > fixing it is more important than backwards compat". Essentially I agree with
>> > Guido that the important part is an upgrade path to a good third-party
>> > library, which is the part about asyncore that REALLY sucks right now.
>> I have the feeling that the main reason asyncore sucks is that it
>> requires you to subclass its Dispatcher class, which has a rather
>> treacherous interface.
> There's at least a few others, but sure, that's an obvious one. Many of the
> objections I can raise however don't matter if there's already an *existing
> working solution*. I mean, sure, it can't do SSL, but if you have code that
> does what you want right now, then obviously SSL isn't actually needed.

I think you mean this as an indication that providing the forward path
for existing asyncore apps shouldn't be rocket science, right? Sure, I
don't want to worry about that, I just want to make sure that we don't
*completely* paint ourselves into the wrong corner when it comes to

>> > Regardless, an API upgrade is probably a good idea. I'm not sure if it
>> > should go in the first PEP: given the separation I've outlined above (which
>> > may be too spread out...), there's no obvious place to put it besides it
>> > being a new PEP.
>> Aren't all your proposals API upgrades?
> Sorry, that was incredibly poor wording. I meant something more of an
> adapter: an upgrade path for existing asyncore code to new and shiny 3153
> code.

Yes, now it makes sense.

>> > Re base reactor interface: drawing maximally from the lessons learned in
>> > twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
>> > etc), asynchronous-looking name lookup, fd handling are the important
>> > parts.
>> That actually sounds more concrete than I'd like a reactor interface
>> to be. In the App Engine world, there is a definite need for a
>> reactor, but it cannot talk about file descriptors at all -- all I/O
>> is defined in terms of RPC operations which have their own (several
>> layers of) async management but still need to be plugged in to user
>> code that might want to benefit from other reactor functionality such
>> as scheduling and placing a call at a certain moment in the future.
> I have a hard time understanding how that would work well outside of
> something like GAE. IIUC, that level of abstraction was chosen because it
> made sense for GAE (and I don't disagree), but I'm not sure it makes sense
> here.

I think I answered this in the reactors thread -- I propose an I/O
object abstraction that is not directly tied to a file descriptor, but
for which a concrete implementation can be made to support file
descriptors, and another to support App Engine RPC.

> In this example, where would eg the select/epoll/whatever calls happen? Is
> it something that calls the reactor that then in turn calls whatever?

App Engine doesn't have select/epoll/whatever, so it would have a
reactor implementation that doesn't use them. But the standard Unix
reactor would support file descriptors using select/etc.

Please respond in the reactors thread.

>> > call_every can be implemented in terms of call_later on a separate object,
>> > so I think it should be (eg twisted.internet.task.LoopingCall). One thing
>> > that is apparently forgotten about is event loop integration. The prime way
>> > of having two event loops cooperate is *NOT* "run both in parallel", it's
>> > "have one call the other". Even though not all loops support this, I think
>> > it's important to get this as part of the interface (raise an exception for
>> > all I care if it doesn't work).
>> This is definitely one of the things we ought to get right. My own
>> thoughts are slightly (perhaps only cosmetically) different again:
>> ideally each event loop would have a primitive operation to tell it to
>> run for a little while, and then some other code could tie several
>> event loops together.
> As an API, that's pretty close to Twisted's IReactorCore.iterate, I think.
> It'd work well enough. The issue is only with event loops that don't
> cooperate so well.

Again, a topic for the reactor thread.

But I'm really hoping you'll make good on your promise of redoing
async-pep, giving some actual specifications and example code, so I
can play with it.

--Guido van Rossum (

From shibturn at  Sat Oct 13 01:39:21 2012
From: shibturn at (Richard Oudkerk)
Date: Sat, 13 Oct 2012 00:39:21 +0100
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <k5a9nb$l3j$>

On 12/10/2012 11:11pm, Guido van Rossum wrote:
> Using Futures and generator coroutines, I would do it as follows. I'm
> hypothesizing that for every blocking API foo() there is a
> corresponding non-blocking API foo_async() with the same call
> signature, and returning a Future whose result is what the synchronous
> API returns (and raises what the synchronous call would raise, if
> there's an error). These are the conventions I use in NDB. I'm also
> inventing a @task decorator.
>   @task
>   def view_paste_async(request, filekey):
>      # Create Futures -- no yields!
>      f1 = Pastes.objects.get_async(key=filekey) # This won't raise
>      f2 = loader.get_template_async('pastebin/error.html')
>      f3 = loader.get_template_async('pastebin/paste.html')
>      try:
>          fileinfo= yield f1
>      except DoesNotExist:
>          t = yield f2
>          return HttpResponse(t.render(Context(dict(error='File does not
> exist'))))
>      f = yield open_async(fileinfo.filename)
>      fcontents = yield f.read_async()
>      t = yield f3
>      return HttpResponse(t.render(Context(dict(file=fcontents))))

So would the futures be registered with the reactor as soon as they are 
created, or only when they are yielded?  I can't see how there can be 
any "concurrency" if they don't start till they are yielded.  It would 
be like doing

    t1 = Thread(target=f1)
    t2 = Thread(target=f2)
    t3 = Thread(target=f3)

But if the futures are registered immediately with the reactor then does 
that mean there is a singleton reactor?  That seems rather inflexible.


From greg.ewing at  Sat Oct 13 02:01:23 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 13 Oct 2012 13:01:23 +1300
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> - There's an abstract Reactor class and an abstract Async I/O object
> class.

Can we please use a better term than "reactor" for this?
Its meaning is only obvious to someone familiar with Twisted.

Not being such a person, it's taken me a while to figure out
from this discussion that it refers to the central object
implementing the event loop, and not one of the user-supplied
objects that could equally well be described as "reacting"
to events.

Something like "dispatcher" would be clearer, IMO.


From greg.ewing at  Sat Oct 13 02:17:28 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 13 Oct 2012 13:17:28 +1300
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

Antoine Pitrou wrote:

> On Fri, 12 Oct 2012 11:13:23 -0700
> Guido van Rossum <guido at> wrote:
>>ote that the callback is *not* a
>>designated method on the I/O object!
> Why isn't it?

One reason might be that it more or less forces you to
subclass the I/O object, instead of just using one of
a few predefined ones for file, socket, etc.

Although this could be ameliorated by giving the standard
I/O objects the ability to have callbacks plugged into
them. Then you could use whichever style was most


From guido at  Sat Oct 13 02:22:07 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 17:22:07 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <k5a9nb$l3j$>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 4:39 PM, Richard Oudkerk <shibturn at> wrote:
> On 12/10/2012 11:11pm, Guido van Rossum wrote:
>> Using Futures and generator coroutines, I would do it as follows. I'm
>> hypothesizing that for every blocking API foo() there is a
>> corresponding non-blocking API foo_async() with the same call
>> signature, and returning a Future whose result is what the synchronous
>> API returns (and raises what the synchronous call would raise, if
>> there's an error). These are the conventions I use in NDB. I'm also
>> inventing a @task decorator.
>>   @task
>>   def view_paste_async(request, filekey):
>>      # Create Futures -- no yields!
>>      f1 = Pastes.objects.get_async(key=filekey) # This won't raise
>>      f2 = loader.get_template_async('pastebin/error.html')
>>      f3 = loader.get_template_async('pastebin/paste.html')
>>      try:
>>          fileinfo= yield f1
>>      except DoesNotExist:
>>          t = yield f2
>>          return HttpResponse(t.render(Context(dict(error='File does not exist'))))
>>      f = yield open_async(fileinfo.filename)
>>      fcontents = yield f.read_async()
>>      t = yield f3
>>      return HttpResponse(t.render(Context(dict(file=fcontents))))
> So would the futures be registered with the reactor as soon as they are
> created, or only when they are yielded?  I can't see how there can be any
> "concurrency" if they don't start till they are yielded.  It would be like
> doing
>    t1 = Thread(target=f1)
>    t2 = Thread(target=f2)
>    t3 = Thread(target=f3)
>    t1.start()
>    t1.join()
>    t2.start()
>    t2.join()
>    t3.start()
>    t3.join()
> But if the futures are registered immediately with the reactor then does
> that mean there is a singleton reactor?  That seems rather inflexible.

I don't think it follows that there can only be one reactor if they
are registered immediately. There could be a notion of "current
reactor" maintained in thread-local context; moreover it could depend
on the reactor that made the callback that caused the current task to
run. The reactor could also be chosen by the code that made the
Future. (Though I'm not immediately sure how that would work in the
yield-from scenario -- but I'm sure there's a way.)

FWIW, in NDB there is one event loop per thread; separate threads are
handling separate requests and are completely independent. Also, in
NDB there's some code that turns Futures into actual RPCs that runs
only once there are no more immediately runnable tasks. I think that
in general such behaviors are up to the reactor implementation for the
platform though, and should not directly be reflected in the reactor

--Guido van Rossum (

From mwm at  Sat Oct 13 02:26:20 2012
From: mwm at (Mike Meyer)
Date: Fri, 12 Oct 2012 19:26:20 -0500
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 4:37 PM, Ethan Furman <ethan at> wrote:
> Ram Rachum wrote:
>> Hi everybody,
>> Today a funny thought occurred to me. Ever since I've learned to program
>> when I was a child, I've taken for granted that when programming, the sign
>> used for multiplication is *. But now that I think about it, why? Now that
>> we have Unicode, why not use ? ?
> Because it is too easy to confuse ? with .
> Because it is not solving a problem.
> Because it would still take work, and then easily cause confusion.

Because, unlike *, it's a valid character in identifiers. Which means
allowing it either breaks backwards compatibility or makes for some
very confusing usage conventions.

From guido at  Sat Oct 13 02:26:27 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 17:26:27 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 5:01 PM, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
>> - There's an abstract Reactor class and an abstract Async I/O object
>> class.
> Can we please use a better term than "reactor" for this?
> Its meaning is only obvious to someone familiar with Twisted.
> Not being such a person, it's taken me a while to figure out
> from this discussion that it refers to the central object
> implementing the event loop, and not one of the user-supplied
> objects that could equally well be described as "reacting"
> to events.
> Something like "dispatcher" would be clearer, IMO.

Sorry about that. I'm afraid it's too late for this thread's subject
line, but I will try to make sure that if and when this makes it into
the standard library it'll have a more appropriate name. I would
recommend event loop (which is the name I naturally would give it when
asked out of context) or I/O loop, which is what Tornado apparently
used. Dispatcher would not be my first choice.

FWIW, it's not a completely Twisted-specific term:

--Guido van Rossum (

From dreamingforward at  Sat Oct 13 02:58:29 2012
From: dreamingforward at (Mark Adam)
Date: Fri, 12 Oct 2012 19:58:29 -0500
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 3:43 PM, Guido van Rossum <guido at> wrote:
> On Thu, Oct 11, 2012 at 6:38 PM, Mark Adam <dreamingforward at> wrote:
>> Here's the thing:  the underlying O.S is always handling two major I/O
>> channels at any given time and it needs all it's attention to do this:
>>  the GUI and one of the following (network, file) I/O.  You can
>> shuffle these around all you want, but somewhere the O.S. kernel is
>> going to have to be involved, which means either portability is
>> sacrificed or speed if one is going to pursue and abstract, unified
>> async API.
> I'm convinced that the OS has to get involved. I'm not convinced that
> it will get in the way of designing an abstract unified API -- however
> that API will have to be more complicated than the kind of event loop
> that *only* handles network I/O or the kind that *only* handles GUI
> events.

Yes, however, as suggested in my other message, there are three
desires: {"cross-platform (OS) portability", "speed", "unified API"},
but you can only pick two.

One of these has to be sacrificed because there are users for all of those.

I think such a decision must be "deferred() "to some
"Future(Python4000)" in order to succeed at making  "Grand Unified
Theory" for hardware/OS/python synchronization.

(For the record, I do think it is possible, and indeed that is exactly
what I'm working on.   To make it work will require a compelling,
unified object model, forwarding the art of Computer Science...)


From guido at  Sat Oct 13 03:00:42 2012
From: guido at (Guido van Rossum)
Date: Fri, 12 Oct 2012 18:00:42 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 5:58 PM, Mark Adam <dreamingforward at> wrote:
> On Fri, Oct 12, 2012 at 3:43 PM, Guido van Rossum <guido at> wrote:
>> On Thu, Oct 11, 2012 at 6:38 PM, Mark Adam <dreamingforward at> wrote:
>>> Here's the thing:  the underlying O.S is always handling two major I/O
>>> channels at any given time and it needs all it's attention to do this:
>>>  the GUI and one of the following (network, file) I/O.  You can
>>> shuffle these around all you want, but somewhere the O.S. kernel is
>>> going to have to be involved, which means either portability is
>>> sacrificed or speed if one is going to pursue and abstract, unified
>>> async API.
>> I'm convinced that the OS has to get involved. I'm not convinced that
>> it will get in the way of designing an abstract unified API -- however
>> that API will have to be more complicated than the kind of event loop
>> that *only* handles network I/O or the kind that *only* handles GUI
>> events.
> Yes, however, as suggested in my other message, there are three
> desires: {"cross-platform (OS) portability", "speed", "unified API"},
> but you can only pick two.

Do you have any proof for that claim?

> One of these has to be sacrificed because there are users for all of those.
> I think such a decision must be "deferred() "to some
> "Future(Python4000)" in order to succeed at making  "Grand Unified
> Theory" for hardware/OS/python synchronization.
> (For the record, I do think it is possible, and indeed that is exactly
> what I'm working on.   To make it work will require a compelling,
> unified object model, forwarding the art of Computer Science...)

That would be the topic for a new thread, please.

--Guido van Rossum (

From mikegraham at  Sat Oct 13 03:06:01 2012
From: mikegraham at (Mike Graham)
Date: Fri, 12 Oct 2012 21:06:01 -0400
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 8:58 PM, Mark Adam <dreamingforward at> wrote:
>  there are three desires:
>  {"cross-platform (OS) portability", "speed", "unified API"},
> but you can only pick two.

There are many tradeoffs where this is the case, but this isn't one of
them. There are several systems that prove otherwise.


From trent at  Sat Oct 13 03:11:20 2012
From: trent at (Trent Nelson)
Date: Fri, 12 Oct 2012 21:11:20 -0400
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 03:49:36PM -0700, Guido van Rossum wrote:
> [Responding to yet another message in the original thread]
> On Thu, Oct 11, 2012 at 9:45 PM, Trent Nelson <trent at> wrote:
> >     It's the best way to do it.  There should really be a libevent-type
> >     library (libiocp?) that leverages IOCP where possible, and fakes it
> >     when not using a half-sync/half-async pattern with threads and epoll
> >     or kqueue on Linux and FreeBSD, falling back to processes and poll
> >     on everything else (NetBSD, OpenBSD and HP-UX (the former two not
> >     having robust-enough pthread implementations, the latter not having
> >     anything better than select or poll)).
> In which category does OS X fall?

    Oh, how'd I forget about OS X!  At the worst, it falls into the
    FreeBSD kqueue camp, having both a) kqueue and b) a performant
    pthread implementation.

    However, with the recent advent of Grand Central Dispatch, it's
    actually on par with Windows' IOCP+threadpool offerings, which is
    pretty cool.  (And apparently there are GCD ports in the works for
    Solaris, Linux and... Windows?!)

    Will reply to the other questions in a separate response.


From steve at  Sat Oct 13 04:41:18 2012
From: steve at (Steven D'Aprano)
Date: Sat, 13 Oct 2012 13:41:18 +1100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 13/10/12 07:27, Ram Rachum wrote:
> Hi everybody,
> Today a funny thought occurred to me. Ever since I've learned to program
> when I was a child, I've taken for granted that when programming, the sign
> used for multiplication is *. But now that I think about it, why? Now that
> we have Unicode, why not use ? ?

25 or so years ago, I used to do some programming in Apple's Hypertalk
language, which accepted ? in place of / for division. The use of two
symbols for the same operation didn't cause any problem for users. But then
Apple had the advantage that there was a single, system-wide, highly
discoverable way of typing non-ASCII characters at the keyboard, and Apple
users tended to pride themselves for using them.

I'm not entirely sure about MIDDLE DOT though: especially in small font sizes,
it falls foul of the design principle:

"syntax should not look like a speck of dust on Tim's monitor"

(paraphrasing... can anyone locate the original quote?)

and may be too easily confused with FULL STOP. Another problem is that MIDDLE
DOT is currently valid in identifiers, so that a?b would count as a single
name. Fixing this would require some fairly heavy lifting (a period of
deprecation and warnings for any identifier using MIDDLE DOT) before
introducing it as an operator. So that's a lot of effort for very little gain.

If I were designing a language from scratch today, with full Unicode support
from the beginning, I would support a rich set of operators possibly even
including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user
to use them wisely or not at all. But I don't think it would be appropriate
for Python to add them, at least not before Python 4: too much effort for too
little gain. Maybe in another ten years people will be less resistant to
Unicode operators.

> ?. People on Linux can type Alt-. .

For what it is worth, I'm using Linux and that does not work for me. I am
yet to find a decent method of entering non-ASCII characters.


From mwm at  Sat Oct 13 05:19:29 2012
From: mwm at (Mike Meyer)
Date: Fri, 12 Oct 2012 22:19:29 -0500
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, 12 Oct 2012 21:11:20 -0400
Trent Nelson <trent at> wrote:
>     However, with the recent advent of Grand Central Dispatch, it's
>     actually on par with Windows' IOCP+threadpool offerings, which is
>     pretty cool.  (And apparently there are GCD ports in the works for
>     Solaris, Linux and... Windows?!)

The port already exists for FreeBSD. As of 8.1, the kernel has
enhanced kqueue support for it, and devel/libdispatch installs the GCD
code. I'd be surprised if the other *BSD's haven't picked it up yet.

All of which makes me think that an async library based on GCD and
maybe IOCP for Windows if it's not available there would be reasonably

A standard Python library that made this as nice to use as it is from
MacRuby would be a good thing. You can find jkh (ex FreeBSD RE, now
running the OS X systems group for Apple) discussing Python and GCD here:

Mike Meyer <mwm at>
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail -

From dreamingforward at  Sat Oct 13 05:29:46 2012
From: dreamingforward at (Mark Adam)
Date: Fri, 12 Oct 2012 22:29:46 -0500
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 8:06 PM, Mike Graham <mikegraham at> wrote:
> On Fri, Oct 12, 2012 at 8:58 PM, Mark Adam <dreamingforward at> wrote:
>>  there are three desires:
>>  {"cross-platform (OS) portability", "speed", "unified API"},
>> but you can only pick two.
> There are many tradeoffs where this is the case, but this isn't one of
> them. There are several systems that prove otherwise.

...several **systems**?  i mean, you can accomplish such a task on a
*particular* O.S. but I don't know where this is the case across
*several* systems (Unix, Mac, and Windows).  I would like to know of
an example, if you have one?


From bruce at  Sat Oct 13 06:20:30 2012
From: bruce at (Bruce Leban)
Date: Fri, 12 Oct 2012 21:20:30 -0700
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

Well, I learned x as a multiplication symbol long before I learned either ?
or *, and in many fonts you can barely see the middle dot. Is there a good
reason, we can't just write foo x bar instead of foo * bar? If that's
confusing we could use ? instead. No one would ever confuse ? and x.

Or for that matter how about (~R?R?.?R)/R?1??R

Seriously: learning that * means multiplication is a very small thing. You
also need to learn what /, // and % do, and the difference between 'and'
and &, and between =, ==, != and /=.

--- Bruce

On Fri, Oct 12, 2012 at 7:41 PM, Steven D'Aprano <steve at>wrote:

> On 13/10/12 07:27, Ram Rachum wrote:
>> Hi everybody,
>> Today a funny thought occurred to me. Ever since I've learned to program
>> when I was a child, I've taken for granted that when programming, the sign
>> used for multiplication is *. But now that I think about it, why? Now that
>> we have Unicode, why not use ? ?
> t
> 25 or so years ago, I used to do some programming in Apple's Hypertalk
> language, which accepted ? in place of / for division. The use of two
> symbols for the same operation didn't cause any problem for users. But then
> Apple had the advantage that there was a single, system-wide, highly
> discoverable way of typing non-ASCII characters at the keyboard, and Apple
> users tended to pride themselves for using them.
> I'm not entirely sure about MIDDLE DOT though: especially in small font
> sizes,
> it falls foul of the design principle:
> "syntax should not look like a speck of dust on Tim's monitor"
> (paraphrasing... can anyone locate the original quote?)
> and may be too easily confused with FULL STOP. Another problem is that
> DOT is currently valid in identifiers, so that a?b would count as a single
> name. Fixing this would require some fairly heavy lifting (a period of
> deprecation and warnings for any identifier using MIDDLE DOT) before
> introducing it as an operator. So that's a lot of effort for very little
> gain.
> If I were designing a language from scratch today, with full Unicode
> support
> from the beginning, I would support a rich set of operators possibly even
> including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user
> to use them wisely or not at all. But I don't think it would be appropriate
> for Python to add them, at least not before Python 4: too much effort for
> too
> little gain. Maybe in another ten years people will be less resistant to
> Unicode operators.
> [...]
>  ?. People on Linux can type Alt-. .
> For what it is worth, I'm using Linux and that does not work for me. I am
> yet to find a decent method of entering non-ASCII characters.
> --
> Steven
> ______________________________**_________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tjreedy at  Sat Oct 13 06:22:43 2012
From: tjreedy at (Terry Reedy)
Date: Sat, 13 Oct 2012 00:22:43 -0400
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <k5aqb7$vik$>

On 10/12/2012 8:26 PM, Guido van Rossum wrote:
> On Fri, Oct 12, 2012 at 5:01 PM, Greg Ewing <greg.ewing at> wrote:

>> Can we please use a better term than "reactor" for this?
>> Its meaning is only obvious to someone familiar with Twisted.
>> Not being such a person, it's taken me a while to figure out
>> from this discussion that it refers to the central object
>> implementing the event loop, and not one of the user-supplied
>> objects that could equally well be described as "reacting"
>> to events.
>> Something like "dispatcher" would be clearer, IMO.
> Sorry about that. I'm afraid it's too late for this thread's subject
> line, but I will try to make sure that if and when this makes it into
> the standard library it'll have a more appropriate name. I would
> recommend event loop (which is the name I naturally would give it when
> asked out of context) or I/O loop, which is what Tornado apparently
> used. Dispatcher would not be my first choice.
> FWIW, it's not a completely Twisted-specific term:

Thanks for the clarification. Reactors react to events within an event 
loop* by dispatching them to handlers. Correct?

*Iteration rather than recursion is required because they continue the 
cycle indefinitely.

I am still fuzzy on edge-triggered versus level triggered in this 
context, as opposed to electronics.

Terry Jan Reedy

From jeanpierreda at  Sat Oct 13 06:44:35 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Sat, 13 Oct 2012 00:44:35 -0400
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 10:41 PM, Steven D'Aprano <steve at> wrote:
> If I were designing a language from scratch today, with full Unicode support
> from the beginning, I would support a rich set of operators possibly even
> including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user
> to use them wisely or not at all. But I don't think it would be appropriate
> for Python to add them, at least not before Python 4: too much effort for
> too
> little gain. Maybe in another ten years people will be less resistant to
> Unicode operators.

Python has cleverly left the $ symbol unused.

We can use it as a quasiquote to embed executable TeX.

    for x in xrange($b \cdot \sum_{i=1}^n \frac{x^n}{n!}$):

No need to wait for that new language, we can have a rich set of math
operators today!

-- Devin

From glyph at  Sat Oct 13 06:46:20 2012
From: glyph at (Glyph)
Date: Fri, 12 Oct 2012 21:46:20 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
Message-ID: <>

There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days.  There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion.

Would everyone interested in this please please please read <> several times?  Especially this section: <>.  If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation.

I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering.  Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer.  Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called.  Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it.  This is perhaps the central design error of asyncore.

If it needs a name, I suppose I'd call my preferred style "event triggering".

Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once.  If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk.


From ben at  Sat Oct 13 06:52:19 2012
From: ben at (Ben Darnell)
Date: Fri, 12 Oct 2012 21:52:19 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 11:13 AM, Guido van Rossum <guido at> wrote:
> [This is the first spin-off thread from "asyncore: included batteries
> don't fit"]
> On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell <ben at> wrote:
>> On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum <guido at> wrote:
>>>> Re base reactor interface: drawing maximally from the lessons learned in
>>>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later,
>>>> etc), asynchronous-looking name lookup, fd handling are the important parts.
>>> That actually sounds more concrete than I'd like a reactor interface
>>> to be. In the App Engine world, there is a definite need for a
>>> reactor, but it cannot talk about file descriptors at all -- all I/O
>>> is defined in terms of RPC operations which have their own (several
>>> layers of) async management but still need to be plugged in to user
>>> code that might want to benefit from other reactor functionality such
>>> as scheduling and placing a call at a certain moment in the future.
>> So are you thinking of something like
>> reactor.add_event_listener(event_type, event_params, func)?  One thing
>> to keep in mind is that file descriptors are somewhat special (at
>> least in a level-triggered event loop), because of the way the event
>> will keep firing until the socket buffer is drained or the event is
>> unregistered.  I'd be inclined to keep file descriptors in the
>> interface even if they just raise an error on app engine, since
>> they're fairly fundamental to the (unixy) event loop.  On the other
>> hand, I don't have any experience with event loops outside the
>> unix/network world so I don't know what other systems might need for
>> their event loops.
> Hmm... This is definitely an interesting issue. I'm tempted to believe
> that it is *possible* to change every level-triggered setup into an
> edge-triggered setup by using an explicit loop -- but I'm not saying
> it is a good idea. In practice I think we need to support both equally
> well, so that the *app* can decide which paradigm to use. E.g. if I
> were to implement an HTTP server, I might use level-triggered for the
> "accept" call on the listening socket, but edge-triggered for
> everything else. OTOH someone else might prefer a buffered stream
> abstraction that just keeps filling its read buffer (and draining its
> write buffer) using level-triggered callbacks, at least up to a
> certain buffer size -- we have to be robust here and make it
> impossible for an evil client to fill up all our memory without our
> approval!

First of all, to clear up the terminology, edge-triggered actually has
a specific meaning in this context that is separate from the question
of whether callbacks are used more than once. The edge- vs
level-triggered question is moot with one-shot callbacks, but when
you're reusing callbacks in edge-triggered mode you won't get a second
call until you've drained the socket buffer and then it becomes
readable again.  This turns out to be helpful for hybrid
event/threaded systems, since the network thread may go into the next
iteration of its loop while the worker thread is still consuming the
data from a previous event.

You can't always emulate edge-triggered behavior since it needs
knowledge of internal socket buffers (epoll has an edge-triggered mode
and I think kqueue does too, but you can't get edge-triggered behavior
if you're falling back to select()).  However, you can easily get
one-shot callbacks from an event loop with persistent callbacks just
by unregistering the callback once it has received an event.  This has
a performance cost, though - in tornado we try to avoid unnecessary
unregister/register pairs.

> I'm not at all familiar with the Twisted reactor interface. My own
> design would be along the following lines:
> - There's an abstract Reactor class and an abstract Async I/O object
> class. To get a reactor to call you back, you must give it an I/O
> object, a callback, and maybe some more stuff. (I have gone back and
> like passing optional args for the callback, rather than requiring
> lambdas to create closures.) Note that the callback is *not* a
> designated method on the I/O object! In order to distinguish between
> edge-triggered and level-triggered, you just use a different reactor
> method. There could also be a reactor method to schedule a "bare"
> callback, either after some delay, or immediately (maybe with a given
> priority), although such functionality could also be implemented
> through magic I/O objects.

One reason to have a distinct method for running a bare callback is
that you need to have some thread-safe entry point, but you otherwise
don't really want locking on all the internal methods.  Tornado's
IOLoop.add_callback and Twisted's Reactor.callFromThread can be used
to run code in the IOLoop's thread (which can then call the other
IOLoop methods).

We also have distinct methods for running a callback after a timeout,
although if you had a variant of add_handler that didn't require a
subsequent call to remove_handler you could probably do timeouts using
a magical IO object. (an additional subtlety for the time-based
methods is how time is computed.  I recently added support in tornado
to optionally use time.monotonic instead of time.time)

> - In systems supporting file descriptors, there's a reactor
> implementation that knows how to use select/poll/etc., and there are
> concrete I/O object classes that wrap file descriptors. On Windows,
> those would only be socket file descriptors. On Unix, any file
> descriptor would do. To create such an I/O object you would use a
> platform-specific factory. There would be specialized factories to
> create e.g. listening sockets, connections, files, pipes, and so on.

Jython is another interesting case - it has a select() function that
doesn't take integer file descriptors, just the opaque objects
returned by socket.fileno().

While it's convenient to have higher-level constructors for various
specialized types, I'd like to emphasize that having the low-level
interface is important for interoperability.  Tornado doesn't know
whether the file descriptors are listening sockets, connected sockets,
or pipes, so we'd just have to pass in a file descriptor with no other

> - In systems like App Engine that don't support async I/O on file
> descriptors at all, the constructors for creating I/O objects for disk
> files and connection sockets would comply with the interface but fake
> out almost everything (just like today, using httplib or httplib2 on
> App Engine works by adapting them to a "urlfetch" RPC request).

Why would you be allowed to make IO objects for sockets that don't
work?  I would expect that to just raise an exception.  On app engine
RPCs would be the only supported async I/O objects (and timers, if
those are implemented as magic I/O objects), and they're not
implemented in terms of sockets or files.


From greg.ewing at  Sat Oct 13 07:05:53 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 13 Oct 2012 18:05:53 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> But the fact remains that you can't completely hide these yields --
> the best you can do is replace them with a single yield-from.

Yes, as things stand, a call to a sub-generator is always
going to look different from an ordinary call, all the way
up the call chain. I regard that as a wart remaining to be
fixed, although opinions seem to differ.

I do think it's a bit unfortunate that 'yield from' contains
the word 'yield', though, since in this context it's best
thought of as a kind of function call rather than a kind
of yield.

>>>This seems to be begging to be collapsed into a single line, e.g.
>>>      data = yield sock.recv_async(1024)
>>I'm not sure how you're imagining that would work, but whatever
>>it is, it's wrong -- that just doesn't make sense.
> It makes a lot of sense in a world using
> Futures and a Future-aware trampoline/scheduler, instead of yield-from
> and bare generators. I can see however that you don't like it in the
> yield-from world you're envisioning

I don't like it because, to my mind, Futures et al are kludgy
workarounds for not having something like yield-from. Now that
we do, we shouldn't need them any more.

I can see the desirability of being able to interoperate with
existing code that uses them, but I'm not convinced that building
awareness of them into the guts of the scheduler is the best
way to go about it.

Why Futures in particular? What if someone wants to use Deferreds
instead, or some other similar thing? At some point you need
to build adapters. I'd rather see Futures treated on an equal
footing with the others, and dealt with by building on the
primitive facilities provided by the scheduler.

> But the only use for send() on a generator is when using it as a
> coroutine for a concurrent tasks system... And you're claiming, it seems,
> that you prefer yield-from for concurrent tasks.

The particular technique of using send() to supply a return
value for a simulated sub-generator call is made obsolete
by yield-from.

I can't rule out the possibility that there may be other
uses for send() in a concurrent task system. I just haven't
found the need for it in any of the examples I've developed
so far.

> I feel that "value = yield <something that returns a
> Future>" is quite a good paradigm,

I feel that it shouldn't be *necessary* to yield any kind
of special object in order to suspend a task; just a simple
'yield' should be sufficient.

It might make sense to allow this as an *option* for the
purpose of interoperating with existing async code. But
I would much rather the public API for this was something

    value = yield from wait_for_future(a_future)

leaving it up to the implementation whether this is achieved
by yielding the Future or by some other means. Then we can
also have wait_for_deferred(), etc., without giving any one
of them special status.

 > One is what to do with operations directly
> implemented in C. It would be horrible to require C to create a fake
> generator. Fortunately an
> iterator whose final __next__() raises StopIteration(<value>) works in
> the latest Python 3.3

Well, such an iterator *is* a "fake generator" in all the
respects that the scheduler cares about. Especially if the
scheduler doesn't rely on send(), so your C object doesn't
have to implement a send() method. :-)

> Well, I'm talking about a decorator that you *always* apply, and which
> does nothing (or very little) when wrapping a generator, but adds
> generator behavior when wrapping a non-generator function.

As long as it's optional, I wouldn't object to the existence
of such a decorator, although I would probably choose not to
use it most of the time.

I would object if it was *required* to make things work
properly, because I would worry that this was a symptom of
unnecessary complication and inefficiency in the underlying

> (6) Spawning off multiple async subtasks
> Futures:
>   f1 = subtask1(args1)  # Note: no yield!!!
>   f2 = subtask2(args2)
>   res1, res2 = yield f1, f2
> Yield-from:
>   ??????????
> *** Greg, can you come up with a good idiom to spell concurrency at
> this level? Your example only has concurrency in the philosophers
> example, but it appears to interact directly with the scheduler, and
> the philosophers don't return values. ***

I don't regard the need to interact directly with the scheduler
as a problem. That's because in the world I envisage, there would
only be *one* scheduler, for much the same reason that there can
really only be one async event handling loop in any given program.
It would be part of the standard library and have a well-known
API that everyone uses.

If you don't want things to be that way, then maybe this is a
good use for yielding things to the scheduler. Yielding a generator
could mean "spawn this as a concurrent task".

You could go further and say that yielding a tuple of generators
means to spawn them all concurrently, wait for them all to
complete and send back a tuple of the results. The yield-from
code would then look pretty much the same as the futures code.

However, I'm inclined to think that this is too much functionality
to build directly into the scheduler, and that it would be better
provided by a class or function that builds on more primitive
facilities. So it would look something like

    task1 = subtask1(args1)
    task2 = subtask2(args2)
    res1, res2 = yield from par(task1, task2)

where the implementation of par() is left as an exercise for
the reader.

> (7) Checking whether an operation is already complete
> Futures:
>   if f.done(): ...

I'm inclined to think that this is not something the
scheduler needs to be directly concerned with. If it's
important for one task to know when another task is completed,
it's up to those tasks to agree on a way of communicating
that information between them.

Although... is there a way to non-destructively test whether
a generator is exhausted? If so, this could easily be provided
as a scheduler primitive.

> (8) Getting the result of an operation multiple times
> Futures:
>   f = async_op(args)
>   # squirrel away a reference to f somewhere else
>   r = yield f
>   # ... later, elsewhere
>   r = f.result()

Is this really a big deal? What's wrong with having to store
the return value away somewhere if you want to use it
multiple times?

> (9) Canceling an operation
> Futures:
>   f.cancel()

This would be another scheduler primitive.


This would remove the task from the ready list or whatever
queue it's blocked on, and probably throw an exception into
it to give it a chance to clean up.

> (10) Registering additional callbacks
> Futures:
>   f.add_done_callback(callback)

Another candidate for a higher-level facility, I think.
The API might look something like

    cbt = task_with_callbacks(task)
    yield from

I may have a go at coming up with implementations for some of
these things and send them in later posts.


From ben at  Sat Oct 13 07:26:46 2012
From: ben at (Ben Darnell)
Date: Fri, 12 Oct 2012 22:26:46 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <k5a9nb$l3j$>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 4:39 PM, Richard Oudkerk <shibturn at> wrote:
>>   @task
>>   def view_paste_async(request, filekey):
>>      # Create Futures -- no yields!
>>      f1 = Pastes.objects.get_async(key=filekey) # This won't raise
>>      f2 = loader.get_template_async('pastebin/error.html')
>>      f3 = loader.get_template_async('pastebin/paste.html')
>>      try:
>>          fileinfo= yield f1
>>      except DoesNotExist:
>>          t = yield f2
>>          return HttpResponse(t.render(Context(dict(error='File does not
>> exist'))))
>>      f = yield open_async(fileinfo.filename)
>>      fcontents = yield f.read_async()
>>      t = yield f3
>>      return HttpResponse(t.render(Context(dict(file=fcontents))))
> So would the futures be registered with the reactor as soon as they are
> created, or only when they are yielded?  I can't see how there can be any
> "concurrency" if they don't start till they are yielded.  It would be like
> doing

The Futures are not what is doing the work here, they just hold the
result.  In this example the get_async() functions register something
with the reactor when they are called.  When that "something" is done
(or perhaps after several "somethings" chained together), get_async
will set a result on its Future.

> But if the futures are registered immediately with the reactor then does
> that mean there is a singleton reactor?  That seems rather inflexible.

In most event-driven systems there is a global (or thread-local) event
loop, but it's also possible to pass one in explicitly to get_async().


From greg.ewing at  Sat Oct 13 07:37:31 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 13 Oct 2012 18:37:31 +1300
Subject: [Python-ideas] Is there a good reason to use *
	for	multiplication?
In-Reply-To: <>
References: <>
Message-ID: <>

Ram Rachum wrote:
> I could say that for newbies it's one small 
> confusion that could removed from the language. You and I have been 
> programming for a long time so we take it for granted that * means 
> multiplication, but for any other person that's just another 
> weird idiosyncrasy that further alienates programming.

Do you have any evidence that a substantial number of
beginners are confused by * for multiplication, or that
they have trouble remembering what it means once they've
been told?

If you do, is there further evidence that they would
find a dot to be any clearer?

The use of a raised dot to indicate multiplication of
numbers is actually quite rare even in mathematics, and I
would not expect anyone without a mathematical background
to even be aware of it.

In primary school we're taught that 'x' means multiplication.
Later when we come to algebra, we're taught not to use
any symbol at all, just write things next to each other.
A dot is only used in rare cases where there would
otherwise be ambiguity -- and even then it's often
preferred to parenthesise things instead.

And don't forget there's great potential for confusion
with the decimal point.


From solipsis at  Sat Oct 13 08:14:45 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 13 Oct 2012 08:14:45 +0200
Subject: [Python-ideas] The async API of the future: Twisted and
References: <>
Message-ID: <>

On Fri, 12 Oct 2012 15:11:54 -0700
Guido van Rossum <guido at> wrote:
> > 2. Method dispatch callbacks:
> >
> >     Similar to the above, the reactor or somebody has a handle on your
> > object, and calls methods that you've defined when events happen
> >     e.g. IProtocol's dataReceived method
> While I'm sure it's expedient and captures certain common patterns
> well, I like this the least of all -- calling fixed methods on an
> object sounds like a step back; it smells of the old Java way (before
> it had some equivalent of anonymous functions), and of asyncore, which
> (nearly) everybody agrees is kind of bad due to its insistence that
> you subclass its classes. (Notice how subclassing as the prevalent
> approach to structuring your code has gotten into a lot of discredit
> since 1996.)

But how would you write a dataReceived equivalent then? Would you have
a "task" looping on a read() call, e.g.

def my_protocol_main_loop(conn):
    while <some_condition>:
            data = yield
        except ConnectionError:

I'm not sure I understand the problem with subclassing. It works fine
in Twisted. Even in Python 3 we don't shy away from subclassing, for
example the IO stack is based on subclassing RawIOBase, BufferedIOBase,



Software development and contracting:

From greg.ewing at  Sat Oct 13 08:44:48 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 13 Oct 2012 19:44:48 +1300
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
	<k59ff3$i5i$> <>
Message-ID: <>

Joshua Landau wrote:

>     '.j/homeo/homes/homeh/homeu/homeacj/homeo/homes/homeh/homeu/homeaoj/homeo/homes/homeh/homeu/homeanj/homeo/homes/homeh/homeu/homeafj/homeo/homes/homeh/homeu/homeaij/homeo/homes/homeh/homeu/homeag'

Homeo, Homeo, wherefore path thou Homeo?


From ncoghlan at  Sat Oct 13 09:41:29 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 13 Oct 2012 17:41:29 +1000
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On Sat, Oct 13, 2012 at 7:00 AM, Ethan Furman <ethan at> wrote:
> My point about the Path(...(str(...))) sandwich still applies, though, for
> every function that isn't built in to Path.  :)

It's the same situation we were in with the design of the new
ipaddress module, and the answer is the same: implicit coercion just
creates way too many opportunities for errors to pass silently. We had
to create a backwards incompatible version of the language to
eliminate the semantic confusion between binary data and text data,
we're not going to introduce a similar confusion between arbitrary
text strings and objects that actually behave like filesystem paths.

str has a *big* API, and much of it doesn't make any sense in the
particular case of path objects. In particular, path objects shouldn't
be iterable, because it isn't clear what iteration should mean: it
could be path segments, it could be parent paths, or it could be
directory contents. It definitely *shouldn't* be individual
characters, but that's what we would get if it inherited from strings.

I do like the idea of introducing a "filesystem path" protocol though
(and Antoine's already considering that), which would give us the
implicit interoperability without the inheritance of an overbroad API.

Something else I've been thinking about is that it still feels wrong
to me to be making the Windows vs Posix behavioural decision at the
class level. It really feels more like a "decimal.Context" style API
would be more appropriate, where there was a PathContext that
determined how various operations on paths behaved. The default
context would then be determined by the current OS, but you could

    with pathlib.PosixContext:
        # "\" is not a directory separator
        # "/" is used in representations
        # Comparison is case sensitive
        # expanduser() uses posix rules

    with pathlib.WindowsContext:
        # "\" and "/" are directory separators
        # "\" is used in representations
        # Comparison is case insensitive

Contexts could be tweaked for desired behaviour (e.g. using "/" in
representations on Windows)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Sat Oct 13 09:59:53 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 13 Oct 2012 17:59:53 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 3:05 PM, Greg Ewing <greg.ewing at> wrote:
> Although... is there a way to non-destructively test whether
> a generator is exhausted? If so, this could easily be provided
> as a scheduler primitive.

Yes. Take a look at inspect.getgeneratorstate in 3.2+ (previously,
implementations weren't *required* to provide that introspection
capability, but now they do in order to support this function in the
inspect module).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ubershmekel at  Sat Oct 13 10:05:34 2012
From: ubershmekel at (Yuval Greenfield)
Date: Sat, 13 Oct 2012 10:05:34 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 13, 2012 6:45 AM, "Devin Jeanpierre" <jeanpierreda at> wrote:
> On Fri, Oct 12, 2012 at 10:41 PM, Steven D'Aprano <steve at>
> > If I were designing a language from scratch today, with full Unicode
> > from the beginning, I would support a rich set of operators possibly
> > including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the
> > to use them wisely or not at all. But I don't think it would be
> > for Python to add them, at least not before Python 4: too much effort
> > too
> > little gain. Maybe in another ten years people will be less resistant to
> > Unicode operators.
> Python has cleverly left the $ symbol unused.
> We can use it as a quasiquote to embed executable TeX.
>     for x in xrange($b \cdot \sum_{i=1}^n \frac{x^n}{n!}$):
>         ...

I hope this was in jest because that line of TeX for general programming
made my eyes bleed.

A PEP for defining operators sounds interesting for 4.0 indeed. Though it
might be messy to allow a module to meddle with the python syntax.

Perhaps instead I would like it if all operators were objects with e.g.
special __infix__ methods.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From steve at  Sat Oct 13 10:18:12 2012
From: steve at (Steven D'Aprano)
Date: Sat, 13 Oct 2012 19:18:12 +1100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 13/10/12 19:05, Yuval Greenfield wrote:

> A PEP for defining operators sounds interesting for 4.0 indeed. Though it
> might be messy to allow a module to meddle with the python syntax.

You mean more than classes already do? :)

> Perhaps instead I would like it if all operators were objects with e.g.
> special __infix__ methods.

I believe that Haskell treats operators as if they were function objects,
so you could do something like:

negative_values = map(-, values)

but I think that puts the emphasis on the wrong thing. If (and that's a big
if) we did something like this, it should be a pair of methods __op__ and
the right-hand version __rop__ which get called on the *operands*, not the
operator/function object:

def __op__(self, other, symbol)


From solipsis at  Sat Oct 13 10:22:04 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 13 Oct 2012 10:22:04 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On Sat, 13 Oct 2012 17:41:29 +1000
Nick Coghlan <ncoghlan at> wrote:
> Something else I've been thinking about is that it still feels wrong
> to me to be making the Windows vs Posix behavioural decision at the
> class level. It really feels more like a "decimal.Context" style API
> would be more appropriate, where there was a PathContext that
> determined how various operations on paths behaved. The default
> context would then be determined by the current OS, but you could
> write:
>     with pathlib.PosixContext:
>         # "\" is not a directory separator
>         # "/" is used in representations
>         # Comparison is case sensitive
>         # expanduser() uses posix rules
>     with pathlib.WindowsContext:
>         # "\" and "/" are directory separators
>         # "\" is used in representations
>         # Comparison is case insensitive

You could make an argument that the Path classes could have their
behaviour tweaked with such a context system, but I really think
explicit classes for different path flavours are much better design
than some thread-local context hackery. Personally, I consider
thread-local contexts to be an anti-pattern.

(also, the idea that a POSIX path becomes a Windows path based on which
"with" statement it's used inside sounds scary)



Software development and contracting:

From storchaka at  Sat Oct 13 10:33:13 2012
From: storchaka at (Serhiy Storchaka)
Date: Sat, 13 Oct 2012 11:33:13 +0300
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <k5b90b$tnp$>

On 13.10.12 05:41, Steven D'Aprano wrote:
> If I were designing a language from scratch today, with full Unicode
> support
> from the beginning, I would support a rich set of operators possibly even
> including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user
> to use them wisely or not at all.

But they are a different operators.

(1, 2, 3)?(6, 5, 4) = 28
(1, 2, 3)?(6, 5, 4) = (-7, 14, -7)

From ubershmekel at  Sat Oct 13 11:15:10 2012
From: ubershmekel at (Yuval Greenfield)
Date: Sat, 13 Oct 2012 11:15:10 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 10:18 AM, Steven D'Aprano <steve at>wrote:

> [..]
> but I think that puts the emphasis on the wrong thing. If (and that's a big
> if) we did something like this, it should be a pair of methods __op__ and
> the right-hand version __rop__ which get called on the *operands*, not the
> operator/function object:
> def __op__(self, other, symbol)
I thought the operator should have a say in how it operates, e.g. the
operater `dot` could call __dot__ in its operands.

class Vector:
    def _dot(self, other):
        return sum([i * j for i, j in zip(self, other)])

class dot(operator):
    def __infix__(self, left, right):
        return left._dot(left, right)

>>>Vector([1,2,3]) dot Vector([3,4,5])

Making the declaration and import of operators more explicit than the `def
__op__(self, other, symbol)` version. We could put [/, *, ., //, etc...] in

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From rosuav at  Sat Oct 13 11:18:26 2012
From: rosuav at (Chris Angelico)
Date: Sat, 13 Oct 2012 20:18:26 +1100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 7:18 PM, Steven D'Aprano <steve at> wrote:
> On 13/10/12 19:05, Yuval Greenfield wrote:
>> A PEP for defining operators sounds interesting for 4.0 indeed. Though it
>> might be messy to allow a module to meddle with the python syntax.
> You mean more than classes already do? :)

Yes, more than classes already do. You could completely redefine
Python into another language.

Here, I wrote a program. It uses the letter d as an infix operator
that means "sum N random numbers up to M". You know the language, it's
Python same as you work with all the time! Oh, but I don't use + for
addition, I use $, and # is my "turn tuple into dictionary" operator,
and I use parentheses as a sort of C-style ternary operator.

But it's still Python, so you should be able to read and understand
the code, right?

I actually wrote up a language design spec to highlight what would
happen if this sort of thing were possible. And the writing of that
spec was what demonstrated to me how fundamentally BAD the idea was.

It could certainly be done. All you need to do is make abuttal of
three objects into second_object.__infix__(first_object, third_object)
and then handle the mess of prefix and postfix objects. I just don't
recommend ever doing it.


From cs at  Sat Oct 13 06:27:28 2012
From: cs at (Cameron Simpson)
Date: Sat, 13 Oct 2012 15:27:28 +1100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 12Oct2012 13:27, Ram Rachum <ram.rachum at> wrote:
| Today a funny thought occurred to me. Ever since I've learned to program 
| when I was a child, I've taken for granted that when programming, the sign 
| used for multiplication is *. But now that I think about it, why? Now that 
| we have Unicode, why not use ? ?

Because it looks astonishingly like ".". Reason enough to avoid it
altogether, for any purpose, in a language that uses "." quite a like,
as Python does.

A big -100 from me.

Besides, "*" works well and has a long history as multiplication in many
languages. This isn't broken.

As a child, I was taught "x" (that's intened as a small cross diagonally
oriented, not the letter I've used here) for multiplication. Let's
support that too! It also looks like another character (specifically, a
lot like the letter "x").

Seriously, I think this is a bad idea on a readability/usability basis,
and an unnecessary idea from a functional point of view - it adds noting
not already there and mucks with the "one obvious way to do it" notion
into the bargain.

Cameron Simpson <cs at>

Climber: "I don't know, I can't see the next bolt."
Belayer: "Remember X, when in doubt, run it out."
This should be read with a good Birmingham accent, something like "Remember
'oids, win in dowt, roon it owt"

From shibturn at  Sat Oct 13 11:30:16 2012
From: shibturn at (Richard Oudkerk)
Date: Sat, 13 Oct 2012 10:30:16 +0100
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <k5bcbb$m8a$>

On 13/10/2012 1:22am, Guido van Rossum wrote:
> I don't think it follows that there can only be one reactor if they
> are registered immediately. There could be a notion of "current
> reactor" maintained in thread-local context; moreover it could depend
> on the reactor that made the callback that caused the current task to
> run. The reactor could also be chosen by the code that made the
> Future. (Though I'm not immediately sure how that would work in the
> yield-from scenario -- but I'm sure there's a way.)

Alternatively, yielding a future (or whatever ones calls the objects 
returned by *_async()) could register *and* wait for the result.  To 
register without waiting one would yield a wrapper for the future.  So 
one could write

     result = yield foo_async(...)


     f = yield Register(foo_async())
     # do some other work
     result = yield f


From masklinn at  Sat Oct 13 11:32:24 2012
From: masklinn at (Masklinn)
Date: Sat, 13 Oct 2012 11:32:24 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-13, at 10:18 , Steven D'Aprano wrote:
>> Perhaps instead I would like it if all operators were objects with e.g.
>> special __infix__ methods.
> I believe that Haskell treats operators as if they were function objects

That is correct for binary operators. The unary minus is (currently) a
keyword and sugar for the negate function[0].

So `map (-) values` is not going to negate all values, it's going to
partially apply the binary `(-)` to all values.

> but I think that puts the emphasis on the wrong thing.

I'm not sure I understand that, what does it put the emphasis on? Note
that these operators ? when generic ? tend to live in typeclasses, so
the actual implementation of the behavior of the operator for the set
of its arguments is defined where and when the corresponding typeclass
instance is created. This is essentially how Python's own operators
(and some builtins e.g. ``divmod`` or ``pow``) work (except Haskell
doesn't have a reflected operands fallback)


From masklinn at  Sat Oct 13 11:34:11 2012
From: masklinn at (Masklinn)
Date: Sat, 13 Oct 2012 11:34:11 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-12, at 23:37 , Ethan Furman wrote:
> In college we dropped the ? and just wrote stuff like:
> (x + z)(x - y)
> but we can't do that in Python because they are function calls.

Numbers could be callable with __call__ aliasing to a

From breamoreboy at  Sat Oct 13 11:51:52 2012
From: breamoreboy at (Mark Lawrence)
Date: Sat, 13 Oct 2012 10:51:52 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <k5bdhj$up8$>

On 13/10/2012 05:27, Cameron Simpson wrote:
> As a child, I was taught "x" (that's intened as a small cross diagonally
> oriented, not the letter I've used here) for multiplication. Let's
> support that too! It also looks like another character (specifically, a
> lot like the letter "x").
> Cheers,

Another problem with "x" is actually writing it out correctly on your 
coding sheets for the data preparation team.  IIRC Hagar the Horrible 
had an issue with this as he couldn't get the lines to cross.


Mark Lawrence.

From solipsis at  Sat Oct 13 12:06:34 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 13 Oct 2012 12:06:34 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
	<> <>
Message-ID: <1350122794.3365.8.camel@localhost.localdomain>

Le samedi 13 octobre 2012 ? 19:47 +1000, Nick Coghlan a ?crit :
> The problem is that "Windows path" and "Posix path" aren't really
> accurate. There are a bunch of degrees of freedom, which is *exactly*
> the problem the context pattern is designed to deal with without a
> combinatorial explosion of different types or mixins.
> The "how is the string format determined?" aspect could be handled
> with separate methods, but how do you do case insensitive comparisons
> of paths on posix systems?

The question is: why do you want to do that?
I know there are a limited bunch of special cases where Posix filesystem
paths may be case-insensitive, but nobody really cares about them today,
and I don't expect many people to bother tomorrow. Playing with
individual parameters of path semantics sounds like a theoretical bother
more than a practical one.

A possibility would be to expose the Flavour classes, which until now
are an internal implementation detail. That would first imply better
defining their API, though. Then people could write e.g.:

class PosixCaseInsensitiveFlavour(pathlib.PosixFlavour):
    case_sensitive = False

class MyPath(pathlib.PosixPath):
    flavour = PosixCaseInsensitiveFlavour()

But I would consider it extra icing on the cake, not a requirement for a
Path API.



Software development and contracting:

From itamar at  Sat Oct 13 12:52:54 2012
From: itamar at (Itamar Turner-Trauring)
Date: Sat, 13 Oct 2012 06:52:54 -0400
Subject: [Python-ideas] The async API of the future: Twisted and
Message-ID: <>

(Sorry if this doesn't end up in the right thread in mail clients; I've
been reading this through a web UI and only just formally subscribed so
can't reply directly to the correct email.)

Code that uses generators is indeed often easier to read... but the problem
is that this isn't just a difference in syntax, it has a significant
semantic impact. Specifically, requiring yield means that you're
re-introducing context switching. In inlineCallbacks, or coroutines, or any
system that use yield as in your example above, arbitrary code may run
during the context switch, and who knows what happened to the state of the
world in the interim. True, it's an explicit context switch, unlike
threading where it can happen at any point, but it's still a context
switch, and it still increases the chance of race conditions and all the
other problems threading has. (If you're omitting yield it's even worse,
since you can't even tell anymore where the context switches are
happening.) Superficially such code is simpler (and in some cases I'm happy
to use inlineCallbacks, in particular in unit tests), but much the same way
threaded code is "simpler". If you're not very very careful, it'll work 99
times and break mysteriously the 100th.

For example, consider the following code; silly, but buggy due to the
context switch in yield allowing race conditions if any other code modifies
counter.value while getResult() is waiting for a result.

   def addToCounter():
        counter.value = counter.value + (yield getResult())

In a Deferred callback, on the other hand, you know the only things that
are going to run are functions you call. In so far as it's possible, what
happens is under control of one function only. Less pretty, but no
potential race conditions:

    def add(result):
        counter.value = counter.value + result

That being said, perhaps some changes to Python syntax could solve this;
Allen Short (
claims to have a proposal, hopefully he'll post it soon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From _ at  Sat Oct 13 13:05:08 2012
From: _ at (Laurens Van Houtven)
Date: Sat, 13 Oct 2012 13:05:08 +0200
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

In addition to the issue mentioned by Itamar, there needs to be a clear way
to do two related things:

1) actually doing things asynchronously! A good example of where this
happens for me is stats logging. I log some stats, but I don't want to wait
for the request to be completed before I continue on with my work:

def callback():
    return actuallyDoWorkCustomerCaresAbout()

logSomeStats returns a deferred, and I probably would attach an errback to
that deferred, but I don't want to wait until I've finished logging some
stats to do the rest of the work, and I CERTAINLY don't want the work the
customer cares about to bomb out because my stats server is down.

In current inlineCallbacks, this is equally simple: I just run the
expression and *not* yield. If I understand the current alternative
suggestions correctly, the yielding part is important for actually hooking
up the IO (whereas in @inlineCallbacks, it *only* does callback
management). Perhaps I am mistaken in this belief?

2) doing multiple things concurrently. Let's say I want to download 10 web
pages and do something when all ten of them have completed. In twisted, I
can say:

gatherResults(map(getPage, urls)).addCallback(...)

with inlineCallbacks, you can do quite similar things (just yield the
result of gatherResults, since that's a deferred that'll fire once all of
them have fired):

for body in (yield gatherResults(map(getPage, urls)):


How would these two look in a world where the generator/inlineCallbacks
magic isn't generator backed?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From michelelacchia at  Sat Oct 13 14:04:56 2012
From: michelelacchia at (Michele Lacchia)
Date: Sat, 13 Oct 2012 05:04:56 -0700 (PDT)
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
	<k59ff3$i5i$> <>
Message-ID: <>

> >     
> '.j/homeo/homes/homeh/homeu/homeacj/homeo/homes/homeh/homeu/homeaoj/homeo/homes/homeh/homeu/homeanj/homeo/homes/homeh/homeu/homeafj/homeo/homes/homeh/homeu/homeaij/homeo/homes/homeh/homeu/homeag' 
> Homeo, Homeo, wherefore path thou Homeo? 
> -- 
> Greg

I just had to +1 on this one!! Congrats! 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tismer at  Sat Oct 13 13:42:32 2012
From: tismer at (Christian Tismer)
Date: Sat, 13 Oct 2012 13:42:32 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Guido and folks,

On 07.10.12 17:04, Guido van Rossum wrote:
> On Sun, Oct 7, 2012 at 3:09 AM, Antoine Pitrou <solipsis at> wrote:
>> On Sat, 6 Oct 2012 17:23:48 -0700
>> Guido van Rossum <guido at> wrote:
>>> On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou <solipsis at> wrote:
>>>> greenlets/gevents only get you half the advantages of single-threaded
>>>> "async" programming: they get you scalability in the face of a high
>>>> number of concurrent connections, but they don't get you the robustness
>>>> of cooperative multithreading (because it's not obvious when reading
>>>> the code where the possible thread-switching points are).
>>> I used to think that too, long ago, until I discovered that as you add
>>> abstraction layers, cooperative multithreading is untenable -- sooner
>>> or later you will lose track of where the threads are switched.
>> Even with an explicit notation like "yield" / "yield from"?
> If you strictly adhere to using those you should be safe (though
> distinguishing between the two may prove challenging) -- but in
> practice it's hard to get everyone and every API to use this style. So
> you'll have some blocking API calls hidden deep inside what looks like
> a perfectly innocent call to some helper function.
> IIUC in Go this is solved by mixing threads and lighter-weight
> constructs (say, greenlets) -- if a greenlet gets blocked for I/O, the
> rest of the system continues to make progress by spawning another
> thread.
> My own experience with NDB is that it's just too hard to make everyone
> use the async APIs all the time -- so I gave up and made async APIs an
> optional feature, offering a blocking and an async version of every
> API. I didn't start out that way, but once I started writing
> documentation aimed at unsophisticated users, I realized that it was
> just too much of an uphill battle to bother.
> So I think it's better to accept this and deal with it, possibly
> adding locking primitives into the mix that work well with the rest of
> the framework. Building a lock out of a tasklet-based (i.e.
> non-threading) Future class is easy enough.

I'm digging in, a bit late.
Still trying to read the myriad of messages.

For now just a word:
Guido: How much I would love to use your time machine and invite
you to discuss Pythons future in 1998.

Then we would have tossed greenlet/stackless and all that crap.
Entering a different context could have been folded deeply into Python,
by making it able to pickle program state in certain positions.

Just dreaming out loud :-)
It is great that this discussion is taking place, and I'll try to help.

cheers - Chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From tismer at  Sat Oct 13 15:11:43 2012
From: tismer at (Christian Tismer)
Date: Sat, 13 Oct 2012 15:11:43 +0200
Subject: [Python-ideas] Cofunctions PEP - Revision 4
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Hi Greg,

digged this thing up while looking into the current async discussion.

On 14.08.10 03:22, Greg Ewing wrote:
> M.-A. Lemburg wrote:
>> Greg Ewing wrote:
>>> In an application that requires thousands of small, cooperating
>>> processes,
>> Sure, and those use Stackless to solve the problem, which IMHO
>> provides a much more Pythonic approach to these things.
> At the expense of using a non-standard Python installation,
> though. I'm trying to design something that can be incorporated
> into standard Python and work without requiring any deep
> black magic. Guido has so far rejected any idea of merging
> Stackless into CPython.
> Also I gather that Stackless works by copying pieces of
> C stack around, which is probably more lightweight than using
> an OS thread, but not as light as it could be.

So, here I need to correct a bit.
What you are describing is the behavior of stackless 2.0,
also what the greenlet does (and eventlet then too for now).

The main thing that makes stackless 3.x so difficult _is_ that
it is as efficient as can be, because no stack slicing is done,
for 90 % of all code.

Stackless uses operations to unwind the C stack in most cases.
If this were possible in _all_ cases, then all the stack copying
would go away, and we had no machine code at all!

But the necessary change to Python would be quite heavy,
undoable for a small team.

I have left these ideas long time ago and did other projects.
But maybe things should be considered again, after the world
has changed so much. Maybe Python 4 could be decoupled
from the C stack.

cheers - Chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From mwm at  Sat Oct 13 17:22:29 2012
From: mwm at (Mike Meyer)
Date: Sat, 13 Oct 2012 10:22:29 -0500
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, 13 Oct 2012 19:18:12 +1100
Steven D'Aprano <steve at> wrote:

> On 13/10/12 19:05, Yuval Greenfield wrote:
> I believe that Haskell treats operators as if they were function objects,
> so you could do something like:

For the record, Haskell allows operators to be used as functions by
quoting them in ()'s (to provide the functionality of operator) and to
turn functions into operators by quoting them in ``'s.

> negative_values = map(-, values)
> but I think that puts the emphasis on the wrong thing. If (and that's a big
> if) we did something like this, it should be a pair of methods __op__ and
> the right-hand version __rop__ which get called on the *operands*, not the
> operator/function object:
> def __op__(self, other, symbol)

Yeah, but then your function has to dispatch for *all*
operators. Depending on how we handle backwards compatibility with
__add__ et. al.

I'd rather slice it the other way (leveraging $ being unsused):

def __$<op>__(self, other, right): 

so it only has to dispatch on left/right invocation.

<op> must match a new grammer symbol "operator_symbol", with limits on
it to for readability reasons: say at most three characters, all
coming from an appropriate unicode class or classes (you want to catch
the current operators and dollar sign).

Both of these leave both operator precedence and backwards
compatibility to be dealt with.

Mike Meyer <mwm at>
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail -

From bauertomer at  Sat Oct 13 17:29:54 2012
From: bauertomer at (T.B.)
Date: Sat, 13 Oct 2012 17:29:54 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <1350122794.3365.8.camel@localhost.localdomain>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On 2012-10-13 12:06, Antoine Pitrou wrote:
> Le samedi 13 octobre 2012 ? 19:47 +1000, Nick Coghlan a ?crit :
>> The problem is that "Windows path" and "Posix path" aren't really
>> accurate. There are a bunch of degrees of freedom, which is *exactly*
>> the problem the context pattern is designed to deal with without a
>> combinatorial explosion of different types or mixins.
>> The "how is the string format determined?" aspect could be handled
>> with separate methods, but how do you do case insensitive comparisons
>> of paths on posix systems?
> The question is: why do you want to do that?
> I know there are a limited bunch of special cases where Posix filesystem
> paths may be case-insensitive, but nobody really cares about them today,
> and I don't expect many people to bother tomorrow. Playing with
> individual parameters of path semantics sounds like a theoretical bother
> more than a practical one.

If you want do that, and that is a big if, it might be better to give 
keywords arguments to Path(), so that the class signature would look like:

class Path:
     def __init__(self, *args, sep=os.path.sep, 
casesensitive=os.path.casesensitive, expanduser=False)...

This will make PosixPath and WindowsPath a partial class with certain 
keywords arguments filled in.

Notice that os.path.casesensitive is not (yet) present in Python.


From ncoghlan at  Sat Oct 13 17:46:09 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 14 Oct 2012 01:46:09 +1000
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 8:52 PM, Itamar Turner-Trauring
<itamar at> wrote:
>    def addToCounter():
>         counter.value = counter.value + (yield getResult())

This is buggy code for the reasons you state. However, only improperly
*embedded* yields have this problem, yields that are done in a
dedicated assignment statement are fine:

    def addToCounter():
        result = yield getResult()
        # No race condition here, as we only read the counter *after*
receiving the result
        counter.value = counter.value + result

(You can also make sure they're the first thing executed as part of a
larger expression, but a separate assignment statement will almost
always be clearer)

> In a Deferred callback, on the other hand, you know the only things that are
> going to run are functions you call. In so far as it's possible, what
> happens is under control of one function only. Less pretty, but no potential
> race conditions:
>     def add(result):
>         counter.value = counter.value + result
>     getResult().addCallback(add)

This is not the same code you wrote above in the generator version.
The callback equivalent of the code you wrote is this:

    bound_value = counter.value
    def add(result):
        counter.value = bound_value + result

The generator version isn't magic, people still need to know what
they're doing to properly benefit from the cooperative multithreading.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Sat Oct 13 17:50:57 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 14 Oct 2012 01:50:57 +1000
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 9:05 PM, Laurens Van Houtven <_ at> wrote:
> In addition to the issue mentioned by Itamar, there needs to be a clear way
> to do two related things:
> 1) actually doing things asynchronously! A good example of where this
> happens for me is stats logging. I log some stats, but I don't want to wait
> for the request to be completed before I continue on with my work:
> def callback():
>     logSomeStats()
>     return actuallyDoWorkCustomerCaresAbout()
> logSomeStats returns a deferred, and I probably would attach an errback to
> that deferred, but I don't want to wait until I've finished logging some
> stats to do the rest of the work, and I CERTAINLY don't want the work the
> customer cares about to bomb out because my stats server is down.
> In current inlineCallbacks, this is equally simple: I just run the
> expression and *not* yield. If I understand the current alternative
> suggestions correctly, the yielding part is important for actually hooking
> up the IO (whereas in @inlineCallbacks, it *only* does callback management).
> Perhaps I am mistaken in this belief?

Some have certainly suggested that, but not Guido. In Guido's API, the
*_async() calls actually kick off the operations, the "yield" calls
are the "I'm done for now, wake me when this Future I'm yielding is

This is the only way that makes sense, for the reasons you give here.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From itamar at  Sat Oct 13 18:00:24 2012
From: itamar at (Itamar Turner-Trauring)
Date: Sat, 13 Oct 2012 12:00:24 -0400
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 11:46 AM, Nick Coghlan <ncoghlan at> wrote:

> > In a Deferred callback, on the other hand, you know the only things that
> are
> > going to run are functions you call. In so far as it's possible, what
> > happens is under control of one function only. Less pretty, but no
> potential
> > race conditions:
> >
> >     def add(result):
> >         counter.value = counter.value + result
> >     getResult().addCallback(add)
> This is not the same code you wrote above in the generator version.
> The callback equivalent of the code you wrote is this:
>     bound_value = counter.value
>     def add(result):
>         counter.value = bound_value + result
>     getResult().addCallback(add)

True, so, let's look at this version. First, notice that it's more
convoluted than the version I wrote above; i.e. you have to go out of your
way to write race conditiony code. Second, and much more important, when
reading it it's obvious that you're getting and setting counter.value at
different times! Whereas in the generator version you have to think about
it. The generator version has you naturally writing code where things you
thought are happening at the same time are actually happening very far
apart; the Deferred code makes it clear which pieces of code happen
separately, and so you're much more likely to notice these sort of bugs.

The generator version isn't magic, people still need to know what
> they're doing to properly benefit from the cooperative multithreading.

I agree. And that's exactly the dimension in which Deferreds are superior
to cooperative multithreading; people don't have to think about race
conditions as much, which is hard enough  in general. At least when you're
using Deferreds, you can tell by reading the code which chunks of code can
happen at different times, and the natural idioms of Python don't
*encourage* race conditions as they do with yield syntax.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Sat Oct 13 17:37:18 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 14 Oct 2012 01:37:18 +1000
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <1350122794.3365.8.camel@localhost.localdomain>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On Sat, Oct 13, 2012 at 8:06 PM, Antoine Pitrou <solipsis at> wrote:
> The question is: why do you want to do that?
> I know there are a limited bunch of special cases where Posix filesystem
> paths may be case-insensitive, but nobody really cares about them today,
> and I don't expect many people to bother tomorrow. Playing with
> individual parameters of path semantics sounds like a theoretical bother
> more than a practical one.

It's a useful trick for writing genuinely cross-platform code: when
I'm writing cross-platform code on *nix, I want my paths to behave
like posix paths in every respect *except* I want them to complain
somehow if any of my names only differ by case. I've been burnt in the
past by checking in conflicting names on a Linux machine and then
wondering why the Windows checkouts were broken. The only real way to
deal with that is to avoid relying on filesystem case sensitivity for
correct behaviour of your application, even when the underlying OS
*permits* case sensitivity.

This becomes even *more* important if NFS and CIFS filesystems are
being shared between *nix and Windows systems, but it applies any time
a file system may be shared (e.g. creating archive files, checking in
to a source control system, etc). I have the luxury right now of only
needing to care about Linux systems, but I've had to deal with the
mess in the past and "act case insensitive everywhere" is the only
sanity preserving option. Python itself deals with this mostly via the
stylistic rule of "always use lowercase module and package names", but
it would be nice if a new path abstraction allowed the problem to be
handled *properly*.

On the Windows side, it would be nice to be able to request the use of
"/" as the directory separator when converting to a string. Using "\"
has the potential to cause interoperability problems (e.g. with
regular expressions).

If you don't like the implicit nature of contexts (a perfectly
reasonable complaint), then I suggest going for an explicit strategy
pattern with flavours rather than requiring classes.

With this approach, the flavour would be specified on a *per-instance*
basis (with the default behaviour being determined by the OS).

The main class hierarchy would just be PurePath <-- Path and there
would be a separate PathFlavor ABC with PosixFlavor and WindowsFlavor
subclasses (public Python stdlib APIs generally follow US spelling and
drop the 'u').

The main classes would then *delegate* the flavour dependent
operations like parsing, conversion to a string and equality
comparisons to the flavour objects.

It's really the public use of the strategy pattern that prevents the
combinatorial explosion - you can just have a single OS-based default
(as is already the case with PurePath.__new__ and Path.__new__ playing
type selection games), rather than allowing the default to be
configured per thread. The decimal-style thread-based dynamic contexts
are more useful when you want to change the behaviour *without* either
copying or mutating objects, which I agree is overkill for path

Since pathlib already uses the Flavor objects as strategies
internally, it should just be a matter of switching from the use of
inheritance to specify the flavour to using a keyword-only argument in
the constructor. The "case-insensitive posix path" example would then
look like:

class PosixCaseInsensitiveFlavor(pathlib.PosixFlavor):
    case_sensitive = False

def my_path(*args):
    return Path(*args, flavor=PosixCaseInsensitiveFlavor)

You can add as many new flavours as you want, and it's only one class
per flavour rather than up to 3 (the flavour itself, the pure variant
and the concrete variant).

This class hierarchy is also more amenable to the introduction of
MutablePath as a second subclass of PurePath - a path variant with
mutable properties still sounds potentially attractive to me (over a
wide variety of return-a-modified-copy methods for various cases).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From at  Sat Oct 13 18:10:08 2012
From: at (Joshua Landau)
Date: Sat, 13 Oct 2012 17:10:08 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 13 October 2012 16:22, Mike Meyer <mwm at> wrote:

> On Sat, 13 Oct 2012 19:18:12 +1100
> Steven D'Aprano <steve at> wrote:
> > On 13/10/12 19:05, Yuval Greenfield wrote:
> > I believe that Haskell treats operators as if they were function objects,
> > so you could do something like:
> For the record, Haskell allows operators to be used as functions by
> quoting them in ()'s (to provide the functionality of operator) and to
> turn functions into operators by quoting them in ``'s.
> > negative_values = map(-, values)
> >
> > but I think that puts the emphasis on the wrong thing. If (and that's a
> big
> > if) we did something like this, it should be a pair of methods __op__ and
> > the right-hand version __rop__ which get called on the *operands*, not
> the
> > operator/function object:
> >
> > def __op__(self, other, symbol)
> Yeah, but then your function has to dispatch for *all*
> operators. Depending on how we handle backwards compatibility with
> __add__ et. al.
> I'd rather slice it the other way (leveraging $ being unsused):
> def __$<op>__(self, other, right):
> so it only has to dispatch on left/right invocation.
> <op> must match a new grammer symbol "operator_symbol", with limits on
> it to for readability reasons: say at most three characters, all
> coming from an appropriate unicode class or classes (you want to catch
> the current operators and dollar sign).
> Both of these leave both operator precedence and backwards
> compatibility to be dealt with.

If anyone is taking this as more than a bit of fun, *stop it*.

How'er, for all you wanting something a bit more concrete to play with,
I've got something that simulates infix based off something I'd found on
the netz sometime who's author I do not remember.

The code is one Codepad <> for brevity, and it
lets you do things like this:

 >>> (2 *dot* "__mul__" |mappedover| 10-to-25) >> tolist
> [20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]

Note that this is a contrived example equivalent to:

 >>> list(map((2).__mul__, range(10, 25)))
> [20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]

and mixing the styles you can get a quite nice:

 >>> map((2).__mul__, 10-to-25) >> tolist
> [20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]

which would actually look readable if it was considered mainstream.

Note that making an in-line function is as simple as:

>>> @Inline
> ... def multiply(x, y): return x*y
> ...
> >>> 3 ^multiply^ 3
> 9

and that you can use any surrounding operators (other than comparisons)  to
chose your operator priority or what reads well:

>>> 1 |div| 3 |div| 3
> 0.1111111111111111
> >>> 1 |div| 3 *div* 3
> 1.0

and finally you also get "coercion" to functions ? la Haskell:

>>> 2 |(div|3)
> 0.6666666666666666
> >>> (div|3)(2)
> 0.6666666666666666

but I wouldn't even hope of calling it stable code or low on WTFs (if the
above wasn't enough):

>>> (div|(div|3))(3) # Go on, guess why!
> 1.0
> >>> 2 + (div|3) # 'Cause you can, yo
> 0.6666666666666666

These could both be "fixed" by making an infix require the same operator on
both sides, which would make these both errors, but that wouldn't catch
cases like (or*(div|3))(3) anyway.

So enjoy. Or not. Preferably not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sat Oct 13 18:17:46 2012
From: guido at (Guido van Rossum)
Date: Sat, 13 Oct 2012 09:17:46 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 9:46 PM, Glyph <glyph at> wrote:
> There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days.  There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion.
> Would everyone interested in this please please please read <> several times?  Especially this section: <>.  If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation.

I am well aware of that section. But, like the rest of PEP 3153, it is
sorely lacking in examples or specifications.

> I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering.  Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer.

This could mean several things: (a) only the networking layer needs to
use both trigger styles, the rest of your code should always use
trigger style X (and please let X be edge-triggered :-); (b) only in
the networking layer is it important to distinguish carefully between
the two, in the rest of the app you can use whatever you like best.

> Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called.  Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it.  This is perhaps the central design error of asyncore.

Is this about buffering? Because I think I understand buffering.
Filling up a buffer with data as it comes in (until a certain limit)
is a good job for level-triggered callbacks. Ditto for draining a
buffer. The rest of the app can then talk to the buffer and tell it
"give me between X and Y bytes, possibly blocking if you don't have at
least X available right now, or "here are N more bytes, please send
them out when you can". From the app's position these calls *may*
block, so they need to use whatever mechanism (callbacks, Futures,
Deferreds, yield, yield-from) to ensure that *if* they block, other
tasks can run. But the common case is that they don't actually need to
block because there is still data / space in the buffer. (You could
also have an exception for write() and make that never-blocking,
trusting the app not to overfill the buffer; this seems convenient but
it worries me a bit.)

> If it needs a name, I suppose I'd call my preferred style "event triggering".

But how does it work? What would typical user code in this style look like?

> Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once.  If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk.

Well understood. (And I don't even want to get microthreading into the
mix, although others may disagree -- I see Christian Tismer has jumped
in...) But I also think that if we design these things in isolation
it's likely that we'll find later that the pieces don't fit, and I
don't want that to happen either. So I think we should consider these
separate, but loosely coordinated efforts.

--Guido van Rossum (

From at  Sat Oct 13 18:21:19 2012
From: at (Joshua Landau)
Date: Sat, 13 Oct 2012 17:21:19 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
	<> <>
Message-ID: <>

On 13 October 2012 16:37, Nick Coghlan <ncoghlan at> wrote:

> On Sat, Oct 13, 2012 at 8:06 PM, Antoine Pitrou <solipsis at>
> wrote:
> > The question is: why do you want to do that?
> > I know there are a limited bunch of special cases where Posix filesystem
> > paths may be case-insensitive, but nobody really cares about them today,
> > and I don't expect many people to bother tomorrow. Playing with
> > individual parameters of path semantics sounds like a theoretical bother
> > more than a practical one.
> It's a useful trick for writing genuinely cross-platform code: when
> I'm writing cross-platform code on *nix, I want my paths to behave
> like posix paths in every respect *except* I want them to complain
> somehow if any of my names only differ by case. I've been burnt in the
> past by checking in conflicting names on a Linux machine and then
> wondering why the Windows checkouts were broken. The only real way to
> deal with that is to avoid relying on filesystem case sensitivity for
> correct behaviour of your application, even when the underlying OS
> *permits* case sensitivity.
> This becomes even *more* important if NFS and CIFS filesystems are
> being shared between *nix and Windows systems, but it applies any time
> a file system may be shared (e.g. creating archive files, checking in
> to a source control system, etc). I have the luxury right now of only
> needing to care about Linux systems, but I've had to deal with the
> mess in the past and "act case insensitive everywhere" is the only
> sanity preserving option. Python itself deals with this mostly via the
> stylistic rule of "always use lowercase module and package names", but
> it would be nice if a new path abstraction allowed the problem to be
> handled *properly*.
> On the Windows side, it would be nice to be able to request the use of
> "/" as the directory separator when converting to a string. Using "\"
> has the potential to cause interoperability problems (e.g. with
> regular expressions).
> If you don't like the implicit nature of contexts (a perfectly
> reasonable complaint), then I suggest going for an explicit strategy
> pattern with flavours rather than requiring classes.
> With this approach, the flavour would be specified on a *per-instance*
> basis (with the default behaviour being determined by the OS).
> The main class hierarchy would just be PurePath <-- Path and there
> would be a separate PathFlavor ABC with PosixFlavor and WindowsFlavor
> subclasses (public Python stdlib APIs generally follow US spelling and
> drop the 'u').
> The main classes would then *delegate* the flavour dependent
> operations like parsing, conversion to a string and equality
> comparisons to the flavour objects.
> It's really the public use of the strategy pattern that prevents the
> combinatorial explosion - you can just have a single OS-based default
> (as is already the case with PurePath.__new__ and Path.__new__ playing
> type selection games), rather than allowing the default to be
> configured per thread. The decimal-style thread-based dynamic contexts
> are more useful when you want to change the behaviour *without* either
> copying or mutating objects, which I agree is overkill for path
> manipulation.
> Since pathlib already uses the Flavor objects as strategies
> internally, it should just be a matter of switching from the use of
> inheritance to specify the flavour to using a keyword-only argument in
> the constructor. The "case-insensitive posix path" example would then
> look like:
> class PosixCaseInsensitiveFlavor(pathlib.PosixFlavor):
>     case_sensitive = False
> def my_path(*args):
>     return Path(*args, flavor=PosixCaseInsensitiveFlavor)
> You can add as many new flavours as you want, and it's only one class
> per flavour rather than up to 3 (the flavour itself, the pure variant
> and the concrete variant).
> This class hierarchy is also more amenable to the introduction of
> MutablePath as a second subclass of PurePath - a path variant with
> mutable properties still sounds potentially attractive to me (over a
> wide variety of return-a-modified-copy methods for various cases).

I don't disagree with your points, but I want to point out that IO is
something Python has to make *really basic* because it's one of the first
things newbies use, and Python is a newbie-friendly language.

If you're recommending flavours and whatnot, I recommend you do it in a way
that makes it very much optional and not at all the direct focus of the
docs. The nice thing about the class idea for the uninitiated was that
there were only two options, and newbies only ever had one obvious choice.

Contexts using "with", I think, seem newbie-friendly too. So does having
default flavours and then an ?expert?'s option to override default classes
in possibly a sub-module.

I'm no expert, but I think it's worth bearing in mind.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Sat Oct 13 18:28:30 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 13 Oct 2012 18:28:30 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
	<> <>
Message-ID: <1350145710.3365.44.camel@localhost.localdomain>

Le dimanche 14 octobre 2012 ? 01:37 +1000, Nick Coghlan a ?crit :
> On Sat, Oct 13, 2012 at 8:06 PM, Antoine Pitrou <solipsis at> wrote:
> > The question is: why do you want to do that?
> > I know there are a limited bunch of special cases where Posix filesystem
> > paths may be case-insensitive, but nobody really cares about them today,
> > and I don't expect many people to bother tomorrow. Playing with
> > individual parameters of path semantics sounds like a theoretical bother
> > more than a practical one.
> It's a useful trick for writing genuinely cross-platform code: when
> I'm writing cross-platform code on *nix, I want my paths to behave
> like posix paths in every respect *except* I want them to complain
> somehow if any of my names only differ by case.

But that's not cross-platform. Under Windows you must also care about
reserved files (CON, NUL, etc.). Also, you can create Posix filenames
with backslashes in them, but under Windows they will be treated as
directory separators. Mercurial learnt this the hard way:

> On the Windows side, it would be nice to be able to request the use of
> "/" as the directory separator when converting to a string. Using "\"
> has the potential to cause interoperability problems (e.g. with
> regular expressions).

The PEP mentions the .as_posix() method, which does exactly that.
(use of regular expressions on whole paths sounds like a weird idea, but
hey :-))

> If you don't like the implicit nature of contexts (a perfectly
> reasonable complaint), then I suggest going for an explicit strategy
> pattern with flavours rather than requiring classes.
> With this approach, the flavour would be specified on a *per-instance*
> basis (with the default behaviour being determined by the OS).

If you s/would/could/, I have nothing against it, but I certainly don't
understand why you dislike the approach of providing dedicated classes
*by default*.

IMO, having separate classes is simpler to use, easier to type, more
discoverable (using pydoc or help() or tab-completion at the prompt),
and it has an educational value that a keyword-only "flavour" argument
doesn't have.

> The main classes would then *delegate* the flavour dependent
> operations like parsing, conversion to a string and equality
> comparisons to the flavour objects.

Which they already do :) Here is the code:

class PurePosixPath(PurePath):
    _flavour = _posix_flavour
    __slots__ = ()

class PureNTPath(PurePath):
    _flavour = _nt_flavour
    __slots__ = ()


> The decimal-style thread-based dynamic contexts
> are more useful when you want to change the behaviour *without* either
> copying or mutating objects, which I agree is overkill for path
> manipulation.

Not only overkill, but incorrect and dangerous!

> You can add as many new flavours as you want, and it's only one class
> per flavour rather than up to 3 (the flavour itself, the pure variant
> and the concrete variant).

Yes, you can. That doesn't preclude offering separate classes by
default, though :-)

> This class hierarchy is also more amenable to the introduction of
> MutablePath as a second subclass of PurePath - a path variant with
> mutable properties still sounds potentially attractive to me (over a
> wide variety of return-a-modified-copy methods for various cases).

I'm very cold on offering both mutable on non-mutable paths. That's just
complicated and confusing. Since an immutable type is very desireable
for use in associative containers, I think immutability is the right



Software development and contracting:

From ncoghlan at  Sat Oct 13 18:52:18 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 14 Oct 2012 02:52:18 +1000
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <1350145710.3365.44.camel@localhost.localdomain>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On Sun, Oct 14, 2012 at 2:28 AM, Antoine Pitrou <solipsis at> wrote:
>> You can add as many new flavours as you want, and it's only one class
>> per flavour rather than up to 3 (the flavour itself, the pure variant
>> and the concrete variant).
> Yes, you can. That doesn't preclude offering separate classes by
> default, though :-)

Factory functions would make more sense to me than separate classes -
they're not really a different type, they're the same type using a
different strategy for the OS dependent bits.

>> This class hierarchy is also more amenable to the introduction of
>> MutablePath as a second subclass of PurePath - a path variant with
>> mutable properties still sounds potentially attractive to me (over a
>> wide variety of return-a-modified-copy methods for various cases).
> I'm very cold on offering both mutable on non-mutable paths. That's just
> complicated and confusing. Since an immutable type is very desireable
> for use in associative containers, I think immutability is the right
> choice.

Sure, if we're only offering one of them, then immutable is definitely
the right choice. However, I think this is analogous to the bytes vs
bytearray distinction - while bytes objects are more useful in
general, using the mutable bytearray when appropriate is vastly
superior to slicing and copying bytes objects.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From solipsis at  Sat Oct 13 19:04:21 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 13 Oct 2012 19:04:21 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On Sun, 14 Oct 2012 02:52:18 +1000
Nick Coghlan <ncoghlan at> wrote:
> On Sun, Oct 14, 2012 at 2:28 AM, Antoine Pitrou <solipsis at> wrote:
> >> You can add as many new flavours as you want, and it's only one class
> >> per flavour rather than up to 3 (the flavour itself, the pure variant
> >> and the concrete variant).
> >
> > Yes, you can. That doesn't preclude offering separate classes by
> > default, though :-)
> Factory functions would make more sense to me than separate classes -
> they're not really a different type, they're the same type using a
> different strategy for the OS dependent bits.

I find them less helpful. isinstance() calls won't work. Deriving
won't work. It makes things a bit more opaque. However, we are
definitely talking about a secondary style issue.

(note how the threading module moved away from factory functions to
regular classes :-))

> >> This class hierarchy is also more amenable to the introduction of
> >> MutablePath as a second subclass of PurePath - a path variant with
> >> mutable properties still sounds potentially attractive to me (over a
> >> wide variety of return-a-modified-copy methods for various cases).
> >
> > I'm very cold on offering both mutable on non-mutable paths. That's just
> > complicated and confusing. Since an immutable type is very desireable
> > for use in associative containers, I think immutability is the right
> > choice.
> Sure, if we're only offering one of them, then immutable is definitely
> the right choice. However, I think this is analogous to the bytes vs
> bytearray distinction - while bytes objects are more useful in
> general, using the mutable bytearray when appropriate is vastly
> superior to slicing and copying bytes objects.

bytearray was only added after a lot of experience with the 2.x str
type. I don't think we should add a mutable path API before significant
experience has been gathered about the cost and performance-criticality
of path manipulation operations. Offering both mutable and immutable
types makes learning the API harder for beginners ("which type should
I use? what happens when I combine them?").



Software development and contracting:

From ben at  Sat Oct 13 19:07:05 2012
From: ben at (Ben Darnell)
Date: Sat, 13 Oct 2012 10:07:05 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis at> wrote:
> On Fri, 12 Oct 2012 15:11:54 -0700
> Guido van Rossum <guido at> wrote:
>> > 2. Method dispatch callbacks:
>> >
>> >     Similar to the above, the reactor or somebody has a handle on your
>> > object, and calls methods that you've defined when events happen
>> >     e.g. IProtocol's dataReceived method
>> While I'm sure it's expedient and captures certain common patterns
>> well, I like this the least of all -- calling fixed methods on an
>> object sounds like a step back; it smells of the old Java way (before
>> it had some equivalent of anonymous functions), and of asyncore, which
>> (nearly) everybody agrees is kind of bad due to its insistence that
>> you subclass its classes. (Notice how subclassing as the prevalent
>> approach to structuring your code has gotten into a lot of discredit
>> since 1996.)
> But how would you write a dataReceived equivalent then? Would you have
> a "task" looping on a read() call, e.g.
> @task
> def my_protocol_main_loop(conn):
>     while <some_condition>:
>         try:
>             data = yield
>         except ConnectionError:
>             conn.close()
>             break
> I'm not sure I understand the problem with subclassing. It works fine
> in Twisted. Even in Python 3 we don't shy away from subclassing, for
> example the IO stack is based on subclassing RawIOBase, BufferedIOBase,
> etc.

Subclassing per se isn't a problem, but requiring a single
dataReceived method per class can be awkward.  Many protocols are
effectively state machines, and modeling each state as a function can
be cleaner than a big if/switch block in dataReceived.  For example,
here's a simplistic HTTP client using tornado's IOStream:

       from tornado import ioloop
        from tornado import iostream
        import socket

        def send_request():
            stream.write("GET / HTTP/1.0\r\nHost:\r\n\r\n")
            stream.read_until("\r\n\r\n", on_headers)

        def on_headers(data):
            headers = {}
            for line in data.split("\r\n"):
               parts = line.split(":")
               if len(parts) == 2:
                   headers[parts[0].strip()] = parts[1].strip()
            stream.read_bytes(int(headers["Content-Length"]), on_body)

        def on_body(data):
            print data

        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
        stream = iostream.IOStream(s)
        stream.connect(("", 80), send_request)

Classes allow and encourage broader interfaces, which are sometimes a
good thing, but interact poorly with coroutines.  Both twisted and
tornado use separate callbacks for incoming data and for the
connection being closed, but for coroutines it's probably better to
just treat a closed connection as an error on the read.  Futures (and
yield from) give us a nice way to do that.


From _ at  Sat Oct 13 19:18:20 2012
From: _ at (Laurens Van Houtven)
Date: Sat, 13 Oct 2012 19:18:20 +0200
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

What calls on_headers in this example? Coming from twisted, that seems like
dataReceived's responsibility, but given your introductory paragraph that's
not actually what goes on here?

On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell <ben at> wrote:

> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis at>
> wrote:
> > On Fri, 12 Oct 2012 15:11:54 -0700
> > Guido van Rossum <guido at> wrote:
> >>
> >> > 2. Method dispatch callbacks:
> >> >
> >> >     Similar to the above, the reactor or somebody has a handle on your
> >> > object, and calls methods that you've defined when events happen
> >> >     e.g. IProtocol's dataReceived method
> >>
> >> While I'm sure it's expedient and captures certain common patterns
> >> well, I like this the least of all -- calling fixed methods on an
> >> object sounds like a step back; it smells of the old Java way (before
> >> it had some equivalent of anonymous functions), and of asyncore, which
> >> (nearly) everybody agrees is kind of bad due to its insistence that
> >> you subclass its classes. (Notice how subclassing as the prevalent
> >> approach to structuring your code has gotten into a lot of discredit
> >> since 1996.)
> >
> > But how would you write a dataReceived equivalent then? Would you have
> > a "task" looping on a read() call, e.g.
> >
> > @task
> > def my_protocol_main_loop(conn):
> >     while <some_condition>:
> >         try:
> >             data = yield
> >         except ConnectionError:
> >             conn.close()
> >             break
> >
> > I'm not sure I understand the problem with subclassing. It works fine
> > in Twisted. Even in Python 3 we don't shy away from subclassing, for
> > example the IO stack is based on subclassing RawIOBase, BufferedIOBase,
> > etc.
> Subclassing per se isn't a problem, but requiring a single
> dataReceived method per class can be awkward.  Many protocols are
> effectively state machines, and modeling each state as a function can
> be cleaner than a big if/switch block in dataReceived.  For example,
> here's a simplistic HTTP client using tornado's IOStream:
>        from tornado import ioloop
>         from tornado import iostream
>         import socket
>         def send_request():
>             stream.write("GET / HTTP/1.0\r\nHost:\r\n\r\n")
>             stream.read_until("\r\n\r\n", on_headers)
>         def on_headers(data):
>             headers = {}
>             for line in data.split("\r\n"):
>                parts = line.split(":")
>                if len(parts) == 2:
>                    headers[parts[0].strip()] = parts[1].strip()
>             stream.read_bytes(int(headers["Content-Length"]), on_body)
>         def on_body(data):
>             print data
>             stream.close()
>             ioloop.IOLoop.instance().stop()
>         s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
>         stream = iostream.IOStream(s)
>         stream.connect(("", 80), send_request)
>         ioloop.IOLoop.instance().start()
> Classes allow and encourage broader interfaces, which are sometimes a
> good thing, but interact poorly with coroutines.  Both twisted and
> tornado use separate callbacks for incoming data and for the
> connection being closed, but for coroutines it's probably better to
> just treat a closed connection as an error on the read.  Futures (and
> yield from) give us a nice way to do that.
> -Ben
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tismer at  Sat Oct 13 19:22:13 2012
From: tismer at (Christian Tismer)
Date: Sat, 13 Oct 2012 19:22:13 +0200
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

On 13.10.12 18:17, Guido van Rossum wrote:
> ....
>> Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once.  If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk.
> Well understood. (And I don't even want to get microthreading into the
> mix, although others may disagree -- I see Christian Tismer has jumped
> in...) But I also think that if we design these things in isolation
> it's likely that we'll find later that the pieces don't fit, and I
> don't want that to happen either. So I think we should consider these
> separate, but loosely coordinated efforts.

I don't disagree but understand this, too.
As long as we are talking Python 3.x, the topic is good compromises,
usability and coordination. Pushing for microthreads would not be
constructive for these threads (email-threads, of course ;-) .

ciao - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From ben at  Sat Oct 13 19:27:55 2012
From: ben at (Ben Darnell)
Date: Sat, 13 Oct 2012 10:27:55 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at> wrote:
> What calls on_headers in this example? Coming from twisted, that seems like
> dataReceived's responsibility, but given your introductory paragraph that's
> not actually what goes on here?

The IOStream does, after send_request calls
stream.read_until("\r\n\r\n", on_headers).  Inside IOStream, there is
a _handle_read method that is registered with the IOLoop and fills up
a buffer.  When the read condition is satisfied the IOStream calls
back into application code.


> On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell <ben at> wrote:
>> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis at>
>> wrote:
>> > On Fri, 12 Oct 2012 15:11:54 -0700
>> > Guido van Rossum <guido at> wrote:
>> >>
>> >> > 2. Method dispatch callbacks:
>> >> >
>> >> >     Similar to the above, the reactor or somebody has a handle on
>> >> > your
>> >> > object, and calls methods that you've defined when events happen
>> >> >     e.g. IProtocol's dataReceived method
>> >>
>> >> While I'm sure it's expedient and captures certain common patterns
>> >> well, I like this the least of all -- calling fixed methods on an
>> >> object sounds like a step back; it smells of the old Java way (before
>> >> it had some equivalent of anonymous functions), and of asyncore, which
>> >> (nearly) everybody agrees is kind of bad due to its insistence that
>> >> you subclass its classes. (Notice how subclassing as the prevalent
>> >> approach to structuring your code has gotten into a lot of discredit
>> >> since 1996.)
>> >
>> > But how would you write a dataReceived equivalent then? Would you have
>> > a "task" looping on a read() call, e.g.
>> >
>> > @task
>> > def my_protocol_main_loop(conn):
>> >     while <some_condition>:
>> >         try:
>> >             data = yield
>> >         except ConnectionError:
>> >             conn.close()
>> >             break
>> >
>> > I'm not sure I understand the problem with subclassing. It works fine
>> > in Twisted. Even in Python 3 we don't shy away from subclassing, for
>> > example the IO stack is based on subclassing RawIOBase, BufferedIOBase,
>> > etc.
>> Subclassing per se isn't a problem, but requiring a single
>> dataReceived method per class can be awkward.  Many protocols are
>> effectively state machines, and modeling each state as a function can
>> be cleaner than a big if/switch block in dataReceived.  For example,
>> here's a simplistic HTTP client using tornado's IOStream:
>>        from tornado import ioloop
>>         from tornado import iostream
>>         import socket
>>         def send_request():
>>             stream.write("GET / HTTP/1.0\r\nHost:\r\n\r\n")
>>             stream.read_until("\r\n\r\n", on_headers)
>>         def on_headers(data):
>>             headers = {}
>>             for line in data.split("\r\n"):
>>                parts = line.split(":")
>>                if len(parts) == 2:
>>                    headers[parts[0].strip()] = parts[1].strip()
>>             stream.read_bytes(int(headers["Content-Length"]), on_body)
>>         def on_body(data):
>>             print data
>>             stream.close()
>>             ioloop.IOLoop.instance().stop()
>>         s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
>>         stream = iostream.IOStream(s)
>>         stream.connect(("", 80), send_request)
>>         ioloop.IOLoop.instance().start()
>> Classes allow and encourage broader interfaces, which are sometimes a
>> good thing, but interact poorly with coroutines.  Both twisted and
>> tornado use separate callbacks for incoming data and for the
>> connection being closed, but for coroutines it's probably better to
>> just treat a closed connection as an error on the read.  Futures (and
>> yield from) give us a nice way to do that.
>> -Ben
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
> --
> cheers
> lvh

From _ at  Sat Oct 13 19:49:59 2012
From: _ at (Laurens Van Houtven)
Date: Sat, 13 Oct 2012 19:49:59 +0200
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

Interesting. That's certainly a nice API, but that then again (read_until)
sounds like something I'd implement using dataReceived... You know,
read_until clears the buffer, logs the requested callback. data_received
adds something to the buffer, and checks if it triggered the (one of the?)
registered callbacks.

Of course, I may just be rusted in my ways and trying to implement
everything in terms of things I know (then again, that might be just what's
needed when you're trying to make a useful general API).

I guess it's time for me to go deep-diving into Tornado :)

On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell <ben at> wrote:

> On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at> wrote:
> > What calls on_headers in this example? Coming from twisted, that seems
> like
> > dataReceived's responsibility, but given your introductory paragraph
> that's
> > not actually what goes on here?
> The IOStream does, after send_request calls
> stream.read_until("\r\n\r\n", on_headers).  Inside IOStream, there is
> a _handle_read method that is registered with the IOLoop and fills up
> a buffer.  When the read condition is satisfied the IOStream calls
> back into application code.
> -Ben
> >
> >
> > On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell <ben at> wrote:
> >>
> >> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis at>
> >> wrote:
> >> > On Fri, 12 Oct 2012 15:11:54 -0700
> >> > Guido van Rossum <guido at> wrote:
> >> >>
> >> >> > 2. Method dispatch callbacks:
> >> >> >
> >> >> >     Similar to the above, the reactor or somebody has a handle on
> >> >> > your
> >> >> > object, and calls methods that you've defined when events happen
> >> >> >     e.g. IProtocol's dataReceived method
> >> >>
> >> >> While I'm sure it's expedient and captures certain common patterns
> >> >> well, I like this the least of all -- calling fixed methods on an
> >> >> object sounds like a step back; it smells of the old Java way (before
> >> >> it had some equivalent of anonymous functions), and of asyncore,
> which
> >> >> (nearly) everybody agrees is kind of bad due to its insistence that
> >> >> you subclass its classes. (Notice how subclassing as the prevalent
> >> >> approach to structuring your code has gotten into a lot of discredit
> >> >> since 1996.)
> >> >
> >> > But how would you write a dataReceived equivalent then? Would you have
> >> > a "task" looping on a read() call, e.g.
> >> >
> >> > @task
> >> > def my_protocol_main_loop(conn):
> >> >     while <some_condition>:
> >> >         try:
> >> >             data = yield
> >> >         except ConnectionError:
> >> >             conn.close()
> >> >             break
> >> >
> >> > I'm not sure I understand the problem with subclassing. It works fine
> >> > in Twisted. Even in Python 3 we don't shy away from subclassing, for
> >> > example the IO stack is based on subclassing RawIOBase,
> BufferedIOBase,
> >> > etc.
> >>
> >> Subclassing per se isn't a problem, but requiring a single
> >> dataReceived method per class can be awkward.  Many protocols are
> >> effectively state machines, and modeling each state as a function can
> >> be cleaner than a big if/switch block in dataReceived.  For example,
> >> here's a simplistic HTTP client using tornado's IOStream:
> >>
> >>        from tornado import ioloop
> >>         from tornado import iostream
> >>         import socket
> >>
> >>         def send_request():
> >>             stream.write("GET / HTTP/1.0\r\nHost:
> \r\n\r\n")
> >>             stream.read_until("\r\n\r\n", on_headers)
> >>
> >>         def on_headers(data):
> >>             headers = {}
> >>             for line in data.split("\r\n"):
> >>                parts = line.split(":")
> >>                if len(parts) == 2:
> >>                    headers[parts[0].strip()] = parts[1].strip()
> >>             stream.read_bytes(int(headers["Content-Length"]), on_body)
> >>
> >>         def on_body(data):
> >>             print data
> >>             stream.close()
> >>             ioloop.IOLoop.instance().stop()
> >>
> >>         s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
> >>         stream = iostream.IOStream(s)
> >>         stream.connect(("", 80), send_request)
> >>         ioloop.IOLoop.instance().start()
> >>
> >>
> >> Classes allow and encourage broader interfaces, which are sometimes a
> >> good thing, but interact poorly with coroutines.  Both twisted and
> >> tornado use separate callbacks for incoming data and for the
> >> connection being closed, but for coroutines it's probably better to
> >> just treat a closed connection as an error on the read.  Futures (and
> >> yield from) give us a nice way to do that.
> >>
> >> -Ben
> >> _______________________________________________
> >> Python-ideas mailing list
> >> Python-ideas at
> >>
> >
> >
> >
> >
> > --
> > cheers
> > lvh
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From _ at  Sat Oct 13 19:54:34 2012
From: _ at (Laurens Van Houtven)
Date: Sat, 13 Oct 2012 19:54:34 +0200
Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 1:22 AM, Guido van Rossum <guido at> wrote:

> [Hopefully this is the last spin-off thread from "asyncore: included
> batteries don't fit"]
> So it's totally unfinished?

At the time, the people I talked to placed significantly more weight in
"explain why this is necessary" than "get me something I can play with".

> > Do you feel that there should be less talk about rationale?
> No, but I feel that there should be some actual specification. I am
> also looking forward to an actual meaty bit of example code -- ISTR
> you mentioned you had something, but that it was incomplete, and I
> can't find the link.

Just examples of how it would work, nothing hooked up to real code. My
memory of it is more of a drowning-in-politics-and-bikeshedding kind of
thing, unfortunately :) Either way, I'm okay with letting bygones be
bygones and focus on how we can get this show on the road.

> It's not that there's *no* reference to IO: it's just that that reference
> is
> > abstracted away in data_received and the protocol's transport object,
> just
> > like Twisted's IProtocol.
> The words "data_received" don't even occur in the PEP.

See above.

What thread should I reply in about the pull APIs?

> I just want to make sure that we don't *completely* paint ourselves into
> the wrong corner when it comes to that.

I don't think we have to worry about it too much. Any reasonable API I can
think of makes this completely doable.

But I'm really hoping you'll make good on your promise of redoing
> async-pep, giving some actual specifications and example code, so I
> can play with it.


- The async API of the future is very important, and too important to be
left to chance.
- It requires a lot of very experienced manpower.
- It requires a lot of effort to handle the hashing out of it (as we're
doing here) as well as it deserves to be.

I'll take as proactive a role as I can afford to take in this process, but
I don't think I can do it by myself. Furthermore, it's a risk nobody wants
to take: a repeat performance wouldn't be good for anyone, in particular
not for Python nor myself.

I've asked JP Calderone and Itamar Turner-Trauring if they would be
interested in carrying this forward professionally, and they have
tentatively said yes. JP's already familiar with a large part of the
problem space with the implementation of the ssl module. JP and Itamar have
worked together for years and have recently set up a consulting firm.

Given that this is emphatically important to Python, I intend to apply for
a PSF grant on their behalf to further this goal. Given their experience in
the field, I expect this to be a fairly low risk endeavor.

> --
> --Guido van Rossum (

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From dreamingforward at  Sat Oct 13 20:20:02 2012
From: dreamingforward at (Mark Adam)
Date: Sat, 13 Oct 2012 13:20:02 -0500
Subject: [Python-ideas] Floating point contexts in Python core
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 5:54 PM, Mark Adam <dreamingforward at> wrote:
> On Thu, Oct 11, 2012 at 8:03 PM, Steven D'Aprano <steve at> wrote:
>>>> I would gladly give up a small amount of speed for better control
>>>> over floats, such as whether 1/0.0 raised an exception or
>>>> returned infinity.
>>> Umm, you would be giving up a *lot* of speed.  Native floating point
>>> happens right in the processor, so if you want special behavior, you'd
>>> have to take the floating point out of hardware and into "user space".
>> Even in user-space, you're not giving up that much speed in practical
>> terms, at least not for my needs. The new decimal module in Python 3.3 is
>> less than a factor of 10 times slower than Python's floats, which makes it
>> pretty much instantaneous to my mind :)
> Hmm, well, if it's only that much slower, then we should implement
> Rationals and get rid of the issue altogether.

Now that I think of it, this issue has a strange whiff of the argument
wherefrom came the "from __future__" directive and the split that
happened between the vpython folks who needed the direct support of
float division (rendering 3-d graphics for an interpreted environment)
and the regular python crowd.   Anyone else remember that?


From sturla at  Sat Oct 13 20:29:57 2012
From: sturla at (Sturla Molden)
Date: Sat, 13 Oct 2012 20:29:57 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

Den 13. okt. 2012 kl. 06:44 skrev Devin Jeanpierre <jeanpierreda at>:

> Python has cleverly left the $ symbol unused.
> We can use it as a quasiquote to embed executable TeX.
>    for x in xrange($b \cdot \sum_{i=1}^n \frac{x^n}{n!}$):
>        ...
> No need to wait for that new language, we can have a rich set of math
> operators today!


But hey, this is valid Python :D :D

for x in texrange(r"$b \cdot \sum_{i=1}^n \frac{x^n}{n!}$"): pass


From ben at  Sat Oct 13 20:54:27 2012
From: ben at (Ben Darnell)
Date: Sat, 13 Oct 2012 11:54:27 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 10:49 AM, Laurens Van Houtven <_ at> wrote:
> Interesting. That's certainly a nice API, but that then again (read_until)
> sounds like something I'd implement using dataReceived... You know,
> read_until clears the buffer, logs the requested callback. data_received
> adds something to the buffer, and checks if it triggered the (one of the?)
> registered callbacks.

Right, that's how IOStream is implemented internally.  The
transport/protocol split works a little differently in Tornado:
IOStream is implemented something like a Protocol subclass, but we
consider it a part of the transport layer.  The "protocols" are
arbitrary classes that don't share any particular interface, but
instead just call methods on the IOStream.


> Of course, I may just be rusted in my ways and trying to implement
> everything in terms of things I know (then again, that might be just what's
> needed when you're trying to make a useful general API).
> I guess it's time for me to go deep-diving into Tornado :)
> On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell <ben at> wrote:
>> On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at> wrote:
>> > What calls on_headers in this example? Coming from twisted, that seems
>> > like
>> > dataReceived's responsibility, but given your introductory paragraph
>> > that's
>> > not actually what goes on here?
>> The IOStream does, after send_request calls
>> stream.read_until("\r\n\r\n", on_headers).  Inside IOStream, there is
>> a _handle_read method that is registered with the IOLoop and fills up
>> a buffer.  When the read condition is satisfied the IOStream calls
>> back into application code.
>> -Ben
>> >
>> >
>> > On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell <ben at> wrote:
>> >>
>> >> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis at>
>> >> wrote:
>> >> > On Fri, 12 Oct 2012 15:11:54 -0700
>> >> > Guido van Rossum <guido at> wrote:
>> >> >>
>> >> >> > 2. Method dispatch callbacks:
>> >> >> >
>> >> >> >     Similar to the above, the reactor or somebody has a handle on
>> >> >> > your
>> >> >> > object, and calls methods that you've defined when events happen
>> >> >> >     e.g. IProtocol's dataReceived method
>> >> >>
>> >> >> While I'm sure it's expedient and captures certain common patterns
>> >> >> well, I like this the least of all -- calling fixed methods on an
>> >> >> object sounds like a step back; it smells of the old Java way
>> >> >> (before
>> >> >> it had some equivalent of anonymous functions), and of asyncore,
>> >> >> which
>> >> >> (nearly) everybody agrees is kind of bad due to its insistence that
>> >> >> you subclass its classes. (Notice how subclassing as the prevalent
>> >> >> approach to structuring your code has gotten into a lot of discredit
>> >> >> since 1996.)
>> >> >
>> >> > But how would you write a dataReceived equivalent then? Would you
>> >> > have
>> >> > a "task" looping on a read() call, e.g.
>> >> >
>> >> > @task
>> >> > def my_protocol_main_loop(conn):
>> >> >     while <some_condition>:
>> >> >         try:
>> >> >             data = yield
>> >> >         except ConnectionError:
>> >> >             conn.close()
>> >> >             break
>> >> >
>> >> > I'm not sure I understand the problem with subclassing. It works fine
>> >> > in Twisted. Even in Python 3 we don't shy away from subclassing, for
>> >> > example the IO stack is based on subclassing RawIOBase,
>> >> > BufferedIOBase,
>> >> > etc.
>> >>
>> >> Subclassing per se isn't a problem, but requiring a single
>> >> dataReceived method per class can be awkward.  Many protocols are
>> >> effectively state machines, and modeling each state as a function can
>> >> be cleaner than a big if/switch block in dataReceived.  For example,
>> >> here's a simplistic HTTP client using tornado's IOStream:
>> >>
>> >>        from tornado import ioloop
>> >>         from tornado import iostream
>> >>         import socket
>> >>
>> >>         def send_request():
>> >>             stream.write("GET / HTTP/1.0\r\nHost:
>> >>\r\n\r\n")
>> >>             stream.read_until("\r\n\r\n", on_headers)
>> >>
>> >>         def on_headers(data):
>> >>             headers = {}
>> >>             for line in data.split("\r\n"):
>> >>                parts = line.split(":")
>> >>                if len(parts) == 2:
>> >>                    headers[parts[0].strip()] = parts[1].strip()
>> >>             stream.read_bytes(int(headers["Content-Length"]), on_body)
>> >>
>> >>         def on_body(data):
>> >>             print data
>> >>             stream.close()
>> >>             ioloop.IOLoop.instance().stop()
>> >>
>> >>         s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
>> >>         stream = iostream.IOStream(s)
>> >>         stream.connect(("", 80), send_request)
>> >>         ioloop.IOLoop.instance().start()
>> >>
>> >>
>> >> Classes allow and encourage broader interfaces, which are sometimes a
>> >> good thing, but interact poorly with coroutines.  Both twisted and
>> >> tornado use separate callbacks for incoming data and for the
>> >> connection being closed, but for coroutines it's probably better to
>> >> just treat a closed connection as an error on the read.  Futures (and
>> >> yield from) give us a nice way to do that.
>> >>
>> >> -Ben
>> >> _______________________________________________
>> >> Python-ideas mailing list
>> >> Python-ideas at
>> >>
>> >
>> >
>> >
>> >
>> > --
>> > cheers
>> > lvh
>> >
> --
> cheers
> lvh

From _ at  Sat Oct 13 21:13:09 2012
From: _ at (Laurens Van Houtven)
Date: Sat, 13 Oct 2012 21:13:09 +0200
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

I quite like IOStream's interface, actually. If that's part of the
transport layer, how do you prevent from having duplicating its behavior
(read_until etc)? If there's just another separate object that would be the
ITransport in twisted, I think the difference is purely one of labeling.

On Sat, Oct 13, 2012 at 8:54 PM, Ben Darnell <ben at> wrote:

> On Sat, Oct 13, 2012 at 10:49 AM, Laurens Van Houtven <_ at> wrote:
> > Interesting. That's certainly a nice API, but that then again
> (read_until)
> > sounds like something I'd implement using dataReceived... You know,
> > read_until clears the buffer, logs the requested callback. data_received
> > adds something to the buffer, and checks if it triggered the (one of
> the?)
> > registered callbacks.
> Right, that's how IOStream is implemented internally.  The
> transport/protocol split works a little differently in Tornado:
> IOStream is implemented something like a Protocol subclass, but we
> consider it a part of the transport layer.  The "protocols" are
> arbitrary classes that don't share any particular interface, but
> instead just call methods on the IOStream.
> -Ben
> >
> > Of course, I may just be rusted in my ways and trying to implement
> > everything in terms of things I know (then again, that might be just
> what's
> > needed when you're trying to make a useful general API).
> >
> > I guess it's time for me to go deep-diving into Tornado :)
> >
> >
> > On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell <ben at> wrote:
> >>
> >> On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at> wrote:
> >> > What calls on_headers in this example? Coming from twisted, that seems
> >> > like
> >> > dataReceived's responsibility, but given your introductory paragraph
> >> > that's
> >> > not actually what goes on here?
> >>
> >> The IOStream does, after send_request calls
> >> stream.read_until("\r\n\r\n", on_headers).  Inside IOStream, there is
> >> a _handle_read method that is registered with the IOLoop and fills up
> >> a buffer.  When the read condition is satisfied the IOStream calls
> >> back into application code.
> >>
> >> -Ben
> >>
> >> >
> >> >
> >> > On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell <ben at>
> wrote:
> >> >>
> >> >> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <
> solipsis at>
> >> >> wrote:
> >> >> > On Fri, 12 Oct 2012 15:11:54 -0700
> >> >> > Guido van Rossum <guido at> wrote:
> >> >> >>
> >> >> >> > 2. Method dispatch callbacks:
> >> >> >> >
> >> >> >> >     Similar to the above, the reactor or somebody has a handle
> on
> >> >> >> > your
> >> >> >> > object, and calls methods that you've defined when events happen
> >> >> >> >     e.g. IProtocol's dataReceived method
> >> >> >>
> >> >> >> While I'm sure it's expedient and captures certain common patterns
> >> >> >> well, I like this the least of all -- calling fixed methods on an
> >> >> >> object sounds like a step back; it smells of the old Java way
> >> >> >> (before
> >> >> >> it had some equivalent of anonymous functions), and of asyncore,
> >> >> >> which
> >> >> >> (nearly) everybody agrees is kind of bad due to its insistence
> that
> >> >> >> you subclass its classes. (Notice how subclassing as the prevalent
> >> >> >> approach to structuring your code has gotten into a lot of
> discredit
> >> >> >> since 1996.)
> >> >> >
> >> >> > But how would you write a dataReceived equivalent then? Would you
> >> >> > have
> >> >> > a "task" looping on a read() call, e.g.
> >> >> >
> >> >> > @task
> >> >> > def my_protocol_main_loop(conn):
> >> >> >     while <some_condition>:
> >> >> >         try:
> >> >> >             data = yield
> >> >> >         except ConnectionError:
> >> >> >             conn.close()
> >> >> >             break
> >> >> >
> >> >> > I'm not sure I understand the problem with subclassing. It works
> fine
> >> >> > in Twisted. Even in Python 3 we don't shy away from subclassing,
> for
> >> >> > example the IO stack is based on subclassing RawIOBase,
> >> >> > BufferedIOBase,
> >> >> > etc.
> >> >>
> >> >> Subclassing per se isn't a problem, but requiring a single
> >> >> dataReceived method per class can be awkward.  Many protocols are
> >> >> effectively state machines, and modeling each state as a function can
> >> >> be cleaner than a big if/switch block in dataReceived.  For example,
> >> >> here's a simplistic HTTP client using tornado's IOStream:
> >> >>
> >> >>        from tornado import ioloop
> >> >>         from tornado import iostream
> >> >>         import socket
> >> >>
> >> >>         def send_request():
> >> >>             stream.write("GET / HTTP/1.0\r\nHost:
> >> >>\r\n\r\n")
> >> >>             stream.read_until("\r\n\r\n", on_headers)
> >> >>
> >> >>         def on_headers(data):
> >> >>             headers = {}
> >> >>             for line in data.split("\r\n"):
> >> >>                parts = line.split(":")
> >> >>                if len(parts) == 2:
> >> >>                    headers[parts[0].strip()] = parts[1].strip()
> >> >>             stream.read_bytes(int(headers["Content-Length"]),
> on_body)
> >> >>
> >> >>         def on_body(data):
> >> >>             print data
> >> >>             stream.close()
> >> >>             ioloop.IOLoop.instance().stop()
> >> >>
> >> >>         s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
> >> >>         stream = iostream.IOStream(s)
> >> >>         stream.connect(("", 80), send_request)
> >> >>         ioloop.IOLoop.instance().start()
> >> >>
> >> >>
> >> >> Classes allow and encourage broader interfaces, which are sometimes a
> >> >> good thing, but interact poorly with coroutines.  Both twisted and
> >> >> tornado use separate callbacks for incoming data and for the
> >> >> connection being closed, but for coroutines it's probably better to
> >> >> just treat a closed connection as an error on the read.  Futures (and
> >> >> yield from) give us a nice way to do that.
> >> >>
> >> >> -Ben
> >> >> _______________________________________________
> >> >> Python-ideas mailing list
> >> >> Python-ideas at
> >> >>
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > cheers
> >> > lvh
> >> >
> >
> >
> >
> >
> > --
> > cheers
> > lvh
> >

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Sat Oct 13 21:14:09 2012
From: at (Joshua Landau)
Date: Sat, 13 Oct 2012 20:14:09 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 13 October 2012 19:29, Sturla Molden <sturla at> wrote:

> Den 13. okt. 2012 kl. 06:44 skrev Devin Jeanpierre <jeanpierreda at
> >:
> >
> > Python has cleverly left the $ symbol unused.
> >
> > We can use it as a quasiquote to embed executable TeX.
> >
> >    for x in xrange($b \cdot \sum_{i=1}^n \frac{x^n}{n!}$):
> >        ...
> >
> > No need to wait for that new language, we can have a rich set of math
> > operators today!
> >
> LOL :D
> But hey, this is valid Python :D :D
> for x in texrange(r"$b \cdot \sum_{i=1}^n \frac{x^n}{n!}$"): pass

I am glad someone else shares the same progressive attitude. I, personally,
wrap my whole code like so:

import texcode

> \eq{y}{\range{1}{10}}
> \for{x}{y}{
>     \print{x}
> }
> """)  # Alas, the joy has to end

Which has tremendously improved the quality of my output.
Recently, rendering my code, too, has sped up to a remarkable 3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ben at  Sat Oct 13 21:25:38 2012
From: ben at (Ben Darnell)
Date: Sat, 13 Oct 2012 12:25:38 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 12:13 PM, Laurens Van Houtven <_ at> wrote:
> I quite like IOStream's interface, actually. If that's part of the transport
> layer, how do you prevent from having duplicating its behavior (read_until
> etc)? If there's just another separate object that would be the ITransport
> in twisted, I think the difference is purely one of labeling.

So far we haven't actually needed much flexibility in the transport
layer - most of the functionality is in the BaseIOStream class, and
then there are subclasses IOStream (regular sockets), SSLIOStream, and
PipeIOStream that actually call recv(), read(), connect(), etc.  We
might need a little refactoring if we introduce dramatically different
types of transports, but the plan is that we'd represent transports as
classes in the IOStream hierarchy.


> On Sat, Oct 13, 2012 at 8:54 PM, Ben Darnell <ben at> wrote:
>> On Sat, Oct 13, 2012 at 10:49 AM, Laurens Van Houtven <_ at> wrote:
>> > Interesting. That's certainly a nice API, but that then again
>> > (read_until)
>> > sounds like something I'd implement using dataReceived... You know,
>> > read_until clears the buffer, logs the requested callback. data_received
>> > adds something to the buffer, and checks if it triggered the (one of
>> > the?)
>> > registered callbacks.
>> Right, that's how IOStream is implemented internally.  The
>> transport/protocol split works a little differently in Tornado:
>> IOStream is implemented something like a Protocol subclass, but we
>> consider it a part of the transport layer.  The "protocols" are
>> arbitrary classes that don't share any particular interface, but
>> instead just call methods on the IOStream.
>> -Ben
>> >
>> > Of course, I may just be rusted in my ways and trying to implement
>> > everything in terms of things I know (then again, that might be just
>> > what's
>> > needed when you're trying to make a useful general API).
>> >
>> > I guess it's time for me to go deep-diving into Tornado :)
>> >
>> >
>> > On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell <ben at> wrote:
>> >>
>> >> On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at> wrote:
>> >> > What calls on_headers in this example? Coming from twisted, that
>> >> > seems
>> >> > like
>> >> > dataReceived's responsibility, but given your introductory paragraph
>> >> > that's
>> >> > not actually what goes on here?
>> >>
>> >> The IOStream does, after send_request calls
>> >> stream.read_until("\r\n\r\n", on_headers).  Inside IOStream, there is
>> >> a _handle_read method that is registered with the IOLoop and fills up
>> >> a buffer.  When the read condition is satisfied the IOStream calls
>> >> back into application code.
>> >>
>> >> -Ben
>> >>
>> >> >
>> >> >
>> >> > On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell <ben at>
>> >> > wrote:
>> >> >>
>> >> >> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou
>> >> >> <solipsis at>
>> >> >> wrote:
>> >> >> > On Fri, 12 Oct 2012 15:11:54 -0700
>> >> >> > Guido van Rossum <guido at> wrote:
>> >> >> >>
>> >> >> >> > 2. Method dispatch callbacks:
>> >> >> >> >
>> >> >> >> >     Similar to the above, the reactor or somebody has a handle
>> >> >> >> > on
>> >> >> >> > your
>> >> >> >> > object, and calls methods that you've defined when events
>> >> >> >> > happen
>> >> >> >> >     e.g. IProtocol's dataReceived method
>> >> >> >>
>> >> >> >> While I'm sure it's expedient and captures certain common
>> >> >> >> patterns
>> >> >> >> well, I like this the least of all -- calling fixed methods on an
>> >> >> >> object sounds like a step back; it smells of the old Java way
>> >> >> >> (before
>> >> >> >> it had some equivalent of anonymous functions), and of asyncore,
>> >> >> >> which
>> >> >> >> (nearly) everybody agrees is kind of bad due to its insistence
>> >> >> >> that
>> >> >> >> you subclass its classes. (Notice how subclassing as the
>> >> >> >> prevalent
>> >> >> >> approach to structuring your code has gotten into a lot of
>> >> >> >> discredit
>> >> >> >> since 1996.)
>> >> >> >
>> >> >> > But how would you write a dataReceived equivalent then? Would you
>> >> >> > have
>> >> >> > a "task" looping on a read() call, e.g.
>> >> >> >
>> >> >> > @task
>> >> >> > def my_protocol_main_loop(conn):
>> >> >> >     while <some_condition>:
>> >> >> >         try:
>> >> >> >             data = yield
>> >> >> >         except ConnectionError:
>> >> >> >             conn.close()
>> >> >> >             break
>> >> >> >
>> >> >> > I'm not sure I understand the problem with subclassing. It works
>> >> >> > fine
>> >> >> > in Twisted. Even in Python 3 we don't shy away from subclassing,
>> >> >> > for
>> >> >> > example the IO stack is based on subclassing RawIOBase,
>> >> >> > BufferedIOBase,
>> >> >> > etc.
>> >> >>
>> >> >> Subclassing per se isn't a problem, but requiring a single
>> >> >> dataReceived method per class can be awkward.  Many protocols are
>> >> >> effectively state machines, and modeling each state as a function
>> >> >> can
>> >> >> be cleaner than a big if/switch block in dataReceived.  For example,
>> >> >> here's a simplistic HTTP client using tornado's IOStream:
>> >> >>
>> >> >>        from tornado import ioloop
>> >> >>         from tornado import iostream
>> >> >>         import socket
>> >> >>
>> >> >>         def send_request():
>> >> >>             stream.write("GET / HTTP/1.0\r\nHost:
>> >> >>\r\n\r\n")
>> >> >>             stream.read_until("\r\n\r\n", on_headers)
>> >> >>
>> >> >>         def on_headers(data):
>> >> >>             headers = {}
>> >> >>             for line in data.split("\r\n"):
>> >> >>                parts = line.split(":")
>> >> >>                if len(parts) == 2:
>> >> >>                    headers[parts[0].strip()] = parts[1].strip()
>> >> >>             stream.read_bytes(int(headers["Content-Length"]),
>> >> >> on_body)
>> >> >>
>> >> >>         def on_body(data):
>> >> >>             print data
>> >> >>             stream.close()
>> >> >>             ioloop.IOLoop.instance().stop()
>> >> >>
>> >> >>         s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
>> >> >>         stream = iostream.IOStream(s)
>> >> >>         stream.connect(("", 80), send_request)
>> >> >>         ioloop.IOLoop.instance().start()
>> >> >>
>> >> >>
>> >> >> Classes allow and encourage broader interfaces, which are sometimes
>> >> >> a
>> >> >> good thing, but interact poorly with coroutines.  Both twisted and
>> >> >> tornado use separate callbacks for incoming data and for the
>> >> >> connection being closed, but for coroutines it's probably better to
>> >> >> just treat a closed connection as an error on the read.  Futures
>> >> >> (and
>> >> >> yield from) give us a nice way to do that.
>> >> >>
>> >> >> -Ben
>> >> >> _______________________________________________
>> >> >> Python-ideas mailing list
>> >> >> Python-ideas at
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > cheers
>> >> > lvh
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> > cheers
>> > lvh
>> >
> --
> cheers
> lvh

From sturla at  Sat Oct 13 22:13:28 2012
From: sturla at (Sturla Molden)
Date: Sat, 13 Oct 2012 22:13:28 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

Den 13. okt. 2012 kl. 21:14 skrev Joshua Landau < at>:

> I am glad someone else shares the same progressive attitude. I, personally, wrap my whole code like so:
>> import texcode 
>> texcode.texecute("""
>> \eq{y}{\range{1}{10}}
>> \for{x}{y}{
>>     \print{x}
>> }
>> """)  # Alas, the joy has to end
> Which has tremendously improved the quality of my output.
> Recently, rendering my code, too, has sped up to a remarkable 3 pages-per-minute!

Gee, I thought texecution was the Texas death penalty :-)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From grosser.meister.morti at  Sat Oct 13 22:20:12 2012
From: grosser.meister.morti at (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Sat, 13 Oct 2012 22:20:12 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/12/2012 10:27 PM, Ram Rachum wrote:
> Hi everybody,
> Today a funny thought occurred to me. Ever since I've learned to program when I was a child, I've
> taken for granted that when programming, the sign used for multiplication is *. But now that I think
> about it, why? Now that we have Unicode, why not use ? ?
> Do you think that we can make Python support ? in addition to *?
> I can think of a couple of problems, but none of them seem like deal-breakers:
>   - Backward compatibility: Python already uses *, but I don't see a backward compatibility problem
> with supporting ? additionally. Let people use whichever they want, like spaces and tabs.
>   - Input methods: I personally use an IDE that could be easily set to automatically convert * to ?
> where appropriate and to allow manual input of ?. People on Linux can type Alt-. .

I use Linux (KDE4). When I press Alt-. in kwrite I simply get . in gvim I get ? and here in 
Thunderbird I get nothing. So I don't think this is very practical.

> Anyone else can
> set up a script that'll let them type ? using whichever keyboard combination they want. I admit this
> is pretty annoying, but since you can always use * if you want to, I figure that anyone who cares
> enough about using ? instead of * (I bet that people in scientific computing would like that) would
> be willing to take the time to set it up.
> What do you think?
> Ram.

From grosser.meister.morti at  Sat Oct 13 22:25:09 2012
From: grosser.meister.morti at (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Sat, 13 Oct 2012 22:25:09 +0200
Subject: [Python-ideas] Is there a good reason to use *
	for	multiplication?
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/13/2012 07:37 AM, Greg Ewing wrote:
> Ram Rachum wrote:
>> I could say that for newbies it's one small confusion that could removed from the language. You
>> and I have been programming for a long time so we take it for granted that * means multiplication,
>> but for any other person that's just another weird idiosyncrasy that further alienates programming.
> Do you have any evidence that a substantial number of
> beginners are confused by * for multiplication, or that
> they have trouble remembering what it means once they've
> been told?
> If you do, is there further evidence that they would
> find a dot to be any clearer?
> The use of a raised dot to indicate multiplication of
> numbers is actually quite rare even in mathematics, and I
> would not expect anyone without a mathematical background
> to even be aware of it.
> In primary school we're taught that 'x' means multiplication.
> Later when we come to algebra, we're taught not to use
> any symbol at all, just write things next to each other.
> A dot is only used in rare cases where there would
> otherwise be ambiguity -- and even then it's often
> preferred to parenthesise things instead.
> And don't forget there's great potential for confusion
> with the decimal point.

I'm -1 on the whole idea.

Also why use ? and not ?? I think unicode in source code is a bad idea.

From grosser.meister.morti at  Sat Oct 13 22:33:56 2012
From: grosser.meister.morti at (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=)
Date: Sat, 13 Oct 2012 22:33:56 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/13/2012 06:20 AM, Bruce Leban wrote:
> Well, I learned x as a multiplication symbol long before I learned either ? or *, and in many fonts
> you can barely see the middle dot. Is there a good reason, we can't just write foo x bar instead of
> foo * bar? If that's confusing we could use ? instead. No one would ever confuse ? and x.
> Or for that matter how about (~R?R?.?R)/R?1??R

On related news: The source code of the APL complier (interpreter?) was released.

I'm still baffled that this programming language was ever in production use.

> Seriously: learning that * means multiplication is a very small thing. You also need to learn what
> /, // and % do, and the difference between 'and' and &, and between =, ==, != and /=.
> --- Bruce
> On Fri, Oct 12, 2012 at 7:41 PM, Steven D'Aprano <steve at <mailto:steve at>>
> wrote:
>     On 13/10/12 07:27, Ram Rachum wrote:
>         Hi everybody,
>         Today a funny thought occurred to me. Ever since I've learned to program
>         when I was a child, I've taken for granted that when programming, the sign
>         used for multiplication is *. But now that I think about it, why? Now that
>         we have Unicode, why not use ? ?
>     t
>     25 or so years ago, I used to do some programming in Apple's Hypertalk
>     language, which accepted ? in place of / for division. The use of two
>     symbols for the same operation didn't cause any problem for users. But then
>     Apple had the advantage that there was a single, system-wide, highly
>     discoverable way of typing non-ASCII characters at the keyboard, and Apple
>     users tended to pride themselves for using them.
>     I'm not entirely sure about MIDDLE DOT though: especially in small font sizes,
>     it falls foul of the design principle:
>     "syntax should not look like a speck of dust on Tim's monitor"
>     (paraphrasing... can anyone locate the original quote?)
>     and may be too easily confused with FULL STOP. Another problem is that MIDDLE
>     DOT is currently valid in identifiers, so that a?b would count as a single
>     name. Fixing this would require some fairly heavy lifting (a period of
>     deprecation and warnings for any identifier using MIDDLE DOT) before
>     introducing it as an operator. So that's a lot of effort for very little gain.
>     If I were designing a language from scratch today, with full Unicode support
>     from the beginning, I would support a rich set of operators possibly even
>     including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user
>     to use them wisely or not at all. But I don't think it would be appropriate
>     for Python to add them, at least not before Python 4: too much effort for too
>     little gain. Maybe in another ten years people will be less resistant to
>     Unicode operators.
>     [...]
>         ?. People on Linux can type Alt-. .
>     For what it is worth, I'm using Linux and that does not work for me. I am
>     yet to find a decent method of entering non-ASCII characters.
>     --
>     Steven
>     _________________________________________________
>     Python-ideas mailing list
>     Python-ideas at <mailto:Python-ideas at>
>     <>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From at  Sat Oct 13 22:45:32 2012
From: at (Joshua Landau)
Date: Sat, 13 Oct 2012 21:45:32 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 13 October 2012 21:20, Mathias Panzenb?ck
<grosser.meister.morti at>wrote:

> On 10/12/2012 10:27 PM, Ram Rachum wrote:
>> Hi everybody,
>> Today a funny thought occurred to me. Ever since I've learned to program
>> when I was a child, I've
>> taken for granted that when programming, the sign used for multiplication
>> is *. But now that I think
>> about it, why? Now that we have Unicode, why not use ? ?
>> Do you think that we can make Python support ? in addition to *?
>> I can think of a couple of problems, but none of them seem like
>> deal-breakers:
>>   - Backward compatibility: Python already uses *, but I don't see a
>> backward compatibility problem
>> with supporting ? additionally. Let people use whichever they want, like
>> spaces and tabs.
>>   - Input methods: I personally use an IDE that could be easily set to
>> automatically convert * to ?
>> where appropriate and to allow manual input of ?. People on Linux can
>> type Alt-. .
> I use Linux (KDE4). When I press Alt-. in kwrite I simply get . in gvim I
> get ? and here in Thunderbird I get nothing. So I don't think this is very
> practical.

Are y'all using your Alt Grill <>?
M?n? ?e?s m???????
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From grosser.meister.morti at  Sat Oct 13 23:50:17 2012
From: grosser.meister.morti at (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=)
Date: Sat, 13 Oct 2012 23:50:17 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/13/2012 10:45 PM, Joshua Landau wrote:
> On 13 October 2012 21:20, Mathias Panzenb?ck <grosser.meister.morti at
> <mailto:grosser.meister.morti at>> wrote:
>     On 10/12/2012 10:27 PM, Ram Rachum wrote:
>         Hi everybody,
>         Today a funny thought occurred to me. Ever since I've learned to program when I was a child,
>         I've
>         taken for granted that when programming, the sign used for multiplication is *. But now that
>         I think
>         about it, why? Now that we have Unicode, why not use ? ?
>         Do you think that we can make Python support ? in addition to *?
>         I can think of a couple of problems, but none of them seem like deal-breakers:
>            - Backward compatibility: Python already uses *, but I don't see a backward compatibility
>         problem
>         with supporting ? additionally. Let people use whichever they want, like spaces and tabs.
>            - Input methods: I personally use an IDE that could be easily set to automatically
>         convert * to ?
>         where appropriate and to allow manual input of ?. People on Linux can type Alt-. .
>     I use Linux (KDE4). When I press Alt-. in kwrite I simply get . in gvim I get ? and here in
>     Thunderbird I get nothing. So I don't think this is very practical.
> Are y'all using your Alt Grill <>? M?n? ?e?s m???????

With Alt Gr I always get ?

Ah, Alt Gr-, produces ? (German keyboard here, of course.)

From daniel.mcdougall at  Sun Oct 14 00:27:22 2012
From: daniel.mcdougall at (Daniel McDougall)
Date: Sat, 13 Oct 2012 18:27:22 -0400
Subject: [Python-ideas] The async API of the future: Some thoughts from an
	ignorant Tornado user
Message-ID: <>

(This is a response to GVR's Google+ post asking for ideas; I
apologize in advance if I come off as an ignorant programming newbie)

I am the author of Gate One (
which makes extensive use of Tornado's asynchronous capabilities.  It
also uses multiprocessing and threading to a lesser extent.  The
biggest issue I've had trying to write asynchronous code for Gate One
is complexity.  Complexity creates problems with expressiveness which
results in code that, to me, feels un-Pythonic.  For evidence of this
I present the following example:  The retrieve_log_playback()
function: (link goes to Github)

All the function does is generate and return (to the client browser)
an HTML playback of their terminal session recording.  To do it
efficiently without blocking the event loop or slowing down all other
connected clients required loads of complexity (or maybe I'm just
ignorant of "a better way"--feel free to enlighten me).  In an ideal
world I could have just done something like this:

import async # The API of the future ;)
async.async_call(retrieve_log_playback, settings, tws,
# tws == instance of tornado.web.WebSocketHandler that holds the open connection

...but instead I had to create an entirely separate function to act as
the multiprocessing.Process(), create a multiprocessing.Queue() to
shuffle data back and forth, watch a special file descriptor for
updates (so I can tell when the task is complete), and also create a
closure because the connection instance (aka 'tws') isn't pickleable.

After reading through these threads I feel much of the discussion is
over my head but as someone who will ultimately become a *user* of the
"async API of the future" I would like to share my thoughts...

My opinion is that the goal of any async module that winds up in
Python's standard library should be simplicity and portability.  In
terms of features, here's my 'async wishlist':

* I should not have to worry about what is and isn't pickleable when I
decide that a task should be performed asynchronously.
* I should be able to choose the type of event loop/async mechanism
that is appropriate for the task:  For CPU-bound tasks I'll probably
want to use multiprocessing.  For IO-bound tasks I might want to use
threading.  For a multitude of tasks that "just need to be async" (by
nature) I'll want to use an event loop.
* Any async module should support 'basics' like calling functions at
an interval and calling functions after a timeout occurs (with the
ability to cancel).
* Asynchronous tasks should be able to access the same namespace as
everything else.  Maybe wishful thinking.
* It should support publish/subscribe-style events (i.e. an event
dispatcher).  For example, the ability to watch a file descriptor or
socket for changes in state and call a function when that happens.
Preferably with the flexibility to define custom events (i.e don't
have it tied to kqueue/epoll-specific events).

Thanks for your consideration; and thanks for the awesome language.

Dan McDougall - Chief Executive Officer and Developer
Liftoff Software ? Your flight to the cloud is now boarding.

From greg.ewing at  Sun Oct 14 01:33:50 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 14 Oct 2012 12:33:50 +1300
Subject: [Python-ideas] PEP 428: poll about the joining syntax
In-Reply-To: <>
References: <>
	<k59ff3$i5i$> <>
Message-ID: <>

Michele Lacchia wrote:
 > I wrote:
>     '.j/homeo/homes/homeh/homeu/homeacj/homeo/homes/homeh/homeu/homeaoj/homeo/homes/homeh/homeu/homeanj/homeo/homes/homeh/homeu/homeafj/homeo/homes/homeh/homeu/homeaij/homeo/homes/homeh/homeu/homeag'
>     Homeo, Homeo, wherefore path thou Homeo?
>     -- 
>     Greg
> I just had to +1 on this one!! Congrats! 

I also propose the term "julietted" to describe a string that
has had this misfortune happen to it.


From jeanpierreda at  Sun Oct 14 01:42:09 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Sat, 13 Oct 2012 19:42:09 -0400
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

There has to be some way to contract emails sent in discussions rather
than exploding them. I swear I'm trying to be concise, yet readable.
It's not working.

On Fri, Oct 12, 2012 at 6:11 PM, Guido van Rossum <guido at> wrote:
> I also don't doubt that using classic Futures you can't do this -- the
> chaining really matter for this style, and I presume this (modulo
> unimportant API differences) is what typical Twisted code looks like.

My experience has been unfortunately rather devoid of deferreds in
Twisted. I always feel like the odd one out when people discuss this
confusion. For me, it was all Protocol this and Protocol that, and
deferreds only came up when I used Twisted's great AMP (Asynchronous
Messaging Protocol) library.

> However, Python has yield, and you can do much better (I'll write
> plain yield for now, but it works the same with yield-from):
> try:
>   value1 = yield step1(<args>)
>   value2 = yield step2(value1)
>   value3 = yield step3(value2)
>   # Do something with value4
> except Exception:
>   # Handle any error from step1 through step4
> This form is more flexible, since it is easier to catch different
> exceptions at different points. It is also much easier to pass extra
> information around. E.g. what if your flow ends up having to pass both
> value1 and value2 into step3()? Sure, you can do that by making value2
> a tuple (or a dict, or an object) incorporating value1 and the
> original value2, but that's exactly where this style becomes
> cumbersome, whereas in the yield-based form, such things can remain
> simple local variables. All in all I find it more readable.

Well, first of all, deferreds have ways of joining values together. For example:

    from __future__ import print_function
    from twisted.internet import defer

    def example_joined():
        d1 = defer.Deferred()
        d2 = defer.Deferred()
        # consumeErrors looks scary, but it only means that
        # d1 and d2's errbacks aren't called. Instead, the error is sent to d's
        # errback.
        d = defer.gatherResults([d1, d2], consumeErrors=True)

        d.addErrback(lambda v: print("ERROR!"))

        d1.callback("The first deferred has succeeded")
        # now we're waiting on the second deferred to succeed,
        # which we'll let the caller handle
        return d2

    example_joined().callback("The second deferred has succeeded too!")
    example_joined().errback("The second deferred has failed...")

I agree it's easier to use the generator style in many complicated
cases. That doesn't preclude manual deferreds from also being useful.

> So, in the end, for Python 3.4 and beyond, I want to promote a style
> that mixes simple callbacks (perhaps augmented with simple Futures)
> and generator-based coroutines (either PEP 342, yield/send-based, or
> PEP 380 yield-from-based). I'm looking to Twisted for the best
> reactors (see other thread). But for transport/protocol
> implementations I think that generator/coroutines offers a cleaner,
> better interface than incorporating Deferred.

Egh. I mean, sure, supposed we have those things. But what if you want
to send the result of a callback to a generator-coroutine? Presumably
generator coroutines work by yielding deferreds and being called back
when the future resolves (deferred fires). But if those
futures/deferreds aren't unexposed, and instead only the generator
stuff is exposed, then bridging the gap between callbacks and
generator-coroutines is impossible. So every callback function has to
also be defined to use something else. And worse, other APIs using
callbacks are left in the dust.

Suppose, OTOH, futures/deferreds are exposed. Then we can easily
bridge between callbacks and generators, by returning a future whose
`set_result` is the callback to our callback function (deferred whose
`callback` is the callback).

But if we're exposing futures/deferreds, why have callbacks in the
first place? The difference between these two functions, is that the
second can be used in generator-coroutines trivially and the first

    # callbacks:
    reactor.timer(10, print, "hello world")

    # deferreds
    reactor.timer(10).addCallback(print, "hello world")

Now here's another thing: suppose we have a list of "deferred events",
but instead of handling all 10 at once, we want to handle them "as
they arrive", and then synthesize a result at the bottom. How do you
do this with pure generator coroutines?

For example, perhaps I am implementing a game server, where all the
players choose their characters and then the game begins. Whenever a
character is chosen, everyone else has to know about it so that they
can plan their strategy based on who has chosen a character. Character
selections are final, just so that I can use deferreds (hee hee).

I am imagining something like the following:

    # WRONG: handles players in a certain order, rather than as they come in
    def player_lobby(reactor, players):
        for player in players:
            player_character = yield player.wait_for_confirm(reactor)
            # tell all the other players what character the player has chosen
            notify_choice((player, player_character), players)


This is wrong, because it goes in a certain order and "blocks" the
coroutine until every character is chosen. Players will not know who
has chosen what characters in an appropriate order.

But hypothetically, maybe we could do the following:

    # Hypothetical magical code?
    def player_lobby(reactor, players):
        confirmation_events =
UnorderedEventList([player.wait_for_confirm(reactor) for player in
        while confirmation_events:
            player_character = yield confirmation_events.get_next()
            # tell all the other players what character the player has chosen
            notify_choice((player, player_character), players)


But then, how do we write UnorderedEventList? I don't really know. I
suspect I've made the problem harder, not easier! eek. Plus, it
doesn't even read very well. Especially not compared to the deferred

This is how I would personally do it in Twisted, without using
UnorderedEventList (no magic!):

    def player_lobby(reactor, players):
        events = []
        for player in players:
            confirm_event = player.wait_for_confirm(reactor)
            def on_confirmation(player_character, player=player)
                # tell all the other players what character the player
has chosen
                notify_choice((player, player_character), players)

        yield gatherResults(events)

Notice how I dropped down into the level of manipulating deferreds so
that I could add this "as they come in" functionality, and then went
back. Actually it wouldn't've hurt much to just not bother with
inlineCallbacks at all.

I don't think this is particularly unreadable. More importantly, I
actually know how to do it. I have no idea how I would do this without
using addCallback, or without reimplementing addCallback using

And then, supposing we don't have these deferreds/futures exposed...
how do we implement delayed computation stuff from extension modules?
What if we want to do these kinds of compositions within said
extension modules? What if we want to write our own version of @tasks
or @inlineCallbacks with extra features, or generate callback chains
from XML files, and so on?

I don't really like the prospect of having just the "sugary syntax"
available, without a flexible underlying representation also exposed.
I don't know if you've ever shared that worry -- sometimes the pretty
syntax gets in the way of getting stuff done.

> I hope that the path forward for Twisted will be simple enough: it
> should be possible to hook Deferred into the simpler callback APIs
> (perhaps a new implementation using some form of adaptation, but
> keeping the interface the same). In a sense, the greenlet/gevent crowd
> will be the biggest losers, since they currently write async code
> without either callbacks or yield, using microthreads instead. I
> wouldn't want to have to start putting yield back everywhere into that
> code. But the stdlib will still support yield-free blocking calls
> (even if under the hood some of these use yield/send-based or
> yield-from-based couroutines) so the monkey-patchey tradition can
> continue.

Surely it's no harder to make yourself into a generator than to make
yourself into a low-level thread-like context switching function with
a saved callstack implemented by hand in assembler, and so on?

I'm sure they'll be fine.

>> 1. Explicit callbacks:
>>     For example, reactor.callLater(t, lambda: print("woo hoo"))
> I actually like this, as it's a lowest-common-denominator approach
> which everyone can easily adapt to their purposes. See the thread I
> started about reactors.

Will do (but also see my response above about why not "everyone" can).

>> 2. Method dispatch callbacks:
>>     Similar to the above, the reactor or somebody has a handle on your
>> object, and calls methods that you've defined when events happen
>>     e.g. IProtocol's dataReceived method
> While I'm sure it's expedient and captures certain common patterns
> well, I like this the least of all -- calling fixed methods on an
> object sounds like a step back; it smells of the old Java way (before
> it had some equivalent of anonymous functions), and of asyncore, which
> (nearly) everybody agrees is kind of bad due to its insistence that
> you subclass its classes. (Notice how subclassing as the prevalent
> approach to structuring your code has gotten into a lot of discredit
> since 1996.)

I only used asyncore once, indirectly, so I don't know anything about
it. I'm willing to dismiss it (and, in fact, various parts of twisted
(I'm looking at you twisted.words)) as not good examples of the

First of all, I'd like to separate the notion of subclassing and
method dispatch. They're entirely unrelated. If I pass my object to
you, and you call different methods depending on what happens
elsewhere, that's method dispatch. And my object doesn't have to be
subclassed or anything for it to happen.

Now here's the thing. Suppose we're writing, for example, an IRC bot.
(Everyone loves IRC bots.)  My IRC bot needs to handle several
different possible events, such as:

    private messages
    channel join event
    CTCP event

and so on. My event handlers for each of these events probably
manipulate some internal state (such as a log file, or a GUI). We'd
probably organize this as a class, or else as a bunch of functions
accessing global state. Or, perhaps a collection of closures. This
last one is pretty unlikely.

For the most part, these functions are all intrinsically related and
can't be sensibly treated separately. You can't take the private
message callback of Bot A, and the channel join callback of bot B, and
register these and expect a result that makes sense.

If we look at this, we're expecting to deal with a set of functions
that manage shared data. The abstraction for this is usually an
object, and we'd really probably write the callbacks in a class unless
we were being contrarian. And it's not too crazy for the dispatcher to
know this and expect you to write it as a class that supports a
certain interface (certain methods correspond to certain events).
Missing methods can be assumed to have the empty implementation (no
subclassing, just catching AttributeError).

This isn't too much of an imposition on the user -- any collection of
functions (with shared state via globals or closure variables) can be
converted to an object with callable attributes very simply (thanks to
types.SimpleNamespace, especially). And I only really think this is OK
when writing it as an object -- as a collection of functions with
shared state -- is the eminently obvious primary use case, so that
that situation wouldn't come up very often.

So, as an example, a protocol that passes data on further down the
line needs to be notified when data is received, but also when the
connection begins and ends. So the twisted protocol interface has
"dataReceived", "connectionMade", and "connectionLost" callbacks.
These really do belong together, they manage a single connection
between computers and how it gets mapped to events usable by a twisted
application. So I like the convenience and suggestiveness of them all
being methods on an object.

>> 4. Generator coroutines
>>     These are a syntactic wrapper around deferreds. If you yield a
>> deferred, you will be sent the result if the deferred succeeds, or an
>> exception if the deferred fails.
>>     e.g. examples from previous message
> Seeing them as syntactic sugar for Deferreds is one way of looking at
> it; no doubt this is how they're seen in the Twisted community because
> Deferreds are older and more entrenched. But there's no requirement
> that an architecture has to have Deferreds in order to use generator
> coroutines -- simple Futures will do just fine, and Greg Ewing has
> shown that using yield-from you can even do without those. (But he
> does use simple, explicit callbacks at the lowest level of his
> system.)

I meant it as a factual explanation of what generator coroutines are
in Twisted, not what they are in general. Sorry for the confusion. We
are probably agreed here.

After a cursory examination, I don't really understand Greg Ewing's
thing. I'd have to dig deeper into the logs for when he first
introduced it.

> I'd like to come back to that Django example though. You are implying
> that there are some opportunities for concurrency here, and I agree,
> assuming we believe disk I/O is slow enough to bother making it
> asynchronously. (In App Engine it's not, and we can't anyways, but in
> other contexts I agree that it would be bad if a slow disk seek were
> to hold up all processing -- not to mention that it might really be
> NFS...)
> How would you code that using Twisted Deferreds?

Well. I'd replace the @task in your NDB thing with @inlineCallbacks
and call it a day. ;)

(I think there's enough deferred examples above, and I'm getting tired
and it's been a day since I started writing this damned email.)

>> For that stuff, you'd have to speak to the main authors of Twisted.
>> I'm just a twisted user. :(
> They seem to be mostly ignoring this conversation, so your standing in
> as a proxy for them is much appreciated!

Well. We are on Python-Ideas... :(

>> In the end it really doesn't matter what API you go with. The Twisted
>> people will wrap it up so that they are compatible, as far as that is
>> possible.
> And I want to ensure that that is possible and preferably easy, if I
> can do it without introducing too many warts in the API that
> non-Twisted users see and use.

I probably lack the expertise to help too much with this. I can point
out anything that sticks out, if/when an extended futures proposal is

-- Devin

From greg.ewing at  Sun Oct 14 01:42:49 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 14 Oct 2012 12:42:49 +1300
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

Mike Meyer wrote:

> def __$<op>__(self, other, right): 
> <op> must match a new grammer symbol "operator_symbol", with limits on
> it to for readability reasons: say at most three characters, all
> coming from an appropriate unicode class or classes

If it's restricted it to single Unicode character, we could
use its Unicode name as the method name:

def __CIRCLE_PLUS__(x, y):


From greg.ewing at  Sun Oct 14 01:48:34 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 14 Oct 2012 12:48:34 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
	<> <>
Message-ID: <>

Nick Coghlan wrote:
> It's a useful trick for writing genuinely cross-platform code: when
> I'm writing cross-platform code on *nix, I want my paths to behave
> like posix paths in every respect *except* I want them to complain
> somehow if any of my names only differ by case.

I don't see how this problem can be solved purely by
adjusting path object behaviour. What you want is to get
a complaint whenever you try to create a file in a
directory that already contains another name that is
case-insensitively equal. That would have to be built
into the file system access functions.


From python at  Sun Oct 14 02:04:59 2012
From: python at (MRAB)
Date: Sun, 14 Oct 2012 01:04:59 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-14 00:42, Greg Ewing wrote:
> Mike Meyer wrote:
>> def __$<op>__(self, other, right):
>> <op> must match a new grammer symbol "operator_symbol", with limits on
>> it to for readability reasons: say at most three characters, all
>> coming from an appropriate unicode class or classes
> If it's restricted it to single Unicode character, we could
> use its Unicode name as the method name:
> def __CIRCLE_PLUS__(x, y):
>      ...
If it's more than one codepoint, we could prefix with the length of the
codepoint's name:

def __12CIRCLED_PLUS__(x, y):

From greg.ewing at  Sun Oct 14 02:17:05 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 14 Oct 2012 13:17:05 +1300
Subject: [Python-ideas] The async API of the future: Twisted
	and	Deferreds
In-Reply-To: <>
References: <>
Message-ID: <>

Itamar Turner-Trauring wrote:

> For example, consider the following code; silly, but buggy due to the 
> context switch in yield allowing race conditions if any other code 
> modifies counter.value while getResult() is waiting for a result.
>    def addToCounter():
>         counter.value = counter.value + (yield getResult())

But at least you can *see* from the presence of the 'yield'
that suspension can occur.

PEP 380 preserves this, because anything that can yield has
to be called using 'yield from', so the potential suspension
points remain visible.

> That being said, perhaps some changes to Python syntax could solve this; 
> Allen Short 
> ( 
> claims to have a proposal, hopefully he'll post it soon.

He argues there that greenlet-style coroutines are bad because
suspension can occur anywhere without warning. He likes
generators better, because the 'yield' warns you that suspension
might occur. Generators using 'yield from' have the same property.

If his proposal involves marking the suspension points somehow, then
syntactically it will probably be very similar to yield-from.


From itamar at  Sun Oct 14 02:59:56 2012
From: itamar at (Itamar Turner-Trauring)
Date: Sat, 13 Oct 2012 20:59:56 -0400
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 8:17 PM, Greg Ewing <greg.ewing at>wrote:

> But at least you can *see* from the presence of the 'yield'
> that suspension can occur.


He argues there that greenlet-style coroutines are bad because
> suspension can occur anywhere without warning. He likes
> generators better, because the 'yield' warns you that suspension
> might occur. Generators using 'yield from' have the same property.
> If his proposal involves marking the suspension points somehow, then
> syntactically it will probably be very similar to yield-from.

Explicit suspension is certainly better than hidden suspension, yes. But by
extension, no suspension at all is best.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Oct 14 03:35:00 2012
From: guido at (Guido van Rossum)
Date: Sat, 13 Oct 2012 18:35:00 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 5:59 PM, Itamar Turner-Trauring
<itamar at> wrote:
> Explicit suspension is certainly better than hidden suspension, yes. But by
> extension, no suspension at all is best.

When using Deferreds, there are suspension points too. They just
happen whenever a Deferred is blocked. Each next callback has to
assume that the world may have changed.

You may like that better. But for me, 9 out of 10 times, yield-based
coroutines (whether using Futures or PEP 380's yield from) make the
code more readable than the Deferred style. I do appreciate that often
the Deferred style is an improvement over plain callbacks -- but I
believe that explicit-yielding coroutines are so much better than
Deferred that I'd rather base the future standard API on a combination
of plain callbacks and either Futures+yield or yield-from (the latter
without Futures).

I trust that Twisted invented the best possible interface given the
available machinery at the time (no yield-based coroutines at all, and
not using Stackless). But now that we have explicit-yielding
coroutines, I believe we should adopt a style based on those.

Twisted can of course implement Deferred easily in this world using
some form of adaptation, and we should ensure that this is indeed
reasonable before accepting a standard.

Whether it's better to use yield-from <generator> or yield <future>
remains to be seen; that debate is still ongoing in the thread

--Guido van Rossum (

From oscar.j.benjamin at  Sun Oct 14 04:16:57 2012
From: oscar.j.benjamin at (Oscar Benjamin)
Date: Sun, 14 Oct 2012 03:16:57 +0100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On 12 October 2012 20:42, Antoine Pitrou <solipsis at> wrote:
> On Fri, 12 Oct 2012 12:23:46 -0700
> Ethan Furman <ethan at> wrote:
>> Which is why I would like to see Path based on str, despite Guido's
>> misgivings.  (Yes, I know I'm probably tilting at windmills here...)
>> If Path is string based we get backwards compatibility with all the os
>> and third-party tools that expect and use strings; this would allow a
>> gentle migration to using them, as opposed to the all-or-nothing if Path
>> is a completely new type.
> It is not all-or-nothing since you can just call str() and it will work
> fine with both strings and paths.

I assumed that part of the proposal for including a new Path class was
that it would (perhaps eventually rather than immediately) be directly
supported by all of the standard Python APIs that expect
strings-representing-paths. I apologise if I have missed something but
is there some reason why it would be bad for e.g. open() to accept
Path instances as they are? I think it's reasonable to require that
e.g. should only accept a str, but standard open()?


From ncoghlan at  Sun Oct 14 04:26:04 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 14 Oct 2012 12:26:04 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

My general thought on this is that "yield from generator" is the coroutine
equivalent of a function call, while "yield future" would be the way the
lowest level of the generator talked to the standard event loop.

Sent from my phone, thus the relative brevity :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Oct 14 04:39:06 2012
From: guido at (Guido van Rossum)
Date: Sat, 13 Oct 2012 19:39:06 -0700
Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 10:54 AM, Laurens Van Houtven <_ at> wrote:
> On Sat, Oct 13, 2012 at 1:22 AM, Guido van Rossum <guido at> wrote:
>> [Hopefully this is the last spin-off thread from "asyncore: included
>> batteries don't fit"]
>> So it's totally unfinished?
> At the time, the people I talked to placed significantly more weight in
> "explain why this is necessary" than "get me something I can play with".

Odd. Were those people experienced in writing / reviewing PEPs?

>> > Do you feel that there should be less talk about rationale?
>> No, but I feel that there should be some actual specification. I am
>> also looking forward to an actual meaty bit of example code -- ISTR
>> you mentioned you had something, but that it was incomplete, and I
>> can't find the link.
> Just examples of how it would work, nothing hooked up to real code. My
> memory of it is more of a drowning-in-politics-and-bikeshedding kind of
> thing, unfortunately :) Either way, I'm okay with letting bygones be bygones
> and focus on how we can get this show on the road.

Shall I just reject PEP 3153 so it doesn't distract people? Of course
we can still refer to it when people ask for a rationale for the
separation between transports and protocols, but it doesn't seem the
PEP itself is going to be finished (correct me if I'm wrong), and as
it stands it is not useful as a software specification.

>> > It's not that there's *no* reference to IO: it's just that that
>> > reference is
>> > abstracted away in data_received and the protocol's transport object,
>> > just
>> > like Twisted's IProtocol.
>> The words "data_received" don't even occur in the PEP.
> See above.
> What thread should I reply in about the pull APIs?

Probably the yield-from thread; or the Twisted/Deferred thread.

>> I just want to make sure that we don't *completely* paint ourselves into
>> the wrong corner when it comes to that.
> I don't think we have to worry about it too much. Any reasonable API I can
> think of makes this completely doable.

Agreed that we needn't constantly worry about it. It should be enough
to have some kind of reality check closer to PEP accept time.

>> But I'm really hoping you'll make good on your promise of redoing
>> async-pep, giving some actual specifications and example code, so I
>> can play with it.
> Takeaways:
> - The async API of the future is very important, and too important to be
> left to chance.

That's why we're discussing it here.

> - It requires a lot of very experienced manpower.

It also requires (a certain level of) *agreement* between people with
different preferences, since it's no good if the community fragments
or the standard solution gets ignored by Twisted and Tornado, for
example. Ideally those packages (that is, their Python 3.4 versions)
would build on and extend the standard API, and for "boring" stuff
(like possibly the event loop) they would just use the standard

> - It requires a lot of effort to handle the hashing out of it (as we're
> doing here) as well as it deserves to be.


> I'll take as proactive a role as I can afford to take in this process, but I
> don't think I can do it by myself.

I hope I didn't come across as asking you that! I am just hoping that
you can give some concrete, working example code showing how to do
protocols and transports.

>  Furthermore, it's a risk nobody wants to
> take: a repeat performance wouldn't be good for anyone, in particular not
> for Python nor myself.

A repeat of what? Of the failure of PEP 3153? Don't worry about that.
This time around I'm here, and since then I have got a lot of
experience implementing and using a solid async library (albeit of a
quite different nature than the typical socket-based stuff that most
people do).

> I've asked JP Calderone and Itamar Turner-Trauring if they would be
> interested in carrying this forward professionally, and they have
> tentatively said yes. JP's already familiar with a large part of the problem
> space with the implementation of the ssl module. JP and Itamar have worked
> together for years and have recently set up a consulting firm.

Insight in the right way to support SSL would be huge; it is an
excellent example of a popular transport that does *not* behave like
sockets, even though its abstract conceptual model is similar (a setup
phase, followed by two bidirectional byte streams).

> Given that this is emphatically important to Python, I intend to apply for a
> PSF grant on their behalf to further this goal. Given their experience in
> the field, I expect this to be a fairly low risk endeavor.

Famous last words. :-)

--Guido van Rossum (

From glyph at  Sun Oct 14 05:41:02 2012
From: glyph at (Glyph)
Date: Sat, 13 Oct 2012 20:41:02 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 13, 2012, at 9:17 AM, Guido van Rossum <guido at> wrote:

> On Fri, Oct 12, 2012 at 9:46 PM, Glyph <glyph at> wrote:
>> There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days.  There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion.
>> Would everyone interested in this please please please read <> several times?  Especially this section: <>.  If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation.
> I am well aware of that section. But, like the rest of PEP 3153, it is
> sorely lacking in examples or specifications.

If that's what the problem is, I will do what I can to get those sections fleshed out ASAP.

>> I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering.  Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer.
> This could mean several things: (a) only the networking layer needs to use both trigger styles, the rest of your code should always use trigger style X (and please let X be edge-triggered :-); (b) only in the networking layer is it important to distinguish carefully between the two, in the rest of the app you can use whatever you like best.

Edge triggering and level triggering both have to do with changes in boolean state.  Edge triggering is "call me when this bit is changed"; level triggering is "call me (and keep calling me) when this bit is set".  The metaphor extends very well from the electrical-circuit definition, but the distinction is not very meaningful to applications who want to subscribe to a semantic event and not the state of a bit.

Applications want to know about particular bits of information, not state changes.  Obviously when data is available on the connection, it's the bytes that the application is interested in.  When a new connection is available to be accept()-ed, the application wants to know that as a distinct notification.  There's no way to deliver data or new connected sockets to the application as "edge-triggered"; if the bit is still set later, then there's more, different data to be delivered, which needs a separate notification.  But, even in more obscure cases like "the socket is writable", the transport layer needs to disambiguate between "the connection has closed unexpectedly" and "you should produce some more data for writing now".  (You might want to also know how much buffer space is available, although that is pretty fiddly.)

The low-level event loop needs to have both kinds of callbacks, but avoid exposing the distinction to application code.  However, this doesn't mean all styles need to be implemented.  If Python defines a core event loop interface specification, it doesn't have to provide every type of loop.  Twisted can continue using its reactors, Tornado can continue using its IOLoop, and each can have transforming adapters to work with standard-library protocols.

When the "you should read some data" bit is set, an edge-triggered transport receives that notification, reads the data, which immediately clears that bit, so it responds to the next down->up edge notification in the same way.  The level-triggered transport does the same thing: it receives the notification that the bit is set, then immediately clears it by reading the data; therefore, if it gets another notification that the bit is high, that means it's high again, and more data needs to be read.

>> Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called.  Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it.  This is perhaps the central design error of asyncore.
> Is this about buffering? Because I think I understand buffering.  Filling up a buffer with data as it comes in (until a certain limit) is a good job for level-triggered callbacks. Ditto for draining a buffer.

In the current Twisted implementation, you just get bytes objects delivered; when it was designed, 'str' was really the only game in town.  However, I think this still applies because the first thing you're going to do when parsing the contents of your buffer is to break it up into chunks by using some handy bytes method.

In modern Python, you might want to get a bytearray plus an offset delivered instead, because a bytearray can use recv_into, and a bytearray might be reusable, and could possibly let you implement some interesting zero-copy optimizations.  However, in order to facilitate that, bytearray would need to have zero-copy implementations of split() and and such.

In my opinion, the prerequisite for using anything other than a bytes object in practical use would be a very sophisticated lazy-slicing data structure, with zero-copy implementations of everything, and a copy-on-write version of recv_into so that if the sliced-up version of the data structure is shared between loop iterations the copies passed off to other event handlers don't get stomped on.  (Although maybe somebody's implemented this while I wasn't looking?)

This kind of pay-only-for-what-you-need buffering is really cool, a lot of fun to implement, and it will give you very noticeable performance gains if you're trying to write a wire-speed proxy or router with almost no logic in it; however, I've never seen it really be worth the trouble in any other type of application.  I'd say that if we can all agree on the version that delivers bytes, the version that re-uses a fixed-sized bytearray buffer could be an optional feature in the 2.0 version of the spec.

> The rest of the app can then talk to the buffer and tell it "give me between X and Y bytes, possibly blocking if you don't have at least X available right now, or "here are N more bytes, please send them out when you can". From the app's position these calls *may* block, so they need to use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from) to ensure that *if* they block, other tasks can run.

This is not how the application should talk to the receive buffer.  Bytes should not necessarily be directly be requested by the application: they simply arrive.  If you have to model everything in terms of a request-for-bytes/response-to-request idiom, there are several problems:

1. You have to heap-allocate an additional thing-to-track-the-request object every time you ask for bytes, which adds non-trivial additional overhead to the processing of simple streams.  (The C-level event object that i.e. IOCP uses to track the request is slightly different, because it's a single signaling event and you should only ever have one outstanding per connection, so you don't have to make a bunch of them.)

2. Multiple listeners might want to "read" from a socket at once; for example, if you have a symmetric protocol where the application is simultaneously waiting for a response message from its peer and also trying to handle new requests of its own.  (This is required in symmetric protocols, like websockets and XMPP, and HTTP/2.0 seems to be moving in this direction too.)

3. Even assuming you deal with part 1 and 2 properly - they are possible to work around - error-handling becomes tricky and tedious.  You can't effectively determine in your coroutine scheduler which errors are in the code that is reading or writing to a given connection (because the error may have been triggered by code that was reading or writing to a different connection), so sometimes your sockets will just go off into la-la land with nothing reading from them or writing to them.  In Twisted, if a dataReceived handler causes an error, then we know it's time to shut down that connection and close that socket; there's no ambiguity.

Even if you want to write your protocol parsers in a yield-coroutine style, I don't think you want the core I/O layer to be written in that style; it should be possible to write everything as "raw" it's-just-a-method event handlers because that is really the lowest common denominator and therefore the lowest overhead; both in terms of performance and in terms of simplicity of debugging.  It's easy enough to write a thing with a .data_received(data) method that calls send() on the appropriate suspended generator.

> But the common case is that they don't actually need to block because there is still data / space in the buffer.

I don't think that this is necessarily the "common case".  Certainly in bulk-transfer protocols or in any protocol that supports pipelining, you usually fill up the buffer completely on every iteration.

> (You could also have an exception for write() and make that never-blocking, trusting the app not to overfill the buffer; this seems convenient but it worries me a bit.)

That's how Twisted works... sort of.  If you call write(), it always just does its thing.  That said, you can ask to be notified if you've written too much, so that you can slow down.

(Flow-control is sort of a sore spot for the current Twisted API; what we have works, and it satisfies the core requirements, but the shape of the API is definitely not very convenient.  <> outlines the next-generation streaming and flow-control primitives that we are currently working on.  I'm very excited about those but they haven't been battle-tested yet.)

If you're talking about "blocking" in a generator-coroutine style, then well-written code can do

yield write(x)
yield write(y)
yield write(z)

and "lazy" code, that doesn't care about over-filling its buffer, can just do

yield write(z)

there's no reason that the latter style ought to cause any sort of error.

>> If it needs a name, I suppose I'd call my preferred style "event triggering".
> But how does it work? What would typical user code in this style look like?

It really depends on the layer.  You have to promote what methods get called at each semantic layer; but, at the one that's most interesting for interoperability, the thing that delivers bytes to protocol parsers, it looks something like this:

def data_received(self, data):
    lines = (self.buf + data).split("\r\n")
    for line in lines[:-1]:
    self.buf = lines[-1]

At a higher level, you might have header_received, http_request_received, etc.

The thing that calls data_received typically would look like this:

def handle_read(self):
        data = self.socket.recv(self.buffer_size)
    except socket.error, se:
        if se.args[0] == EWOULDBLOCK:
            return main.CONNECTION_LOST

although it obviously looks a little different in the case of IOCP.

>> Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once.  If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk.
> Well understood. (And I don't even want to get microthreading into the
> mix, although others may disagree -- I see Christian Tismer has jumped
> in...) But I also think that if we design these things in isolation
> it's likely that we'll find later that the pieces don't fit, and I
> don't want that to happen either. So I think we should consider these
> separate, but loosely coordinated efforts.

Great, glad to hear it.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Sun Oct 14 06:30:04 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 14 Oct 2012 17:30:04 +1300
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
Message-ID: <>

Oscar Benjamin wrote:
> I think it's reasonable to require that
> e.g. should only accept a str, but standard open()?

Why shouldn't accept a path object?

Especially if we use a protocol such as __strpath__
so that the os module doesn't have to explicitly
know about the Path classes.


From guido at  Sun Oct 14 06:49:07 2012
From: guido at (Guido van Rossum)
Date: Sat, 13 Oct 2012 21:49:07 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 8:41 PM, Glyph <glyph at> wrote:

> On Oct 13, 2012, at 9:17 AM, Guido van Rossum <guido at> wrote:
> On Fri, Oct 12, 2012 at 9:46 PM, Glyph <glyph at> wrote:
> There has been a lot written on this list about asynchronous,
> microthreaded and event-driven I/O in the last couple of days.  There's too
> much for me to try to respond to all at once, but I would very much like to
> (possibly re-)introduce one very important point into the discussion.
> Would everyone interested in this please please please read <
>> several times?
>  Especially this section: <
>  If it is not clear, please ask questions about it and I will try to needle
> someone qualified into improving the explanation.
> I am well aware of that section. But, like the rest of PEP 3153, it is
> sorely lacking in examples or specifications.
> If that's what the problem is, I will do what I can to get those sections
> fleshed out ASAP.

I'd love that! Laurens seems burned-out from his previous attempts at
authoring that PEP and has not volunteered any examples.

> I am bringing this up because I've seen a significant amount of discussion
> of level-triggering versus edge-triggering.  Once you have properly
> separated out transport logic from application implementation, triggering
> style is an irrelevant, private implementation detail of the networking
> layer.
> This could mean several things: (a) only the networking layer needs to use
> both trigger styles, the rest of your code should always use trigger style
> X (and please let X be edge-triggered :-); (b) only in the networking layer
> is it important to distinguish carefully between the two, in the rest of
> the app you can use whatever you like best.
> Edge triggering and level triggering both have to do with changes in
> boolean state.  Edge triggering is "call me when this bit is changed";
> level triggering is "call me (and keep calling me) when this bit is set".
>  The metaphor extends very well from the electrical-circuit definition, but
> the distinction is not very meaningful to applications who want to
> subscribe to a semantic event and not the state of a bit.

I am well aware of the terms' meanings in electrical circuits. It seems
that, alas, I may have misunderstood how the terms are used in the world of
callbacks. In my naivete, when they were brought up, I thought that
edge-triggered meant "call this callback once, when this specific event
happens" (e.g. a specific async read or write call completing) whereas
level-triggered referred to "call this callback whenever a certain
condition is true" (e.g. a socket is ready for reading or writing).

But from your messages it seems that it's more a technical term for
different ways of dealing with the latter, so that in either case it is
about multiple-call callbacks. If this is indeed the case I profusely
apologize for the confusion I have probably caused. (Hopefully most people
glazed over anyway. :-)

Applications want to know about particular bits of information, not state
> changes.  Obviously when data is available on the connection, it's the
> bytes that the application is interested in.  When a new connection is
> available to be accept()-ed, the application wants to know that as a
> distinct notification.  There's no way to deliver data or new connected
> sockets to the application as "edge-triggered"; if the bit is still set
> later, then there's more, different data to be delivered, which needs a
> separate notification.  But, even in more obscure cases like "the socket is
> writable", the transport layer needs to disambiguate between "the
> connection has closed unexpectedly" and "you should produce some more data
> for writing now".  (You might want to also know how much buffer space is
> available, although that is pretty fiddly.)

Excuse my ignorance, but are there ioctl() calls to get at this kind of
information, or do you just have to try to call send()/write() and
interpret the error you get back?

> The low-level event loop needs to have both kinds of callbacks, but avoid
> exposing the distinction to application code.  However, this doesn't mean
> all styles need to be implemented.  If Python defines a core event loop
> interface specification, it doesn't have to provide every type of loop.
>  Twisted can continue using its reactors, Tornado can continue using its
> IOLoop, and each can have transforming adapters to work with
> standard-library protocols.

I'm not 100% sure I follow this. I think you are saying that in some
systems the system level (the kernel, say) has an edge-triggered API and in
other systems it is level-triggered? And that it doesn't matter much since
it's easy to turn either into the other?

If I've got that right, do you have a preference for what style the
standard-library interface should use? And why?

> When the "you should read some data" bit is set, an edge-triggered
> transport receives that notification, reads the data, which immediately
> clears that bit, so it responds to the next down->up edge notification in
> the same way.  The level-triggered transport does the same thing: it
> receives the notification that the bit is set, then immediately clears it
> by reading the data; therefore, if it gets another notification that the
> bit is high, that means it's high again, and more data needs to be read.

Makes sense. So they both refer to multi-call callbacks (I don't know what
you call these). And it looks like a common application of either is
buffered streams, and another is incoming connections to listening sockets.
Both seem to belong to the world of transports. Right?

> Whether the operating system tells Python "you must call recv() once now"
> or "you must call recv() until I tell you to stop" should not matter to the
> application if the application is just getting passed the results of recv()
> which has already been called.  Since not all I/O libraries actually have a
> recv() to call, you shouldn't have the application have to call it.  This
> is perhaps the central design error of asyncore.
> Is this about buffering? Because I think I understand buffering.  Filling
> up a buffer with data as it comes in (until a certain limit) is a good job
> for level-triggered callbacks. Ditto for draining a buffer.
> In the current Twisted implementation, you just get bytes objects
> delivered; when it was designed, 'str' was really the only game in town.
>  However, I think this still applies because the first thing you're going
> to do when parsing the contents of your buffer is to break it up into
> chunks by using some handy bytes method.
> In modern Python, you *might* want to get a bytearray plus an offset
> delivered instead, because a bytearray can use recv_into, and a bytearray
> might be reusable, and could possibly let you implement some interesting
> zero-copy optimizations.  However, in order to facilitate that, bytearray
> would need to have zero-copy implementations of split() and and
> such.

That sounds like a *very* low-level consideration to me, and as you suggest
unrealistic given the other limitations. I would rather just get bytes
objects and pay for the copying. I know some people care deeply about extra
copies, and in certain systems they are probably right, but I doubt that
those systems would be implemented in Python even if we *did* bend over
backwards to avoid copies. And it really would make the interface much more
painful to use. Possibly there could be a separate harder-to-use
lower-level API that deals in bytearrays for a few connoisseurs, but we
probably shouldn't promote it much, and since it's always possible to add
APIs later, I'd rather avoid defining it for version 1.

> In my opinion, the prerequisite for using anything other than a bytes
> object in practical use would be a very sophisticated lazy-slicing data
> structure, with zero-copy implementations of everything, and a
> copy-on-write version of recv_into so that if the sliced-up version of the
> data structure is shared between loop iterations the copies passed off to
> other event handlers don't get stomped on.  (Although maybe somebody's
> implemented this while I wasn't looking?)
> This kind of pay-only-for-what-you-need buffering is really cool, a lot of
> fun to implement, and it will give you very noticeable performance gains if
> you're trying to write a wire-speed proxy or router with almost no logic in
> it; however, I've never seen it really be worth the trouble in any other
> type of application.  I'd say that if we can all agree on the version that
> delivers bytes, the version that re-uses a fixed-sized bytearray buffer
> could be an optional feature in the 2.0 version of the spec.

Seems we are in perfect agreement (I wrote the above without reading this
far :-).

> The rest of the app can then talk to the buffer and tell it "give me
> between X and Y bytes, possibly blocking if you don't have at least X
> available right now, or "here are N more bytes, please send them out when
> you can". From the app's position these calls *may* block, so they need to
> use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from)
> to ensure that *if* they block, other tasks can run.
> This is not how the application should talk to the receive buffer.  Bytes
> should not necessarily be directly be requested by the application: they
> simply arrive.  If you have to model everything in terms of a
> request-for-bytes/response-to-request idiom, there are several problems:

(Thanks for writing this; this is the kind of insight I am hoping to get
from you and others.)

> 1. You have to heap-allocate an additional thing-to-track-the-request
> object every time you ask for bytes, which adds non-trivial additional
> overhead to the processing of simple streams.  (The C-level event object
> that i.e. IOCP uses to track the request is slightly different, because
> it's a single signaling event and you should only ever have one outstanding
> per connection, so you don't have to make a bunch of them.)
> 2. Multiple listeners might want to "read" from a socket at once; for
> example, if you have a symmetric protocol where the application is
> simultaneously waiting for a response message from its peer and also trying
> to handle new requests of its own.  (This is required in symmetric
> protocols, like websockets and XMPP, and HTTP/2.0 seems to be moving in
> this direction too.)
> 3. Even assuming you deal with part 1 and 2 properly - they are possible
> to work around - error-handling becomes tricky and tedious.  You can't
> effectively determine in your coroutine scheduler which errors are in the
> code that is reading or writing to a given connection (because the error
> may have been triggered by code that was reading or writing to a different
> connection), so sometimes your sockets will just go off into la-la land
> with nothing reading from them or writing to them.  In Twisted, if a
> dataReceived handler causes an error, then we know it's time to shut down
> that connection and close that socket; there's no ambiguity.

I'll have to digest all this, but I'll be sure to think about this
carefully. My kneejerk reactions are that (1) heap allocations are
unavoidable anyway, (2) if there are multiple listeners there should be
some other layer demultiplexing, and (3) nobody gets error handling right
anyway; but I should be very suspicious of kneejerks, even my own.

> Even if you want to write your protocol parsers in a yield-coroutine
> style, I don't think you want the core I/O layer to be written in that
> style; it should be possible to write everything as "raw"
> it's-just-a-method event handlers because that is really the lowest common
> denominator and therefore the lowest overhead; both in terms of performance
> and in terms of simplicity of debugging.  It's easy enough to write a thing
> with a .data_received(data) method that calls send() on the appropriate
> suspended generator.

I agree. In fact, the lowest level in NDB (my own big foray into async,
albeit using App Engine's RPC instead of sockets) is written as an event
loop with no references to generators or Futures -- all it knows about are
RPCs and callback functions. (Given the way the RPC class is defined in App
Engine, calling a designated method on the RPC object is out of the
question, everything is callables plus *args plus **kwds.)

> But the common case is that they don't actually need to block because
> there is still data / space in the buffer.
> I don't think that this is necessarily the "common case".  Certainly in
> bulk-transfer protocols or in any protocol that supports pipelining, you
> usually fill up the buffer completely on every iteration.

Another pragmatic observation that I wouldn't have been able to make on my

> (You could also have an exception for write() and make that
> never-blocking, trusting the app not to overfill the buffer; this seems
> convenient but it worries me a bit.)
> That's how Twisted works... sort of.  If you call write(), it always just
> does its thing.  That said, you can ask to be notified if you've written
> too much, so that you can slow down.
> (Flow-control is sort of a sore spot for the current Twisted API; what we
> have works, and it satisfies the core requirements, but the shape of the
> API is definitely not very convenient.  <> outlines the
> next-generation streaming and flow-control primitives that we are currently
> working on.  I'm very excited about those but they haven't been
> battle-tested yet.)
> If you're talking about "blocking" in a generator-coroutine style, then
> well-written code can do
> yield write(x)
> yield write(y)
> yield write(z)
> and "lazy" code, that doesn't care about over-filling its buffer, can just
> do
> write(x)
> write(y)
> yield write(z)
> there's no reason that the latter style ought to cause any sort of error.

Good to know.

> If it needs a name, I suppose I'd call my preferred style "event
> triggering".
> But how does it work? What would typical user code in this style look like?
> It really depends on the layer.  You have to promote what methods get
> called at each semantic layer; but, at the one that's most interesting for
> interoperability, the thing that delivers bytes to protocol parsers, it
> looks something like this:
> def data_received(self, data):
>     lines = (self.buf + data).split("\r\n")
>     for line in lines[:-1]:
>         self.line_received(line)
>     self.buf = lines[-1]
I see, I've written code like this many times, with many variations.

> At a higher level, you might have header_received, http_request_received,
> etc.
> The thing that calls data_received typically would look like this:
> def handle_read(self):
>     try:
>         data = self.socket.recv(self.buffer_size)
>     except socket.error, se:
>         if se.args[0] == EWOULDBLOCK:
>             return
>         else:
>             return main.CONNECTION_LOST
>     else:
>         try:
>             self.protocol.data_received(data)
>         except:
>             log_the_error()
>             self.disconnect()
> although it obviously looks a little different in the case of IOCP.

It seems that peraps the 'data_received' interface is the most important
one to standardize (for the event loop); I can imagine many variations on
the handle_read() implementation, and there would be different ones for
IOCP, SSL, and probably others. The stdlib should have good ones for the
common platforms but it should be designed to allow people who know better
to hook up their own implementation.

> Also, I would like to remind all participants that microthreading,
> request/response abstraction (i.e. Deferreds, Futures), generator
> coroutines and a common API for network I/O are all very different tasks
> and do not need to be accomplished all at once.  If you try to build
> something that does all of this stuff, you get most of Twisted core plus
> half of Stackless all at once, which is a bit much for the stdlib to bite
> off in one chunk.
> Well understood. (And I don't even want to get microthreading into the
> mix, although others may disagree -- I see Christian Tismer has jumped
> in...) But I also think that if we design these things in isolation
> it's likely that we'll find later that the pieces don't fit, and I
> don't want that to happen either. So I think we should consider these
> separate, but loosely coordinated efforts.
> Great, glad to hear it.

Thanks for taking the time to respond!

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Oct 14 07:03:17 2012
From: guido at (Guido van Rossum)
Date: Sat, 13 Oct 2012 22:03:17 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

[Quick, I know I'm way behind, especially on this thread; more tomorrow.]

On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou <solipsis at> wrote:
> On Fri, 12 Oct 2012 15:11:54 -0700
> Guido van Rossum <guido at> wrote:
> >
> > > 2. Method dispatch callbacks:
> > >
> > >     Similar to the above, the reactor or somebody has a handle on your
> > > object, and calls methods that you've defined when events happen
> > >     e.g. IProtocol's dataReceived method
> >
> > While I'm sure it's expedient and captures certain common patterns
> > well, I like this the least of all -- calling fixed methods on an
> > object sounds like a step back; it smells of the old Java way (before
> > it had some equivalent of anonymous functions), and of asyncore, which
> > (nearly) everybody agrees is kind of bad due to its insistence that
> > you subclass its classes. (Notice how subclassing as the prevalent
> > approach to structuring your code has gotten into a lot of discredit
> > since 1996.)
> But how would you write a dataReceived equivalent then? Would you have
> a "task" looping on a read() call, e.g.
> @task
> def my_protocol_main_loop(conn):
>     while <some_condition>:
>         try:
>             data = yield
>         except ConnectionError:
>             conn.close()
>             break

No, I would use plain callbacks. There would be some kind of IOObject
class defined by the stdlib that wraps a socket (it would make it
non-blocking, and possibly to other things), and the user would make a
registration call to the event loop giving it the IOOjbect and the
user's callback function plus *args and **kwds; the event loop would
call callback(*args, **kwds) each time the IOObject became readable.
(Oh, and there would be separate registration (and unregistration)
functions for reading and writing.)

Apparently my rants about callbacks have made people assume that I
don't want to see them anywhere. In fact I am comfortable with
callbacks for a number of situations -- I just think we have several
other tools in our toolbox that are way underused, whereas callbacks
are way overused, in part because the alternative tools are relatively

This way the user could switch to a different callback when a
different phase of the protocol is reached. I realize there are other
shapes this API could take. But I really don't want the user to have
to subclass IOObject.

> I'm not sure I understand the problem with subclassing. It works fine
> in Twisted. Even in Python 3 we don't shy away from subclassing, for
> example the IO stack is based on subclassing RawIOBase, BufferedIOBase,
> etc.

I'm fine with using subclassing for the internal structure of a
library. (The IOObject I am postulating would almost certainly have a
bunch of subclasses used for different types of sockets, IOCP, SSL,
etc.) The thing that I've soured upon (and many others too) is to tell
users "and to use this fine feature, just subclass this handy base
class and override or extend the following three methods". Because in
practice (certainly in Python, where the compiler doesn't enforce
privacy) users always start overriding other methods, or using
internal state, or add state that clashes with the base class's state,
or forget to call mandatory super calls, or make incorrect assumptions
about thread-safety, or whatever else they can do to screw things up.
And duck typing isn't ideal either for this situation.

--Guido van Rossum (

From greg.ewing at  Sun Oct 14 07:16:10 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 14 Oct 2012 18:16:10 +1300
Subject: [Python-ideas] The async API of the future: Twisted
	and	Deferreds
In-Reply-To: <>
References: <>
Message-ID: <>

Devin Jeanpierre wrote:
> Presumably
> generator coroutines work by yielding deferreds and being called back
> when the future resolves (deferred fires).

That's one way to go about it, but it's not the only way.
See here for my take on how it might work:


From greg.ewing at  Sun Oct 14 07:29:05 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 14 Oct 2012 18:29:05 +1300
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> I thought that 
> edge-triggered meant "call this callback once, when this specific event 
> happens" (e.g. a specific async read or write call completing) whereas 
> level-triggered referred to "call this callback whenever a certain 
> condition is true" (e.g. a socket is ready for reading or writing).

Not sure if this is relevant, but I'd just like to point out
that the behaviour of select() in this respect is actually
*edge triggered* by this definition. Once it has reported that
a given file descriptor is ready, it *won't* report that file
descriptor again until you do something with it. This can be
a subtle source of bugs in select-based code if you're not
aware of it.


From ubershmekel at  Sun Oct 14 07:40:57 2012
From: ubershmekel at (Yuval Greenfield)
Date: Sun, 14 Oct 2012 07:40:57 +0200
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 2:04 AM, MRAB <python at> wrote:

> If it's more than one codepoint, we could prefix with the length of the
> codepoint's name:
> def __12CIRCLED_PLUS__(x, y):
>     ...
That's a bit impractical, and why reinvent the wheel? I'd much rather:

def \u2295(x, y):

So readable I want to read it twice. And that's not legal python today so
we don't break backwards compatibility!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Sun Oct 14 08:24:55 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 14 Oct 2012 19:24:55 +1300
Subject: [Python-ideas] The async API of the future: Twisted
	and	Deferreds
In-Reply-To: <>
References: <>
Message-ID: <>

Devin Jeanpierre wrote (concerning callbacks):
> If we look at this, we're expecting to deal with a set of functions
> that manage shared data. The abstraction for this is usually an
> object, and we'd really probably write the callbacks in a class unless
> we were being contrarian. And it's not too crazy for the dispatcher to
> know this and expect you to write it as a class that supports a
> certain interface (certain methods correspond to certain events).

IIUC, what Guido objects to is callbacks that are methods *of the
I/O object*, so that you have to subclass the library-supplied
object and override them.

You seem to be talking about something slightly different -- an
object that's entirely supplied by the user, and simply bundles
a set of callbacks together. That doesn't seem so bad.


From greg.ewing at  Sun Oct 14 09:12:04 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 14 Oct 2012 20:12:04 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

I've had some thoughts on why I'm uncomfortable
about this kind of pattern:

    data = yield sock.async_read(1024)

The idea here is that sock.async_read() returns a
Future or similar object that performs the I/O and
waits for the result.

However, reading the data isn't necessarily the point
at which the suspension actually occurs. If you're
using a select-style event loop, the async read
operation breaks down into

    1. Wait for data to arrive on the socket
    2. Read the data

So the implementation of sock.async_read() is going
to have to create another Future to handle waiting
for the socket to become ready. But then the outer
Future is an unnecessary complication, because you
could get the same effect by defining

    def async_read(self, length):
       yield future_to_wait_for_fd(self.fd)
       return, length)

and calling it using

    data = yield from sock.async_read(1024)

If Futures are to appear anywhere, they should only
be at the very bottom layer, at the transition
between generator and non-generator code. And the
place where that transition occurs depend on how
the lower levels are implemented. If you're using
IOCP instead of select, for example, you need to
do things the other way around:

    1. Start the read operation
    2. Wait for it to complete

So I feel that all public APIs should be functions
called using yield-from, leaving it up to the
implementation to decide if and where Futures
become involved.


From _ at  Sun Oct 14 11:32:01 2012
From: _ at (Laurens Van Houtven)
Date: Sun, 14 Oct 2012 11:32:01 +0200
Subject: [Python-ideas] The async API of the future: Some thoughts from
 an ignorant Tornado user
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 12:27 AM, Daniel McDougall <
daniel.mcdougall at> wrote:

> (This is a response to GVR's Google+ post asking for ideas; I
> apologize in advance if I come off as an ignorant programming newbie)

-- snip snip snip --

import async # The API of the future ;)
> async.async_call(retrieve_log_playback, settings, tws,
> mechanism=multiprocessing)
> # tws == instance of tornado.web.WebSocketHandler that holds the open
> connection

Is this a CPU-bound problem?

My opinion is that the goal of any async module that winds up in
> Python's standard library should be simplicity and portability.  In
> terms of features, here's my 'async wishlist':
> * I should not have to worry about what is and isn't pickleable when I
> decide that a task should be performed asynchronously.

Certainly. My above question is important, because this should only matter
for IPC.

> * I should be able to choose the type of event loop/async mechanism
> that is appropriate for the task:  For CPU-bound tasks I'll probably
> want to use multiprocessing.  For IO-bound tasks I might want to use
> threading.  For a multitude of tasks that "just need to be async" (by
> nature) I'll want to use an event loop.

Ehhh, maybe. This sounds like it confounds the tools for different use
cases. You can quite easily have threads and processes on top of an event
loop; that works out particularly nicely for processes because you still
have to talk to your processes.


twisted.internet.reactor.spawnProcess (local processes)
twisted.internet.threads.deferToThread (local threads)
ampoule (remote processes)

It's quite easy to do blocking IO in a thread with deferToThread; in fact,
that's how twisted's adbapi, an async wrapper to dbapi, works.

* Any async module should support 'basics' like calling functions at
> an interval and calling functions after a timeout occurs (with the
> ability to cancel).
> * Asynchronous tasks should be able to access the same namespace as
> everything else.  Maybe wishful thinking.

With twisted, this is already the case; general caveats for shared mutable
state across threads of course still apply. Fortunately in most Twisted
apps, that's a tiny fraction of the total code, and they tend to be
fractions that are well-isolated or at least easily isolatable.

> * It should support publish/subscribe-style events (i.e. an event
> dispatcher).  For example, the ability to watch a file descriptor or
> socket for changes in state and call a function when that happens.
> Preferably with the flexibility to define custom events (i.e don't
> have it tied to kqueue/epoll-specific events).

Like connectionMade, connectionLost, dataReceived etc?

> Thanks for your consideration; and thanks for the awesome language.
> --
> Dan McDougall - Chief Executive Officer and Developer
> Liftoff Software ? Your flight to the cloud is now boarding.
> 904-446-8323
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Sun Oct 14 12:40:48 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 14 Oct 2012 12:40:48 +0200
Subject: [Python-ideas] The async API of the future: yield-from
References: <>
Message-ID: <>

On Sun, 14 Oct 2012 20:12:04 +1300
Greg Ewing <greg.ewing at> wrote:
> So the implementation of sock.async_read() is going
> to have to create another Future to handle waiting
> for the socket to become ready. But then the outer
> Future is an unnecessary complication, because you
> could get the same effect by defining
>     def async_read(self, length):
>        yield future_to_wait_for_fd(self.fd)
>        return, length)

read() may fail even if select() returned successfully.

What this means is that your select-style event loop should probably
also handle actually reading the data. Besides, this will make its API
more easily ported to something like IOCP.



Software development and contracting:

From steve at  Sun Oct 14 12:48:59 2012
From: steve at (Steven D'Aprano)
Date: Sun, 14 Oct 2012 21:48:59 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On 13/10/12 18:41, Nick Coghlan wrote:

> str has a *big* API, and much of it doesn't make any sense in the
> particular case of path objects. In particular, path objects shouldn't
> be iterable, because it isn't clear what iteration should mean: it
> could be path segments, it could be parent paths, or it could be
> directory contents. It definitely *shouldn't* be individual
> characters, but that's what we would get if it inherited from strings.

Ah, I wondered if anyone else had picked up on that. When I read the PEP,
I was concerned about the mental conflict between iteration and indexing
of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does
something completely different from iterating over p directly.

Indexing gives path components; iteration gives children of the path
(like os.walk).

-1 on iteration over the children. Instead, use:

for child in p.walk():

which has the huge benefit that the walk method can take arguments as
needed, such as the args os.walk takes:

topdown=True, onerror=None, followlinks=False

plus I'd like to see a "filter" argument to filter which children
are (or aren't) seen.

+1 on indexing giving path components, although the side effect of
this is that you automatically get iteration via the sequence protocol.
So be it -- I don't think we should be scared to *choose* an iteration
model, just because there are other potential models. Using indexing
to get path components is useful, slicing gives you sub paths for free,
and if the cost of that is that you can iterate over the path, well,
I'm okay with that:

p = Path('/usr/local/lib/python3.3/')
=> ['/', 'usr', 'local', 'lib', 'python3.3', '']

Works for me.


From solipsis at  Sun Oct 14 12:43:27 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 14 Oct 2012 12:43:27 +0200
Subject: [Python-ideas] The async API of the future: Twisted and
References: <>
Message-ID: <>

On Sat, 13 Oct 2012 22:03:17 -0700
Guido van Rossum <guido at> wrote:
> >
> > But how would you write a dataReceived equivalent then? Would you have
> > a "task" looping on a read() call, e.g.
> >
> > @task
> > def my_protocol_main_loop(conn):
> >     while <some_condition>:
> >         try:
> >             data = yield
> >         except ConnectionError:
> >             conn.close()
> >             break
> No, I would use plain callbacks. There would be some kind of IOObject
> class defined by the stdlib that wraps a socket (it would make it
> non-blocking, and possibly to other things), and the user would make a
> registration call to the event loop giving it the IOOjbect and the
> user's callback function plus *args and **kwds; the event loop would
> call callback(*args, **kwds) each time the IOObject became readable.
> (Oh, and there would be separate registration (and unregistration)
> functions for reading and writing.)
> Apparently my rants about callbacks have made people assume that I
> don't want to see them anywhere. In fact I am comfortable with
> callbacks for a number of situations -- I just think we have several
> other tools in our toolbox that are way underused, whereas callbacks
> are way overused, in part because the alternative tools are relatively
> new.
> This way the user could switch to a different callback when a
> different phase of the protocol is reached. I realize there are other
> shapes this API could take. But I really don't want the user to have
> to subclass IOObject.

Subclassing IOObject would be wrong, since the user isn't writing an IO
object in the first place. But subclassing a separate class, like
Twisted's Protocol (which is mostly an empty shell, really), would sound
reasonable to me.



Software development and contracting:

From steve at  Sun Oct 14 13:02:19 2012
From: steve at (Steven D'Aprano)
Date: Sun, 14 Oct 2012 22:02:19 +1100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 14/10/12 02:22, Mike Meyer wrote:
> On Sat, 13 Oct 2012 19:18:12 +1100
> Steven D'Aprano<steve at>  wrote:
>> On 13/10/12 19:05, Yuval Greenfield wrote:
>> I believe that Haskell treats operators as if they were function objects,
>> so you could do something like:
> For the record, Haskell allows operators to be used as functions by
> quoting them in ()'s (to provide the functionality of operator) and to
> turn functions into operators by quoting them in ``'s.
>> negative_values = map(-, values)
>> but I think that puts the emphasis on the wrong thing. If (and that's a big
>> if) we did something like this, it should be a pair of methods __op__ and
>> the right-hand version __rop__ which get called on the *operands*, not the
>> operator/function object:
>> def __op__(self, other, symbol)
> Yeah, but then your function has to dispatch for *all*
> operators. Depending on how we handle backwards compatibility with
> __add__ et. al.

It looks like I didn't make myself clear. I didn't think it was necessary to
go into too much detail for an off-the-cuff comment about an idea that can't
go anywhere for at least another five years. I should have known better :)

What I meant was that standard Python operators like +, -, &, etc. would
continue to dispatch at the compiler level to dunder methods __add__, __sub__,
__and__ etc. But there could be a way to add new operators, in which case
Python could call a dedicated dunder method __op__ with two arguments, the
"other" operand and the operator itself. Your class needs to define the
__op__ method, but it only needs to dispatch on operators it cares about.

I have no idea how this would work out in practice, given that presumably
Python would still want to raise SyntaxError on illegal/unknown operators
at compile time.

As I said, this is Python 4 territory. Let's sleep on it for four or six
years, hey? :)


From solipsis at  Sun Oct 14 13:03:18 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 14 Oct 2012 13:03:18 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On Sun, 14 Oct 2012 21:48:59 +1100
Steven D'Aprano <steve at> wrote:
> Ah, I wondered if anyone else had picked up on that. When I read the PEP,
> I was concerned about the mental conflict between iteration and indexing
> of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does
> something completely different from iterating over p directly.

p[0] p[1] etc. are just TypeErrors:

>>> p = Path('.')
>>> p[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "", line 951, in __getitem__
    return self._make_child((key,))
  File "", line 1090, in _make_child
    return self._from_parts(parts)
  File "", line 719, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "", line 711, in _parse_args
    % type(a))
TypeError: argument should be a path or str object, not <class 'int'>

So, yes, it's doing "something different", but there is little chance
of silent bugs :-)

> -1 on iteration over the children. Instead, use:
> for child in p.walk():
>      ...
> which has the huge benefit that the walk method can take arguments as
> needed, such as the args os.walk takes:
> topdown=True, onerror=None, followlinks=False

Judging by its name and signature, walk() would be a recursive
operation, while iterating on a path isn't (it only gets you the

> +1 on indexing giving path components, although the side effect of
> this is that you automatically get iteration via the sequence protocol.
> So be it -- I don't think we should be scared to *choose* an iteration
> model, just because there are other potential models.

There is already a .parts property which does exactly that:

The problem with enabling sequence access *on the path object* is that
you get confusion with str's own sequencing behaviour, if you happen to
pass a str instead of a Path, or the reverse. Which is explained
briefly here:



Software development and contracting:

From steve at  Sun Oct 14 13:21:58 2012
From: steve at (Steven D'Aprano)
Date: Sun, 14 Oct 2012 22:21:58 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On 14/10/12 22:03, Antoine Pitrou wrote:
> On Sun, 14 Oct 2012 21:48:59 +1100
> Steven D'Aprano<steve at>  wrote:
>> Ah, I wondered if anyone else had picked up on that. When I read the PEP,
>> I was concerned about the mental conflict between iteration and indexing
>> of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does
>> something completely different from iterating over p directly.
> p[0] p[1] etc. are just TypeErrors:

Ah, my mistake... I didn't register that you sequenced over the parts
attribute, not the path itself. Sorry for the noise.


From ubershmekel at  Sun Oct 14 14:04:52 2012
From: ubershmekel at (Yuval Greenfield)
Date: Sun, 14 Oct 2012 14:04:52 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Sun, Oct 14, 2012 at 1:03 PM, Antoine Pitrou <solipsis at> wrote:

> On Sun, 14 Oct 2012 21:48:59 +1100
> Steven D'Aprano <steve at> wrote:> -1 on iteration over the
> children. Instead, use:
> >
> > for child in p.walk():
> >      ...
> >
> > which has the huge benefit that the walk method can take arguments as
> > needed, such as the args os.walk takes:
> >
> > topdown=True, onerror=None, followlinks=False
> Judging by its name and signature, walk() would be a recursive
> operation, while iterating on a path isn't (it only gets you the
> children).
Steven realized what currently happens and was suggesting doing it

Personally I really dislike the idea that

    [i for i in p][0] != p[0]

It makes no sense to have this huge surprise.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Sun Oct 14 14:13:21 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 14 Oct 2012 14:13:21 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
	<> <>
	<> <>
Message-ID: <1350216801.3484.0.camel@localhost.localdomain>

Le dimanche 14 octobre 2012 ? 14:04 +0200, Yuval Greenfield a ?crit :
> Steven realized what currently happens and was suggesting doing it
> differently.
> Personally I really dislike the idea that
>     [i for i in p][0] != p[0]
> It makes no sense to have this huge surprise.

Again, p[0] just raises TypeError.



From steve at  Sun Oct 14 14:45:42 2012
From: steve at (Steven D'Aprano)
Date: Sun, 14 Oct 2012 23:45:42 +1100
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <1350216801.3484.0.camel@localhost.localdomain>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
Message-ID: <>

On 14/10/12 23:13, Antoine Pitrou wrote:
> Le dimanche 14 octobre 2012 ? 14:04 +0200, Yuval Greenfield a ?crit :
>> Steven realized what currently happens and was suggesting doing it
>> differently.
>> Personally I really dislike the idea that
>>      [i for i in p][0] != p[0]
>> It makes no sense to have this huge surprise.
> Again, p[0] just raises TypeError.

Well, that's two people so far who have conflated "" as just p.
Perhaps that's because "parts" is so similar to "path".

Since we already refer to the bits of a path as "path components",
perhaps this bike shed ought to be spelled "p.components". It's longer,
but I bet nobody will miss it.


From shibturn at  Sun Oct 14 14:48:30 2012
From: shibturn at (Richard Oudkerk)
Date: Sun, 14 Oct 2012 13:48:30 +0100
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <k5ecb2$o9a$>

On 14/10/2012 6:29am, Greg Ewing wrote:
> Not sure if this is relevant, but I'd just like to point out
> that the behaviour of select() in this respect is actually
> *edge triggered* by this definition. Once it has reported that
> a given file descriptor is ready, it *won't* report that file
> descriptor again until you do something with it. This can be
> a subtle source of bugs in select-based code if you're not
> aware of it.

Unless I have misunderstood you, the following example contradicts that:

 >>> import os, select
 >>> r, w = os.pipe()
 >>> os.write(w, b"hello")
 >>>[r], [], [])
([3], [], [])
 >>>[r], [], [])
([3], [], [])


From ironfroggy at  Sun Oct 14 15:09:10 2012
From: ironfroggy at (Calvin Spealman)
Date: Sun, 14 Oct 2012 09:09:10 -0400
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Sun, Oct 14, 2012 at 8:45 AM, Steven D'Aprano <steve at> wrote:
> On 14/10/12 23:13, Antoine Pitrou wrote:
>> Le dimanche 14 octobre 2012 ? 14:04 +0200, Yuval Greenfield a ?crit :
>>> Steven realized what currently happens and was suggesting doing it
>>> differently.
>>> Personally I really dislike the idea that
>>>      [i for i in p][0] != p[0]
>>> It makes no sense to have this huge surprise.
>> Again, p[0] just raises TypeError.
> Well, that's two people so far who have conflated "" as just p.
> Perhaps that's because "parts" is so similar to "path".
> Since we already refer to the bits of a path as "path components",
> perhaps this bike shed ought to be spelled "p.components". It's longer,
> but I bet nobody will miss it.

I would prefer to see p.split()

It matches the existing os.path.split() better and I like the idea of
a new library matching the old, to be an easier transition for brains.

That said, it also looks too much like str.split()

> --
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From shane at  Sun Oct 14 15:18:29 2012
From: shane at (Shane Green)
Date: Sun, 14 Oct 2012 06:18:29 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <k5ecb2$o9a$>
References: <>
	<> <k5ecb2$o9a$>
Message-ID: <>

Not sure I follow, but yeah: select reports the state of the file-descriptor.  While the descriptor is readable, every call to select will indicate that it's readable, etc. 

Shane Green
805-452-9666 | shane at

On Oct 14, 2012, at 5:48 AM, Richard Oudkerk <shibturn at> wrote:

> On 14/10/2012 6:29am, Greg Ewing wrote:
>> Not sure if this is relevant, but I'd just like to point out
>> that the behaviour of select() in this respect is actually
>> *edge triggered* by this definition. Once it has reported that
>> a given file descriptor is ready, it *won't* report that file
>> descriptor again until you do something with it. This can be
>> a subtle source of bugs in select-based code if you're not
>> aware of it.
> Unless I have misunderstood you, the following example contradicts that:
> >>> import os, select
> >>> r, w = os.pipe()
> >>> os.write(w, b"hello")
> 5
> >>>[r], [], [])
> ([3], [], [])
> >>>[r], [], [])
> ([3], [], [])
> --
> Richard
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Oct 14 16:36:38 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 07:36:38 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 10:05 PM, Greg Ewing
<greg.ewing at> wrote:
[Long sections snipped, all very clear]
> Guido van Rossum wrote:

>> (6) Spawning off multiple async subtasks
>> Futures:
>>   f1 = subtask1(args1)  # Note: no yield!!!
>>   f2 = subtask2(args2)
>>   res1, res2 = yield f1, f2
>> Yield-from:
>>   ??????????
>> *** Greg, can you come up with a good idiom to spell concurrency at
>> this level? Your example only has concurrency in the philosophers
>> example, but it appears to interact directly with the scheduler, and
>> the philosophers don't return values. ***
> I don't regard the need to interact directly with the scheduler
> as a problem. That's because in the world I envisage, there would
> only be *one* scheduler, for much the same reason that there can
> really only be one async event handling loop in any given program.
> It would be part of the standard library and have a well-known
> API that everyone uses.
> If you don't want things to be that way, then maybe this is a
> good use for yielding things to the scheduler. Yielding a generator
> could mean "spawn this as a concurrent task".
> You could go further and say that yielding a tuple of generators
> means to spawn them all concurrently, wait for them all to
> complete and send back a tuple of the results. The yield-from
> code would then look pretty much the same as the futures code.

Sadly it looks that

  r = yield from (f1(), f2())

ends up interpreting the tuple as the iterator, and you end up with

  r = (f1(), f2())

(i.e., a tuple of generators) rather than the desired

 r = ((yield from f1()), (yield from f2()))

> However, I'm inclined to think that this is too much functionality
> to build directly into the scheduler, and that it would be better
> provided by a class or function that builds on more primitive
> facilities.

Possibly. In NDB it is actually a very common operation which looks
quite elegant. But your solution below is fine (and helps by giving
people a specific entry in the documentation they can look up!)

> So it would look something like
> Yield-from:
>    task1 = subtask1(args1)
>    task2 = subtask2(args2)
>    res1, res2 = yield from par(task1, task2)
> where the implementation of par() is left as an exercise for
> the reader.

So, can par() be as simple as

def par(*args):
  results = []
  for task in args:
    result = yield from task
  return results


Or does it need to interact with the scheduler to ensure fairness?
(Not having built one of these, my intuition for how the primitives
fit together is still lacking, so excuse me for asking naive

Of course there's the question of what to do when one of the tasks
raises an error -- I haven't quite figured that out in NDB either, it
runs all the tasks to completion but the caller only sees the first
exception. I briefly considered having an "multi-exception" but it
felt too weird -- though I'm not married to that decision.

>> (7) Checking whether an operation is already complete
>> Futures:
>>   if f.done(): ...
> I'm inclined to think that this is not something the
> scheduler needs to be directly concerned with. If it's
> important for one task to know when another task is completed,
> it's up to those tasks to agree on a way of communicating
> that information between them.
> Although... is there a way to non-destructively test whether
> a generator is exhausted? If so, this could easily be provided
> as a scheduler primitive.

Nick answered this affirmatively.

>> (8) Getting the result of an operation multiple times
>> Futures:
>>   f = async_op(args)
>>   # squirrel away a reference to f somewhere else
>>   r = yield f
>>   # ... later, elsewhere
>>   r = f.result()
> Is this really a big deal? What's wrong with having to store
> the return value away somewhere if you want to use it
> multiple times?

I suppose that's okay.

>> (9) Canceling an operation
>> Futures:
>>   f.cancel()
> This would be another scheduler primitive.
> Yield-from:
>    cancel(task)
> This would remove the task from the ready list or whatever
> queue it's blocked on, and probably throw an exception into
> it to give it a chance to clean up.

Ah, of course. (I said I was asking newbie questions. Consider me your
first newbie!)

>> (10) Registering additional callbacks
>> Futures:
>>   f.add_done_callback(callback)
> Another candidate for a higher-level facility, I think.
> The API might look something like
> Yield-from:
>    cbt = task_with_callbacks(task)
>    cbt.add_callback(callback)
>    yield from
> I may have a go at coming up with implementations for some of
> these things and send them in later posts.

Or better, add them to the tutorial. (Or an advanced tutorial, "common
async patterns". That would actually be a useful collection of use
cases for whatever we end up building.)

Here's another pattern that I can't quite figure out. It started when
Ben Darnell posted a link to Tornado's chat demo
I didn't understand it and asked him offline what it meant.
Essentially, it's a barrier pattern where multiple tasks (each
representing a different HTTP request, and thus not all starting at
the same time) render a partial web page and then block until a new
HTTP request comes in that provides the missing info. (For technical
reasons they only do this once, and then the browsers re-fetch the
URL.) When the missing info is available, it must wake up all blocked
task and give then the new info.

I wrote a Futures-based version of this -- not the whole thing, but
the block-until-more-info-and-wakeup part. Here it is (read 'info' for

Each waiter executes this code when it is ready to block:

f = Future()  # Explicitly create a future!
messages = yield f
<process messages and quit>

I'd write a helper for the first two lines:

def register():
  f = Future()
  return f

Then the waiter's code becomes:

messages = yield register()
<process messages and quit>

When new messages become available, the code just sends the same
results to all those Futures:

def wakeup(messages):
  for waiter in waiters:

(OO sauce left to the reader. :-)

If you wonder where the code is that hooks up the waiter.set_result()
call with the yield, that's done by the scheduler: when a task yields
a Future, it adds a callback to the Future that reschedules the task
when the Future's result is set.

Edge cases:

- Were the waiter to lose interest, it could remove its Future from
the list of waiters, but no harm is done leaving it around either.
(NDB doesn't have this feature, but if you have a way to remove
callbacks, setting the result of a Future that nobody cares about has
no ill effect. You could even use a weak set...)

- It's possible to broadcast an exception to all waiters by using

--Guido van Rossum (

From guido at  Sun Oct 14 16:39:41 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 07:39:41 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 12:12 AM, Greg Ewing
<greg.ewing at> wrote:
> I've had some thoughts on why I'm uncomfortable
> about this kind of pattern:
>    data = yield sock.async_read(1024)
> The idea here is that sock.async_read() returns a
> Future or similar object that performs the I/O and
> waits for the result.
> However, reading the data isn't necessarily the point
> at which the suspension actually occurs. If you're
> using a select-style event loop, the async read
> operation breaks down into
>    1. Wait for data to arrive on the socket
>    2. Read the data
> So the implementation of sock.async_read() is going
> to have to create another Future to handle waiting
> for the socket to become ready. But then the outer
> Future is an unnecessary complication, because you
> could get the same effect by defining
>    def async_read(self, length):
>       yield future_to_wait_for_fd(self.fd)
>       return, length)
> and calling it using
>    data = yield from sock.async_read(1024)
> If Futures are to appear anywhere, they should only
> be at the very bottom layer, at the transition
> between generator and non-generator code. And the
> place where that transition occurs depend on how
> the lower levels are implemented. If you're using
> IOCP instead of select, for example, you need to
> do things the other way around:
>    1. Start the read operation
>    2. Wait for it to complete
> So I feel that all public APIs should be functions
> called using yield-from, leaving it up to the
> implementation to decide if and where Futures
> become involved.

A logical and consistent conclusion. I actually agree: in NDB, where
all I have is "yield <future>" I have a similar guideline: all public
async APIs return a Future and must be waited on using yield, and only
at the lowest level are other types primitives involved (bare App
Engine RPCs, callbacks).

--Guido van Rossum (

From ethan at  Sun Oct 14 16:50:06 2012
From: ethan at (Ethan Furman)
Date: Sun, 14 Oct 2012 07:50:06 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<k4pr17$i94$>
	<>	<>	<>
	<k4q3o6$m9q$>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Steven D'Aprano wrote:
> On 14/10/12 22:03, Antoine Pitrou wrote:
>> On Sun, 14 Oct 2012 21:48:59 +1100
>> Steven D'Aprano<steve at> wrote:
>>> Ah, I wondered if anyone else had picked up on that. When I read the 
>>> PEP,
>>> I was concerned about the mental conflict between iteration and 
>>> indexing
>>> of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does
>>> something completely different from iterating over p directly.
>> p[0] p[1] etc. are just TypeErrors:
> Ah, my mistake... I didn't register that you sequenced over the parts
> attribute, not the path itself. Sorry for the noise.

I actually prefer Steven's interpretation. If we are going to iterate 
directly on a path object, we should be yeilding the pieces of the path 
object. After all, a path can contain a file name (most of mine do) and 
what sense does it make to iterate over the children of 


From ironfroggy at  Sun Oct 14 17:01:15 2012
From: ironfroggy at (Calvin Spealman)
Date: Sun, 14 Oct 2012 11:01:15 -0400
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 3:32 PM, Guido van Rossum <guido at> wrote:
> On Fri, Oct 12, 2012 at 11:33 AM, Antoine Pitrou <solipsis at> wrote:
>> On Fri, 12 Oct 2012 11:13:23 -0700
>> Guido van Rossum <guido at> wrote:
>>> OTOH someone else might prefer a buffered stream
>>> abstraction that just keeps filling its read buffer (and draining its
>>> write buffer) using level-triggered callbacks, at least up to a
>>> certain buffer size -- we have to be robust here and make it
>>> impossible for an evil client to fill up all our memory without our
>>> approval!
>> I'd like to know what a sane buffered API for non-blocking I/O may look
>> like, because right now it doesn't seem to make a lot of sense. At
>> least this bug is tricky to resolve:
> Good question. It actually depends quite a bit on whether you have an
> event loop or not -- with the help of an event loop, you can have a
> level-triggered callback that fills the buffer behind your back (up to
> a given limit, at which point it should unregister the I/O object);
> that bug seems to be about a situation without an event loop, where
> you can't do that. Also the existing io module design never
> anticipated cooperation with an event loop.
>>> - There's an abstract Reactor class and an abstract Async I/O object
>>> class. To get a reactor to call you back, you must give it an I/O
>>> object, a callback, and maybe some more stuff. (I have gone back and
>>> like passing optional args for the callback, rather than requiring
>>> lambdas to create closures.) Note that the callback is *not* a
>>> designated method on the I/O object!
>> Why isn't it? In practice, you need several callbacks: in Twisted
>> parlance, you have dataReceived but also e.g. ConnectionLost
>> (depending on the transport, you may even imagine other callbacks, for
>> example for things happening on the TLS layer?).
> Yes, but I really want to separate the callbacks from the object, so
> that I don't have to inherit from an I/O object class -- asyncore
> requires this and IMO it's wrong. It also makes it harder to use the
> same callback code with different types of I/O objects.

Why is subclassing a problem? It can be overused, but seems the right
thing to do in this case. You want a protocol that responds to new data by
echoing and tells the user when the connection was terminated? It makes
sense that this is a subclass: a special case of some class that handles the
base behavior.

What if this was just an optional way and we could also provide a helper to
attach handlers to the base class instance without subclassing it? The function
registering it could take keyword arguments mapping additional event->callbacks
to the object.

>>> - In systems supporting file descriptors, there's a reactor
>>> implementation that knows how to use select/poll/etc., and there are
>>> concrete I/O object classes that wrap file descriptors. On Windows,
>>> those would only be socket file descriptors. On Unix, any file
>>> descriptor would do.
>> Windows *is* able to do async I/O on things other than sockets (see the
>> discussion about IOCP). It's just that the Windows implementation of
>> select() (the POSIX function call) is limited to sockets.
> I know, but IOCP is currently not supported in the stdlib. I expect
> that on Windows, to use IOCP, you'd need to use a different reactor
> implementation and a different I/O object than the vanilla fd-based
> ones. My design is actually *inspired* by the desire to support this
> cleanly.
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From guido at  Sun Oct 14 17:11:46 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 08:11:46 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 3:43 AM, Antoine Pitrou <solipsis at> wrote:
> Subclassing IOObject would be wrong, since the user isn't writing an IO
> object in the first place. But subclassing a separate class, like
> Twisted's Protocol (which is mostly an empty shell, really), would sound
> reasonable to me.

It's a possible style. I'm inclined not to follow this example but I
could go either way. One thing that somewhat worries me is that the
names of these methods will be baked forever into all user code. As a
user I prefer to have control over the names of my methods; first,
there's the style issue (e.g. I'm always conflicted over what style to
use in unittest.TestCase subclasses, since its own style is setUp,
tearDown); second, in my app there may be a much better name for what
the method does than e.g. data_received(). (Not to mention that that's
another adjective used as a verb. ;-)

--Guido van Rossum (

From solipsis at  Sun Oct 14 17:16:40 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 14 Oct 2012 17:16:40 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
Message-ID: <>

On Sun, 14 Oct 2012 07:50:06 -0700
Ethan Furman <ethan at> wrote:
> Steven D'Aprano wrote:
> > On 14/10/12 22:03, Antoine Pitrou wrote:
> >> On Sun, 14 Oct 2012 21:48:59 +1100
> >> Steven D'Aprano<steve at> wrote:
> >>>
> >>> Ah, I wondered if anyone else had picked up on that. When I read the 
> >>> PEP,
> >>> I was concerned about the mental conflict between iteration and 
> >>> indexing
> >>> of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does
> >>> something completely different from iterating over p directly.
> >>
> >> p[0] p[1] etc. are just TypeErrors:
> >
> >
> > Ah, my mistake... I didn't register that you sequenced over the parts
> > attribute, not the path itself. Sorry for the noise.
> >
> >
> >
> I actually prefer Steven's interpretation. If we are going to iterate 
> directly on a path object, we should be yeilding the pieces of the path 
> object.
> After all, a path can contain a file name (most of mine do) and 
> what sense does it make to iterate over the children of 
> /usr/home/ethanf/some_table.dbf?

Well, given that:

1. sequence access (including the iterator protocol) to the path's
parts is already provided through the ".parts" property

2. it makes little sense to actually iterate over those parts (what
operations are you going to do sequentially over '/', then 'home', then
'ethanf', etc.?)

... I think yielding the directory contents is a much more useful
alternative when iterating over the path itself.



Software development and contracting:

From _ at  Sun Oct 14 17:29:27 2012
From: _ at (Laurens Van Houtven)
Date: Sun, 14 Oct 2012 17:29:27 +0200
Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 4:39 AM, Guido van Rossum <guido at> wrote:

> Odd. Were those people experienced in writing / reviewing PEPs?

There were a few. Some of them were. Unfortunately the prevalent reason was
politics: "make it clear that you're not just trying to get twisted in the
stdlib". Given that that's been suggested both on and off-list, both now
and then, I guess that wasn't entirely unreasonable (but not providing
things to play with was -- the experience was just so bad I pretty much
never got there).

> >> > Do you feel that there should be less talk about rationale?
> >>
> >> No, but I feel that there should be some actual specification. I am
> >> also looking forward to an actual meaty bit of example code -- ISTR
> >> you mentioned you had something, but that it was incomplete, and I
> >> can't find the link.
> >
> > Just examples of how it would work, nothing hooked up to real code. My
> > memory of it is more of a drowning-in-politics-and-bikeshedding kind of
> > thing, unfortunately :) Either way, I'm okay with letting bygones be
> bygones
> > and focus on how we can get this show on the road.
> Shall I just reject PEP 3153 so it doesn't distract people? Of course
> we can still refer to it when people ask for a rationale for the
> separation between transports and protocols, but it doesn't seem the
> PEP itself is going to be finished (correct me if I'm wrong), and as
> it stands it is not useful as a software specification.

I'm not sure that's necessary; these threads show a lot of willpower to get
it done (even though that's not enough), and it's pretty easy to edit.
You're certainly right that right now it's not a useful software spec; but
neither would an empty new PEP be ;)

--Guido van Rossum (

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Oct 14 17:53:15 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 08:53:15 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 4:42 PM, Devin Jeanpierre
<jeanpierreda at> wrote:
> There has to be some way to contract emails sent in discussions rather
> than exploding them. I swear I'm trying to be concise, yet readable.
> It's not working.

Don't worry too much. I took essentially all Friday starting those
four new threads. I am up at night thinking about the issues. I can't
expect everyone else to have this much time to devote to Python!

> On Fri, Oct 12, 2012 at 6:11 PM, Guido van Rossum <guido at> wrote:
>> I also don't doubt that using classic Futures you can't do this -- the
>> chaining really matter for this style, and I presume this (modulo
>> unimportant API differences) is what typical Twisted code looks like.
> My experience has been unfortunately rather devoid of deferreds in
> Twisted. I always feel like the odd one out when people discuss this
> confusion. For me, it was all Protocol this and Protocol that, and
> deferreds only came up when I used Twisted's great AMP (Asynchronous
> Messaging Protocol) library.

Especially odd since you jumped into the discussion when I called
Deferreds a bad name. :-)

>> However, Python has yield, and you can do much better (I'll write
>> plain yield for now, but it works the same with yield-from):
>> try:
>>   value1 = yield step1(<args>)
>>   value2 = yield step2(value1)
>>   value3 = yield step3(value2)
>>   # Do something with value4
>> except Exception:
>>   # Handle any error from step1 through step4
> --snip--
>> This form is more flexible, since it is easier to catch different
>> exceptions at different points. It is also much easier to pass extra
>> information around. E.g. what if your flow ends up having to pass both
>> value1 and value2 into step3()? Sure, you can do that by making value2
>> a tuple (or a dict, or an object) incorporating value1 and the
>> original value2, but that's exactly where this style becomes
>> cumbersome, whereas in the yield-based form, such things can remain
>> simple local variables. All in all I find it more readable.
> Well, first of all, deferreds have ways of joining values together. For example:
>     from __future__ import print_function
>     from twisted.internet import defer
>     def example_joined():
>         d1 = defer.Deferred()
>         d2 = defer.Deferred()
>         # consumeErrors looks scary, but it only means that
>         # d1 and d2's errbacks aren't called. Instead, the error is sent to d's
>         # errback.
>         d = defer.gatherResults([d1, d2], consumeErrors=True)
>         d.addCallback(print)
>         d.addErrback(lambda v: print("ERROR!"))
>         d1.callback("The first deferred has succeeded")
>         # now we're waiting on the second deferred to succeed,
>         # which we'll let the caller handle
>         return d2
>     example_joined().callback("The second deferred has succeeded too!")
>     print("==============")
>     example_joined().errback("The second deferred has failed...")

I'm sorry, but that's not very readable at all. You needed a lambda
(which if there was anything more would have to be expanded using
'def') and you're cheating by passing print as a callable (which saves
you a second lambda, but only in this simple case).

A readable version of this could should not have to use lambdas.

> I agree it's easier to use the generator style in many complicated
> cases. That doesn't preclude manual deferreds from also being useful.

Yeah, but things should be as simple as they can. If you can do
everything using plain callbacks, Futures and coroutines, why add
Deferreds even if you can? (Except for backward compatibility of
course. That's a totally different topic. But we're first defining the
API of the future.) If Greg Ewing had his way we'd even do without
Futures -- I'm still considering that bid. (In the yield-from thread
I'm asking for common patterns that the new API should be able to

>> So, in the end, for Python 3.4 and beyond, I want to promote a style
>> that mixes simple callbacks (perhaps augmented with simple Futures)
>> and generator-based coroutines (either PEP 342, yield/send-based, or
>> PEP 380 yield-from-based). I'm looking to Twisted for the best
>> reactors (see other thread). But for transport/protocol
>> implementations I think that generator/coroutines offers a cleaner,
>> better interface than incorporating Deferred.
> Egh. I mean, sure, supposed we have those things. But what if you want
> to send the result of a callback to a generator-coroutine? Presumably
> generator coroutines work by yielding deferreds and being called back
> when the future resolves (deferred fires).

No, they don't use deferreds. They use Futures. You've made it quite
clear that they are very different.

> But if those
> futures/deferreds aren't unexposed, and instead only the generator
> stuff is exposed, then bridging the gap between callbacks and
> generator-coroutines is impossible. So every callback function has to
> also be defined to use something else. And worse, other APIs using
> callbacks are left in the dust.

My plan is to expose the Futures *will* be exposed -- this is what
worked well in NDB.

> Suppose, OTOH, futures/deferreds are exposed. Then we can easily
> bridge between callbacks and generators, by returning a future whose
> `set_result` is the callback to our callback function (deferred whose
> `callback` is the callback).

And that's how NDB does it. I've got a question to Greg Ewing on how he does it.

> But if we're exposing futures/deferreds, why have callbacks in the
> first place? The difference between these two functions, is that the
> second can be used in generator-coroutines trivially and the first
> cannot:
>     # callbacks:
>     reactor.timer(10, print, "hello world")
>     # deferreds
>     reactor.timer(10).addCallback(print, "hello world")

How about this:

  f = <some future>
  reactor.timer(10, f.set_result, None)

Then whoever waits for f gets woken up in 10 seconds, and the reactor
doesn't have to know what Futures are.

But I believe your whole argument may be based on a misreading of my
proposal. *I* want plain callbacks, Futures, and coroutines, and an
event loop that only knows about plain callbacks and IO objects (e.g.

> Now here's another thing: suppose we have a list of "deferred events",
> but instead of handling all 10 at once, we want to handle them "as
> they arrive", and then synthesize a result at the bottom. How do you
> do this with pure generator coroutines?

Let's ask Greg that.

In NDB, I have a wait_any() function that you give a set of Futures
and returns the first one that completes. It would be easy to build an
iterator on top of this that takes a set of Futures and iterates over
them in the order in which they are completed.

> For example, perhaps I am implementing a game server, where all the
> players choose their characters and then the game begins. Whenever a
> character is chosen, everyone else has to know about it so that they
> can plan their strategy based on who has chosen a character. Character
> selections are final, just so that I can use deferreds (hee hee).
> I am imagining something like the following:
>     # WRONG: handles players in a certain order, rather than as they come in
>     def player_lobby(reactor, players):
>         for player in players:
>             player_character = yield player.wait_for_confirm(reactor)
>             player.set_character(player_character)
>             # tell all the other players what character the player has chosen
>             notify_choice((player, player_character), players)
>         start_game(players)
> This is wrong, because it goes in a certain order and "blocks" the
> coroutine until every character is chosen. Players will not know who
> has chosen what characters in an appropriate order.
> But hypothetically, maybe we could do the following:
>     # Hypothetical magical code?
>     def player_lobby(reactor, players):
>         confirmation_events =
> UnorderedEventList([player.wait_for_confirm(reactor) for player in
> players])
>         while confirmation_events:
>             player_character = yield confirmation_events.get_next()
>             player.set_character(player_character)
>             # tell all the other players what character the player has chosen
>             notify_choice((player, player_character), players)
>         start_game(players)
> But then, how do we write UnorderedEventList? I don't really know. I
> suspect I've made the problem harder, not easier! eek. Plus, it
> doesn't even read very well. Especially not compared to the deferred
> version:
> This is how I would personally do it in Twisted, without using
> UnorderedEventList (no magic!):
>     @inlineCallbacks
>     def player_lobby(reactor, players):
>         events = []
>         for player in players:
>             confirm_event = player.wait_for_confirm(reactor)
>             @confirm_event.addCallback
>             def on_confirmation(player_character, player=player)
>                 player.set_character(player_character)
>                 # tell all the other players what character the player has chosen
>                 notify_choice((player, player_character), players)
>         yield gatherResults(events)
>         start_game(players)
> Notice how I dropped down into the level of manipulating deferreds so
> that I could add this "as they come in" functionality, and then went
> back. Actually it wouldn't've hurt much to just not bother with
> inlineCallbacks at all.
> I don't think this is particularly unreadable. More importantly, I
> actually know how to do it. I have no idea how I would do this without
> using addCallback, or without reimplementing addCallback using
> inlineCallbacks.

Clearly we have an educational issue on our hands! :-)

> And then, supposing we don't have these deferreds/futures exposed...
> how do we implement delayed computation stuff from extension modules?
> What if we want to do these kinds of compositions within said
> extension modules? What if we want to write our own version of @tasks
> or @inlineCallbacks with extra features, or generate callback chains
> from XML files, and so on?
> I don't really like the prospect of having just the "sugary syntax"
> available, without a flexible underlying representation also exposed.
> I don't know if you've ever shared that worry -- sometimes the pretty
> syntax gets in the way of getting stuff done.

You're barking up the wrong tree -- please badger Greg Ewing with use
cases in the yield-from thread. With my approach all of these can be
done. (See the yield-from thread for an example I just posted of a
barrier, where multiple tasks wait for a single event.)

>> I hope that the path forward for Twisted will be simple enough: it
>> should be possible to hook Deferred into the simpler callback APIs
>> (perhaps a new implementation using some form of adaptation, but
>> keeping the interface the same). In a sense, the greenlet/gevent crowd
>> will be the biggest losers, since they currently write async code
>> without either callbacks or yield, using microthreads instead. I
>> wouldn't want to have to start putting yield back everywhere into that
>> code. But the stdlib will still support yield-free blocking calls
>> (even if under the hood some of these use yield/send-based or
>> yield-from-based couroutines) so the monkey-patchey tradition can
>> continue.
> Surely it's no harder to make yourself into a generator than to make
> yourself into a low-level thread-like context switching function with
> a saved callstack implemented by hand in assembler, and so on?
> I'm sure they'll be fine.

The thing that worries me most is reimplementing httplib, urllib and
so on to use all this new machinery *and* keep the old synchronous
APIs working *even* if some code is written using the old style and
some other code wants to use the new style.

>>> 1. Explicit callbacks:
>>>     For example, reactor.callLater(t, lambda: print("woo hoo"))
>> I actually like this, as it's a lowest-common-denominator approach
>> which everyone can easily adapt to their purposes. See the thread I
>> started about reactors.
> Will do (but also see my response above about why not "everyone" can).
>>> 2. Method dispatch callbacks:
>>>     Similar to the above, the reactor or somebody has a handle on your
>>> object, and calls methods that you've defined when events happen
>>>     e.g. IProtocol's dataReceived method
>> While I'm sure it's expedient and captures certain common patterns
>> well, I like this the least of all -- calling fixed methods on an
>> object sounds like a step back; it smells of the old Java way (before
>> it had some equivalent of anonymous functions), and of asyncore, which
>> (nearly) everybody agrees is kind of bad due to its insistence that
>> you subclass its classes. (Notice how subclassing as the prevalent
>> approach to structuring your code has gotten into a lot of discredit
>> since 1996.)
> I only used asyncore once, indirectly, so I don't know anything about
> it. I'm willing to dismiss it (and, in fact, various parts of twisted
> (I'm looking at you twisted.words)) as not good examples of the
> pattern.
> First of all, I'd like to separate the notion of subclassing and
> method dispatch. They're entirely unrelated. If I pass my object to
> you, and you call different methods depending on what happens
> elsewhere, that's method dispatch. And my object doesn't have to be
> subclassed or anything for it to happen.

Agreed. Antoine made the same point elsewhere and I half conceded.

> Now here's the thing. Suppose we're writing, for example, an IRC bot.
> (Everyone loves IRC bots.)

(For the record, I hate IRC, the software, the culture, the
interaction style. But maybe I'm unusual that way. :-)

> My IRC bot needs to handle several
> different possible events, such as:
>     private messages
>     channel join event
>     CTCP event
> and so on. My event handlers for each of these events probably
> manipulate some internal state (such as a log file, or a GUI). We'd
> probably organize this as a class, or else as a bunch of functions
> accessing global state. Or, perhaps a collection of closures. This
> last one is pretty unlikely.

I certainly wouldn't recommend collections of closures for that!

> For the most part, these functions are all intrinsically related and
> can't be sensibly treated separately. You can't take the private
> message callback of Bot A, and the channel join callback of bot B, and
> register these and expect a result that makes sense.
> If we look at this, we're expecting to deal with a set of functions
> that manage shared data. The abstraction for this is usually an
> object, and we'd really probably write the callbacks in a class unless
> we were being contrarian. And it's not too crazy for the dispatcher to
> know this and expect you to write it as a class that supports a
> certain interface (certain methods correspond to certain events).
> Missing methods can be assumed to have the empty implementation (no
> subclassing, just catching AttributeError).
> This isn't too much of an imposition on the user -- any collection of
> functions (with shared state via globals or closure variables) can be
> converted to an object with callable attributes very simply (thanks to
> types.SimpleNamespace, especially). And I only really think this is OK
> when writing it as an object -- as a collection of functions with
> shared state -- is the eminently obvious primary use case, so that
> that situation wouldn't come up very often.
> So, as an example, a protocol that passes data on further down the
> line needs to be notified when data is received, but also when the
> connection begins and ends. So the twisted protocol interface has
> "dataReceived", "connectionMade", and "connectionLost" callbacks.
> These really do belong together, they manage a single connection
> between computers and how it gets mapped to events usable by a twisted
> application. So I like the convenience and suggestiveness of them all
> being methods on an object.

There's also a certain order to them, right? I'd think the state
transition diagram is something like

  connectionMade (1); dataReceived (*); connectionLost (1)

I wonder if there are any guarantees that they will only be called in
this order, and who is supposed to enforce this? If would be awkward
if the user code would have to guard itself against this; also if the
developer made an unwarranted assumption (e.g. dataReceived is called
at least once).

>>> 4. Generator coroutines
>>>     These are a syntactic wrapper around deferreds. If you yield a
>>> deferred, you will be sent the result if the deferred succeeds, or an
>>> exception if the deferred fails.
>>>     e.g. examples from previous message
>> Seeing them as syntactic sugar for Deferreds is one way of looking at
>> it; no doubt this is how they're seen in the Twisted community because
>> Deferreds are older and more entrenched. But there's no requirement
>> that an architecture has to have Deferreds in order to use generator
>> coroutines -- simple Futures will do just fine, and Greg Ewing has
>> shown that using yield-from you can even do without those. (But he
>> does use simple, explicit callbacks at the lowest level of his
>> system.)
> I meant it as a factual explanation of what generator coroutines are
> in Twisted, not what they are in general. Sorry for the confusion. We
> are probably agreed here.
> After a cursory examination, I don't really understand Greg Ewing's
> thing. I'd have to dig deeper into the logs for when he first
> introduced it.

Please press him for explanations. Ask questions. He knows his dream
best of all. We need to learn.

>> I'd like to come back to that Django example though. You are implying
>> that there are some opportunities for concurrency here, and I agree,
>> assuming we believe disk I/O is slow enough to bother making it
>> asynchronously. (In App Engine it's not, and we can't anyways, but in
>> other contexts I agree that it would be bad if a slow disk seek were
>> to hold up all processing -- not to mention that it might really be
>> NFS...)
> --snip--
>> How would you code that using Twisted Deferreds?
> Well. I'd replace the @task in your NDB thing with @inlineCallbacks
> and call it a day. ;)
> (I think there's enough deferred examples above, and I'm getting tired
> and it's been a day since I started writing this damned email.)

No problem. Same here. :-)

>>> For that stuff, you'd have to speak to the main authors of Twisted.
>>> I'm just a twisted user. :(
>> They seem to be mostly ignoring this conversation, so your standing in
>> as a proxy for them is much appreciated!
> Well. We are on Python-Ideas... :(

Somehow we got Itamar and Glyph to join, so I think we're covered!

>>> In the end it really doesn't matter what API you go with. The Twisted
>>> people will wrap it up so that they are compatible, as far as that is
>>> possible.
>> And I want to ensure that that is possible and preferably easy, if I
>> can do it without introducing too many warts in the API that
>> non-Twisted users see and use.
> I probably lack the expertise to help too much with this. I can point
> out anything that sticks out, if/when an extended futures proposal is
> made.

You've done great in increasing my understanding of Twisted and
Deferred. Thank you very much!

--Guido van Rossum (

From ethan at  Sun Oct 14 17:48:53 2012
From: ethan at (Ethan Furman)
Date: Sun, 14 Oct 2012 08:48:53 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<k4pr17$i94$>
	<>	<>	<>
	<k4q3o6$m9q$>	<>	<>	<>	<>	<>	<>	<>	<>	<>
Message-ID: <>

Antoine Pitrou wrote:
> On Sun, 14 Oct 2012 07:50:06 -0700
> Ethan Furman <ethan at> wrote:
>> Steven D'Aprano wrote:
>>> On 14/10/12 22:03, Antoine Pitrou wrote:
>>>> On Sun, 14 Oct 2012 21:48:59 +1100
>>>> Steven D'Aprano<steve at> wrote:
>>>>> Ah, I wondered if anyone else had picked up on that. When I read the 
>>>>> PEP,
>>>>> I was concerned about the mental conflict between iteration and 
>>>>> indexing
>>>>> of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does
>>>>> something completely different from iterating over p directly.
>>>> p[0] p[1] etc. are just TypeErrors:
>>> Ah, my mistake... I didn't register that you sequenced over the parts
>>> attribute, not the path itself. Sorry for the noise.
>> I actually prefer Steven's interpretation. If we are going to iterate 
>> directly on a path object, we should be yeilding the pieces of the path 
>> object.
>> After all, a path can contain a file name (most of mine do) and 
>> what sense does it make to iterate over the children of 
>> /usr/home/ethanf/some_table.dbf?
> Well, given that:
> 1. sequence access (including the iterator protocol) to the path's
> parts is already provided through the ".parts" property
> 2. it makes little sense to actually iterate over those parts (what
> operations are you going to do sequentially over '/', then 'home', then
> 'ethanf', etc.?)
> ... I think yielding the directory contents is a much more useful
> alternative when iterating over the path itself.
> Regards
> Antoine.
Useful, sure.  Still potentially confusing.  I'm perfectly happy with 
not allowing any default iteration at all.

What behavior can I expect with your Path implementation when I try to 
iterate over




From daniel.mcdougall at  Sun Oct 14 18:03:27 2012
From: daniel.mcdougall at (Daniel McDougall)
Date: Sun, 14 Oct 2012 12:03:27 -0400
Subject: [Python-ideas] The async API of the future: Some thoughts from
 an ignorant Tornado user
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 5:32 AM, Laurens Van Houtven <_ at> wrote:
>> import async # The API of the future ;)
>> async.async_call(retrieve_log_playback, settings, tws,
>> mechanism=multiprocessing)
>> # tws == instance of tornado.web.WebSocketHandler that holds the open
>> connection
> Is this a CPU-bound problem?

It depends on the host.  On embedded platforms (e.g. the BeagleBone)
it is more IO-bound than CPU bound (fast CPU but slow disk and slow
memory).  On regular x86 systems it is mostly CPU-bound.

>> * I should be able to choose the type of event loop/async mechanism
>> that is appropriate for the task:  For CPU-bound tasks I'll probably
>> want to use multiprocessing.  For IO-bound tasks I might want to use
>> threading.  For a multitude of tasks that "just need to be async" (by
>> nature) I'll want to use an event loop.
> Ehhh, maybe. This sounds like it confounds the tools for different use
> cases. You can quite easily have threads and processes on top of an event
> loop; that works out particularly nicely for processes because you still
> have to talk to your processes.
> Examples:
> twisted.internet.reactor.spawnProcess (local processes)
> twisted.internet.threads.deferToThread (local threads)
> ampoule (remote processes)
> It's quite easy to do blocking IO in a thread with deferToThread; in fact,
> that's how twisted's adbapi, an async wrapper to dbapi, works.

As I understand it, twisted.internet.reactor.spawnProcess is all about
spawning subprocesses akin to subprocess.Popen().  Also, it requires
writing a sophisticated ProcessProtocol.  It seems to be completely
unrelated and wickedly complicated.  The complete opposite of what I
would consider ideal for an asynchronous library since it is anything
but simple.

I mean, I could write a separate program to generate HTML playback
files from logs, spawn a subprocess in an asynchronous fashion, then
watch it for completion but I could do that with termio.Multiplex

deferToThread() does what one would expect but in many situations I'd
prefer something like deferToMultiprocessing().

>> * It should support publish/subscribe-style events (i.e. an event
>> dispatcher).  For example, the ability to watch a file descriptor or
>> socket for changes in state and call a function when that happens.
>> Preferably with the flexibility to define custom events (i.e don't
>> have it tied to kqueue/epoll-specific events).
> Like connectionMade, connectionLost, dataReceived etc?

Oh there's a hundred different ways to fire and catch events.  I'll
let the low-level async experts decide which is best.  Having said
that, it would be nice if the interface didn't use such
network-specific naming conventions.  I would prefer something more
generic.  It is fine if it uses sockets and whatnot in the background.

Dan McDougall - Chief Executive Officer and Developer
Liftoff Software ? Your flight to the cloud is now boarding.

From _ at  Sun Oct 14 18:11:38 2012
From: _ at (Laurens Van Houtven)
Date: Sun, 14 Oct 2012 18:11:38 +0200
Subject: [Python-ideas] The async API of the future: Some thoughts from
 an ignorant Tornado user
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 6:03 PM, Daniel McDougall <
daniel.mcdougall at> wrote:

> deferToThread() does what one would expect but in many situations I'd
> prefer something like deferToMultiprocessing().

Twisted sort of has that with ampoule. The main issue is that arbitrary
object serialization is pretty much impossible. Within threads, you
sidestep that issue completely; across processes, you have to do deal with
serialization, leading to the issues with pickle you've mentioned.

I would prefer something more generic.

So maybe something like is popular in JS, where you subscribe to events by
some string identifier? I personally use and like AngularJS' $broadcast,
$emit and $on -- quite nice, but depedant on a hierarchical structure that
seems to be missing here.

> --
> Dan McDougall - Chief Executive Officer and Developer
> Liftoff Software ? Your flight to the cloud is now boarding.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ericsnowcurrently at  Sun Oct 14 18:15:26 2012
From: ericsnowcurrently at (Eric Snow)
Date: Sun, 14 Oct 2012 10:15:26 -0600
Subject: [Python-ideas] yield from multiple iterables (was Re: The async API
 of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 14, 2012 8:42 AM, "Guido van Rossum" <guido at> wrote:
> Sadly it looks that
>   r = yield from (f1(), f2())
> ends up interpreting the tuple as the iterator, and you end up with
>   r = (f1(), f2())
> (i.e., a tuple of generators) rather than the desired
>  r = ((yield from f1()), (yield from f2()))

Didn't want this tangent to get lost to the async discussion.  Would it be
too late to make a change along these lines?  Would it be enough of an
improvement to be warranted?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From _ at  Sun Oct 14 18:18:52 2012
From: _ at (Laurens Van Houtven)
Date: Sun, 14 Oct 2012 18:18:52 +0200
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 5:53 PM, Guido van Rossum <guido at> wrote:

> A readable version of this could should not have to use lambdas.

In a lot of Twisted code, it happens with methods as callback methods,
something like:

d = self._doRPC(....)
d.addCallbacks(self._formatResponse, self._formatException)

That doesn't talk about gatherResults, but hopefully it makes the idea
clear. A lot of the legibility is dependant on making those method names
sensible, though. Our in-house style guide asks for limiting functions to
about ten lines, preferably half that. Works for us.

Another pattern that's frowned upon since it's a bit of an abuse of
decorator syntax, but I still like because it tends to make things easier
to read for inline callback definitions where you do need more than a

d = somethingThatHappensLater()

def whenItsDone(result):

--Guido van Rossum (

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Oct 14 18:54:54 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 09:54:54 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 8:01 AM, Calvin Spealman <ironfroggy at> wrote:
> Why is subclassing a problem? It can be overused, but seems the right
> thing to do in this case. You want a protocol that responds to new data by
> echoing and tells the user when the connection was terminated? It makes
> sense that this is a subclass: a special case of some class that handles the
> base behavior.

I replied to this in detail on the "Twisted and Deferreds" thread in
an exchange. Summary: I'm -0 when it comes to subclassing protocol
classes; -1 on subclassing objects that implement significant

> What if this was just an optional way and we could also provide a helper to
> attach handlers to the base class instance without subclassing it? The function
> registering it could take keyword arguments mapping additional event->callbacks
> to the object.

Yeah, there are many APIs that we could offer. We just have to offer
one that's general enough so that people who prefer other styles can
implement their preferred style in a library.

--Guido van Rossum (

From guido at  Sun Oct 14 19:15:27 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 10:15:27 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 12, 2012 at 9:52 PM, Ben Darnell <ben at> wrote:
> First of all, to clear up the terminology, edge-triggered actually has
> a specific meaning in this context that is separate from the question
> of whether callbacks are used more than once. The edge- vs
> level-triggered question is moot with one-shot callbacks, but when
> you're reusing callbacks in edge-triggered mode you won't get a second
> call until you've drained the socket buffer and then it becomes
> readable again.  This turns out to be helpful for hybrid
> event/threaded systems, since the network thread may go into the next
> iteration of its loop while the worker thread is still consuming the
> data from a previous event.

Yeah, sorry for contributing to the confusion here! Glyph cleared it up for me.

> You can't always emulate edge-triggered behavior since it needs
> knowledge of internal socket buffers (epoll has an edge-triggered mode
> and I think kqueue does too, but you can't get edge-triggered behavior
> if you're falling back to select()).  However, you can easily get
> one-shot callbacks from an event loop with persistent callbacks just
> by unregistering the callback once it has received an event.  This has
> a performance cost, though - in tornado we try to avoid unnecessary
> unregister/register pairs.

We should do be careful to support all this in our event loop design,
without necessarily offering two ways of doing everything -- the event
loop should be at liberty to use the most efficient strategy for the
platform. (If that depends on what sort of I/O the user is interested
in, we should be sure that that information reaches the event loop
too. I like the idea more and more of an IO object that encapsulates a
socket or other event source, using predefined subclasses for each
type that is relevant to the platform.

>> I'm not at all familiar with the Twisted reactor interface. My own
>> design would be along the following lines:
>> - There's an abstract Reactor class and an abstract Async I/O object
>> class. To get a reactor to call you back, you must give it an I/O
>> object, a callback, and maybe some more stuff. (I have gone back and
>> like passing optional args for the callback, rather than requiring
>> lambdas to create closures.) Note that the callback is *not* a
>> designated method on the I/O object! In order to distinguish between
>> edge-triggered and level-triggered, you just use a different reactor
>> method. There could also be a reactor method to schedule a "bare"
>> callback, either after some delay, or immediately (maybe with a given
>> priority), although such functionality could also be implemented
>> through magic I/O objects.
> One reason to have a distinct method for running a bare callback is
> that you need to have some thread-safe entry point, but you otherwise
> don't really want locking on all the internal methods.  Tornado's
> IOLoop.add_callback and Twisted's Reactor.callFromThread can be used
> to run code in the IOLoop's thread (which can then call the other
> IOLoop methods).

That's an important use case to support.

> We also have distinct methods for running a callback after a timeout,
> although if you had a variant of add_handler that didn't require a
> subsequent call to remove_handler you could probably do timeouts using
> a magical IO object. (an additional subtlety for the time-based
> methods is how time is computed.  I recently added support in tornado
> to optionally use time.monotonic instead of time.time)

>> - In systems supporting file descriptors, there's a reactor
>> implementation that knows how to use select/poll/etc., and there are
>> concrete I/O object classes that wrap file descriptors. On Windows,
>> those would only be socket file descriptors. On Unix, any file
>> descriptor would do. To create such an I/O object you would use a
>> platform-specific factory. There would be specialized factories to
>> create e.g. listening sockets, connections, files, pipes, and so on.
> Jython is another interesting case - it has a select() function that
> doesn't take integer file descriptors, just the opaque objects
> returned by socket.fileno().


> While it's convenient to have higher-level constructors for various
> specialized types, I'd like to emphasize that having the low-level
> interface is important for interoperability.  Tornado doesn't know
> whether the file descriptors are listening sockets, connected sockets,
> or pipes, so we'd just have to pass in a file descriptor with no other
> information.

Yeah, the IO object will still need to have a fileno() method.

>> - In systems like App Engine that don't support async I/O on file
>> descriptors at all, the constructors for creating I/O objects for disk
>> files and connection sockets would comply with the interface but fake
>> out almost everything (just like today, using httplib or httplib2 on
>> App Engine works by adapting them to a "urlfetch" RPC request).
> Why would you be allowed to make IO objects for sockets that don't
> work?  I would expect that to just raise an exception.  On app engine
> RPCs would be the only supported async I/O objects (and timers, if
> those are implemented as magic I/O objects), and they're not
> implemented in terms of sockets or files.

Here's my use case. Suppose in general one can use async I/O for disk
files, and it is integrated with the standard (abstract) event loop.
So someone writes a handy templating library that wants to play nice
with async apps, so it uses the async I/O idiom to read e.g. the
template source code. Support I want to use that library on App
Engine. It would be a pain if I had to modify that template-reading
code to not use the async API. But (given the right async API!) it
would be pretty simple for the App Engine API to provide a mock
implementation of the async file reading API that was synchronous
under the hood. Yes, it would block while waiting for disk, but App
Engine uses threads anyway so it wouldn't be a problem.

Another, current-day, use case is the httplib interface in the stdlib
(a fairly fancy HTTP/1.1 client, although it has its flaws). That's
based on sockets, which App Engine doesn't have; we have a "urlfetch"
RPC that you give a URL (and more optional stuff) and returns a record
containing the contents and headers. But again, many useful 3rd party
libraries use httplib, and they won't work unless we somehow support
httplib. So we have had to go out of our way to cover most uses of
httplib. While the app believes it is opening the connection and
sending the request, we are actually just buffering everything; and
when the app starts reading from the connection, we make the urlfetch
RPC and buffer the response, which we then feed back to the app as it
believes it is reading from the socket. As long as the app doesn't try
to get the socket's file descriptor and call select() it will work

But some libraries *do* call select(), and here our emulation breaks
down. It would be nicer if the standard way to do async stuff was
higher level than select(), so that we could offer the emulation at a
level that would integrate with the event loop -- that way, ideally
when we have to send the urlfetch RPC we could actually return a
Future (or whatever), and the task would correctly be suspended, just
*thinking* it was waiting for the response on a socket, but actually
waiting for the RPC.

Hopefully SSL provides another use case.

--Guido van Rossum (

From jeanpierreda at  Sun Oct 14 19:26:16 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Sun, 14 Oct 2012 13:26:16 -0400
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 11:53 AM, Guido van Rossum <guido at> wrote:
>> My experience has been unfortunately rather devoid of deferreds in
>> Twisted. I always feel like the odd one out when people discuss this
>> confusion. For me, it was all Protocol this and Protocol that, and
>> deferreds only came up when I used Twisted's great AMP (Asynchronous
>> Messaging Protocol) library.
> Especially odd since you jumped into the discussion when I called
> Deferreds a bad name. :-)

Did I mention how great AMP was? ;)

> I'm sorry, but that's not very readable at all. You needed a lambda
> (which if there was anything more would have to be expanded using
> 'def') and you're cheating by passing print as a callable (which saves
> you a second lambda, but only in this simple case).
> A readable version of this could should not have to use lambdas.

Sure. I probably erred in not using inlineCallbacks form, what I
wanted to do was highlight the gatherResults function (which, as it
happens, does something generators can't without invoking an external

My worry here was that generators are being praised for being more
readable, which is true and reasonable, but I don't know that they're
flexible enough to be the only way to do things. But you've stated now
that you'd want futures to be there too, so... those are probably
mostly flexible enough.

>> Egh. I mean, sure, supposed we have those things. But what if you want
>> to send the result of a callback to a generator-coroutine? Presumably
>> generator coroutines work by yielding deferreds and being called back
>> when the future resolves (deferred fires).
> No, they don't use deferreds. They use Futures. You've made it quite
> clear that they are very different.

Haha, different in API and what they can do, but they are meant to do
the same thing (represent delayed results). I meant to talk about
futures and deferreds equally, and ask the same questions of both of

>> But if those
>> futures/deferreds aren't unexposed, and instead only the generator
>> stuff is exposed, then bridging the gap between callbacks and
>> generator-coroutines is impossible. So every callback function has to
>> also be defined to use something else. And worse, other APIs using
>> callbacks are left in the dust.
> My plan is to expose the Futures *will* be exposed -- this is what
> worked well in NDB.

OK. I was confused when you said there would only be generators and
simple callbacks (and so I posed questions about what happens when you
have just generators, which you took to be questions aimed at Greg
Ewing's thing.)

> How about this:
>   f = <some future>
>   reactor.timer(10, f.set_result, None)
> Then whoever waits for f gets woken up in 10 seconds, and the reactor
> doesn't have to know what Futures are.

I know that Twisted has historically agreed with the idea that the
reactor shouldn't know about futures/deferreds. I'm not sure I agree
it's so important. If the universal way of writing asynchronous code
is generator-coroutines, then the reactor should work well with this
and not require extra effort.

> But I believe your whole argument may be based on a misreading of my
> proposal. *I* want plain callbacks, Futures, and coroutines, and an
> event loop that only knows about plain callbacks and IO objects (e.g.
> sockets).

You're correct.

>> Now here's another thing: suppose we have a list of "deferred events",
>> but instead of handling all 10 at once, we want to handle them "as
>> they arrive", and then synthesize a result at the bottom. How do you
>> do this with pure generator coroutines?
> Let's ask Greg that.

I meant to be asking about the situation you were proposing. I thought
it was just callbacks and generators, now we've added futures. Futures
sans chaining can definitely implement this, just maybe not as nicely
as how I'd do it.

The issue is that it's a reasonable thing to want to escape the
generator system in order to implement things that aren't "linear" the
way generator coroutines are. And if we escape the system, it should
be possible and easy to do a large variety of things.

But, on the plus side, I'm convinced that it's possible, and that the
necessary things will be exposed (even if it's very unpleasant,
there's always helper functions...).

Unless you do Greg's thing, then I'm worried again. I will read his
stuff later today or tomorrow.

(Unrelated: I'm not sure why I was so sure UnorderedEventList had to
be that ugly. It can use a for loop... oops.)

> The thing that worries me most is reimplementing httplib, urllib and
> so on to use all this new machinery *and* keep the old synchronous
> APIs working *even* if some code is written using the old style and
> some other code wants to use the new style.

(We're now deviating from futures and deferreds, but I think the part
I was taking was drawing to a close anyway)

Code that wants to use the old style can be integrated by calling it
in a separate thread, and that's fine. If the results should be used
in the asynchronous code, then have a thing that integrates with
threading so that when the thread returns (or fails with an exception)
it can notify a future/deferred of the outcome. Twisted's has
deferToThread for this. It also has blockingCallFromThread if the
synchronous code wants to talk back to the asynchronous code. And that
leads me to this:

Imagine if, instead of having two implementations (one synchronous,
one not), we had only one (asynchronous), and then had some wrappers
to make it work as a synchronous implementation as well?

Here is an example of a synchronous program written in Python+Twisted,
where I wrap deferlater to be a blocking function (so that it is
similar to a time.sleep() followed by a function call).

The reactor is started in a separate thread, and is left to die
whenever the main thread dies (because thread daemons yay.)

    from __future__ import print_function
    import threading
    from twisted.internet import task, reactor
    from twisted.internet.threads import blockingCallFromThread

    def my_deferlater(reactor, time, callback, *args, **kwargs):
        return blockingCallFromThread(reactor,
            task.deferLater, reactor, time, callback, *args, **kwargs)

    # in reality, global reactor for all threads is terrible idea.
    # We'd want to instantiate a new reactor for
    # the reactor thread, and have a global REACTOR as well.
    # We'll just use this reactor.
    # This code will not work with any other twisted
    # code because of the global reactor shenanigans.

    # (But it'd work if we were able to have a reactor per thread.)


    def start_reactor():
        global REACTOR_THREAD
        if REACTOR_THREAD is not None:
            # could be an error, or not, depending on how you feel
this should be.

        REACTOR_THREAD = threading.Thread(,
                # signal handlers don't work if not in main thread.
        REACTOR_THREAD.daemon = True # Probably really evil.


    my_deferlater(reactor, 1, print, "This will print after 1 second!")
    my_deferlater(reactor, 1, print, "This will print after 2 seconds!")
    my_deferlater(reactor, 1, print, "This will print after 3 seconds!")

So maybe this is an option? It's really important that there not be
just one global reactor, and that multiple reactors can run at the
same time, for this to really work. But if that were done, then you
could have a single global reactor responsible for being the back end
of the new implementations of old synchronous APIs. Maybe it'd be
started whenever the first call is made to a synchronous function. And
maybe, to interoperate with some actual asynchronous code, you could
have a way to change which reactor acts as the global reactor for
synchronous APIs?

I did this once, because I needed to rewrite a blocking API and wanted
to use Twisted, except that I made the mistake of starting the thread
when the module was created instead of on first call. This lead to a
deadlock because of the global import lock... :(  In principle I don't
know why this would be a terrible awful idea, if it was done right,
but maybe people with more experiences with threaded code can correct

(The whole thread daemon thing necessary to make it act like a
synchronous program, might be terribly insane and therefore an idea
killer. I'm not sure.)

I'm under the understanding that the global import lock won't cause
this particular issue anymore as of Python 3.3, so perhaps starting a
reactor on import is reasonable.

> There's also a certain order to them, right? I'd think the state
> transition diagram is something like
>   connectionMade (1); dataReceived (*); connectionLost (1)
> I wonder if there are any guarantees that they will only be called in
> this order, and who is supposed to enforce this? If would be awkward
> if the user code would have to guard itself against this; also if the
> developer made an unwarranted assumption (e.g. dataReceived is called
> at least once).

The docs in Twisted don't spell it out, but they do say that
connectionMade should be considered to be the initializer for the
connection, and that upon connectionLost the one should let the
protocol be garbage collected. So, that seems like a guarantee that
they are called in that order.

I don't think it can really be enforced in Python (unless you want to
do some jiggery pokery into model checking at runtime), but the
responsibility for this failing in Twisted would be on the transport,
as far as I understand it. If the transport calls back to the protocol
in some invalid combination, it's the transport's fault for being

This is something that should be clearly documented. (It's an issue,
also, regardless of whether or not a class is used to encapsulate the
callbacks, or whether they are registered individually.)

-- Devin

From tjreedy at  Sun Oct 14 19:27:34 2012
From: tjreedy at (Terry Reedy)
Date: Sun, 14 Oct 2012 13:27:34 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <k5esm7$lkf$>

On 10/14/2012 10:36 AM, Guido van Rossum wrote:

> So, can par() be as simple as
> def par(*args):
>    results = []
>    for task in args:
>      result = yield from task
>      results.append(result)
>    return results
> ???
> Or does it need to interact with the scheduler to ensure fairness?
> (Not having built one of these, my intuition for how the primitives
> fit together is still lacking, so excuse me for asking naive
> questions.)
> Of course there's the question of what to do when one of the tasks
> raises an error -- I haven't quite figured that out in NDB either, it
> runs all the tasks to completion but the caller only sees the first
> exception. I briefly considered having an "multi-exception" but it
> felt too weird -- though I'm not married to that decision.

One answer is to append the exception object to results and let the 
requesting code sort out what to do.

def par(*args):
    results = []
    for task in args:
          result = yield from task
       except Exception as exc:
    return results

Terry Jan Reedy

From guido at  Sun Oct 14 19:33:55 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 10:33:55 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 9:15 AM, Eric Snow <ericsnowcurrently at> wrote:
> On Oct 14, 2012 8:42 AM, "Guido van Rossum" <guido at> wrote:
>> Sadly it looks that
>>   r = yield from (f1(), f2())
>> ends up interpreting the tuple as the iterator, and you end up with
>>   r = (f1(), f2())
>> (i.e., a tuple of generators) rather than the desired
>>  r = ((yield from f1()), (yield from f2()))
> Didn't want this tangent to get lost to the async discussion.  Would it be
> too late to make a change along these lines?  Would it be enough of an
> improvement to be warranted?

3.3 has been released. It's too late. Also I'm not sure what change
*could* be made. Surely yield from <a function returning a tuple>
should just iterate over that tuple -- that's fundamental to yield
from. The only thing that could be done might be to change "yield from
x, y" to mean something different than "yield from (x, y)" -- but
that's questionable at best, and violates many other contexts (e.g.
"return x, y", "yield x, y", "for i in x, y:").

--Guido van Rossum (

From guido at  Sun Oct 14 19:39:59 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 10:39:59 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 9:18 AM, Laurens Van Houtven <_ at> wrote:
> On Sun, Oct 14, 2012 at 5:53 PM, Guido van Rossum <guido at> wrote:
>> A readable version of this could should not have to use lambdas.
> In a lot of Twisted code, it happens with methods as callback methods,
> something like:
> d = self._doRPC(....)
> d.addCallbacks(self._formatResponse, self._formatException)
> d.addCallback(self._finish)
> That doesn't talk about gatherResults, but hopefully it makes the idea
> clear. A lot of the legibility is dependant on making those method names
> sensible, though. Our in-house style guide asks for limiting functions to
> about ten lines, preferably half that. Works for us.

I quite understand that in your ecosystem you've found best practices
for every imaginable use case. And I understand that once you're part
of the community and have internalized the idioms and style, it's
quite readable. But you haven't shaken my belief that we can do better
with the current version of the language (3.3).

(FWIW, I think it would be a good idea to develop a "reference
implementation" of many of these ideas outside the standard library.
Depending on whether we end up adopting yield <future> or yield from
<generator> it might even support versions of Python 3 before 3.3. I
certainly don't want to have to wait for 3.4 -- although that's the
first opportunity for incorporating it into the stdlib.)

--Guido van Rossum (

From guido at  Sun Oct 14 19:42:25 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 10:42:25 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <k5esm7$lkf$>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 10:27 AM, Terry Reedy <tjreedy at> wrote:
> On 10/14/2012 10:36 AM, Guido van Rossum wrote:
>> So, can par() be as simple as
>> def par(*args):
>>    results = []
>>    for task in args:
>>      result = yield from task
>>      results.append(result)
>>    return results
>> ???
>> Or does it need to interact with the scheduler to ensure fairness?
>> (Not having built one of these, my intuition for how the primitives
>> fit together is still lacking, so excuse me for asking naive
>> questions.)
>> Of course there's the question of what to do when one of the tasks
>> raises an error -- I haven't quite figured that out in NDB either, it
>> runs all the tasks to completion but the caller only sees the first
>> exception. I briefly considered having an "multi-exception" but it
>> felt too weird -- though I'm not married to that decision.
> One answer is to append the exception object to results and let the
> requesting code sort out what to do.
> def par(*args):
>    results = []
>    for task in args:
>       try:
>          result = yield from task
>          results.append(result)
>       except Exception as exc:
>          results.append(exc)
>    return results

But then the caller would have to sort through the results and check
for exceptions. I want the caller to be able to use try/except as

So far the best I've come up with is to recommend that if you care
about distinguishing multiple exceptions, use separate yields
surrounded by separate try/except blocks. Note that the tasks can
still run concurrently, just create all the futures before doing the
first yield.

--Guido van Rossum (

From solipsis at  Sun Oct 14 19:53:26 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 14 Oct 2012 19:53:26 +0200
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
References: <>
	<k4pr17$i94$> <>
	<> <k4q3o6$m9q$>
	<> <>
Message-ID: <>

On Sun, 14 Oct 2012 08:48:53 -0700
Ethan Furman <ethan at> wrote:
> What behavior can I expect with your Path implementation when I try to 
> iterate over
> /usr/home/ethanf/some_table.dbf

>>> p = Path('')
>>> list(p)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "./", line 1176, in __iter__
    for name in self._accessor.listdir(self):
  File "./", line 455, in wrapped
    return strfunc(str(pathobj), *args)
NotADirectoryError: [Errno 20] Not a directory: ''



From jstpierre at  Sun Oct 14 19:55:38 2012
From: jstpierre at (Jasper St. Pierre)
Date: Sun, 14 Oct 2012 13:55:38 -0400
Subject: [Python-ideas] The async API of the future: Twisted and
Message-ID: <>

(Sorry if this is in the wrong place, I'm joining the conversation and
I'm not sure where mailman will put it)

> Alternatively, yielding a future (or whatever ones calls the objects
> returned by *_async()) could register *and* wait for the result.  To
> register without waiting one would yield a wrapper for the future.  So
> one could write

What would registering a Future do? As far as I understood it, the
plan here is that a Future was just a marker for an outstanding

    def callback(result):
        print "The result was", result

    def say_hello(name):
        f = Future()
        f.resolve("Hello, %s!")
        return f

    f = say_hello("Jeff")

The outstanding request doesn't have to care about socket connections;
it's just a way to pass around a result that hasn't arrived yet. This
is pretty much the same as Deferreds/Promises, with a different name.
There's no reactor here to register here, because there doesn't need
to be one.


From guido at  Sun Oct 14 20:17:51 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 11:17:51 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 10:55 AM, Jasper St. Pierre
<jstpierre at> wrote:
> (Sorry if this is in the wrong place, I'm joining the conversation and
> I'm not sure where mailman will put it)
>> Alternatively, yielding a future (or whatever ones calls the objects
>> returned by *_async()) could register *and* wait for the result.  To
>> register without waiting one would yield a wrapper for the future.  So
>> one could write
> What would registering a Future do? As far as I understood it, the
> plan here is that a Future was just a marker for an outstanding
> request:
>     def callback(result):
>         print "The result was", result
>     def say_hello(name):
>         f = Future()
>         f.resolve("Hello, %s!")
>         return f
>     f = say_hello("Jeff")
>     f.add_callback(callback)
> The outstanding request doesn't have to care about socket connections;
> it's just a way to pass around a result that hasn't arrived yet. This
> is pretty much the same as Deferreds/Promises, with a different name.
> There's no reactor here to register here, because there doesn't need
> to be one.

The Future class itself probably shouldn't interface with the event
loop. But an operation that creates and returns a Future certainly

--Guido van Rossum (

From jstpierre at  Sun Oct 14 20:19:50 2012
From: jstpierre at (Jasper St. Pierre)
Date: Sun, 14 Oct 2012 14:19:50 -0400
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 2:17 PM, Guido van Rossum <guido at> wrote:
> On Sun, Oct 14, 2012 at 10:55 AM, Jasper St. Pierre
> <jstpierre at> wrote:
>> (Sorry if this is in the wrong place, I'm joining the conversation and
>> I'm not sure where mailman will put it)
>>> Alternatively, yielding a future (or whatever ones calls the objects
>>> returned by *_async()) could register *and* wait for the result.  To
>>> register without waiting one would yield a wrapper for the future.  So
>>> one could write
>> What would registering a Future do? As far as I understood it, the
>> plan here is that a Future was just a marker for an outstanding
>> request:
>>     def callback(result):
>>         print "The result was", result
>>     def say_hello(name):
>>         f = Future()
>>         f.resolve("Hello, %s!")
>>         return f
>>     f = say_hello("Jeff")
>>     f.add_callback(callback)
>> The outstanding request doesn't have to care about socket connections;
>> it's just a way to pass around a result that hasn't arrived yet. This
>> is pretty much the same as Deferreds/Promises, with a different name.
>> There's no reactor here to register here, because there doesn't need
>> to be one.
> The Future class itself probably shouldn't interface with the event
> loop. But an operation that creates and returns a Future certainly
> can.

Of course, but that wouldn't be done at the Future level, but at the
fetch_async level. I just want to make sure that we're clear that the
Future itself isn't being registered with any event loop or reactor.

> --
> --Guido van Rossum (


From guido at  Sun Oct 14 20:21:01 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 11:21:01 -0700
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 11:19 AM, Jasper St. Pierre
<jstpierre at> wrote:
> On Sun, Oct 14, 2012 at 2:17 PM, Guido van Rossum <guido at> wrote:
>> The Future class itself probably shouldn't interface with the event
>> loop. But an operation that creates and returns a Future certainly
>> can.
> Of course, but that wouldn't be done at the Future level, but at the
> fetch_async level. I just want to make sure that we're clear that the
> Future itself isn't being registered with any event loop or reactor.

Of course.

--Guido van Rossum (

From ironfroggy at  Sun Oct 14 20:46:49 2012
From: ironfroggy at (Calvin Spealman)
Date: Sun, 14 Oct 2012 14:46:49 -0400
Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 1:54 PM, Laurens Van Houtven <_ at> wrote:
> On Sat, Oct 13, 2012 at 1:22 AM, Guido van Rossum <guido at> wrote:
>> [Hopefully this is the last spin-off thread from "asyncore: included
>> batteries don't fit"]
>> So it's totally unfinished?
> At the time, the people I talked to placed significantly more weight in
> "explain why this is necessary" than "get me something I can play with".
>> > Do you feel that there should be less talk about rationale?
>> No, but I feel that there should be some actual specification. I am
>> also looking forward to an actual meaty bit of example code -- ISTR
>> you mentioned you had something, but that it was incomplete, and I
>> can't find the link.
> Just examples of how it would work, nothing hooked up to real code. My
> memory of it is more of a drowning-in-politics-and-bikeshedding kind of
> thing, unfortunately :) Either way, I'm okay with letting bygones be bygones
> and focus on how we can get this show on the road.
>> > It's not that there's *no* reference to IO: it's just that that
>> > reference is
>> > abstracted away in data_received and the protocol's transport object,
>> > just
>> > like Twisted's IProtocol.
>> The words "data_received" don't even occur in the PEP.
> See above.
> What thread should I reply in about the pull APIs?
>> I just want to make sure that we don't *completely* paint ourselves into
>> the wrong corner when it comes to that.
> I don't think we have to worry about it too much. Any reasonable API I can
> think of makes this completely doable.
>> But I'm really hoping you'll make good on your promise of redoing
>> async-pep, giving some actual specifications and example code, so I
>> can play with it.
> Takeaways:
> - The async API of the future is very important, and too important to be
> left to chance.

Could not agree more.

> - It requires a lot of very experienced manpower.

I'm sitting on the sidelines, wishing I had much of either, because of
point number 1.

> - It requires a lot of effort to handle the hashing out of it (as we're
> doing here) as well as it deserves to be.
> I'll take as proactive a role as I can afford to take in this process, but I
> don't think I can do it by myself. Furthermore, it's a risk nobody wants to
> take: a repeat performance wouldn't be good for anyone, in particular not
> for Python nor myself.
> I've asked JP Calderone and Itamar Turner-Trauring if they would be
> interested in carrying this forward professionally, and they have
> tentatively said yes. JP's already familiar with a large part of the problem
> space with the implementation of the ssl module. JP and Itamar have worked
> together for years and have recently set up a consulting firm.
> Given that this is emphatically important to Python, I intend to apply for a
> PSF grant on their behalf to further this goal. Given their experience in
> the field, I expect this to be a fairly low risk endeavor.

I like this idea. There are some problems spare time isn't enough to solve.

I can't think of many people as qualified for the task.

>> --
>> --Guido van Rossum (
> --
> cheers
> lvh
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From steve at  Sun Oct 14 21:14:19 2012
From: steve at (Steven D'Aprano)
Date: Mon, 15 Oct 2012 06:14:19 +1100
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On 15/10/12 03:15, Eric Snow wrote:
> On Oct 14, 2012 8:42 AM, "Guido van Rossum"<guido at>  wrote:
>> Sadly it looks that
>>    r = yield from (f1(), f2())
>> ends up interpreting the tuple as the iterator, and you end up with
>>    r = (f1(), f2())
>> (i.e., a tuple of generators) rather than the desired
>>   r = ((yield from f1()), (yield from f2()))

How about this?

r = yield from *(f1(), f2())

which currently is a SyntaxError in 3.3.


From guido at  Sun Oct 14 21:19:04 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 12:19:04 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 12:14 PM, Steven D'Aprano <steve at> wrote:
> On 15/10/12 03:15, Eric Snow wrote:
>> On Oct 14, 2012 8:42 AM, "Guido van Rossum"<guido at>  wrote:
>>> Sadly it looks that
>>>    r = yield from (f1(), f2())
>>> ends up interpreting the tuple as the iterator, and you end up with
>>>    r = (f1(), f2())
>>> (i.e., a tuple of generators) rather than the desired
>>>   r = ((yield from f1()), (yield from f2()))
> How about this?
> r = yield from *(f1(), f2())
> which currently is a SyntaxError in 3.3.

I think it's too early to start proposing new syntax for a problem we
don't even know is common at all.

Greg Ewing's proposal works for me:

  r = yield from par(f1(), f2())

--Guido van Rossum (

From ethan at  Sun Oct 14 21:16:28 2012
From: ethan at (Ethan Furman)
Date: Sun, 14 Oct 2012 12:16:28 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>	<>	<>	<>	<k4pr17$i94$>
	<>	<>	<>
	<k4q3o6$m9q$>	<>	<>	<>	<>	<>	<>	<>	<>	<>
	<>	<>	<>
Message-ID: <>

Antoine Pitrou wrote:
> On Sun, 14 Oct 2012 08:48:53 -0700
> Ethan Furman <ethan at> wrote:
>> What behavior can I expect with your Path implementation when I try to 
>> iterate over
>> /usr/home/ethanf/some_table.dbf
>>>> p = Path('')
>>>> list(p)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "./", line 1176, in __iter__
>     for name in self._accessor.listdir(self):
>   File "./", line 455, in wrapped
>     return strfunc(str(pathobj), *args)
> NotADirectoryError: [Errno 20] Not a directory: ''

Certainly reasonable, and the same behavior I would expect of, e.g., 
p.children(). I guess it just feels too magical to me.

-1 for built-in iteration.

+1 for a .children() (or other) method.


From tjreedy at  Sun Oct 14 21:38:20 2012
From: tjreedy at (Terry Reedy)
Date: Sun, 14 Oct 2012 15:38:20 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <k5f4bc$f88$>

On 10/14/2012 1:42 PM, Guido van Rossum wrote:
 > On Sun, Oct 14, 2012 at 10:27 AM, Terry Reedy <tjreedy at> wrote:
 >> On 10/14/2012 10:36 AM, Guido van Rossum wrote:
 >>> Of course there's the question of what to do when one of the tasks
 >>> raises an error -- I haven't quite figured that out in NDB either, it
 >>> runs all the tasks to completion but the caller only sees the first
 >>> exception. I briefly considered having an "multi-exception" but it
 >>> felt too weird -- though I'm not married to that decision.

 >> One answer is to append the exception object to results and let the
 >> requesting code sort out what to do.
 >> def par(*args):
 >>     results = []
 >>     for task in args:
 >>        try:
 >>           result = yield from task
 >>           results.append(result)
 >>        except Exception as exc:
 >>           results.append(exc)
 >>     return results
 > But then the caller would have to sort through the results and check
 > for exceptions. I want the caller to be able to use try/except as
 > well.

OK. Then ...

   def par(*args):
      results = []
      exceptions = False
      for task in args:
            result = yield from task
         except Exception as exc:
            exceptions = True
      if not exceptions:
         return results
         exc = MultiXException()
         exc.results = results
         raise exc

Is this is what you meant by 'multi-exception'?


    results = <whatever>
    <process results, perhaps by iterating thru them, knowing all 
represent successed>
except MultiXException as exc:

From mwm at  Sun Oct 14 21:57:38 2012
From: mwm at (Mike Meyer)
Date: Sun, 14 Oct 2012 14:57:38 -0500
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, 14 Oct 2012 07:40:57 +0200
Yuval Greenfield <ubershmekel at> wrote:

> On Sun, Oct 14, 2012 at 2:04 AM, MRAB <python at> wrote:
> > If it's more than one codepoint, we could prefix with the length of the
> > codepoint's name:
> >
> > def __12CIRCLED_PLUS__(x, y):
> >     ...
> >
> >
> That's a bit impractical, and why reinvent the wheel? I'd much rather:
> def \u2295(x, y):
>     ....
> So readable I want to read it twice. And that's not legal python today so
> we don't break backwards compatibility!

Yes, but we're defining an operator for instances of the class, so it
needs the 'special' method marking:

def __\u2295__(self, other):

Now *that's* pretty!

Mike Meyer <mwm at>
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail -

From tismer at  Sun Oct 14 22:55:34 2012
From: tismer at (Christian Tismer)
Date: Sun, 14 Oct 2012 22:55:34 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>


On 14.10.12 21:19, Guido van Rossum wrote:
> On Sun, Oct 14, 2012 at 12:14 PM, Steven D'Aprano <steve at> wrote:
>> On 15/10/12 03:15, Eric Snow wrote:
>>> On Oct 14, 2012 8:42 AM, "Guido van Rossum"<guido at>  wrote:
>>>> Sadly it looks that
>>>>     r = yield from (f1(), f2())
>>>> ends up interpreting the tuple as the iterator, and you end up with
>>>>     r = (f1(), f2())
>>>> (i.e., a tuple of generators) rather than the desired
>>>>    r = ((yield from f1()), (yield from f2()))
>> How about this?
>> r = yield from *(f1(), f2())
>> which currently is a SyntaxError in 3.3.
> I think it's too early to start proposing new syntax for a problem we
> don't even know is common at all.
> Greg Ewing's proposal works for me:
>    r = yield from par(f1(), f2())

I'm not very positive about all I've read in the last 50 hours.

The concept of generators IMHO gets overly bent towards modelling
a sensible syntax for a problem that not even had a convincing
solution in a dialect that already has full coroutines.
'par' and 'join' and friends should be considered without thinking
of generators in the first place. This is attacking too many problems
in one shot.

My approach would be to first find out how async operations should
be modelled the best under the assumption that we have a coroutine
concept that works without headaches about yielding in and out from
something to whatnot.

After that is settled and gets consensus, then I would think about
bearable patterns to implement that using generators. And when we
know what we really need, maybe considering more suitable Syntax.

my 0.2 thousand yen - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From at  Sun Oct 14 23:06:44 2012
From: at (Joshua Landau)
Date: Sun, 14 Oct 2012 22:06:44 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 14 October 2012 20:57, Mike Meyer <mwm at> wrote:

> On Sun, 14 Oct 2012 07:40:57 +0200
> Yuval Greenfield <ubershmekel at> wrote:
> > On Sun, Oct 14, 2012 at 2:04 AM, MRAB <python at>
> wrote:
> >
> > > If it's more than one codepoint, we could prefix with the length of the
> > > codepoint's name:
> > >
> > > def __12CIRCLED_PLUS__(x, y):
> > >     ...
> > >
> > >
> > That's a bit impractical, and why reinvent the wheel? I'd much rather:
> >
> > def \u2295(x, y):
> >     ....
> >
> > So readable I want to read it twice. And that's not legal python today so
> > we don't break backwards compatibility!
> Yes, but we're defining an operator for instances of the class, so it
> needs the 'special' method marking:
> def __\u2295__(self, other):
> Now *that's* pretty!
>     <mike

I much preferred your first choice:
def __$?__(self, other):

But to keep the "$" unused we can write:
def __op_?__(self, other):
(new methods will take precedence over the older __add__ and so forth)

What we can do then is use the "\u" syntax to let people without unicode
editors have accessibility:
def __op_\u2295__(self, other):
...later in the code...
new = first \u2295 second

Which adds consistency whereas before we could only use that in
specific circumstances (inside strings), reducing cognitive burden.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Sun Oct 14 23:30:21 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 14:30:21 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <k5f4bc$f88$>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 12:38 PM, Terry Reedy <tjreedy at> wrote:
> On 10/14/2012 1:42 PM, Guido van Rossum wrote:
>> On Sun, Oct 14, 2012 at 10:27 AM, Terry Reedy <tjreedy at> wrote:
>>> On 10/14/2012 10:36 AM, Guido van Rossum wrote:
>>>> Of course there's the question of what to do when one of the tasks
>>>> raises an error -- I haven't quite figured that out in NDB either, it
>>>> runs all the tasks to completion but the caller only sees the first
>>>> exception. I briefly considered having an "multi-exception" but it
>>>> felt too weird -- though I'm not married to that decision.
>>> One answer is to append the exception object to results and let the
>>> requesting code sort out what to do.
>>> def par(*args):
>>>     results = []
>>>     for task in args:
>>>        try:
>>>           result = yield from task
>>>           results.append(result)
>>>        except Exception as exc:
>>>           results.append(exc)
>>>     return results
>> But then the caller would have to sort through the results and check
>> for exceptions. I want the caller to be able to use try/except as
>> well.
> OK. Then ...
>   def par(*args):
>      results = []
>      exceptions = False
>      for task in args:
>         try:
>            result = yield from task
>            results.append(result)
>         except Exception as exc:
>            results.append(exc)
>            exceptions = True
>      if not exceptions:
>         return results
>      else:
>         exc = MultiXException()
>         exc.results = results
>         raise exc
> Is this is what you meant by 'multi-exception'?


> caller:
> try:
>    results = <whatever>
>    <process results, perhaps by iterating thru them, knowing all represent
> successed>
> except MultiXException as exc:
>    errprocess(exc.results)

In NDB I have yet to encounter a situation where I care.

--Guido van Rossum (

From rene at  Sun Oct 14 23:55:53 2012
From: rene at (Rene Nejsum)
Date: Sun, 14 Oct 2012 23:55:53 +0200
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 14, 2012, at 9:22 PM, Guido van Rossum <guido at> wrote:

> On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum <rene at> wrote:
>> On the high level (Python) basically what you need is that the queue.get()
>> can handle:
>> 1) Python objects (as today)
>> 2) timeout (as today, maybe in mills instead of seconds)
>> 3) Network (socket input/state change)
>> 4) File desc input/state change
>> 5) Other I/O changes like serial comm, etc.
>> 6) Maybe also yield based coroutine support ?
>> This requires support from the underlaying
>> OS. A support which is probably not there today ?
>> As far as I can see, having this one extended queue.get() would nicely enable
>> all high level concurrency issues in Python.
> [...]
>> I believe a "super" queue.get() would solve all use cases.
>> I have no idea on how difficult it would be to implement in
>> a cross platform manner.
> Hm. I know that a common (and often right!) recommendation for thread
> communication is to use the queue module. But that module is meant to
> work with threads. I think that the correct I/O primitives are more
> likely to come by looking at what Tornado and Twisted have done than
> by trying to "pimp up" the queue module -- it's good for what it does,
> but trying to add all that new functionality to it doesn't sound like
> a good fit.

You are probably right about the queue class. Maybe it should be a new class,
but I still believe I would be an excellent fit for doing concurrent stuff if Python
had a multiplexer message queue, Python is high-level enough to be able to 
hide thread/select/read etc.

A while ago I implemented pyworks ( which
is a kind of Erlang implementation for Python, making objects concurrent and return
values Futures, without adding much new code. Methods are sent asynchronous, simply
by doing standard obj.method(). obj is a proxy for the real object sending method() as a
message to the real object running in a separate thread. Return value is a Future. So 
you can do

	val = obj.method()
	? continue async with method()
	? and do some other stuff, until:
	print val

which will hang waiting for the Future to complete, if it's not.

It has been used in a couple of projects, making it much easier to do concurrent systems.
But, it would be great if the object/task could wait for more events than queue.get()


> -- 
> --Guido van Rossum (

From guido at  Mon Oct 15 00:05:26 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 15:05:26 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 2:55 PM, Rene Nejsum <rene at> wrote:
> On Oct 14, 2012, at 9:22 PM, Guido van Rossum <guido at> wrote:
>> On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum <rene at> wrote:
>>> On the high level (Python) basically what you need is that the queue.get()
>>> can handle:
>>> 1) Python objects (as today)
>>> 2) timeout (as today, maybe in mills instead of seconds)
>>> 3) Network (socket input/state change)
>>> 4) File desc input/state change
>>> 5) Other I/O changes like serial comm, etc.
>>> 6) Maybe also yield based coroutine support ?
>>> This requires support from the underlaying
>>> OS. A support which is probably not there today ?
>>> As far as I can see, having this one extended queue.get() would nicely enable
>>> all high level concurrency issues in Python.
>> [...]
>>> I believe a "super" queue.get() would solve all use cases.
>>> I have no idea on how difficult it would be to implement in
>>> a cross platform manner.
>> Hm. I know that a common (and often right!) recommendation for thread
>> communication is to use the queue module. But that module is meant to
>> work with threads. I think that the correct I/O primitives are more
>> likely to come by looking at what Tornado and Twisted have done than
>> by trying to "pimp up" the queue module -- it's good for what it does,
>> but trying to add all that new functionality to it doesn't sound like
>> a good fit.
> You are probably right about the queue class. Maybe it should be a new class,
> but I still believe I would be an excellent fit for doing concurrent stuff if Python
> had a multiplexer message queue, Python is high-level enough to be able to
> hide thread/select/read etc.

I believe that the Twisted and Tornado event loops have APIs to push
work into a thread and/or process, and it will be a requirement for
the new stdlib event loop. However the main focus of the current
effort is not making the distinction between process, threads and
tasks (or microthreads or coroutines) disappear -- it is simply to
have the most useful API for tasks.

> A while ago I implemented pyworks ( which
> is a kind of Erlang implementation for Python, making objects concurrent and return
> values Futures, without adding much new code. Methods are sent asynchronous, simply
> by doing standard obj.method(). obj is a proxy for the real object sending method() as a
> message to the real object running in a separate thread. Return value is a Future. So
> you can do
>         val = obj.method()
>         ? continue async with method()
>         ? and do some other stuff, until:
>         print val
> which will hang waiting for the Future to complete, if it's not.

That sounds like implicit futures (to use the Wikipedia article's
terminology). I'm not a big fan of that. In fact, I'm proposing an API
where all task switching is explicit, using the yield keyword (or
yield from), and accessing the value of a future is also explicit in
such a system.

> It has been used in a couple of projects, making it much easier to do concurrent systems.
> But, it would be great if the object/task could wait for more events than queue.get()

I still think you're focused more on concurrent CPU activity than
async I/O. These are quire different fields, even though they often
use similar terminology (like future, task/thread/process,
concurrent/parallel, spawn/join, queue). I think the keyword that most
distinguishes them is "event". If you hear people talk about events
they are probably multiplexing I/O, not CPU activities.

--Guido van Rossum (

From python at  Mon Oct 15 00:08:25 2012
From: python at (MRAB)
Date: Sun, 14 Oct 2012 23:08:25 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-14 22:06, Joshua Landau wrote:
> On 14 October 2012 20:57, Mike Meyer <mwm at
> <mailto:mwm at>> wrote:
>     On Sun, 14 Oct 2012 07:40:57 +0200
>     Yuval Greenfield <ubershmekel at
>     <mailto:ubershmekel at>> wrote:
>      > On Sun, Oct 14, 2012 at 2:04 AM, MRAB <python at
>     <mailto:python at>> wrote:
>      >
>      > > If it's more than one codepoint, we could prefix with the
>     length of the
>      > > codepoint's name:
>      > >
>      > > def __12CIRCLED_PLUS__(x, y):
>      > >     ...
>      > >
>      > >
>      > That's a bit impractical, and why reinvent the wheel? I'd much
>     rather:
>      >
>      > def \u2295(x, y):
>      >     ....
>      >
>      > So readable I want to read it twice. And that's not legal python
>     today so
>      > we don't break backwards compatibility!
>     Yes, but we're defining an operator for instances of the class, so it
>     needs the 'special' method marking:
>     def __\u2295__(self, other):
>     Now *that's* pretty!
>          <mike
> I much preferred your first choice:
> def __$?__(self, other):
> But to keep the "$" unused we can write:
> def __op_?__(self, other):
> (new methods will take precedence over the older __add__ and so forth)
> What we can do then is use the "\u" syntax to let people without unicode
> editors have accessibility:
> def __op_\u2295__(self, other):
> ...later in the code...
> new = first \u2295 second
> Which adds consistency whereas before we could only use that in
> specific circumstances (inside strings), reducing cognitive burden.
I don't think we should change what happens inside a string literal.

Consider what would happen if you wanted to write "\\u0190". It would
convert that into "\?".

IIRC, Java can suffer from that kind of problem because \uXXXX is
treated as that codepoint wherever it occurs.

From ben at  Mon Oct 15 00:09:10 2012
From: ben at (Ben Darnell)
Date: Sun, 14 Oct 2012 15:09:10 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum <guido at> wrote:
>> So it would look something like
>> Yield-from:
>>    task1 = subtask1(args1)
>>    task2 = subtask2(args2)
>>    res1, res2 = yield from par(task1, task2)
>> where the implementation of par() is left as an exercise for
>> the reader.
> So, can par() be as simple as
> def par(*args):
>   results = []
>   for task in args:
>     result = yield from task
>     results.append(result)
>   return results
> ???
> Or does it need to interact with the scheduler to ensure fairness?
> (Not having built one of these, my intuition for how the primitives
> fit together is still lacking, so excuse me for asking naive
> questions.)

It's not just fairness, it needs to interact with the scheduler to get
any parallelism at all if the sub-generators have more than one step.

def task1():
  print "1A"
  print "1B"
  print "1C"
  # and so on...

def task2():
  print "2A"
  print "2B"
  print "2C"

def outer():
  yield from par(task1(), task2())

Both tasks are started immediately, but can't progress further until
they are yielded from to advance the iterator.  So with this version
of par() you get 1A, 2A, 1B, 1C..., 2B, 2C.  To get parallelism I
think you have to schedule each sub-generator separately instead of
just yielding from them (which negates some of the benefits of yield
from like easy error handling).

Even if there is a clever version of par() that works more like yield
from, you'd need to go back to explicit scheduling if you wanted
parallel execution without forcing everything to finish at the same
time (which is simple with Futures).

> Of course there's the question of what to do when one of the tasks
> raises an error -- I haven't quite figured that out in NDB either, it
> runs all the tasks to completion but the caller only sees the first
> exception. I briefly considered having an "multi-exception" but it
> felt too weird -- though I'm not married to that decision.

In general for this kind of parallel operation I think it's fine to
say that one (unspecified) exception is raised in the outer function
and the rest are hidden.  With futures, "(r1, r2) = yield (f1, f2)" is
just shorthand for "r1 = yield f1; r2 = yield f2", so separating the
yields to have separate try/except blocks is no problem.  WIth yield
from it's not as good because the second operation can't proceed while
the outer function is waiting for the first.


From ben at  Mon Oct 15 00:19:27 2012
From: ben at (Ben Darnell)
Date: Sun, 14 Oct 2012 15:19:27 -0700
Subject: [Python-ideas] The async API of the future: Some thoughts from
 an ignorant Tornado user
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 13, 2012 at 3:27 PM, Daniel McDougall
<daniel.mcdougall at> wrote:
> (This is a response to GVR's Google+ post asking for ideas; I
> apologize in advance if I come off as an ignorant programming newbie)
> I am the author of Gate One (
> which makes extensive use of Tornado's asynchronous capabilities.  It
> also uses multiprocessing and threading to a lesser extent.  The
> biggest issue I've had trying to write asynchronous code for Gate One
> is complexity.  Complexity creates problems with expressiveness which
> results in code that, to me, feels un-Pythonic.  For evidence of this
> I present the following example:  The retrieve_log_playback()
> function: (link goes to Github)
> All the function does is generate and return (to the client browser)
> an HTML playback of their terminal session recording.  To do it
> efficiently without blocking the event loop or slowing down all other
> connected clients required loads of complexity (or maybe I'm just
> ignorant of "a better way"--feel free to enlighten me).  In an ideal
> world I could have just done something like this:
> import async # The API of the future ;)
> async.async_call(retrieve_log_playback, settings, tws,
> mechanism=multiprocessing)
> # tws == instance of tornado.web.WebSocketHandler that holds the open connection

What you've described is very similar the the
concurrent.futures.Executor.submit() method.  ProcessPoolExecutor
still has multiprocessing's pickle-related limitations, but other than
that you're free to create ProcessPoolExecutors and/or
ThreadPoolExecutors and submit work to them.  Your
retrieve_log_playback function could become:

  # create a global/singleton ProcessPoolExecutor
  executor = concurrent.futures.ProcessPoolExecutor()
  def retrieve_log_playback(settings, tws=None):
    # set up settings dict just like the original
    io_loop = tornado.ioloop.IOLoop.instance()
    future = executor.submit(_retrieve_log_playback, settings)
    def send_message(future):
    future.add_done_callback(lambda future: io_loop.add_callback(send_message)

In Tornado 3.0 there will be some native support for Futures - the
last line will probably become "io_loop.add_future(future,
send_message)".  In _retrieve_log_playback you no longer have a queue
argument, and instead just return the result normally.

It's also possible to do this just using multiprocessing instead of
concurrent.futures - see multiprocessing.Pool.apply_async.


From guido at  Mon Oct 15 00:27:43 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 15:27:43 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 3:09 PM, Ben Darnell <ben at> wrote:
> On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum <guido at> wrote:
>>> So it would look something like
>>> Yield-from:
>>>    task1 = subtask1(args1)
>>>    task2 = subtask2(args2)
>>>    res1, res2 = yield from par(task1, task2)
>>> where the implementation of par() is left as an exercise for
>>> the reader.
>> So, can par() be as simple as
>> def par(*args):
>>   results = []
>>   for task in args:
>>     result = yield from task
>>     results.append(result)
>>   return results
>> ???
>> Or does it need to interact with the scheduler to ensure fairness?
>> (Not having built one of these, my intuition for how the primitives
>> fit together is still lacking, so excuse me for asking naive
>> questions.)
> It's not just fairness, it needs to interact with the scheduler to get
> any parallelism at all if the sub-generators have more than one step.
> Consider:
> def task1():
>   print "1A"
>   yield
>   print "1B"
>   yield
>   print "1C"
>   # and so on...
> def task2():
>   print "2A"
>   yield
>   print "2B"
>   yield
>   print "2C"
> def outer():
>   yield from par(task1(), task2())

Hm, that's a little unrealistic -- in practice you'll rarely see code
that yields unless it is also blocking for I/O. I presume that if both
tasks immediately block for I/O, the one whose I/O completes first
gets the run next; and if it then blocks again, it'll again depend on
whose I/O finishes first.

(Admittedly this has little to do with fairness now.)

> Both tasks are started immediately, but can't progress further until
> they are yielded from to advance the iterator.  So with this version
> of par() you get 1A, 2A, 1B, 1C..., 2B, 2C.

Really? When you call a generator, it doesn't run until the first
yield; it gets suspended before the first bytecode of the body. So if
anything, you might get 1A, 1B, 1C, 2A, 2B, 2C. (Which would prove
your point just as much of course.)

Sadly I don't have a framework lying around where I can test this
easily -- I'm pretty sure that the equivalent code in NDB interacts
with the scheduler in a way that ensures round-robin scheduling.

> To get parallelism I
> think you have to schedule each sub-generator separately instead of
> just yielding from them (which negates some of the benefits of yield
> from like easy error handling).

Honestly I don't mind of the scheduler has to be messy, as long the
mess is hidden from the caller.

> Even if there is a clever version of par() that works more like yield
> from, you'd need to go back to explicit scheduling if you wanted
> parallel execution without forcing everything to finish at the same
> time (which is simple with Futures).

Why wouldn't all generators that aren't blocked for I/O just run until
their next yield, in a round-robin fashion? That's fair enough for me.

But as I said, my intuition for how things work in Greg's world is not
very good.

>> Of course there's the question of what to do when one of the tasks
>> raises an error -- I haven't quite figured that out in NDB either, it
>> runs all the tasks to completion but the caller only sees the first
>> exception. I briefly considered having an "multi-exception" but it
>> felt too weird -- though I'm not married to that decision.
> In general for this kind of parallel operation I think it's fine to
> say that one (unspecified) exception is raised in the outer function
> and the rest are hidden.  With futures, "(r1, r2) = yield (f1, f2)" is
> just shorthand for "r1 = yield f1; r2 = yield f2", so separating the
> yields to have separate try/except blocks is no problem.  With yield
> from it's not as good because the second operation can't proceed while
> the outer function is waiting for the first.

Hmmm, I think I see your point. This seems to follow if (as Greg
insists) you don't have any decorators on the generators.

OTOH I am okay with only getting one of the exceptions. But I think
all of the remaining tasks should still be run to completion -- maybe
the caller just cared about their side effects. Or maybe this should
be an option to par().

--Guido van Rossum (

From ericsnowcurrently at  Mon Oct 15 00:34:08 2012
From: ericsnowcurrently at (Eric Snow)
Date: Sun, 14 Oct 2012 16:34:08 -0600
Subject: [Python-ideas] The async API of the future: Twisted and
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 14, 2012 11:27 AM, "Devin Jeanpierre" <jeanpierreda at> wrote:
> I did this once, because I needed to rewrite a blocking API and wanted
> to use Twisted, except that I made the mistake of starting the thread
> when the module was created instead of on first call. This lead to a
> deadlock because of the global import lock... :(  In principle I don't
> know why this would be a terrible awful idea, if it was done right,
> but maybe people with more experiences with threaded code can correct
> me.
> (The whole thread daemon thing necessary to make it act like a
> synchronous program, might be terribly insane and therefore an idea
> killer. I'm not sure.)
> I'm under the understanding that the global import lock won't cause
> this particular issue anymore as of Python 3.3, so perhaps starting a
> reactor on import is reasonable.

Yeah, while a global import lock still exists, it's used just long enough
to get a per-module lock.  On top of that, the import system now uses
importlib (read: pure Python) for most functionality, which has bearing on
threading and ease of better accommodating async if needed.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From storchaka at  Mon Oct 15 00:41:01 2012
From: storchaka at (Serhiy Storchaka)
Date: Mon, 15 Oct 2012 01:41:01 +0300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <k5f4bc$f88$>
References: <>
Message-ID: <k5ff20$1b7$>

On 14.10.12 22:38, Terry Reedy wrote:
> On 10/14/2012 1:42 PM, Guido van Rossum wrote:
>  > But then the caller would have to sort through the results and check
>  > for exceptions. I want the caller to be able to use try/except as
>  > well.

> OK. Then ...

   def par(*args):
      results = []
      exceptions = False
      for task in args:
            result = yield from task
            if exceptions:
         except Exception as exc:
            results = [StopIteration(result) for result in results]
            exceptions = True
      if not exceptions:
         return results
         exc = MultiXException()
         exc.results = results
         raise exc

From at  Mon Oct 15 00:42:09 2012
From: at (Joshua Landau)
Date: Sun, 14 Oct 2012 23:42:09 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 14 October 2012 23:08, MRAB <python at> wrote:

> On 2012-10-14 22:06, Joshua Landau wrote:
>> On 14 October 2012 20:57, Mike Meyer <mwm at
>> <mailto:mwm at>> wrote:
>>     On Sun, 14 Oct 2012 07:40:57 +0200
>>     Yuval Greenfield <ubershmekel at
>>     <mailto:ubershmekel at>**> wrote:
>>      > On Sun, Oct 14, 2012 at 2:04 AM, MRAB <python at
>>     <mailto:python at mrabarnett.** <python at>>>
>> wrote:
>>      >
>>      > > If it's more than one codepoint, we could prefix with the
>>     length of the
>>      > > codepoint's name:
>>      > >
>>      > > def __12CIRCLED_PLUS__(x, y):
>>      > >     ...
>>      > >
>>      > >
>>      > That's a bit impractical, and why reinvent the wheel? I'd much
>>     rather:
>>      >
>>      > def \u2295(x, y):
>>      >     ....
>>      >
>>      > So readable I want to read it twice. And that's not legal python
>>     today so
>>      > we don't break backwards compatibility!
>>     Yes, but we're defining an operator for instances of the class, so it
>>     needs the 'special' method marking:
>>     def __\u2295__(self, other):
>>     Now *that's* pretty!
>>          <mike
>> I much preferred your first choice:
>> def __$?__(self, other):
>> But to keep the "$" unused we can write:
>> def __op_?__(self, other):
>> (new methods will take precedence over the older __add__ and so forth)
>> What we can do then is use the "\u" syntax to let people without unicode
>> editors have accessibility:
>> def __op_\u2295__(self, other):
>> ...later in the code...
>> new = first \u2295 second
>> Which adds consistency whereas before we could only use that in
>> specific circumstances (inside strings), reducing cognitive burden.
>>  I don't think we should change what happens inside a string literal.
> Consider what would happen if you wanted to write "\\u0190". It would
> convert that into "\?".
> IIRC, Java can suffer from that kind of problem because \uXXXX is
> treated as that codepoint wherever it occurs.

No, no. "\\" would have priority, still. "\\uXXXX" is invalid outside of a
string, anyway, so we're allowed to say that.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Mon Oct 15 00:45:23 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 15 Oct 2012 11:45:23 +1300
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
	<> <k5ecb2$o9a$>
Message-ID: <>

Shane Green wrote:

>> On 14/10/2012 6:29am, Greg Ewing wrote:
>>> Once it has reported that
>>> a given file descriptor is ready, it *won't* report that file
>>> descriptor again until you do something with it.

>> Unless I have misunderstood you, the following example contradicts that:

It does indeed contradict me. It looks like this is
implementation-dependent, because I distinctly remember
encountering a bug once that I traced back to the fact
that I wasn't servicing *all* the fds reported as ready
before making another select call.

Since then I've always been careful to do that, so it's
possible that the behaviour has changed in the meantime
and I haven't noticed.


From ben at  Mon Oct 15 00:55:46 2012
From: ben at (Ben Darnell)
Date: Sun, 14 Oct 2012 15:55:46 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 3:27 PM, Guido van Rossum <guido at> wrote:
> On Sun, Oct 14, 2012 at 3:09 PM, Ben Darnell <ben at> wrote:
>> On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum <guido at> wrote:
>>>> So it would look something like
>>>> Yield-from:
>>>>    task1 = subtask1(args1)
>>>>    task2 = subtask2(args2)
>>>>    res1, res2 = yield from par(task1, task2)
>>>> where the implementation of par() is left as an exercise for
>>>> the reader.
>>> So, can par() be as simple as
>>> def par(*args):
>>>   results = []
>>>   for task in args:
>>>     result = yield from task
>>>     results.append(result)
>>>   return results
>>> ???
>>> Or does it need to interact with the scheduler to ensure fairness?
>>> (Not having built one of these, my intuition for how the primitives
>>> fit together is still lacking, so excuse me for asking naive
>>> questions.)
>> It's not just fairness, it needs to interact with the scheduler to get
>> any parallelism at all if the sub-generators have more than one step.
>> Consider:
>> def task1():
>>   print "1A"
>>   yield
>>   print "1B"
>>   yield
>>   print "1C"
>>   # and so on...
>> def task2():
>>   print "2A"
>>   yield
>>   print "2B"
>>   yield
>>   print "2C"
>> def outer():
>>   yield from par(task1(), task2())
> Hm, that's a little unrealistic -- in practice you'll rarely see code
> that yields unless it is also blocking for I/O. I presume that if both
> tasks immediately block for I/O, the one whose I/O completes first
> gets the run next; and if it then blocks again, it'll again depend on
> whose I/O finishes first.
> (Admittedly this has little to do with fairness now.)
>> Both tasks are started immediately, but can't progress further until
>> they are yielded from to advance the iterator.  So with this version
>> of par() you get 1A, 2A, 1B, 1C..., 2B, 2C.
> Really? When you call a generator, it doesn't run until the first
> yield; it gets suspended before the first bytecode of the body. So if
> anything, you might get 1A, 1B, 1C, 2A, 2B, 2C. (Which would prove
> your point just as much of course.)

Ah, OK.  I was mistaken about the "first yield" part, but the rest
stands.  The problem is that as soon as task1 blocks on IO, the entire
current task (which includes outer(), par(), and both children) gets
unscheduled.  no part of task2 gets scheduled until it gets yielded
from, because the scheduler can't see it until then.

> Sadly I don't have a framework lying around where I can test this
> easily -- I'm pretty sure that the equivalent code in NDB interacts
> with the scheduler in a way that ensures round-robin scheduling.
>> To get parallelism I
>> think you have to schedule each sub-generator separately instead of
>> just yielding from them (which negates some of the benefits of yield
>> from like easy error handling).
> Honestly I don't mind of the scheduler has to be messy, as long the
> mess is hidden from the caller.


>> Even if there is a clever version of par() that works more like yield
>> from, you'd need to go back to explicit scheduling if you wanted
>> parallel execution without forcing everything to finish at the same
>> time (which is simple with Futures).
> Why wouldn't all generators that aren't blocked for I/O just run until
> their next yield, in a round-robin fashion? That's fair enough for me.
> But as I said, my intuition for how things work in Greg's world is not
> very good.

The good and bad parts of this proposal both stem from the fact that
yield from is very similar to just inlining everything together.  This
gives you the exception handling semantics that you expect from
synchronous code, but it means that the scheduler can't distinguish
between subtasks; you have to explicitly schedule them as top-level

>>> Of course there's the question of what to do when one of the tasks
>>> raises an error -- I haven't quite figured that out in NDB either, it
>>> runs all the tasks to completion but the caller only sees the first
>>> exception. I briefly considered having an "multi-exception" but it
>>> felt too weird -- though I'm not married to that decision.
>> In general for this kind of parallel operation I think it's fine to
>> say that one (unspecified) exception is raised in the outer function
>> and the rest are hidden.  With futures, "(r1, r2) = yield (f1, f2)" is
>> just shorthand for "r1 = yield f1; r2 = yield f2", so separating the
>> yields to have separate try/except blocks is no problem.  With yield
>> from it's not as good because the second operation can't proceed while
>> the outer function is waiting for the first.
> Hmmm, I think I see your point. This seems to follow if (as Greg
> insists) you don't have any decorators on the generators.
> OTOH I am okay with only getting one of the exceptions. But I think
> all of the remaining tasks should still be run to completion -- maybe
> the caller just cared about their side effects. Or maybe this should
> be an option to par().

That's probably a good idea.


> --
> --Guido van Rossum (

From rene at  Mon Oct 15 01:08:42 2012
From: rene at (Rene Nejsum)
Date: Mon, 15 Oct 2012 01:08:42 +0200
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 15, 2012, at 12:05 AM, Guido van Rossum <guido at> wrote:

> On Sun, Oct 14, 2012 at 2:55 PM, Rene Nejsum <rene at> wrote:
>> On Oct 14, 2012, at 9:22 PM, Guido van Rossum <guido at> wrote:
>>> On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum <rene at> wrote:
>>>> On the high level (Python) basically what you need is that the queue.get()
>>>> can handle:
>>>> 1) Python objects (as today)
>>>> 2) timeout (as today, maybe in mills instead of seconds)
>>>> 3) Network (socket input/state change)
>>>> 4) File desc input/state change
>>>> 5) Other I/O changes like serial comm, etc.
>>>> 6) Maybe also yield based coroutine support ?
>>>> This requires support from the underlaying
>>>> OS. A support which is probably not there today ?
>>>> As far as I can see, having this one extended queue.get() would nicely enable
>>>> all high level concurrency issues in Python.
>>> [...]
>>>> I believe a "super" queue.get() would solve all use cases.
>>>> I have no idea on how difficult it would be to implement in
>>>> a cross platform manner.
>>> Hm. I know that a common (and often right!) recommendation for thread
>>> communication is to use the queue module. But that module is meant to
>>> work with threads. I think that the correct I/O primitives are more
>>> likely to come by looking at what Tornado and Twisted have done than
>>> by trying to "pimp up" the queue module -- it's good for what it does,
>>> but trying to add all that new functionality to it doesn't sound like
>>> a good fit.
>> You are probably right about the queue class. Maybe it should be a new class,
>> but I still believe I would be an excellent fit for doing concurrent stuff if Python
>> had a multiplexer message queue, Python is high-level enough to be able to
>> hide thread/select/read etc.
> I believe that the Twisted and Tornado event loops have APIs to push
> work into a thread and/or process, and it will be a requirement for
> the new stdlib event loop. However the main focus of the current
> effort is not making the distinction between process, threads and
> tasks (or microthreads or coroutines) disappear -- it is simply to
> have the most useful API for tasks.
>> A while ago I implemented pyworks ( which
>> is a kind of Erlang implementation for Python, making objects concurrent and return
>> values Futures, without adding much new code. Methods are sent asynchronous, simply
>> by doing standard obj.method(). obj is a proxy for the real object sending method() as a
>> message to the real object running in a separate thread. Return value is a Future. So
>> you can do
>>        val = obj.method()
>>        ? continue async with method()
>>        ? and do some other stuff, until:
>>        print val
>> which will hang waiting for the Future to complete, if it's not.
> That sounds like implicit futures (to use the Wikipedia article's
> terminology). I'm not a big fan of that. In fact, I'm proposing an API
> where all task switching is explicit, using the yield keyword (or
> yield from), and accessing the value of a future is also explicit in
> such a system.

You are right, it's implicit. An I think I understand your concern, how
much should be hidden/implicit and how much should be left to the
programmer. IMHO Python is such an excellent tool, mainly
because it hides a lot of details. Things like Memory management, GC,
threads and concurrency should be (and - I believe - can be hidden for
the developer.

>> It has been used in a couple of projects, making it much easier to do concurrent systems.
>> But, it would be great if the object/task could wait for more events than queue.get()
> I still think you're focused more on concurrent CPU activity than
> async I/O. These are quire different fields, even though they often
> use similar terminology (like future, task/thread/process,
> concurrent/parallel, spawn/join, queue). I think the keyword that most
> distinguishes them is "event". If you hear people talk about events
> they are probably multiplexing I/O, not CPU activities.

Yes and No. My field of concurrency and IO is process control, like
controlling high speed sorting machines with a lot of IO from 24V inputs,
scanners, scales, OCR, serial ports, etc. So for me it's a combination of concurrent IO,
state and parallelism (concurrent CPU). when you have an async (I/O) event,
you need some kind of concurrency to handle it at the next level.
It is difficult to do concurrent CPU activity without events, even if 
they are only signal events on a semaphore. 

One difference from ex. web servers is that we at design time, knows
exactly who many tasks we need and what the maximum load is going
to be. Typical between 50 to 100 tasks/threads sending messages to 
each other.


> -- 
> --Guido van Rossum (

From guido at  Mon Oct 15 01:26:11 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 16:26:11 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 4:08 PM, Rene Nejsum <rene at> wrote:
> On Oct 15, 2012, at 12:05 AM, Guido van Rossum <guido at> wrote:
>> That sounds like implicit futures (to use the Wikipedia article's
>> terminology). I'm not a big fan of that. In fact, I'm proposing an API
>> where all task switching is explicit, using the yield keyword (or
>> yield from), and accessing the value of a future is also explicit in
>> such a system.
> You are right, it's implicit. An I think I understand your concern, how
> much should be hidden/implicit and how much should be left to the
> programmer. IMHO Python is such an excellent tool, mainly
> because it hides a lot of details. Things like Memory management, GC,
> threads and concurrency should be (and - I believe - can be hidden for
> the developer.

I don't think you can hide threads or concurrency. You can offer
different APIs to work with them that have different advantages and
disadvantages, but I don't think you can *hide* them any more than you
can hide language constructs like classes or sequences.

>> I still think you're focused more on concurrent CPU activity than
>> async I/O. These are quire different fields, even though they often
>> use similar terminology (like future, task/thread/process,
>> concurrent/parallel, spawn/join, queue). I think the keyword that most
>> distinguishes them is "event". If you hear people talk about events
>> they are probably multiplexing I/O, not CPU activities.
> Yes and No. My field of concurrency and IO is process control, like
> controlling high speed sorting machines with a lot of IO from 24V inputs,
> scanners, scales, OCR, serial ports, etc. So for me it's a combination of concurrent IO,
> state and parallelism (concurrent CPU). when you have an async (I/O) event,
> you need some kind of concurrency to handle it at the next level.
> It is difficult to do concurrent CPU activity without events, even if
> they are only signal events on a semaphore.

Can you do it with threads? Because if threads serve your purpose,
they are probably easier to use than the async API we're considering
here, especially given your desire to hide unnecessary details. The
async APIs under consideration (Twisted, Tornado, coroutines) all
intentionally makes task switching explicit. You may also consider
greenlets/gevent, which is a compromise that makes task-switching
semi-explicit -- only certain calls cause task switches, but those
calls may be hidden inside other calls (or even overloaded operations
like __getattr__).

> One difference from ex. web servers is that we at design time, knows
> exactly who many tasks we need and what the maximum load is going
> to be. Typical between 50 to 100 tasks/threads sending messages to
> each other.

That does sound like threads are just fine for you. Of course you may
have to craft your own synchronization primitives out of the
lower-level locks and queues offered by the stdlib...

--Guido van Rossum (

From guido at  Mon Oct 15 01:28:56 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 16:28:56 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 3:55 PM, Ben Darnell <ben at> wrote:
> Ah, OK.  I was mistaken about the "first yield" part, but the rest
> stands.  The problem is that as soon as task1 blocks on IO, the entire
> current task (which includes outer(), par(), and both children) gets
> unscheduled.  no part of task2 gets scheduled until it gets yielded
> from, because the scheduler can't see it until then.

Ah, yes. I had forgotten that the whole stack (at least all frames
currently blocked in yield-from) is suspended.

I really hope that Greg has a working implementation of par().

> The good and bad parts of this proposal both stem from the fact that
> yield from is very similar to just inlining everything together.  This
> gives you the exception handling semantics that you expect from
> synchronous code, but it means that the scheduler can't distinguish
> between subtasks; you have to explicitly schedule them as top-level
> tasks.

I'm beginning to see that. Thanks for helping me form my intuition
about how this stuff works!

--Guido van Rossum (

From python at  Mon Oct 15 01:46:23 2012
From: python at (MRAB)
Date: Mon, 15 Oct 2012 00:46:23 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-14 23:42, Joshua Landau wrote:
> On 14 October 2012 23:08, MRAB <python at
> <mailto:python at>> wrote:
>     On 2012-10-14 22:06, Joshua Landau wrote:
>         On 14 October 2012 20:57, Mike Meyer <mwm at
>         <mailto:mwm at>
>         <mailto:mwm at <mailto:mwm at>>> wrote:
>              On Sun, 14 Oct 2012 07:40:57 +0200
>              Yuval Greenfield <ubershmekel at
>         <mailto:ubershmekel at>
>              <mailto:ubershmekel at
>         <mailto:ubershmekel at>>__> wrote:
>               > On Sun, Oct 14, 2012 at 2:04 AM, MRAB
>         <python at <mailto:python at>
>              <mailto:python at
>         <mailto:python at>>> wrote:
>               >
>               > > If it's more than one codepoint, we could prefix with the
>              length of the
>               > > codepoint's name:
>               > >
>               > > def __12CIRCLED_PLUS__(x, y):
>               > >     ...
>               > >
>               > >
>               > That's a bit impractical, and why reinvent the wheel?
>         I'd much
>              rather:
>               >
>               > def \u2295(x, y):
>               >     ....
>               >
>               > So readable I want to read it twice. And that's not
>         legal python
>              today so
>               > we don't break backwards compatibility!
>              Yes, but we're defining an operator for instances of the
>         class, so it
>              needs the 'special' method marking:
>              def __\u2295__(self, other):
>              Now *that's* pretty!
>                   <mike
>         I much preferred your first choice:
>         def __$?__(self, other):
>         But to keep the "$" unused we can write:
>         def __op_?__(self, other):
>         (new methods will take precedence over the older __add__ and so
>         forth)
>         What we can do then is use the "\u" syntax to let people without
>         unicode
>         editors have accessibility:
>         def __op_\u2295__(self, other):
>         ...later in the code...
>         new = first \u2295 second
>         Which adds consistency whereas before we could only use that in
>         specific circumstances (inside strings), reducing cognitive burden.
>     I don't think we should change what happens inside a string literal.
>     Consider what would happen if you wanted to write "\\u0190". It would
>     convert that into "\?".
>     IIRC, Java can suffer from that kind of problem because \uXXXX is
>     treated as that codepoint wherever it occurs.
> No, no. "\\" would have priority, still. "\\uXXXX" is invalid outside of
> a string, anyway, so we're allowed to say that.
OK, but what about raw string literals? Currently, "\\u0190" ==
r"\u0190", but "\\u0190" != r"?".

From greg.ewing at  Mon Oct 15 01:49:49 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 15 Oct 2012 12:49:49 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> On Fri, Oct 12, 2012 at 10:05 PM, Greg Ewing
> <greg.ewing at> wrote:

>>You could go further and say that yielding a tuple of generators
>>means to spawn them all concurrently, wait for them all to
>>complete and send back a tuple of the results. The yield-from
>>code would then look pretty much the same as the futures code.
> Sadly it looks that
>   r = yield from (f1(), f2())
> ends up interpreting the tuple as the iterator,

That's not yielding a tuple of generators. This is:

    r = yield (f1(), f2())

Note the absence of 'from'.

> So, can par() be as simple as
> def par(*args):
>   results = []
>   for task in args:
>     result = yield from task
>     results.append(result)
>   return results

No, it can't be as simple as that, because that will just
execute the tasks sequentially. It would have to be something
like this:

    def par(*tasks):
       n = len(tasks)
       results = [None] * n
       for i, task in enumerate(tasks):
          def thunk():
             nonlocal n
             results[i] = yield from task
             n -= 1
       while n > 0:
       return results

Not exactly straightforward, but that's why we write it once
and put it in the library. :-)

> Of course there's the question of what to do when one of the tasks
> raises an error -- I haven't quite figured that out in NDB either, it
> runs all the tasks to completion but the caller only sees the first
> exception. I briefly considered having an "multi-exception" but it
> felt too weird -- though I'm not married to that decision.

Hmmm. Probably what should happen is that all the other tasks
get cancelled and then the exception gets propagated to the
caller of par(). If we assume another couple of primitives:

    scheduler.cancel(task) -- cancels the task

    scheduler.throw(task, exc) -- raises an exception in the task

then we could implement it this way:

    def par(*tasks):
       n = len(tasks)
       results = [None] * n
       this = scheduler.current_task
       for i, task in enumerate(tasks):
          def thunk():
             nonlocal n
                results[i] = yield from task
             except BaseException as e:
                for t in tasks:
                scheduler.throw(this, e)
             n -= 1
       while n > 0:
       return results

>>>(10) Registering additional callbacks

While we're at it:

    class task_with_callbacks():

       def __init__(self, task):
          self.task = task
          self.callbacks = []

       def add_callback(self, cb):

       def run(self):
          result = yield from self.task
          for cb in self.callbacks:
          return result

> Here's another pattern that I can't quite figure out. ...
> Essentially, it's a barrier pattern where multiple tasks (each
> representing a different HTTP request, and thus not all starting at
> the same time) render a partial web page and then block until a new
> HTTP request comes in that provides the missing info.

This should be fairly straightforward.

    waiters = [] # Tasks waiting for the event

When a task wants to wait:


When the event occurs:

    for t in waiters:
    del waiters[:]

Incidentally, this is a commonly encountered pattern known as a
"condition queue" in IPC parlance. I envisage that the async
library would provide encapsulations of this and other standard
IPC mechanisms such as mutexes, semaphores, channels, etc.


From shibturn at  Mon Oct 15 01:55:45 2012
From: shibturn at (Richard Oudkerk)
Date: Mon, 15 Oct 2012 00:55:45 +0100
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
	<> <k5ecb2$o9a$>
Message-ID: <k5fje4$uo7$>

On 14/10/2012 11:45pm, Greg Ewing wrote:
> It does indeed contradict me. It looks like this is
> implementation-dependent, because I distinctly remember
> encountering a bug once that I traced back to the fact
> that I wasn't servicing *all* the fds reported as ready
> before making another select call.

Could it have been that some fds were being starved because the earlier 
ones in the lists were getting priority?  Servicing all fds reported 
prevents such starvation problems.


From at  Mon Oct 15 02:12:45 2012
From: at (Joshua Landau)
Date: Mon, 15 Oct 2012 01:12:45 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 15 October 2012 00:46, MRAB <python at> wrote:

> OK, but what about raw string literals? Currently, "\\u0190" ==
> r"\u0190", but "\\u0190" != r"?".

The ?r"? prefix escapes all escapes, so will escape this escape too. Hence,
this behaviour is un...escaped ;).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Mon Oct 15 02:23:52 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 15 Oct 2012 13:23:52 +1300
Subject: [Python-ideas] The async API of the future: Twisted
	and	Deferreds
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> The thing that worries me most is reimplementing httplib, urllib and
> so on to use all this new machinery *and* keep the old synchronous
> APIs working *even* if some code is written using the old style and
> some other code wants to use the new style.

I think this could be handled the same way you alluded to
before when talking about the App Engine. The base implementation
is asynchronous, and you provide a synchronous API that sets
up an async operation and then runs a nested event loop until
it completes.


From guido at  Mon Oct 15 02:35:25 2012
From: guido at (Guido van Rossum)
Date: Sun, 14 Oct 2012 17:35:25 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 4:49 PM, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
>> On Fri, Oct 12, 2012 at 10:05 PM, Greg Ewing
>> <greg.ewing at> wrote:
>>> You could go further and say that yielding a tuple of generators
>>> means to spawn them all concurrently, wait for them all to
>>> complete and send back a tuple of the results. The yield-from
>>> code would then look pretty much the same as the futures code.
>> Sadly it looks that
>>   r = yield from (f1(), f2())
>> ends up interpreting the tuple as the iterator,
> That's not yielding a tuple of generators. This is:
>    r = yield (f1(), f2())
> Note the absence of 'from'.

That's what I meant -- excuse me for not writing "yield-fromming". :-)

>> So, can par() be as simple as
>> def par(*args):
>>   results = []
>>   for task in args:
>>     result = yield from task
>>     results.append(result)
>>   return results
> No, it can't be as simple as that, because that will just
> execute the tasks sequentially.

Yeah, Ben just cleared that up for me.

> It would have to be something like this:
>    def par(*tasks):
>       n = len(tasks)
>       results = [None] * n
>       for i, task in enumerate(tasks):
>          def thunk():
>             nonlocal n
>             results[i] = yield from task
>             n -= 1
>          scheduler.schedule(thunk)
>       while n > 0:
>          yield
>       return results
> Not exactly straightforward, but that's why we write it once
> and put it in the library. :-)

But, as Christian Tismer wrote, we need to have some kind of idea of
what the primitives are that we want to support. Or should we just
have async equivalents for everything in and
(What about thread-local? Do we need task-local? Shudder.)

>> Of course there's the question of what to do when one of the tasks
>> raises an error -- I haven't quite figured that out in NDB either, it
>> runs all the tasks to completion but the caller only sees the first
>> exception. I briefly considered having an "multi-exception" but it
>> felt too weird -- though I'm not married to that decision.
> Hmmm. Probably what should happen is that all the other tasks
> get cancelled and then the exception gets propagated to the
> caller of par().

I think it ought to be at least an option to run them all to
completion -- I can easily imagine use cases for that. Also for
wanting to receive a list of exceptions. A practical par() may have to
grow a few options...

> If we assume another couple of primitives:
>    scheduler.cancel(task) -- cancels the task
>    scheduler.throw(task, exc) -- raises an exception in the task
> then we could implement it this way:
>    def par(*tasks):
>       n = len(tasks)
>       results = [None] * n
>       this = scheduler.current_task
>       for i, task in enumerate(tasks):
>          def thunk():
>             nonlocal n
>             try:
>                results[i] = yield from task
>             except BaseException as e:
>                for t in tasks:
>                   scheduler.cancel(t)
>                scheduler.throw(this, e)
>             n -= 1
>          scheduler.schedule(thunk)
>       while n > 0:
>          yield
>       return results

I glazed over here but I trust you.

>>>> (10) Registering additional callbacks
> While we're at it:
>    class task_with_callbacks():
>       def __init__(self, task):
>          self.task = task
>          self.callbacks = []
>       def add_callback(self, cb):
>          self.callbacks.append(cb)
>       def run(self):
>          result = yield from self.task
>          for cb in self.callbacks:
>             cb()
>          return result

Nice. (In fact so simple that maybe users can craft this for themselves?)

>> Here's another pattern that I can't quite figure out. ...
>> Essentially, it's a barrier pattern where multiple tasks (each
>> representing a different HTTP request, and thus not all starting at
>> the same time) render a partial web page and then block until a new
>> HTTP request comes in that provides the missing info.
> This should be fairly straightforward.
>    waiters = [] # Tasks waiting for the event
> When a task wants to wait:
>    scheduler.block(waiters)
> When the event occurs:
>    for t in waiters:
>       scheduler.schedule(t)
>    del waiters[:]
> Incidentally, this is a commonly encountered pattern known as a
> "condition queue" in IPC parlance. I envisage that the async
> library would provide encapsulations of this and other standard
> IPC mechanisms such as mutexes, semaphores, channels, etc.

Maybe you meant condition variable? It looks like threading.Condition
with notify_all().

Anyway, I agree we need some primitives like these, but I'm not sure
how to choose the set of essentials.

--Guido van Rossum (

From python at  Mon Oct 15 02:35:51 2012
From: python at (MRAB)
Date: Mon, 15 Oct 2012 01:35:51 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-15 01:12, Joshua Landau wrote:
> On 15 October 2012 00:46, MRAB <python at
> <mailto:python at>> wrote:
>     OK, but what about raw string literals? Currently, "\\u0190" ==
>     r"\u0190", but "\\u0190" != r"?".
> The ?r"? prefix escapes all escapes, so will escape this escape too.
> Hence, this behaviour is un...escaped ;).
If "\u0190" becomes "?", what happens to "\u000A"? Currently it's
legal. :-)

From shane at  Mon Oct 15 03:50:56 2012
From: shane at (Shane Green)
Date: Sun, 14 Oct 2012 18:50:56 -0700
Subject: [Python-ideas] The async API of the future: Twisted
	and	Deferreds
In-Reply-To: <>
References: <>
Message-ID: <>

Okay, I hate to do this, but is there any chance someone can provide a quick summary of the solution we're referring to here?  I just started watching python-ideas today, and have a lot of things going on, plus real bad ADD, so I'm having a hard time reassembling the solutions being referred to? (maybe there's a web presentation that gives a better threaded presentation than my mail program?  Or maybe I'm daff.  Either way, this sounded interesting!)

In summary, then, the Q/A below is referring to which approach? 

Shane Green
805-452-9666 | shane at

On Oct 14, 2012, at 5:23 PM, Greg Ewing <greg.ewing at> wrote:

> Guido van Rossum wrote:
>> The thing that worries me most is reimplementing httplib, urllib and
>> so on to use all this new machinery *and* keep the old synchronous
>> APIs working *even* if some code is written using the old style and
>> some other code wants to use the new style.
> I think this could be handled the same way you alluded to
> before when talking about the App Engine. The base implementation
> is asynchronous, and you provide a synchronous API that sets
> up an async operation and then runs a nested event loop until
> it completes.
> -- 
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Mon Oct 15 04:05:30 2012
From: at (Joshua Landau)
Date: Mon, 15 Oct 2012 03:05:30 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 15 October 2012 01:35, MRAB <python at> wrote:

> On 2012-10-15 01:12, Joshua Landau wrote:
>> On 15 October 2012 00:46, MRAB <python at
>> <mailto:python at mrabarnett.** <python at>>>
>> wrote:
>>     OK, but what about raw string literals? Currently, "\\u0190" ==
>>     r"\u0190", but "\\u0190" != r"?".
>> The ?r"? prefix escapes all escapes, so will escape this escape too.
>> Hence, this behaviour is un...escaped ;).
>>  If "\u0190" becomes "?", what happens to "\u000A"? Currently it's
> legal. :-)

The python interpreter could distinguish between its morphed Unicode
escapes and the originals - the escapes would never match against
already-syntactically-relevant constructs*. Hence "a \u0069s b"
is equivalent to "a i\u0073 b" but *not* "a is b": the first two are
defined by __op_is__ and the last is just the "is" keyword.

Hence, \u000A would just act like a character, and be definable as an
operator, and have little to do with the newline character.

Nice try, but the proposal stands firm.

* Except, of course, the old operators which will be phased into the new
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From glyph at  Mon Oct 15 04:07:11 2012
From: glyph at (Glyph)
Date: Sun, 14 Oct 2012 19:07:11 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 13, 2012, at 9:49 PM, Guido van Rossum <guido at> wrote:

> It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL[1], and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation.

Hopefully I'll have time to reply to some of the other stuff in this message, but:

Yes, absolutely.  This is the most important core issue, for me.  There's a little more to it than "data_received" (for example: listening for incoming connections, establishing outgoing connections, and scheduling timed calls) but this was the original motivation for the async PEP: to specify this interface.

Again, I'll have to kick the appropriate people to try to get that started again.  (Already started, at <>.)  It's on github so anyone can contribute, so if other participants in this thread - especially those of you with connections to the Tornado community - would like to try fleshing some of it out, please go ahead.  Even if you just have a question, or an area you think the PEP should address, file an issue (or comment on one already filed).

> (Thanks for writing this; this is the kind of insight I am hoping to get from you and others.)

Thanks for the repeated prompts for Twisted representatives to participate.

I was somewhat reluctant to engage with this thread at first, because it looked like a lot of meandering discussion of how to implement stuff that Twisted already deals with quite effectively and I wasn't sure what the point of it all was - why not just go look at Twisted's implementation?  But, after writing that message, I'm glad I did, since I realize that many of these insights are not recorded anywhere and in many cases there's no reasonable way to back this information out of Twisted's implementation.

In my (ha ha) copious spare time, I'll try to do some blogging about these topics.


[1]: With one minor nitpick: IOCP and SSL should not be mutually exclusive.  This was a problem for Twisted for a while, given the absolutely stupid way that OpenSSL thinks "sockets" work; but, we now have <> which could probably be adapted to be framework-neutral if this transport/event-loop level were standardized.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From shane at  Mon Oct 15 04:11:23 2012
From: shane at (Shane Green)
Date: Sun, 14 Oct 2012 19:11:23 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <k5fje4$uo7$>
References: <>
	<> <k5ecb2$o9a$>
	<> <k5fje4$uo7$>
Message-ID: <>

There are definitely bugs and system-dependent behaviour* in these areas.  Just a couple years ago I ran into one that lead to me writing a "handle_expt()" method with this comment before it: 

>     #####
>     #   Semi-crazy method that is working around a sort-of bug within 
>     #   asyncore.  When using select-based I/O multiplexing, the POLLHUP 
>     #   the socket state is indicated by the socket becoming readable, 
>     #   and not by indicating an exceptional event.
>     #   
>     #   When using POLL instead, the flag returned indicates precisely 
>     #   what the state is because "flags & select.POLLHUP" will be true.
>     #   
>     #   In the former case, when using select-based I/O multiplexing, 
>     #   select's indication that the the descriptor has become readable 
>     #   leads to the channel's handle read event method being invoked.  
>     #   Invoking receive on the socket then returns an empty string, 
>     #   which is taken by the channel as an indication that the socket 
>     #   is no longer connected and the channel correctly shuts itself 
>     #   down.
>     #   
>     #   However, asyncore's current implementation of the poll-based 
>     #   I/O multiplex event handling invokes the channel's 
>     #   handle exceptional data event anytime "flags & POLLHUP" is true.  
>     #   While select-based multiplexing would only call this method when 
>     #   OOB or urgent data was detected, it can now be called for POLLHUP 
>     #   events too.
>     #   
>     #   Under most scenarios this is not problematic because poll-based 
>     #   multiplexing also indicates the descriptor is readable and 
>     #   so the handle read event is also called and therefore the 
>     #   channel is properly close, with only an extraneous invocation 
>     #   to handle exceptional event being a side-effect.  Under certain 
>     #   situations, however, the socket is not indicated as being 
>     #   readable, only that it has had an exceptional data event.  I 
>     #   believe this occurs when the attemtp to connect never succeeds, 
>     #   but a POLLHUP does.  Previously this lead to a busy loop, which 
>     #   is what this method fixes.
>     ###

Shane Green
805-452-9666 | shane at

On Oct 14, 2012, at 4:55 PM, Richard Oudkerk <shibturn at> wrote:

> On 14/10/2012 11:45pm, Greg Ewing wrote:
>> It does indeed contradict me. It looks like this is
>> implementation-dependent, because I distinctly remember
>> encountering a bug once that I traced back to the fact
>> that I wasn't servicing *all* the fds reported as ready
>> before making another select call.
> Could it have been that some fds were being starved because the earlier ones in the lists were getting priority?  Servicing all fds reported prevents such starvation problems.
> --
> Richard
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From shane at  Mon Oct 15 04:47:28 2012
From: shane at (Shane Green)
Date: Sun, 14 Oct 2012 19:47:28 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

Hm, just jumping in out of turn (async ;-)  here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor.  Code ended up looking something like this following: 

# Server accepting incoming connections and spawning new HTTP/S channels?

# With a handle_connection() kind of like?
def handle_connection(conn): 
	# Create new channel and add to socket map, then?
	if (this.running()): 

# And HTTP/S channels with code like this?

# And handle-request code that did stuff like?
if (this.chunked):
	get_content = this.read_until("\r\n").then(self.parse_chunk_size).then(this.read_bytes)
	get_content = this.read_bytes(this.content_length)
return get_content.then(handle_content) 

I'll look around for the code, because it's been well over a year and wasn't complete event then, but that should convey some of how it was shaping up. 

Shane Green
805-452-9666 | shane at

On Oct 14, 2012, at 7:07 PM, Glyph <glyph at> wrote:

> On Oct 13, 2012, at 9:49 PM, Guido van Rossum <guido at> wrote:
>> It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL[1], and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation.
> Hopefully I'll have time to reply to some of the other stuff in this message, but:
> Yes, absolutely.  This is the most important core issue, for me.  There's a little more to it than "data_received" (for example: listening for incoming connections, establishing outgoing connections, and scheduling timed calls) but this was the original motivation for the async PEP: to specify this interface.
> Again, I'll have to kick the appropriate people to try to get that started again.  (Already started, at <>.)  It's on github so anyone can contribute, so if other participants in this thread - especially those of you with connections to the Tornado community - would like to try fleshing some of it out, please go ahead.  Even if you just have a question, or an area you think the PEP should address, file an issue (or comment on one already filed).
>> (Thanks for writing this; this is the kind of insight I am hoping to get from you and others.)
> Thanks for the repeated prompts for Twisted representatives to participate.
> I was somewhat reluctant to engage with this thread at first, because it looked like a lot of meandering discussion of how to implement stuff that Twisted already deals with quite effectively and I wasn't sure what the point of it all was - why not just go look at Twisted's implementation?  But, after writing that message, I'm glad I did, since I realize that many of these insights are not recorded anywhere and in many cases there's no reasonable way to back this information out of Twisted's implementation.
> In my (ha ha) copious spare time, I'll try to do some blogging about these topics.
> -glyph
> [1]: With one minor nitpick: IOCP and SSL should not be mutually exclusive.  This was a problem for Twisted for a while, given the absolutely stupid way that OpenSSL thinks "sockets" work; but, we now have <> which could probably be adapted to be framework-neutral if this transport/event-loop level were standardized.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ethan at  Mon Oct 15 03:45:49 2012
From: ethan at (Ethan Furman)
Date: Sun, 14 Oct 2012 18:45:49 -0700
Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

I would like to see some backwards compatibility here. ;)

In other words, add method names where reasonable (such as .child or 
.children instead of or along with built-in iteration) so that this new 
Path beast can be backported to the 2.x line. I'm happy to take that 
task on if Antoine has better uses of his time. What this would allow is 
a nice shiny toy for the 2.x series, plus an easier migration to 3.x 
when the time comes.

While I am very excited about the 3.x branch, and will use it whenever I 
can, some projects still have to be 2.x because of other dependencies. 
If the new Path doesn't have conflicting method or dunder names it would 
be possible to have a str-based 2.x version that otherwise acted 
remarkably like the non-str based 3.x version -- especially if the 
__strpath__ concept takes hold and Path objects can be passed around the 
os and os.path modules the way strings are now.


From ben at  Mon Oct 15 05:20:33 2012
From: ben at (Ben Darnell)
Date: Sun, 14 Oct 2012 20:20:33 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 10:15 AM, Guido van Rossum <guido at> wrote:
>> While it's convenient to have higher-level constructors for various
>> specialized types, I'd like to emphasize that having the low-level
>> interface is important for interoperability.  Tornado doesn't know
>> whether the file descriptors are listening sockets, connected sockets,
>> or pipes, so we'd just have to pass in a file descriptor with no other
>> information.
> Yeah, the IO object will still need to have a fileno() method.

They also need to be constructible given nothing but a fileno (but
more on this later)

>>> - In systems like App Engine that don't support async I/O on file
>>> descriptors at all, the constructors for creating I/O objects for disk
>>> files and connection sockets would comply with the interface but fake
>>> out almost everything (just like today, using httplib or httplib2 on
>>> App Engine works by adapting them to a "urlfetch" RPC request).
>> Why would you be allowed to make IO objects for sockets that don't
>> work?  I would expect that to just raise an exception.  On app engine
>> RPCs would be the only supported async I/O objects (and timers, if
>> those are implemented as magic I/O objects), and they're not
>> implemented in terms of sockets or files.
> Here's my use case. Suppose in general one can use async I/O for disk
> files, and it is integrated with the standard (abstract) event loop.
> So someone writes a handy templating library that wants to play nice
> with async apps, so it uses the async I/O idiom to read e.g. the
> template source code. Support I want to use that library on App
> Engine. It would be a pain if I had to modify that template-reading
> code to not use the async API. But (given the right async API!) it
> would be pretty simple for the App Engine API to provide a mock
> implementation of the async file reading API that was synchronous
> under the hood. Yes, it would block while waiting for disk, but App
> Engine uses threads anyway so it wouldn't be a problem.
> Another, current-day, use case is the httplib interface in the stdlib
> (a fairly fancy HTTP/1.1 client, although it has its flaws). That's
> based on sockets, which App Engine doesn't have; we have a "urlfetch"
> RPC that you give a URL (and more optional stuff) and returns a record
> containing the contents and headers. But again, many useful 3rd party
> libraries use httplib, and they won't work unless we somehow support
> httplib. So we have had to go out of our way to cover most uses of
> httplib. While the app believes it is opening the connection and
> sending the request, we are actually just buffering everything; and
> when the app starts reading from the connection, we make the urlfetch
> RPC and buffer the response, which we then feed back to the app as it
> believes it is reading from the socket. As long as the app doesn't try
> to get the socket's file descriptor and call select() it will work
> fine.
> But some libraries *do* call select(), and here our emulation breaks
> down. It would be nicer if the standard way to do async stuff was
> higher level than select(), so that we could offer the emulation at a
> level that would integrate with the event loop -- that way, ideally
> when we have to send the urlfetch RPC we could actually return a
> Future (or whatever), and the task would correctly be suspended, just
> *thinking* it was waiting for the response on a socket, but actually
> waiting for the RPC.


> Hopefully SSL provides another use case.

In posix-land, SSL isn't that different from regular sockets (using
ssl.wrap_socket from the 2.6+ stdlib).  The connection process is a
little more complicated, and it gets hairy if you want to support
renegotiation, but once a connection is established you can select()
on its file descriptor and generally use it just like a regular
socket.  On IOCP it's another story, though.

I've finally gotten around to reading up on IOCP and see how it's so
different from everything I'm used to (a lot of Twisted's design
decisions at the reactor level make a lot more sense now).  Earlier
you had mentioned platform-specific constructors for IOObjects, but it
actually needs to be event-loop-specific:  On windows you can use
select() or IOCP, and the IOObjects would be completely different for
each of them (and I do think you need to support both - select() is
kind of a second-class citizen on windows but is useful due to its

This means that the event loop needs to be involved in the creation of
these objects, which is why twisted has connectTCP, listenTCP,
listenUDP, connectSSL, etc methods on the reactor interface.  I think
that in order to handle both IOCP and select-style event loops you'll
need a very broad interface (roughly the union of twisted's
IReactor{Core, Time, Thread, TCP, UDP, SSL} as a minimum, with
IReactorFDSet and maybe IReactorSocket on posix for compatible with
existing posixy practices).  Basically, an event loop that supports
IOCP (or hopes to support it in the future) will end up looking a lot
like the bottom couple of layers of twisted (and assuming IOCP is a
requirement I wouldn't want to stray too far from twisted's designs


From pjdelport at  Mon Oct 15 05:36:59 2012
From: pjdelport at (Piet Delport)
Date: Mon, 15 Oct 2012 05:36:59 +0200
Subject: [Python-ideas] Proposal: A simple protocol for generator tasks
Message-ID: <>

[This is a lengthy mail; I apologize in advance!]


I've been following this discussion with great interest, and would like
to put forward a suggestion that might simplify some of the questions
that are up in the air.

There are several key point being considered: what exactly constitutes a
"coroutine" or "tasklet", what the precise semantics of "yield" and
"yield from" should be, how the stdlib can support different event loops
and reactors, and how exactly Futures, Deferreds, and other APIs fit
into the whole picture.

This mail is mostly about the first point: I think everyone agrees
roughly what a coroutine-style generator is, but there's enough
variation in how they are used, both historically and presently, that
the concept isn't as precise as it should be. This makes them hard to
think and reason about (failing the "BDFL gets headaches" test), and
makes it harder to define the behavior of all the parts that they
interact with, too.

This is a sketch of an attempt to define what constitutes a
generator-based task or coroutine more rigorously: I think that the
essential behavior can be captured in a small protocol, building on the
generator and iterator protocols. If anyone else thinks this is a good
idea, maybe something like this could work its way into a PEP?

(For the sake of this mail, I will use the term "generator task" or
"task" as a straw man term, but feel free to substitute "coroutine", or
whatever the preferred name ends up being.)


Very informally: A "generator task" is what you get if you take a normal
Python function and replace its blocking calls with "yield from" calls
to equivalent subtasks.

More formally, a "generator task" is a generator that implements an
incremental, multi-step computation, and is intended to be externally
driven to completion by a runner, or "scheduler", until it delivers a
final result.

This driving process happens as follows:

1. A generator task is iterated by its scheduler to yield a series of
   intermediate "step" values.

2. Each value yielded as a "step" represents a scheduling instruction,
   or primitive, to be interpreted by the task's scheduler.

   This scheduling instruction can be None ("just resume this task
   later"), or a variety of other primitives, such as Futures ("resume
   this task with the result of this Future"); see below for more.

3. The scheduler is responsible for interpreting each "step" instruction
   as appropriate, and sending the instruction's result, if any, back to
   the task using send() or throw().

   A scheduler may run a single task to completion, or may multiplex
   execution between many tasks: generator tasks should assume that
   other tasks may have executed while the task was yielding.

4. The generator task completes by successfully returning (raising
   StopIteration), or by raising an exception. The task's caller
   receives this result.

(For the sake of discussion, I use "the scheduler" to refer to whoever
calls the generator task's next/send/throw methods, and "the task's
caller" to refer to whoever receives the task's final result, but this
is not important to the protocol: a task should not care who drives it
or consumes its result, just like an iterator should not.)

Scheduling instructions / primitives

(This could probably use a better name.)

The protocol is intentionally agnostic about the implementation of
schedulers, event loops, or reactors: as long as they implement the same
set of scheduling primitives, code should work across them.

There multiple ways to accomplish this, but one possibility is to have a
set common, generic instructions in a standard library module such as
"tasklib" (which could also contain things like default scheduler
implementations, helper functions, and so on).

A partial list of possible primitives (the names are all made up, not
serious suggestions):

1. None: The most basic "do nothing" instruction. This just instructs
   the scheduler to resume the yielding task later.

2. Futures: Instruct the scheduler to resume with the future's result.

   Similar types in third-party libraries, such Deferreds, could
   potentially be implemented either natively by a scheduler that
   supports it, or using a wait_for_deferred(d) helper task, or using
   the idea of a "adapter" scheduler (see below).

3. Control primitives: spawn, sleep, etc.

   - Spawn a new (independent) task: yield tasklib.spawn(task())
   - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar())
   - Delay execution: yield tasklib.sleep(seconds)
   - etc.

   These could be simple marker objects, leaving it up to the underlying
   scheduler to actually recognize and implement them; some could also
   be implemented in terms of simpler operations (e.g.  sleep(), in
   terms of lower-level suspend and resume operations).

4. I/O operations

   This could be anything from low-level "yield fd_readable(sock)" style
   requests, or any of the higher-level APIs being discussed elsewhere.

   Whatever the exact API ends up being, the scheduler should implement
   these primitives by waiting for the I/O (or condition), and resuming
   the task with the result, if any.

5. Cooperative concurrency primitives, for working with locks, condition
   variables, and so on. (If useful?)

6. Custom, scheduler-specific instructions: Since a generator task can
   potentially yield anything as a scheduler instruction, it's not
   inconceivable for specialized schedulers to support specialized
   instructions. (Code that relies on such special instructions won't
   work on other schedulers, but that would be the point.)

A question open to debate is what a scheduler should do when faced with
an unrecognized scheduling instruction.

Raising TypeError or NotImplementedError back into the task is probably
a reasonable action, and would allow code like:

    def task():
            yield fancy_magic_instruction()
        except NotImplementedError:
            yield from boring_fallback()

Generator tasks as schedulers, and vice versa

Note that there is a symmetry to the protocol when a generator task
calls another using "yield from":

    def task()
        spam = yield from subtask()

Here, task() is both a generator task, and the effective scheduler for
subtask(): it "implements" subtask()'s scheduling instructions by
delegating them to its own scheduler.

This is a plain observation on its own, however, it raises one or two
interesting possibilities for more interesting schedulers implemented as
generator tasks themselves, including:

- Specialized sub-schedulers that run as a normal task within their
  parent scheduler, but implement for example weighted or priority
  queuing of their subtasks, or similar features.

- "Adapter" schedulers that intercept special scheduler instructions
  (say, Deferreds or other library-specific objects), and implement them
  using more generic instructions to the underlying scheduler.

Piet Delport
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Mon Oct 15 07:10:56 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 15 Oct 2012 18:10:56 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> I think it's too early to start proposing new syntax for a problem we
> don't even know is common at all.
> Greg Ewing's proposal works for me:
>   r = yield from par(f1(), f2())

Also, whoever's proposing this needs to understand that even
if the suggested change to yield-from were made, it would NOT
automatically result in par() behaviour. It would just be
another way of sequentially calling two sub-generators.


From stephen at  Mon Oct 15 07:02:20 2012
From: stephen at (Stephen J. Turnbull)
Date: Mon, 15 Oct 2012 14:02:20 +0900
Subject: [Python-ideas] Is there a good reason to use *
	for	multiplication?
In-Reply-To: <>
References: <>
Message-ID: <>

Ram Rachum writes:
 > On Fri, Oct 12, 2012 at 10:40 PM, Blake Hyde <syrion at> wrote:
 > > Is anything gained from this addition?
 > To give a practical answer, I could say that for newbies it's one small
 > confusion that could removed from the language.

Get Microsoft to agree and implement it in Excel and you might have a
point.  But as long as Excel uses * for multiplication, I don't think
anybody who uses computers is going to have trouble learning this.

Anyway, Python believes in TOOWTDI ("the one old way to do it").[1]

[1]  With apologies to Tim Peters.

From greg.ewing at  Mon Oct 15 07:34:45 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 15 Oct 2012 18:34:45 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Christian Tismer wrote:

> My approach would be to first find out how async operations should
> be modelled the best under the assumption that we have a coroutine
> concept that works without headaches about yielding in and out from
> something to whatnot.

I think we already know that. People like Dijkstra and Hoare
figured it all out decades ago.

That's what my generator-oriented approach is based on --
using standard techniques for managing concurrency.

> After that is settled and gets consensus, then I would think about
> bearable patterns to implement that using generators. And when we
> know what we really need, maybe considering more suitable Syntax.

Given that we don't want to use OS threads or greenlets,
but we're happy to use generators, all that's left is to
find bearable patterns for doing so.


From greg.ewing at  Mon Oct 15 07:58:35 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 15 Oct 2012 18:58:35 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> Why wouldn't all generators that aren't blocked for I/O just run until
> their next yield, in a round-robin fashion? That's fair enough for me.
> But as I said, my intuition for how things work in Greg's world is not
> very good.

That's exactly how my scheduler behaves.

> OTOH I am okay with only getting one of the exceptions. But I think
> all of the remaining tasks should still be run to completion -- maybe
> the caller just cared about their side effects. Or maybe this should
> be an option to par().

This is hard to answer without considering real use cases,
but my feeling is that if I care enough about the results of
the subtasks to wait until they've all completed before continuing,
then if anything goes wrong in any of them, I might as well abandon
the whole computation.

If that's not the case, I'd be happy to wrap each one in a
try-except that doesn't propagate the exception to the main
task, but just records the information that the subtask
failed somewhere, for the main task to check afterwards.

Another direction to approach this is to consider that par()
ought to be just an optimisation -- the result should be the same
as if you'd written sequential code to perform the subtasks
one after another. And in that case, an exception in one would
prevent any of the following ones from executing, so it's fine
if par() behaves like that, too.


From greg.ewing at  Mon Oct 15 08:04:16 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 15 Oct 2012 19:04:16 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Ben Darnell wrote:

The problem is that as soon as task1 blocks on IO, the entire
> current task (which includes outer(), par(), and both children) gets
> unscheduled.  no part of task2 gets scheduled until it gets yielded
> from, because the scheduler can't see it until then.

The suggested implementation of par() that I posted does
explicitly schedule the subtasks. Then it repeatedly
yields, giving them a chance to run, until they all


From pjdelport at  Mon Oct 15 08:59:12 2012
From: pjdelport at (Piet Delport)
Date: Mon, 15 Oct 2012 08:59:12 +0200
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 1:49 AM, Greg Ewing <greg.ewing at> wrote:
> No, it can't be as simple as that, because that will just
> execute the tasks sequentially. It would have to be something
> like this:
>    def par(*tasks):
>       n = len(tasks)
>       results = [None] * n
>       for i, task in enumerate(tasks):
>          def thunk():
>             nonlocal n
>             results[i] = yield from task
>             n -= 1
>          scheduler.schedule(thunk)
>       while n > 0:
>          yield
>       return results
> Not exactly straightforward, but that's why we write it once
> and put it in the library. :-)

There are two problems with this code. :)

The first is a scoping gotcha: every copy of thunk() will attempt run
the same task, and assign it to the same index, due to them sharing the
"i" and "task" variables. (The closure captures a reference to the outer
variable cells, rather than a copy of their values at the time of
thunk's definition.)

This could be fixed by defining it as "def thunk(i=i, task=task)", to
capture copies.

The second problem is more serious: the final while loop busy-waits,
which will consume all spare CPU time waiting for the underlying tasks
to complete. For this to be practical, it must suspend and resume itself
more efficiently.

Here's my own attempt. I'll assume the following primitive scheduler
instructions (see my "generator task protocol" mail for context), but it
should be readily adaptable to other primitives:

1. yield tasklib.spawn(task()) instructs the scheduler to spawn a new,
   independent task.
2. yield tasklib.suspend() suspends the current task.
3. yield tasklib.get_resume() obtains a callable / primitive that can be
   used to resume the current task later.

I'll also expand it to keep track of success and failure by returning a
list of (flag, result) tuples, in the style of DeferredList[1].


    def par(*tasks):
        resume = yield tasklib.get_resume()

        # Prepare to hold the results, and keep track of progress.
        results = [None] * len(tasks)
        finished = 0

        # Gather the i'th task's result
        def gather(i, task):
            nonlocal finished
                r = yield from task
            except Exception as e:
                results[i] = (False, e)
                results[i] = (True, r)
            finished += 1

            # If we're the last to complete, resume par()
            if finished == len(tasks):
                yield resume()

        # Spawn subtasks, and wait for completion.
        for (i, task) in tasks:
            yield tasklib.spawn(gather(i, task))
        yield tasklib.suspend()

        return results

Hopefully, this is easy enough to read: it should be obvious to see how
to modify gather() to add support for resuming immediately on the first
result or first error.


Piet Delport

From greg.ewing at  Mon Oct 15 09:37:55 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 15 Oct 2012 20:37:55 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> But, as Christian Tismer wrote, we need to have some kind of idea of
> what the primitives are that we want to support.

Well, I was just responding to your asking what the yield-from
equivalent would be to the corresponding thing using Futures.
I assumed from the fact that you asked that it was something
Futures-using people like to do a lot, so it would be worth
putting into a library.

There may be other ways to approach it, though. Suppose we
had a primitive that just waits for a single task to finish
and returns its value. Then we could do this:

    def par(*tasks):
      for task in tasks:
      return [yield from scheduler.wait_for(task) for task in tasks]

That's straightforward enough that maybe it doesn't even need
to be a library function, just a well-known pattern.

> Maybe you meant condition variable? It looks like threading.Condition
> with notify_all().

Something like that -- the terminology probably varies a bit
from one library to another. The basic concept is "set of
tasks waiting for some condition to become true".

> Anyway, I agree we need some primitives like these, but I'm not sure
> how to choose the set of essentials.

I think that most, maybe all, of the standard synchronisation
mechanisms, like mutexes and semaphores, can be built out of the
primitives I've already introduced -- essentially just block()
and yield. So anything of this kind that we provide will be more
in the way of convenience features than essential primitives.


From glyph at  Mon Oct 15 09:45:14 2012
From: glyph at (Glyph)
Date: Mon, 15 Oct 2012 00:45:14 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 14, 2012, at 7:47 PM, Shane Green <shane at> wrote:

> Hm, just jumping in out of turn (async ;-)  here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor.  Code ended up looking something like this following: 

> this.socket.accept().then(this.handle_connection)
> # With a handle_connection() kind of like?
> def handle_connection(conn): 
> 	# Create new channel and add to socket map, then?
> 	if (this.running()): 
> 		this.accept().then(this.handle_connection)

As I explained in a previous message, I think this is the wrong way to go, because:

It's error-prone.  It's very easy to forget to call this.accept().then(...).  What if you get an exception? How do you associate it with 'this'?  (Why do you have to constantly have application code check 'this.running'?)
It's inefficient.  You have to allocate a promise for every single operation. (Not a big deal for 'accept()' but kind of a big deal for 'recv()'.
It's hard to share resources. What if multiple layers try to call .accept() or .read_until() from different promise contexts?
As a bonus fourth point, this uses some wacky new promise abstraction which isn't Deferreds, and therefore (most likely) forgets to implement some part of the callback-flow abstraction that you really need for layered, composable systems :).

We implemented something very like this in Twisted in a module called called "" and it was a big problem and had very poor performance and I just this week fixed yet another instance of the 'oops I forgot to call .read() again in my exception handler' bug in a system where it's still in use.  Please don't repeat this mistake in the standard library.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From shane at  Mon Oct 15 10:03:33 2012
From: shane at (Shane Green)
Date: Mon, 15 Oct 2012 01:03:33 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

Your points regarding performance are good ones.  My tests indicated it was slightly slower than asyncore.  The API I based it on is actually quite thorough, and addresses many of the shortcomings Deferreds (in Twisted) have.  Namely, all callbacks registered with a given Promise instance, receive the output of the original operation; chaining is fully supported but explicitly (this.then(that).then(that)?), rather than having a Deferred whose value automatically assumes that of each callback, making them necessarily dependent handlers fired before them, with a default guaranteed behaviour being that only the first one actually receives the output of the originating application.  I haven't come across many instances where one wants to chain their callback by accident, but many examples where multiple parties were interested in the same operation's output.  Finally, I'm not sure you're other points differ greatly from the gotchas of I/O programming in general.  Uncoordinated access by multiple threads tends to be problematic.  Again, though, you're point about efficiency and the less than ideal "an instance for every" arrangement are good ones.  Just throwing it out there as a source of ideas, and hopefully to unseat Deferreds as the defacto callback standard four discussion because the promise pattern is more flexible and robust. 

Shane Green
805-452-9666 | shane at

On Oct 15, 2012, at 12:45 AM, Glyph <glyph at> wrote:

> On Oct 14, 2012, at 7:47 PM, Shane Green <shane at> wrote:
>> Hm, just jumping in out of turn (async ;-)  here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor.  Code ended up looking something like this following: 
>> this.socket.accept().then(this.handle_connection)
>> # With a handle_connection() kind of like?
>> def handle_connection(conn): 
>> 	# Create new channel and add to socket map, then?
>> 	if (this.running()): 
>> 		this.accept().then(this.handle_connection)
> As I explained in a previous message, I think this is the wrong way to go, because:
> It's error-prone.  It's very easy to forget to call this.accept().then(...).  What if you get an exception? How do you associate it with 'this'?  (Why do you have to constantly have application code check 'this.running'?)
> It's inefficient.  You have to allocate a promise for every single operation. (Not a big deal for 'accept()' but kind of a big deal for 'recv()'.
> It's hard to share resources. What if multiple layers try to call .accept() or .read_until() from different promise contexts?
> As a bonus fourth point, this uses some wacky new promise abstraction which isn't Deferreds, and therefore (most likely) forgets to implement some part of the callback-flow abstraction that you really need for layered, composable systems :).
> We implemented something very like this in Twisted in a module called called "" and it was a big problem and had very poor performance and I just this week fixed yet another instance of the 'oops I forgot to call .read() again in my exception handler' bug in a system where it's still in use.  Please don't repeat this mistake in the standard library.
> -glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Mon Oct 15 10:18:13 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 15 Oct 2012 18:18:13 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 10:35 AM, Guido van Rossum <guido at> wrote:
> But, as Christian Tismer wrote, we need to have some kind of idea of
> what the primitives are that we want to support. Or should we just
> have async equivalents for everything in and
> (What about thread-local? Do we need task-local? Shudder.)

Task locals aren't so scary, since they're already the main reason why
generators are so handy - task locals are just the frame locals in the
generator :)

The main primitive I personally want out of an async API is a
task-based equivalent to concurrent.futures.as_completed() [1]. This
is what I meant about iteration being a bit of a mess: the way the
as_completed() works, the suspend/resume channel of the iterator
protocol is being used to pass completed future objects back to the
calling iterator. That means that channel *can't* be used to talk
between the coroutine and the scheduler, so if you decide you need to
free it up for that purpose, you're either forced to wait for *all*
the futures to be triggered before any of them can be passed to the
caller (allowing you to use yield-from and return a container of
completed futures) or else you're forced to switch to callback-style
programming (this is where Ruby's blocks are a huge advantage -
because their for loops essentially *are* callbacks, you have a lot
more flexibility in calling back to different places from a single
piece of code).

However, I can see one why to make it work which is to require the
*invoking* code to continue to manage the communication with the
scheduler. Using this concept, there would be an
"as_completed_async()" primitive that works something like:

    for get_next_result in as_completed_task(tasks):
        task, result = yield get_next_result
        # Process this result, wait for next one

The async equivalent of the concurrent.futures example would then look
something like:

    URLS = ['',

    def load_url_async(url, timeout):
        with (yield urlopen_async(url, timeout=timeout)) as handle:
            return url,

    tasks = (load_url_async(url, 60) for url in URLS)
    with concurrent.futures.as_completed_async(tasks) as async_results
        for get_next_result in async_results:
                url, data = yield get_next_result
            except Exception as exc:
                print('{!r} generated an exception: {}'.format(url, exc))
                print('{!r} page is {:d} bytes'.format(url, len(data)))

Key parts of this idea:

1. as_completed_async registers the supplied tasks with the main
scheduler so they can all start running in parallel
2. as_completed_async is a context manager that will *cancel* all
pending jobs on exit
3. as_completed_async is an iterator that produces a special future
that fires whenever *any* of the registered tasks has run to
4. because there's a separate yield step for each result retrieval,
ordinary exception handling mechanisms can be used rather than needing
to introspect a future object



Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From tismer at  Mon Oct 15 10:24:53 2012
From: tismer at (Christian Tismer)
Date: Mon, 15 Oct 2012 10:24:53 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 15.10.12 07:34, Greg Ewing wrote:
> Christian Tismer wrote:
>> My approach would be to first find out how async operations should
>> be modelled the best under the assumption that we have a coroutine
>> concept that works without headaches about yielding in and out from
>> something to whatnot.
> I think we already know that. People like Dijkstra and Hoare
> figured it all out decades ago.
> That's what my generator-oriented approach is based on --
> using standard techniques for managing concurrency.

Sure, the theory is clear and well-known.
Not so clear is which of the concepts to implement to
what detail, and things like the C10K problem still are a challenge
to solve efficiently for Python.

I think it is necessary to take these considerations into account
at least and to think about updating large sets of waiting
channels efficiently, using appropriate data structures.

>> After that is settled and gets consensus, then I would think about
>> bearable patterns to implement that using generators. And when we
>> know what we really need, maybe considering more suitable Syntax.
> Given that we don't want to use OS threads or greenlets,
> but we're happy to use generators, all that's left is to
> find bearable patterns for doing so.

Question: Is it already given that something like greenlets is out
of consideration? I did not find a final say on that (but I'm bad at

Is the whole discussion about what would be best, or just
"you can choose any implementation provided it's generators" ? :-)

cheers - Chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From ncoghlan at  Mon Oct 15 10:33:44 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 15 Oct 2012 18:33:44 +1000
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 2:54 AM, Guido van Rossum <guido at> wrote:
> On Sun, Oct 14, 2012 at 8:01 AM, Calvin Spealman <ironfroggy at> wrote:
>> Why is subclassing a problem? It can be overused, but seems the right
>> thing to do in this case. You want a protocol that responds to new data by
>> echoing and tells the user when the connection was terminated? It makes
>> sense that this is a subclass: a special case of some class that handles the
>> base behavior.
> I replied to this in detail on the "Twisted and Deferreds" thread in
> an exchange. Summary: I'm -0 when it comes to subclassing protocol
> classes; -1 on subclassing objects that implement significant
> functionality.

This problem does seem tailor-made for a Protocol ABC - you can
inherit from it if you want, or call register() if you don't.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From greg.ewing at  Mon Oct 15 11:17:17 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 15 Oct 2012 22:17:17 +1300
Subject: [Python-ideas] Proposal: A simple protocol for generator tasks
In-Reply-To: <>
References: <>
Message-ID: <>

Piet Delport wrote:

> 2. Each value yielded as a "step" represents a scheduling instruction,
>    or primitive, to be interpreted by the task's scheduler.

I don't think this technique should be used to communicate
with the scheduler, other than *maybe* for a *very* small
set of operations that are truly primitive -- and even then
I'm not convinced.

To begin with, there are some operations that *can't* rely
on yielded instructions as the only way of invoking them.
Spawning a task, for example -- there must be some way for
non-task code to invoke that, otherwise you wouldn't be able
to get top-level tasks into the system.

Also, consider the operation of unblocking a task that's
waiting for some event to occur. Often you will want to
invoke this using a callback from an event loop, which is
not a generator and can't yield anything to anywhere.

Given that these operations must provide a way of invoking
them using a plain function call, there is little reason
to provide a second way using a yielded instruction.

In any case, I believe that the public interface for *any*
scheduler operation should not be a yielded instruction,
but either a plain function or something called using
yield-from, for reasons I explained to Guido earlier.

> - Specialized sub-schedulers that run as a normal task within their
>   parent scheduler, but implement for example weighted or priority
>   queuing of their subtasks, or similar features.

There are problems with allowing multiple schedulers to
coexist within the one system, especially if yielded
instructions are the only way to communicate with them.

It might work for instructions to a task's own scheduler
concerning itself, but some operations need to operate on
a *different* task, e.g. unblocking a task when the event
it was waiting for occurs. How do you know which scheduler
is managing it? And even if you can find out, if you have
to control it using yielded instructions, you have no
way of yielding something to a different task's scheduler.


From _ at  Mon Oct 15 11:24:44 2012
From: _ at (Laurens Van Houtven)
Date: Mon, 15 Oct 2012 11:24:44 +0200
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 1:26 AM, Guido van Rossum <guido at> wrote:

> I don't think you can hide threads or concurrency. You can offer
> different APIs to work with them that have different advantages and
> disadvantages, but I don't think you can *hide* them any more than you
> can hide language constructs like classes or sequences.

+1. Nice APIs to put padding on the sharp edges, yes. Hiding them? IMHO,
usually a mistake.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From _ at  Mon Oct 15 11:29:37 2012
From: _ at (Laurens Van Houtven)
Date: Mon, 15 Oct 2012 11:29:37 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 15, 2012 at 10:24 AM, Christian Tismer <tismer at>wrote:

> Question: Is it already given that something like greenlets is out
> of consideration? I did not find a final say on that (but I'm bad at
> searching...)

I think an number of people have expressed a distaste for implicit task
switching. That doesn't mean "no", but I'm guessing what's going to happen
is having some kind of explicit, generator based thing, with an underlying
API that makes implementing greenlets pretty easy.

> Is the whole discussion about what would be best, or just
> "you can choose any implementation provided it's generators" ? :-)
> cheers - Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ironfroggy at  Mon Oct 15 12:25:16 2012
From: ironfroggy at (Calvin Spealman)
Date: Mon, 15 Oct 2012 06:25:16 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 3:37 AM, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
>> But, as Christian Tismer wrote, we need to have some kind of idea of
>> what the primitives are that we want to support.
> Well, I was just responding to your asking what the yield-from
> equivalent would be to the corresponding thing using Futures.
> I assumed from the fact that you asked that it was something
> Futures-using people like to do a lot, so it would be worth
> putting into a library.
> There may be other ways to approach it, though. Suppose we
> had a primitive that just waits for a single task to finish
> and returns its value. Then we could do this:
>    def par(*tasks):
>      for task in tasks:
>         scheduler.schedule(task)
>      return [yield from scheduler.wait_for(task) for task in tasks]
> That's straightforward enough that maybe it doesn't even need
> to be a library function, just a well-known pattern.

The more I follow this thread the less I understand the point of
introducing a new use for yield-from in this discussion.

All of this extra work trying to figure how to make yield-from work
giving its existing 3.3 semantics could just be avoided if we just
allow yielding the tasks directly, and treating them like any other
async operation.

In the original message yield-from seemed to be suggested, there
was no justification, it was just said "so you have to do this" but
I don't see that you do.

If you allow yielding tasks, then yielding multiple tasks to wait together
because trivial: just yield a tuple of them. In fact, I think we should say
that yielding any tuple of async operations, whatever those objects actually
end of being, should wait for all of them.

Maybe we also want to wait on both some http request operation,
implemented as a task (a generator), and also a cache hit.

def handle_or_cached(request):
    api_resp, cache_resp = yield request(API_ENDPOINT), cache.get(KEY)
    if cache_resp:
        return cache_resp
    return render(api_resp)

Or we could provide wrappers to control the behavior of multiple-wait:

def handle_or_cached(request):
    api_resp, cache_resp = yield first(request(API_ENDPOINT), cache.get(KEY))
    if cache_resp:
        return cache_resp
    return render(api_resp)

>> Maybe you meant condition variable? It looks like threading.Condition
>> with notify_all().
> Something like that -- the terminology probably varies a bit
> from one library to another. The basic concept is "set of
> tasks waiting for some condition to become true".
>> Anyway, I agree we need some primitives like these, but I'm not sure
>> how to choose the set of essentials.
> I think that most, maybe all, of the standard synchronisation
> mechanisms, like mutexes and semaphores, can be built out of the
> primitives I've already introduced -- essentially just block()
> and yield. So anything of this kind that we provide will be more
> in the way of convenience features than essential primitives.
> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From tismer at  Mon Oct 15 12:38:21 2012
From: tismer at (Christian Tismer)
Date: Mon, 15 Oct 2012 12:38:21 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 15.10.12 11:29, Laurens Van Houtven wrote:
> On Mon, Oct 15, 2012 at 10:24 AM, Christian Tismer 
> <tismer at <mailto:tismer at>> wrote:
>     Question: Is it already given that something like greenlets is out
>     of consideration? I did not find a final say on that (but I'm bad at
>     searching...)
> I think an number of people have expressed a distaste for implicit 
> task switching. That doesn't mean "no", but I'm guessing what's going 
> to happen is having some kind of explicit, generator based thing, with 
> an underlying API that makes implementing greenlets pretty easy.
>     Is the whole discussion about what would be best, or just
>     "you can choose any implementation provided it's generators" ? :-)

Thanks for your reply.

Just one thing that I don't get.
What do you mean by 'implicit taskswitching' ?
There is no such thing in greenlet, if you really meant that
Library from Armin Rigo.

greenlets do everything explicitly, no pre-emption at all.

So, is there a general understanding what a greenlet is and what not?
Just to make sure that the discussed terms are clearly defined.

cheers - Chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ironfroggy at  Mon Oct 15 12:48:20 2012
From: ironfroggy at (Calvin Spealman)
Date: Mon, 15 Oct 2012 06:48:20 -0400
Subject: [Python-ideas] Proposal: A simple protocol for generator tasks
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 11:36 PM, Piet Delport <pjdelport at> wrote:
> [This is a lengthy mail; I apologize in advance!]

This is what I get for deciding to check up on these threads at 6AM
after a late night.

> Hi,
> I've been following this discussion with great interest, and would like
> to put forward a suggestion that might simplify some of the questions
> that are up in the air.
> There are several key point being considered: what exactly constitutes a
> "coroutine" or "tasklet", what the precise semantics of "yield" and
> "yield from" should be, how the stdlib can support different event loops
> and reactors, and how exactly Futures, Deferreds, and other APIs fit
> into the whole picture.
> This mail is mostly about the first point: I think everyone agrees
> roughly what a coroutine-style generator is, but there's enough
> variation in how they are used, both historically and presently, that
> the concept isn't as precise as it should be. This makes them hard to
> think and reason about (failing the "BDFL gets headaches" test), and
> makes it harder to define the behavior of all the parts that they
> interact with, too.
> This is a sketch of an attempt to define what constitutes a
> generator-based task or coroutine more rigorously: I think that the
> essential behavior can be captured in a small protocol, building on the
> generator and iterator protocols. If anyone else thinks this is a good
> idea, maybe something like this could work its way into a PEP?
> (For the sake of this mail, I will use the term "generator task" or
> "task" as a straw man term, but feel free to substitute "coroutine", or
> whatever the preferred name ends up being.)

I like that "task" is more general and avoids complaints from some that
these are not "real" coroutines.

> Definition
> ==========
> Very informally: A "generator task" is what you get if you take a normal
> Python function and replace its blocking calls with "yield from" calls
> to equivalent subtasks.

"yield" and "yield from", although I'm really disliking the second
being included
at all. More on this later.

> More formally, a "generator task" is a generator that implements an
> incremental, multi-step computation, and is intended to be externally
> driven to completion by a runner, or "scheduler", until it delivers a
> final result.
> This driving process happens as follows:
> 1. A generator task is iterated by its scheduler to yield a series of
>    intermediate "step" values.
> 2. Each value yielded as a "step" represents a scheduling instruction,
>    or primitive, to be interpreted by the task's scheduler.
>    This scheduling instruction can be None ("just resume this task
>    later"), or a variety of other primitives, such as Futures ("resume
>    this task with the result of this Future"); see below for more.
> 3. The scheduler is responsible for interpreting each "step" instruction
>    as appropriate, and sending the instruction's result, if any, back to
>    the task using send() or throw().
>    A scheduler may run a single task to completion, or may multiplex
>    execution between many tasks: generator tasks should assume that
>    other tasks may have executed while the task was yielding.
> 4. The generator task completes by successfully returning (raising
>    StopIteration), or by raising an exception. The task's caller
>    receives this result.
> (For the sake of discussion, I use "the scheduler" to refer to whoever
> calls the generator task's next/send/throw methods, and "the task's
> caller" to refer to whoever receives the task's final result, but this
> is not important to the protocol: a task should not care who drives it
> or consumes its result, just like an iterator should not.)
> Scheduling instructions / primitives
> ====================================
> (This could probably use a better name.)
> The protocol is intentionally agnostic about the implementation of
> schedulers, event loops, or reactors: as long as they implement the same
> set of scheduling primitives, code should work across them.
> There multiple ways to accomplish this, but one possibility is to have a
> set common, generic instructions in a standard library module such as
> "tasklib" (which could also contain things like default scheduler
> implementations, helper functions, and so on).
> A partial list of possible primitives (the names are all made up, not
> serious suggestions):
> 1. None: The most basic "do nothing" instruction. This just instructs
>    the scheduler to resume the yielding task later.
> 2. Futures: Instruct the scheduler to resume with the future's result.
>    Similar types in third-party libraries, such Deferreds, could
>    potentially be implemented either natively by a scheduler that
>    supports it, or using a wait_for_deferred(d) helper task, or using
>    the idea of a "adapter" scheduler (see below).
> 3. Control primitives: spawn, sleep, etc.
>    - Spawn a new (independent) task: yield tasklib.spawn(task())
>    - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar())
>    - Delay execution: yield tasklib.sleep(seconds)
>    - etc.
>    These could be simple marker objects, leaving it up to the underlying
>    scheduler to actually recognize and implement them; some could also
>    be implemented in terms of simpler operations (e.g.  sleep(), in
>    terms of lower-level suspend and resume operations).

What is the difference between the tossed around "yield from task()"
and this "yield tasklib.spawn(task())"

And, why isn't it simply spelled "yield task()"? You have all these different
types that can be yielded to the scheduler from tasks to the scheduler. Why
isn't a task one of those possible types? If the scheduler gets an iterator, it
should schedule it automatically.

> 4. I/O operations
>    This could be anything from low-level "yield fd_readable(sock)" style
>    requests, or any of the higher-level APIs being discussed elsewhere.
>    Whatever the exact API ends up being, the scheduler should implement
>    these primitives by waiting for the I/O (or condition), and resuming
>    the task with the result, if any.
> 5. Cooperative concurrency primitives, for working with locks, condition
>    variables, and so on. (If useful?)

I am sure these will come about, but I think that is considered a
library that sits
on top of whatever API comes out, not part of it.

> 6. Custom, scheduler-specific instructions: Since a generator task can
>    potentially yield anything as a scheduler instruction, it's not
>    inconceivable for specialized schedulers to support specialized
>    instructions. (Code that relies on such special instructions won't
>    work on other schedulers, but that would be the point.)
> A question open to debate is what a scheduler should do when faced with
> an unrecognized scheduling instruction.
> Raising TypeError or NotImplementedError back into the task is probably
> a reasonable action, and would allow code like:
>     def task():
>         try:
>             yield fancy_magic_instruction()
>         except NotImplementedError:
>             yield from boring_fallback()
>         ...

Interesting. Can anyone think of an example of this?

> Generator tasks as schedulers, and vice versa
> =============================================
> Note that there is a symmetry to the protocol when a generator task
> calls another using "yield from":
>     def task()
>         spam = yield from subtask()
> Here, task() is both a generator task, and the effective scheduler for
> subtask(): it "implements" subtask()'s scheduling instructions by
> delegating them to its own scheduler.

As raised above, why not simply "yield subtask()"?

> This is a plain observation on its own, however, it raises one or two
> interesting possibilities for more interesting schedulers implemented as
> generator tasks themselves, including:
> - Specialized sub-schedulers that run as a normal task within their
>   parent scheduler, but implement for example weighted or priority
>   queuing of their subtasks, or similar features.

I think that is too messy, you could have so many different scheduler
semantics. Maybe this sort of thing is what your schedule-specific
instructions should be for.

Or, attributes on tasks that schedulers can be known to look for.

> - "Adapter" schedulers that intercept special scheduler instructions
>   (say, Deferreds or other library-specific objects), and implement them
>   using more generic instructions to the underlying scheduler.

I think we can make yielding tasks a direct operation, and still implment
sub-schedulers. They should be more opaque, I think.

> --
> Piet Delport
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From techtonik at  Mon Oct 15 13:13:56 2012
From: techtonik at (anatoly techtonik)
Date: Mon, 15 Oct 2012 14:13:56 +0300
Subject: [Python-ideas] Python as a tool to download stuff for
In-Reply-To: <jt7ebl$duo$>
References: <>
Message-ID: <>

On Fri, Jul 6, 2012 at 10:30 PM, Georg Brandl <g.brandl at> wrote:
> On 05.07.2012 22:24, Amaury Forgeot d'Arc wrote:
>> 2012/7/5 anatoly techtonik <techtonik at>:
>>> This makes me kind of sad. You have Python installed. Why can't you
>>> just crossplatformly do:
>>>   mkdir nacl
>>>   cd nacl
>>>   python -m urllib get
>>>   python
>> I'm sure there is already a way with standard python tools. Something
>> along these lines:
>> python -c "from urllib.request import urlretrieve; urlretrieve('URL',
>> '')"
>> python -m
>> The second command will work if the zip file has a
>> Do you think we need other tools?
> The "python -m urllib" (don't think "get" is required) interface certainly
> looks nice and is similar in style with many of the other __main__ stuff we
> add to stdlib modules.

Here is the implementation of urllib.__main__ module for Python 3 with
progress bar. I've left 'get' argument to make it extensible in future with
other commands, such as `test`.

While working on this code I've also found the regression which would
be nice to see fixed at the same time.
-------------- next part --------------
A non-text attachment was scrubbed...
Type: application/octet-stream
Size: 4783 bytes
Desc: not available
URL: <>

From Ronny.Pfannschmidt at  Mon Oct 15 13:39:16 2012
From: Ronny.Pfannschmidt at (Ronny Pfannschmidt)
Date: Mon, 15 Oct 2012 13:39:16 +0200
Subject: [Python-ideas] Proposal: A simple protocol for generator tasks
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Piet,

i like that finally someone is pointing out
how to deal with the *concurrent* part

i have some further notes

* greenlet interaction wanted
   since interacting with greenlets is slightly different
   from generators

   * they don?t get the function arguments at greenlet creation time,
     but on the first `switch`

     generator outer use:
       gn = f(*arg, **kwarg)

     greenlet outer use:
       gr = greenlet.greenlet(f)
       gr.switch(*args, **kw)

   * instead of send/next, they always use switch
   * `yield` is a function call
      -> there is need for a lib to manage the local part
         of greenlet operations in any case

         (so we should just ensure that the scheduler can
          handle their way if `yield`,
          but not actually have support/compat code in
          the stdlib for their yielding)

* considering regular classes for interaction
   since for some protocol implementations
   different means might make sense
   (this could also be used for the scheduler part of
    greenlet interaction)

   result -> a protocol for cooperative concurrency

* considering the upcoming pypy transaction module/stm
   since using that right could mean "free" parallelism in future
* alternatives for queues/channels are needed
* pools/rate-limiters and other exercises are needed as well
* some kind of default tools for servers are needed

* the stdlib could have a very simple default scheduler
   that?s just doing something basic like run all work it can do,
   and if it cant block on a io reactor

   we just need something that can run() after all has been created

   having an api like sheduler.add(gen) would be a plus
   (since it would be just like pypy's transaction module)

   an example i have in mind is something like


If things go as I planned on my side,
starting in jan/feb 2013 i'll try a prototype implementation
for further comments/actual experimentation.

-- Ronny

On 10/15/2012 05:36 AM, Piet Delport wrote:
> [This is a lengthy mail; I apologize in advance!]
> Hi,
> I've been following this discussion with great interest, and would like
> to put forward a suggestion that might simplify some of the questions
> that are up in the air.
> There are several key point being considered: what exactly constitutes a
> "coroutine" or "tasklet", what the precise semantics of "yield" and
> "yield from" should be, how the stdlib can support different event loops
> and reactors, and how exactly Futures, Deferreds, and other APIs fit
> into the whole picture.
> This mail is mostly about the first point: I think everyone agrees
> roughly what a coroutine-style generator is, but there's enough
> variation in how they are used, both historically and presently, that
> the concept isn't as precise as it should be. This makes them hard to
> think and reason about (failing the "BDFL gets headaches" test), and
> makes it harder to define the behavior of all the parts that they
> interact with, too.
> This is a sketch of an attempt to define what constitutes a
> generator-based task or coroutine more rigorously: I think that the
> essential behavior can be captured in a small protocol, building on the
> generator and iterator protocols. If anyone else thinks this is a good
> idea, maybe something like this could work its way into a PEP?
> (For the sake of this mail, I will use the term "generator task" or
> "task" as a straw man term, but feel free to substitute "coroutine", or
> whatever the preferred name ends up being.)
> Definition
> ==========
> Very informally: A "generator task" is what you get if you take a normal
> Python function and replace its blocking calls with "yield from" calls
> to equivalent subtasks.
> More formally, a "generator task" is a generator that implements an
> incremental, multi-step computation, and is intended to be externally
> driven to completion by a runner, or "scheduler", until it delivers a
> final result.
> This driving process happens as follows:
> 1. A generator task is iterated by its scheduler to yield a series of
>     intermediate "step" values.
> 2. Each value yielded as a "step" represents a scheduling instruction,
>     or primitive, to be interpreted by the task's scheduler.
>     This scheduling instruction can be None ("just resume this task
>     later"), or a variety of other primitives, such as Futures ("resume
>     this task with the result of this Future"); see below for more.
> 3. The scheduler is responsible for interpreting each "step" instruction
>     as appropriate, and sending the instruction's result, if any, back to
>     the task using send() or throw().
>     A scheduler may run a single task to completion, or may multiplex
>     execution between many tasks: generator tasks should assume that
>     other tasks may have executed while the task was yielding.
> 4. The generator task completes by successfully returning (raising
>     StopIteration), or by raising an exception. The task's caller
>     receives this result.
> (For the sake of discussion, I use "the scheduler" to refer to whoever
> calls the generator task's next/send/throw methods, and "the task's
> caller" to refer to whoever receives the task's final result, but this
> is not important to the protocol: a task should not care who drives it
> or consumes its result, just like an iterator should not.)
> Scheduling instructions / primitives
> ====================================
> (This could probably use a better name.)
> The protocol is intentionally agnostic about the implementation of
> schedulers, event loops, or reactors: as long as they implement the same
> set of scheduling primitives, code should work across them.
> There multiple ways to accomplish this, but one possibility is to have a
> set common, generic instructions in a standard library module such as
> "tasklib" (which could also contain things like default scheduler
> implementations, helper functions, and so on).
> A partial list of possible primitives (the names are all made up, not
> serious suggestions):
> 1. None: The most basic "do nothing" instruction. This just instructs
>     the scheduler to resume the yielding task later.
> 2. Futures: Instruct the scheduler to resume with the future's result.
>     Similar types in third-party libraries, such Deferreds, could
>     potentially be implemented either natively by a scheduler that
>     supports it, or using a wait_for_deferred(d) helper task, or using
>     the idea of a "adapter" scheduler (see below).
> 3. Control primitives: spawn, sleep, etc.
>     - Spawn a new (independent) task: yield tasklib.spawn(task())
>     - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar())
>     - Delay execution: yield tasklib.sleep(seconds)
>     - etc.
>     These could be simple marker objects, leaving it up to the underlying
>     scheduler to actually recognize and implement them; some could also
>     be implemented in terms of simpler operations (e.g.  sleep(), in
>     terms of lower-level suspend and resume operations).
> 4. I/O operations
>     This could be anything from low-level "yield fd_readable(sock)" style
>     requests, or any of the higher-level APIs being discussed elsewhere.
>     Whatever the exact API ends up being, the scheduler should implement
>     these primitives by waiting for the I/O (or condition), and resuming
>     the task with the result, if any.
> 5. Cooperative concurrency primitives, for working with locks, condition
>     variables, and so on. (If useful?)
> 6. Custom, scheduler-specific instructions: Since a generator task can
>     potentially yield anything as a scheduler instruction, it's not
>     inconceivable for specialized schedulers to support specialized
>     instructions. (Code that relies on such special instructions won't
>     work on other schedulers, but that would be the point.)
> A question open to debate is what a scheduler should do when faced with
> an unrecognized scheduling instruction.
> Raising TypeError or NotImplementedError back into the task is probably
> a reasonable action, and would allow code like:
>      def task():
>          try:
>              yield fancy_magic_instruction()
>          except NotImplementedError:
>              yield from boring_fallback()
>          ...
> Generator tasks as schedulers, and vice versa
> =============================================
> Note that there is a symmetry to the protocol when a generator task
> calls another using "yield from":
>      def task()
>          spam = yield from subtask()
> Here, task() is both a generator task, and the effective scheduler for
> subtask(): it "implements" subtask()'s scheduling instructions by
> delegating them to its own scheduler.
> This is a plain observation on its own, however, it raises one or two
> interesting possibilities for more interesting schedulers implemented as
> generator tasks themselves, including:
> - Specialized sub-schedulers that run as a normal task within their
>    parent scheduler, but implement for example weighted or priority
>    queuing of their subtasks, or similar features.
> - "Adapter" schedulers that intercept special scheduler instructions
>    (say, Deferreds or other library-specific objects), and implement them
>    using more generic instructions to the underlying scheduler.
> --
> Piet Delport
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From ncoghlan at  Mon Oct 15 14:08:21 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 15 Oct 2012 22:08:21 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 8:25 PM, Calvin Spealman <ironfroggy at> wrote:
> The more I follow this thread the less I understand the point of
> introducing a new use for yield-from in this discussion.

+1. To me, "yield from" is just a tool that brings generators back to
parity with functions when it comes to breaking up a larger algorithm
into smaller pieces. Where you would break a function out into
subfunctions and call them normally, with a generator you can break
out subgenerators and invoke them with yield from.

Any meaningful use of "yield from" in the coroutine context *has* to
ultimate devolve to an operation that:
1. Asks the scheduler to schedule another operation
2. Waits for that operation to complete

Guido's approach to that problem is that step 1 is handled by calling
functions that in turn call methods on a thread-local scheduler. These
methods return Future objects, which can subsequently be yielded to
the scheduler to say "I'm waiting for this future to be set".

I *thought* Greg's way combined step 1 and step 2 into a single
operation: the objects you yield *not only* say what you want to wait
for, but also what you want to do. However, his example par()
implementation killed that idea, since it turned out to need to
schedule tasks explicitly rather than their being a "execute this in
parallel" option.

So now I'm back to think that Greg and Guido are talking about
different levels. *Any* scheduling option will be able to be collapsed
into an async task invoked by "yield from" by writing:

    def simple_async_task():
        return yield start_task()

The part that still needs to be figured out is how you turn that
suspend/resume communications channel between the lowest level of the
task stack and the scheduling loop into something usable, as well as
how you handle iteration in a sensible way (I described my preferred
approach when writing about the API I'd like to see for an async
version of as_completed). I haven't seen anything to suggest that
"yield from"'s role should change from what it is in 3.3: a way to
factor out generators into multiple pieces with out breaking send()
and throw().


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Mon Oct 15 14:18:54 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 15 Oct 2012 22:18:54 +1000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 15, 2012 at 8:38 PM, Christian Tismer <tismer at> wrote:
> Just one thing that I don't get.
> What do you mean by 'implicit taskswitching' ?
> There is no such thing in greenlet, if you really meant that
> Library from Armin Rigo.
> greenlets do everything explicitly, no pre-emption at all.
> So, is there a general understanding what a greenlet is and what not?
> Just to make sure that the discussed terms are clearly defined.

With greenlets, your potential switching points are every function
call (because you can call switch() from anywhere, and you can't
reliably know the name of *every* IO operation, or operation that
implicitly invokes an IO operation).

With generators, there is always an explicit *local* marker within the
generator body of the potential switching points: yield expressions
(including yield from). Ordinary function calls cannot cause the
function to be suspended.

So greenlets give you the scalability benefits of microthreading (as
almost any OS supports a couple of orders of magnitude more sockets
than it can threads), but without the same benefits of locally visible
suspension points that are provided by generators and explicit

That's the philosophical reason. As a *practical* matter, there's
still the problem you described in more detail elsewhere that CPython
relies too much on the C stack to support suspension of arbitrary call
chains without the stack switching assembly code in


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ironfroggy at  Mon Oct 15 14:31:22 2012
From: ironfroggy at (Calvin Spealman)
Date: Mon, 15 Oct 2012 08:31:22 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

A thought about more ways we could control groups of tasks, and avoid
yield-from, just came to me this morning.

def asset_packer(asset_urls):
    with yield until_all as results:
        for url in asset_urls:
            yield http.get(url)
    return pack(results)


def handle_or_cached(url):
    with yield first as result:
        yield http.get(url)
        yield cache.get(url)
    return result

Currently, "with yield expr:" is not valid syntax, surprisingly. This gives us
room to use it for something new. A generator-sensitive context manager.

One option is just to allow the syntax directly. The generator yields, and
sent value is used as a context manager. This would let the generator
tell the scheduler "I'm going to give you a few different async ops, and I want
to wait for all of them before I continue." etc. However, it leaves open the
question how the scheduler knows the context manager has ended. Could it
somehow indicate this to the correct scheduler in __exit__?

Another option, if we're adding a new syntax anyway, is to make "with
yield expr:"
special and yield first the result of __enter__() and then, after the
block is done,
yield the result of __exit__(), which lets context blocks in the
generator talk to
the scheduler both before and after.

Maybe we don't need the second, nuttier idea. But, I like the general
idea. It feels

On Mon, Oct 15, 2012 at 8:08 AM, Nick Coghlan <ncoghlan at> wrote:
> On Mon, Oct 15, 2012 at 8:25 PM, Calvin Spealman <ironfroggy at> wrote:
>> The more I follow this thread the less I understand the point of
>> introducing a new use for yield-from in this discussion.
> +1. To me, "yield from" is just a tool that brings generators back to
> parity with functions when it comes to breaking up a larger algorithm
> into smaller pieces. Where you would break a function out into
> subfunctions and call them normally, with a generator you can break
> out subgenerators and invoke them with yield from.
> Any meaningful use of "yield from" in the coroutine context *has* to
> ultimate devolve to an operation that:
> 1. Asks the scheduler to schedule another operation
> 2. Waits for that operation to complete
> Guido's approach to that problem is that step 1 is handled by calling
> functions that in turn call methods on a thread-local scheduler. These
> methods return Future objects, which can subsequently be yielded to
> the scheduler to say "I'm waiting for this future to be set".
> I *thought* Greg's way combined step 1 and step 2 into a single
> operation: the objects you yield *not only* say what you want to wait
> for, but also what you want to do. However, his example par()
> implementation killed that idea, since it turned out to need to
> schedule tasks explicitly rather than their being a "execute this in
> parallel" option.
> So now I'm back to think that Greg and Guido are talking about
> different levels. *Any* scheduling option will be able to be collapsed
> into an async task invoked by "yield from" by writing:
>     def simple_async_task():
>         return yield start_task()
> The part that still needs to be figured out is how you turn that
> suspend/resume communications channel between the lowest level of the
> task stack and the scheduling loop into something usable, as well as
> how you handle iteration in a sensible way (I described my preferred
> approach when writing about the API I'd like to see for an async
> version of as_completed). I haven't seen anything to suggest that
> "yield from"'s role should change from what it is in 3.3: a way to
> factor out generators into multiple pieces with out breaking send()
> and throw().
> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
hink there is something wrong with the autolists that are set up to
include Premium and Free content.

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From shibturn at  Mon Oct 15 15:11:07 2012
From: shibturn at (Richard Oudkerk)
Date: Mon, 15 Oct 2012 14:11:07 +0100
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <k5h21h$2dd$>

On 12/10/2012 11:49pm, Guido van Rossum wrote:
>>> >>That said, the idea of a common API architected around async I/O,
>>> >>rather than non-blocking I/O, sounds interesting at least theoretically.
> (Oh, what a nice distinction.)
> ...
> How close would our abstracted reactor interface have to be exactly
> like IOCP? The actual IOCP API calls have very little to recommend
> them -- it's the implementation and the architecture that we're after.
> But we want it to be able to use actual IOCP calls on all systems that
> have them.

One could use IOCP or select/poll/... to implement an API which looks like

class AsyncHub:
     def read(self, fd, nbytes):
         """Return future which is ready when read is complete"""

     def write(self, fd, buf):
         """Return future which is ready when write is complete"""

     def accept(self, fd):
         """Return future which is ready when connection is accepted"""

     def connect(self, fd, address):
         """Return future which is ready when connection has succeeded"""

     def wait(self, timeout=None):
         """Wait till a future is ready; return list of ready futures"""

A reactor could then be built on top of such a hub.


From ncoghlan at  Mon Oct 15 15:48:46 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 15 Oct 2012 23:48:46 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman <ironfroggy at> wrote:
> Currently, "with yield expr:" is not valid syntax, surprisingly.

It's not that surprising, it's the general requirement that yield
expressions must be enclosed in parentheses except when used
standalone or in a simple assignment statement.

"with (yield expr):" is valid syntax though, so I'm reluctant to
endorse doing anything substantially different if the parentheses are

I think the combination of "yield from" to delegate control (including
exception handling) completely to a subgenerator and "context manager
+ for loop + explicit yield" when an operation needs to yield multiple
times and the exception handling behaviour should be left to the
caller (as in the "as_completed" case) should cover the necessary


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From tismer at  Mon Oct 15 15:57:53 2012
From: tismer at (Christian Tismer)
Date: Mon, 15 Oct 2012 15:57:53 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Hey Nick,

On 15.10.12 14:18, Nick Coghlan wrote:
> On Mon, Oct 15, 2012 at 8:38 PM, Christian Tismer <tismer at> wrote:
>> Just one thing that I don't get.
>> What do you mean by 'implicit taskswitching' ?
>> There is no such thing in greenlet, if you really meant that
>> Library from Armin Rigo.
>> greenlets do everything explicitly, no pre-emption at all.
>> So, is there a general understanding what a greenlet is and what not?
>> Just to make sure that the discussed terms are clearly defined.
> With greenlets, your potential switching points are every function
> call (because you can call switch() from anywhere, and you can't
> reliably know the name of *every* IO operation, or operation that
> implicitly invokes an IO operation).

That's true, and you will wonder: I never liked that!
See below (you'll wonder even more)
> With generators, there is always an explicit *local* marker within the
> generator body of the potential switching points: yield expressions
> (including yield from). Ordinary function calls cannot cause the
> function to be suspended.
> So greenlets give you the scalability benefits of microthreading (as
> almost any OS supports a couple of orders of magnitude more sockets
> than it can threads), but without the same benefits of locally visible
> suspension points that are provided by generators and explicit
> callbacks.

Yes, I understood that a lot better now.
The nice trick of the (actually a bit ugly) explicit down-chaining
of the locally visible switching points is the one thing that makes
a huge difference, both for Stackless and Greenlets.
Because we could never know the exact switching points, things became
so difficult to handle.

> That's the philosophical reason. As a *practical* matter, there's
> still the problem you described in more detail elsewhere that CPython
> relies too much on the C stack to support suspension of arbitrary call
> chains without the stack switching assembly code in
> Stackless/greenlets.

Right, CPython still keeps unneccessary crap on the C stack.
But that's not the point right now, because on the other hand,
in the context of a possible yield (from or not), the C stack
is clean, and this enables switching.
And actually in such clean positions, Stackless Python (as opposed to
Greenlets) does soft-switching, which is very similar to what the generators
are doing - there is no assembly stuff involved at all.

So in the context of switching, CPython is presumably more efficient
than greenlet (because of stack slicing), and a bit less efficient than
stackless because of the generator chaining.

I have begun studying the code for YIELD_FROM. As it is written, every
next iteration elevates the chain of generators once up and down.
Maybe that can be avoided by changing the frame chain, so this can become
a cheaper O(1) operation.

Alternatively I could also imagine to write real generators or coroutines
as an extension module. It would use the same concept as generators,
internally. No big deal, not changing the interpreter, maybe adding a bit.

I think this would make Greenlet and even Stackless obsolete in most
cases which are of real use.

I would like to discuss this and maybe do a prototype.

cheers - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From rene at  Mon Oct 15 16:11:00 2012
From: rene at (Rene Nejsum)
Date: Mon, 15 Oct 2012 16:11:00 +0200
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <k5h21h$2dd$>
References: <>
Message-ID: <>

On Oct 15, 2012, at 3:11 PM, Richard Oudkerk <shibturn at> wrote:

> On 12/10/2012 11:49pm, Guido van Rossum wrote:
>>>> >>That said, the idea of a common API architected around async I/O,
>>>> >>rather than non-blocking I/O, sounds interesting at least theoretically.
>> (Oh, what a nice distinction.)
>> ...
>> How close would our abstracted reactor interface have to be exactly
>> like IOCP? The actual IOCP API calls have very little to recommend
>> them -- it's the implementation and the architecture that we're after.
>> But we want it to be able to use actual IOCP calls on all systems that
>> have them.
> One could use IOCP or select/poll/... to implement an API which looks like
> class AsyncHub:
>    def read(self, fd, nbytes):
>        """Return future which is ready when read is complete"""
>    def write(self, fd, buf):
>        """Return future which is ready when write is complete"""
>    def accept(self, fd):
>        """Return future which is ready when connection is accepted"""
>    def connect(self, fd, address):
>        """Return future which is ready when connection has succeeded"""
>    def wait(self, timeout=None):
>        """Wait till a future is ready; return list of ready futures"""
> A reactor could then be built on top of such a hub.

So in general alle methods are async, even the wait() could be async if
it returned a Furure, this way all methods would be of the same concept.

I like this as a general API for all types of connections and all underlying OS'


> --
> Richard
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From ironfroggy at  Mon Oct 15 16:16:14 2012
From: ironfroggy at (Calvin Spealman)
Date: Mon, 15 Oct 2012 10:16:14 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan <ncoghlan at> wrote:
> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman <ironfroggy at> wrote:
>> Currently, "with yield expr:" is not valid syntax, surprisingly.
> It's not that surprising, it's the general requirement that yield
> expressions must be enclosed in parentheses except when used
> standalone or in a simple assignment statement.
> "with (yield expr):" is valid syntax though, so I'm reluctant to
> endorse doing anything substantially different if the parentheses are
> omitted.

Silly oversight on my part, and I agree that the parens shouldn't make the
difference in meaning.

> I think the combination of "yield from" to delegate control (including
> exception handling) completely to a subgenerator and "context manager
> + for loop + explicit yield" when an operation needs to yield multiple
> times and the exception handling behaviour should be left to the
> caller (as in the "as_completed" case) should cover the necessary
> behaviours.

I'm still -1 on delegating control to subgenerators with yield-from,
versus having the scheduler just deal with them directly.  I think it
is far less flexible.

I would still like to see a less confusing "with yield expr:" by
simply allowing it without parens, but no special meaning. I think it
would be really useful in coroutines.

with yield collect() as tasks:
  yield task1()
  yield task2()
results = yield tasks

> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From ncoghlan at  Mon Oct 15 16:46:00 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 16 Oct 2012 00:46:00 +1000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Mon, Oct 15, 2012 at 11:57 PM, Christian Tismer <tismer at> wrote:
> So in the context of switching, CPython is presumably more efficient
> than greenlet (because of stack slicing), and a bit less efficient than
> stackless because of the generator chaining.
> I have begun studying the code for YIELD_FROM. As it is written, every
> next iteration elevates the chain of generators once up and down.
> Maybe that can be avoided by changing the frame chain, so this can become
> a cheaper O(1) operation.

Yes, we certainly talked about that, but I don't believe anyone came
up with the code needed to make it behave itself properly when
unwinding the stack. (Either that or someone *did* try it, and then
undid it because it broke the test suite, which amounts to the same
thing. Mercurial could say for sure)

> Alternatively I could also imagine to write real generators or coroutines
> as an extension module. It would use the same concept as generators,
> internally. No big deal, not changing the interpreter, maybe adding a bit.

Tangentially related, there are some patches [1,2] on the tracker
looking to shuffle a few things related to generator state around to
get them out of the frame objects and into the generator objects where
they belong. There are definitely a few things that could do with
cleaning up in this space.


> I think this would make Greenlet and even Stackless obsolete in most
> cases which are of real use.

The "take this synchronous code and magically make it scale better"
aspect is still a nice feature of greenlets & gevent.

> I would like to discuss this and maybe do a prototype.

Sure, I think there's several things we can do better here, and I
think the test suite is comprehensive enough to keep us honest.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From ncoghlan at  Mon Oct 15 16:50:39 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 16 Oct 2012 00:50:39 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 12:16 AM, Calvin Spealman <ironfroggy at> wrote:
> I'm still -1 on delegating control to subgenerators with yield-from,
> versus having the scheduler just deal with them directly.  I think it
> is far less flexible.

Um, yield from is to generators as calls are to functions...
delegating to subgenerators, regardless of context, is what it's
*for*. Without it, the scheduler will have to do quite a bit of extra
work to reconstruct sane stack traces.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From daniel.mcdougall at  Mon Oct 15 17:00:57 2012
From: daniel.mcdougall at (Daniel McDougall)
Date: Mon, 15 Oct 2012 11:00:57 -0400
Subject: [Python-ideas] The async API of the future: Some thoughts from
 an ignorant Tornado user
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 5:32 AM, Laurens Van Houtven <_ at> wrote:
>> import async # The API of the future ;)
>> async.async_call(retrieve_log_playback, settings, tws,
>> mechanism=multiprocessing)
>> # tws == instance of tornado.web.WebSocketHandler that holds the open
>> connection
> Is this a CPU-bound problem?

It depends on the host.  On embedded platforms (e.g. the BeagleBone)
it is more IO-bound than CPU bound (fast CPU but slow disk and slow
memory).  On regular x86 systems it is mostly CPU-bound.

>> * I should be able to choose the type of event loop/async mechanism
>> that is appropriate for the task:  For CPU-bound tasks I'll probably
>> want to use multiprocessing.  For IO-bound tasks I might want to use
>> threading.  For a multitude of tasks that "just need to be async" (by
>> nature) I'll want to use an event loop.
> Ehhh, maybe. This sounds like it confounds the tools for different use
> cases. You can quite easily have threads and processes on top of an event
> loop; that works out particularly nicely for processes because you still
> have to talk to your processes.
> Examples:
> twisted.internet.reactor.spawnProcess (local processes)
> twisted.internet.threads.deferToThread (local threads)
> ampoule (remote processes)
> It's quite easy to do blocking IO in a thread with deferToThread; in fact,
> that's how twisted's adbapi, an async wrapper to dbapi, works.

As I understand it, twisted.internet.reactor.spawnProcess is all about
spawning subprocesses akin to subprocess.Popen().  Also, it requires
writing a sophisticated ProcessProtocol.  It seems to be completely
unrelated and wickedly complicated.  The complete opposite of what I
would consider ideal for an asynchronous library since it is anything
but simple.

I mean, I could write a separate program to generate HTML playback
files from logs, spawn a subprocess in an asynchronous fashion, then
watch it for completion but I could do that with termio.Multiplex

>> * Any async module should support 'basics' like calling functions at
>> an interval and calling functions after a timeout occurs (with the
>> ability to cancel).
>> * Asynchronous tasks should be able to access the same namespace as
>> everything else.  Maybe wishful thinking.
> With twisted, this is already the case; general caveats for shared mutable
> state across threads of course still apply. Fortunately in most Twisted
> apps, that's a tiny fraction of the total code, and they tend to be
> fractions that are well-isolated or at least easily isolatable.
>> * It should support publish/subscribe-style events (i.e. an event
>> dispatcher).  For example, the ability to watch a file descriptor or
>> socket for changes in state and call a function when that happens.
>> Preferably with the flexibility to define custom events (i.e don't
>> have it tied to kqueue/epoll-specific events).
> Like connectionMade, connectionLost, dataReceived etc?
>> Thanks for your consideration; and thanks for the awesome language.
>> --
>> Dan McDougall - Chief Executive Officer and Developer
>> Liftoff Software ? Your flight to the cloud is now boarding.
>> 904-446-8323
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
> --
> cheers
> lvh

Dan McDougall - Chief Executive Officer and Developer
Liftoff Software ? Your flight to the cloud is now boarding.

From ironfroggy at  Mon Oct 15 17:16:18 2012
From: ironfroggy at (Calvin Spealman)
Date: Mon, 15 Oct 2012 11:16:18 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 10:50 AM, Nick Coghlan <ncoghlan at> wrote:
> On Tue, Oct 16, 2012 at 12:16 AM, Calvin Spealman <ironfroggy at> wrote:
>> I'm still -1 on delegating control to subgenerators with yield-from,
>> versus having the scheduler just deal with them directly.  I think it
>> is far less flexible.
> Um, yield from is to generators as calls are to functions...
> delegating to subgenerators, regardless of context, is what it's
> *for*. Without it, the scheduler will have to do quite a bit of extra
> work to reconstruct sane stack traces.

I didn't consider the ease of sane stack traces, that is a good point.
I just see all the problems that seem to be harder to do right with yield-from
and I wish it could be made simpler by just bypassing them for coroutines.
I don't feel they are the same as the original intent of yield-from, but I see
the obvious way they match the need now.

But, I still want to make my case and will put another hypothetical on the
board. A "sane stack trace" only makes sense if we assume that tasks
"call" each other in the same kind of call tree that synchronous code flows
in, and I don't think that is necessarily the case. There are cases when one
task might want to end before tasks it as "called" are complete, and if we use
yield-from this is *impossible* but it is very useful.

An example of this is a task which makes multiple requests, but only needs to
wait for the results from less-than-all of them before returning. It
might still want
the other tasks to complete, even if it won't do anything with the results.

yield-from semantics won't allow a called task to continue, if needed, after the
calling task itself has completed.

Is there another way these semantics could be expressed?

> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From g.brandl at  Mon Oct 15 17:32:42 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 15 Oct 2012 17:32:42 +0200
Subject: [Python-ideas] Python as a tool to download stuff for
In-Reply-To: <>
References: <>
Message-ID: <k5haak$ich$>

On 10/15/2012 01:13 PM, anatoly techtonik wrote:
> On Fri, Jul 6, 2012 at 10:30 PM, Georg Brandl <g.brandl at> wrote:
>> On 05.07.2012 22:24, Amaury Forgeot d'Arc wrote:
>>> 2012/7/5 anatoly techtonik <techtonik at>:
>>>> This makes me kind of sad. You have Python installed. Why can't you
>>>> just crossplatformly do:
>>>>   mkdir nacl
>>>>   cd nacl
>>>>   python -m urllib get
>>>>   python
>>> I'm sure there is already a way with standard python tools. Something
>>> along these lines:
>>> python -c "from urllib.request import urlretrieve; urlretrieve('URL',
>>> '')"
>>> python -m
>>> The second command will work if the zip file has a
>>> Do you think we need other tools?
>> The "python -m urllib" (don't think "get" is required) interface certainly
>> looks nice and is similar in style with many of the other __main__ stuff we
>> add to stdlib modules.
> Here is the implementation of urllib.__main__ module for Python 3 with
> progress bar. I've left 'get' argument to make it extensible in future with
> other commands, such as `test`.

Please don't send patches to the mailing list, open a new tracker issue instead.


From ncoghlan at  Mon Oct 15 17:32:46 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 16 Oct 2012 01:32:46 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 1:16 AM, Calvin Spealman <ironfroggy at> wrote:
> An example of this is a task which makes multiple requests, but only needs to
> wait for the results from less-than-all of them before returning. It
> might still want
> the other tasks to complete, even if it won't do anything with the results.
> yield-from semantics won't allow a called task to continue, if needed, after the
> calling task itself has completed.
> Is there another way these semantics could be expressed?

Sure, did you see my as_completed example? You couldn't use "yield
from" for that, you'd need to use an ordinary iterator and an explicit
yield in the body of the loop (this is why I disagree with Greg that
"yield from" can serve as the one true API - it doesn't handle partial
iteration, and it doesn't handle pre- or post- processing around the
suspension points while iterating).

My preferred way of thinking of "yield from" is as a simple
refactoring tool: "Gee, this generator is getting kind of long and
unwieldy. I'll move this piece out into a separate generator, and use
yield from to invoke it" or "Hmm, I keep using this same sequence of 3
or 4 operations. I guess I'll move them out to a separate generator
and use yield from to invoke it in the appropriate places".

Compare that with the almost identical equivalents when refactoring a
function to call a helper function instead of doing everything inline:
"Gee, this function is getting kind of long and unwieldy. I'll move
this piece out into a separate function, and call it" or "Hmm, I keep
using this same sequence of 3 or 4 operations. I guess I'll move them
out to a separate function and call it it in the appropriate places".

Just as some operations can't be factored out with simple function
calls, hence we have iterators and context managers, so not all
operations will be able to be factored out of a coroutine with "yield
from" (hence why I consider "yield" to be the more appropriate core
primitive, with "yield from" just correctly factoring out the task of
complete delegation, which is otherwise hard to do correctly)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guido at  Mon Oct 15 17:33:32 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 08:33:32 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 1:33 AM, Nick Coghlan <ncoghlan at> wrote:
> On Mon, Oct 15, 2012 at 2:54 AM, Guido van Rossum <guido at> wrote:
>> On Sun, Oct 14, 2012 at 8:01 AM, Calvin Spealman <ironfroggy at> wrote:
>>> Why is subclassing a problem? It can be overused, but seems the right
>>> thing to do in this case. You want a protocol that responds to new data by
>>> echoing and tells the user when the connection was terminated? It makes
>>> sense that this is a subclass: a special case of some class that handles the
>>> base behavior.
>> I replied to this in detail on the "Twisted and Deferreds" thread in
>> an exchange. Summary: I'm -0 when it comes to subclassing protocol
>> classes; -1 on subclassing objects that implement significant
>> functionality.
> This problem does seem tailor-made for a Protocol ABC - you can
> inherit from it if you want, or call register() if you don't.

But you're still stuck with implementing the names that someone else
decided upon a decade ago... :-)

--Guido van Rossum (

From jstpierre at  Mon Oct 15 17:39:57 2012
From: jstpierre at (Jasper St. Pierre)
Date: Mon, 15 Oct 2012 11:39:57 -0400
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 11:33 AM, Guido van Rossum <guido at> wrote:
> But you're still stuck with implementing the names that someone else
> decided upon a decade ago... :-)

And why is that a bad thing? I don't see the value in having something
like: thing.set_data_received_callback(self.bake_some_eggs)

We're going to have to give *something* a name, eventually. Why not
pick it at the most direct level?

> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


From _ at  Mon Oct 15 17:51:03 2012
From: _ at (Laurens Van Houtven)
Date: Mon, 15 Oct 2012 17:51:03 +0200
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 5:32 PM, Nick Coghlan <ncoghlan at> wrote:

> My preferred way of thinking of "yield from" is as a simple
> refactoring tool: "Gee, this generator is getting kind of long and
> unwieldy. I'll move this piece out into a separate generator, and use
> yield from to invoke it" or "Hmm, I keep using this same sequence of 3
> or 4 operations. I guess I'll move them out to a separate generator
> and use yield from to invoke it in the appropriate places".

I agree. That's how I've used it. Maybe that's just short-sightedness.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From glyph at  Mon Oct 15 17:51:17 2012
From: glyph at (Glyph)
Date: Mon, 15 Oct 2012 08:51:17 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 15, 2012, at 1:03 AM, Shane Green <shane at> wrote:

> Namely, all callbacks registered with a given Promise instance, receive the output of the original operation

This is somewhat tangential to the I/O loop discussion, and my hope for that discussion is that it won't involve Deferreds, or Futures, or Promises, or any other request/response callback management abstraction, because requests and responses are significantly higher level than accept() and recv() and do not belong within the same layer.  The event loop ought to provide tools to experiment with event-driven abstractions so that users can use Deferreds and Promises - which are, fundamentally, perfectly interoperable, and still use standard library network protocol implementations.

What I think you were trying to say was that callback addition on Deferreds is a destructive operation; whereas your promises are (from the caller's perspective, at least) immutable.  Sometimes I do think that the visibly mutable nature of Deferreds was a mistake.  If I read you properly though, what you're saying is that you can do this:

promise = ...

and in yield-coroutine style this is effectively:

value = yield promise
beta(yield alpha(value))
delta(yield gamma(value))

This deficiency is reasonably easy to work around with Deferreds.  You can just do:

def fork(d):
    dprime = Deferred()
    def propagate(result):
        return result
    return dprime

and then:


Perhaps this function should be in Twisted; it's certainly come up a few times.

But, the fact that the original result is immediately forgotten can also be handy, because it helps the unused result get garbage collected faster, even if multiple things are hanging on to the Deferred after the initial result has been processed.  And it is actually pretty unusual to want to share the same result among multiple callers (which is why this function hasn't been added to the core yet).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Mon Oct 15 17:53:49 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 08:53:49 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 10:58 PM, Greg Ewing
<greg.ewing at> wrote:
> Guido van Rossum wrote:
>> Why wouldn't all generators that aren't blocked for I/O just run until
>> their next yield, in a round-robin fashion? That's fair enough for me.
>> But as I said, my intuition for how things work in Greg's world is not
>> very good.
> That's exactly how my scheduler behaves.
>> OTOH I am okay with only getting one of the exceptions. But I think
>> all of the remaining tasks should still be run to completion -- maybe
>> the caller just cared about their side effects. Or maybe this should
>> be an option to par().
> This is hard to answer without considering real use cases,
> but my feeling is that if I care enough about the results of
> the subtasks to wait until they've all completed before continuing,
> then if anything goes wrong in any of them, I might as well abandon
> the whole computation.
> If that's not the case, I'd be happy to wrap each one in a
> try-except that doesn't propagate the exception to the main
> task, but just records the information that the subtask
> failed somewhere, for the main task to check afterwards.
> Another direction to approach this is to consider that par()
> ought to be just an optimisation -- the result should be the same
> as if you'd written sequential code to perform the subtasks
> one after another. And in that case, an exception in one would
> prevent any of the following ones from executing, so it's fine
> if par() behaves like that, too.

I'd think of such a par() more as something that saves me typing than
as an optimization. Anyway, the key functionality I cannot live
without here is to start multiple tasks concurrently. It seems that
without par() or some other scheduling primitive, you cannot do that:
if I write

a = foo_task()  # Search google
b = bar_task()  # Search bing
ra = yield from a
rb = yield from b
# now compare search results

the tasks run sequentially. A good par() should run then concurrently.
But there needs to be another way to get a task running immediately
and concurrently; I believe that would be

a = spawn(foo_task())

right? One could then at any later point use

ra = yield from a

One could also combine these and do e.g.

a = spawn(foo_task())
b = spawn(bar_task())
<do more work locally>
ra, rb = yield from par(a, b)

Have I got the spelling for spawn() right? In many other systems (e.g.
threads, greenlets) this kind of operation takes a callable, not the
result of calling a function (albeit a generator). If it takes a
generator, would it return the same generator or a different one to
wait for?

--Guido van Rossum (

From solipsis at  Mon Oct 15 17:54:11 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 15 Oct 2012 17:54:11 +0200
Subject: [Python-ideas] The async API of the future: Reactors
References: <>
Message-ID: <>

On Mon, 15 Oct 2012 14:11:07 +0100
Richard Oudkerk <shibturn at>
> One could use IOCP or select/poll/... to implement an API which looks like
> class AsyncHub:
>      def read(self, fd, nbytes):
>          """Return future which is ready when read is complete"""
>      def write(self, fd, buf):
>          """Return future which is ready when write is complete"""
>      def accept(self, fd):
>          """Return future which is ready when connection is accepted"""
>      def connect(self, fd, address):
>          """Return future which is ready when connection has succeeded"""
>      def wait(self, timeout=None):
>          """Wait till a future is ready; return list of ready futures"""
> A reactor could then be built on top of such a hub.

I suppose the reactor would handle higher-level stuff such as TLS?



From ncoghlan at  Mon Oct 15 17:56:45 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 16 Oct 2012 01:56:45 +1000
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 1:33 AM, Guido van Rossum <guido at> wrote:
> But you're still stuck with implementing the names that someone else
> decided upon a decade ago... :-)

There's a certain benefit to everyone using the same names and being
able to read each others code, even when there's a (small?) risk of
the names not aging well. Do we really want the first step in
deciphering someone else's async code to be "OK, what did they call
their connection and data processing callbacks?"?

Twisted's IProtocol API is pretty simple:
- makeConnection
- connectionMade
- dataReceived
- connectionLost

Everything else is up to the individual protocols (including whether
or not they offer a "write" method)

The transport and producer/consumer APIs aren't much more complicated
and make rather a lot of sense. The precise *shape* of those APIs are
likely to be different in a generator based system, and I assume we'd
want to lose the camel-case names, but standardising the terminology
seems like a good idea.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From _ at  Mon Oct 15 18:04:09 2012
From: _ at (Laurens Van Houtven)
Date: Mon, 15 Oct 2012 18:04:09 +0200
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 5:56 PM, Nick Coghlan <ncoghlan at> wrote:

> Twisted's IProtocol API is pretty simple:
> - makeConnection
> - connectionMade
> - dataReceived
> - connectionLost
> Everything else is up to the individual protocols (including whether
> or not they offer a "write" method)

While I agree with everything else you're saying, write may be a bad
example: it's generally something on the *transport*, and it's an interface
method (ie always available) there.

> Cheers,
> Nick.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ironfroggy at  Mon Oct 15 18:06:44 2012
From: ironfroggy at (Calvin Spealman)
Date: Mon, 15 Oct 2012 12:06:44 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 11:53 AM, Guido van Rossum <guido at> wrote:
> On Sun, Oct 14, 2012 at 10:58 PM, Greg Ewing
> <greg.ewing at> wrote:
>> Guido van Rossum wrote:
>>> Why wouldn't all generators that aren't blocked for I/O just run until
>>> their next yield, in a round-robin fashion? That's fair enough for me.
>>> But as I said, my intuition for how things work in Greg's world is not
>>> very good.
>> That's exactly how my scheduler behaves.
>>> OTOH I am okay with only getting one of the exceptions. But I think
>>> all of the remaining tasks should still be run to completion -- maybe
>>> the caller just cared about their side effects. Or maybe this should
>>> be an option to par().
>> This is hard to answer without considering real use cases,
>> but my feeling is that if I care enough about the results of
>> the subtasks to wait until they've all completed before continuing,
>> then if anything goes wrong in any of them, I might as well abandon
>> the whole computation.
>> If that's not the case, I'd be happy to wrap each one in a
>> try-except that doesn't propagate the exception to the main
>> task, but just records the information that the subtask
>> failed somewhere, for the main task to check afterwards.
>> Another direction to approach this is to consider that par()
>> ought to be just an optimisation -- the result should be the same
>> as if you'd written sequential code to perform the subtasks
>> one after another. And in that case, an exception in one would
>> prevent any of the following ones from executing, so it's fine
>> if par() behaves like that, too.
> I'd think of such a par() more as something that saves me typing than
> as an optimization. Anyway, the key functionality I cannot live
> without here is to start multiple tasks concurrently. It seems that
> without par() or some other scheduling primitive, you cannot do that:
> if I write
> a = foo_task()  # Search google
> b = bar_task()  # Search bing
> ra = yield from a
> rb = yield from b
> # now compare search results
> the tasks run sequentially. A good par() should run then concurrently.
> But there needs to be another way to get a task running immediately
> and concurrently; I believe that would be
> a = spawn(foo_task())
> right? One could then at any later point use
> ra = yield from a
> One could also combine these and do e.g.
> a = spawn(foo_task())
> b = spawn(bar_task())
> <do more work locally>
> ra, rb = yield from par(a, b)
> Have I got the spelling for spawn() right? In many other systems (e.g.
> threads, greenlets) this kind of operation takes a callable, not the
> result of calling a function (albeit a generator). If it takes a
> generator, would it return the same generator or a different one to
> wait for?

I think "start this other async task, but let me continue now" (spawn) is
so common and basic an operation it needs to be first class. What if
we allow both yield and yield from of a task? If we allow spawn(task())
then we're not getting nice tracebacks anyway, so I think we should

  result1 = yield from task1() # wait for this other task
  result2 = yield from task2() # wait for this next


  future1 = yield task1() # spawn task
  future2 = yield task2() # spawn other task
  results = yield future1, future2

I was wrong to say we shouldn't do yield-from task scheduling, I see
the benefits now. but I don't think it has to be either or. I think it makes
sense to allow both, and that the behavior differences between the two
ways to invoke another task would be sensible. Both are primitives we
need to support as first-class operation. That is, without some wrapper
like spawn().

> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From guido at  Mon Oct 15 18:09:51 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 09:09:51 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 8:32 AM, Nick Coghlan <ncoghlan at> wrote:
> My preferred way of thinking of "yield from" is as a simple
> refactoring tool: "Gee, this generator is getting kind of long and
> unwieldy. I'll move this piece out into a separate generator, and use
> yield from to invoke it" or "Hmm, I keep using this same sequence of 3
> or 4 operations. I guess I'll move them out to a separate generator
> and use yield from to invoke it in the appropriate places".

In the NDB world you would say:

"Gee this _tasklet_ is getting kind of long and unwieldy. I'll move
this piece out into a separate _tasklet_, and use _yield_ to invoke

Creating a tasklet is just writing a generator decorated with
@ndb.tasklet -- after using this a bit it becomes total second nature
(I've seen several coworkers pick it up effortlessly).

I'll have to digest your other points about yield vs. yield-from more
carefully -- on the one hand I think it would be cool if yield-from
could give us an even simpler paradigm to write async code than NDB's
version, and that expectation was one of my main reasons to push for
PEP 380's acceptance. On the other hand you bring up some good points
with the as_completed() example (though I have a feeling Greg will
easily sail around it :-).

PS. Unrelated, and please don't respond to this or at least change the
subject if you feel compelled: there seem to be a lot of bad names in
this field. Twisted uses adjectives as nouns (Twisted, Deferred, I
read about another one), "add_done_callback" is too longwinded,
"as_completed" brings absolutely no useful association with it..

--Guido van Rossum (

From guido at  Mon Oct 15 18:17:55 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 09:17:55 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 8:39 AM, Jasper St. Pierre
<jstpierre at> wrote:
> On Mon, Oct 15, 2012 at 11:33 AM, Guido van Rossum <guido at> wrote:
>> But you're still stuck with implementing the names that someone else
>> decided upon a decade ago... :-)
> And why is that a bad thing? I don't see the value in having something
> like: thing.set_data_received_callback(self.bake_some_eggs)

But I do, and you've pinpointed exactly my argument. My code is all
about baking an egg, and (from my POV) it's secondary that it's
invoked by the reactor when data is received.

> We're going to have to give *something* a name, eventually. Why not
> pick it at the most direct level?

Let the reactor pick *its* names (e.g. set_data_received_callback).
Then I can pick mine.

--Guido van Rossum (

From guido at  Mon Oct 15 18:24:12 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 09:24:12 -0700
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 8:56 AM, Nick Coghlan <ncoghlan at> wrote:
> On Tue, Oct 16, 2012 at 1:33 AM, Guido van Rossum <guido at> wrote:
>> But you're still stuck with implementing the names that someone else
>> decided upon a decade ago... :-)
> There's a certain benefit to everyone using the same names and being
> able to read each others code, even when there's a (small?) risk of
> the names not aging well. Do we really want the first step in
> deciphering someone else's async code to be "OK, what did they call
> their connection and data processing callbacks?"?
> Twisted's IProtocol API is pretty simple:
> - makeConnection
> - connectionMade
> - dataReceived
> - connectionLost
> Everything else is up to the individual protocols (including whether
> or not they offer a "write" method)
> The transport and producer/consumer APIs aren't much more complicated
> (
> and make rather a lot of sense. The precise *shape* of those APIs are
> likely to be different in a generator based system, and I assume we'd
> want to lose the camel-case names, but standardising the terminology
> seems like a good idea.

I guess you see it as a template pattern, where everybody has to
implement the same state machine *somehow*. Like having to implement a
file-like object, or a mapping. I'm still convinced that the alternate
POV is just as valid in this case, but I'm going to let it rest
because it doesn't matter enough to me to keep arguing.

--Guido van Rossum (

From guido at  Mon Oct 15 18:25:53 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 09:25:53 -0700
Subject: [Python-ideas] Off-line most of the day
Message-ID: <>

I'm about to enter an intense all-day-long meeting at work, and won't
have time to keep up with email at all until late tonight. So have fun
discussing async APIs without me, and please stay on topic!

--Guido van Rossum (

From tismer at  Mon Oct 15 18:41:27 2012
From: tismer at (Christian Tismer)
Date: Mon, 15 Oct 2012 18:41:27 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 15.10.12 16:46, Nick Coghlan wrote:
> On Mon, Oct 15, 2012 at 11:57 PM, Christian Tismer <tismer at> wrote:
>> Alternatively I could also imagine to write real generators or coroutines
>> as an extension module. It would use the same concept as generators,
>> internally. No big deal, not changing the interpreter, maybe adding a bit.
> Tangentially related, there are some patches [1,2] on the tracker
> looking to shuffle a few things related to generator state around to
> get them out of the frame objects and into the generator objects where
> they belong. There are definitely a few things that could do with
> cleaning up in this space.
> [1]
> [2]

Thanks for pointing me at that. I think Mark Shannon has quite similar 
I need to talk to him.
>> I think this would make Greenlet and even Stackless obsolete in most
>> cases which are of real use.
> The "take this synchronous code and magically make it scale better"
> aspect is still a nice feature of greenlets & gevent.

I had a deeper look into gevent and how it uses greenlet and
does its monkey-patching. Indeed, cute!
My assumption was that I could write a surrogate greenlet
using the advanced generators.

But I overlooked that for this to work, everything must behave
like generators. Not only the surrogate greenlet, but also
the code that it wants to switch. Argh...

A work-around for gevent would be a rewrite of all supported
modules to patch. Not a cake walk.

Thanks, you gave me a lot of insight!
>> I would like to discuss this and maybe do a prototype.
> Sure, I think there's several things we can do better here, and I
> think the test suite is comprehensive enough to keep us honest.
> Cheers,
> Nick.

Cheers - Chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From tismer at  Mon Oct 15 19:25:31 2012
From: tismer at (Christian Tismer)
Date: Mon, 15 Oct 2012 19:25:31 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 15.10.12 15:57, Christian Tismer wrote:
> Right, CPython still keeps unneccessary crap on the C stack.
> But that's not the point right now, because on the other hand,
> in the context of a possible yield (from or not), the C stack
> is clean, and this enables switching.
> And actually in such clean positions, Stackless Python (as opposed to
> Greenlets) does soft-switching, which is very similar to what the 
> generators
> are doing - there is no assembly stuff involved at all.

I'm sorry about the expression "crap". Please read this as "stuff".

I was not aware of the unfriendliness of this word and will be more
careful next time.

cheers - Chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From shibturn at  Mon Oct 15 19:25:16 2012
From: shibturn at (Richard Oudkerk)
Date: Mon, 15 Oct 2012 18:25:16 +0100
Subject: [Python-ideas] The async API of the future: Reactors
In-Reply-To: <>
References: <>
	<k5h21h$2dd$> <>
Message-ID: <k5hgu1$kqh$>

On 15/10/2012 4:54pm, Antoine Pitrou wrote:
> I suppose the reactor would handle higher-level stuff such as TLS?

Yes.  The hub would just cover the platform dependent IO stuff.


From dinov at  Mon Oct 15 19:24:16 2012
From: dinov at (Dino Viehland)
Date: Mon, 15 Oct 2012 17:24:16 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space.  We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3.  Our version of this is also based around futures so that an @async API will return a future.  The big difference here might be that we always return a future from a call rather than yielding it up the stack.  So our API works with just simple yields rather than yield froms.  This is what a simple usage of the API looks like:

        from concurrent.futures import ThreadPoolExecutor
        from urllib.request import urlopen
        executor = ThreadPoolExecutor(max_workers=5)

        def load_url(url):
            return urlopen(_url).read()

        def get_image_async(url):
            buffer = yield executor.submit(load_url, url)
            return Image(buffer)

        def main(image_uri):
            img_future = get_image_async(image_uri)
            # perform other tasks while the image is downloading
            img = img_future.result()


This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future.  So inside of an async method anything which is yielded should be a future.  The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future.  Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value.

Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there.  Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed.

The big outstanding item we're still working through is I/O, but we think the contexts help here too.  We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue.  We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread.

Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run):
Tk context:
Tk app:

-----Original Message-----
From: Python-ideas [ at] On Behalf Of Calvin Spealman
Sent: Monday, October 15, 2012 7:16 AM
To: Nick Coghlan
Cc: python-ideas at
Subject: Re: [Python-ideas] The async API of the future: yield-from

On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan <ncoghlan at> wrote:
> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman <ironfroggy at> wrote:
>> Currently, "with yield expr:" is not valid syntax, surprisingly.
> It's not that surprising, it's the general requirement that yield 
> expressions must be enclosed in parentheses except when used 
> standalone or in a simple assignment statement.
> "with (yield expr):" is valid syntax though, so I'm reluctant to 
> endorse doing anything substantially different if the parentheses are 
> omitted.

Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning.

> I think the combination of "yield from" to delegate control (including 
> exception handling) completely to a subgenerator and "context manager
> + for loop + explicit yield" when an operation needs to yield multiple
> times and the exception handling behaviour should be left to the 
> caller (as in the "as_completed" case) should cover the necessary 
> behaviours.

I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly.  I think it is far less flexible.

I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines.

with yield collect() as tasks:
  yield task1()
  yield task2()
results = yield tasks

> Cheers,
> Nick.
> --
> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing: _______________________________________________
Python-ideas mailing list
Python-ideas at

From glyph at  Mon Oct 15 20:08:41 2012
From: glyph at (Glyph)
Date: Mon, 15 Oct 2012 11:08:41 -0700
Subject: [Python-ideas] Expressiveness of coroutines versus Deferred
	callbacks (or possibly promises, futures)
Message-ID: <>

Still working my way through zillions of messages on this thread, trying to find things worth responding to, I found this, from Guido:

> [Generators are] more flexible [than Deferreds], since it is easier to catch different exceptions at different points (...) In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change.

If you were actually paying attention, we did explain what "the real stuff" is, and why you can't do it with inlineCallbacks. ;-)

(Or perhaps I should say, why we prefer to do it with Deferreds explicitly.)

Managing parallelism is easy with the when-this-then-that idiom of Deferreds, but challenging with the sequential this-then-this-then-this idiom of generators.  The examples in the quoted message were all sequential workflows, which are roughly equivalent in both styles.  As soon as a for loop gets involved though, yield-based coroutines have a harder time expressing the kind of parallelism that a lot of applications should use, so it's easy to become accidentally sequential (and therefore less responsive) even if you don't need to be.  For example, using some hypothetical generator coroutine library, the idiomatic expression of a loop across several request/responses would be something like this:

def something_async():
    values = yield step1()
    results = set()
    for value in values:
        results.add(step3((yield step2(value))))

Since it's in a set, the order of 'results' doesn't actually matter; but this code needs to sit and wait for each result to come back in order; it can't perform any processing on the ones that are already ready while it's waiting.  You express this with Deferreds:

def something_deferred():
    return step1().addCallback(
        lambda values: gatherResults([step2(value).addCallback(step3)
                                      for value in values])).addCallback(set)

In addition to being a roughly equivalent amount of code (fewer lines, but denser), that will run step2() and step3() on demand, as results are ready from the set of Deferreds from step1.  That means that your program will automatically spread out its computation, which makes better use of time as results may be arriving in any order.

The problem is that it is difficult to express laziness with generator coroutines: you've already spent the generator-ness on the function on responding to events, so there's no longer any syntactic support for laziness.

(There's another problem where sometimes you can determine that work needs to be done as it arrives; that's an even trickier abstraction than Deferreds though and I'm still working on it. I think I've mentioned <> already in one of my previous posts.)

Also, this is not at all a hypothetical or academic example.  This pattern comes up all the time in e.g. web-spidering and chat applications.

To be fair, you could express this in a generator-coroutine library like this:

def something_async():
    values = yield step1()
    thunks = []
    def do_steps(value):
        return_(step3((yield step2(value))))
    for value in values:
    return_(set((yield multi_wait(thunks))))

but that seems bizarre and not very idiomatic; to me, it looks like the confusing aspects of both styles.

David Reid also wrote up some examples of how Deferreds can express sequential workflows more nicely as well (also indirectly as a response to Guido!) on his blog, here: <>.

> Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle),

inlineCallbacks (and the even-earlier deferredGenerator) predates Monocle.  That's not to say Monocle has no value; it is a portability layer between Twisted and Tornado that does the same thing inlineCallbacks does but allows you to do it even if you're not using Deferreds, which will surely be useful to some people.

I don't want to belabor this point, but it bugs me a little bit that we get so much feedback from the broader Python community along the lines of "Why doesn't Twisted do X?  I'd use it if it did X, but it's all weird and I don't understand Y that it forces me to do instead, that's why I use Z" when, in fact:

Twisted does do X
It's done X for years
It actually invented X in the first place
There are legitimate reasons why we (Twisted core developers) suggest and prefer Y for many cases, but you don't need to do it if you don't want to follow our advice
Thing Z that is being cited as doing X actually explicitly mentions Twisted as an inspiration for its implementation of X

It's fair, of course, to complain that we haven't explained this very well, and I'll cop to that unless I can immediately respond with a pre-existing URL that explains things :).

One other comment that's probably worth responding to:

> I suppose on systems that support both networking and GUI events, in my design these would use different I/O objects (created using different platform-specific factories) and the shared reactor API would sort things out based on the type of I/O object passed in to it.

In my opinion, it is a mistake to try to harmonize or unify all GUI event systems, unless you are also harmonizing the GUI itself (i.e. writing a totally portable GUI toolkit that does everything).  And I think we can all agree that writing a totally portable GUI toolkit is an impossibly huge task that is out of scope for this (or, really, any other) discussion.  GUI systems can already dispatch its event to user code just fine - interposing a Python reactor API between the GUI and the event registration adds additional unnecessary work, and may not even be possible in some cases.  See, for example, the way that Xcode (formerly Interface Builder) and the Glade interface designer use: the name of the event handler is registered inside a somewhat opaque blob, which is data and not code, and then hooked up automatically at runtime based on reflection.  The code itself never calls any event-registration APIs.

Also, modeling all GUI interaction as a request/response conversation is limiting and leads to bad UI conventions.  Consider: the UI element that most readily corresponds to a request/response is a modal dialog box.  Does anyone out there really like applications that consist mainly of popping up dialog after dialog to prompt you for the answers to questions?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mikegraham at  Mon Oct 15 21:12:15 2012
From: mikegraham at (Mike Graham)
Date: Mon, 15 Oct 2012 15:12:15 -0400
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 14, 2012 at 3:57 PM, Mike Meyer <mwm at> wrote:
> On Sun, 14 Oct 2012 07:40:57 +0200
> Yuval Greenfield <ubershmekel at> wrote:
>> On Sun, Oct 14, 2012 at 2:04 AM, MRAB <python at> wrote:
>> > If it's more than one codepoint, we could prefix with the length of the
>> > codepoint's name:
>> >
>> > def __12CIRCLED_PLUS__(x, y):
>> >     ...
>> >
>> >
>> That's a bit impractical, and why reinvent the wheel? I'd much rather:
>> def \u2295(x, y):
>>     ....
>> So readable I want to read it twice. And that's not legal python today so
>> we don't break backwards compatibility!
> Yes, but we're defining an operator for instances of the class, so it
> needs the 'special' method marking:
> def __\u2295__(self, other):
> Now *that's* pretty!
>     <mike

IMO it's essential that we add source code escapes. Imagine the
one-liners this will allow!

    def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None

Can we get this fix applied in Python 2.2 and up?


From guido at  Mon Oct 15 21:33:41 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 12:33:41 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Wow, sounds very similar to NDB's approach! Please do check out NDB's
tasklets and event loop:

On Mon, Oct 15, 2012 at 10:24 AM, Dino Viehland <dinov at> wrote:
> I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space.  We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3.  Our version of this is also based around futures so that an @async API will return a future.  The big difference here might be that we always return a future from a call rather than yielding it up the stack.  So our API works with just simple yields rather than yield froms.  This is what a simple usage of the API looks like:
>         from concurrent.futures import ThreadPoolExecutor
>         from urllib.request import urlopen
>         executor = ThreadPoolExecutor(max_workers=5)
>         def load_url(url):
>             return urlopen(_url).read()
>         @async
>         def get_image_async(url):
>             buffer = yield executor.submit(load_url, url)
>             return Image(buffer)
>         def main(image_uri):
>             img_future = get_image_async(image_uri)
>             # perform other tasks while the image is downloading
>             img = img_future.result()
>         main("")
> This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future.  So inside of an async method anything which is yielded should be a future.  The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future.  Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value.
> Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there.  Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed.
> The big outstanding item we're still working through is I/O, but we think the contexts help here too.  We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue.  We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread.
> Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run):
> Tk context:
> Tk app:
> -----Original Message-----
> From: Python-ideas [ at] On Behalf Of Calvin Spealman
> Sent: Monday, October 15, 2012 7:16 AM
> To: Nick Coghlan
> Cc: python-ideas at
> Subject: Re: [Python-ideas] The async API of the future: yield-from
> On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan <ncoghlan at> wrote:
>> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman <ironfroggy at> wrote:
>>> Currently, "with yield expr:" is not valid syntax, surprisingly.
>> It's not that surprising, it's the general requirement that yield
>> expressions must be enclosed in parentheses except when used
>> standalone or in a simple assignment statement.
>> "with (yield expr):" is valid syntax though, so I'm reluctant to
>> endorse doing anything substantially different if the parentheses are
>> omitted.
> Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning.
>> I think the combination of "yield from" to delegate control (including
>> exception handling) completely to a subgenerator and "context manager
>> + for loop + explicit yield" when an operation needs to yield multiple
>> times and the exception handling behaviour should be left to the
>> caller (as in the "as_completed" case) should cover the necessary
>> behaviours.
> I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly.  I think it is far less flexible.
> I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines.
> with yield collect() as tasks:
>   yield task1()
>   yield task2()
> results = yield tasks
>> Cheers,
>> Nick.
>> --
>> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
> --
> Read my blog! I depend on your acceptance of my opinion! I am interesting!
> Follow me if you're into that sort of thing: _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

--Guido van Rossum (

From jstpierre at  Mon Oct 15 21:37:32 2012
From: jstpierre at (Jasper St. Pierre)
Date: Mon, 15 Oct 2012 15:37:32 -0400
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 3:12 PM, Mike Graham <mikegraham at> wrote:
>> def __\u2295__(self, other):
>> Now *that's* pretty!
>>     <mike
> IMO it's essential that we add source code escapes. Imagine the
> one-liners this will allow!
>     def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None
> Can we get this fix applied in Python 2.2 and up?

Yeah, this is how Java works, and it's one of the best features of the
language, because any valid program can be expressed using ASCII only.

Of course, it means that there are going to be some edge cases. Like, now:

    print "\n"

will be an invalid program, since the newline escape will be
translated before the source is tokenized. But who does that? It's
just a small price to pay for the big wins of having any program
expressed in simple ASCII.

> Mike
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


From jsbueno at  Mon Oct 15 22:00:27 2012
From: jsbueno at (Joao S. O. Bueno)
Date: Mon, 15 Oct 2012 17:00:27 -0300
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 15 October 2012 16:12, Mike Graham <mikegraham at> wrote:
> On Sun, Oct 14, 2012 at 3:57 PM, Mike Meyer <mwm at> wrote:
>> On Sun, 14 Oct 2012 07:40:57 +0200
>> Yuval Greenfield <ubershmekel at> wrote:
>>> On Sun, Oct 14, 2012 at 2:04 AM, MRAB <python at> wrote:
>>> > If it's more than one codepoint, we could prefix with the length of the
>>> > codepoint's name:
>>> >
>>> > def __12CIRCLED_PLUS__(x, y):
>>> >     ...
>>> >
>>> >
>>> That's a bit impractical, and why reinvent the wheel? I'd much rather:
>>> def \u2295(x, y):
>>>     ....
>>> So readable I want to read it twice. And that's not legal python today so
>>> we don't break backwards compatibility!
>> Yes, but we're defining an operator for instances of the class, so it
>> needs the 'special' method marking:
>> def __\u2295__(self, other):
>> Now *that's* pretty!
>>     <mike
> IMO it's essential that we add source code escapes. Imagine the
> one-liners this will allow!
>     def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None
> Can we get this fix applied in Python 2.2 and up?

" The time machine strikes again!"
What you want is _valid_ in Python, likely since 2.2 -
 You will need at least two lines in the file:
# coding:unicode_escape\n
def a():\n\tprint "Helo World"\n\na()

> Mike
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From python at  Mon Oct 15 22:14:17 2012
From: python at (MRAB)
Date: Mon, 15 Oct 2012 21:14:17 +0100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-15 20:37, Jasper St. Pierre wrote:
> On Mon, Oct 15, 2012 at 3:12 PM, Mike Graham <mikegraham at> wrote:
>>> def __\u2295__(self, other):
>>> Now *that's* pretty!
>>>     <mike
>> IMO it's essential that we add source code escapes. Imagine the
>> one-liners this will allow!
>>     def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None
>> Can we get this fix applied in Python 2.2 and up?
> Yeah, this is how Java works, and it's one of the best features of the
> language, because any valid program can be expressed using ASCII only.
> Of course, it means that there are going to be some edge cases. Like, now:
>      print "\n"
> will be an invalid program, since the newline escape will be
> translated before the source is tokenized. But who does that? It's
> just a small price to pay for the big wins of having any program
> expressed in simple ASCII.

     print "\\n"

From greg.ewing at  Mon Oct 15 22:14:47 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 09:14:47 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan wrote:

> The main primitive I personally want out of an async API is a
> task-based equivalent to concurrent.futures.as_completed() [1]. This
> is what I meant about iteration being a bit of a mess: the way the
> as_completed() works, the suspend/resume channel of the iterator
> protocol is being used to pass completed future objects back to the
> calling iterator. That means that channel *can't* be used to talk
> between the coroutine and the scheduler,

I had to read this a couple of times before I figured out
what you're talking about, but I get it now.

This is an instance of a general problem that was noticed
back when I was discussing my cofunctions idea: using
generator-based coroutines, it's not possible to have a
"suspendable iterator", because that would require "yield"
to have two conflicting meanings: "suspend this coroutine"
on one hand, and "provide a value to my caller" on the

Unfortunately, I suspect that a truly elegant solution to this
problem will require yet another language addition -- something

    yield for item in subtask():

which would run a slightly different version of the iterator
protocol in which values to be yield are wrapped somehow
(I haven't figured out all the details yet).


From greg.ewing at  Mon Oct 15 22:19:58 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 09:19:58 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Christian Tismer wrote:

> Question: Is it already given that something like greenlets is out
> of consideration?

Greenlets will always be available to those who want and
are able to use them.

But there's a desire to have something in the standard library
that is completely portable and doesn't rely on any platform
dependent techniques or tricks. That's what we're talking
about here.


From jimjjewett at  Mon Oct 15 22:21:50 2012
From: jimjjewett at (Jim Jewett)
Date: Mon, 15 Oct 2012 16:21:50 -0400
Subject: [Python-ideas] filename comparison [was] Re: PEP 428 -
 object-oriented filesystem paths
Message-ID: <>

On 10/8/12, Greg Ewing <greg.ewing at> wrote:
> Ronald Oussoren wrote:
>> neither statvs, statvfs,  nor pathconf seem to be able to tell if a
>> filesystem is case insensitive.

> Even if they could, you wouldn't be entirely out of the woods,
> because different parts of the same path can be on different
> file systems...

> But how important is all this anyway? I'm trying to think of
> occasions when I've wanted to compare two entire paths for
> equality, and I can't think of *any*.

I can think of several, but when I thought a bit harder, they were
mostly bug attractors.

If I want my program (or a dict) to know that "CONFIG" and "config"
are the same, then I also want it to know that "My Documents" is the
same as "MYDOCU~1".*

Ideally, I would also have a way to find out that a pathname is likely
to be problematic for cross-platform uses, or at least whether two
specific pathnames are known to be collision-prone on existing
platforms other than mine.  (But I'm not sure that sort of test can be
reliable enough for the stdlib.  Would just check for caseless
equality, reserved Windows names, and non-alphanumeric characters in
the filename?)

*(Well, assuming it is.  The short name depends on the history of the


From phd at  Mon Oct 15 21:23:29 2012
From: phd at (Oleg Broytman)
Date: Mon, 15 Oct 2012 23:23:29 +0400
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 03:12:15PM -0400, Mike Graham <mikegraham at> wrote:
> IMO it's essential that we add source code escapes. Imagine the
> one-liners this will allow!
>     def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None

   SyntaxError: a semicolon required after 'except ValueError'.

     Oleg Broytman              phd at
           Programmers don't die, they just GOSUB without RETURN.

From dinov at  Mon Oct 15 23:45:13 2012
From: dinov at (Dino Viehland)
Date: Mon, 15 Oct 2012 21:45:13 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

They look remarkably similar.  The biggest difference I see is that NDB appears to be using an event loop to keep the futures running while we're using add_done_callback (on the yielded futures) to continue stepping the generator function along.  So there's not necessary an event loop in our case, and in fact the default context always just executes things synchronously.  But frameworks can replace the default context so that work is posted into an event loop of some form.

-----Original Message-----
From: gvanrossum at [mailto:gvanrossum at] On Behalf Of Guido van Rossum
Sent: Monday, October 15, 2012 12:34 PM
To: Dino Viehland
Cc: ironfroggy at; Nick Coghlan; python-ideas at
Subject: Re: [Python-ideas] The async API of the future: yield-from

Wow, sounds very similar to NDB's approach! Please do check out NDB's tasklets and event loop:

On Mon, Oct 15, 2012 at 10:24 AM, Dino Viehland <dinov at> wrote:
> I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space.  We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3.  Our version of this is also based around futures so that an @async API will return a future.  The big difference here might be that we always return a future from a call rather than yielding it up the stack.  So our API works with just simple yields rather than yield froms.  This is what a simple usage of the API looks like:
>         from concurrent.futures import ThreadPoolExecutor
>         from urllib.request import urlopen
>         executor = ThreadPoolExecutor(max_workers=5)
>         def load_url(url):
>             return urlopen(_url).read()
>         @async
>         def get_image_async(url):
>             buffer = yield executor.submit(load_url, url)
>             return Image(buffer)
>         def main(image_uri):
>             img_future = get_image_async(image_uri)
>             # perform other tasks while the image is downloading
>             img = img_future.result()
>         main("")
> This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future.  So inside of an async method anything which is yielded should be a future.  The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future.  Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value.
> Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there.  Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed.
> The big outstanding item we're still working through is I/O, but we think the contexts help here too.  We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue.  We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread.
> Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run):
> Tk context: 
> Tk app:
> -----Original Message-----
> From: Python-ideas 
> [ at] On Behalf 
> Of Calvin Spealman
> Sent: Monday, October 15, 2012 7:16 AM
> To: Nick Coghlan
> Cc: python-ideas at
> Subject: Re: [Python-ideas] The async API of the future: yield-from
> On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan <ncoghlan at> wrote:
>> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman <ironfroggy at> wrote:
>>> Currently, "with yield expr:" is not valid syntax, surprisingly.
>> It's not that surprising, it's the general requirement that yield 
>> expressions must be enclosed in parentheses except when used 
>> standalone or in a simple assignment statement.
>> "with (yield expr):" is valid syntax though, so I'm reluctant to 
>> endorse doing anything substantially different if the parentheses are 
>> omitted.
> Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning.
>> I think the combination of "yield from" to delegate control 
>> (including exception handling) completely to a subgenerator and 
>> "context manager
>> + for loop + explicit yield" when an operation needs to yield 
>> + multiple
>> times and the exception handling behaviour should be left to the 
>> caller (as in the "as_completed" case) should cover the necessary 
>> behaviours.
> I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly.  I think it is far less flexible.
> I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines.
> with yield collect() as tasks:
>   yield task1()
>   yield task2()
> results = yield tasks
>> Cheers,
>> Nick.
>> --
>> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
> --
> Read my blog! I depend on your acceptance of my opinion! I am interesting!
> Follow me if you're into that sort of thing: 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

--Guido van Rossum (

From mikegraham at  Tue Oct 16 00:06:47 2012
From: mikegraham at (Mike Graham)
Date: Mon, 15 Oct 2012 18:06:47 -0400
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 3:23 PM, Oleg Broytman <phd at> wrote:
> On Mon, Oct 15, 2012 at 03:12:15PM -0400, Mike Graham <mikegraham at> wrote:
>> IMO it's essential that we add source code escapes. Imagine the
>> one-liners this will allow!
>>     def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None
>    SyntaxError: a semicolon required after 'except ValueError'.
> Oleg.

Obviously we'd make those pesky semicolons optional in the process.


From anacrolix at  Tue Oct 16 01:37:58 2012
From: anacrolix at (Matt Joiner)
Date: Tue, 16 Oct 2012 10:37:58 +1100
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

I gave something like this a go a while ago:

"Coroutines" yield events or futures as Nick put them from the top, and the
scheduler at the bottom manages events and scheduling.

There are a few things I took away from this attempt:

1) Explicit yield a la PEP380 requires syntactical changes *everywhere*.

2) Python's dynamic typing means that neglecting to "yield from" gives you
broken code, and Python won't help you here. Add to this that you now have
a 380, and "normal synchronous" form of most interfaces and the caller must
know what kind is used at all times.

3) Concurrency is nice, but it requires language-level support, and proper
parallelism to really shine. The "C way" of doing things is already so
heavily ingrained in Python, an entirely new standard library and
interpreter that breaks C compatibility is really the only way to proceed,
and this certainly isn't worth it just to write code with "yield from"
littered on every line.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Tue Oct 16 02:13:58 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 13:13:58 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan wrote:
> To me, "yield from" is just a tool that brings generators back to
> parity with functions when it comes to breaking up a larger algorithm
> into smaller pieces. Where you would break a function out into
> subfunctions and call them normally, with a generator you can break
> out subgenerators and invoke them with yield from.

That's exactly correct. It's the way I intended "yield from"
to be thought of right from the beginning.

What I'm arguing is that the *public api* for any suspendable
operation should be in the form of something called using
yield-from, because you never know when the implementation
might want to break it down into sub-operations and use
yield-from to call *them*.

> Any meaningful use of "yield from" in the coroutine context *has* to
> ultimate devolve to an operation that:
> 1. Asks the scheduler to schedule another operation
> 2. Waits for that operation to complete

I don't think I would put it quite that way. In my view
of things at least, the scheduler doesn't schedule "operations"
(in the sense of "read some bytes from this socket" etc.)
Rather, it schedules the running of tasks.

So the breakdown is really:

1. Start an operation (this doesn't involve the scheduler)
2. Ask the scheduler to suspend this task until the
    operation is finished

Also, this breakdown is only necessary at the very lowest
level, where you want to do something that isn't provided
in the form of a generator.

Obviously it's *possible* to treat each level of the call
chain as its own subtask, that you spawn independently and
then wait for it to finish. That's what people have done
in the past with their trampoline schedulers that interpret
yielded "call" and "return" instructions.

But one of the purposes of yield-from is to relieve the
scheduler of the need to handle things at that level of
granularity. It can treat a generator together with all
the subgenerators it might call as a *single* task, the
same way that a greenlet is thought of as a single task,
however many levels of function calls it might make.

> I *thought* Greg's way combined step 1 and step 2 into a single
> operation: the objects you yield *not only* say what you want to wait
> for, but also what you want to do.

I don't actually yield objects at all, but...

> However, his example par()
> implementation killed that idea, since it turned out to need to
> schedule tasks explicitly rather than their being a "execute this in
> parallel" option.

I don't see how that's a problem. Seems to me it's just as
easy for the user to call a par() function as it is to yield
a tuple of tasks. And providing this functionality using a
function means that different versions or options can be
made available for variations such as different ways of
handling exceptions. Using yield, you need to pick one of
the variations and bless it as being the one that you
invoke using special syntax.

If you're complaining that the implementation of par()
seems too complex, well, that complexity has to occur
*somewhere* -- if it's not in the par() function, then
it will turn up inside whatever part of the scheduler
handles the case that it's given a tuple of tasks.

> So now I'm back to think that Greg and Guido are talking about
> different levels. *Any* scheduling option will be able to be collapsed
> into an async task invoked by "yield from" by writing:
>     def simple_async_task():
>         return yield start_task()

Yes... or another implementation that works some way
other than yielding instructions to the scheduler.

> I haven't seen anything to suggest that
> "yield from"'s role should change from what it is in 3.3: a way to
> factor out generators into multiple pieces with out breaking send()
> and throw().

I don't think anyone is suggesting that. I'm certainly not.


From steve at  Tue Oct 16 02:17:52 2012
From: steve at (Steven D'Aprano)
Date: Tue, 16 Oct 2012 11:17:52 +1100
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

Deliberately not naming names, 'cos this isn't intended as a personal
attack on anyone...

Some people suggested as syntax:

>>>> def __12CIRCLED_PLUS__(x, y):
>>>>      ...

>>> def \u2295(x, y):
>>>      ....

>> def __\u2295__(self, other):

> IMO it's essential that we add source code escapes. Imagine the
> one-liners this will allow!
>      def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None
> Can we get this fix applied in Python 2.2 and up?

As much as I've been wetting yourselves from all the hilarity, I'm
afraid that I have to ask you all to stop. Competing to see who can
come up with the worst possible joke syntax gets *real old* fast.

Sorry to be a wet blanket spoiling the fun, but this list does
have a serious purpose, and it seems to me that sarcastically[1]
inventing deliberately awful syntax is off-topic. Or at least

Now I enjoy reading the occasional piece of obfuscated code or
syntax as much as the next guy, but there are limits, and I think
this thread passed them about a dozen posts back.

Believe it or not, there are good, reasonable reasons for wanting
more operators, and at least one serious PEP driven by real-world

So can we please drop this thread unless you have a serious
suggestion that doesn't need to wait until Python 4?

[1] By all the gods, PLEASE don't tell me these proposals are meant


From greg.ewing at  Tue Oct 16 02:24:15 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 13:24:15 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

I've just had another thought I'd like to mention
concerning the way we think about subtasks.

There's actually a subtle difference between invoking
a subgenerator using yield-from on the one hand, and
spawning it as a separate task and then waiting for
it on the other.

When you call a subgenerator using yield-from, a switch
to another task can't occur until that subgenerator or
something it calls reaches a yield.

But (at least the way my scheduler currently works),
if you spawn it as a separate task and then block
waiting for it to complete, other tasks can run
immediately, before the subtask has even started.

If you're relying on knowing where the yields can
occur, this difference could be important. So I think
the distinction between calling and spawning subtasks
needs to be maintained. This means that spawning must
be something you request explicitly in some way.


From greg.ewing at  Tue Oct 16 02:44:22 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 13:44:22 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Christian Tismer wrote:

> Right, CPython still keeps unneccessary crap on the C stack.

It's not just Python leaving stuff on the stack that's a
problem, it's external C code that calls back into Python.

> But that's not the point right now, because on the other hand,
> in the context of a possible yield (from or not), the C stack
> is clean, and this enables switching.

> And actually in such clean positions, Stackless Python (as opposed to
> Greenlets) does soft-switching, which is very similar to what the 
> generators
> are doing - there is no assembly stuff involved at all.

But the assembly code still needs to be there to handle the
cases where you *can't* do soft switching. It's the presence
of the code that's the issue here, not how frequently it
gets called.

> I have begun studying the code for YIELD_FROM. As it is written, every
> next iteration elevates the chain of generators once up and down.
> Maybe that can be avoided by changing the frame chain, so this can become
> a cheaper O(1) operation.

My original implementation of yield-from actually *did* avoid
this, by keeping a C-level pointer chain of yielding-from frames.
But that part was ripped out at the last minute when someone
discovered that it had a detrimental effect on tracebacks.

There are probably other ways the traceback problem could be
fixed, so maybe we will get this optimisation back one day.


From greg.ewing at  Tue Oct 16 02:55:20 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 13:55:20 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Calvin Spealman wrote:

> I'm still -1 on delegating control to subgenerators with yield-from,
> versus having the scheduler just deal with them directly.

Do you mean to *disallow* using yield-from for this, or just
not to encourage it?

I don't see how you *could* disallow it; there's no way for
the scheduler to know whether one of the generators it's
handling is delegating using yield-from.

I also can't see any reason you would want to discourage it.
Given that yield-from exists, it's an obvious thing to do.


From guido at  Tue Oct 16 03:17:48 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 18:17:48 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 2:45 PM, Dino Viehland <dinov at> wrote:
> They look remarkably similar.  The biggest difference I see is that NDB appears to be using an event loop to keep the futures running while we're using add_done_callback (on the yielded futures) to continue stepping the generator function along.  So there's not necessary an event loop in our case, and in fact the default context always just executes things synchronously.  But frameworks can replace the default context so that work is posted into an event loop of some form.

But do your Futures use threads? NDB doesn't. NDB's event loop doesn't
know about Futures; however the @ndb.tasklet decorator does, and the
Futures know about the event loop. When you wait for a Future, a
callback is added to the Future that will resume the generator when it
is done, and in order to run them, the Future passes its callbacks to
the event loop to be run.


> -----Original Message-----
> From: gvanrossum at [mailto:gvanrossum at] On Behalf Of Guido van Rossum
> Sent: Monday, October 15, 2012 12:34 PM
> To: Dino Viehland
> Cc: ironfroggy at; Nick Coghlan; python-ideas at
> Subject: Re: [Python-ideas] The async API of the future: yield-from
> Wow, sounds very similar to NDB's approach! Please do check out NDB's tasklets and event loop:
> On Mon, Oct 15, 2012 at 10:24 AM, Dino Viehland <dinov at> wrote:
>> I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space.  We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3.  Our version of this is also based around futures so that an @async API will return a future.  The big difference here might be that we always return a future from a call rather than yielding it up the stack.  So our API works with just simple yields rather than yield froms.  This is what a simple usage of the API looks like:
>>         from concurrent.futures import ThreadPoolExecutor
>>         from urllib.request import urlopen
>>         executor = ThreadPoolExecutor(max_workers=5)
>>         def load_url(url):
>>             return urlopen(_url).read()
>>         @async
>>         def get_image_async(url):
>>             buffer = yield executor.submit(load_url, url)
>>             return Image(buffer)
>>         def main(image_uri):
>>             img_future = get_image_async(image_uri)
>>             # perform other tasks while the image is downloading
>>             img = img_future.result()
>>         main("")
>> This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future.  So inside of an async method anything which is yielded should be a future.  The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future.  Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value.
>> Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there.  Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed.
>> The big outstanding item we're still working through is I/O, but we think the contexts help here too.  We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue.  We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread.
>> Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run):
>> Tk context:
>> Tk app:
>> -----Original Message-----
>> From: Python-ideas
>> [ at] On Behalf
>> Of Calvin Spealman
>> Sent: Monday, October 15, 2012 7:16 AM
>> To: Nick Coghlan
>> Cc: python-ideas at
>> Subject: Re: [Python-ideas] The async API of the future: yield-from
>> On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan <ncoghlan at> wrote:
>>> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman <ironfroggy at> wrote:
>>>> Currently, "with yield expr:" is not valid syntax, surprisingly.
>>> It's not that surprising, it's the general requirement that yield
>>> expressions must be enclosed in parentheses except when used
>>> standalone or in a simple assignment statement.
>>> "with (yield expr):" is valid syntax though, so I'm reluctant to
>>> endorse doing anything substantially different if the parentheses are
>>> omitted.
>> Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning.
>>> I think the combination of "yield from" to delegate control
>>> (including exception handling) completely to a subgenerator and
>>> "context manager
>>> + for loop + explicit yield" when an operation needs to yield
>>> + multiple
>>> times and the exception handling behaviour should be left to the
>>> caller (as in the "as_completed" case) should cover the necessary
>>> behaviours.
>> I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly.  I think it is far less flexible.
>> I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines.
>> with yield collect() as tasks:
>>   yield task1()
>>   yield task2()
>> results = yield tasks
>>> Cheers,
>>> Nick.
>>> --
>>> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
>> --
>> Read my blog! I depend on your acceptance of my opinion! I am interesting!
>> Follow me if you're into that sort of thing:
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
> --
> --Guido van Rossum (

--Guido van Rossum (

From ncoghlan at  Tue Oct 16 03:49:01 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 16 Oct 2012 11:49:01 +1000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
<greg.ewing at> wrote:
> My original implementation of yield-from actually *did* avoid
> this, by keeping a C-level pointer chain of yielding-from frames.
> But that part was ripped out at the last minute when someone
> discovered that it had a detrimental effect on tracebacks.
> There are probably other ways the traceback problem could be
> fixed, so maybe we will get this optimisation back one day.

Ah, I thought I remembered something along those lines. IIRC, it was a
bug report on one of the alphas that prompted us to change it.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guido at  Tue Oct 16 03:51:26 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 18:51:26 -0700
Subject: [Python-ideas] Expressiveness of coroutines versus Deferred
 callbacks (or possibly promises, futures)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 11:08 AM, Glyph <glyph at> wrote:

> Still working my way through zillions of messages on this thread, trying
> to find things worth responding to, I found this, from Guido:
> [Generators are] more flexible [than Deferreds], since it is easier to
> catch different exceptions at different points (...) In the past, when I
> pointed this out to Twisted aficionados, the responses usually were a mix
> of "sure, if you like that style, we got it covered, Twisted has
> inlineCallbacks," and "but that only works for the simple cases, for the
> real stuff you still need Deferreds." But that really sounds to me like
> Twisted people just liking what they've got and not wanting to change.
> If you were actually paying attention, we did explain what "the real
> stuff" is, and why you can't do it with inlineCallbacks. ;-)

An yet the rest of your email could be paraphrased by those two quoted
phrases. :-) But seriously, thanks for repeating the explanation for my

> (Or perhaps I should say, why we prefer to do it with Deferreds
> explicitly.)
> Managing parallelism is easy with the when-this-then-that idiom of
> Deferreds, but challenging with the sequential this-then-this-then-this
> idiom of generators.  The examples in the quoted message were all
> sequential workflows, which are roughly equivalent in both styles.  As soon
> as a for loop gets involved though, yield-based coroutines have a harder
> time expressing the kind of parallelism that a lot of applications *should
> * use, so it's easy to become accidentally sequential (and therefore less
> responsive) even if you don't need to be.  For example, using some
> hypothetical generator coroutine library, the idiomatic expression of a
> loop across several request/responses would be something like this:
> @yield_coroutine
> def something_async():
>     values = yield step1()
>     results = set()
>     for value in values:
>         results.add(step3((yield step2(value))))
>     return_(results)
> Since it's in a set, the order of 'results' doesn't actually matter; but
> this code needs to sit and wait for each result to come back in order; it
> can't perform any processing on the ones that are already ready while it's
> waiting.  You express this with Deferreds:
> def something_deferred():
>     return step1().addCallback(
>         lambda values: gatherResults([step2(value).addCallback(step3)
>                                       for value in
> values])).addCallback(set)
> In addition to being a roughly equivalent amount of code (fewer lines, but
> denser), that will run step2() and step3() on demand, as results are ready
> from the set of Deferreds from step1.  That means that your program will
> automatically spread out its computation, which makes better use of time as
> results may be arriving in any order.
> The problem is that it is difficult to express laziness with generator
> coroutines: you've already spent the generator-ness on the function on
> responding to events, so there's no longer any syntactic support for
> laziness.

I see your example as a perfect motivation for adding some kind of map()
primitive. In NDB there is one for the specific case of mapping over query
results (common in NDB because it's primarily a database client). That
map() primitive takes a callback that is either a plain function or a
tasklet (i.e. something returning a Future). map() itself is also async
(returning a Future) and all the tasklets results are waited for and
collected only when you wait for the map(). It also handles the input
arriving in batches (as they do for App Engine Datastore queries). IOW it
exploits all available parallelism. While the public API is tailored for
queries, the underlying mechanism can support a few different ways of
collecting the results, supporting filter() and even reduce() (!) in
addition to map(); and most of the code is reusable for other (non-query)
contexts. I feel it would be possible to extend it to support "stop after
the first N results" and "stop when this predicate says so" too.

In general, whenever you want parallelism in Python, you have to introduce
a new function, unless you happen to have a suitable function lying around
already; so I don't feel I am contradicting myself by proposing a mechanism
using callbacks here. It's the callbacks for sequencing that I dislike.

> (There's another problem where sometimes you can determine that work needs
> to be done as it arrives; that's an even trickier abstraction than
> Deferreds though and I'm still working on it. I think I've mentioned <
>> already in one of my previous posts.)

NDB's map() does this.

> Also, this is not at all a hypothetical or academic example.  This pattern
> comes up all the time in e.g. web-spidering and chat applications.

Of course. In App Engine, fetching multiple URLs in parallel is the
hello-world of async operations.

> To be fair, you *could* express this in a generator-coroutine library
> like this:
> @yield_coroutine
> def something_async():
>     values = yield step1()
>     thunks = []
>     @yield_coroutine
>     def do_steps(value):
>         return_(step3((yield step2(value))))
>     for value in values:
>         thunks.append(do_steps(value))
>     return_(set((yield multi_wait(thunks))))
> but that seems bizarre and not very idiomatic; to me, it looks like the
> confusing aspects of both styles.

Yeah, you need a map() operation:

def something_async():
  values = yield step1()
  def do_steps(value):
    return step3((yield step2(value)))
  return set(yield map_async(do_steps, values))

Or maybe map_async()'s Future's result should be a set?

> David Reid also wrote up some examples of how Deferreds can express
> sequential workflows more nicely as well (also indirectly as a response to
> Guido!) on his blog, here: <
> Which I understand -- I don't want to change either. But I also observe
> that a lot of people find bare Twisted-with-Deferreds too hard to grok, so
> they use Tornado instead, or they build a layer on top of either (like
> Monocle),
> inlineCallbacks (and the even-earlier deferredGenerator) predates Monocle.
>  That's not to say Monocle has no value; it is a portability layer between
> Twisted and Tornado that does the same thing inlineCallbacks does but
> allows you to do it even if you're not using Deferreds, which will surely
> be useful to some people.
> I don't want to belabor this point, but it bugs me a little bit that we
> get so much feedback from the broader Python community along the lines of
> "Why doesn't Twisted do X?

I don't think I quite said that. But I suspect it happens because Twisted
is hard to get into. I suspect anything using higher-order functions this
much has that problem; I feel this way about Haskell's Monads. I wouldn't
be surprised if many Twisted lovers are also closet (or not) Haskell lovers.

> I'd use it if it did X, but it's all weird and I don't understand Y that
> it forces me to do instead, that's why I use Z" when, in fact:
>    1. Twisted does do X
>    2. It's done X for years
>    3. It actually invented X in the first place
>    4. There are legitimate reasons why we (Twisted core developers)
>    suggest and prefer Y for many cases, but you don't need to do it if you
>    don't want to follow our advice
>    5. Thing Z that is being cited as doing X actually explicitly mentions
>    Twisted as an inspiration for its implementation of X
> It's fair, of course, to complain that we haven't explained this very
> well, and I'll cop to that unless I can immediately respond with a
> pre-existing URL that explains things :).
> One other comment that's probably worth responding to:
> I suppose on systems that support both networking and GUI events, in my
> design these would use different I/O objects (created using different
> platform-specific factories) and the shared reactor API would sort things
> out based on the type of I/O object passed in to it.
> In my opinion, it is a mistake to try to harmonize or unify all GUI event
> systems, unless you are also harmonizing the GUI itself (i.e. writing a
> totally portable GUI toolkit that does everything).  And I think we can all
> agree that writing a totally portable GUI toolkit is an impossibly huge
> task that is out of scope for this (or, really, any other) discussion.  GUI
> systems can already dispatch its event to user code just fine - interposing
> a Python reactor API between the GUI and the event registration adds
> additional unnecessary work, and may not even be possible in some cases.
>  See, for example, the way that Xcode (formerly Interface Builder) and the
> Glade interface designer use: the name of the event handler is registered
> inside a somewhat opaque blob, which is data and not code, and then hooked
> up automatically at runtime based on reflection.  The code itself never
> calls any event-registration APIs.
> Also, modeling all GUI interaction as a request/response conversation is
> limiting and leads to bad UI conventions.  Consider: the UI element that
> most readily corresponds to a request/response is a modal dialog box.  Does
> anyone out there really like applications that consist mainly of popping up
> dialog after dialog to prompt you for the answers to questions?

I don't feel very strongly about integrating GUI systems. IIRC Twisted has
some way to integrate with certain GUI event loops. I don't think we should
desire any more (but neither, less).

--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Tue Oct 16 04:10:46 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 19:10:46 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 1:14 PM, Greg Ewing <greg.ewing at> wrote:
> Nick Coghlan wrote:
>> The main primitive I personally want out of an async API is a
>> task-based equivalent to concurrent.futures.as_completed() [1]. This
>> is what I meant about iteration being a bit of a mess: the way the
>> as_completed() works, the suspend/resume channel of the iterator
>> protocol is being used to pass completed future objects back to the
>> calling iterator. That means that channel *can't* be used to talk
>> between the coroutine and the scheduler,
> I had to read this a couple of times before I figured out
> what you're talking about, but I get it now.
> This is an instance of a general problem that was noticed
> back when I was discussing my cofunctions idea: using
> generator-based coroutines, it's not possible to have a
> "suspendable iterator", because that would require "yield"
> to have two conflicting meanings: "suspend this coroutine"
> on one hand, and "provide a value to my caller" on the
> other.
> Unfortunately, I suspect that a truly elegant solution to this
> problem will require yet another language addition -- something
> like
>    yield for item in subtask():
>       ...
> which would run a slightly different version of the iterator
> protocol in which values to be yield are wrapped somehow
> (I haven't figured out all the details yet).

I think I ran into a similar issue with NDB when defining iteration
over an asynchronous query. My solution:

  q = <some query specification>
  it = q.iter()  # Fire off the query to the datastore
  while (yield it.has_next_async()):  # Block until one result
    emp =  # Get the result that was buffered on the iterator
    print, emp.age  # Use it

--Guido van Rossum (

From guido at  Tue Oct 16 04:19:29 2012
From: guido at (Guido van Rossum)
Date: Mon, 15 Oct 2012 19:19:29 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 4:37 PM, Matt Joiner <anacrolix at> wrote:
> I gave something like this a go a while ago:
> "Coroutines" yield events or futures as Nick put them from the top, and the
> scheduler at the bottom manages events and scheduling.
> There are a few things I took away from this attempt:
> 1) Explicit yield a la PEP380 requires syntactical changes *everywhere*.

So does using PEP 342 style coroutines (yield Future instead of yield from).

> 2) Python's dynamic typing means that neglecting to "yield from" gives you
> broken code, and Python won't help you here. Add to this that you now have a
> 380, and "normal synchronous" form of most interfaces and the caller must
> know what kind is used at all times.

In NDB this is alleviated by insisting that the only thing you are
allowed to yield is a Future. Anything else raises TypeError. But yes,
the first few days when getting used to this style, you end up
debugging this a few times.

> 3) Concurrency is nice, but it requires language-level support, and proper
> parallelism to really shine. The "C way" of doing things is already so
> heavily ingrained in Python, an entirely new standard library and
> interpreter that breaks C compatibility is really the only way to proceed,
> and this certainly isn't worth it just to write code with "yield from"
> littered on every line.

Here you're basically arguing for greenlets/gevent -- you're saying
you just don't want to put up with the yields everywhere. But the
popularity of Twisted and Tornado show that at least some people are
willing to make even bigger sacrifices in order to be able to do async
I/O efficiently -- i.e., to solve the C10K problem that Christian
Tismer referred to (,

There happen to be several problems with greenlets (even Christian
Tismer said so, and included Stackless in the problem). The current
effort is striving to help people solve it ith less effort than the
async style Twisted and Tornado promote, while avoiding the problems
with greenlets.

--Guido van Rossum (

From dinov at  Tue Oct 16 03:50:29 2012
From: dinov at (Dino Viehland)
Date: Tue, 16 Oct 2012 01:50:29 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido wrote:
> But do your Futures use threads? NDB doesn't. NDB's event loop doesn't
> know about Futures; however the @ndb.tasklet decorator does, and the
> Futures know about the event loop. When you wait for a Future, a callback is
> added to the Future that will resume the generator when it is done, and in
> order to run them, the Future passes its callbacks to the event loop to be
> run.

The decorator and the default context don't do anything w/ threads by default,
but once you start combining it w/ other futures threads are likely to be used.
For example if you take:

         def get_image_async(url):
             buffer = yield executor.submit(load_url, url)
             return Image(buffer)

Then the " yield executor.submit(load_url, url)" line is going to yield a future which
is running on a thread pool thread.  When it completes it's done callback is also going
to be delivered on the same thread pool thread.  At that point we let the context which
was captured when the function was initially called handle resuming the generator.  The
default context is just going to synchronously continue to the function, so the
generator would then resume running on the thread pool thread.  But if you're running
in a GUI app which sets up its own context then the context will post an event into the UI 
event loop and execution will continue on the UI thread.

Likewise if there were a bunch of async I/O routines then this would combine with them in 
a similar way - async I/O would result in a future, the futures would signal that they're done 
on some worker thread, and then the async methods will get to continue running on that 
worker thread unless the current context wants to do something different.

From Steve.Dower at  Tue Oct 16 03:45:25 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 16 Oct 2012 01:45:25 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

That's basically exactly the same as ours (I worked on it with Dino). We assume that yielded objects are futures and wire up the callback.

I think the difference is that all the yielded futures are hidden within the decorator, which returns one main future to the caller. This may be slightly inefficient, but it also makes it far easier for end-users. An event loop (in our terminology, a 'context') is only necessary if you need to ensure that callbacks (in this case, the next step in the generator) is run in a certain context (such as a UI thread). Without one, calling an @async method simply gives you back a future that you can wait on.

The most important part of PEP 380 for this approach is not yield from, but allowing return <expr> inside a generator. It makes the methods that much more natural.

Probably the most important part is that we assume whatever context is available (through contexts.get_current()) has a post() method for scheduling a callback. Basically, we approached this as less of a "how do I run this asynchronously" problem and more of a "how do I run something after this finishes" problem.

We also have some ideas about associating properties with futures in a way that lets the caller decide how to run continuations, so you can opt-out of coming back to the calling thread or provide a cancellation token of some sort. These aren't written up yet (obviously), but we've certainly considered it.


From: Python-ideas [ at] on behalf of Guido van Rossum [guido at]
Sent: Monday, October 15, 2012 6:17 PM
To: Dino Viehland
Cc: python-ideas at
Subject: Re: [Python-ideas] The async API of the future: yield-from

On Mon, Oct 15, 2012 at 2:45 PM, Dino Viehland <dinov at> wrote:
> They look remarkably similar.  The biggest difference I see is that NDB appears to be using an event loop to keep the futures running while we're using add_done_callback (on the yielded futures) to continue stepping the generator function along.  So there's not necessary an event loop in our case, and in fact the default context always just executes things synchronously.  But frameworks can replace the default context so that work is posted into an event loop of some form.

But do your Futures use threads? NDB doesn't. NDB's event loop doesn't
know about Futures; however the @ndb.tasklet decorator does, and the
Futures know about the event loop. When you wait for a Future, a
callback is added to the Future that will resume the generator when it
is done, and in order to run them, the Future passes its callbacks to
the event loop to be run.


> -----Original Message-----
> From: gvanrossum at [mailto:gvanrossum at] On Behalf Of Guido van Rossum
> Sent: Monday, October 15, 2012 12:34 PM
> To: Dino Viehland
> Cc: ironfroggy at; Nick Coghlan; python-ideas at
> Subject: Re: [Python-ideas] The async API of the future: yield-from
> Wow, sounds very similar to NDB's approach! Please do check out NDB's tasklets and event loop:
> On Mon, Oct 15, 2012 at 10:24 AM, Dino Viehland <dinov at> wrote:
>> I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space.  We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3.  Our version of this is also based around futures so that an @async API will return a future.  The big difference here might be that we always return a future from a call rather than yielding it up the stack.  So our API works with just simple yields rather than yield froms.  This is what a simple usage of the API looks like:
>>         from concurrent.futures import ThreadPoolExecutor
>>         from urllib.request import urlopen
>>         executor = ThreadPoolExecutor(max_workers=5)
>>         def load_url(url):
>>             return urlopen(_url).read()
>>         @async
>>         def get_image_async(url):
>>             buffer = yield executor.submit(load_url, url)
>>             return Image(buffer)
>>         def main(image_uri):
>>             img_future = get_image_async(image_uri)
>>             # perform other tasks while the image is downloading
>>             img = img_future.result()
>>         main("")
>> This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future.  So inside of an async method anything which is yielded should be a future.  The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future.  Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value.
>> Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there.  Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed.
>> The big outstanding item we're still working through is I/O, but we think the contexts help here too.  We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue.  We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread.
>> Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run):
>> Tk context:
>> Tk app:
>> -----Original Message-----
>> From: Python-ideas
>> [ at] On Behalf
>> Of Calvin Spealman
>> Sent: Monday, October 15, 2012 7:16 AM
>> To: Nick Coghlan
>> Cc: python-ideas at
>> Subject: Re: [Python-ideas] The async API of the future: yield-from
>> On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan <ncoghlan at> wrote:
>>> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman <ironfroggy at> wrote:
>>>> Currently, "with yield expr:" is not valid syntax, surprisingly.
>>> It's not that surprising, it's the general requirement that yield
>>> expressions must be enclosed in parentheses except when used
>>> standalone or in a simple assignment statement.
>>> "with (yield expr):" is valid syntax though, so I'm reluctant to
>>> endorse doing anything substantially different if the parentheses are
>>> omitted.
>> Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning.
>>> I think the combination of "yield from" to delegate control
>>> (including exception handling) completely to a subgenerator and
>>> "context manager
>>> + for loop + explicit yield" when an operation needs to yield
>>> + multiple
>>> times and the exception handling behaviour should be left to the
>>> caller (as in the "as_completed" case) should cover the necessary
>>> behaviours.
>> I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly.  I think it is far less flexible.
>> I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines.
>> with yield collect() as tasks:
>>   yield task1()
>>   yield task2()
>> results = yield tasks
>>> Cheers,
>>> Nick.
>>> --
>>> Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia
>> --
>> Read my blog! I depend on your acceptance of my opinion! I am interesting!
>> Follow me if you're into that sort of thing:
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
> --
> --Guido van Rossum (

--Guido van Rossum (
Python-ideas mailing list
Python-ideas at

From turnbull at  Tue Oct 16 06:07:42 2012
From: turnbull at (Stephen J. Turnbull)
Date: Tue, 16 Oct 2012 13:07:42 +0900
Subject: [Python-ideas] Is there a good reason to use *
	for	multiplication?
In-Reply-To: <>
References: <>
Message-ID: <>

Mike Graham writes:

 > IMO it's essential that we add source code escapes. Imagine the
 > one-liners this will allow!
 >     def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None
 > Can we get this fix applied in Python 2.2 and up?

Why not go all the way back to v1.5.2?  All it takes is a version bump
to v1j.5.2.

From greg.ewing at  Tue Oct 16 07:25:24 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 18:25:24 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Calvin Spealman wrote:
> A "sane stack trace" only makes sense if we assume that tasks
> "call" each other in the same kind of call tree that synchronous code flows
> in, and I don't think that is necessarily the case.

No, but often it *is* the case, and in those cases we
would like to get a traceback that correctly reflects
the chain of calls.

> There are cases when one
> task might want to end before tasks it as "called" are complete, and if we use
> yield-from this is *impossible* but it is very useful.

That depends on what you mean by "use yield-from". It's
true that yield-from *on its own* can't achieve the effect
of spawning concurrent subtasks; other mechanisms will need
to be brought to bear at some point.

But there's no reason a solution involving those other
mechanisms can't be encapsulated in a library function that
you invoke using yield-from. I've posted a couple of examples
of how a par() function which does that might be written.

> yield-from semantics won't allow a called task to continue, if needed, after the
> calling task itself has completed.

You seem to be imagining that more is being claimed about
the abilities of yield-from than is actually being claimed.
Yield-from is just a procedure call; the important thing
is what the called procedure does.

One of the things it can do is invoke a scheduler primitive
that spawns an independent task. In my example scheduler,
this is spelled scheduler.schedule(task). This is not a
yield-from call, it's just an ordinary call. It adds
the given generator to the list of ready tasks, so that it
will get run when its chance comes around. Meanwhile,
the calling task carries on.


From greg.ewing at  Tue Oct 16 07:39:20 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 18:39:20 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan wrote:
> (this is why I disagree with Greg that
> "yield from" can serve as the one true API - it doesn't handle partial
> iteration, and it doesn't handle pre- or post- processing around the
> suspension points while iterating).

I'm aware of the iteration problem, but I'm not convinced
that the convolutions necessary to make it possible to use
a for-loop for this are worth the bother, as opposed to
simply accepting that you can't use the for statement in
this situation, and using some other kind of loop.

In any case, even if we decide to provide a scheduler
instruction to enable using for-loops on suspendable
iterators somehow, it doesn't follow that we should use
scheduler instructions for anything *else*.

I would consider such a scheduler instruction to be a stopgap
measure until we can find a better solution -- just as
yield-from is a better solution than using "call" and "return"
scheduler instructions.


From greg.ewing at  Tue Oct 16 07:46:12 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 18:46:12 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Laurens Van Houtven wrote:
> On Mon, Oct 15, 2012 at 5:32 PM, Nick Coghlan <ncoghlan at 
> <mailto:ncoghlan at>> wrote:
>     My preferred way of thinking of "yield from" is as a simple
>     refactoring tool
> I agree. That's how I've used it. Maybe that's just short-sightedness.

And that's exactly how *I* see it as well! Which means
some people must be misinterpreting something I'm saying,
if they think I see it some other way.


From ncoghlan at  Tue Oct 16 08:01:51 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 16 Oct 2012 16:01:51 +1000
Subject: [Python-ideas] filename comparison [was] Re: PEP 428 -
 object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 6:21 AM, Jim Jewett <jimjjewett at> wrote:
> Ideally, I would also have a way to find out that a pathname is likely
> to be problematic for cross-platform uses, or at least whether two
> specific pathnames are known to be collision-prone on existing
> platforms other than mine.  (But I'm not sure that sort of test can be
> reliable enough for the stdlib.  Would just check for caseless
> equality, reserved Windows names, and non-alphanumeric characters in
> the filename?)

I'd forgotten about it until reading this, but I think you can get
into trouble with Unicode normalisation as well - so, I think we can
safely dismiss this as an irrelevant tangent and just stick with
Antoine's basic Windows vs Posix distinction. If need be, the
strategies can be exposed at a later date (via keyword-only arguments)
if we come up with a more convincing use case.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From steve at  Tue Oct 16 08:43:52 2012
From: steve at (Steven D'Aprano)
Date: Tue, 16 Oct 2012 17:43:52 +1100
Subject: [Python-ideas] filename comparison [was] Re: PEP 428 -
	object-oriented filesystem paths
In-Reply-To: <>
References: <>
Message-ID: <20121016064351.GA20296@ando>

On Mon, Oct 15, 2012 at 04:21:50PM -0400, Jim Jewett wrote:

> If I want my program (or a dict) to know that "CONFIG" and "config"
> are the same, then I also want it to know that "My Documents" is the
> same as "MYDOCU~1".*

Well, perhaps you do, but those not using Windows are unlikely to care 
about DOS short names.

However, they may care about some other form of short name. E.g. on 
iso9660 file systems (CDs) long names are just truncated; if two 
truncated names clash, the second and subsequent file is given a three 
digit suffix:

My Documents

get renamed to:


although my Linux computer displays those names in lower case. The Rock 
Ridge and Joliet extensions can record the unmangled file names, but not 
all CDs use them.

It is not the case that all case-insensitive file systems necessarily 
support DOS short names. There are file systems that don't support long 
names at all, there are case-insensitive file systems that preserve 
case, and those that don't.

It's not even necessarily so that Windows is always case-insensitive:


From greg.ewing at  Tue Oct 16 09:20:08 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 20:20:08 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> But there needs to be another way to get a task running immediately
> and concurrently; I believe that would be
> a = spawn(foo_task())
> right? One could then at any later point use
> ra = yield from a

Hmmm. I suppose it *could* be made to work that way, but I'm
not sure it's a good idea, because it blurs the distinction
between invoking a subtask synchronously and waiting for the
result of a previously spawned independent task.

Recently I've been thinking about an implementation where
it would look like this. First you do

    t = spawn(foo_task())

but what you get back is *not* a generator; rather it's
a Task object which wraps a generator and provides various
operations. One of them would be

    r = yield from t.wait()

which waits for the task to complete and then returns its
value (or if it raised an exception, propagates the exception).

Other operations that a Task object might support include

    t.unblock()        # wake up a blocked task
    t.cancel()         # unschedule and clean up the task
    t.throw(exception) # raise an exception in the task

(I haven't included t.block(), because I think that should
be a stand-alone function that operates on the current task.
Telling some other task to block feels like a dodgy thing
to do.)

> One could also combine these and do e.g.
> a = spawn(foo_task())
> b = spawn(bar_task())
> <do more work locally>
> ra, rb = yield from par(a, b)

If you're happy to bail out at the first exception, you
wouldn't strictly need a par() function for this, you could
just do

    a = spawn(foo_task())
    b = spawn(bar_task())
    ra = yield from a.wait()
    rb = yield from b.wait()

> Have I got the spelling for spawn() right? In many other systems (e.g.
> threads, greenlets) this kind of operation takes a callable, not the
> result of calling a function (albeit a generator).

That's a result of the fact that a generator doesn't start
running as soon as you call it. If you don't like that, the
spawn() operation could be defined to take an uncalled generator
and make the call for you. But I think it's useful to make the
call yourself, because it gives you an opportunity to pass
parameters to the task.

> If it takes a
> generator, would it return the same generator or a different one to
> wait for?

In your version above where you wait for the task simply
by calling it with yield-from, spawn() would have to return a
generator (or something with the same interface). But it
couldn't be the same generator -- it would have to be a wrapper
that takes care of blocking until the subtask is finished.


From pjdelport at  Tue Oct 16 09:27:01 2012
From: pjdelport at (Piet Delport)
Date: Tue, 16 Oct 2012 09:27:01 +0200
Subject: [Python-ideas] Proposal: A simple protocol for generator tasks
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 11:17 AM, Greg Ewing
<greg.ewing at> wrote:
> Piet Delport wrote:
>> 2. Each value yielded as a "step" represents a scheduling instruction,
>>    or primitive, to be interpreted by the task's scheduler.
> I don't think this technique should be used to communicate
> with the scheduler, other than *maybe* for a *very* small
> set of operations that are truly primitive -- and even then
> I'm not convinced.

But this is by necessity how the scheduler is *already* being
communicated with, at least for the de facto scheduler instructions like
None, Future, and the other primitives being discussed.

This concept of an "intermediate object yielded by a task to its
scheduler on each step, instructing it how to schedule" is already
unavoidably fundamental to how these tasks / coroutines work: this
proposal is just an attempt to name that concept, and define it more

> To begin with, there are some operations that *can't* rely
> on yielded instructions as the only way of invoking them.
> Spawning a task, for example -- there must be some way for
> non-task code to invoke that, otherwise you wouldn't be able
> to get top-level tasks into the system.

I'm definitely not suggesting that this be the *only* way of invoking
operations, or that all operations should be invoked this way.

Certainly, everything that is possible inside this protocol will also be
possible outside of it by directly calling methods on some global
scheduler, but that requires knowing who and what that global scheduler

It's important to note that a globally identifiable scheduler object
might not even exist: it's entirely reasonable, for example, to
implement this entire protocol in Twisted by writing a deferTask(task)
helper that handles generic scheduler instructions (None, Future-alike,
and things like spawn() and sleep()) by just arranging for the
appropriate Twisted callbacks and resumptions to happen under the hood.

(This is basically how Twisted's deferredGenerator works currently: the
main difference is that a deferTask() implementation would be able to
run any generic coroutine / generator task code that uses this protocol,
without that code having to know about Twisted.)

Regarding getting top-level tasks into the system, this can be done in a
variety of ways, depending on how particular applications are
structured. For example, if the stdlib grows a standardized default
event loop:



    result =

or with existing frameworks like Twisted:


In other words, only the top level of an application should need to
worry about how the initial scheduler, tasks, and everything else are

> Also, consider the operation of unblocking a task that's
> waiting for some event to occur. Often you will want to
> invoke this using a callback from an event loop, which is
> not a generator and can't yield anything to anywhere.

This can be done with a scheduler primitive that obtains a callable to
resume the current task, like the strawman:

    resume = yield tasklib.get_resume()

from the other thread.

However the exact API ends up looking, suspending and resuming tasks are
very fundamental operations, and probably the most worth having as
standardized instructions that any scheduler can implement: a variety of
more powerful abstractions can be generically built on top of them.

> Given that these operations must provide a way of invoking
> them using a plain function call, there is little reason
> to provide a second way using a yielded instruction.

I don't see the former as an argument to avoid supporting the same
operations as standard yielded instructions.

A task can arrange to wait for a Future using plain function calls, or
by yielding it as an instruction (i.e., "result = yield some_future()"):
the ability to do the former should not make the latter any less

The advantage of treating certain primitives as yielded scheduler
instructions is that:

- It's generic and scheduler-agnostic: for example, any task can simply
  yield a Future to its scheduler without caring exactly how the
  scheduler arranges for add_done_callback() to resume the task.

- It requires no global coordination: every generator task already has a
  direct line of communication to its immediate scheduler, without
  having to identify itself using handles, task ids, or other

In other words, it's the difference between saying:

    h = get_current_task_handle()
    current_scheduler.sleep(h, 10)

and, saying:

    yield tasklib.sleep(10)
    yield tasklib.suspend()

where sleep(n) and suspend() are simple generic objects that any
scheduler can recognize and implement, just like how yielded None and
Future values are recognized and implemented.

> In any case, I believe that the public interface for *any*
> scheduler operation should not be a yielded instruction,
> but either a plain function or something called using
> yield-from, for reasons I explained to Guido earlier.

In other words, limiting the allowable set of yielded scheduler
instructions to None, and doing everything else separate API?

This is possible, but it seems like an awful waste of the perfectly good
and dedicated communication channel that already exists between tasks
and their schedulers, in favor of something more complex and indirect.

There's certainly a motivation for global APIs too, as with the
discussion about getting standardized event loops and schedulers in the
stdlib, but I think that is solving a somewhat different problem, and
see this no reason to tie coroutines / generator tasks to those APIs
when simpler, more generic and universal protocol could be defined.

To me, defining locally how a scheduler should behave and respond to
certain yielded types and values is a much more tractable problem than
the question of designing a good global scheduler API that exposes all
the same operations in a way that's portable and usable across many
different application architectures and lifecycles.

> There are problems with allowing multiple schedulers to
> coexist within the one system, especially if yielded
> instructions are the only way to communicate with them.
> It might work for instructions to a task's own scheduler
> concerning itself, but some operations need to operate on
> a *different* task, e.g. unblocking a task when the event
> it was waiting for occurs. How do you know which scheduler
> is managing it?

The point of a protocol like this is that there would be no need for
tasks to know which schedulers are managing what: they can limit
themselves to using a generic protocol.

For example, the par() implementation I gave assumes the primitive:

    resume = yield tasklib.get_resume()

to get a callable to resume itself, and can simply pass that callable to
the tasks it spawns: the last child to complete just calls resume() to
resume the parent task in its own scheduler.

In this example, the resume callable contains all the necessary state to
resume that particular task. A particular scheduler could implement this
primitive by sending back a closure like:

    lambda: current_scheduler.schedule(the_task)

In the case of something like deferTask(), there need not even be any
particular long-lived scheduler aside from the transient calls arranged
by deferTask, and all the state would live in the Twisted reactor and
its queues:

    lambda: reactor.callLater(_defertask_iterate, the_task)

As far as the generic protocol is concerned, it does not matter whether
there's a single global scheduler, or multiple schedulers, or no single
scheduler at all: the scheduler side of the protocol is free to be
implemented in many ways, and manage its state however it's convenient.

> And even if you can find out, if you have to control it using yielded
> instructions, you have no way of yielding something to a different
> task's scheduler.

Generally speaking, this should not be necessary: inter-task
communication is a different question to how tasks should communicate
with their immediate scheduler.

Generically controlling the scheduling of different tasks can be done in
many ways:

- The way par() passes its resume callable to its spawned children.

- Using synchronization primitives: for example, an alternative way to
  implement something like par() without direct use of suspend/resume is
  cooperative condition variable or semaphore.

- Using queues, channels, or similar mechanisms to communicate
  information between tasks. (The communicated values can implicitly
  even be scheduler instructions themselves, like a queue of Futures.)

If something cannot be done inside this generator task protocol, you can
of course still step outside of it and use other mechanisms directly,
but that necessarily ties your code to those mechanisms, which may not
be as simple and universal as code that only relies on this protocol.

From greg.ewing at  Tue Oct 16 09:45:35 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 16 Oct 2012 20:45:35 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Calvin Spealman wrote:
> If we allow spawn(task())
> then we're not getting nice tracebacks anyway, so I think we should
> allow
>   future1 = yield task1() # spawn task
>   future2 = yield task2() # spawn other task

I don't think it's necessary to allow 'yield task' as a
method of spawning in order to get nice tracebacks for
spawned tasks.

In the Task-object-based system I'm thinking about, if
an exception reaches the top level of a Task, it gets
stored in the Task object until another task wait()s
for it, and then it continues to propagate.

This makes sense, because the wait() establishes a
task-subtask relationship, so the traceback should
proceed from the subtask to the waiting task.

 > Both are primitives we
> need to support as first-class operation. That is, without some wrapper
> like spawn().

In my system, spawn() isn't a wrapper -- it *is* the
primitive way to create an independent task. And I
think it's the only one we need.


From ncoghlan at  Tue Oct 16 09:48:24 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 16 Oct 2012 17:48:24 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 3:39 PM, Greg Ewing <greg.ewing at> wrote:
> In any case, even if we decide to provide a scheduler
> instruction to enable using for-loops on suspendable
> iterators somehow, it doesn't follow that we should use
> scheduler instructions for anything *else*.

The only additional operation needed is an async equivalent to the
concurrent.futures.wait() API, which would allow you to provide a set
of Futures and say "let me know when one of these operations are done"

As it turns out, this shouldn't *need* a new scheduler primitive in
Guido's design, since it can be handled by hooking up an appropriate
callback to the supplied future objects. Following code isn't tested,
but given my understanding of how Guido wants things to work, it
should do what I want:

    def _wait_first(futures):
        # futures must be a set, items will be removed as they complete
        signal = Future()
        def chain_result(completed):
            if completed.cancelled():
            elif completed.done():
        for f in futures:
        return signal

    def wait_first(futures):
        return _wait_first(set(futures))

    def as_completed(futures):
        remaining = set(futures)
        while 1:
            if not remaining:
            yield _wait_first(remaining)

    def load_url_async(url)
        return url, (yield urllib.urlopen_async(url)).read()

    def example(urls):
        for get_next_page in as_completed(load_url_async(url) for url in urls):
                url, data = yield get_next_page
            except Exception as exc:
                print("Something broke: {}".format(exc))
                print("Loaded {} bytes from {!r}".format(len(data), url))

There's no scheduler instruction, there's just Guido's core API
concept: the only thing a tasklet is allowed to yield is a Future
object, and the step of registering tasks to be run is *always* done
via an explicit call to the event loop rather than via the "yield"
channel. The yield channel is only used to say "wait for this
operation now".

What this approach means is that, to get sensible iteration, all you
need is an ordinary iterator that produces future objects instead of
reporting the results directly. You can then either call this operator
with "yield from", in which case the individual results will be
ignored and the first failure will abort the iteration, *or* you can
invoke it with an explicit for loop, which will be enough to give you
control over how exceptions are handled by means of an ordinary
try/except block rather than a complicated exception chain.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From solipsis at  Tue Oct 16 11:30:01 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 16 Oct 2012 11:30:01 +0200
Subject: [Python-ideas] The async API of the future: yield-from
References: <>
Message-ID: <>

On Mon, 15 Oct 2012 19:19:29 -0700
Guido van Rossum <guido at> wrote:
> Here you're basically arguing for greenlets/gevent -- you're saying
> you just don't want to put up with the yields everywhere. But the
> popularity of Twisted and Tornado show that at least some people are
> willing to make even bigger sacrifices in order to be able to do async
> I/O efficiently -- i.e., to solve the C10K problem that Christian
> Tismer referred to (,

To be honest, one of the selling points of Twisted is not that it
solves the C10k problem, it's that it's a comprehensive network
programming toolkit. I'd bet many users of Twisted don't care that much
about the single-thread event-loop approach, and don't have C10k-like



Software development and contracting:

From solipsis at  Tue Oct 16 11:43:15 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 16 Oct 2012 11:43:15 +0200
Subject: [Python-ideas] The async API of the future: yield-from
References: <>
Message-ID: <>

On Tue, 16 Oct 2012 17:48:24 +1000
Nick Coghlan <ncoghlan at> wrote:
>     def _wait_first(futures):
>         # futures must be a set, items will be removed as they complete
>         signal = Future()
>         def chain_result(completed):
>             futures.remove(completed)
>             if completed.cancelled():
>                 signal.cancel()
>                 signal.set_running_or_notify_cancel()
>             elif completed.done():
>                 signal.set_result(completed.result())
>             else:
>                 signal.set_exception(completed.exception())
>         for f in futures:
>             f.add_done_callback(chain_result)
>         return signal
>     def wait_first(futures):
>         return _wait_first(set(futures))
>     def as_completed(futures):
>         remaining = set(futures)
>         while 1:
>             if not remaining:
>                 break
>             yield _wait_first(remaining)
>     @task
>     def load_url_async(url)
>         return url, (yield urllib.urlopen_async(url)).read()
>     @task
>     def example(urls):
>         for get_next_page in as_completed(load_url_async(url) for url in urls):
>             try:
>                 url, data = yield get_next_page
>             except Exception as exc:
>                 print("Something broke: {}".format(exc))
>             else:
>                 print("Loaded {} bytes from {!r}".format(len(data), url))

Your example looks rather confusing to me. There are a couple of things
I don't understand:

- why does load_url_async return something instead of yielding it?

- how does overlapping of reads happen? you seem to consider that a
  read() will be non-blocking once the server starts responding to your
  request, which is only true if the response is small (or you have a
  very fast connection to the server).

- if read() is really non-blocking, why do you yield get_next_page? What
  does that achieve? Actually, what is yielding a tuple supposed to
  achieve at all?

- where is control transferred over to the scheduler? it seems it's
  only in get_next_page, while I would expect it to be transferred in
  as_completed as well.



From shane at  Tue Oct 16 14:04:33 2012
From: shane at (Shane Green)
Date: Tue, 16 Oct 2012 05:04:33 -0700
Subject: [Python-ideas] re-implementing Twisted for fun and profit
In-Reply-To: <>
References: <>
Message-ID: <>

You make an excellent point about the different levels being discussed.  Yes, you understand my point well.  For some reason I've always hated thinking of the promise as immutable, but that's the normal terminology.  The reality is that a Promise represents the output of an operation, and will emit the output of that operation to all callers that register with it.  The promise doesn't pass itself as the value to the callbacks, so its immutability is somewhat immaterial. I'm not arguing with you on that point, just the general description of the pattern.  The more I think about it, the more I'm realizing how inappropriate something like a deferred or promise is to this discussion.  

Unfortunately my knowledge of coroutines is somewhat limited, and my time the lasts couple of days, and the next couple, is preventing me from giving it a good think through.  I understand them well enough to know they're cool, and I'm pretty sure I like the idea of making them the event loop mechanism.  I think it would be good for us all to continuously revisit concrete examples during the discussion, because the set of core I/O are small enough to revisit multiple times.  If a much more general mechanism naturally falls out then great.  

Shane Green
805-452-9666 | shane at

On Oct 15, 2012, at 8:51 AM, Glyph <glyph at> wrote:

> On Oct 15, 2012, at 1:03 AM, Shane Green <shane at> wrote:
>> Namely, all callbacks registered with a given Promise instance, receive the output of the original operation
> This is somewhat tangential to the I/O loop discussion, and my hope for that discussion is that it won't involve Deferreds, or Futures, or Promises, or any other request/response callback management abstraction, because requests and responses are significantly higher level than accept() and recv() and do not belong within the same layer.  The event loop ought to provide tools to experiment with event-driven abstractions so that users can use Deferreds and Promises - which are, fundamentally, perfectly interoperable, and still use standard library network protocol implementations.
> What I think you were trying to say was that callback addition on Deferreds is a destructive operation; whereas your promises are (from the caller's perspective, at least) immutable.  Sometimes I do think that the visibly mutable nature of Deferreds was a mistake.  If I read you properly though, what you're saying is that you can do this:
> promise = ...
> promise.then(alpha).then(beta)
> promise.then(gamma).then(delta)
> and in yield-coroutine style this is effectively:
> value = yield promise
> beta(yield alpha(value))
> delta(yield gamma(value))
> This deficiency is reasonably easy to work around with Deferreds.  You can just do:
> def fork(d):
>     dprime = Deferred()
>     def propagate(result):
>         dprime.callback(result)
>         return result
>     d.addBoth(propagate)
>     return dprime
> and then:
> fork(x).addCallback(alpha).addCallback(beta)
> fork(x).addCallback(gamma).addCallback(delta)
> Perhaps this function should be in Twisted; it's certainly come up a few times.
> But, the fact that the original result is immediately forgotten can also be handy, because it helps the unused result get garbage collected faster, even if multiple things are hanging on to the Deferred after the initial result has been processed.  And it is actually pretty unusual to want to share the same result among multiple callers (which is why this function hasn't been added to the core yet).
> -glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Tue Oct 16 14:08:15 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 16 Oct 2012 22:08:15 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 7:43 PM, Antoine Pitrou <solipsis at> wrote:
> Your example looks rather confusing to me. There are a couple of things
> I don't understand:
> - why does load_url_async return something instead of yielding it?

It yields *and* returns, that's the way Guido's API works (as I understand it).

However, some of the other stuff was just plain mistakes in my example.

Background (again, as I understand it, and I'm sure Guido will correct
me if I'm wrong. So, if you think this sounds crazy, *please wait
until Guido clarifies* before worrying too much about it):

- the "@task" decorator is the part that knows how to interface
generators with the event loop (just as @contextmanager adapts between
generators and with statements). I believe it handles these things:
    - when you call it, it creates the generator object and calls
next() to advance it to the first yield point
    - this initial call returns a Future that will fire only when the
entire *task* is complete
    - if a Future is yielded by the underlying generator, the task
wrapper adds the appropriate callback to ensure results are pushed
back into the underlying generator on completion of the associated
    - when one of these callbacks fires, the generator is advanced and
a yielded Future is processed in the same fashion
    - if at any point the generator finishes instead of yielding
another Future, then the callback will call the appropriate
notification method on the originally *returned* Future
    - yielding anything other than a Future from a tasklet is not permitted
    - it's the IO operations themselves that know how to kick off
operations and register the appropriate callbacks with the event loop
to get the Future to be triggered

- The Future object API is documented in concurrent.futures:

I've now posted this example as a gist
(, so it should be a easier to read
over there. However, I've included it inline below as well.

- This first part in my example is a helper function to wait for any
one of a set of Futures to be signalled and help keep track of which
ones we're still waiting for

    def _wait_first(futures):
        # futures must be a set as items will be removed as they complete
        # we create a signalling future to return to our caller. We will copy
        # the result of the first future to complete to this signalling future
        signal = Future()
        def copy_result(completed):
            # We ignore every callback after the first one
            if signal.done():
            # Keep track of which ones have been processed across multiple calls
            # It would be nice if we could also remove our callback
from all the other futures at
            # this point, but the Future API doesn't currently allow that
            # Now we pass the result of this future through to our
signalling future
            if completed.cancelled():
                    result = completed.result()
                except Exception as exc:

        # Here we hook our signalling future up to all our actual operations
        # If any of them are already complete, then the callback will
fire immediately
        # and we're OK with that
        for f in futures:
        # And, for our signalling future to be useful, the caller must
be able to access it
        return signal

- This is just a public version of the above helper that works with
arbitrary iterables:

    def wait_first(futures):
        # Helper needs a real set, so we give it one
        # Also makes sure all operations start immediately when passed
a generator
        return _wait_first(set(futures))

- This is the API I'm most interested in, as it's the async equivalent
which powers this URL retrieval example:

    # Note that this is an *ordinary iterator*, not a tasklet
    def as_completed(futures):
        # We ensure all the operations have started, and get ourselves
a set to work with
        remaining = set(futures)
        while remaining:
            # The trick here is that we *don't yield the original
futures directly*
            # Instead, we yield
            yield _wait_first(remaining)

And now a more complete, heavily commented, version of the example:

# First, a tasklet for loading a single page
def load_url_async(url)
    # The async URL open operation does three things:
    # 1. kicks off the connection process
    # 2. registers a callback with the event handler that will signal
a Future object when IO is complete
    # 3. returns the future object
    # We then *yield* the Future object, at which point the task
decorator takes over and registers a callback
    # with the *Future* object to resume this generator with the
*result* that was passed to the Future object
    conn = yield urllib.urlopen_async(url)
    # We assume "" is defined in such a way that it allows
both "read everything at once" usage *and* a
    # usage where you read the individual bits of data as they arrive like this:
    #     for wait_for_chunk in
    #         chunk = yield wait_for_chunk
    # The secret is that would be an *ordinary generator*
in that case rather than a tasklet.
    # You could also do a version that *only* supported complete
reads, in which case the "from" wouldn't be needed
    data = yield from
    # We return both the URL *and* the data, so our caller doesn't
have to keep track of which url the data is for
    return url, data

# And now the payoff: defining a tasklet to read a bunch of URLs in
parallel, processing them in the order of loading rather than the
order of requesting them or having to wait until the slowest load
completes before doing anything

def example(urls):
    # We define the tasks we want to run based on the given URLs
    # This results in an iterable of Future objects that will fire when
    # the associated page has been read completely
    tasks = (load_url_async(url) for url in urls)
    # And now we use our helper iterable to run things in parallel
    # and get access to the results as they complete
    for wait_for_page in as_completed(tasks):
            url, data = yield wait_for_page
        except Exception as exc:
            print("Something broke for {!r} ({}: {})".format(url,
type(exc), exc))
            print("Loaded {} bytes from {!r}".format(len(data), url))

# The real kicker here? Replace "yield wait_for_page" with
"wait_for_page.result()" and you have the equivalent
concurrent.futures code.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From Steve.Dower at  Tue Oct 16 16:21:10 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 16 Oct 2012 14:21:10 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

> It yields *and* returns, that's the way Guido's API works (as I understand it).

I can't speak for Guido obviously, but you've certainly described what we came up with perfectly (, the _Awaiter class starts on line 93).

> # The real kicker here? Replace "yield wait_for_page" with
> "wait_for_page.result()" and you have the equivalent
> concurrent.futures code.

Basically, the task/tasklet/async decorator aggregates the futures from inside the wrapped method and exposes a single future to the caller. Your example doesn't even need a scheduler or event loop, and we found that all the event loop was really doing was running the callbacks in a certain thread/context/equivalent. And because there is a future coming out of every call, the user can choose when to stop using tasklets and go back to using plain old futures (or whatever subclasses have been used).

From solipsis at  Tue Oct 16 17:10:37 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 16 Oct 2012 17:10:37 +0200
Subject: [Python-ideas] The async API of the future: yield-from
References: <>
Message-ID: <>

On Tue, 16 Oct 2012 14:21:10 +0000
Steve Dower <Steve.Dower at> wrote:

> > It yields *and* returns, that's the way Guido's API works (as I understand it).
> I can't speak for Guido obviously, but you've certainly described what we came up with perfectly (, the _Awaiter class starts on line 93).
> > # The real kicker here? Replace "yield wait_for_page" with
> > "wait_for_page.result()" and you have the equivalent
> > concurrent.futures code.
> Basically, the task/tasklet/async decorator aggregates the futures from inside the wrapped method and exposes a single future to the caller. Your example doesn't even need a scheduler or event loop, and we found that all the event loop was really doing was running the callbacks in a certain thread/context/equivalent.

I'm sure doing concurrent I/O will require an event loop, unless you
use threads under the hood...



Software development and contracting:

From ironfroggy at  Tue Oct 16 17:18:22 2012
From: ironfroggy at (Calvin Spealman)
Date: Tue, 16 Oct 2012 11:18:22 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 8:55 PM, Greg Ewing <greg.ewing at> wrote:
> Calvin Spealman wrote:
>> I'm still -1 on delegating control to subgenerators with yield-from,
>> versus having the scheduler just deal with them directly.
> Do you mean to *disallow* using yield-from for this, or just
> not to encourage it?
> I don't see how you *could* disallow it; there's no way for
> the scheduler to know whether one of the generators it's
> handling is delegating using yield-from.
> I also can't see any reason you would want to discourage it.
> Given that yield-from exists, it's an obvious thing to do.

I have since changed my position slightly. I don't want to disallow it, no.
I don't want to discourage, no. But, I do think *both* are useful.

I think "yield from" is the obvious way to "call" between tasks, but that
there are other cases when we want to spawn a task to begin without
blocking our task, and that "yield" should be used here. We should be
table to simply yield a task to tell the scheduler to start it,
possibly getting a Future in return which we can use to get the
eventual result. This may make it easier to do multiple sub-tasks
together. We might yield N tasks, and then "yield from wait(futures)"
to wait for them all to complete.

> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From ironfroggy at  Tue Oct 16 17:20:53 2012
From: ironfroggy at (Calvin Spealman)
Date: Tue, 16 Oct 2012 11:20:53 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 10:10 PM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 15, 2012 at 1:14 PM, Greg Ewing <greg.ewing at> wrote:
>> Nick Coghlan wrote:
>>> The main primitive I personally want out of an async API is a
>>> task-based equivalent to concurrent.futures.as_completed() [1]. This
>>> is what I meant about iteration being a bit of a mess: the way the
>>> as_completed() works, the suspend/resume channel of the iterator
>>> protocol is being used to pass completed future objects back to the
>>> calling iterator. That means that channel *can't* be used to talk
>>> between the coroutine and the scheduler,
>> I had to read this a couple of times before I figured out
>> what you're talking about, but I get it now.
>> This is an instance of a general problem that was noticed
>> back when I was discussing my cofunctions idea: using
>> generator-based coroutines, it's not possible to have a
>> "suspendable iterator", because that would require "yield"
>> to have two conflicting meanings: "suspend this coroutine"
>> on one hand, and "provide a value to my caller" on the
>> other.
>> Unfortunately, I suspect that a truly elegant solution to this
>> problem will require yet another language addition -- something
>> like
>>    yield for item in subtask():
>>       ...
>> which would run a slightly different version of the iterator
>> protocol in which values to be yield are wrapped somehow
>> (I haven't figured out all the details yet).
> I think I ran into a similar issue with NDB when defining iteration
> over an asynchronous query. My solution:
>   q = <some query specification>
>   it = q.iter()  # Fire off the query to the datastore
>   while (yield it.has_next_async()):  # Block until one result
>     emp =  # Get the result that was buffered on the iterator
>     print, emp.age  # Use it

Crazy Idea I Probably Don't Actually Want:

for yield emp in q:
  print, emp.age

Turns into something like:

_it = iter(q)
for _emp in _it:
    emp = yield _emp
    print, emp.age

> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From ironfroggy at  Tue Oct 16 17:27:54 2012
From: ironfroggy at (Calvin Spealman)
Date: Tue, 16 Oct 2012 11:27:54 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 3:45 AM, Greg Ewing <greg.ewing at> wrote:
> Calvin Spealman wrote:
>> If we allow spawn(task())
>> then we're not getting nice tracebacks anyway, so I think we should
>> allow
>>   future1 = yield task1() # spawn task
>>   future2 = yield task2() # spawn other task
> I don't think it's necessary to allow 'yield task' as a
> method of spawning in order to get nice tracebacks for
> spawned tasks.

Necessary, no. But I think it feels obvious that you yield things you are
waiting on, and so you want to start a task if you yield it. Also, its
going to be a common primitive, so I think it should be very easy and
clear to write.

> In the Task-object-based system I'm thinking about, if
> an exception reaches the top level of a Task, it gets
> stored in the Task object until another task wait()s
> for it, and then it continues to propagate.
> This makes sense, because the wait() establishes a
> task-subtask relationship, so the traceback should
> proceed from the subtask to the waiting task.

What if two tasks call wait() on the same subtask which raises an
error? I think we should let errors propagate through yield-from,
primarily. That's what it exists for.

>> Both are primitives we
>> need to support as first-class operation. That is, without some wrapper
>> like spawn().
> In my system, spawn() isn't a wrapper -- it *is* the
> primitive way to create an independent task. And I
> think it's the only one we need.

It has to know what scheduler to talk to, right? We might want to
allow multiple schedulers, and tasks shouldn't know who their
scheduler is (right?) so that is another advantage of "yield task()"

> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From Steve.Dower at  Tue Oct 16 18:31:55 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 16 Oct 2012 16:31:55 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

> I'm sure doing concurrent I/O will require an event loop, unless you use threads under the hood...

Polling I/O will require some sort of loop, yes, but I/O that triggers a callback at the OS level (such as ReadFileEx and WriteFileEx on Windows) doesn't need it.

Of course, without an event loop you still need to wait on the future - for polling I/O you could return a subclassed future where waiting starts the polling loop if there isn't a better event loop available.

My view is that the most helpful thing to have in the standard is a way for any code to find and interact with an event loop - if we can discover a scheduler/context/loop/whatever and use its commands for "run this callable as soon as you can" and "run this callable when this condition is true" then we can have portable support for polling or event-based I/O (as well as being able to handle other thread-sensitive code such as in UIs).

For optimal support, you'll need to have very close coupling between the scheduler and the asynchronous operations. This can be built on top of the portable support, but aiming for optimal support initially is a good way to make this API painful to use and more likely to be ignored.

From solipsis at  Tue Oct 16 18:39:59 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 16 Oct 2012 18:39:59 +0200
Subject: [Python-ideas] The async API of the future: yield-from
References: <>
Message-ID: <>

On Tue, 16 Oct 2012 16:31:55 +0000
Steve Dower <Steve.Dower at> wrote:
> > I'm sure doing concurrent I/O will require an event loop, unless you use threads under the hood...
> Polling I/O will require some sort of loop, yes, but I/O that triggers a callback at the OS level (such as ReadFileEx and WriteFileEx on Windows) doesn't need it.

Well, how do you plan for that callback to execute Python code?



From ironfroggy at  Tue Oct 16 18:54:37 2012
From: ironfroggy at (Calvin Spealman)
Date: Tue, 16 Oct 2012 12:54:37 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 12:31 PM, Steve Dower <Steve.Dower at> wrote:
>> I'm sure doing concurrent I/O will require an event loop, unless you use threads under the hood...
> Polling I/O will require some sort of loop, yes, but I/O that triggers a callback at the OS level (such as ReadFileEx and WriteFileEx on Windows) doesn't need it.
> Of course, without an event loop you still need to wait on the future - for polling I/O you could return a subclassed future where waiting starts the polling loop if there isn't a better event loop available.

What if the event poll was just inside a task, not requiring any loop
in the scheduler, or even knowledge by the scheduler, in any way?

An extremely rudimentary version:

class Selector(object):
    def __init__(self):
        self.r = []
        self.w = []
        self.x = []
        self.futures = {}
    def add(self, t, fd, future):
        self.futures[fd] = future
        getattr(self, t).append(fd)
    def __iter__(self): return self
    def __next__(self):
        r = [fd for fd,future in self.r]
        w = [fd for fd,future in self.w]
        x = [fd for fd,future in self.x]
        r, w, x = select(r, w, x)
        for fd in chain(r, w, x):
        for fd in r: self.r.remove(fd)
        for fd in w: self.w.remove(fd)
        for fd in x: self.x.remove(fd)

This, if even to the scheduler, would handle polling completely
outside the scheduler, which makes it easier to mix and match event
loops you need to use in a single project.

I know I probably got details wrong.

> My view is that the most helpful thing to have in the standard is a way for any code to find and interact with an event loop - if we can discover a scheduler/context/loop/whatever and use its commands for "run this callable as soon as you can" and "run this callable when this condition is true" then we can have portable support for polling or event-based I/O (as well as being able to handle other thread-sensitive code such as in UIs).
> For optimal support, you'll need to have very close coupling between the scheduler and the asynchronous operations. This can be built on top of the portable support, but aiming for optimal support initially is a good way to make this API painful to use and more likely to be ignored.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From Steve.Dower at  Tue Oct 16 19:04:34 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 16 Oct 2012 17:04:34 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

> Well, how do you plan for that callback to execute Python code?

IMO, that is the most important question in all of this discussion. 

With any I/O some waiting is required - there must be a point where the application is not doing anything other than waiting for the I/O to complete, regardless of whether a loop is used or not. (Ideally the I/O is already complete by the time we start waiting.) The callbacks in the particular examples require a thread to be in an alertable wait state, which is basically equivalent to select(), though a little less discriminatory (as in, ANY I/O callback can interrupt an alertable wait).

In my view, these callbacks should be 'leaving a message' for the main program to run a particular function when it next has a chance. Like an interrupt handler, the aim is to do the minimum amount of work and then get out of the way.

Having a context (or event loop, message loop or whatever you want to call it) as I described in my last email lets us do the minimum amount of work. I posted our implementation of such a context earlier and Dino posted an example/recipe for using the concept with an existing event loop (Tcl).

So while I said we don't _need_ an event loop, that relies on the asynchronous operations being on a separate thread or otherwise not requiring the current thread to pay any attention to them, AND assumes that the continuations are agile and can be run on any thread (or in any process, or whatever granularity you are working at). I believe some way of getting code running back where it started from is essential, and this is most easily done with a loop.

From ironfroggy at  Tue Oct 16 19:25:31 2012
From: ironfroggy at (Calvin Spealman)
Date: Tue, 16 Oct 2012 13:25:31 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 1:04 PM, Steve Dower <Steve.Dower at> wrote:
>> Well, how do you plan for that callback to execute Python code?
> IMO, that is the most important question in all of this discussion.
> With any I/O some waiting is required - there must be a point where the application is not doing anything other than waiting for the I/O to complete, regardless of whether a loop is used or not. (Ideally the I/O is already complete by the time we start waiting.) The callbacks in the particular examples require a thread to be in an alertable wait state, which is basically equivalent to select(), though a little less discriminatory (as in, ANY I/O callback can interrupt an alertable wait).
> In my view, these callbacks should be 'leaving a message' for the main program to run a particular function when it next has a chance. Like an interrupt handler, the aim is to do the minimum amount of work and then get out of the way.

I like this model as well.

However, I recognize some problems with it. If we don't kick whatever
handles the callback and result immediately, we are essentially
re-introducing pre-emptive scheduling. If TaskA is waiting on the
result of TaskB, and when TaskB finishes we say "OK, but we need to go
let TaskC do something before TaskA is given that result" then we
leave room for C to break things, modify state, and generally act in a
less-than-determinable way.

I really *like* this model better, I just don't know the best way to
reconcile this problem.

> Having a context (or event loop, message loop or whatever you want to call it) as I described in my last email lets us do the minimum amount of work. I posted our implementation of such a context earlier and Dino posted an example/recipe for using the concept with an existing event loop (Tcl).
> So while I said we don't _need_ an event loop, that relies on the asynchronous operations being on a separate thread or otherwise not requiring the current thread to pay any attention to them, AND assumes that the continuations are agile and can be run on any thread (or in any process, or whatever granularity you are working at). I believe some way of getting code running back where it started from is essential, and this is most easily done with a loop.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From Steve.Dower at  Tue Oct 16 19:17:34 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 16 Oct 2012 17:17:34 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

> What if the event poll was just inside a task, not requiring any loop in the scheduler, or even knowledge by the scheduler, in any way?

I agree, every task can handle all the asynchrony within it and just expose a single 'completed' notification (a Future or similar) to its caller. This is the portable solution - it is going to be less than optimal in some cases, but is much more composable and extensible. As a Python developer, I like the model of "I call this function normally and it gives me a Future to let me know when it's done but I don't really know how it's doing it." (Incidentally, I like it as a C# and C++ developer too.) 

From glyph at  Tue Oct 16 20:15:06 2012
From: glyph at (Glyph)
Date: Tue, 16 Oct 2012 11:15:06 -0700
Subject: [Python-ideas] Expressiveness of coroutines versus Deferred
	callbacks (or possibly promises, futures)
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 15, 2012, at 6:51 PM, Guido van Rossum <guido at> wrote:

> (...) But seriously, thanks for repeating the explanation for my benefit.

Glad it was useful.  To be fair, I think this is the first time I've actually written the whole thing down.  And I didn't even get the whole thing down, I missed the following important bit:

> I see your example as a perfect motivation for adding some kind of map() primitive. (...)

You're correct, of course; technically, a map() primitive resolves all the same issues.  It's possible to do everything with generator coroutines that it's possible to do with callbacks explicitly; I shouldn't have made the case for sequencing callbacks on the basis that the behavior can't be replicated.  And, modulo any of my other suggestions, a "map" primitive is a good idea - Twisted implements such a primitive with 'gatherResults' (although, of course, it works on any Deferred, not just those returned by inlineCallbacks).

The real problem with generator coroutines is that if you make them the primitive, you have an abstraction inversion if you want to have callbacks (which, IMHO, are simply more straightforward in many cases).

By using a generator scheduler, you're still using callbacks to implement the sequencing.  At some point, you have to have some code calling, x.send(...), x.close(), and raising StopIteration(), but they are obscured by syntactic sugar.  You still need a low-level callback-scheduling API to integrate with the heart of the event loop.

One area where this abstraction inversion bites you is performance.  Now, my experience might be dated here; I haven't measured in a few years, but as nice as generators can be for structuring complex event flows, that abstraction comes with a non-trivial performance cost.  Exceptions in Python are much better than they used to be, but in CPython they're still not free.  Every return value being replaced with a callback trampoline is bad, but replacing it instead with a generator being advanced, an exception being raised and a callback trampoline is worse.

Of course, maybe inlineCallbacks is just badly implemented, but reviewing the implementation now it looks reasonably minimal.

I don't want to raise the specter of premature optimization here; I'm not claiming that the implementation of the scheduler needs to be squeezed for every ounce of performance before anyone implements anything.  But, by building in the requirement for these unnecessary gyrations to support syntax sugar for every request/response event-driven operation, one precludes the possibility of low-level optimizations for performance-sensitive event coordination later.

Now, if a PyPy developer wants to chime in and tell me I'm full of crap, and either now or in the future StopIteration Exceptions will be free, and will actually send your CPU back in time, as well as giving a pet kitten as a present to a unicorn every time you 'raise', I'll probably believe it and happily retire this argument forever.  But I doubt it.

I'll also grant that it's possible that I'm just the equivalent of a crotchety old assembler programmer here, claiming that we can't afford these fancy automatic register allocators and indirect function calls and run-time linking because they'll never be fast enough for real programs.  But I will note that, rarely as you need it, assembler does still exist at some layer of the C compiler stack, and you can write it yourself if you really want to; nothing will get in your way.

So that's mainly the point I'm trying to make about a Deferred-like abstraction.  Aside from matters of taste and performance, you need to implement your generator coroutines in terms of something, and it might as well be something clean and documented that can be used by people who feel they need it.  This will also help if some future version of Python modifies something about the way that generators work, similar to the way .send() opened the door for non-ugly coroutines in the first place.  Perhaps some optimized version of 'return' with a value?  If the coroutine scheduler is firmly in terms of some other eventual-result API (Deferreds, Futures, Promises), then adding support to that scheduler for @yield_coroutine_v2 should be easy; as would adding support for other things I don't like, like tasklets and greenlets ;).

> It also handles the input arriving in batches (as they do for App Engine Datastore queries). (...)
>> ... I think I've mentioned <> already in one of my previous posts. ...
> NDB's map() does this.

I'm curious as to how this works.  If you are getting a Future callback, don't you only get that once?  How do you re-sequence all of your generators to run the same step again when more data is available?

> In general, whenever you want parallelism in Python, you have to introduce a new function, unless you happen to have a suitable function lying around already;

I'm glad we agree there, at least :).

> so I don't feel I am contradicting myself by proposing a mechanism using callbacks here. It's the callbacks for sequencing that I dislike.

Earlier I was talking about implementing event sequencing as callbacks, which you kind of have to do either way.  Separately, there's the issue of presenting event sequencing as control flow.  While this is definitely useful for high-level applications - at my day job, about half the code I write is decorated with @inlineCallbacks - these high-level applications depend on a huge amount of low-level code (protocol parsers, database bindings, thread pools) being written and exhaustively tested, whose edge cases are much easier to exhaustively flesh out with explicit callbacks.  When you need to test a portion of the control flow, there's no need to fool a generator into executing down to a specific branch point; you just pull out the callback to a top-level name rather than a closure and call it directly.

Also, correct usage of generator coroutines depends on a previous understanding of event-driven programming.  This is why Twisted core maintainers are not particularly sanguine about inlineCallbacks and generally consider it a power-tool for advanced users rather than an introductory facility to make things easier.

In our collective experience helping people understand both Deferreds and inlineCallbacks, there are different paths to enlightenment.

When learning Deferreds, someone with no previous event-driven experience will initially be disgusted; why does their code have to look like such a mess?  Then they'll come to terms with the problem being solved and accept it, but move on to being perplexed: what the heck are these Deferreds doing, anyway?  Finally they start to understand what's happening and move on to depending on the reactor to much, and are somewhat baffled by callbacks never being called.  Finally they realize they should start testing their code by firing Deferreds synchronously and inspecting results, and everything starts to come together.

Keep in mind, as you read the following, that I probably couldn't do my job as effectively without inlineCallbacks and I am probably its biggest fan on the Twisted team, also :).

When learning with inlineCallbacks, someone with no previous event-driven experience will usually be excited.  The 'yield's are weird, but almost exciting - it makes the code feel more advanced somehow, and they sort of understand the concurrency implications, but not really.  It's also convenient!  They just sprinkle in a 'yield' any time they need to make a call that looks like maybe it'll block sometimes.

Everything works okay for a while, and then (inevitably, it seems) they happen across some ordering bug and just absolutely cannot figure out, which causes state corruption (because they blithely stuck a 'yield' between two things that really needed to be in an effective critical section) or hangs (generators hanging around waiting on un-fired Deferreds so you don't even get the traceback out of GC closing them because something's keeping a reference to them; harder to debug even than "normal" unfired Deferreds because they're not familiar with how to inspect or trace the flow of event execution, since the code looked "normal").

Now, this is easier to back out of than a massive multithreaded (read) mess, because the code does at least have a finite number of visible task-switch points, and it's usually possible to track it down with some help.  But the experience is not pleasant, because by this point there are usually 10-deep call-stacks of generator-calling-a-generator-calling-a-generator and, especially in the problematic cases, it's not clear what got started from where.

inlineCallbacks is a great boon to promoting Twisted usage, because some people never make it out of the "everything works okay for a while" phase, and it's much easier to get started.  We certainly support it as best we can - optimize it, add debugging information to it - because we want people to have the best experience possible.  So it's not like it's unmaintained or anything.

But, without Deferreds to fall back down to in order to break down sequencing into super explicit, individual steps, without any potentially misleading syntactic sugar, I don't know how we'd help these folks.

I have a few thoughts on how our experiences have differed here, since I'm assuming you don't hear these sorts of complaints about NDB.

One is that Twisted users are typically dealing with a truly bewildering diversity of events, whereas NDB is, as you said, mostly a database client.  It's not entirely unusual for a Twisted application to be processing events from a serial port, a USB device, some PTYs, a couple of server connections, some timed events, some threads (usually database connections) and some HTTP client connections.

Another is that we only hear from users with problems.  Maybe there are millions of successful users of inlineCallbacks who have architected everything from tiny scripts to massive distributed systems without ever needing to say so much as a how-do-you-do to the Twisted mailing list or IRC channel.  (Somehow I doubt this is completely accurate but maybe it accounts for some of our perspective.)

Nevertheless I feel like the strategy of backing out a generator into lower-level discrete callback-sequenced operations is a very important tool in the debugging toolbox.

> Or maybe map_async()'s Future's result should be a set?

Well really it ought to be a dataflow of some kind so you can enumerate it as it's going :).  But I think if the results arrive in some order you ought to be able to see that order in application code, even if you usually don't care.

> I don't want to belabor this point, but it bugs me a little bit that we get so much feedback from the broader Python community along the lines of "Why doesn't Twisted do X?
> I don't think I quite said that.

Sorry, I didn't mean to say that you did.  I raised the point because people who do say things like that tend to cite your opinions that e.g. Monocle is something new and different as reasons why they thought that Twisted didn't do what it did.  (I certainly sympathize with the pressure that comes along with everyone scrutinizing every word one says and trying to discover hidden meaning; I'm sure that in a message as long as this one, someone will draw at least five wrong conclusions from me, too.)

> But I suspect it happens because Twisted is hard to get into.

Part of it's a marketing issue.  Like, if we just converted all of our examples to inlineCallbacks and let people trip over the problems we've seen later on, I'm sure we would get more adoption, and possibly not even a backlash later; people with bugs in their programs tend to think that there's a bug in their programs.  They only blame the tools when the programs are hard to write in the first place.

Part of it is a background issue.  GUI programmers and people who have worked with multiplayer games instantly recognize what Deferreds are for and are usually up and running within minutes.  People primarily with experience with databases and web servers - a pretty big audience, in this day and age - are usually mystified.

But, there are intractable parts of it, too.  The Twisted culture is all about extreme reliability and getting a good reputation for systems built using it, and I guess we've made some compromises about expanding our audience in service of that goal.

> I suspect anything using higher-order functions this much has that problem; I feel this way about Haskell's Monads.

I've heard several people who do know Haskell say things like "Deferreds are just a trivial linearization of the I/O eigenfunctor over the monadic category of callbacks" and it does worry me.  I still think they're relatively straightforward - I invented them in one afternoon when I was about 20 and they have changed relatively little since then - but if they're actually a homomorphism of the lambda calculus over the event manifold as it approaches the monad limit (or whatever: does anyone else feel like Haskell people have great ideas, but they have sworn a solemn vow to only describe them in a language that can only be translated by using undiscovered stone tablets buried on the dark side of the moon?) then I can understand why some users have a hard time.

> I wouldn't be surprised if many Twisted lovers are also closet (or not) Haskell lovers.

There are definitely some appealing concepts there.  Their 'async' package, for example, does everything in the completely wrong, naive but apparently straightforward way that Java originally did (asynchronous exceptions? communication via shared mutable state? everything's a thread? no event-driven I/O?) but I/O is so limited and the VM is so high tech that it might actually be able to work.  I suppose I can best summarize my feelings as <>.

Anyway, back on topic...

> I don't feel very strongly about integrating GUI systems. IIRC Twisted has some way to integrate with certain GUI event loops. I don't think we should desire any more (but neither, less).

Yeah, all we do is dispatch Twisted events from the GUI's loop, usually using the GUI's built-in support for sockets.  So your GUI app runs as a normal app.  You can, of course, return a Deferred from a function that prompts the user for input, and fire it from a GUI callback, and that'll all work fine: Deferreds don't actually depend on the reactor at all, so you can use them from any callback (they are only in the 'internet' package where the event loop goes for unfortunate historical reasons).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From matthias at  Tue Oct 16 20:40:07 2012
From: matthias at (Matthias Urlichs)
Date: Tue, 16 Oct 2012 18:40:07 +0000 (UTC)
Subject: [Python-ideas] asyncore: included batteries don't fit
References: <>
Message-ID: <>

I'll have to put in my ..02? here ?

Guido van Rossum <guido at ...> writes:

> (2) We're at a fork in the road here. On the one hand, we could choose
> to deeply integrate greenlets/gevents into the standard library.


I have two and a half reasons for this.

(?) Ultimately I think that switching stacks around is always going to be faster
than unwinding and re-winding things with yield().

(1) It's a whole lot easier to debug a problem with gevent than with anything
which uses yield / Deferreds / asyncore / whatever. With gevent, you get a
standard stack trace. With anything else, the "where did this call come from"
information is not part of the call chain and thus is either unavailable, or
will have to be carried around preemptively (with associated overhead).

(2) Nothing against Twisted or any other async frameworks, but writing any
nontrivial program in it requires warping my brain into something that's *not*
second nature in Python, and never going to be.

Python is not Javascript; if you want to use the "loads of callbacks"
programming style, use node.js.

Personal experience: I have written an interpreter for an asynchronous and
vaguely Pythonic language which I use for home automation, my lawn sprinkers,
and related stuff (which I should probably release in some form). The code was
previously based on Twisted and was impossible to debug. It now uses gevent and
Just Works.

-- Matthias Urlichs

From guido at  Tue Oct 16 21:58:18 2012
From: guido at (Guido van Rossum)
Date: Tue, 16 Oct 2012 12:58:18 -0700
Subject: [Python-ideas] Expressiveness of coroutines versus Deferred
 callbacks (or possibly promises, futures)
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 11:15 AM, Glyph <glyph at> wrote:

It'll be days before I digest all of that. But thank you very much for
writing it all up. You bring up all sorts of interesting issues. I
think I would like to start discovering some of the issues by writing
an extensive prototype using Greg Ewing's model -- it is the most
radical but therefore most worthy of some serious prototyping before
either adopting or rejecting it.

--Guido van Rossum (

From guido at  Tue Oct 16 22:07:12 2012
From: guido at (Guido van Rossum)
Date: Tue, 16 Oct 2012 13:07:12 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 10:39 PM, Greg Ewing
<greg.ewing at> wrote:
> Nick Coghlan wrote:
>> (this is why I disagree with Greg that
>> "yield from" can serve as the one true API - it doesn't handle partial
>> iteration, and it doesn't handle pre- or post- processing around the
>> suspension points while iterating).
> I'm aware of the iteration problem, but I'm not convinced
> that the convolutions necessary to make it possible to use
> a for-loop for this are worth the bother, as opposed to
> simply accepting that you can't use the for statement in
> this situation, and using some other kind of loop.
> In any case, even if we decide to provide a scheduler
> instruction to enable using for-loops on suspendable
> iterators somehow, it doesn't follow that we should use
> scheduler instructions for anything *else*.

I don't see how we could ever have  a for-loop that yields on every
iteration step. The for-loop never uses yield. Thus there can be no
direct equivalent to as_completed() in the PEP 380 or PEP 342
coroutine worlds.

> I would consider such a scheduler instruction to be a stopgap
> measure until we can find a better solution -- just as
> yield-from is a better solution than using "call" and "return"
> scheduler instructions.

I can already see the armchair language designers race to propose
syntax the puts a yield keyword in the for-loop syntax at a point
where it is currently not allowed. Let's nip that in the bud and focus
on something that can work with Python 3.3.

--Guido van Rossum (

From guido at  Tue Oct 16 22:18:02 2012
From: guido at (Guido van Rossum)
Date: Tue, 16 Oct 2012 13:18:02 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 12:20 AM, Greg Ewing
<greg.ewing at> wrote:
> Guido van Rossum wrote:
>> But there needs to be another way to get a task running immediately
>> and concurrently; I believe that would be
>> a = spawn(foo_task())
>> right? One could then at any later point use
>> ra = yield from a
> Hmmm. I suppose it *could* be made to work that way, but I'm
> not sure it's a good idea, because it blurs the distinction
> between invoking a subtask synchronously and waiting for the
> result of a previously spawned independent task.

Are you sure you really want to distinguish between those though? In
NDB they are intentionally the same -- invoking some API whose name
ends in _async() starts an async subtask and returns a Future; you
wait for the subtask by yielding the Future.

Starting multiple tasks is just a matter of calling several _async()
APIs; then you can wait for any or all of them using yield [future1,
future2, ...] *or* by yielding the futures one at a time. This gives
users a gentle introduction to concurrency (first they use the
synchronous APIs; then they learn to use yield foo_async(); then they
learn they can write:

f = foo_async()
<other work>
r = yield f

and finally they learn about spawning multiple tasks:

f1 = foo_async()
f2 = bar_async()
rfoo, rbar = yield f1, f2

> Recently I've been thinking about an implementation where
> it would look like this. First you do
>    t = spawn(foo_task())
> but what you get back is *not* a generator; rather it's
> a Task object which wraps a generator and provides various
> operations. One of them would be
>    r = yield from t.wait()
> which waits for the task to complete and then returns its
> value (or if it raised an exception, propagates the exception).
> Other operations that a Task object might support include
>    t.unblock()        # wake up a blocked task
>    t.cancel()         # unschedule and clean up the task
>    t.throw(exception) # raise an exception in the task
> (I haven't included t.block(), because I think that should
> be a stand-alone function that operates on the current task.
> Telling some other task to block feels like a dodgy thing
> to do.)

Right. I'm looking forward to a larger example.

>> One could also combine these and do e.g.
>> a = spawn(foo_task())
>> b = spawn(bar_task())
>> <do more work locally>
>> ra, rb = yield from par(a, b)
> If you're happy to bail out at the first exception, you
> wouldn't strictly need a par() function for this, you could
> just do
>    a = spawn(foo_task())
>    b = spawn(bar_task())
>    ra = yield from a.wait()
>    rb = yield from b.wait()
>> Have I got the spelling for spawn() right? In many other systems (e.g.
>> threads, greenlets) this kind of operation takes a callable, not the
>> result of calling a function (albeit a generator).
> That's a result of the fact that a generator doesn't start
> running as soon as you call it. If you don't like that, the
> spawn() operation could be defined to take an uncalled generator
> and make the call for you. But I think it's useful to make the
> call yourself, because it gives you an opportunity to pass
> parameters to the task.

Agreed, actually. I was just checking.

>> If it takes a
>> generator, would it return the same generator or a different one to
>> wait for?
> In your version above where you wait for the task simply
> by calling it with yield-from, spawn() would have to return a
> generator (or something with the same interface). But it
> couldn't be the same generator -- it would have to be a wrapper
> that takes care of blocking until the subtask is finished.

That's fine with me (though Glyph would worry about creating too many objects).

--Guido van Rossum (

From greg.ewing at  Tue Oct 16 22:27:53 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 17 Oct 2012 09:27:53 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Nick Coghlan wrote:

>     # Note that this is an *ordinary iterator*, not a tasklet
>     def as_completed(futures):
>         # We ensure all the operations have started, and get ourselves
> a set to work with
>         remaining = set(futures)
>         while remaining:
>             # The trick here is that we *don't yield the original
> futures directly*
>             # Instead, we yield
>             yield _wait_first(remaining)

I've just figured out how your as_completed() thing works,
and realised that it's *not* a general solution to the
suspendable-iterator problem. You're making use of the fact
that you know *how many* values there will be ahead of time,
even if you don't know what they are yet.

In general this won't be the case. I don't think there is
any trick that will allow a for-loop to be used in the general
case, because in order for an iterator to be suspendable, the
call to next() would need to be made using yield-from, and
it's hidden inside the for-loop implementation.

I know you probably weren't intending as_completed() to be
a solution to the general suspendable-iterator problem.
I just wanted to record my thoughts on this.


From greg.ewing at  Tue Oct 16 22:48:51 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 17 Oct 2012 09:48:51 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Calvin Spealman wrote:

> I think "yield from" is the obvious way to "call" between tasks, but that
> there are other cases when we want to spawn a task to begin without
> blocking our task, and that "yield" should be used here.

I've thought of another problem with this. In my scheduler at
least, simply spawning a task doesn't immediately allow that
task, or any other, to run. Using "yield" to spell this operation
gives the impression that it could be a suspension point, when
it's actually not.

It also forces anything that uses it to be called with "yield
from", all the way up, so if you're relying on the presence of
yield-froms to warn you of potential suspension points, you'll
get false positives.


From greg.ewing at  Tue Oct 16 23:14:11 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 17 Oct 2012 10:14:11 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Calvin Spealman wrote:

> What if two tasks call wait() on the same subtask which raises an
> error?

That would be disallowed. A given Task would only be allowed
to have its wait() method called once.

The reason for this restriction is because of the way tracebacks
are attached to exception objects in Python 3, which means that
exceptions are effectively single-use now. If it weren't for
that, the exception could simply be raised in *both* waiters.

> I think we should let errors propagate through yield-from,
> primarily. That's what it exists for.

Yes, and that's exactly what my wait() mechanism does. You call
the wait() method using yield-from.

The important idea is that just because you spawn a task, it
doesn't necessarily follow that you want to be regarded as the
*parent* of that task and receive its exceptions. That only
becomes clear when you wait() for it.

>>In my system, spawn() isn't a wrapper -- it *is* the
>>primitive way to create an independent task. And I
>>think it's the only one we need.
> It has to know what scheduler to talk to, right?

Yes, but in my world, there is only *one* scheduler.

I understand that not everyone thinks that's a good idea,
and I'm thinking about ways to remove that restriction. But
I'm not yet sure that it *should* be removed even if it can.
It seems to me that having multiple schedulers is inviting
many of the same problems as having multiple event loops,
and this whole disussion is centred on the idea that there
should only be one of those.

Just to be clear, I'm not saying there should only be one
scheduler *implementation* in existence -- only that there
should only be one *instance* of some scheduler implementation
in any given program (or thread, if you're using those). And
there should be a standard interface for it and an agreed
way of finding the instance.

What you're saying is that the standard interface should
consist of yielded instructions and the instance should be
found implicitly using dynamic scoping. This is *very*
different from the kind of interface used for everything
else in Python, and I'm not yet convinced that such a
large amount of weirdness is justified.


From guido at  Tue Oct 16 23:31:00 2012
From: guido at (Guido van Rossum)
Date: Tue, 16 Oct 2012 14:31:00 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 2:14 PM, Greg Ewing <greg.ewing at> wrote:
> The important idea is that just because you spawn a task, it
> doesn't necessarily follow that you want to be regarded as the
> *parent* of that task and receive its exceptions. That only
> becomes clear when you wait() for it.

Maybe. But the opposite doesn't follow either. It's a toss-up between
the spawner and the waiter.

--Guido van Rossum (

From Steve.Dower at  Tue Oct 16 23:31:53 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 16 Oct 2012 21:31:53 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

> Yes, but in my world, there is only *one* scheduler.
> Just to be clear, I'm not saying there should only be one scheduler *implementation* in existence -- only that 
> there should only be one *instance* of some scheduler implementation in any given program (or thread, if
> you're using those). And there should be a standard interface for it and an agreed way of finding the instance.

I agree with this entirely. There are a lot of optimisations to be had with different scheduler implementations, but the only way this can be portable is with a minimum supported interface and a standard way to find it.

From ironfroggy at  Wed Oct 17 00:33:55 2012
From: ironfroggy at (Calvin Spealman)
Date: Tue, 16 Oct 2012 18:33:55 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 4:48 PM, Greg Ewing <greg.ewing at> wrote:
> Calvin Spealman wrote:
>> I think "yield from" is the obvious way to "call" between tasks, but that
>> there are other cases when we want to spawn a task to begin without
>> blocking our task, and that "yield" should be used here.
> I've thought of another problem with this. In my scheduler at
> least, simply spawning a task doesn't immediately allow that
> task, or any other, to run. Using "yield" to spell this operation
> gives the impression that it could be a suspension point, when
> it's actually not.

While i still like the feeling, I must concede this point. I could see
them being yielded and forgotten... assuming they would suspend.

> It also forces anything that uses it to be called with "yield
> from", all the way up, so if you're relying on the presence of
> yield-froms to warn you of potential suspension points, you'll
> get false positives.
> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From ironfroggy at  Wed Oct 17 00:37:33 2012
From: ironfroggy at (Calvin Spealman)
Date: Tue, 16 Oct 2012 18:37:33 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 5:14 PM, Greg Ewing <greg.ewing at> wrote:
> Calvin Spealman wrote:
>> What if two tasks call wait() on the same subtask which raises an
>> error?
> That would be disallowed. A given Task would only be allowed
> to have its wait() method called once.
> The reason for this restriction is because of the way tracebacks
> are attached to exception objects in Python 3, which means that
> exceptions are effectively single-use now. If it weren't for
> that, the exception could simply be raised in *both* waiters.
>> I think we should let errors propagate through yield-from,
>> primarily. That's what it exists for.
> Yes, and that's exactly what my wait() mechanism does. You call
> the wait() method using yield-from.
> The important idea is that just because you spawn a task, it
> doesn't necessarily follow that you want to be regarded as the
> *parent* of that task and receive its exceptions. That only
> becomes clear when you wait() for it.
>>> In my system, spawn() isn't a wrapper -- it *is* the
>>> primitive way to create an independent task. And I
>>> think it's the only one we need.
>> It has to know what scheduler to talk to, right?
> Yes, but in my world, there is only *one* scheduler.

Practically speaking, that is nice. But, are there use cases for
multiple schedulers we should support?

I also like the idea of the scheduler being an iterable, and thus
itself being something you can schedule. Turtles all the way down.

> I understand that not everyone thinks that's a good idea,
> and I'm thinking about ways to remove that restriction. But
> I'm not yet sure that it *should* be removed even if it can.
> It seems to me that having multiple schedulers is inviting
> many of the same problems as having multiple event loops,
> and this whole disussion is centred on the idea that there
> should only be one of those.
> Just to be clear, I'm not saying there should only be one
> scheduler *implementation* in existence -- only that there
> should only be one *instance* of some scheduler implementation
> in any given program (or thread, if you're using those). And
> there should be a standard interface for it and an agreed
> way of finding the instance.
> What you're saying is that the standard interface should
> consist of yielded instructions and the instance should be
> found implicitly using dynamic scoping. This is *very*
> different from the kind of interface used for everything
> else in Python, and I'm not yet convinced that such a
> large amount of weirdness is justified.

I don't follow the part about "found implicitly using dynamic
scoping". What do you mean?

In my model, the tasks never find the scheduler at all. They
don't directly access it at all.

> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From pjdelport at  Wed Oct 17 00:56:44 2012
From: pjdelport at (Piet Delport)
Date: Wed, 17 Oct 2012 00:56:44 +0200
Subject: [Python-ideas] Proposal: A simple protocol for generator tasks
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 15, 2012 at 12:48 PM, Calvin Spealman <ironfroggy at> wrote:
> What is the difference between the tossed around "yield from task()"
> and this "yield tasklib.spawn(task())"

"yield from task()" is simply the coroutine / task version of a function
call: it runs the task to completion, and returns its final result.

"yield tasklib.spawn(task())" (or however it ends up being spelled)
would be a scheduler primitive to start a task *without* waiting for its
result: in other words, it's a request that the scheduler start a new,
independent thread of control.

> And, why isn't it simply spelled "yield task()"? You have all these different
> types that can be yielded to the scheduler from tasks to the scheduler. Why
> isn't a task one of those possible types? If the scheduler gets an iterator, it
> should schedule it automatically.

This is a good question: I stopped short of discussing it in the
original message only to keep it short, and in the hope that the answer
is implied.

The short answer is that "yield task()" is the old, hacky, cumbersome,
"legacy"[1] way of calling subtasks, and that "yield from" should
entirely replace the need to have to support it.

Before "yield from", "yield task()" was the only to call subtasks, but
this approach has some major disadvantages:

1. In order for it to work, schedulers must manually implement task
   trampolining, which is ugly at best, and prone to bugs if not all
   edge cases are handled correctly. (IOW, it effectively places the
   burden of implementing PEP 380 onto each scheduler.)

2. It obfuscates exception tracebacks by default, requiring schedulers
   that want readable stack traces to take additional pains to clean up
   their own non-task frames, while propagating exceptions.

3. It requires schedulers to reliably distinguish between tasks and
   other primitives in the first place.

   Simply treating all iterators as tasks is not sufficient: to run a
   task, you need send() and throw(), at least. (Type-checking for
   GeneratorType would be marginally better, but would unnecessarily
   preclude for example implementing tasks as classes or C extension
   types, which is otherwise entirely possible with this protocol.)

"yield from" simplifies and solves all these problems in elegant swoop:

1. No more manual trampolining: a scheduler can treat any task as a
   single unit, and only needs to worry about the single, combined
   stream of instructions coming from it.

2. Tracebacks (and return values) take care of themselves, as they

3. By separating the concerns of direct scheduler communication
   ("yield") and subtask delegation ("yield from"), schedulers can limit
   themselves to just knowing about scheduler primitives when dealing
   yielded values, which should be more easily and tightly defined than
   the full spectrum of tasks in general. (The set of officially-defined
   scheduler instructions could end up being as small as None and
   Future, say.)

In summary, it's entirely possible for schedulers to continue supporting
the old "yield task()" way of calling subtasks (and this has no problem
fitting into the proposed protocol[2]), but there should be no reason to
do so, and several good reasons not to: hopefully, it will become a
pre-3.3 historical footnote.

[1] For the purposes of this email, interpret "legacy" to mean "older
    than 17 days". :)

[2] Interpreted as a scheduler instruction, a task value would simply
    mean "resume the current task with the result of completing the
    yielded subtask" (modulo the practical question of reliably
    type-checking tasks, as mentioned).

>> Raising TypeError or NotImplementedError back into the task is probably
>> a reasonable action, and would allow code like:
>>     def task():
>>         try:
>>             yield fancy_magic_instruction()
>>         except NotImplementedError:
>>             yield from boring_fallback()
>>         ...
> Interesting. Can anyone think of an example of this?

I just want to note for the record that I'm not *encouraging* this kind
of thing: I'm just just observing that it would be allowed by the

(However, one imaginable use case would be for tasks to send
scheduler-specific hints, that can safely be ignored when those tasks
are running on other scheduler implementations.)

>> This is a plain observation on its own, however, it raises one or two
>> interesting possibilities for more interesting schedulers implemented as
>> generator tasks themselves, including:
>> - Specialized sub-schedulers that run as a normal task within their
>>   parent scheduler, but implement for example weighted or priority
>>   queuing of their subtasks, or similar features.
> I think that is too messy, you could have so many different scheduler
> semantics. Maybe this sort of thing is what your schedule-specific
> instructions should be for.

It shouldn't get messy: the core semantics of any scheduler should
always stay within the proposed protocol.

The above is not the best example of a custom scheduler, though.
Perhaps a better example would be a generic helper function like the
following, that implements throttling throttling of I/O requests made
through it:

    def task():
        result = yield from io_throttled(subtask(), rate=foo)

io_throttled() would end up sitting between task() and subtask() in the
hierarchy, like so:

    ... -> task() -> io_throttled() -> subtask() -> ...

To recap, each task is implicitly driven by the scheduler above it, and
implicitly drives the task(s) below it: The outer scheduler drives
task(), which drives io_throttled(), which drives subtask(), and so on.

In this picture: "yield from" is the "most default" scheduler: it simply
delegates all yielded instructions to the outer scheduler.

However, instead of relying on "yield from", io_throttled() can dip down
into the task protocol itself, and drive subtask() directly. This would
allow it to inspect and manipulate the underlying instructions
instructions and responses flowing back and forth, and, assuming that
there's a recognizable standard representation for I/O primitives, it
could keep track of the rate of I/O, and insert delay instructions as
necessary (or something similar).

The key observations I want to make:

* io_throttled() is not special: it is just a normal task, as far as the
  tasks above and below it are concerned, and assumes only a
  recognizable representation of the fundamental I/O and delay
  instructions used.

* To the extent that said underlying primitives are scheduler-agnostic,
  io_throttled() can be used or inserted anywhere, without caring how
  the underlying scheduler or event loop handles I/O, or how its global
  API looks. It just acts locally, in terms of the task protocol.

An example where this kind of thing might actually be useful is an
application or library that wishes to throttle, say, certain HTTP
requests: it could simply internally wrap the tasks that make those
requests in io_throttled(), without any special support from the
underlying scheduler.

This is of course not the only way to solve this particular problem, but
it's an example of how thinking about generator tasks and their
schedulers as two sides of the same underlying protocol could be a
powerful abstraction, enabling a compositional approach to combining
implementations of the protocol that might not be obvious or possible

Piet Delport

From ncoghlan at  Wed Oct 17 06:21:27 2012
From: ncoghlan at (Nick Coghlan)
Date: Wed, 17 Oct 2012 14:21:27 +1000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 17, 2012 at 6:27 AM, Greg Ewing <greg.ewing at> wrote:
> Nick Coghlan wrote:
>>     # Note that this is an *ordinary iterator*, not a tasklet
>>     def as_completed(futures):
>>         # We ensure all the operations have started, and get ourselves
>> a set to work with
>>         remaining = set(futures)
>>         while remaining:
>>             # The trick here is that we *don't yield the original
>> futures directly*
>>             # Instead, we yield
>>             yield _wait_first(remaining)
> I've just figured out how your as_completed() thing works,
> and realised that it's *not* a general solution to the
> suspendable-iterator problem. You're making use of the fact
> that you know *how many* values there will be ahead of time,
> even if you don't know what they are yet.
> In general this won't be the case. I don't think there is
> any trick that will allow a for-loop to be used in the general
> case, because in order for an iterator to be suspendable, the
> call to next() would need to be made using yield-from, and
> it's hidden inside the for-loop implementation.

Yeah, that's what lets me get away with not passing the sent results
back down into the iterator (it can figure out from the original
arguments when it needs to stop). It gets trickier if you want to
terminate the iteration based on the result of an asynchronous

For example, here's a very simplistic way you could apply the concept
of "yield a future to be handled in the loop body" to the operation of
continuously reading binary data from a connection until EOF is

    def read(self):
        """This knows how to start an IO operation such the future
will fire on completion"""
        future = ...
        return future

    # Again, notice this is *not* a tasklet, it's an ordinary iterator
that produces Future objects
    def readall(self):
        """This can be used in two modes - as an iterator or as a coroutine.

        As a coroutine:
            data = yield from conn.readall()

        As an iterator:
            for wait_for_chunk in conn.readall():
                    chunk = yield wait_for_chunk
                except EOFError:

        Obviously, the coroutine mode is far more convenient, but you
*can* override
        the default accumulator behaviour if you want/need to by
waiting on the individual
        futures explicitly. However, in this case, you lose the
automatic loop termination
        behaviour, so, you may as well implement the underlying loop explicitly:

            while 1:
                    chunk = yield
                except EOFError:

        output = io.BytesIO()
        while 1:
                data = yield
            except EOFError:
            if data: # This check makes iterator mode possible
        return output.getvalue()

Impedance matching in a way that allows the exception handling to be
factored out as well as the iteration step is a *lot* trickier, since
you need to bring context managers into play if termination is
signalled by an exception:

    # This version produces context managers rather than producing
futures directly, and thus can't be
    # used directly as a coroutine
    def read_chunks(self):
        finished = False
        def handle_chunk():
            nonlocal finished
            data = b''
                data = yield
            except EOFError:
                finished = True
            return data
        while not finished:
            yield handle_chunk()

    # Usage
    for handle_chunk in conn.read_chunks():
        with handle_chunk as wait_for_chunk:
            chunk = yield from wait_for_chunk
        # We end up doing a final "extra" iteration with chunk = b''
        # So we'd likely need to guard with an "if chunk:" or "if not
chunk: continue"
        # which again means we're not getting much value out of using
the iterator

Using an explicit "add_done_callback" doesn't help much, as you still
have to deal with the exception being thrown back in to your

I know Guido doesn't want people racing off and designing new syntax
for asynchronous iteration, but I'm not sure it's going to be possible
to avoid it if we want a clean approach to "forking" the results of
asynchronous calls between passing them down into a coroutine (to
decide whether or not to terminate iteration) and binding them to a
local variable (to allow local processing in the loop body). Compare
the arcane incantations above to something like (similar to
suggestions previously made by Christian Heimes):

    def read_chunks(self):
        """Designed for use as an asynchronous iterator"""
        while 1:
            except EOFError:

    # Usage
    for chunk in yield from conn.read_chunks():

The idea here would be that whereas "for chunk in (yield from
conn.read_chunks()):" runs the underlying coroutine to completion and
then iterates over the return value, the version without the
parentheses would effectively "tee" the values being sent back,
*first* sending them to the underlying coroutine (to decide whether or
not iteration should continue and to get the value to be yielded at
the start of the next iteration) and then, if that doesn't raise
StopIteration, binding them to the local variable and proceeding to
execution of the loop body.

All that said, I still like Guido's concept that the core asynchronous
API is *really* future objects, just as it already is in the
concurrent.futures module. The @task decorator and yielding future
objects to that decorator is then just nice syntactic sugar for
hooking generators up to the "add_done_callback" API of future
objects. It's completely independent of the underlying event loop
and/or asynchronous IO interfaces - those interfaces are about setting
things up to invoke the set_* methods of the returned future objects
correctly, just as they are with the Executor API in

> I know you probably weren't intending as_completed() to be
> a solution to the general suspendable-iterator problem.

Right, I just wanted to be sure that *that particular use case* of
waiting for a collection of futures and processing them in completion
order could be handled in terms of Guido's API *without* needing any
extra magic. The "iterate over data chunks until EOFError is raised"
is a better example for highlighting the "how do you write an
asynchronous iterator?" problem when it comes to


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From pjdelport at  Wed Oct 17 07:31:10 2012
From: pjdelport at (Piet Delport)
Date: Wed, 17 Oct 2012 07:31:10 +0200
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 16, 2012 at 9:45 AM, Greg Ewing <greg.ewing at> wrote:
> In my system, spawn() isn't a wrapper -- it *is* the
> primitive way to create an independent task. And I
> think it's the only one we need.

I think you will at minimum need a way to suspend and resume tasks, in
addition to spawn(), as illustrated by the example of par() waiting for
not CPU-bound tasks.

This could be done either as actual suspend and resume primitives, or by
building on a related set of synchronization primitives, such as queues,
channels, or condition variables: there are a number of sets of that are
mutually co-expressible.

Suspending and resuming, in particular, is highly related to the
question of how you reify a task as a conventional callback, when the
need for that arises.

Here's one possible way of approaching this with a combined
suspend/resume primitive that might look familiar to people with a FP

    result = yield suspend(lambda resume: ...)

(Here, "suspend" could be a scheduler-agnostic instruction object, a la
tasklib.suspend(), or a method on a global scheduler.)

suspend() would instruct the scheduler to stop running the current task,
and call its argument (the lambda in the above example) with a
"resume(value)" callable that will arrange to resume the task again with
the given value. The body of the lambda (or whatever is passed to
suspend()) would be responsible for doing something useful with the
resume() callable: e.g. in par() example, it would arrange that the last
child task triggers it.

In particular, this suspend() could be used to integrate fairly directly
with callback-based APIs: for example, if you have a Twisted Deferred,
you could do:

    result = yield suspend(d.addCallback)

to suspend the current task and add a callback to d that will resume it
again, and receive the Deferred's result.

To add support for exceptions, a variation of suspend() could pass two
callables, mirroring pairs like send/throw, or callback/errback:

    result = yield suspend2(lambda resume, throw: ...)

    result = yield suspend2(d.addCallbacks)

Piet Delport

From greg.ewing at  Wed Oct 17 08:04:31 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 17 Oct 2012 19:04:31 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Calvin Spealman wrote:
> If we don't kick whatever
> handles the callback and result immediately, we are essentially
> re-introducing pre-emptive scheduling. If TaskA is waiting on the
> result of TaskB, and when TaskB finishes we say "OK, but we need to go
> let TaskC do something before TaskA is given that result" then we
> leave room for C to break things, modify state, and generally act in a
> less-than-determinable way.

I don't see how the risk of this is any higher than the risk
that some other task D gets run while task A is waiting and
messes something up.

Ultimately you have to trust your tasks to behave themselves.


From greg.ewing at  Wed Oct 17 09:26:44 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 17 Oct 2012 20:26:44 +1300
Subject: [Python-ideas] Expressiveness of coroutines versus Deferred
 callbacks (or possibly promises, futures)
In-Reply-To: <>
References: <>
Message-ID: <>

Glyph wrote:

> The real problem with generator coroutines is that if you make them the 
> primitive, you have an abstraction inversion if you want to have 
> callbacks

Has anyone suggested making generator coroutines "the primitive",
whatever that means?

Guido seems to have made it clear that he wants the interface
to the event loop layer to be based on plain callbacks. To plug
in a generator coroutine, you install a callback that wakes up
the coroutine. So using generators with the event loop will be
entirely optional.

> I haven't measured in a few 
> years, but as nice as generators can be for structuring complex event 
> flows, that abstraction comes with a non-trivial performance cost. 
> ...  Every return value being replaced with 
> a callback trampoline is bad, but replacing it instead with a generator 
> being advanced, an exception being raised /and /a callback trampoline is 
> worse.

This is where we expect yield-from to help a *lot*, by removing
almost all of that overhead. A return to the trampoline is only
needed when a task wants to yield the CPU, instead of every time
it makes a function call to a subgenerator.

Returns are still a bit more expensive due to the StopIterations,
but raising and catching an exception in C code is still fairly
efficient compared to doing it in Python. (Although not quite as
super-efficient as it was in Python 2.x, unfortunately, due to
tracebacks being attached to exceptions, so that we can't
instantiate exceptions lazily any more.)


From greg.ewing at  Wed Oct 17 09:30:16 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 17 Oct 2012 20:30:16 +1300
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Matthias Urlichs wrote:

> (1) It's a whole lot easier to debug a problem with gevent than with anything
> which uses yield / Deferreds / asyncore / whatever. With gevent, you get a
> standard stack trace. With anything else, the "where did this call come from"
> information is not part of the call chain

With yield-from this is no longer true -- you get exactly the same
traceback from a yield-from call chain that you would get from the
corresponding ordinary call chain, without having to do anything
special. This is one of the beauties of it.


From tismer at  Wed Oct 17 10:25:03 2012
From: tismer at (Christian Tismer)
Date: Wed, 17 Oct 2012 10:25:03 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Ok I'll add a buck...

On 16.10.12 20:40, Matthias Urlichs wrote:
> I'll have to put in my ..02? here ?
> Guido van Rossum <guido at ...> writes:
>> (2) We're at a fork in the road here. On the one hand, we could choose
>> to deeply integrate greenlets/gevents into the standard library.
> Yes.
> I have two and a half reasons for this.
> (?) Ultimately I think that switching stacks around is always going to be faster
> than unwinding and re-winding things with yield().

If you are emulating things in Python, that may be true.

Also if you are really only switching stacks, that may be true.

But both assumptions do not fit, see below.
> (1) It's a whole lot easier to debug a problem with gevent than with anything
> which uses yield / Deferreds / asyncore / whatever. With gevent, you get a
> standard stack trace. With anything else, the "where did this call come from"
> information is not part of the call chain and thus is either unavailable, or
> will have to be carried around preemptively (with associated overhead).

I'm absolutely your's on ease of coding straight forward.
But this new, efficient "yield from" is a big step into that direction,
see Greg's reply.

> (2) Nothing against Twisted or any other async frameworks, but writing any
> nontrivial program in it requires warping my brain into something that's *not*
> second nature in Python, and never going to be.

Same here.
> Python is not Javascript; if you want to use the "loads of callbacks"
> programming style, use node.js.
> Personal experience: I have written an interpreter for an asynchronous and
> vaguely Pythonic language which I use for home automation, my lawn sprinkers,
> and related stuff (which I should probably release in some form). The code was
> previously based on Twisted and was impossible to debug. It now uses gevent and
> Just Works.

You are using gevent, which uses greenlet!
That means no pure stack switching, but the stack is sliced and
moved onto the heap.
But that technique (originally from Stackless 2.0) is known to be
5-10 times slower, compared to a cooperative context switching
that is built into the interpreter.

This story is by far not over.
Even PyPy with all its advanced technology still depends on stack slicing
when it emulates concurrency.

Python 3.3 has done a huge move, because this efficient nesting
of generators can deeply influence how people are coding,
maybe with the effect that stack tricks loose more of their
importance. I expect more like this to come.

Greenlets are great. Stack inversion is faster.

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From greg.ewing at  Wed Oct 17 12:16:17 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 17 Oct 2012 23:16:17 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> On Tue, Oct 16, 2012 at 12:20 AM, Greg Ewing
> <greg.ewing at> wrote:
>>it blurs the distinction
>>between invoking a subtask synchronously and waiting for the
>>result of a previously spawned independent task.
> Are you sure you really want to distinguish between those though?

I think I do. Partly because I feel that not doing so would
make code harder to reason about. Async stuff is difficult
enough as it is without hiding the boundaries between one
thread of control and another.

There are technical reasons as well. If you use 'yield from'
to wait for completion of an independent task, then it would
seem like you should be able to do this:

    t1 = task1()
    t2 = task2()
    r1 = yield from t1
    r2 = yield from t2

But that can't work -- the object that you wait on has to be
different from the generator instance passed to spawn(). The
reason is that if the task finishes before anyone waits on it,
the return value needs to be stored somewhere.

Having spawn() return an object that deliberately does *not*
have the interface of a generator, and having to explicitly wait
for it, makes it much less likely that anyone will make that kind
of mistake. If you wrote

    t1 = task1()
    t2 = task2()
    r1 = yield from t1.wait()
    r2 = yield from t2.wait()

you would quickly get an exception, because generators don't
have a wait() method.


From greg.ewing at  Wed Oct 17 12:27:49 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 17 Oct 2012 23:27:49 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

I wrote:
> Just to be clear, I'm not saying there should only be one
> scheduler *implementation* in existence

But having said that, I can't see much reason why you would
need to have more than one scheduler implementation.

Multiple event loop implementations are necessary because
async I/O needs to be done different ways on different

But the scheduler we're talking about is all pure Python.
If the interface is well known and universally used, and
there's a good implementation of it in the standard
library, why would anyone want another one?


From greg.ewing at  Wed Oct 17 12:38:41 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 17 Oct 2012 23:38:41 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> On Tue, Oct 16, 2012 at 2:14 PM, Greg Ewing <greg.ewing at> wrote:
>>The important idea is that just because you spawn a task, it
>>doesn't necessarily follow that you want to be regarded as the
>>*parent* of that task and receive its exceptions.
> Maybe. But the opposite doesn't follow either. It's a toss-up between
> the spawner and the waiter.

So maybe spawn() should have an option indicating that the
spawning task is to receive exceptions occuring in the
spawned task.


From tismer at  Wed Oct 17 14:27:43 2012
From: tismer at (Christian Tismer)
Date: Wed, 17 Oct 2012 14:27:43 +0200
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On 17.10.12 12:38, Greg Ewing wrote:
> Guido van Rossum wrote:
>> On Tue, Oct 16, 2012 at 2:14 PM, Greg Ewing 
>> <greg.ewing at> wrote:
>>> The important idea is that just because you spawn a task, it
>>> doesn't necessarily follow that you want to be regarded as the
>>> *parent* of that task and receive its exceptions.
>> Maybe. But the opposite doesn't follow either. It's a toss-up between
>> the spawner and the waiter.
> So maybe spawn() should have an option indicating that the
> spawning task is to receive exceptions occuring in the
> spawned task.

No idea if that helps here, but the same problem occurred for us
as well. It is not always clear if an exception should be handled
in a certain context, or if it should be passed on and get raised
later in the context that is concerned.

For that, Stackless has introduced a _bomb_ object that encapsulates
an exception, in order to let it pass through the normal call/yield/return
interface. It is used to send an exception over a channel, which
will explode (raise that exception) when the receiver picks it later up.

I could think of something similar as a way to collect very many
results in a join construct that collects everything without the need
to handle each exception in the very moment it was raised.

That would make it possible to collect results efficiently using 'yield 
and inspect the results later.

Probably nothing new, just mentioned an idea...

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From Steve.Dower at  Wed Oct 17 15:54:22 2012
From: Steve.Dower at (Steve Dower)
Date: Wed, 17 Oct 2012 13:54:22 +0000
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
	<>, <>
Message-ID: <>

> But the scheduler we're talking about is all pure Python.
> If the interface is well known and universally used, and
> there's a good implementation of it in the standard
> library, why would anyone want another one?

Probably because they already have another one and can't get rid of it. Whether or not we are trying to include GUI development in this, I can guarantee that people will try and use it with a GUI message loop (to avoid blocking on IO, largely). In this case we'd almost certainly need a different implementation for Wx/Tcl/whatever.

"Universally used" is a nice idea, but it will take a long time to get there. A well known interface, especially one that doesn't require the loop itself (i.e. it doesn't have a blocking run() function), lets users write thin wrappers, like the one we did for Tcl: (CallableContext (the 'scheduler') base class is in There needs to be an way to change which one is used at runtime, but there only needs to be one per thread.


From guido at  Wed Oct 17 16:55:57 2012
From: guido at (Guido van Rossum)
Date: Wed, 17 Oct 2012 07:55:57 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 17, 2012 at 3:16 AM, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
>> On Tue, Oct 16, 2012 at 12:20 AM, Greg Ewing
>> <greg.ewing at> wrote:
>>> it blurs the distinction
>>> between invoking a subtask synchronously and waiting for the
>>> result of a previously spawned independent task.
>> Are you sure you really want to distinguish between those though?
> I think I do. Partly because I feel that not doing so would
> make code harder to reason about. Async stuff is difficult
> enough as it is without hiding the boundaries between one
> thread of control and another.
> There are technical reasons as well. If you use 'yield from'
> to wait for completion of an independent task, then it would
> seem like you should be able to do this:
>    t1 = task1()
>    t2 = task2()
>    spawn(t1)
>    spawn(t2)
>    r1 = yield from t1
>    r2 = yield from t2
> But that can't work -- the object that you wait on has to be
> different from the generator instance passed to spawn(). The
> reason is that if the task finishes before anyone waits on it,
> the return value needs to be stored somewhere.
> Having spawn() return an object that deliberately does *not*
> have the interface of a generator, and having to explicitly wait
> for it, makes it much less likely that anyone will make that kind
> of mistake. If you wrote
>    t1 = task1()
>    t2 = task2()
>    spawn(t1)
>    spawn(t2)
>    r1 = yield from t1.wait()
>    r2 = yield from t2.wait()
> you would quickly get an exception, because generators don't
> have a wait() method.

Ack. I get it. It's like the difference between calling a function vs.
running it in an OS thread.

--Guido van Rossum (

From guido at  Wed Oct 17 16:58:52 2012
From: guido at (Guido van Rossum)
Date: Wed, 17 Oct 2012 07:58:52 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Wed, Oct 17, 2012 at 5:27 AM, Christian Tismer <tismer at> wrote:
> On 17.10.12 12:38, Greg Ewing wrote:
>> Guido van Rossum wrote:
>>> On Tue, Oct 16, 2012 at 2:14 PM, Greg Ewing <greg.ewing at>
>>> wrote:
>>>> The important idea is that just because you spawn a task, it
>>>> doesn't necessarily follow that you want to be regarded as the
>>>> *parent* of that task and receive its exceptions.
>>> Maybe. But the opposite doesn't follow either. It's a toss-up between
>>> the spawner and the waiter.
>> So maybe spawn() should have an option indicating that the
>> spawning task is to receive exceptions occuring in the
>> spawned task.
> No idea if that helps here, but the same problem occurred for us
> as well. It is not always clear if an exception should be handled
> in a certain context, or if it should be passed on and get raised
> later in the context that is concerned.
> For that, Stackless has introduced a _bomb_ object that encapsulates
> an exception, in order to let it pass through the normal call/yield/return
> interface. It is used to send an exception over a channel, which
> will explode (raise that exception) when the receiver picks it later up.

Hmm... That sounds a little like your iriginal design for the channel
only supported transferring values. At least for NDB, all channels
support exceptions and tracebacks as an explicit alternative to the

> I could think of something similar as a way to collect very many
> results in a join construct that collects everything without the need
> to handle each exception in the very moment it was raised.
> That would make it possible to collect results efficiently using 'yield
> from' and inspect the results later.
> Probably nothing new, just mentioned an idea...

I do think we're hashing out important ideas...

--Guido van Rossum (

From greg.ewing at  Wed Oct 17 23:49:48 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 18 Oct 2012 10:49:48 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Piet Delport wrote:

> In particular, this suspend() could be used to integrate fairly directly
> with callback-based APIs: for example, if you have a Twisted Deferred,
> you could do:
>     result = yield suspend(d.addCallback)

I've been thinking about how to express this using the
primitives provided by the scheduler in my tutorial.
I don't actually have a primitive that simply suspends
a task; instead, I have one that moves the current task
from the ready list to a specified list:


Similarly, I don't have a primitive that explicitly
resumes a particular task[1] -- only one that takes the first
task off a specified list and resumes it:


I think this is a good idea if we want to be able to
cancel tasks, because a cancelled task ought to cleanly
disappear from the system, without any risk that something
will try to schedule it again. This is achievable if we
maintain the invariant that a task always belongs to some
queue, and the scheduler knows about that queue.

Given these primitives, we can define

    def wakeup_callback(queue):
       lambda: scheduler.unblock(queue)


    def wait_for_callback(add_callback):
       q = []

This is starting to look rather like a semaphore. If we
assume semaphores as a facility provided by the library,
then it becomes very straightforward:

    def wait_for_callback(add_callback):
       s = Semaphore()
       yield from s.wait()

That assumes the callback is single-use. But a semaphore
can also handle multi-use callbacks: you just keep the
semaphore around and repeatedly wait on it. You will get
woken up once for each time the callback is called.

    s = Semaphore()
    while we_are_still_interested():
       yield from s.wait()


[1] Actually I do, but I'm thinking it shouldn't be
exposed as part of the public API for reasons given here.


From greg.ewing at  Thu Oct 18 09:49:20 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 18 Oct 2012 20:49:20 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

I've converted my tutorial on generator-based tasks
for Python 3.3, tidied it up a bit and posted it here:


From _ at  Thu Oct 18 13:01:30 2012
From: _ at (Laurens Van Houtven)
Date: Thu, 18 Oct 2012 13:01:30 +0200
Subject: [Python-ideas] asyncore: included batteries don't fit
In-Reply-To: <>
References: <>
Message-ID: <>

Do you use gevent's monkeypatch-the-stdlib feature?

On Tue, Oct 16, 2012 at 8:40 PM, Matthias Urlichs <matthias at>wrote:

> I'll have to put in my ..02? here ?
> Guido van Rossum <guido at ...> writes:
> > (2) We're at a fork in the road here. On the one hand, we could choose
> > to deeply integrate greenlets/gevents into the standard library.
> Yes.
> I have two and a half reasons for this.
> (?) Ultimately I think that switching stacks around is always going to be
> faster
> than unwinding and re-winding things with yield().

That seems like something that can be factually proven or counterproven.

> (1) It's a whole lot easier to debug a problem with gevent than with
> anything
> which uses yield / Deferreds / asyncore / whatever. With gevent, you get a
> standard stack trace. With anything else, the "where did this call come
> from"
> information is not part of the call chain and thus is either unavailable,
> or
> will have to be carried around preemptively (with associated overhead).

gevent uses stack slicing, which IIUC is pretty expensive. Why is it not
subject to the performance overhead you mention?

Can you give an example of such a crappy stack trace in twisted? I develop
in it all day, and get pretty decent stack traces. The closest thing I have
to a crappy stack trace is when doing functional tests with an RPC API --
obviously on the client side all I'm going to see is a fairly crappy
just-an-exception. That's okay, I also get the server side exception that
looks like a plain old Python traceback to me and tells me exactly where
the problem is from.

> (2) Nothing against Twisted or any other async frameworks, but writing any
> nontrivial program in it requires warping my brain into something that's
> *not*
> second nature in Python, and never going to be.

Which ones are you thinking about other than twisted? It seems that the
issue you are describing is one of semantics, not so much of whether or not
it actually does things asynchronously under the hood, as e.g gevent does

> Python is not Javascript; if you want to use the "loads of callbacks"
> programming style, use node.js.

None of the solutions on the table have node.js-style "loads of callbacks".
Everything has some way of structuring them. It's either implicit switches
(as in "can happen in the caller"), explicit switches (as in yield/yield
from) or something like deferreds, some options having both of the latter.

> Personal experience: I have written an interpreter for an asynchronous and
> vaguely Pythonic language which I use for home automation, my lawn
> sprinkers,
> and related stuff (which I should probably release in some form). The code
> was
> previously based on Twisted and was impossible to debug. It now uses
> gevent and
> Just Works.

If you have undebuggable code samples from that I'd love to take a look.

> --
> -- Matthias Urlichs
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From carlopires at  Thu Oct 18 14:04:05 2012
From: carlopires at (Carlo Pires)
Date: Thu, 18 Oct 2012 09:04:05 -0300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

2012/10/18 Greg Ewing <greg.ewing at>

> I've converted my tutorial on generator-based tasks
> for Python 3.3, tidied it up a bit and posted it here:

I liked it. I was kind of confused about use of yield/yield from in this
style of async. Now things seems to be pretty clear.
  Carlo Pires
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jimjjewett at  Fri Oct 19 00:10:32 2012
From: jimjjewett at (Jim Jewett)
Date: Thu, 18 Oct 2012 18:10:32 -0400
Subject: [Python-ideas] Is there a good reason to use * for
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/12/12, Ethan Furman <ethan at> wrote:
> <aside>
> In college we dropped the ? and just wrote stuff like:
> (x + z)(x - y)
> but we can't do that in Python because they are function calls.
> </aside>

I think your mailer must have stripped out unicode character 0x2062,

Some spelling mistakes are harder to see than others...


From tismer at  Fri Oct 19 03:12:58 2012
From: tismer at (Christian Tismer)
Date: Fri, 19 Oct 2012 03:12:58 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Hi Greg,

coming back to this after quite a storm in my brain...

On 16.10.12 02:44, Greg Ewing wrote:
> Christian Tismer wrote:
>> Right, CPython still keeps unneccessary crap on the C stack.
> It's not just Python leaving stuff on the stack that's a
> problem, it's external C code that calls back into Python.

That one is something that I think to ignore.
Of course there are quite some situations where callbacks
into Python are a problem, but I don't want to put this into Python.

There are ways to cope with this, for instance using greenlet as
an optional extension module that handles these cases.
Alternatively, Python itself can do it with strictly controlled threads.

But I think leaving that out will simplify matters a lot and keeps Python
clean. In the end, I want to model something that is likely to be

>> But that's not the point right now, because on the other hand,
>> in the context of a possible yield (from or not), the C stack
>> is clean, and this enables switching.
>> And actually in such clean positions, Stackless Python (as opposed to
>> Greenlets) does soft-switching, which is very similar to what the 
>> generators
>> are doing - there is no assembly stuff involved at all.
> But the assembly code still needs to be there to handle the
> cases where you *can't* do soft switching. It's the presence
> of the code that's the issue here, not how frequently it
> gets called.

No, I'm intending to really rip that out.
Or better, I want to do a rewrite of a subset of Stackless,
actually the functionality that allows to implement greenlets
or multi-level generators, task scheduling and so on.

In effect, I want to find something that enables some extended
switching. Emulated, without hacking the kernel in the first place.

Generators are restricted to call/yield/return positions, and
I thing that's fine. What can be switched is totally clear by
definitions, and I like that. I'm talking of exactly that.

What I dislike is a different topic ;-)

>> I have begun studying the code for YIELD_FROM. As it is written, every
>> next iteration elevates the chain of generators once up and down.
>> Maybe that can be avoided by changing the frame chain, so this can 
>> become
>> a cheaper O(1) operation.
> My original implementation of yield-from actually *did* avoid
> this, by keeping a C-level pointer chain of yielding-from frames.
> But that part was ripped out at the last minute when someone
> discovered that it had a detrimental effect on tracebacks.
> There are probably other ways the traceback problem could be
> fixed, so maybe we will get this optimisation back one day.

Ok, let's ignore this O(n) problem for now. _yield from_ is anyway
probably faster by more than an order of magnitude, so it will
serve your purpose (nesting generators) pretty well.

My problem is different because I want a scaling building block
for building higher level structures, and I would love to build them
using _yield from_ .

There are a few things which contradict completely my thinking:

- _yield from_ works from the top. That is, if I have five nested iterators
    and I want to drive them, then I have to call the root generator?!
    I see that that works, but it is against all what I'm used to.

    How can I inject the info that I want to switch context?

- generators always yield to the caller, and also return values to the
    caller. What I'm looking for is to express a switch so something

- generators are able to free the stack, when they yield. But when they
    are active, they use the full stack. At least when I follow the pattern
    "generator is calling sub-generator".
    A deeply nested recursion is therefore something to avoid. :-(

Now I'm playing around with different approaches to model something
flexible that gives me more freedom. Right now I'm trying a slightly
pervert approach to give me an _unwindable_, merely a frame-like
object that can vanish on demand.

I'm also experimenting with emulating different kinds of _yield_".
Since there is only one kind of yield to one target, I get the problem
to distinguish that for different purposes.


I can take a set of nested functions in their native form.
Then replacing ordinary calls by _yield from_ and inserting proper
yields before actually returning, I now have the equivalent of a nested
function call, that I can drive with another _yield from_ .
This is now a function that permanently releases the stack.

Now I would like to give one of the nested function the ability to
transfer execution somewhere else. The support is insofar there,
as the stack is freed all the time. But this function that wants to
switch needs to pass the fact that it wants to switch, plus the target
somewhere. As I understood it, I would need to yield that to the
driver function.
In order to do that, I would need to yield a tuple or a bound object.
This is a different purpose than the simple driver functionality.

Do you see it? In my understanding, a switch would not be driven from
the top and then dispatched upon, but a called function below the
function to be switched would modify something that leads to a
switch as a result.
In this example, the generator-emulated function would be driven
by thousands of yields, kind of polled to catch the one event that
needs to be supported by a switching action.

This looks wrong for me, like doing things upside down.

To shorten this: I have the problem that I have your very efficient yield
collector, but I need to dispatch on what is intended by the yield,
instead of initiating a reaction from where I am.
All in all, I can't get rid of the thought "un-pythonic".

So I'm still thinking of a frame-like object that allows me to control
its execution, let it vanish, and so on, and use it as a building block.
As always, I'm feeling bad when going this road, because I want to use
the eficient _yield from_ as much as possible.

But it may be my missing experience.

Do you understand, and maybe see where I have the wrong
brain shortcuts?
How do you write something composable that scales?

Cheers -- Chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From jimjjewett at  Fri Oct 19 05:46:40 2012
From: jimjjewett at (Jim Jewett)
Date: Thu, 18 Oct 2012 23:46:40 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Is the goal really to provide "The async API of the future", or just
to provide "a stdlib module which provides one adequate way to do

I think the yield and yield from solutions all need too much magical
scaffolding to be The One True Way, but I don't mind such conventions
as much when they're part of a particular example class, such as

To stretch an analogy, generators and context managers are different
concepts.  Allowing certain generators to be used as context managers
(by using the "with" keyword)  is fine.  But I wouldn't want to give
up all the other uses of generators.

If yield starts implying other magical properties that are only useful
when communicating with a scheduler, rather than a regular caller ...
I'm afraid that muddies the concept up too much for my taste.

More specific concerns below:

On 10/12/12, Guido van Rossum <guido at> wrote:

> But the only use for send() on a generator is when using it as a
> coroutine for a concurrent tasks system -- send() really makes no
> sense for generators used as iterators. And you're claiming, it seems,
> that you prefer yield-from for concurrent tasks.

But the data doesn't have to be scheduling information; it can be new
data, a seed for an algorithm, a command to switch or reset the state
... locking it to the scheduler is part of what worries me.

> On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing <greg.ewing at>

>> Keep in mind that a value yielded by a generator being used as
>> part of a coroutine is *not* seen by code calling it with
>> yield-from.

That is part of what bugs me about the yield-from examples.

Until this discussion, I had thought of yield-from as factoring out
some code that was still conceptually embedded within the parent
generator.  This (perhaps correctly) makes it seem more like a
temporary replacement, as if the parent were no longer there at all.

But with the yield channel reserved for scheduling overhead, the
"generator" can't really generate anything, except through side

> ... I feel that "value = yield <something that returns a Future>"
> is quite a good paradigm,

To me, it seems fine for a particular concrete scheduler, but too
strong an assumption for an abstract API.

I can mostly* understand:

    YieldScheduler assumes any yielded data is another Task; it will
    schedule that task, and cause the original (yielding) Task to wait
    until the new task is completed.

But I wonder what I'm missing with:

    Generators should only yield (expressions that create) Futures;
    the scheduler will automatically unwrap the future and send (or
    throw) the result back into the parent (or other ancestor)
    Generator, which will then be resumed.

* "mostly", because if my task is willing to wait for the subtask to
complete, then why not just use a blocking call in the first place?
Is it just because switching to another task is lighter weight than
letting a thread block?

What happens if a generator does yield something other than a Future?
Will the generator be rescheduled in an already-runnable (as opposed
to waiting) state?  Will it never be resumed?  Will that object be
auto-wrapped in a Future for the benefit of whichever other co-routine
originally made the request?

Are generators assumed to run to exhaustion, or is some sort of driver
needed to keep pumping them?

> ... It would be horrible to require C to create a fake generator.

Would it have to wrap results in a fake Future, so that the scheduler
could properly unwrap?

> ...Well, I'm talking about a decorator that you *always* apply, and which
> does nothing (or very little) when wrapping a generator, but adds
> generator behavior when wrapping a non-generator function.

Why is an always-applied decorator any less intrusive than a mandatory
(mixin) base class?

> (1) Calling an async operation and waiting for its result, using yield

> Futures:
>   result = yield some_async_op(args)

I was repeatedly confused over whether "result" would be a Future that
still needed resolution, and the example code wasn't always
consistent.  As I understand it now, the scheduler (not just the
particular implementation, but the API) has to automatically treat any
yielded data as a future, resolve that future to its result, and then
send (or throw) that result (as opposed to the future) back into
either the parent task or the least distant ancestor task not to be
using "yield from".

> Yield-from:
>   result = yield from some_async_op(args)

So the generator containing this code suspends itself entirely until
some_async_op is exhausted, at which point result will be the
StopIteration?  (Or None?)  Non-Exception results get passed straight
to the least-distant ancestor task not using "yield from", but
Exceptions propagate through one generation at a time.

> (2) Setting the result of an async operation

> Futures:
>   f.set_result(value)  # From any callback

PEP 3148 considers set_result private to the executor.  Can that
always be done from arbitrary callbacks?  Can it be done more than

I think for the normal case, a task should just return its value, and
the Future or the Scheduler should be responsible for calling

> Yield-from:
>   return value  # From the outermost generator

Why only the outermost?  I'm guessing it is because everything else is
suspended, and even if a mid-level generator is explicitly re-added to
the task queue, it can't actually continue because of re-entrancy.

> (3) Handling an exception
> Futures:
>   try:
>     result = yield some_async_op(args)
>   except MyException:
>     <handle exception>

So the scheduler does have to unpack the future, and throw rather than send.

> (4) Raising an exception as the outcome of an async operation

> Futures:
>   f.set_exception(<Exception instance>)

Again, shouldn't the task itself just raise, and let the future (or
the scheduler) call that?

> Yield-from:
>   raise <Exception instance or class>  # From any of the generators

So it doesn't need to be wrapped in a Future, until it needs to cross
back over a "schedule this  asynchronously" gulf?

> (5) Having one async operation invoke another async operation

> Futures:
>   @task
>   def outer(args):
>     res = yield inner(args)
>     return res

> Yield-from:
>   def outer(args):
>     res = yield from inner(args)
>     return res

Will it ever get to continue processing (under either model) before
inner exhausts itself and stops yielding?

> Note: I'm including this because in the Futures case, each level of
> yield requires the creation of a separate Future.

Only because of the auto-unboxing.  And if the generator suspends
itself to wait for the future, then the future will be resolved before
control returns to the generator's own parents, so those per-layer
Futures won't really add anything.

> (6) Spawning off multiple async subtasks
> Futures:
>   f1 = subtask1(args1)  # Note: no yield!!!
>   f2 = subtask2(args2)
>   res1, res2 = yield f1, f2

ah.  That makes a bit more sense, though the tuple of futures does
complicate the automagic unboxing.  (Which containers, to which
levels, have to be resolved?)

> Yield-from:
>   ??????????
> *** Greg, can you come up with a good idiom to spell concurrency at
> this level? Your example only has concurrency in the philosophers
> example, but it appears to interact directly with the scheduler, and
> the philosophers don't return values. ***

Why wouldn't this be the same as you already wrote without yield-from?
Two subtasks were submitted but not waited for.  I suppose you could
yield from a generator that submits new subtasks every time it
generates something, but that would be solving a more complicated
problem.  (So it wouldn't be a consequence of the "yield from".)

> (7) Checking whether an operation is already complete

> Futures:
>   if f.done(): ...

If f was yielded, it is done, or this code wouldn't be running again to check.

> Yield-from:
>   ?????????????

And again, if the futures were yielded (even through a yield from)
then they're already unboxed; otherwise, you can still check f.done

> (8) Getting the result of an operation multiple times
> Futures:
>   f = async_op(args)
>   # squirrel away a reference to f somewhere else
>   r = yield f
>   # ... later, elsewhere
>   r = f.result()

Why do you have to squirrel away the reference?  Are you assuming that
the async scheduler will mess with the locals so that f is no longer

> Yield-from:
>   ???????????????

This, you cannot reasonably do; the nature of yield-from means that
the unresolved futures were never visible within this generator; they
were resolved by the scheduler and the results handed straight to the
generator's ancestor.

> (9) Canceling an operation
> Futures:
>   f.cancel()
> Yield-from:
>   ???????????????
> Note: I haven't needed canceling yet, and I believe Devin said that
> Twisted just got rid of it. However some of the JS Deferred
> implementations seem to support it.

I think that once you've called "yield from", the generator making
that call is suspended until the child generator completes.   But a
different thread of control could cancel the active (most-descended)

> (10) Registering additional callbacks
> Futures:
>   f.add_done_callback(callback)
> Yield-from:
>   ???????
> Note: this is used in NDB to trigger "hooks" that should run e.g. when
> a database write completes. The user's code just writes yield
> ent.put_async(); the trigger is automatically called by the Future's
> machinery. This also uses (8).

I think you would have to do add the callbacks within the subgenerator
that is spawning f.

That, or un-inline the yield from, and lose the automated send-throw forwarding.


From greg.ewing at  Fri Oct 19 07:15:33 2012
From: greg.ewing at (Greg Ewing)
Date: Fri, 19 Oct 2012 18:15:33 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Christian Tismer wrote:

> - generators are able to free the stack, when they yield. But when they
>    are active, they use the full stack. At least when I follow the pattern
>    "generator is calling sub-generator".
>    A deeply nested recursion is therefore something to avoid. :-(

Only if yield-from chains aren't optimised the way they
used to be.

In any case, for the application we're talking about here,
the difference will probably not be noticeable.

> But this function that wants to
> switch needs to pass the fact that it wants to switch, plus the target
> somewhere. As I understood it, I would need to yield that to the
> driver function.

You understand incorrectly. In my scheduler, the yields
don't send or receive values at all. Communicating with the
scheduler, for example to tell it to allow another task to
run, is done by calling functions. A yield must be done to
actually allow a switch, but the yield itself doesn't send
any information.

> Do you see it? In my understanding, a switch would not be driven from
> the top and then dispatched upon, but a called function below the
> function to be switched would modify something that leads to a
> switch as a result.

That's pretty much what happens in my scheduler.

> Do you understand, and maybe see where I have the wrong
> brain shortcuts?
> How do you write something composable that scales?

I think you should study my scheduler tutorial. If you can
understand how that works, I think it will answer many of
your questions.


From tismer at  Fri Oct 19 14:05:20 2012
From: tismer at (Christian Tismer)
Date: Fri, 19 Oct 2012 14:05:20 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 19.10.12 07:15, Greg Ewing wrote:
> Christian Tismer wrote:
>> - generators are able to free the stack, when they yield. But when they
>>    are active, they use the full stack. At least when I follow the 
>> pattern
>>    "generator is calling sub-generator".
>>    A deeply nested recursion is therefore something to avoid. :-(
> Only if yield-from chains aren't optimised the way they
> used to be.

Does that mean a very deep recursion would be efficient?

I'm trying to find that change in the hg history right now.

Can you give me a hint how your initial implementation
works, the initial patch source?
> ...
>> But this function that wants to
>> switch needs to pass the fact that it wants to switch, plus the target
>> somewhere. As I understood it, I would need to yield that to the
>> driver function.
> You understand incorrectly. In my scheduler, the yields
> don't send or receive values at all. Communicating with the
> scheduler, for example to tell it to allow another task to
> run, is done by calling functions. A yield must be done to
> actually allow a switch, but the yield itself doesn't send
> any information.

I have studied that yesterday already in depth and like that quite much.
It is probably just the problem that I had with generators from their

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From ironfroggy at  Fri Oct 19 14:46:31 2012
From: ironfroggy at (Calvin Spealman)
Date: Fri, 19 Oct 2012 08:46:31 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 18, 2012 at 11:46 PM, Jim Jewett <jimjjewett at> wrote:
> Is the goal really to provide "The async API of the future", or just
> to provide "a stdlib module which provides one adequate way to do
> async"?
> I think the yield and yield from solutions all need too much magical
> scaffolding to be The One True Way, but I don't mind such conventions
> as much when they're part of a particular example class, such as
> concurrent.schedulers.YieldScheduler.
> To stretch an analogy, generators and context managers are different
> concepts.  Allowing certain generators to be used as context managers
> (by using the "with" keyword)  is fine.  But I wouldn't want to give
> up all the other uses of generators.
> If yield starts implying other magical properties that are only useful
> when communicating with a scheduler, rather than a regular caller ...
> I'm afraid that muddies the concept up too much for my taste.

I think it is important that this is more than convention. I think that we
need our old friend TOOOWTDI (There's Only One Obvious Way To Do It)
here more than ever. This stuff is complicated, and following that
of what eventually is written on top of it is going to be complicated. Our
focus should be not on providing simple things like "async file read" but
crafting an environment where people can continue to write wonderfully
expressive and useful libraries that others can combine to their own needs.
If we don't provide the layer upon which this disparate pieces cooperate,
I fear much of the effort is all for too little gain to be worth the effort.

> More specific concerns below:
> On 10/12/12, Guido van Rossum <guido at> wrote:
>> But the only use for send() on a generator is when using it as a
>> coroutine for a concurrent tasks system -- send() really makes no
>> sense for generators used as iterators. And you're claiming, it seems,
>> that you prefer yield-from for concurrent tasks.
> But the data doesn't have to be scheduling information; it can be new
> data, a seed for an algorithm, a command to switch or reset the state
> ... locking it to the scheduler is part of what worries me.

When a coroutine yields, it yields *to the scheduler* so for whom else should
these values be?

>> On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing <greg.ewing at>
>>> Keep in mind that a value yielded by a generator being used as
>>> part of a coroutine is *not* seen by code calling it with
>>> yield-from.
> That is part of what bugs me about the yield-from examples.
> Until this discussion, I had thought of yield-from as factoring out
> some code that was still conceptually embedded within the parent
> generator.  This (perhaps correctly) makes it seem more like a
> temporary replacement, as if the parent were no longer there at all.
> But with the yield channel reserved for scheduling overhead, the
> "generator" can't really generate anything, except through side
> effects...

Don't forget that yield-from is an expression, not a statement. The
value eventually returned from the generator is the result of the yield-from,
so the generator still produces a final value.

The fact that these are generators is for their ability to suspend, not to

>> ... I feel that "value = yield <something that returns a Future>"
>> is quite a good paradigm,
> To me, it seems fine for a particular concrete scheduler, but too
> strong an assumption for an abstract API.
> I can mostly* understand:
>     YieldScheduler assumes any yielded data is another Task; it will
>     schedule that task, and cause the original (yielding) Task to wait
>     until the new task is completed.
> But I wonder what I'm missing with:
>     Generators should only yield (expressions that create) Futures;
>     the scheduler will automatically unwrap the future and send (or
>     throw) the result back into the parent (or other ancestor)
>     Generator, which will then be resumed.
> * "mostly", because if my task is willing to wait for the subtask to
> complete, then why not just use a blocking call in the first place?
> Is it just because switching to another task is lighter weight than
> letting a thread block?

By blocking call do you mean "x = foo()" or "x = yield from foo()"?
Blocking call usually means the former, so if you mean that, then you neglect
to think of all the other tasks running which are not willing to wait.

> What happens if a generator does yield something other than a Future?
> Will the generator be rescheduled in an already-runnable (as opposed
> to waiting) state?  Will it never be resumed?  Will that object be
> auto-wrapped in a Future for the benefit of whichever other co-routine
> originally made the request?

I think if the scheduler doesn't know what to do with something, it should be an
error. That makes it easier to change things in the future.

> Are generators assumed to run to exhaustion, or is some sort of driver
> needed to keep pumping them?
>> ... It would be horrible to require C to create a fake generator.
> Would it have to wrap results in a fake Future, so that the scheduler
> could properly unwrap?
>> ...Well, I'm talking about a decorator that you *always* apply, and which
>> does nothing (or very little) when wrapping a generator, but adds
>> generator behavior when wrapping a non-generator function.
> Why is an always-applied decorator any less intrusive than a mandatory
> (mixin) base class?
>> (1) Calling an async operation and waiting for its result, using yield
>> Futures:
>>   result = yield some_async_op(args)
> I was repeatedly confused over whether "result" would be a Future that
> still needed resolution, and the example code wasn't always
> consistent.  As I understand it now, the scheduler (not just the
> particular implementation, but the API) has to automatically treat any
> yielded data as a future, resolve that future to its result, and then
> send (or throw) that result (as opposed to the future) back into
> either the parent task or the least distant ancestor task not to be
> using "yield from".
>> Yield-from:
>>   result = yield from some_async_op(args)
> So the generator containing this code suspends itself entirely until
> some_async_op is exhausted, at which point result will be the
> StopIteration?  (Or None?)  Non-Exception results get passed straight
> to the least-distant ancestor task not using "yield from", but
> Exceptions propagate through one generation at a time.

The result is not an exception, but the return of some_async_op(args)

>> (2) Setting the result of an async operation
>> Futures:
>>   f.set_result(value)  # From any callback
> PEP 3148 considers set_result private to the executor.  Can that
> always be done from arbitrary callbacks?  Can it be done more than
> once?
> I think for the normal case, a task should just return its value, and
> the Future or the Scheduler should be responsible for calling
> set_result.

I agree

>> Yield-from:
>>   return value  # From the outermost generator
> Why only the outermost?  I'm guessing it is because everything else is
> suspended, and even if a mid-level generator is explicitly re-added to
> the task queue, it can't actually continue because of re-entrancy.
>> (3) Handling an exception
>> Futures:
>>   try:
>>     result = yield some_async_op(args)
>>   except MyException:
>>     <handle exception>
> So the scheduler does have to unpack the future, and throw rather than send.
>> (4) Raising an exception as the outcome of an async operation
>> Futures:
>>   f.set_exception(<Exception instance>)
> Again, shouldn't the task itself just raise, and let the future (or
> the scheduler) call that?
>> Yield-from:
>>   raise <Exception instance or class>  # From any of the generators
> So it doesn't need to be wrapped in a Future, until it needs to cross
> back over a "schedule this  asynchronously" gulf?
>> (5) Having one async operation invoke another async operation
>> Futures:
>>   @task
>>   def outer(args):
>>     res = yield inner(args)
>>     return res
>> Yield-from:
>>   def outer(args):
>>     res = yield from inner(args)
>>     return res
> Will it ever get to continue processing (under either model) before
> inner exhausts itself and stops yielding?
>> Note: I'm including this because in the Futures case, each level of
>> yield requires the creation of a separate Future.
> Only because of the auto-unboxing.  And if the generator suspends
> itself to wait for the future, then the future will be resolved before
> control returns to the generator's own parents, so those per-layer
> Futures won't really add anything.
>> (6) Spawning off multiple async subtasks
>> Futures:
>>   f1 = subtask1(args1)  # Note: no yield!!!
>>   f2 = subtask2(args2)
>>   res1, res2 = yield f1, f2
> ah.  That makes a bit more sense, though the tuple of futures does
> complicate the automagic unboxing.  (Which containers, to which
> levels, have to be resolved?)
>> Yield-from:
>>   ??????????
>> *** Greg, can you come up with a good idiom to spell concurrency at
>> this level? Your example only has concurrency in the philosophers
>> example, but it appears to interact directly with the scheduler, and
>> the philosophers don't return values. ***
> Why wouldn't this be the same as you already wrote without yield-from?
> Two subtasks were submitted but not waited for.  I suppose you could
> yield from a generator that submits new subtasks every time it
> generates something, but that would be solving a more complicated
> problem.  (So it wouldn't be a consequence of the "yield from".)
>> (7) Checking whether an operation is already complete
>> Futures:
>>   if f.done(): ...
> If f was yielded, it is done, or this code wouldn't be running again to check.
>> Yield-from:
>>   ?????????????
> And again, if the futures were yielded (even through a yield from)
> then they're already unboxed; otherwise, you can still check f.done
>> (8) Getting the result of an operation multiple times
>> Futures:
>>   f = async_op(args)
>>   # squirrel away a reference to f somewhere else
>>   r = yield f
>>   # ... later, elsewhere
>>   r = f.result()
> Why do you have to squirrel away the reference?  Are you assuming that
> the async scheduler will mess with the locals so that f is no longer
> valid?
>> Yield-from:
>>   ???????????????
> This, you cannot reasonably do; the nature of yield-from means that
> the unresolved futures were never visible within this generator; they
> were resolved by the scheduler and the results handed straight to the
> generator's ancestor.
>> (9) Canceling an operation
>> Futures:
>>   f.cancel()
>> Yield-from:
>>   ???????????????
>> Note: I haven't needed canceling yet, and I believe Devin said that
>> Twisted just got rid of it. However some of the JS Deferred
>> implementations seem to support it.
> I think that once you've called "yield from", the generator making
> that call is suspended until the child generator completes.   But a
> different thread of control could cancel the active (most-descended)
> generator.
>> (10) Registering additional callbacks
>> Futures:
>>   f.add_done_callback(callback)
>> Yield-from:
>>   ???????
>> Note: this is used in NDB to trigger "hooks" that should run e.g. when
>> a database write completes. The user's code just writes yield
>> ent.put_async(); the trigger is automatically called by the Future's
>> machinery. This also uses (8).
> I think you would have to do add the callbacks within the subgenerator
> that is spawning f.
> That, or un-inline the yield from, and lose the automated send-throw forwarding.
> -jJ
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Read my blog! I depend on your acceptance of my opinion! I am interesting!
Follow me if you're into that sort of thing:

From tismer at  Fri Oct 19 14:55:57 2012
From: tismer at (Christian Tismer)
Date: Fri, 19 Oct 2012 14:55:57 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Hi Nick,

On 16.10.12 03:49, Nick Coghlan wrote:
> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
> <greg.ewing at> wrote:
>> My original implementation of yield-from actually *did* avoid
>> this, by keeping a C-level pointer chain of yielding-from frames.
>> But that part was ripped out at the last minute when someone
>> discovered that it had a detrimental effect on tracebacks.
>> There are probably other ways the traceback problem could be
>> fixed, so maybe we will get this optimisation back one day.
> Ah, I thought I remembered something along those lines. IIRC, it was a
> bug report on one of the alphas that prompted us to change it.

I was curious and searched quite a lot.
It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
from Marc Shannon, patched by Benjamin.

Now I found the original implementation. That looks very much
as I'm thinking it should be.

Quite a dramatic change which works well, but really seems to remove
what I would call "now I can emulate most of Stackless" efficiently.

Maybe I should just try to think it would be implemented as before,
build an abstraction and just use it for now.

I will spend my time at PyCon de for sprinting on "yield from".

cheers - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From ncoghlan at  Fri Oct 19 15:56:04 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 19 Oct 2012 23:56:04 +1000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer <tismer at> wrote:
> I was curious and searched quite a lot.
> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
> from Marc Shannon, patched by Benjamin.
> Now I found the original implementation. That looks very much
> as I'm thinking it should be.
> Quite a dramatic change which works well, but really seems to remove
> what I would call "now I can emulate most of Stackless" efficiently.
> Maybe I should just try to think it would be implemented as before,
> build an abstraction and just use it for now.
> I will spend my time at PyCon de for sprinting on "yield from".

Yeah, if we can get Greg's original optimised behaviour while still
supporting introspection properly, that's really where we want to be.
That's the main reason I'm a fan of Mark's other patches moving more
of the generator state from the frame objects out into the generator
objects - my suspicion is that generator objects themselves need to be
maintaining a full "generator stack" independent of the frame stack in
the main eval loop in order to get the best of both worlds (i.e.
optimised suspend/resume with confusing debuggers).


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From breamoreboy at  Fri Oct 19 16:05:54 2012
From: breamoreboy at (Mark Lawrence)
Date: Fri, 19 Oct 2012 15:05:54 +0100
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <k5rmmg$b55$>

On 19/10/2012 14:56, Nick Coghlan wrote:
> On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer <tismer at> wrote:
>> I was curious and searched quite a lot.
>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
>> from Marc Shannon, patched by Benjamin.
>> Now I found the original implementation. That looks very much
>> as I'm thinking it should be.
>> Quite a dramatic change which works well, but really seems to remove
>> what I would call "now I can emulate most of Stackless" efficiently.
>> Maybe I should just try to think it would be implemented as before,
>> build an abstraction and just use it for now.
>> I will spend my time at PyCon de for sprinting on "yield from".
> Yeah, if we can get Greg's original optimised behaviour while still
> supporting introspection properly, that's really where we want to be.
> That's the main reason I'm a fan of Mark's other patches moving more
> of the generator state from the frame objects out into the generator
> objects - my suspicion is that generator objects themselves need to be
> maintaining a full "generator stack" independent of the frame stack in
> the main eval loop in order to get the best of both worlds (i.e.
> optimised suspend/resume with confusing debuggers).
> Cheers,
> Nick.

There's nothing like confusing debuggers or have I read that wrong? :)


Mark Lawrence.

From ncoghlan at  Fri Oct 19 16:16:01 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 20 Oct 2012 00:16:01 +1000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <k5rmmg$b55$>
References: <>
	<> <>
Message-ID: <>

On Sat, Oct 20, 2012 at 12:05 AM, Mark Lawrence <breamoreboy at> wrote:
> There's nothing like confusing debuggers or have I read that wrong? :)

Yeah, that was the main issue that resulted in the design change - the
optimised approach confused a lot of the introspection machinery. So
the challenge is to restore the optimisation while *also* adding in
mechanisms to preserve the introspection support.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guido at  Fri Oct 19 18:05:05 2012
From: guido at (Guido van Rossum)
Date: Fri, 19 Oct 2012 09:05:05 -0700
Subject: [Python-ideas] The async API of the future
Message-ID: <>

Work priorities don't allow me to spend another day replying in detail
to the various emails on this topic, but I am still keeping up

I have read Greg's response to my comparison between
Future+yield-based coroutines and his yield-from-based, Future-free
coroutines, and after having written a small prototype, I am now
pretty much convinced that Greg's way is superior. This doesn't mean
you can't use generators or yield-from for other purposes! It's just
that *if* you are writing a coroutine for use with a certain schedule,
you must use yield and yield-from in accordance to the scheduler's
rules. However, code you call can still use yield and yield-from for
iteration, and you can still use for-loops. In particular, if f is a
coroutine, it can still write "for x in g(): ..." where g is a
generator meant to be an iterator. However if g were instead a
coroutine, f should call it using "yield from g()", and f and g should
agree on the interface of their scheduler.

As to other topics, my current feeling is that we should try to
separately develop requirements and prototype implementations of the
I/O loop of the future, and to figure the loosest possible coupling
between that and a coroutine scheduler (or any other type of
scheduler). In particular, I think the I/O loop should not assume the
event handlers are implemented using coroutines -- but if someone
wants to write an awesome coroutine scheduler, they should be able to
delegate all their I/O waiting needs to the I/O loop with very little

To me, this means that the I/O loop probably should use "plain"
callback functions (i.e., not Futures, Deferreds or coroutines). We
should also standardize the interface to the I/O loop so that 3rd
parties can plug in their own I/O loop -- I don't see an end to the
debate whether the best C library for event handling is libevent,
libev or libuv.

While the focus of the I/O loop should be on single-threaded event
handling, some standard interface should exist so that you can run
certain code in a separate thread and wait for its completion -- I've
found this handy when calling socket.getaddrinfo(), which may block.
(Apparently async DNS lookups are really hard -- I read some
complaints about libevent's DNS lookups, and IIUC many Firefox
lockups are due to this.) But there may be other uses for this too.

An issue in the design of the I/O loop is the strain between a
ready-based and completion-based design. The typical Unix design
(whether based on select or any of the poll variants) is usually
ready-based; but on Windows, the only way to get high performance is
to base it on IOCP, which is completion-based (i.e. you start a
specific async operation, like writing N bytes, and the I/O loop tells
you when it is done). I would like people to be able to write fast
event handling programs on Windows too, and ideally the only change
would be the implementation of the I/O loop. But I don't know how
tenable that is given the dramatically different style used by IOCP
and the need to use native Windows API for all async I/O -- it sounds
like we could only do this if the library providing the I/O loop
implementation also wrapped all I/O operations, andthat may be a bit

Finally, there should also be some minimal interface so that multiple
I/O loops can interact -- at least in the case where one I/O loop
belongs to a GUI library. It seems this is a solved problem (as well
solved as you can hope for) to Twisted, so we should just adopt their

--Guido van Rossum (

From guido at  Fri Oct 19 18:07:24 2012
From: guido at (Guido van Rossum)
Date: Fri, 19 Oct 2012 09:07:24 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer <tismer at> wrote:
> On 19.10.12 07:15, Greg Ewing wrote:
>> Christian Tismer wrote:
>>> - generators are able to free the stack, when they yield. But when they
>>>    are active, they use the full stack. At least when I follow the
>>> pattern
>>>    "generator is calling sub-generator".
>>>    A deeply nested recursion is therefore something to avoid. :-(
>> Only if yield-from chains aren't optimised the way they
>> used to be.
> Does that mean a very deep recursion would be efficient?

TBH, I am not interested in making very deep recursion work at all. If
you need that, you're doing it wrong in my opinion.

> I'm trying to find that change in the hg history right now.
> Can you give me a hint how your initial implementation
> works, the initial patch source?
>> ...
>>> But this function that wants to
>>> switch needs to pass the fact that it wants to switch, plus the target
>>> somewhere. As I understood it, I would need to yield that to the
>>> driver function.
>> You understand incorrectly. In my scheduler, the yields
>> don't send or receive values at all. Communicating with the
>> scheduler, for example to tell it to allow another task to
>> run, is done by calling functions. A yield must be done to
>> actually allow a switch, but the yield itself doesn't send
>> any information.
> I have studied that yesterday already in depth and like that quite much.
> It is probably just the problem that I had with generators from their
> beginning.
> --
> Christian Tismer             :^)   <mailto:tismer at>
> Software Consulting          :     Have a break! Take a ride on Python's
> Karl-Liebknecht-Str. 121     :    *Starship*
> 14482 Potsdam                :     PGP key ->
> phone +49 173 24 18 776  fax +49 (30) 700143-0023
> PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
>       whom do you want to sponsor today?
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

--Guido van Rossum (

From tismer at  Fri Oct 19 18:18:42 2012
From: tismer at (Christian Tismer)
Date: Fri, 19 Oct 2012 18:18:42 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 19.10.12 15:56, Nick Coghlan wrote:
> On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer <tismer at> wrote:
>> I was curious and searched quite a lot.
>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
>> from Marc Shannon, patched by Benjamin.
>> Now I found the original implementation. That looks very much
>> as I'm thinking it should be.
>> Quite a dramatic change which works well, but really seems to remove
>> what I would call "now I can emulate most of Stackless" efficiently.
>> Maybe I should just try to think it would be implemented as before,
>> build an abstraction and just use it for now.
>> I will spend my time at PyCon de for sprinting on "yield from".
> Yeah, if we can get Greg's original optimised behaviour while still
> supporting introspection properly, that's really where we want to be.
> That's the main reason I'm a fan of Mark's other patches moving more
> of the generator state from the frame objects out into the generator
> objects - my suspicion is that generator objects themselves need to be
> maintaining a full "generator stack" independent of the frame stack in
> the main eval loop in order to get the best of both worlds (i.e.
> optimised suspend/resume with confusing debuggers).

That may be very true in order to get real generators.
The storm in my brain is quite intense the last days...

Actually I would like to have a python context where it gets into
"async mode" and interprets all functions defined in that mode as 
In that mode, generators are not meant as generators, but async-enabled
I see "yield from" as a low-level construct that should not even be
exposed, but be applied automatically in async mode.
That way, we could write normal functions and could implement a real "Yield"
without the "yield from" helper visible everywhere.

Not sure how to do that right. I'm playing with AST a bit to get a feeling
for this.

To give you an idea where my thoughts are meandering around,
I would like to point you at

That is an implementation that comes close to what I'm thinking.
The drawback of the current PyPy implementation is that it
used greenlet style for its underlying switching. That is what
I want to replace with some "yield from" construct.

cheers - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From tismer at  Fri Oct 19 18:50:39 2012
From: tismer at (Christian Tismer)
Date: Fri, 19 Oct 2012 18:50:39 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 19.10.12 18:07, Guido van Rossum wrote:
> On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer <tismer at> wrote:
>> On 19.10.12 07:15, Greg Ewing wrote:
>>> Christian Tismer wrote:
>>>> - generators are able to free the stack, when they yield. But when they
>>>>     are active, they use the full stack. At least when I follow the
>>>> pattern
>>>>     "generator is calling sub-generator".
>>>>     A deeply nested recursion is therefore something to avoid. :-(
>>> Only if yield-from chains aren't optimised the way they
>>> used to be.
>> Does that mean a very deep recursion would be efficient?
> TBH, I am not interested in making very deep recursion work at all. If
> you need that, you're doing it wrong in my opinion.

Misunderstanding I think. Of course I don't want to use deep recursion.
But people might write things that happen several levels deep and
then iterating over lots of stuff. A true generator would have no
problem with that.
Assume just five layers of generators that have to be re-invoked
for a tight yielding loop is quite some overhead that can be avoided.

The reason why I care is that existing implementations that use
greenlet style could  be turned into pure python, given that I manage
to write the right support functions, and replace all functions by
generators that emulate functions with async behavior.

It would just be great if that worked at the same speed, independent
from at which stack level an iteration happens.

Agreed that new code like that would be bad style.

ciao - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From guido at  Fri Oct 19 19:18:38 2012
From: guido at (Guido van Rossum)
Date: Fri, 19 Oct 2012 10:18:38 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Fri, Oct 19, 2012 at 9:50 AM, Christian Tismer <tismer at> wrote:
> On 19.10.12 18:07, Guido van Rossum wrote:
>> On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer <tismer at>
>> wrote:
>>> On 19.10.12 07:15, Greg Ewing wrote:
>>>> Christian Tismer wrote:
>>>>> - generators are able to free the stack, when they yield. But when they
>>>>>     are active, they use the full stack. At least when I follow the
>>>>> pattern
>>>>>     "generator is calling sub-generator".
>>>>>     A deeply nested recursion is therefore something to avoid. :-(
>>>> Only if yield-from chains aren't optimised the way they
>>>> used to be.
>>> Does that mean a very deep recursion would be efficient?
>> TBH, I am not interested in making very deep recursion work at all. If
>> you need that, you're doing it wrong in my opinion.
> Misunderstanding I think. Of course I don't want to use deep recursion.
> But people might write things that happen several levels deep and
> then iterating over lots of stuff. A true generator would have no
> problem with that.

Okay, good. I agree that this use case should be as fast as possible
-- as long as we still see every frame involved when a traceback is

> Assume just five layers of generators that have to be re-invoked
> for a tight yielding loop is quite some overhead that can be avoided.
> The reason why I care is that existing implementations that use
> greenlet style could  be turned into pure python, given that I manage
> to write the right support functions, and replace all functions by
> generators that emulate functions with async behavior.
> It would just be great if that worked at the same speed, independent
> from at which stack level an iteration happens.


> Agreed that new code like that would be bad style.

Like "what"?

--Guido van Rossum (

From tismer at  Fri Oct 19 19:36:39 2012
From: tismer at (Christian Tismer)
Date: Fri, 19 Oct 2012 19:36:39 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 19.10.12 19:18, Guido van Rossum wrote:
> On Fri, Oct 19, 2012 at 9:50 AM, Christian Tismer <tismer at> wrote:
>> On 19.10.12 18:07, Guido van Rossum wrote:
>>> ...

>>> TBH, I am not interested in making very deep recursion work at all. If
>>> you need that, you're doing it wrong in my opinion.
>> ...
>> Agreed that new code like that would be bad style.
> Like "what"?

Like code that excercises deep recursion thoughtlessly ;-)
in contrast to code that happens to be quite nested because
of a systematic transformation.

So correctness first, big Oh later.

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From jimjjewett at  Fri Oct 19 22:10:00 2012
From: jimjjewett at (Jim Jewett)
Date: Fri, 19 Oct 2012 16:10:00 -0400
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/19/12, Calvin Spealman <ironfroggy at> wrote:
> On Thu, Oct 18, 2012 at 11:46 PM, Jim Jewett <jimjjewett at> wrote:

>> [I think the yield solutions are (too magic)/(prematurely lock too
>> much policy) to be "The" API, but work fine as "an example API"]

> I think it is important that this is more than convention. ... Our
> focus should be not on providing simple things like "async file read" but
> crafting an environment where people can continue to write wonderfully
> expressive and useful libraries that others can combine to their own needs.

And I think that adding  (requirements for generator usage) / (implied
meaning of yield) prevents that.

>> On 10/12/12, Guido van Rossum <guido at> wrote:

>>> But the only use for send() on a generator is when using it as a
>>> coroutine for a concurrent tasks system -- send() really makes no
>>> sense for generators used as iterators.

>> But the data doesn't have to be scheduling information; it can be new
>> data, a seed for an algorithm, a command to switch or reset the state
>> ... locking it to the scheduler is part of what worries me.

> When a coroutine yields, it yields *to the scheduler* so for whom else
> should these values be?

Who says that there has to be a scheduler?  Or at least a single scheduler?

To me, the "obvious" solution is that each co-routine is "scheduled"
only by its own caller, and runs on its own micro-thread.  The caller
thread may or may not wait for a result to be yielded, but would not
normally wait for the entire generator to be exhausted forever (the

The next call to the co-routine may well be from an entirely different
caller, particularly if the co-routine is a generic source or sink.

There may well be several other co-routines (as opposed to a single
scheduler) that enforce policy, and may send messages about things
like "switch to that source of randomness", "start using this other
database instance as a backup", "stop listening on that port".  They
would certainly want to use throw, and perhaps send as well.

In practice, creating a full thread for each such co-routine probably
won't work well under current threading systems, because an OS thread
(let alone an OS process) is too heavy-weight.  And without OS
support, python has to do some internal scheduling.  But I'm not
convinced that the current situation will last forever, so I don't
want to muddy up the *abstraction* just to coddle temporary

>> But with the yield channel reserved for scheduling overhead, the
>> "generator" can't really generate anything, except through side
>> effects...

> Don't forget that yield-from is an expression, not a statement. The
> value eventually returned from the generator is the result of the
> yield-from, so the generator still produces a final value.

Assuming it terminates, then yes.  But that isn't (conceptually) a
generator; it is an ordinary function call.

> The fact that these are generators is for their ability to suspend, not to
> iterate.

So "generator" is not really the right term.  Abusing that for one
module is acceptable, but I'm reluctant to bake that change into an
officially sanctioned API, let alone one important enough that it
might eventually be seen as the primary definition.

>> * "mostly", because if my task is willing to wait for the subtask to
>> complete, then why not just use a blocking call in the first place?
>> Is it just because switching to another task is lighter weight than
>> letting a thread block?

> By blocking call do you mean "x = foo()" or "x = yield from foo()"?
> Blocking call usually means the former, so if you mean that, then you
> neglect to think of all the other tasks running which are not willing to wait.

Exactly.  From my own code's perspective, is there any difference
between those two?  (Well, besides the fact that the second is
wordier, and puts more constraints on what I can use for foo.)

So why not just use the first spelling, let the (possibly OS-level)
scheduler notice that I'm blocked (if I happen to be), and let it
suspend my thread waiting on foo?

Is it just that *current* ways to suspend a thread of execution are
expensive, and we hope to do it more cheaply?  If so, that is a
perfectly sensible justification for conventions within a single
stdlib module.  But since the trade-offs may change with time, the
current costs shouldn't drive decisions about the async API, let alone
changes to the meaning of "yield" or "generator".

>> [Questions about generators that do not follow the new constraints]

> I think if the scheduler doesn't know what to do with something, it should
> be an error. That makes it easier to change things in the future.

Those were all things that could reasonably happen simply by reusing
correct existing code.

For a specific implementation, even a stdlib module, it is OK to treat
them as errors;  a specific module can always be viewed as incomplete.

But for "the asynchronous API of the future", undefined behavior just
guarantees warts.  We may eventually decide that the warts are in the
existing legacy code, but there would still be warts.


From guido at  Fri Oct 19 22:22:55 2012
From: guido at (Guido van Rossum)
Date: Fri, 19 Oct 2012 13:22:55 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Jim, relax.

We're not changing the meaning of yield or generator.

We're just making it *possible* to use yield(-from) and generators as
coroutines; that's actually a long path that started with PEP 342. No
freedom is taken away by PEP 380; it just adds the possibility to do
it without managing an explicit stack of coroutine calls in the

If we believed that there was no advantage to spelling a blocking call
as "yield from foo()", we would just spell it as "foo()" and somehow
make it work.

But (and even Christian Tismer agrees) there is a problem with the
shorter spelling -- you lose track of which calls may cause a
task-switch. Using yield-from (or yield, for that matter) for this
purpose ensures that all callers in the call chain have to explicitly
mark the suspension points, and this serves as a useful reminder that
after resumption, the world may look differently, because other tasks
may have run in the mean time.

--Guido van Rossum (

From mark at  Fri Oct 19 23:15:38 2012
From: mark at (Mark Shannon)
Date: Fri, 19 Oct 2012 22:15:38 +0100
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 19/10/12 13:55, Christian Tismer wrote:
> Hi Nick,
> On 16.10.12 03:49, Nick Coghlan wrote:
>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
>> <greg.ewing at> wrote:
>>> My original implementation of yield-from actually *did* avoid
>>> this, by keeping a C-level pointer chain of yielding-from frames.
>>> But that part was ripped out at the last minute when someone
>>> discovered that it had a detrimental effect on tracebacks.
>>> There are probably other ways the traceback problem could be
>>> fixed, so maybe we will get this optimisation back one day.
>> Ah, I thought I remembered something along those lines. IIRC, it was a
>> bug report on one of the alphas that prompted us to change it.
> I was curious and searched quite a lot.
> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
> from Marc Shannon, patched by Benjamin.
> Now I found the original implementation. That looks very much
> as I'm thinking it should be.
> Quite a dramatic change which works well, but really seems to remove
> what I would call "now I can emulate most of Stackless" efficiently.
> Maybe I should just try to think it would be implemented as before,
> build an abstraction and just use it for now.
> I will spend my time at PyCon de for sprinting on "yield from".

The current implementation may not be much slower than Greg's original 
version. One of the main costs of making a call is the creation of a new 
frame. But calling into a generator does not need a new frame, so the 
cost will be reduced.
Unless anyone has evidence to the contrary :)

Rather than increasing the performance of this special case, I would 
suggest that improving the performance of calls & returns in general 
would be a more worthwhile goal.
Calls and returns ought to be cheap.


From guido at  Fri Oct 19 23:31:17 2012
From: guido at (Guido van Rossum)
Date: Fri, 19 Oct 2012 14:31:17 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon <mark at> wrote:
> On 19/10/12 13:55, Christian Tismer wrote:
>> Hi Nick,
>> On 16.10.12 03:49, Nick Coghlan wrote:
>>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
>>> <greg.ewing at> wrote:
>>>> My original implementation of yield-from actually *did* avoid
>>>> this, by keeping a C-level pointer chain of yielding-from frames.
>>>> But that part was ripped out at the last minute when someone
>>>> discovered that it had a detrimental effect on tracebacks.
>>>> There are probably other ways the traceback problem could be
>>>> fixed, so maybe we will get this optimisation back one day.
>>> Ah, I thought I remembered something along those lines. IIRC, it was a
>>> bug report on one of the alphas that prompted us to change it.
>> I was curious and searched quite a lot.
>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
>> from Marc Shannon, patched by Benjamin.
>> Now I found the original implementation. That looks very much
>> as I'm thinking it should be.
>> Quite a dramatic change which works well, but really seems to remove
>> what I would call "now I can emulate most of Stackless" efficiently.
>> Maybe I should just try to think it would be implemented as before,
>> build an abstraction and just use it for now.
>> I will spend my time at PyCon de for sprinting on "yield from".
> The current implementation may not be much slower than Greg's original
> version. One of the main costs of making a call is the creation of a new
> frame. But calling into a generator does not need a new frame, so the cost
> will be reduced.
> Unless anyone has evidence to the contrary :)
> Rather than increasing the performance of this special case, I would suggest
> that improving the performance of calls & returns in general would be a more
> worthwhile goal.
> Calls and returns ought to be cheap.

I did a basic timing test using a simple recursive function and a
recursive PEP-380 coroutine computing the same value (see attachment).
The coroutine version is a little over twice as slow as the function
version. I find that acceptable. This went 20 deep, making 2 recursive
calls at each level (except at the deepest level).

Output on my MacBook Pro:

plain 2097151 0.5880069732666016
coro. 2097151 1.2958409786224365

This was a Python 3.3 built a few days ago from the 3.3 branch.

--Guido van Rossum (
-------------- next part --------------
A non-text attachment was scrubbed...
Type: application/octet-stream
Size: 675 bytes
Desc: not available
URL: <>

From tismer at  Sat Oct 20 00:31:15 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 00:31:15 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Hi Guido, Marc, all,

this is a veery promising result, telling
me that the big Oh can in fact be
neglected in real applications. 20 for
twi is good!

I will of course do an analysis and find
the parameters of the quadratic, but my
concern is pretty much tamed. 

For me that means there will soon be
a library that contains real generators
and more building blocks. 

I think using those would simplify the
design of the async API quite a lot. 

I suggest to regard current generator
constructs as low-level helpers for
Implementing the real concurrency
building blocks. 

Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract
from this now and think in terms of
higher level constructs. 

Let's assume generators and coroutines,
and model concurrency from that. I
believe this unwinds the brains and
clarifies things a lot. 

I will provide sone classes for that at
the sprint, unless somebody
implements it earlier (please don't).

This email was written in non-linear
order, so please ignore logic inversions. 

Cheers - chris

Sent from my Ei4Steve

On Oct 19, 2012, at 23:31, Guido van Rossum <guido at> wrote:

> On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon <mark at> wrote:
>> On 19/10/12 13:55, Christian Tismer wrote:
>>> Hi Nick,
>>> On 16.10.12 03:49, Nick Coghlan wrote:
>>>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
>>>> <greg.ewing at> wrote:
>>>>> My original implementation of yield-from actually *did* avoid
>>>>> this, by keeping a C-level pointer chain of yielding-from frames.
>>>>> But that part was ripped out at the last minute when someone
>>>>> discovered that it had a detrimental effect on tracebacks.
>>>>> There are probably other ways the traceback problem could be
>>>>> fixed, so maybe we will get this optimisation back one day.
>>>> Ah, I thought I remembered something along those lines. IIRC, it was a
>>>> bug report on one of the alphas that prompted us to change it.
>>> I was curious and searched quite a lot.
>>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
>>> from Marc Shannon, patched by Benjamin.
>>> Now I found the original implementation. That looks very much
>>> as I'm thinking it should be.
>>> Quite a dramatic change which works well, but really seems to remove
>>> what I would call "now I can emulate most of Stackless" efficiently.
>>> Maybe I should just try to think it would be implemented as before,
>>> build an abstraction and just use it for now.
>>> I will spend my time at PyCon de for sprinting on "yield from".
>> The current implementation may not be much slower than Greg's original
>> version. One of the main costs of making a call is the creation of a new
>> frame. But calling into a generator does not need a new frame, so the cost
>> will be reduced.
>> Unless anyone has evidence to the contrary :)
>> Rather than increasing the performance of this special case, I would suggest
>> that improving the performance of calls & returns in general would be a more
>> worthwhile goal.
>> Calls and returns ought to be cheap.
> I did a basic timing test using a simple recursive function and a
> recursive PEP-380 coroutine computing the same value (see attachment).
> The coroutine version is a little over twice as slow as the function
> version. I find that acceptable. This went 20 deep, making 2 recursive
> calls at each level (except at the deepest level).
> Output on my MacBook Pro:
> plain 2097151 0.5880069732666016
> coro. 2097151 1.2958409786224365
> This was a Python 3.3 built a few days ago from the 3.3 branch.
> -- 
> --Guido van Rossum (
> <>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From tismer at  Sat Oct 20 00:45:15 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 00:45:15 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>


Sent from my Ei4Steve

On Oct 20, 2012, at 0:31, Christian Tismer <tismer at> wrote:

> Hi Guido, Marc, all,
> this is a veery promising result, telling
> me that the big Oh can in fact be
> neglected in real applications. 20 for
> twi is good!
> I will of course do an analysis and find
> the parameters of the quadratic, but my
> concern is pretty much tamed. 
> For me that means there will soon be
> a library that contains real generators
> and more building blocks. 
> I think using those would simplify the
> design of the async API quite a lot. 
> I suggest to regard current generator
> constructs as low-level helpers for
> Implementing the real concurrency
> building blocks. 
> Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract
> from this now and think in terms of
> higher level constructs. 
> Let's assume generators and coroutines,
> and model concurrency from that. I
> believe this unwinds the brains and
> clarifies things a lot. 
> I will provide sone classes for that at
> the sprint, unless somebody
> implements it earlier (please don't).
> This email was written in non-linear
> order, so please ignore logic inversions. 
> Cheers - chris
> Sent from my Ei4Steve
> On Oct 19, 2012, at 23:31, Guido van Rossum <guido at> wrote:
>> On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon <mark at> wrote:
>>> On 19/10/12 13:55, Christian Tismer wrote:
>>>> Hi Nick,
>>>> On 16.10.12 03:49, Nick Coghlan wrote:
>>>>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
>>>>> <greg.ewing at> wrote:
>>>>>> My original implementation of yield-from actually *did* avoid
>>>>>> this, by keeping a C-level pointer chain of yielding-from frames.
>>>>>> But that part was ripped out at the last minute when someone
>>>>>> discovered that it had a detrimental effect on tracebacks.
>>>>>> There are probably other ways the traceback problem could be
>>>>>> fixed, so maybe we will get this optimisation back one day.
>>>>> Ah, I thought I remembered something along those lines. IIRC, it was a
>>>>> bug report on one of the alphas that prompted us to change it.
>>>> I was curious and searched quite a lot.
>>>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
>>>> from Marc Shannon, patched by Benjamin.
>>>> Now I found the original implementation. That looks very much
>>>> as I'm thinking it should be.
>>>> Quite a dramatic change which works well, but really seems to remove
>>>> what I would call "now I can emulate most of Stackless" efficiently.
>>>> Maybe I should just try to think it would be implemented as before,
>>>> build an abstraction and just use it for now.
>>>> I will spend my time at PyCon de for sprinting on "yield from".
>>> The current implementation may not be much slower than Greg's original
>>> version. One of the main costs of making a call is the creation of a new
>>> frame. But calling into a generator does not need a new frame, so the cost
>>> will be reduced.
>>> Unless anyone has evidence to the contrary :)
>>> Rather than increasing the performance of this special case, I would suggest
>>> that improving the performance of calls & returns in general would be a more
>>> worthwhile goal.
>>> Calls and returns ought to be cheap.
>> I did a basic timing test using a simple recursive function and a
>> recursive PEP-380 coroutine computing the same value (see attachment).
>> The coroutine version is a little over twice as slow as the function
>> version. I find that acceptable. This went 20 deep, making 2 recursive
>> calls at each level (except at the deepest level).
>> Output on my MacBook Pro:
>> plain 2097151 0.5880069732666016
>> coro. 2097151 1.2958409786224365
>> This was a Python 3.3 built a few days ago from the 3.3 branch.
>> -- 
>> --Guido van Rossum (
>> <>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From greg.ewing at  Sat Oct 20 01:02:17 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 12:02:17 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Christian Tismer wrote:

> Can you give me a hint how your initial implementation
> works, the initial patch source?

You can find my initial patches here:

Essentially, an extra field f_yieldfrom is added to frame
objects. When a 'yield from' is started, the f_yieldfrom field
of the calling frame is set to point to the called frame.

The __next__ method of a generator first traverses the
f_yieldfrom chain to find the frame at the end, and then
resumes that frame. So most of the time, only the innermost
frame of a nested yield-from chain is actually entered in
response to a next() call.

(There are some complications due to the fact that you can
'yield from' something that's not a generator, but the above
is effectively what happens when all the objects in the
chain are generators.)


From greg.ewing at  Sat Oct 20 01:29:25 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 12:29:25 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Calvin Spealman wrote:

> I think it is important that this is more than convention. I think that we
> need our old friend TOOOWTDI (There's Only One Obvious Way To Do It)
> here more than ever.

This is part of the reason that I don't like the idea of
controlling the scheduler by yielding instructions to it.
There are a great many ways that such a "scheduler instruction
set" could be designed, none of them any more obvious than
the others.

So rather than single out an arbitrarily chosen set of
operations to be regarded as primitives that the scheduler
knows about directly, I would rather have *no* such
primitives in the public API.


From guido at  Sat Oct 20 01:39:05 2012
From: guido at (Guido van Rossum)
Date: Fri, 19 Oct 2012 16:39:05 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 19, 2012 at 4:29 PM, Greg Ewing <greg.ewing at> wrote:
> Calvin Spealman wrote:
>> I think it is important that this is more than convention. I think that we
>> need our old friend TOOOWTDI (There's Only One Obvious Way To Do It)
>> here more than ever.
> This is part of the reason that I don't like the idea of
> controlling the scheduler by yielding instructions to it.
> There are a great many ways that such a "scheduler instruction
> set" could be designed, none of them any more obvious than
> the others.
> So rather than single out an arbitrarily chosen set of
> operations to be regarded as primitives that the scheduler
> knows about directly, I would rather have *no* such
> primitives in the public API.

But you have that problem anyway. In your current style you write
things like this:


I don't see how this decouples the call site of the primitive from the
scheduler any more than if you were to write e.g. this:

        yield block(self.queue)

In fact, you can write it in your current framework and it would have
the exact same effect! That's because block() returns None, so it
comes down to calling block(self.queue) and then yielding None, which
is exactly what happens in the first form as well. And even if block()
were to return a value, since the scheduler ignores the return value
from next(), it still works the same way. Not that I recommend doing
this just because it works -- but if we liked the second form better,
we could easily implement block() in such a way that you'd *have* to
write it like that.

So, I don't see what we gain by writing it the first way.

--Guido van Rossum (

From greg.ewing at  Sat Oct 20 01:50:20 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 12:50:20 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Nick Coghlan wrote:
> my suspicion is that generator objects themselves need to be
> maintaining a full "generator stack" independent of the frame stack in
> the main eval loop in order to get the best of both worlds (i.e.
> optimised suspend/resume with confusing debuggers).

The f_yieldfrom chain effectively *is* a generator stack, it's
just linked in the opposite direction to the way stacks normally
are. While you probably could move f_yieldfrom out of the frame
object and into the generator-iterator object, I don't see how
it would make any difference to the traceback issue.

I'm not even sure why my original implementation was getting
tracebacks wrong. What *should* happen is that if an exception
comes out of a generator being yielded from, the tail is
chopped off the f_yieldfrom chain and the exception is thrown
into the next frame up, thereby adding its frame to the

It may simply be that there was a minor bug in my implementation
that could be fixed without ditching the whole f_yieldfrom
idea. I may look into this if I find time.


From greg.ewing at  Sat Oct 20 02:33:31 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 13:33:31 +1300
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> I would like people to be able to write fast
> event handling programs on Windows too, ... But I don't know how
> tenable that is given the dramatically different style used by IOCP
> and the need to use native Windows API for all async I/O -- it sounds
> like we could only do this if the library providing the I/O loop
> implementation also wrapped all I/O operations, and that may be a bit
> much.

That's been bothering me, too. It seems like an interface
accommodating the completion-based style will have to be
*extremely* fat.

That's not just a burden for anyone implementing the
interface, it's a problem for any library wanting to *wrap*
it as well.

For example, to maintain separation between the async
layer and the generator layer, we will probably want to
have an AsyncSocket object in the async layer, and a
separate GeneratorSocket in the generator layer that wraps
an AsyncSocket.

If the AsyncSocket needs to provide methods for all the
possible I/O operations that one might want to perform on
a socket, then GeneratorSocket needs to provide its own
versions of all those methods as well.

Multiply that by the number of different kinds of I/O
objects (files, sockets, message queues, etc. -- there
seem to be quite a lot of them on Windows) and that's
a *lot* of stuff to be wrapped.

> Finally, there should also be some minimal interface so that multiple
> I/O loops can interact -- at least in the case where one I/O loop
> belongs to a GUI library.

That's another thing that worries me. With a ready-based
event loop, this is fairly straightforward. If you can get
hold of the file descriptor or handle that the GUI is
ultimately reading its input from, all you need to do is
add it as an event source to your main loop, and when it's
ready, tell the GUI event loop to run itself once.

But you can't do that with a completion-based main loop,
because the actual reading of the input needs to be done
in a different way, and that's usually buried somewhere
deep in the GUI library where you can't easily change it.

 > It seems this is a solved problem (as well
> solved as you can hope for) to Twisted, so we should just adopt their
> approach.

Do they actually do it for an IOCP-based main loop on Windows?
If so, I'd be interested to know how.


From jstpierre at  Sat Oct 20 02:50:00 2012
From: jstpierre at (Jasper St. Pierre)
Date: Fri, 19 Oct 2012 20:50:00 -0400
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 19, 2012 at 8:33 PM, Greg Ewing <greg.ewing at> wrote:

... snip ...

> That's another thing that worries me. With a ready-based
> event loop, this is fairly straightforward. If you can get
> hold of the file descriptor or handle that the GUI is
> ultimately reading its input from, all you need to do is
> add it as an event source to your main loop, and when it's
> ready, tell the GUI event loop to run itself once.

For most windowing systems, this isn't true. You need to call some
function to check if you have events pending. For X11, this is
"XPending". For Win32, this is "GetQueueStatus".

But overall, the thing is that most GUI libraries have their own event
loops. In GTK+, this is done with a "GSource", which can have support
for custom sources (which is how the calls to the above APIs are
made). What Twisted does is this case is swap out their own select
loop with another implementation built around GLib's GMainLoop, which
uses whatever internally.

I'd highly recommend taking Twisted's approach of having swappable event loops.

The question then becomes how you swap out the main loop: Twisted does
this with a global reactor which you "install", which the community
has found rather ugly, but there isn't really a better solution
they've come up with. They've had a few proposals over the years to
add better functionality, so I'd like to hear their experience on

> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


From greg.ewing at  Sat Oct 20 02:50:28 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 13:50:28 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Christian Tismer wrote:
> Actually I would like to have a python context where it gets into
> "async mode" and interprets all functions defined in that mode as 
> generators.

That sounds somewhat similar to another idea I proposed a while

There would be a special kind of function called a "cofunction",
that you define using "codef" instead of "def". A cofunction
is essentially a generator, but with a special property: when
one cofunction calls another, the call is implicitly made as
a "yield from" call.

This scheme wouldn't be completely transparent, since the
cofunctions have to be defined in a special way. But the calls
would look like ordinary calls.

There's a PEP describing a variation on the idea here:

In that version, calls to cofunctions are specially marked
using a "cocall" keyword. But since writing that, I've come to
believe that my original idea (where the cocalls are implicit)
was better.


From tismer at  Sat Oct 20 03:17:02 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 03:17:02 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>


On 20.10.12 00:31, Christian Tismer wrote:
> Hi Guido, Marc, all,
> this is a veery promising result, telling
> me that the big Oh can in fact be
> neglected in real applications. 20 for
> twi is good!
> I will of course do an analysis and find
> the parameters of the quadratic, but my
> concern is pretty much tamed.
> For me that means there will soon be
> a library that contains real generators
> and more building blocks.
> I think using those would simplify the
> design of the async API quite a lot.
> I suggest to regard current generator
> constructs as low-level helpers for
> Implementing the real concurrency
> building blocks.
> Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract
> from this now and think in terms of
> higher level constructs.
> Let's assume generators and coroutines,
> and model concurrency from that. I
> believe this unwinds the brains and
> clarifies things a lot.
> I will provide sone classes for that at
> the sprint, unless somebody
> implements it earlier (please don't).
> This email was written in non-linear
> order, so please ignore logic inversions.
> Cheers - chris
> Sent from my Ei4Steve
> On Oct 19, 2012, at 23:31, Guido van Rossum <guido at> wrote:
>> On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon <mark at> wrote:
>>> On 19/10/12 13:55, Christian Tismer wrote:
>>>> Hi Nick,
>>>> On 16.10.12 03:49, Nick Coghlan wrote:
>>>>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing
>>>>> <greg.ewing at> wrote:
>>>>>> My original implementation of yield-from actually *did* avoid
>>>>>> this, by keeping a C-level pointer chain of yielding-from frames.
>>>>>> But that part was ripped out at the last minute when someone
>>>>>> discovered that it had a detrimental effect on tracebacks.
>>>>>> There are probably other ways the traceback problem could be
>>>>>> fixed, so maybe we will get this optimisation back one day.
>>>>> Ah, I thought I remembered something along those lines. IIRC, it was a
>>>>> bug report on one of the alphas that prompted us to change it.
>>>> I was curious and searched quite a lot.
>>>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220
>>>> from Marc Shannon, patched by Benjamin.
>>>> Now I found the original implementation. That looks very much
>>>> as I'm thinking it should be.
>>>> Quite a dramatic change which works well, but really seems to remove
>>>> what I would call "now I can emulate most of Stackless" efficiently.
>>>> Maybe I should just try to think it would be implemented as before,
>>>> build an abstraction and just use it for now.
>>>> I will spend my time at PyCon de for sprinting on "yield from".
>>> The current implementation may not be much slower than Greg's original
>>> version. One of the main costs of making a call is the creation of a new
>>> frame. But calling into a generator does not need a new frame, so the cost
>>> will be reduced.
>>> Unless anyone has evidence to the contrary :)
>>> Rather than increasing the performance of this special case, I would suggest
>>> that improving the performance of calls & returns in general would be a more
>>> worthwhile goal.
>>> Calls and returns ought to be cheap.
>> I did a basic timing test using a simple recursive function and a
>> recursive PEP-380 coroutine computing the same value (see attachment).
>> The coroutine version is a little over twice as slow as the function
>> version. I find that acceptable. This went 20 deep, making 2 recursive
>> calls at each level (except at the deepest level).
>> Output on my MacBook Pro:
>> plain 2097151 0.5880069732666016
>> coro. 2097151 1.2958409786224365
>> This was a Python 3.3 built a few days ago from the 3.3 branch.

What you are comparing seems to have a constant factor of about 2.5.

minimax:py3 tismer$ python3
plain 0 1   0.00000
coro. 0 1   0.00001
relat 0 1   8.50000
plain 1 3   0.00000
coro. 1 3   0.00001
relat 1 3   2.77778
plain 2 7   0.00000
coro. 2 7   0.00001
relat 2 7   3.62500
plain 3 15   0.00000
coro. 3 15   0.00001
relat 3 15   2.87500
plain 4 31   0.00001
coro. 4 31   0.00002
relat 4 31   2.42424
plain 5 63   0.00002
coro. 5 63   0.00004
relat 5 63   2.46032
plain 6 127   0.00003
coro. 6 127   0.00007
relat 6 127   2.52542
plain 7 255   0.00006
coro. 7 255   0.00014
relat 7 255   2.38272
plain 8 511   0.00011
coro. 8 511   0.00028
relat 8 511   2.49356
plain 9 1023   0.00022
coro. 9 1023   0.00055
relat 9 1023   2.50327
plain 10 2047   0.00042
coro. 10 2047   0.00106
relat 10 2047   2.50956
plain 11 4095   0.00083
coro. 11 4095   0.00204
relat 11 4095   2.44699
plain 12 8191   0.00167
coro. 12 8191   0.00441
relat 12 8191   2.64792
plain 13 16383   0.00340
coro. 13 16383   0.00855
relat 13 16383   2.51881
plain 14 32767   0.00876
coro. 14 32767   0.01823
relat 14 32767   2.08106
plain 15 65535   0.01419
coro. 15 65535   0.03507
relat 15 65535   2.47131
plain 16 131071   0.02669
coro. 16 131071   0.06874
relat 16 131071   2.57515
plain 17 262143   0.05448
coro. 17 262143   0.13699
relat 17 262143   2.51467
plain 18 524287   0.10843
coro. 18 524287   0.27395
relat 18 524287   2.52660
plain 19 1048575   0.21310
coro. 19 1048575   0.54573
relat 19 1048575   2.56095
plain 20 2097151   0.42802
coro. 20 2097151   1.06199
relat 20 2097151   2.48114
plain 21 4194303   0.86531
coro. 21 4194303   2.19048
relat 21 4194303   2.53143

ciao - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

-------------- next part --------------
A non-text attachment was scrubbed...
Type: text/x-python
Size: 946 bytes
Desc: not available
URL: <>

From tjreedy at  Sat Oct 20 03:55:22 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 19 Oct 2012 21:55:22 -0400
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <k5t0al$740$>

On 10/19/2012 5:31 PM, Guido van Rossum wrote:

> I did a basic timing test using a simple recursive function and a
> recursive PEP-380 coroutine computing the same value (see attachment).
> The coroutine version is a little over twice as slow as the function
> version. I find that acceptable. This went 20 deep, making 2 recursive
> calls at each level (except at the deepest level).
> Output on my MacBook Pro:
> plain 2097151 0.5880069732666016
> coro. 2097151 1.2958409786224365
> This was a Python 3.3 built a few days ago from the 3.3 branch.

At the top level, the coroutine version adds 2097151 next() calls. 
Suspecting that that, not the addition of 'yield from' was responsible 
for most of the extra time, I added

def trivial():
     for i in range(2097151):
     raise StopIteration(2097151)
     t0 = time.time()
         g = trivial()
         while True:
     except StopIteration as err:
         k = err.value
     t1 = time.time()
     print('triv.', k, t1-t0)

The result supports the hypothesis.

plain 2097151 0.4590260982513428
coro. 2097151 0.9180529117584229
triv. 2097151 0.39902305603027344

I don't know what to make of this in the context of asynch operations, 
but in 'traditional' use, the generator would not replace a function 
returning a single number but one returning a list (of, in this case, 
2097151 numbers), so each next replaces a .append method call.

Terry Jan Reedy

From ncoghlan at  Sat Oct 20 03:56:45 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 20 Oct 2012 11:56:45 +1000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sat, Oct 20, 2012 at 10:50 AM, Greg Ewing
<greg.ewing at> wrote:
> Christian Tismer wrote:
>> Actually I would like to have a python context where it gets into
>> "async mode" and interprets all functions defined in that mode as
>> generators.
> That sounds somewhat similar to another idea I proposed a while
> ago:
> There would be a special kind of function called a "cofunction",
> that you define using "codef" instead of "def". A cofunction
> is essentially a generator, but with a special property: when
> one cofunction calls another, the call is implicitly made as
> a "yield from" call.
> This scheme wouldn't be completely transparent, since the
> cofunctions have to be defined in a special way. But the calls
> would look like ordinary calls.

Please don't lose sight of the fact that yield-based suspension points
looking like something other than an ordinary function call is a
*feature*, not a bug.

The idea is that the flow control, especially the fact that "other
code may run here, so the world may have changed before we get to the
next expression", is visible *locally* in each function, rather than
relying on global knowledge of which calls may lead to a task switch.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From Steve.Dower at  Sat Oct 20 04:41:52 2012
From: Steve.Dower at (Steve Dower)
Date: Sat, 20 Oct 2012 02:41:52 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<>	<>
Message-ID: <>

I'm not entirely sure whether I'm hijacking the thread here... I have to admit I've somewhat lost track with all the renames. The discussion has been very interesting (I really like the 'codef' idea, and decorators can provide this without requiring syntax changes) regardless of which thread is active.

I have spent a bit of time writing up the approach that we (Dino, who posted it here originally, myself and with some advice from colleagues who are working on a similar API for C++) came up with and implemented.

I must apologise for the length - I got a little carried away with background information, but I do believe that it is important for us to understand exactly what problem we're trying to solve so that we aren't distracted by "new toys".

The write-up is here:

I included code, since there have been a few people asking for prototype implementations, so if you want to skip ahead to the code (which is quite heavily annotated) it is at or (I based my example on Greg's socket spam, so thanks for that!)

And no, I'm not collecting any ad revenue from the page, so feel free to visit as often as you like and use up my bandwidth.

Let the tearing apart of my approach commence! :)


From greg.ewing at  Sat Oct 20 04:44:51 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 15:44:51 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> In your current style you write
> things like this:
>         block(self.queue)
>         yield
> I don't see how this decouples the call site of the primitive from the
> scheduler any more than if you were to write e.g. this:
>         yield block(self.queue)

If I wrote a library intended for serious use, the end user
probably wouldn't write either of those. Instead he would
write something like

    yield from block(self.queue)

and it would be an implementation detail of the library
where abouts the 'yield' happened and whether it needed
to send a value or not.

When I say I don't like scheduler instructions, all I really
mean is that they shouldn't be part of the public API.
A scheduler can use them internally if it wants, I don't


From greg.ewing at  Sat Oct 20 05:11:08 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 16:11:08 +1300
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <>

Jasper St. Pierre wrote:
> For most windowing systems, this isn't true. You need to call some
> function to check if you have events pending. For X11, this is
> "XPending". For Win32, this is "GetQueueStatus".

X11 is ultimately reading its events from the socket to
the display server. If you select() that socket, it will
tell you whenever the X11 event loop could possibly have
something to do.

On Windows, I imagine the equivalent would be to pass your
message queue handle to a WaitForMultipleObjects call.
I've never tried to do anything like that, though, so
I don't know if it would really work.

> What Twisted does is this case is swap out their own select
> loop with another implementation built around GLib's GMainLoop,

If it's truly impossible to incorporate GMainLoop as a
sub-loop of something else, then this is a bad situation.
What happens if you also want to use some other library
that insists on *its* main loop being in charge? This
cannot be a general solution.


From jstpierre at  Sat Oct 20 05:20:22 2012
From: jstpierre at (Jasper St. Pierre)
Date: Fri, 19 Oct 2012 23:20:22 -0400
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 19, 2012 at 11:11 PM, Greg Ewing
<greg.ewing at> wrote:
> Jasper St. Pierre wrote:
>> For most windowing systems, this isn't true. You need to call some
>> function to check if you have events pending. For X11, this is
>> "XPending". For Win32, this is "GetQueueStatus".
> X11 is ultimately reading its events from the socket to
> the display server. If you select() that socket, it will
> tell you whenever the X11 event loop could possibly have
> something to do.

Nope. libX11/XCB keep their own queue of events and do their own
socket management, so it's not just "poll on this FD, thanks"

> On Windows, I imagine the equivalent would be to pass your
> message queue handle to a WaitForMultipleObjects call.
> I've never tried to do anything like that, though, so
> I don't know if it would really work.
>> What Twisted does is this case is swap out their own select
>> loop with another implementation built around GLib's GMainLoop,
> If it's truly impossible to incorporate GMainLoop as a
> sub-loop of something else, then this is a bad situation.
> What happens if you also want to use some other library
> that insists on *its* main loop being in charge? This
> cannot be a general solution.

GLib has a way of embedding its main loop in another, but it's not
easy or viable to use in a situation like this. It basically splits up
its event loop into multiple pieces (prepare, check, dispatch), which
you call at various times. Qt uses this for their GLib mainloop

It's clear there's never going to be one event loop solution (as Guido
already mentioned, there's wars about libuv/libevent/libev that we
can't possibly resolve), so why pretend like there is?

> --
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


From greg.ewing at  Sat Oct 20 07:00:00 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 18:00:00 +1300
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <>

Jasper St. Pierre wrote:
> Nope. libX11/XCB keep their own queue of events and do their own
> socket management, so it's not just "poll on this FD, thanks"

So you keep going until the internal buffer is empty.
"Run once" is probably a bit inaccurate; it's really
more like "run until you don't think there's anything
more to do".

> It's clear there's never going to be one event loop solution (as Guido
> already mentioned, there's wars about libuv/libevent/libev that we
> can't possibly resolve), so why pretend like there is?

This discussion seems to have got off track. I'm not
opposed to being able to choose whichever top-level
event loop works the best for your application.

All I set out to say is that a wait-for-ready style
event loop seems more amenable to having other event
loops plugged into it than a wait-for-completion

But maybe that's not a problem if we provide an
IOCP-based event loop that can be plugged into the
wait-for-ready loop of your choice. Is that likely
to be feasible?


From greg.ewing at  Sat Oct 20 07:19:11 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 18:19:11 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Jim Jewett wrote:

> Who says that there has to be a scheduler?  Or at least a single scheduler?
> To me, the "obvious" solution is that each co-routine is "scheduled"
> only by its own caller, and runs on its own micro-thread.

I think you may be confused about what we mean by a "scheduler".
The scheduler is not something that you tell which task should
run next. Rather, the scheduler decides which task to run
next when the current task says "I'm waiting for something,
let someone else have a turn." The task that gets run will
very often be one that the suspending task knows nothing

It's for that reason -- not all the tasks know about each
other -- that I think it's best to have only one scheduler in
any given system, so that it can make the best decision about
what to run next.


From jeanpierreda at  Sat Oct 20 07:27:52 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Sat, 20 Oct 2012 01:27:52 -0400
Subject: [Python-ideas] The async API of the future: yield-from
Message-ID: <>

On Fri, Oct 19, 2012 at 10:44 PM, Greg Ewing
<greg.ewing at> wrote:
> If I wrote a library intended for serious use, the end user
> probably wouldn't write either of those. Instead he would
> write something like
>    yield from block(self.queue)
> and it would be an implementation detail of the library
> where abouts the 'yield' happened and whether it needed
> to send a value or not.

What's the benefit of having both "yield" and "yield from" as opposed
to just "yield"? It seems like an attractive nuisance if "yield" works
but doesn't let the function have implementation details and wait for
more than one thing or somesuch.

With the existing generator-coroutine decorators (monocle,
inlineCallbacks), there is no such trap. "yield foo()" will work no
matter how many things foo() will wait for.

My understanding is that the only benefit we get here is nicer
tracebacks. I hope there's more.

-- Devin

From greg.ewing at  Sat Oct 20 07:37:54 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 18:37:54 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Nick Coghlan wrote:

> Please don't lose sight of the fact that yield-based suspension points
> looking like something other than an ordinary function call is a
> *feature*, not a bug.

People keep asserting that, but I don't think we have enough
experience with the explicit-switching-point-markers-all-the-
way-up style of coding to tell whether it's a good idea or not.

My gut feeling is that the explicit markers will help at the
lowest levels, where you're trying to protect a critical section,
but at the upper levels they will just be noise that causes
unnecessary worry.

In one of Guido's earlier posts (which I can't find now,
unfortunately), he said something that made it sound like
he was coming around to that point of view too, but he
seems to have gone back on that recently.


From greg.ewing at  Sat Oct 20 07:52:53 2012
From: greg.ewing at (Greg Ewing)
Date: Sat, 20 Oct 2012 18:52:53 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Devin Jeanpierre wrote:
>>If I wrote a library intended for serious use, the end user
>>probably wouldn't write either of those. Instead he would
>>write something like
>>   yield from block(self.queue)

> What's the benefit of having both "yield" and "yield from" as opposed
> to just "yield"? It seems like an attractive nuisance if "yield" works
> but doesn't let the function have implementation details and wait for
> more than one thing or somesuch.

The documentation would say to use "yield from", and if
someone misreads that and just says "yield" instead, it's
their own fault.

I don't think it's worth the rather large increase in the
complexity of the scheduler implementation that would be
required to make "yield foo()" do the same thing as
"yield from foo()" in all circumstances, just to rescue
people who make this kind of mistake.

It's unfortunate that "yield" and "yield from" look so
similar. This is one way that cofunctions would help, by
making calls to subtasks look very different from yields.

> My understanding is that the only benefit we get here is nicer
> tracebacks. I hope there's more.

You also get a much simpler and much more efficient
scheduler implementation.


From ubershmekel at  Sat Oct 20 10:00:28 2012
From: ubershmekel at (Yuval Greenfield)
Date: Sat, 20 Oct 2012 10:00:28 +0200
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Thu, Oct 18, 2012 at 9:49 AM, Greg Ewing <greg.ewing at>wrote:

> I've converted my tutorial on generator-based tasks
> for Python 3.3, tidied it up a bit and posted it here:
> --
> Greg
Thanks for writing this. I've used threads all my life so this
coroutine/yield-from paradigm is hard for me to grok even after reading
this quite a few times.

I can't wrap my head around the block and unblock functions.

block() removes the current task from the ready_list, but is the current
task guaranteed to be my task? If so, then I'd never run again after the
yield in acquire(), that is unless a gracious other player unblocks me.

block() in acquire() is the philosopher or fork avoiding the scheduler?
yield in acquire() is the philosopher relinquishing control or the fork?

I think I finally figured it out after staring at it for long enough. I'm
not sure it makes sense for scheduler functions to store waiting tasks in a
queue owned by the app and invisible from the scheduler. This can
cause *invisible
deadlocks* such as:

schedule(philosopher("Socrates", 8, 3, 1, forks[0], forks[2]), "Socrates")
schedule(philosopher("Euclid", 5, 1, 4, forks[2], forks[0]), "Euclid")

Which may be really hard to debug.

Is there a coroutine strategy for tackling these challenges? Or will I just
get better at overcoming them with practice?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From glyph at  Sat Oct 20 10:33:07 2012
From: glyph at (Glyph)
Date: Sat, 20 Oct 2012 01:33:07 -0700
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing <greg.ewing at> wrote:

> Guido van Rossum wrote:
>> I would like people to be able to write fast
>> event handling programs on Windows too, ... But I don't know how
>> tenable that is given the dramatically different style used by IOCP
>> and the need to use native Windows API for all async I/O -- it sounds
>> like we could only do this if the library providing the I/O loop
>> implementation also wrapped all I/O operations, and that may be a bit
>> much.
> That's been bothering me, too. It seems like an interface accommodating the completion-based style will have to be *extremely* fat.

No, not really.  Quite the opposite, in fact.  The way to make the interface thin is to abstract out all the details related to the particulars of the multiplexing I/O underneath everything and the transport functions necessary to read data out of it.

The main interfaces you need are here:


which have maybe a dozen methods between them, and could be cleaned up for a standardized version.

The interface required for unifying over completion-based and socket-based is actually much thinner than the interface you get if you start exposing sockets all over the place.

But, focusing on I/O completion versus readiness-notification is, like the triggering modes discussion, missing the forest for the trees.  Some of IOCP's triggering modes are itself an interesting example of a pattern, but, by itself, is a bit of a red herring.  Another thing you want to abstract over is pipes versus sockets versus files versus UNIX sockets versus UNIX sockets with CMSG extensions versus TLS over TCP versus SCTP versus bluetooth.  99% of applications do not care: a stream of bytes is a stream of bytes and you have to turn it into a stream of some other, higher-layer event protocol.

I would really, really encourage everyone interested in this area of design to go read all of twisted.internet.interfaces and familiarize yourselves with the contents there and make specific comments about those existing interfaces rather than some hypothetical ideal.  Also, the Twisted chapter <> in "the architecture of open source applications" explains some of Twisted's architectural decisions.  If you're going to re-invent the wheel, it behooves you to at least check whether the existing ones are round.  I'm happy to answer questions about specifics of how things are implemented, whether the Twisted APIs have certain limitations, and filling in gaps in the documentation.  There are certainly an embarrassing profusion of those, especially in these decade-old, core APIs that haven't changed since we started requiring docstrings; if you find any, please file bugs and I will try to do what I can to get them fixed.  But I'd rather not have to keep re-describing the basics.

> That's not just a burden for anyone implementing the interface, it's a problem for any library wanting to *wrap* it as well.

I really have no idea what you mean by this.  Writing and wrapping ITransport and IProtocol is pretty straightforward.  With the enhanced interfaces I'm working on in <>, it's almost automatic.

<>, for example, is a complete bi-directional proxying of all interfaces related to transports (even TCP transport specific APIs, not just the core interfaces above), in addition to implementing all the glue necessary for TLS, with thorough docstrings and comments, all in just over 600 lines.  This also easily deals with the fact that, for example, sometimes in order to issue a read-ready notification, TLS needs to write some bytes; and in order to issue a write-ready notification, TLS sometimes needs to read some bytes.

> For example, to maintain separation between the async layer and the generator layer, we will probably want to have an AsyncSocket object in the async layer, and a separate GeneratorSocket in the generator layer that wraps an AsyncSocket.

Yes, generator scheduling and async I/O are really different layers, as I explained in a previous email.  This is a good thing as it provides a common basis for developing things in different styles as appropriate to different problem domains.  If you smash them together you're just conflating responsibilities and requiring abstraction inversions, not making it easier to implement anything.

> If the AsyncSocket needs to provide methods for all the possible I/O operations that one might want to perform on a socket, then GeneratorSocket needs to provide its own versions of all those methods as well.

GeneratorSocket does not even need to exist in the first implementation of this kind of a spec, let alone provide all possible operations.  Python managed to survive without "all the possible I/O operations that one might want to perform on a socket" for well over a decade; sendmsg and recvmsg didn't arrive until 3.3: <>.

Plus, GeneratorSocket isn't that hard to write.  You just need a method for each I/O operation that returns a Future (by which I mean Deferred, of course :)) and then fires that Future from the relevant I/O operation's callback.

> Multiply that by the number of different kinds of I/O objects (files, sockets, message queues, etc. -- there seem to be quite a lot of them on Windows) and that's a *lot* of stuff to be wrapped.

The common operations here are by far the most important.  But, yes, if you want to have support for all the wacky things that Windows provides, you have to write wrappers for all the wacky things you need to call.

>> Finally, there should also be some minimal interface so that multiple I/O loops can interact -- at least in the case where one I/O loop belongs to a GUI library.
> That's another thing that worries me. With a ready-based event loop, this is fairly straightforward. If you can get hold of the file descriptor or handle that the GUI is ultimately reading its input from, all you need to do is add it as an event source to your main loop, and when it's ready, tell the GUI event loop to run itself once.

No.  That is how X windows and ncurses work, not how GUIs in general work.

On Windows, the GUI is a message pump on a thread (and possibly a collection thereof); there's no discrete event which represents it and no completion port or event object that gets its I/O, but at the low level, you're still expected to write your own loop and call something that blocks waiting for GUI input.  (This actually causes some problems, see below.)

On Mac OS X, the GUI is an event loop of its own; you have to integrate with CFRunLoop via CFRunLoopRun (or something that eventually calls it, like NSApplicationMain), not write your own loop that calls a blocking function.  You don't get to invent your own thing with kqueue or select() and then explicitly observe "the GUI" as some individual discrete event; there's nothing to read, the GUI just calls directly into your application.  Underneath there's some mach messages and stuff, but I honestly couldn't tell you how that all works; it's not necessary to understand.  (And in fact "the GUI" is not actually just the GUI, but a whole host of notifications from other processes, the display, the sound device, and so on, that you can register for.  The documentation for NSNotificationCenter is illuminating.)

(I don't know anything about Android.  Can anyone comment authoritatively about that?)

This really doesn't have anything to do with the readiness-based-ness of the API, but rather that there is more on heaven and earth (and kernel interrupt handlers) than is dreamt of in your philosophy (and file descriptor dispatch functions).

Once again: the important thing is to separate out these fiddly low layers for each platform and get something that exposes the high layer that most python programmers care about - "incoming connection", "here are some bytes", "your connection was dropped" - in such a way that you can plug in an implementation that uses it to any one of these low-level things.

> But you can't do that with a completion-based main loop, because the actual reading of the input needs to be done in a different way, and that's usually buried somewhere deep in the GUI library where you can't easily change it.

Indeed not, but this phrasing makes it sound like "completion-based" main loops are some weird obscure thing.  This is not an edge-case problem you can sweep under the rug with the assumption that somebody will be able to wrestle a file descriptor out of the GUI somehow or emulate it eventually.  The GUI platforms that basically everyone in the world uses don't observe file descriptors for their input events.

>> It seems this is a solved problem (as well solved as you can hope for) to Twisted, so we should just adopt their
>> approach.
> Do they actually do it for an IOCP-based main loop on Windows?

No, but it's hypothetically possible.

For GUIs, we have win32eventreactor, which can't support as many sockets, but runs the message pump, which causes the GUI to run (for most GUI toolkits).  Several low-level Windows applications have used this to good effect.  (Although I don't know of any that are open source, unfortunately.)

There's also the fact that most people writing Python GUIs want to use a cross-platform library, so most of the demand for GUI sockets on Windows have been for integrating with Wx, Qt, or GTK, and we have support for all of those separately from the IOCP stuff.  It's usually possible to call the wrapped socket functions in those libraries, but more difficult to reach below the GUI library and dispatch to it one windows message pump message at a time.

> If so, I'd be interested to know how.

It's definitely possible to get a GUI to cooperate nicely with IOCP, but it's a bit challenging to figure out how.  I had a very long, unpleasant conversation with the IOCP reactor's maintainer while we refreshed our memories about the frankly sadistic IOCP API, and put together all of our various experiences working with it, trying to refresh our collective memory to the point where we remembered enough about the way IOCP actually works to be able to explain it, so I hope you enjoy this :-).

Right now Twisted's IOCP reactor uses the mode of IOCP where it passes NULL to both the lpCompletionRoutine and lpOverlapped->hEvent member of everything (i.e. WSARecv, WSASend, WSAAccept, etc).  Later, the reactor thread blocks on GetQueuedCompletionStatus, which only blocks on the associated completion port's scheduled I/O, which precludes noticing when messages arrive from the message pump.

As I mentioned above, the message pump is a discrete source of events and can't be waited upon as a C runtime "file descriptor", WSA socket, IOCP completion or thread event.  Also, you can't translate it into one of those sources, because the message pump is associated with a particular thread; you can't call a function in a different thread to call PostQueuedCompletionStatus.

There are two ways to fix this; there already is a lengthy and confusing digression in comments in the implementation explaining parts of this.

The first, and probably easiest option, is simply to create an event with CreateEvent(bManualReset=False) and fill out the hEvent structure of all queued Event objects with that same event, pass that event handle to MsgWaitForMultipleObjectsEx.  Then, if the message queue wakes up the thread, you dispatch messages the standard way (doing what win32eventreactor already does: see win32gui.PumpWaitingMessages).  If instead, the event signals, you call GetQueuedCompletionStatus as IOCP already does, and it will always return immediately.

The second (and probably higher performance) option is to fill out the lpCompletionRoutine parameter to all I/O functions, and effectively have the reactor's "loop" integrated into the implicit asynchronous procedure dispatch of any alertable function.  This would have to be MsgWaitForMultipleObjectsEx in order to wait on events added with addEvent(), in the reactor's core.  The reactor's core itself could actually just call WaitForSingleObjectEx() and it would be roughly the same except for those external events, as long as the thread is put into an alertable state.  This option is likely higher performance because it removes all the function call and iteration overhead because you effectively go straight from the kernel to the I/O handling function.  In addition to being slightly trickier though, there's also the fact that someone else might put the thread into an alertable state and the I/O completion might be done with a surprising stack frame.

If you want to integrate this with a modern .NET application (i.e. windows platform-specific stuff), I think this is the relevant document: <>; I am not sure how you'd integrate it with Wx/Tk/Qt/GTK+.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From shibturn at  Sat Oct 20 12:56:41 2012
From: shibturn at (Richard Oudkerk)
Date: Sat, 20 Oct 2012 11:56:41 +0100
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <k5u015$l7q$>

On 20/10/2012 1:33am, Greg Ewing wrote:
> That's been bothering me, too. It seems like an interface
> accommodating the completion-based style will have to be
> *extremely* fat.
> That's not just a burden for anyone implementing the
> interface, it's a problem for any library wanting to *wrap*
> it as well.
> For example, to maintain separation between the async
> layer and the generator layer, we will probably want to
> have an AsyncSocket object in the async layer, and a
> separate GeneratorSocket in the generator layer that wraps
> an AsyncSocket.
> If the AsyncSocket needs to provide methods for all the
> possible I/O operations that one might want to perform on
> a socket, then GeneratorSocket needs to provide its own
> versions of all those methods as well.
> Multiply that by the number of different kinds of I/O
> objects (files, sockets, message queues, etc. -- there
> seem to be quite a lot of them on Windows) and that's
> a *lot* of stuff to be wrapped.

I don't see why a completion api needs to create wrappers for sockets.  See

for an implementation of a completion api implemented for Unix (plus a 
stupid reactor class and some example server/client code).

The AsyncIO class is independent of reactors, futures etc.  The methods 
for starting an operation are

     recv(key, sock, nbytes, flags=0)
     send(key, sock, buf, flags=0)
     accept(key, sock)
     connect(key, sock, address)

The "key" argument is used as an identifier for the operation.  You wait 
for something to complete using


which returns a list of tuples of the form "(key, success, value)" 
representing completed operations.  "key" is the identifier used when 
starting the operation, "success" is a boolean indicating whether an 
error occurred, and "value" is the return/exception value.  To check 
whether there are any outstanding operations, use


(To make the AsyncIO class usable without a reactor one should probably 
implement a "filtered" wait so that you can restrict the keys you want 
to wait for.)


From tismer at  Sat Oct 20 13:52:19 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 13:52:19 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Hi Greg,

On 20.10.12 02:50, Greg Ewing wrote:
> Christian Tismer wrote:
>> Actually I would like to have a python context where it gets into
>> "async mode" and interprets all functions defined in that mode as 
>> generators.
> That sounds somewhat similar to another idea I proposed a while
> ago:
> There would be a special kind of function called a "cofunction",
> that you define using "codef" instead of "def". A cofunction
> is essentially a generator, but with a special property: when
> one cofunction calls another, the call is implicitly made as
> a "yield from" call.

Whow, I had a look at the patch. Without talking about the syntax,
this is very close to what I'm trying without a patch.
No, it is almost identical.
> This scheme wouldn't be completely transparent, since the
> cofunctions have to be defined in a special way. But the calls
> would look like ordinary calls.
> There's a PEP describing a variation on the idea here:
> In that version, calls to cofunctions are specially marked
> using a "cocall" keyword. But since writing that, I've come to
> believe that my original idea (where the cocalls are implicit)
> was better.

Yes, without the keyword it looks better. Would you raise an
exception if something is called that is not a cofunction? Or
would that be an ordinary call?

The only difference is that I'm not aiming at coroutines in
the first place, but just having the concept of a *suspendable*

What has happened to the PEP, was it rejected?

ciao - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From ncoghlan at  Sat Oct 20 14:16:46 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 20 Oct 2012 22:16:46 +1000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Sat, Oct 20, 2012 at 9:52 PM, Christian Tismer <tismer at> wrote:
> What has happened to the PEP, was it rejected?

No, it's still open. We just wanted to give the yield from PEP a
chance to see some use on its own before we started trying to take it
any further, and Greg was amenable to that approach.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From tismer at  Sat Oct 20 14:54:00 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 14:54:00 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On 20.10.12 13:52, Christian Tismer wrote:
> Hi Greg,
> On 20.10.12 02:50, Greg Ewing wrote:
>> Christian Tismer wrote:
>>> Actually I would like to have a python context where it gets into
>>> "async mode" and interprets all functions defined in that mode as 
>>> generators.
>> That sounds somewhat similar to another idea I proposed a while
>> ago:
>> There would be a special kind of function called a "cofunction",
>> that you define using "codef" instead of "def". A cofunction
>> is essentially a generator, but with a special property: when
>> one cofunction calls another, the call is implicitly made as
>> a "yield from" call.
> Whow, I had a look at the patch. Without talking about the syntax,
> this is very close to what I'm trying without a patch.
> No, it is almost identical.
>> This scheme wouldn't be completely transparent, since the
>> cofunctions have to be defined in a special way. But the calls
>> would look like ordinary calls.
>> There's a PEP describing a variation on the idea here:
>> In that version, calls to cofunctions are specially marked
>> using a "cocall" keyword. But since writing that, I've come to
>> believe that my original idea (where the cocalls are implicit)
>> was better.
> Yes, without the keyword it looks better. Would you raise an
> exception if something is called that is not a cofunction? Or
> would that be an ordinary call?
> The only difference is that I'm not aiming at coroutines in
> the first place, but just having the concept of a *suspendable*
> function.
> What has happened to the PEP, was it rejected?

I just saw that it is in flux and did not please you as well.

A rough idea would be to start the whole interpreter in suspendable
mode. Maybe that's too much. I'm seeking a way to tell a whole bunch
of functions that they should be suspendable.
What if we had a flag (unclear how) that function calls should behave
like cofunctions now. That flag would be initiated by a root call
and then propagated to the callees, without any syntax change?

In any case it should be possible to inquire/assert that a function
is running as cofunction.

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From tismer at  Sat Oct 20 17:39:45 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 17:39:45 +0200
Subject: [Python-ideas] Cofunctions - Back to Basics
In-Reply-To: <>
References: <> <>
	<> <>
	<> <>
Message-ID: <>


I found again a misunderstanding.

On 29.10.11 04:10, Nick Coghlan wrote:
> On Sat, Oct 29, 2011 at 8:40 AM, Ethan Furman <ethan at> wrote:
>> Greg Ewing wrote:
>>> Mark Shannon wrote:
>>>> Stackless provides coroutines. Greenlets are also coroutines (I think).
>>>> Lua has them, and is implemented in ANSI C, so it can be done portably.
>>> These all have drawbacks. Greenlets are based on non-portable
>>> (and, I believe, slightly dangerous) C hackery, and I'm given
>>> to understand that Lua coroutines can't be suspended from
>>> within a C function.
>>> My proposal has limitations, but it has the advantage of
>>> being based on fully portable and well-understood techniques.
>> If Stackless has them, could we use that code?
> That's what the greenlets module *is* - the coroutine code from
> Stackless, lifted out and provided as an extension module instead of a
> forked version of the runtime.

No, the greenlet code is a subset of stackless.
Stackless could remove its greenlet part and become
an assembler-free implementation.
It would just not get over the C extension problem.

But that could then be handled by using the greenlet as an optional
external module.

(I think I said that before. Just wanted it to appear here)

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From tismer at  Sat Oct 20 17:59:41 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 17:59:41 +0200
Subject: [Python-ideas] Cofunctions - Back to Basics
In-Reply-To: <>
References: <> <>
	<> <>
Message-ID: <>

Picking that up, too...

On 29.10.11 09:37, Greg Ewing wrote:
> Nick Coghlan wrote:
>> The limitation of Lua style coroutines is that they can't be suspended
>> from inside a function implemented in C. Without greenlets/Stackless
>> style assembly code, coroutines in Python would likely have the same
>> limitation.
>> PEP 3152 (and all generator based coroutines) have the limitation that
>> they can't suspend if there's a *Python* function on the stack. Can
>> you see why I know consider this approach categorically worse than one
>> that pursued the Lua approach?
> Ouch, yes, point taken. Fortunately, I think I may have an
> answer to this...
> Now that the cocall syntax is gone, the bytecode generated for
> a cofunction is actually identical to that of an ordinary
> function. The only difference is a flag in the code object.
> If the flag were moved into the stack frame instead, it would
> be possible to run any function in either "normal" or "coroutine"
> mode, depending on whether it was invoked via __call__ or
> __cocall__.
> So there would no longer be two kinds of function, no need for
> 'codef', and any pure-Python code could be used either way.
> This wouldn't automatically handle the problem of C code --
> existing C functions would run in "normal" mode and therefore
> wouldn't be able to yield. However, there is at least a clear
> way for C-implemented objects to participate, by providing
> a __cocall__ method that returns an iterator.

What about this idea?
I think I just wrote exactly the same thing in another thread ;-)

Is it still under consideration?

(I missed quite a lot when recovering from my strokes ...)

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From guido at  Sat Oct 20 18:10:40 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 09:10:40 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 19, 2012 at 10:27 PM, Devin Jeanpierre
<jeanpierreda at> wrote:
> On Fri, Oct 19, 2012 at 10:44 PM, Greg Ewing
> <greg.ewing at> wrote:
>> If I wrote a library intended for serious use, the end user
>> probably wouldn't write either of those. Instead he would
>> write something like
>>    yield from block(self.queue)
>> and it would be an implementation detail of the library
>> where abouts the 'yield' happened and whether it needed
>> to send a value or not.
> What's the benefit of having both "yield" and "yield from" as opposed
> to just "yield"? It seems like an attractive nuisance if "yield" works
> but doesn't let the function have implementation details and wait for
> more than one thing or somesuch.
> With the existing generator-coroutine decorators (monocle,
> inlineCallbacks), there is no such trap. "yield foo()" will work no
> matter how many things foo() will wait for.
> My understanding is that the only benefit we get here is nicer
> tracebacks. I hope there's more.

It is also *much* faster. In the "yield <future>" style (what I use in
NDB) every level that blocks involves the creation of a Future and a
bunch of code that sets its result. The scheduler has to do a lot of
work to make it work. In Greg's "yield from <generator>" style most of
those futures disappear, so adding extra layers of logic is much
cheaper. (And believe me, in a real system, like NDB is, you have to
add a lot of extra logic layers to make your API easy to use.) As a
result Greg's scheduler is much simpler. (In the last week I wrote one
to test this hypothesis, so I know.)

I do have one concern, but it can easily be addressed. Users have the
tendency to make mistakes. In NDB, a common mistake is leaving out the
yield keyword. Fortunately when you do that, nothing works, so you
typically find out quickly. The other mistake is found even easier:
writing yield where you shouldn't. The NDB scheduler rejects values
that aren't Futures so this is diagnosed precisely and with a decent
stack trace.

In the PEP 380 world, there will be a new type of mistake possible:
writing yield instead of yield from. Fortunately the scheduler can
easily test for this -- if the result of its calling next() is not
None, the user yielded something. In Greg's strict design, you should
never yield a value from a coroutine, so that's always an error. Even
in a design where values yielded are used as scheduler instructions
(albeit only by the lowest levels of the I/O wrappers), we can assume
that a value yielded should never be a generator -- so the scheduler
can throw back an exception if it receives a generator, and it can
even hint to the user "did you mean yield from instead of yield?". The
exception thrown in will show exactly the point where the from keyword
is missing. (Making diagnosing cases like this more robust actually
pleads for adopting Greg's strict stance.)

--Guido van Rossum (

From guido at  Sat Oct 20 18:30:16 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 09:30:16 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing
<greg.ewing at> wrote:
> Nick Coghlan wrote:
>> Please don't lose sight of the fact that yield-based suspension points
>> looking like something other than an ordinary function call is a
>> *feature*, not a bug.

(Ironically, Christian just revived an old thread where Nick was of a
different opinion.)

> People keep asserting that, but I don't think we have enough
> experience with the explicit-switching-point-markers-all-the-
> way-up style of coding to tell whether it's a good idea or not.

Hm. I would say I have a lot of real world experience with this style:
App Engine's NDB. It's in use by 1000s of people. I've written a lot
of complicated database code in it (integrating several layers of
caching). This style really does hold up well.

Now, I think that if it could yield from, NDB would be even better,
but for most code it would be a very minimal change, and the issue
here (the requirement to mark suspension points) is the same.

In C# they also have a lot of experience with this style -- functions
declared as async must be waited for using await, and the type checker
enforces that it's await all the way up (I think a function using
await must be declared as async, too).

> My gut feeling is that the explicit markers will help at the
> lowest levels, where you're trying to protect a critical section,
> but at the upper levels they will just be noise that causes
> unnecessary worry.

Actually, an earlier experience (like 25 years earier :-) suggests
that it's at the higher levels where you get in trouble without the
markers -- because you still have critical sections in end user code,
but it's impossible to remember which functions you call may cause a
task switch.

> In one of Guido's earlier posts (which I can't find now,
> unfortunately), he said something that made it sound like
> he was coming around to that point of view too, but he
> seems to have gone back on that recently.

I was probably more waxing philosophically on the reasons why people
like greenlets/gevent (if they like it). I feel I am pretty
consistently in favor of marking switch points, at least in the
context we are currently discussing (where high-speed async event
handling is the thing to do).

For less-performant situations I'm fine with writing classic
synchronous-looking code, and running it in multiple OS threads for
concurrency reasons. But the purpose of designing a new async API is
to break the barrier of one thread per connection.

--Guido van Rossum (

From tismer at  Sat Oct 20 19:48:33 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 19:48:33 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On 20.10.12 18:30, Guido van Rossum wrote:
> On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing
> <greg.ewing at> wrote:
>> Nick Coghlan wrote:
>>> Please don't lose sight of the fact that yield-based suspension points
>>> looking like something other than an ordinary function call is a
>>> *feature*, not a bug.
> (Ironically, Christian just revived an old thread where Nick was of a
> different opinion.)
>> People keep asserting that, but I don't think we have enough
>> experience with the explicit-switching-point-markers-all-the-
>> way-up style of coding to tell whether it's a good idea or not.
> Hm. I would say I have a lot of real world experience with this style:
> App Engine's NDB. It's in use by 1000s of people. I've written a lot
> of complicated database code in it (integrating several layers of
> caching). This style really does hold up well.
> Now, I think that if it could yield from, NDB would be even better,
> but for most code it would be a very minimal change, and the issue
> here (the requirement to mark suspension points) is the same.
> In C# they also have a lot of experience with this style -- functions
> declared as async must be waited for using await, and the type checker
> enforces that it's await all the way up (I think a function using
> await must be declared as async, too).
>> My gut feeling is that the explicit markers will help at the
>> lowest levels, where you're trying to protect a critical section,
>> but at the upper levels they will just be noise that causes
>> unnecessary worry.
> Actually, an earlier experience (like 25 years earier :-) suggests
> that it's at the higher levels where you get in trouble without the
> markers -- because you still have critical sections in end user code,
> but it's impossible to remember which functions you call may cause a
> task switch.
>> In one of Guido's earlier posts (which I can't find now,
>> unfortunately), he said something that made it sound like
>> he was coming around to that point of view too, but he
>> seems to have gone back on that recently.
> I was probably more waxing philosophically on the reasons why people
> like greenlets/gevent (if they like it). I feel I am pretty
> consistently in favor of marking switch points, at least in the
> context we are currently discussing (where high-speed async event
> handling is the thing to do).
> For less-performant situations I'm fine with writing classic
> synchronous-looking code, and running it in multiple OS threads for
> concurrency reasons. But the purpose of designing a new async API is
> to break the barrier of one thread per connection.

It is of course a bit confusing to find out who thought what and when ;-)

And yes, I see your point, but I have difficulties to see how it is done
best. If I take Stackless as an example, well, there would everything
potentially be marked as some codef, because it is simply everywhere 

But just for the fact that something _supports_ suspension or switching
is IMHO not enough reason to clutter the code everywhere.
What I think is need is a way to distinguish critical code paths.
Not sure how this should be.

Maybe it's my limited understanding.
The generator-based functions do not get switched from alone.
If they want to do that, they call some special function, and I would
mark them for doing that.
But all the tree up? I can't see the benefit so much.
Maybe it would be less verbose to have decorators that assert something
does _not_ switch, like guards? Or maybe add properties?

I agree that one needs certain information about the program that
can easily be extracted. Perhaps this could be done with an analysing
tool. This tool would only need additional hints if things are very dynamic,
like variables holding certain constructs which are known at runtime

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From guido at  Sat Oct 20 20:12:37 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 11:12:37 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Fri, Oct 19, 2012 at 7:41 PM, Steve Dower <Steve.Dower at> wrote:
> I'm not entirely sure whether I'm hijacking the thread here... I have to admit I've somewhat lost track with all the renames. The discussion has been very interesting (I really like the 'codef' idea, and decorators can provide this without requiring syntax changes) regardless of which thread is active.
> I have spent a bit of time writing up the approach that we (Dino, who posted it here originally, myself and with some advice from colleagues who are working on a similar API for C++) came up with and implemented.
> I must apologise for the length - I got a little carried away with background information, but I do believe that it is important for us to understand exactly what problem we're trying to solve so that we aren't distracted by "new toys".
> The write-up is here:
> I included code, since there have been a few people asking for prototype implementations, so if you want to skip ahead to the code (which is quite heavily annotated) it is at or (I based my example on Greg's socket spam, so thanks for that!)
> And no, I'm not collecting any ad revenue from the page, so feel free to visit as often as you like and use up my bandwidth.
> Let the tearing apart of my approach commence! :)

Couple of questions and comments.

- You mention a query interface a few times but there are no details
in your example code; can you elaborate? (Or was that a typo for

- This is almost completely isomorphic with NDB's tasklets, except
that you borrow the Future class implementation from
concurrent.futures -- I think that's the wrong building block to start
with, because it is linked too closely to threads.

- There is a big speed difference between yield from <generator> and
yield <future>. With yield <future>, the scheduler has to do
significant work for each yield at an intermediate level, whereas with
yield from, the schedule is only involved when actual blocking needs
to be performed. In my experience, real code has lots of intermediate
levels. Therefore I would like to use yield from. You can already do
most things with yield from that you can do with Futures; there are a
few operations that need a helper (in particular spawning truly
concurrent tasks), but the helper code can be much simpler than the
Future object, and isn't needed as often, so it's still a bare win.

- Nit: I don't like calling the event loop context; there are too many
things called context (e.g. context managers in Python), so I prefer
to call it what it is -- event loop or I/O loop.

- Nittier nit: you have a few stray colons, e.g. "it = iter(fn(*args,
**kwargs)):" .

--Guido van Rossum (

From guido at  Sat Oct 20 20:29:09 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 11:29:09 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Maybe it would help if I was more straightforward.

I do not want to solve this problem by introducing yet another
language mechanism. This rules out codef, greenlets, and Stackless.

I want to solve it using what we have in Python 3.3. And I want to
solve it for all platforms where Python 3.3 runs and for all Python
3.3 implementations (assuming Jython, IronPython and PyPy will
eventually get there).

Basically this leaves as options OS threads, callbacks (+ Deferred),
or yield [from].

Using OS threads the problem is solved without writing any code, but
does not scale, so it does not really solve the problem.

To scale we need either?callbacks or yield [from], or both.

I accept that some people prefer to use callbacks and Deferred. I want
to respect this choice and I want to see integration with this style
at the event loop level.

But for myself, I know that I want to write *most* code without
callbacks (or Deferreds), and I am quite comfortable to use yield or
yield from instead. (I have written a lot of code using yield
<future>, and I am now practicing yield from <generator> -- the
transition is quite easy and I like what I see.)

If you are not happy with what we can do in (portable) Python 3.3, we
are not talking about solving the same problem.

If you are happy using OS threads, we are also not talking about
solving the same problem. (To be sure, there is a place for them in my
solution -- but it is only needed for things we cannot run
asynchronously, such as socket.getaddrinfo().)

If you are not happy using callbacks/Deferred nor using yield[from],
you're welcome to use greenlets or Stackless. But they will not be in
the standard library.

--Guido van Rossum (

From jstpierre at  Sat Oct 20 21:25:44 2012
From: jstpierre at (Jasper St. Pierre)
Date: Sat, 20 Oct 2012 15:25:44 -0400
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

I'm curious now... you keep mentioning Futures and Deferreds like
they're two separate entities. What distinction between the two do you

On Sat, Oct 20, 2012 at 2:29 PM, Guido van Rossum <guido at> wrote:
> Maybe it would help if I was more straightforward.
> I do not want to solve this problem by introducing yet another
> language mechanism. This rules out codef, greenlets, and Stackless.
> I want to solve it using what we have in Python 3.3. And I want to
> solve it for all platforms where Python 3.3 runs and for all Python
> 3.3 implementations (assuming Jython, IronPython and PyPy will
> eventually get there).
> Basically this leaves as options OS threads, callbacks (+ Deferred),
> or yield [from].
> Using OS threads the problem is solved without writing any code, but
> does not scale, so it does not really solve the problem.
> To scale we need either callbacks or yield [from], or both.
> I accept that some people prefer to use callbacks and Deferred. I want
> to respect this choice and I want to see integration with this style
> at the event loop level.
> But for myself, I know that I want to write *most* code without
> callbacks (or Deferreds), and I am quite comfortable to use yield or
> yield from instead. (I have written a lot of code using yield
> <future>, and I am now practicing yield from <generator> -- the
> transition is quite easy and I like what I see.)
> If you are not happy with what we can do in (portable) Python 3.3, we
> are not talking about solving the same problem.
> If you are happy using OS threads, we are also not talking about
> solving the same problem. (To be sure, there is a place for them in my
> solution -- but it is only needed for things we cannot run
> asynchronously, such as socket.getaddrinfo().)
> If you are not happy using callbacks/Deferred nor using yield[from],
> you're welcome to use greenlets or Stackless. But they will not be in
> the standard library.
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


From Steve.Dower at  Sat Oct 20 21:31:12 2012
From: Steve.Dower at (Steve Dower)
Date: Sat, 20 Oct 2012 19:31:12 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

> - Nit: I don't like calling the event loop context; there are too many
> things called context (e.g. context managers in Python), so I prefer
> to call it what it is -- event loop or I/O loop.

The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate.

> - You mention a query interface a few times but there are no details
> in your example code; can you elaborate? (Or was that a typo for
> queue?)

I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type. The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.)

> - This is almost completely isomorphic with NDB's tasklets, except
> that you borrow the Future class implementation from
> concurrent.futures -- I think that's the wrong building block to start
> with, because it is linked too closely to threads.

As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can.

> - There is a big speed difference between yield from <generator> and
> yield <future>. With yield <future>, the scheduler has to do
> significant work for each yield at an intermediate level, whereas with
> yield from, the schedule is only involved when actual blocking needs
> to be performed. In my experience, real code has lots of intermediate
> levels. Therefore I would like to use yield from. You can already do
> most things with yield from that you can do with Futures; there are a
> few operations that need a helper (in particular spawning truly
> concurrent tasks), but the helper code can be much simpler than the
> Future object, and isn't needed as often, so it's still a bare win.

I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created. The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task.

yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?)

From tismer at  Sat Oct 20 22:06:26 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 22:06:26 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Then I have another short one...

Sent from my Ei4Steve

On Oct 20, 2012, at 14:16, Nick Coghlan <ncoghlan at> wrote:

> On Sat, Oct 20, 2012 at 9:52 PM, Christian Tismer <tismer at> wrote:
>> What has happened to the PEP, was it rejected?
> No, it's still open. We just wanted to give the yield from PEP a
> chance to see some use on its own before we started trying to take it
> any further, and Greg was amenable to that approach.

I often see the phrase "coroutine" but
without explicit mention if symmetric
(greenlet) or asymmetric (Lua). 

Then when I hear myself quibbling
about "full generators" then I mean
generators that are made up of a few
functions that all can yield to the
caller of this generator. 

Question: is "full generators" equivalent
to "asymmetric coroutine"?  

Question 2: when people talk about
coroutines, is "asymmetric" the default

The reason that I ask is my fear to
create problems when answering to
messages with different default meanings
in mind. 

Thanks - chris

From tismer at  Sat Oct 20 22:55:15 2012
From: tismer at (Christian Tismer)
Date: Sat, 20 Oct 2012 22:55:15 +0200
Subject: [Python-ideas] Language "macros" in discussions
Message-ID: <>


I have a tendency to mention constructs
from other threads in a discussion. 
This might suggest that I'm propose
using this not-yet-included or even 
accepted feature as a solution. For instance
Guido's reaction to my last message
might be an indicator of misinterpreting
this, although I'm not sure if I was
Prinarily addressed at all (despite that the to/cc
suggested it). 

Anyway, I just want to make sure:

If I'm mentioning stackless or codef
or greenlet, this does not imply that I
propose to code the solution to async
by implementing such a thing, first. 
The opposite is true. 

I mean such meantioning more like
a macro-like feature: 
I'm implementing structures using the existing
things, but adhere to a coding style
that stays compatible to one of the mentioned

This is like a macro feature of my brain
- I talk about codef, but code it using

So please don't take me wrong that I
want to push for features to be
included. This is only virtual. I use yield
constructs, but obey the codef protocol,
for instance. 

And as an addition: when I'm talking
of generators implemented by yield from,
then this is just a generator that can
yield from any of its sub-functions. 

I am not talking about tasks or schedulars. 
These constructs do not belong there. 
I'm strongly against using "yield from"
for this. 
It is a building block for generatos
resp. coroutines, and there it stops !

Higher level stuff should by no means
use those primitives at all. 

Sent from my Ei4Steve

From guido at  Sun Oct 21 00:37:17 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 15:37:17 -0700
Subject: [Python-ideas] Language "macros" in discussions
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 20, 2012 at 1:55 PM, Christian Tismer <tismer at> wrote:
> Clarification:
> I have a tendency to mention constructs
> from other threads in a discussion.
> This might suggest that I'm propose
> using this not-yet-included or even
> accepted feature as a solution. For instance
> Guido's reaction to my last message
> might be an indicator of misinterpreting
> this, although I'm not sure if I was
> Prinarily addressed at all (despite that the to/cc
> suggested it).
> Anyway, I just want to make sure:
> If I'm mentioning stackless or codef
> or greenlet, this does not imply that I
> propose to code the solution to async
> by implementing such a thing, first.
> The opposite is true.
> I mean such meantioning more like
> a macro-like feature:
> I'm implementing structures using the existing
> things, but adhere to a coding style
> that stays compatible to one of the mentioned
> principles.
> This is like a macro feature of my brain
> - I talk about codef, but code it using
> yield-from.
> So please don't take me wrong that I
> want to push for features to be
> included. This is only virtual. I use yield
> constructs, but obey the codef protocol,
> for instance.
> And as an addition: when I'm talking
> of generators implemented by yield from,
> then this is just a generator that can
> yield from any of its sub-functions.
> I am not talking about tasks or schedulars.
> These constructs do not belong there.
> I'm strongly against using "yield from"
> for this.
> It is a building block for generatos
> resp. coroutines, and there it stops !
> Higher level stuff should by no means
> use those primitives at all.

Ok, understood, and sorry if I mistook your intention before. Here's
how I tend to use some terminology:

- generator function: any function containing yield
- generator object: the iterator returned by calling a generator function
- generator: either of the above, when the context makes it clear
which one I mean, or when it doesn't matter
- iterator generator: a generator used to produce values that one
would consume with an implicit or explicit for-loop
- coroutine: a generator function used to implement an async
computation instead of an iterator
- Future: something with roughly the interface but not necessarily the
implementation of PEP 3148
- Deferred: the Twisted Deferred class or something with similar
functionality (there are some in the JavaScript world)

Note that I use coroutine for both PEP-342-style and PEP-380-style
generators (i.e. "yield <future>" vs. "yield from <generator>").

The big difference between Futures and Deferreds is that Deferreds can
easily be chains together to create multiple stages, and each callback
is called with the value returned from the previous stage; also,
Deferreds have separate callback chains for regular values and errors.

> Sent from my Ei4Steve

Does it happen to have a 40-char wide screen? :-)

--Guido van Rossum (

From guido at  Sun Oct 21 00:38:52 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 15:38:52 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre
<jstpierre at> wrote:
> I'm curious now... you keep mentioning Futures and Deferreds like
> they're two separate entities. What distinction between the two do you
> see?

They have different interfaces and you end up using them differently.
In particular, quoting myself from another thread, here is how I use
the terms:

- Future: something with roughly the interface but not necessarily the
implementation of PEP 3148.

- Deferred: the Twisted Deferred class or something with very similar
functionality (there are some in the JavaScript world).

The big difference between Futures and Deferreds is that Deferreds can
easily be chains together to create multiple stages, and each callback
is called with the value returned from the previous stage; also,
Deferreds have separate callback chains for regular values and errors.

--Guido van Rossum (

From guido at  Sun Oct 21 00:39:59 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 15:39:59 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Sat, Oct 20, 2012 at 1:06 PM, Christian Tismer <tismer at> wrote:
> I often see the phrase "coroutine" but
> without explicit mention if symmetric
> (greenlet) or asymmetric (Lua).
> Then when I hear myself quibbling
> about "full generators" then I mean
> generators that are made up of a few
> functions that all can yield to the
> caller of this generator.
> Question: is "full generators" equivalent
> to "asymmetric coroutine"?
> Question 2: when people talk about
> coroutines, is "asymmetric" the default
> here?
> The reason that I ask is my fear to
> create problems when answering to
> messages with different default meanings
> in mind.

I believe I just answered this is another thread (that you also started).

--Guido van Rossum (

From guido at  Sun Oct 21 01:11:15 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 16:11:15 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Sat, Oct 20, 2012 at 12:31 PM, Steve Dower <Steve.Dower at> wrote:
>> - Nit: I don't like calling the event loop context; there are too many
>> things called context (e.g. context managers in Python), so I prefer
>> to call it what it is -- event loop or I/O loop.
> The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate.

I'm happy to settle on event loop. (Terminology in this area seems
fraught with conflicting conventions; Twisted calls it a reactor,
after the reactor pattern, but I've been chided by others for using
this term without explanation; Tornado calls it I/O loop.)

>> - You mention a query interface a few times but there are no details
>> in your example code; can you elaborate? (Or was that a typo for
>> queue?)
> I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type.

Then what is the use? What *is* its contract?

> The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.)

I'm all for declaring select() an implementation detail. It doesn't
scale on any platform; on Windows it only works for sockets; the
properly scaling alternative varies per platform. (It is IOCP on
Windows, right?)

This *probably* also means that the concept of file descriptor is out
the window (even though Tornado apparently cannot do anything without
it -- it's probably not used on Windows at all). And I suspect that it
means that the implementation of the socket abstraction will vary per
platform. The collection of other implementations of the same
abstraction available, and even available other abstractions, will
also vary per platform -- on Unix, there are pseudo ttys, pipes, named
pipes, and unix domain sockets; I don't recall the set available on
Windows, but I'm sure it is different. Then there is SSL/TLS, which
feels like it requires special handling but in the end implements an
abstraction similar to sockets.

I assume that in many cases it is easy to bridge from the various
platform-specific abstractions and implementation to more
cross-platform abstractions; this is where the notions of transports
and protocols seem most important. I haven't explored those enough,

One note inspired by my mention of SSL, but also by discussions about
GUI event loops in other threads: it is easy to think that everything
is reducible to a file descriptor, but often it is not that easy. E.g.
with something like SSL, you can't just select on the underlying
socket, and then when it's ready call the read() method of the SSL
layer -- it's possible that the read() will still block because the
socket didn't have enough bytes to be able to decrypt the next block
of data. Similar for sockets associated with e.g. GUI event management
(e.g. X).

>> - This is almost completely isomorphic with NDB's tasklets, except
>> that you borrow the Future class implementation from
>> concurrent.futures -- I think that's the wrong building block to start
>> with, because it is linked too closely to threads.
> As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can.

I wish it was true. But the Future class contains a condition
variable, and the Waiter class used by the implementation uses an
event. Both are directly imported from the threading module, and if
you block on either of these, it is a hard block (not even
interruptable by a signal).

Don't worry too much about this -- it's just the particular
implementation (concurrent.futures.Future). We can define a better
Future class for our purposes elsewhere, with the same interface (or a
subset -- I don't care much for the whole cancellation feature) but
without references to threading. For those Futures, we'll have to
decide what should happen if you call result() when the Future isn't
done yet -- raise an error (similar to EWOULDBLOCK), or somehow block,
possibly running a recursive event loop? (That's what NDB does, but
not everybody likes it.)

I think the normal approach would be to ask the scheduler to suspend
the current task until the Future is ready -- it can easily arrange
for that by adding a callback. In NDB this is spelled "yield
<future>". In the yield-from <generator> world we could spell it that
way too (i.e. yield, not yield from), or we could make it so that we
can write yield from <future>, or perhaps we need a helper call: yield
from wait(<future>) or maybe a method on the Future class (since it is
our own), yield from <future>.wait(). These are API design details.

(I also have a need to block for the Futures returned by
ThreadPoolExecutor and ProcessPoolExecutor -- those are handy when you
really can't run something inline in the event loop -- the simplest
example being getaddrinfo(), which may block for DNS.)

>> - There is a big speed difference between yield from <generator> and
>> yield <future>. With yield <future>, the scheduler has to do
>> significant work for each yield at an intermediate level, whereas with
>> yield from, the schedule is only involved when actual blocking needs
>> to be performed. In my experience, real code has lots of intermediate
>> levels. Therefore I would like to use yield from. You can already do
>> most things with yield from that you can do with Futures; there are a
>> few operations that need a helper (in particular spawning truly
>> concurrent tasks), but the helper code can be much simpler than the
>> Future object, and isn't needed as often, so it's still a bare win.
> I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created.

IIUC every yield must pass a Future, and every time that happens the
scheduler gets it and must arrange for a callback on that Future which
resumes the generator. I have code like that in NDB and you have very
similar code like that in your version (wrapper in @async, and later

> The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task.

Interesting that you always run the first step immediately. I don't do
this in NDB. Can you explain why you think you need it? (It may simply
be an optimization I've overlooked. :-)

> yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?)

Yeah, this is what I do in NDB, as I mentioned above (the recursive
event loop call).

But I suspect it would be very easy to write a helper function that
you give a generator and which runs it to completion. It would also
have to invoke the event loop, but that seems unavoidable, and
otherwise the event loop isn't running in interactive mode, right?
(Unless it runs in a separate thread, in which case the helper
function should just communicate with that thread.)

Final remark: I keep wondering if it's better to try and stay "pure"
in the public API and use only yield from, plus some helpers like
spawn(), join() and par(), or if a decent, pragmatic public API can
offer a combination. I worry that most users will have a hard time
remembering when to use yield and when yield from.

--Guido van Rossum (

From greg.ewing at  Sun Oct 21 01:21:05 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 21 Oct 2012 12:21:05 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Yuval Greenfield wrote:
> block() removes the current task from the ready_list, but is the current 
> task guaranteed to be my task?

Yes, block() always operates on the currently running task.

> If so, then I'd never run again after the 
> yield in acquire(), that is unless a gracious other player unblocks me.

Yes, the unblocking needs to be done by another task, or by
something outside the task system such as an I/O callback.

> I'm not sure it makes sense for scheduler functions to store waiting 
> tasks in a queue owned by the app and invisible from the scheduler. This 
> can cause *invisible deadlocks* such as:
> schedule(philosopher("Socrates", 8, 3, 1, forks[0], forks[2]), "Socrates")
> schedule(philosopher("Euclid", 5, 1, 4, forks[2], forks[0]), "Euclid")

Deadlocks are a potential problem in any system involving
concurrency, and have to be dealt with on a case-by-case

Simply having the scheduler know where all the tasks are
will not prevent deadlocks. It might make it possible for the
scheduler to *detect* deadlocks, but you still have to do
something about them.

Having said that, I'm thinking about writing a more elaborate
version of my scheduler that does keep track of which queue a
task is waiting on, mainly so that tasks can be cancelled

> Is there a coroutine strategy for tackling these challenges? Or will I 
> just get better at overcoming them with practice?

If you've been using threads all your life as you say, then
you're probably already pretty good at dealing with them. All
of the same techniques apply.


From greg.ewing at  Sun Oct 21 01:26:04 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 21 Oct 2012 12:26:04 +1300
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <>

Glyph wrote:

> The main interfaces you need are here:
> <>
> <>
> <>
> <>

These don't look anywhere near adequate to me. How do I make
a sendmsg() call on a unix-domain socket and pass access rights
over it? How do I do a readdir() on a file descriptor representing
a directory? Etc.


From greg.ewing at  Sun Oct 21 01:41:41 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 21 Oct 2012 12:41:41 +1300
Subject: [Python-ideas] The async API of the future
In-Reply-To: <k5u015$l7q$>
References: <>
	<> <k5u015$l7q$>
Message-ID: <>

Richard Oudkerk wrote:
> I don't see why a completion api needs to create wrappers for sockets.  See
> ...
> The AsyncIO class is independent of reactors, futures etc.  The methods 
> for starting an operation are
>     recv(key, sock, nbytes, flags=0)
>     send(key, sock, buf, flags=0)
>     accept(key, sock)
>     connect(key, sock, address)

That looks awfully like a wrapper for a socket to me. All of those
system calls are peculiar to sockets.

There doesn't necessarily have to be a wrapper class for each kind
of file descriptor. There could be one I/O class that handles everything,
or there could just be a collection of functions.

The point is that, with a completion-based model, you need a function
or method for every possible system call that you might want to perform


From greg.ewing at  Sun Oct 21 01:52:40 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 21 Oct 2012 12:52:40 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Christian Tismer wrote:
> Would you raise an
> exception if something is called that is not a cofunction? Or
> would that be an ordinary call?

A cofunction calling a non-cofunction is fine, it just makes
an ordinary call.

But if a non-cofunction tries to call a cofunction using an
ordinary call, an exception raised. Effectively, cofunctions
do *not* implement __call__ (instead they implement a new
protocol __cocall__).

> The only difference is that I'm not aiming at coroutines in
> the first place, but just having the concept of a *suspendable*
> function.

I'm not sure what the distinction is.

> What has happened to the PEP, was it rejected?

No, its status is still listed as "draft". It's probably too
soon to consider whether it should be accepted or rejected;
we need more experience with yield-from based task systems


From guido at  Sun Oct 21 01:53:06 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 16:53:06 -0700
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
	<> <k5u015$l7q$>
Message-ID: <>

On Sat, Oct 20, 2012 at 4:41 PM, Greg Ewing <greg.ewing at> wrote:
> The point is that, with a completion-based model, you need a function
> or method for every possible system call that you might want to perform
> asynchronously.

TBH, I like APIs that wrap all system calls. System calls have too
many low-level details that you have to be aware of, and they too
often vary per platform. (I just wrote a simple event loop plus
scheduler along the lines of your essay, extending it to the point
where I could do basic, fully-async, HTTP exchanges. The number of
details I had to take care of was excruciating; and then there were
the subtle differences between OSX and Ubuntu.)

--Guido van Rossum (

From guido at  Sun Oct 21 02:02:53 2012
From: guido at (Guido van Rossum)
Date: Sat, 20 Oct 2012 17:02:53 -0700
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sat, Oct 20, 2012 at 4:21 PM, Greg Ewing <greg.ewing at> wrote:
> Simply having the scheduler know where all the tasks are
> will not prevent deadlocks. It might make it possible for the
> scheduler to *detect* deadlocks, but you still have to do
> something about them.
> Having said that, I'm thinking about writing a more elaborate
> version of my scheduler that does keep track of which queue a
> task is waiting on, mainly so that tasks can be cancelled
> cleanly.

In NDB, I have a feature that detects most deadlocks -- the Future
class keeps track of all incomplete instances, and it can dump this
list at request. Futures also keep some information about where and
for what purpose they were created. Finally, to tie it all together,
there's code that detects that you're waiting for something to happen
but the event loop is out of things to do (i.e. no pending RPCs, no
"call later" callbacks left -- hence, no progress can possibly be

This feature has caught mostly bugs in NDB itself -- because NDB is
primarily a database API, regular NDB users don't normally write code
that is likely to deadlock. But in the wider Python 3 world, where
regular users would be writing (presumably buggy) protocol
implementations and their own servers and clients, I suspect debugging
features can make and break a system like this.

--Guido van Rossum (

From greg.ewing at  Sun Oct 21 02:09:04 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 21 Oct 2012 13:09:04 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

Christian Tismer wrote:

> A rough idea would be to start the whole interpreter in suspendable
> mode. Maybe that's too much. I'm seeking a way to tell a whole bunch
> of functions that they should be suspendable.

I'm not sure it's really feasible to do that. It seems easy
enough at first sight, but keep in mind that it would only work
for pure Python code called directly from other pure Python
code. There are many corners where it wouldn't work -- all the
operator methods, for example, and anything else called through
a type slot -- unless you went to a *lot* of work to provide
alternative suspendable versions of all the type slots.


From yaroslav at  Sun Oct 21 02:09:43 2012
From: yaroslav at (Yaroslav Fedevych)
Date: Sun, 21 Oct 2012 03:09:43 +0300
Subject: [Python-ideas] asyncore and stuff
Message-ID: <>

So Guido told it's better to discuss things here.

Mostly reiterating what I said in the G+ thread. I'm by no means a
greybeard in library/language design, and have no successful async
project behind me, so please take what I'm saying with a grain of
salt. I just want to highlight a point I feel is very important.

There should be standard library, but no standard framework. Please.

You see, there is a number of event-driven frameworks, and they all
suck. Sorry for being blunt, but each one of them is more or less
voluntarily described as being almost the ultimate silver bullet, or
even a silver grenade, the One Framework to rule them all. The truth
is that every framework that prospers today was started as a scratch
to a specific itch, and it might be a perfect scratch for that class
of itches. I know of no application framework designed as being the
ultimate scratch for every itch that is not dead and forgotten, or
described on a resource other than thedailywtf.

There is a reason for this state of things, mainly that the real world
is a rather complex pile of crap, and there is no nice uniform model
into which you can fit all of that crap and expect the model still to
be good for any practical use. Thus in the world of software, which is
notoriously complex subset of the crap the real world is, we are going
to live with dozens of event models, asynchronous I/O models, gobs of
event loops. Every one of them (even WaitForMultipleObjects() kind of
loop) is a priceless tool for a specific class of problem it's
designed to solve, so it won't go away, ever.

The standard library, on the other hand, IS the ultimate tool. It is
the way things should work. The people look at it as the reference,
the code written the way it should be, encompassing the best of the
best practices out there. Everyone keeps telling, just look at how
this thing is implemented. Look, it's in the stdlib, don't reinvent
the wheel. It illustrates the Right Way to use the language and the
runtime, the ultimate argument to end doubts.

In my opinion, the reason a standard library can be regarded this high
is exactly because it provides high-quality examples (or at least it
should do that), materials, bits and tools, but does not limit you in
the way those tools can be used, and does not impose its rules on you
if you want to actually roll something of your own. No framework in
the world should have this power, as it would defeat the very reason
frameworks do exist.

And that's why I think, while asyncore and other expired batteries
need to be cleaned up and upgraded (so they are of any use), I expect
that no existing frameworks would enter the stdlib as de jure
standard. I would expect instead that there would be useful primitives
each of these frameworks implements anyway, and make the standard
networking modules aware of those.

But please, no bringing $MYFAVORITEFRAMEWORK into stdlib. You will
either end up with something horrendous to support every existing
mainloop implemetation out there, unwieldy and buggy, or you will make
a clean, elegant framework that won't solve anyone's problem, will be
incompatible with the rest of the world and fall into disuse. Or you
can bring some gevent, or Tornado, you name it, into stdlib, and make
the users of the remaining dozens of frameworks feel like damned

I feel the same about web things. Picking the tools to parse HTTP
requests and forming the responses is okay, as HTTP is not a simple
thing; bringing into the standard library the templating engine,
routing engine, or, God forbid, an ORM, would be totally insane.

From solipsis at  Sun Oct 21 02:07:40 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 21 Oct 2012 02:07:40 +0200
Subject: [Python-ideas] The async API of the future
References: <>
	<> <k5u015$l7q$>
Message-ID: <>

On Sun, 21 Oct 2012 12:41:41 +1300
Greg Ewing <greg.ewing at> wrote:
> Richard Oudkerk wrote:
> > I don't see why a completion api needs to create wrappers for sockets.  See
> > 
> >
> > 
> > ...
> > 
> > The AsyncIO class is independent of reactors, futures etc.  The methods 
> > for starting an operation are
> > 
> >     recv(key, sock, nbytes, flags=0)
> >     send(key, sock, buf, flags=0)
> >     accept(key, sock)
> >     connect(key, sock, address)
> That looks awfully like a wrapper for a socket to me. All of those
> system calls are peculiar to sockets.
> There doesn't necessarily have to be a wrapper class for each kind
> of file descriptor. There could be one I/O class that handles everything,
> or there could just be a collection of functions.
> The point is that, with a completion-based model, you need a function
> or method for every possible system call that you might want to perform
> asynchronously.

There aren't that many of them, though: the four Richard listed should
already be enough for most network applications, AFAIK.

I really think Richard's proposal is a sane building block.



From solipsis at  Sun Oct 21 02:09:14 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 21 Oct 2012 02:09:14 +0200
Subject: [Python-ideas] The async API of the future
References: <>
Message-ID: <>

On Sun, 21 Oct 2012 12:26:04 +1300
Greg Ewing <greg.ewing at> wrote:
> Glyph wrote:
> > The main interfaces you need are here:
> > 
> > <>
> > <>
> > <>
> > <>
> These don't look anywhere near adequate to me. How do I make
> a sendmsg() call on a unix-domain socket and pass access rights
> over it?

Looks like your question is answered here:



From greg.ewing at  Sun Oct 21 02:18:55 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 21 Oct 2012 13:18:55 +1300
Subject: [Python-ideas] Cofunctions - Back to Basics
In-Reply-To: <>
References: <> <>
	<> <>
	<> <>
Message-ID: <>

Christian Tismer wrote:
> Picking that up, too...
> On 29.10.11 09:37, Greg Ewing wrote:
>> If the flag were moved into the stack frame instead, it would
>> be possible to run any function in either "normal" or "coroutine"
>> mode, depending on whether it was invoked via __call__ or
>> __cocall__.
> What about this idea?
> I think I just wrote exactly the same thing in another thread ;-)

Yes, it's the same idea. As you can see, I did consider it at
one point, but I had second thoughts when I realised that it
wouldn't work through type slots, meaning that there would be
some areas of what appear to be pure Python code, but are not
suspendable for non-obvious reasons.

Maybe this is not a fatal problem -- we just tell people that
__xxx__ methods are not suspendable. It's something to consider.


From andrew.robert.moffat at  Sun Oct 21 02:33:48 2012
From: andrew.robert.moffat at (Andrew Moffat)
Date: Sat, 20 Oct 2012 19:33:48 -0500
Subject: [Python-ideas] Interest in seeing in the stdlib
Message-ID: <>


I'm the author of, an intuitive interface for launching subprocesses
in Linux and OSX  It has been maintained on
github for about 10 months and currently has
about 25k installs, according to (,

Andy Grover maintains the Fedora rpm for  and Nick
Moffit has submitted an older version of (which was called pbs) to be
included in Debian distros

I'm interested in making more accessible to help bring Python forward
in the area of shell scripting, so I'm interested in seeing if sh would be
suitable for the standard library.  Is there any other interest in
something like this?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From mikegraham at  Sun Oct 21 03:02:58 2012
From: mikegraham at (Mike Graham)
Date: Sat, 20 Oct 2012 21:02:58 -0400
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat
<andrew.robert.moffat at> wrote:
> Hi,
> I'm the author of, an intuitive interface for launching subprocesses
> in Linux and OSX  It has been maintained on
> github for about 10 months and currently has
> about 25k installs, according to
> (,
> Andy Grover maintains the Fedora rpm for
>  and Nick
> Moffit has submitted an older version of (which was called pbs) to be
> included in Debian distros
> I'm interested in making more accessible to help bring Python forward
> in the area of shell scripting, so I'm interested in seeing if sh would be
> suitable for the standard library.  Is there any other interest in something
> like this?
> Thanks strikes me as on the clever side for the stdlib and the lack of
Windows support would be very unfortunate for a stdlib module (I don't
know if this is relatively easily fixed, though it seems possible)


From glyph at  Sun Oct 21 03:52:48 2012
From: glyph at (Glyph)
Date: Sat, 20 Oct 2012 18:52:48 -0700
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
	<> <k5u015$l7q$>
Message-ID: <>

On Oct 20, 2012, at 4:53 PM, Guido van Rossum <guido at> wrote:

> On Sat, Oct 20, 2012 at 4:41 PM, Greg Ewing <greg.ewing at> wrote:
>> The point is that, with a completion-based model, you need a function
>> or method for every possible system call that you might want to perform
>> asynchronously.
> TBH, I like APIs that wrap all system calls. System calls have too
> many low-level details that you have to be aware of, and they too
> often vary per platform. (I just wrote a simple event loop plus
> scheduler along the lines of your essay, extending it to the point
> where I could do basic, fully-async, HTTP exchanges. The number of
> details I had to take care of was excruciating; and then there were
> the subtle differences between OSX and Ubuntu.)

The layer that wraps the system calls does not necessarily be visible to applications.  You absolutely need the syscalls to be exposed directly at some lower, non-standardized level, because it takes on average 15 years to shake out all the differences between platform behavior that you observed here :-).  If applications try to do this, they will always get it wrong, and besides, they want to be making different syscalls for different transports.

Much of Twisted's development has been about discovering exciting new behaviors on new platforms or new versions of supported platforms in the face of new levels of load, concurrency, or some other attribute.

A minor nitpick: system calls aren't usually be performed asynchronously; you execute the syscall non-blockingly, and then you complete the action asynchronously.  The whole idea of asynchronous I/O via non-blocking APIs implies some level of syscall wrapping.)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Sun Oct 21 04:18:31 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 21 Oct 2012 12:18:31 +1000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sun, Oct 21, 2012 at 2:30 AM, Guido van Rossum <guido at> wrote:
> On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing
> <greg.ewing at> wrote:
>> Nick Coghlan wrote:
>>> Please don't lose sight of the fact that yield-based suspension points
>>> looking like something other than an ordinary function call is a
>>> *feature*, not a bug.
> (Ironically, Christian just revived an old thread where Nick was of a
> different opinion.)

I like greenlets too, just for the ease of converting the scaling
constraints of existing concurrent code from
number-of-threads-per-process to number-of-open-sockets-per-process.

I've come to the conclusion that they're no substitute for explicitly
asynchronous code, though, and the assembler magic needed to make them
work with arbitrary C code (either in the language core or in C
extensions) makes them a poor fit for the standard library.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From greg.ewing at  Sun Oct 21 04:46:05 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 21 Oct 2012 15:46:05 +1300
Subject: [Python-ideas] The async API of the future: yield-from
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> In the PEP 380 world, there will be a new type of mistake possible:
> writing yield instead of yield from. Fortunately the scheduler can
> easily test for this -- if the result of its calling next() is not
> None, the user yielded something.

That will catch some mistakes of that kind, but not all --
it won't catch 'yield foo()' where foo() returns None.

One way to fix that would be to require yielding some
unique sentinel value. If the yields are all hidden inside
primitives called with 'yield from', that could be kept
an implementation detail.


From tismer at  Sun Oct 21 06:40:51 2012
From: tismer at (Christian Tismer)
Date: Sun, 21 Oct 2012 06:40:51 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On 21.10.12 01:52, Greg Ewing wrote:
> Christian Tismer wrote:
> ...
>> The only difference is that I'm not aiming at coroutines in
>> the first place, but just having the concept of a *suspendable*
>> function.
> I'm not sure what the distinction is.

This comes maybe from my use of 'coroutine', 'tasklet', 'generator'
etc. which differs from the meaning where others are thinking of.
I'm mostly talking in the PyPy and Stackless community, which
creates confusion.
In that world, 'generator' for instance means a whole bunch of
functions that can play together and yield to the caller of _the_
generator. The same holds for coroutines in that world.

In python-world, things seem to be more often made of single functions.
Switching context to that:

'coroutine' implies to think about coroutines, the intended action.
'suspendable' instead is neutral without any a-priori intent to
switch or something. It just tells the ability that it can be suspended.
That sounds more like a property.

The 'suspendable' is meant as a building block for higher-level
things, which include for instance coroutines (in any flavor).

Technically the same, when we're talking about one single function
that implements it.

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From tismer at  Sun Oct 21 06:58:27 2012
From: tismer at (Christian Tismer)
Date: Sun, 21 Oct 2012 06:58:27 +0200
Subject: [Python-ideas] Language "macros" in discussions
In-Reply-To: <>
References: <>
Message-ID: <>

On 21.10.12 00:37, Guido van Rossum wrote:
> On Sat, Oct 20, 2012 at 1:55 PM, Christian Tismer <tismer at> wrote:
> ...
>> Sent from my Ei4Steve
> Does it happen to have a 40-char wide screen? :-)

Ah, sometimes I write from my iPhone, and I don't know how to do proper
line breaks. Sometimes I break them line-by-line, sometimes I don't
and rely on the email reader's wrapping. None is perfect.

The iPhone 4S means 'iphone for Steve' for me, in memoriam ;-)

cheers - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From greg.ewing at  Sun Oct 21 07:19:46 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 21 Oct 2012 18:19:46 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Guido van Rossum wrote:
> In the yield-from <generator> world we could spell it that
> way too (i.e. yield, not yield from), or we could make it so that we
> can write yield from <future>, or perhaps we need a helper call: yield
> from wait(<future>) or maybe a method on the Future class (since it is
> our own), yield from <future>.wait().

This will depend to some extent on whether Futures are considered
part of the tasks layer or part of the callbacks layer. If they're
considered part of the callbacks layer, they shouldn't have any
methods that must be called with yield-from.

> Final remark: I keep wondering if it's better to try and stay "pure"
> in the public API and use only yield from, plus some helpers like
> spawn(), join() and par(), or if a decent, pragmatic public API can
> offer a combination. I worry that most users will have a hard time
> remembering when to use yield and when yield from.

As I've said, I think it would be better to have only 'yield from'
calls in the public API, because it gives the implementation the
greatest freedom.


From solipsis at  Sun Oct 21 12:18:20 2012
From: solipsis at (Antoine Pitrou)
Date: Sun, 21 Oct 2012 12:18:20 +0200
Subject: [Python-ideas] Interest in seeing in the stdlib
References: <>
Message-ID: <>

On Sat, 20 Oct 2012 21:02:58 -0400
Mike Graham <mikegraham at> wrote:
> On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat
> <andrew.robert.moffat at> wrote:
> > Hi,
> >
> > I'm the author of, an intuitive interface for launching subprocesses
> > in Linux and OSX  It has been maintained on
> > github for about 10 months and currently has
> > about 25k installs, according to
> > (,
> >
> >
> > Andy Grover maintains the Fedora rpm for
> >  and Nick
> > Moffit has submitted an older version of (which was called pbs) to be
> > included in Debian distros
> >
> >
> > I'm interested in making more accessible to help bring Python forward
> > in the area of shell scripting, so I'm interested in seeing if sh would be
> > suitable for the standard library.  Is there any other interest in something
> > like this?
> >
> > Thanks
> strikes me as on the clever side for the stdlib and the lack of
> Windows support would be very unfortunate for a stdlib module (I don't
> know if this is relatively easily fixed, though it seems possible)

Ditto for me. The basic concept of the sh module looks like some fancy
wrapper around subprocess.check_output:

The "easy chaining of subprocesses" part does not look that useful to
me, or at least the examples aren't very convincing. If I want to sort
the results of a shell command, it makes much more sense to me to do so
using Python's text processing and sorting capabilities, than trying to
find the right invocation of Unix "sort" and other utilities.

That said, I do find the "fancy wrapper" part somewhat pretty.



From rosuav at  Sun Oct 21 14:11:44 2012
From: rosuav at (Chris Angelico)
Date: Sun, 21 Oct 2012 23:11:44 +1100
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, Oct 21, 2012 at 11:33 AM, Andrew Moffat
<andrew.robert.moffat at> wrote:
> I'm the author of, an intuitive interface for launching subprocesses
> in Linux and OSX  It has been maintained on
> github for about 10 months and currently has
> about 25k installs, according to
> (,

Is this on PyPI? I tried a search, but 'sh' comes up with rather a lot of hits.


From christian at  Sun Oct 21 15:35:46 2012
From: christian at (Christian Heimes)
Date: Sun, 21 Oct 2012 15:35:46 +0200
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

Am 21.10.2012 02:33, schrieb Andrew Moffat:
> I'm interested in making more accessible to help bring Python
> forward in the area of shell scripting, so I'm interested in seeing if
> sh would be suitable for the standard library.  Is there any other
> interest in something like this?

I like to ignore the technical issues for now and concentrate on the
legal and organizational problems.

In order to get into Python's stdlib you have to relicense and
donate the code under the PSF license. You and every contributor must
agree on the relicensing. At least you must submit a signed contributor
agreement, maybe every contributor. Are you able to get hold of everybody?

Are you willing to maintain your code for several years, at least five
years or more?


From benjamin at  Sun Oct 21 16:03:11 2012
From: benjamin at (Benjamin Peterson)
Date: Sun, 21 Oct 2012 14:03:11 +0000 (UTC)
Subject: [Python-ideas] Interest in seeing in the stdlib
References: <>
Message-ID: <>

Antoine Pitrou <solipsis at ...> writes:
> That said, I do find the "fancy wrapper" part somewhat pretty.

One thing that's not very pretty about it is the need to use "-" prefixed
parameters to get special behavior. It might be nicer if the command was an
object and you called special methods on it to get special behavior. Ex:

sudo.context("extra", "args")


From shibturn at  Sun Oct 21 16:56:35 2012
From: shibturn at (Richard Oudkerk)
Date: Sun, 21 Oct 2012 15:56:35 +0100
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <k612f1$imd$>

On 20/10/2012 9:33am, Glyph wrote:
> ... Also, you can't translate it into one
> of those sources, because the message pump is associated with a
> particular thread; you can't call a function in a different thread to
> call PostQueuedCompletionStatus.

I thought that the whole point of completion ports was inter-thread 
communication, and that PostQueuedCompletionStatus() is the equivalent 
of Queue.put().  Why does the message pump's thread matter?


From Steve.Dower at  Sun Oct 21 18:47:04 2012
From: Steve.Dower at (Steve Dower)
Date: Sun, 21 Oct 2012 16:47:04 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Greg Ewing wrote:
> This will depend to some extent on whether Futures are considered
> part of the tasks layer or part of the callbacks layer. If they're
> considered part of the callbacks layer, they shouldn't have any
> methods that must be called with yield-from.

I put Futures very firmly in the callbacks layer (I guess the easiest reasoning for this is the complete lack of threading/async code in their implementation). Every time someone suggests "yielding a sentinel value" it seems that a Future is ideal for this - it even provides the other thread/tasklet/coroutine with a way to reactivate the original one, whether the two functions were written with knowledge of each other or not.

> As I've said, I think it would be better to have only 'yield from'
> calls in the public API, because it gives the implementation the
> greatest freedom.

I agree with this, though I still feel that we should be aiming for only 'yield' in the public API and leaving 'yield from' as a generalisation of this. For example, the two following pieces of code are basically equivalent:

def task1():
    yield do_something_async_returning_a_future()

def task2():
    yield task1()
    yield task1()

def task3():
    yield task2()


And doing the same thing with yield from:

def task1():
    yield do_something_async_returning_a_future()

def task2():
    yield from task1()
    yield from task1()

def task3():
    yield from task2()


This is also equivalent to this code:

def task3():
    yield do_something_async_returning_a_future()
    yield do_something_async_returning_a_future()


And this:

def task():
    f = Future()
        lambda _: do_something_async_returning_a_future().add_done_callback(
            lambda _: f.set_result(None)
    return f

My point is that once we are using yield, yield from automatically becomes an option for composing operations. Teaching and validating this style is also easier, because the rule can be 'always use @async/yield in public APIs and just yield from in private APIs', and the biggest problem with not using yield from is that more Future objects are created. (The upsides were in my essay, but include compatibility with other Future-based APIs and composability between code from different sources.)


From guido at  Sun Oct 21 19:08:42 2012
From: guido at (Guido van Rossum)
Date: Sun, 21 Oct 2012 10:08:42 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 21, 2012 9:48 AM, "Steve Dower" <Steve.Dower at> wrote:
> Greg Ewing wrote:
> > This will depend to some extent on whether Futures are considered
> > part of the tasks layer or part of the callbacks layer. If they're
> > considered part of the callbacks layer, they shouldn't have any
> > methods that must be called with yield-from.
> I put Futures very firmly in the callbacks layer (I guess the easiest
reasoning for this is the complete lack of threading/async code in their

Did you check the source? That's simply incorrect. It uses locks, of the
threading variety.

( However one could write an implementation with the same interface that

> Every time someone suggests "yielding a sentinel value" it seems that a
Future is ideal for this - it even provides the other
thread/tasklet/coroutine with a way to reactivate the original one, whether
the two functions were written with knowledge of each other or not.

This I like.

> > As I've said, I think it would be better to have only 'yield from'
> > calls in the public API, because it gives the implementation the
> > greatest freedom.
> I agree with this, though I still feel that we should be aiming for only
'yield' in the public API and leaving 'yield from' as a generalisation of
this. For example, the two following pieces of code are basically
> @async
> def task1():
>     yield do_something_async_returning_a_future()
> @async
> def task2():
>     yield task1()
>     yield task1()
> @async
> def task3():
>     yield task2()
> task3().result()
> And doing the same thing with yield from:
> def task1():
>     yield do_something_async_returning_a_future()
> def task2():
>     yield from task1()
>     yield from task1()
> @async
> def task3():
>     yield from task2()
> task3().result()
> This is also equivalent to this code:
> @async
> def task3():
>     yield do_something_async_returning_a_future()
>     yield do_something_async_returning_a_future()
> task3().result()
> And this:
> def task():
>     f = Future()
>     do_something_async_returning_a_future().add_done_callback(
>         lambda _:
>             lambda _: f.set_result(None)
>         )
>     )
>     return f
> My point is that once we are using yield, yield from automatically
becomes an option for composing operations. Teaching and validating this
style is also easier, because the rule can be 'always use @async/yield in
public APIs and just yield from in private APIs', and the biggest problem
with not using yield from is that more Future objects are created. (The
upsides were in my essay, but include compatibility with other Future-based
APIs and composability between code from different sources.)

Hm. I think it'll be confusing. And the Futures-only-in-public-APIs rule
seems to encourage less efficient solutions.

--Guido van Rossum (sent from Android phone)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From vinay_sajip at  Sun Oct 21 20:41:50 2012
From: vinay_sajip at (Vinay Sajip)
Date: Sun, 21 Oct 2012 18:41:50 +0000 (UTC)
Subject: [Python-ideas] Interest in seeing in the stdlib
References: <>
Message-ID: <>

Andrew Moffat <andrew.robert.moffat at ...> writes:

> I'm interested in making more accessible to help bring Python forward in
> the area of shell scripting, so I'm interested in seeing if sh would be
> suitable for the standard library. ?Is there any other interest in something
> like this?

I would agree with others who have replied saying that the approach is cute,
but a little too magical. Disclosure: this is an area of interest for
me, and I maintain a project called sarge [1] which sort of fits in the same
space as pbs/sh. It doesn't have the cute shell-command-as-Python-function
idiom (which, in my view, buys very little readability), but it does aim to
offer some features which (AFAICT) sh doesn't have. I'll just list sarge's
features briefly below, if for no other reason than to show that there are
other contenders worth considering (should there be a consensus that the
stdlib needs batteries in this area).

Sarge improves upon subprocess when:

* You want to use command pipelines, but using subprocess out of the box often
  leads to deadlocks because pipe buffers get filled up.

* You want to use bash-style pipe syntax on Windows, but some Windows shells
  don?t support some of the syntax you want to use, like &&, ||, |& and so on.

* You want to process output from commands in a flexible way, and communicate()
  is not flexible enough for your needs ? for example, you need to process
  output a line at a time.

* You want to avoid shell injection problems by having the ability to quote
  your command arguments safely.

* subprocess allows you to let stderr be the same as stdout, but not the other
  way around ? and you need to do that.

It offers:

* A simple run command which allows a rich subset of Bash-style shell command
  syntax, but parsed and run by sarge so that you can run identically on
  Windows without cygwin. This includes asynchronous calls (using "&" just as
  in bash).

* The ability to format shell commands with placeholders, such that variables
  are quoted to prevent shell injection attacks.

* The ability to capture output streams without requiring you to program your
  own threads. You just use a Capture object and then you can read from it as
  and when you want. A Capture object can capture the output from multiple
  chained commands.

* Delays in commands (e.g. "sleep") are honoured in asynchronous calls.

I would also concur with others who've pointed out that stdlib maintenance
is a long haul affair. I've been maintaining the logging package for around
10 years now :-)


Vinay Sajip


From Steve.Dower at  Sun Oct 21 22:07:49 2012
From: Steve.Dower at (Steve Dower)
Date: Sun, 21 Oct 2012 20:07:49 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<>	<>
	<>	<>
Message-ID: <>

> Did you check the source? That's simply incorrect. It uses locks, of the threading variety.

Yes, I've spent a lot of time in the source for Future while working on this. It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

> ( However one could write an implementation with the same interface that doesn't.)

And this is as simple as replacing threading.Condition() with no-op acquire() and release() functions. Regardless, the big advantage of requiring 'Future' as an interface* is that other implementations can be substituted. (Maybe making the implementation of future a property of the active event loop? I don't mind particular event loops from banning CPU threads, but the entire API should allow their existence.)

(*I'm inclined to define this as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(),  and doesn't need to be part of the interface.)

> Hm. I think it'll be confusing.

I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated.

> And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.

Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.)


From guido at  Mon Oct 22 02:23:52 2012
From: guido at (Guido van Rossum)
Date: Sun, 21 Oct 2012 17:23:52 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve.Dower at> wrote:
>> Did you check the source? That's simply incorrect. It uses locks, of the threading variety.
> Yes, I've spent a lot of time in the source for Future while working on this.

Sorry, I should have realized this, since your code example contained
monkey-patching that Future class...

> It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads.

I don't see it that way. Any time you acquire a lock, you may be
blocked for a long time. In a typical event loop that's an absolute
no-no. Typically, to wait for another thread, you give the other
thread a callback that adds a new event for *this* thread.

Now, it's possible that in Windows, when using IOCP, the philosophy is
different -- I think I've read in that
there can be multiple threads reading events from a single queue.

But AFAIK, in Twisted and Tornado and similar systems, and probably
even in gevent and Stackless, there is a strong culture around having
only a single thread handling events (at least only one thread at a
time), since the assumption is that as long as you don't suspend, you
can trust that the world doesn't change, and that assumption becomes
invalid when other threads may also be handling events from the same
queue. It's possible to design a world where different threads have
their own event queues, and this assumption would only be valid for
events belonging to the same queue; however that seems complicated.
And you still don't want to ever attempt to acquire a *threading*
lock, because you end up blocking the entire event loop.

>> ( However one could write an implementation with the same interface that doesn't.)
> And this is as simple as replacing threading.Condition() with no-op acquire() and release() functions. Regardless, the big advantage of requiring 'Future' as an interface* is that other implementations can be substituted.

Yes, here I think we are in (possibly violent :-) agreement.

> (Maybe making the implementation of future a property of the active event loop? I don't mind particular event loops from banning CPU threads, but the entire API should allow their existence.)

Perhaps. Lots of possibilities in this design space.

> (*I'm inclined to define this as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(),  and doesn't need to be part of the interface.)

Agreed. I don't see much use for the cancellation stuff and all the
extra complexity that adds to the interface. BTW, I think
concurrent.futures.Future doesn't stop you from calling set_result()
or set_exception() more than once, which I think is a mistake -- I do
enforce that in NDB's Futures.

[Here you snipped some context. You proposed having public APIs that
use "yield <future>" and leaving "yield from <generator>" as something
the user can use in her own program. To which I replied:]

>> Hm. I think it'll be confusing.
> I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated.

Let's agree to disagree on this. I think they are both valid design
choices with different trade-offs. We should explore both directions
further so as to form a better opinion.

>> And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.
> Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.)

Surely "whatever you like" is constrained by whatever the Task type
defines. Maybe it looks like a Future and has a blocking method to
wait for the result, like .result() on concurrent.futures.Future? If
you want that functionality for generators you just have to call some
function, passing it the generator as an argument. Remember, Python
doesn't consider that an inferior choice of API design compared to
making something a method of the object itself -- witness len(),
repr() and many others.

FWIW, if I may sound antagonistic, I actually think that we're mostly
in violent agreement, and I think we're getting closer to coming up
with a sensible set of requirements and possibly even an API proposal.
Keep it coming!

--Guido van Rossum (

From eric at  Mon Oct 22 03:18:35 2012
From: eric at (Eric V. Smith)
Date: Sun, 21 Oct 2012 21:18:35 -0400
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/21/2012 8:23 PM, Guido van Rossum wrote:
> I don't see it that way. Any time you acquire a lock, you may be
> blocked for a long time. In a typical event loop that's an absolute
> no-no. Typically, to wait for another thread, you give the other
> thread a callback that adds a new event for *this* thread.
> Now, it's possible that in Windows, when using IOCP, the philosophy is
> different -- I think I've read in
> that
> there can be multiple threads reading events from a single queue.

Correct. The typical usage of an IOCP is that you create as many threads
as you have CPUs (or cores, or execution units, or whatever the kids
call them these days), then they can all wait on the same IOCP. So if
you have, say 4 CPUs so 4 threads, they can all be woken up to do useful
work if the IOCP has work items for them.


From andrew.robert.moffat at  Mon Oct 22 04:40:07 2012
From: andrew.robert.moffat at (Andrew Moffat)
Date: Sun, 21 Oct 2012 21:40:07 -0500
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

I would be interested in relicensing and donating.  I am able to reach out
to the contributors, and I am pretty positive I could reach out and get the
signing off from them.  I would be more than willing to maintain the
package as well...I'm in it for the long haul, it seems to resonated well
with the community throughout its development.

On Sun, Oct 21, 2012 at 8:35 AM, Christian Heimes <christian at>wrote:

> Am 21.10.2012 02:33, schrieb Andrew Moffat:
> > I'm interested in making more accessible to help bring Python
> > forward in the area of shell scripting, so I'm interested in seeing if
> > sh would be suitable for the standard library.  Is there any other
> > interest in something like this?
> I like to ignore the technical issues for now and concentrate on the
> legal and organizational problems.
> In order to get into Python's stdlib you have to relicense and
> donate the code under the PSF license. You and every contributor must
> agree on the relicensing. At least you must submit a signed contributor
> agreement, maybe every contributor. Are you able to get hold of everybody?
> Are you willing to maintain your code for several years, at least five
> years or more?
> Regards,
> Christian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From andrew.robert.moffat at  Mon Oct 22 04:40:53 2012
From: andrew.robert.moffat at (Andrew Moffat)
Date: Sun, 21 Oct 2012 21:40:53 -0500
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

The main criticism has been the cleverness of the dynamic lookups.  There
is also the ability to use a Command object for more explicit calls:

cmd = sh.Command("/some/command")

So you have the best of both worlds.  If you like the idea of the programs
being attributes on the module, you can use the advertised way, if you
don't, you can use the more explicit way.

Windows support would be a little more difficult.  It existed in an old
version of sh, when it was merely a wrapper around the subprocess module.
 Now that no longer relies on the subprocess module and does
fork-exec itself (in order to get more flexible access to the processes),
Windows is currently unsupported.  My current understanding is that most of
the value comes from the linux/OSX folks, but Windows support is scheduled
for the future.

On Sat, Oct 20, 2012 at 8:02 PM, Mike Graham <mikegraham at> wrote:

> On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat
> <andrew.robert.moffat at> wrote:
> > Hi,
> >
> > I'm the author of, an intuitive interface for launching
> subprocesses
> > in Linux and OSX  It has been maintained
> on
> > github for about 10 months and currently
> has
> > about 25k installs, according to
> > (,
> >
> >
> > Andy Grover maintains the Fedora rpm for
> >  and Nick
> > Moffit has submitted an older version of (which was called pbs) to
> be
> > included in Debian distros
> >
> >
> > I'm interested in making more accessible to help bring Python
> forward
> > in the area of shell scripting, so I'm interested in seeing if sh would
> be
> > suitable for the standard library.  Is there any other interest in
> something
> > like this?
> >
> > Thanks
> strikes me as on the clever side for the stdlib and the lack of
> Windows support would be very unfortunate for a stdlib module (I don't
> know if this is relatively easily fixed, though it seems possible)
> Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From glyph at  Mon Oct 22 04:41:51 2012
From: glyph at (Glyph)
Date: Sun, 21 Oct 2012 19:41:51 -0700
Subject: [Python-ideas] The async API of the future
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 20, 2012, at 4:26 PM, Greg Ewing <greg.ewing at> wrote:

> Glyph wrote:
>> The main interfaces you need are here:
>> <>
>> <>
>> <>
>> <>
> These don't look anywhere near adequate to me. How do I make
> a sendmsg() call on a unix-domain socket and pass access rights
> over it? How do I do a readdir() on a file descriptor representing
> a directory? Etc.

You don't.  Notice I didn't even include basic datagram transports in those interfaces, and those are relatively straightforward compared to your questions.

Antoine already cited the answer to your first question - on POSIX-y operating systems, you add another interface, something like <>.

Except, the first question doesn't even make sense as posed on Windows.  Windows doesn't have any support for AF_UNIX/AF_LOCAL; if you want to send a "file descriptor" (read: object handle) to another process, you have to have either use DuplicateHandle or WSADuplicateSocket.  Note that these work differently, so it depends which type of "file descriptor" you're trying to pass; if you are passing a socket you need a PID, if you're passing an anonymous pipe you need a process handle.

The second question doesn't make sense anywhere.  readdir() is blocking, always and forever; opendir() doesn't take flags and you can't effectively or portably set O_NONBLOCK on a directory descriptor with ioctls.  All filesystem operations also block, for all practical purposes.  So, really your question reverts to "how does one integrate a thread pool with an event loop", to which the answer is <>.

Of course, all of these operations _can_ be made 'really' non-blocking with sufficient terrifyingly deep platform-specific knowledge.  A DIR* is just a file descriptor, eventually, and readdir() is eventually some kind of syscall on it.  POSIX AIO operations might be used to read without blocking[1].

However, trying to get consensus about, standardize, and implement every possible I/O operation before implementing the core I/O loop interface for sockets and timed calls is pretty extreme cart-before-horse-putting.  People implemented gazillions of amazing networking applications in Python over the past two decades, despite the fact that it only even got sendmsg support recently.  Heck, even Twisted's support of sendmsg and file-descriptor sending is relatively recent.  I realize that this list is dedicated to the discussion of all proposals regardless of how radical and insane they might be, but that doesn't mean that *every* proposal must have its scope expanded until it is as radical and insane as possible.

The fact that Twisted has so many separate interfaces is not a coincidence.  We took an explicitly layered approach to building the main loop, so that anyone who needed an esoteric I/O operation could always write their own platform-specific handler.  Anyone building an application that requires that layer will probably need to write something specific to their operating system and their preferred framework (Twisted, Tornado, the stdlib loop whether it's based on asyncore or not, etc).

I believe that trying to cover every case in advance so they won't have to use an escape hatch and then not providing such an escape hatch is just going to make the API huge and confusing for newcomers and frustratingly limiting for people with really advanced use-cases.


[1]: Actually, no, they can't: <>.  But maybe one day this would be consistently implemented.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Mon Oct 22 06:10:57 2012
From: guido at (Guido van Rossum)
Date: Sun, 21 Oct 2012 21:10:57 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sun, Oct 21, 2012 at 6:18 PM, Eric V. Smith <eric at> wrote:
> On 10/21/2012 8:23 PM, Guido van Rossum wrote:
>> I don't see it that way. Any time you acquire a lock, you may be
>> blocked for a long time. In a typical event loop that's an absolute
>> no-no. Typically, to wait for another thread, you give the other
>> thread a callback that adds a new event for *this* thread.
>> Now, it's possible that in Windows, when using IOCP, the philosophy is
>> different -- I think I've read in
>> that
>> there can be multiple threads reading events from a single queue.
> Correct. The typical usage of an IOCP is that you create as many threads
> as you have CPUs (or cores, or execution units, or whatever the kids
> call them these days), then they can all wait on the same IOCP. So if
> you have, say 4 CPUs so 4 threads, they can all be woken up to do useful
> work if the IOCP has work items for them.

So what's the typical way to do locking in such a system? Waiting for
a lock seems bad; and you can't assume that no other callbacks may run
while you are running. What synchronization primitives are typically

--Guido van Rossum (

From Steve.Dower at  Mon Oct 22 07:30:24 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 22 Oct 2012 05:30:24 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

(Sorry about cutting context, I'll try not to do that again, but I also try to avoid reposting an entire email.)

> > It has synchronisation which is _aware_ of threads, but it never 
> > creates, requires or uses them. It simply ensures thread-safe 
> > reentrancy, which will be required for any general solution unless
> > it is completely banned from interacting across CPU threads.
> I don't see it that way. Any time you acquire a lock, you may be
> blocked for a long time. In a typical event loop that's an absolute
> no-no. Typically, to wait for another thread, you give the other
> thread a callback that adds a new event for *this* thread.

Agreed, but when you're waiting for another thread to stop reading its queue so you can add to it, how are you supposed to queue an event while you wait?

The lock in Future is only an issue in result() where we wait for another thread to complete the event, but that is the entire point of that function. FWIW I don't see any scheduler ever calling result(), but there are valid situations for a user to call it (REPL, already on a worker thread, unit tests). Everywhere else the lock is required for thread safety. It could be a different lock to the one in result, but I don't think anything is gained from that. Rewriting Future in C and using CPU CAS primitives might be possible, but probably only of limited value.

> Now, it's possible that in Windows, when using IOCP, the philosophy is
> different -- I think I've read in
> that
> there can be multiple threads reading events from a single queue.
> But AFAIK, in Twisted and Tornado and similar systems, and probably
> even in gevent and Stackless, there is a strong culture around having
> only a single thread handling events (at least only one thread at a
> time), since the assumption is that as long as you don't suspend, you
> can trust that the world doesn't change, and that assumption becomes
> invalid when other threads may also be handling events from the same
> queue. 

This is true, and my understanding is that IOCP is basically just a thread pool, and the 'single queue' means that all the threads are waiting on all the events and you can't guarantee which thread will get which. This is better than creating a new thread for each file, but I think that's all it is meant to be. We can easily write a single thread that can wait on all I/O, scheduling callbacks on the main thread, if necessary. I'm pretty sure that all platforms have better ways to do this though, but because they're all different it will need different implementations.

> It's possible to design a world where different threads have
> their own event queues, and this assumption would only be valid for
> events belonging to the same queue; however that seems complicated.
> And you still don't want to ever attempt to acquire a *threading*
> lock, because you end up blocking the entire event loop.

Multiple threads with independent queues should be okay, though definitely an advanced scenario. I'm sure this would be preferable to having multiple processes with one thread/queue each in some cases. In any case, this is easy enough to implement with TLS.

> > (*I'm inclined to define [the required Future interface] as 'result()', 'done()', 
> > 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()'
> > functions. Maybe more, but I think that's sufficient. The current '_waiters'
> > list is an optimisation for add_done_callback(),  and doesn't need to be part
> > of the interface.)
> Agreed. I don't see much use for the cancellation stuff and all the
> extra complexity that adds to the interface. BTW, I think
> concurrent.futures.Future doesn't stop you from calling set_result()
> or set_exception() more than once, which I think is a mistake -- I do
> enforce that in NDB's Futures.

I agree, there should be no way to set the result or exception more than once.

On cancellation, while there is some complexity involved I do think we can make use of 'cancel' and 'cancelled' functions to pass a signal back into the worker:

op = do_something_async()   # not yielded
button.on_click += lambda: op.cancel()
    result = yield op
except CancelledError:
    return False

def do_something_async():
    f = Future()

    def threadproc():
        total = 0
        for i in range(10000):
            if f.cancelled(): raise CancelledError
            total += i

    return f

I certainly would not want to see the CancelledError be raised automatically - this is no thread.abort() call - but it may be convenient to have an interface for "self._cancelled = True" and "return self._cancelled" that at least saves people from coming up with their own way of passing it in. The worker may completely ignore it, or complete anyway, but for long running operations it may be very handy.

(I'll stop before I start thinking about partial results... :) )

> [Here you snipped some context. You proposed having public APIs that
> use "yield <future>" and leaving "yield from <generator>" as something
> the user can use in her own program. To which I replied:]
> >> Hm. I think it'll be confusing.
> >
> > I think the basic case ("just make it work") will be simpler, and the advanced 
> > case ("minimise memory/CPU usage") will be more complicated.
> Let's agree to disagree on this. I think they are both valid design
> choices with different trade-offs. We should explore both directions
> further so as to form a better opinion.

Probably we need some working implementations to code against. 

> > > And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.
> >
> > Personally, I'd prefer developers to get a correct solution without having to
> > understand how the whole thing works (the "pit of success"). I'm also sceptical
> > of any other rule being as portable and composable - I don't think a standard
> > library should have APIs where "you must only call this function with yield-from".
> > ('await' in C# is not compulsory - you can take the Task returned from an async
> > method and do whatever you like with it.)
> Surely "whatever you like" is constrained by whatever the Task type
> defines. Maybe it looks like a Future and has a blocking method to
> wait for the result, like .result() on concurrent.futures.Future? If
> you want that functionality for generators you just have to call some
> function, passing it the generator as an argument. Remember, Python
> doesn't consider that an inferior choice of API design compared to
> making something a method of the object itself -- witness len(),
> repr() and many others.

I'm interested that you skipped my "portable and composable" claim and went straight for my aside about another language. I'd prefer to avoid introducing top-level names, especially since this is an API with plenty of predecessors... what sort of trouble would we be having if sched or asyncore had claimed 'wait()'? Even more so because it's Python, since it is so easy to overwrite the value.

(And as it happens, Task handles both the asynchrony and the callbacks, so it looks a bit like Thread and Future mixed together. Personally, I prefer to keep the concepts separate.)

> FWIW, if I may sound antagonistic, I actually think that we're mostly
> in violent agreement, and I think we're getting closer to coming up
> with a sensible set of requirements and possibly even an API proposal.
> Keep it coming!

I do my best work when someone is arguing with me :)


From mal at  Mon Oct 22 08:58:17 2012
From: mal at (M.-A. Lemburg)
Date: Mon, 22 Oct 2012 08:58:17 +0200
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On 22.10.2012 04:40, Andrew Moffat wrote:
> I would be interested in relicensing and donating.  I am able to reach out
> to the contributors, and I am pretty positive I could reach out and get the
> signing off from them.  I would be more than willing to maintain the
> package as well...I'm in it for the long haul, it seems to resonated well
> with the community throughout its development.
> On Sun, Oct 21, 2012 at 8:35 AM, Christian Heimes <christian at>wrote:
>> Am 21.10.2012 02:33, schrieb Andrew Moffat:
>>> I'm interested in making more accessible to help bring Python
>>> forward in the area of shell scripting, so I'm interested in seeing if
>>> sh would be suitable for the standard library.  Is there any other
>>> interest in something like this?
>> I like to ignore the technical issues for now and concentrate on the
>> legal and organizational problems.
>> In order to get into Python's stdlib you have to relicense and
>> donate the code under the PSF license. You and every contributor must
>> agree on the relicensing. At least you must submit a signed contributor
>> agreement, maybe every contributor. Are you able to get hold of everybody?

Small correction:

The contributors would have to sign a contributor agreement
with the PSF to enable the PSF to distribute the code under
the PSF license:

This usually is much easier to have than a copyright sign-over,
since it's only a special license and the copyright remains with
the authors.

>> Are you willing to maintain your code for several years, at least five
>> years or more?

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Oct 22 2012)
>>> Python Projects, Consulting and Support ...
>>> mxODBC.Zope/Plone.Database.Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...
2012-09-27: Released eGenix PyRun 1.1.0 ...
2012-09-26: Released mxODBC.Connect 2.0.1 ...
2012-09-25: Released mxODBC 3.2.1 ...   
2012-10-23: Python Meeting Duesseldorf ...                      tomorrow Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From techtonik at  Mon Oct 22 12:51:52 2012
From: techtonik at (anatoly techtonik)
Date: Mon, 22 Oct 2012 13:51:52 +0300
Subject: [Python-ideas] Windows temporary file association for Python files
Message-ID: <>

I wonder if it will make the life easier if Python was installed with
.py association to "%PYTHON_HOME%\python.exe" "%1" %*
It will remove the need to run .py scripts in virtualenv with explicit
'python' prefix.

Example how it doesn't work right now

E:\virtenv32\Scripts>echo import sys; print(sys.version) >

3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)]

3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]

If Python file association was specified with
"%PYTHON_HOME%\python.exe" "%1" %* then virtualenv could override this
variable when setting the environment to set correct executable for
.py files.
anatoly t.

From p.f.moore at  Mon Oct 22 13:44:04 2012
From: p.f.moore at (Paul Moore)
Date: Mon, 22 Oct 2012 12:44:04 +0100
Subject: [Python-ideas] Windows temporary file association for Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 22 October 2012 11:51, anatoly techtonik <techtonik at> wrote:
> I wonder if it will make the life easier if Python was installed with
> .py association to "%PYTHON_HOME%\python.exe" "%1" %*
> It will remove the need to run .py scripts in virtualenv with explicit
> 'python' prefix.

In Python 3.3 and later, the "py.exe" launcher is installed, and this
is the association for ".py" files by default. It looks at the #! line
of .py files, so you can run a specific Python interpreter by giving
its full path. You can also specify (for example) "python3" or
"python3.2" to run a specific Python version.

A less known fact is that you can define custom commands for py.exe in
a py.ini file. So you can have


in your py.ini, and then start your script with #!vpy to make it use
the currently active Python (whichever is on %PATH%).

Hope that helps,

From tismer at  Mon Oct 22 13:52:55 2012
From: tismer at (Christian Tismer)
Date: Mon, 22 Oct 2012 13:52:55 +0200
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On 22.10.12 04:40, Andrew Moffat wrote:
> The main criticism has been the cleverness of the dynamic lookups. 
>  There is also the ability to use a Command object for more explicit 
> calls:
> cmd = sh.Command("/some/command")
> cmd(arg)
> So you have the best of both worlds.  If you like the idea of the 
> programs being attributes on the module, you can use the advertised 
> way, if you don't, you can use the more explicit way.
> Windows support would be a little more difficult.  It existed in an 
> old version of sh, when it was merely a wrapper around the subprocess 
> module.  Now that no longer relies on the subprocess module and 
> does fork-exec itself (in order to get more flexible access to the 
> processes), Windows is currently unsupported.  My current 
> understanding is that most of the value comes from the linux/OSX 
> folks, but Windows support is scheduled for the future.

This is what I don't like:

subprocess is not used, but you implement stuff yourself.
Instead of bypassing subprocess I would improve subprocess
and not duplicate the windows problem, which is most of the
time _not_ easy to get right.

Can you explain why you went this path?

cheers - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From tismer at  Mon Oct 22 14:01:38 2012
From: tismer at (Christian Tismer)
Date: Mon, 22 Oct 2012 14:01:38 +0200
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On 22.10.12 13:52, Christian Tismer wrote:
> On 22.10.12 04:40, Andrew Moffat wrote:
>> The main criticism has been the cleverness of the dynamic lookups. 
>>  There is also the ability to use a Command object for more explicit 
>> calls:
>> cmd = sh.Command("/some/command")
>> cmd(arg)
>> So you have the best of both worlds.  If you like the idea of the 
>> programs being attributes on the module, you can use the advertised 
>> way, if you don't, you can use the more explicit way.
>> Windows support would be a little more difficult.  It existed in an 
>> old version of sh, when it was merely a wrapper around the subprocess 
>> module.  Now that no longer relies on the subprocess module and 
>> does fork-exec itself (in order to get more flexible access to the 
>> processes), Windows is currently unsupported.  My current 
>> understanding is that most of the value comes from the linux/OSX 
>> folks, but Windows support is scheduled for the future.
> This is what I don't like:
> subprocess is not used, but you implement stuff yourself.
> Instead of bypassing subprocess I would improve subprocess
> and not duplicate the windows problem, which is most of the
> time _not_ easy to get right.
> Can you explain why you went this path?

Sorry, while we are at it:
The package name is a problem for me.
A two-character name for a package??
That is something that I would never do in the global package namespace.
It also is IMHO not nice to have such short names in PyPI.

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From vinay_sajip at  Mon Oct 22 14:38:36 2012
From: vinay_sajip at (Vinay Sajip)
Date: Mon, 22 Oct 2012 12:38:36 +0000 (UTC)
Subject: [Python-ideas] Interest in seeing in the stdlib
References: <>
Message-ID: <>

Andrew Moffat <andrew.robert.moffat at ...> writes:

> The main criticism has been the cleverness of the dynamic lookups.

I would add:

* The plethora of special keyword arguments like _bg, _iter, _in, _piped etc.
  doesn't look good.

* Using callbacks for processing stream output makes it harder to do certain
  kinds of processing on that output.

> Windows support would be a little more difficult. ?It existed in an old
> version of sh, when it was merely a wrapper around the subprocess module.
> Now that no longer relies on the subprocess module and does fork-exec
> itself

This isn't good. You may have resorted to bypassing subprocess because it
didn't do what you needed, but it certainly wouldn't look good if a proposed
stdlib module wasn't eating its own dog food (by which I mean, using
subprocess). Though there have been precedents (optparse / argparse), a
determined effort was made there to work with the existing stdlib module before
giving up on it. From my own experience, subprocess has not been that
intractable, so I'm curious - what flexibility of access did you need that
subprocess couldn't offer? I would guess things that are essentially
non-portable, like tty access to provide pexpect-like behaviour. (I had to
eschew this for sarge, in the interests of cross-platform compatibility.)

> (in order to get more flexible access to the processes), Windows is currently
> unsupported. ?My current understanding is that most of the value comes from
> the linux/OSX folks, but Windows support is scheduled for the future.

It seems to me premature to propose for inclusion in the stdlib before
offering Windows support. After all, those who need it can readily get hold of
it from PyPI, as the impressive download numbers show.

Just as its design has changed a fair bit going from pbs to, it may
change yet more when Windows support is added, and it can be looked at again


Vinay Sajip

From techtonik at  Mon Oct 22 14:42:26 2012
From: techtonik at (anatoly techtonik)
Date: Mon, 22 Oct 2012 15:42:26 +0300
Subject: [Python-ideas] Windows temporary file association for Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 22, 2012 at 2:44 PM, Paul Moore <p.f.moore at> wrote:
> On 22 October 2012 11:51, anatoly techtonik <techtonik at> wrote:
>> I wonder if it will make the life easier if Python was installed with
>> .py association to "%PYTHON_HOME%\python.exe" "%1" %*
>> It will remove the need to run .py scripts in virtualenv with explicit
>> 'python' prefix.
> In Python 3.3 and later, the "py.exe" launcher is installed, and this
> is the association for ".py" files by default. It looks at the #! line
> of .py files, so you can run a specific Python interpreter by giving
> its full path. You can also specify (for example) "python3" or
> "python3.2" to run a specific Python version.

Yes, I've noticed that this nasty launcher gets in the way. So, do you
propose to edit source files every time I need to test them with a new
version of Python? My original user story:

    I want to execute scripts in virtual environment (i.e. with Python
installed for this virtual environment) without 'python' prefix.

Here is another one. Currently Sphinx doesn't install with Python 3.2
and with Python 3.3 [1]. Normally I'd create 3 environments to
troubleshoot it and I can not modify all Sphinx files to point to the
correct interpreter to just execute ' install'.

A solution would be to teach launcher to honor PYTHON_PATH variable if
it is set (please don't confuse it with PYTHONPATH which purpose is
still unclear on Windows).


From breamoreboy at  Mon Oct 22 15:16:49 2012
From: breamoreboy at (Mark Lawrence)
Date: Mon, 22 Oct 2012 14:16:49 +0100
Subject: [Python-ideas] Windows temporary file association for Python
In-Reply-To: <>
References: <>
Message-ID: <k63gqe$643$>

On 22/10/2012 13:42, anatoly techtonik wrote:
> On Mon, Oct 22, 2012 at 2:44 PM, Paul Moore <p.f.moore at> wrote:
>> On 22 October 2012 11:51, anatoly techtonik <techtonik at> wrote:
>>> I wonder if it will make the life easier if Python was installed with
>>> .py association to "%PYTHON_HOME%\python.exe" "%1" %*
>>> It will remove the need to run .py scripts in virtualenv with explicit
>>> 'python' prefix.
>> In Python 3.3 and later, the "py.exe" launcher is installed, and this
>> is the association for ".py" files by default. It looks at the #! line
>> of .py files, so you can run a specific Python interpreter by giving
>> its full path. You can also specify (for example) "python3" or
>> "python3.2" to run a specific Python version.
> Yes, I've noticed that this nasty launcher gets in the way. So, do you
> propose to edit source files every time I need to test them with a new
> version of Python? My original user story:

I see nothing nasty in the launcher, rather it's extremely useful.  You 
don't have to edit your scripts.  Just use py -3.2, py -2 or whatever to 
run the script, the launcher will work out which version to run for you 
if you're not specific.

>      I want to execute scripts in virtual environment (i.e. with Python
> installed for this virtual environment) without 'python' prefix.
> Here is another one. Currently Sphinx doesn't install with Python 3.2
> and with Python 3.3 [1]. Normally I'd create 3 environments to
> troubleshoot it and I can not modify all Sphinx files to point to the
> correct interpreter to just execute ' install'.

Please try running your scripts with the mechanism I've given above and 
report back what happens, hopefully success :)

> A solution would be to teach launcher to honor PYTHON_PATH variable if
> it is set (please don't confuse it with PYTHONPATH which purpose is
> still unclear on Windows).

What is PYTHON_PATH?  IIRC I was told years ago *NOT* to use PYTHONPATH 
on Windows so its purpose to me isn't unclear, it's completely baffling.

> 1.


Mark Lawrence.

From jstpierre at  Mon Oct 22 16:52:31 2012
From: jstpierre at (Jasper St. Pierre)
Date: Mon, 22 Oct 2012 10:52:31 -0400
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat
<andrew.robert.moffat at> wrote:
> Hi,
> I'm the author of, an intuitive interface for launching subprocesses
> in Linux and OSX  It has been maintained on
> github for about 10 months and currently has
> about 25k installs, according to
> (,
> Andy Grover maintains the Fedora rpm for
>  and Nick
> Moffit has submitted an older version of (which was called pbs) to be
> included in Debian distros
> I'm interested in making more accessible to help bring Python forward
> in the area of shell scripting, so I'm interested in seeing if sh would be
> suitable for the standard library.  Is there any other interest in something
> like this?

I'm not one for the sugar. Seems like you're stuffing the Python
syntax where it doesn't quite belong, as evidenced by the many escape
hatches. Basic query of things not covered in the documentation:

If I import a non-existant program, will it give me back a function
that will fail or raise an ImportError?

How do I run a program with a - in the name? You say you replace -
with _, but thatdoesn't specify what happens in the edge case of "if I
have google-chrome and google_chrome, which one wins? What about
/usr/bin/google-chrome and /usr/local/bin/google_chrome"? That is,
will it exhaust the PATH before trying fallbacks replacements or will
it check all replacements at once?

If I have a program that's not on PATH, what do I do? I can manipulate
the PATH environment variable, but am I guaranteed that will work? Are
you going to double fork forever to guarantee that environment? Can I
build a custom prefix, like p =
sh.MagicPrefix(path="/opt/android_devtools/bin"), and have that work
like the regular sh module? p.gcc("whatever") ? Even with the
existence of a regular gcc in the path?

I wonder what happens if you do from sh import *.

Does it block execution before continuing? How can I do parallel
execution of four subprocesses, and get notified when all four are
done? (Seems like this might be a thing for a Future as well, even in
the absence of any scheduler or event loop).

Are newcomers going to be confused by this? What happens if I try and
do something like"-l -a")? Will you use the POSIX shell parsing
algorithm, pass it to bash, or pass it as one parameter? Will some
form of injection attack be mitigated by this design?

If you see this magic syntax as your one unique feature, I'd propose
that you add it to the subprocess module, and improve the standard
subprocess module's interface to cope with the new feature.

But I don't see this as a worthwhile thing to have. -1 on the thing.

> Thanks
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


From guido at  Mon Oct 22 16:59:56 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 07:59:56 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve.Dower at> wrote:
[Stuff about Futures and threads]

Personally, I'm interested in designing a system, including an event
loop, where you can rely on the properties of cooperative scheduling
to avoid ever touching (OS) threading locks. I think such a system
should be "pure" and all interaction with threads should be mediated
by the event loop. (It's okay if this means that the implementation of
the event loop must at some point acquire a threading lock.) The
Futures used by the tasks to coordinate amongst themselves should not
require locking -- they should themselves be able to rely on the
guarantees of the event loop not to invoke multiple callbacks in

IIUC you can do this on Windows with IOCP too, simply by only having a
single thread reading events.

>> > > And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions.
>> >
>> > Personally, I'd prefer developers to get a correct solution without having to
>> > understand how the whole thing works (the "pit of success"). I'm also sceptical
>> > of any other rule being as portable and composable - I don't think a standard
>> > library should have APIs where "you must only call this function with yield-from".
>> > ('await' in C# is not compulsory - you can take the Task returned from an async
>> > method and do whatever you like with it.)
>> Surely "whatever you like" is constrained by whatever the Task type
>> defines. Maybe it looks like a Future and has a blocking method to
>> wait for the result, like .result() on concurrent.futures.Future? If
>> you want that functionality for generators you just have to call some
>> function, passing it the generator as an argument. Remember, Python
>> doesn't consider that an inferior choice of API design compared to
>> making something a method of the object itself -- witness len(),
>> repr() and many others.
> I'm interested that you skipped my "portable and composable" claim and went straight for my aside about another language. I'd prefer to avoid introducing top-level names, especially since this is an API with plenty of predecessors... what sort of trouble would we be having if sched or asyncore had claimed 'wait()'? Even more so because it's Python, since it is so easy to overwrite the value.

Sorry, probably just got distracted (I was reading on three different
devices while on a family outing :-).

But my answer is short: to me, the PEP 380 style is perfectly portable
and composable. If you think it isn't, please elaborate.

> (And as it happens, Task handles both the asynchrony and the callbacks, so it looks a bit like Thread and Future mixed together. Personally, I prefer to keep the concepts separate.)

Same here.

--Guido van Rossum (

From Steve.Dower at  Mon Oct 22 17:55:27 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 22 Oct 2012 15:55:27 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

> Personally, I'm interested in designing a system, including an event loop, 
> where you can rely on the properties of cooperative scheduling to avoid
> ever touching (OS) threading locks. I think such a system should be "pure"
> and all interaction with threads should be mediated by the event loop.
> (It's okay if this means that the implementation of the event loop must at
> some point acquire a threading lock.) The Futures used by the tasks to
> coordinate amongst themselves should not require locking -- they should
> themselves be able to rely on the guarantees of the event loop not to
> invoke multiple callbacks in parallel.

Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O.

That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.

(Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

> IIUC you can do this on Windows with IOCP too, simply by only having a 
> single thread reading events.

Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

[ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ]
> To me, the PEP 380 style is perfectly portable and composable. If you think
> it isn't, please elaborate.

I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function.

By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread). Here, the user still gets the benefits like:

def not_an_async_func():
    ops = list(map(get_url_async, list_of_urls))
    # all URLs are now downloading in parallel, let's do some other synchronous stuff
    results = list(map(Future.result, ops))

Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running.

(And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.)

However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]). 


From eric at  Mon Oct 22 13:21:07 2012
From: eric at (Eric V. Smith)
Date: Mon, 22 Oct 2012 07:21:07 -0400
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/22/2012 12:10 AM, Guido van Rossum wrote:
> On Sun, Oct 21, 2012 at 6:18 PM, Eric V. Smith <eric at> wrote:
>> On 10/21/2012 8:23 PM, Guido van Rossum wrote:
>>> I don't see it that way. Any time you acquire a lock, you may be
>>> blocked for a long time. In a typical event loop that's an absolute
>>> no-no. Typically, to wait for another thread, you give the other
>>> thread a callback that adds a new event for *this* thread.
>>> Now, it's possible that in Windows, when using IOCP, the philosophy is
>>> different -- I think I've read in
>>> that
>>> there can be multiple threads reading events from a single queue.
>> Correct. The typical usage of an IOCP is that you create as many threads
>> as you have CPUs (or cores, or execution units, or whatever the kids
>> call them these days), then they can all wait on the same IOCP. So if
>> you have, say 4 CPUs so 4 threads, they can all be woken up to do useful
>> work if the IOCP has work items for them.
> So what's the typical way to do locking in such a system? Waiting for
> a lock seems bad; and you can't assume that no other callbacks may run
> while you are running. What synchronization primitives are typically
> used?

When I've done it (admittedly 10 years ago) we just used critical
sections, since we weren't blocking for long (mostly memory management).
I'm not sure if that's a best practice or not. The IOCP will actually
let you block, then it will release another thread. So if you know
you're going to block, you should create more threads than you have CPUs.

Here's the relevant paragraph from the IOCP link you posted above:

"The system also allows a thread waiting in GetQueuedCompletionStatus to
process a completion packet if another running thread associated with
the same I/O completion port enters a wait state for other reasons, for
example the SuspendThread function. When the thread in the wait state
begins running again, there may be a brief period when the number of
active threads exceeds the concurrency value. However, the system
quickly reduces this number by not allowing any new active threads until
the number of active threads falls below the concurrency value. This is
one reason to have your application create more threads in its thread
pool than the concurrency value. Thread pool management is beyond the
scope of this topic, but a good rule of thumb is to have a minimum of
twice as many threads in the thread pool as there are processors on the
system. For additional information about thread pooling, see Thread Pools."

From jstpierre at  Mon Oct 22 18:46:47 2012
From: jstpierre at (Jasper St. Pierre)
Date: Mon, 22 Oct 2012 12:46:47 -0400
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Sat, Oct 20, 2012 at 6:38 PM, Guido van Rossum <guido at> wrote:
> On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre
> <jstpierre at> wrote:
>> I'm curious now... you keep mentioning Futures and Deferreds like
>> they're two separate entities. What distinction between the two do you
>> see?
> They have different interfaces and you end up using them differently.

Who is "you" supposed to refer to?

> In particular, quoting myself from another thread, here is how I use
> the terms:
> - Future: something with roughly the interface but not necessarily the
> implementation of PEP 3148.
> - Deferred: the Twisted Deferred class or something with very similar
> functionality (there are some in the JavaScript world).
> The big difference between Futures and Deferreds is that Deferreds can
> easily be chains together to create multiple stages, and each callback
> is called with the value returned from the previous stage; also,
> Deferreds have separate callback chains for regular values and errors.

Chaining is an add-on to the system and not necessarily required.
Dojo's Deferreds, modelled directly after Twisted's, don't have direct
chaining with multiple callbacks per Deferred, but instead addCallback
returns a new Deferred, which it may pass on to. This means that each
Deferred has one result, and chaining is done slightly differently.

The whole point of chaining is just convenience of mutating a value
before it's passed to the caller. It's possible to live without it.

    from async_http_client import fetch_page
    from some_xml_library import parse_xml

    def fetch_xml(url):
        d = fetch_page(url)
        return d


    def fetch_xml(url):
        def parse_page(result):

        d = Deferred()
        page = fetch_page(url)
        return d

The two functions, treated as a black box, are equivalent. The
distinction is convenience.

> --
> --Guido van Rossum (


From guido at  Mon Oct 22 19:03:53 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 10:03:53 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Mon, Oct 22, 2012 at 9:46 AM, Jasper St. Pierre
<jstpierre at> wrote:
> On Sat, Oct 20, 2012 at 6:38 PM, Guido van Rossum <guido at> wrote:
>> On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre
>> <jstpierre at> wrote:
>>> I'm curious now... you keep mentioning Futures and Deferreds like
>>> they're two separate entities. What distinction between the two do you
>>> see?
>> They have different interfaces and you end up using them differently.
> Who is "you" supposed to refer to?
>> In particular, quoting myself from another thread, here is how I use
>> the terms:
>> - Future: something with roughly the interface but not necessarily the
>> implementation of PEP 3148.
>> - Deferred: the Twisted Deferred class or something with very similar
>> functionality (there are some in the JavaScript world).
>> The big difference between Futures and Deferreds is that Deferreds can
>> easily be chains together to create multiple stages, and each callback
>> is called with the value returned from the previous stage; also,
>> Deferreds have separate callback chains for regular values and errors.
> Chaining is an add-on to the system and not necessarily required.
> Dojo's Deferreds, modelled directly after Twisted's, don't have direct
> chaining with multiple callbacks per Deferred, but instead addCallback
> returns a new Deferred, which it may pass on to. This means that each
> Deferred has one result, and chaining is done slightly differently.
> The whole point of chaining is just convenience of mutating a value
> before it's passed to the caller. It's possible to live without it.
> Compare:
>     from async_http_client import fetch_page
>     from some_xml_library import parse_xml
>     def fetch_xml(url):
>         d = fetch_page(url)
>         d.add_callback(parse_xml)
>         return d
> with:
>     def fetch_xml(url):
>         def parse_page(result):
>             d.callback(parse_xml(result))
>         d = Deferred()
>         page = fetch_page(url)
>         page.add_callback(parse_page)
>         return d
> The two functions, treated as a black box, are equivalent. The
> distinction is convenience.

Jasper, I don't know you. You may be  a wizard-levelTwisted user, or
maybe you once saw a Twisted tutorial. All I know is that when I
started this discussion I used the term Future thinking Deferreds were
just Futures, and then Twisted core developers started explaining me
that Deferreds are so much more than Futures (I think it may have been
Glyph himself, in one of his longer posts). So please go argue the
distinction or similarity with the Twisted core developers, not with

--Guido van Rossum (

From guido at  Mon Oct 22 19:34:38 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 10:34:38 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Mon, Oct 22, 2012 at 8:55 AM, Steve Dower <Steve.Dower at> wrote:
>> Personally, I'm interested in designing a system, including an event loop,
>> where you can rely on the properties of cooperative scheduling to avoid
>> ever touching (OS) threading locks. I think such a system should be "pure"
>> and all interaction with threads should be mediated by the event loop.
>> (It's okay if this means that the implementation of the event loop must at
>> some point acquire a threading lock.) The Futures used by the tasks to
>> coordinate amongst themselves should not require locking -- they should
>> themselves be able to rely on the guarantees of the event loop not to
>> invoke multiple callbacks in parallel.
> Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O.

I'm curious what the Twisted folks have to say about this. Or the
folks using gevent.

I think your world view is colored by Windows; that's fine, we need
input from experienced Windows users. But I can certainly imagine
other ways of dealing with this.

For example, in CPython, at least, a callback that is called directly
by the OS cannot call straight into Python anyway -- you have to
acquire the GIL first. This pretty much means that an unconstrained
callback directly from the OS cannot call straight into Python -- it
has to put something into a queue, and the bytecode interpreter will
eventuall call it (possibly in another thread). This is how signal
handlers are invoked too.

> That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code

If interrupt-scheduled means what I think it means, this can only be C
code. For the Python callback, see above.

> queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea.

I don't actually believe we should avoid all locks. I do believe that
there should be a separate mechanism, likely OS-specific, whereby the
"pure" async world and the "messy" threading world can hand off data
to each other. It is probably unavoidable that the implementation of
this mechanism touches a threading lock. But this does not mean that
the rest of the "pure" world should need to use a Future class that
touches threading locks.

> (Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.)

This sounds like a potentially fatal clash in terminology. In the way
I use 'async', Twisted, Tornado and gevent certainly qualify, and all
those have huge parts of their API where there is no non-deterministic
switching in sight -- in fact, they all carefully fence off the part
that does interact with threads. For example, the Twisted folks have
argued that one of the big advantages of using Twisted's Deferred
class is that while a callback is running, the state of the world
remains constant (except for actions made by the callback itself,

What other term should we use to encompass this world view (which IMO
is a perfectly valid abstraction for a lot of I/O-related

>> IIUC you can do this on Windows with IOCP too, simply by only having a
>> single thread reading events.
> Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization.

It does sound like this may be unique to Windows, or at least not
shared with most of the UNIX world (UNIX ports of IOCP

> [ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ]
>> To me, the PEP 380 style is perfectly portable and composable. If you think
>> it isn't, please elaborate.
> I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function.

And that is by design -- Greg *wants* it to be that way, and so far I
haven't found a reason to disagree with him. It seems you just
fundamentally disagree with the design, but your arguments come from a
fundamentally different world view.

> By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread).

Note that in the YF world, there are also ways to stop the yield to
bubble all the way to the top. You simply call the generator function,
which gives you a generator object, and the scheduler module or class
can offer a variety of APIs to do things with it -- e.g. run it
without waiting for it (yet), run several of these in parallel until
one of them (or all of them) completes, etc.

> Here, the user still gets the benefits like:
> def not_an_async_func():
>     ops = list(map(get_url_async, list_of_urls))
>     # all URLs are now downloading in parallel, let's do some other synchronous stuff
>     results = list(map(Future.result, ops))

And in the YF world you can do that too.

> Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running.

I don't see that. The user just has to be able to get a reference to
the schedule, which should be part of the scheduler's API (e.g. a
function in its module that returns the current scheduler instance).

> (And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.)

With YF it doesn't have to be the base of the stack. It just usually is.

I feel we are going around in circles.

> However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]).

I don't know what OpenCV is, but assuming it is something that doesn't
know about YF, then it needs to run in a thread of its own (or a
threadpool). It is perfectly possible to add a primitive operation to
the YF scheduler that says "run this in a threadpool and wake me up
when it produces a result". The public API for that primitive can
certainly use YF itself -- the messing interface with threads can be
completely hidden from view. IMO YF scheduler worth using for real
work must provide such a primitive (it was one of the first things I
had to do in my own prototype, to be able to call

--Guido van Rossum (

From ned at  Mon Oct 22 20:59:55 2012
From: ned at (Ned Batchelder)
Date: Mon, 22 Oct 2012 14:59:55 -0400
Subject: [Python-ideas] Windows temporary file association for Python
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/22/2012 8:42 AM, anatoly techtonik wrote:
> A solution would be to teach launcher to honor PYTHON_PATH variable if
> it is set (please don't confuse it with PYTHONPATH which purpose is
> still unclear on Windows).
What are you talking about?  PYTHON_PATH doesn't appear in the CPython 
sources at all. PYTHONPATH has the same purpose on Windows that it has 
anywhere: a list of directories to prefix to sys.path to find modules 
when importing.


From Steve.Dower at  Mon Oct 22 21:18:02 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 22 Oct 2012 19:18:02 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

>>> Personally, I'm interested in designing a system, including an event
>>> loop, where you can rely on the properties of cooperative scheduling
>>> to avoid ever touching (OS) threading locks. I think such a system should be "pure"
>>> and all interaction with threads should be mediated by the event loop.
>>> (It's okay if this means that the implementation of the event loop
>>> must at some point acquire a threading lock.) The Futures used by the
>>> tasks to coordinate amongst themselves should not require locking --
>>> they should themselves be able to rely on the guarantees of the event
>>> loop not to invoke multiple callbacks in parallel.
>> Unfortunately, a "pure" system means that no async operation can ever have an OS
>> provided callback (or one that comes from outside the world of the scheduler). The purity
>> in this case becomes infectious and limits what operations can be continued from(/waited
>> on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for
>> that loop, whether by modifying a queue or setting a Future. This kind of system does not
>> help with callback-based I/O.
> I'm curious what the Twisted folks have to say about this. Or the folks using gevent.

So am I, but my guess would be that as long as you stay within their 'world' everything is fine (I haven't seen any Twisted code to make me believe otherwise, but happy to accept examples - I have no experience with it directly, though I believe I've used similar concepts before). This is fine for a library or framework, but I don't think it's appropriate for a standard library - maybe this is where our views differ?

> I think your world view is colored by Windows; that's fine, we need input from experienced
> Windows users. But I can certainly imagine other ways of dealing with this.

Coloured by threads is probably more accurate, but then again, throwing threads around wildly is definitely a Windows thing :). I also have a background in microcontrollers, including writing my own pre-emptive and cooperative schedulers that worked with external devices, so I'm trying to draw on that as much as my Windows experience.

> For example, in CPython, at least, a callback that is called directly by the OS cannot
> call straight into Python anyway -- you have to acquire the GIL first. This pretty much
> means that an unconstrained callback directly from the OS cannot call straight into Python
> -- it has to put something into a queue, and the bytecode interpreter will eventuall call
> it (possibly in another thread). This is how signal handlers are invoked too.

I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever).

>> That's not to say that I want big heavy locks everywhere, but as soon
>> as you potentially have two interrupt-scheduled pieces of code
> If interrupt-scheduled means what I think it means, this can only be C code. For the
> Python callback, see above.

I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it. 

>> queuing to the same loop you need to synchronise access to the data structure. As soon
>> as you get the state and result of a future non-atomically, you need synchronization. I
>> don't doubt there are ways around this (CAS goes a long way, also the GIL will probably
>> help, assuming it's all Python code), and the current implementation of Future is a bit on
>> the heavy side (but also suitable for much more arbitrary uses), but I really believe that
>> avoiding all locks is a bad idea.
> I don't actually believe we should avoid all locks. I do believe that there should be a
> separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy"
> threading world can hand off data to each other. It is probably unavoidable that the
> implementation of this mechanism touches a threading lock. But this does not mean that the
> rest of the "pure" world should need to use a Future class that touches threading locks.

We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user.

>> (Also, I don't consider cooperative multitasking to be "async" - async
>> requires at least two simultaneous (or at least non-deterministically
>> switching) tasks, whether these are CPU threads or hardware-controlled
>> I/O.)
> This sounds like a potentially fatal clash in terminology. In the way I use 'async',
> Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API
> where there is no non-deterministic switching in sight -- in fact, they all carefully
> fence off the part that does interact with threads. For example, the Twisted folks have
> argued that one of the big advantages of using Twisted's Deferred class is that while a
> callback is running, the state of the world remains constant (except for actions made by
> the callback itself, obviously).
> What other term should we use to encompass this world view (which IMO is a perfectly valid
> abstraction for a lot of I/O-related concurrency)?

It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.)

>>> IIUC you can do this on Windows with IOCP too, simply by only having
>>> a single thread reading events.
>> Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any
> more completions) you need to schedule it back to another thread. This requires
> synchronization.
> It does sound like this may be unique to Windows, or at least not shared with most of the
> UNIX world (UNIX ports of IOCP notwithstanding).

IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop.

>> However, as you pointed out earlier, if all you are trying to achieve is "pure"
>> coroutines, then YF is perfectly appropriate. But this is because of the high level of
>> cooperation required between the involved tasklets. As I understand it, coroutines gain me
>> nothing once I call into a long OpenCV operation, because OpenCV does not know that it is
>> supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are
>> great for within a program, but they don't extend so well into libraries, and certainly
>> provide no compatibility with existing ones (whereas, at worst, I can always write "yield
>> thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library
>> [except maybe a threading library... don't take that "any" too literally]).
> I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then
> it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add
> a primitive operation to the YF scheduler that says "run this in a threadpool and wake me
> up when it produces a result". The public API for that primitive can certainly use YF
> itself -- the messing interface with threads can be completely hidden from view. IMO YF
> scheduler worth using for real work must provide such a primitive (it was one of the first
> things I had to do in my own prototype, to be able to call socket.getaddrinfo()).

Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these).


From guido at  Mon Oct 22 22:26:06 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 13:26:06 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Mon, Oct 22, 2012 at 12:18 PM, Steve Dower <Steve.Dower at> wrote:
[Quoting me]
>> For example, in CPython, at least, a callback that is called directly by the OS cannot
>> call straight into Python anyway -- you have to acquire the GIL first. This pretty much
>> means that an unconstrained callback directly from the OS cannot call straight into Python
>> -- it has to put something into a queue, and the bytecode interpreter will eventuall call
>> it (possibly in another thread). This is how signal handlers are invoked too.
> I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever).

Ok, forget the GIL (though PyPy has one). Anyway, the existing
mechanism I was referring to does *not* guarantee that the callback
keeps the GIL as long as it runs. The GIL is used to emulate
preemptive scheduling while still protecting CPython's internal data
structures from concurrent access. It makes no guarantees for user
data. Even "x = d[key]" may release the GIL if the dict contains keys
whose __eq__ is implemented in Python.

But the crucial point of the mechanism is that you don't call straight
into Python from the OS-level callback (which is written in C or some
other low-level language). You arrange for the interpreter to call the
Python-level callback at some later time. So you might as well use
this to enforce single-threading, if that's the way of your world.

>>> That's not to say that I want big heavy locks everywhere, but as soon
>>> as you potentially have two interrupt-scheduled pieces of code
>> If interrupt-scheduled means what I think it means, this can only be C code. For the
>> Python callback, see above.
> I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it.

At least in CPython, it's not just the GIL. The queue I'm talking
about above must exist even in a CPython version that has no threading
support (and hence no GIL). You still cannot call into Python from a
signal handler or other callback called directly by the OS kernel. You
must delay it until the bytecode interpreter is at a good stopping
point. Check out this code:
(AddPendingCall and friends).

>>> queuing to the same loop you need to synchronise access to the data structure. As soon
>>> as you get the state and result of a future non-atomically, you need synchronization. I
>>> don't doubt there are ways around this (CAS goes a long way, also the GIL will probably
>>> help, assuming it's all Python code), and the current implementation of Future is a bit on
>>> the heavy side (but also suitable for much more arbitrary uses), but I really believe that
>>> avoiding all locks is a bad idea.
>> I don't actually believe we should avoid all locks. I do believe that there should be a
>> separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy"
>> threading world can hand off data to each other. It is probably unavoidable that the
>> implementation of this mechanism touches a threading lock. But this does not mean that the
>> rest of the "pure" world should need to use a Future class that touches threading locks.
> We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user.

Yes, this sounds find. I note that the existing APIs already encourage
leaving the creation of the Future to library code -- you don't
construct a Future, typically, but call an executor's submit() method.

>>> (Also, I don't consider cooperative multitasking to be "async" - async
>>> requires at least two simultaneous (or at least non-deterministically
>>> switching) tasks, whether these are CPU threads or hardware-controlled
>>> I/O.)
>> This sounds like a potentially fatal clash in terminology. In the way I use 'async',
>> Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API
>> where there is no non-deterministic switching in sight -- in fact, they all carefully
>> fence off the part that does interact with threads. For example, the Twisted folks have
>> argued that one of the big advantages of using Twisted's Deferred class is that while a
>> callback is running, the state of the world remains constant (except for actions made by
>> the callback itself, obviously).
>> What other term should we use to encompass this world view (which IMO is a perfectly valid
>> abstraction for a lot of I/O-related concurrency)?
> It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.)

Hm. I guess you are talking about the low-level (or should I say
OS-kernel-called) callback; most event frameworks for Python (except
perhaps gevent?) use user-level callback extensively -- in fact that's
where Twisted wants you to do all the work.

So, again a clash of terminology...

(Aside: please don't use 'continuation' for 'task'. The use of this
term in Scheme has forever tainted the word for me.)

>>>> IIUC you can do this on Windows with IOCP too, simply by only having
>>>> a single thread reading events.
>>> Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any
>> more completions) you need to schedule it back to another thread. This requires
>> synchronization.
>> It does sound like this may be unique to Windows, or at least not shared with most of the
>> UNIX world (UNIX ports of IOCP notwithstanding).
> IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop.

So maybe IOCP is not all that relevant. Very early on in this
discussion, IOCP was brought up as an important example of a system
for async I/O that had a significantly *different* API than the
typical select/poll/etc.-based systems found on UNIX platforms. But
its relevance may well decompose into a few separable concerns:

- Don't assume everything is a file descriptor.

- On some systems, the natural way to do async I/O is *not* to wait
until the socket (or other event source) is ready, but to ask it to
perform a specific operation in "overlapping" (or async) mode, and you
will get an event back when it is done.

- Event queues are powerful.

- You cannot ignore threads everywhere.

>>> However, as you pointed out earlier, if all you are trying to achieve is "pure"
>>> coroutines, then YF is perfectly appropriate. But this is because of the high level of
>>> cooperation required between the involved tasklets. As I understand it, coroutines gain me
>>> nothing once I call into a long OpenCV operation, because OpenCV does not know that it is
>>> supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are
>>> great for within a program, but they don't extend so well into libraries, and certainly
>>> provide no compatibility with existing ones (whereas, at worst, I can always write "yield
>>> thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library
>>> [except maybe a threading library... don't take that "any" too literally]).
>> I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then
>> it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add
>> a primitive operation to the YF scheduler that says "run this in a threadpool and wake me
>> up when it produces a result". The public API for that primitive can certainly use YF
>> itself -- the messing interface with threads can be completely hidden from view. IMO YF
>> scheduler worth using for real work must provide such a primitive (it was one of the first
>> things I had to do in my own prototype, to be able to call socket.getaddrinfo()).
> Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these).

But @async must imported from *somewhere*, and that's where the
decisions are made on how the scheduler works. If you want to use a
different scheduler you still have to import a different @async.

(TBH I don't understand your with_options() thing. If that's how you
propose switching scheduler implementations, there's still a default
behavior that you'd have to change on a per-call basis.)

And about threads and futures: I am making a principled stance that
you shouldn't have to use threads, and you shouldn't have to use a
future implementation that's tied to threads. But maybe we should hear
from some Twisted folks...

--Guido van Rossum (

From drobinow at  Mon Oct 22 22:37:30 2012
From: drobinow at (David Robinow)
Date: Mon, 22 Oct 2012 16:37:30 -0400
Subject: [Python-ideas] Windows temporary file association for Python
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 22, 2012 at 6:51 AM, anatoly techtonik <techtonik at> wrote:
> I wonder if it will make the life easier if Python was installed with
> .py association to "%PYTHON_HOME%\python.exe" "%1" %*
> It will remove the need to run .py scripts in virtualenv with explicit
> 'python' prefix.
> Example how it doesn't work right now
> E:\virtenv32\Scripts>echo import sys; print(sys.version) >
> E:\virtenv32\Scripts>
> 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)]
> E:\virtenv32\Scripts>python
> 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)]
> If Python file association was specified with
> "%PYTHON_HOME%\python.exe" "%1" %* then virtualenv could override this
> variable when setting the environment to set correct executable for
> .py files.

I believe you can solve your problem with the PY_PYTHON environment
variable or the user's py.ini file.  See section 3.4.4 of the

From guido at  Mon Oct 22 22:58:14 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 13:58:14 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Steve, I realize that continued point-by-point rebuttals probably are
getting pointless. Maybe your enthusiasm and energy would be better
spent trying to propose and implement (a prototype) of an API in the
style that you prefer? Maybe we can battle it out in code more

--Guido van Rossum (

From Steve.Dower at  Mon Oct 22 23:32:42 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 22 Oct 2012 21:32:42 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Sounds good. I'll make some revisions to the code I posted earlier and come up with some comparable/benchmarkable examples.

Apart from the network server and client examples that have already been discussed, any particular problems I should be looking at solving with this? (Anyone?) I don't want to only come up with 'good' examples.

-----Original Message-----
From: gvanrossum at [mailto:gvanrossum at] On Behalf Of Guido van Rossum
Sent: Monday, October 22, 2012 1358
To: Steve Dower
Cc: python-ideas at
Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from)

Steve, I realize that continued point-by-point rebuttals probably are getting pointless. Maybe your enthusiasm and energy would be better spent trying to propose and implement (a prototype) of an API in the style that you prefer? Maybe we can battle it out in code more easily...

--Guido van Rossum (

From guido at  Mon Oct 22 23:41:55 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 14:41:55 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Mon, Oct 22, 2012 at 2:32 PM, Steve Dower <Steve.Dower at> wrote:
> Sounds good. I'll make some revisions to the code I posted earlier and come up with some comparable/benchmarkable examples.
> Apart from the network server and client examples that have already been discussed, any particular problems I should be looking at solving with this? (Anyone?) I don't want to only come up with 'good' examples.

I have a prototype implementing an async web client that fetches a
page given a URL. Primitives I have in mind include running several of
these concurrently and waiting for the first to come up with a result,
or waiting for all results, or getting the results as they are ready.
I have an event loop that can use select, poll, epoll, and kqueue
(though I've only lightly tested it, on Linux and OSX, so I'm sure
I've missed some corner cases and optimization possibilities). The
fetcher calls socket.getaddrinfo() in a threadpool.

--Guido van Rossum (

From greg.ewing at  Tue Oct 23 00:09:41 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 23 Oct 2012 11:09:41 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> On Mon, Oct 22, 2012 at 8:55 AM, Steve Dower <Steve.Dower at> wrote:

>> Yes, but unless you run all subsequent code on the IOCP thread (thereby
>> blocking any more completions) you need to schedule it back to another thread.
>> This requires synchronization.

I think there's an assumption behind this whole async tasks discussion
that the tasks being scheduled are I/O bound. We're trying to overlap
CPU activity with I/O, and different I/O activities with each other.
We're *not* trying to achieve concurrency of CPU-bound tasks -- the
GIL prevents that anyway for pure Python code.

The whole Windows IOCP thing, on the other hand, seems to be geared
towards having a pool of threads, any of which can handle any I/O
operation. That's not helpful for us; when one of our tasks blocks
waiting for I/O, the completion of that I/O must wake up *that particular
task*, and it must be run using the same OS thread that was running
it before.

I gather that Windows provides a way of making an async I/O request
and specifying a callback for that request. If that's the case, do
we need to bother with an IOCP at all? Just have the callback wake
up the associated task directly.


From guido at  Tue Oct 23 00:30:46 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 15:30:46 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 22, 2012 at 3:09 PM, Greg Ewing <greg.ewing at> wrote:
> I think there's an assumption behind this whole async tasks discussion
> that the tasks being scheduled are I/O bound. We're trying to overlap
> CPU activity with I/O, and different I/O activities with each other.
> We're *not* trying to achieve concurrency of CPU-bound tasks -- the
> GIL prevents that anyway for pure Python code.

Right. Of course.

> The whole Windows IOCP thing, on the other hand, seems to be geared
> towards having a pool of threads, any of which can handle any I/O
> operation. That's not helpful for us; when one of our tasks blocks
> waiting for I/O, the completion of that I/O must wake up *that particular
> task*, and it must be run using the same OS thread that was running
> it before.

The reason we can't ignore IOCP is that it is apparently the *only*
way to do async I/O in a scalable way. The only other polling
primitive available is select() which does not scale. (Or so it is
asserted by many folks; I haven't tested this, but I believe the
argument against select() scaling in general.)

> I gather that Windows provides a way of making an async I/O request
> and specifying a callback for that request. If that's the case, do
> we need to bother with an IOCP at all? Just have the callback wake
> up the associated task directly.

AFAICT the way to do that goes through IOCP...

--Guido van Rossum (

From greg.ewing at  Tue Oct 23 00:33:00 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 23 Oct 2012 11:33:00 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> (Aside: please don't use 'continuation' for 'task'. The use of this
> term in Scheme has forever tainted the word for me.)

It has a broader meaning than the one in Scheme; essentially
it's a synonym for "callback".

I agree it shouldn't be used as a synonym for "task", though.
In any of its forms, a continuation isn't an entire task, it's
something that you call to cause the resumption of a task
from a particular suspension point.


From Steve.Dower at  Tue Oct 23 00:31:10 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 22 Oct 2012 22:31:10 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

>>> Yes, but unless you run all subsequent code on the IOCP thread
>>> (thereby blocking any more completions) you need to schedule it back to another thread.
>>> This requires synchronization.
> I think there's an assumption behind this whole async tasks discussion that the tasks
> being scheduled are I/O bound. We're trying to overlap CPU activity with I/O, and
> different I/O activities with each other.
> We're *not* trying to achieve concurrency of CPU-bound tasks -- the GIL prevents that
> anyway for pure Python code.

Sure, but it's easy enough to slip it in for (nearly) free. The only other option is complete exclusion of CPU-bound concurrency, which also rules out running C functions (outside the GIL) on a separate thread.

> The whole Windows IOCP thing, on the other hand, seems to be geared towards having a pool
> of threads, any of which can handle any I/O operation. That's not helpful for us; when one
> of our tasks blocks waiting for I/O, the completion of that I/O must wake up *that
> particular task*, and it must be run using the same OS thread that was running it before.
> I gather that Windows provides a way of making an async I/O request and specifying a
> callback for that request. If that's the case, do we need to bother with an IOCP at all?
> Just have the callback wake up the associated task directly.

IOCP is probably not useful at all, and as Guido said, it was brought up as an example of a non-select style of waiting. APIs like ReadFileEx/WriteFileEx let you provide the callback directly without using IOCP. In any case, even if we did use IOCP it would be an implementation detail and would not affect how the API is exposed.

(Also, love your work on PEP 380. Despite my hesitation about using yield from for this API, I do really like using it with generators.)


From guido at  Tue Oct 23 00:35:12 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 15:35:12 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
>> (Aside: please don't use 'continuation' for 'task'. The use of this
>> term in Scheme has forever tainted the word for me.)
> It has a broader meaning than the one in Scheme; essentially
> it's a synonym for "callback".

(Off-topic:) But does that meaning apply to Scheme? If so, I wish
someone would have told me 15 years ago...

> I agree it shouldn't be used as a synonym for "task", though.
> In any of its forms, a continuation isn't an entire task, it's
> something that you call to cause the resumption of a task
> from a particular suspension point.

I guess that was just Steve showing off. :-)

--Guido van Rossum (

From Steve.Dower at  Tue Oct 23 00:49:40 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 22 Oct 2012 22:49:40 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Alertable I/O (<>) and overlapped I/O are two alternatives to IOCP on Windows.

>> I agree [continuation] shouldn't be used as a synonym for "task", though.
>> In any of its forms, a continuation isn't an entire task, it's
>> something that you call to cause the resumption of a task
>> from a particular suspension point.
> I guess that was just Steve showing off. :-)

Not intentionally - the team here that did async/await in C# talks a lot about "continuation-passing style", which is where I picked the term up from. I don't use it as a synonym for "task" - it's always meant the "bit that runs after we come back from the yield" (hmm... I think that definition needs some work...).

From guido at  Tue Oct 23 00:56:48 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 15:56:48 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 22, 2012 at 3:49 PM, Steve Dower <Steve.Dower at> wrote:
> Alertable I/O (<>) and overlapped I/O are two alternatives to IOCP on Windows.
>>> I agree [continuation] shouldn't be used as a synonym for "task", though.
>>> In any of its forms, a continuation isn't an entire task, it's
>>> something that you call to cause the resumption of a task
>>> from a particular suspension point.
>> I guess that was just Steve showing off. :-)
> Not intentionally - the team here that did async/await in C# talks a lot about "continuation-passing style", which is where I picked the term up from. I don't use it as a synonym for "task" - it's always meant the "bit that runs after we come back from the yield" (hmm... I think that definition needs some work...).

Yeah, I have the same terminology hang-up with the term
"continuation-passing-style" for web callbacks.

Reading back what you wrote, you were indeed trying to distinguish
between the "callback" (which you consider the thing that's directly
invoked by the OS) and "the rest of the task" (e.g. the code that runs
when the yield is resumed), which you were referring to as
"continuation". I'd just use "the rest of the task" here.

--Guido van Rossum (

From greg.ewing at  Tue Oct 23 01:04:57 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 23 Oct 2012 12:04:57 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> The reason we can't ignore IOCP is that it is apparently the *only*
> way to do async I/O in a scalable way. The only other polling
> primitive available is select() which does not scale.

There seems to be an alternative to polling, though. There are
functions called ReadFileEx and WriteFileEx that allow you to
pass in a routine to be called when the operation completes:

Is there some reason that this doesn't scale either?


From guido at  Tue Oct 23 01:09:28 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 16:09:28 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 22, 2012 at 4:04 PM, Greg Ewing <greg.ewing at> wrote:
> Guido van Rossum wrote:
>> The reason we can't ignore IOCP is that it is apparently the *only*
>> way to do async I/O in a scalable way. The only other polling
>> primitive available is select() which does not scale.
> There seems to be an alternative to polling, though. There are
> functions called ReadFileEx and WriteFileEx that allow you to
> pass in a routine to be called when the operation completes:
> Is there some reason that this doesn't scale either?

I don't know, we've reached territory I don't know at all. Are there
also similar calls for Accept() and Connect() on sockets? Those seem
the other major blocking primitives that are frequently used.

FWIW, here is where I read about IOCP being the only scalable way on

--Guido van Rossum (

From thoover at  Tue Oct 23 01:36:35 2012
From: thoover at (Tom Hoover)
Date: Mon, 22 Oct 2012 16:36:35 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 22, 2012 at 4:09 PM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 22, 2012 at 4:04 PM, Greg Ewing <greg.ewing at> wrote:
> > Guido van Rossum wrote:
> >
> >> The reason we can't ignore IOCP is that it is apparently the *only*
> >> way to do async I/O in a scalable way. The only other polling
> >> primitive available is select() which does not scale.
> >
> >
> > There seems to be an alternative to polling, though. There are
> > functions called ReadFileEx and WriteFileEx that allow you to
> > pass in a routine to be called when the operation completes:
> >
> >
> >
> >
> > Is there some reason that this doesn't scale either?
> I don't know, we've reached territory I don't know at all. Are there
> also similar calls for Accept() and Connect() on sockets? Those seem
> the other major blocking primitives that are frequently used.
> FWIW, here is where I read about IOCP being the only scalable way on
> Windows:

It's been years since I've looked at this stuff, but I believe that
you want to use AcceptEx and ConnectEx in conjunction with IOCP.

event_iocp.c and listener.c in libevent 2.0.x could help shed some
light on the details.

From greg.ewing at  Tue Oct 23 01:48:39 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 23 Oct 2012 12:48:39 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg.ewing at> wrote:
>>It has a broader meaning than the one in Scheme; essentially
>>it's a synonym for "callback".
> (Off-topic:) But does that meaning apply to Scheme? If so, I wish
> someone would have told me 15 years ago...

It does, in the sense that a continuation appears to the
Scheme programmer as a callable object.

The connection goes deeper as well. There's a style of
programming called "continuation-passing style", in which
nothing ever returns -- every function is passed another
function to be called with its result. In a language such
as Scheme that supports tail calls, you can use this style
extensively without fear of overflowing the call stack.

You're using this style whenever you chain callbacks
together using Futures or Deferreds. The callbacks don't
return values; instead, each callback arranges for another
callback to be called, passing it the result.

This is also the way monadic I/O works in Haskell. None
of the I/O functions ever return, they just call another
function and pass it the result. A combination of currying
and syntactic sugar is used to hide the fact that you're
passing callbacks -- aka continuations -- around all
over the place.

Now, it turns out that you can define all the semantics
of Scheme, including its continuations, by writing a Scheme
interpreter in Scheme that doesn't itself use Scheme
continuations. You do it by writing the whole interpereter
in continuation-passing style, and it becomes clear that
at that level, the "continuations" are just ordinary
functions, relying on lexical scoping to capture all of the
necessary state.

> I guess that was just Steve showing off. :-)

Not really -- to someone with a Scheme or FP background,
it's near-impossible to look at something like a chain
of Deferred callbacks without the word "continuation"
springing to mind. I agree that it's not helpful to
anyone without such a background, however.


From guido at  Tue Oct 23 01:54:41 2012
From: guido at (Guido van Rossum)
Date: Mon, 22 Oct 2012 16:54:41 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

And, predictably, that gave me a headache... :-)

--Guido van Rossum (sent from Android phone)
On Oct 22, 2012 4:49 PM, "Greg Ewing" <greg.ewing at> wrote:

> Guido van Rossum wrote:
>> On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg.ewing at>
>> wrote:
>>  It has a broader meaning than the one in Scheme; essentially
>>> it's a synonym for "callback".
>> (Off-topic:) But does that meaning apply to Scheme? If so, I wish
>> someone would have told me 15 years ago...
> It does, in the sense that a continuation appears to the
> Scheme programmer as a callable object.
> The connection goes deeper as well. There's a style of
> programming called "continuation-passing style", in which
> nothing ever returns -- every function is passed another
> function to be called with its result. In a language such
> as Scheme that supports tail calls, you can use this style
> extensively without fear of overflowing the call stack.
> You're using this style whenever you chain callbacks
> together using Futures or Deferreds. The callbacks don't
> return values; instead, each callback arranges for another
> callback to be called, passing it the result.
> This is also the way monadic I/O works in Haskell. None
> of the I/O functions ever return, they just call another
> function and pass it the result. A combination of currying
> and syntactic sugar is used to hide the fact that you're
> passing callbacks -- aka continuations -- around all
> over the place.
> Now, it turns out that you can define all the semantics
> of Scheme, including its continuations, by writing a Scheme
> interpreter in Scheme that doesn't itself use Scheme
> continuations. You do it by writing the whole interpereter
> in continuation-passing style, and it becomes clear that
> at that level, the "continuations" are just ordinary
> functions, relying on lexical scoping to capture all of the
> necessary state.
>  I guess that was just Steve showing off. :-)
> Not really -- to someone with a Scheme or FP background,
> it's near-impossible to look at something like a chain
> of Deferred callbacks without the word "continuation"
> springing to mind. I agree that it's not helpful to
> anyone without such a background, however.
> --
> Greg
> ______________________________**_________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Tue Oct 23 02:07:01 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 23 Oct 2012 13:07:01 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> And, predictably, that gave me a headache... :-)

Oops, sorry, Guido -- I shouldn't have mentioned
the M-word. :-)


From andrew.robert.moffat at  Tue Oct 23 04:52:59 2012
From: andrew.robert.moffat at (Andrew Moffat)
Date: Mon, 22 Oct 2012 21:52:59 -0500
Subject: [Python-ideas] Interest in seeing in the stdlib
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 22, 2012 at 9:52 AM, Jasper St. Pierre <jstpierre at>wrote:

> On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat
> <andrew.robert.moffat at> wrote:
> > Hi,
> >
> > I'm the author of, an intuitive interface for launching
> subprocesses
> > in Linux and OSX  It has been maintained
> on
> > github for about 10 months and currently
> has
> > about 25k installs, according to
> > (,
> >
> >
> > Andy Grover maintains the Fedora rpm for
> >  and Nick
> > Moffit has submitted an older version of (which was called pbs) to
> be
> > included in Debian distros
> >
> >
> > I'm interested in making more accessible to help bring Python
> forward
> > in the area of shell scripting, so I'm interested in seeing if sh would
> be
> > suitable for the standard library.  Is there any other interest in
> something
> > like this?
> I'm not one for the sugar. Seems like you're stuffing the Python
> syntax where it doesn't quite belong, as evidenced by the many escape
> hatches. Basic query of things not covered in the documentation:
> If I import a non-existant program, will it give me back a function
> that will fail or raise an ImportError?
> How do I run a program with a - in the name? You say you replace -
> with _, but thatdoesn't specify what happens in the edge case of "if I
> have google-chrome and google_chrome, which one wins? What about
> /usr/bin/google-chrome and /usr/local/bin/google_chrome"? That is,
> will it exhaust the PATH before trying fallbacks replacements or will
> it check all replacements at once?
> If I have a program that's not on PATH, what do I do? I can manipulate
> the PATH environment variable, but am I guaranteed that will work? Are
> you going to double fork forever to guarantee that environment? Can I
> build a custom prefix, like p =
> sh.MagicPrefix(path="/opt/android_devtools/bin"), and have that work
> like the regular sh module? p.gcc("whatever") ? Even with the
> existence of a regular gcc in the path?
> I wonder what happens if you do from sh import *.
> Does it block execution before continuing? How can I do parallel
> execution of four subprocesses, and get notified when all four are
> done? (Seems like this might be a thing for a Future as well, even in
> the absence of any scheduler or event loop).
> Are newcomers going to be confused by this? What happens if I try and
> do something like"-l -a")? Will you use the POSIX shell parsing
> algorithm, pass it to bash, or pass it as one parameter? Will some
> form of injection attack be mitigated by this design?
> If you see this magic syntax as your one unique feature, I'd propose
> that you add it to the subprocess module, and improve the standard
> subprocess module's interface to cope with the new feature.
> But I don't see this as a worthwhile thing to have. -1 on the thing.
> > Thanks
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at
> >
> >
> --
>   Jasper

Hi Jasper, thanks for your questions

If I import a non-existant program, will it give me back a function
> that will fail or raise an ImportError?

Yes, an exception will be raised

 How do I run a program with a - in the name? You say you replace -
> with _, but thatdoesn't specify what happens in the edge case of "if I
> have google-chrome and google_chrome, which one wins? What about
> /usr/bin/google-chrome and /usr/local/bin/google_chrome"? That is,
> will it exhaust the PATH before trying fallbacks replacements or will
> it check all replacements at once?

The full PATH will be exhausted for the exact command, as typed, before any
kind of "-" replacement is exercised.  There hasn't been much concern about
this because most people who want to call commands with special characters
prefer to use the Command class (e.g. chrome =
Command("/usr/bin/google-chrome")), so the documentation makes a note of
this on this issue.

If I have a program that's not on PATH, what do I do? I can manipulate
> the PATH environment variable, but am I guaranteed that will work? Are
> you going to double fork forever to guarantee that environment? Can I
> build a custom prefix, like p =
> sh.MagicPrefix(path="/opt/android_devtools/bin"), and have that work
> like the regular sh module? p.gcc("whatever") ? Even with the
> existence of a regular gcc in the path?

You could manipulate the PATH, but a better way would be to use the Command
class, which can take a full path of a command.  The returned object can be
used just like other commands.

I wonder what happens if you do from sh import *.

"ImportError: Cannot import * from sh. Please import sh or import programs
individually."  Commands are lazy resolved on sh anyways, so loading from
all would be undefined.

Does it block execution before continuing? How can I do parallel
> execution of four subprocesses, and get notified when all four are
> done? (Seems like this might be a thing for a Future as well, even in
> the absence of any scheduler or event loop).

Commands may be run in the background, like this:

job1 = sh.tar("-zc", "-f", "archive-name.tar.gz", "/some/directory",
job2 = sh.tar(..., _bg=True)
job3 = sh.tar(..., _bg=True)


Are newcomers going to be confused by this? What happens if I try and
> do something like"-l -a")? Will you use the POSIX shell parsing
> algorithm, pass it to bash, or pass it as one parameter? Will some
> form of injection attack be mitigated by this design?

Bash--nor any shell-- is called into play. doesn't do any argument
parsing either.  Arguments are passed into commands exactly as they're sent
through  Newcomers have loved it so far, and after seeing some
examples, there's been minimal confusion about how to use it.

My thoughts about the magical-ness of  I typically don't support
very magical or dynamically resolving modules.  I can be a little
apprehensive of ORMs for this reason... I like to know how my code behaves
explicitly.  I think clever/magic can be confusing for people and
inexplicit, and it's important to know more or less what's going on under
the hood.  But I also think that scratches an itch that a less
dynamic approach fails to reach.  My goal for has been to make
writing system scripts as easy for Python as it is for Bash.  People who
write Bash scripts do so for a few reasons, one being that a shell script
is pretty portable for *nix systems, and also because it's very easy to
call programs with arguments, feed them input, and parse their output.  But
the shortcomings of shell scripts are how obfuscated
and unnecessarily difficult they are to accomplish generic programming
tasks.  This is somewhere where Python excels.  But unfortunately, until, I have not found any tool that lets you call commands nearly as
easily as Bash, that didn't rely on Bash.  Subprocess is painful.  Other
modules are extremely verbose., yes, uses a dynamic lookup mechanism
(but it should be noted that you don't have to rely on it *at all* if you
don't like it), but it does so with a very specific intention, and that is
to make Python more suited and commonplace for writing system shell scripts.

My push here to see if there is support is because I believe if could
enter the stdlib, and therefore become more ubiquitous on Linux and OSX,
that more shell-style scripts could be written in Python, more new users
would be comfortable in using Python, and Bash scripts could go the way of
the dodo :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From dinov at  Tue Oct 23 06:46:43 2012
From: dinov at (Dino Viehland)
Date: Tue, 23 Oct 2012 04:46:43 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Greg wrote:
> Guido van Rossum wrote:
> > The reason we can't ignore IOCP is that it is apparently the *only*
> > way to do async I/O in a scalable way. The only other polling
> > primitive available is select() which does not scale.
> There seems to be an alternative to polling, though. There are functions called
> ReadFileEx and WriteFileEx that allow you to pass in a routine to be called when
> the operation completes:
> us/library/windows/desktop/aa365468%28v=vs.85%29.aspx
> us/library/windows/desktop/aa365748%28v=vs.85%29.aspx
> Is there some reason that this doesn't scale either?

I suspect it's because it has the completion routine is being invoked on the same 
thread that issued the I/O.  The thread has to first block in an alertable wait (e.g. 
WaitForMultipleObjectsEx or WSAWaitForMultipleEvents).  So you'll only get 1 
thread doing I/Os and CPU work vs IOCP's where many threads can share both 

From ericsnowcurrently at  Tue Oct 23 07:07:17 2012
From: ericsnowcurrently at (Eric Snow)
Date: Mon, 22 Oct 2012 23:07:17 -0600
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 22, 2012 at 9:55 AM, Steve Dower <Steve.Dower at> wrote:
> I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to
> delegate part of its operations to another generator." Using 'yield from' (YF, for convenience)
> requires (a) that the caller is a generator and (b) that the callee is a generator.

Rather, the callee must be some iterable:

  def f():
      yield from [1, 2, 3]

  for x in f():


From jimjjewett at  Tue Oct 23 09:17:01 2012
From: jimjjewett at (Jim Jewett)
Date: Tue, 23 Oct 2012 03:17:01 -0400
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On 10/19/12, Guido van Rossum <guido at> wrote:

> I did a basic timing test using a simple recursive function and a
> recursive PEP-380 coroutine computing the same value (see attachment).
> The coroutine version is a little over twice as slow as the function
> version. I find that acceptable. This went 20 deep, making 2 recursive
> calls at each level (except at the deepest level).

Note that the co-routine code (copied below) does not involve a
scheduler that unwraps futures; there is no scheduler, and nothing
runs concurrently.

    def coroutine(n):
        if n <= 0:
            return 1
        l = yield from coroutine(n-1)
        r = yield from coroutine(n-1)
        return l + 1 + r

I like the above code; my concern was that yield might get co-opted
for use with scheduler loops, which would have to track the parent
task explicitly, and prevent it from being rescheduled too early.


From benoitc at  Tue Oct 23 09:19:59 2012
From: benoitc at (Benoit Chesneau)
Date: Tue, 23 Oct 2012 09:19:59 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido at> wrote:

> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve.Dower at> wrote:
> [Stuff about Futures and threads]
> Personally, I'm interested in designing a system, including an event
> loop, where you can rely on the properties of cooperative scheduling
> to avoid ever touching (OS) threading locks. I think such a system
> should be "pure" and all interaction with threads should be mediated
> by the event loop. (It's okay if this means that the implementation of
> the event loop must at some point acquire a threading lock.) The
> Futures used by the tasks to coordinate amongst themselves should not
> require locking -- they should themselves be able to rely on the
> guarantees of the event loop not to invoke multiple callbacks in
> parallel.
> IIUC you can do this on Windows with IOCP too, simply by only having a
> single thread reading events.

Maybe it is worth to have a look on libuv and the way it mixes threads and  and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, ?) and I was thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it:

- It provides a common api for IO watchers: read, write, writelines, readable, writeable that can probably be extend over remote systems
- Have a job queue system for threds that is working mostly like the Futures but using the event loop 

In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3]

I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled:

- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue.

- For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be likely be replaced by an event loop if someone want.

In my opinion the Go concurrency & memory model [5] could perfectly fit in the Python world and I'm surprised none already spoke about it.

In flower greenlets could probably be replaced by generators but i like the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that?

- beno?t

[1] &

From jimjjewett at  Tue Oct 23 09:34:58 2012
From: jimjjewett at (Jim Jewett)
Date: Tue, 23 Oct 2012 03:34:58 -0400
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/21/12, Guido van Rossum <guido at> wrote:
> On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve.Dower at>
> wrote:

>> It has synchronisation which is _aware_ of threads, but it never creates,
>> requires or uses them. It simply ensures thread-safe reentrancy, which
>> will be required for any general solution unless it is completely banned
>> from interacting across CPU threads.

> I don't see it that way. Any time you acquire a lock, you may be
> blocked for a long time. In a typical event loop that's an absolute
> no-no. Typically, to wait for another thread, you give the other
> thread a callback that adds a new event for *this* thread.

That (with or without rescheduling this thread to actually process the
event) is a perfectly reasonable solution, but I'm not sure how
obvious it is.  People willing to deal with the conventions and
contortions of twisted are likely to just use twisted.  A general API
should have a straightforward way to weight for a result; even
explicitly calling wait() may be too much to ask if you want to keep
assuming that other events will cooperate.

 > Perhaps. Lots of possibilities in this design space.
>> (*I'm inclined to define this [the Future interface] as 'result()', 'done()',
>> 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()'
>> functions. Maybe more, but I think that's sufficient. The current
>> '_waiters' list is an optimisation for add_done_callback(),  and doesn't
>> need to be part of the interface.)

> Agreed. I don't see much use for the cancellation stuff and all the
> extra complexity that adds to the interface.

wait_for_any may well be launching different strategies to solve the
same problem, and intending to ignore all but the fastest.  It makes
sense to go ahead and cancel the slower strategies.  (That said, I
agree that the API shouldn't guarantee that other tasks are actually
cancelled, let alone that they are cancelled before side effects


From guido at  Tue Oct 23 16:44:31 2012
From: guido at (Guido van Rossum)
Date: Tue, 23 Oct 2012 07:44:31 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
	<> <>
Message-ID: <>

On Tue, Oct 23, 2012 at 12:17 AM, Jim Jewett <jimjjewett at> wrote:
> On 10/19/12, Guido van Rossum <guido at> wrote:
>> I did a basic timing test using a simple recursive function and a
>> recursive PEP-380 coroutine computing the same value (see attachment).
>> The coroutine version is a little over twice as slow as the function
>> version. I find that acceptable. This went 20 deep, making 2 recursive
>> calls at each level (except at the deepest level).
> Note that the co-routine code (copied below) does not involve a
> scheduler that unwraps futures; there is no scheduler, and nothing
> runs concurrently.
>     def coroutine(n):
>         if n <= 0:
>             return 1
>         l = yield from coroutine(n-1)
>         r = yield from coroutine(n-1)
>         return l + 1 + r
> I like the above code; my concern was that yield might get co-opted
> for use with scheduler loops, which would have to track the parent
> task explicitly, and prevent it from being rescheduled too early.

Don't worry. There is no way that a scheduler can change the meaning
of yield from. All its power stems from its ability to decide when to
call next(), and that is the same power that the app has itself.

--Guido van Rossum (

From guido at  Tue Oct 23 16:48:36 2012
From: guido at (Guido van Rossum)
Date: Tue, 23 Oct 2012 07:48:36 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Thanks for the pointer to and description of libuv; it had come up in
my research yet but so far I have not looked it up actively. Now I
will. Also thanks for your reminder of the Goroutine model -- this is
definitely something to look at for inspiration as well. (Though does
Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-)


On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau <benoitc at> wrote:
> On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido at> wrote:
>> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <Steve.Dower at> wrote:
>> [Stuff about Futures and threads]
>> Personally, I'm interested in designing a system, including an event
>> loop, where you can rely on the properties of cooperative scheduling
>> to avoid ever touching (OS) threading locks. I think such a system
>> should be "pure" and all interaction with threads should be mediated
>> by the event loop. (It's okay if this means that the implementation of
>> the event loop must at some point acquire a threading lock.) The
>> Futures used by the tasks to coordinate amongst themselves should not
>> require locking -- they should themselves be able to rely on the
>> guarantees of the event loop not to invoke multiple callbacks in
>> parallel.
>> IIUC you can do this on Windows with IOCP too, simply by only having a
>> single thread reading events.
> Maybe it is worth to have a look on libuv and the way it mixes threads and  and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, ?) and I was thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it:
> - It provides a common api for IO watchers: read, write, writelines, readable, writeable that can probably be extend over remote systems
> - Have a job queue system for threds that is working mostly like the Futures but using the event loop
> In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3]
> I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled:
> - In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue.
> - For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be likely be replaced by an event loop if someone want.
> In my opinion the Go concurrency & memory model [5] could perfectly fit in the Python world and I'm surprised none already spoke about it.
> In flower greenlets could probably be replaced by generators but i like the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that?
> - beno?t
> [1] &
> [2]
> [3]
> [4]
> [5]
> [6]

--Guido van Rossum (

From guido at  Tue Oct 23 16:54:46 2012
From: guido at (Guido van Rossum)
Date: Tue, 23 Oct 2012 07:54:46 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Tue, Oct 23, 2012 at 12:34 AM, Jim Jewett <jimjjewett at> wrote:
> On 10/21/12, Guido van Rossum <guido at> wrote:
>> On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower <Steve.Dower at>
>> wrote:
>>> It has synchronisation which is _aware_ of threads, but it never creates,
>>> requires or uses them. It simply ensures thread-safe reentrancy, which
>>> will be required for any general solution unless it is completely banned
>>> from interacting across CPU threads.
>> I don't see it that way. Any time you acquire a lock, you may be
>> blocked for a long time. In a typical event loop that's an absolute
>> no-no. Typically, to wait for another thread, you give the other
>> thread a callback that adds a new event for *this* thread.
> That (with or without rescheduling this thread to actually process the
> event) is a perfectly reasonable solution, but I'm not sure how
> obvious it is.  People willing to deal with the conventions and
> contortions of twisted are likely to just use twisted.

I think part of my point is that we can package all this up in a way
that is a lot less scary than Twisted's reputation. And remember,
there are many other frameworks that use similar machinery. There's
Tornado, Monocle (which runs on top of Tornado *or* Twisted), and of
course the stdlib's asyncore, which is antiquated but still much used
-- AFAIL Zope is still built around it.

> A general API
> should have a straightforward way to wait for a result; even
> explicitly calling wait() may be too much to ask if you want to keep
> assuming that other events will cooperate.

Here I have some real world relevant experience: NDB, App Engine's new
Datastore API (which I wrote). It is async under the hood (yield + its
own flavor of Futures), and users who want the most performance from
their app are encouraged to use the async APIs directly -- but users
who don't care can ignore their existence completely. There are
thousands of users, and I've seen people explain the async stuff to
each other on StackOverflow, so I think it is quite accessible.

>> Agreed. I don't see much use for the cancellation stuff and all the
>> extra complexity that adds to the interface.
> wait_for_any may well be launching different strategies to solve the
> same problem, and intending to ignore all but the fastest.  It makes
> sense to go ahead and cancel the slower strategies.  (That said, I
> agree that the API shouldn't guarantee that other tasks are actually
> cancelled, let alone that they are cancelled before side effects
> occur.)

Agreed. And it's not hard to implement a custom cancellation mechanism either.

--Guido van Rossum (

From brett at  Tue Oct 23 18:09:56 2012
From: brett at (Brett Cannon)
Date: Tue, 23 Oct 2012 12:09:56 -0400
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

Go is available for Windows:

On Tue, Oct 23, 2012 at 10:48 AM, Guido van Rossum <guido at> wrote:

> Thanks for the pointer to and description of libuv; it had come up in
> my research yet but so far I have not looked it up actively. Now I
> will. Also thanks for your reminder of the Goroutine model -- this is
> definitely something to look at for inspiration as well. (Though does
> Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-)
> --Guido
> On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau <benoitc at>
> wrote:
> >
> > On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido at> wrote:
> >
> >> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower <
> Steve.Dower at> wrote:
> >> [Stuff about Futures and threads]
> >>
> >> Personally, I'm interested in designing a system, including an event
> >> loop, where you can rely on the properties of cooperative scheduling
> >> to avoid ever touching (OS) threading locks. I think such a system
> >> should be "pure" and all interaction with threads should be mediated
> >> by the event loop. (It's okay if this means that the implementation of
> >> the event loop must at some point acquire a threading lock.) The
> >> Futures used by the tasks to coordinate amongst themselves should not
> >> require locking -- they should themselves be able to rely on the
> >> guarantees of the event loop not to invoke multiple callbacks in
> >> parallel.
> >>
> >> IIUC you can do this on Windows with IOCP too, simply by only having a
> >> single thread reading events.
> >>
> >
> > Maybe it is worth to have a look on libuv and the way it mixes threads
> and  and event loop [1]. Libuv is one of the good event loop around able to
> use IOCP and other events systems on other arch (kqueue, ?) and I was
> thinking when reading all the exchange around that it would perfectly fit
> in our cases. Or at least something like it:
> >
> > - It provides a common api for IO watchers: read, write, writelines,
> readable, writeable that can probably be extend over remote systems
> > - Have a job queue system for threds that is working mostly like the
> Futures but using the event loop
> >
> > In any case there is a pyuv binding [2] if some want to test. Even a
> twisted reactor [3]
> >
> > I myself toying with the idea of porting the Go concurrency model to
> Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are
> handled:
> >
> > - In Go all coroutines are independent from each others and can only
> communicate via channel. Which has the advantage to allows them to run on
> different threads when one is blocking. In normal case they are mostly
> working like grrenlets on a single thread and are simply scheduled in a
> round-robin way. (mostly like in stackless). On the difference that
> goroutines can be executed in parallel. When one is blocking another thread
> will be created to handle other goroutines in the runnable queue.
> >
> > - For I/Os it exists a common api to all Connections and Listeners (Conn
> & Listen classes) that generally ask on a poll server. This poll server has
> for only task to register FDs and wake up the groutines that wait on read
> or fd events. This this poll server is running in a blocking loop it is
> automatically let by the scheduler in a thread. This pol server could be
> likely be replaced by an event loop if someone want.
> >
> > In my opinion the Go concurrency & memory model [5] could perfectly fit
> in the Python world and I'm surprised none already spoke about it.
> >
> > In flower greenlets could probably be replaced by generators but i like
> the API proposed by any coroutine pattern. I wonder if continulets [6]
> couldn't be ported in cpython to handle that?
> >
> > - beno?t
> >
> >
> > [1] &
> > [2]
> > [3]
> > [4]
> > [5]
> > [6]
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From andrewfr_ice at  Tue Oct 23 18:51:21 2012
From: andrewfr_ice at (Andrew Francis)
Date: Tue, 23 Oct 2012 09:51:21 -0700 (PDT)
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Benoit and folks:

>Message: 3
>Date: Tue, 23 Oct 2012 09:19:59 +0200
>From: Benoit Chesneau <benoitc at>
>To: Guido van Rossum <guido at>
>Cc: Python-Ideas <python-ideas at>
>Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The
?> ? async??? API of the future: yield-from)
>Message-ID: <BE74DBDE-1965-47F0-99B9-27F0C7CD574C at>
>Content-Type: text/plain; charset=windows-1252

(I learnt about this mailing list from Christian Tismer's post in the Stackless mailing list and I am catching up)

>I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler >and the way IOs are handled:

>- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to >allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single >thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be >executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue.

What aspect of the Go concurrency model? Maybe you already know this but ?Go and Stackless Python share a common ancestor: Limbo. More specifically the way channels work.?

This may be tangential to the discussion but in the past, I have used the module in conjunction with CPython and greenlets to rapidly?prototype parts of Go's model that are not present in Stackless, i.e. the select (ALT) language feature.?
Rob Pike and Russ?Cox were really helpful in answering my questions. Newer implementations use?
continuelets so look for an older PyPy implementation.?

I have also prototyped a subset of Polyphonic C# join patterns. ?After I got the prototype running, I had an interesting discussion with the authors of "Scalable Join Patterns."

For networking support, I run Twisted as a tasklet. There are a few tricks to make Stackless and Twisted co-operate.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Tue Oct 23 18:54:15 2012
From: guido at (Guido van Rossum)
Date: Tue, 23 Oct 2012 09:54:15 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

But does it let you use any Windows APIs?

On Tue, Oct 23, 2012 at 9:09 AM, Brett Cannon <brett at> wrote:
> Go is available for Windows:
> On Tue, Oct 23, 2012 at 10:48 AM, Guido van Rossum <guido at> wrote:
>> Thanks for the pointer to and description of libuv; it had come up in
>> my research yet but so far I have not looked it up actively. Now I
>> will. Also thanks for your reminder of the Goroutine model -- this is
>> definitely something to look at for inspiration as well. (Though does
>> Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-)
>> --Guido
>> On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau <benoitc at>
>> wrote:
>> >
>> > On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido at> wrote:
>> >
>> >> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower
>> >> <Steve.Dower at> wrote:
>> >> [Stuff about Futures and threads]
>> >>
>> >> Personally, I'm interested in designing a system, including an event
>> >> loop, where you can rely on the properties of cooperative scheduling
>> >> to avoid ever touching (OS) threading locks. I think such a system
>> >> should be "pure" and all interaction with threads should be mediated
>> >> by the event loop. (It's okay if this means that the implementation of
>> >> the event loop must at some point acquire a threading lock.) The
>> >> Futures used by the tasks to coordinate amongst themselves should not
>> >> require locking -- they should themselves be able to rely on the
>> >> guarantees of the event loop not to invoke multiple callbacks in
>> >> parallel.
>> >>
>> >> IIUC you can do this on Windows with IOCP too, simply by only having a
>> >> single thread reading events.
>> >>
>> >
>> > Maybe it is worth to have a look on libuv and the way it mixes threads
>> > and  and event loop [1]. Libuv is one of the good event loop around able to
>> > use IOCP and other events systems on other arch (kqueue, ?) and I was
>> > thinking when reading all the exchange around that it would perfectly fit in
>> > our cases. Or at least something like it:
>> >
>> > - It provides a common api for IO watchers: read, write, writelines,
>> > readable, writeable that can probably be extend over remote systems
>> > - Have a job queue system for threds that is working mostly like the
>> > Futures but using the event loop
>> >
>> > In any case there is a pyuv binding [2] if some want to test. Even a
>> > twisted reactor [3]
>> >
>> > I myself toying with the idea of porting the Go concurrency model to
>> > Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are
>> > handled:
>> >
>> > - In Go all coroutines are independent from each others and can only
>> > communicate via channel. Which has the advantage to allows them to run on
>> > different threads when one is blocking. In normal case they are mostly
>> > working like grrenlets on a single thread and are simply scheduled in a
>> > round-robin way. (mostly like in stackless). On the difference that
>> > goroutines can be executed in parallel. When one is blocking another thread
>> > will be created to handle other goroutines in the runnable queue.
>> >
>> > - For I/Os it exists a common api to all Connections and Listeners (Conn
>> > & Listen classes) that generally ask on a poll server. This poll server has
>> > for only task to register FDs and wake up the groutines that wait on read or
>> > fd events. This this poll server is running in a blocking loop it is
>> > automatically let by the scheduler in a thread. This pol server could be
>> > likely be replaced by an event loop if someone want.
>> >
>> > In my opinion the Go concurrency & memory model [5] could perfectly fit
>> > in the Python world and I'm surprised none already spoke about it.
>> >
>> > In flower greenlets could probably be replaced by generators but i like
>> > the API proposed by any coroutine pattern. I wonder if continulets [6]
>> > couldn't be ported in cpython to handle that?
>> >
>> > - beno?t
>> >
>> >
>> > [1] &
>> >
>> > [2]
>> > [3]
>> > [4]
>> > [5]
>> > [6]
>> --
>> --Guido van Rossum (
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at

--Guido van Rossum (

From brett at  Tue Oct 23 19:08:12 2012
From: brett at (Brett Cannon)
Date: Tue, 23 Oct 2012 13:08:12 -0400
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
	<> <>
Message-ID: <>

On Tue, Oct 23, 2012 at 12:54 PM, Guido van Rossum <guido at> wrote:

> But does it let you use any Windows APIs?
That I don't know.

> On Tue, Oct 23, 2012 at 9:09 AM, Brett Cannon <brett at> wrote:
> > Go is available for Windows:
> >
> > On Tue, Oct 23, 2012 at 10:48 AM, Guido van Rossum <guido at>
> wrote:
> >>
> >> Thanks for the pointer to and description of libuv; it had come up in
> >> my research yet but so far I have not looked it up actively. Now I
> >> will. Also thanks for your reminder of the Goroutine model -- this is
> >> definitely something to look at for inspiration as well. (Though does
> >> Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-)
> >>
> >> --Guido
> >>
> >> On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau <benoitc at
> >
> >> wrote:
> >> >
> >> > On Oct 22, 2012, at 4:59 PM, Guido van Rossum <guido at>
> wrote:
> >> >
> >> >> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower
> >> >> <Steve.Dower at> wrote:
> >> >> [Stuff about Futures and threads]
> >> >>
> >> >> Personally, I'm interested in designing a system, including an event
> >> >> loop, where you can rely on the properties of cooperative scheduling
> >> >> to avoid ever touching (OS) threading locks. I think such a system
> >> >> should be "pure" and all interaction with threads should be mediated
> >> >> by the event loop. (It's okay if this means that the implementation
> of
> >> >> the event loop must at some point acquire a threading lock.) The
> >> >> Futures used by the tasks to coordinate amongst themselves should not
> >> >> require locking -- they should themselves be able to rely on the
> >> >> guarantees of the event loop not to invoke multiple callbacks in
> >> >> parallel.
> >> >>
> >> >> IIUC you can do this on Windows with IOCP too, simply by only having
> a
> >> >> single thread reading events.
> >> >>
> >> >
> >> > Maybe it is worth to have a look on libuv and the way it mixes threads
> >> > and  and event loop [1]. Libuv is one of the good event loop around
> able to
> >> > use IOCP and other events systems on other arch (kqueue, ?) and I was
> >> > thinking when reading all the exchange around that it would perfectly
> fit in
> >> > our cases. Or at least something like it:
> >> >
> >> > - It provides a common api for IO watchers: read, write, writelines,
> >> > readable, writeable that can probably be extend over remote systems
> >> > - Have a job queue system for threds that is working mostly like the
> >> > Futures but using the event loop
> >> >
> >> > In any case there is a pyuv binding [2] if some want to test. Even a
> >> > twisted reactor [3]
> >> >
> >> > I myself toying with the idea of porting the Go concurrency model to
> >> > Python [4] using greenlets and pyuv. Both the scheduler and the way
> IOs are
> >> > handled:
> >> >
> >> > - In Go all coroutines are independent from each others and can only
> >> > communicate via channel. Which has the advantage to allows them to
> run on
> >> > different threads when one is blocking. In normal case they are mostly
> >> > working like grrenlets on a single thread and are simply scheduled in
> a
> >> > round-robin way. (mostly like in stackless). On the difference that
> >> > goroutines can be executed in parallel. When one is blocking another
> thread
> >> > will be created to handle other goroutines in the runnable queue.
> >> >
> >> > - For I/Os it exists a common api to all Connections and Listeners
> (Conn
> >> > & Listen classes) that generally ask on a poll server. This poll
> server has
> >> > for only task to register FDs and wake up the groutines that wait on
> read or
> >> > fd events. This this poll server is running in a blocking loop it is
> >> > automatically let by the scheduler in a thread. This pol server could
> be
> >> > likely be replaced by an event loop if someone want.
> >> >
> >> > In my opinion the Go concurrency & memory model [5] could perfectly
> fit
> >> > in the Python world and I'm surprised none already spoke about it.
> >> >
> >> > In flower greenlets could probably be replaced by generators but i
> like
> >> > the API proposed by any coroutine pattern. I wonder if continulets [6]
> >> > couldn't be ported in cpython to handle that?
> >> >
> >> > - beno?t
> >> >
> >> >
> >> > [1] &
> >> >
> >> > [2]
> >> > [3]
> >> > [4]
> >> > [5]
> >> > [6]
> >>
> >>
> >>
> >> --
> >> --Guido van Rossum (
> >> _______________________________________________
> >> Python-ideas mailing list
> >> Python-ideas at
> >>
> >
> >
> --
> --Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From andrewfr_ice at  Tue Oct 23 19:18:56 2012
From: andrewfr_ice at (Andrew Francis)
Date: Tue, 23 Oct 2012 10:18:56 -0700 (PDT)
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Greg:

Message: 2
Date: Tue, 23 Oct 2012 12:48:39 +1300
From: Greg Ewing <greg.ewing at>
To: "python-ideas at" <python-ideas at>
Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The
??? async API of the future: yield-from)
Message-ID: <5085DB57.4010504 at>
Content-Type: text/plain; charset=UTF-8; format=flowed

>It does, in the sense that a continuation appears to the
>Scheme programmer as a callable object.

>The connection goes deeper as well. There's a style of
>programming called "continuation-passing style", in which
>nothing ever returns -- every function is passed another
>function to be called with its result. In a language such
>as Scheme that supports tail calls, you can use this style
>extensively without fear of overflowing the call stack.

>You're using this style whenever you chain callbacks
>together using Futures or Deferreds. The callbacks don't
>return values; instead, each callback arranges for another
>callback to be called, passing it the result.

There is a really nice Microsoft Research called "Cooperative Task Management without Manual Stackless Management."[1]
In this paper, the authors introduce the term "stack ripping" to describe how asynchronous events with callbacks handle memory.

I think this is a nice way to describe the fundamental differences between continuations and Twisted callbacks/deferred.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From vinay_sajip at  Tue Oct 23 20:49:37 2012
From: vinay_sajip at (Vinay Sajip)
Date: Tue, 23 Oct 2012 18:49:37 +0000 (UTC)
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
References: <>
Message-ID: <>

Guido van Rossum <guido at ...> writes:

> But does it let you use any Windows APIs?

It seems you can:

Quote from that page:

"w32 is a wrapper of windows apis for the Go Programming Language. It wraps
win32 apis to "Go style" to make them easier to use."


Vinay Sajip

From at  Tue Oct 23 21:33:58 2012
From: at (Yury Selivanov)
Date: Tue, 23 Oct 2012 15:33:58 -0400
Subject: [Python-ideas] Async API
Message-ID: <>


First of all, sorry for the lengthy email.  I've really tried to make it concise
and I hope that I didn't fail entirely.  At the beginning I want to describe
the framework my company has been working on for several years, and on which we
successfully deployed several web applications that are exposed to 1000s of users 
today.  It survived multiple architecture reviews and redesigns, so I believe its 
design is worth to be introduced here.

Secondly, I'm really glad that many advanced python developers find that use of
"yield" is viable for async API, and that it even may be "The Right Way".  Because 
when we started working on our framework that sounded nuts (and still sounds...)

The framework

I'll describe here only the core functionality, not touching message bus & dispatch,
protocols design, IO layers, etc.  If someone gets interested - I can start another

The very core of the system is Scheduler.  I prefer it to be called "Scheduler",
and not "Reactor" or something else, because it's not just an event loop.  It loops 
over micro-threads, where a micro-thread is a primitive that holds a pointer to the
current running/suspended task.  Task can be anything, from coroutine, to a Sleep
command.  A Task may be suspended because of IO waiting, a lock primitive, a timeout
or something else.  You can even write programs that are not IO-bound at all.

To the code.  So when you have::

    def foo():
        bar_value = yield bar()

defined, and then executed, 'foo' will send a Task object (wrapped around 'bar'), 
so that it will be executed in the foo's micro-thread.

And because we return a Task, we can also do::

    yield bar().with_timeout(1)

or even (alike coroutines with Futures)::

    bar_value_promise = yield bar().with_timeout(1).async()
    [some code]
    bar_value = yield bar_value_promise

So far there is nothing new.  The need for something "new" emerged when we started
to use it in "real world" applications.  Consider you have some ORM, and the 
following piece of code::

    topics =[
        (FE.creator, [
            (FE.creator.subject, [
                (gpi, [
    ]).filter(FE.publication_date <,
              FE.category == self.category)

and later::

    for topic in topics:

Everything is lazily-loaded, so a DB query here can be run at virtually any point.
When you iterate it pre-fetches objects, or addressing an attribute which wasn't
told to be loaded, etc.  The thing is that there is no way to express with 'yield'
all that semantics.  There is no 'for yield' statement, there is no pretty way of
resolving an attribute with 'yield'.

So even if you decide to write everything around you from scratch supporting
'yields', you still can't make a nice python API for some problems.

Another problem is that "writing everything from scratch" thing.  Nobody wants it.
We always want to reuse, nobody wants to write an SMTP client from scratch, when 
there is a decent one available right in the stdlib.

So the solution was simple.  Incorporate greenlets.

With greenlets we got a 'yield_()' function, that can be called from any coroutine,
and from framework user's point of view it is the same as 'yield' statement.

Now we were able to create a green-socket object, that looks as a plain stdlib
socket, and fix our ORM.  With it help we also were able to wrap lots and lots
of existing libraries in a nice 'yield'-style design, without rewriting their

At the end - we have a hybrid approach.  For 95% we use explicit 'yields', and for
the rest 5% - well, we know that when we use ORM it may do some implicit 'yields',
but that's OK.

Now, with adopting greenlets a whole new optimization set of strategies became
available.  For instance, we can substitute 'yield' statements with 'yield_'
command transparently by messing with opcodes, and by tweaking 'yield_' and
reusing 'Task' objects we can achieve near regular-python-call performance, but
with a tight control over our coroutines & micro-threads.  And when PyPy finally
adds support for Python 3, STM & JIT-able continulets, it would be very interesting
to see how we can improve performance even further.


The whole point of this text was to show, that pure 'yield' approach will not
work.  Moreover, I don't think it's time to pronounce "The Right Way" of 'yielding'
and 'yield-fromming'.  There are so many ways of doing that: with @coroutine
decorator, plain generators, futures and Tasks, and perhaps more.  And I honestly
don't know with one is the right one.

What we really need now (and I think Guido has already mentioned that) is a 
callback-based (Deferreds, Futures, plain callbacks) design that is easy to 
plug-and-play in any coroutine-framework.  It has to be low-level and simple.  
Sort of WSGI for async frameworks ;)

We also need to work on the stdlib, so that it is easy to inject a custom socket 
in any object.  Ideally, by passing it in the constructor (as everybody hates 

With all that said, I'd be happy to dedicate a fair amount of my time to help
with the design and implementation.

Thank you!

From sam-pydeas at  Tue Oct 23 23:25:21 2012
From: sam-pydeas at (Sam Rushing)
Date: Tue, 23 Oct 2012 14:25:21 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/23/12 12:33 PM, Yury Selivanov wrote:
> The whole point of this text was to show, that pure 'yield' approach will not
> work.  Moreover, I don't think it's time to pronounce "The Right Way" of 'yielding'
> and 'yield-fromming'.  There are so many ways of doing that: with @coroutine
> decorator, plain generators, futures and Tasks, and perhaps more.  And I honestly
> don't know with one is the right one.

[Thanks Yury for giving me a convenient place to jump in]

I abandoned the callback-driven approach in 1999, after pushing it as
far as I could handle.  IMHO you can build single pieces in a relatively
clean fashion, but you cannot easily combine those pieces together to
build real systems.

Over the past year I've played a little with some generator-based code
(tlslite & bluelets for example), and I don't think they're much of an
improvement.  Whether it's decorated callbacks, generators, whatever, it
all reminds me of impenetrable monad code in Haskell. 
Continuation-passing-style isn't something that humans should be
expected to do, it's a trick for compilers. 8^)

> What we really need now (and I think Guido has already mentioned that) is a 
> callback-based (Deferreds, Futures, plain callbacks) design that is easy to 
> plug-and-play in any coroutine-framework.  It has to be low-level and simple.  
> Sort of WSGI for async frameworks ;)

I've been trying to play catch-up since being told about this thread a
couple of days ago.  If I understand it correctly, 'yield-from' looks
like it can help make generator-based-concurrency a little more sane by
cutting back on endless chains of 'for x in ...: yield ...', right? 
That certainly sounds like an improvement, but does the generator nature
of the API bubble all the way up to the top?  Can you send an email with
a function call?

> We also need to work on the stdlib, so that it is easy to inject a custom socket 
> in any object.  Ideally, by passing it in the constructor (as everybody hates 
> monkey-patching.)

I second this one.  Having a way to [optionally] pass in a factory for
sockets would help with portability, and would cut down on the
temptation to monkey-patch.

It'd be really great to use standard 'async' protocol implementations in
a performant way... although I'm not sure how/if I can wedge such code
into systems like shrapnel*, but it all starts with being able to pass
in a socket-like object (or factory).


(*) Since no one else has mentioned it yet, a tiny plug here for

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: OpenPGP digital signature
URL: <>

From benoitc at  Tue Oct 23 23:48:44 2012
From: benoitc at (Benoit Chesneau)
Date: Tue, 23 Oct 2012 23:48:44 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 23, 2012, at 6:51 PM, Andrew Francis <andrewfr_ice at> wrote:

> Hi Benoit and folks:
> >Message: 3
> >Date: Tue, 23 Oct 2012 09:19:59 +0200
> >From: Benoit Chesneau <benoitc at>
> >To: Guido van Rossum <guido at>
> >Cc: Python-Ideas <python-ideas at>
> >Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The
>  >   async    API of the future: yield-from)
> >Message-ID: <BE74DBDE-1965-47F0-99B9-27F0C7CD574C at>
> >Content-Type: text/plain; charset=windows-1252
> (I learnt about this mailing list from Christian Tismer's post in the Stackless mailing list and I am catching up)
> >I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler >and the way IOs are handled:
> >- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to >allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single >thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be >executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue.
> What aspect of the Go concurrency model? Maybe you already know this but  Go and Stackless Python share a common ancestor: Limbo. More specifically the way channels work. 

Indeed :) I would have say Plan 9 and tasks inside but right channnels are in limbo too.

> This may be tangential to the discussion but in the past, I have used the module in conjunction with CPython and greenlets to rapidly prototype parts of Go's model that are not present in Stackless, i.e. the select (ALT) language feature. 
> Rob Pike and Russ Cox were really helpful in answering my questions. Newer implementations use 
> continuelets so look for an older PyPy implementation. 
> I have also prototyped a subset of Polyphonic C# join patterns.  After I got the prototype running, I had an interesting discussion with the authors of "Scalable Join Patterns."

Yes saw that. And actually some part of the Task code is based on  but using greenlets, Channels have been slightly modified to be thread-safe and support buffering. Did you release your code somewhere ? It could be interesting to put the experience further.
> For networking support, I run Twisted as a tasklet. There are a few tricks to make Stackless and Twisted co-operate.

I plan to release a new version of flower this week. For now i am also running a libuv eventloop in a tasklet, but since the tasklet need to be blocking for performance, i am writing some new code to run the tasklet in its proper thread when needed. Not sure how it will go.

Current implementation handle events when the scheduler come on the eventloop which isn't the more efficient way imo.

Another thing to considers is also rust. Rust is using libuv and put the eventloop in its own task thread :

I find this idea quite elegant.


- beno?t

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Wed Oct 24 00:05:52 2012
From: at (Yury Selivanov)
Date: Tue, 23 Oct 2012 18:05:52 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>


BTW, kudos for shrapnel!

On 2012-10-23, at 5:25 PM, Sam Rushing <sam-pydeas at> wrote:

> On 10/23/12 12:33 PM, Yury Selivanov wrote:
>> What we really need now (and I think Guido has already mentioned that) is a 
>> callback-based (Deferreds, Futures, plain callbacks) design that is easy to 
>> plug-and-play in any coroutine-framework.  It has to be low-level and simple.  
>> Sort of WSGI for async frameworks ;)
> I've been trying to play catch-up since being told about this thread a
> couple of days ago.  If I understand it correctly, 'yield-from' looks
> like it can help make generator-based-concurrency a little more sane by
> cutting back on endless chains of 'for x in ...: yield ...', right? 
> That certainly sounds like an improvement, but does the generator nature
> of the API bubble all the way up to the top?  Can you send an email with
> a function call?

Well, I guess so.  Let's say, urllib is rewritten internally in async-style,
exposing publicly its old API, like::

   def urlopen(*args, **kwargs):
       return run_coro(urlopen_async, args, kwargs)

where 'run_coro' takes care of setting up a Scheduler/event-loop and running
yield-style or callback-style private code.  So that 'urllib' is blocking, but
there is an option of using 'urlopen_async' for those who need it.

For basic library functions that will work.  And that's already a huge win.
But developing a complicated library will become twice as hard, as you'll need
to maintain two versions of API - sync & async all the way through the code.

There is only one way to 'magically' make existing code both sync- & async-
friendly--greenlets, but I think there is no chance for them (or stackless) to 
land in cpython in the foreseeable future (although it would be awesome.)

BTW, why didn't you use greenlets in shrapnel and ended up with your own

>> We also need to work on the stdlib, so that it is easy to inject a custom socket 
>> in any object.  Ideally, by passing it in the constructor (as everybody hates 
>> monkey-patching.)
> I second this one.  Having a way to [optionally] pass in a factory for
> sockets would help with portability, and would cut down on the
> temptation to monkey-patch.


Let's see - if nobody is opposed to this we can start with submitting patches :)

Or is there a need for a separate small PEP?


From sam-pydeas at  Wed Oct 24 01:00:30 2012
From: sam-pydeas at (Sam Rushing)
Date: Tue, 23 Oct 2012 16:00:30 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/23/12 3:05 PM, Yury Selivanov wrote:
> Sam,
> BTW, kudos for shrapnel!
> For basic library functions that will work.  And that's already a huge win.
> But developing a complicated library will become twice as hard, as you'll need
> to maintain two versions of API - sync & async all the way through the code.

This is really difficult, if you want to see a great example of trying
to make all parties happy, look at Pika (an AMQP implementation).

Actually this reminds me, it would be really great if there was a
standardized with_timeout()API.  It's better than adding timeout args to
all the functions.  I'm sure that systems like Twisted & gevent could
also implement it (if they don't already have it):

In shrapnel it is simply:

    coro.with_timeout (<seconds>, <fun>, *args, **kwargs)

Timeouts are caught thus:

      coro.with_timeout (...)
   except coro.TimeoutError:

> There is only one way to 'magically' make existing code both sync- & async-
> friendly--greenlets, but I think there is no chance for them (or stackless) to 
> land in cpython in the foreseeable future (although it would be awesome.)
> BTW, why didn't you use greenlets in shrapnel and ended up with your own
> implementation?
I think shrapnel predates greenlets... some of the core asm code for
greenlets may have come from one of shrapnel's precursors at ironport...
Unfortunately it took many years to get shrapnel open-sourced - I
remember talking with Guido about it over lunch in ~2006.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: OpenPGP digital signature
URL: <>

From at  Wed Oct 24 01:30:35 2012
From: at (Yury Selivanov)
Date: Tue, 23 Oct 2012 19:30:35 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-23, at 7:00 PM, Sam Rushing <sam-pydeas at> wrote:

> On 10/23/12 3:05 PM, Yury Selivanov wrote:
>> Sam,
>> BTW, kudos for shrapnel!
> Thanks!
>> For basic library functions that will work.  And that's already a huge win.
>> But developing a complicated library will become twice as hard, as you'll need
>> to maintain two versions of API - sync & async all the way through the code.
> This is really difficult, if you want to see a great example of trying
> to make all parties happy, look at Pika (an AMQP implementation).

Thanks, will take a look!

> Actually this reminds me, it would be really great if there was a
> standardized with_timeout()API.  It's better than adding timeout args to
> all the functions.  I'm sure that systems like Twisted & gevent could
> also implement it (if they don't already have it):
> In shrapnel it is simply:
>    coro.with_timeout (<seconds>, <fun>, *args, **kwargs)
> Timeouts are caught thus:
>   try:
>      coro.with_timeout (...)
>   except coro.TimeoutError:
>      ...

You're right--if we want to ship some "standard" async API in python,
API for timeouts is a must.  We will at least need to handle timeouts
in async code in the stdlib, won't we...

A question:

How do you protect finally statements in shrapnel?  If we have a following 
coroutine (greenlet style):

   def foo():
       connection = open_connection()
           [some code]
What happens if you run 'foo.with_timeout(1)' and timeout occurs at 
"[some code]" point?  Will you just abort 'foo', possibly preventing
'connection' from being closed?


From sam-pydeas at  Wed Oct 24 02:24:54 2012
From: sam-pydeas at (Sam Rushing)
Date: Tue, 23 Oct 2012 17:24:54 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/23/12 4:30 PM, Yury Selivanov wrote:
> How do you protect finally statements in shrapnel?  If we have a following 
> coroutine (greenlet style):
>    def foo():
>        connection = open_connection()
>        try:
>            spam()
>        finally:
>            [some code]
>            connection.close()
> What happens if you run 'foo.with_timeout(1)' and timeout occurs at 
> "[some code]" point?  Will you just abort 'foo', possibly preventing
> 'connection' from being closed?
Timeouts are raised as normal exceptions - for exactly this reason.

The interesting part of the implementation is keeping each
with_timeout() call separate.

If you have nested with_timeout() calls and the outer timeout goes off,
it will skip the inner exception handler and fire only the outer one. 
In other words, the code for with_timeout() verifies that any timeouts
propagating through it belong to it.


From greg.ewing at  Wed Oct 24 02:24:49 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 24 Oct 2012 13:24:49 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Yury Selivanov wrote:

>    def foo():
>        connection = open_connection()
>        try:
>            spam()
>        finally:
>            [some code]
>            connection.close()
> What happens if you run 'foo.with_timeout(1)' and timeout occurs at 
> "[some code]" point?

I would say that vital cleanup code probably shouldn't do
anything that could block. If you really need to do that,
it should be protected by a finally clause of its own:

    def foo():
        connection = open_connection()
                [some code]


From tismer at  Wed Oct 24 02:24:49 2012
From: tismer at (Christian Tismer)
Date: Wed, 24 Oct 2012 02:24:49 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On 23.10.12 00:35, Guido van Rossum wrote:
> On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing <greg.ewing at> wrote:
>> Guido van Rossum wrote:
>>> (Aside: please don't use 'continuation' for 'task'. The use of this
>>> term in Scheme has forever tainted the word for me.)
>> It has a broader meaning than the one in Scheme; essentially
>> it's a synonym for "callback".
> (Off-topic:) But does that meaning apply to Scheme? If so, I wish
> someone would have told me 15 years ago...

As used quite often, the definition is more like "half a coroutine",
that means the part that can resume it at some point.
Sticking two together, you get a coroutine (tasklet, greenlet etc).
The are one-shot continuations, they are gone after resuming.

The meaning in Scheme is much weider, and you were right to be scared.
In Scheme, these beasts survive their reactivation as a constant.
My big design error in 1998 was to implement exactly those full
continuations for Python.

I'm scared myself when I recall that ... ;-)

ciao - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From tismer at  Wed Oct 24 02:43:38 2012
From: tismer at (Christian Tismer)
Date: Wed, 24 Oct 2012 02:43:38 +0200
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 24.10.12 01:00, Sam Rushing wrote:
> On 10/23/12 3:05 PM, Yury Selivanov wrote:
> ...

>> There is only one way to 'magically' make existing code both sync- & async-
>> friendly--greenlets, but I think there is no chance for them (or stackless) to
>> land in cpython in the foreseeable future (although it would be awesome.)
>> BTW, why didn't you use greenlets in shrapnel and ended up with your own
>> implementation?
> I think shrapnel predates greenlets... some of the core asm code for
> greenlets may have come from one of shrapnel's precursors at ironport...
> Unfortunately it took many years to get shrapnel open-sourced - I
> remember talking with Guido about it over lunch in ~2006.

Hi Sam,

greenlets were developed in 2004 by Armin Rigo, on the first
(and maybe only) Stackless sprint here in Berlin.
The greenlet asm code was ripped out of Stackless and slightly
improved, but has the same old stack-slicing idea.

cheers - chris

Christian Tismer             :^)   <mailto:tismer at>
Software Consulting          :     Have a break! Take a ride on Python's
Karl-Liebknecht-Str. 121     :    *Starship*
14482 Potsdam                :     PGP key ->
phone +49 173 24 18 776  fax +49 (30) 700143-0023
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
       whom do you want to sponsor today?

From at  Wed Oct 24 02:52:52 2012
From: at (Yury Selivanov)
Date: Tue, 23 Oct 2012 20:52:52 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Greg,

On 2012-10-23, at 8:24 PM, Greg Ewing <greg.ewing at> wrote:

> Yury Selivanov wrote:
>>   def foo():
>>       connection = open_connection()
>>       try:
>>           spam()
>>       finally:
>>           [some code]
>>           connection.close()
>>           What happens if you run 'foo.with_timeout(1)' and timeout occurs at "[some code]" point?
> I would say that vital cleanup code probably shouldn't do
> anything that could block. If you really need to do that,
> it should be protected by a finally clause of its own:
>   def foo():
>       connection = open_connection()
>       try:
>           spam()
>       finally:
>           try:
>               [some code]
>           finally:
>               connection.close()

Please take a look at the problem definition in PEP 419.

It's not about try..finally nesting, it's about Scheduler being aware
that a coroutine is in its 'finally' block and thus shouldn't be interrupted
at the moment (a problem that doesn't exist in a non-coroutine world).

Speaking about your solution, imagine if you have three connections to close,
what will you write?

           c1.close() # coroutine call
               c2.close() # coroutine call
               c3.close() # coroutine call

But if you somehow make scheduler aware of 'finally' block, through PEP 419 
(which I don't like), or like in my framework where we inline special code in
finally statement by modifying coroutine opcodes (which I don't like too), 
you can simply write::


And scheduler will gladly wait until finally is over.  And the code snippet
above is something, that is familiar to every user of python--nobody expects
code in the finally section to be interrupted from the *outside* world.  If
we fail to guarantee 'finally' block safety, then coroutine-style programming
is going to be much tougher.  Or we have to abandon timeouts and coroutines

So eventually, we'll need to figure out the best mechanism/approach for this.

Now, I don't think it's the right moment to shift discussion into this 
particular problem, but I would rather like to bring up the point, that
implementing 'yield'-style coroutines is a very hard thing, and I'm not
sure that we should implement them in 3.4.

Setting guidelines and standard protocols, adding socket-factories support
where necessary in the stdlib is a better approach (in my humble opinion.)


From sam-pydeas at  Wed Oct 24 02:58:16 2012
From: sam-pydeas at (Sam Rushing)
Date: Tue, 23 Oct 2012 17:58:16 -0700
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/23/12 5:24 PM, Christian Tismer wrote:
> As used quite often, the definition is more like "half a coroutine",
> that means the part that can resume it at some point.
> Sticking two together, you get a coroutine (tasklet, greenlet etc).
> The are one-shot continuations, they are gone after resuming.
> The meaning in Scheme is much weider, and you were right to be scared.
> In Scheme, these beasts survive their reactivation as a constant.
> My big design error in 1998 was to implement exactly those full
> continuations for Python.
> I'm scared myself when I recall that ... ;-)
Come on Christian, take the red pill and see how far down the rabbit
hole goes... 8^)

I never noticed before, but there really are two different meanings to

1) in the phrase 'continuation-passing style', it means a 'callback' of
2) as a separate term, it means an object that represents the future of
a computation.

Like Greg said, you can apply the CPS transformation to any bit of code
(or write it that way from the start), and when you do you might be
tempted to refer to your callbacks as 'continuations'.


From Steve.Dower at  Wed Oct 24 03:53:52 2012
From: Steve.Dower at (Steve Dower)
Date: Wed, 24 Oct 2012 01:53:52 +0000
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Since I was the one to first mention the term 'continuation' in this discussion, I'll clarify that I meant it as the "'callback' of sorts", and specifically in the situation where the person writing it does not realise that it is a callback.

For example:

def my_func():
    # part a
    x = yield y
    # part b

Part B is the continuation here - the piece of code that continues after 'y' completes. There are various other pieces involved (a callback and a generator, and possibly others, depending on the implementation) so rather than muddying the waters with adjectives I muddied the waters with a noun. "The rest of the task" is close enough (when used in context) that I'm happy to stick to that. "Callback" is an implementation detail IMO, and not one that is necessary to leak through our abstraction.

(I also didn't realise people were so traumatised by the C-word, or I would have picked another one. Add this to the list of reasons to not learn functional programming... :) )

From sam-pydeas at  Wed Oct 24 06:21:47 2012
From: sam-pydeas at (Sam Rushing)
Date: Tue, 23 Oct 2012 21:21:47 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/23/12 5:43 PM, Christian Tismer wrote:
> greenlets were developed in 2004 by Armin Rigo, on the first
> (and maybe only) Stackless sprint here in Berlin.
> The greenlet asm code was ripped out of Stackless and slightly
> improved, but has the same old stack-slicing idea.
Ah, ok.  I remember talking with you at the 2005 PyCon about my
two-stack solution*, but don't remember if anything came of it.
Do greenlets use a single stack?


(*) nothing to do with Israel

From greg.ewing at  Wed Oct 24 07:06:36 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 24 Oct 2012 18:06:36 +1300
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
 API of the future: yield-from)
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

Sam Rushing wrote:

> 1) in the phrase 'continuation-passing style', it means a 'callback' of
> sorts.
> 2) as a separate term, it means an object that represents the future of
> a computation.

They're not really different things. When you call a continuation
function in a continuation-passing style program, you're effectively
invoking *all* of the rest of the computation, not just the part
represented by that function.

This becomes particularly clear if you're able to make the
continuation calls using tail calls. Then it's not so much a
"callback" as a "callforward". Nothing ever returns, so forward
is the only way to go!


From andrewfr_ice at  Wed Oct 24 19:03:09 2012
From: andrewfr_ice at (Andrew Francis)
Date: Wed, 24 Oct 2012 10:03:09 -0700 (PDT)
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Benoit:

 From: Benoit Chesneau <benoitc at>
To: Andrew Francis <andrewfr_ice at> 
Cc: "python-ideas at" <python-ideas at> 
Sent: Tuesday, October 23, 2012 5:48 PM
Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from)

On Oct 23, 2012, at 6:51 PM, Andrew Francis <andrewfr_ice at> wrote:
AF>This may be tangential to the discussion but in the past, I have used the module in conjunction with AF>CPython and greenlets to rapidly?prototype parts of Go's model that are not present in Stackless, i.e. the select (ALT) AF>language feature.?Rob Pike and Russ?Cox were really helpful in answering my questions. Newer AF>implementations use?continuelets so look for an older PyPy implementation.?
AF>I have also prototyped a subset of Polyphonic C# join patterns. ?After I got the prototype running, I had an interesting AF>discussion with the authors of "Scalable Join Patterns."
>Yes saw that. And actually some part of the Task code is based on ?but using greenlets, >Channels have been slightly modified to be thread-safe and support buffering. Did you release your code >somewhere ? It could be interesting to put the experience further.

You may be mistaking my work with someone else.

I didn't add buffering but that t is relatively easy to do without altering Stackless Python's internals. However I believe that synchronous channels with buffering is a simple and powerful concurrency model. Go's implementers got it right. John Reppy (currently a NSF director) talks about synchronous channel's power in a Concurrent ML book.?

If ?you go to to the Stackless repository example page

you will find the code for a modified that implements Go's select statement.?

Since I am giving a talk in Toronto soon, I will soon release a new version of the join pattern version with documentation and examples. The code is about a year old and I have learnt new things. ?I can mail you an archive and you are free to play with it and ask questions.?

Since this is somewhat off-topic, the reason I mention all this is that if you want to experiment with a Go style system, I think it easiest to work from something like and greenlets than start from scratch.?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From benoitc at  Wed Oct 24 23:54:17 2012
From: benoitc at (Benoit Chesneau)
Date: Wed, 24 Oct 2012 23:54:17 +0200
Subject: [Python-ideas] yield from multiple iterables (was Re: The async
	API of the future: yield-from)
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 24, 2012, at 7:03 PM, Andrew Francis <andrewfr_ice at> wrote:

> Hi Benoit:
> From: Benoit Chesneau <benoitc at>
> To: Andrew Francis <andrewfr_ice at> 
> Cc: "python-ideas at" <python-ideas at> 
> Sent: Tuesday, October 23, 2012 5:48 PM
> Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from)
> On Oct 23, 2012, at 6:51 PM, Andrew Francis <andrewfr_ice at> wrote:
>> AF>This may be tangential to the discussion but in the past, I have used the module in conjunction with AF>CPython and greenlets to rapidly prototype parts of Go's model that are not present in Stackless, i.e. the select (ALT) AF>language feature. Rob Pike and Russ Cox were really helpful in answering my questions. Newer AF>implementations use continuelets so look for an older PyPy implementation. 
>> AF>I have also prototyped a subset of Polyphonic C# join patterns.  After I got the prototype running, I had an interesting AF>discussion with the authors of "Scalable Join Patterns."
> >Yes saw that. And actually some part of the Task code is based on  but using greenlets, >Channels have been slightly modified to be thread-safe and support buffering. Did you release your code >somewhere ? It could be interesting to put the experience further.
> You may be mistaking my work with someone else.
Oh was just saying i made this change.

> I didn't add buffering but that t is relatively easy to do without altering Stackless Python's internals. However I believe that synchronous channels with buffering is a simple and powerful concurrency model. Go's implementers got it right. John Reppy (currently a NSF director) talks about synchronous channel's power in a Concurrent ML book. 
> If  you go to to the Stackless repository example page
> you will find the code for a modified that implements Go's select statement. 

Thanks for the link.
> Since I am giving a talk in Toronto soon, I will soon release a new version of the join pattern version with documentation and examples. The code is about a year old and I have learnt new things.  I can mail you an archive and you are free to play with it and ask questions. 
> Since this is somewhat off-topic, the reason I mention all this is that if you want to experiment with a Go style system, I think it easiest to work from something like and greenlets than start from scratch. 
> Cheers,
> Andrew

Right. Actually flower is working well for simle purpose. My goal is more  about testing new ideas about concurrency and async handling on python. Tomorrow I will push a new branch using Futures and libuv in its own thread.

- beno?t

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Wed Oct 24 23:30:04 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 25 Oct 2012 10:30:04 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Yury Selivanov wrote:

> It's not about try..finally nesting, it's about Scheduler being aware
> that a coroutine is in its 'finally' block and thus shouldn't be interrupted
> at the moment

It would be a bad idea to make a close() method, or anything else
that might be needed for cleanup purposes, be a 'yield from' call.
If it's an ordinary function, it can't be interrupted in the world
we're talking about, so the PEP 419 problem doesn't apply.

If I were feeling in a radical mood, I might go as far as suggesting
that 'yield' and 'yield from' be syntactically forbidden inside
a finally clause. That would force you to design your cleanup
code to be safe from interruptions.


From guido at  Thu Oct 25 00:43:27 2012
From: guido at (Guido van Rossum)
Date: Wed, 24 Oct 2012 15:43:27 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 24, 2012 at 2:30 PM, Greg Ewing <greg.ewing at> wrote:
> Yury Selivanov wrote:
>> It's not about try..finally nesting, it's about Scheduler being aware
>> that a coroutine is in its 'finally' block and thus shouldn't be
>> interrupted
>> at the moment
> It would be a bad idea to make a close() method, or anything else
> that might be needed for cleanup purposes, be a 'yield from' call.
> If it's an ordinary function, it can't be interrupted in the world
> we're talking about, so the PEP 419 problem doesn't apply.
> If I were feeling in a radical mood, I might go as far as suggesting
> that 'yield' and 'yield from' be syntactically forbidden inside
> a finally clause. That would force you to design your cleanup
> code to be safe from interruptions.

What's the problem with just letting the cleanup take as long as it
wants to and do whatever it wants? That's how try/finally works in
regular Python code.

--Guido van Rossum (

From at  Thu Oct 25 00:47:02 2012
From: at (Yury Selivanov)
Date: Wed, 24 Oct 2012 18:47:02 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>


On 2012-10-24, at 5:30 PM, Greg Ewing <greg.ewing at> wrote:

> Yury Selivanov wrote:
>> It's not about try..finally nesting, it's about Scheduler being aware
>> that a coroutine is in its 'finally' block and thus shouldn't be interrupted
>> at the moment
> It would be a bad idea to make a close() method, or anything else
> that might be needed for cleanup purposes, be a 'yield from' call.
> If it's an ordinary function, it can't be interrupted in the world
> we're talking about, so the PEP 419 problem doesn't apply.
> If I were feeling in a radical mood, I might go as far as suggesting
> that 'yield' and 'yield from' be syntactically forbidden inside
> a finally clause. That would force you to design your cleanup
> code to be safe from interruptions.

I'm not sure it would be a good idea... Cleanup code for a DB connection
*will* need to run queries to the database (at least in some circumstances).  
And we can't make them blocking.


From at  Thu Oct 25 01:03:17 2012
From: at (Yury Selivanov)
Date: Wed, 24 Oct 2012 19:03:17 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Guido,

On 2012-10-24, at 6:43 PM, Guido van Rossum <guido at> wrote:

> On Wed, Oct 24, 2012 at 2:30 PM, Greg Ewing <greg.ewing at> wrote:
>> Yury Selivanov wrote:
>>> It's not about try..finally nesting, it's about Scheduler being aware
>>> that a coroutine is in its 'finally' block and thus shouldn't be
>>> interrupted
>>> at the moment
>> It would be a bad idea to make a close() method, or anything else
>> that might be needed for cleanup purposes, be a 'yield from' call.
>> If it's an ordinary function, it can't be interrupted in the world
>> we're talking about, so the PEP 419 problem doesn't apply.
>> If I were feeling in a radical mood, I might go as far as suggesting
>> that 'yield' and 'yield from' be syntactically forbidden inside
>> a finally clause. That would force you to design your cleanup
>> code to be safe from interruptions.
> What's the problem with just letting the cleanup take as long as it
> wants to and do whatever it wants? That's how try/finally works in
> regular Python code.

The problem appears when you add timeouts support.

Let me show you an abstract example (I won't use yield_froms, but I'm
sure that the problem is the same with them):

   def fetch_comments(app):
       session = yield app.new_session()
            return (yield session.query(...))
            yield session.close()

and now we execute that with:

   #: Get a list of comments; throw a TimeoutError if it
   #: takes more than 1 second
   comments = yield fetch_comments(app).with_timeout(1.0)

Now, scheduler starts with 'fetch_comments', then executes
'new_session', then executes 'session.query' in a round-robin fashion.

Imagine, that database query took a bit less than a second to execute,
scheduler pushes the result in coroutine, and then a timeout event occurs.
So scheduler throws a 'TimeoutError' in the coroutine, thus preventing
the 'session.close' to be executed.  There is no way for a scheduler to
understand, that there is no need in pushing the exception right now,
as the coroutine is in its finally block.

And this situation is a pretty common when you have such timeouts 
mechanism in place and widely used.


From guido at  Thu Oct 25 01:12:00 2012
From: guido at (Guido van Rossum)
Date: Wed, 24 Oct 2012 16:12:00 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 24, 2012 at 4:03 PM, Yury Selivanov < at> wrote:
> Hi Guido,
> On 2012-10-24, at 6:43 PM, Guido van Rossum <guido at> wrote:
>> What's the problem with just letting the cleanup take as long as it
>> wants to and do whatever it wants? That's how try/finally works in
>> regular Python code.

> The problem appears when you add timeouts support.
> Let me show you an abstract example (I won't use yield_froms, but I'm
> sure that the problem is the same with them):
>    @coroutine
>    def fetch_comments(app):
>        session = yield app.new_session()
>        try:
>             return (yield session.query(...))
>        finally:
>             yield session.close()
> and now we execute that with:
>    #: Get a list of comments; throw a TimeoutError if it
>    #: takes more than 1 second
>    comments = yield fetch_comments(app).with_timeout(1.0)
> Now, scheduler starts with 'fetch_comments', then executes
> 'new_session', then executes 'session.query' in a round-robin fashion.
> Imagine, that database query took a bit less than a second to execute,
> scheduler pushes the result in coroutine, and then a timeout event occurs.
> So scheduler throws a 'TimeoutError' in the coroutine, thus preventing
> the 'session.close' to be executed.  There is no way for a scheduler to
> understand, that there is no need in pushing the exception right now,
> as the coroutine is in its finally block.
> And this situation is a pretty common when you have such timeouts
> mechanism in place and widely used.

Ok, I can understand. But still, this is a problem with timeouts in
general, not just with timeouts in a yield-based environment. How does
e.g. Twisted deal with this?

As a work-around, I could imagine some kind of with-statement that
tells the scheduler we're already in the finally clause (it could
still send you a timeout if your cleanup takes way too long):

  yield <regular code>
  with protect_finally():
    yield <cleanup code>

Of course this could be abused, but at your own risk -- the scheduler
only gives you a fixed amount of extra time and then it's quits.

--Guido van Rossum (

From Steve.Dower at  Thu Oct 25 01:25:15 2012
From: Steve.Dower at (Steve Dower)
Date: Wed, 24 Oct 2012 23:25:15 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

>On Wed, Oct 24, 2012 at 4:03 PM, Yury Selivanov < at> wrote:
>> Hi Guido,
>> On 2012-10-24, at 6:43 PM, Guido van Rossum <guido at> wrote:
>>> What's the problem with just letting the cleanup take as long as it 
>>> wants to and do whatever it wants? That's how try/finally works in 
>>> regular Python code.
>> The problem appears when you add timeouts support.
>> Let me show you an abstract example (I won't use yield_froms, but I'm 
>> sure that the problem is the same with them):
>>    @coroutine
>>    def fetch_comments(app):
>>        session = yield app.new_session()
>>        try:
>>             return (yield session.query(...))
>>        finally:
>>             yield session.close()
>> and now we execute that with:
>>    #: Get a list of comments; throw a TimeoutError if it
>>    #: takes more than 1 second
>>    comments = yield fetch_comments(app).with_timeout(1.0)
>> Now, scheduler starts with 'fetch_comments', then executes 
>> 'new_session', then executes 'session.query' in a round-robin fashion.
>> Imagine, that database query took a bit less than a second to execute, 
>> scheduler pushes the result in coroutine, and then a timeout event occurs.
>> So scheduler throws a 'TimeoutError' in the coroutine, thus preventing 
>> the 'session.close' to be executed.  There is no way for a scheduler 
>> to understand, that there is no need in pushing the exception right 
>> now, as the coroutine is in its finally block.
>> And this situation is a pretty common when you have such timeouts 
>> mechanism in place and widely used.
>Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this?
>As a work-around, I could imagine some kind of with-statement that tells the scheduler we're already in the finally clause (it could still send you a timeout if your cleanup takes way too long):
>  yield <regular code>
>  with protect_finally():
>    yield <cleanup code>
>Of course this could be abused, but at your own risk -- the scheduler only gives you a fixed amount of extra time and then it's quits.

Could another workaround be to spawn the cleanup code without yielding - in effect saying "go and do this, but don't come back"? Then there is nowhere for the scheduler to throw the exception.

I ask because this falls out naturally with my implementation (code is coming, but work is taking priority right now): "do_cleanup()" instead of "yield do_cleanup()". I haven't tried it in this context yet, so no idea whether it works, but I don't see why it wouldn't. In a system without the @async decorator you'd need a "scheduler.current.spawn(do_cleanup)" instead of yield [from]s, but it can still be done.


From at  Thu Oct 25 01:26:32 2012
From: at (Yury Selivanov)
Date: Wed, 24 Oct 2012 19:26:32 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-24, at 7:12 PM, Guido van Rossum <guido at> wrote:

> On Wed, Oct 24, 2012 at 4:03 PM, Yury Selivanov < at> wrote:
>> Hi Guido,
>> On 2012-10-24, at 6:43 PM, Guido van Rossum <guido at> wrote:
>>> What's the problem with just letting the cleanup take as long as it
>>> wants to and do whatever it wants? That's how try/finally works in
>>> regular Python code.
>> The problem appears when you add timeouts support.
>> Let me show you an abstract example (I won't use yield_froms, but I'm
>> sure that the problem is the same with them):
>>   @coroutine
>>   def fetch_comments(app):
>>       session = yield app.new_session()
>>       try:
>>            return (yield session.query(...))
>>       finally:
>>            yield session.close()
>> and now we execute that with:
>>   #: Get a list of comments; throw a TimeoutError if it
>>   #: takes more than 1 second
>>   comments = yield fetch_comments(app).with_timeout(1.0)
>> Now, scheduler starts with 'fetch_comments', then executes
>> 'new_session', then executes 'session.query' in a round-robin fashion.
>> Imagine, that database query took a bit less than a second to execute,
>> scheduler pushes the result in coroutine, and then a timeout event occurs.
>> So scheduler throws a 'TimeoutError' in the coroutine, thus preventing
>> the 'session.close' to be executed.  There is no way for a scheduler to
>> understand, that there is no need in pushing the exception right now,
>> as the coroutine is in its finally block.
>> And this situation is a pretty common when you have such timeouts
>> mechanism in place and widely used.
> Ok, I can understand. But still, this is a problem with timeouts in
> general, not just with timeouts in a yield-based environment. How does
> e.g. Twisted deal with this?

I don't know, I hope someone with an expertise in Twisted can tell us.

But I would imagine that they don't have this particular problem, as it
should be related only to coroutines and schedulers that run them.  I.e.
it's a problem when you run some code and may interrupt it.  And you can't
interrupt a plain python code that uses callbacks without yields and

> As a work-around, I could imagine some kind of with-statement that
> tells the scheduler we're already in the finally clause (it could
> still send you a timeout if your cleanup takes way too long):
> try:
>  yield <regular code>
> finally:
>  with protect_finally():
>    yield <cleanup code>
> Of course this could be abused, but at your own risk -- the scheduler
> only gives you a fixed amount of extra time and then it's quits.

Right, that's the basic approach.  But it also gives you a feeling of
a "broken" language feature.  I.e. we have coroutines, but we can not
implement timeouts on top of them without making 'finally' blocks
look ugly.  And if we assume that you can run any coroutine with a
timeout - you'll need to use 'protect_finally' in virtually every
'finally' statement.

I solved the problem by dynamically inlining 'with protect_finally()'
code in @coroutine decorator (something that I would never suggest to
put in the stdlib, btw).  There is also PEP 419, but I don't like it as
well, as it is tied to frames--two low level (and I'm not sure how it
will work with future CPython optimizations and PyPy's JIT.)

BUT, the concept is nice.  I've implemented a number of protocols with
yield-coroutines, and managing timeouts with a simple ".with_timeout()"
call is a very handy and readable feature.  So, I hope, that we can
all brainstorm this problem to make coroutines "complete", if we decide
to start using them widely.


From at  Thu Oct 25 01:37:57 2012
From: at (Yury Selivanov)
Date: Wed, 24 Oct 2012 19:37:57 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-24, at 7:25 PM, Steve Dower <Steve.Dower at> wrote:
> Could another workaround be to spawn the cleanup code without yielding - in effect saying "go and do this, but don't come back"? Then there is nowhere for the scheduler to throw the exception.
> I ask because this falls out naturally with my implementation (code is coming, but work is taking priority right now): "do_cleanup()" instead of "yield do_cleanup()". I haven't tried it in this context yet, so no idea whether it works, but I don't see why it wouldn't. In a system without the @async decorator you'd need a "scheduler.current.spawn(do_cleanup)" instead of yield [from]s, but it can still be done.

Well, yes, this will work.

If we have the following:

    # "async()" is a way to launch coroutines in my framework without 
    # "coming back"; with it they just return a promise/future that needs 
    # to be yielded again
        yield c.close().async()

The solution is very limited though.  Imagine if you have lots of cleanup code

        yield c1.close().async() # go and do this, but don't come back
        yield c2.close().async()

The above won't work, as scheduler would have an opportunity to break
everything on the second 'yield'.

You may solve it by grouping cleanup code in a separate inner coroutine, 

    def do_stuff():
            def cleanup():
                yield c1.close()
                yield c2.close()

            yield cleanup().async() # go and do this, but don't come back

But that looks even worse than using 'with protect_finally()'.


From guido at  Thu Oct 25 01:43:13 2012
From: guido at (Guido van Rossum)
Date: Wed, 24 Oct 2012 16:43:13 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov < at> wrote:
> On 2012-10-24, at 7:12 PM, Guido van Rossum <guido at> wrote:
>> Ok, I can understand. But still, this is a problem with timeouts in
>> general, not just with timeouts in a yield-based environment. How does
>> e.g. Twisted deal with this?

> I don't know, I hope someone with an expertise in Twisted can tell us.
> But I would imagine that they don't have this particular problem, as it
> should be related only to coroutines and schedulers that run them.  I.e.
> it's a problem when you run some code and may interrupt it.  And you can't
> interrupt a plain python code that uses callbacks without yields and
> greenlets.

Well, but in the Twisted world, if a cleanup callback requires more
blocking calls, it has to spawn more deferred callbacks. So I think
they *do* have the problem, unless they don't have a way at all to
constrain the total running time of an action involving cascading
callbacks. Also, they have inlineCallbacks which does use yield.

>> As a work-around, I could imagine some kind of with-statement that
>> tells the scheduler we're already in the finally clause (it could
>> still send you a timeout if your cleanup takes way too long):
>> try:
>>  yield <regular code>
>> finally:
>>  with protect_finally():
>>    yield <cleanup code>
>> Of course this could be abused, but at your own risk -- the scheduler
>> only gives you a fixed amount of extra time and then it's quits.
> Right, that's the basic approach.  But it also gives you a feeling of
> a "broken" language feature.  I.e. we have coroutines, but we can not
> implement timeouts on top of them without making 'finally' blocks
> look ugly.  And if we assume that you can run any coroutine with a
> timeout - you'll need to use 'protect_finally' in virtually every
> 'finally' statement.

I think the problem may be with timeouts, or with doing blocking I/O
in cleanup clauses. I suspect that any system implementing timeouts
has subtle bugs.

> I solved the problem by dynamically inlining 'with protect_finally()'
> code in @coroutine decorator (something that I would never suggest to
> put in the stdlib, btw).  There is also PEP 419, but I don't like it as
> well, as it is tied to frames--two low level (and I'm not sure how it
> will work with future CPython optimizations and PyPy's JIT.)
> BUT, the concept is nice.  I've implemented a number of protocols with
> yield-coroutines, and managing timeouts with a simple ".with_timeout()"
> call is a very handy and readable feature.  So, I hope, that we can
> all brainstorm this problem to make coroutines "complete", if we decide
> to start using them widely.

I think the with-clause is the solution.

Note that in a world with only blocking calls this *can* be a problem
(despite your repeated claims that it's not a problem there) -- a
common approach to giving operations a timeout is sending it a SIGTERM
(which you can easily call with a signal handler in Python) when the
deadline is over, then sending it more SIGTERM signals every few
seconds until it dies, and sending SIGKILL (which can't be caught) if
it takes too long to die.

--Guido van Rossum (

From at  Thu Oct 25 02:00:50 2012
From: at (Yury Selivanov)
Date: Wed, 24 Oct 2012 20:00:50 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-24, at 7:43 PM, Guido van Rossum <guido at> wrote:
> On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov < at> wrote:
>> On 2012-10-24, at 7:12 PM, Guido van Rossum <guido at> wrote:
>>> Ok, I can understand. But still, this is a problem with timeouts in
>>> general, not just with timeouts in a yield-based environment. How does
>>> e.g. Twisted deal with this?
>> I don't know, I hope someone with an expertise in Twisted can tell us.
>> But I would imagine that they don't have this particular problem, as it
>> should be related only to coroutines and schedulers that run them.  I.e.
>> it's a problem when you run some code and may interrupt it.  And you can't
>> interrupt a plain python code that uses callbacks without yields and
>> greenlets.
> Well, but in the Twisted world, if a cleanup callback requires more
> blocking calls, it has to spawn more deferred callbacks. So I think
> they *do* have the problem, unless they don't have a way at all to
> constrain the total running time of an action involving cascading
> callbacks. Also, they have inlineCallbacks which does use yield.


I was under impression that you don't just use 'finally' stmt but rather
setup a Deferred with a cleanup callback.  Anyways, I'm now curious enough
so I'll take a look...

> Note that in a world with only blocking calls this *can* be a problem
> (despite your repeated claims that it's not a problem there) -- a
> common approach to giving operations a timeout is sending it a SIGTERM
> (which you can easily call with a signal handler in Python) when the
> deadline is over, then sending it more SIGTERM signals every few
> seconds until it dies, and sending SIGKILL (which can't be caught) if
> it takes too long to die.

Yes, you're right.  I guess I've just never seen anybody trying to protect
their 'finally' statements from being interrupted with a signal.

Whereas with coroutines we needed to protect lots of them, as otherwise
we had many and many bugs with unclosed database connections etc.

So 'protect_finally' is going to be a very common thing to use.


From at  Thu Oct 25 02:16:31 2012
From: at (Yury Selivanov)
Date: Wed, 24 Oct 2012 20:16:31 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-24, at 8:00 PM, Yury Selivanov < at> wrote:

> On 2012-10-24, at 7:43 PM, Guido van Rossum <guido at> wrote:
>> On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov < at> wrote:
>>> On 2012-10-24, at 7:12 PM, Guido van Rossum <guido at> wrote:
>>>> Ok, I can understand. But still, this is a problem with timeouts in
>>>> general, not just with timeouts in a yield-based environment. How does
>>>> e.g. Twisted deal with this?
>>> I don't know, I hope someone with an expertise in Twisted can tell us.
>>> But I would imagine that they don't have this particular problem, as it
>>> should be related only to coroutines and schedulers that run them.  I.e.
>>> it's a problem when you run some code and may interrupt it.  And you can't
>>> interrupt a plain python code that uses callbacks without yields and
>>> greenlets.
>> Well, but in the Twisted world, if a cleanup callback requires more
>> blocking calls, it has to spawn more deferred callbacks. So I think
>> they *do* have the problem, unless they don't have a way at all to
>> constrain the total running time of an action involving cascading
>> callbacks. Also, they have inlineCallbacks which does use yield.
> Right.
> I was under impression that you don't just use 'finally' stmt but rather
> setup a Deferred with a cleanup callback.  Anyways, I'm now curious enough
> so I'll take a look...

Well, that wasn't too hard to find:



From Steve.Dower at  Thu Oct 25 02:24:11 2012
From: Steve.Dower at (Steve Dower)
Date: Thu, 25 Oct 2012 00:24:11 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

>On 2012-10-24, at 8:00 PM, Yury Selivanov < at> wrote:
>> On 2012-10-24, at 7:43 PM, Guido van Rossum <guido at> wrote:
>>> On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov < at> wrote:
>>>> On 2012-10-24, at 7:12 PM, Guido van Rossum <guido at> wrote:
>>>>> Ok, I can understand. But still, this is a problem with timeouts in 
>>>>> general, not just with timeouts in a yield-based environment. How 
>>>>> does e.g. Twisted deal with this?
>>>> I don't know, I hope someone with an expertise in Twisted can tell us.
>>>> But I would imagine that they don't have this particular problem, as 
>>>> it should be related only to coroutines and schedulers that run them.  I.e.
>>>> it's a problem when you run some code and may interrupt it.  And you 
>>>> can't interrupt a plain python code that uses callbacks without 
>>>> yields and greenlets.
>>> Well, but in the Twisted world, if a cleanup callback requires more 
>>> blocking calls, it has to spawn more deferred callbacks. So I think 
>>> they *do* have the problem, unless they don't have a way at all to 
>>> constrain the total running time of an action involving cascading 
>>> callbacks. Also, they have inlineCallbacks which does use yield.
>> Right.
>> I was under impression that you don't just use 'finally' stmt but 
>> rather setup a Deferred with a cleanup callback.  Anyways, I'm now 
>> curious enough so I'll take a look...
>Well, that wasn't too hard to find:

Maybe our approach to timeouts should be based on running two tasks in parallel, where the second delays for the timeout period and then cancels the first (I believe this is what they're doing in Twisted). My vision for cancellation involves the worker task polling (or whatever is appropriate for low-level tasks), rather than an exception being forced in by the scheduler, so this avoids the finally issue - it's too late to cancel the task at that point.

It also strengthens the case for including a cancellation protocol, which I was keen on anyway.


From greg.ewing at  Thu Oct 25 02:49:30 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 25 Oct 2012 13:49:30 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 25/10/12 11:43, Guido van Rossum wrote:

> What's the problem with just letting the cleanup take as long as it
> wants to and do whatever it wants?

IIUC, the worry is not about time, it's that either

1) another task could run during the cleanup and mess
something up, or

2) an exception could be thrown into the task during
the cleanup and prevent it being completed.

 From a correctness standpoint, it doesn't matter if the
cleanup takes a long time, as long as it doesn't yield.


From greg.ewing at  Thu Oct 25 02:52:30 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 25 Oct 2012 13:52:30 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 25/10/12 11:47, Yury Selivanov wrote:
> Cleanup code for a DB connection
> *will* need to run queries to the database (at least in some circumstances).

That smells like a design problem to me. If something goes wrong,
the most you should have to do is roll back any transactions
you were in the middle of. Trying to perform further queries
is just inviting more trouble.


From greg.ewing at  Thu Oct 25 02:56:52 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 25 Oct 2012 13:56:52 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 25/10/12 12:12, Guido van Rossum wrote:
> Of course this could be abused, but at your own risk -- the scheduler
> only gives you a fixed amount of extra time and then it's quits.

Which is another good reason to design your cleanup code so that
it can't take an arbitrarily long time.


From at  Thu Oct 25 03:07:45 2012
From: at (Yury Selivanov)
Date: Wed, 24 Oct 2012 21:07:45 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-24, at 8:52 PM, Greg Ewing <greg.ewing at> wrote:

> On 25/10/12 11:47, Yury Selivanov wrote:
>> Cleanup code for a DB connection
>> *will* need to run queries to the database (at least in some circumstances).
> That smells like a design problem to me. If something goes wrong,
> the most you should have to do is roll back any transactions
> you were in the middle of. Trying to perform further queries
> is just inviting more trouble.

Right.  And that rolling back - a tiny db query "rollback" - is an
async code, and where there is an async code, no matter how tiny and fast, - 
scheduler has an opportunity to screw it up.

Guido's 'with protected_finally' should work, although it probably
will look weird for for people unfamiliar with coroutines and this
particular problem.


From greg.ewing at  Thu Oct 25 03:29:32 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 25 Oct 2012 14:29:32 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 25/10/12 14:07, Yury Selivanov wrote:
> Right.  And that rolling back - a tiny db query "rollback" - is an
> async code,

Only if we implement it as a blocking operation as far as our
task scheduler is concerned. I wouldn't do it that way -- I'd
perform it synchronously and assume it'll be fast enough for
that not to be a problem.

BTW, we seem to be using different definitions for the term
"query". To my way of thinking, a rollback is *not* a query,
even if it happens to be triggered by sending a "rollback"
command to the SQL interpreter. At the Python API level,
it should appear as a distinct operation with its own


From greg.ewing at  Thu Oct 25 03:34:59 2012
From: greg.ewing at (Greg Ewing)
Date: Thu, 25 Oct 2012 14:34:59 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 25/10/12 12:43, Guido van Rossum wrote:
> Note that in a world with only blocking calls this *can* be a problem...
 > a common approach to giving operations a timeout is sending it a SIGTERM

Well, yes, if you have preemptive interruptions of some kind, then
things are a lot trickier. But I'm assuming we're using cooperative
scheduling *instead* of things like that.

(Note that in the face of preemption, I don't think it's possible
to solve this problem completely without language support, because
there will always be a small window of opportunity between
entering the finally clause and getting into the with-statement
or whatever that you're using to block asynchronous signals.)


From at  Thu Oct 25 04:25:16 2012
From: at (Yury Selivanov)
Date: Wed, 24 Oct 2012 22:25:16 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>


On 2012-10-24, at 9:29 PM, Greg Ewing <greg.ewing at> wrote:

> On 25/10/12 14:07, Yury Selivanov wrote:
>> Right.  And that rolling back - a tiny db query "rollback" - is an
>> async code,
> Only if we implement it as a blocking operation as far as our
> task scheduler is concerned. I wouldn't do it that way -- I'd
> perform it synchronously and assume it'll be fast enough for
> that not to be a problem.

In a non-blocking application there is no way of running a blocking code,
even if it's anticipated to block for a mere millisecond.  Because if something
gets out of control and it blocks for a longer period of time - everything
just stops, right?  

Or did you mean something else with "synchronously" (perhaps Steve Dower's 

> BTW, we seem to be using different definitions for the term
> "query". To my way of thinking, a rollback is *not* a query,
> even if it happens to be triggered by sending a "rollback"
> command to the SQL interpreter. At the Python API level,
> it should appear as a distinct operation with its own
> method.

Right.  I meant that "sending a rollback command to the SQL interpreter"
part--this should be done through a non-blocking socket.  To invoke an 
operation on a non-blocking socket we have to do it through 'yield' or 
'yield from', hence - give scheduler a chance to interrupt the coroutine.

Given the fact that we know, that the clean-up code should be simple and fast,
it still contains coroutine context switches in real world code, be it due to 
the need of sending some information via a socket, or just by calling some other 
coroutine.  If you write a single 'yield' in your finally block, and that (or 
caller) coroutine is called with a timeout, there is a chance that its 'finally' 
block execution will be aborted by a scheduler.  Writing this yield/non-blocking 
type of code in finally blocks is a necessity, unfortunately.  And even if that 
cleanup code is incredibly fast, if you have a webserver that runs for 
days/weeks/months, bad things will happen.

So if we decide to adopt Guido's approach with explicitly marking critical
finally blocks (well, they are all critical) with 'with protected_finally()'
- allright.  If we somehow invent a mechanism that would allow us to hide
this all from user and protect finally blocks implicitly in scheduler - that's
even better.

Or we should design a totally different approach of handling timeouts, and try
to not to interrupt coroutines at all.


From at  Thu Oct 25 04:28:15 2012
From: at (Yury Selivanov)
Date: Wed, 24 Oct 2012 22:28:15 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing at> wrote:
> (Note that in the face of preemption, I don't think it's possible
> to solve this problem completely without language support, because
> there will always be a small window of opportunity between
> entering the finally clause and getting into the with-statement
> or whatever that you're using to block asynchronous signals.)


In my experience, though, broken finally blocks due to interruption
by a signal is a very rare thing (again, that maybe different for
someone else.)


From guido at  Thu Oct 25 04:51:04 2012
From: guido at (Guido van Rossum)
Date: Wed, 24 Oct 2012 19:51:04 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov < at> wrote:
> On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing at> wrote:
> [...]
>> (Note that in the face of preemption, I don't think it's possible
>> to solve this problem completely without language support, because
>> there will always be a small window of opportunity between
>> entering the finally clause and getting into the with-statement
>> or whatever that you're using to block asynchronous signals.)
> Agree.
> In my experience, though, broken finally blocks due to interruption
> by a signal is a very rare thing (again, that maybe different for
> someone else.)

We're far from our starting point: in a the yield-from (or yield)
world, there are no truly async interrupts, but anything that yields
may be interrupted, if we decide to implement timeouts by throwing an
exception into the generator (which seems the logical thing to do).
The with-statement can deal with this fine (there's no yield between
entering the finally and entering the with-block) but making the
cleanup into its own task (like Steve proposed) sounds fine too.

In any case this sounds like something that each framework should
decide for itself.

--Guido van Rossum (

From Steve.Dower at  Thu Oct 25 05:28:47 2012
From: Steve.Dower at (Steve Dower)
Date: Thu, 25 Oct 2012 03:28:47 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

This could alse be another application for extension options on futures:

    yield do_cleanup_1().set_options(never_raise=True)
    yield do_cleanup_2().set_options(never_raise=True)

The scheduler can then ignore exceptions (including CancelledError) instead of raising them. ('set_scheduler_hint' may be a better name than 'set_options', now I come to think of it. I like the extensibility of this, since I don't think anyone can predict what advanced options every scheduler may want - the function takes **params and updates a (lazily created) dict on the future.)

Of course, this will also work (and is pretty much equivalent):

    try: yield do_cleanup_1()
    except: pass
    try: yield do_cleanup_2()
    except: pass

We'll probably need/want some form of 'atomic' primitive anyway, which might work like this:

yield atomically(do_cleanup_1, do_cleanup_2, ...)

Though the behaviour of this when exceptions are involved gets complicated - do we abort all of them? Pass the exception on? Continue anyway? Which exception gets reported?


From: Python-ideas [ at] on behalf of Guido van Rossum [guido at]
Sent: Wednesday, October 24, 2012 7:51 PM
To: Yury Selivanov
Cc: python-ideas at
Subject: Re: [Python-ideas] Async API

On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov < at> wrote:
> On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing at> wrote:
> [...]
>> (Note that in the face of preemption, I don't think it's possible
>> to solve this problem completely without language support, because
>> there will always be a small window of opportunity between
>> entering the finally clause and getting into the with-statement
>> or whatever that you're using to block asynchronous signals.)
> Agree.
> In my experience, though, broken finally blocks due to interruption
> by a signal is a very rare thing (again, that maybe different for
> someone else.)

We're far from our starting point: in a the yield-from (or yield)
world, there are no truly async interrupts, but anything that yields
may be interrupted, if we decide to implement timeouts by throwing an
exception into the generator (which seems the logical thing to do).
The with-statement can deal with this fine (there's no yield between
entering the finally and entering the with-block) but making the
cleanup into its own task (like Steve proposed) sounds fine too.

In any case this sounds like something that each framework should
decide for itself.

--Guido van Rossum (
Python-ideas mailing list
Python-ideas at

From at  Thu Oct 25 06:37:57 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 00:37:57 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-24, at 10:51 PM, Guido van Rossum <guido at> wrote:

> On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov < at> wrote:
>> On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing at> wrote:
>> [...]
>>> (Note that in the face of preemption, I don't think it's possible
>>> to solve this problem completely without language support, because
>>> there will always be a small window of opportunity between
>>> entering the finally clause and getting into the with-statement
>>> or whatever that you're using to block asynchronous signals.)
>> Agree.
>> In my experience, though, broken finally blocks due to interruption
>> by a signal is a very rare thing (again, that maybe different for
>> someone else.)
> We're far from our starting point: in a the yield-from (or yield)
> world, there are no truly async interrupts, but anything that yields
> may be interrupted, if we decide to implement timeouts by throwing an
> exception into the generator (which seems the logical thing to do).
> The with-statement can deal with this fine (there's no yield between
> entering the finally and entering the with-block) but making the
> cleanup into its own task (like Steve proposed) sounds fine too.
> In any case this sounds like something that each framework should
> decide for itself.

BTW, is there a way of adding a read-only property to generator objects - 
'in_finally'?  Will it actually slow down things?


From at  Thu Oct 25 08:18:51 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 02:18:51 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-25, at 12:37 AM, Yury Selivanov < at> wrote:

> On 2012-10-24, at 10:51 PM, Guido van Rossum <guido at> wrote:
>> On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov < at> wrote:
>>> On 2012-10-24, at 9:34 PM, Greg Ewing <greg.ewing at> wrote:
>>> [...]
>>>> (Note that in the face of preemption, I don't think it's possible
>>>> to solve this problem completely without language support, because
>>>> there will always be a small window of opportunity between
>>>> entering the finally clause and getting into the with-statement
>>>> or whatever that you're using to block asynchronous signals.)
>>> Agree.
>>> In my experience, though, broken finally blocks due to interruption
>>> by a signal is a very rare thing (again, that maybe different for
>>> someone else.)
>> We're far from our starting point: in a the yield-from (or yield)
>> world, there are no truly async interrupts, but anything that yields
>> may be interrupted, if we decide to implement timeouts by throwing an
>> exception into the generator (which seems the logical thing to do).
>> The with-statement can deal with this fine (there's no yield between
>> entering the finally and entering the with-block) but making the
>> cleanup into its own task (like Steve proposed) sounds fine too.
>> In any case this sounds like something that each framework should
>> decide for itself.
> BTW, is there a way of adding a read-only property to generator objects - 
> 'in_finally'?  Will it actually slow down things?

Well, I couldn't resist and just implemented a *proof of concept* myself.
The patch is here:

The patch adds 'gi_in_finally' read-only property to generator objects.

There is no observable difference between patched & unpatched python
(latest master) in pybench.

Some small demo:

>>> def spam():
...     try:
...         yield 1
...     finally:
...         yield 2
...     yield 3
>>> gen = spam()
>>> gen.gi_in_finally, gen.send(None), gen.gi_in_finally
(0, 1, 0)
>>> gen.gi_in_finally, gen.send(None), gen.gi_in_finally
(0, 2, 1)
>>> gen.gi_in_finally, gen.send(None), gen.gi_in_finally
(1, 3, 0)
>>> gen.gi_in_finally, gen.send(None), gen.gi_in_finally
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

If we decide to merge this in cpython, then this whole problem with
'finally' statements can be solved (at least for generator-based

What do you think?


From paul at  Thu Oct 25 09:49:06 2012
From: paul at (Paul Colomiets)
Date: Thu, 25 Oct 2012 10:49:06 +0300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Yury,

On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov < at> wrote:
> Well, I couldn't resist and just implemented a *proof of concept* myself.
> The patch is here:
> The patch adds 'gi_in_finally' read-only property to generator objects.

Why haven't you used my implementation?


From paul at  Thu Oct 25 09:55:44 2012
From: paul at (Paul Colomiets)
Date: Thu, 25 Oct 2012 10:55:44 +0300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Hi Guido,

On Thu, Oct 25, 2012 at 2:43 AM, Guido van Rossum <guido at> wrote:
> On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov < at> wrote:
>> On 2012-10-24, at 7:12 PM, Guido van Rossum <guido at> wrote:
>>> Ok, I can understand. But still, this is a problem with timeouts in
>>> general, not just with timeouts in a yield-based environment. How does
>>> e.g. Twisted deal with this?
>> I don't know, I hope someone with an expertise in Twisted can tell us.
>> But I would imagine that they don't have this particular problem, as it
>> should be related only to coroutines and schedulers that run them.  I.e.
>> it's a problem when you run some code and may interrupt it.  And you can't
>> interrupt a plain python code that uses callbacks without yields and
>> greenlets.
> Well, but in the Twisted world, if a cleanup callback requires more
> blocking calls, it has to spawn more deferred callbacks. So I think
> they *do* have the problem, unless they don't have a way at all to
> constrain the total running time of an action involving cascading
> callbacks. Also, they have inlineCallbacks which does use yield.

AFAIR, in twisted there is no timeout on coroutine, there is a timeout
on request, which is usually just a socket timeout. So there is no
problem of interrupting the code in arbitrary places.

Another twisted thing, is doing all writes asynchronously with respect
to user code, so if you want to write something and close a connection
for finalization you just call:


And they do not return deferreds, so it returns immediately even if
the socket is not writable at the moment. (IIRC, it never writes right
now, but rather from reactor callback)


From _ at  Thu Oct 25 13:46:29 2012
From: _ at (Laurens Van Houtven)
Date: Thu, 25 Oct 2012 13:46:29 +0200
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Sorry, working really long hours these days; just wanted to chime in that
yes, you can call transport.write with large strings, and the reactor will
do the right thing under the hood: loseConnection is the polite way of
dropping a connection, which should wait for all pending writes to finish

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Thu Oct 25 15:37:17 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 09:37:17 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

]On 2012-10-25, at 3:49 AM, Paul Colomiets <paul at> wrote:

> Hi Yury,
> On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov < at> wrote:
>> Well, I couldn't resist and just implemented a *proof of concept* myself.
>> The patch is here:
>> The patch adds 'gi_in_finally' read-only property to generator objects.
> Why haven't you used my implementation?

Because it's a different thing.  Yours is a PEP 419 implementation --
'sys.setcleanuphook'.  Mine is a quick hack to add 'gi_in_finally' property
to generators and see how good/bad it is.


From guido at  Thu Oct 25 16:43:20 2012
From: guido at (Guido van Rossum)
Date: Thu, 25 Oct 2012 07:43:20 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 25, 2012 at 4:46 AM, Laurens Van Houtven <_ at> wrote:
> Sorry, working really long hours these days; just wanted to chime in that
> yes, you can call transport.write with large strings, and the reactor will
> do the right thing under the hood: loseConnection is the polite way of
> dropping a connection, which should wait for all pending writes to finish
> etc.

This seems a decent enough pattern. It also makes it possible to use
one of these things as a substitute for a writable file object, so you
can e.g. use it as sys.stdout or the stream for a

Still, I wonder what happens if the socket/pipe/whatever that is
written to is very slow and the program produces too much data. Does
memory just balloon up, or is there some kind of throttling of the
writer? Or a buffer overflow exception? For a totally general solution
I would at least like to have the *option* of doing synchronous

(I'm asking these questions because I'd like to copy this useful
pattern -- but I want to get the end cases right.)

--Guido van Rossum (

From guido at  Thu Oct 25 16:44:58 2012
From: guido at (Guido van Rossum)
Date: Thu, 25 Oct 2012 07:44:58 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 25, 2012 at 6:37 AM, Yury Selivanov < at> wrote:
> ]On 2012-10-25, at 3:49 AM, Paul Colomiets <paul at> wrote:
>> Hi Yury,
>> On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov < at> wrote:
>>> Well, I couldn't resist and just implemented a *proof of concept* myself.
>>> The patch is here:
>>> The patch adds 'gi_in_finally' read-only property to generator objects.
>> Why haven't you used my implementation?
> Because it's a different thing.  Yours is a PEP 419 implementation --
> 'sys.setcleanuphook'.  Mine is a quick hack to add 'gi_in_finally' property
> to generators and see how good/bad it is.

I feel it's a code smell if you need to use this feature a lot. If you
need it rarely, well, use one of the existing work-arounds.

--Guido van Rossum (

From mark.hackett at  Thu Oct 25 16:48:32 2012
From: mark.hackett at (Mark Hackett)
Date: Thu, 25 Oct 2012 15:48:32 +0100
Subject: [Python-ideas] Enabling man page structure for python
Message-ID: <>

In trying to reduce the repetition of option arguments to python scripts I 
basically needed to allow some structure to the program to be able to be 
automatically mangled so it could be used in

a) the getopt() call
b) the -h (give call usage) option in the program
c) Synopsis subheading in the man page
d) Options subheading in the man page

rather than having to keep all in synch just because someone wanted a "-j" 
option added.

Because it requires a programmed man page creation, Sphinx, pydoc et al 
haven't been really of any use, since they are YAML (Yet Another Markup 
Language) as far as I've been able to tell, not really able to allow runtime 
changes to reflect in document generation. I may have missed how, however...

So I used a dictionary and wrote a program to generate man pages based on that 
dictionary and included function calls to automate the four repetitions above 
into one structure, rather similar to what you need for ArgParse.

A dictionary allowed me to check the ordering, existence and allow optional 
and updatable sections to be used in man page writing. It also gave me a 
reason to use docstrings in public functions.

I know man pages are passe and GUIs generally don't bother at all, but it 
still seems to me that adding some core python utility to express a man page 
and allow programmatic use of the construction both to define the program and 
its description is still a large gap.

Making man pages easier to write would be enough, but I also think that if 
newcomers could see some utility in writing documentation inside the programs, 
they would do so more readily. And this learnt behaviour is useful elsewhere.

The attached program (if it appears!) is my solution, basically baby python. 
It still has one redundant repetition because getopt() does it that way. And 
it has some possibly silly but useful markup based on the basic python data 
types (e.g. it displays a list differently from a scalar string).

It is meant to illustrate what I felt was not possible with python as-is to 
see if there is a way to make this work done redundant.

There are a few other people out there who have had to roll-their-own answer 
to the same problems. They solved it slightly differently and didn't include an 
ability to enforce "good practice" in man page creation which I think is 

So I do feel there is room for python to stop us flailing around trying to find 
our own solution.

Is there agreement from others?
-------------- next part --------------
A non-text attachment was scrubbed...
Type: text/x-python
Size: 13874 bytes
Desc: not available
URL: <>

From mikegraham at  Thu Oct 25 17:09:35 2012
From: mikegraham at (Mike Graham)
Date: Thu, 25 Oct 2012 11:09:35 -0400
Subject: [Python-ideas] Enabling man page structure for python
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 25, 2012 at 10:48 AM, Mark Hackett
<mark.hackett at> wrote:
> Because it requires a programmed man page creation, Sphinx, pydoc et al
> haven't been really of any use, since they are YAML (Yet Another Markup
> Language) as far as I've been able to tell, not really able to allow runtime
> changes to reflect in document generation. I may have missed how, however...

Use to make a manpage with
sphinx. I wouldn't be shocked if other documentation systems had
something similar.

I wouldn't be opposed to having argparse have some builtin or
third-party capability for generating manpages. I wouldn't use getopt
myself for anything but mimicing old, established, getopt-based
interfaces. argparse has a lot more functionality already and it's
more reasonable to expand it since it's a Python thing, not a
pre-established thing.


From at  Thu Oct 25 17:12:11 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 11:12:11 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-25, at 10:44 AM, Guido van Rossum <guido at> wrote:

> On Thu, Oct 25, 2012 at 6:37 AM, Yury Selivanov < at> wrote:
>> ]On 2012-10-25, at 3:49 AM, Paul Colomiets <paul at> wrote:
>>> Hi Yury,
>>> On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov < at> wrote:
>>>> Well, I couldn't resist and just implemented a *proof of concept* myself.
>>>> The patch is here:
>>>> The patch adds 'gi_in_finally' read-only property to generator objects.
>>> Why haven't you used my implementation?
>> Because it's a different thing.  Yours is a PEP 419 implementation --
>> 'sys.setcleanuphook'.  Mine is a quick hack to add 'gi_in_finally' property
>> to generators and see how good/bad it is.
> I feel it's a code smell if you need to use this feature a lot. If you
> need it rarely, well, use one of the existing work-arounds.

But the feature isn't going to be used by users directly.  It will be used 
only in scheduler implementations.  Users will just write 'finally' blocks 
and they will work as expected. This just makes coroutines look and behave 
more like ordinary functions.  Isn't it one of our goals--to make it 
convenient and reliable?


From Steve.Dower at  Thu Oct 25 17:28:29 2012
From: Steve.Dower at (Steve Dower)
Date: Thu, 25 Oct 2012 15:28:29 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

>>> Mine is a quick hack to add 'gi_in_finally' property
>>> to generators and see how good/bad it is.
>> I feel it's a code smell if you need to use this feature a lot. If you
>> need it rarely, well, use one of the existing work-arounds.
>But the feature isn't going to be used by users directly.  It will be used
>only in scheduler implementations.  Users will just write 'finally' blocks
>and they will work as expected. This just makes coroutines look and behave
>more like ordinary functions.  Isn't it one of our goals--to make it
>convenient and reliable?

I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case?

        yield some_op()
        yield cleanup_that_raises_network_error()
except NetworkError:
    # will we ever see this?

Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to.


From at  Thu Oct 25 17:39:00 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 11:39:00 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-25, at 11:28 AM, Steve Dower <Steve.Dower at> wrote:

>>>> Mine is a quick hack to add 'gi_in_finally' property
>>>> to generators and see how good/bad it is.
>>> I feel it's a code smell if you need to use this feature a lot. If you
>>> need it rarely, well, use one of the existing work-arounds.
>> But the feature isn't going to be used by users directly.  It will be used
>> only in scheduler implementations.  Users will just write 'finally' blocks
>> and they will work as expected. This just makes coroutines look and behave
>> more like ordinary functions.  Isn't it one of our goals--to make it
>> convenient and reliable?
> I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case?
> try:
>    try:
>        yield some_op()
>    finally:
>        yield cleanup_that_raises_network_error()
> except NetworkError:
>    # will we ever see this?
> Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to.

We can.  You can experiment with the approach--I've implemented it
a bit differently and it proved to work.  Now we're just talking
about making this feature supported on the interpreter level.

As for your example - I'm not sure what's the NetworkError is and how 
it relates to TimeoutError...

But if you have something like this:

          yield some_op().with_timeout(0.1)
          yield something_else()
  except TimeoutError:
      # Then everything would be just fine here.

Look, it all the same as if you just drop yields.  Generators already
support 'finally' clause perfectly.


From Steve.Dower at  Thu Oct 25 17:43:01 2012
From: Steve.Dower at (Steve Dower)
Date: Thu, 25 Oct 2012 15:43:01 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

>>>>> Mine is a quick hack to add 'gi_in_finally' property to generators 
>>>>> and see how good/bad it is.
>>>> I feel it's a code smell if you need to use this feature a lot. If 
>>>> you need it rarely, well, use one of the existing work-arounds.
>>> But the feature isn't going to be used by users directly.  It will be 
>>> used only in scheduler implementations.  Users will just write 
>>> 'finally' blocks and they will work as expected. This just makes 
>>> coroutines look and behave more like ordinary functions.  Isn't it 
>>> one of our goals--to make it convenient and reliable?
>> I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case?
>> try:
>>    try:
>>        yield some_op()
>>    finally:
>>        yield cleanup_that_raises_network_error()
>> except NetworkError:
>>    # will we ever see this?
>> Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to.
>We can.  You can experiment with the approach--I've implemented it a bit differently and it proved to work.  Now we're just talking about making this feature supported on the interpreter level.
>As for your example - I'm not sure what's the NetworkError is and how it relates to TimeoutError...
>But if you have something like this:
>  try:
>      try:
>          yield some_op().with_timeout(0.1)
>      finally:
>          yield something_else()
>  except TimeoutError:
>      # Then everything would be just fine here.
>Look, it all the same as if you just drop yields.  Generators already support 'finally' clause perfectly.

The type of the error is irrelevant - if something_else() might raise an exception that is expected, it won't be passed in because the scheduler is suppressing exceptions inside finally blocks. Or perhaps I've misunderstood the point of gi_in_finally?


From at  Thu Oct 25 17:47:57 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 11:47:57 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-25, at 11:43 AM, Steve Dower <Steve.Dower at> wrote:

>>>>>> Mine is a quick hack to add 'gi_in_finally' property to generators 
>>>>>> and see how good/bad it is.
>>>>> I feel it's a code smell if you need to use this feature a lot. If 
>>>>> you need it rarely, well, use one of the existing work-arounds.
>>>> But the feature isn't going to be used by users directly.  It will be 
>>>> used only in scheduler implementations.  Users will just write 
>>>> 'finally' blocks and they will work as expected. This just makes 
>>>> coroutines look and behave more like ordinary functions.  Isn't it 
>>>> one of our goals--to make it convenient and reliable?
>>> I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case?
>>> try:
>>>   try:
>>>       yield some_op()
>>>   finally:
>>>       yield cleanup_that_raises_network_error()
>>> except NetworkError:
>>>   # will we ever see this?
>>> Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to.
>> We can.  You can experiment with the approach--I've implemented it a bit differently and it proved to work.  Now we're just talking about making this feature supported on the interpreter level.
>> As for your example - I'm not sure what's the NetworkError is and how it relates to TimeoutError...
>> But if you have something like this:
>> try:
>>     try:
>>         yield some_op().with_timeout(0.1)
>>     finally:
>>         yield something_else()
>> except TimeoutError:
>>     # Then everything would be just fine here.
>> Look, it all the same as if you just drop yields.  Generators already support 'finally' clause perfectly.
> The type of the error is irrelevant - if something_else() might raise an exception that is expected, it won't be passed in because the scheduler is suppressing exceptions inside finally blocks. Or perhaps I've misunderstood the point of gi_in_finally?

The only thing scheduler will ever suppress--is its *own* intent to
*interrupt* something (until `gi_in_finally` gets back to 0.)
Every other exception must be propagated as usual, without even
checking `gi_in_finally` flag.


From guido at  Thu Oct 25 17:58:54 2012
From: guido at (Guido van Rossum)
Date: Thu, 25 Oct 2012 08:58:54 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Yuri, please give up this particular issue (trying to patch CPython to
record whether a generator is in a finally clause). I have failed to
explain my reasons why I think it is a bad idea, but you haven't
convinced me it's a good idea, and we have at least two decent
work-arounds. So let me just use the release cycle as an argument:
your patch is a new feature, 3.3 just came out, so it cannot be
introduced until 3.4. I don't want to wait for that.

--Guido van Rossum (

From mark.hackett at  Thu Oct 25 18:08:14 2012
From: mark.hackett at (Mark Hackett)
Date: Thu, 25 Oct 2012 17:08:14 +0100
Subject: [Python-ideas] Enabling man page structure for python
In-Reply-To: <>
References: <>
Message-ID: <>

On Thursday 25 Oct 2012, Mike Graham wrote:
> On Thu, Oct 25, 2012 at 10:48 AM, Mark Hackett
> <mark.hackett at> wrote:
> > Because it requires a programmed man page creation, Sphinx, pydoc et al
> > haven't been really of any use, since they are YAML (Yet Another Markup
> > Language) as far as I've been able to tell, not really able to allow
> > runtime changes to reflect in document generation. I may have missed how,
> > however...
> Use to make a manpage with
> sphinx. I wouldn't be shocked if other documentation systems had
> something similar.
> I wouldn't be opposed to having argparse have some builtin or
> third-party capability for generating manpages. I wouldn't use getopt
> myself for anything but mimicing old, established, getopt-based
> interfaces. argparse has a lot more functionality already and it's
> more reasonable to expand it since it's a Python thing, not a
> pre-established thing.
> Mike

Sphinx allows better formatting control and then translation to troff macros. 
But doesn't help encourage and self-write those man page sections. Certainly 
much of the code would be rendered obsolete by using Sphinx calls, but the 
production of the man page and reduction of duplication won't happen.

For future inclusion, if it were to be included, argparse's method would be 
usable for defining the options. I don't know argparse benefits from having 
information about man pages in it, however, so a utility/class/method/include 
that can operate on what argparse requires to do the writing of the section(s) 
is entirely sensible. This may push argparse to include items that aren't used 
in itself, solely for documentation purposes.

If some methodology for solving this duplication with man page content were 
put in future python releases, that same methodology could be written into 
home-built code by those who have not yet access to the latest python at their 
work, with at least the sop to their efforts that nobody using their suite will 
have to relearn another way of doing it.

e.g. turning the argparse arguments into a getopt() call is pretty trivial if 
you don't have access to the argparse method.

From at  Thu Oct 25 18:10:32 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 12:10:32 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-25, at 11:58 AM, Guido van Rossum <guido at> wrote:

> Yuri, please give up this particular issue (trying to patch CPython to
> record whether a generator is in a finally clause). I have failed to
> explain my reasons why I think it is a bad idea, but you haven't
> convinced me it's a good idea, and we have at least two decent
> work-arounds. So let me just use the release cycle as an argument:
> your patch is a new feature, 3.3 just came out, so it cannot be
> introduced until 3.4. I don't want to wait for that.


One question: what do we actually want to get?  What're the goals?

- A specification (PEP?) of how to make stdlib more async-friendly?

- To develop a separate library that may be included in the stdlib
one day?

- And what's your opinion on writing a PEP about making it possible 
to pass a custom socket-factory to stdlib objects?

I'm (and I think it's not just me) a bit lost here, after reading 100s 
of emails on python-ideas.  And I just want to know where to channel my 
energy and expertise ;)


From storchaka at  Thu Oct 25 18:42:19 2012
From: storchaka at (Serhiy Storchaka)
Date: Thu, 25 Oct 2012 19:42:19 +0300
Subject: [Python-ideas] Enabling man page structure for python
In-Reply-To: <>
References: <>
Message-ID: <k6bq5j$h29$>

On 25.10.12 19:08, Mark Hackett wrote:
> But doesn't help encourage and self-write those man page sections.

Try help2man.

From merwok at  Thu Oct 25 19:25:29 2012
From: merwok at (=?UTF-8?B?w4lyaWMgQXJhdWpv?=)
Date: Thu, 25 Oct 2012 13:25:29 -0400
Subject: [Python-ideas] Enabling man page structure for python
In-Reply-To: <>
References: <>
Message-ID: <>


See ?argparse: add ability to create a
man page?


From Steve.Dower at  Thu Oct 25 19:39:23 2012
From: Steve.Dower at (Steve Dower)
Date: Thu, 25 Oct 2012 17:39:23 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

> One question: what do we actually want to get?  What're the goals?
> - A specification (PEP?) of how to make stdlib more async-friendly?
> - To develop a separate library that may be included in the stdlib one day?
> - And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects?
> I'm (and I think it's not just me) a bit lost here, after reading 100s of emails on python-ideas.  And I just want to know where to channel my energy and expertise ;)

It's not just you, I'm not entirely clear on what we expect to end up with either.

My current view is that we'll get a PEP that defines a convention for user code and an interface for schedulers. Adding *_async() methods to the entire standard library could take a long time and should probably be divided up so we can have really experienced devs on particular areas (e.g. someone on Windows sockets, someone else on Linux sockets, etc.) and may need individual PEPs.

My hope is that the first PEP provides a protocol for users to defer the rest of a task until after some/any operation has completed - I don't really want sockets/networking/files/threads/etc. to leak through at all, though these are all important use cases that need to be tried.

This is the way I'm approaching it, so please let me know if I'm off the mark :)


From guido at  Thu Oct 25 19:58:08 2012
From: guido at (Guido van Rossum)
Date: Thu, 25 Oct 2012 10:58:08 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 25, 2012 at 9:10 AM, Yury Selivanov < at> wrote:
> One question: what do we actually want to get?  What're the goals?

Good question. I'm still in the requirements gathering phase myself.

> - A specification (PEP?) of how to make stdlib more async-friendly?

That's one of the hopeful goals, but a lot of things need to be
decided before we can start adapting the stdlib. It is also likely
that this will be a process that will take several release (and may
never finish completely).

> - To develop a separate library that may be included in the stdlib
> one day?

That's one way I am pursuing and I hope others will too.

> - And what's your opinion on writing a PEP about making it possible
> to pass a custom socket-factory to stdlib objects?

That sounds like it might be jumping to a specific solution. I agree
that the stdlib often, unfortunately, couples classes too tightly,
where a class that needs an instance of another class just
instantiates that other class rather than having an instance passed in
(at least as an option). We're doing better with files these days --
most APIs (that I can think of) that work with streams let you pass
one in. So maybe you're on to something. Perhaps, as a step towards
the exploration of this PEP, you could come up with a concrete list of
modules and classes (or other API elements) that you think would
benefit from being able to pass in a socket? Please start another
thread -- python-ideas is fine. I will read it.

> I'm (and I think it's not just me) a bit lost here, after reading 100s
> of emails on python-ideas.  And I just want to know where to channel my
> energy and expertise ;)

Totally understood. I'm overwhelmed myself by the vast array of
options. Still, I have been writing some experimental code myself, and
I am beginning to understand in which direction I'd like to move.

I am thinking of having a strict separation between an event loop, a
task scheduler, specific transports, and protocol implementations.

- The event loop in turn separates into a component that knows how to
poll for I/O (or other) events using the best mechanism available on
the platform, and a part that manages callback functions -- these are
closely tied together, but the idea is that the callback management
part does not have to vary by platform, so only the I/O polling needs
to be a platform-specific. Details subject to bikeshedding (I've only
got something working on Linux and OSX so far). One of the
requirements for this event loop is that it should be possible to run
frameworks like Twisted or Tornado using an adapter to it, and it
should also be possible for Twisted/Tornado/etc. to provide their own
event loop (again via some kind of adaptation) to replace the default

- For the task scheduler I am piling all my hopes on PEP-380, i.e.
yield from. I have not found a single thing that is harder to do using
this style than using the PEP-342 yield <future> style, and I really
don't like mixing the two up (despite what Steve Dower says :-). But I
don't want the event loop interface to know about this at all --
howver the scheduler has to know about the event loop (at least its
interface). I am currently refactoring my ideas in this area; I think
I'll end up with a Task object that smells a bit like a Future, but
represents a whole stack of generator invocations linked via
yield-from, and which allows suspension of the entire stack at once;
user code only needs to use Tasks when it wants to schedule multiple
activities concurrently, not when it just wants to be able to yield.
(This may be the core insight in favor of PEP 380.)

- Transports (e.g. TCP): I feel like a newbie here. I know sockets
pretty well, but the key is to introduce abstractions that let you
easily replace a transport with a different one -- e.g. TCP vs. pipes
vs. SSL. Twisted clearly has paved the way here -- even if we end up
slicing the abstractions somewhat differently, the road to the optimal
interface has to take the same road that Twisted took -- implement a
simple transport using sockets, then add another transport, refactor
the abstractions to share the commonalities and separate the
differences, then try adding yet another transport, rinse and repeat.
We should provide a bunch of common transports but also let people
build new ones; however, there will probably be way fewer transport
implementations than protocol implementations.

- Protocols (e.g. HTTP): A protocol should ideally be able to work
with any transport (though obviously some protocols require certain
transport extensions -- hopefully we'll have a small hierarchy of
abstract classes defining different transport styles and
capabilities). We should provide a bunch of common protocols (e.g. a
good HTTP client and server) but this is where users will most often
be writing their own -- so the APIs used by protocol implementations
must be documented especially well, the standard protocol
implementations must be examples of excellent coding style, and the
transport implementations should not let protocol implementations get
away with undefined behavior. It would be useful to have explicit
testing support too -- just like there's a WSGI validator, we could
have a protocol validator that acts like a particularly picky
transport. (I found this idea in a library written by Jim Fulton for
Zope, I think it's zope.ngi. It's a valuable idea.)

I think it's inevitable that the choice of using PEP-380 will be
reflected in the abstract classes defining transports and protocols.
Hopefully we will be able to bridge between the PEP-380 world and
Twisted's world of Deferred somehow -- the event loop is one interface
layer, but I think we can build adapters for the other levels as well
(at least for transports).

One final thought: async WSGI anyone?

--Guido van Rossum (

From at  Thu Oct 25 20:39:18 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 14:39:18 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>


Thank you for such a detailed and deep response.  Lots of good thoughts
to digest.

One idea: the scope of the problem is enormously big.  It may take
months/years to synchronize all ideas and thoughts by just communicating
ideas over mail list without a concrete thing and subject to discuss.
How about you/we create a repository with a draft implementation of
scheduler/io loop/coroutines engine and we simply start tweaking an
discussing that particular design?  That way people will see where
to start the discussion, what's done, and some will even participate?
The goal is not to write a production-quality software, but rather to
have a common place to discuss/try things/benchmark etc.  I'm not sure,
but maybe places like bitbucket, where you can have a wiki, issues, and
the actual code is a better place, than a mail-list.

I also think that there's need to move concurrency-related discussions
to a separate mail-list, as everything else on python-ideas is lost

On 2012-10-25, at 1:58 PM, Guido van Rossum <guido at> wrote:
>> - And what's your opinion on writing a PEP about making it possible
>> to pass a custom socket-factory to stdlib objects?
> That sounds like it might be jumping to a specific solution. I agree
> that the stdlib often, unfortunately, couples classes too tightly,
> where a class that needs an instance of another class just
> instantiates that other class rather than having an instance passed in
> (at least as an option). We're doing better with files these days --
> most APIs (that I can think of) that work with streams let you pass
> one in. So maybe you're on to something. Perhaps, as a step towards
> the exploration of this PEP, you could come up with a concrete list of
> modules and classes (or other API elements) that you think would
> benefit from being able to pass in a socket? Please start another
> thread -- python-ideas is fine. I will read it.

OK, I will, in a week or two.  Need some time for a research.

> - For the task scheduler I am piling all my hopes on PEP-380, i.e.
> yield from. I have not found a single thing that is harder to do using
> this style than using the PEP-342 yield <future> style, and I really
> don't like mixing the two up (despite what Steve Dower says :-). But I
> don't want the event loop interface to know about this at all --
> howver the scheduler has to know about the event loop (at least its
> interface). I am currently refactoring my ideas in this area; I think
> I'll end up with a Task object that smells a bit like a Future, but
> represents a whole stack of generator invocations linked via
> yield-from, and which allows suspension of the entire stack at once;
> user code only needs to use Tasks when it wants to schedule multiple
> activities concurrently, not when it just wants to be able to yield.
> (This may be the core insight in favor of PEP 380.)

The only problem I have with PEP-380, is that to me it's not entirely 
clear when you should use 'yield' or 'yield from' (please correct me if 
I am wrong).  I'll try to demonstrate it by example:

class Socket:
    def sendall(self, payload):
        f = Future()
        IOLoop.sendall(payload, future=f)
        return f

class SMTP:
    def send(self, s):
        # yield the returned future to the scheduler
        yield self.sock.sendall(s)

# And later:
s = SMTP()
yield from s.send('spam')

Is it (roughly) how you want it all to look like?  I.e. using 'yield' to
send a future/task to the scheduler, and 'yield from' to delegate?

If I guessed correctly, and that's how you envision it, I have a question:
What if you decide to refactor 'Socket.sendall' to be a coroutine?
In that case you'd want users to call it 'yield from Socket.sendall', and
not 'yield Socket.sendall'.

Thank you,

From guido at  Thu Oct 25 21:25:06 2012
From: guido at (Guido van Rossum)
Date: Thu, 25 Oct 2012 12:25:06 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 25, 2012 at 11:39 AM, Yury Selivanov
< at> wrote:
> Thank you for such a detailed and deep response.  Lots of good thoughts
> to digest.

You're welcome.

> One idea: the scope of the problem is enormously big.  It may take
> months/years to synchronize all ideas and thoughts by just communicating
> ideas over mail list without a concrete thing and subject to discuss.
> How about you/we create a repository with a draft implementation of
> scheduler/io loop/coroutines engine and we simply start tweaking an
> discussing that particular design?  That way people will see where
> to start the discussion, what's done, and some will even participate?
> The goal is not to write a production-quality software, but rather to
> have a common place to discuss/try things/benchmark etc.  I'm not sure,
> but maybe places like bitbucket, where you can have a wiki, issues, and
> the actual code is a better place, than a mail-list.

I am currently working on code. Steve Dower has also said he's going
to write some code. I'm just not quite ready to show my code (I need
to do a few more iterations on each component). As long as I can use
Mercurial I'm happy; bitbucket or Google Code Hosting both work fine
for me.

> I also think that there's need to move concurrency-related discussions
> to a separate mail-list, as everything else on python-ideas is lost
> now.

I don't have that problem. You are the one who started a new thread. :-)

If you really want a new mailing list, you can set it up; I'd be happy
to join, but my preference would be to stick it out here; I've seen
too many specialized lists and SIGs dwindle after an initial burst of

> The only problem I have with PEP-380, is that to me it's not entirely
> clear when you should use 'yield' or 'yield from' (please correct me if
> I am wrong).  I'll try to demonstrate it by example:
> class Socket:
>     def sendall(self, payload):
>         f = Future()
>         IOLoop.sendall(payload, future=f)
>         return f
> class SMTP:
>     def send(self, s):
>         ...
>         # yield the returned future to the scheduler
>         yield self.sock.sendall(s)
>         ...
> # And later:
> s = SMTP()
> yield from s.send('spam')
> Is it (roughly) how you want it all to look like?  I.e. using 'yield' to
> send a future/task to the scheduler, and 'yield from' to delegate?

I think that's the style that Steve Dower prefers. Greg Ewing would
rather see all public APIs use yield from, and reserve plain yield
exclusively as an implementation detail of the scheduler. In my own
experimental code I am using Greg's style and it is working out great.
My main reason for taking a hard stance on this is that it would
otherwise be too confusing for users -- should they use yield, yield
from, or a plain call? I'd like to tell them "if it blocks, use yield

BTW, if you haven't read Greg's introduction to this style, here it is
-- worth reading!

> If I guessed correctly, and that's how you envision it, I have a question:
> What if you decide to refactor 'Socket.sendall' to be a coroutine?
> In that case you'd want users to call it 'yield from Socket.sendall', and
> not 'yield Socket.sendall'.

That's why using yield from all the way is better!

--Guido van Rossum (

From at  Thu Oct 25 21:36:28 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 15:36:28 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-25, at 3:25 PM, Guido van Rossum <guido at> wrote:
> On Thu, Oct 25, 2012 at 11:39 AM, Yury Selivanov
> < at> wrote:

>> One idea: the scope of the problem is enormously big.  It may take
>> months/years to synchronize all ideas and thoughts by just communicating
>> ideas over mail list without a concrete thing and subject to discuss.
>> How about you/we create a repository with a draft implementation of
>> scheduler/io loop/coroutines engine and we simply start tweaking an
>> discussing that particular design?  That way people will see where
>> to start the discussion, what's done, and some will even participate?
>> The goal is not to write a production-quality software, but rather to
>> have a common place to discuss/try things/benchmark etc.  I'm not sure,
>> but maybe places like bitbucket, where you can have a wiki, issues, and
>> the actual code is a better place, than a mail-list.
> I am currently working on code. Steve Dower has also said he's going
> to write some code. I'm just not quite ready to show my code (I need
> to do a few more iterations on each component). As long as I can use
> Mercurial I'm happy; bitbucket or Google Code Hosting both work fine
> for me.

OK.  Let's wait until we have a somewhat stable platform to work with.

>> Is it (roughly) how you want it all to look like?  I.e. using 'yield' to
>> send a future/task to the scheduler, and 'yield from' to delegate?
> I think that's the style that Steve Dower prefers. Greg Ewing would
> rather see all public APIs use yield from, and reserve plain yield
> exclusively as an implementation detail of the scheduler. In my own
> experimental code I am using Greg's style and it is working out great.
> My main reason for taking a hard stance on this is that it would
> otherwise be too confusing for users -- should they use yield, yield
> from, or a plain call? I'd like to tell them "if it blocks, use yield
> from".
> BTW, if you haven't read Greg's introduction to this style, here it is
> -- worth reading!
>> If I guessed correctly, and that's how you envision it, I have a question:
>> What if you decide to refactor 'Socket.sendall' to be a coroutine?
>> In that case you'd want users to call it 'yield from Socket.sendall', and
>> not 'yield Socket.sendall'.
> That's why using yield from all the way is better!

Yes, that now makes sense!  
I'll definitely take a look at Greg's article.


From tjreedy at  Thu Oct 25 22:39:40 2012
From: tjreedy at (Terry Reedy)
Date: Thu, 25 Oct 2012 16:39:40 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <k6c82h$hrn$>

On 10/25/2012 12:10 PM, Yury Selivanov wrote:

> - And what's your opinion on writing a PEP about making it possible
> to pass a custom socket-factory to stdlib objects?

I think this is probably a good idea quite aside from async issues. For 
one thing, it would make testing with a mock-socket class easier. Issues 
to decide: name of parameter (should be same for all socket using 
classes); keyword only? (ditto).

I am not sure this needs a PEP. Most parameter additions are just 
tracker issues. But I would be worthwhile to decide on the details here 

Terry Jan Reedy

From at  Thu Oct 25 22:51:09 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 16:51:09 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <k6c82h$hrn$>
References: <>
Message-ID: <>

On 2012-10-25, at 4:39 PM, Terry Reedy <tjreedy at> wrote:

> On 10/25/2012 12:10 PM, Yury Selivanov wrote:
>> - And what's your opinion on writing a PEP about making it possible
>> to pass a custom socket-factory to stdlib objects?
> I think this is probably a good idea quite aside from async issues. For one thing, it would make testing with a mock-socket class easier. Issues to decide: name of parameter (should be same for all socket using classes); keyword only? (ditto).

Right, good catch on mocking sockets!

As for the issues: I think that the parameter name should be the 
same/very consistent, and surely keyword-only.

> I am not sure this needs a PEP. Most parameter additions are just tracker issues. But I would be worthwhile to decide on the details here first.

We'll see.  I'll start with a detailed post on python-ideas, and
if the PEP looks like an overkill - I'd be glad to skip the PEP step.


From tjreedy at  Thu Oct 25 23:06:17 2012
From: tjreedy at (Terry Reedy)
Date: Thu, 25 Oct 2012 17:06:17 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <k6c9ke$ve6$>

On 10/25/2012 4:51 PM, Yury Selivanov wrote:
> On 2012-10-25, at 4:39 PM, Terry Reedy
> <tjreedy at> wrote:
>> On 10/25/2012 12:10 PM, Yury Selivanov wrote:
>>> - And what's your opinion on writing a PEP about making it
>>> possible to pass a custom socket-factory to stdlib objects?
>> I think this is probably a good idea quite aside from async issues.
>> For one thing, it would make testing with a mock-socket class
>> easier. Issues to decide: name of parameter (should be same for all
>> socket using classes); keyword only? (ditto).
> Right, good catch on mocking sockets!
> As for the issues: I think that the parameter name should be the
> same/very consistent, and surely keyword-only.

I left out the following issue: should the argument be a 
socket-returning callable (a 'socket-factory' as you called it above) or 
an opened socket?

For files, we variously pass file names to be used with the default 
opener, opened files, and file descriptors, but never an alternate 
opener (such as StringIO). One reason is the the user typically needs a 
handle on the file object in order to later retrieve the contents.

I am not sure that the same applies to sockets. If I ask the ftp module 
to get or send a file, I should not ever need to see the socket used for 
the transport.

Terry Jan Reedy

From guido at  Thu Oct 25 23:12:57 2012
From: guido at (Guido van Rossum)
Date: Thu, 25 Oct 2012 14:12:57 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <k6c9ke$ve6$>
References: <>
Message-ID: <>

Please start a new thread for this sub-topic.

Note that for mocking, you won't need to pass in a socket object; you
can just mock out socket.socket() directly using Michael Foord's
all-singing all-dancing unittest.mock module (now in the Python 3

On Thu, Oct 25, 2012 at 2:06 PM, Terry Reedy <tjreedy at> wrote:
> On 10/25/2012 4:51 PM, Yury Selivanov wrote:
>> On 2012-10-25, at 4:39 PM, Terry Reedy
>> <tjreedy at> wrote:
>>> On 10/25/2012 12:10 PM, Yury Selivanov wrote:
>>>> - And what's your opinion on writing a PEP about making it
>>>> possible to pass a custom socket-factory to stdlib objects?
>>> I think this is probably a good idea quite aside from async issues.
>>> For one thing, it would make testing with a mock-socket class
>>> easier. Issues to decide: name of parameter (should be same for all
>>> socket using classes); keyword only? (ditto).
>> Right, good catch on mocking sockets!
>> As for the issues: I think that the parameter name should be the
>> same/very consistent, and surely keyword-only.
> I left out the following issue: should the argument be a socket-returning
> callable (a 'socket-factory' as you called it above) or an opened socket?
> For files, we variously pass file names to be used with the default opener,
> opened files, and file descriptors, but never an alternate opener (such as
> StringIO). One reason is the the user typically needs a handle on the file
> object in order to later retrieve the contents.
> I am not sure that the same applies to sockets. If I ask the ftp module to
> get or send a file, I should not ever need to see the socket used for the
> transport.
> --
> Terry Jan Reedy
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

--Guido van Rossum (

From greg.ewing at  Thu Oct 25 23:22:40 2012
From: greg.ewing at (Greg Ewing)
Date: Fri, 26 Oct 2012 10:22:40 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

If the main concern in all of this is timeouts, it should be
possible to address that without adding any more interpreter

For example, when a timeout exception is thrown, whatever is
responsible for that can flag the task as being in the process
of handling a timeout, and refrain from initiating any more
timeouts until that flag is cleared.


From greg.ewing at  Thu Oct 25 23:30:07 2012
From: greg.ewing at (Greg Ewing)
Date: Fri, 26 Oct 2012 10:30:07 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Steve Dower wrote:
> The type of the error is irrelevant - if something_else() might raise an
> exception that is expected, it won't be passed in because the scheduler is
> suppressing exceptions inside finally blocks. Or perhaps I've misunderstood the
> point of gi_in_finally?

IIUC, it's only *asynchronous* exceptions that would be blocked --
i.e. ones thrown in from a different task, or arising from an
external event such as a timeout. An exception raised explicity
by the task's own code would be unaffected.


From at  Fri Oct 26 01:50:52 2012
From: at (Yury Selivanov)
Date: Thu, 25 Oct 2012 19:50:52 -0400
Subject: [Python-ideas]
Message-ID: <>


I remember a discussion to make pointed to py3k docs by

Are we still going to do that?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From andrew.svetlov at  Fri Oct 26 08:29:23 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Fri, 26 Oct 2012 09:29:23 +0300
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

+1 for switching default

On Fri, Oct 26, 2012 at 2:50 AM, Yury Selivanov < at> wrote:
> Hi,
> I remember a discussion to make pointed to py3k docs by
> default.
> Are we still going to do that?
> -
> Yury
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Andrew Svetlov

From ncoghlan at  Fri Oct 26 10:47:11 2012
From: ncoghlan at (Nick Coghlan)
Date: Fri, 26 Oct 2012 18:47:11 +1000
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

Eventually, but not just yet :)

Definitely by 3.4, but maybe earlier if it seems appropriate.


Sent from my phone, thus the relative brevity :)
On Oct 26, 2012 9:51 AM, "Yury Selivanov" < at> wrote:

> Hi,
> I remember a discussion to make pointed to py3k docs by
> default.
> Are we still going to do that?
> -
> Yury
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From fetchinson at  Fri Oct 26 11:22:30 2012
From: fetchinson at (Daniel Fetchinson)
Date: Fri, 26 Oct 2012 11:22:30 +0200
Subject: [Python-ideas] list of reserved identifiers in program?
Message-ID: <>

Hi folks,

Would it be a good idea to have a built-in list of strings containing
the reserved identifiers of python such as 'assert', 'import', etc?

The reason I think this would be useful is that whenever I write a
class with user defined methods I always have to exclude the reserved
keywords. So for instance myinstance.mymethod( ) is okay but
myinstance.assert( ) is not. In these cases I use the convention
myinstance._assert( ), etc. In order to test for these cases I hard
code the keywords in a list and test from there. I take the list of
keywords from
But what if these change in the future?

So if I would have a built-in list containing all the keywords of the
given interpreter version in question my life would be that much

What do you think?


Psss, psss, put it down! -

From christian at  Fri Oct 26 11:28:55 2012
From: christian at (Christian Heimes)
Date: Fri, 26 Oct 2012 11:28:55 +0200
Subject: [Python-ideas] list of reserved identifiers in program?
In-Reply-To: <>
References: <>
Message-ID: <k6dl4m$1m2$>

Am 26.10.2012 11:22, schrieb Daniel Fetchinson:
> Hi folks,
> Would it be a good idea to have a built-in list of strings containing
> the reserved identifiers of python such as 'assert', 'import', etc?

Something like ? :)


From fetchinson at  Fri Oct 26 11:34:44 2012
From: fetchinson at (Daniel Fetchinson)
Date: Fri, 26 Oct 2012 11:34:44 +0200
Subject: [Python-ideas] list of reserved identifiers in program?
In-Reply-To: <k6dl4m$1m2$>
References: <>
Message-ID: <>

>> Would it be a good idea to have a built-in list of strings containing
>> the reserved identifiers of python such as 'assert', 'import', etc?
> Something like
> ? :)

Exactly! Thanks a lot, I did not know about it before!


Psss, psss, put it down! -

From rob.cliffe at  Fri Oct 26 11:33:47 2012
From: rob.cliffe at (Rob Cliffe)
Date: Fri, 26 Oct 2012 10:33:47 +0100
Subject: [Python-ideas] list of reserved identifiers in program?
In-Reply-To: <>
References: <>
Message-ID: <>

On 26/10/2012 10:22, Daniel Fetchinson wrote:
> Hi folks,
> Would it be a good idea to have a built-in list of strings containing
> the reserved identifiers of python such as 'assert', 'import', etc?
> The reason I think this would be useful is that whenever I write a
> class with user defined methods I always have to exclude the reserved
> keywords. So for instance myinstance.mymethod( ) is okay but
> myinstance.assert( ) is not. In these cases I use the convention
> myinstance._assert( ), etc. In order to test for these cases I hard
> code the keywords in a list and test from there. I take the list of
> keywords from
> But what if these change in the future?
> So if I would have a built-in list containing all the keywords of the
> given interpreter version in question my life would be that much
> easier.
> What do you think?
> Cheers,
> Daniel
 >>> import keyword
 >>> keyword.kwlist
['and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 
'elif', 'else', 'except', 'exec', 'finally', 'for', 'from', 'global', 
'if', 'import', 'in', 'is', 'lambda', 'not', 'or', 'pass', 'print', 
'raise', 'return', 'try', 'while', 'with', 'yield']
Rob Cliffe

From steve at  Fri Oct 26 11:42:25 2012
From: steve at (Steven D'Aprano)
Date: Fri, 26 Oct 2012 20:42:25 +1100
Subject: [Python-ideas] list of reserved identifiers in program?
In-Reply-To: <>
References: <>
Message-ID: <>

On 26/10/12 20:22, Daniel Fetchinson wrote:
> Hi folks,
> Would it be a good idea to have a built-in list of strings containing
> the reserved identifiers of python such as 'assert', 'import', etc?
> The reason I think this would be useful is that whenever I write a
> class with user defined methods I always have to exclude the reserved
> keywords. So for instance myinstance.mymethod( ) is okay but
> myinstance.assert( ) is not. In these cases I use the convention
> myinstance._assert( ), etc.

The usual convention is that leading underscores are private, and
trailing underscores are used to avoid name clashes with reserved words.

So myinstance.assert_ rather than myinstance._assert, which would be
considered "private, do not use".


From mark.hackett at  Fri Oct 26 11:58:51 2012
From: mark.hackett at (Mark Hackett)
Date: Fri, 26 Oct 2012 10:58:51 +0100
Subject: [Python-ideas] list of reserved identifiers in program?
In-Reply-To: <>
References: <>
Message-ID: <>

On Friday 26 Oct 2012, Steven D'Aprano wrote:
> On 26/10/12 20:22, Daniel Fetchinson wrote:
> > Hi folks,
> >
> > Would it be a good idea to have a built-in list of strings containing
> > the reserved identifiers of python such as 'assert', 'import', etc?
> >
> > The reason I think this would be useful is that whenever I write a
> > class with user defined methods I always have to exclude the reserved
> > keywords. So for instance myinstance.mymethod( ) is okay but
> > myinstance.assert( ) is not. In these cases I use the convention
> > myinstance._assert( ), etc.
> The usual convention is that leading underscores are private, and
> trailing underscores are used to avoid name clashes with reserved words.
> So myinstance.assert_ rather than myinstance._assert, which would be
> considered "private, do not use".

One story I heard about development was a site that had included as an early 
C++ header had

#define private public

If users REALLY want to use a function you though was private, they will.

Convention works just as well without having people go to extreme lengths to 
avoid it (where their use case makes it beneficial).

From sturla at  Fri Oct 26 12:27:03 2012
From: sturla at (Sturla Molden)
Date: Fri, 26 Oct 2012 12:27:03 +0200
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 23.10.2012 21:33, Yury Selivanov wrote:

 >      topics =[
 >          FE.publication_date,
 >          FE.body,
 >          FE.category,
 >          (FE.creator, [
 >              (FE.creator.subject, [
 >                  (gpi, [
 >                      gpi.avatar
 >                  ])
 >              ])
 >          ])
 >      ]).filter(FE.publication_date<,
 >                FE.category == self.category)

Why use Python when you clearly want Java?


From ned at  Fri Oct 26 13:55:06 2012
From: ned at (Ned Batchelder)
Date: Fri, 26 Oct 2012 07:55:06 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 10/25/2012 7:50 PM, Yury Selivanov wrote:
> Hi,
> I remember a discussion to make 
> <> pointed to py3k docs by default.
> Are we still going to do that?

Before we do anything to make py3 the default, let's please provide a 
navigation bar that shows the version, and makes it easy to switch 
between versions?  Py2 is still vastly more used.


> -
> Yury
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From kristjan at  Fri Oct 26 14:03:32 2012
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Fri, 26 Oct 2012 12:03:32 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

> -----Original Message-----
> From: Python-ideas [mailto:python-ideas-
> at] On Behalf Of Sam Rushing
> Sent: 23. okt?ber 2012 23:01
> To: Yury Selivanov
> Cc: python-ideas at
> Subject: Re: [Python-ideas] Async API
> In shrapnel it is simply:
>     coro.with_timeout (<seconds>, <fun>, *args, **kwargs)
> Timeouts are caught thus:
>    try:
>       coro.with_timeout (...)
>    except coro.TimeoutError:
>       ...

Hi Sam ( I rember our talk about Shrapnel here at CCP some years back)
, others:

Jumping in here with some random stuff, in case anyone cares:

A few years ago, I started trying to create a standard library for stackless python.
we use it internally at ccp and it is open source, at

What it provides is
1) some utility classes for stackless (context managers mostly) but also synchronization primitives.
2) a basic "main" functionality:  A main loop and an event scheduler
3) a set of replacement modules for threading/socket, etc
4) Monkeypatching tools, to monkeypatch in the replacements, and even run monkeypatched scripts.

On the basis of the event scheduler, I also implemented timeout for socket.receive() functions.  These used to allow e.g. timeouts for locking operations

Timeouts are indeed implemented as exceptions raised.  There are some minor race issues to think about but that's it.

Notice the need for a stacklesslib.main module.  The issue I have found with this sort of event driven model, is that composability suffers when everyone has their own idea about what a "main" loop should be.  In threaded programming, the OS provides the main loop and the event scheduler.  For something like Python, a whole application has to agree on what the main loop is, and how to schedule future events.  Hopefully this discussion is an attempt to settle that in a standard manner.


p.s. stacklesslib is in a state of protracted and procrastinated development.  I promised that I would fix it up at last pycon.  Mostly I'm working on restructuring and making the main loop work more "out of the box."

From jstpierre at  Fri Oct 26 14:52:49 2012
From: jstpierre at (Jasper St. Pierre)
Date: Fri, 26 Oct 2012 08:52:49 -0400
Subject: [Python-ideas] list of reserved identifiers in program?
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 5:58 AM, Mark Hackett
<mark.hackett at> wrote:
> On Friday 26 Oct 2012, Steven D'Aprano wrote:
>> On 26/10/12 20:22, Daniel Fetchinson wrote:
>> > Hi folks,
>> >
>> > Would it be a good idea to have a built-in list of strings containing
>> > the reserved identifiers of python such as 'assert', 'import', etc?
>> >
>> > The reason I think this would be useful is that whenever I write a
>> > class with user defined methods I always have to exclude the reserved
>> > keywords. So for instance myinstance.mymethod( ) is okay but
>> > myinstance.assert( ) is not. In these cases I use the convention
>> > myinstance._assert( ), etc.
>> The usual convention is that leading underscores are private, and
>> trailing underscores are used to avoid name clashes with reserved words.
>> So myinstance.assert_ rather than myinstance._assert, which would be
>> considered "private, do not use".
> One story I heard about development was a site that had included as an early
> C++ header had
> #define private public
> If users REALLY want to use a function you though was private, they will.
> Convention works just as well without having people go to extreme lengths to
> avoid it (where their use case makes it beneficial).

I use it more as a guarantee. Any API that you mark as private can and
will break in the future, and is not covered by any stability promise.
If they really need to do some awfulness that my library can help out
with, sure, they can hack up the private API, but they're on their

> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


From at  Fri Oct 26 15:26:29 2012
From: at (Yury Selivanov)
Date: Fri, 26 Oct 2012 09:26:29 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-26, at 6:27 AM, Sturla Molden <sturla at> wrote:

> On 23.10.2012 21:33, Yury Selivanov wrote:
> >      topics =[
> >          FE.publication_date,
> >          FE.body,
> >          FE.category,
> >          (FE.creator, [
> >              (FE.creator.subject, [
> >                  (gpi, [
> >                      gpi.avatar
> >                  ])
> >              ])
> >          ])
> >      ]).filter(FE.publication_date<,
> >                FE.category == self.category)
> Why use Python when you clearly want Java?

And why do you think so? ;)


From itamar at  Fri Oct 26 16:03:56 2012
From: itamar at (Itamar Turner-Trauring)
Date: Fri, 26 Oct 2012 10:03:56 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Thu, Oct 25, 2012 at 10:43 AM, Guido van Rossum <guido at> wrote:

> On Thu, Oct 25, 2012 at 4:46 AM, Laurens Van Houtven <_ at> wrote:
> > Sorry, working really long hours these days; just wanted to chime in that
> > yes, you can call transport.write with large strings, and the reactor
> will
> > do the right thing under the hood: loseConnection is the polite way of
> > dropping a connection, which should wait for all pending writes to finish
> > etc.
> This seems a decent enough pattern. It also makes it possible to use
> one of these things as a substitute for a writable file object, so you
> can e.g. use it as sys.stdout or the stream for a
> logging.StreamHandler.
> Still, I wonder what happens if the socket/pipe/whatever that is
> written to is very slow and the program produces too much data. Does
> memory just balloon up, or is there some kind of throttling of the
> writer? Or a buffer overflow exception? For a totally general solution
> I would at least like to have the *option* of doing synchronous
> writes.
> (I'm asking these questions because I'd like to copy this useful
> pattern -- but I want to get the end cases right.)

There's a callback that gets called saying "your buffer is too full". This
is the producer/consumer API people have referred to. It's not the best API
in the world, and Glyph is working on an improvement, but that's the basic
idea. The general move is towards a push API - push as much data as you can
until you're told to stop.

Tornado has a "tell me when this write is removed from the buffer and
actually written to the socket" callback. This is more of a pull approach;
you write some data, and get notified when you should write some more.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From itamar at  Fri Oct 26 16:12:15 2012
From: itamar at (Itamar Turner-Trauring)
Date: Fri, 26 Oct 2012 10:12:15 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 24, 2012 at 7:43 PM, Guido van Rossum <guido at> wrote:

> > I don't know, I hope someone with an expertise in Twisted can tell us.
> >
> > But I would imagine that they don't have this particular problem, as it
> > should be related only to coroutines and schedulers that run them.  I.e.
> > it's a problem when you run some code and may interrupt it.  And you
> can't
> > interrupt a plain python code that uses callbacks without yields and
> > greenlets.
> Well, but in the Twisted world, if a cleanup callback requires more
> blocking calls, it has to spawn more deferred callbacks. So I think
> they *do* have the problem, unless they don't have a way at all to
> constrain the total running time of an action involving cascading
> callbacks. Also, they have inlineCallbacks which does use yield.

Deferreds don't do anything to prevent blocking. They're just a nice
abstraction for callbacks. And yes, if you call 1000 functions that do lots
of CPU in a row, that will keep other stuff from happening.

However, consider how a timeout works: the event loop notices enough time
has passed, and so calls some code that tells the Deferred to cancel its
operation. So you're *not* adding the cancellation operations to the stack
of the original operation, you're starting from the event loop. And so
timeouts are just normal event loop world, where you need to be careful not
to do to much CPU-intensive processing in any given call, and you can't
call blocking system calls (except using a thread).

Of course, you can't timeout a function that's just looping using CPU, or a
blocking system call, and so code needs to be structured to deal with this,
but that's a different issue.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Fri Oct 26 17:25:06 2012
From: guido at (Guido van Rossum)
Date: Fri, 26 Oct 2012 08:25:06 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 7:12 AM, Itamar Turner-Trauring
<itamar at> wrote:
> On Wed, Oct 24, 2012 at 7:43 PM, Guido van Rossum <guido at> wrote:
>> > I don't know, I hope someone with an expertise in Twisted can tell us.
>> >
>> > But I would imagine that they don't have this particular problem, as it
>> > should be related only to coroutines and schedulers that run them.  I.e.
>> > it's a problem when you run some code and may interrupt it.  And you
>> > can't
>> > interrupt a plain python code that uses callbacks without yields and
>> > greenlets.
>> Well, but in the Twisted world, if a cleanup callback requires more
>> blocking calls, it has to spawn more deferred callbacks. So I think
>> they *do* have the problem, unless they don't have a way at all to
>> constrain the total running time of an action involving cascading
>> callbacks. Also, they have inlineCallbacks which does use yield.
> Deferreds don't do anything to prevent blocking. They're just a nice
> abstraction for callbacks. And yes, if you call 1000 functions that do lots
> of CPU in a row, that will keep other stuff from happening.
> However, consider how a timeout works: the event loop notices enough time
> has passed, and so calls some code that tells the Deferred to cancel its
> operation. So you're *not* adding the cancellation operations to the stack
> of the original operation, you're starting from the event loop. And so
> timeouts are just normal event loop world, where you need to be careful not
> to do to much CPU-intensive processing in any given call, and you can't call
> blocking system calls (except using a thread).
> Of course, you can't timeout a function that's just looping using CPU, or a
> blocking system call, and so code needs to be structured to deal with this,
> but that's a different issue.

So, basically, it's just "after T seconds you get this second callback
and it's up to you to deal with it"? I guess the timeout callback can
inspect the state of the operation, and cancel any pending operations?

Do you have a way to translate timeouts into exceptions in
inlineCallbacks? If so, how is that working out?

--Guido van Rossum (

From _ at  Fri Oct 26 17:40:41 2012
From: _ at (Laurens Van Houtven)
Date: Fri, 26 Oct 2012 17:40:41 +0200
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

There's an exception for "a deferred has been cancelled".  Cancelling a
deferred fires that down its errback chain just like any exception. Since
@inlineCallbacks works on top of deferreds, it magically works:

>>> from twisted.internet import defer
>>> d = defer.Deferred()
>>> @defer.inlineCallbacks
... def f():
...     yield d
>>> r = f()
>>> r
<Deferred at 0x1019df950>
>>> d.cancel()
>>> r
<Deferred at 0x1019df950 current result: <twisted.python.failure.Failure
<class 'twisted.internet.defer.CancelledError'>>>

On Fri, Oct 26, 2012 at 5:25 PM, Guido van Rossum <guido at> wrote:

> On Fri, Oct 26, 2012 at 7:12 AM, Itamar Turner-Trauring
> <itamar at> wrote:
> >
> >
> > On Wed, Oct 24, 2012 at 7:43 PM, Guido van Rossum <guido at>
> wrote:
> >>
> >>
> >> > I don't know, I hope someone with an expertise in Twisted can tell us.
> >> >
> >> > But I would imagine that they don't have this particular problem, as
> it
> >> > should be related only to coroutines and schedulers that run them.
>  I.e.
> >> > it's a problem when you run some code and may interrupt it.  And you
> >> > can't
> >> > interrupt a plain python code that uses callbacks without yields and
> >> > greenlets.
> >>
> >> Well, but in the Twisted world, if a cleanup callback requires more
> >> blocking calls, it has to spawn more deferred callbacks. So I think
> >> they *do* have the problem, unless they don't have a way at all to
> >> constrain the total running time of an action involving cascading
> >> callbacks. Also, they have inlineCallbacks which does use yield.
> >
> >
> > Deferreds don't do anything to prevent blocking. They're just a nice
> > abstraction for callbacks. And yes, if you call 1000 functions that do
> lots
> > of CPU in a row, that will keep other stuff from happening.
> >
> > However, consider how a timeout works: the event loop notices enough time
> > has passed, and so calls some code that tells the Deferred to cancel its
> > operation. So you're *not* adding the cancellation operations to the
> stack
> > of the original operation, you're starting from the event loop. And so
> > timeouts are just normal event loop world, where you need to be careful
> not
> > to do to much CPU-intensive processing in any given call, and you can't
> call
> > blocking system calls (except using a thread).
> >
> > Of course, you can't timeout a function that's just looping using CPU,
> or a
> > blocking system call, and so code needs to be structured to deal with
> this,
> > but that's a different issue.
> So, basically, it's just "after T seconds you get this second callback
> and it's up to you to deal with it"? I guess the timeout callback can
> inspect the state of the operation, and cancel any pending operations?
> Do you have a way to translate timeouts into exceptions in
> inlineCallbacks? If so, how is that working out?
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From _ at  Fri Oct 26 17:52:49 2012
From: _ at (Laurens Van Houtven)
Date: Fri, 26 Oct 2012 17:52:49 +0200
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

err, I suppose the missing bit there is that you'll probably want to:

reactor.callLater(timeout, d.cancel)

As opposed to calling d.cancel() directly. (That snippet was in
bpython-urwid with the reactor running in the background, but I doubt it'd
work well anywhere else outside of manholes :))

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Fri Oct 26 17:52:53 2012
From: solipsis at (Antoine Pitrou)
Date: Fri, 26 Oct 2012 17:52:53 +0200
Subject: [Python-ideas] Async API
References: <>
Message-ID: <20121026175253.1361628a@cosmocat>

Le Thu, 25 Oct 2012 16:39:40 -0400,
Terry Reedy <tjreedy at> a ?crit :
> On 10/25/2012 12:10 PM, Yury Selivanov wrote:
> > - And what's your opinion on writing a PEP about making it possible
> > to pass a custom socket-factory to stdlib objects?
> I think this is probably a good idea quite aside from async issues.

I think it's a rather bad idea. It does not correspond to any real use
case and will clutter the API with an additional parameter.



From ryan at  Fri Oct 26 18:31:52 2012
From: ryan at (Ryan D Hiebert)
Date: Fri, 26 Oct 2012 09:31:52 -0700
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 26, 2012, at 4:55 AM, Ned Batchelder <ned at> wrote:
> Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions?  Py2 is still vastly more used.

+1 I can't count how many times I've been on the right page, but the wrong version, and need to switch.

From guido at  Fri Oct 26 18:36:59 2012
From: guido at (Guido van Rossum)
Date: Fri, 26 Oct 2012 09:36:59 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_ at> wrote:
> err, I suppose the missing bit there is that you'll probably want to:
> reactor.callLater(timeout, d.cancel)
> As opposed to calling d.cancel() directly. (That snippet was in
> bpython-urwid with the reactor running in the background, but I doubt it'd
> work well anywhere else outside of manholes :))

So I think that Yuri's original problem statement, transformed to
Twisted+Deferred, might still apply, depending on how you implement
it. Yuri essentially did this:

def foobar():  # a task
        yield <blocking action>
        # must clean up regardless of whether action succeeded or failed:
        yield <blocking cleanup>

He then calls this with a timeout, with the semantics that if the
generator is blocked in a yield when the timeout arrives, that yield
raises a Timeout exception (and at no other time is Timeout raised).
The problem with this is that if the action succeeds within the
timeout, but barely, there's a chance that the cleanup of a
*successful* action receives the Timeout exception. Apparently this
bit Yuri. I'm not sure how you'd model that using just Deferreds, but
using inlineCallbacks it seems the same thing might happen. Using
Deferreds, I assume there's a common pattern to implement this that
doesn't have this problem. Of course, using coroutines, there is too
-- spawn the cleanup as an independent task.

--Guido van Rossum (

From itamar at  Fri Oct 26 18:57:16 2012
From: itamar at (Itamar Turner-Trauring)
Date: Fri, 26 Oct 2012 12:57:16 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 12:36 PM, Guido van Rossum <guido at> wrote:

> On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_ at> wrote:
> > err, I suppose the missing bit there is that you'll probably want to:
> >
> > reactor.callLater(timeout, d.cancel)
> >
> > As opposed to calling d.cancel() directly. (That snippet was in
> > bpython-urwid with the reactor running in the background, but I doubt
> it'd
> > work well anywhere else outside of manholes :))
> So I think that Yuri's original problem statement, transformed to
> Twisted+Deferred, might still apply, depending on how you implement
> it. Yuri essentially did this:
> def foobar():  # a task
>     try:
>         yield <blocking action>
>     finally:
>         # must clean up regardless of whether action succeeded or failed:
>         yield <blocking cleanup>
> He then calls this with a timeout, with the semantics that if the
> generator is blocked in a yield when the timeout arrives, that yield
> raises a Timeout exception (and at no other time is Timeout raised).
> The problem with this is that if the action succeeds within the
> timeout, but barely, there's a chance that the cleanup of a
> *successful* action receives the Timeout exception. Apparently this
> bit Yuri. I'm not sure how you'd model that using just Deferreds, but
> using inlineCallbacks it seems the same thing might happen. Using
> Deferreds, I assume there's a common pattern to implement this that
> doesn't have this problem. Of course, using coroutines, there is too
> -- spawn the cleanup as an independent task.

If you call cancel() on a Deferred that already has a result, nothing
happens. So you don't get a TimeoutError if the operation has succeeded (or
failed some other way). This would also be true when using inlineCallbacks,
so there's no issue.

In general I'm not clear why this is a problem: in a single-threaded
program only one thing happens at a time. Your code for triggering a
timeout always has the option to check if the operation has succeeded,
without worrying about race conditions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Fri Oct 26 19:06:14 2012
From: at (Yury Selivanov)
Date: Fri, 26 Oct 2012 13:06:14 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-26, at 12:57 PM, Itamar Turner-Trauring <itamar at> wrote:
> On Fri, Oct 26, 2012 at 12:36 PM, Guido van Rossum <guido at> wrote:
> On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_ at> wrote:
> > err, I suppose the missing bit there is that you'll probably want to:
> >
> > reactor.callLater(timeout, d.cancel)
> >
> > As opposed to calling d.cancel() directly. (That snippet was in
> > bpython-urwid with the reactor running in the background, but I doubt it'd
> > work well anywhere else outside of manholes :))
> So I think that Yuri's original problem statement, transformed to
> Twisted+Deferred, might still apply, depending on how you implement
> it. Yuri essentially did this:
> def foobar():  # a task
>     try:
>         yield <blocking action>
>     finally:
>         # must clean up regardless of whether action succeeded or failed:
>         yield <blocking cleanup>
> He then calls this with a timeout, with the semantics that if the
> generator is blocked in a yield when the timeout arrives, that yield
> raises a Timeout exception (and at no other time is Timeout raised).
> The problem with this is that if the action succeeds within the
> timeout, but barely, there's a chance that the cleanup of a
> *successful* action receives the Timeout exception. Apparently this
> bit Yuri. I'm not sure how you'd model that using just Deferreds, but
> using inlineCallbacks it seems the same thing might happen. Using
> Deferreds, I assume there's a common pattern to implement this that
> doesn't have this problem. Of course, using coroutines, there is too
> -- spawn the cleanup as an independent task.
> If you call cancel() on a Deferred that already has a result, nothing happens. So you don't get a TimeoutError if the operation has succeeded (or failed some other way). This would also be true when using inlineCallbacks, so there's no issue.
> In general I'm not clear why this is a problem: in a single-threaded program only one thing happens at a time. Your code for triggering a timeout always has the option to check if the operation has succeeded, without worrying about race conditions.

Let me ask you a question that may help me and others to understand
how inlineCallbacks works.

If you write the following:

   def func():
           yield one_thing()
           yield and_another()
           yield and_finally()

Then each of those yields will create a separate Deferred object, that
'inlineCallbacks' transparently dispatches via generator send/throw,

And if you 'yield func()' the same will happen--'inlineCallbacks' will
return a Deferred, that will have a result of 'func' execution?


From guido at  Fri Oct 26 19:08:24 2012
From: guido at (Guido van Rossum)
Date: Fri, 26 Oct 2012 10:08:24 -0700
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 9:57 AM, Itamar Turner-Trauring
<itamar at> wrote:
> On Fri, Oct 26, 2012 at 12:36 PM, Guido van Rossum <guido at> wrote:
>> On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_ at> wrote:
>> > err, I suppose the missing bit there is that you'll probably want to:
>> >
>> > reactor.callLater(timeout, d.cancel)
>> >
>> > As opposed to calling d.cancel() directly. (That snippet was in
>> > bpython-urwid with the reactor running in the background, but I doubt
>> > it'd
>> > work well anywhere else outside of manholes :))
>> So I think that Yuri's original problem statement, transformed to
>> Twisted+Deferred, might still apply, depending on how you implement
>> it. Yuri essentially did this:
>> def foobar():  # a task
>>     try:
>>         yield <blocking action>
>>     finally:
>>         # must clean up regardless of whether action succeeded or failed:
>>         yield <blocking cleanup>
>> He then calls this with a timeout, with the semantics that if the
>> generator is blocked in a yield when the timeout arrives, that yield
>> raises a Timeout exception (and at no other time is Timeout raised).
>> The problem with this is that if the action succeeds within the
>> timeout, but barely, there's a chance that the cleanup of a
>> *successful* action receives the Timeout exception. Apparently this
>> bit Yuri. I'm not sure how you'd model that using just Deferreds, but
>> using inlineCallbacks it seems the same thing might happen. Using
>> Deferreds, I assume there's a common pattern to implement this that
>> doesn't have this problem. Of course, using coroutines, there is too
>> -- spawn the cleanup as an independent task.
> If you call cancel() on a Deferred that already has a result, nothing
> happens. So you don't get a TimeoutError if the operation has succeeded (or
> failed some other way). This would also be true when using inlineCallbacks,
> so there's no issue.
> In general I'm not clear why this is a problem: in a single-threaded program
> only one thing happens at a time. Your code for triggering a timeout always
> has the option to check if the operation has succeeded, without worrying
> about race conditions.

But the example is not single-threaded (in the informal sense that you
use it here). Each yield is a suspension point where other things can
happen, and one of those things could be a cancellation of *this* task
(because of a timeout or otherwise).

The example would have to set some flag indicating it has a result
after the first yield (i.e. before entering the finally, or at least
before yielding in the finally clause). And the timeout callback would
have to check this flag. This makes it slightly awkward to design a
general-purpose timeout mechanism for tasks written in this style --
if you expect a timeout or cancellation you must protect your cleanup
code from it by using some API.

Anyway, no need to respond: I think I understand how Twisted deals
with this, and translating that into the world of PEP 380 is not your

--Guido van Rossum (

From chris.jerdonek at  Fri Oct 26 19:13:01 2012
From: chris.jerdonek at (Chris Jerdonek)
Date: Fri, 26 Oct 2012 10:13:01 -0700
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 9:31 AM, Ryan D Hiebert <ryan at> wrote:
> On Oct 26, 2012, at 4:55 AM, Ned Batchelder <ned at> wrote:
>> Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions?  Py2 is still vastly more used.
> +1 I can't count how many times I've been on the right page, but the wrong version, and need to switch.

I believe the primary issue filed for this is here:


From jstpierre at  Fri Oct 26 19:14:56 2012
From: jstpierre at (Jasper St. Pierre)
Date: Fri, 26 Oct 2012 13:14:56 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 1:06 PM, Yury Selivanov < at> wrote:

... snip ...

> Let me ask you a question that may help me and others to understand
> how inlineCallbacks works.
> If you write the following:
>    def func():
>        try:
>            yield one_thing()
>            yield and_another()
>        finally:
>            yield and_finally()
> Then each of those yields will create a separate Deferred object, that
> 'inlineCallbacks' transparently dispatches via generator send/throw,
> right?

one_thing() and and_another() and and_finally() should return
Deferreds. inlineCallbacks gets those Deferreds, adds callbacks for
completion/error, and resumes the generator at the appropriate time.
You don't use the results from either Deferreds, so the values will
just be thrown out. The yield/trampoline doesn't create any Deferreds
for those operations itself.

> And if you 'yield func()' the same will happen--'inlineCallbacks' will
> return a Deferred, that will have a result of 'func' execution?

You didn't decorate func with inlineCallbacks, but if you do, func()
will give you Deferred. Note that func itself doesn't return any
value. In Twisted land, this is done by defer.returnValue(), which
uses exceptions to return a value to the trampoline. This maps well to
the new sugar in 3.3.

> Thanks,
> Yury
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at


From at  Fri Oct 26 19:36:54 2012
From: at (Yury Selivanov)
Date: Fri, 26 Oct 2012 13:36:54 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-26, at 7:55 AM, Ned Batchelder <ned at> wrote:

> On 10/25/2012 7:50 PM, Yury Selivanov wrote:
>> Hi,
>> I remember a discussion to make pointed to py3k docs by default.
>> Are we still going to do that?
> Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions?  Py2 is still vastly more used.


I've just created an issue
with a working patch attached to it.

Docs will look like this:

Please check it out!


From christian at  Fri Oct 26 19:42:32 2012
From: christian at (Christian Heimes)
Date: Fri, 26 Oct 2012 19:42:32 +0200
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <k6ei26$8hf$>

Am 26.10.2012 01:50, schrieb Yury Selivanov:
> Hi,
> I remember a discussion to make <>
> pointed to py3k docs by default.
> Are we still going to do that?

How about for the latest stable version of
Python 2.x and for the latest stable of Python
3.x? The py3k docs traditionally point to the latest development version.


From at  Fri Oct 26 19:46:28 2012
From: at (Yury Selivanov)
Date: Fri, 26 Oct 2012 13:46:28 -0400
Subject: [Python-ideas]
In-Reply-To: <k6ei26$8hf$>
References: <>
Message-ID: <>


On 2012-10-26, at 1:42 PM, Christian Heimes <christian at> wrote:
> Am 26.10.2012 01:50, schrieb Yury Selivanov:
>> Hi,
>> I remember a discussion to make <>
>> pointed to py3k docs by default.
>> Are we still going to do that?
> How about for the latest stable version of
> Python 2.x and for the latest stable of Python
> 3.x? The py3k docs traditionally point to the latest development version.

As for me, I like simple ''.
The rest of UX is easy to ensure with a little JS ;)
Take a look at my patch attached to the issue 16331.


From at  Fri Oct 26 19:49:58 2012
From: at (Yury Selivanov)
Date: Fri, 26 Oct 2012 13:49:58 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-26, at 1:14 PM, Jasper St. Pierre <jstpierre at> wrote:
> On Fri, Oct 26, 2012 at 1:06 PM, Yury Selivanov < at> wrote:
> ... snip ...
>> Let me ask you a question that may help me and others to understand
>> how inlineCallbacks works.
>> If you write the following:
>>   def func():
>>       try:
>>           yield one_thing()
>>           yield and_another()
>>       finally:
>>           yield and_finally()
>> Then each of those yields will create a separate Deferred object, that
>> 'inlineCallbacks' transparently dispatches via generator send/throw,
>> right?
> one_thing() and and_another() and and_finally() should return
> Deferreds. inlineCallbacks gets those Deferreds, adds callbacks for
> completion/error, and resumes the generator at the appropriate time.
> You don't use the results from either Deferreds, so the values will
> just be thrown out. The yield/trampoline doesn't create any Deferreds
> for those operations itself.
>> And if you 'yield func()' the same will happen--'inlineCallbacks' will
>> return a Deferred, that will have a result of 'func' execution?
> You didn't decorate func with inlineCallbacks, but if you do, func()
> will give you Deferred. Note that func itself doesn't return any
> value. In Twisted land, this is done by defer.returnValue(), which
> uses exceptions to return a value to the trampoline. This maps well to
> the new sugar in 3.3.

Right, I forgot to decorate the 'func' with 'inlineCallbacks'.

If it is decorated, though, how can I invoke it with a timeout?


From bruce at  Fri Oct 26 19:56:25 2012
From: bruce at (Bruce Leban)
Date: Fri, 26 Oct 2012 10:56:25 -0700
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 10:46 AM, Yury Selivanov < at>wrote:

> On 2012-10-26, at 1:42 PM, Christian Heimes <christian at> wrote:
> >> Are we still going to do that?
> >
> > How about for the latest stable version of
> > Python 2.x and for the latest stable of Python
> > 3.x? The py3k docs traditionally point to the latest development version.
> As for me, I like simple ''.
> The rest of UX is easy to ensure with a little JS ;)
> Take a look at my patch attached to the issue 16331.
There are tons of links out there that would break if you switched to docs2
and docs3. JS is better. And it would accommodate a feature where a user
can set a preference of what version of python documentation they want to
see rather than defaulting to 2.7 or 3.x.

--- Bruce
Follow me:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From christian at  Fri Oct 26 20:04:09 2012
From: christian at (Christian Heimes)
Date: Fri, 26 Oct 2012 20:04:09 +0200
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

Am 26.10.2012 19:56, schrieb Bruce Leban:
> There are tons of links out there that would break if you switched to
> docs2 and docs3. JS is better. And it would accommodate a feature where
> a user can set a preference of what version of python documentation they
> want to see rather than defaulting to 2.7 or 3.x.

We can have the FQDNs additionally to and have
them as mnemonic for the correct Python 2.x or 3.x docs. It's easy to
create an Apache rewrite rule that redirects the user to the proper

  RewriteCond %{HTTP_HOST} [NC]
    RewriteRule ^/(.*)$1 [R=301,L]

Yury, I'm not arguing against your JS UI -- I actually like it. I like
to have both.


From at  Fri Oct 26 20:09:43 2012
From: at (Yury Selivanov)
Date: Fri, 26 Oct 2012 14:09:43 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-26, at 2:04 PM, Christian Heimes <christian at> wrote:

> Am 26.10.2012 19:56, schrieb Bruce Leban:
>> There are tons of links out there that would break if you switched to
>> docs2 and docs3. JS is better. And it would accommodate a feature where
>> a user can set a preference of what version of python documentation they
>> want to see rather than defaulting to 2.7 or 3.x.
> We can have the FQDNs additionally to and have
> them as mnemonic for the correct Python 2.x or 3.x docs. It's easy to
> create an Apache rewrite rule that redirects the user to the proper
> documents.
>  RewriteCond %{HTTP_HOST} [NC]
>    RewriteRule ^/(.*)$1 [R=301,L]
> Yury, I'm not arguing against your JS UI -- I actually like it. I like
> to have both.

Thanks ;)

The thing about 'doc2' & 'doc3' urls I don't like is that sooner or later
users will use python 3.  There is no future for python 2.  That's why
I think that it's better to have just one main doc destination that
everybody knows, uses, and posts links to.  Just my 2 cents.


From breamoreboy at  Fri Oct 26 21:13:35 2012
From: breamoreboy at (Mark Lawrence)
Date: Fri, 26 Oct 2012 20:13:35 +0100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <k6en8j$p5u$>

On 26/10/2012 19:09, Yury Selivanov wrote:
> The thing about 'doc2' & 'doc3' urls I don't like is that sooner or later
> users will use python 3.  There is no future for python 2.  That's why
> I think that it's better to have just one main doc destination that
> everybody knows, uses, and posts links to.  Just my 2 cents.
> -
> Yury

I entirely agree with your sentiments.  Complaints along the lines of 
"but library xyz isn't compatible with Python 3" should be met with a 
response from the Python community "what can we do to fix this 
situation".  A very personnal preference, but I would like to see this 
happening rather than having people playing with new toys, like the 
Async API. YMMV.


Mark Lawrence.

From chris.jerdonek at  Fri Oct 26 22:08:38 2012
From: chris.jerdonek at (Chris Jerdonek)
Date: Fri, 26 Oct 2012 13:08:38 -0700
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 10:36 AM, Yury Selivanov
< at> wrote:
> On 2012-10-26, at 7:55 AM, Ned Batchelder <ned at> wrote:
>> On 10/25/2012 7:50 PM, Yury Selivanov wrote:
>>> Hi,
>>> I remember a discussion to make pointed to py3k docs by default.
>>> Are we still going to do that?
>> Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions?  Py2 is still vastly more used.
> OK.
> I've just created an issue
> with a working patch attached to it.

Did you see my earlier response before this message that provides a
link to an already-existing issue on this topic?


From at  Fri Oct 26 22:11:35 2012
From: at (Yury Selivanov)
Date: Fri, 26 Oct 2012 16:11:35 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-26, at 4:08 PM, Chris Jerdonek <chris.jerdonek at> wrote:

> On Fri, Oct 26, 2012 at 10:36 AM, Yury Selivanov
> < at> wrote:
>> On 2012-10-26, at 7:55 AM, Ned Batchelder <ned at> wrote:
>>> On 10/25/2012 7:50 PM, Yury Selivanov wrote:
>>>> Hi,
>>>> I remember a discussion to make pointed to py3k docs by default.
>>>> Are we still going to do that?
>>> Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions?  Py2 is still vastly more used.
>> OK.
>> I've just created an issue
>> with a working patch attached to it.
> Did you see my earlier response before this message that provides a
> link to an already-existing issue on this topic?

Take a look at 16331.  There is a open question there--which issue
should be closed now ;)

I apologize that I didn't find your issue (but I honestly tried to.)


From albrecht.andi at  Fri Oct 26 22:22:22 2012
From: albrecht.andi at (Andi Albrecht)
Date: Fri, 26 Oct 2012 22:22:22 +0200
Subject: [Python-ideas] Enabling man page structure for python
In-Reply-To: <>
References: <>
Message-ID: <>


On Thu, Oct 25, 2012 at 7:25 PM, ?ric Araujo <merwok at> wrote:
> Hi,
> See ?argparse: add ability to create a
> man page?

I've started to work on this issue some time ago. The starting point
was a man page formatter based on optparse I wrote earlier. But I've
encountered some problems since the output order of argparse
formatters differ from what to expect on a man page. IIRC I saw the
need to do some changes to the way how argparse formatters work but
unfortunately got interrupted by other work.

IMO adding a argparse formatter would the probably the right way to
add man page support. There would even be no need to add this to
stdlib then.

Best regards,


> Cheers
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From g.brandl at  Fri Oct 26 23:09:34 2012
From: g.brandl at (Georg Brandl)
Date: Fri, 26 Oct 2012 23:09:34 +0200
Subject: [Python-ideas]
In-Reply-To: <k6ei26$8hf$>
References: <>
Message-ID: <k6eu3l$ij7$>

Am 26.10.2012 19:42, schrieb Christian Heimes:
> Am 26.10.2012 01:50, schrieb Yury Selivanov:
>> Hi,
>> I remember a discussion to make <>
>> pointed to py3k docs by default.
>> Are we still going to do that?
> How about for the latest stable version of
> Python 2.x and for the latest stable of Python
> 3.x? The py3k docs traditionally point to the latest development version.

FWIW, docs3 already exists.  Nobody is using it.


From andrew.svetlov at  Fri Oct 26 23:13:29 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Sat, 27 Oct 2012 00:13:29 +0300
Subject: [Python-ideas]
In-Reply-To: <k6eu3l$ij7$>
References: <>
	<k6ei26$8hf$> <k6eu3l$ij7$>
Message-ID: <>

Maybe just because it is a simple redirect?
If python3 docs will be accessible as instead people will start to use this address?

On Sat, Oct 27, 2012 at 12:09 AM, Georg Brandl <g.brandl at> wrote:
> Am 26.10.2012 19:42, schrieb Christian Heimes:
>> Am 26.10.2012 01:50, schrieb Yury Selivanov:
>>> Hi,
>>> I remember a discussion to make <>
>>> pointed to py3k docs by default.
>>> Are we still going to do that?
>> How about for the latest stable version of
>> Python 2.x and for the latest stable of Python
>> 3.x? The py3k docs traditionally point to the latest development version.
> FWIW, docs3 already exists.  Nobody is using it.
> Georg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Andrew Svetlov

From g.brandl at  Fri Oct 26 23:21:36 2012
From: g.brandl at (Georg Brandl)
Date: Fri, 26 Oct 2012 23:21:36 +0200
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
	<k6ei26$8hf$> <k6eu3l$ij7$>
Message-ID: <k6euq7$pde$>

I don't know, and I'm not fond of docs3, so I wouldn't make it more
prominent.  It was requested some years ago, and since it doesn't
cause problems that way, I added it as a redirect.


Am 26.10.2012 23:13, schrieb Andrew Svetlov:
> Maybe just because it is a simple redirect?
> If python3 docs will be accessible as instead
> people will start to use this address?
> On Sat, Oct 27, 2012 at 12:09 AM, Georg Brandl <g.brandl at> wrote:
>> Am 26.10.2012 19:42, schrieb Christian Heimes:
>>> Am 26.10.2012 01:50, schrieb Yury Selivanov:
>>>> Hi,
>>>> I remember a discussion to make <>
>>>> pointed to py3k docs by default.
>>>> Are we still going to do that?
>>> How about for the latest stable version of
>>> Python 2.x and for the latest stable of Python
>>> 3.x? The py3k docs traditionally point to the latest development version.
>> FWIW, docs3 already exists.  Nobody is using it.
>> Georg
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at

From tjreedy at  Sat Oct 27 00:15:32 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 26 Oct 2012 18:15:32 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <k6f22a$epu$>

On 10/26/2012 4:47 AM, Nick Coghlan wrote:
> Eventually, but not just yet :)

I think it should already have been done. To not feature our latest 
release on the page where the latest releases have always before been 
featured is to say that it is somehow not a full production-ready release.

Terry Jan Reedy

From tjreedy at  Sat Oct 27 00:18:52 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 26 Oct 2012 18:18:52 -0400
Subject: [Python-ideas]
In-Reply-To: <k6eu3l$ij7$>
References: <>
	<k6ei26$8hf$> <k6eu3l$ij7$>
Message-ID: <k6f28i$epu$>

On 10/26/2012 5:09 PM, Georg Brandl wrote:

> FWIW, docs3 already exists.  Nobody is using it.

I do, when I want to see the updated version instead of the older 
window's help version.

Terry Jan Reedy

From jeanpierreda at  Sat Oct 27 00:22:19 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Fri, 26 Oct 2012 18:22:19 -0400
Subject: [Python-ideas]
In-Reply-To: <k6f22a$epu$>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 6:15 PM, Terry Reedy <tjreedy at> wrote:
> I think it should already have been done. To not feature our latest release
> on the page where the latest releases have always before been featured is to
> say that it is somehow not a full production-ready release.

There were times when 3.1 and 3.2 were the latest releases, and they
have never been featured there. They were also production ready.

-- Devin

From tjreedy at  Sat Oct 27 00:22:31 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 26 Oct 2012 18:22:31 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <k6f2fe$nqu$>

> On Oct 26, 2012, at 4:55 AM, Ned Batchelder
> <ned at> wrote:
>> Py2 is still vastly more used.

Every time we release a new version, the previous version is vastly more 
used. But we have previously put the new docs on anyway.

For beginners learning Python in classes, I suspect Python 3 is more 
used. (I certainly hope so ;-).

Terry Jan Reedy

From tjreedy at  Sat Oct 27 00:17:16 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 26 Oct 2012 18:17:16 -0400
Subject: [Python-ideas]
In-Reply-To: <k6ei26$8hf$>
References: <>
Message-ID: <k6f25h$epu$>

On 10/26/2012 1:42 PM, Christian Heimes wrote:
> Am 26.10.2012 01:50, schrieb Yury Selivanov:
>> Hi,
>> I remember a discussion to make <>
>> pointed to py3k docs by default.
>> Are we still going to do that?
> How about for the latest stable version of
> Python 2.x and for the latest stable of Python
> 3.x? The py3k docs traditionally point to the latest development version.

I thought we had half-way already decided on that, with the possibility 
of  listing both.

Terry Jan Reedy

From cs at  Sat Oct 27 00:46:44 2012
From: cs at (Cameron Simpson)
Date: Sat, 27 Oct 2012 09:46:44 +1100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 26Oct2012 18:22, Devin Jeanpierre <jeanpierreda at> wrote:
| On Fri, Oct 26, 2012 at 6:15 PM, Terry Reedy <tjreedy at> wrote:
| > I think it should already have been done. To not feature our latest release
| > on the page where the latest releases have always before been featured is to
| > say that it is somehow not a full production-ready release.
| There were times when 3.1 and 3.2 were the latest releases, and they
| have never been featured there. They were also production ready.

That's Terry's point: by not featuring them there we're insinuating that they
were not production ready...
Cameron Simpson <cs at>

You can blip it twice to clear the bore,
But blip it thrice, and you've sinned once more.
        - Tom Warner <tom at>

From ned at  Sat Oct 27 00:58:53 2012
From: ned at (Ned Batchelder)
Date: Fri, 26 Oct 2012 18:58:53 -0400
Subject: [Python-ideas]
In-Reply-To: <k6f2fe$nqu$>
References: <>
Message-ID: <>

On 10/26/2012 6:22 PM, Terry Reedy wrote:
>> On Oct 26, 2012, at 4:55 AM, Ned Batchelder
>> <ned at> wrote:
>>> Py2 is still vastly more used.
> Every time we release a new version, the previous version is vastly 
> more used. But we have previously put the new docs on anyway.

I'm not suggesting having py2 as the default, just providing an easy way 
to get to them.  I can read 2.7 docs and figure out how 2.6 works from 
them much more easily than I can read 3.3 docs and figure out how 2.7 works.

> For beginners learning Python in classes, I suspect Python 3 is more 
> used. (I certainly hope so ;-).

Hmm, I don't think that's true just yet.


From jeanpierreda at  Sat Oct 27 01:36:03 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Fri, 26 Oct 2012 19:36:03 -0400
Subject: [Python-ideas]
In-Reply-To: <k6f2fe$nqu$>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 6:22 PM, Terry Reedy <tjreedy at> wrote:
> For beginners learning Python in classes, I suspect Python 3 is more used.
> (I certainly hope so ;-).

Instructors have their own kind of inertia. If they change major
versions, they no longer get to reuse old slides, they have to rewrite
old assignments, upgrade the automated test systems, and even just
plain learn Python 3, which is a challenge of its own (albeit a small
one.)  Remember also that must non-research instructors are vastly
overworked, and most research professors aren't exactly eager to burn
lots of time in course preparation either, since their job is not to
teach but to research.

Considering that the differences between Python 2 and 3 are irrelevant
for nearly any educational context, what's the payoff? The move is
just something they have to do eventually because of bug support
reasons, not something they are eager to do except out of some kind of
enthusiasm (which, admittedly, instructors often have -- shiny is

My university (the University of Toronto) has switched to Python 3 for
their new Coursera courses, because they involved writing material
from scratch anyway, so might as well make it futureproof. The regular
classes taught inside the university itself still use Python 2.7
(actually, they used Python 2.5 until the upgrade process a year and a
half ago, which I was a part of), and other than the coursera work, as
far as I am aware, no moves have been made to switch to Python 3.

They might also switch to another language entirely instead. They used
Racket in a couple of introductory courses last year, and I've heard
good things from faculty and students involved. It's a more viable
decision than it used to be, since a lot of work has to be done
regardless to switch to Python 3, so the inertial reason of staying
with Python is diminished. I don't think this will happen near-term,
because they're still investing in Python, but it was nice to see that
they were breaking out of their rut and trying new things.

-- Devin

From raymond.hettinger at  Sat Oct 27 04:09:29 2012
From: raymond.hettinger at (Raymond Hettinger)
Date: Fri, 26 Oct 2012 19:09:29 -0700
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 26, 2012, at 4:36 PM, Devin Jeanpierre <jeanpierreda at> wrote:

>> For beginners learning Python in classes, I suspect Python 3 is more used.
>> (I certainly hope so ;-).

I've been teaching quite a bit this year.  Python 3 isn't being used at all
(by any of my clients or by any of the other instructors I know who are
teaching Python).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From massimo.dipierro at  Sat Oct 27 04:34:27 2012
From: massimo.dipierro at (massimo.dipierro at
Date: Fri, 26 Oct 2012 19:34:27 -0700 (PDT)
Subject: [Python-ideas]
Message-ID: <>

An HTML attachment was scrubbed...
URL: <>

From tjreedy at  Sat Oct 27 04:55:33 2012
From: tjreedy at (Terry Reedy)
Date: Fri, 26 Oct 2012 22:55:33 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <k6fifd$34v$>

On 10/26/2012 6:46 PM, Cameron Simpson wrote:
> On 26Oct2012 18:22, Devin Jeanpierre <jeanpierreda at> wrote:
> | On Fri, Oct 26, 2012 at 6:15 PM, Terry Reedy <tjreedy at> wrote:
> | > I think it should already have been done. To not feature our latest release
> | > on the page where the latest releases have always before been featured is to
> | > say that it is somehow not a full production-ready release.
> |
> | There were times when 3.1 and 3.2 were the latest releases, and they
> | have never been featured there. They were also production ready.

3.1 came out in between 2.6 and 2.7 and one could argue that it was 
still somewhat a trial version and that switching back and forth (2.6, 
3.1, 2.7) would not be a good idea.

3.2 came out 8 months after 2.7. I would have made the switch then, but 
I acknowledge that one could argue that 2.7 had not had its 18-24 months 
in the sun, and that 3.2 still lacked 3rd party library support.

> That's Terry's point: by not featuring them there we're insinuating that they
> were not production ready...

3.3 is now out 29 months after 2.7, library support is much improved, 
and the new unicode implementation fixes most to almost all the 
remaining problems with unicode. It is a release we can be proud of and 
should promote as the latest and greatest Python version.

Terry Jan Reedy

From pydsigner at  Sat Oct 27 05:06:51 2012
From: pydsigner at (Daniel Foerster)
Date: Fri, 26 Oct 2012 22:06:51 -0500
Subject: [Python-ideas] Async API
Message-ID: <7516094788412279153@unknownmsgid>

So, are threads still an option? I feel that many of these problems with
generators could be solved with threads.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From breamoreboy at  Sat Oct 27 05:13:57 2012
From: breamoreboy at (Mark Lawrence)
Date: Sat, 27 Oct 2012 04:13:57 +0100
Subject: [Python-ideas]
In-Reply-To: <k6fifd$34v$>
References: <>
Message-ID: <k6fjbr$74f$>

On 27/10/2012 03:55, Terry Reedy wrote:
> 3.3 is now out 29 months after 2.7, library support is much improved,
> and the new unicode implementation fixes most to almost all the
> remaining problems with unicode. It is a release we can be proud of and
> should promote as the latest and greatest Python version.



Mark Lawrence.

From at  Sat Oct 27 06:46:07 2012
From: at (Yury Selivanov)
Date: Sat, 27 Oct 2012 00:46:07 -0400
Subject: [Python-ideas]
In-Reply-To: <k6fifd$34v$>
References: <>
Message-ID: <>

On 2012-10-26, at 10:55 PM, Terry Reedy <tjreedy at> wrote:

> 3.3 is now out 29 months after 2.7, library support is much improved, and the new unicode implementation fixes most to almost all the remaining problems with unicode. It is a release we can be proud of and should promote as the latest and greatest Python version.

I feel the same.

On the one hand I understand position to keep 2.7 as default here and there,
as it's currently used more; but on the other, here is what we have:

- default documentation page - 2.7

- home page: New to Python or choosing between Python 2 and Python 3? 
Read Python 2 or Python 3

- downloads:
-- The current production versions are Python 2.7.3 and Python 3.3.0.
-- If you don't know which version to use, start with Python 2.7; more existing 
third party software is compatible with Python 2 than Python 3 right now.
-- First links to downloads - 2.7

Isn't it too much of python 2?  What is the impression after all of this?  
Python 2.7 is the current and recommended version.

I think that the message should be clear, and after 3 years it's time to say 
that python 3 is always the preferred way.  After all, people are not dumb, 
if they use python 2 they can go and download it, and they certainly can find 
docs for it as well.


From ncoghlan at  Sat Oct 27 06:54:45 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 27 Oct 2012 14:54:45 +1000
Subject: [Python-ideas] Async API
In-Reply-To: <7516094788412279153@unknownmsgid>
References: <7516094788412279153@unknownmsgid>
Message-ID: <>

On Sat, Oct 27, 2012 at 1:06 PM, Daniel Foerster <pydsigner at> wrote:
> So, are threads still an option? I feel that many of these problems with
> generators could be solved with threads.

No, because available operating systems can handle a few orders of
magnitude more concurrent IO operations per process than they can
handle threads per process. The idea of asynchronous programming is to
only use additional threads when you really need them (i.e. for
blocking synchronous operations with no asynchronous equivalent), thus
providing support for a far greater number of concurrent operations
per process than if you rely entirely on threads for concurrency.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From at  Sat Oct 27 07:07:07 2012
From: at (Yury Selivanov)
Date: Sat, 27 Oct 2012 01:07:07 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <7516094788412279153@unknownmsgid>
Message-ID: <>


On 2012-10-27, at 12:54 AM, Nick Coghlan <ncoghlan at> wrote:

> The idea of asynchronous programming is to
> only use additional threads when you really need them (i.e. for
> blocking synchronous operations with no asynchronous equivalent)

BTW, you've touched a very interesting subject.  There are lots of
potentially blocking operations that are very hard to do asynchronously
without threads.  Such as working with directories or even reading
from files (there is aio on linux, but I haven't seen a library
that supports it.)

It would be great if we can address those problems with the new
async API.  I.e. we can use threadpools where necessary, but make 
the public API look fancy and yield-from-able.  Same approach that 
Joyent uses in their libuv.  And when OSes gain more advanced and
wide non-blocking support we can decrease use of threads.


From ncoghlan at  Sat Oct 27 07:15:51 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 27 Oct 2012 15:15:51 +1000
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 27, 2012 at 2:46 PM, Yury Selivanov < at> wrote:
> I think that the message should be clear, and after 3 years it's time to say
> that python 3 is always the preferred way.  After all, people are not dumb,
> if they use python 2 they can go and download it, and they certainly can find
> docs for it as well.

The message is clear, but some people just don't like the current
message: Python 2 is still the recommended default version for
production systems and applications.

- most hosting services (including Platform-as-a-Service providers
with a Python option) only offer Python 2
- Fedora, RHEL and derivatives still require Python 2 for all their
system utilities (Ubuntu at least has migrated their core system
tools, but I don't know about Debian upstream)
- Django does not yet have a released version that supports Python 3
(and even once 1.5 final is out the door, the Python 3 support is
technically classed as experimental until 1.6)
- graphics support in Python 3 is still a little sketchy in some
regards, but clearly improving (pygame and various GUI libraries like
pyside already work, pyglet has an alpha version, there's no
PIL/Pillow release, but there are working forks [1])

I don't think the ecosystem is to the point where it makes sense to
flip the switch just yet, but I do think it would be reasonable to
define the ecosystem state where we *will* flip the switch. The two
key missing pieces for me are:
- a Django release with non-experimental Python 3 support (i.e. likely
to happen with Django 1.6)
- an official release of PIL (or Pillow) that supports Python 3

(Why do I include those, and not Twisted? Because if you're a capable
enough developer to cope with Twisted, you're going to be able to cope
with the move from 3.3 back to 2.7)



Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From Steve.Dower at  Sat Oct 27 07:41:41 2012
From: Steve.Dower at (Steve Dower)
Date: Sat, 27 Oct 2012 05:41:41 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <7516094788412279153@unknownmsgid>
Message-ID: <>

Yury Selivanov wrote:
> It would be great if we can address those problems with the new
> async API.  I.e. we can use threadpools where necessary, but make
> the public API look fancy and yield-from-able.  Same approach that
> Joyent uses in their libuv.  And when OSes gain more advanced and
> wide non-blocking support we can decrease use of threads.

This certainly seems to be the plan, though I expect the details will be determined as libraries are updated to support the async API. As long as we ensure that the API itself can support event loops and operations using threads and other OS primitives, then we don't need to specify each and every one at this stage.

My design (which I'm writing up now) puts most of the responsibility on the active scheduler, which should make it much easier to have different default schedulers for each platform (and maybe specialised ones that are optimised for more limited situations) while the operations themselves can be built out of existing Python functions. I'll post more details soon, but it basically allows schedulers to optionally support some operations (such as select() or Condition.wait()) 'natively', with the operation only having to implement a fallback (presumably on a thread pool).


From at  Sat Oct 27 07:44:26 2012
From: at (Yury Selivanov)
Date: Sat, 27 Oct 2012 01:44:26 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-27, at 1:15 AM, Nick Coghlan <ncoghlan at> wrote:

> I don't think the ecosystem is to the point where it makes sense to
> flip the switch just yet, but I do think it would be reasonable to
> define the ecosystem state where we *will* flip the switch. The two
> key missing pieces for me are:
> - a Django release with non-experimental Python 3 support (i.e. likely
> to happen with Django 1.6)
> - an official release of PIL (or Pillow) that supports Python 3

One last thought (no need to reply if you disagree).

What if it's all "chicken or the egg" problem?  Maybe the right strategy
is not to hide python 2 from everywhere and start actively promoting
py3k, but to push it gradually?

Start with docs switching to py3k by default.  That shouldn't be harmful
(and I hope that my docs theme patch will be accepted soon).

A bit later, when Django finally adds python 3 support - change
homepage with a more prominent advice to use py3d.  



From ncoghlan at  Sat Oct 27 08:22:20 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 27 Oct 2012 16:22:20 +1000
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 27, 2012 at 3:44 PM, Yury Selivanov < at> wrote:
> Start with docs switching to py3k by default.  That shouldn't be harmful
> (and I hope that my docs theme patch will be accepted soon).

Actually, there are at least a few very real harms that come from
switching the docs over:
1. Many third party Python 2 tutorials include links to our docs. We
can't magically reach out to those sites and update their links, so
they will end up linking to Python 3 resources from Python 2 ones
2. It breaks links on sites like Stack Overflow and in mailing list
archives and our own bug tracker, which currently link to the main
docs to explain Python 2 behaviour
3. it completely breaks direct hyperlinks to names that no longer
exist in Python 3 (even the ones that exist under new names).

I'm actually wondering if should be updated *now* with
a rewrite rule that redirects to a more explicit
URL. At the moment, there is no easy way to get hold of a stable URL
for the Python 2 docs, and nothing we can put in any advance
announcement of a migration to say something like:

" will switch to displaying the Python 3 documentation
by default in June 2013. Please update any direct links that are
intended to refer specifically to the Python 2 documentation by
including a leading '/2.x/' in the path component of the URL. For
example, '' would become
''. Between now and the migration
in June 2013, affected links will be automatically redirected to the
new stable Python 2.x URLs".

So that's my concrete proposal:
1. We pick a date (June next year sounds about right)
2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/")
3. We start redirecting affected pages immediately
4. We add a notice like the one above to the home page of the 2.7
docs, announce it on the PSF blog, announce it far and wide


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From donald.stufft at  Sat Oct 27 09:11:02 2012
From: donald.stufft at (Donald Stufft)
Date: Sat, 27 Oct 2012 03:11:02 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
	<> <k6fifd$34v$>
Message-ID: <>

On Saturday, October 27, 2012 at 2:22 AM, Nick Coghlan wrote:
> So that's my concrete proposal:
> 1. We pick a date (June next year sounds about right)
> 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/")
> 3. We start redirecting affected pages immediately
> 4. We add a notice like the one above to the home page of the 2.7
> docs, announce it on the PSF blog, announce it far and wide

Can we change /py3k/ to /3.x/ and redirect the old one to match?

Another idea is similar, but instead of doing /2.x/ always redirect the
the root of to the latest production release, so
right now /foo would redirect to /2.7/foo. This is even better for
maintaining links to the actual resource people meant to link
to. Could even include a header at the top of old versions saying that
"You are currently viewing the docs for 2.5. Click here to view the
docs for 2.7".

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Sat Oct 27 09:17:41 2012
From: at (Yury Selivanov)
Date: Sat, 27 Oct 2012 03:17:41 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>


On 2012-10-27, at 2:22 AM, Nick Coghlan <ncoghlan at> wrote:

> I'm actually wondering if should be updated *now* with
> a rewrite rule that redirects to a more explicit
> URL. At the moment, there is no easy way to get hold of a stable URL
> for the Python 2 docs, and nothing we can put in any advance
> announcement of a migration to say something like:
> " will switch to displaying the Python 3 documentation
> by default in June 2013. Please update any direct links that are
> intended to refer specifically to the Python 2 documentation by
> including a leading '/2.x/' in the path component of the URL. For
> example, '' would become
> ''. Between now and the migration
> in June 2013, affected links will be automatically redirected to the
> new stable Python 2.x URLs".
> So that's my concrete proposal:
> 1. We pick a date (June next year sounds about right)
> 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/")
> 3. We start redirecting affected pages immediately
> 4. We add a notice like the one above to the home page of the 2.7
> docs, announce it on the PSF blog, announce it far and wide

Now that's a great plan!

Big +1.

A few comments:

1. I'd still vote for an earlier date, like February/March 2013
2. How about simple and ?


From bruce at  Sat Oct 27 10:02:53 2012
From: bruce at (Bruce Leban)
Date: Sat, 27 Oct 2012 01:02:53 -0700
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
	<> <k6fifd$34v$>
Message-ID: <>

On Fri, Oct 26, 2012 at 11:22 PM, Nick Coghlan <ncoghlan at> wrote:

> On Sat, Oct 27, 2012 at 3:44 PM, Yury Selivanov < at>
> wrote:
> > Start with docs switching to py3k by default.  That shouldn't be harmful
> > (and I hope that my docs theme patch will be accepted soon).
> Actually, there are at least a few very real harms that come from
> switching the docs over:
> 1. Many third party Python 2 tutorials include links to our docs. We
> can't magically reach out to those sites and update their links, so
> they will end up linking to Python 3 resources from Python 2 ones

And many tutorials are not intentionally version specific.

> 2. It breaks links on sites like Stack Overflow and in mailing list
> archives and our own bug tracker, which currently link to the main
> docs to explain Python 2 behaviour

However, just because stack overflow and other sites link to 2.x docs
doesn't mean that the user wants to read the 2.x docs. Scenario: I'm using
3.x, I go to stack overflow to find out how to do something. it links to
the docs for the old version which is inaccurate for me. What I want is to
be able to quickly get to the doc that's relevant to *my* version.

> 3. it completely breaks direct hyperlinks to names that no longer
> exist in Python 3 (even the ones that exist under new names)

Urls for things that have been renamed should redirect to the appropriate
pages (whether docs on the new thing or an explanation of why this feature
doesn't exist in that version). This should work both forwards (2.x feature
renamed in 3.x) and backwards (3.x feature doesn't exist in 2.x)

> So that's my concrete proposal:

1. We pick a date (June next year sounds about right)
> 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/")
> 3. We start redirecting affected pages immediately
> 4. We add a notice like the one above to the home page of the 2.7
> docs, announce it on the PSF blog, announce it far and wide

I think this following proposal provides a better user experience. If you
don't think this is better, why?

2. Pick a stable url for docs and a way for referrers to select the
referenced version when that matters

    (a) -- displays
user's preferred version (see below)
    (b) --
displays version 2.7 if user does not have user's preferred version
always displays version 2.7 (discouraged unless talking specifically
about that version)

3. All the pages have a version picker (as previously discussed). The
dropdown to pick a version number could also have a way to pick the user's
preferred version and save it in a cookie.

4. Make the version number more prominent in case (c) so user will be aware
that they are not seeing their preferred version.

--- Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From ncoghlan at  Sat Oct 27 10:52:16 2012
From: ncoghlan at (Nick Coghlan)
Date: Sat, 27 Oct 2012 18:52:16 +1000
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 27, 2012 at 6:02 PM, Bruce Leban <bruce at> wrote:
> 2. Pick a stable url for docs and a way for referrers to select the
> referenced version when that matters
> Examples:
>     (a) -- displays
> user's preferred version (see below)
>     (b) --
> displays version 2.7 if user does not have user's preferred version
>     (c)
> -- always displays version 2.7 (discouraged unless talking specifically
> about that version)

We can already reference exact versions:

For non-current releases, those will redirect to the appropriate
release-specific URL, for the two current releases, it will redirect
to the stable "latest release" URL.

The problem is the current stable URLs for "latest Python 2" and
"latest Python 3" are respectively:

(despite comments elsewhere in the thread, "py3k" does *not* resolve
to the dev docs - those use the "/dev/" prefix in the path component)

It was suggested previously (i.e. more than a year ago) that it would
be better if 2.x/3.x worked as expected so people could update their
links appropriately, and I thought we had agreement on making that
change, but I guess nobody with server access agreed that was the case
(there's no ticket tracker currently in place for the

Note that I am deliberately limiting my suggestions to those which
require nothing new in the docs theming, just updates to the URL
handling in the web server.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From kristjan at  Sat Oct 27 12:27:40 2012
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Sat, 27 Oct 2012 10:27:40 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <7516094788412279153@unknownmsgid>
Message-ID: <>

Yes, stacklesslib provides this functionality with the call_on_thread() api, which turns a blocking operation into a non-blocking one.  This is also useful for cpu bound operations, btw.  For example, in EVE, when we need to do file operations and zipping of local files, we do it using this api.

-----Original Message-----
From: Python-ideas [ at] On Behalf Of Yury Selivanov
Sent: 27. okt?ber 2012 05:07
To: Nick Coghlan
Cc: Python-ideas at
Subject: Re: [Python-ideas] Async API

It would be great if we can address those problems with the new async API.  I.e. we can use threadpools where necessary, but make the public API look fancy and yield-from-able.  Same approach that Joyent uses in their libuv.  And when OSes gain more advanced and wide non-blocking support we can decrease use of threads.

From mal at  Sat Oct 27 12:54:20 2012
From: mal at (M.-A. Lemburg)
Date: Sat, 27 Oct 2012 12:54:20 +0200
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 27.10.2012 08:22, Nick Coghlan wrote:
> So that's my concrete proposal:
> 1. We pick a date (June next year sounds about right)
> 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/")

Why "/2.x/" and not just "/2/" ?

> 3. We start redirecting affected pages immediately

I think we should do the same for all Python 3 resources, i.e.
have "/library/os.html" redirect to "/3/library/os.html" so that
we don't run into the same problem again in the future.

> 4. We add a notice like the one above to the home page of the 2.7
> docs, announce it on the PSF blog, announce it far and wide

We also need a solution for URLs that exist for Python 2, but
not for Python 3. Those should be redirected to the Python 2
resource automatically, e.g. URLs pointing to the Python 2 modules
that were renamed in Python 3.

BTW: Will you write up a PEP for this ?

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Oct 27 2012)
>>> Python Projects, Consulting and Support ...
>>> mxODBC.Zope/Plone.Database.Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...
2012-10-29: PyCon DE 2012, Leipzig, Germany ...             2 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From p.f.moore at  Sat Oct 27 13:06:41 2012
From: p.f.moore at (Paul Moore)
Date: Sat, 27 Oct 2012 12:06:41 +0100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 27 October 2012 08:11, Donald Stufft <donald.stufft at> wrote:
> On Saturday, October 27, 2012 at 2:22 AM, Nick Coghlan wrote:
> So that's my concrete proposal:
> 1. We pick a date (June next year sounds about right)
> 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/")
> 3. We start redirecting affected pages immediately
> 4. We add a notice like the one above to the home page of the 2.7
> docs, announce it on the PSF blog, announce it far and wide
> +1

+1 also.

> Can we change /py3k/ to /3.x/ and redirect the old one to match?

+1. I'm sorry, but now that Python 3 is up to 3.3, and is a really
solid version, the "py3k" name doesn't feel "official" enough.

> Another idea is similar, but instead of doing /2.x/ always redirect the
> the root of to the latest production release, so
> right now /foo would redirect to /2.7/foo. This is even better for
> maintaining links to the actual resource people meant to link
> to. Could even include a header at the top of old versions saying that
> "You are currently viewing the docs for 2.5. Click here to view the
> docs for 2.7".

-1. Certainly what I (and I suspect many others) usually care about is
getting at the "Python 2" or "Python 3" documentation, not a specific
version. Having the 2.7, 2.6 links is fine, but I don't *think* of
myself as going to the 2.7 docs, but rather to the 2.x docs (as
opposed to 3.x). The "New in x.y" annotations give me the history I
need. And I think that's true of links as well - they would be to
"python 2" or "python 3", not (normally) to a specific minor version.


From dickinsm at  Sat Oct 27 13:34:44 2012
From: dickinsm at (Mark Dickinson)
Date: Sat, 27 Oct 2012 12:34:44 +0100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 27, 2012 at 7:22 AM, Nick Coghlan <ncoghlan at> wrote:
> On Sat, Oct 27, 2012 at 3:44 PM, Yury Selivanov < at> wrote:
>> Start with docs switching to py3k by default.  That shouldn't be harmful
>> (and I hope that my docs theme patch will be accepted soon).
> Actually, there are at least a few very real harms that come from
> switching the docs over:
> 1. Many third party Python 2 tutorials include links to our docs. We
> can't magically reach out to those sites and update their links, so
> they will end up linking to Python 3 resources from Python 2 ones

As a data point, MIT's '6.00x Introduction to Computer Science and
Programming' EdX online course contains many links of the form
"".  I don't have exact numbers, but
judging by the EPD download numbers we've been seeing there are
definitely thousands of students, and probably tens of thousands,
taking that course.  Switching without a generous
warning period would not be a good idea for those students.


From solipsis at  Sat Oct 27 13:43:48 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 27 Oct 2012 13:43:48 +0200
Subject: [Python-ideas]
References: <>
Message-ID: <>

On Sat, 27 Oct 2012 12:06:41 +0100
Paul Moore <p.f.moore at> wrote:
> > Another idea is similar, but instead of doing /2.x/ always redirect the
> > the root of to the latest production release, so
> > right now /foo would redirect to /2.7/foo. This is even better for
> > maintaining links to the actual resource people meant to link
> > to. Could even include a header at the top of old versions saying that
> > "You are currently viewing the docs for 2.5. Click here to view the
> > docs for 2.7".
> -1. Certainly what I (and I suspect many others) usually care about is
> getting at the "Python 2" or "Python 3" documentation, not a specific
> version. Having the 2.7, 2.6 links is fine, but I don't *think* of
> myself as going to the 2.7 docs, but rather to the 2.x docs (as
> opposed to 3.x). The "New in x.y" annotations give me the history I
> need. And I think that's true of links as well - they would be to
> "python 2" or "python 3", not (normally) to a specific minor version.

I'm not sure why you're -1 about something which wouldn't affect you
negatively.  As you say yourself, the 2.7 docs have all the information
you need about previous releases as well (because of the versionadded
and versionchanged markers). *However*, the 2.6 and previous docs don't
have information about useful stuff added in 2.7.

And since 2.7 is the last in the 2.x line, I think it makes sense to
reflect that explicitly in the redirections.



From p.f.moore at  Sat Oct 27 14:21:59 2012
From: p.f.moore at (Paul Moore)
Date: Sat, 27 Oct 2012 13:21:59 +0100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 27 October 2012 12:43, Antoine Pitrou <solipsis at> wrote:
> On Sat, 27 Oct 2012 12:06:41 +0100
> Paul Moore <p.f.moore at> wrote:
>> > Another idea is similar, but instead of doing /2.x/ always redirect the
>> > the root of to the latest production release, so
>> > right now /foo would redirect to /2.7/foo. This is even better for
>> > maintaining links to the actual resource people meant to link
>> > to. Could even include a header at the top of old versions saying that
>> > "You are currently viewing the docs for 2.5. Click here to view the
>> > docs for 2.7".
>> -1. Certainly what I (and I suspect many others) usually care about is
>> getting at the "Python 2" or "Python 3" documentation, not a specific
>> version. Having the 2.7, 2.6 links is fine, but I don't *think* of
>> myself as going to the 2.7 docs, but rather to the 2.x docs (as
>> opposed to 3.x). The "New in x.y" annotations give me the history I
>> need. And I think that's true of links as well - they would be to
>> "python 2" or "python 3", not (normally) to a specific minor version.
> I'm not sure why you're -1 about something which wouldn't affect you
> negatively.  As you say yourself, the 2.7 docs have all the information
> you need about previous releases as well (because of the versionadded
> and versionchanged markers). *However*, the 2.6 and previous docs don't
> have information about useful stuff added in 2.7.

Maybe I misunderstood. I was assuming that there would be no "2.x"
link, only "2.7". That's what I'm against - I would prefer to use a
generic 2.x link to get to the Python 2 docs if I needed them (just as
I use at the moment).

My -1 was too strong though, make that a -0 (and a "don't care" if
there will be a 2.x link as well as the explicit ones).

> And since 2.7 is the last in the 2.x line, I think it makes sense to
> reflect that explicitly in the redirections.

I'm not against an explicit 2.7 link - we have that already, don't we?


From mal at  Sat Oct 27 15:27:53 2012
From: mal at (M.-A. Lemburg)
Date: Sat, 27 Oct 2012 15:27:53 +0200
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 27.10.2012 13:34, Mark Dickinson wrote:
> On Sat, Oct 27, 2012 at 7:22 AM, Nick Coghlan <ncoghlan at> wrote:
>> On Sat, Oct 27, 2012 at 3:44 PM, Yury Selivanov < at> wrote:
>>> Start with docs switching to py3k by default.  That shouldn't be harmful
>>> (and I hope that my docs theme patch will be accepted soon).
>> Actually, there are at least a few very real harms that come from
>> switching the docs over:
>> 1. Many third party Python 2 tutorials include links to our docs. We
>> can't magically reach out to those sites and update their links, so
>> they will end up linking to Python 3 resources from Python 2 ones
> As a data point, MIT's '6.00x Introduction to Computer Science and
> Programming' EdX online course contains many links of the form
> "".  I don't have exact numbers, but
> judging by the EPD download numbers we've been seeing there are
> definitely thousands of students, and probably tens of thousands,
> taking that course.  Switching without a generous
> warning period would not be a good idea for those students.

Wouldn't it be possible to leave the non-versioned URLs redirecting
to the Python 2 versions for say another 5 years and instead
have the base URL provide links to either
the Python 2 or 3 version (perhaps even listing the various available
minor versions) ?

That would avoid the issue of having existing course material on the
web fail to work after just one year.

At PyCon UK we discussed these issues with teachers and people
interested in getting Python on the UK teaching plan. Their main
concern was that text books and course material have a much longer
life period than just 18 months. For them it's very important to
have a stable release of both Python and its documentation that
remains valid for at least 5 years.

I hope that Python 3.x has stabilized enough now with the 3.3 release
that it can become the basis for such materials.

In any case, if we want Python 3 to be picked up in such environments,
we cannot easily go about breaking things like URLs to documentation
and will have to settle on a stable approach soon.

Marc-Andre Lemburg

Professional Python Services directly from the Source  (#1, Oct 27 2012)
>>> Python Projects, Consulting and Support ...
>>> mxODBC.Zope/Plone.Database.Adapter ...
>>> mxODBC, mxDateTime, mxTextTools ...
2012-10-29: PyCon DE 2012, Leipzig, Germany ...             2 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! :::: Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at  Sat Oct 27 16:40:46 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 28 Oct 2012 00:40:46 +1000
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 27, 2012 at 8:54 PM, M.-A. Lemburg <mal at> wrote:
> On 27.10.2012 08:22, Nick Coghlan wrote:
>> So that's my concrete proposal:
>> 1. We pick a date (June next year sounds about right)
>> 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/")
> Why "/2.x/" and not just "/2/" ?

I find the /2/ vs /3/ too easy to miss in the middle of a full URL,
whereas I find the extra space to the right of the number in /2.x/ vs
/3.x/ makes them easier to separate.

However, in writing up the PEP, I discovered it was annoyingly
ambiguous whether "/2.x/" specifically meant that URL, or whether it
meant "/2.7/" and friends, so I switched to the shorter form.

>> 3. We start redirecting affected pages immediately
> I think we should do the same for all Python 3 resources, i.e.
> have "/library/os.html" redirect to "/3/library/os.html" so that
> we don't run into the same problem again in the future.

In writing up the PEP, I rediscovered an old proposal of mine to avoid
breaking deep links by simply do a "documented deprecation" of
unqualified deep links, but otherwise leaving them pointing to Python
2. Only the default landing page would be switched to Python 3.

Since that approach avoids a *lot* of issues, that's what I ended writing up.

>> 4. We add a notice like the one above to the home page of the 2.7
>> docs, announce it on the PSF blog, announce it far and wide
> We also need a solution for URLs that exist for Python 2, but
> not for Python 3. Those should be redirected to the Python 2
> resource automatically, e.g. URLs pointing to the Python 2 modules
> that were renamed in Python 3.
> BTW: Will you write up a PEP for this ?

Committed as PEP 430, should show up before too long.


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From _ at  Sat Oct 27 17:11:33 2012
From: _ at (Laurens Van Houtven)
Date: Sat, 27 Oct 2012 17:11:33 +0200
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On Fri, Oct 26, 2012 at 7:49 PM, Yury Selivanov < at>wrote:

> If it is decorated, though, how can I invoke it with a timeout?

The important thing to remember is that the fundamental abstraction at play
here is the deferred. Calling such a decorated function gives you a
deferred. So, you call it with a timeout the same way you timeout (cancel)
any deferred:

d = deferred_returning_expression
reactor.callLater(timeout, d.cancel)

Where deferred_returning_expression can be anything, including calling your
@inlineCallbacks-decorated function.

The way it fits in with all existing stuff, making it look an awful lot
like a lot of existing stuff, is probably why deferred cancellation is one
of the more recent features to make it into twisted: a lot of people did
similar things using the tools that were already there.

> -
> Yury
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From _ at  Sat Oct 27 17:16:06 2012
From: _ at (Laurens Van Houtven)
Date: Sat, 27 Oct 2012 17:16:06 +0200
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <7516094788412279153@unknownmsgid>
Message-ID: <>

Yes, thread pools are unfortunately necessary evils.

Twisted comes with a few tools to handle the use cases we're discussing.
The 1:1 equivalent for call_on_thread would be
deferToThread/deferToThreadPool (deferToThread == deferToThreadPool except
with the default thread pool instead of a specific one).

There are a few other tools:

- spawnProcess (equiv to subprocess module, except with async communication
with the subprocess)
- cooperative multitasking, such (twisted.internet.task.) Cooperator and
coiterate: basically resumable tasks that are explicit about where they can
be paused/resumed
- third party tools such as corotwine, giving stackless-style coroutines,
or ampoule, giving remote subprocesses

The more I learn about other stuff the more I see that everything is the
same because everything is different :)

On Sat, Oct 27, 2012 at 12:27 PM, Kristj?n Valur J?nsson <
kristjan at> wrote:

> Yes, stacklesslib provides this functionality with the call_on_thread()
> api, which turns a blocking operation into a non-blocking one.  This is
> also useful for cpu bound operations, btw.  For example, in EVE, when we
> need to do file operations and zipping of local files, we do it using this
> api.
> K

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Sat Oct 27 17:46:30 2012
From: solipsis at (Antoine Pitrou)
Date: Sat, 27 Oct 2012 17:46:30 +0200
Subject: [Python-ideas]
References: <>
Message-ID: <>

On Sat, 27 Oct 2012 13:21:59 +0100
Paul Moore <p.f.moore at> wrote:
> > And since 2.7 is the last in the 2.x line, I think it makes sense to
> > reflect that explicitly in the redirections.
> I'm not against an explicit 2.7 link - we have that already, don't we?

Yes, but the proposal is about redirecting to



From at  Sat Oct 27 17:53:41 2012
From: at (Yury Selivanov)
Date: Sat, 27 Oct 2012 11:53:41 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-27, at 10:40 AM, Nick Coghlan <ncoghlan at> wrote:

> Committed as PEP 430, should show up
> before too long.

I like the PEP, Nick.


From pydsigner at  Sat Oct 27 18:02:45 2012
From: pydsigner at (Daniel Foerster)
Date: Sat, 27 Oct 2012 11:02:45 -0500
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <7516094788412279153@unknownmsgid>
Message-ID: <>

On 10/26/2012 11:54 PM, Nick Coghlan wrote:
> On Sat, Oct 27, 2012 at 1:06 PM, Daniel Foerster <pydsigner at> wrote:
>> So, are threads still an option? I feel that many of these problems with
>> generators could be solved with threads.
> No, because available operating systems can handle a few orders of
> magnitude more concurrent IO operations per process than they can
> handle threads per process. The idea of asynchronous programming is to
> only use additional threads when you really need them (i.e. for
> blocking synchronous operations with no asynchronous equivalent), thus
> providing support for a far greater number of concurrent operations
> per process than if you rely entirely on threads for concurrency.
> Cheers,
> Nick.
I'm realizing that I perhaps don't grasp the entirety of Asynchronous 
programming. However, The only results I have found are for .NET and C#. 
Would you like to recommend some online sources I could read?

From tjreedy at  Sat Oct 27 22:12:38 2012
From: tjreedy at (Terry Reedy)
Date: Sat, 27 Oct 2012 16:12:38 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <k6hf7t$2vs$>

On 10/27/2012 11:53 AM, Yury Selivanov wrote:
> On 2012-10-27, at 10:40 AM, Nick Coghlan <ncoghlan at> wrote:
>> Committed as PEP 430, should show up
>> before too long.
> I like the PEP, Nick.

It looks good to me also. I agree that breaking the existing 
non-specific deep links is a problem. As I understand the proposal, 
browser bars would only display version- or at least series-specific 
links so that future copy and paste of links would do the right thing 
for the indefinite future.

Terry Jan Reedy

From dickinsm at  Sat Oct 27 22:16:56 2012
From: dickinsm at (Mark Dickinson)
Date: Sat, 27 Oct 2012 21:16:56 +0100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Sat, Oct 27, 2012 at 3:40 PM, Nick Coghlan <ncoghlan at> wrote:
> In writing up the PEP, I rediscovered an old proposal of mine to avoid
> breaking deep links by simply do a "documented deprecation" of
> unqualified deep links, but otherwise leaving them pointing to Python
> 2. Only the default landing page would be switched to Python 3.
> Since that approach avoids a *lot* of issues, that's what I ended writing up.

This seems like a nice solution.


From greg.ewing at  Sun Oct 28 01:21:35 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 28 Oct 2012 12:21:35 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> The example would have to set some flag indicating it has a result
> after the first yield (i.e. before entering the finally, or at least
> before yielding in the finally clause). And the timeout callback would
> have to check this flag. This makes it slightly awkward to design a
> general-purpose timeout mechanism for tasks written in this style --
> if you expect a timeout or cancellation you must protect your cleanup
> code from it by using some API.

This is where having a way to find out whether a generator
is in a finally clause would help. It would allow the scheduler
to take care of this transparently.


From at  Sun Oct 28 01:45:13 2012
From: at (Yury Selivanov)
Date: Sat, 27 Oct 2012 19:45:13 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-27, at 7:21 PM, Greg Ewing <greg.ewing at> wrote:

> Guido van Rossum wrote:
>> The example would have to set some flag indicating it has a result
>> after the first yield (i.e. before entering the finally, or at least
>> before yielding in the finally clause). And the timeout callback would
>> have to check this flag. This makes it slightly awkward to design a
>> general-purpose timeout mechanism for tasks written in this style --
>> if you expect a timeout or cancellation you must protect your cleanup
>> code from it by using some API.
> This is where having a way to find out whether a generator
> is in a finally clause would help. It would allow the scheduler
> to take care of this transparently.

Right.  But now I'm not sure this approach will work with yield-froms.
As when you yield-fromming scheduler knows nothing about the chain of 
generators, as it's all hidden in the yield-from implementation.


From greg.ewing at  Sun Oct 28 01:52:40 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 28 Oct 2012 12:52:40 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Yury Selivanov wrote:
> But now I'm not sure this approach will work with yield-froms.
> As when you yield-fromming scheduler knows nothing about the chain of 
> generators, as it's all hidden in the yield-from implementation.

I think this just means that the implementation would involve
more than looking at a single bit. Something like an in_finally()
method that looks along the yield-from chain and returns true if
any of the generators are in a finally section.


From at  Sun Oct 28 02:29:31 2012
From: at (Yury Selivanov)
Date: Sat, 27 Oct 2012 20:29:31 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-27, at 7:52 PM, Greg Ewing <greg.ewing at> wrote:

> Yury Selivanov wrote:
>> But now I'm not sure this approach will work with yield-froms.
>> As when you yield-fromming scheduler knows nothing about the chain of generators, as it's all hidden in the yield-from implementation.
> I think this just means that the implementation would involve
> more than looking at a single bit. Something like an in_finally()
> method that looks along the yield-from chain and returns true if
> any of the generators are in a finally section.

That would not be a solution either.

Imagine that we have two coroutines:

  def c1():
          yield c2().with_timeout(1.0)      # p1
              yield c2().with_timeout(1.0)  # p2
          except TimeoutError:

  def c2():
          yield c3().with_timeout(2.0)      # p3
          yield c4()                        # p4
In the above example scheduler *can* safely interrupt "c2" when it
is invoked from "c1" at "p2".  I.e. scheduler can't interrupt the
coroutine when it is itself in its finally statement, but it's fine
to interrupt it when it is not, even if it is invoked from other
coroutine's finally block.

If you translate this example in yield-from form, then checking 
'in_finally()' result on "c1" when it is at "p2" will prevent you
to raise TimeoutError, but you clearly should.

In other words, we want coroutines behaviour to be closer to the
regular python code.


From greg.ewing at  Sun Oct 28 06:55:52 2012
From: greg.ewing at (Greg Ewing)
Date: Sun, 28 Oct 2012 18:55:52 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Yury Selivanov wrote:
> In the above example scheduler *can* safely interrupt "c2" when it
> is invoked from "c1" at "p2".  I.e. scheduler can't interrupt the
> coroutine when it is itself in its finally statement, but it's fine
> to interrupt it when it is not, even if it is invoked from other
> coroutine's finally block.

I'm confused about the relationship between c1 and c2 here, and
what you mean by one coroutine "invoking" another.

Can you post a version that uses yield-from instead of yielding
objects with unknown (to me) semantics?


From at  Sun Oct 28 08:03:34 2012
From: at (Yury Selivanov)
Date: Sun, 28 Oct 2012 03:03:34 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-28, at 1:55 AM, Greg Ewing <greg.ewing at> wrote:

> Yury Selivanov wrote:
>> In the above example scheduler *can* safely interrupt "c2" when it
>> is invoked from "c1" at "p2".  I.e. scheduler can't interrupt the
>> coroutine when it is itself in its finally statement, but it's fine
>> to interrupt it when it is not, even if it is invoked from other
>> coroutine's finally block.
> I'm confused about the relationship between c1 and c2 here, and
> what you mean by one coroutine "invoking" another.
> Can you post a version that uses yield-from instead of yielding
> objects with unknown (to me) semantics?

The reason I kept using my version is because I'm not sure how we will set 
timeouts for yield-from style coroutines.  But let's assume that we can do 
that with a context manager.

Let's also assume that generator object has 'in_finally()' method,
as you defined: "Something like an in_finally() method that looks along 
the yield-from chain and returns true if any of the generators are in a 
finally section."

    def coro1():
            with timeout(1.0):
                yield from coro2() # 1
                with timeout(1.0):
                    yield from coro2() # 2
            except TimeoutError:

    def coro2():
            yield # 3
            yield # 4

Now, if "coro2" is suspended at #4 -- it shouldn't be interrupted with

If, however, "coro2" is at #3 -- it can be, and it doesn't matter was it 
called from #1 or #2.

IIUC, yield-from supporting scheduler, won't know about "coro2".  All it 
will have is a generator for "coro1".  All dispatching will be handled
by "yield from" statement automatically.  In this case, you can't rely
on "coro1.in_finally()", because it will return:

- True, when "coro1" is at #1 & "coro2" is at #4 (it's unsafe to interrupt)
- True, when "coro1" is at #2 & "coro2" is at #3 (safe to interrupt)

The fundamental problem here, is that scheduler knows nothing about
coroutines call chain.  It doesn't even know at what generator 
'with timeout' was called.


From reingart at  Sun Oct 28 08:30:00 2012
From: reingart at (Mariano Reingart)
Date: Sun, 28 Oct 2012 04:30:00 -0300
Subject: [Python-ideas] i18n and Python tracebacks
In-Reply-To: <>
References: <>
	<> <>
Message-ID: <>

On Tue, May 18, 2010 at 12:14 AM, Stephen J. Turnbull
<stephen at> wrote:
> Nick Coghlan writes:
>  > It would actually be interesting to see just how far someone could get
>  > [on translating tracebacks] purely with sys.excepthook.
>  >
>  > It would be subject to some fairly significant limitations (particularly
>  > when it comes to reparsing strings with interpolated values), but the
>  > traceback parsing and comparison code in doctest may offer a good
>  > starting point.
> Actually, it shouldn't be too hard to handle the interpolations.  In
> fact the language to be parsed is probably mostly pretty simple, and
> can be automatically translated to BNF or whatever input your favorite
> parsing library wants from the .pot file.  The generated grammar
> probably would be on the order of the size of the .pot file, no?  It
> could be stored with the .mos as a "pseudo-translation".

Interpolation is not very hard (although it could be error prone).

I tried that with some regex but I'd found some dead-ends because some
messages are hard-coded at the interpreter level, so they cannot be
implemented purely with sys.excepthook

I'd created a parallel project just if anyone is interested (would be
the pure-python version but it would require too much work):

Maybe I missed something, but the gettext approach seems more
consistent and cleaner, and IMHO using gettext is easier than
rewriting an interpreter :-)

[sorry for the 2-year delay]

Mariano Reingart

From reingart at  Sun Oct 28 08:39:22 2012
From: reingart at (Mariano Reingart)
Date: Sun, 28 Oct 2012 04:39:22 -0300
Subject: [Python-ideas] i18n and Python tracebacks
In-Reply-To: <>
References: <>
Message-ID: <>

On Sun, May 16, 2010 at 1:19 AM, Nick Coghlan <ncoghlan at> wrote:
> Mariano Reingart wrote:
>> Sorry if there is any mistake, I hope the interested people (here in
>> Argentina at least),  with more experience in C and Python, would help
>> me to fix/enhance this and/or champion it.
>> Do you think this is the right way?
> The basic concept appears sound, but you'll want to work against the
> py3k branch rather than trunk.

Done (sorry for the 2-year delay), it implements Py_GETTEXT against py3.3+:

Updated proposal:

BTW, I've make a patch for a related issue too (utf-8):

If this Traceback Internationalization Proposal makes sense, I could
present it on the PyCon Argentina 2012 Core-Python Sprint to see if we
can advance it:

Best regards

Mariano Reingart

From g.brandl at  Sun Oct 28 08:59:11 2012
From: g.brandl at (Georg Brandl)
Date: Sun, 28 Oct 2012 08:59:11 +0100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <k6iohl$7pi$>

Am 27.10.2012 16:40, schrieb Nick Coghlan:

>>> 4. We add a notice like the one above to the home page of the 2.7
>>> docs, announce it on the PSF blog, announce it far and wide
>> We also need a solution for URLs that exist for Python 2, but
>> not for Python 3. Those should be redirected to the Python 2
>> resource automatically, e.g. URLs pointing to the Python 2 modules
>> that were renamed in Python 3.
>> BTW: Will you write up a PEP for this ?
> Committed as PEP 430, should show up
> before too long.

Well, with the approval I've seen here, I have absolutely no problem
with appointing myself PEP Czar and accepting the PEP :)

I'll work on fixing the Apache config.


From _ at  Sun Oct 28 09:44:19 2012
From: _ at (Laurens Van Houtven)
Date: Sun, 28 Oct 2012 09:44:19 +0100
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <7516094788412279153@unknownmsgid>
Message-ID: <>

On Sat, Oct 27, 2012 at 6:02 PM, Daniel Foerster <pydsigner at>wrote:

> I'm realizing that I perhaps don't grasp the entirety of Asynchronous
> programming. However, The only results I have found are for .NET and C#.
> Would you like to recommend some online sources I could read?

A simple question with a multitude of answers!

Presumably you are more interested in the gory details of async
programming, whereas most of this discussing has been about what the code
looks like.

The wikipedia articles, while not fantastic, aren't terrible:

Also, there's a great introduction at ;
which unfortunately comes wrapped as a twisted tutorial ;) (You can stop
reading a while in when it becomes twisted-specific).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From barry at  Sun Oct 28 11:45:01 2012
From: barry at (Barry Warsaw)
Date: Sun, 28 Oct 2012 06:45:01 -0400
Subject: [Python-ideas]
References: <>
Message-ID: <20121028064501.6b6c0203@resist>

On Oct 26, 2012, at 10:55 PM, Terry Reedy wrote:

>3.3 is now out 29 months after 2.7, library support is much improved, and the
>new unicode implementation fixes most to almost all the remaining problems
>with unicode. It is a release we can be proud of and should promote as the
>latest and greatest Python version.

Very definitely +1

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From barry at  Sun Oct 28 11:52:01 2012
From: barry at (Barry Warsaw)
Date: Sun, 28 Oct 2012 06:52:01 -0400
Subject: [Python-ideas]
References: <>
Message-ID: <20121028065201.74bd83dc@resist>

On Oct 27, 2012, at 03:15 PM, Nick Coghlan wrote:

>The message is clear, but some people just don't like the current
>message: Python 2 is still the recommended default version for
>production systems and applications.

I would hedge that and say that for new work where you have your Python 3
dependencies available, Python 3 should be the recommended default.  In
Ubuntu, we are actively porting our core system utilities to Python 3, but
some dependencies stop us for getting all the way there.  Xapian and Twisted
come to mind, but the Twisted folks are making great progress, so I expect
that for our Twisted apps at least, that story will be better soon.

Python 3.3 has some very clear advantages, so we are pushing to make that the
default leading up to Ubuntu 14.04 LTS.

>- Fedora, RHEL and derivatives still require Python 2 for all their
>system utilities (Ubuntu at least has migrated their core system
>tools, but I don't know about Debian upstream)

Debian Wheezy is in freeze so I wouldn't expect a lot of adoption there until
after that's released.  Then I hope that we'll be able to push those things

>I don't think the ecosystem is to the point where it makes sense to
>flip the switch just yet, but I do think it would be reasonable to
>define the ecosystem state where we *will* flip the switch. The two
>key missing pieces for me are:
>- a Django release with non-experimental Python 3 support (i.e. likely
>to happen with Django 1.6)
>- an official release of PIL (or Pillow) that supports Python 3

One way to look at it is that there doesn't necessary have to be just one big
switch.  There's a big bank of switches, many of which can be flipped now.
Yes, I'd love for the whole line of 'em to be Python 3 green, and eventually
they will be, but if you don't need Django or PIL (or whatever still isn't
ported yet), don't wait, port!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From ncoghlan at  Sun Oct 28 12:25:28 2012
From: ncoghlan at (Nick Coghlan)
Date: Sun, 28 Oct 2012 21:25:28 +1000
Subject: [Python-ideas]
In-Reply-To: <k6iohl$7pi$>
References: <>
Message-ID: <>

On Sun, Oct 28, 2012 at 5:59 PM, Georg Brandl <g.brandl at> wrote:
> Am 27.10.2012 16:40, schrieb Nick Coghlan:
>>>> 4. We add a notice like the one above to the home page of the 2.7
>>>> docs, announce it on the PSF blog, announce it far and wide
>>> We also need a solution for URLs that exist for Python 2, but
>>> not for Python 3. Those should be redirected to the Python 2
>>> resource automatically, e.g. URLs pointing to the Python 2 modules
>>> that were renamed in Python 3.
>>> BTW: Will you write up a PEP for this ?
>> Committed as PEP 430, should show up
>> before too long.
> Well, with the approval I've seen here, I have absolutely no problem
> with appointing myself PEP Czar and accepting the PEP :)

Heh, asking you to do that was next on my list, so thanks. Did Guido
hide a mind reading device in the time machine? :)

> I'll work on fixing the Apache config.

Huzzah \o/


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From at  Sun Oct 28 15:43:06 2012
From: at (Yury Selivanov)
Date: Sun, 28 Oct 2012 10:43:06 -0400
Subject: [Python-ideas]
In-Reply-To: <k6iohl$7pi$>
References: <>
Message-ID: <>

On 2012-10-28, at 3:59 AM, Georg Brandl <g.brandl at> wrote:

> Well, with the approval I've seen here, I have absolutely no problem
> with appointing myself PEP Czar and accepting the PEP :)

That's awesome!


From Steve.Dower at  Sun Oct 28 17:58:05 2012
From: Steve.Dower at (Steve Dower)
Date: Sun, 28 Oct 2012 16:58:05 +0000
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <7516094788412279153@unknownmsgid>
Message-ID: <>

Laurens Van Houtven [_ at] wrote:
> Also, there's a great introduction at ;
> which unfortunately comes wrapped as a twisted tutorial ;) (You can 
> stop reading a while in when it becomes twisted-specific).

That's a great description (even the parts about Twisted ;) ) - bookmarked. Thanks!

My one dislike about the general introduction is the sole focus on I/O. In its context (as a Twisted intro) this is entirely understandable, but I'm afraid some people may come away thinking that async never involves threads. (Not that waiting on a thread is any different to waiting on I/O. Thread parallelism is a completely different concept, of course.)


From guido at  Mon Oct 29 00:52:02 2012
From: guido at (Guido van Rossum)
Date: Sun, 28 Oct 2012 16:52:02 -0700
Subject: [Python-ideas] Async API: some code to review
Message-ID: <>

I am finally ready to show the code I worked on for the past two
weeks. This is definitely not ready for anything except as a quick
demo, but I learned enough while writing it to feel comfortable with
the PEP 380 paradigm.

I've set up a Hg repo on, and I picked a codename:
tulip. View the code here:

It runs on Linux and OSX; I have no easy access to Windows but I'd be
happy to take contributions.

Key files in the directory:

- the main program for testing, and a rough HTTP client
- transports for sockets and SSL, and a buffering layer
- a Task class and related stuff; this is where the PEP
380 scheduler is implemented
- an event loop and basic polling implementations for:
select(), poll(), epoll(), kqueue()

Other junk: .hgignore, Makefile, README, (benchmark yield
from vs. plain functions), (stupid style checker)

More detailed discussions per file follows; please read the code along
with my description (separately they may not make much sense):

I found it remarkably easy to come up with polling implementations
using all those different system calls. I ended up mixing in the
pollster class with the event loop class, although I'm not sure that's
the best design -- perhaps it's better if the event loop just
references the pollster as a separate object.

The pollster has a very simple API: add_reader(fd, callback, *args),
add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and
poll(timeout) -> list of events. (fd means file descriptor.) There's
also pollable() which just checks if there are any fds registered. My
implementation requires fd to be an int, but that could easily be
extended to support other types of event sources. I'm not super happy
that I have parallel reader/writer APIs, but passing a separate
read/write flag didn't come out any more elegant, and I don't foresee
other operation types (though I may be wrong).

The event list started out as a tuple of (fd, flag, callback, args),
where flag is 'r' or 'w' (easily extensible); in practice neither the
fd nor the flag are used, and one of the last things I did was to wrap
callback and args into a simple object that allows cancelling the
callback; the add_*() methods return this object. (This could probably
use a little more abstraction.) Note that poll() doesn't call the
callbacks -- that's up to the event loop.

The event loop has two basic ways to register callbacks:
call_soon(callback, *args) causes callback(*args) to be called the
next time the event loop runs; call_later(delay, callback, *args)
schedules a callback at some time (relative or absolute) in the
future. It also inherits add_reader() and add_writer() from the
pollster. Then there is run(), which runs the event loop until there's
nothing left to do (no readers, no writers, no soon or later
callbacks), and run_once(), which goes through the entire list of
event sources once. (I think the order in which I do this isn't quite
right but it works for now.)

Finally, there's a helper class (ThreadRunner) here which lets you run
something in a separate thread using the features of
concurrent.futures. It uses the "self-pipe trick" (Google it :-) to
ensure that the poll() call wakes up -- this is needed by
call_in_thread() at the next layer ( (There may be a
race condition here, but I think it can be fixed.)

Note that there are no yields (or yield froms) here; that's for the next layer:

This is the scheduler for PEP-380 style coroutines. I started with a
Scheduler class and operations along the lines of Greg Ewing's design,
with a Scheduler instance as a global variable, but ended up ripping
it out in favor of a Task object that represents a single stack of
generators chained via yield-from. There is a Context object holding
the event loop and the current task in thread-local storage, so that
multiple threads can (and must) have independent event loops.

Most user (and much library) code in this system should be written as
generators invoking other generators directly using yield from.
However to run something as an independent task, you wrap the
generator call in a Task() constructor, possibly giving it a timeout,
and then calling its start() method. A Task also acts a little like a
future -- you can wait() for it, add done-callbacks, and it preserves
the return value of the generator call. This can be used to introduce
concurrency or to give something a separate timeout. (There are also
primitives to wait for the first N completed of a bunch of Tasks.)

To invoke a primitive I/O operation, you call the current task's
block() method and then immediately yield (similar to Greg Ewing's
approach). There are helpers block_r() and block_w() that arrange for
a task to block until a file descriptor is ready for reading/writing.
Examples of their use are in

There is also call_in_thread() which integrates with
polling.ThreadRunner to run a function in a separate thread and wait
for it. Also used in

In the docstrings I use the prefix "COROUTINE:" to indicate public
APIs that should be invoked using yield from.

This implements some internet primitives using the APIs in (including block_r() and block_w()). I call them
transports but they are different from transports Twisted; they are
closer to idealized sockets. SocketTransport wraps a plain socket,
offering recv() and send() methods that must be invoked using yield
from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
stdlib ssl sockets have good async support!). Then there is a
BufferedReader class that implements more traditional read() and
readline() coroutines (i.e., to be invoked using yield from), the
latter handy for line-oriented transports. Finally there are some
functions for connecting sockets, the highest-level one
create_transport(). These use call_in_thread() to run
socket.getaddrinfo() in a thread (this provides IPv6 support).

I don't particularly care about the exact abstractions in this module;
they are convenient and I was surprised how easy it was to add SSL,
but still these mostly serve as somewhat realistic examples of how to
use (Afterthought: I think the SocketTransport's recv()
and send() methods could be made more similar to SslTransport.)

More examples in the final file:

There is a simplistic HTTP client here built on top of the
sockets.*Transport abstractions. And the main code exercises this by
spawning four tasks fetching a variety of URLs (more when you
uncomment a block of code) and waiting for their results. The code is
a bit of a mess because I used it as a place to try out various APIs.

I'm most interested in feedback on the design of and, and to a lesser extent on the design of; is just an example of how this style works out in practice.

Sorry for the brain-dump style; I would like to write it all up
better, but at the same time waiting longer doesn't necessarily make
it better, so here it is, for all to see. (I also have a list of
problems I had to debug during the development and what I learned from
that; but that's too raw to post right now.)

--Guido van Rossum (

From stephen at  Mon Oct 29 03:47:53 2012
From: stephen at (Stephen J. Turnbull)
Date: Mon, 29 Oct 2012 11:47:53 +0900
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

Yury Selivanov writes:

 > The thing about 'doc2' & 'doc3' urls I don't like is that sooner or later
 > users will use python 3.  There is no future for python 2.

That's true for each user (assuming they don't die before switching).
It's not true for all applications, though.  There will undoubtedly be
systems based on Python 2 still in active, profitable use 10 years
from now.

It's just a yucky UI, let's stick to that for a reason.

From stephen at  Mon Oct 29 04:50:23 2012
From: stephen at (Stephen J. Turnbull)
Date: Mon, 29 Oct 2012 12:50:23 +0900
Subject: [Python-ideas]
In-Reply-To: <20121028064501.6b6c0203@resist>
References: <>
	<k6fifd$34v$> <20121028064501.6b6c0203@resist>
Message-ID: <>

Barry Warsaw writes:
 > On Oct 26, 2012, at 10:55 PM, Terry Reedy wrote:
 > >3.3 is now out 29 months after 2.7, library support is much improved, and the
 > >new unicode implementation fixes most to almost all the remaining problems
 > >with unicode. It is a release we can be proud of and should promote as the
 > >latest and greatest Python version.
 > Very definitely +1

As stated, yes, very much so.

I think it's unfortunate that some of this discussion has generated
more heat than light because there are three different goals here all
stemming from "promoting Python 3": (1) "... as a great language", (2)
"... as a great production-ready development environment" (for *some*
applications), and (3) "... as a great production-ready development
environment" (period, or to take a page from Linus's book, "World
Domination! Now!")

I think Nick's approach starts to phase in a change in promotion
effort appropriately.  But it's only a start.

From greg.ewing at  Mon Oct 29 06:05:24 2012
From: greg.ewing at (Greg Ewing)
Date: Mon, 29 Oct 2012 18:05:24 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Yury Selivanov wrote:

>     def coro1():
>         try:
>             with timeout(1.0):
>                 yield from coro2() # 1
>         finally:
>             try:
>                 with timeout(1.0):
>                     yield from coro2() # 2
>             except TimeoutError:
>                 pass
>     def coro2():
>         try:
>             block()
>             yield # 3
>             action()
>         finally:
>             block()
>             yield # 4
>             another_action()
> Now, if "coro2" is suspended at #4 -- it shouldn't be interrupted with
> TimeoutError.
> If, however, "coro2" is at #3 -- it can be, and it doesn't matter was it 
> called from #1 or #2.

What is your reasoning behind asserting this? Because it's inside
a try block of its own? Because it's subject to a nested timeout?
Something else?


From mark.hackett at  Mon Oct 29 11:09:24 2012
From: mark.hackett at (Mark Hackett)
Date: Mon, 29 Oct 2012 10:09:24 +0000
Subject: [Python-ideas] Enabling man page structure for python
In-Reply-To: <>
References: <>
Message-ID: <>

On Friday 26 Oct 2012, Andi Albrecht wrote:
> Hi,
> On Thu, Oct 25, 2012 at 7:25 PM, ?ric Araujo <merwok at> wrote:
> > Hi,
> >
> > See ?argparse: add ability to create a
> > man page?
> I've started to work on this issue some time ago. The starting point
> was a man page formatter based on optparse I wrote earlier. But I've
> encountered some problems since the output order of argparse
> formatters differ from what to expect on a man page. IIRC I saw the
> need to do some changes to the way how argparse formatters work but
> unfortunately got interrupted by other work.
> IMO adding a argparse formatter would the probably the right way to
> add man page support. There would even be no need to add this to
> stdlib then.
> Best regards,
> Andi
> > Cheers
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at
> >
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

I'd still like to see some of the functionality in the code I'd written to 
solve my problem in the parser, if not too much trouble, Andi.

I.e. at least a way to push more things to the man page (to be inserted in the 
page) so that you can add in more things (like external function calls).

It's not obvious to me whether argparse also gives you a synopsis (self-
written --help option).


From stefan at  Mon Oct 29 11:36:37 2012
From: stefan at (Stefan Krah)
Date: Mon, 29 Oct 2012 11:36:37 +0100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
	<k6fifd$34v$> <20121028064501.6b6c0203@resist>
Message-ID: <>

Stephen J. Turnbull <stephen at> wrote:
> I think Nick's approach starts to phase in a change in promotion
> effort appropriately.  But it's only a start.

As for promotion, I just noticed that searching for "Python 3" gives this
as the first result:

Overall, the (Google) search results on the first page don't look very
inviting, so perhaps we could improve the situation by adding "nofollow"
to the older release pages.

Stefan Krah

From ncoghlan at  Mon Oct 29 11:51:33 2012
From: ncoghlan at (Nick Coghlan)
Date: Mon, 29 Oct 2012 20:51:33 +1000
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
	<k6fifd$34v$> <20121028064501.6b6c0203@resist>
Message-ID: <>

On Mon, Oct 29, 2012 at 8:36 PM, Stefan Krah <stefan at> wrote:
> As for promotion, I just noticed that searching for "Python 3" gives this
> as the first result:

The second result is the current docs at,
which is pretty useful, *except* that the docs have no pointer to the
corresponding release page. Perhaps the existing "Welcome" paragraph
should be extended with a reference to the appropriate release page?

(Also: very nice work to everyone that helped make the version
switcher a reality)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From stefan at  Mon Oct 29 13:02:40 2012
From: stefan at (Stefan Krah)
Date: Mon, 29 Oct 2012 13:02:40 +0100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
	<k6fifd$34v$> <20121028064501.6b6c0203@resist>
Message-ID: <>

Nick Coghlan <ncoghlan at> wrote:
> On Mon, Oct 29, 2012 at 8:36 PM, Stefan Krah <stefan at> wrote:
> > As for promotion, I just noticed that searching for "Python 3" gives this
> > as the first result:
> >
> >
> The second result is the current docs at,
> which is pretty useful, *except* that the docs have no pointer to the
> corresponding release page. Perhaps the existing "Welcome" paragraph
> should be extended with a reference to the appropriate release page?

I think that's probably not necessary. Someone who is really searching
for the newest version will of course find it.

Getting rid of 3.0 in the top search results is more of an image thing.
3.0 is associated with "this new experimental version with virtually
no packages that support it".

For the casual searcher who might be trying to decide between Python and
other languages it would be nice to have more 3.3 links, hopefully sending
the message "a better Python with many more features and Django/Twisted
support just around the corner".

> (Also: very nice work to everyone that helped make the version
> switcher a reality)

I agree, the changes are a great improvement. Thanks everyone.

Stefan Krah

From shibturn at  Mon Oct 29 14:13:15 2012
From: shibturn at (Richard Oudkerk)
Date: Mon, 29 Oct 2012 13:13:15 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <k6lvdf$i5n$>

On 28/10/2012 11:52pm, Guido van Rossum wrote:
> I'm most interested in feedback on the design of and
>, and to a lesser extent on the design of;
> is just an example of how this style works out in practice.

What happens if two tasks try to do a read op (or two tasks try to do a 
write op) on the same file descriptor?  It looks like the second one to 
do scheduling.block_r(fd) will cause the first task to be forgotten, 
causing the first task to block forever.

Shouldn't there be a list of pending readers and a list of pending 
writers for each fd?


From Steve.Dower at  Mon Oct 29 15:00:35 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 29 Oct 2012 14:00:35 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <k6lvdf$i5n$>
References: <>,
Message-ID: <>

Richard Oudkerk wrote:
> On 28/10/2012 11:52pm, Guido van Rossum wrote:
>> I'm most interested in feedback on the design of and
>>, and to a lesser extent on the design of;
>> is just an example of how this style works out in practice.
> What happens if two tasks try to do a read op (or two tasks try to do a
> write op) on the same file descriptor?  It looks like the second one to
> do scheduling.block_r(fd) will cause the first task to be forgotten,
> causing the first task to block forever.

I know I haven't posted my own code yet (coming very soon), but I'd like to put out there that I don't think this is an important sort of question at this time. We both have sample schedulers that work well enough to demonstrate the API, but aren't meant to be production ready.

IMO, the important questions are:

 - how easy/difficult/flexible/restrictive is it to write a new scheduler as a core Python developer?
 - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
 - how easy/difficult/flexible/restrictive is it to write async operations as a core Python developer?
 - how easy/difficult/flexible/restrictive is it to write async operations as an end user?
 - how straightforward is it to consume async operations?
 - how easy is it to write async code that is correct?

Admittedly, I am writing this preemptively knowing that there are a lot of distractions like this in my code (some people are going to be horrified at what I did with file IO :-) Don't worry, it's only for trying the API). Once we know what interface we'll be coding against we can worry about getting the implementation perfect. Also, I imagine we'll find some more volunteers for coding (hopefully people who have done non-blocking stuff in C or similar before) who are currently avoiding the higher-level ideas discussion.


From guido at  Mon Oct 29 15:47:55 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 07:47:55 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 7:00 AM, Steve Dower <Steve.Dower at> wrote:
> Richard Oudkerk wrote:
>> On 28/10/2012 11:52pm, Guido van Rossum wrote:
>>> I'm most interested in feedback on the design of and
>>>, and to a lesser extent on the design of;
>>> is just an example of how this style works out in practice.
>> What happens if two tasks try to do a read op (or two tasks try to do a
>> write op) on the same file descriptor?  It looks like the second one to
>> do scheduling.block_r(fd) will cause the first task to be forgotten,
>> causing the first task to block forever.
> I know I haven't posted my own code yet (coming very soon), but I'd like to put out there that I don't think this is an important sort of question at this time.

Kind of. I think if it was an important use case it might affect the
shape of the API. However I can't think of a use case where it might
make sense for two tasks to read or write the same file descriptor
without some higher-level mediation. (Even at a higher level I find it
hard to imagine, except for writing to a common log file -- but even
there you want to be sure that individual lines aren't spliced into
each other, and the semantics of send() don't prevent that.)

> We both have sample schedulers that work well enough to demonstrate the API, but aren't meant to be production ready.
> IMO, the important questions are:
>  - how easy/difficult/flexible/restrictive is it to write a new scheduler as a core Python developer?
>  - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
>  - how easy/difficult/flexible/restrictive is it to write async operations as a core Python developer?
>  - how easy/difficult/flexible/restrictive is it to write async operations as an end user?
>  - how straightforward is it to consume async operations?
>  - how easy is it to write async code that is correct?

Yes, these are all important questions. I'm not sure that end users
would be writing new schedulers -- but 3rd party library developers
will be, and I suppose that's what you are referring to.

My own approach to answering these is to first try to figure out what
a typical application would be trying to accomplish. That's why I made
a point of implementing a 100% async HTTP client -- it's just quirky
enough that it exercises various issues (e.g. switching between
line-mode and blob mode, and the need to invoke getaddrinfo()).

> Admittedly, I am writing this preemptively knowing that there are a lot of distractions like this in my code (some people are going to be horrified at what I did with file IO :-) Don't worry, it's only for trying the API). Once we know what interface we'll be coding against we can worry about getting the implementation perfect. Also, I imagine we'll find some more volunteers for coding (hopefully people who have done non-blocking stuff in C or similar before) who are currently avoiding the higher-level ideas discussion.

I'm looking forward to it! I suspect we'll be merging our designs shortly...

--Guido van Rossum (

From shibturn at  Mon Oct 29 17:03:07 2012
From: shibturn at (Richard Oudkerk)
Date: Mon, 29 Oct 2012 16:03:07 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <k6m9bt$g3r$>

On 29/10/2012 2:47pm, Guido van Rossum wrote:
> Kind of. I think if it was an important use case it might affect the
> shape of the API. However I can't think of a use case where it might
> make sense for two tasks to read or write the same file descriptor
> without some higher-level mediation. (Even at a higher level I find it
> hard to imagine, except for writing to a common log file -- but even
> there you want to be sure that individual lines aren't spliced into
> each other, and the semantics of send() don't prevent that.)

It is a common pattern to have multiple threads/processes trying to 
accept connections on an single listening socket, so it would be 
unfortunate to disallow that.  Writing (short messages) to a pipe also 
has atomic guarantees that can make having multiple writers perfectly 


From solipsis at  Mon Oct 29 17:07:31 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 29 Oct 2012 17:07:31 +0100
Subject: [Python-ideas] Async API: some code to review
References: <>
Message-ID: <20121029170731.74bd3d37@cosmocat>

Hello Guido,

Le Sun, 28 Oct 2012 16:52:02 -0700,
Guido van Rossum <guido at> a
?crit :
> The event list started out as a tuple of (fd, flag, callback, args),
> where flag is 'r' or 'w' (easily extensible); in practice neither the
> fd nor the flag are used, and one of the last things I did was to wrap
> callback and args into a simple object that allows cancelling the
> callback; the add_*() methods return this object. (This could probably
> use a little more abstraction.) Note that poll() doesn't call the
> callbacks -- that's up to the event loop.

I don't understand why the pollster takes callback objects if it never
calls them. Also the fact that it wraps them into DelayedCalls is more
mysterious to me. DelayedCalls represent one-time cancellable callbacks
with a given deadline, not callbacks which are called any number of
times on I/O events and that you can't cancel.

> This is the scheduler for PEP-380 style coroutines. I started with a
> Scheduler class and operations along the lines of Greg Ewing's design,
> with a Scheduler instance as a global variable, but ended up ripping
> it out in favor of a Task object that represents a single stack of
> generators chained via yield-from. There is a Context object holding
> the event loop and the current task in thread-local storage, so that
> multiple threads can (and must) have independent event loops.

YMMV, but I tend to be wary of implicit thread-local storage. What if
someone runs a function or method depending on that thread-local
storage from inside a thread pool? Weird bugs ensue.

I think explicit context is much less error-prone. Even a single global
instance (like Twisted's reactor) would be better :-)

As for the rest of the scheduling module, I can't say much since I have
a hard time reading and understanding it.

> To invoke a primitive I/O operation, you call the current task's
> block() method and then immediately yield (similar to Greg Ewing's
> approach). There are helpers block_r() and block_w() that arrange for
> a task to block until a file descriptor is ready for reading/writing.
> Examples of their use are in

That's weird and kindof ugly IMHO. Why would you write:


instead of say:

        yield scheduling.block_w(self.sock.fileno())


Also, the fact that each call to SocketTransport.{recv,send} explicitly
registers then removes the fd on the event loop looks wasteful.

By the way, even when a fd is signalled ready, you must still be
prepared for recv() to return EAGAIN (see

> In the docstrings I use the prefix "COROUTINE:" to indicate public
> APIs that should be invoked using yield from.

Hmm, should they? Your approach looks a bit weird: you have functions
that should use yield, and others that should use "yield from"? That
sounds confusing to me.

I'd much rather either have all functions use "yield", or have all
functions use "yield from".

(also, I wouldn't be shocked if coroutines had to wear a special
decorator; it's a better marker than having the word COROUTINE in the
docstring, anyway :-))

> This implements some internet primitives using the APIs in
> (including block_r() and block_w()). I call them
> transports but they are different from transports Twisted; they are
> closer to idealized sockets. SocketTransport wraps a plain socket,
> offering recv() and send() methods that must be invoked using yield
> from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
> stdlib ssl sockets have good async support!).

SslTransport.{recv,send} need the same kind of logic as do_handshake():
catch both SSLWantReadError and SSLWantWriteError, and call block_r /
block_w accordingly.

> Then there is a
> BufferedReader class that implements more traditional read() and
> readline() coroutines (i.e., to be invoked using yield from), the
> latter handy for line-oriented transports.

Well... It would be nice if BufferedReader could re-use the actual
io.BufferedReader and its fast readline(), read(), readinto()



From mark.hackett at  Mon Oct 29 17:09:51 2012
From: mark.hackett at (Mark Hackett)
Date: Mon, 29 Oct 2012 16:09:51 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <k6m9bt$g3r$>
References: <>
Message-ID: <>

On Monday 29 Oct 2012, Richard Oudkerk wrote:
> Writing (short messages) to a pipe also
> has atomic guarantees that can make having multiple writers perfectly
> reasonable.
> --
> Richard
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux 
(because of the string operations available in the x86 instruction set), but I 
don't thing anything other than an IPC message has a "you can write a string 
atomically" guarantee. And I may be misremembering that.

And even if it's part of the SUS, how do we know this is true for non-UNIX 
compatible systems?

From jrwren at  Mon Oct 29 17:12:56 2012
From: jrwren at (Jay Wren)
Date: Mon, 29 Oct 2012 12:12:56 -0400
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 27, 2012, at 8:21 AM, Paul Moore <p.f.moore at> wrote:

> On 27 October 2012 12:43, Antoine Pitrou <solipsis at> wrote:
>> On Sat, 27 Oct 2012 12:06:41 +0100
>> Paul Moore <p.f.moore at> wrote:
>>>> Another idea is similar, but instead of doing /2.x/ always redirect the
>>>> the root of to the latest production release, so
>>>> right now /foo would redirect to /2.7/foo. This is even better for
>>>> maintaining links to the actual resource people meant to link
>>>> to. Could even include a header at the top of old versions saying that
>>>> "You are currently viewing the docs for 2.5. Click here to view the
>>>> docs for 2.7".
>>> -1. Certainly what I (and I suspect many others) usually care about is
>>> getting at the "Python 2" or "Python 3" documentation, not a specific
>>> version. Having the 2.7, 2.6 links is fine, but I don't *think* of
>>> myself as going to the 2.7 docs, but rather to the 2.x docs (as
>>> opposed to 3.x). The "New in x.y" annotations give me the history I
>>> need. And I think that's true of links as well - they would be to
>>> "python 2" or "python 3", not (normally) to a specific minor version.
>> I'm not sure why you're -1 about something which wouldn't affect you
>> negatively.  As you say yourself, the 2.7 docs have all the information
>> you need about previous releases as well (because of the versionadded
>> and versionchanged markers). *However*, the 2.6 and previous docs don't
>> have information about useful stuff added in 2.7.
> Maybe I misunderstood. I was assuming that there would be no "2.x"
> link, only "2.7". That's what I'm against - I would prefer to use a
> generic 2.x link to get to the Python 2 docs if I needed them (just as
> I use at the moment).
> My -1 was too strong though, make that a -0 (and a "don't care" if
> there will be a 2.x link as well as the explicit ones).
>> And since 2.7 is the last in the 2.x line, I think it makes sense to
>> reflect that explicitly in the redirections.
> I'm not against an explicit 2.7 link - we have that already, don't we?

Did this change recently? I just noticed that from if I click "Browse Current Documentation" under then Python 2.x section, it links to which then redirects to which is NOT the 2.x current documentation for which I clicked.

From guido at  Mon Oct 29 17:35:16 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 09:35:16 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <k6m9bt$g3r$>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 9:03 AM, Richard Oudkerk <shibturn at> wrote:
> On 29/10/2012 2:47pm, Guido van Rossum wrote:
>> Kind of. I think if it was an important use case it might affect the
>> shape of the API. However I can't think of a use case where it might
>> make sense for two tasks to read or write the same file descriptor
>> without some higher-level mediation. (Even at a higher level I find it
>> hard to imagine, except for writing to a common log file -- but even
>> there you want to be sure that individual lines aren't spliced into
>> each other, and the semantics of send() don't prevent that.)
> It is a common pattern to have multiple threads/processes trying to accept
> connections on an single listening socket, so it would be unfortunate to
> disallow that.

Ah, but that will work -- each thread has its own pollster, event loop
and scheduler and collection of tasks. And listening on a socket is a
pretty special case anyway -- I imagine we'd build a special API just
for that purpose.

> Writing (short messages) to a pipe also has atomic
> guarantees that can make having multiple writers perfectly reasonable.

That's a good one. I'll keep that on the list of requirements.

--Guido van Rossum (

From shibturn at  Mon Oct 29 17:41:57 2012
From: shibturn at (Richard Oudkerk)
Date: Mon, 29 Oct 2012 16:41:57 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <k6mbko$8jh$>

On 29/10/2012 4:09pm, Mark Hackett wrote:
> Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux
> (because of the string operations available in the x86 instruction set), but I
> don't thing anything other than an IPC message has a "you can write a string
> atomically" guarantee. And I may be misremembering that.

The guarantee I was talking about is for pipes on Unix:

POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be 
atomic: the output data is written to the pipe as a contiguous sequence. 
Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may 
interleave the data with data written by other processes. POSIX.1-2001 
requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096 
bytes.) ...

On Windows writes to pipes in message oriented mode are also atomic.

> And even if it's part of the SUS, how do we know this is true for non-UNIX
> compatible systems?

We don't, but that isn't necessarily a reason to ban it as evil.


From at  Mon Oct 29 17:42:44 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 12:42:44 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-29, at 1:05 AM, Greg Ewing <greg.ewing at> wrote:

> Yury Selivanov wrote:
>>    def coro1():
>>        try:
>>            with timeout(1.0):
>>                yield from coro2() # 1
>>        finally:
>>            try:
>>                with timeout(1.0):
>>                    yield from coro2() # 2
>>            except TimeoutError:
>>                pass
>>    def coro2():
>>        try:
>>            block()
>>            yield # 3
>>            action()
>>        finally:
>>            block()
>>            yield # 4
>>            another_action()
>> Now, if "coro2" is suspended at #4 -- it shouldn't be interrupted with
>> TimeoutError.
>> If, however, "coro2" is at #3 -- it can be, and it doesn't matter was it called from #1 or #2.
> What is your reasoning behind asserting this? Because it's inside
> a try block of its own? Because it's subject to a nested timeout?
> Something else?

Because scheduler, when it is deciding to interrupt a coroutine or not, 
should only question whether that particular coroutine is in its finally, 
and not the one which called it.


From mark.hackett at  Mon Oct 29 17:46:13 2012
From: mark.hackett at (Mark Hackett)
Date: Mon, 29 Oct 2012 16:46:13 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <k6mbko$8jh$>
References: <>
Message-ID: <>

On Monday 29 Oct 2012, Richard Oudkerk wrote:
> On Windows writes to pipes in message oriented mode are also atomic.
> > And even if it's part of the SUS, how do we know this is true for
> > non-UNIX compatible systems?
> We don't, but that isn't necessarily a reason to ban it as evil.

Hey, good idea I didn't say ban it, then hey?

But if the OS cannot guarantee atomic writes (and enforce that size to ensure 
atomic writes for the system run under), then you cannot just say "Atomic 
writes mean we can have safely multiple threads accessing the pipe".

The multiple access requires atomic access.

If that cannot be guaranteed, then you cannot give multiple access.

From mark.hackett at  Mon Oct 29 17:47:46 2012
From: mark.hackett at (Mark Hackett)
Date: Mon, 29 Oct 2012 16:47:46 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <k6mbko$8jh$>
References: <>
Message-ID: <>

On Monday 29 Oct 2012, Richard Oudkerk wrote:
> On Windows writes to pipes in message oriented mode are also atomic.

PS this means, like I said maybe, that you have to be running an IPC message 
to get guaranteed atomic writes.

If someone has their python programming with multiple thread accessing the 
pipe, but that pipe is NOT running in message oriented mode, then you will get 

From at  Mon Oct 29 17:47:50 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 12:47:50 -0400
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <20121029170731.74bd3d37@cosmocat>
References: <>
Message-ID: <>

On 2012-10-29, at 12:07 PM, Antoine Pitrou <solipsis at> wrote:

>> To invoke a primitive I/O operation, you call the current task's
>> block() method and then immediately yield (similar to Greg Ewing's
>> approach). There are helpers block_r() and block_w() that arrange for
>> a task to block until a file descriptor is ready for reading/writing.
>> Examples of their use are in
> That's weird and kindof ugly IMHO. Why would you write:
> 	scheduling.block_w(self.sock.fileno())
>        yield
> instead of say:
>        yield scheduling.block_w(self.sock.fileno())
> ?

I, personally, like and use the second approach.  But I believe the 
main incentive for Guido & Greg to use 'yields' like that is to make
one thing *very* clear: always use 'yield from' to call something.  
'yield' statement is just an explicit context switch point, and it 
should be used only for that purpose and only when you write a 
low-level APIs.


From at  Mon Oct 29 17:59:12 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 12:59:12 -0400
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <20121029170731.74bd3d37@cosmocat>
References: <>
Message-ID: <>

On 2012-10-29, at 12:07 PM, Antoine Pitrou <solipsis at> wrote:

>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>> APIs that should be invoked using yield from.
> Hmm, should they? Your approach looks a bit weird: you have functions
> that should use yield, and others that should use "yield from"? That
> sounds confusing to me.
> I'd much rather either have all functions use "yield", or have all
> functions use "yield from".
> (also, I wouldn't be shocked if coroutines had to wear a special
> decorator; it's a better marker than having the word COROUTINE in the
> docstring, anyway :-))

That's what bothers me is well.  'yield from' looks too long for a
simple thing it does (1); users will be confused whether they should
use 'yield' or 'yield from' (2); there is no visible difference between
a plain generator and a coroutine (3).

Personally, I like Greg's PEP 3152 (aside from 'cocall' keyword).
With that approach it's easy to distinguish coroutines, generators and
plain functions.  And it'd be easier to add some special 
methods/properties to codefs, like 'in_finally()' method etc.


From cesare.di.mauro at  Mon Oct 29 18:02:09 2012
From: cesare.di.mauro at (Cesare Di Mauro)
Date: Mon, 29 Oct 2012 18:02:09 +0100
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

2012/10/29 Mark Hackett <mark.hackett at>

> On Monday 29 Oct 2012, Richard Oudkerk wrote:
> > Writing (short messages) to a pipe also
> > has atomic guarantees that can make having multiple writers perfectly
> > reasonable.
> >
> > --
> > Richard
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at
> >
> >
> Is that actually true? It may be guaranteed on Intel x86 compatibles and
> Linux
> (because of the string operations available in the x86 instruction set),
> but I
> don't thing anything other than an IPC message has a "you can write a
> string
> atomically" guarantee. And I may be misremembering that.

x86 and x64 string operations aren't atomic. Only a few, selected,
instructions can be LOCK prefixed (XCHG is the only one that doesn't
require it, since it's always locked) to ensure an atomic RMW memory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Mon Oct 29 18:03:00 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 10:03:00 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <20121029170731.74bd3d37@cosmocat>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 9:07 AM, Antoine Pitrou <solipsis at> wrote:
> Le Sun, 28 Oct 2012 16:52:02 -0700,
> Guido van Rossum <guido at> a ?crit :
>> The event list started out as a tuple of (fd, flag, callback, args),
>> where flag is 'r' or 'w' (easily extensible); in practice neither the
>> fd nor the flag are used, and one of the last things I did was to wrap
>> callback and args into a simple object that allows cancelling the
>> callback; the add_*() methods return this object. (This could probably
>> use a little more abstraction.) Note that poll() doesn't call the
>> callbacks -- that's up to the event loop.
> I don't understand why the pollster takes callback objects if it never
> calls them. Also the fact that it wraps them into DelayedCalls is more
> mysterious to me. DelayedCalls represent one-time cancellable callbacks
> with a given deadline, not callbacks which are called any number of
> times on I/O events and that you can't cancel.

Yeah, this part definitely needs reworking. In the current design the
pollster is a base class of the eventloop, and the latter *does* call
them; but I want to refactor that anyway. I'll probably end up with a
pollster that registers (what are to it) opaque tokens and returns
just a list of tokens from poll(). (Unrelated: would it be useful if
poll() was an iterator?)

>> This is the scheduler for PEP-380 style coroutines. I started with a
>> Scheduler class and operations along the lines of Greg Ewing's design,
>> with a Scheduler instance as a global variable, but ended up ripping
>> it out in favor of a Task object that represents a single stack of
>> generators chained via yield-from. There is a Context object holding
>> the event loop and the current task in thread-local storage, so that
>> multiple threads can (and must) have independent event loops.
> YMMV, but I tend to be wary of implicit thread-local storage. What if
> someone runs a function or method depending on that thread-local
> storage from inside a thread pool? Weird bugs ensue.

Agreed, I had to figure out one of these in the implementation of
call_in_thread() and it wasn't fun.

I don't know what else to do -- I think it's probably best if I base
my implementation on this for now so that I know it works correctly in
such an environment. In the end there will probably be an API to get
the current context and another to influence how that API gets it, so
people can plug in their own schemes, from TLS to a simple global to
something determined by an external library.

> I think explicit context is much less error-prone. Even a single global
> instance (like Twisted's reactor) would be better :-)

I find that passing the context around everywhere makes for awkward APIs though.

> As for the rest of the scheduling module, I can't say much since I have
> a hard time reading and understanding it.

That's a problem, I need to write this up properly so that everyone
can understand it.

>> To invoke a primitive I/O operation, you call the current task's
>> block() method and then immediately yield (similar to Greg Ewing's
>> approach). There are helpers block_r() and block_w() that arrange for
>> a task to block until a file descriptor is ready for reading/writing.
>> Examples of their use are in
> That's weird and kindof ugly IMHO. Why would you write:
>         scheduling.block_w(self.sock.fileno())
>         yield
> instead of say:
>         yield scheduling.block_w(self.sock.fileno())
> ?

This has been debated at nauseam already (be glad you missed it);
basically, there's not a whole lot of difference but if there are some
APIs that require "yield X(args)" and others that require "yield from
Y(args)" that's really confusing. The "bare yield only" makes it
possible (though I didn't implement it here) to put some strict checks
in the scheduler -- next() should never return anything except None.
But there are other ways to do that too.

Anyway, I probably will change the API so that e.g. doesn't
have to use this paradigm; I'll just wrap these low-level APIs in a
proper "coroutine" and then can just use "yield from
block_r(fd)". (This is one reason why I like the "bare generators with
yield from" approach that Greg Ewing and PEP 380 recommend: it's
really cheap to wrap an API in an extra layer of yield-from. (See the benchmark I added to the tulip drectory.)

> Also, the fact that each call to SocketTransport.{recv,send} explicitly
> registers then removes the fd on the event loop looks wasteful.

I am hoping to add some optimization for this -- I am actually
planning a hackathon (or re-education session :-) with some Twisted
folks where I hope they'll explain to me how they do this.

> By the way, even when a fd is signalled ready, you must still be
> prepared for recv() to return EAGAIN (see

Yeah, I should know, I ran into this for a Google project too (there
was a kernel driver that was lying...). I had a cryptic remark in my
post above referring to this.

>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>> APIs that should be invoked using yield from.
> Hmm, should they? Your approach looks a bit weird: you have functions
> that should use yield, and others that should use "yield from"? That
> sounds confusing to me.

Yeah, see above.

> I'd much rather either have all functions use "yield", or have all
> functions use "yield from".

Agreed, and I'm strongly in favor of "yield from". The block_r() +
yield is considered an *internal* API.

> (also, I wouldn't be shocked if coroutines had to wear a special
> decorator; it's a better marker than having the word COROUTINE in the
> docstring, anyway :-))

Agreed it would be useful as documentation, and maybe an API can use
this to enforce proper coding style. It would have to be purely
decoration though -- I don't want an extra layer of wrapping to occur
each time you call a coroutine. (I.e. the decorator should just return

>> This implements some internet primitives using the APIs in
>> (including block_r() and block_w()). I call them
>> transports but they are different from transports Twisted; they are
>> closer to idealized sockets. SocketTransport wraps a plain socket,
>> offering recv() and send() methods that must be invoked using yield
>> from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
>> stdlib ssl sockets have good async support!).
> SslTransport.{recv,send} need the same kind of logic as do_handshake():
> catch both SSLWantReadError and SSLWantWriteError, and call block_r /
> block_w accordingly.

Oh... Thanks for the tip. I didn't find this in the ssl module docs.

>> Then there is a
>> BufferedReader class that implements more traditional read() and
>> readline() coroutines (i.e., to be invoked using yield from), the
>> latter handy for line-oriented transports.
> Well... It would be nice if BufferedReader could re-use the actual
> io.BufferedReader and its fast readline(), read(), readinto()
> implementations.

Agreed, I would love that too, but the problem is, *this*
BufferedReader defines methods you have to invoke with yield from.
Maybe we can come up with a solution for sharing code by modifying the
_io module though; that would be great! (I've also been thinking of
layering TextIOWrapper on top of these.)

Thanks for the thorough review!

--Guido van Rossum (

From at  Mon Oct 29 18:08:14 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 13:08:14 -0400
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-29, at 1:03 PM, Guido van Rossum <guido at> wrote:

> Agreed it would be useful as documentation, and maybe an API can use
> this to enforce proper coding style. It would have to be purely
> decoration though -- I don't want an extra layer of wrapping to occur
> each time you call a coroutine. (I.e. the decorator should just return
> "func".)

I'd also set something like 'func.__coroutine__' to True.  That will allow 
to analyze, introspect, validate and do other useful things.


From g.brandl at  Mon Oct 29 18:24:30 2012
From: g.brandl at (Georg Brandl)
Date: Mon, 29 Oct 2012 18:24:30 +0100
Subject: [Python-ideas]
In-Reply-To: <>
References: <>
Message-ID: <k6me4b$16i$>

Am 29.10.2012 17:12, schrieb Jay Wren:

>>> And since 2.7 is the last in the 2.x line, I think it makes sense to 
>>> reflect that explicitly in the redirections.
>> I'm not against an explicit 2.7 link - we have that already, don't we?
> Did this change recently? I just noticed that from
> if I click "Browse Current Documentation" under then Python 2.x section, it
> links to which then redirects to which is
> NOT the 2.x current documentation for which I clicked. -- Jay

Good point.  Should be fixed now.


From guido at  Mon Oct 29 18:43:09 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 10:43:09 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 10:08 AM, Yury Selivanov
< at> wrote:
> On 2012-10-29, at 1:03 PM, Guido van Rossum <guido at> wrote:
>> Agreed it would be useful as documentation, and maybe an API can use
>> this to enforce proper coding style. It would have to be purely
>> decoration though -- I don't want an extra layer of wrapping to occur
>> each time you call a coroutine. (I.e. the decorator should just return
>> "func".)
> I'd also set something like 'func.__coroutine__' to True.  That will allow
> to analyze, introspect, validate and do other useful things.

Yes, that sounds about right.

--Guido van Rossum (

From andrew.svetlov at  Mon Oct 29 19:02:09 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Mon, 29 Oct 2012 20:02:09 +0200
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Pollster has to support any object as file descriptor.
The use case is ZeroMQ sockets: they are implemented at user level and
socket is just some opaque structure wrapped by Python object.
ZeroMQ has own poll function to process zmq sockets as well as regular

I would to see add_{reader,writer} and call_{soon,later} accepting
**kwargs as well as *args. At least to respect functions with
keyword-only arguments.

+1 for explicit passing loop instance and clearing role of DelayedCall.

Decorating coroutines with setting some flag looks good to me, but I
expect some problems with setting extra attribute to objects like

Thanks, Andrew.

From g.rodola at  Mon Oct 29 19:08:45 2012
From: g.rodola at (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=)
Date: Mon, 29 Oct 2012 19:08:45 +0100
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

2012/10/29 Guido van Rossum <guido at>
> I'm most interested in feedback on the design of and
>, and to a lesser extent on the design of;
> is just an example of how this style works out in practice.

Follows my comments.

=== About ===

1 - I think DelayedCall should have a reset() method, other than just cancel().

2 - EventLoopMixin should have a call_every() method other than just

3 - call_later() and call_every() should also take **kwargs other than
just *args

4 - I think PollsterBase should provide a method to modify() the
events registered for a certain fd (both poll() and epoll() have such
a method and it's faster compared to un/registering a fd).

Feel free to take a look at my scheduler implementation which looks
quite similar to what you've done in

=== About ===

1 - In SocketTransport it seems there's no error handling provisioned
for send() and recv().
You should expect these errors
signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry"

2 - SslTransport's send() and recv() methods should suffer the same problem.

3 - I don't fully understand how data transfer works exactly but keep
in mind that the transport should interact with the pollster.
What I mean is that generally speaking a connected socket should
*always* be readable ("r"), even when it's idle, then switch to "rw"
events when sending data, then get back to "r" when all the data has
been sent.
This is *crucial* if you want to achieve high performances/scalability
and that is why PollsterBase should probably provide a modify()
Please take a look at what I've done here:

===  Other considerations ===

This 'yield' / 'yield from' approach is new to me (I'm more of a
"callback guy") so I can't say I fully understand what's going on just
by reading the code.
What I would like to see instead of is a bunch of code samples
/ demos showing how this library is supposed to be used in different
In details I'd like to see at least:

1 - a client example (connect(), send() a string, recv() a response, close())
2 - an echo server example (accept(), recv() string,  send() it back(), close()
3 - how to use a different transport (e.g. UDP)?
4 - how to run long running tasks in a thread?


5 - is it possible to use multiple "reactors" in different threads?
How?  (asyncore for example achieves this by providing a separate
'map' argument for both the 'reactor' and the dispatchers)

I understand you just started with this so I'm probably asking too
much at this point in time.
Feel free to consider this a kind of a "long term review".

--- Giampaolo

From guido at  Mon Oct 29 19:10:42 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 11:10:42 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 11:02 AM, Andrew Svetlov
<andrew.svetlov at> wrote:
> Pollster has to support any object as file descriptor.
> The use case is ZeroMQ sockets: they are implemented at user level and
> socket is just some opaque structure wrapped by Python object.
> ZeroMQ has own poll function to process zmq sockets as well as regular
> sockets/pipes/files.

Good call! This seem to be an excellent use case to validate the
pollster design. Are you saying that the approach I used for
SslTransport doesn't work here? (I can believe it, I've never looked
at 0MQ, but I can't tell from your message.) The insistence on
isinstance(fd, int) is mostly there so that I don't accidentally
register a socket object *and* its file descriptor at the same time --
but there are other ways to ensure that. I've added a TODO item for

> I would to see add_{reader,writer} and call_{soon,later} accepting
> **kwargs as well as *args. At least to respect functions with
> keyword-only arguments.

Hmm... I intentionally ruled those out because I wanted to leave the
door open for keyword args that modify the registration function
(add_reader etc.); it is awkward to require conventions like "your
function cannot have a keyword arg named X because we use that for our
own API" and it is even more awkward to have to retrofit new values of
X into that rule. Maybe we can come up with a simple wrapper.

> +1 for explicit passing loop instance and clearing role of DelayedCall.

Will do. (I think you meant clarifying?)

> Decorating coroutines with setting some flag looks good to me, but I
> expect some problems with setting extra attribute to objects like
> staticmethod/classmethod.


--Guido van Rossum (

From guido at  Mon Oct 29 19:43:57 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 11:43:57 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? <g.rodola at> wrote:
> 2012/10/29 Guido van Rossum <guido at>
>> I'm most interested in feedback on the design of and
>>, and to a lesser extent on the design of;
>> is just an example of how this style works out in practice.
> Follows my comments.
> === About ===
> 1 - I think DelayedCall should have a reset() method, other than just cancel().

So, essentially an uncancel()? Why not just re-register in that case?
Or what's your use case? (Right now there's no problem in calling one
of these many times -- it's just that cancellation is permanent.)

> 2 - EventLoopMixin should have a call_every() method other than just
> call_later()

Arguably you can emulate that with a simple loop:

def call_every(secs, func, *args):
    while True:
        yield from scheduler.sleep(secs)

(Flavor to taste to log exceptions, handle cancellation, automatically
spawn a separate task, etc.)

I can build lots of other useful things out of call_soon() and
call_later() -- but I do need at least those two as "axioms".

> 3 - call_later() and call_every() should also take **kwargs other than
> just *args

I just replied to that in a previous message; there's also a comment
in the code. How important is this really? Are there lots of use cases
that require you to pass keyword args? If it's only on occasion you
can use a lambda. (The *args is a compromise so we don't need a lambda
to wrap every callback. But I want to reserve keyword args for future
extensions to the registration functions.)

> 4 - I think PollsterBase should provide a method to modify() the
> events registered for a certain fd (both poll() and epoll() have such
> a method and it's faster compared to un/registering a fd).

Did you see the concrete implementations? Those where this matters
implicitly uses modify() if the required flags change. I can imagine
more optimizations of the implementations (e.g. delaying
register()/modify() calls until poll() is actually called, to avoid
unnecessary churn) without making the API more complex.

> Feel free to take a look at my scheduler implementation which looks
> quite similar to what you've done in

Thanks, I had seen it previously, I think this also proves that
there's nothing particularly earth-shattering about this design. :-)
I'd love to copy some more of your tricks, e.g. the occasional
re-heapifying. (What usage pattern is this dealing with exactly?) I
should also check that I've taken care of all the various flags and
other details (I recall being quite surprised that with poll(), on
some platforms I need to check for POLLHUP but not on others).

> === About ===
> 1 - In SocketTransport it seems there's no error handling provisioned
> for send() and recv().
> You should expect these errors
> signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry"

Right, I know have been naive about these and have already got a TODO note.

> 2 - SslTransport's send() and recv() methods should suffer the same problem.

Ditto, Antoine told me.

> 3 - I don't fully understand how data transfer works exactly but keep
> in mind that the transport should interact with the pollster.
> What I mean is that generally speaking a connected socket should
> *always* be readable ("r"), even when it's idle, then switch to "rw"
> events when sending data, then get back to "r" when all the data has
> been sent.
> This is *crucial* if you want to achieve high performances/scalability
> and that is why PollsterBase should probably provide a modify()
> method.
> Please take a look at what I've done here:

Hm. I am not convinced that managing this explicitly from the
transport is the right solution (note that my transports are quite
different from those in Twisted). But I'll keep this in mind -- I
would like to set up a benchmark suite at some point. I will probably
have to implement the server side of HTTP for that purpose, so I can
point e.g. ab at my app.

> ===  Other considerations ===
> This 'yield' / 'yield from' approach is new to me (I'm more of a
> "callback guy") so I can't say I fully understand what's going on just
> by reading the code.

Fair enough. You should probably start by reading Greg Ewing's
tutorial -- it's short and sweet:

> What I would like to see instead of is a bunch of code samples
> / demos showing how this library is supposed to be used in different
> circumstances.

Agreed, more examples are needed.

> In details I'd like to see at least:
> 1 - a client example (connect(), send() a string, recv() a response, close())

Hm, that's all in urlfetch().

> 2 - an echo server example (accept(), recv() string,  send() it back(), close()

Yes, that's missing.

> 3 - how to use a different transport (e.g. UDP)?

I haven't looked into this yet. I expect I'll have to write a
different SocketTransport for this (the existing transports are
implicitly stream-oriented) but I know that the scheduler and
eventloop implementation can handle this fine.

> 4 - how to run long running tasks in a thread?

That's implemented. Check out call_in_thread(). Note that you can pass
it an alternate threadpool (executor).

> Also:
> 5 - is it possible to use multiple "reactors" in different threads?

Should be possible.

> How?  (asyncore for example achieves this by providing a separate
> 'map' argument for both the 'reactor' and the dispatchers)

It works by making the Context class use thread-local storage (TLS).

> I understand you just started with this so I'm probably asking too
> much at this point in time.
> Feel free to consider this a kind of a "long term review".

You have asked many useful questions already. Since you have
implemented a real-world I/O loop yourself, your input is extremely
valuable. Thanks, and keep at it!

--Guido van Rossum (

From at  Mon Oct 29 20:10:17 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 15:10:17 -0400
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-29, at 2:02 PM, Andrew Svetlov <andrew.svetlov at> wrote:

> Pollster has to support any object as file descriptor.
> The use case is ZeroMQ sockets: they are implemented at user level and
> socket is just some opaque structure wrapped by Python object.
> ZeroMQ has own poll function to process zmq sockets as well as regular
> sockets/pipes/files.

Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets.
Just get the underlying file descriptor with 'getsockopt', as described

For instance, here is a stripped out zmq support classes I have in my

  class Socket(_zmq_Socket):
      def __init__(self, *args, **kwargs):
          super().__init__(*args, **kwargs)
          self.fileno = self.getsockopt(FD)

      def send(self, data, *, flags=0, copy=True, track=False):
          flags |= NOBLOCK

              result = _zmq_Socket.send(self, data, flags, copy, track)
          except ZMQError as e:
              if e.errno != EAGAIN:
              self._sending = (Promise(), data, flags, copy, track)
              return self._sending[0]
              p = Promise()
              return p

  class Context(_zmq_Context):
      _socket_class = Socket

And '_schedule_write' accepts any object with 'fileno' property, and
uses an appropriate polling mechanism to poll.

So to use a non-blocking ZMQ sockets, you simply do:

    context = Context()
    socket = context.socket(zmq.REP)
    yield socket.send(message)

From andrew.svetlov at  Mon Oct 29 20:24:25 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Mon, 29 Oct 2012 21:24:25 +0200
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 29, 2012 at 11:02 AM, Andrew Svetlov
> <andrew.svetlov at> wrote:
>> Pollster has to support any object as file descriptor.
>> The use case is ZeroMQ sockets: they are implemented at user level and
>> socket is just some opaque structure wrapped by Python object.
>> ZeroMQ has own poll function to process zmq sockets as well as regular
>> sockets/pipes/files.
> Good call! This seem to be an excellent use case to validate the
> pollster design. Are you saying that the approach I used for
> SslTransport doesn't work here? (I can believe it, I've never looked
> at 0MQ, but I can't tell from your message.) The insistence on
> isinstance(fd, int) is mostly there so that I don't accidentally
> register a socket object *and* its file descriptor at the same time --
> but there are other ways to ensure that. I've added a TODO item for
> now.
0MQ socket has no file descriptor at all, it's just pointer to some
unspecified structure.
So 0MQ has own *poll* function which can process that sockets as well
as file descriptors.
Interface is mimic to poll object from python stdlib.
You can see
as example.
For 0MQ support tulip has to have yet another reactor implementation
in line of select, epoll, kqueue etc.
Not big deal, but it would be nice if PollsterBase will not assume the
registered object is always int file descriptor.

>> I would to see add_{reader,writer} and call_{soon,later} accepting
>> **kwargs as well as *args. At least to respect functions with
>> keyword-only arguments.
> Hmm... I intentionally ruled those out because I wanted to leave the
> door open for keyword args that modify the registration function
> (add_reader etc.); it is awkward to require conventions like "your
> function cannot have a keyword arg named X because we use that for our
> own API" and it is even more awkward to have to retrofit new values of
> X into that rule. Maybe we can come up with a simple wrapper.

It can be solved easy with using names like __when, __callback etc.
That names will never clutter with user provided kwargs I believe.

>> +1 for explicit passing loop instance and clearing role of DelayedCall.
> Will do. (I think you meant clarifying?)
Exactly. Thanks.

>> Decorating coroutines with setting some flag looks good to me, but I
>> expect some problems with setting extra attribute to objects like
>> staticmethod/classmethod.
> Noted.
> --
> --Guido van Rossum (

Thank you, Andrew Svetlov

From andrew.svetlov at  Mon Oct 29 20:32:41 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Mon, 29 Oct 2012 21:32:41 +0200
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 9:10 PM, Yury Selivanov < at> wrote:
> On 2012-10-29, at 2:02 PM, Andrew Svetlov <andrew.svetlov at> wrote:
>> Pollster has to support any object as file descriptor.
>> The use case is ZeroMQ sockets: they are implemented at user level and
>> socket is just some opaque structure wrapped by Python object.
>> ZeroMQ has own poll function to process zmq sockets as well as regular
>> sockets/pipes/files.
> Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets.
> Just get the underlying file descriptor with 'getsockopt', as described
> here:

Well, will take a look. I used zmq poll only.
It works for reading only, not for writing, right?
As I know you use proactor pattern.
Can reactor has some problems with this approach?
May embedded 0MQ poll be more effective via some internal optimizations?

> For instance, here is a stripped out zmq support classes I have in my
> framework:
>   class Socket(_zmq_Socket):
>       def __init__(self, *args, **kwargs):
>           super().__init__(*args, **kwargs)
>           self.fileno = self.getsockopt(FD)
>       ...
>       #coroutine
>       def send(self, data, *, flags=0, copy=True, track=False):
>           flags |= NOBLOCK
>           try:
>               result = _zmq_Socket.send(self, data, flags, copy, track)
>           except ZMQError as e:
>               if e.errno != EAGAIN:
>                   raise
>               self._sending = (Promise(), data, flags, copy, track)
>               self._scheduler.proactor._schedule_write(self)
>               return self._sending[0]
>           else:
>               p = Promise()
>               p.send(result)
>               return p
>       ...
>   class Context(_zmq_Context):
>       _socket_class = Socket
> And '_schedule_write' accepts any object with 'fileno' property, and
> uses an appropriate polling mechanism to poll.
> So to use a non-blocking ZMQ sockets, you simply do:
>     context = Context()
>     socket = context.socket(zmq.REP)
>     ...
>     yield socket.send(message)

Andrew Svetlov

From guido at  Mon Oct 29 20:54:24 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 12:54:24 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 12:24 PM, Andrew Svetlov
<andrew.svetlov at> wrote:
> On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum <guido at> wrote:
>>> I would to see add_{reader,writer} and call_{soon,later} accepting
>>> **kwargs as well as *args. At least to respect functions with
>>> keyword-only arguments.
>> Hmm... I intentionally ruled those out because I wanted to leave the
>> door open for keyword args that modify the registration function
>> (add_reader etc.); it is awkward to require conventions like "your
>> function cannot have a keyword arg named X because we use that for our
>> own API" and it is even more awkward to have to retrofit new values of
>> X into that rule. Maybe we can come up with a simple wrapper.
> It can be solved easy with using names like __when, __callback etc.
> That names will never clutter with user provided kwargs I believe.

No, those names have different meaning inside a class (they would be
transformed into _<class>__when, where <class> is the name of the
*current* class textually enclosing the use). I am not closing the
door on this one but I'd have to see a lot more evidence that this
issue is widespread.

--Guido van Rossum (

From at  Mon Oct 29 20:57:26 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 15:57:26 -0400
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-29, at 3:32 PM, Andrew Svetlov <andrew.svetlov at> wrote:

> On Mon, Oct 29, 2012 at 9:10 PM, Yury Selivanov < at> wrote:
>> On 2012-10-29, at 2:02 PM, Andrew Svetlov <andrew.svetlov at> wrote:
>>> Pollster has to support any object as file descriptor.
>>> The use case is ZeroMQ sockets: they are implemented at user level and
>>> socket is just some opaque structure wrapped by Python object.
>>> ZeroMQ has own poll function to process zmq sockets as well as regular
>>> sockets/pipes/files.
>> Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets.
>> Just get the underlying file descriptor with 'getsockopt', as described
>> here:
> Well, will take a look. I used zmq poll only.
> It works for reading only, not for writing, right?
> As I know you use proactor pattern.
> Can reactor has some problems with this approach?
> May embedded 0MQ poll be more effective via some internal optimizations?

It's officially documented and supported approach.  We haven't seen any
problem with it so far.

It works both for reading and writing, however, 99.9% EAGAIN errors occur
on reading.  When you 'send', it just stores your data in an internal
buffer and sends it itself.  When you 'read', well, if there is no data
in buffers you get EAGAIN.

As for the performance -- I haven't tested 'zmq.poll' vs (let's say) epoll, 
but I doubt there is any significant difference.  And if I would want to 
write a benchmark, I'd first compare pure blocking ZMQ sockets vs 
non-blocking ZMQ sockets with ZMQ.poll, as ZMQ uses threads heavily, and
probably, blocking threads-driven IO is faster then non-blocking with 
polling (when FDs count is relatively small), no matter whether you use
zmq.poll or epoll/etc.


From andrew.svetlov at  Mon Oct 29 20:58:35 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Mon, 29 Oct 2012 21:58:35 +0200
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 8:43 PM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? <g.rodola at> wrote:
>> 2012/10/29 Guido van Rossum <guido at>
>>> I'm most interested in feedback on the design of and
>>>, and to a lesser extent on the design of;
>>> is just an example of how this style works out in practice.
>> Follows my comments.
>> === About ===
>> 1 - I think DelayedCall should have a reset() method, other than just cancel().
> So, essentially an uncancel()? Why not just re-register in that case?
> Or what's your use case? (Right now there's no problem in calling one
> of these many times -- it's just that cancellation is permanent.)
>> 2 - EventLoopMixin should have a call_every() method other than just
>> call_later()
> Arguably you can emulate that with a simple loop:
> def call_every(secs, func, *args):
>     while True:
>         yield from scheduler.sleep(secs)
>         func(*args)
> (Flavor to taste to log exceptions, handle cancellation, automatically
> spawn a separate task, etc.)
> I can build lots of other useful things out of call_soon() and
> call_later() -- but I do need at least those two as "axioms".
>> 3 - call_later() and call_every() should also take **kwargs other than
>> just *args
> I just replied to that in a previous message; there's also a comment
> in the code. How important is this really? Are there lots of use cases
> that require you to pass keyword args? If it's only on occasion you
> can use a lambda. (The *args is a compromise so we don't need a lambda
> to wrap every callback. But I want to reserve keyword args for future
> extensions to the registration functions.)

Well, using keyword-only arguments for passing flags can be good point.
I can live with *args only. Maybe using **kwargs for call_later family
only is good compromise?
Really I don't care on add_reader/add_writer, that functions intended
to library writers.
call_later and call_soon can be used in user code often enough and
passing keyword arguments can be convenient.

>> 4 - I think PollsterBase should provide a method to modify() the
>> events registered for a certain fd (both poll() and epoll() have such
>> a method and it's faster compared to un/registering a fd).
> Did you see the concrete implementations? Those where this matters
> implicitly uses modify() if the required flags change. I can imagine
> more optimizations of the implementations (e.g. delaying
> register()/modify() calls until poll() is actually called, to avoid
> unnecessary churn) without making the API more complex.
>> Feel free to take a look at my scheduler implementation which looks
>> quite similar to what you've done in
> Thanks, I had seen it previously, I think this also proves that
> there's nothing particularly earth-shattering about this design. :-)
> I'd love to copy some more of your tricks, e.g. the occasional
> re-heapifying. (What usage pattern is this dealing with exactly?) I
> should also check that I've taken care of all the various flags and
> other details (I recall being quite surprised that with poll(), on
> some platforms I need to check for POLLHUP but not on others).
>> === About ===
>> 1 - In SocketTransport it seems there's no error handling provisioned
>> for send() and recv().
>> You should expect these errors
>> signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry"
> Right, I know have been naive about these and have already got a TODO note.
>> 2 - SslTransport's send() and recv() methods should suffer the same problem.
> Ditto, Antoine told me.
>> 3 - I don't fully understand how data transfer works exactly but keep
>> in mind that the transport should interact with the pollster.
>> What I mean is that generally speaking a connected socket should
>> *always* be readable ("r"), even when it's idle, then switch to "rw"
>> events when sending data, then get back to "r" when all the data has
>> been sent.
>> This is *crucial* if you want to achieve high performances/scalability
>> and that is why PollsterBase should probably provide a modify()
>> method.
>> Please take a look at what I've done here:
> Hm. I am not convinced that managing this explicitly from the
> transport is the right solution (note that my transports are quite
> different from those in Twisted). But I'll keep this in mind -- I
> would like to set up a benchmark suite at some point. I will probably
> have to implement the server side of HTTP for that purpose, so I can
> point e.g. ab at my app.
>> ===  Other considerations ===
>> This 'yield' / 'yield from' approach is new to me (I'm more of a
>> "callback guy") so I can't say I fully understand what's going on just
>> by reading the code.
> Fair enough. You should probably start by reading Greg Ewing's
> tutorial -- it's short and sweet:
>> What I would like to see instead of is a bunch of code samples
>> / demos showing how this library is supposed to be used in different
>> circumstances.
> Agreed, more examples are needed.
>> In details I'd like to see at least:
>> 1 - a client example (connect(), send() a string, recv() a response, close())
> Hm, that's all in urlfetch().
>> 2 - an echo server example (accept(), recv() string,  send() it back(), close()
> Yes, that's missing.
>> 3 - how to use a different transport (e.g. UDP)?
> I haven't looked into this yet. I expect I'll have to write a
> different SocketTransport for this (the existing transports are
> implicitly stream-oriented) but I know that the scheduler and
> eventloop implementation can handle this fine.
>> 4 - how to run long running tasks in a thread?
> That's implemented. Check out call_in_thread(). Note that you can pass
> it an alternate threadpool (executor).
>> Also:
>> 5 - is it possible to use multiple "reactors" in different threads?
> Should be possible.
>> How?  (asyncore for example achieves this by providing a separate
>> 'map' argument for both the 'reactor' and the dispatchers)
> It works by making the Context class use thread-local storage (TLS).
>> I understand you just started with this so I'm probably asking too
>> much at this point in time.
>> Feel free to consider this a kind of a "long term review".
> You have asked many useful questions already. Since you have
> implemented a real-world I/O loop yourself, your input is extremely
> valuable. Thanks, and keep at it!
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Andrew Svetlov

From andrew.svetlov at  Mon Oct 29 21:03:12 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Mon, 29 Oct 2012 22:03:12 +0200
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

I mean just something like:

    def call_soon(__self, __callback, *__args, **__kwargs):
        dcall = DelayedCall(None, __callback, __args, __kwargs)
        return dcall

Not big deal, through. We can delay this discussion for later.

On Mon, Oct 29, 2012 at 9:54 PM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 29, 2012 at 12:24 PM, Andrew Svetlov
> <andrew.svetlov at> wrote:
>> On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum <guido at> wrote:
> [Andrew]
>>>> I would to see add_{reader,writer} and call_{soon,later} accepting
>>>> **kwargs as well as *args. At least to respect functions with
>>>> keyword-only arguments.
>>> Hmm... I intentionally ruled those out because I wanted to leave the
>>> door open for keyword args that modify the registration function
>>> (add_reader etc.); it is awkward to require conventions like "your
>>> function cannot have a keyword arg named X because we use that for our
>>> own API" and it is even more awkward to have to retrofit new values of
>>> X into that rule. Maybe we can come up with a simple wrapper.
>> It can be solved easy with using names like __when, __callback etc.
>> That names will never clutter with user provided kwargs I believe.
> No, those names have different meaning inside a class (they would be
> transformed into _<class>__when, where <class> is the name of the
> *current* class textually enclosing the use). I am not closing the
> door on this one but I'd have to see a lot more evidence that this
> issue is widespread.
> --
> --Guido van Rossum (

Andrew Svetlov

From g.rodola at  Mon Oct 29 22:20:44 2012
From: g.rodola at (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=)
Date: Mon, 29 Oct 2012 22:20:44 +0100
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

2012/10/29 Guido van Rossum <guido at>:
> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? <g.rodola at> wrote:
>> 2012/10/29 Guido van Rossum <guido at>
>> === About ===
>> 1 - I think DelayedCall should have a reset() method, other than just cancel().
> So, essentially an uncancel()? Why not just re-register in that case?
> Or what's your use case? (Right now there's no problem in calling one
> of these many times -- it's just that cancellation is permanent.)

The most common use case is when you want to disconnect the other peer
after a certain time of inactivity.
Ideally what you would do is schedule() a idle/timeout function and
reset() it every time the other peer sends you some data.

>> 2 - EventLoopMixin should have a call_every() method other than just
>> call_later()
> Arguably you can emulate that with a simple loop:
> def call_every(secs, func, *args):
>     while True:
>         yield from scheduler.sleep(secs)
>         func(*args)
> (Flavor to taste to log exceptions, handle cancellation, automatically
> spawn a separate task, etc.)
> I can build lots of other useful things out of call_soon() and
> call_later() -- but I do need at least those two as "axioms".


>> 3 - call_later() and call_every() should also take **kwargs other than
>> just *args
> I just replied to that in a previous message; there's also a comment
> in the code. How important is this really? Are there lots of use cases
> that require you to pass keyword args? If it's only on occasion you
> can use a lambda. (The *args is a compromise so we don't need a lambda
> to wrap every callback. But I want to reserve keyword args for future
> extensions to the registration functions.)

It's not crucial to have kwargs, just nice, but I understand your
motives to rule them out, in fact I reserved two kwarg names
('_errback' and '_scheduler') for the same reason.
In my experience I learned that passing an extra error handler
function (what Twisted calls 'errrback') can be desirable, so that's
another thing you might want to consider.
In my scheduler implementation I achieved that by passing an _errback
keyword parameter, like this:

>>> ioloop.call_later(30, callback, _errback=err_callback)

Not very nice to use a reserved keyword, I agree.
Perhaps you can keep ruling out kwargs referred to the callback
function and change the current call_later signature as such:

-    def call_later(self, when, callback, *args):
+    def call_later(self, when, callback, *args, errback=None):

...or maybe provide a DelayedCall.add_errback() method a-la Twisted.

> Thanks, I had seen it previously, I think this also proves that
> there's nothing particularly earth-shattering about this design. :-)
> I'd love to copy some more of your tricks,

Sure, go on. It's MIT licensed code.

> e.g. the occasional re-heapifying. (What usage pattern is this
> dealing with exactly?)

It's intended to avoid making the list grow with too many cancelled functions.
Imagine this use case:

WEEK = 60 x 60 x 24 x 7
for x in xrange(1000000):
    f = call_later(WEEK, fun)

You'll end up having a heap with milions of cancelled items which will
be freed after a week.
Instead you can keep track of the number of cancelled functions every
time cancel() is called and re-heapify the list when that number gets
too high:

> should also check that I've taken care of all the various flags and
> other details (I recall being quite surprised that with poll(), on
> some platforms I need to check for POLLHUP but not on others).

Yeah, that's a painful part.
Try to look here:
Instead of handle_close()ing you should add the fd to the list of
readable ones ("r").
The call to recv() which will be coming next will then cause the
socket to close (you have to add the error handling to recv() first

>> 3 - I don't fully understand how data transfer works exactly but keep
>> in mind that the transport should interact with the pollster.
>> What I mean is that generally speaking a connected socket should
>> *always* be readable ("r"), even when it's idle, then switch to "rw"
>> events when sending data, then get back to "r" when all the data has
>> been sent.
>> This is *crucial* if you want to achieve high performances/scalability
>> and that is why PollsterBase should probably provide a modify()
>> method.
>> Please take a look at what I've done here:
> Hm. I am not convinced that managing this explicitly from the
> transport is the right solution (note that my transports are quite
> different from those in Twisted). But I'll keep this in mind -- I
> would like to set up a benchmark suite at some point. I will probably
> have to implement the server side of HTTP for that purpose, so I can
> point e.g. ab at my app.

I think you might want to apply that to something slighlty higher
level than the mere transport.
Something like the equivalent of asynchat.push /
asynchat.push_with_producer, if you'll ever want to go that far in
terms of abstraction, or maybe avoid that at all but make it clear in
the doc that the user should take care of that.
My point is that having a socket registered for both "r" AND "w"
events when in fact you want only "r" OR "w" is an exponential waste
of CPU cycles and it should be avoided either by the lib or by the
"old select() implementation" vs "new select() implementation"
benchmark shown here reflects exactly this problem which still affects
base asyncore module:

I'll keep following the progress on this and hopefully come up with
another set of questions and/or random thoughts.

--- Giampaolo

From solipsis at  Mon Oct 29 22:25:41 2012
From: solipsis at (Antoine Pitrou)
Date: Mon, 29 Oct 2012 22:25:41 +0100
Subject: [Python-ideas] non-blocking buffered I/O
References: <>
Message-ID: <>

On Mon, 29 Oct 2012 10:03:00 -0700
Guido van Rossum <guido at> wrote:
> >> Then there is a
> >> BufferedReader class that implements more traditional read() and
> >> readline() coroutines (i.e., to be invoked using yield from), the
> >> latter handy for line-oriented transports.
> >
> > Well... It would be nice if BufferedReader could re-use the actual
> > io.BufferedReader and its fast readline(), read(), readinto()
> > implementations.
> Agreed, I would love that too, but the problem is, *this*
> BufferedReader defines methods you have to invoke with yield from.
> Maybe we can come up with a solution for sharing code by modifying the
> _io module though; that would be great! (I've also been thinking of
> layering TextIOWrapper on top of these.)

There is a rather infamous issue about _io.BufferedReader and
non-blocking I/O at
It is a bit problematic because currently non-blocking readline()
returns '' instead of None when no data is available, meaning EOF can't
be easily detected :(

Once this issue is solved, you could use _io.BufferedReader, and
workaround the "partial read/readline result" issue by iterating
(hopefully in most cases there is enough data in the buffer to 
return a complete read or readline, so the C optimizations are useful).
Here is how it may work:

def __init__(self, fd):
    self.fd = fd
    self.bufio = _io.BufferedReader(...)

def readline(self):
    chunks = []
    while True:
        line = self.bufio.readline()
        if line is not None:
            if line == b'' or line.endswith(b'\n'):
                # EOF or EOL
                return b''.join(chunks)
        yield from scheduler.block_r(self.fd)

def read(self, n):
    chunks = []
    bytes_read = 0
    while True:
        data = - bytes_read)
        if data is not None:
            bytes_read += len(data)
            if data == b'' or bytes_read == n:
                # EOF or read satisfied
        yield from scheduler.block_r(self.fd)
    return b''.join(chunks)

As for TextIOWrapper, AFAIR it doesn't handle non-blocking I/O at all
(but my memories are vague).

By the way I don't know how this whole approach (of mocking socket-like
or file-like objects with coroutine-y read() / readline() methods)
lends itself to plugging into Windows' IOCP. You may rely on some raw
I/O object that registers a callback when a read() is requested and
then yields a Future object that gets completed by the callback.
I'm sure Richard has some ideas about that :-)



From guido at  Mon Oct 29 23:03:07 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 15:03:07 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 2:20 PM, Giampaolo Rodol? <g.rodola at> wrote:
> 2012/10/29 Guido van Rossum <guido at>:
>> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? <g.rodola at> wrote:
>>> 2012/10/29 Guido van Rossum <guido at>
>>> === About ===
>>> 1 - I think DelayedCall should have a reset() method, other than just cancel().
>> So, essentially an uncancel()? Why not just re-register in that case?
>> Or what's your use case? (Right now there's no problem in calling one
>> of these many times -- it's just that cancellation is permanent.)
> The most common use case is when you want to disconnect the other peer
> after a certain time of inactivity.
> Ideally what you would do is schedule() a idle/timeout function and
> reset() it every time the other peer sends you some data.

Um, ok, I think you are saying that you want to be able to set
timeouts and then "reset" that timeout. This is a much higher-level
thing than canceling the DelayedCall object. (I have no desire to make
DelayedCall have functionality like Twisted's Deferred. It is
something *much* simpler; it's just the API for cancelling a callback
passed to call_later(), and its other uses are similar to this.)

> Not very nice to use a reserved keyword, I agree.
> Perhaps you can keep ruling out kwargs referred to the callback
> function and change the current call_later signature as such:
> -    def call_later(self, when, callback, *args):
> +    def call_later(self, when, callback, *args, errback=None):
> ...or maybe provide a DelayedCall.add_errback() method a-la Twisted.

I really don't want that though! But I'm glad you're not too hell-bent
on supporting callbacks with keyword-only args.

>> should also check that I've taken care of all the various flags and
>> other details (I recall being quite surprised that with poll(), on
>> some platforms I need to check for POLLHUP but not on others).
> Yeah, that's a painful part.
> Try to look here:
> Instead of handle_close()ing you should add the fd to the list of
> readable ones ("r").
> The call to recv() which will be coming next will then cause the
> socket to close (you have to add the error handling to recv() first
> though).

Aha, are you suggesting that I close the socket when I detect that the
socket is closed? But what if the other side uses shutdown() to close
only one end? Depending on the protocol it might be useful to either
stop reading but keep sending, or vice versa. Maybe I could detect
that both ends are closed and then close the socket. Or are you
suggesting something else?

>>> 3 - I don't fully understand how data transfer works exactly but keep
>>> in mind that the transport should interact with the pollster.
>>> What I mean is that generally speaking a connected socket should
>>> *always* be readable ("r"), even when it's idle, then switch to "rw"
>>> events when sending data, then get back to "r" when all the data has
>>> been sent.
>>> This is *crucial* if you want to achieve high performances/scalability
>>> and that is why PollsterBase should probably provide a modify()
>>> method.
>>> Please take a look at what I've done here:
>> Hm. I am not convinced that managing this explicitly from the
>> transport is the right solution (note that my transports are quite
>> different from those in Twisted). But I'll keep this in mind -- I
>> would like to set up a benchmark suite at some point. I will probably
>> have to implement the server side of HTTP for that purpose, so I can
>> point e.g. ab at my app.
> I think you might want to apply that to something slighlty higher
> level than the mere transport.

(Apply *what*?)

> Something like the equivalent of asynchat.push /
> asynchat.push_with_producer, if you'll ever want to go that far in
> terms of abstraction, or maybe avoid that at all but make it clear in
> the doc that the user should take care of that.

I'm actually not sufficiently familiar with asynchat to comment. I
think it's got quite a different model than what I am trying to set up

> My point is that having a socket registered for both "r" AND "w"
> events when in fact you want only "r" OR "w" is an exponential waste
> of CPU cycles and it should be avoided either by the lib or by the
> user.

One task can only be blocked for reading OR writing. The only way to
have a socket registered for both is if there are separate tasks for
reading and writing, and then presumably that is what you want. (I
have a feeling you haven't fully grokked my HTTP client code yet?)

> "old select() implementation" vs "new select() implementation"
> benchmark shown here reflects exactly this problem which still affects
> base asyncore module:

Hm, I am already using epoll or kqueue if available, otherwise poll,
falling back to select only if there's nothing else available (in
practice that's only Windows).

But I will diligently work towards a benchmarkable demo.

> I'll keep following the progress on this and hopefully come up with
> another set of questions and/or random thoughts.


--Guido van Rossum (

From guido at  Mon Oct 29 23:08:54 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 15:08:54 -0700
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 2:25 PM, Antoine Pitrou <solipsis at> wrote:
> On Mon, 29 Oct 2012 10:03:00 -0700
> Guido van Rossum <guido at> wrote:
>> >> Then there is a
>> >> BufferedReader class that implements more traditional read() and
>> >> readline() coroutines (i.e., to be invoked using yield from), the
>> >> latter handy for line-oriented transports.
>> >
>> > Well... It would be nice if BufferedReader could re-use the actual
>> > io.BufferedReader and its fast readline(), read(), readinto()
>> > implementations.
>> Agreed, I would love that too, but the problem is, *this*
>> BufferedReader defines methods you have to invoke with yield from.
>> Maybe we can come up with a solution for sharing code by modifying the
>> _io module though; that would be great! (I've also been thinking of
>> layering TextIOWrapper on top of these.)
> There is a rather infamous issue about _io.BufferedReader and
> non-blocking I/O at
> It is a bit problematic because currently non-blocking readline()
> returns '' instead of None when no data is available, meaning EOF can't
> be easily detected :(


> Once this issue is solved, you could use _io.BufferedReader, and
> workaround the "partial read/readline result" issue by iterating
> (hopefully in most cases there is enough data in the buffer to
> return a complete read or readline, so the C optimizations are useful).

Yes, that's what I'm hoping for.

> Here is how it may work:
> def __init__(self, fd):
>     self.fd = fd
>     self.bufio = _io.BufferedReader(...)
> def readline(self):
>     chunks = []
>     while True:
>         line = self.bufio.readline()
>         if line is not None:
>             chunks.append(line)
>             if line == b'' or line.endswith(b'\n'):
>                 # EOF or EOL
>                 return b''.join(chunks)
>         yield from scheduler.block_r(self.fd)
> def read(self, n):
>     chunks = []
>     bytes_read = 0
>     while True:
>         data = - bytes_read)
>         if data is not None:
>             chunks.append(data)
>             bytes_read += len(data)
>             if data == b'' or bytes_read == n:
>                 # EOF or read satisfied
>                 break
>         yield from scheduler.block_r(self.fd)
>     return b''.join(chunks)

Hm... I wonder if it would make more sense if these standard APIs were
to return specific exceptions, like the ssl module does in
non-blocking mode? Look here (I updated since posting last night):

> As for TextIOWrapper, AFAIR it doesn't handle non-blocking I/O at all
> (but my memories are vague).

Same suggestion... (I only found out about ssl's approach to async I/O
a few days ago. It felt brilliant and right to me. But maybe I'm
missing something?)

> By the way I don't know how this whole approach (of mocking socket-like
> or file-like objects with coroutine-y read() / readline() methods)
> lends itself to plugging into Windows' IOCP.

Me neither. I hope Steve Dower can tell us.

> You may rely on some raw
> I/O object that registers a callback when a read() is requested and
> then yields a Future object that gets completed by the callback.
> I'm sure Richard has some ideas about that :-)

Which Richard?

--Guido van Rossum (

From andrew.svetlov at  Mon Oct 29 23:19:16 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Tue, 30 Oct 2012 00:19:16 +0200
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 30, 2012 at 12:03 AM, Guido van Rossum <guido at> wrote:
> On Mon, Oct 29, 2012 at 2:20 PM, Giampaolo Rodol? <g.rodola at> wrote:
>> 2012/10/29 Guido van Rossum <guido at>:
>>> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? <g.rodola at> wrote:
>>>> 2012/10/29 Guido van Rossum <guido at>
>>>> === About ===
>>>> 1 - I think DelayedCall should have a reset() method, other than just cancel().
>>> So, essentially an uncancel()? Why not just re-register in that case?
>>> Or what's your use case? (Right now there's no problem in calling one
>>> of these many times -- it's just that cancellation is permanent.)
>> The most common use case is when you want to disconnect the other peer
>> after a certain time of inactivity.
>> Ideally what you would do is schedule() a idle/timeout function and
>> reset() it every time the other peer sends you some data.
> Um, ok, I think you are saying that you want to be able to set
> timeouts and then "reset" that timeout. This is a much higher-level
> thing than canceling the DelayedCall object. (I have no desire to make
> DelayedCall have functionality like Twisted's Deferred. It is
> something *much* simpler; it's just the API for cancelling a callback
> passed to call_later(), and its other uses are similar to this.)
Twisted's DelayedCall is different from Deferred, it used for
reactor.callLater and returned from this function (the same as
call_later from tulip)
Interface is:
Implementation is
DelayedCall from twisted has nothing common with Deferred, it's just
an interface for scheduled activity, which can be called once,
cancelled or rescheduled to another time.

I've found that concept very useful when I used twisted.

> [...]
>> Not very nice to use a reserved keyword, I agree.
>> Perhaps you can keep ruling out kwargs referred to the callback
>> function and change the current call_later signature as such:
>> -    def call_later(self, when, callback, *args):
>> +    def call_later(self, when, callback, *args, errback=None):
>> ...or maybe provide a DelayedCall.add_errback() method a-la Twisted.
> I really don't want that though! But I'm glad you're not too hell-bent
> on supporting callbacks with keyword-only args.
> [...]
>>> should also check that I've taken care of all the various flags and
>>> other details (I recall being quite surprised that with poll(), on
>>> some platforms I need to check for POLLHUP but not on others).
>> Yeah, that's a painful part.
>> Try to look here:
>> Instead of handle_close()ing you should add the fd to the list of
>> readable ones ("r").
>> The call to recv() which will be coming next will then cause the
>> socket to close (you have to add the error handling to recv() first
>> though).
> Aha, are you suggesting that I close the socket when I detect that the
> socket is closed? But what if the other side uses shutdown() to close
> only one end? Depending on the protocol it might be useful to either
> stop reading but keep sending, or vice versa. Maybe I could detect
> that both ends are closed and then close the socket. Or are you
> suggesting something else?
>>>> 3 - I don't fully understand how data transfer works exactly but keep
>>>> in mind that the transport should interact with the pollster.
>>>> What I mean is that generally speaking a connected socket should
>>>> *always* be readable ("r"), even when it's idle, then switch to "rw"
>>>> events when sending data, then get back to "r" when all the data has
>>>> been sent.
>>>> This is *crucial* if you want to achieve high performances/scalability
>>>> and that is why PollsterBase should probably provide a modify()
>>>> method.
>>>> Please take a look at what I've done here:
>>> Hm. I am not convinced that managing this explicitly from the
>>> transport is the right solution (note that my transports are quite
>>> different from those in Twisted). But I'll keep this in mind -- I
>>> would like to set up a benchmark suite at some point. I will probably
>>> have to implement the server side of HTTP for that purpose, so I can
>>> point e.g. ab at my app.
>> I think you might want to apply that to something slighlty higher
>> level than the mere transport.
> (Apply *what*?)
>> Something like the equivalent of asynchat.push /
>> asynchat.push_with_producer, if you'll ever want to go that far in
>> terms of abstraction, or maybe avoid that at all but make it clear in
>> the doc that the user should take care of that.
> I'm actually not sufficiently familiar with asynchat to comment. I
> think it's got quite a different model than what I am trying to set up
> here.
>> My point is that having a socket registered for both "r" AND "w"
>> events when in fact you want only "r" OR "w" is an exponential waste
>> of CPU cycles and it should be avoided either by the lib or by the
>> user.
> One task can only be blocked for reading OR writing. The only way to
> have a socket registered for both is if there are separate tasks for
> reading and writing, and then presumably that is what you want. (I
> have a feeling you haven't fully grokked my HTTP client code yet?)
>> "old select() implementation" vs "new select() implementation"
>> benchmark shown here reflects exactly this problem which still affects
>> base asyncore module:
> Hm, I am already using epoll or kqueue if available, otherwise poll,
> falling back to select only if there's nothing else available (in
> practice that's only Windows).
> But I will diligently work towards a benchmarkable demo.
>> I'll keep following the progress on this and hopefully come up with
>> another set of questions and/or random thoughts.
> Thanks!
> --
> --Guido van Rossum (
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Andrew Svetlov

From rene at  Mon Oct 29 23:23:34 2012
From: rene at (Rene Nejsum)
Date: Mon, 29 Oct 2012 23:23:34 +0100
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 29, 2012, at 5:59 PM, Yury Selivanov < at> wrote:

> On 2012-10-29, at 12:07 PM, Antoine Pitrou <solipsis at> wrote:
>>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>>> APIs that should be invoked using yield from.
>> Hmm, should they? Your approach looks a bit weird: you have functions
>> that should use yield, and others that should use "yield from"? That
>> sounds confusing to me.
>> I'd much rather either have all functions use "yield", or have all
>> functions use "yield from".
>> (also, I wouldn't be shocked if coroutines had to wear a special
>> decorator; it's a better marker than having the word COROUTINE in the
>> docstring, anyway :-))
> That's what bothers me is well.  'yield from' looks too long for a
> simple thing it does (1); users will be confused whether they should
> use 'yield' or 'yield from' (2); there is no visible difference between
> a plain generator and a coroutine (3).

I agree, was this ever commented ? I know it maybe late in the discussion
but just because you can use yield/yield from for concurrent stuff, should you?

it looks very implicit to me (breaking the second rule)

Have the delegate/event model of C# been discussed ?

As always i recommend moving the concurrent stuff to the object level, it
would be so much easier to state that a message for an object is just that:
An async message sent from one object to another? :-) 
A simple decorator like @task would be enough:

@task # explicit run instance in own thread/coroutine
class SomeTask(object):
  def asyc_add(self, x, y)
    return x + y # returns a Future() with result

task = SomeTask()
n = task.async_add(2,2)
# Do other stuff while waiting for answer
print( "result is %d" % n ) # Future will wait/hang until result is ready


> Personally, I like Greg's PEP 3152 (aside from 'cocall' keyword).
> With that approach it's easy to distinguish coroutines, generators and
> plain functions.  And it'd be easier to add some special 
> methods/properties to codefs, like 'in_finally()' method etc.
> -
> Yury
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From guido at  Mon Oct 29 23:26:59 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 15:26:59 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 3:19 PM, Andrew Svetlov
<andrew.svetlov at> wrote:
> Twisted's DelayedCall is different from Deferred, it used for
> reactor.callLater and returned from this function (the same as
> call_later from tulip)
> Interface is:
> Implementation is
> DelayedCall from twisted has nothing common with Deferred, it's just
> an interface for scheduled activity, which can be called once,
> cancelled or rescheduled to another time.
> I've found that concept very useful when I used twisted.

Oh dear. I had no idea there was something named DelayedCall in
Twisted. There is no intention of similarity.

--Guido van Rossum (

From Steve.Dower at  Tue Oct 30 00:00:14 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 29 Oct 2012 23:00:14 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Rene Nejsum wrote:
>> [SNIP]
>> That's what bothers me is well.  'yield from' looks too long for a 
>> simple thing it does (1); users will be confused whether they should 
>> use 'yield' or 'yield from' (2); there is no visible difference 
>> between a plain generator and a coroutine (3).
> I agree, was this ever commented ? I know it maybe late in the discussion
> but just because you can use yield/yield from for concurrent stuff, should you?
> it looks very implicit to me (breaking the second rule)
> Have the delegate/event model of C# been discussed ?
> As always i recommend moving the concurrent stuff to the object level, it
> would be so much easier to state that a message for an object is just that:
> An async message sent from one object to another... :-) A simple decorator
> like @task would be enough:
> @task # explicit run instance in own thread/coroutine class SomeTask(object):
>   def asyc_add(self, x, y)
>     return x + y # returns a Future() with result
> task = SomeTask()
> n = task.async_add(2,2)
> # Do other stuff while waiting for answer print( "result is %d" % n ) # Future will
> wait/hang until result is ready

I think you'll like what I'll be sending out later tonight (US Pacific time), so hold on :) (In the meantime, feel free to read up on C#'s async/await model, which is very similar to what both Guido and I are proposing and has already been pretty well received.)


From greg.ewing at  Tue Oct 30 00:16:22 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 12:16:22 +1300
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Steve Dower wrote:

>  - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?

I don't think that writing new schedulers is something an end user
will do very often. Or more precisely, it's not something they should
*have* to do except in extremely unusual circumstances.

I believe it will be possible to provide a scheduler in the stdlib
that will be satisfactory for the vast majority of applications.


From Steve.Dower at  Tue Oct 30 00:12:54 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 29 Oct 2012 23:12:54 +0000
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
>> By the way I don't know how this whole approach (of mocking 
>> socket-like or file-like objects with coroutine-y read() / readline() 
>> methods) lends itself to plugging into Windows' IOCP.
> Me neither. I hope Steve Dower can tell us.

I suppose since my name has been invoked I ought to comment, though Richard (Oudkerk, I think?) seems to have more experience with IOCP than I do.

>From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks. Yes, it then will not be a pure single-threaded model, but on the other hand it isn't going to use an unbounded number of threads. There are alternatives to IOCP, but they will require expert hands to make them efficient under scale - IOCP has already had the expect hands applied (I assume... maybe it was written by an intern? I really don't know).

The whole blocking coroutine model works really well with callback-based unblocks (whether they call Future.set_result or unblock_task), so I don't think there's anything to worry about here. Compatibility-wise, it should be easy to make programs portable, and since we can have completely separate implementations for Linux/Mac/Windows it will be possible to get good, if not excellent, performance out of each.

What will make a difference is the ready vs. complete notifications - most async Windows APIs will signal when they are complete (for example, the data has been read from the file) unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this difference up by making all APIs notify on completion, and if we don't do this then user code may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important consideration for good cross-platform libraries.


From guido at  Tue Oct 30 00:21:38 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 16:21:38 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 3:23 PM, Rene Nejsum <rene at> wrote:
> On Oct 29, 2012, at 5:59 PM, Yury Selivanov < at> wrote:
>> On 2012-10-29, at 12:07 PM, Antoine Pitrou <solipsis at> wrote:
>>>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>>>> APIs that should be invoked using yield from.
>>> Hmm, should they? Your approach looks a bit weird: you have functions
>>> that should use yield, and others that should use "yield from"? That
>>> sounds confusing to me.
>>> I'd much rather either have all functions use "yield", or have all
>>> functions use "yield from".
>>> (also, I wouldn't be shocked if coroutines had to wear a special
>>> decorator; it's a better marker than having the word COROUTINE in the
>>> docstring, anyway :-))
>> That's what bothers me is well.  'yield from' looks too long for a
>> simple thing it does (1); users will be confused whether they should
>> use 'yield' or 'yield from' (2); there is no visible difference between
>> a plain generator and a coroutine (3).
> I agree, was this ever commented ? I know it maybe late in the discussion
> but just because you can use yield/yield from for concurrent stuff, should you?

I explained my position on yield vs. yield from twice already in this thread.

--Guido van Rossum (

From Steve.Dower at  Tue Oct 30 00:26:17 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 29 Oct 2012 23:26:17 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing wrote:
> Steve Dower wrote:
>>  - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
> I don't think that writing new schedulers is something an end user will do very often. Or
> more precisely, it's not something they should *have* to do except in extremely
> unusual circumstances.
> I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory
> for the vast majority of applications.

I agree, and I chose my words poorly for that point: "library/framework developers" is more accurate than "end user". And since I expect every GUI framework is going to need (or at least want) their own scheduler, not to mention all the cases of Python being embedded in other programs, there is some value in helping these developers to get it right by virtue of the design rather than relying on documentation.


From guido at  Tue Oct 30 00:29:00 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 16:29:00 -0700
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 4:12 PM, Steve Dower <Steve.Dower at> wrote:
> Guido van Rossum wrote:
>>> By the way I don't know how this whole approach (of mocking
>>> socket-like or file-like objects with coroutine-y read() / readline()
>>> methods) lends itself to plugging into Windows' IOCP.
>> Me neither. I hope Steve Dower can tell us.
> I suppose since my name has been invoked I ought to comment, though Richard (Oudkerk, I think?) seems to have more experience with IOCP than I do.

Aha, somehow I thought Richard was a Mac expert. :-(

> From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks. Yes, it then will not be a pure single-threaded model, but on the other hand it isn't going to use an unbounded number of threads. There are alternatives to IOCP, but they will require expert hands to make them efficient under scale - IOCP has already had the expect hands applied (I assume... maybe it was written by an intern? I really don't know).

Experts all point in its direction, so I believe IOCP is solid.

> The whole blocking coroutine model works really well with callback-based unblocks (whether they call Future.set_result or unblock_task), so I don't think there's anything to worry about here. Compatibility-wise, it should be easy to make programs portable, and since we can have completely separate implementations for Linux/Mac/Windows it will be possible to get good, if not excellent, performance out of each.

Right. Did you see my call_in_thread() yet?

> What will make a difference is the ready vs. complete notifications - most async Windows APIs will signal when they are complete (for example, the data has been read from the file) unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this difference up by making all APIs notify on completion, and if we don't do this then user code may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important consideration for good cross-platform libraries.

I wonder if this could be done by varying the transports by platform?
Not too many people are going to write new transports -- there just
aren't that many options. And those that do might be doing something
platform-specific anyway. It shouldn't be that hard to come up with a
transport abstraction that lets protocol implementations work
regardless of whether it's a UNIX style transport or a Windows style
transport. UNIX systems with IOCP support could use those too.

--Guido van Rossum (

From guido at  Tue Oct 30 00:37:52 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 16:37:52 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 4:26 PM, Steve Dower <Steve.Dower at> wrote:
> Greg Ewing wrote:
>> Steve Dower wrote:
>>>  - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user?
>> I don't think that writing new schedulers is something an end user will do very often. Or
>> more precisely, it's not something they should *have* to do except in extremely
>> unusual circumstances.
>> I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory
>> for the vast majority of applications.
> I agree, and I chose my words poorly for that point: "library/framework developers" is more accurate than "end user". And since I expect every GUI framework is going to need (or at least want) their own scheduler, not to mention all the cases of Python being embedded in other programs, there is some value in helping these developers to get it right by virtue of the design rather than relying on documentation.

BTW, would it be useful to separate this into pollster, eventloop, and
scheduler? At least in my world these are different; of these three,
only the pollster contains platform-specific code (and then again the
transports do too -- this is a nice match IMO).

--Guido van Rossum (

From Steve.Dower at  Tue Oct 30 00:47:51 2012
From: Steve.Dower at (Steve Dower)
Date: Mon, 29 Oct 2012 23:47:51 +0000
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

> Guido van Rossum wrote:
> [SNIP]
> On Mon, Oct 29, 2012 at 4:12 PM, Steve Dower <Steve.Dower at> wrote:
>> The whole blocking coroutine model works really well with callback-based unblocks
>> (whether they call Future.set_result or unblock_task), so I don't think there's anything
>> to worry about here. Compatibility-wise, it should be easy to make programs portable,
>> and since we can have completely separate implementations for Linux/Mac/Windows it
>> will be possible to get good, if not excellent, performance out of each.
> Right. Did you see my call_in_thread() yet?

Yes, and it really stood out as one of the similarities between our work. I don't have an equivalent function, since writing "yield thread_pool.submit(...)" is sufficient (because it already returns a Future), but I haven't actually made the thread pool a property of the current scheduler. I think there's value in it

>> What will make a difference is the ready vs. complete notifications - most async Windows 
>> APIs will signal when they are complete (for example, the data has been read from the file)
>> unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this
>> difference up by making all APIs notify on completion, and if we don't do this then user code
>> may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important
>> consideration for good cross-platform libraries.
> I wonder if this could be done by varying the transports by platform?
> Not too many people are going to write new transports -- there just aren't that many options.
> And those that do might be doing something platform-specific anyway. It shouldn't be that hard
> to come up with a transport abstraction that lets protocol implementations work regardless of
> whether it's a UNIX style transport or a Windows style transport. UNIX systems with IOCP support
> could use those too.

I feel like a bit of a tease now, since I still haven't posted my code (it's coming, but I also have day work to do [also Python related]), but I've really left this side of things out of my definition completely in favour of allowing schedulers to "unblock" known functions. For example, (library) code that needs a socket to be ready can ask the current scheduler if it can do "select([sock], [], [])", and if the scheduler can then it will give the library code a Future. How the scheduler ends up implementing the asynchronous-select is entirely up to the scheduler, and if it can't do it, the caller can do it their own way (which probably means using a thread pool as a last resort).

What I would expect this to result in is a set of platform-specific default schedulers that do common operations well and other (3rd-party) schedulers that do particular things really well. So if you want high performance single-threaded sockets, you replace the default scheduler with another one - but if Windows doesn't support the optimized scheduler, you can use the default scheduler without your code breaking.

Writing this now it seems to be even clearer that we've approached the problem differently, which should mean there'll be room to share parts of the designs and come up with a really solid result. I'm looking forward to it.


From greg.ewing at  Tue Oct 30 00:53:56 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 12:53:56 +1300
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Mark Hackett wrote:

> Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux 
> (because of the string operations available in the x86 instruction set), but I 
> don't thing anything other than an IPC message has a "you can write a string 
> atomically" guarantee. And I may be misremembering that.

It seems to be a POSIX requirement:

        POSIX.1-2001  says  that  write(2)s of less than PIPE_BUF bytes must be
        atomic: the output  data  is  written  to  the  pipe  as  a  contiguous


There's no corresponding guarantee for reading, though. The process
on the other end can't be sure of getting the data from one write()
call in a single read() call. In other words, the write does *not*
establish a record boundary.


From shibturn at  Tue Oct 30 01:01:23 2012
From: shibturn at (Richard Oudkerk)
Date: Tue, 30 Oct 2012 00:01:23 +0000
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <k6n5cn$477$>

On 29/10/2012 11:29pm, Guido van Rossum wrote:
> I wonder if this could be done by varying the transports by platform?
> Not too many people are going to write new transports -- there just
> aren't that many options. And those that do might be doing something
> platform-specific anyway. It shouldn't be that hard to come up with a
> transport abstraction that lets protocol implementations work
> regardless of whether it's a UNIX style transport or a Windows style
> transport. UNIX systems with IOCP support could use those too.

Yes, having separate implementations of the transport layer should work.

But I think it would be cleaner to put all the platform specific stuff 
in the pollster, and make the pollster poll-for-completion rather than 
poll-for-readiness.  (Is this the "proactor pattern"?)  That seems to be 
the direction libevent has moved in.


From greg.ewing at  Tue Oct 30 01:06:29 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 13:06:29 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Yury Selivanov wrote:
> Because scheduler, when it is deciding to interrupt a coroutine or not, 
> should only question whether that particular coroutine is in its finally, 
> and not the one which called it.

So given this:

    def c1():
          yield from c2()

    def c2():
       yield from block() # 1

it should be okay to interrupt at point 1, even though
it will prevent very_important_cleanup() from being done?

That doesn't seem right to me.


From greg.ewing at  Tue Oct 30 01:19:08 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 13:19:08 +1300
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
>>I would to see add_{reader,writer} and call_{soon,later} accepting
>>**kwargs as well as *args. At least to respect functions with
>>keyword-only arguments.
> Hmm... I intentionally ruled those out because I wanted to leave the
> door open for keyword args that modify the registration function

One way to accommodate that would be to make the
registration API look like this:

    call_later(my_func)(arg1, ..., kwd = value, ...)


From greg.ewing at  Tue Oct 30 01:24:18 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 13:24:18 +1300
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:

> I can build lots of other useful things out of call_soon() and
> call_later() -- but I do need at least those two as "axioms".

Isn't call_soon() equivalent to call_later() with a
time delay of 0?

If so, then call_later() is really the only axiomatic one.


From greg.ewing at  Tue Oct 30 01:25:43 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 13:25:43 +1300
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Andrew Svetlov wrote:

> 0MQ socket has no file descriptor at all, it's just pointer to some
> unspecified structure.
> So 0MQ has own *poll* function which can process that sockets as well
> as file descriptors.

Aaargh... yet another event loop that wants to rule
the world. This is not good.


From ncoghlan at  Tue Oct 30 01:34:24 2012
From: ncoghlan at (Nick Coghlan)
Date: Tue, 30 Oct 2012 10:34:24 +1000
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 30, 2012 at 9:29 AM, Guido van Rossum <guido at> wrote:
> Aha, somehow I thought Richard was a Mac expert. :-(

Just in case anyone else confused the two names (I know I have in the past):

    Ronald Oussoren = Mac expert
    Richard Oudkerk = multiprocessing expert (including tools for
inter-process communication)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From at  Tue Oct 30 01:44:03 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 20:44:03 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-29, at 8:06 PM, Greg Ewing <greg.ewing at> wrote:

> Yury Selivanov wrote:
>> Because scheduler, when it is deciding to interrupt a coroutine or not, should only question whether that particular coroutine is in its finally, and not the one which called it.
> So given this:
>   def c1():
>      try:
>         something()
>      finally:
>         yield from c2()
>         very_important_cleanup()
>   def c2():
>      yield from block() # 1
> it should be okay to interrupt at point 1, even though
> it will prevent very_important_cleanup() from being done?
> That doesn't seem right to me.
> -- 
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From at  Tue Oct 30 01:43:23 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 20:43:23 -0400
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>


Finally got some time to do a review & read what others posted.
Some comments are more general, some are more implementation-specific
(hopefully you want to hear latter ones as well)

And I'm still in the process of digesting your approach & code (as
I've spent too much time with my implementation)...

On 2012-10-28, at 7:52 PM, Guido van Rossum <guido at> wrote:

1. I'd make EventLoopMixin a separate entity from pollsters.  So that you'd
be able to add many different pollsters to one EventLoop.  This way
you can have specialized pollster for different types of IO, including
UI etc.

2. Sometimes, there is a need to run a coroutine in a threadpool.  I know it
sounds weird, but it's probably worth exploring.

3. In my framework each threadpool worker has its own local context, with
various information like what Task run the operation etc.

And few small things:

4. epoll.poll and other syscalls need to be wrapped in try..except to catch
and ignore (and log?) EINTR type of exceptions.

5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors 


> In the docstrings I use the prefix "COROUTINE:" to indicate public
> APIs that should be invoked using yield from.

As others, I would definitely suggest adding a decorator to make
coroutines more distinguishable.  It would be even better if we can return
a tiny wrapper, that lets you to simply write '', 
instead of:

    task = scheduling.Task(doit(), timeout=2.1)

And avoid manual Task instantiation at all.

I also liked the simplicity of the Task class.  I think it'd be easy
to mix greenlets in it by switching in a new greenlet on each 'step'.
That will give you 'yield_()' function, which you can use in the same
way you use 'yield' statement now (I'm not proposing to incorporate
greenlets in the lib itself, but rather to provide an option to do so)
Hence there should be a way to plug your own Task (sub-)class in.

Thank you,

From at  Tue Oct 30 02:00:46 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 21:00:46 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Oh... I'm sorry for the empty reply.

On 2012-10-29, at 8:06 PM, Greg Ewing <greg.ewing at> wrote:
> Yury Selivanov wrote:
>> Because scheduler, when it is deciding to interrupt a coroutine or not, should only question whether that particular coroutine is in its finally, and not the one which called it.
> So given this:
>   def c1():
>      try:
>         something()
>      finally:
>         yield from c2()
>         very_important_cleanup()
>   def c2():
>      yield from block() # 1
> it should be okay to interrupt at point 1, even though
> it will prevent very_important_cleanup() from being done?
> That doesn't seem right to me.

But you don't just randomly interrupt coroutines.  You interrupt them
when you *explicitly stated*, for instance, that this very one coroutine
is executed with a timeout.  And it's your responsibility to handle
a TimeoutError when you call it with such restriction.

That's the *main* thing here.  Again, when you, explicitly, execute 
something with a timeout), then that very something shouldn't be 
interrupted uncontrollably by the scheduler.  It's that particular
something, whose 'finally' should be protected.

So in your example scheduler would never ever has a question of 
interrupting c2(), because it wasn't called with any restriction/timeout.
There simply no reason to interrupt it ever.

But if you want to make c2() interruptible, you would write:

   def c1():
          yield from with_timeout(2.0, c2())

And that way, c2() actually may be (and at some point will be) interrupted
by scheduler.  And it's your responsibility to catch TimeoutError.

So you would write your code in the following way to protect c1's finally 

   def c1():
             yield from with_timeout(2.0, c2())
          except TimeoutError:

Now, the problem is that when you call c2() with a timeout, scheduler should
not interrupt c2's finally statement (if there is any).  And it has nothing 
to do with c1 entirely.

So if c2() code is like the following:

   def c2():
          yield from someotherthing()

Then you need scheduler to know if it is in its finally or not.  Because its
c2() which was run with a timeout.  It's c2() code that may be subject to
aborting.  And it doesn't matter from where c2() was called, the only thing
that matters, is that if it was called with a timeout, its finally block
should be protected from interrupting.  That's all.


From guido at  Tue Oct 30 02:02:43 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 18:02:43 -0700
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <k6n5cn$477$>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 5:01 PM, Richard Oudkerk <shibturn at> wrote:
> On 29/10/2012 11:29pm, Guido van Rossum wrote:
>> I wonder if this could be done by varying the transports by platform?
>> Not too many people are going to write new transports -- there just
>> aren't that many options. And those that do might be doing something
>> platform-specific anyway. It shouldn't be that hard to come up with a
>> transport abstraction that lets protocol implementations work
>> regardless of whether it's a UNIX style transport or a Windows style
>> transport. UNIX systems with IOCP support could use those too.
> Yes, having separate implementations of the transport layer should work.
> But I think it would be cleaner to put all the platform specific stuff in
> the pollster, and make the pollster poll-for-completion rather than
> poll-for-readiness.  (Is this the "proactor pattern"?)  That seems to be the
> direction libevent has moved in.

Interesting. I'd like to hear what Twisted thinks of this. (I will
find out next week. :-)

--Guido van Rossum (

From Steve.Dower at  Tue Oct 30 02:40:53 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 30 Oct 2012 01:40:53 +0000
Subject: [Python-ideas] Async API: some more code to review
In-Reply-To: <>
References: <>
Message-ID: <>

To save people scrolling to get to the interesting parts, I'll lead with the links:

Detailed write-up:

Source code:

(Yes, I renamed my repo after the code name was selected. That would have been far too much of a coincidence.)

Practically all of the details are in the write-up linked first, so anything that's not is either something I didn't think of or something I decided is unimportant right now (for example, the optimal way to wait for ten thousand sockets simultaneously on every different platform).

There's a reimplemented Future class in the code which is not essential, but it is drastically simplified from concurrent.futures.Future (CFF). It can't be directly replaced by CFF, but only because CFF requires more state management that the rest of the implementation does not perform ("set_running_or_notify_cancel"). CFF also includes cancellation, for which I've proposed a different mechanism.

For the sake of a quick example, I've modified Guido's main.doit function ( to how it could be written with my proposal (apologies if I've butchered it, but I think it should behave the same):


def doit():
    TIMEOUT = 2
    cs = CancellationSource()

    tasks = set()

    task1 = urlfetch('localhost', 8080, path='/', cancel_source=cs)

    task2 = urlfetch('', 8080, path='/home', cancel_source=cs)

    task3 = urlfetch('', 80, path='/', cancel_source=cs)

    task4 = urlfetch('', ssl=True, path='/', af=socket.AF_INET, cancel_source=cs)

    ## for t in tasks: t.start()    # tasks start as soon as they are called - this function does not exist

    yield delay(0.2)                # I believe this is equivalent to scheduling.with_timeout(0.2, ...)?

    winners = [t.result() for t in tasks if t.done()]
    print('And the winners are:', [w for w in winners])

    results = []                    # This 'wait all' loop could easily be a helper function
    for t in tasks:                 # Unfortunately, [(yield t) for t in tasks] does not work :(
        results.append((yield t))
    print('And the players were:', [r for r in results])
    return results

This is untested code, and has a few differences. I don't have task names, so it will print the returned value from urlfetch (a tuple of (host, port, path, status, len(data), time_taken)). The cancellation approach is quite different, but IMO far more likely to avoid the finally-related issues discussed in other threads.

However, I want to emphasise that unless you are already familiar with this exact style, it is near impossible to guess exactly what is going on from this little sample. Please read the write-up before assuming what is or is not possible with this approach.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Tue Oct 30 02:59:56 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 21:59:56 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Guido, Greg,

On 2012-10-27, at 7:45 PM, Yury Selivanov < at> wrote:

> Right.  But now I'm not sure this approach will work with yield-froms.
> As when you yield-fromming scheduler knows nothing about the chain of 
> generators, as it's all hidden in the yield-from implementation.

I think I've come up with a solution that should work for yield-froms too
(if we accept my in_finally idea in 3.4).  And there should be a way
of writing a 'protect_finally' context manager too.

I'll illustrate the approach on Guido's tulip micro-framework
(consider it a pseudo code to illustrate the idea):

    class Interrupt(BaseException):
        """Should penetrate all try..excepts"""

    def call_with_timeout(timeout, gen):
        context.current_task._add_timeout(timeout, gen)
            return (yield from gen)
        except Interrupt:
            raise TimeoutError() from None

    class Task:
        def _add_timeout(timeout, gen):
                partial(self._interrupt, gen))

        def _interrupt(self, gen):
            if not gen.in_finally:
                gen.throw(Interrupt, Interrupt(), None)
                # So we set a flag to watch for gen's in_finally value
                # on each 'step' call.  And when it's 0 - Task.step
                # will call '_interrupt' again.

I defined a new function 'call_with_timeout', because tulip's 'with_timeout'
starts a new Task, whereas the former works in any generator inside the task.

So, after that you'd be able to do the following:

     yield from call_with_timeout(1.0, something())

And something's 'finally' won't ever be aborted.


From Steve.Dower at  Tue Oct 30 03:06:37 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 30 Oct 2012 02:06:37 +0000
Subject: [Python-ideas] Async API: some more code to review
In-Reply-To: <>
References: <>,
Message-ID: <>

Possibly I should have selected a different code name, now I come to think of it, but we came up with such similar code that I don't think it'll stay separate for too long.

From: Python-ideas [ at] on behalf of Steve Dower [Steve.Dower at]
Sent: Monday, October 29, 2012 6:40 PM
To: python-ideas at
Subject: [Python-ideas] Async API: some more code to review

To save people scrolling to get to the interesting parts, I'll lead with the links:

Detailed write-up:

Source code:

(Yes, I renamed my repo after the code name was selected. That would have been far too much of a coincidence.)

Practically all of the details are in the write-up linked first, so anything that's not is either something I didn't think of or something I decided is unimportant right now (for example, the optimal way to wait for ten thousand sockets simultaneously on every different platform).

There's a reimplemented Future class in the code which is not essential, but it is drastically simplified from concurrent.futures.Future (CFF). It can't be directly replaced by CFF, but only because CFF requires more state management that the rest of the implementation does not perform ("set_running_or_notify_cancel"). CFF also includes cancellation, for which I've proposed a different mechanism.

For the sake of a quick example, I've modified Guido's main.doit function ( to how it could be written with my proposal (apologies if I've butchered it, but I think it should behave the same):


def doit():
    TIMEOUT = 2
    cs = CancellationSource()

    tasks = set()

    task1 = urlfetch('localhost', 8080, path='/', cancel_source=cs)

    task2 = urlfetch('', 8080, path='/home', cancel_source=cs)

    task3 = urlfetch('', 80, path='/', cancel_source=cs)

    task4 = urlfetch('', ssl=True, path='/', af=socket.AF_INET, cancel_source=cs)

    ## for t in tasks: t.start()    # tasks start as soon as they are called - this function does not exist

    yield delay(0.2)                # I believe this is equivalent to scheduling.with_timeout(0.2, ...)?

    winners = [t.result() for t in tasks if t.done()]
    print('And the winners are:', [w for w in winners])

    results = []                    # This 'wait all' loop could easily be a helper function
    for t in tasks:                 # Unfortunately, [(yield t) for t in tasks] does not work :(
        results.append((yield t))
    print('And the players were:', [r for r in results])
    return results

This is untested code, and has a few differences. I don't have task names, so it will print the returned value from urlfetch (a tuple of (host, port, path, status, len(data), time_taken)). The cancellation approach is quite different, but IMO far more likely to avoid the finally-related issues discussed in other threads.

However, I want to emphasise that unless you are already familiar with this exact style, it is near impossible to guess exactly what is going on from this little sample. Please read the write-up before assuming what is or is not possible with this approach.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From guido at  Tue Oct 30 03:07:21 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 19:07:21 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 5:43 PM, Yury Selivanov < at> wrote:
> Finally got some time to do a review & read what others posted.


> Some comments are more general, some are more implementation-specific
> (hopefully you want to hear latter ones as well)


> And I'm still in the process of digesting your approach & code (as
> I've spent too much time with my implementation)...

Heh. :-)

> On 2012-10-28, at 7:52 PM, Guido van Rossum <guido at> wrote:
> [...]
> [...]
> 1. I'd make EventLoopMixin a separate entity from pollsters.  So that you'd
> be able to add many different pollsters to one EventLoop.  This way
> you can have specialized pollster for different types of IO, including
> UI etc.

I came to the same conclusion, so I fixed this. See the latest version.

(BTW, I also renamed add_reader() etc. on the Pollster class to
register_reader() etc. -- I dislike similar APIs on different classes
to have the same name if there's not a strict super class override

> 2. Sometimes, there is a need to run a coroutine in a threadpool.  I know it
> sounds weird, but it's probably worth exploring.

I think that can be done quite simply. Since each thread has its own
eventloop (via the magic of TLS), it's as simple as writing a function
that creates a task, starts it, and then runs the eventloop. There's
nothing else running in that particular thread, and its eventloop will
terminate when there's nothing left to do there -- i.e. when the task
is done. Sketch:

def some_generator(arg):
    ...stuff using yield from...
    return 42

def run_it_in_the_threadpool(arg):
    t = Task(some_generator(arg))
    return t.result

# And in your code:
result = yield from scheduling.call_in_thread(run_it_in_the_threadpool, arg)

# Now result == 42.

> 3. In my framework each threadpool worker has its own local context, with
> various information like what Task run the operation etc.

I think I have this too -- Thread-Local Storage!

> And few small things:
> 4. epoll.poll and other syscalls need to be wrapped in try..except to catch
> and ignore (and log?) EINTR type of exceptions.

Good point.

> 5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors
> too.

Do you have a code sample? I haven't found a need yet.

> [...]
>> In the docstrings I use the prefix "COROUTINE:" to indicate public
>> APIs that should be invoked using yield from.
> [...]
> As others, I would definitely suggest adding a decorator to make
> coroutines more distinguishable.

That's definitely on my TODO list.

> It would be even better if we can return
> a tiny wrapper, that lets you to simply write '',
> instead of:
>     task = scheduling.Task(doit(), timeout=2.1)
>     task.start()

The run() call shouldn't be necessary unless you are at the toplevel.

> And avoid manual Task instantiation at all.

Hm. I want the generator function to return just a generator object,
and I can't add methods to that. But we can come up with a decent API.

> I also liked the simplicity of the Task class.  I think it'd be easy
> to mix greenlets in it by switching in a new greenlet on each 'step'.
> That will give you 'yield_()' function, which you can use in the same
> way you use 'yield' statement now (I'm not proposing to incorporate
> greenlets in the lib itself, but rather to provide an option to do so)
> Hence there should be a way to plug your own Task (sub-)class in.

Hm. Someone else will have to give that a try.

Thanks for your feedback!!

--Guido van Rossum (

From at  Tue Oct 30 03:18:25 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 22:18:25 -0400
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-29, at 10:07 PM, Guido van Rossum <guido at> wrote:
>> 5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors
>> too.
> Do you have a code sample? I haven't found a need yet.

Just a code dump from my epoll proactor:

    if ev & EPOLLHUP:

    if ev & EPOLLERR:
        sock.close(_error_cls=ConnectionError, _error_msg='socket error in epoll proactor')

>> It would be even better if we can return
>> a tiny wrapper, that lets you to simply write '',
>> instead of:
>>    task = scheduling.Task(doit(), timeout=2.1)
>>    task.start()
> The run() call shouldn't be necessary unless you are at the toplevel.

Yes, that's just a sugar to make top-level runs more appealing.
You'll also get a nice way of setting timeouts,

    yield from coro().with_timeout(1.0)

>> I also liked the simplicity of the Task class.  I think it'd be easy
>> to mix greenlets in it by switching in a new greenlet on each 'step'.
>> That will give you 'yield_()' function, which you can use in the same
>> way you use 'yield' statement now (I'm not proposing to incorporate
>> greenlets in the lib itself, but rather to provide an option to do so)
>> Hence there should be a way to plug your own Task (sub-)class in.
> Hm. Someone else will have to give that a try.

I'll be that someone once we choose the direction ;)  IMO the greenlets
integration is a very important topic.


From at  Tue Oct 30 03:29:59 2012
From: at (Yury Selivanov)
Date: Mon, 29 Oct 2012 22:29:59 -0400
Subject: [Python-ideas] Async API: some more code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-29, at 9:40 PM, Steve Dower <Steve.Dower at> wrote:

> To save people scrolling to get to the interesting parts, I'll lead with the links:
> Detailed write-up:
> Source code:

Your design looks very similar to the framework I developed.
I'll try to review your code in detail tomorrow.

Couple of things I like already:

1) Use of 'yield from' is completely optional

2) @async decorator.  That makes coroutines more visible and
allows to add extra methods to them.

3) Tight control over coroutines execution, something that
is completely missing when you use yield-from.

I dislike the choice of name for 'async', though.  Since 
@async-decorated functions are going to be yielded most of the 
time (yield makes them "sync" in that context), I'd stick to 
plain @coroutine.

P.S. If this approach is viable (optional yield-from, required
@async-or-something decorator), I can invest some time and
open source the core of my framework (one benefit is that it
has lots and lots of unit-tests).


From at  Tue Oct 30 05:08:25 2012
From: at (Yury Selivanov)
Date: Tue, 30 Oct 2012 00:08:25 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-29, at 9:59 PM, Yury Selivanov < at> wrote:

> Guido, Greg,
> On 2012-10-27, at 7:45 PM, Yury Selivanov < at> wrote:
>> Right.  But now I'm not sure this approach will work with yield-froms.
>> As when you yield-fromming scheduler knows nothing about the chain of 
>> generators, as it's all hidden in the yield-from implementation.
> I think I've come up with a solution that should work for yield-froms too
> (if we accept my in_finally idea in 3.4).  And there should be a way
> of writing a 'protect_finally' context manager too.
> I'll illustrate the approach on Guido's tulip micro-framework
> (consider it a pseudo code to illustrate the idea):
>    class Interrupt(BaseException):
>        """Should penetrate all try..excepts"""
>    def call_with_timeout(timeout, gen):
>        context.current_task._add_timeout(timeout, gen)
>        try:
>            return (yield from gen)
>        except Interrupt:
>            raise TimeoutError() from None
>    class Task:
>        def _add_timeout(timeout, gen):
>            self.eventloop.call_later(
>                timeout,
>                partial(self._interrupt, gen))
>        def _interrupt(self, gen):
>            if not gen.in_finally:
>                gen.throw(Interrupt, Interrupt(), None)
>            else:
>                # So we set a flag to watch for gen's in_finally value
>                # on each 'step' call.  And when it's 0 - Task.step
>                # will call '_interrupt' again.
>                self._watch_finally(gen)
> I defined a new function 'call_with_timeout', because tulip's 'with_timeout'
> starts a new Task, whereas the former works in any generator inside the task.
> So, after that you'd be able to do the following:
>     yield from call_with_timeout(1.0, something())
> And something's 'finally' won't ever be aborted.

Ah, the solution is wrong, I've tricked myself.

The right code would be something like that:

   class Interrupt(BaseException):
       """Should penetrate all try..excepts"""

   def call_with_timeout(timeout, gen):
       context.current_task._add_timeout(timeout, gen)
           return (yield from gen)
       except Interrupt:
           raise TimeoutError() from None

   class Task:
       def _add_timeout(timeout, gen):
           # XXX The following line is the key.  We need a reference
           # to the generator object that is yield-fromming our 'gen'
           # ('caller' for 'gen')
           current_yield_from = self.gen.yield_from
               partial(self._interrupt, gen, current_yield_from))

       def _interrupt(self, gen, yf):           
           if not yf.in_finally:
               # If gen's caller is not in it's finally block - it's
               # safe for us to interrupt gen.
               gen.throw(Interrupt, Interrupt(), None)
               # So we set a flag to watch for yf's in_finally value
               # on each 'step' call.  And when it's 0 - Task.step
               # will call '_interrupt' again.
               self._watch_finally(yf, gen)

IOW, besides just 'in_finally', we also need to add 'yield_from' property
to generator object.  The latter will hold a reference to the sub-generator
that current generator is yielding from.

The logic is pretty twisted, but i'm sure that the problem is solvable.

P.S. I'm not proposing to add anything.  It's more about finding *any* way
to actually solve the problem correctly.  Once we find that way, we *maybe*
start thinking about language support of it.


From guido at  Tue Oct 30 05:25:21 2012
From: guido at (Guido van Rossum)
Date: Mon, 29 Oct 2012 21:25:21 -0700
Subject: [Python-ideas] Async API: some more code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Monday, October 29, 2012, Steve Dower wrote:

>  Possibly I should have selected a different code name, now I come to
> think of it, but we came up with such similar code that I don't think it'll
> stay separate for too long.

Hm, yes, this felt weird. I figured the code names would be useful to
reference the proposals when comparing them, not as the ultimate eventual
project name once it's beeb PEP-ified and put in the stdlib.

 Maybe you can call yours "wattle"? That's a Pythonic plant name. :-)

(Sorry, still reading through your docs and code, it's too early for more
substantial fedback.)


--Guido van Rossum (
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From Steve.Dower at  Tue Oct 30 05:38:28 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 30 Oct 2012 04:38:28 +0000
Subject: [Python-ideas] Async API: some more code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> On Monday, October 29, 2012, Steve Dower wrote:
>> Possibly I should have selected a different code name,
>> now I come to think of it, but we came up with such
>> similar code that I don't think it'll stay separate for too long.
> Hm, yes, this felt weird. I figured the code names would be
> useful to reference the proposals when comparing them, not
> as the ultimate eventual project name once it's beeb PEP-ified
> and put in the stdlib.
> Maybe you can call yours "wattle"? That's a Pythonic plant name. :-)

Nice idea. I renamed it and (hopefully) made it so the original links still work.

I was never expecting the name to last, I just figured you had to make something up to create a project. Eventually it will all just become a boring PEP-xxx number...


From greg.ewing at  Tue Oct 30 06:10:28 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 18:10:28 +1300
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

Steve Dower wrote:

> From my point of view, IOCP fits in very well provided the callbacks (which will
> run in the IOCP thread pool) are only used to unblock tasks.

Is it really necessary to have a separate thread just to handle
unblocking tasks? That thread will have very little to do, so
it could just as well run the tasks too, couldn't it?


From greg.ewing at  Tue Oct 30 06:20:13 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 18:20:13 +1300
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Steve Dower wrote:

>>I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory
>>for the vast majority of applications.
> I agree, and I chose my words poorly for that point: "library/framework
> developers" is more accurate than "end user".

I don't think that even library developers should need to write
their own scheduler very often.

> And since I expect every GUI
> framework is going to need (or at least want) their own scheduler,

I don't agree with that. They might need their own event loop,
but I haven't seen any reason so far to think they would need
their own coroutine scheduler.

Remember that Guido wants to keep the event loop stuff and the
scheduler stuff very clearly separated. The scheduler will all
be pure Python and should be usable with just about any
event loop.


From greg.ewing at  Tue Oct 30 06:27:34 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 18:27:34 +1300
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

Steve Dower wrote:
> For example, (library) code that needs
> a socket to be ready can ask the current scheduler if it can do "select([sock],
> [], [])", 

I think you're mixing up the scheduler and event loop layers
here. If the scheduler is involved in this at all, it would only
be to pass the request on to the event loop.


From greg.ewing at  Tue Oct 30 06:36:10 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 18:36:10 +1300
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Yury Selivanov wrote:
> It would be even better if we can return
> a tiny wrapper, that lets you to simply write '', 
> instead of:
>     task = scheduling.Task(doit(), timeout=2.1)
>     task.start()

I would prefer spelling this something like

    scheduling.spawn(doit(), timeout=2.1)

A newly spawned task should be scheduled automatically; if
you're not ready for it to run yet, then don't spawn it
until you are.

Also, it should almost *never* be necessary to call That should happen only in a very few
places, mostly buried deep inside the scheduling/event
loop system.


From greg.ewing at  Tue Oct 30 06:53:09 2012
From: greg.ewing at (Greg Ewing)
Date: Tue, 30 Oct 2012 18:53:09 +1300
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

Yury Selivanov wrote:

> So in your example scheduler would never ever has a question of 
> interrupting c2(), because it wasn't called with any restriction/timeout.
> There simply no reason to interrupt it ever.

But there's nothing to stop someone writing

    def c3():
       	 yield from with_timeout(10.0, c1())
       except TimeoutError:
          print("That's cool, I can cope with that")

Also, it's not just TimeoutErrors that are a potential
problem, it's any asynchronous exception. For example,
the task calling c1() might get cancelled by another
task while c2() is blocked. If cancelling is implemented
by throwing in an exception, you have the same problem.

> Then you need scheduler to know if it is in its finally or not.  Because its
> c2() which was run with a timeout.  It's c2() code that may be subject to
> aborting.

I'm really not following your reasoning here. You seem to
be speaking as if with_timeout() calls only have an effect
one level deep. But that's not the case -- the frame that a
TimeoutError gets thrown into by with_timeout() can be
nested any number of yield-from calls deep.


From _ at  Tue Oct 30 11:12:17 2012
From: _ at (Laurens Van Houtven)
Date: Tue, 30 Oct 2012 11:12:17 +0100
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>


I've been following the PEP380-related threads and I've reviewed this
stuff, while trying to do the protocols/transports PEP, and trying to glue
the two together.

The biggest difference I can see is that protocols as they've been
discussed are "pull": they get called when some data arrives. They don't
know how much data there is; they just get told "here's some data". The
obvious difference with the API in, eg:

... is that now I have to tell a socket to read n bytes, which "blocks" the
coroutine, then I get some data.

Now, there doesn't have to be an issue; you could simply say:

data = yield from s.recv(4096) # that's the magic number usually right

It seems a bit boilerplatey, but I suppose that eventually could be hidden

But this style is pervasive, for example that's how reading by lines works:

While I'm not a big fan (I may be convinced if I see a protocol test that
looks nice); I'm just wondering if there's any point in trying to write the
pull-style protocols when this works quite differently.

Additionally, I'm not sure if readline belongs on the socket. I understand
the simile with files, though. With the coroutine style I could see how the
most obvious fit would be something like tornado's read_until, or an
as_lines that essentially calls read_until repeatedly. Can the delimiter
for this be modified?

My main syntactic gripe is that when I write @inlineCallbacks code or
monocle code or whatever, when I say "yield" I'm yielding to the reactor.
That makes sense to me (I realize natural language arguments don't always
make sense in a programming language context). "yield from" less so (but
okay, that's what it has to look like). But this just seems weird to me:

yield from trans.send(line.upper())

Not only do I not understand why I'm yielding there in the first place (I
don't have to wait for anything, I just want to push some data out!), it
feels like all of my yields have been replaced with yield froms for no
obvious reason (well, there are reasons, I'm just trying to look at this

I guess Twisted gets away with this because of deferred chaining: that one
deferred might have tons of callbacks in the background, many of which also
doing IO operations, resulting in a sequence of asynchronous operations
that only at the end cause the generator to be run some more.

I guess that belongs in a different thread, though. Even, then, I'm not
sure if I'm uncomfortable because I'm seeing something different from what
I'm used to, or if my argument from English actually makes any sense

Speaking of protocol tests, what would those look like? How do I yell, say,
"POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock
transport, and call the handler with that? (I realize it's early days to be
thinking that far ahead; I'm just trying to figure out how I can contribute
a good protocol definition to all of this).

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Tue Oct 30 12:36:41 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 30 Oct 2012 12:36:41 +0100
Subject: [Python-ideas] non-blocking buffered I/O
References: <>
Message-ID: <20121030123641.23224db2@cosmocat>

Le Tue, 30 Oct 2012 18:10:28 +1300,
Greg Ewing <greg.ewing at> a
?crit :
> Steve Dower wrote:
> > From my point of view, IOCP fits in very well provided the
> > callbacks (which will run in the IOCP thread pool) are only used to
> > unblock tasks.
> Is it really necessary to have a separate thread just to handle
> unblocking tasks? That thread will have very little to do, so
> it could just as well run the tasks too, couldn't it?

The IOCP thread pool is managed by Windows, not you.



From jkbbwr at  Tue Oct 30 14:10:53 2012
From: jkbbwr at (Jakob Bowyer)
Date: Tue, 30 Oct 2012 13:10:53 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Sorry to chime in, but would this be a case where there could be the
syntax `yield to` ?

On Tue, Oct 30, 2012 at 10:12 AM, Laurens Van Houtven <_ at> wrote:
> Hi,
> I've been following the PEP380-related threads and I've reviewed this stuff,
> while trying to do the protocols/transports PEP, and trying to glue the two
> together.
> The biggest difference I can see is that protocols as they've been discussed
> are "pull": they get called when some data arrives. They don't know how much
> data there is; they just get told "here's some data". The obvious difference
> with the API in, eg:
> ... is that now I have to tell a socket to read n bytes, which "blocks" the
> coroutine, then I get some data.
> Now, there doesn't have to be an issue; you could simply say:
> data = yield from s.recv(4096) # that's the magic number usually right
> proto.data_received(4096)
> It seems a bit boilerplatey, but I suppose that eventually could be hidden
> away.
> But this style is pervasive, for example that's how reading by lines works:
> While I'm not a big fan (I may be convinced if I see a protocol test that
> looks nice); I'm just wondering if there's any point in trying to write the
> pull-style protocols when this works quite differently.
> Additionally, I'm not sure if readline belongs on the socket. I understand
> the simile with files, though. With the coroutine style I could see how the
> most obvious fit would be something like tornado's read_until, or an
> as_lines that essentially calls read_until repeatedly. Can the delimiter for
> this be modified?
> My main syntactic gripe is that when I write @inlineCallbacks code or
> monocle code or whatever, when I say "yield" I'm yielding to the reactor.
> That makes sense to me (I realize natural language arguments don't always
> make sense in a programming language context). "yield from" less so (but
> okay, that's what it has to look like). But this just seems weird to me:
> yield from trans.send(line.upper())
> Not only do I not understand why I'm yielding there in the first place (I
> don't have to wait for anything, I just want to push some data out!), it
> feels like all of my yields have been replaced with yield froms for no
> obvious reason (well, there are reasons, I'm just trying to look at this
> naively).
> I guess Twisted gets away with this because of deferred chaining: that one
> deferred might have tons of callbacks in the background, many of which also
> doing IO operations, resulting in a sequence of asynchronous operations that
> only at the end cause the generator to be run some more.
> I guess that belongs in a different thread, though. Even, then, I'm not sure
> if I'm uncomfortable because I'm seeing something different from what I'm
> used to, or if my argument from English actually makes any sense whatsoever.
> Speaking of protocol tests, what would those look like? How do I yell, say,
> "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock
> transport, and call the handler with that? (I realize it's early days to be
> thinking that far ahead; I'm just trying to figure out how I can contribute
> a good protocol definition to all of this).
> cheers
> lvh
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From guido at  Tue Oct 30 15:02:54 2012
From: guido at (Guido van Rossum)
Date: Tue, 30 Oct 2012 07:02:54 -0700
Subject: [Python-ideas] Async API: some more code to review
In-Reply-To: <>
References: <>
Message-ID: <>


I don't want to beat around the bush, I think your approach is too
slow. In may situations I would be guilty of premature optimization
saying this, but (a) the whole *point* of async I/O is to be
blindingly fast (the C10K problem), and (b) the time difference is
rather marked.

I wrote a simple program for each version (attached) that times a
simple double-recursive function, where each recursive level uses

With a depth of 20, wattle takes about 24 seconds on my MacBook Pro.
And the same problem in tulip takes 0.7 seconds! That's close to two
orders of magnitude. Now, this demo is obviously geared towards
showing the pure overhead of the "one future per level" approach
compared to "pure yield from". But that's what you're proposing. And I
think allowing the user to mix yield and yield from is just too risky.
(I got rid of block_r/w() + bare yield as a public API from tulip --
that API is now wrapped up in a generator too. And I can do that
without feeling guilty knowing that an extra level of generators costs
me almost nothing.

Debugging experience: I made the same mistake in each program (I guess
I copied it over before fixing the bug :-), which caused an
AttributeError to happen at the time.time() call. In both frameworks
this was baffling, because it caused the program to exit immediately
without any output. So on this count we're even. :-)

I have to think more about what I'd like to borrow from wattle -- I
agree that it's nice to mark up async functions with a decorator (it
just shouldn't affect call speed), I like being able to start a task
with a single call. Probably more, but my family is calling me to get
out of bed. :-)

--Guido van Rossum (
-------------- next part --------------
A non-text attachment was scrubbed...
Type: application/octet-stream
Size: 624 bytes
Desc: not available
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Type: application/octet-stream
Size: 572 bytes
Desc: not available
URL: <>

From guido at  Tue Oct 30 15:52:51 2012
From: guido at (Guido van Rossum)
Date: Tue, 30 Oct 2012 07:52:51 -0700
Subject: [Python-ideas] Async API: some more code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Mon, Oct 29, 2012 at 7:29 PM, Yury Selivanov < at> wrote:
> Couple of things I like already:
> 1) Use of 'yield from' is completely optional

That's actually my biggest gripe...

> 2) @async decorator.  That makes coroutines more visible and
> allows to add extra methods to them.

Yes on marking them more visibly. No on wrapping each call into an
object that slows down the invocation.

> 3) Tight control over coroutines execution, something that
> is completely missing when you use yield-from.

This I don't understand. What do you mean by "tight control"? And why
would you want it?

> I dislike the choice of name for 'async', though.  Since
> @async-decorated functions are going to be yielded most of the
> time (yield makes them "sync" in that context), I'd stick to
> plain @coroutine.

Hm. I think of it this way: the "async" (or whatever) function *is*
asynchronous, and just calling it does *not* block. However if you
then *yield* (or in my tulip proposal *yield from*) it, that suspends
the current task until the asyc function completes, giving the
*illusion* of synchronicity or blocking. (I have to admit I was
confused by a comment in Steve's example code saying "does not block"
on a line containing a yield, where I have been used to think of such
lines as blocking.)

> P.S. If this approach is viable (optional yield-from, required
> @async-or-something decorator), I can invest some time and
> open source the core of my framework (one benefit is that it
> has lots and lots of unit-tests).

Just open-sourcing the tests would already be useful!!

--Guido van Rossum (

From Steve.Dower at  Tue Oct 30 16:57:39 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 30 Oct 2012 15:57:39 +0000
Subject: [Python-ideas] Async API: some more code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> I don't want to beat around the bush, I think your approach is too slow. In may
> situations I would be guilty of premature optimization saying this, but (a) the
> whole *point* of async I/O is to be blindingly fast (the C10K problem), and (b)
> the time difference is rather marked.
> I wrote a simple program for each version (attached) that times a simple
> double-recursive function, where each recursive level uses yield.
> With a depth of 20, wattle takes about 24 seconds on my MacBook Pro.
> And the same problem in tulip takes 0.7 seconds! That's close to two orders of
> magnitude. Now, this demo is obviously geared towards showing the pure overhead
> of the "one future per level" approach compared to "pure yield from". But that's
> what you're proposing.

I get similar results on my machine with those benchmarks, though the difference
was not so significant with my own (100 connections x 100 messages to - I included The only time there was more
than about 5% difference was when the 'yield from' case was behaving completely
differently (each connection's routine was not interleaving with the others - my
own bug, which I fixed).

Choice of scheduler makes a difference as well. Using my
UnthreadedSocketScheduler() instead of SingleThreadedScheduler() halves the time
taken, and just using "main(depth).result()" reduces that by about 10% again. It
still is not directly comparable to tulip, but there are ways to make them
equivalent (discussed below).

> And I think allowing the user to mix yield and yield from is just too risky.

The errors involved when you get yield and yield from confused are quite clear
in this case. However, if you use 'yield' instead of 'yield from' in tulip, you
simply don't ever run that function. Maybe this will give you an error further
down the track, but it won't be as immediate.

On the other hand, if you're really after extreme performance (*cough*use
C*cough* :) ) we can easily add an "__unwrapped__" attribute to @async that
provides access to the internal generator, which you can then 'yield from' from:

def binary(n):
    if n <= 0:
        return 1
    l = yield from binary.__unwrapped__(n-1)
    r = yield from binary.__unwrapped__(n-1)
    return l + 1 + r

With this change the performance is within 5% of tulip (most times are up to 5%
slower, but some are faster - I'd say margin of error), regardless of the
scheduler. (I've no doubt this could be improved further by modifying _Awaiter
and Future to reduce the amount of memory allocations, and a super optimized
library could use C implementations that still fit the API and work with
existing code.)

I much prefer treating 'yield from __unwrapped__' as an advanced case, so I'm
all for providing ways to optimize async code where necessary, but when I think
about how I'd teach this to a class of undergraduates I'd much rather have the
simpler @async/yield rule (which doesn't even require an understanding of
generators). For me, "get it to work" and "get it to work, fast" comes well
before "get it to work fast".

> (I got rid of block_r/w() + bare yield as a public API from tulip -- that API is
> now wrapped up in a generator too. And I can do that without feeling guilty
> knowing that an extra level of generators costs me almost nothing.

I don't feel particularly guilty about the extra level... if the operations
you're blocking on are that much quicker than the overhead then you probably
don't need to block. I'm pretty certain that even with multiple network cards
you'll still suffer from bus contention before suffering from generator

> Debugging experience: I made the same mistake in each program (I guess I copied
> it over before fixing the bug :-), which caused an AttributeError to happen at
> the time.time() call. In both frameworks this was baffling, because it caused
> the program to exit immediately without any output. So on this count we're even.
> :-)

This is my traceback once I misspell time():

Traceback (most recent call last):
  File "", line 27, in <module>
    SingleThreadScheduler().run(main, depth=depth)
  File "", line 106, in run
    raise self._exit_exception
  File "", line 171, in _step
    next_future = self.generator.send(result)
  File "", line 22, in main
    t1 = time.tme()
AttributeError: 'module' object has no attribute 'tme'

Of course, if you do call an @async function and don't yield (or call result())
then you won't ever see an exception. I don't think there's any nice way to
propagate these automatically (except maybe through a finalizer... not so keen
on that). You can do 'op.add_done_callback(Future.result)' to force the error to
be raised somewhere (or better yet, pass it to a logger - this is why we allow
multiple callbacks, after all).

> I have to think more about what I'd like to borrow from wattle -- I agree that
> it's nice to mark up async functions with a decorator (it just shouldn't affect
> call speed), I like being able to start a task with a single call. 

You'll probably find (as I did in my early work) that starting the task in the
initial call doesn't work with yield from. Because it does the first next()
call, you can't send results/exceptions back in. If all the yields (at the
deepest level) are blank, this might be okay, but it caused me issues when I was
yielding objects to wait for.

I'm also interested in your thoughts on get_future_for(), since that seems to be
one of the more unorthodox ideas of wattle. I can clearly see how it works, but
I have no idea whether I've expressed it well in the description.


From at  Tue Oct 30 17:08:39 2012
From: at (Yury Selivanov)
Date: Tue, 30 Oct 2012 12:08:39 -0400
Subject: [Python-ideas] Async API: some more code to review
In-Reply-To: <>
References: <>
Message-ID: <>


Well, with such a jaw dropping benchmarks results there is no point
in discussion whether it's better to use yield-froms or yields+promises.

But let me also share results of my framework:

- Plain coroutines - 24.4
- Coroutines + greenlets - 34.5
- Coroutines + greenlets + many cython optimizations: 4.79 (still too slow)

Now with dynamically replacing (opcodes magic) 'yield' with 'yield_' to 
entirely avoid generators and some other optimizations I believe it's 
possible to speed it up even further, probably to times below 1 second.

But, again, the price of not using yield-froms is too high (and I don't even
mention hard-to-fix tracebacks when you use just yields)

On 2012-10-30, at 10:52 AM, Guido van Rossum <guido at> wrote:

> On Mon, Oct 29, 2012 at 7:29 PM, Yury Selivanov < at> wrote:
>> Couple of things I like already:
>> 1) Use of 'yield from' is completely optional
> That's actually my biggest gripe...

Yes, let's use just one thing everywhere.

>> 2) @async decorator.  That makes coroutines more visible and
>> allows to add extra methods to them.
> Yes on marking them more visibly. No on wrapping each call into an
> object that slows down the invocation.
>> 3) Tight control over coroutines execution, something that
>> is completely missing when you use yield-from.
> This I don't understand. What do you mean by "tight control"? And why
> would you want it?

Actually, if we make decorating coroutines with @coro-like decorator
strongly recommended (or even required) I can get that tight-control

It gives you the following:

- Breakdown profiling results by individual coroutines
- Blocking code detection
- Hacks to protect finally statements, modify your coroutines 
internals, etc / probably I'm the only one in the world who need this :(
- Better debugging (just logging individual coroutines sometimes helps)

And decorator makes code more future-proof as well.  Who knows what
kind of instruments you need later.

>> I dislike the choice of name for 'async', though.  Since
>> @async-decorated functions are going to be yielded most of the
>> time (yield makes them "sync" in that context), I'd stick to
>> plain @coroutine.
> Hm. I think of it this way: the "async" (or whatever) function *is*
> asynchronous, and just calling it does *not* block. However if you
> then *yield* (or in my tulip proposal *yield from*) it, that suspends
> the current task until the asyc function completes, giving the
> *illusion* of synchronicity or blocking. (I have to admit I was
> confused by a comment in Steve's example code saying "does not block"
> on a line containing a yield, where I have been used to think of such
> lines as blocking.)

"*illusion* of synchronicity or blocking" -- that's precisely the reason
I don't like '@async' used together with yields.

>> P.S. If this approach is viable (optional yield-from, required
>> @async-or-something decorator), I can invest some time and
>> open source the core of my framework (one benefit is that it
>> has lots and lots of unit-tests).
> Just open-sourcing the tests would already be useful!!

When the tulip is ready I simply start integrating them.


From kristjan at  Tue Oct 30 17:05:45 2012
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Tue, 30 Oct 2012 16:05:45 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

> -----Original Message-----
> From: Python-ideas [mailto:python-ideas-
> at] On Behalf Of Guido van
> Rossum
> Sent: 29. okt?ber 2012 16:35
> To: Richard Oudkerk
> Cc: python-ideas at
> Subject: Re: [Python-ideas] Async API: some code to review
> > It is a common pattern to have multiple threads/processes trying to
> > accept connections on an single listening socket, so it would be
> > unfortunate to disallow that.
> Ah, but that will work -- each thread has its own pollster, event loop and
> scheduler and collection of tasks. And listening on a socket is a pretty special
> case anyway -- I imagine we'd build a special API just for that purpose.

I don't think he meant actual "threads" but rather thread in the context of coroutines.
in StacklessIO (our custom sockets lib for stackless) multiple tasklets can have an "accept" pending on a socket, so that when multiple connections arrive, wakeup time is minimal.

We have also been careful to allow multiple operations on sockets, from different tasklets, although the same caveats apply as when multiple threads perform operations, i.e. no guarantees about it making any sense.  The important bit is that when such things happen, you get some defined result, rather than for example a tasklet being infinitely blocked.  Such errors are suprising and hard to debug.


From Steve.Dower at  Tue Oct 30 17:27:37 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 30 Oct 2012 16:27:37 +0000
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing wrote:
> Steve Dower wrote:
>> For example, (library) code that needs a socket to be ready can ask 
>> the current scheduler if it can do "select([sock], [], [])",
> I think you're mixing up the scheduler and event loop layers here. If the scheduler
> is involved in this at all, it would only be to pass the request on to the event loop.

Could you clarify for me what goes into each layer? I've been treating "scheduler" and "event loop" as more-or-less synonyms (I see an event loop as one possible implementation of a scheduler).

If you consider the scheduler to be the part that calls __next__() on the generator and sets up callbacks, that is implemented in my _Awaiter class, and should never need to be touched.

Possibly the difference in terminology comes out because I'm not treating I/O specially? As far as wattle is concerned, I/O is just another operation that will eventually call Future.set_result(). I've tried to capture this in my write-up:


From Steve.Dower at  Tue Oct 30 17:32:19 2012
From: Steve.Dower at (Steve Dower)
Date: Tue, 30 Oct 2012 16:32:19 +0000
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

Greg Ewing wrote:
> Steve Dower wrote:
>> From my point of view, IOCP fits in very well provided the callbacks 
>> (which will run in the IOCP thread pool) are only used to unblock tasks.
> Is it really necessary to have a separate thread just to handle unblocking tasks?
> That thread will have very little to do, so it could just as well run the tasks too,
> couldn't it?

In the C10k problem (which seems to keep coming up as our "goal") that thread will have a lot to do.

I would expect that most actual users of this API could keep running on that thread without issue, but since it is OS managed and belongs to a pool, the chances of deadlocking are much higher than on a 'real' CPU thread. Limiting its work to unblocking at least prevents the end developer from having to worry about this.


From guido at  Tue Oct 30 17:40:18 2012
From: guido at (Guido van Rossum)
Date: Tue, 30 Oct 2012 09:40:18 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

[Richard Oudkerk (?)]
>> > It is a common pattern to have multiple threads/processes trying to
>> > accept connections on an single listening socket, so it would be
>> > unfortunate to disallow that.

>> Ah, but that will work -- each thread has its own pollster, event loop and
>> scheduler and collection of tasks. And listening on a socket is a pretty special
>> case anyway -- I imagine we'd build a special API just for that purpose.

On Tue, Oct 30, 2012 at 9:05 AM, Kristj?n Valur J?nsson
<kristjan at> wrote:
> I don't think he meant actual "threads" but rather thread in the context of coroutines.

(Yes, we figured that out already. :-)

> in StacklessIO (our custom sockets lib for stackless) multiple tasklets can have an "accept" pending on a socket, so that when multiple connections arrive, wakeup time is minimal.

What kind of time savings are we talking about? I imagine that the
accept() loop I put in tulip/ is fast enough in terms of
response time (latency) -- throughput would seem the more important
measure (and I have no idea of this yet).

> We have also been careful to allow multiple operations on sockets, from different tasklets, although the same caveats apply as when multiple threads perform operations, i.e. no guarantees about it making any sense.  The important bit is that when such things happen, you get some defined result, rather than for example a tasklet being infinitely blocked.  Such errors are suprising and hard to debug.

That's a good point. It should either cause an immediate, clear
exception, or interleave the data without compromising integrity of
the scheduler or the app.

--Guido van Rossum (

From kristjan at  Tue Oct 30 17:11:40 2012
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Tue, 30 Oct 2012 16:11:40 +0000
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

> -----Original Message-----
> From: Python-ideas [mailto:python-ideas-
> at] On Behalf Of Greg Ewing
> Sent: 30. okt?ber 2012 05:10
> To: python-ideas at
> Subject: Re: [Python-ideas] non-blocking buffered I/O
> > From my point of view, IOCP fits in very well provided the callbacks
> > (which will run in the IOCP thread pool) are only used to unblock tasks.
> Is it really necessary to have a separate thread just to handle unblocking
> tasks? That thread will have very little to do, so it could just as well run the
> tasks too, couldn't it?

StacklessIO (which is an IOCP implementation for stackless) uses callbacks on an arbitrary thread (in practice a worker thread from window's own threadpool that it keeps for such things) to unblock tasklets.  You don't want to do any significant work on such a thread because it is used for other stuff by the system.

By the way:  We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive.  I spent a lot of effort figuring out why that is and found no real answer.  The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity.  Also, the main thread is interrupted if it is doing a sleep.  This is much more efficient.


From guido at  Tue Oct 30 18:34:12 2012
From: guido at (Guido van Rossum)
Date: Tue, 30 Oct 2012 10:34:12 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 30, 2012 at 3:12 AM, Laurens Van Houtven <_ at> wrote:
> I've been following the PEP380-related threads and I've reviewed this stuff,
> while trying to do the protocols/transports PEP, and trying to glue the two
> together.

Thanks! I know it can't be easy to keep up with all the threads (and
now code repos).

> The biggest difference I can see is that protocols as they've been discussed
> are "pull": they get called when some data arrives. They don't know how much
> data there is; they just get told "here's some data". The obvious difference
> with the API in, eg:
> ... is that now I have to tell a socket to read n bytes, which "blocks" the
> coroutine, then I get some data.

Yes. But do note that is mostly a throw-away example
written to support the only style I am familiar with -- synchronous
reads and writes. My point in writing this particular set of
transports is that I want to take existing synchronous code (e.g. a
threaded server built using the stdlib's
socketserver.ThreadingTCPServer class) and make minimal changes to the
protocol logic to support async operation -- those minimal changes
should boil down to using a different way to set up a connection or a
listening socket or constructing a stream from a socket, and putting
"yield from" in front of the blocking operations (recv(), send(), and
the read/readline/write operations on the streams.

I'm still looking for guidance from Twisted and Tornado (and you!) to
come up with better abstractions for transports and protocols. The
underlying event loop *does* support a style where an object registers
a callback function once which is called repeatedly, as long as the
socket is readable (or writable, depending on the registration call).

> Now, there doesn't have to be an issue; you could simply say:
> data = yield from s.recv(4096) # that's the magic number usually right
> proto.data_received(4096)

(Off-topic: ages ago I determined that the optimal block size is
actually 8192. But for all I know it is 256K these days. :-)

> It seems a bit boilerplatey, but I suppose that eventually could be hidden
> away.
> But this style is pervasive, for example that's how reading by lines works:

Right -- again, this is all geared towards making it palatable for
people used to write synchronous code (either single-threaded or
multi-threaded), not for people used to Twisted.

> While I'm not a big fan (I may be convinced if I see a protocol test that
> looks nice);

Check out urlfetch() in

For sure, this isn't "pretty" and it should be rewritten using more
abstraction -- I only wrote the entire thing as a single function
because I was focused on the scheduler and event loop. And it is
clearly missing a buffering layer for writing (it currently uses a
separate send() call for each line of the HTTP headers, blech). But it
implements a fairly complex (?) protocol and it performs well enough.

> I'm just wondering if there's any point in trying to write the
> pull-style protocols when this works quite differently.

Perhaps you could try to write some pull-style transports and
protocols for tulip to see if anything's missing from the scheduler
and eventloop APIs or implementations? I'd be happy to rename to so there's room for a competing, and then we can compare apples to apples.

(Unlike the yield vs. yield-from issue, where I am very biased, I am
not biased about push vs. pull style. I just coded up what I was most
familiar with first.)

> Additionally, I'm not sure if readline belongs on the socket.

It isn't -- it is on the BufferedReader, which wraps around the socket
(or other socket-like transport, like SSL). This is similar to the way
the stdlib socket.socket class has a makefile() method that returns a
stream wrapping the socket.

> I understand the simile with files, though.

Right, that's where I've gotten most of my inspiration. I figure they
are a good model to lure unsuspecting regular Python users in. :-)

> With the coroutine style I could see how the
> most obvious fit would be something like tornado's read_until, or an
> as_lines that essentially calls read_until repeatedly. Can the delimiter for
> this be modified?

You can write your own BufferedReader, and if this is a common pattern
we can make it a standard API. Unlike the SocketTransport and
SslTransport classes, which contain various I/O hacks and integrate
tightly with the polling capability of the eventloop, I consider
BufferedReader plain user code. Antoine also hinted that with not too
many changes we could reuse the existing buffering classes in the
stdlib io module, which are implemented in C.

> My main syntactic gripe is that when I write @inlineCallbacks code or
> monocle code or whatever, when I say "yield" I'm yielding to the reactor.
> That makes sense to me (I realize natural language arguments don't always
> make sense in a programming language context). "yield from" less so (but
> okay, that's what it has to look like). But this just seems weird to me:
> yield from trans.send(line.upper())
> Not only do I not understand why I'm yielding there in the first place (I
> don't have to wait for anything, I just want to push some data out!), it
> feels like all of my yields have been replaced with yield froms for no
> obvious reason (well, there are reasons, I'm just trying to look at this
> naively).

Are you talking about yield vs. yield-from here, or about the need to
suspend every write? Regarding yield vs. yield-from, please squint and
get used to seeing yield-from everywhere -- the scheduler
implementation becomes *much* simpler and *much* faster using
yield-from, so much so that there really is no competition.

As to why you would have to suspend each time you call send(), that's
mostly just an artefact of the incomplete example -- I didn't
implement a BufferedWriter yet. I also have some worries about a task
producing data at a rate faster than the socket can drain it from the
buffer, but in practice I would probably relent and implement a
write() call that returns immediately and should *not* be used with
yield-from. (Unfortunately you can't have a call that works with or
without yield-from.) I think there's a throttling mechanism in Twisted
that can probably be copied here.

> I guess Twisted gets away with this because of deferred chaining: that one
> deferred might have tons of callbacks in the background, many of which also
> doing IO operations, resulting in a sequence of asynchronous operations that
> only at the end cause the generator to be run some more.
> I guess that belongs in a different thread, though. Even, then, I'm not sure
> if I'm uncomfortable because I'm seeing something different from what I'm
> used to, or if my argument from English actually makes any sense whatsoever.
> Speaking of protocol tests, what would those look like? How do I yell, say,
> "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock
> transport, and call the handler with that? (I realize it's early days to be
> thinking that far ahead; I'm just trying to figure out how I can contribute
> a good protocol definition to all of this).

Actually I think the ease of writing tests should definitely be taken
into account when designing the APIs here. In the Zope world, Jim
Fulton wrote a simple abstraction for networking code that explicitly
provides for testing: (it also
supports yield-style callbacks, similar to Twisted's inlineCallbacks).

I currently don't have any tests, apart from manually running
and checking its output. I am a bit hesitant to add unit tests in this
early stage, because keeping the tests passing inevitably slows down
the process of ripping apart the API and rebuilding it in a different
way -- something I do at least once a day, whenever I get feedback or
a clever thought strikes me or something annoying reaches my trigger

But I should probably write at least *some* tests, I'm sure it will be
enlightening and I will end up changing the APIs to make testing
easier. It's in the TODO.

--Guido van Rossum (

From guido at  Tue Oct 30 18:47:24 2012
From: guido at (Guido van Rossum)
Date: Tue, 30 Oct 2012 10:47:24 -0700
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 30, 2012 at 9:11 AM, Kristj?n Valur J?nsson
<kristjan at> wrote:
> By the way:  We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive.  I spent a lot of effort figuring out why that is and found no real answer.  The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity.  Also, the main thread is interrupted if it is doing a sleep.  This is much more efficient.

In which Python version? The GIL has been redesigned at least once.
Also the latency (not necessarily cost) to acquire the GIL varies by
the sys.setswitchinterval setting. (Actually the more responsive you
make it, the more it will cost you in overall performance.)

I do think that using the pending call mechanism is the right solution here.

--Guido van Rossum (

From shibturn at  Tue Oct 30 18:50:53 2012
From: shibturn at (Richard Oudkerk)
Date: Tue, 30 Oct 2012 17:50:53 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <k6p41r$4m1$>

On 30/10/2012 4:40pm, Guido van Rossum wrote:
> What kind of time savings are we talking about? I imagine that the
> accept() loop I put in tulip/ is fast enough in terms of
> response time (latency) -- throughput would seem the more important
> measure (and I have no idea of this yet).

With Windows overlapped I/O I think you can get substantially better 
throughput by starting many AcceptEx() calls in parallel.  (For bonus 
points you can also recycle the accepted connections using DisconnectEx().)

Even so, Windows socket code always seems to be much slower than the 
equivalent on Linux.


From at  Tue Oct 30 18:52:54 2012
From: at (Yury Selivanov)
Date: Tue, 30 Oct 2012 13:52:54 -0400
Subject: [Python-ideas] Async API
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-30, at 1:53 AM, Greg Ewing <greg.ewing at> wrote:

> Yury Selivanov wrote:
>> So in your example scheduler would never ever has a question of interrupting c2(), because it wasn't called with any restriction/timeout.
>> There simply no reason to interrupt it ever.
> But there's nothing to stop someone writing
>   def c3():
>      try:
>      	 yield from with_timeout(10.0, c1())
>      except TimeoutError:
>         print("That's cool, I can cope with that")
> Also, it's not just TimeoutErrors that are a potential
> problem, it's any asynchronous exception. For example,
> the task calling c1() might get cancelled by another
> task while c2() is blocked. If cancelling is implemented
> by throwing in an exception, you have the same problem.
>> Then you need scheduler to know if it is in its finally or not.  Because its
>> c2() which was run with a timeout.  It's c2() code that may be subject to
>> aborting.
> I'm really not following your reasoning here. You seem to
> be speaking as if with_timeout() calls only have an effect
> one level deep. But that's not the case -- the frame that a
> TimeoutError gets thrown into by with_timeout() can be
> nested any number of yield-from calls deep.


Looks like I'm failing to explain my point of view (which is maybe
wrong).  The problem is tough, and without a shared code to debug 
and test ideas on it's just hard to communicate.

Let's get back to this issue once we have a framework/library to 
work on.


From guido at  Tue Oct 30 19:10:10 2012
From: guido at (Guido van Rossum)
Date: Tue, 30 Oct 2012 11:10:10 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <k6p41r$4m1$>
References: <>
Message-ID: <>

On Tue, Oct 30, 2012 at 10:50 AM, Richard Oudkerk <shibturn at> wrote:
> On 30/10/2012 4:40pm, Guido van Rossum wrote:
>> What kind of time savings are we talking about? I imagine that the
>> accept() loop I put in tulip/ is fast enough in terms of
>> response time (latency) -- throughput would seem the more important
>> measure (and I have no idea of this yet).

> With Windows overlapped I/O I think you can get substantially better
> throughput by starting many AcceptEx() calls in parallel.  (For bonus points
> you can also recycle the accepted connections using DisconnectEx().)

Hm... I already have on my list that the transports should probably be
platform dependent. So this would suggest that the standard accept
loop should be abstracted as a method on the transport object, right?

> Even so, Windows socket code always seems to be much slower than the
> equivalent on Linux.

Is this Python sockets code or are you also talking about other
languages, like C++?

--Guido van Rossum (

From solipsis at  Tue Oct 30 20:31:01 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 30 Oct 2012 20:31:01 +0100
Subject: [Python-ideas] Async API: some code to review
References: <>
Message-ID: <>

On Tue, 30 Oct 2012 10:34:12 -0700
Guido van Rossum <guido at> wrote:
> >
> > Speaking of protocol tests, what would those look like? How do I yell, say,
> > "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock
> > transport, and call the handler with that? (I realize it's early days to be
> > thinking that far ahead; I'm just trying to figure out how I can contribute
> > a good protocol definition to all of this).
> Actually I think the ease of writing tests should definitely be taken
> into account when designing the APIs here.

+11 !



From guido at  Tue Oct 30 21:24:23 2012
From: guido at (Guido van Rossum)
Date: Tue, 30 Oct 2012 13:24:23 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Tue, Oct 30, 2012 at 12:46 PM, Richard Oudkerk <shibturn at> wrote:
> The difference in speed between AF_INET sockets and pipes on Windows is much
> bigger than the difference between AF_INET sockets and pipes on Unix.
> (Who knows, maybe it is just my firewall which is causing the slowdown...)

Here's another unscientific benchmark: I wrote a stupid "http" server
(stupider than actually) that accepts HTTP requests and
responds with the shortest possible "200 Ok" response. This should
provide an adequate benchmark of how fast the event loop, scheduler,
and transport are at accepting and closing connections (and reading
and writing small amounts). On my linux box at work, over localhost,
it seems I can handle 10K requests (sent using 'ab' over localhost) in
1.6 seconds. Is that good or bad? The box has insane amounts of memory
and 12 cores (?) and rates at around 115K pystones.

(I tried to repro this on my Mac, but I am running into problems,
perhaps due to system limits.)

--Guido van Rossum (

From carlopires at  Tue Oct 30 21:33:12 2012
From: carlopires at (Carlo Pires)
Date: Tue, 30 Oct 2012 18:33:12 -0200
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

2012/10/30 Guido van Rossum <guido at>

> Here's another unscientific benchmark: I wrote a stupid "http" server
> (stupider than actually) that accepts HTTP requests and
> responds with the shortest possible "200 Ok" response. This should
> provide an adequate benchmark of how fast the event loop, scheduler,
> and transport are at accepting and closing connections (and reading
> and writing small amounts). On my linux box at work, over localhost,
> it seems I can handle 10K requests (sent using 'ab' over localhost) in
> 1.6 seconds. Is that good or bad? The box has insane amounts of memory
> and 12 cores (?) and rates at around 115K pystones.

Take a look at

It is a bit outdated but can be useful to get some insight.
  Carlo Pires
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From greg.ewing at  Tue Oct 30 21:37:44 2012
From: greg.ewing at (Greg Ewing)
Date: Wed, 31 Oct 2012 09:37:44 +1300
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Kristj?n Valur J?nsson wrote:
> in StacklessIO (our custom sockets lib for stackless) multiple tasklets can
> have an "accept" pending on a socket, so that when multiple connections arrive,
> wakeup time is minimal.

With sufficiently cheap tasks, there's another way to approach
this: one task is dedicated to accepting connections from the
socket, and it spawns a new task to handle each connection.


From solipsis at  Tue Oct 30 22:30:20 2012
From: solipsis at (Antoine Pitrou)
Date: Tue, 30 Oct 2012 22:30:20 +0100
Subject: [Python-ideas] Async API: some code to review
References: <>
Message-ID: <>

On Tue, 30 Oct 2012 13:24:23 -0700
Guido van Rossum <guido at> wrote:
> On Tue, Oct 30, 2012 at 12:46 PM, Richard Oudkerk <shibturn at> wrote:
> > The difference in speed between AF_INET sockets and pipes on Windows is much
> > bigger than the difference between AF_INET sockets and pipes on Unix.
> >
> > (Who knows, maybe it is just my firewall which is causing the slowdown...)
> Here's another unscientific benchmark: I wrote a stupid "http" server
> (stupider than actually) that accepts HTTP requests and
> responds with the shortest possible "200 Ok" response. This should
> provide an adequate benchmark of how fast the event loop, scheduler,
> and transport are at accepting and closing connections (and reading
> and writing small amounts). On my linux box at work, over localhost,
> it seems I can handle 10K requests (sent using 'ab' over localhost) in
> 1.6 seconds. Is that good or bad? The box has insane amounts of memory
> and 12 cores (?) and rates at around 115K pystones.

It sounds both good and useless to me :)



From paul at  Tue Oct 30 22:45:46 2012
From: paul at (Paul Colomiets)
Date: Tue, 30 Oct 2012 23:45:46 +0200
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <k6lvdf$i5n$>
References: <>
Message-ID: <>

Hi Richard,

On Mon, Oct 29, 2012 at 3:13 PM, Richard Oudkerk <shibturn at> wrote:
> On 28/10/2012 11:52pm, Guido van Rossum wrote:
>> I'm most interested in feedback on the design of and
>>, and to a lesser extent on the design of;
>> is just an example of how this style works out in practice.
> What happens if two tasks try to do a read op (or two tasks try to do a
> write op) on the same file descriptor?  It looks like the second one to do
> scheduling.block_r(fd) will cause the first task to be forgotten, causing
> the first task to block forever.
> Shouldn't there be a list of pending readers and a list of pending writers
> for each fd?

There is another approach to handle this. You create a dedicated
coroutine which does writing (or reading). And if other coroutine
needs to write, it puts data into a queue (or channel), and wait until
writer coroutine picks it up. This way you don't care about atomicity
of writes, and a lot of other things.

This approach is similar to what Greg Ewing proposed for handling
accept() recently.


From shibturn at  Tue Oct 30 23:01:19 2012
From: shibturn at (Richard Oudkerk)
Date: Tue, 30 Oct 2012 22:01:19 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <k6pinf$8ss$>

On 30/10/2012 8:24pm, Guido van Rossum wrote:
> Here's another unscientific benchmark: I wrote a stupid "http" server
> (stupider than actually) that accepts HTTP requests and
> responds with the shortest possible "200 Ok" response. This should
> provide an adequate benchmark of how fast the event loop, scheduler,
> and transport are at accepting and closing connections (and reading
> and writing small amounts). On my linux box at work, over localhost,
> it seems I can handle 10K requests (sent using 'ab' over localhost) in
> 1.6 seconds. Is that good or bad? The box has insane amounts of memory
> and 12 cores (?) and rates at around 115K pystones.

I tried the simple single threaded benchmark below on my laptop.

                                        | Connections/sec
Linux                                  |   6000-11000
Linux in a VM (with 1 cpu assigned)    |         4600
Windows                                |         1400

On Windows this sometimes failed with:

   OSError: [WinError 10055] An operation on a socket could not
   be performed because the system lacked sufficient buffer
   space or because a queue was full

import socket, time, sys, argparse

N = 10000

def server():
     l = socket.socket()
     l.bind(('', 0))
     print('listening on port', l.getsockname()[1])
     while True:
         a, _ = l.accept()
         data = a.recv(20)

def client(port):
     start = time.time()
     for i in range(N):
         with socket.socket() as c:
             c.connect(('', port))
             res = c.recv(20)
             assert res == b'FOO'
     elapsed = time.time() - start
     print("elapsed=%s, connections/sec=%s" % (elapsed, N/elapsed))

parser = argparse.ArgumentParser()
parser.add_argument('--port', type=int, default=None,
                     help='port to connect to')
args = parser.parse_args()

if args.port is not None:


From grosser.meister.morti at  Wed Oct 31 06:02:52 2012
From: grosser.meister.morti at (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Wed, 31 Oct 2012 06:02:52 +0100
Subject: [Python-ideas] Support data: URLs in urllib
Message-ID: <>

Sometimes it would be handy to read data:-urls just like any other url. While it is pretty easy to 
parse a data: url yourself I think it would be nice if urllib could do this for you.

Example data url parser:

 >>> import base64
 >>> import urllib.parse
 >>> def read_data_url(url):
 >>> 	scheme, data = url.split(":")
 >>> 	assert scheme == "data", "unsupported scheme: "+scheme
 >>> 	mimetype, data = data.split(",")
 >>> 	if mimetype.endswith(";base64"):
 >>> 		return mimetype[:-7] or None, base64.b64decode(data.encode("UTF-8"))
 >>> 	else:
 >>> 		return mimetype or None, urllib.parse.unquote(data).encode("UTF-8")

See also:


From rene at  Wed Oct 31 07:16:47 2012
From: rene at (Rene Nejsum)
Date: Wed, 31 Oct 2012 07:16:47 +0100
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

> There is another approach to handle this. You create a dedicated
> coroutine which does writing (or reading). And if other coroutine
> needs to write, it puts data into a queue (or channel), and wait until
> writer coroutine picks it up. This way you don't care about atomicity
> of writes, and a lot of other things.

I support this idea, IMHO it's by far the easiest (or least problematic)
way to handle the complexity of concurrency.

What's the general position on monkey patching existing libs ? This
might not be possible with the above ?


> This approach is similar to what Greg Ewing proposed for handling
> accept() recently.
> -- 
> Paul
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

From p.f.moore at  Wed Oct 31 08:54:29 2012
From: p.f.moore at (Paul Moore)
Date: Wed, 31 Oct 2012 07:54:29 +0000
Subject: [Python-ideas] Support data: URLs in urllib
In-Reply-To: <>
References: <>
Message-ID: <>

On Wednesday, 31 October 2012, Mathias Panzenb?ck wrote:

> Sometimes it would be handy to read data:-urls just like any other url.
> While it is pretty easy to parse a data: url yourself I think it would be
> nice if urllib could do this for you.
> Example data url parser:

IIUC, this should be possible with a custom opener. While it might be nice
to have this in the stdlib, it would also be a really useful recipe to have
in the docs, showing how to create and install a simple custom opener into
the default set of openers (so that urllib.request gains the ability to
handle data rules automatically). Would you be willing to submit a doc
patch to cover this?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From kristjan at  Wed Oct 31 10:29:43 2012
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Wed, 31 Oct 2012 09:29:43 +0000
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

> -----Original Message-----
> From: gvanrossum at [mailto:gvanrossum at] On Behalf
> Of Guido van Rossum
> Sent: 30. okt?ber 2012 17:47
> To: Kristj?n Valur J?nsson
> Cc: python-ideas at
> Subject: Re: [Python-ideas] non-blocking buffered I/O
> On Tue, Oct 30, 2012 at 9:11 AM, Kristj?n Valur J?nsson
> <kristjan at> wrote:
> > By the way:  We found that acquiring the GIL by a random external thread
> in response to the IOCP to wake up tasklets was incredibly expensive.  I
> spent a lot of effort figuring out why that is and found no real answer.  The
> mechanism we now use is to let the external worker thread schedule a
> "pending call" which is serviced by the main thread at the earliest
> opportunity.  Also, the main thread is interrupted if it is doing a sleep.  This is
> much more efficient.
> In which Python version? The GIL has been redesigned at least once.
> Also the latency (not necessarily cost) to acquire the GIL varies by the
> sys.setswitchinterval setting. (Actually the more responsive you make it, the
> more it will cost you in overall performance.)
> I do think that using the pending call mechanism is the right solution here.

I am talking about 2.7, of course, the python of hard working lumberjacks everywhere :)

Anyway I don't think the issue is much affected by the particular GIL implementation.
Alternative a)
	Callback comes on arbitrary thread
	arbitrary thread calls PyGILState_Ensure
		(This causes a _dynamic thread state_ to be generated for the arbitrary thread, and the GIL to be subsequently acquired)
	arbitrary thread does whatever python gymnastics required to complete the IO (wake up tasklet
	arbitrary thread calls PyGILState_Release

For whatever reason, this approach _increased CPU usage_ on a loaded server.  Latency was fine, throughput the same, and the delay in actual GIL acquisition was ok.  I suspect that the problem lies with the dynamic acquisition of a thread state, and other initialization that may occur.  I did experiment with having a cache of unused threadstates on the ready for external threads, but it didn't get me anywhere.  This could also be the result of cache thrashing or something that doesn't show up immediately on a multicore cpu.

Alternative b)
	Callback comes on arbitrary thread
	external thread callse PyEval_SchedulePendingCall()
		This grabs a static lock, puts in a record, and signals to python that something needs to be done immediately. 
	external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout.
	Main thread wakes up from its sleep (if it was sleeping).
	Main thread runs python code, causing it to immediately service the scheduled pending call, causing it to perform the wait.

In reality, StacklessIO uses a slight variation of the above:

StacklessIO dispatch system
	Callback comes on arbitrary thread
	external thread schedules a completion event in its own "dispatch" buffer to be serviced by the main thread.  This is protected by its own lock, and doesn't need the GIL.
	external thread callse PyEval_SchedulePendingCall() to "tick" the dispatch buffer
	external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout.
	If main thread is sleeping: Main thread wakes up from its sleep
		Immediately at after sleeping, the main thread will 'tick' the dispatch queue
	               After ticking, tasklets may have been made runnable, so the main thread may continue out into the main loop of the application to do work.  If not, it may continue sleeping.
	Main thread runs python code, causing it to immediately service the scheduled pending call, which will tick the dispatch queue.  This may be a no-op if the main thread was sleeping and was already ticked.

The issue we were facing was not with latency (although grabbing the GIL when the main thread is busy is slower than notifying it of a pending call), but with unexplained increased cpu showing up.  A proxy node servicing 2000 clients or upwards would suddenly double or triple its cpu.

The reason I'm mentioning this here is that this is important.  We have spent quite some time and energy on trying to figure out the most efficient way to complete IOCP from an arbitrary thread and this is the end result.  Perhaps things can be done to improve this.  Also, it is really important to study these things under real load, experience has shown me that the most innocuous changes that work well in the lab suddenly start behaving strangely in the field.


From glyph at  Wed Oct 31 11:10:18 2012
From: glyph at (Glyph)
Date: Wed, 31 Oct 2012 03:10:18 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Finally getting around to this one...

I am sorry if I'm repeating any criticism that has already been rehashed in this thread.  There is really a deluge of mail here and I can't keep up with it.  I've skimmed some of it and avoided or noted things that I did see mentioned, but I figured I should write up something before next week.

To make a long story short, my main points here are:

I think tulip unfortunately has a lot of the problems I tried to describe in earlier messages,
it would be really great if we could have a core I/O interface that we could use for interoperability with Twisted before bolting a requirement for coroutine trampolines on to everything,
twisted-style protocol/transport separation is really important and this should not neglect it.  As I've tried to illustrate in previous messages, an API where applications have to call send() or recv() is just not going to behave intuitively in edge cases or perform well,
I know it's a prototype, but this isn't such an unexplored area that it should be developed without TDD: all this code should both have tests and provide testing support to show how applications that use it can be tested
the scheduler module needs some example implementation of something like Twisted's gatherResults for me to critique its expressiveness; it looks like it might be missing something in the area of one task coordinating multiple others but I can't tell 

On Oct 28, 2012, at 4:52 PM, Guido van Rossum <guido at> wrote:

> The pollster has a very simple API: add_reader(fd, callback, *args),
> add_writer(<ditto>), remove_reader(fd), remove_writer(fd), and
> poll(timeout) -> list of events. (fd means file descriptor.) There's
> also pollable() which just checks if there are any fds registered. My
> implementation requires fd to be an int, but that could easily be
> extended to support other types of event sources.

I don't see how that is.  All of the mechanisms I would leverage within Twisted to support other event sources are missing (e.g.: abstract interfaces for those event sources).  Are you saying that a totally different pollster could just accept a different type to add_reader, and not an integer?  If so, how would application code know how to construct something else.

> I'm not super happy that I have parallel reader/writer APIs, but passing a separate read/write flag didn't come out any more elegant, and I don't foresee other operation types (though I may be wrong).

add_reader and add_writer is an important internal layer of the API for UNIX-like operating systems, but the design here is fundamentally flawed in that application code (e.g. needs to import concrete socket-handling classes like SocketTransport and BufferedReader in order to synthesize a transport.  These classes might need to vary their behavior significantly between platforms, and application code should not be manipulating them unless there is a serious low-level need to.

It looks like you've already addressed the fact that some transports need to be platform-specific.  That's not quite accurate, unless you take a very broad definition of "platform".  In Twisted, the basic socket-based TCP transport is actually supported across all platforms; but some other *APIs* (well, let's be honest, right now, just IOCP, but there have been others, such as java's native I/O APIs under Jython, in the past).

You have to ask the "pollster" (by which I mean: reactor) for transport objects, because different multiplexing mechanisms can require different I/O APIs, even for basic socket I/O.  This is why I keep talking about IOCP.  It's not that Windows is particularly great, but that the IOCP API, if used correctly, is fairly alien, and is a good proxy for other use-cases which are less direct to explain, like interacting with GUI libraries where you need to interact with the GUI's notion of a socket to get notifications, rather than a raw FD.  (GUI libraries often do this because they have to support Windows and therefore IOCP.)  Others in this thread have already mentioned the fact that ZeroMQ requires the same sort of affordance.  This is really a design error on 0MQ's part, but, you have to deal with it anyway ;-).

More importantly, concretely tying everything to sockets is just bad design.  You want to be able to operate on pipes and PTYs (which need to call read(), or, a bunch of gross ioctl()s and then read(), not recv()).  You want to be able to able to operate on these things in unit tests without involving any actual file descriptors or syscalls.  The higher level of abstraction makes regular application code a lot shorter, too: I was able to compress down to 22 lines by removing all the comments and logging and such, but that is still more than twice as long as the (9 line) echo server example on the front page of <>.  It's closer in length to the (19 line) full line-based publish/subscribe protocol over on the third tab.

Also, what about testing? You want to be able to simulate the order of responses of multiple syscalls to coerce your event-driven program to receive its events in different orders.  One of the big advantages of event driven programming is that everything's just a method call, so your unit tests can just call the methods to deliver data to your program and see what it does, without needing to have a large, elaborate simulation edifice to pretend to be a socket.  But, once you mix in the magic of the generator trampoline, it's somewhat hard to assemble your own working environment without some kind of test event source; at least, it's not clear to me how to assemble a Task without having a pollster anywhere, or how to make my own basic pollster for testing.

> The event loop has two basic ways to register callbacks:
> call_soon(callback, *args) causes callback(*args) to be called the
> next time the event loop runs; call_later(delay, callback, *args)
> schedules a callback at some time (relative or absolute) in the
> future.

"relative or absolute" is hiding the whole monotonic-clocks discussion behind a simple phrase, but that probably does not need to be resolved here... I'll let you know if we ever figure it out :).

> This implements some internet primitives using the APIs in
> (including block_r() and block_w()). I call them
> transports but they are different from transports Twisted; they are
> closer to idealized sockets. SocketTransport wraps a plain socket,
> offering recv() and send() methods that must be invoked using yield
> from.

I feel I should note that these methods behave inconsistently; send() behaves as sendall(), re-trying its writes until it receives a full buffer, but recv() may yield a short read.

(But most importantly, block_r and block_w are insufficient as primitives; you need a separate pollster that uses write_then_block(data) and read_then_block() too, which may need to dispatch to WSASend/WSARecv or WriteFile/ReadFile.)

> SslTransport wraps an ssl socket (luckily in Python 2.6 and up,
> stdlib ssl sockets have good async support!).

stdlib ssl sockets have async support that makes a number of UNIX-y assumptions.  The wrap_socket trick doesn't work with IOCP, because the I/O operations are initiated within the SSL layer, and therefore can't be associated with a completion port, so they won't cause a queued completion status trigger and therefore won't wake up the loop.  This plagued us for many years within Twisted and has only relatively recently been fixed: <>.

Since probably 99% of the people on this list don't actually give a crap about Windows, let me give a more practical example: you can't do SSL over a UNIX pipe.  Off the top of my head, this means you can't write a command-line tool to encrypt a connection via a shell pipeline, but there are many other cases where you'd expect to be able to get arbitrary I/O over stdout.

It's reasonable, of course, for lots of Python applications to not care about high-performance, high-concurrency SSL on Windows,; select() works okay for many applications on Windows.  And most SSL happens on sockets, not pipes, hence the existence of the OpenSSL API that the stdlib ssl module exposes for wrapping sockets.  But, as I'll explain in a moment, this is one reason that it's important to be able to give your code a turbo boost with Twisted (or other third-party extensions) once you start encountering problems like this.

> I don't particularly care about the exact abstractions in this module;
> they are convenient and I was surprised how easy it was to add SSL,
> but still these mostly serve as somewhat realistic examples of how to
> use

This is where I think we really differ.

I think that the whole attempt to build a coroutine scheduler at the low level is somewhat misguided and will encourage people to write misleading, sloppy, incorrect programs that will be tricky to debug (although, to be fair, not quite as tricky as even more misleading/sloppy/incorrect multi-threaded ones).  However, I'm more than happy to agree to disagree on this point: clearly you think that forests of yielding coroutines are a big part of the future of Python.  Maybe you're even right to do so, since I have no interest in adding language features, whereas if you hit a rough edge in 'yield' syntax you can sand it off rather than living with it.  I will readily concede that 'yield from' and 'return' are nicer than the somewhat ad-hoc idioms we ended up having to contend with in the current iteration of @inlineCallbacks.  (Except for the exit-at-a-distance problem, which it doesn't seem that return->StopIteration addresses - does this happen, with PEP-380 generators?  <>)

What I'm not happy to disagree about is the importance of a good I/O abstraction and interoperation layer.

Twisted is not going away; there are oodles of good reasons that it's built the way it is, as I've tried to describe in this and other messages, and none of our plans for its future involve putting coroutine trampolines at the core of the event loop; those are just fine over on the side with inlineCallbacks.  However, lots of Python programmers are going to use what you come up with.  They'd use it even if it didn't really work, just because it's bundled in and it's convenient.  But I think it'll probably work fine for many tasks, and it will appeal to lots of people new to event-driven I/O because of the seductive deception of synchronous control flow and the superiority to scheduling I/O operations with threads.

What I think is really very important in the design of this new system is to present an API whereby:

if someone wants to write a basic protocol or data-format parser for the stdlib, it should be easy to write it as a feed parser without needing generator coroutines (for example, if they're pushing data into a C library, they shouldn't have to write a while loop that calls recv, they should be able to just transform some data callback into Python into some data callback in C; it should be able to leverage tulip without much more work,
if users of tulip (read; the stdlib) need access to some functionality implemented within Twisted, like an event-driven DNS client that is more scalable than getaddrinfo, they can call into Twisted without re-writing their entire program,
if users of Twisted need to invoke some functionality implemented on top of tulip, they can construct a task and weave in a scheduler, similarly without re-writing much,
if users of tulip want to just use Twisted to get better performance or reliability than the built-in stdlib multiplexor, they ideally shouldn't have to change anything, just run it with a different import line or something, and
if (when) users of tulip realize that their generators have devolved into a mess of spaghetti ;-) and they need to migrate to Twisted-style event-driven callbacks and maybe some formal state machines or generated parsers to deal with their inputs, that process can be done incrementally and not in one giant shoot-the-moon effort which will make them hate Twisted.

As an added bonus, such an API would provide a great basis for Tornado and Twisted to interoperate.

It would also be nice to have a more discrete I/O layer to insulate application code from common foibles like the fact that, for example, if you call send() in tulip multiple times but forget to 'yield from ...send()', you may end up writing interleaved garbage on the connection, then raising an assertion error, but only if there's a sufficient quantity of data and it needs to block; it will otherwise appear to work, leading to bugs that only start happening when you are pushing large volumes of data through a system at rates exceeding wire speed.  In other words, "only in production, only during the holiday season, only during traffic spikes, only when it's really really important for the system to keep working".

This is why I think that step 1 here needs to be a common low-level API for event-triggered operations that does not have anything to do with generators.  I don't want to stop you from doing interesting things with generators, but I do really want to decouple the tasks so that their responsibilities are not unnecessarily conflated.

task.unblock() is a method; protocol.data_received is a method.  Both can be invoked at the same level by an event loop.  Once that low-level event loop is delivering data to that callback's satisfaction, the callbacks can happily drive a coroutine scheduler, and the coroutine scheduler can have much less of a deep integration with the I/O itself; it just needs some kind of sentinel object (a Future, a Deferred) to keep track of what exactly it's waiting for.

> I'm most interested in feedback on the design of and
>, and to a lesser extent on the design of;
> is just an example of how this style works out in practice.

It looks to me like there's a design error in with respect to coordinating concurrent operations.  If you try to block on two operations at once, you'll get an assertion error ('assert not self.blocked', in block), so you can't coordinate two interesting I/O requests without spawning a bunch of new Tasks and then having them unblock their parent Task when they're done.  I may just be failing to imagine how one would implement something like Twisted's gatherResults, but this looks like it would be frustrating, tedious, and involve creating lots of extra objects and making the scheduler do a bunch more work.

Also, shouldn't there be a lot more real exceptions and a lot fewer assertions in this code?

Relatedly, add_reader/writer will silently stomp on a previous FD registration, so if two tasks end up calling recv() on the same socket, it doesn't look like there's any way to find out that they both did that.  It looks like the first task to call it will just hang forever, and the second one will "win"?  What are the intended semantics?

Speaking from the perspective of I/O scheduling, it will also be thrashing any stateful multiplexor with a ton of unnecessary syscalls.  A Twisted protocol in normal operation just receiving data from a single connection, using, let's say, a kqueue-based multiplexor will call kevent() once to register interest, then kqueue() to block, and then just keep getting data-available notifications and processing them unless some downstream buffer fills up and the transport is told to pause producing data, at which point another kevent() gets issued.  tulip, by contrast, will call kevent() over and over again, removing and then re-adding its reader repeatedly for every packet, since it can never know if someone is about to call recv() again any time soon.  Once again, request/response is not the best model for retrieving data from a transport; active connections need to be prepared to receive more data at any time and not in response to any particular request.

Finally, apologies for spelling / grammar errors; I didn't have a lot of time to copy-edit.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From glyph at  Wed Oct 31 11:20:29 2012
From: glyph at (Glyph)
Date: Wed, 31 Oct 2012 03:20:29 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Oct 29, 2012, at 5:25 PM, Greg Ewing <greg.ewing at> wrote:

> Andrew Svetlov wrote:
>> 0MQ socket has no file descriptor at all, it's just pointer to some
>> unspecified structure.
>> So 0MQ has own *poll* function which can process that sockets as well
>> as file descriptors.
> Aaargh... yet another event loop that wants to rule
> the world. This is not good.

As a wise man once said, "everybody wants to rule the world".

All event loops have their own run() API, and expect to be on top of everything, driving the loop.  This is one of the central principles of Twisted's design; by not attempting to directly take control of any loop, and providing a high-level wrapper around run, and an API that would accommodate every wacky wrapper around poll and select and kqueue and GetQueuedCompletionStatus, we could be a single loop that everything can use as an API and get the advantages of whatever event driven thing is popular this week.

You can't accomplish this by trying to force other loops to play by your rules; rather, accommodate and pave over their peculiarities and it'll be your API that their users actually write to.

(In the land of Mordor, where the shadows lie.)


From kristjan at  Wed Oct 31 11:07:10 2012
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Wed, 31 Oct 2012 10:07:10 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

> -----Original Message-----
> From: gvanrossum at [mailto:gvanrossum at] On Behalf
> Of Guido van Rossum
> Sent: 30. okt?ber 2012 16:40
> To: Kristj?n Valur J?nsson
> Cc: Richard Oudkerk; python-ideas at
> Subject: Re: [Python-ideas] Async API: some code to review
> What kind of time savings are we talking about? I imagine that the
> accept() loop I put in tulip/ is fast enough in terms of response
> time (latency) -- throughput would seem the more important measure (and I
> have no idea of this yet).
To be honest, it isn't serious for applications that serve few connections, but for things like web servers, It becomes important.
Looking at your code:

a) will always "block", causing the main thread (using the term loosely here) to once through the event loop, possibly doing other housekeepeing, even if a connection was available.  I don't think there is no way to selectively do completion based io, i.e. do immediate mode if possible.  You either go for one or the other on windows, at least.  in select based mecanisms it could be possible to do a select here first and avoid that extra loop, but for the sake of the application it might be confusing.  It might be best to stick to one system.
b) will either switch to the net task immediately (possible in stackless) or cause the srtart of t to wait until the next round in the event loop.

I this case, t will not start executing until after going around the loop twice.  A new connection can only be accepted each loop.  Imagine two http requests coming in simultaneously, at t=0

The sequence of operations will then be this (assuming FIFO scheduling)
main loop runs
accept 1 returns. task 1 created.  accept 2 scheduled
main loop runs making task 1 and accep2 runnable
task 1 runs.  does processing. performs send, and blocks
accept2 returns, task2 created
main loop runs, making task2 runnable
task2 runs, does processing, performs send.

Contributing to latency in this scenario are all the "main loop" runs.  Note that I may misunderstand the way your architecture works, perhaps there is no main loop, perhaps everything is interleaved.

An alternative something like this:
def loop():
        while True:
                conn, addr = yield from listener.accept()
                handler(conn, addr)
for I in range(n_handlers):
        t = scheduling.Task(loop)

Here, events will be different:
main loop runs, accept 1 and accept 2 runnable
accept 1 returns, stariting handler, processing and blocking on send
accept 2 returns, starting handler, processing, and blocking on send

As you see, there is only one initial housekeeping run needed to make both tasklets runnable and ready to run without interruption, giving the lowest possible total latency to the client.

In my expericene with RPC systems based this kind of asynchronous python IO, lowering the response time from when user space is made aware of the request and when python actually starts _processing_ it is critical to responsiveness..


From barry at  Wed Oct 31 11:38:53 2012
From: barry at (Barry Warsaw)
Date: Wed, 31 Oct 2012 11:38:53 +0100
Subject: [Python-ideas] with-statement syntactic quirk
Message-ID: <20121031113853.66fb0514@resist>

with-statements have a syntactic quirk, which I think would be useful to fix.
This is true in Python 2.7 through 3.3, but it's likely not fixable until 3.4,
unless of course it's a bug <wink>.


>>> with open('/etc/passwd') as p1, open('/etc/passwd') as p2: pass

Not legal:

>>> with (open('/etc/passwd') as p1, open('/etc/passwd') as p2): pass

Why is this useful?  If you need to wrap this onto multiple lines, say to fit
it within line length limits.  IWBNI you could write it like this:

    with (open('/etc/passwd') as p1,
          open('/etc/passwd') as p2):

This seems analogous to using parens to wrap long if-statements, but maybe
there's some subtle corner of the grammar that makes this problematic (like
'with' treating the whole thing as a single context manager).

Of course, you can wrap with backslashes, but ick!

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From ncoghlan at  Wed Oct 31 12:55:54 2012
From: ncoghlan at (Nick Coghlan)
Date: Wed, 31 Oct 2012 21:55:54 +1000
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <20121031113853.66fb0514@resist>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Wed, Oct 31, 2012 at 8:38 PM, Barry Warsaw <barry at> wrote:
> with-statements have a syntactic quirk, which I think would be useful to fix.
> This is true in Python 2.7 through 3.3, but it's likely not fixable until 3.4,
> unless of course it's a bug <wink>.
> Legal:
>>>> with open('/etc/passwd') as p1, open('/etc/passwd') as p2: pass
> Not legal:
>>>> with (open('/etc/passwd') as p1, open('/etc/passwd') as p2): pass
> Why is this useful?  If you need to wrap this onto multiple lines, say to fit
> it within line length limits.  IWBNI you could write it like this:
>     with (open('/etc/passwd') as p1,
>           open('/etc/passwd') as p2):
>           pass
> This seems analogous to using parens to wrap long if-statements, but maybe
> there's some subtle corner of the grammar that makes this problematic (like
> 'with' treating the whole thing as a single context manager).

It's not an especially subtle corner of the grammar, it's
tuples-as-context-managers (i.e. the case with no as clauses) that
causes hassles:

     with (cmA, cmB):

This is: a) useless (because tuples aren't context managers); but also
b) legal syntax (it blows up at runtime, complaining about a missing
__enter__ or __exit__ method rather than throwing SyntaxError at
compile time)

Adding support for line continuation with parentheses to import
statements was easier, because they don't accept arbitrary
subexpressions, so there was no confusion with tuples.

I do think it makes sense to change the semantics of this, but I ain't
volunteering to figure out the necessary Grammar changes :P


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From jeanpierreda at  Wed Oct 31 13:17:10 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Wed, 31 Oct 2012 08:17:10 -0400
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <20121031113853.66fb0514@resist>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Wed, Oct 31, 2012 at 6:38 AM, Barry Warsaw <barry at> wrote:
> This seems analogous to using parens to wrap long if-statements, but maybe
> there's some subtle corner of the grammar that makes this problematic (like
> 'with' treating the whole thing as a single context manager).

This seemed kind of icky when I read it, and I think Nick Coghlan
stated the reason best.

Is there a reason the tokenizer can't ignore newlines and
indentation/deindentation between with/etc. and the trailing colon?
This would solve the problem in general, without ambiguous syntax.

-- Devin

From eliben at  Wed Oct 31 13:33:04 2012
From: eliben at (Eli Bendersky)
Date: Wed, 31 Oct 2012 05:33:04 -0700
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Wed, Oct 31, 2012 at 5:17 AM, Devin Jeanpierre <jeanpierreda at>wrote:

> On Wed, Oct 31, 2012 at 6:38 AM, Barry Warsaw <barry at> wrote:
> > This seems analogous to using parens to wrap long if-statements, but
> maybe
> > there's some subtle corner of the grammar that makes this problematic
> (like
> > 'with' treating the whole thing as a single context manager).
> This seemed kind of icky when I read it, and I think Nick Coghlan
> stated the reason best.
> Is there a reason the tokenizer can't ignore newlines and
> indentation/deindentation between with/etc. and the trailing colon?
> This would solve the problem in general, without ambiguous syntax.

At the expense of making the tokenizer context dependent?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jeanpierreda at  Wed Oct 31 13:45:00 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Wed, 31 Oct 2012 08:45:00 -0400
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Wed, Oct 31, 2012 at 8:33 AM, Eli Bendersky <eliben at> wrote:
>> Is there a reason the tokenizer can't ignore newlines and
>> indentation/deindentation between with/etc. and the trailing colon?
>> This would solve the problem in general, without ambiguous syntax.
> At the expense of making the tokenizer context dependent?

It's already context-dependent in some sense, but this wouldn't make
it any moreso. For example, the tokenizer already ignores
indents/dedents when inside parens/braces/brackets, and handling this
only slightly more complex than that. In particular, the trailing
colon is the one not inside braces or brackets.

Also, I'd avoid the term "context-dependent". It sounds too similar to
"context-sensitive" !

Anyway, it looks like this isn't how the tokenizer treats
braces/brackets (it ignores indent/dedent, but not newlines (I guess
the grammar handles those)). What I meant to suggest was, treat "with
... :" similarly to how the OP suggests treating "with (...) :".

-- Devin

From eliben at  Wed Oct 31 13:52:04 2012
From: eliben at (Eli Bendersky)
Date: Wed, 31 Oct 2012 05:52:04 -0700
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Wed, Oct 31, 2012 at 5:45 AM, Devin Jeanpierre <jeanpierreda at>wrote:

> On Wed, Oct 31, 2012 at 8:33 AM, Eli Bendersky <eliben at> wrote:
> >> Is there a reason the tokenizer can't ignore newlines and
> >> indentation/deindentation between with/etc. and the trailing colon?
> >> This would solve the problem in general, without ambiguous syntax.
> >
> > At the expense of making the tokenizer context dependent?
> It's already context-dependent in some sense, but this wouldn't make
> it any moreso. For example, the tokenizer already ignores
> indents/dedents when inside parens/braces/brackets, and handling this
> only slightly more complex than that. In particular, the trailing
> colon is the one not inside braces or brackets.
> Also, I'd avoid the term "context-dependent". It sounds too similar to
> "context-sensitive" !

I use the two as rough synonyms. Shouldn't I?

> Anyway, it looks like this isn't how the tokenizer treats
> braces/brackets (it ignores indent/dedent, but not newlines (I guess
> the grammar handles those)). What I meant to suggest was, treat "with
> ... :" similarly to how the OP suggests treating "with (...) :".

If this gets accepted, then, is there a reason to stop at "with"? Why not
ignore newlines between "if" and its trailing ":" as well? [playing devil's
advocate here]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From jeanpierreda at  Wed Oct 31 14:14:07 2012
From: jeanpierreda at (Devin Jeanpierre)
Date: Wed, 31 Oct 2012 09:14:07 -0400
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Wed, Oct 31, 2012 at 8:52 AM, Eli Bendersky <eliben at> wrote:
>> Also, I'd avoid the term "context-dependent". It sounds too similar to
>> "context-sensitive" !
> I use the two as rough synonyms. Shouldn't I?

"context sensitive" has a technical meaning, in the same way that
"regular" or "recursively enumerable" does. In this particular case,
the technical meaning doesn't align very well with the lay / intuitive
meaning, but gets used in the same place as where one might use the
phrase in the lay / intuitive sense -- if you'd said "context
sensitive" I would've assumed you meant it in the technical sense.

I guess I can't say that you should avoid the term unless I have a
replacement. Maybe just using more words would help, like saying "then
the actions of the tokenizer would depend on the context"?

>> Anyway, it looks like this isn't how the tokenizer treats
>> braces/brackets (it ignores indent/dedent, but not newlines (I guess
>> the grammar handles those)). What I meant to suggest was, treat "with
>> ... :" similarly to how the OP suggests treating "with (...) :".
> If this gets accepted, then, is there a reason to stop at "with"? Why not
> ignore newlines between "if" and its trailing ":" as well? [playing devil's
> advocate here]

I'd be very confused if newlines were acceptable inside `with` but not
`if` and those.

I'm not seeing a downside to changing them as well, except that it
makes the workload (maybe significantly?) larger. I'm not sure if it's
made that much larger. In the tokenizer it's easy, maybe in the
grammar it's not so easy, and I don't know if this has to be in the
grammar. The last time I ever tried editing python's parsing rules it
ended very very poorly.

-- Devin

From ncoghlan at  Wed Oct 31 14:22:42 2012
From: ncoghlan at (Nick Coghlan)
Date: Wed, 31 Oct 2012 23:22:42 +1000
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Wed, Oct 31, 2012 at 10:52 PM, Eli Bendersky <eliben at> wrote:
> On Wed, Oct 31, 2012 at 5:45 AM, Devin Jeanpierre <jeanpierreda at>
> wrote:
>> Anyway, it looks like this isn't how the tokenizer treats
>> braces/brackets (it ignores indent/dedent, but not newlines (I guess
>> the grammar handles those)). What I meant to suggest was, treat "with
>> ... :" similarly to how the OP suggests treating "with (...) :".
> If this gets accepted, then, is there a reason to stop at "with"? Why not
> ignore newlines between "if" and its trailing ":" as well? [playing devil's
> advocate here]

Note that I agreed with Barry that we probably *should* change it from
a principle-of-least-surprise point of view. I just called "not it" on
actually figuring out how to make it work given the current Grammar
definition as a starting point :)

Between expression precedence control, singleton tuples, generator
expressions, function calls, function parameter declarations, base
class declarations, import statement grouping and probably a couple of
other cases that have slipped my mind, parentheses already have plenty
of different meanings in Python, and we also have plenty of places
where the syntactical rules aren't quite the same as those in an
ordinary expression.

The thing that makes Python's parser simple is the fact that we have
*prefixes* in the Grammar that make it clear when the parsing rules
should change, so you don't need much lookahead at parsing time (it's
deliberately limited to only 1 token, in fact). The challenge in this
particular case is to avoid a Grammar ambiguity relative to ordinary
expression syntax without duplicating large sections of the grammar


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guido at  Wed Oct 31 16:01:03 2012
From: guido at (Guido van Rossum)
Date: Wed, 31 Oct 2012 08:01:03 -0700
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

Modern CPUs are black boxes full of magic. I'm not too surprised that
running Python code on multiple threads incurs some kind of overhead
that keeping the Python interpreter in one thread avoids.

On Wed, Oct 31, 2012 at 2:29 AM, Kristj?n Valur J?nsson
<kristjan at> wrote:
>> -----Original Message-----
>> From: gvanrossum at [mailto:gvanrossum at] On Behalf
>> Of Guido van Rossum
>> Sent: 30. okt?ber 2012 17:47
>> To: Kristj?n Valur J?nsson
>> Cc: python-ideas at
>> Subject: Re: [Python-ideas] non-blocking buffered I/O
>> On Tue, Oct 30, 2012 at 9:11 AM, Kristj?n Valur J?nsson
>> <kristjan at> wrote:
>> > By the way:  We found that acquiring the GIL by a random external thread
>> in response to the IOCP to wake up tasklets was incredibly expensive.  I
>> spent a lot of effort figuring out why that is and found no real answer.  The
>> mechanism we now use is to let the external worker thread schedule a
>> "pending call" which is serviced by the main thread at the earliest
>> opportunity.  Also, the main thread is interrupted if it is doing a sleep.  This is
>> much more efficient.
>> In which Python version? The GIL has been redesigned at least once.
>> Also the latency (not necessarily cost) to acquire the GIL varies by the
>> sys.setswitchinterval setting. (Actually the more responsive you make it, the
>> more it will cost you in overall performance.)
>> I do think that using the pending call mechanism is the right solution here.
> I am talking about 2.7, of course, the python of hard working lumberjacks everywhere :)
> Anyway I don't think the issue is much affected by the particular GIL implementation.
> Alternative a)
>         Callback comes on arbitrary thread
>         arbitrary thread calls PyGILState_Ensure
>                 (This causes a _dynamic thread state_ to be generated for the arbitrary thread, and the GIL to be subsequently acquired)
>         arbitrary thread does whatever python gymnastics required to complete the IO (wake up tasklet
>         arbitrary thread calls PyGILState_Release
> For whatever reason, this approach _increased CPU usage_ on a loaded server.  Latency was fine, throughput the same, and the delay in actual GIL acquisition was ok.  I suspect that the problem lies with the dynamic acquisition of a thread state, and other initialization that may occur.  I did experiment with having a cache of unused threadstates on the ready for external threads, but it didn't get me anywhere.  This could also be the result of cache thrashing or something that doesn't show up immediately on a multicore cpu.
> Alternative b)
>         Callback comes on arbitrary thread
>         external thread callse PyEval_SchedulePendingCall()
>                 This grabs a static lock, puts in a record, and signals to python that something needs to be done immediately.
>         external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout.
>         Main thread wakes up from its sleep (if it was sleeping).
>         Main thread runs python code, causing it to immediately service the scheduled pending call, causing it to perform the wait.
> In reality, StacklessIO uses a slight variation of the above:
> StacklessIO dispatch system
>         Callback comes on arbitrary thread
>         external thread schedules a completion event in its own "dispatch" buffer to be serviced by the main thread.  This is protected by its own lock, and doesn't need the GIL.
>         external thread callse PyEval_SchedulePendingCall() to "tick" the dispatch buffer
>         external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout.
>         If main thread is sleeping: Main thread wakes up from its sleep
>                 Immediately at after sleeping, the main thread will 'tick' the dispatch queue
>                        After ticking, tasklets may have been made runnable, so the main thread may continue out into the main loop of the application to do work.  If not, it may continue sleeping.
>         Main thread runs python code, causing it to immediately service the scheduled pending call, which will tick the dispatch queue.  This may be a no-op if the main thread was sleeping and was already ticked.
> The issue we were facing was not with latency (although grabbing the GIL when the main thread is busy is slower than notifying it of a pending call), but with unexplained increased cpu showing up.  A proxy node servicing 2000 clients or upwards would suddenly double or triple its cpu.
> The reason I'm mentioning this here is that this is important.  We have spent quite some time and energy on trying to figure out the most efficient way to complete IOCP from an arbitrary thread and this is the end result.  Perhaps things can be done to improve this.  Also, it is really important to study these things under real load, experience has shown me that the most innocuous changes that work well in the lab suddenly start behaving strangely in the field.

--Guido van Rossum (

From kristjan at  Wed Oct 31 16:10:10 2012
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Wed, 31 Oct 2012 15:10:10 +0000
Subject: [Python-ideas] non-blocking buffered I/O
In-Reply-To: <>
References: <>
Message-ID: <>

> -----Original Message-----
> From: gvanrossum at [mailto:gvanrossum at] On Behalf
> Of Guido van Rossum
> Sent: 31. okt?ber 2012 15:01
> To: Kristj?n Valur J?nsson
> Cc: python-ideas at
> Subject: Re: [Python-ideas] non-blocking buffered I/O
> Modern CPUs are black boxes full of magic. I'm not too surprised that running
> Python code on multiple threads incurs some kind of overhead that keeping
> the Python interpreter in one thread avoids.
Ah, but I forgot to mention one weird thing:
If we used a pool of threads for the callbacks, and pre-initalized those threads with python states, and then acquired the GIL using 
PyEval_RestoreThread(), then this overhead went away.
It was only the dynamic tread state acquired using PyGilState_Ensure() that caused cpu overhead.
Using the fixed pool was not acceptable in the long run, in particular we din't want to complicate things to another level by adding a thread pool manger to the whole thing when the OS is fully capable of providing an external callback thread.

I regret not spending more time on this and to be able to provide an actual performance analysis and fix.  Instead I have to be that weird old man in the tavern uttering inscrutable warnings that no young adventurer pays any attention to :)


From guido at  Wed Oct 31 16:13:46 2012
From: guido at (Guido van Rossum)
Date: Wed, 31 Oct 2012 08:13:46 -0700
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <20121031113853.66fb0514@resist>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

Honestly, is a backslash going to kill you?

--Guido van Rossum (

From ncoghlan at  Wed Oct 31 16:28:01 2012
From: ncoghlan at (Nick Coghlan)
Date: Thu, 1 Nov 2012 01:28:01 +1000
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Thu, Nov 1, 2012 at 1:13 AM, Guido van Rossum <guido at> wrote:
> Honestly, is a backslash going to kill you?

Aye, given the cost-benefit ratio on this one, I'll be rather
surprised if anyone ever actually fixes it. I just wanted to be clear
that I'm not *philosophically* opposed to fixing it (since I think
Barry's proposed behaviour makes more sense from a user perspective),
I'm just fairly sure it's likely to be hard to fix without making the
Grammar harder to maintain, which *would* be a difficult sell for
something that's a relatively trivial wart :)


Nick Coghlan   |   ncoghlan at   |   Brisbane, Australia

From guido at  Wed Oct 31 16:37:01 2012
From: guido at (Guido van Rossum)
Date: Wed, 31 Oct 2012 08:37:01 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Ok, this is a good point: the more you can do without having to go
through the main loop again the better.

I already took this to heart in my recent rewrites of recv() and
send() -- they try to read/write the underlying socket first, and if
it works, the task isn't suspended; only if they receive EAGAIN or
something similar do they block the task and go back to the top.

In fact, Listener.accept() does the same thing -- meaning the loop can
go around many times without blocking a single time. (The listening
socket is in non-blocking mode so accept() will raise EAGAIN when
there *isn't* another client connection ready immediately.)

This is also one of the advantages of yield-from; you *never* go back
to the end of the ready queue just to invoke another layer of
abstraction. (Steve tries to approximate this by running the generator
immediately until the first yield, but the caller still ends up
suspending to the scheduler, because they are using yield which
doesn't avoid the suspension, unlike yield-from.)


On Wed, Oct 31, 2012 at 3:07 AM, Kristj?n Valur J?nsson
<kristjan at> wrote:
>> -----Original Message-----
>> From: gvanrossum at [mailto:gvanrossum at] On Behalf
>> Of Guido van Rossum
>> Sent: 30. okt?ber 2012 16:40
>> To: Kristj?n Valur J?nsson
>> Cc: Richard Oudkerk; python-ideas at
>> Subject: Re: [Python-ideas] Async API: some code to review
>> What kind of time savings are we talking about? I imagine that the
>> accept() loop I put in tulip/ is fast enough in terms of response
>> time (latency) -- throughput would seem the more important measure (and I
>> have no idea of this yet).
> To be honest, it isn't serious for applications that serve few connections, but for things like web servers, It becomes important.
> Looking at your code:
> c
> a) will always "block", causing the main thread (using the term loosely here) to once through the event loop, possibly doing other housekeepeing, even if a connection was available.  I don't think there is no way to selectively do completion based io, i.e. do immediate mode if possible.  You either go for one or the other on windows, at least.  in select based mecanisms it could be possible to do a select here first and avoid that extra loop, but for the sake of the application it might be confusing.  It might be best to stick to one system.
> b) will either switch to the net task immediately (possible in stackless) or cause the srtart of t to wait until the next round in the event loop.
> I this case, t will not start executing until after going around the loop twice.  A new connection can only be accepted each loop.  Imagine two http requests coming in simultaneously, at t=0
> The sequence of operations will then be this (assuming FIFO scheduling)
> main loop runs
> accept 1 returns. task 1 created.  accept 2 scheduled
> main loop runs making task 1 and accep2 runnable
> task 1 runs.  does processing. performs send, and blocks
> accept2 returns, task2 created
> main loop runs, making task2 runnable
> task2 runs, does processing, performs send.
> Contributing to latency in this scenario are all the "main loop" runs.  Note that I may misunderstand the way your architecture works, perhaps there is no main loop, perhaps everything is interleaved.
> An alternative something like this:
> def loop():
>         while True:
>                 conn, addr = yield from listener.accept()
>                 handler(conn, addr)
> for I in range(n_handlers):
>         t = scheduling.Task(loop)
>         t.start()
> Here, events will be different:
> main loop runs, accept 1 and accept 2 runnable
> accept 1 returns, stariting handler, processing and blocking on send
> accept 2 returns, starting handler, processing, and blocking on send
> As you see, there is only one initial housekeeping run needed to make both tasklets runnable and ready to run without interruption, giving the lowest possible total latency to the client.
> In my expericene with RPC systems based this kind of asynchronous python IO, lowering the response time from when user space is made aware of the request and when python actually starts _processing_ it is critical to responsiveness..
> Cheers

--Guido van Rossum (

From guido at  Wed Oct 31 16:42:28 2012
From: guido at (Guido van Rossum)
Date: Wed, 31 Oct 2012 08:42:28 -0700
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Wed, Oct 31, 2012 at 8:28 AM, Nick Coghlan <ncoghlan at> wrote:
> On Thu, Nov 1, 2012 at 1:13 AM, Guido van Rossum <guido at> wrote:
>> Honestly, is a backslash going to kill you?
> Aye, given the cost-benefit ratio on this one, I'll be rather
> surprised if anyone ever actually fixes it. I just wanted to be clear
> that I'm not *philosophically* opposed to fixing it (since I think
> Barry's proposed behaviour makes more sense from a user perspective),
> I'm just fairly sure it's likely to be hard to fix without making the
> Grammar harder to maintain, which *would* be a difficult sell for
> something that's a relatively trivial wart :)

Yeah, the problem is that when you see a '(' immediately after 'with',
you don't know whether that's just the start of a parenthesized
expression or the start of a (foo as bar, blah as blabla) syntactic

--Guido van Rossum (

From Steve.Dower at  Wed Oct 31 16:51:35 2012
From: Steve.Dower at (Steve Dower)
Date: Wed, 31 Oct 2012 15:51:35 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> This is also one of the advantages of yield-from; you *never* go back to the end
> of the ready queue just to invoke another layer of abstraction. (Steve tries to
> approximate this by running the generator immediately until the first yield, but
> the caller still ends up suspending to the scheduler, because they are using
> yield which doesn't avoid the suspension, unlike yield-from.)

This is easily changed by modifying lines 141 and 180 of to call _step() directly instead of requeuing it. The reason why it currently requeues the task is that there is no guarantee that the caller wanted the next step to occur in the same scheduler, whether because the completed operation or a previous one continued somewhere else. (I removed the option to attach this information to the Future itself, but it is certainly of value in some circumstances, though mostly involving threads and not necessarily sockets.)

The change I would probably make here is to test and only requeue if it is different to the current scheduler (alternatively, a scheduler could implement its submit() to do this). Yes, this adds a little more overhead, but I'm still convinced that in general the operations being blocked on will take long enough for it to be insignificant. (And of course using a mechanism to bypass the decorator and use 'yield from' also avoids this overhead, though it potentially changes the program's behaviour).


From kristjan at  Wed Oct 31 16:59:18 2012
From: kristjan at (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=)
Date: Wed, 31 Oct 2012 15:59:18 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

> -----Original Message-----
> From: gvanrossum at [mailto:gvanrossum at] On Behalf
> Of Guido van Rossum
> Sent: 31. okt?ber 2012 15:37
> To: Kristj?n Valur J?nsson
> Cc: Richard Oudkerk; python-ideas at
> Subject: Re: [Python-ideas] Async API: some code to review
> Ok, this is a good point: the more you can do without having to go through
> the main loop again the better.
> I already took this to heart in my recent rewrites of recv() and
> send() -- they try to read/write the underlying socket first, and if it works,
> the task isn't suspended; only if they receive EAGAIN or something similar do
> they block the task and go back to the top.

Yes, this is possible for non-blocking style IO.   However, for IO architectures that are based on completions, you can't always mix and match.
On windows, for example it is complicated to do because of how AcceptEx works.  I recall socket properties, overlapped property and other things interfering.
I also recall testing the use of first trying non-blocking IO (for accept and send/recv) and then resorting to an IOCP call.  If I recall correctly, the added overhead of trying  a non-blocking call for the usual case of it failing was detrimental to the whole exercise.  the non-blocking IO calls took non-trivial time to complete.

The approach of having multiple "threads" doing accept also avoids the delay required to dispatch the request from the accepting thread to the worker thread.
> In fact, Listener.accept() does the same thing -- meaning the loop can go 
> This is also one of the advantages of yield-from; you *never* go back to the
> end of the ready queue just to invoke another layer of abstraction.

My experience with this stuff is of course based on stackless/gevent style programming, so some of it may not apply :)
Personally, I feel that things should just magically work, from the programmer's point of view, rather than have to manually leave a trace of breadcrumbs through the stack using "yield" constructs.  But that's just me.


From him at  Wed Oct 31 17:37:58 2012
From: him at (=?ISO-8859-1?Q?Joachim_K=F6nig?=)
Date: Wed, 31 Oct 2012 17:37:58 +0100
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On 31/10/2012 16:42, Guido van Rossum wrote:
> Yeah, the problem is that when you see a '(' immediately after 'with',
> you don't know whether that's just the start of a parenthesized
> expression or the start of a (foo as bar, blah as blabla) syntactic
> construct.

but couldn't "with" be interpreted as an additional kind of opening 
parantheses (and "if", "for", "while",
"elif", "else" too) and the ":" as the closing one?

I'm sure this has been asked a number of times but I couldn't find an 


From barry at  Wed Oct 31 17:51:53 2012
From: barry at (Barry Warsaw)
Date: Wed, 31 Oct 2012 17:51:53 +0100
Subject: [Python-ideas] with-statement syntactic quirk
References: <20121031113853.66fb0514@resist>
Message-ID: <20121031175153.1d49db40@resist>

On Oct 31, 2012, at 09:55 PM, Nick Coghlan wrote:

>It's not an especially subtle corner of the grammar, it's
>tuples-as-context-managers (i.e. the case with no as clauses) that
>causes hassles:
>     with (cmA, cmB):
>           pass
>This is: a) useless (because tuples aren't context managers); but also
>b) legal syntax (it blows up at runtime, complaining about a missing
>__enter__ or __exit__ method rather than throwing SyntaxError at
>compile time)

So clearly we need to make tuples proper context managers <wink>.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <>

From benjamin at  Wed Oct 31 18:04:40 2012
From: benjamin at (Benjamin Peterson)
Date: Wed, 31 Oct 2012 17:04:40 +0000 (UTC)
Subject: [Python-ideas] with-statement syntactic quirk
References: <20121031113853.66fb0514@resist>
Message-ID: <>

Nick Coghlan <ncoghlan at ...> writes:
> I do think it makes sense to change the semantics of this, but I ain't
> volunteering to figure out the necessary Grammar changes :P

It would not be difficult to special in AST construction. We do this for some
other things already.

From python at  Wed Oct 31 18:25:26 2012
From: python at (MRAB)
Date: Wed, 31 Oct 2012 17:25:26 +0000
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On 2012-10-31 13:22, Nick Coghlan wrote:
> On Wed, Oct 31, 2012 at 10:52 PM, Eli Bendersky <eliben at> wrote:
>> On Wed, Oct 31, 2012 at 5:45 AM, Devin Jeanpierre <jeanpierreda at>
>> wrote:
>>> Anyway, it looks like this isn't how the tokenizer treats
>>> braces/brackets (it ignores indent/dedent, but not newlines (I guess
>>> the grammar handles those)). What I meant to suggest was, treat "with
>>> ... :" similarly to how the OP suggests treating "with (...) :".
>> If this gets accepted, then, is there a reason to stop at "with"? Why not
>> ignore newlines between "if" and its trailing ":" as well? [playing devil's
>> advocate here]
> Note that I agreed with Barry that we probably *should* change it from
> a principle-of-least-surprise point of view. I just called "not it" on
> actually figuring out how to make it work given the current Grammar
> definition as a starting point :)
> Between expression precedence control, singleton tuples, generator
> expressions, function calls, function parameter declarations, base
> class declarations, import statement grouping and probably a couple of
> other cases that have slipped my mind, parentheses already have plenty
> of different meanings in Python, and we also have plenty of places
> where the syntactical rules aren't quite the same as those in an
> ordinary expression.
> The thing that makes Python's parser simple is the fact that we have
> *prefixes* in the Grammar that make it clear when the parsing rules
> should change, so you don't need much lookahead at parsing time (it's
> deliberately limited to only 1 token, in fact). The challenge in this
> particular case is to avoid a Grammar ambiguity relative to ordinary
> expression syntax without duplicating large sections of the grammar
> file.
Another possibility could be to allow a tuple of context managers and a
tuple of names:

with (open('/etc/passwd'), open('/etc/passwd')) as (p1, p2):


with open('/etc/passwd') as p1:
     with open('/etc/passwd')) as p2:

or perhaps more correctly:

with open('/etc/passwd') as temp_1:
     with open('/etc/passwd')) as temp_2:
         p1, p2 = temp_1, temp_2

It would also allow:

with (cmA, cmB):


with cmA:
     with cmB:

From dholth at  Wed Oct 31 19:04:45 2012
From: dholth at (Daniel Holth)
Date: Wed, 31 Oct 2012 14:04:45 -0400
Subject: [Python-ideas] Allowing semver in packaging
Message-ID: <>

Or Changing the Version Comparison Module in Distutils (again)

We've discussed a bit on distutils-sig about allowing in Python packages. Ronald's suggestion to
replace - with ~ as a
filename parts separator made me think of it again, because semver also
uses the popular - character.

The gist of semver:

Major.Minor.Patch (always 3 numbers)


And the big feature: no non-lexicographical sorting.

Right now, setuptools replaces every run of non-alphanumeric characters in
versions (and in project names) to a single dash (-). This would have to
change to at least allow +, and the regexp for recognizing an installed
dist would have to allow the plus as well. Current setuptools handling:

def safe_version(version):
    version = version.replace(' ','.')
    return re.sub('[^A-Za-z0-9.]+', '-', version)

Semver would be an upgrade from the existing conventions because it is easy
to remember (no special-case sorting for forgettable words 'dev' and
'post'), because you can legally put Mercurial revision numbers in your
package's pre- or post- version, and because the meaning of Major, Minor,
Patch is defined. For compatibility I think it would be enough to say you
could not mix semver and PEP-386 in the same major.minor.patch release.

Vinay Sajip's distlib has some experimental support for semver.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From solipsis at  Wed Oct 31 20:29:50 2012
From: solipsis at (Antoine Pitrou)
Date: Wed, 31 Oct 2012 20:29:50 +0100
Subject: [Python-ideas] with-statement syntactic quirk
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On Wed, 31 Oct 2012 11:38:53 +0100
Barry Warsaw <barry at> wrote:
> with-statements have a syntactic quirk, which I think would be useful to fix.
> This is true in Python 2.7 through 3.3, but it's likely not fixable until 3.4,
> unless of course it's a bug <wink>.
> Legal:
> >>> with open('/etc/passwd') as p1, open('/etc/passwd') as p2: pass
> Not legal:
> >>> with (open('/etc/passwd') as p1, open('/etc/passwd') as p2): pass
> Why is this useful?  If you need to wrap this onto multiple lines, say to fit
> it within line length limits.  IWBNI you could write it like this:
>     with (open('/etc/passwd') as p1,
>           open('/etc/passwd') as p2):
>           pass

This bit me a couple of days ago.  +1 for supporting it.



From arnodel at  Wed Oct 31 22:03:26 2012
From: arnodel at (Arnaud Delobelle)
Date: Wed, 31 Oct 2012 21:03:26 +0000
Subject: [Python-ideas] with-statement syntactic quirk
In-Reply-To: <20121031113853.66fb0514@resist>
References: <20121031113853.66fb0514@resist>
Message-ID: <>

On 31 October 2012 10:38, Barry Warsaw <barry at> wrote:
> with-statements have a syntactic quirk, which I think would be useful to fix.
> This is true in Python 2.7 through 3.3, but it's likely not fixable until 3.4,
> unless of course it's a bug <wink>.
> Legal:
>>>> with open('/etc/passwd') as p1, open('/etc/passwd') as p2: pass
> Not legal:
>>>> with (open('/etc/passwd') as p1, open('/etc/passwd') as p2): pass
> Why is this useful?  If you need to wrap this onto multiple lines, say to fit
> it within line length limits.  IWBNI you could write it like this:
>     with (open('/etc/passwd') as p1,
>           open('/etc/passwd') as p2):
>           pass
> This seems analogous to using parens to wrap long if-statements, but maybe
> there's some subtle corner of the grammar that makes this problematic (like
> 'with' treating the whole thing as a single context manager).
> Of course, you can wrap with backslashes, but ick!

No need for backslashes, just put the brackets in the right place:

    with (
            open('/etc/passwd')) as p1, (
            open('/etc/passwd')) as p2:



From guido at  Wed Oct 31 22:18:28 2012
From: guido at (Guido van Rossum)
Date: Wed, 31 Oct 2012 14:18:28 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 31, 2012 at 8:51 AM, Steve Dower <Steve.Dower at> wrote:
> Guido van Rossum wrote:
>> This is also one of the advantages of yield-from; you *never* go back to the end
>> of the ready queue just to invoke another layer of abstraction. (Steve tries to
>> approximate this by running the generator immediately until the first yield, but
>> the caller still ends up suspending to the scheduler, because they are using
>> yield which doesn't avoid the suspension, unlike yield-from.)
> This is easily changed by modifying lines 141 and 180 of to call _step() directly instead of requeuing it. The reason why it currently requeues the task is that there is no guarantee that the caller wanted the next step to occur in the same scheduler, whether because the completed operation or a previous one continued somewhere else. (I removed the option to attach this information to the Future itself, but it is certainly of value in some circumstances, though mostly involving threads and not necessarily sockets.)

I think you are missing the point. Even if you don't make a roundtrip
through the queue, *each* yield statement, if it is executed at all,
must transfers control to the scheduler. What you're proposing is just
making the scheduler immediately resume the generator.

So, if you have a trivial task, like this:

def trivial(x):
    return x
    yield  # Unreachable, but makes it a generator

and a caller:

    foo = yield trivial(42)

then the call to trivial(42) returns a Future that already has the
result 42 set in it. But caller() still suspends to the scheduler,
yielding that Future. The scheduler can resume caller() immediately
but the damage (overhead) is done.

In contrast, in the yield-from world, we'd write this

def trivial(x):
    return x
    yield from ()  # Unreachable

def caller():
    foo = yield from trivial(42)

where the latter expands roughly to the following, without reference
to the scheduler at all:

def caller():
    _gen = trivial(42)
        while True:
            _val = next(_gen)
            yield _val
    except StopIteration as _exc:
        foo = _exc.value

The first next(gen) call raises StopIteration so the yield is never
reached -- the scheduler doesn't know that any of this is going in.
And there's no need to do anything special to advance the generator to
the first yield manually either.

(It's different of course when a generator is wrapped in a Task()
constructor. But that should be relatively rare.)

> The change I would probably make here is to test and only requeue if it is different to the current scheduler (alternatively, a scheduler could implement its submit() to do this). Yes, this adds a little more overhead, but I'm still convinced that in general the operations being blocked on will take long enough for it to be insignificant. (And of course using a mechanism to bypass the decorator and use 'yield from' also avoids this overhead, though it potentially changes the program's behaviour).

Just get with the program and use yield-from exclusively.

--Guido van Rossum (

From at  Wed Oct 31 22:31:02 2012
From: at (Yury Selivanov)
Date: Wed, 31 Oct 2012 17:31:02 -0400
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-31, at 5:18 PM, Guido van Rossum <guido at> wrote:

> @async
> def trivial(x):
>    return x
>    yield  # Unreachable, but makes it a generator

FWIW, just a crazy comment: if we make @async decorator to clone
the code object of a passed function and set its (co_flags | 0x0020),
then any passed function becomes a generator, even if it doesn't
have yields/yield-froms ;)


From Steve.Dower at  Wed Oct 31 22:31:58 2012
From: Steve.Dower at (Steve Dower)
Date: Wed, 31 Oct 2012 21:31:58 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> Just get with the program and use yield-from exclusively.

I didn't realise there was a "program" here, just a discussion about an API design. I've already raised my concerns with using yield from exclusively, but since the performance argument trumps all of those then there is little more I can contribute.

When a final design begins to stabilise, I will see how I can make use of it in my own code. Until then, I'll continue using Futures, which are ideal for my current needs. I won't be forcing 'yield from' onto my users until its usage is clear and I can provide them with suitable guidance.


From andrew.svetlov at  Wed Oct 31 22:34:02 2012
From: andrew.svetlov at (Andrew Svetlov)
Date: Wed, 31 Oct 2012 23:34:02 +0200
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Yury, you are really the crazy hacker. Not sure tricks with patching
bytecode etc are good for standard library.

On Wed, Oct 31, 2012 at 11:31 PM, Yury Selivanov < at>wrote:

> On 2012-10-31, at 5:18 PM, Guido van Rossum <guido at> wrote:
> > @async
> > def trivial(x):
> >    return x
> >    yield  # Unreachable, but makes it a generator
> FWIW, just a crazy comment: if we make @async decorator to clone
> the code object of a passed function and set its (co_flags | 0x0020),
> then any passed function becomes a generator, even if it doesn't
> have yields/yield-froms ;)
> -
> Yury
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at

Andrew Svetlov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

From at  Wed Oct 31 22:41:51 2012
From: at (Yury Selivanov)
Date: Wed, 31 Oct 2012 17:41:51 -0400
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On 2012-10-31, at 5:34 PM, Andrew Svetlov <andrew.svetlov at> wrote:

> Yury, you are really the crazy hacker. Not sure tricks with patching bytecode etc are good for standard library.

I know that I sort of created an image for myself of 
"a guy who solves any problem by patching opcodes on live code",
but don't worry, I'll never ever recommend such solutions for
stdlib/python :)

This is, however, a nice technique to rapidly prototype
and test interesting ideas.


From guido at  Wed Oct 31 22:51:47 2012
From: guido at (Guido van Rossum)
Date: Wed, 31 Oct 2012 14:51:47 -0700
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

On Wed, Oct 31, 2012 at 2:31 PM, Steve Dower <Steve.Dower at> wrote:
> Guido van Rossum wrote:
>> Just get with the program and use yield-from exclusively.
> I didn't realise there was a "program" here, just a discussion about an API design.

Sorry, I left off a smiley. :-)

> I've already raised my concerns with using yield from exclusively, but since the performance argument trumps all of those then there is little more I can contribute.

What about the usability argument? Don't you think users will be
confused by the need to use yield from some times and just yield other
times? Yes, they may be able to tell by looking up the definition and
checking how it is decorated, but that doesn't really help.

> When a final design begins to stabilise, I will see how I can make use of it in my own code. Until then, I'll continue using Futures, which are ideal for my current needs. I won't be forcing 'yield from' onto my users until its usage is clear and I can provide them with suitable guidance.

Understood. What exactly is it that makes Futures so ideal for your
current needs? Is it integration with threads?

Another tack: could you make use of tulip/ That doesn't use
generators of any form; it is meant as an integration point with other
styles of async programming (although I am not claiming that it is any
good in its current form -- this too is just a strawman to shoot

--Guido van Rossum (

From Steve.Dower at  Wed Oct 31 23:36:13 2012
From: Steve.Dower at (Steve Dower)
Date: Wed, 31 Oct 2012 22:36:13 +0000
Subject: [Python-ideas] Async API: some code to review
In-Reply-To: <>
References: <>
Message-ID: <>

Guido van Rossum wrote:
> On Wed, Oct 31, 2012 at 2:31 PM, Steve Dower <Steve.Dower at> wrote:
>> Guido van Rossum wrote:
>>> Just get with the program and use yield-from exclusively.
>> I didn't realise there was a "program" here, just a discussion about an API
>> design.
> Sorry, I left off a smiley. :-)

Always a risk in email communication - no offence taken.

>> I've already raised my concerns with using yield from exclusively, but since
>> the performance argument trumps all of those then there is little more I can
>> contribute.
> What about the usability argument? Don't you think users will be confused by the
> need to use yield from some times and just yield other times? Yes, they may be
> able to tell by looking up the definition and checking how it is decorated, but
> that doesn't really help.

Users only ever _need_ to write yield. The only reason that wattle does not work with Python 3.2 is because of non-blank returns inside generators.

There is only one reason to use 'yield from' and that is for the performance optimisation, which I do acknowledge and did observe in my own benchmarks.

>> When a final design begins to stabilise, I will see how I can make use of it
>> in my own code. Until then, I'll continue using Futures, which are ideal for my
>> current needs. I won't be forcing 'yield from' onto my users until its usage is
>> clear and I can provide them with suitable guidance.
> Understood. What exactly is it that makes Futures so ideal for your current
> needs? Is it integration with threads?
> Another tack: could you make use of tulip/ That doesn't use
> generators of any form; it is meant as an integration point with other styles of
> async programming (although I am not claiming that it is any good in its current
> form -- this too is just a strawman to shoot down).

I know I've been vague about our intended application (deliberately so, to try and keep the discussion neutral), but I'll lay out some details.

We're working on adding support for Windows 8 apps (formerly known as Metro) written in Python. These will use the new API (WinRT) which is highly asynchronous - even operations such as opening a file are only* available as an asynchronous function. The intention is to never block on the UI thread.

(* Some synchronous Win32 APIs are still available from C++, but these are actively discouraged and restricted in many ways. Most of Win32 is not usable.)

The model used for these async APIs is future-based: every *Async() function returns a future for a task that is already running. The caller is not allowed to wait on this future - the only option is to attach a callback. C# and VB use their async/await keywords (good 8 min intro video on those: while JavaScript and C++ have multi-line lambda support. For Python, we are aiming for closer to the async/await model (which is also how we chose the names).

Incidentally, our early designs used yield from exclusively. It was only when we started discovering edge-cases where things broke, as well as the impact on code 'cleanliness', that we switched to yield.

There are three aspects of this that work better and result in cleaner code with wattle than with tulip:

 - event handlers can be "async-void", such that when the event is raised by the OS/GUI/device/whatever the handler can use asynchronous tasks without blocking the main thread. In this case, the caller receives a future but ignores it because it does not care about the final result. (We could achieve this under 'yield from' by requiring a decorator, which would then probably prevent other Python code from calling the handler directly. There is very limited opportunity for us to reliably intercept this case.)

 - the event loop is implemented by the OS. Our Scheduler implementation does not need to provide an event loop, since we can submit() calls to the OS-level loop. This pattern also allows wattle to 'sit on top of' any other event loop, probably including Twisted and 0MQ, though I have not tried it (except with Tcl).

 - Future objects can be marshalled directly from Python into Windows, completing the interop story. Even with tulip, we would probably still require a decorator for this case so that we can marshal regular generators as iterables (for which there is a specific type). Without a decorator, we would probably have to ban both cases to prevent subtly misbehaving programs. At least with wattle, the user does not have to do anything different from any of their other @async functions.

Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has.
