From benhoyt at gmail.com  Tue Jan  1 01:11:37 2013
From: benhoyt at gmail.com (Ben Hoyt)
Date: Tue, 1 Jan 2013 13:11:37 +1300
Subject: [Python-ideas] Preventing out of memory conditions
In-Reply-To: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
References: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
Message-ID: <CAL9jXCGOFArkn2E7Zx5sgsrqKAJXxMogU+t_gOdR5_G7fJKiSw@mail.gmail.com>

Interesting idea, though I don't think it's something that should be a
Python language extension. For instance, iOS (typically more
resource-constrained) sends the application signals when system memory is
getting low so it can free stuff -- this is done at the OS level. And I
think that's the right place, because this will almost certainly be setup-
and system-dependent. For instance, it would depend hugely on whether
there's a virtual memory manager, and how it's configured.

I'd say your best bet is to write a little library that does the
appropriate thing for your needs (your system or setup). Say starts a
thread that checks the system's free memory every so often and sends your
application a signal/callback saying "we're getting low, free some of your
caches". It could even send a "level flag" to your callback saying "fairly
low", "very low", or "critically low" -- I think iOS does this.

-Ben


On Tue, Jan 1, 2013 at 11:16 AM, Max Moroz <maxmoroz at gmail.com> wrote:

> Sometimes, I have the flexibility to reduce the memory used by my
> program (e.g., by destroying large cached objects, etc.). It would be
> great if I could ask Python interpreter to notify me when memory is
> running out, so I can take such actions.
>
> Of course, it's nearly impossible for Python to know in advance if the
> OS would run out of memory with the next malloc call. Furthermore,
> Python shouldn't guess which memory (physical, virtual, etc.) is
> relevant in the particular situation (for instance, in my case, I only
> care about physical memory, since swapping to disk makes my
> application as good as frozen). So the problem as stated above is
> unsolvable.
>
> But let's say I am willing to do some work to estimate the maximum
> amount of memory my application can be allowed to use. If I provide
> that number to Python interpreter, it may be possible for it to notify
> me when the next memory allocation would exceed this limit by calling
> a function I provide it (hopefully passing as arguments the amount of
> memory being requested, as well as the amount currently in use). My
> callback function could then destroy some objects, and return True to
> indicate that some objects were destroyed. At that point, the
> intepreter could run its standard garbage collection routines to
> release the memory that corresponded to those objects - before
> proceeding with whatever it was trying to do originally. (If I
> returned False, or if I didn't provide a callback function at all, the
> interpreter would simply behave as it does today.) Any memory
> allocations that happen while the callback function itself is
> executing, would not trigger further calls to it. The whole mechanism
> would be disabled for the rest of the session if the memory freed by
> the callback function was insufficient to prevent going over the
> memory limit.
>
> Would this be worth considering for a future language extension? How
> hard would it be to implement?
>
> Max
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130101/5dea7e29/attachment.html>

From greg at krypto.org  Tue Jan  1 04:22:29 2013
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 31 Dec 2012 19:22:29 -0800
Subject: [Python-ideas] Preventing out of memory conditions
In-Reply-To: <CAL9jXCGOFArkn2E7Zx5sgsrqKAJXxMogU+t_gOdR5_G7fJKiSw@mail.gmail.com>
References: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
	<CAL9jXCGOFArkn2E7Zx5sgsrqKAJXxMogU+t_gOdR5_G7fJKiSw@mail.gmail.com>
Message-ID: <CAGE7PNLC_a2LOsheqO+EE-G+fMNfiT0F9EzN6ZxGVWKDReM8Jw@mail.gmail.com>

Within CPython the way the C API is today it is too late by the time the
code to raise a MemoryError has been called so capturing all places that
could occur is not easy. Implementing this at the C level malloc later
makes more sense. Have it dip into a reserved low memory pool to satisfy
the current request and send the process a signal indicating it is running
low. This approach would also work with C extension modules or an embedded
Python.

I'd expect this already exists but I haven't looked for one.

Having a thread polling memory use it not generally wise as that is polling
rather than event driven and could easily miss low memory situations before
it is too late and a failure has already happened (allocation demand can
come in large spikes depending on the application).

OSes running processes in constrained environments or ones where the
resources available can be reduced by the OS later may already send their
own warning signals prior to outright killing the process but that should
not preclude an application being able to monitor and constrain itself on
its own without needing the OS to do it.

-gps

On Mon, Dec 31, 2012 at 4:11 PM, Ben Hoyt <benhoyt at gmail.com> wrote:

> Interesting idea, though I don't think it's something that should be a
> Python language extension. For instance, iOS (typically more
> resource-constrained) sends the application signals when system memory is
> getting low so it can free stuff -- this is done at the OS level. And I
> think that's the right place, because this will almost certainly be setup-
> and system-dependent. For instance, it would depend hugely on whether
> there's a virtual memory manager, and how it's configured.
>
> I'd say your best bet is to write a little library that does the
> appropriate thing for your needs (your system or setup). Say starts a
> thread that checks the system's free memory every so often and sends your
> application a signal/callback saying "we're getting low, free some of your
> caches". It could even send a "level flag" to your callback saying "fairly
> low", "very low", or "critically low" -- I think iOS does this.
>
> -Ben
>
>
>
> On Tue, Jan 1, 2013 at 11:16 AM, Max Moroz <maxmoroz at gmail.com> wrote:
>
>> Sometimes, I have the flexibility to reduce the memory used by my
>> program (e.g., by destroying large cached objects, etc.). It would be
>> great if I could ask Python interpreter to notify me when memory is
>> running out, so I can take such actions.
>>
>> Of course, it's nearly impossible for Python to know in advance if the
>> OS would run out of memory with the next malloc call. Furthermore,
>> Python shouldn't guess which memory (physical, virtual, etc.) is
>> relevant in the particular situation (for instance, in my case, I only
>> care about physical memory, since swapping to disk makes my
>> application as good as frozen). So the problem as stated above is
>> unsolvable.
>>
>> But let's say I am willing to do some work to estimate the maximum
>> amount of memory my application can be allowed to use. If I provide
>> that number to Python interpreter, it may be possible for it to notify
>> me when the next memory allocation would exceed this limit by calling
>> a function I provide it (hopefully passing as arguments the amount of
>> memory being requested, as well as the amount currently in use). My
>> callback function could then destroy some objects, and return True to
>> indicate that some objects were destroyed. At that point, the
>> intepreter could run its standard garbage collection routines to
>> release the memory that corresponded to those objects - before
>> proceeding with whatever it was trying to do originally. (If I
>> returned False, or if I didn't provide a callback function at all, the
>> interpreter would simply behave as it does today.) Any memory
>> allocations that happen while the callback function itself is
>> executing, would not trigger further calls to it. The whole mechanism
>> would be disabled for the rest of the session if the memory freed by
>> the callback function was insufficient to prevent going over the
>> memory limit.
>>
>> Would this be worth considering for a future language extension? How
>> hard would it be to implement?
>>
>> Max
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121231/2c0c7ae8/attachment.html>

From random832 at fastmail.us  Tue Jan  1 04:28:43 2013
From: random832 at fastmail.us (Random832)
Date: Mon, 31 Dec 2012 22:28:43 -0500
Subject: [Python-ideas] Preventing out of memory conditions
In-Reply-To: <CAL9jXCGOFArkn2E7Zx5sgsrqKAJXxMogU+t_gOdR5_G7fJKiSw@mail.gmail.com>
References: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
	<CAL9jXCGOFArkn2E7Zx5sgsrqKAJXxMogU+t_gOdR5_G7fJKiSw@mail.gmail.com>
Message-ID: <50E257EB.5070002@fastmail.us>

On 12/31/2012 7:11 PM, Ben Hoyt wrote:
> Interesting idea, though I don't think it's something that should be a 
> Python language extension. For instance, iOS (typically more 
> resource-constrained) sends the application signals when system memory 
> is getting low so it can free stuff -- this is done at the OS level. 
> And I think that's the right place, because this will almost certainly 
> be setup- and system-dependent. For instance, it would depend hugely 
> on whether there's a virtual memory manager, and how it's configured.
>
> I'd say your best bet is to write a little library that does the 
> appropriate thing for your needs (your system or setup). Say starts a 
> thread that checks the system's free memory every so often and sends 
> your application a signal/callback saying "we're getting low, free 
> some of your caches". It could even send a "level flag" to your 
> callback saying "fairly low", "very low", or "critically low" -- I 
> think iOS does this.

I'm concerned that a program that does this will end up as the loser in 
this scenario:

http://blogs.msdn.com/b/oldnewthing/archive/2012/01/18/10257834.aspx

(tl;dr, two programs each having a different idea of how much free 
memory the system should have results in an "unfair" total allocation of 
memory)

I think it's possibly important to avoid using the system's free memory 
as an input to any such system.


From greg at krypto.org  Tue Jan  1 04:33:01 2013
From: greg at krypto.org (Gregory P. Smith)
Date: Mon, 31 Dec 2012 19:33:01 -0800
Subject: [Python-ideas] Preventing out of memory conditions
In-Reply-To: <50E257EB.5070002@fastmail.us>
References: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
	<CAL9jXCGOFArkn2E7Zx5sgsrqKAJXxMogU+t_gOdR5_G7fJKiSw@mail.gmail.com>
	<50E257EB.5070002@fastmail.us>
Message-ID: <CAGE7PN++xUpDqET2xH8eeaFiNHFnUgN2z=-nMFtvVOw9wB1TuQ@mail.gmail.com>

On Mon, Dec 31, 2012 at 7:28 PM, Random832 <random832 at fastmail.us> wrote:

> On 12/31/2012 7:11 PM, Ben Hoyt wrote:
>
>> Interesting idea, though I don't think it's something that should be a
>> Python language extension. For instance, iOS (typically more
>> resource-constrained) sends the application signals when system memory is
>> getting low so it can free stuff -- this is done at the OS level. And I
>> think that's the right place, because this will almost certainly be setup-
>> and system-dependent. For instance, it would depend hugely on whether
>> there's a virtual memory manager, and how it's configured.
>>
>> I'd say your best bet is to write a little library that does the
>> appropriate thing for your needs (your system or setup). Say starts a
>> thread that checks the system's free memory every so often and sends your
>> application a signal/callback saying "we're getting low, free some of your
>> caches". It could even send a "level flag" to your callback saying "fairly
>> low", "very low", or "critically low" -- I think iOS does this.
>>
>
> I'm concerned that a program that does this will end up as the loser in
> this scenario:
>
> http://blogs.msdn.com/b/**oldnewthing/archive/2012/01/**18/10257834.aspx<http://blogs.msdn.com/b/oldnewthing/archive/2012/01/18/10257834.aspx>
>
> (tl;dr, two programs each having a different idea of how much free memory
> the system should have results in an "unfair" total allocation of memory)
>
> I think it's possibly important to avoid using the system's free memory as
> an input to any such system.
>
>
indeed. only look at your own process's consumption vs. some numerical
limit you've chosen for yourself.  this also means you can adjust your own
limit up or down at runtime if desired.  (JVM's tend to force you to work
this way)

-gps
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20121231/539af667/attachment.html>

From solipsis at pitrou.net  Tue Jan  1 22:55:05 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 1 Jan 2013 22:55:05 +0100
Subject: [Python-ideas] Documenting Python warts on Stack Overflow
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
Message-ID: <20130101225505.757540fa@pitrou.net>

On Mon, 31 Dec 2012 04:00:12 +0400
Oleg Broytman <phd at phdru.name> wrote:
> Hello and happy New Year!
> 
> On Sun, Dec 30, 2012 at 11:20:34PM +0100, Victor Stinner <victor.stinner at gmail.com> wrote:
> > If I understood correctly, you would like to list some specific issues
> > like print() not flushing immediatly stdout if you ask to not write a
> > newline (print "a", in Python 2 or print("a", end=" ") in Python 3).
> > If I understood correctly, and if you want to improve Python, you
> > should help the documentation project. Or if you can build a website
> > listing such issues *and listing solutions* like calling
> > sys.stdout.flush() or using print(flush=True) (Python 3.3+) for the
> > print issue.
> > 
> > A list of such issue without solution doesn't help anyone.
> 
>    I cannot say for Anatoly but for me warts are:
> 
> -- things that don't exist where they should (but the core team object
>    or they are hard to implement or something);
> -- things that exist where they shouldn't; they are hard to fix because
>    removing them would break backward compatibility;
> -- things that are implemented in strange, inconsistent ways.
> 
>    A few examples:
> [snip]

The problem is you are listing examples which *in your opinion* are
issues with Python. Other people would have different ideas of what is
an issue and what is not. This can't be the right methodology if we
want to write a piece of Python docs. Only things which are
*well-known* annoyances can qualify.

I also disagree that missing features are "warts"; they are just
missing features, not something unpleasant that's difficult to get rid
of.

Regards

Antoine.




From rosuav at gmail.com  Tue Jan  1 23:16:39 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 09:16:39 +1100
Subject: [Python-ideas] Documenting Python warts on Stack Overflow
In-Reply-To: <20130101225505.757540fa@pitrou.net>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
Message-ID: <CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>

On Wed, Jan 2, 2013 at 8:55 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> The problem is you are listing examples which *in your opinion* are
> issues with Python. Other people would have different ideas of what is
> an issue and what is not. This can't be the right methodology if we
> want to write a piece of Python docs. Only things which are
> *well-known* annoyances can qualify.

My understanding of a "Python wart" is that it's something that cannot
be changed without readdressing some fundamental design. For example,
Python has decided that indentation and line-endings are significant -
that a logical statement ends at end-of-line. Python has further
decided that line continuation characters are unnecessary inside
parenthesized expressions. Resultant wart: Breaking a massive 'for'
loop between its assignment list and its iterable list doesn't work,
even though breaking it anywhere else does. (This question came up on
python-list a little while ago.) Why should it be an error to break it
here, but not there? Why can't I split it like this:

for x,y,z in
  start_point,
  continuation_point,
  end_point
:
  pass

when it's perfectly legal to split it like this:

for (
  x,y,z
) in (
  start_point,
  continuation_point,
  end_point
):
  pass

Well, because you can't. It's a little odd what you can and can't do,
until you understand the underlying system fairly well. It's something
that's highly unlikely to change; one of the premises would have to be
sacrificed (or at least modified) to achieve it.

Something that could be changed if the devs had enough time is a
tracker issue (or a "show me some code" issue - you want to complain,
you can do the work to fix it). Something that could be changed, but
would break backward compatibility is a prime candidate for __future__
and/or Python 4 (like the change of the division operator - that
change introduced its own oddities, some of which may be warts, eg
that int/int->float but sqrt(float) !-> complex). A wart is different
from both of the above.

ChrisA


From wuwei23 at gmail.com  Wed Jan  2 00:17:34 2013
From: wuwei23 at gmail.com (alex23)
Date: Tue, 1 Jan 2013 15:17:34 -0800 (PST)
Subject: [Python-ideas] Documenting Python warts on Stack Overflow
In-Reply-To: <CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
Message-ID: <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>

On Jan 2, 8:16?am, Chris Angelico <ros... at gmail.com> wrote:
> It's a little odd what you can and can't do,
> until you understand the underlying system fairly well. It's something
> that's highly unlikely to change; one of the premises would have to be
> sacrificed (or at least modified) to achieve it.

By this definition, though, every feature of Python that someone
doesn't understand is a wart. For a new user, mutable default
parameters is a wart, but once you understand Python's execution &
object models, it's just the way the language is.

Generally, I find "wart" means "something the user doesn't like about
the language even if it makes internal sense".


From rosuav at gmail.com  Wed Jan  2 00:46:00 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 10:46:00 +1100
Subject: [Python-ideas] Documenting Python warts on Stack Overflow
In-Reply-To: <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
Message-ID: <CAPTjJmqvZJO2+n4+DoqJ5xocDN7VEtir4xscZfB0LhOZZrG0tQ@mail.gmail.com>

On Wed, Jan 2, 2013 at 10:17 AM, alex23 <wuwei23 at gmail.com> wrote:
> On Jan 2, 8:16 am, Chris Angelico <ros... at gmail.com> wrote:
>> It's a little odd what you can and can't do,
>> until you understand the underlying system fairly well. It's something
>> that's highly unlikely to change; one of the premises would have to be
>> sacrificed (or at least modified) to achieve it.
>
> By this definition, though, every feature of Python that someone
> doesn't understand is a wart. For a new user, mutable default
> parameters is a wart, but once you understand Python's execution &
> object models, it's just the way the language is.
>
> Generally, I find "wart" means "something the user doesn't like about
> the language even if it makes internal sense".

That's pretty much it, yeah. The warts of Python are the gotchas that
need to be grokked before you can call yourself fluent in the
language. Might feel as though the designers "got it wrong", or were
making an arbitrary choice, but whatever it is, the language behaves
that way and you have to get to know it. I agree that mutable defaults
as a wart.

PHP's scoping rules are simpler than Python's. A variable inside a
function is local unless it's explicitly declared global; function
names are global. (That's not the whole set of rules, but close enough
for this argument.) Python, on the other hand, adds the oddity that a
name referenced inside a function is global unless, somewhere in that
function, it's assigned to. This is a Python wart that bites people
(see the first question in http://toykeeper.net/warts/python/ for
instance), but it's a consequence of putting "variables" and
"functions" into a single namespace called "name bindings", plus the
decision to not require variable declarations (C, for instance, has
the same notion of "everything's a name", but instead of declaring
globals, declares locals). Python's scoping rules are vastly superior
to PHP's, but a bit more complicated, and may need a bit of
explanation.

(Incidentally, of all the warts listed in the page I linked above, two
give a quick and easy error message, two have better ways of doing
things (don't use +=, use append/extend), and only one is really a
wart - mutable default values. Well, that and the behaviour of += on
something in a tuple, but .extend dodges that one too.)

Documenting these sorts of oddities is a good thing, as long as the
underlying goal is one of new programmer education and not "hey you
idiots who develop this language, here's all the things you did
wrong".

ChrisA


From steve at pearwood.info  Wed Jan  2 00:57:00 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 02 Jan 2013 10:57:00 +1100
Subject: [Python-ideas] Documenting Python warts on Stack Overflow
In-Reply-To: <CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
Message-ID: <50E377CC.70108@pearwood.info>

On 02/01/13 09:16, Chris Angelico wrote:

> My understanding of a "Python wart" is that it's something that cannot
> be changed without readdressing some fundamental design. For example,
> Python has decided that indentation and line-endings are significant -
> that a logical statement ends at end-of-line. Python has further
> decided that line continuation characters are unnecessary inside
> parenthesized expressions. Resultant wart: Breaking a massive 'for'
> loop between its assignment list and its iterable list doesn't work,
> even though breaking it anywhere else does.

A truly poor example.

You can't break a for loop between its assignment and iterable for the
same reason you can't break any other statement at an arbitrary place.
That's not how Python does things: statements must be on a single logical
line.

> (This question came up on
> python-list a little while ago.) Why should it be an error to break it
> here, but not there? Why can't I split it like this:
>
> for x,y,z in
>    start_point,
>    continuation_point,
>    end_point
> :
>    pass

As you say above, logical statements end at end-of-line. There is an
end-of-line following "for x,y,z in". Why would anyone think that you
should be able to split the statement there?

- is there a line-continuation that would let you continue over multiple
   physical lines? no

- is there a parenthesized expression that would let you continue over
   multiple physical lines? no

None of the conditions for splitting statements over multiple physical
lines apply, and so the standard rule applies: statements end at
end-of-line. This is not a wart any more than the inability to write:


y =
  x +
   1;

is a wart. Maybe you're used to being able to do that in some (but not
all) semi-colon languages, but they are not Python, any more than
Python is Forth where you might be used to writing:

x 1 + y !


To some degree warts are in the eye of the beholder, but failure of
Python to be "just like language Foo" is not a wart.


> when it's perfectly legal to split it like this:
>
> for (
>    x,y,z
> ) in (
>    start_point,
>    continuation_point,
>    end_point
> ):
>    pass
>
> Well, because you can't. It's a little odd what you can and can't do,

"Why can't I drive straight through red lights, when I'm allowed to
drive through green lights? That's a little odd!"

No it is not. It is a fundamental aspect of Python's syntax.


> until you understand the underlying system fairly well. It's something
> that's highly unlikely to change; one of the premises would have to be
> sacrificed (or at least modified) to achieve it.
>
> Something that could be changed if the devs had enough time is a
> tracker issue (or a "show me some code" issue - you want to complain,
> you can do the work to fix it). Something that could be changed, but
> would break backward compatibility is a prime candidate for __future__
> and/or Python 4 (like the change of the division operator - that
> change introduced its own oddities, some of which may be warts, eg
> that int/int->float but sqrt(float) !->  complex).

Are you talking about math.sqrt or cmath.sqrt or some other sqrt?

In general, Python 3 now extends float to complex under regular arithmetic:

py> (-100.0)**0.5
(6.123031769111886e-16+10j)


math.sqrt(-100.0) on the other hand continues to raise, because the math
module is by design limited to producing real-values. cmath.sqrt(-100.0)
continues to give a complex result, again by design.



-- 
Steven


From phd at phdru.name  Wed Jan  2 01:01:13 2013
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 2 Jan 2013 04:01:13 +0400
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
Message-ID: <20130102000113.GB672@iskra.aviel.ru>

On Tue, Jan 01, 2013 at 03:17:34PM -0800, alex23 <wuwei23 at gmail.com> wrote:
> On Jan 2, 8:16?am, Chris Angelico <ros... at gmail.com> wrote:
> > It's a little odd what you can and can't do,
> > until you understand the underlying system fairly well. It's something
> > that's highly unlikely to change; one of the premises would have to be
> > sacrificed (or at least modified) to achieve it.
> 
> By this definition, though, every feature of Python that someone
> doesn't understand is a wart. For a new user, mutable default
> parameters is a wart, but once you understand Python's execution &
> object models, it's just the way the language is.
> 
> Generally, I find "wart" means "something the user doesn't like about
> the language even if it makes internal sense".

   What about warts that don't have internal sense? Mutable default
parameters are just artifacts of the implementation. What is their
"internal sense"?

   Paraphrasing Alan Cooper from "The Inmates are Running the Asylum":
The phrase "experienced Python programmer" really means the person has
been hurt so many times that the scar tissue is thick enough so he no
longer feels the pain.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From rosuav at gmail.com  Wed Jan  2 01:11:29 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 11:11:29 +1100
Subject: [Python-ideas] Documenting Python warts on Stack Overflow
In-Reply-To: <50E377CC.70108@pearwood.info>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<50E377CC.70108@pearwood.info>
Message-ID: <CAPTjJmostS_6_X7+PATf-j5=pW2EgpDhHtMpnh2zf5XGGmv-pw@mail.gmail.com>

On Wed, Jan 2, 2013 at 10:57 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> You can't break a for loop between its assignment and iterable for the
> same reason you can't break any other statement at an arbitrary place.
> That's not how Python does things: statements must be on a single logical
> line.
>
> This is not a wart any more than the inability to write:
>
> y =
>  x +
>   1;
>
> is a wart. Maybe you're used to being able to do that in some (but not
> all) semi-colon languages)
>
> To some degree warts are in the eye of the beholder, but failure of
> Python to be "just like language Foo" is not a wart.

Of course. I'm just trying to find examples that have actually come up
on python-list, rather than contriving my own. As per my definition of
wart as given above, these are NOT things that need to be fixed - just
things that need to be understood.

Rule: One Python statement must be on one line. (This is the bit where
Python differs from, say, C.)

Modifying rule: Python statements can be broken across multiple lines,
given certain conditions.

Wart: There are other conditions that, though they seem superficially
similar to the legal ones, don't make for valid split points. Even
though a human might say that it's obvious and unambiguous that the
statement continues, the rules don't allow it.

>> eg int/int->float but sqrt(float) !->  complex).
>
> Are you talking about math.sqrt or cmath.sqrt or some other sqrt?
>
> In general, Python 3 now extends float to complex under regular arithmetic:
>
> py> (-100.0)**0.5
> (6.123031769111886e-16+10j)
>
>
> math.sqrt(-100.0) on the other hand continues to raise, because the math
> module is by design limited to producing real-values. cmath.sqrt(-100.0)
> continues to give a complex result, again by design.

Hmm, I was doing that one from memory. Since the ** operator happily
returns complex, it was probably math.sqrt that was in question. I
withdraw this one; the operators are consistent amongst themselves,
all will extend to the "next type up" if necessary. (Or at least, this
pair do. There might be a wart elsewhere, but this ain't it.)

ChrisA


From rosuav at gmail.com  Wed Jan  2 01:34:47 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 11:34:47 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102000113.GB672@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
Message-ID: <CAPTjJmpBGoXkgSV8fm2xFLL_6GL5CiwQWsSJYZDEoUc-sHAA0w@mail.gmail.com>

On Wed, Jan 2, 2013 at 11:01 AM, Oleg Broytman <phd at phdru.name> wrote:
>    What about warts that don't have internal sense? Mutable default
> parameters are just artifacts of the implementation. What is their
> "internal sense"?

They let you use a function for something where you'd otherwise need
to instantiate an object and play with it. Take caching, for instance:

def name_lookup(name,cache={}):
  if name not in cache:
    cache[name] = some_lengthy_operation
    # prune the cache of old stuff to keep its size down
  return cache[name]

You can ignore the default argument and pretend it's all magic, or you
can explicitly run a separate cache:

name_lookup("foo",{})   # easy way to say "bypass the cache"

# Do a bunch of lookups that won't be in the main cache, and which
would only pollute the main cache for later
local_name_cache = {}
[name_lookup(n,local_name_cache) for n in names]

The other consideration here is of side effects. It's all very well to
wave a magic wand and say that:

def foo(x,y=[]): pass

will create a unique list for each y, but what about:

def foo(x,y=open("main.log","w")): pass

or similar? Should it reopen the log every time? Should it reevaluate
the expression? There's an easy way to spell it if you want that
behaviour:

def foo(x,y=None):
  if y is None: y=whatever_expression_you_want

(or using object() if None is a legal arg).

Whichever way mutable objects in default args are handled, there are
going to be strangenesses. Therefore the best thing to do is (almost
certainly) the simplest.

>    Paraphrasing Alan Cooper from "The Inmates are Running the Asylum":
> The phrase "experienced Python programmer" really means the person has
> been hurt so many times that the scar tissue is thick enough so he no
> longer feels the pain.

That applies to PHP, and possibly to C (though if you treat C as "all
the power of assembly language, coupled with all the readability of
assembly language", then it doesn't hurt nearly as much as if you try
to treat it as a modern high level language). I'm not so sure it
applies to Python.

ChrisA


From wuwei23 at gmail.com  Wed Jan  2 01:54:34 2013
From: wuwei23 at gmail.com (alex23)
Date: Tue, 1 Jan 2013 16:54:34 -0800 (PST)
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102000113.GB672@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
Message-ID: <c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>

On Jan 2, 10:01?am, Oleg Broytman <p... at phdru.name> wrote:
> Paraphrasing Alan Cooper from "The Inmates are Running the Asylum":
> The phrase "experienced Python programmer" really means the person has
> been hurt so many times that the scar tissue is thick enough so he no
> longer feels the pain.

To me, that's nonsense. The pain people are experiencing with "warts"
like mutable defaults is entirely from trying to force Python to fit
mental models they've constructed of other languages.

The "internal sense" of mutable defaults is that everything is an
object, and that functions arguments are declared at definition and
not run-time. What you call "implementation artifact" I see as
expected behaviour; any other implementation that didn't provide this
wouldn't be Python in a number of fundamental ways.


From steve at pearwood.info  Wed Jan  2 01:55:58 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 02 Jan 2013 11:55:58 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102000113.GB672@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
Message-ID: <50E3859E.8030003@pearwood.info>

On 02/01/13 11:01, Oleg Broytman wrote:
> On Tue, Jan 01, 2013 at 03:17:34PM -0800, alex23<wuwei23 at gmail.com>  wrote:
>> On Jan 2, 8:16 am, Chris Angelico<ros... at gmail.com>  wrote:
>>> It's a little odd what you can and can't do,
>>> until you understand the underlying system fairly well. It's something
>>> that's highly unlikely to change; one of the premises would have to be
>>> sacrificed (or at least modified) to achieve it.
>>
>> By this definition, though, every feature of Python that someone
>> doesn't understand is a wart. For a new user, mutable default
>> parameters is a wart, but once you understand Python's execution&
>> object models, it's just the way the language is.
>>
>> Generally, I find "wart" means "something the user doesn't like about
>> the language even if it makes internal sense".
>
>     What about warts that don't have internal sense? Mutable default
> parameters are just artifacts of the implementation. What is their
> "internal sense"?

They are not artifacts of the implementation, they are a consequence
of a deliberate design choice of Python.

Default values in function definitions are set *once*, when the function
object is created. Only the function body is run every time the function
is called, not the function definition. So whether you do this:

def ham(x=0):
     x += 1
     return x

or this:

def spam(x=[]):
     x.append(1)
     return x


the default value for both functions is a single object created once and
reused every time you call the function.

The consequences of this may be too subtle for beginners to predict, and
that even experienced coders sometimes forget makes it a wart, but it
makes perfect internal sense:

* in Python, bindings ALWAYS occur when the code is executed;

* in Python, "x=<whatever>" is a binding;

* even inside a function definition;

* def is a statement which is executed at run time, not something
   performed at compile time;

* therefore, inside the statement "def spam(x=[]): ..." the binding
   x=[] occurs ONCE ONLY. The same list object is always used for the
   default value, not a different one each time.

Early binding of function defaults should, in my opinion, be preferred
over late binding because:

* given early binding, it is clean to get late binding semantics with
   just one extra line. Everything you need remains encapsulated inside
   the function:

   def spam(x=None):
       if x is None: x = []
       x.append(1)
       return x


* given late binding, it is ugly to get early binding semantics,
   since it requires you to create a separate global "constant"
   for every argument needing an early binding:

   _SPAM_DEFAULT_ARG = []  # Don't touch this!
   def spam(x=None):
       if x is None: x = _SPAM_DEFAULT_ARG
       x.append(1)
       return x




-- 
Steven


From ncoghlan at gmail.com  Wed Jan  2 02:07:58 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 2 Jan 2013 11:07:58 +1000
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102000113.GB672@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
Message-ID: <CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>

On Wed, Jan 2, 2013 at 10:01 AM, Oleg Broytman <phd at phdru.name> wrote:
> On Tue, Jan 01, 2013 at 03:17:34PM -0800, alex23 <wuwei23 at gmail.com> wrote:
>> On Jan 2, 8:16 am, Chris Angelico <ros... at gmail.com> wrote:
>> > It's a little odd what you can and can't do,
>> > until you understand the underlying system fairly well. It's something
>> > that's highly unlikely to change; one of the premises would have to be
>> > sacrificed (or at least modified) to achieve it.
>>
>> By this definition, though, every feature of Python that someone
>> doesn't understand is a wart. For a new user, mutable default
>> parameters is a wart, but once you understand Python's execution &
>> object models, it's just the way the language is.
>>
>> Generally, I find "wart" means "something the user doesn't like about
>> the language even if it makes internal sense".

FWIW, I prefer the term "traps for the unwary" over "warts", since
it's less judgmental and better covers the goal of issues for people
which can cause problems with learning the language. I highlight some
of the examples related to the import system here:
http://python-notes.boredomandlaziness.org/en/latest/python_concepts/import_traps.html

>    What about warts that don't have internal sense? Mutable default
> parameters are just artifacts of the implementation. What is their
> "internal sense"?

Um, no. Mutable default arguments make perfect sense once you
understand the difference between compile time, definition time and
execution time for a function. Defaults are evaluated at definition
time, thus they are necessarily shared across all invocations of the
function. If you don't want them shared, you use a sentinel value like
None to postpone the creation to execution time. They're a trap for
the unwary, but not a wart.

Else clauses on loops are arguably closer to qualifying as a genuine
wart (see http://python-notes.boredomandlaziness.org/en/latest/python_concepts/break_else.html),
since they're not much shorter than the explicit sentinel value based
alternative, and significantly less intuitive. However, because they
exist, and people *will* encounter them in real world code, every
beginner will eventually have to learn what they mean.

The other complaint discussed in the thread, regarding "Why don't
compound statement keywords and their trailing colon count as
parentheses for purposes of ignoring line breaks?" has to do with a
mix of implementation simplicity and error quality. Pairing up
"if"/":", "with"/":", "for"/":" etc would certainly be possible, but
may result in the infamous "missing semi-colon" style of C syntax
error (or missing paren style of Lisp error), where the fault may be
reported well away from the missing character, or with an error that
is extremely hard for a beginner to translate into "you left out a
character here". Given the likely detrimental effect on error quality,
and the ability to use actual parens or backslashes for line
continuation.

The Design FAQ and Programming FAQ are intended to be the repository
for answers to this kind of question. Addition of new questions and
answers is handled like any other patch: via the tracker (and some of
the existing answers could likely do with updates as well).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From phd at phdru.name  Wed Jan  2 00:49:16 2013
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 2 Jan 2013 03:49:16 +0400
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130101225505.757540fa@pitrou.net>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
Message-ID: <20130101234916.GA672@iskra.aviel.ru>

Hi!

On Tue, Jan 01, 2013 at 10:55:05PM +0100, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Mon, 31 Dec 2012 04:00:12 +0400
> Oleg Broytman <phd at phdru.name> wrote:
> > On Sun, Dec 30, 2012 at 11:20:34PM +0100, Victor Stinner <victor.stinner at gmail.com> wrote:
> > > If I understood correctly, you would like to list some specific issues
> > > like print() not flushing immediatly stdout if you ask to not write a
> > > newline (print "a", in Python 2 or print("a", end=" ") in Python 3).
> > > If I understood correctly, and if you want to improve Python, you
> > > should help the documentation project. Or if you can build a website
> > > listing such issues *and listing solutions*
> > 
> > -- things that don't exist where they should (but the core team object
> >    or they are hard to implement or something);
> > -- things that exist where they shouldn't; they are hard to fix because
> >    removing them would break backward compatibility;
> > -- things that are implemented in strange, inconsistent ways.
> > 
> >    A few examples:
> > [snip]
> 
> The problem is you are listing examples which *in your opinion* are
> issues with Python. Other people would have different ideas of what is
> an issue and what is not. This can't be the right methodology if we
> want to write a piece of Python docs.

   Absolutely not. I collected the list of examples in reply to a
question "what are warts and why one cannot just document solutions?" I
hope I managed to show that warts are built (or unbuilt, so to say) so
deep in Python and the stdlib design it's impossible to fix them with
code or documentation. Fixing them require major design changes.

> Only things which are *well-known* annoyances can qualify.

   Well, some warts are quite well-known. My counter overflows when I
try to count how many times anonymous code blocks have been proposed
and rejected.
   IIRC Mr. van Rossum admitted that for/else was a design mistake.
   One wart is being worked on right now: async libs redesign.

> I also disagree that missing features are "warts"; they are just
> missing features, not something unpleasant that's difficult to get rid
> of.

   Some of those missing features are near to impossible to get rid of.
The idea of anonymous code blocks is rejected constantly so no one would
dare to create a patch.
   As for their unpleasantness -- it's in the eye of the beholder, of
course. I'm not going to fight tooth and nail for my vision.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From steve at pearwood.info  Wed Jan  2 03:00:49 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 02 Jan 2013 13:00:49 +1100
Subject: [Python-ideas] Documenting Python warts on Stack Overflow
In-Reply-To: <CAPTjJmqvZJO2+n4+DoqJ5xocDN7VEtir4xscZfB0LhOZZrG0tQ@mail.gmail.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<CAPTjJmqvZJO2+n4+DoqJ5xocDN7VEtir4xscZfB0LhOZZrG0tQ@mail.gmail.com>
Message-ID: <50E394D1.5060908@pearwood.info>

On 02/01/13 10:46, Chris Angelico wrote:

> PHP's scoping rules are simpler than Python's. A variable inside a
> function is local unless it's explicitly declared global; function
> names are global. (That's not the whole set of rules, but close enough
> for this argument.) Python, on the other hand, adds the oddity that a
> name referenced inside a function is global unless, somewhere in that
> function, it's assigned to.


As given, comparing only treatment of locals and globals, I don't agree
that this makes PHP's scoping rules simpler.


PHP:
   if the name refers to a function:
     - the name is always global;
   otherwise:
     - the name is local unless explicitly declared global.

Python:
   if the name is declared global:
     - the name is always global;
   otherwise:
     - the name is global unless implicitly declared local.

(Implicitly local means "the name is bound to somewhere in the body of the
function".)

Of course, in reality Python includes further complexity: closures and
nonlocal, neither of which are available in PHP due to the lack of
local functions:

http://gadgetopia.com/post/4089


PHP is simpler because it does less.



-- 
Steven


From phd at phdru.name  Wed Jan  2 04:08:51 2013
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 2 Jan 2013 07:08:51 +0400
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
Message-ID: <20130102030851.GA11279@iskra.aviel.ru>

On Tue, Jan 01, 2013 at 04:54:34PM -0800, alex23 <wuwei23 at gmail.com> wrote:
> The pain people are experiencing with "warts"
> like mutable defaults is entirely from trying to force Python to fit
> mental models they've constructed of other languages.

   Yes. And preserving this mental model is important. There is a common
mental model for similar imperative languages, common set of built-in
types (chars, strings, integers, floats) and containers (arrays and
matrices), common set of operations (addition is always spelled as infix
'plus' sign, logical AND -- as '&' or '&&'); there are functions with
parameters -- usually written inside round parentheses; in
object-oriented languages there are classes with inheritance...
   So it's perfectly natural when people using one language expect
features found in other languages, and expect those features to work in
similar ways.
   Sure, every particular language deviate from that common model. Often
people can tolerate the deviation, sometimes they even praise it for
some reasons. But when a deviation makes pain for too many developers --
there is certainly a problem.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From phd at phdru.name  Wed Jan  2 04:12:01 2013
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 2 Jan 2013 07:12:01 +0400
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <CAPTjJmpBGoXkgSV8fm2xFLL_6GL5CiwQWsSJYZDEoUc-sHAA0w@mail.gmail.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CAPTjJmpBGoXkgSV8fm2xFLL_6GL5CiwQWsSJYZDEoUc-sHAA0w@mail.gmail.com>
Message-ID: <20130102031201.GB11279@iskra.aviel.ru>

On Wed, Jan 02, 2013 at 11:34:47AM +1100, Chris Angelico <rosuav at gmail.com> wrote:
> Whichever way mutable objects in default args are handled, there are
> going to be strangenesses. Therefore the best thing to do is (almost
> certainly) the simplest.

   And the simples thing would be... let me think... forbid mutable
defaults altogether? Or may be make them read-only?
   Current implementation is the simplest from the implementation point
of view, but it requires additional documentation, especially for novice
users. Is it really the simplest?

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From phd at phdru.name  Wed Jan  2 04:16:16 2013
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 2 Jan 2013 07:16:16 +0400
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
Message-ID: <20130102031616.GC11279@iskra.aviel.ru>

On Wed, Jan 02, 2013 at 11:07:58AM +1000, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Mutable default arguments make perfect sense once you
> understand the difference between compile time, definition time and
> execution time for a function. Defaults are evaluated at definition
> time, thus they are necessarily shared across all invocations of the
> function.

   I.e., users have to understand the current implementation. Mutable
defaults are not a language design choice, they are dictated by the
implementation, right?

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From ncoghlan at gmail.com  Wed Jan  2 04:23:42 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 2 Jan 2013 13:23:42 +1000
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130101234916.GA672@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<20130101234916.GA672@iskra.aviel.ru>
Message-ID: <CADiSq7fKB7L1jCu4pyq_E2DdPmoh7bSCQtPsTRtiWuskHFMijw@mail.gmail.com>

On Wed, Jan 2, 2013 at 9:49 AM, Oleg Broytman <phd at phdru.name> wrote:
>> I also disagree that missing features are "warts"; they are just
>> missing features, not something unpleasant that's difficult to get rid
>> of.
>
>    Some of those missing features are near to impossible to get rid of.
> The idea of anonymous code blocks is rejected constantly so no one would
> dare to create a patch.
>    As for their unpleasantness -- it's in the eye of the beholder, of
> course. I'm not going to fight tooth and nail for my vision.

This is why the "wart" term in an inherently bad choice: it polarises
disputes, and creates arguments where none needs to exist. If you
instead split them into "hard problems" and "traps for the unwary",
it's easier to have a more rational discussion and come up with a
shared list.

Interoperable asynchronous IO is an inherently hard problem - Guido's
probably the only person in the world capable of gathering sufficient
interest from the right people to come up with a solution that the
existing async frameworks will be willing to support.

Packaging and software distribution is an inherently hard problem (all
current packaging systems suck, with even the best of them being
either language or platform specific), made even harder in the Python
case by the presence of an existing 90% solution in setuptools.

Anonymous blocks in a language with a strong statement/expression
dichotomy is an inherently hard problem (hence the existence of not
one but two deferred PEPs on the topic: PEP 403 and 3150)

Switch statements in a language without compile time named constants
are an inherently hard problem, and some of the demand for this
construct is reduced due to the availability of higher-order
programming features (i.e. dynamic dispatch to stored callables)
(hence the rejected PEPs 275 and 3103)

A do/until loop has the problem of coming up with an elegant syntax
that is demonstrably superior to while/if/break (hence the deferred
PEP 315)

The design space for things that Python *could* do is unimaginably
vast. The number of changes we can make that won't have the net effect
of making the language worse is vanishingly small by comparison.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ncoghlan at gmail.com  Wed Jan  2 04:25:35 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 2 Jan 2013 13:25:35 +1000
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102031616.GC11279@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
	<20130102031616.GC11279@iskra.aviel.ru>
Message-ID: <CADiSq7e=dStDT61bXBetWP=Xu1uDPr_3AAaaE_FA5=70ksvGdQ@mail.gmail.com>

On Wed, Jan 2, 2013 at 1:16 PM, Oleg Broytman <phd at phdru.name> wrote:
> On Wed, Jan 02, 2013 at 11:07:58AM +1000, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Mutable default arguments make perfect sense once you
>> understand the difference between compile time, definition time and
>> execution time for a function. Defaults are evaluated at definition
>> time, thus they are necessarily shared across all invocations of the
>> function.
>
>    I.e., users have to understand the current implementation. Mutable
> defaults are not a language design choice, they are dictated by the
> implementation, right?

No, they're not an implementation accident, they're part of the
language design. It's OK if you don't like them, but please stop
claiming they're a CPython implementation artifact.

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From rosuav at gmail.com  Wed Jan  2 05:22:40 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 15:22:40 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <50E3859E.8030003@pearwood.info>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<50E3859E.8030003@pearwood.info>
Message-ID: <CAPTjJmpzkpNumYP3rQrocXCaitkd6XRnineLF85-5DR5dVZ0kw@mail.gmail.com>

On Wed, Jan 2, 2013 at 11:55 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> * in Python, bindings ALWAYS occur when the code is executed;
>
> * in Python, "x=<whatever>" is a binding;
>
> * even inside a function definition;

Hey, that's a cool way of looking at it! I never thought of it that
way. So default arguments are simply assigned to right back at
function definition time, even though they're locals. Neat!

ChrisA


From greg.ewing at canterbury.ac.nz  Wed Jan  2 05:25:13 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 02 Jan 2013 17:25:13 +1300
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
Message-ID: <50E3B6A9.70500@canterbury.ac.nz>

alex23 wrote:
> The "internal sense" of mutable defaults is that everything is an
> object, and that functions arguments are declared at definition and
> not run-time. What you call "implementation artifact" I see as
> expected behaviour; any other implementation that didn't provide this
> wouldn't be Python in a number of fundamental ways.

What the people who object to this behaviour are really
complaining about is not that the default value is mutable,
but that the default expression is not re-evaluated on
every call.

To me, the justification for this is clear: most of the
time, evaluation on every call is not necessary, so doing
it would be needlessly inefficient. For those cases where
you need a fresh value each time, there is a straightforward
way to get it.

-- 
Greg


From rosuav at gmail.com  Wed Jan  2 05:27:07 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 15:27:07 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
Message-ID: <CAPTjJmoBs=ZRXz4OjzzLo+bsmWmfo_8gp2i8ET8VLVUSdA-P6g@mail.gmail.com>

On Wed, Jan 2, 2013 at 12:07 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> FWIW, I prefer the term "traps for the unwary" over "warts", since
> it's less judgmental and better covers the goal of issues for people
> which can cause problems with learning the language.

Sure. I prefer a shorter keyword-like name, but I think we're talking
about the same thing here.

ChrisA


From rosuav at gmail.com  Wed Jan  2 05:35:39 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 15:35:39 +1100
Subject: [Python-ideas] Documenting Python warts on Stack Overflow
In-Reply-To: <50E394D1.5060908@pearwood.info>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<CAPTjJmqvZJO2+n4+DoqJ5xocDN7VEtir4xscZfB0LhOZZrG0tQ@mail.gmail.com>
	<50E394D1.5060908@pearwood.info>
Message-ID: <CAPTjJmp5unjT_8gdjZ+Y=sdwyj96FNC4B8MPh78cSe3o1hij=w@mail.gmail.com>

On Wed, Jan 2, 2013 at 1:00 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On 02/01/13 10:46, Chris Angelico wrote:
>
>> PHP's scoping rules are simpler than Python's. A variable inside a
>> function is local unless it's explicitly declared global; function
>> names are global. (That's not the whole set of rules, but close enough
>> for this argument.) Python, on the other hand, adds the oddity that a
>> name referenced inside a function is global unless, somewhere in that
>> function, it's assigned to.
>
>
>
> As given, comparing only treatment of locals and globals, I don't agree
> that this makes PHP's scoping rules simpler.
>
> PHP:
>   if the name refers to a function:
>     - the name is always global;

Not quite. Python has the concept of "names" which might be bound to
function objects, or might be bound to simple integers. PHP has two
completely separate namespaces.

<?php
function foo()
{
        echo "Function foo\n";
}
$foo = 1;
echo "foo: ".$foo."\n";
foo();
?>

The variable $foo and the function foo() don't collide, so this isn't
a rule that governs where the name "foo" is looked up. Python has no
such distinction, so code like this does exactly what you would
expect:

def foo():
  pass
bar = foo
def quux():
  bar() # No assignment in the function, so look for a global name 'bar'.


> PHP is simpler because it does less.

Right. And the rules of a Turing tarpit like Ook are even simpler.
Further proof that design warts are not, in and of themselves,
necessarily bad.

ChrisA


From solipsis at pitrou.net  Wed Jan  2 08:29:01 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 2 Jan 2013 08:29:01 +0100
Subject: [Python-ideas] Documenting Python warts
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
	<20130102031616.GC11279@iskra.aviel.ru>
	<CADiSq7e=dStDT61bXBetWP=Xu1uDPr_3AAaaE_FA5=70ksvGdQ@mail.gmail.com>
Message-ID: <20130102082901.2d6a4a63@pitrou.net>

On Wed, 2 Jan 2013 13:25:35 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Wed, Jan 2, 2013 at 1:16 PM, Oleg Broytman <phd at phdru.name> wrote:
> > On Wed, Jan 02, 2013 at 11:07:58AM +1000, Nick Coghlan <ncoghlan at gmail.com> wrote:
> >> Mutable default arguments make perfect sense once you
> >> understand the difference between compile time, definition time and
> >> execution time for a function. Defaults are evaluated at definition
> >> time, thus they are necessarily shared across all invocations of the
> >> function.
> >
> >    I.e., users have to understand the current implementation. Mutable
> > defaults are not a language design choice, they are dictated by the
> > implementation, right?
> 
> No, they're not an implementation accident, they're part of the
> language design. It's OK if you don't like them, but please stop
> claiming they're a CPython implementation artifact.

Let's call them a compromise then, but calling them a language feature
sounds delusional. I can't remember ever taking advantage of the fact
that mutable default arguments are shared accross function invocations.

Regards

Antoine.




From solipsis at pitrou.net  Wed Jan  2 08:35:02 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 2 Jan 2013 08:35:02 +0100
Subject: [Python-ideas] Documenting Python warts
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<20130101234916.GA672@iskra.aviel.ru>
Message-ID: <20130102083502.7a62bf53@pitrou.net>

On Wed, 2 Jan 2013 03:49:16 +0400
Oleg Broytman <phd at phdru.name> wrote:
> > 
> > The problem is you are listing examples which *in your opinion* are
> > issues with Python. Other people would have different ideas of what is
> > an issue and what is not. This can't be the right methodology if we
> > want to write a piece of Python docs.
> 
>    Absolutely not. I collected the list of examples in reply to a
> question "what are warts and why one cannot just document solutions?" I
> hope I managed to show that warts are built (or unbuilt, so to say) so
> deep in Python and the stdlib design it's impossible to fix them with
> code or documentation.

Now please stop FUDding. It is outrageous to claim that missing
features are "impossible to fix with code or documentation".

If you come with a reasonable syntax for anonymous code blocks (and
have a patch to back that up), I'm sure they would be accepted. If you
can't or don't want to, then you can't accuse our community of being
biased against anonymous code blocks.

Regards

Antoine.




From mwm at mired.org  Wed Jan  2 07:04:24 2013
From: mwm at mired.org (Mike Meyer)
Date: Wed, 02 Jan 2013 00:04:24 -0600
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130101234916.GA672@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<20130101234916.GA672@iskra.aviel.ru>
Message-ID: <beb38889-bf97-4a5e-bae2-03e8df898dae@email.android.com>



Oleg Broytman <phd at phdru.name> wrote:
>   Well, some warts are quite well-known. My counter overflows when I
>try to count how many times anonymous code blocks have been proposed
>and rejected.
>   IIRC Mr. van Rossum admitted that for/else was a design mistake.

As I recall it, that wasn't because they were a bad idea per se, but because the  minor upside they provide isn't worth the confusion they create for newcomers.

But since we're referencing the BDFL, IIRC he isn't against anonymous code blocks (and I believe that is by far the most proposed/requested feature) per se. The proposals all seem to fail in one of three ways: 1) embedding them in expressions when indentation denotes block structure just invites unreadable code; 2) putting them in a separate block requires a name, and we already have def if the programmer provides it; or 3) providing an implicit name for a separate block isn't enough of a win to violate "explicit is better than implicit". There have been some let/where type suggestions, but those are more about namespaces than anonymous code blocks.
-- 
Sent from my Android tablet with K-9 Mail. Please excuse my swyping.


From rosuav at gmail.com  Wed Jan  2 09:12:25 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 19:12:25 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102082901.2d6a4a63@pitrou.net>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
	<20130102031616.GC11279@iskra.aviel.ru>
	<CADiSq7e=dStDT61bXBetWP=Xu1uDPr_3AAaaE_FA5=70ksvGdQ@mail.gmail.com>
	<20130102082901.2d6a4a63@pitrou.net>
Message-ID: <CAPTjJmqVbNVoRQtXeTRfJex2=vmjJC2W5VOOmkq+A3czepejLQ@mail.gmail.com>

On Wed, Jan 2, 2013 at 6:29 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Let's call them a compromise then, but calling them a language feature
> sounds delusional. I can't remember ever taking advantage of the fact
> that mutable default arguments are shared accross function invocations.

One common use is caching, as I mentioned earlier (with a contrived
example). Another huge benefit is efficiency - construct a heavy
object once and keep using it. There are others.

It's a feature that can bite people, but no less a feature for that.

ChrisA


From tjreedy at udel.edu  Wed Jan  2 09:54:43 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 02 Jan 2013 03:54:43 -0500
Subject: [Python-ideas] Documenting Python warts on Stack Overflow
In-Reply-To: <20121231000012.GA10426@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
Message-ID: <kc0sl7$hvm$1@ger.gmane.org>

On 12/30/2012 7:00 PM, Oleg Broytman wrote:

>> A list of such issue without solution doesn't help anyone.
>
>     I cannot say for Anatoly but for me warts are:

Another list that to me is off-topic for this list. Go to python-list, 
which is meant for such things.

If you have a (one) specific idea for improving (c)python, that is not 
an energy sucking rehash of rejected ideas, then post is.

-- 
Terry Jan Reedy



From tjreedy at udel.edu  Wed Jan  2 09:58:44 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 02 Jan 2013 03:58:44 -0500
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102000113.GB672@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
Message-ID: <kc0ssp$hvm$2@ger.gmane.org>

On 1/1/2013 7:01 PM, Oleg Broytman wrote:

>     What about warts that don't have internal sense? Mutable default
> parameters are just artifacts of the implementation. What is their
> "internal sense"?

This has been discussed (asked and answered) several times on python-list.

-- 
Terry Jan Reedy



From steve at pearwood.info  Wed Jan  2 10:23:26 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 02 Jan 2013 20:23:26 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <CAPTjJmoBs=ZRXz4OjzzLo+bsmWmfo_8gp2i8ET8VLVUSdA-P6g@mail.gmail.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
	<CAPTjJmoBs=ZRXz4OjzzLo+bsmWmfo_8gp2i8ET8VLVUSdA-P6g@mail.gmail.com>
Message-ID: <50E3FC8E.4080803@pearwood.info>

On 02/01/13 15:27, Chris Angelico wrote:
> On Wed, Jan 2, 2013 at 12:07 PM, Nick Coghlan<ncoghlan at gmail.com>  wrote:
>> FWIW, I prefer the term "traps for the unwary" over "warts", since
>> it's less judgmental and better covers the goal of issues for people
>> which can cause problems with learning the language.
>
> Sure. I prefer a shorter keyword-like name, but I think we're talking
> about the same thing here.

"Gotcha".


Actually I prefer to distinguish between gotchas and warts. A gotcha is
something that makes sense and even has a use, but can still surprise
those who aren't expecting it. (E.g. mutable defaults.) A wart is
something that has no use, but can't (easily, or at all) be removed.
Example:

t = (None, [], None)
t[1] += [0]

Even though the list is successfully modified, the operation still fails
with an exception.



-- 
Steven


From steve at pearwood.info  Wed Jan  2 10:27:51 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 02 Jan 2013 20:27:51 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <beb38889-bf97-4a5e-bae2-03e8df898dae@email.android.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<20130101234916.GA672@iskra.aviel.ru>
	<beb38889-bf97-4a5e-bae2-03e8df898dae@email.android.com>
Message-ID: <50E3FD97.5020905@pearwood.info>

On 02/01/13 17:04, Mike Meyer wrote:
>
>
> Oleg Broytman<phd at phdru.name>  wrote:
>>    Well, some warts are quite well-known. My counter overflows when I
>> try to count how many times anonymous code blocks have been proposed
>> and rejected.
>>    IIRC Mr. van Rossum admitted that for/else was a design mistake.
>
> As I recall it, that wasn't because they were a bad idea per se, but
>because the  minor upside they provide isn't worth the confusion they
>create for newcomers.

There would be a lot less confusion if they weren't called "else". Even
now, I have to explicitly remind myself that the else block doesn't
run if the for loop is empty, but *after* the for block.

# Python 4000 proposal:
for x in seq:
     ...
then:
     # this is skipped by a break
else:
     # this runs only if seq is empty



-- 
Steven


From steve at pearwood.info  Wed Jan  2 10:31:54 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 02 Jan 2013 20:31:54 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102082901.2d6a4a63@pitrou.net>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
	<20130102031616.GC11279@iskra.aviel.ru>
	<CADiSq7e=dStDT61bXBetWP=Xu1uDPr_3AAaaE_FA5=70ksvGdQ@mail.gmail.com>
	<20130102082901.2d6a4a63@pitrou.net>
Message-ID: <50E3FE8A.4020203@pearwood.info>

On 02/01/13 18:29, Antoine Pitrou wrote:
> On Wed, 2 Jan 2013 13:25:35 +1000
> Nick Coghlan<ncoghlan at gmail.com>  wrote:
>> On Wed, Jan 2, 2013 at 1:16 PM, Oleg Broytman<phd at phdru.name>  wrote:
>>> On Wed, Jan 02, 2013 at 11:07:58AM +1000, Nick Coghlan<ncoghlan at gmail.com>  wrote:
>>>> Mutable default arguments make perfect sense once you
>>>> understand the difference between compile time, definition time and
>>>> execution time for a function. Defaults are evaluated at definition
>>>> time, thus they are necessarily shared across all invocations of the
>>>> function.
>>>
>>>     I.e., users have to understand the current implementation. Mutable
>>> defaults are not a language design choice, they are dictated by the
>>> implementation, right?
>>
>> No, they're not an implementation accident, they're part of the
>> language design. It's OK if you don't like them, but please stop
>> claiming they're a CPython implementation artifact.
>
> Let's call them a compromise then, but calling them a language feature
> sounds delusional. I can't remember ever taking advantage of the fact
> that mutable default arguments are shared accross function invocations.

I've never taken advantage of multiprocessing. Does that mean that it is
"delusional" to call multiprocessing a feature?

On the other hand, I have made use of early binding of function defaults,
and consider it a good feature of the language. Early binding is not just
for mutable defaults.




-- 
Steven


From rosuav at gmail.com  Wed Jan  2 10:37:51 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 20:37:51 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <50E3FD97.5020905@pearwood.info>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<20130101234916.GA672@iskra.aviel.ru>
	<beb38889-bf97-4a5e-bae2-03e8df898dae@email.android.com>
	<50E3FD97.5020905@pearwood.info>
Message-ID: <CAPTjJmqW28K-v5_5kT_4W-Gnt28VKbZbyYSaJGXNraNzHBCniQ@mail.gmail.com>

On Wed, Jan 2, 2013 at 8:27 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> There would be a lot less confusion if they weren't called "else". Even
> now, I have to explicitly remind myself that the else block doesn't
> run if the for loop is empty, but *after* the for block.
>
> # Python 4000 proposal:
> for x in seq:
>     ...
> then:
>     # this is skipped by a break
> else:
>     # this runs only if seq is empty

Calling it "else" makes perfect sense if you're searching for something.

for x in lst:
  if x.is_what_we_want(): break
else:
  x=thing()
  lst.append(x)

ChrisA


From steve at pearwood.info  Wed Jan  2 10:49:32 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 02 Jan 2013 20:49:32 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <CAPTjJmqW28K-v5_5kT_4W-Gnt28VKbZbyYSaJGXNraNzHBCniQ@mail.gmail.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<20130101234916.GA672@iskra.aviel.ru>
	<beb38889-bf97-4a5e-bae2-03e8df898dae@email.android.com>
	<50E3FD97.5020905@pearwood.info>
	<CAPTjJmqW28K-v5_5kT_4W-Gnt28VKbZbyYSaJGXNraNzHBCniQ@mail.gmail.com>
Message-ID: <50E402AC.1090006@pearwood.info>

On 02/01/13 20:37, Chris Angelico wrote:
> On Wed, Jan 2, 2013 at 8:27 PM, Steven D'Aprano<steve at pearwood.info>  wrote:
>> There would be a lot less confusion if they weren't called "else". Even
>> now, I have to explicitly remind myself that the else block doesn't
>> run if the for loop is empty, but *after* the for block.
>>
>> # Python 4000 proposal:
>> for x in seq:
>>      ...
>> then:
>>      # this is skipped by a break
>> else:
>>      # this runs only if seq is empty
>
> Calling it "else" makes perfect sense if you're searching for something.
>
> for x in lst:
>    if x.is_what_we_want(): break
> else:
>    x=thing()
>    lst.append(x)

Not really. The "else" doesn't match the "if", it matches the "for". That's
the problem really. Besides, your example is insufficiently general. You can't
assume that the "else" immediately follows the "if", let alone the correct if.


for x in lst:
     if x.is_what_we_want():
         break
     do_something()
     and_another_thing()
     if today is Tuesday:
         print("we must be in Belgium")
else:
     x = thing()
     lst.append(x)


So at best it makes *imperfect* sense, sometimes.


-- 
Steven


From ben+python at benfinney.id.au  Wed Jan  2 10:58:07 2013
From: ben+python at benfinney.id.au (Ben Finney)
Date: Wed, 02 Jan 2013 20:58:07 +1100
Subject: [Python-ideas] Documenting Python warts
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
Message-ID: <7w4nizucsw.fsf@benfinney.id.au>

Nick Coghlan <ncoghlan at gmail.com> writes:

> FWIW, I prefer the term "traps for the unwary" over "warts", since
> it's less judgmental and better covers the goal of issues for people
> which can cause problems with learning the language.

I limit my use of ?wart? to traps for the unwary which are acknowledged
by most core developers to have been a sub-optimal design decision.

They are things one needs to know about Python, the language, which if
the designers had their druthers would not have been such a trap ? but
now we're stuck with them for backward compatibility or lack of a
feasible better design, etc.

In other words, I don't call it a ?wart? unless the core developers
agree with me that it's a wart :-)

-- 
 \           ?There is no reason anyone would want a computer in their |
  `\     home.? ?Ken Olson, president, chairman and founder of Digital |
_o__)                                            Equipment Corp., 1977 |
Ben Finney



From rosuav at gmail.com  Wed Jan  2 11:01:57 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 2 Jan 2013 21:01:57 +1100
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <50E402AC.1090006@pearwood.info>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<20130101234916.GA672@iskra.aviel.ru>
	<beb38889-bf97-4a5e-bae2-03e8df898dae@email.android.com>
	<50E3FD97.5020905@pearwood.info>
	<CAPTjJmqW28K-v5_5kT_4W-Gnt28VKbZbyYSaJGXNraNzHBCniQ@mail.gmail.com>
	<50E402AC.1090006@pearwood.info>
Message-ID: <CAPTjJmq4OkgLzWwRjs7h+TKEjV2Qp6ULnTqzyYpVt7m=zA5xJA@mail.gmail.com>

On Wed, Jan 2, 2013 at 8:49 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On 02/01/13 20:37, Chris Angelico wrote:
>> Calling it "else" makes perfect sense if you're searching for something.
>>
>> for x in lst:
>>    if x.is_what_we_want(): break
>> else:
>>    x=thing()
>>    lst.append(x)
>
>
> Not really. The "else" doesn't match the "if", it matches the "for". That's
> the problem really. Besides, your example is insufficiently general. You
> can't
> assume that the "else" immediately follows the "if", let alone the correct
> if.
>
>
>
> for x in lst:
>     if x.is_what_we_want():
>         break
>     do_something()
>     and_another_thing()
>     if today is Tuesday:
>         print("we must be in Belgium")
> else:
>     x = thing()
>     lst.append(x)
>
>
> So at best it makes *imperfect* sense, sometimes.

Thinking functionally, the for loop is searching for an element in the
list. It'll either find something (and break) or not find anything
(and raise StopIteration). If it finds something, do stuff and break,
else do other stuff. The "else" of the logic corresponds to the
"else:" clause.

Not saying it's always right, but it does at least make some sense in
that particular application, which is a reasonably common one. I've
coded exactly that logic in C++, using a goto to do a "break and skip
the else clause" (with a comment to the effect that I'd rather be
writing Python...).

ChrisA


From wuwei23 at gmail.com  Wed Jan  2 11:59:39 2013
From: wuwei23 at gmail.com (alex23)
Date: Wed, 2 Jan 2013 02:59:39 -0800 (PST)
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102030851.GA11279@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
	<20130102030851.GA11279@iskra.aviel.ru>
Message-ID: <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com>

On Jan 2, 1:08?pm, Oleg Broytman <p... at phdru.name> wrote:
> ? ?So it's perfectly natural when people using one language expect
> features found in other languages, and expect those features to work in
> similar ways.

I think anyone coming from one language to another expecting the
latter to be just like the first is either an inexperienced or a bad
programmer.

There is no way you can make Python fit either the call by reference
or call by value models, although people regularly try, and the
attempt is always painful & torturous to watch. So already Python has
"deviated" drastically from the base expectations of most (generally
static-type lang'd) programmers. Is this a problem, or is this one of
the fundamental design decisions of Python that makes it appealing?
(For me, not having to deal with either of the call by reference or
value models is one of the main reasons I prefer to work with Python.)

The Lisp/Scheme community might take exception over claims that
addition is "always" an infix operation as well.

> Often
> people can tolerate the deviation, sometimes they even praise it for
> some reasons.

I don't really follow what you're trying to say here. I'm not
"tolerating" any "deviations" in Python, I'm actively using it because
I prefer its entire design. If anything, I'm choosing it _because_ it
deviates from other language's approaches. What you seem to be
advocating is that all languages be nothing more than syntactic sugar
for the same underlying model. In that case, what advantage is there
in having any language other than some baseline accepted one, like C?


From wuwei23 at gmail.com  Wed Jan  2 12:05:39 2013
From: wuwei23 at gmail.com (alex23)
Date: Wed, 2 Jan 2013 03:05:39 -0800 (PST)
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <50E3B6A9.70500@canterbury.ac.nz>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
	<50E3B6A9.70500@canterbury.ac.nz>
Message-ID: <9864e14d-84ac-42a6-aaf6-ff263daa3426@po6g2000pbb.googlegroups.com>

On Jan 2, 2:25?pm, Greg Ewing <greg.ew... at canterbury.ac.nz> wrote:
> What the people who object to this behaviour are really
> complaining about is not that the default value is mutable,
> but that the default expression is not re-evaluated on
> every call.

Sorry, I should have said "mutable arguments" over "defaults", because
the problem also bites people passing mutable objects to functions and
expecting them to be copied.

> To me, the justification for this is clear: most of the
> time, evaluation on every call is not necessary, so doing
> it would be needlessly inefficient. For those cases where
> you need a fresh value each time, there is a straightforward
> way to get it.

Absolutely agreed. I have deliberately used this behaviour on a number
of occasions in ways that I believe makes my code clearer, so it
always frustrates me to hear it described as a "wart".


From wuwei23 at gmail.com  Wed Jan  2 12:08:07 2013
From: wuwei23 at gmail.com (alex23)
Date: Wed, 2 Jan 2013 03:08:07 -0800 (PST)
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102082901.2d6a4a63@pitrou.net>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
	<20130102031616.GC11279@iskra.aviel.ru>
	<CADiSq7e=dStDT61bXBetWP=Xu1uDPr_3AAaaE_FA5=70ksvGdQ@mail.gmail.com>
	<20130102082901.2d6a4a63@pitrou.net>
Message-ID: <48801f96-3f54-4821-837c-5156e32169f2@ui9g2000pbc.googlegroups.com>

On Jan 2, 5:29?pm, Antoine Pitrou <solip... at pitrou.net> wrote:
> Let's call them a compromise then, but calling them a language feature
> sounds delusional. I can't remember ever taking advantage of the fact
> that mutable default arguments are shared accross function invocations.

I'd say it's slightly more delusional to believe that if _you_ haven't
used a language feature, that it's not a "feature".


From hernan.grecco at gmail.com  Wed Jan  2 12:20:50 2013
From: hernan.grecco at gmail.com (Hernan Grecco)
Date: Wed, 2 Jan 2013 12:20:50 +0100
Subject: [Python-ideas] Order in the documentation search results
In-Reply-To: <50E142FF.3070101@drees.name>
References: <CAL6gwWXikjrYG+f+sqnm3k2mtNXCasTD7Uj_ABY=JNLi4eBNhQ@mail.gmail.com>
	<50E083BA.7000603@nedbatchelder.com> <kbq5l9$g8o$1@ger.gmane.org>
	<50E142FF.3070101@drees.name>
Message-ID: <CAL6gwWUWXXqv=1zMtL0WpYkhBjurc0jb4uQi7ff6rNnX=1TFsQ@mail.gmail.com>

Hi,

Thanks for all the feedback. I was hacking the sphinx indexer and the
javacript searchtool today. I think the search results can be improved
by patching sphinx upstream and adding a small project dependent (in
this case Python) javascript snippet. I have created a proposal in the
Sphinx Issue tracker [0]. Let's move the discussion there.

best,

Hernan

[0] https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results



On Mon, Dec 31, 2012 at 8:47 AM, Stefan Drees <stefan at drees.name> wrote:
> On 30.12.12 20:45, Georg Brandl wrote:
>>
>> On 12/30/2012 07:11 PM, Ned Batchelder wrote:
>>>
>>> On 12/30/2012 12:54 PM, Hernan Grecco wrote:
>>>>
>>>> ...
>>>>
>>>> I have seen many people new to Python stumbling while using the Python
>>>> docs due to the order of the search results.
>>>> ...
>>>>
>>>> So my suggestion is to put the builtins first, the rest of the
>>>> standard lib later including HowTos, FAQ, etc and finally the
>>>> c-modules. Additionally, a section with a title matching exactly the
>>>> search query should come first. (I am not sure if the last suggestion
>>>> belongs in python-ideas or in
>>>> the sphinx mailing list, please advice)
>>>
>>>
>>> While we're on the topic, why in this day and age do we have a custom
>>> search?  Using google site search would be faster for the user, and more
>>> accurate.
>>
>>
>> I agree.  Someone needs to propose a patch though.
>> ...
>
>
> a custom search in itself is a wonderful thing. To me it also shows more
> appreciation of visitor concerns than thoses sites, that are just _offering_
> google site search (which is accessible anyway to every visitor capable of
> memorizing the google or bing or whatnot URL).
>
> I second Hernans suggestion about ordering and also his question where the
> request (and patches) should be directed to.
>
> All the best,
> Stefan.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas


From phd at phdru.name  Wed Jan  2 12:35:45 2013
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 2 Jan 2013 15:35:45 +0400
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
	<20130102030851.GA11279@iskra.aviel.ru>
	<673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com>
Message-ID: <20130102113545.GA23780@iskra.aviel.ru>

On Wed, Jan 02, 2013 at 02:59:39AM -0800, alex23 <wuwei23 at gmail.com> wrote:
> What you seem to be
> advocating is that all languages be nothing more than syntactic sugar
> for the same underlying model.

   Yes, von Neumann architecture.

> In that case, what advantage is there
> in having any language other than some baseline accepted one, like C?

   So now we know why C is still the most popular language.

   Other languages have their advantages, though. Their syntactic sugar
is sweeter or have different tastes.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From ncoghlan at gmail.com  Wed Jan  2 12:41:46 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 2 Jan 2013 21:41:46 +1000
Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update
	sequence
Message-ID: <CADiSq7dmibf7TKo2KYRTReZB68jrHQnnNhiWTXJAZdCHBgZdqA@mail.gmail.com>

Gah, the PEP number in the subject should, of course, be 432 (not 342).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From shane at umbrellacode.com  Wed Jan  2 12:52:27 2013
From: shane at umbrellacode.com (Shane Green)
Date: Wed, 2 Jan 2013 03:52:27 -0800
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <48801f96-3f54-4821-837c-5156e32169f2@ui9g2000pbc.googlegroups.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
	<20130102031616.GC11279@iskra.aviel.ru>
	<CADiSq7e=dStDT61bXBetWP=Xu1uDPr_3AAaaE_FA5=70ksvGdQ@mail.gmail.com>
	<20130102082901.2d6a4a63@pitrou.net>
	<48801f96-3f54-4821-837c-5156e32169f2@ui9g2000pbc.googlegroups.com>
Message-ID: <03A4CAE9-4FE5-45A3-8EC9-A72BD4915985@umbrellacode.com>

RE: 

> I can't remember ever taking advantage of the fact
> that mutable default arguments are shared accross function invocations.

Can you remember taking advantage of the fact Python is logical, consistent, and elegant?  I tend to think its lack of syntactic sugar and exceptions set it apart.  Although there are sometimes things that could bite you, there's a lot of value in having those things be perfectly predictable, like having default argument values evaluated once, when the function declaration is evaluated.  To do it any other way would introduce an unnecessary "except when" into the explanation of Python.  



Shane Green 
www.umbrellacode.com
805-452-9666 | shane at umbrellacode.com

On Jan 2, 2013, at 3:08 AM, alex23 <wuwei23 at gmail.com> wrote:

> On Jan 2, 5:29 pm, Antoine Pitrou <solip... at pitrou.net> wrote:
>> Let's call them a compromise then, but calling them a language feature
>> sounds delusional. I can't remember ever taking advantage of the fact
>> that mutable default arguments are shared accross function invocations.
> 
> I'd say it's slightly more delusional to believe that if _you_ haven't
> used a language feature, that it's not a "feature".
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130102/ea66a9c5/attachment.html>

From maxmoroz at gmail.com  Wed Jan  2 13:06:14 2013
From: maxmoroz at gmail.com (Max Moroz)
Date: Wed, 2 Jan 2013 04:06:14 -0800
Subject: [Python-ideas] Preventing out of memory conditions
In-Reply-To: <CAGE7PNLC_a2LOsheqO+EE-G+fMNfiT0F9EzN6ZxGVWKDReM8Jw@mail.gmail.com>
References: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
	<CAL9jXCGOFArkn2E7Zx5sgsrqKAJXxMogU+t_gOdR5_G7fJKiSw@mail.gmail.com>
	<CAGE7PNLC_a2LOsheqO+EE-G+fMNfiT0F9EzN6ZxGVWKDReM8Jw@mail.gmail.com>
Message-ID: <CAOVPiMhxPWqQFmQpV-c=szYFHtpPbUCh_RZHTYYUYTqUh54oEw@mail.gmail.com>

On Mon, Dec 31, 2012 at 7:22 PM, Gregory P. Smith <greg at krypto.org> wrote:

> Within CPython the way the C API is today it is too late by the time the
> code to raise a MemoryError has been called so capturing all places that
> could occur is not easy.
> Implementing this at the C level malloc later makes
> more sense. Have it dip into a reserved low memory pool to satisfy the
> current request and send the process a signal indicating it is running low.
> This approach would also work with C extension modules or an embedded
> Python.

Regarding the C malloc solution, wouldn't a callback be preferable to
a signal? If I understood you correctly, signal implies that a
different thread will handle it. At any reasonable size of the
emergency memory pool is, there will be situations when the next
memory allocation is greater than that size, leading to the very same
problem you described later in your message when you talked about the
disadvantage of polling. In addition, if the signal processing happens
a bit slow (perhaps simply due to the thread scheduler being slow to
switch), by the time enough memory is released, it may be too late -
the next memory allocation may have already come in. Unless I'm
missing something, the (synchronous) callback seems to be a strictly
better than the (asynchronous) signal.

As to your main point that this functionality should be inside C
malloc rather than pymalloc, I agree, but only if the objective is to
provide an all-purpose, highly general "low memory condition"
handling. (I'm not sure if malloc knows enough about the OS to define
"low memory condition" well; but it's certain that pymalloc doesn't).

But I was going for a more modest goal. Rather than be warned of the
pending for MemoryError exception, a developer could simply be
notified via callback when the maximum absolute memory used by his app
exceeds a certain limit. pymalloc could very easily call back a
designated function when when the next memory allocation exceeds this
threshold.

In many real-life situations, it's not that hard to estimate how much
RAM the application should be allowed to consume. Sure, the developer
would need to learn a little about the platforms his app is running
on, and use OS-specific rules to set the memory limit, but that effort
is modest, and the payoff is huge. Not to mention, a developer with a
particularly technically savvy end users could even skip this work
entirely by letting his end users set the memory limit per-session.

There is a huge advantage of the pymalloc solution (with the set
memory limit) vs. the C malloc solution (with the generic low memory
condition). On my system, I don't want the application to use (almost)
all the available memory before it starts to manage its cache. In
fact, by the time the physical memory use approaches my total physical
RAM, the system slows down considerably as many other applications get
swapped to disk by the OS. With a set memory limit, I can provide a
much more granular control over the memory used by the application.

Of course, the set memory limit could also be implemented inside C
malloc rather than inside pymalloc. But this requires that developers
rewrite C runtime's memory manager on every platform, and then
recompile their Python with it. The changes to pymalloc, on the other
hand, would be relatively small.

> I'd expect this already exists but I haven't looked for one.

All I found is this comment in XEmacs documentation about vm-limit.c:
http://www.xemacs.org/Documentation/21.5/html/internals_17.html, but
I'm not sure if it's XEmacs feature or if malloc itself supports it.

> Having a thread polling memory use it not generally wise as that is polling
> rather than event driven and could easily miss low memory situations before
> it is too late and a failure has already happened (allocation demand can
> come in large spikes depending on the application).

Precisely. That's the problem with the best existing solutions (e.g.,
http://stackoverflow.com/a/7332782/336527).

> OSes running processes in constrained environments or ones where the
> resources available can be reduced by the OS later may already send their
> own warning signals prior to outright killing the process but that should
> not preclude an application being able to monitor and constrain itself on
> its own without needing the OS to do it.

I was thinking about regular desktop OS, which certainly doesn't warn
the process sufficiently in advance. The MemoryError exception
basically tells the process that it's going to die soon, and there's
nothing it can do about it.

Max


From stefan at drees.name  Wed Jan  2 13:37:35 2013
From: stefan at drees.name (Stefan Drees)
Date: Wed, 02 Jan 2013 13:37:35 +0100
Subject: [Python-ideas] Order in the documentation search results
In-Reply-To: <CAL6gwWUWXXqv=1zMtL0WpYkhBjurc0jb4uQi7ff6rNnX=1TFsQ@mail.gmail.com>
References: <CAL6gwWXikjrYG+f+sqnm3k2mtNXCasTD7Uj_ABY=JNLi4eBNhQ@mail.gmail.com>
	<50E083BA.7000603@nedbatchelder.com>
	<kbq5l9$g8o$1@ger.gmane.org> <50E142FF.3070101@drees.name>
	<CAL6gwWUWXXqv=1zMtL0WpYkhBjurc0jb4uQi7ff6rNnX=1TFsQ@mail.gmail.com>
Message-ID: <50E42A0F.2040908@drees.name>

Hi hernan,
On 02.01.13 12:20, Hernan Grecco wrote:
> ... Thanks for all the feedback. I was hacking the sphinx indexer and the
> javacript searchtool today. I think the search results can be improved
> by patching sphinx upstream and adding a small project dependent (in
> this case Python) javascript snippet. I have created a proposal in the
> Sphinx Issue tracker [0]. Let's move the discussion there.
> ...
> [0] https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results

thanks a lot for transforming the mail thread to improve the local 
search facility into real code suggestions.

I commented on a first snippet from your suggested patch there.

All the best,
Stefan.

Further historic details:
>
> On Mon, Dec 31, 2012 at 8:47 AM, Stefan Drees <stefan at drees.name> wrote:
>> On 30.12.12 20:45, Georg Brandl wrote:
>>> On 12/30/2012 07:11 PM, Ned Batchelder wrote:
>>>> On 12/30/2012 12:54 PM, Hernan Grecco wrote:
>>>>> ...
>>>>>
>>>>> I have seen many people new to Python stumbling while using the Python
>>>>> docs due to the order of the search results.
>>>>> ...
>>>>>
>>>>> So my suggestion is to put the builtins first, the rest of the
>>>>> standard lib later including HowTos, FAQ, etc and finally the
>>>>> c-modules. Additionally, a section with a title matching exactly the
>>>>> search query should come first. (I am not sure if the last suggestion
>>>>> belongs in python-ideas or in
>>>>> the sphinx mailing list, please advice)
>>>>
>>>>
>>>> While we're on the topic, why in this day and age do we have a custom
>>>> search?  Using google site search would be faster for the user, and more
>>>> accurate.
>>>
>>>
>>> I agree.  Someone needs to propose a patch though.
>>> ...
>>
>>
>> a custom search in itself is a wonderful thing. To me it also shows more
>> appreciation of visitor concerns than thoses sites, that are just _offering_
>> google site search (which is accessible anyway to every visitor capable of
>> memorizing the google or bing or whatnot URL).
>>
>> I second Hernans suggestion about ordering and also his question where the
>> request (and patches) should be directed to.
>> ...



From phd at phdru.name  Wed Jan  2 14:39:21 2013
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 2 Jan 2013 17:39:21 +0400
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102072928.52867fd3@bhuda.mired.org>
References: <20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
	<20130102030851.GA11279@iskra.aviel.ru>
	<673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com>
	<20130102113545.GA23780@iskra.aviel.ru>
	<20130102072928.52867fd3@bhuda.mired.org>
Message-ID: <20130102133921.GA25253@iskra.aviel.ru>

On Wed, Jan 02, 2013 at 07:29:28AM -0600, Mike Meyer <mwm at mired.org> wrote:
> On Wed, 2 Jan 2013 15:35:45 +0400
> Oleg Broytman <phd at phdru.name> wrote:
> > On Wed, Jan 02, 2013 at 02:59:39AM -0800, alex23 <wuwei23 at gmail.com> wrote:
> > > What you seem to be
> > > advocating is that all languages be nothing more than syntactic sugar
> > > for the same underlying model.
> >    Yes, von Neumann architecture.
> 
> So all the differences between FORTRAN II, PROLOG and Python are
> syntactic sugar? I guess that makes preference in programming
> languages just a matter of taste.

   In the original message I used the word "imperative".

   I am crawling off of the discussion to my cave.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From solipsis at pitrou.net  Wed Jan  2 14:47:16 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 2 Jan 2013 14:47:16 +0100
Subject: [Python-ideas] Documenting Python warts
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
	<20130102031616.GC11279@iskra.aviel.ru>
	<CADiSq7e=dStDT61bXBetWP=Xu1uDPr_3AAaaE_FA5=70ksvGdQ@mail.gmail.com>
	<20130102082901.2d6a4a63@pitrou.net>
	<50E3FE8A.4020203@pearwood.info>
Message-ID: <20130102144716.0fffa7eb@pitrou.net>

Le Wed, 02 Jan 2013 20:31:54 +1100,
Steven D'Aprano <steve at pearwood.info> a
?crit :
> >>>
> >>>     I.e., users have to understand the current implementation.
> >>> Mutable defaults are not a language design choice, they are
> >>> dictated by the implementation, right?
> >>
> >> No, they're not an implementation accident, they're part of the
> >> language design. It's OK if you don't like them, but please stop
> >> claiming they're a CPython implementation artifact.
> >
> > Let's call them a compromise then, but calling them a language
> > feature sounds delusional. I can't remember ever taking advantage
> > of the fact that mutable default arguments are shared accross
> > function invocations.
> 
> I've never taken advantage of multiprocessing. Does that mean that it
> is "delusional" to call multiprocessing a feature?

multiprocessing fills a definite use case (and quite an important one).
Early binding of function arguments fills no use case that cannot also
be filled using a private global, a closure, or a class or function
attribute; at best it only saves one or two lines of typing.

Regards

Antoine.




From storchaka at gmail.com  Wed Jan  2 15:01:29 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 02 Jan 2013 16:01:29 +0200
Subject: [Python-ideas] Identity dicts and sets
Message-ID: <kc1eld$buk$1@ger.gmane.org>

I propose to add new standard collection types: IdentityDict and 
IdentitySet. They are almost same as ordinal dict and set, but uses 
identity check instead of equality check (and id() or hash(id()) as a 
hash). They will be useful for pickling, for implementing __sizeof__() 
for compound types, and for other graph algorithms.

Of course, they can be implemented using ordinal dicts:

     IdentityDict: key -> value as a dict: id(key) -> (key, value)
     IdentitySet as a dict: id(value) -> value

However implementing them directly in the core has advantages, it 
consumes less memory and time, and more comfortable for use from C. 
IdentityDict and IdentitySet implementations will share almost all code 
with implementations of ordinal dict and set, only lookup function and 
metainformation will be different. However dict and set already use a 
lookup function overloading.



From fuzzyman at gmail.com  Wed Jan  2 14:58:16 2013
From: fuzzyman at gmail.com (Michael Foord)
Date: Wed, 2 Jan 2013 13:58:16 +0000
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <CAPTjJmqVbNVoRQtXeTRfJex2=vmjJC2W5VOOmkq+A3czepejLQ@mail.gmail.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<CADiSq7dZ4fUTsg-FEp1_XFw7_Fg2kyWPFM=2A==-mq2smYGLXw@mail.gmail.com>
	<20130102031616.GC11279@iskra.aviel.ru>
	<CADiSq7e=dStDT61bXBetWP=Xu1uDPr_3AAaaE_FA5=70ksvGdQ@mail.gmail.com>
	<20130102082901.2d6a4a63@pitrou.net>
	<CAPTjJmqVbNVoRQtXeTRfJex2=vmjJC2W5VOOmkq+A3czepejLQ@mail.gmail.com>
Message-ID: <CAKCKLWzsJKqUQ2XyO7NAHG2ygrp4DY3rmSu_=yS4aD+2BJktbQ@mail.gmail.com>

On 2 January 2013 08:12, Chris Angelico <rosuav at gmail.com> wrote:

> On Wed, Jan 2, 2013 at 6:29 PM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
> > Let's call them a compromise then, but calling them a language feature
> > sounds delusional. I can't remember ever taking advantage of the fact
> > that mutable default arguments are shared accross function invocations.
>
> One common use is caching, as I mentioned earlier (with a contrived
> example). Another huge benefit is efficiency - construct a heavy
> object once and keep using it. There are others.
>
> It's a feature that can bite people, but no less a feature for that.
>


A further (and important) use case is introspection. If default values were
only added at call time (rather than definition time) then you couldn't
introspect the default value - so documentation tools (and other tools)
couldn't have access to them.

Added to which, "evaluation at call time" has its own unexpected and weird
behaviour. Consider:

x = 3
def fun(a=x):
  pass
del x

With evaluation at call time this code fails - and indeed any *re-binding*
of x in the definition scope (at any subsequent time - possibly far removed
from the function definition) affects the function.

So default values being bound at definition times have advantages for
efficiency and introspection, they have use cases for caching, and it
removes some unexpected behaviour. It's definitely a language feature.

All the best,

Michael



>
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 

http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130102/747fbf48/attachment.html>

From stephen at xemacs.org  Wed Jan  2 15:59:29 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 02 Jan 2013 23:59:29 +0900
Subject: [Python-ideas] Preventing out of memory conditions
In-Reply-To: <CAOVPiMhxPWqQFmQpV-c=szYFHtpPbUCh_RZHTYYUYTqUh54oEw@mail.gmail.com>
References: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
	<CAL9jXCGOFArkn2E7Zx5sgsrqKAJXxMogU+t_gOdR5_G7fJKiSw@mail.gmail.com>
	<CAGE7PNLC_a2LOsheqO+EE-G+fMNfiT0F9EzN6ZxGVWKDReM8Jw@mail.gmail.com>
	<CAOVPiMhxPWqQFmQpV-c=szYFHtpPbUCh_RZHTYYUYTqUh54oEw@mail.gmail.com>
Message-ID: <87a9srd41a.fsf@uwakimon.sk.tsukuba.ac.jp>

Max Moroz writes:

 > All I found is this comment in XEmacs documentation about vm-limit.c:
 > http://www.xemacs.org/Documentation/21.5/html/internals_17.html, but
 > I'm not sure if it's XEmacs feature or if malloc itself supports
 > it.

It's an XEmacs feature.  Works for me (but then it would, wouldn't it ;-).

The implementation is just generic C, except for the macros that are
used to access the LISP arena's bounds.  It uses standard functions
like getrlimit where available, otherwise it just uses the end of the
address space to determine the available amount of memory.  I can't
vouch for accuracy or efficiency in determining usage (which is why I
didn't bring it up myself), but there hasn't been a complaint about
its functionality since I've been consistently reading the lists
(1997).

https://bitbucket.org/xemacs/xemacs-beta/src/c65b0329894b09c08423739508d277548a0b1a00/src/vm-limit.c?at=default
https://bitbucket.org/xemacs/xemacs-beta/src/c65b0329894b09c08423739508d277548a0b1a00/src/mem-limits.h?at=default



From guido at python.org  Wed Jan  2 17:16:52 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 2 Jan 2013 09:16:52 -0700
Subject: [Python-ideas] Please stop discussing warts here
Message-ID: <CAP7+vJJYJZZhQbaOdnDVdNzFr=U6-JejEyoxyA5e31kq9gs_oA@mail.gmail.com>

I have just had to mute two threads where people were trying to convince
each other that a certain language feature is/isn't a wart. This form of
educational debate belongs in python-list, not here, please.

--Guido van Rossum (sent from Android phone)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130102/d1f4523c/attachment.html>

From solipsis at pitrou.net  Wed Jan  2 20:34:31 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 2 Jan 2013 20:34:31 +0100
Subject: [Python-ideas] Identity dicts and sets
References: <kc1eld$buk$1@ger.gmane.org>
Message-ID: <20130102203431.7b575019@pitrou.net>

On Wed, 02 Jan 2013 16:01:29 +0200
Serhiy Storchaka <storchaka at gmail.com>
wrote:
> I propose to add new standard collection types: IdentityDict and 
> IdentitySet. They are almost same as ordinal dict and set, but uses 
> identity check instead of equality check (and id() or hash(id()) as a 
> hash). They will be useful for pickling, for implementing __sizeof__() 
> for compound types, and for other graph algorithms.
> 
> Of course, they can be implemented using ordinal dicts:
> 
>      IdentityDict: key -> value as a dict: id(key) -> (key, value)
>      IdentitySet as a dict: id(value) -> value
> 
> However implementing them directly in the core has advantages, it 
> consumes less memory and time, and more comfortable for use from C. 
> IdentityDict and IdentitySet implementations will share almost all code 
> with implementations of ordinal dict and set, only lookup function and 
> metainformation will be different. However dict and set already use a 
> lookup function overloading.

I'm ok with this proposal.

Regards

Antoine.




From eliben at gmail.com  Wed Jan  2 20:43:47 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Wed, 2 Jan 2013 11:43:47 -0800
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <kc1eld$buk$1@ger.gmane.org>
References: <kc1eld$buk$1@ger.gmane.org>
Message-ID: <CAF-Rda9NPdCZmOfmt33XnConhVy50xAof2CS=Gahmxd0AqUunQ@mail.gmail.com>

On Wed, Jan 2, 2013 at 6:01 AM, Serhiy Storchaka <storchaka at gmail.com>wrote:

> I propose to add new standard collection types: IdentityDict and
> IdentitySet. They are almost same as ordinal dict and set, but uses
> identity check instead of equality check (and id() or hash(id()) as a
> hash). They will be useful for pickling, for implementing __sizeof__() for
> compound types, and for other graph algorithms.
>
> Of course, they can be implemented using ordinal dicts:
>
>     IdentityDict: key -> value as a dict: id(key) -> (key, value)
>     IdentitySet as a dict: id(value) -> value
>
> However implementing them directly in the core has advantages, it consumes
> less memory and time, and more comfortable for use from C. IdentityDict and
> IdentitySet implementations will share almost all code with implementations
> of ordinal dict and set, only lookup function and metainformation will be
> different. However dict and set already use a lookup function overloading.
>
>
I agree that the data structures may be useful, but is there no way to some
allow the customization of existing data structures instead, without losing
performance? It's a shame to have another kind of dict just for this
purpose.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130102/2b2bba18/attachment.html>

From solipsis at pitrou.net  Wed Jan  2 21:03:48 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 2 Jan 2013 21:03:48 +0100
Subject: [Python-ideas] Identity dicts and sets
References: <kc1eld$buk$1@ger.gmane.org>
	<CAF-Rda9NPdCZmOfmt33XnConhVy50xAof2CS=Gahmxd0AqUunQ@mail.gmail.com>
Message-ID: <20130102210348.2ae0a985@pitrou.net>

On Wed, 2 Jan 2013 11:43:47 -0800
Eli Bendersky <eliben at gmail.com> wrote:
> On Wed, Jan 2, 2013 at 6:01 AM, Serhiy Storchaka <storchaka at gmail.com>wrote:
> 
> > I propose to add new standard collection types: IdentityDict and
> > IdentitySet. They are almost same as ordinal dict and set, but uses
> > identity check instead of equality check (and id() or hash(id()) as a
> > hash). They will be useful for pickling, for implementing __sizeof__() for
> > compound types, and for other graph algorithms.
> >
> > Of course, they can be implemented using ordinal dicts:
> >
> >     IdentityDict: key -> value as a dict: id(key) -> (key, value)
> >     IdentitySet as a dict: id(value) -> value
> >
> > However implementing them directly in the core has advantages, it consumes
> > less memory and time, and more comfortable for use from C. IdentityDict and
> > IdentitySet implementations will share almost all code with implementations
> > of ordinal dict and set, only lookup function and metainformation will be
> > different. However dict and set already use a lookup function overloading.
> >
> >
> I agree that the data structures may be useful, but is there no way to some
> allow the customization of existing data structures instead, without losing
> performance? It's a shame to have another kind of dict just for this
> purpose.

The implementation kindof already exists in _pickle.c, IIRC (it's used
for the memo dict).

Regards

Antoine.




From storchaka at gmail.com  Wed Jan  2 21:34:18 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 2 Jan 2013 22:34:18 +0200
Subject: [Python-ideas] Identity dicts and sets
Message-ID: <201301022234.18839.storchaka@gmail.com>

?????? 02 ?????? 2013 21:43:47 Eli Bendersky ?? ????????:
> I agree that the data structures may be useful, but is there no way to some
> allow the customization of existing data structures instead, without losing
> performance? It's a shame to have another kind of dict just for this
> purpose.

What interface for the customization is possible? Obviously, a dict 
constructor can't have a special keyword argument.


From mwm at mired.org  Wed Jan  2 21:37:12 2013
From: mwm at mired.org (Mike Meyer)
Date: Wed, 2 Jan 2013 14:37:12 -0600
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <201301022234.18839.storchaka@gmail.com>
References: <201301022234.18839.storchaka@gmail.com>
Message-ID: <CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>

On Wed, Jan 2, 2013 at 2:34 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
> ?????? 02 ?????? 2013 21:43:47 Eli Bendersky ?? ????????:
>> I agree that the data structures may be useful, but is there no way to some
>> allow the customization of existing data structures instead, without losing
>> performance? It's a shame to have another kind of dict just for this
>> purpose.
>> What interface for the customization is possible? Obviously, a dict
> constructor can't have a special keyword argument.

How about a set_key method? It takes a single callable as an argument.
You'd get your behavior with dict.set_key(id). If called when the dict
is non-empty, it should throw an exception.

   <mike


From haoyi.sg at gmail.com  Wed Jan  2 21:45:17 2013
From: haoyi.sg at gmail.com (Haoyi Li)
Date: Thu, 3 Jan 2013 04:45:17 +0800
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
References: <201301022234.18839.storchaka@gmail.com>
	<CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
Message-ID: <CALruUQJsOJNuPKOECiMHU4xz+Lz6Yz+tUofMW06iPminWVOVUw@mail.gmail.com>

Something curried?

custom_dict(cfg=...)(key1=..., key2=...)


On Thu, Jan 3, 2013 at 4:37 AM, Mike Meyer <mwm at mired.org> wrote:

> On Wed, Jan 2, 2013 at 2:34 PM, Serhiy Storchaka <storchaka at gmail.com>
> wrote:
> > ?????? 02 ?????? 2013 21:43:47 Eli Bendersky ?? ????????:
> >> I agree that the data structures may be useful, but is there no way to
> some
> >> allow the customization of existing data structures instead, without
> losing
> >> performance? It's a shame to have another kind of dict just for this
> >> purpose.
> >> What interface for the customization is possible? Obviously, a dict
> > constructor can't have a special keyword argument.
>
> How about a set_key method? It takes a single callable as an argument.
> You'd get your behavior with dict.set_key(id). If called when the dict
> is non-empty, it should throw an exception.
>
>    <mike
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130103/35cec654/attachment.html>

From zuo at chopin.edu.pl  Wed Jan  2 22:04:52 2013
From: zuo at chopin.edu.pl (=?utf-8?B?SmFuIEthbGlzemV3c2tp?=)
Date: Wed, 02 Jan 2013 22:04:52 +0100
Subject: [Python-ideas] =?utf-8?q?Odp=3A__Identity_dicts_and_sets?=
Message-ID: <20130102210457.041ED2F587@filifionka.chopin.edu.pl>

Eg.:
    custom_dict(set_key=..., missing=...)
    -> a new dict subclass

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130102/07947fcb/attachment.html>

From masklinn at masklinn.net  Wed Jan  2 22:13:57 2013
From: masklinn at masklinn.net (Masklinn)
Date: Wed, 2 Jan 2013 22:13:57 +0100
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
References: <201301022234.18839.storchaka@gmail.com>
	<CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
Message-ID: <9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net>

On 2013-01-02, at 21:37 , Mike Meyer wrote:
> On Wed, Jan 2, 2013 at 2:34 PM, Serhiy Storchaka <storchaka at gmail.com> wrote:
>> ?????? 02 ?????? 2013 21:43:47 Eli Bendersky ?? ????????:
>>> I agree that the data structures may be useful, but is there no way to some
>>> allow the customization of existing data structures instead, without losing
>>> performance? It's a shame to have another kind of dict just for this
>>> purpose.
>>> What interface for the customization is possible? Obviously, a dict
>> constructor can't have a special keyword argument.
> 
> How about a set_key method? It takes a single callable as an argument.
> You'd get your behavior with dict.set_key(id). If called when the dict
> is non-empty, it should throw an exception.

Wouldn't it make more sense to provide e.g.
collections.KeyedDictionary(key, seq, **kwargs)? It would be clear
and would allow implementations to provide dedicated implementations for
special cases (such as key=id) if desired or necessary.

defaultdict already follows this pattern, so there's a precedent.

From bruce at leapyear.org  Wed Jan  2 22:33:30 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Wed, 2 Jan 2013 13:33:30 -0800
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net>
References: <201301022234.18839.storchaka@gmail.com>
	<CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
	<9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net>
Message-ID: <CAGu0AnvaWtCwcyf5MeoMn6G_jvJnoSTKZzR34XsyKkfsc6=Tow@mail.gmail.com>

On Wed, Jan 2, 2013 at 1:13 PM, Masklinn <masklinn at masklinn.net> wrote:

>
> Wouldn't it make more sense to provide e.g.
> collections.KeyedDictionary(key, seq, **kwargs)? It would be clear
> and would allow implementations to provide dedicated implementations for
> special cases (such as key=id) if desired or necessary.
>
> defaultdict already follows this pattern, so there's a precedent.


I agree collections is the place to put it but that would give us three
specialized subclasses of dictionary which cannot be combined. That is, I
can have a dictionary with a default, one that is ordered or one that uses
a key function but not any combination of those. It would seem better to
have something like Haoyi Li suggested:

collections.Dictionary(default=None, ordered=False, key=None) --> a dict
subclass

of course collections.OrderedDictionary and collections.defaultdict would
continue to be available as appropriate aliases to collections.Dictionary.

--- Bruce
Check it out: http://kck.st/YeqGxQ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130102/3ad8c2ed/attachment.html>

From shibturn at gmail.com  Wed Jan  2 22:35:30 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Wed, 02 Jan 2013 21:35:30 +0000
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
References: <201301022234.18839.storchaka@gmail.com>
	<CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
Message-ID: <kc2972$2df$1@ger.gmane.org>

On 02/01/2013 8:37pm, Mike Meyer wrote:
> How about a set_key method? It takes a single callable as an argument.
> You'd get your behavior with dict.set_key(id). If called when the dict
> is non-empty, it should throw an exception.

Wouldn't you need to specify a hash function at the same time?

-- 
Richard



From greg.ewing at canterbury.ac.nz  Wed Jan  2 21:58:56 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 03 Jan 2013 09:58:56 +1300
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
	<20130102030851.GA11279@iskra.aviel.ru>
	<673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com>
Message-ID: <50E49F90.9080005@canterbury.ac.nz>

alex23 wrote:
> There is no way you can make Python fit either the call by reference
> or call by value models, although people regularly try,

No, what happens is that different people have different ideas
about what those terms mean, and they talk past each other.
So they've become useless nowadays, and are best avoided
altogether unless you want to start a month-long argument.

-- 
Greg


From tjreedy at udel.edu  Wed Jan  2 23:48:26 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 02 Jan 2013 17:48:26 -0500
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <kc1eld$buk$1@ger.gmane.org>
References: <kc1eld$buk$1@ger.gmane.org>
Message-ID: <kc2dgh$a67$1@ger.gmane.org>

On 1/2/2013 9:01 AM, Serhiy Storchaka wrote:
> I propose to add new standard collection types: IdentityDict and
> IdentitySet. They are almost same as ordinal dict and set, but uses

What do you mean by ordinal dict, as opposed to plain dict.

> identity check instead of equality check (and id() or hash(id()) as a

By default, equality check is identity check.

> hash). They will be useful for pickling, for implementing __sizeof__()
> for compound types, and for other graph algorithms.

I don't know anything about pickling or __sizeof__, by if one uses 
user-defined classes for nodes and edges, equality is identity, so I 
don't see what would be gained.

The disadvantage of multiple minor variations on dict is confusion among 
users as to specific properties and use cases.

-- 
Terry Jan Reedy



From mwm at mired.org  Wed Jan  2 14:29:28 2013
From: mwm at mired.org (Mike Meyer)
Date: Wed, 2 Jan 2013 07:29:28 -0600
Subject: [Python-ideas] Documenting Python warts
In-Reply-To: <20130102113545.GA23780@iskra.aviel.ru>
References: <CAPkN8x+TF6znvM-Kd2Qad_ExjYf-hv41S3_Uk0U0OqAGO8HteQ@mail.gmail.com>
	<CAMpsgwYJiF5JFEf1jBmDoh9mDwLbkq7B6efAvq+56zbud8qHPg@mail.gmail.com>
	<20121231000012.GA10426@iskra.aviel.ru>
	<20130101225505.757540fa@pitrou.net>
	<CAPTjJmrP-DokxCtKBRnydy=xTQZHBX9iHY3nYjUJnJ1Ay3wEzQ@mail.gmail.com>
	<189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com>
	<20130102000113.GB672@iskra.aviel.ru>
	<c6f76c2f-635e-4ae9-a263-b7428e68a1e7@pe9g2000pbc.googlegroups.com>
	<20130102030851.GA11279@iskra.aviel.ru>
	<673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com>
	<20130102113545.GA23780@iskra.aviel.ru>
Message-ID: <20130102072928.52867fd3@bhuda.mired.org>

On Wed, 2 Jan 2013 15:35:45 +0400
Oleg Broytman <phd at phdru.name> wrote:
> On Wed, Jan 02, 2013 at 02:59:39AM -0800, alex23 <wuwei23 at gmail.com> wrote:
> > What you seem to be
> > advocating is that all languages be nothing more than syntactic sugar
> > for the same underlying model.
>    Yes, von Neumann architecture.

So all the differences between FORTRAN II, PROLOG and Python are
syntactic sugar? I guess that makes preference in programming
languages just a matter of taste.

	  <mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org


From jimjjewett at gmail.com  Thu Jan  3 03:13:47 2013
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 2 Jan 2013 21:13:47 -0500
Subject: [Python-ideas] Preventing out of memory conditions
In-Reply-To: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
References: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
Message-ID: <CA+OGgf6espty_YFKX0mtvyPFvGZmYjvc-psw6OU+UCmys-zzEw@mail.gmail.com>

On 12/31/12, Max Moroz <maxmoroz at gmail.com> wrote:
> Sometimes, I have the flexibility to reduce the memory used by my
> program (e.g., by destroying large cached objects, etc.). It would be
> great if I could ask Python interpreter to notify me when memory is
> running out, so I can take such actions.

Agreed, provided the overhead isn't too high.

Depending on how accurately and precisely you need to track the memory
usage, it might be enough to replace Objects/obmalloc.c new_arena with
a wrapper that calls your callback before (maybe) allocating a new
arena.

-jJ


From ncoghlan at gmail.com  Thu Jan  3 04:37:40 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 3 Jan 2013 13:37:40 +1000
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <kc2dgh$a67$1@ger.gmane.org>
References: <kc1eld$buk$1@ger.gmane.org>
	<kc2dgh$a67$1@ger.gmane.org>
Message-ID: <CADiSq7c7DZya3DesiG1QiK-8Tyj=tt-YRfJQV48L-o-QUUMkdQ@mail.gmail.com>

On Thu, Jan 3, 2013 at 8:48 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 1/2/2013 9:01 AM, Serhiy Storchaka wrote:
>>
>> I propose to add new standard collection types: IdentityDict and
>> IdentitySet. They are almost same as ordinal dict and set, but uses
>
>
> What do you mean by ordinal dict, as opposed to plain dict.

I assumed Serhiy meant OrderedDict.

>> identity check instead of equality check (and id() or hash(id()) as a
>
> By default, equality check is identity check.

The point of an IdentityDict/Set is for it to be keyed by id rather
than value for *all* objects, rather than just those with the default
equality comparison.

This can be important in some use cases:

1. It's more correct for caching. For example, "0 + 0" should give
"0", while "0.0 + 0.0" should give "0.0". An identity based cache will
get this right, a value based cache will get it wrong
(functools.lru_cache actually splits the difference and goes with a
type+value based cache rather than a simple value based cache)

2. It effectively allows you to add additional state to both mutable
and immutable objects (by storing the extra state in an identity keyed
dictionary).

However, one important problem with this kind of data structure is
that it is *very* easy to get into lifecycle problems if you don't
store at least a weak reference to a real key (since id's may be
recycled after an object is destroyed, as shown here:

>>> [] is [] # Both objects alive at the same time, forces different id
False
>>> id([]) == id([]) # First id is recycled for second object
True


> The disadvantage of multiple minor variations on dict is confusion among
> users as to specific properties and use cases.

Indeed. As noted elsewhere, we already have a nasty composition
problem between __missing__, order preservation and weak referencing.
Adding a key function override into that mix suggests that a hashmap
factory API might be a better option than continuing the proliferation
of slightly different mapping types. (Guido's fears of an explosion in
subtly different container types in the standard library once the
collections module was added have proved to be well founded).

So, -1 from me on making the composition problem worse, but tentative
+0 on an API that addresses the composition problem and also includes
"key=func" style support for using a decorated value in the lookup
step.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From hernan.grecco at gmail.com  Thu Jan  3 05:05:38 2013
From: hernan.grecco at gmail.com (Hernan Grecco)
Date: Thu, 3 Jan 2013 05:05:38 +0100
Subject: [Python-ideas] Order in the documentation search results
In-Reply-To: <50E42A0F.2040908@drees.name>
References: <CAL6gwWXikjrYG+f+sqnm3k2mtNXCasTD7Uj_ABY=JNLi4eBNhQ@mail.gmail.com>
	<50E083BA.7000603@nedbatchelder.com> <kbq5l9$g8o$1@ger.gmane.org>
	<50E142FF.3070101@drees.name>
	<CAL6gwWUWXXqv=1zMtL0WpYkhBjurc0jb4uQi7ff6rNnX=1TFsQ@mail.gmail.com>
	<50E42A0F.2040908@drees.name>
Message-ID: <CAL6gwWV4B1ZjBykLRqeA_UAxMRetoMQdQmumHj=utSRSNsKUVw@mail.gmail.com>

Hi,

I have done some work to improve the search results on the Python
Docs. You can compare the current [0] with the proposed [1], or both
at the same time [2]. It is basically a patch for sphinx [4], plus a
python specific javascript [3]. The ideas are briefly explained [4].

I have not optimized the scores in [4], just some educated guesses.

best,

Hernan

[0] http://hgrecco.github.com/searchpydocs/current/
[1] http://hgrecco.github.com/searchpydocs/proposed/
[2] http://hgrecco.github.com/searchpydocs/
[3] https://github.com/hgrecco/searchpydocs/blob/master/cpy_scorer.js
[4] https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results

On Wed, Jan 2, 2013 at 1:37 PM, Stefan Drees <stefan at drees.name> wrote:
> Hi hernan,
> On 02.01.13 12:20, Hernan Grecco wrote:
>>
>> ... Thanks for all the feedback. I was hacking the sphinx indexer and the
>>
>> javacript searchtool today. I think the search results can be improved
>> by patching sphinx upstream and adding a small project dependent (in
>> this case Python) javascript snippet. I have created a proposal in the
>> Sphinx Issue tracker [0]. Let's move the discussion there.
>> ...
>> [0]
>> https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results
>
>
> thanks a lot for transforming the mail thread to improve the local search
> facility into real code suggestions.
>
> I commented on a first snippet from your suggested patch there.
>
> All the best,
> Stefan.
>
> Further historic details:
>>
>>
>> On Mon, Dec 31, 2012 at 8:47 AM, Stefan Drees <stefan at drees.name> wrote:
>>>
>>> On 30.12.12 20:45, Georg Brandl wrote:
>>>>
>>>> On 12/30/2012 07:11 PM, Ned Batchelder wrote:
>>>>>
>>>>> On 12/30/2012 12:54 PM, Hernan Grecco wrote:
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> I have seen many people new to Python stumbling while using the Python
>>>>>> docs due to the order of the search results.
>>>>>> ...
>>>>>>
>>>>>> So my suggestion is to put the builtins first, the rest of the
>>>>>> standard lib later including HowTos, FAQ, etc and finally the
>>>>>> c-modules. Additionally, a section with a title matching exactly the
>>>>>> search query should come first. (I am not sure if the last suggestion
>>>>>> belongs in python-ideas or in
>>>>>> the sphinx mailing list, please advice)
>>>>>
>>>>>
>>>>>
>>>>> While we're on the topic, why in this day and age do we have a custom
>>>>> search?  Using google site search would be faster for the user, and
>>>>> more
>>>>> accurate.
>>>>
>>>>
>>>>
>>>> I agree.  Someone needs to propose a patch though.
>>>> ...
>>>
>>>
>>>
>>> a custom search in itself is a wonderful thing. To me it also shows more
>>> appreciation of visitor concerns than thoses sites, that are just
>>> _offering_
>>> google site search (which is accessible anyway to every visitor capable
>>> of
>>> memorizing the google or bing or whatnot URL).
>>>
>>> I second Hernans suggestion about ordering and also his question where
>>> the
>>> request (and patches) should be directed to.
>>> ...
>
>


From solipsis at pitrou.net  Thu Jan  3 08:06:49 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 3 Jan 2013 08:06:49 +0100
Subject: [Python-ideas] Identity dicts and sets
References: <kc1eld$buk$1@ger.gmane.org> <kc2dgh$a67$1@ger.gmane.org>
	<CADiSq7c7DZya3DesiG1QiK-8Tyj=tt-YRfJQV48L-o-QUUMkdQ@mail.gmail.com>
Message-ID: <20130103080649.58dfe44b@pitrou.net>

On Thu, 3 Jan 2013 13:37:40 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Thu, Jan 3, 2013 at 8:48 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> > On 1/2/2013 9:01 AM, Serhiy Storchaka wrote:
> >>
> >> I propose to add new standard collection types: IdentityDict and
> >> IdentitySet. They are almost same as ordinal dict and set, but uses
> >
> >
> > What do you mean by ordinal dict, as opposed to plain dict.
> 
> I assumed Serhiy meant OrderedDict.

I'm quite sure Serhiy meant ordinary dict.

> As noted elsewhere, we already have a nasty composition
> problem between __missing__, order preservation and weak referencing.

Aren't you dramatizing a bit? I haven't seen anyone ask for an ordered
weak dict, or a weak dict with default values.

> So, -1 from me on making the composition problem worse, but tentative
> +0 on an API that addresses the composition problem and also includes
> "key=func" style support for using a decorated value in the lookup
> step.

Well, IdentityDict addresses an actual use case. I don't think a
defaultidentitydict addresses any use case.

Regards

Antoine.




From stefan at drees.name  Thu Jan  3 10:05:27 2013
From: stefan at drees.name (Stefan Drees)
Date: Thu, 03 Jan 2013 10:05:27 +0100
Subject: [Python-ideas] Order in the documentation search results
In-Reply-To: <CAL6gwWV4B1ZjBykLRqeA_UAxMRetoMQdQmumHj=utSRSNsKUVw@mail.gmail.com>
References: <CAL6gwWXikjrYG+f+sqnm3k2mtNXCasTD7Uj_ABY=JNLi4eBNhQ@mail.gmail.com>
	<50E083BA.7000603@nedbatchelder.com>
	<kbq5l9$g8o$1@ger.gmane.org> <50E142FF.3070101@drees.name>
	<CAL6gwWUWXXqv=1zMtL0WpYkhBjurc0jb4uQi7ff6rNnX=1TFsQ@mail.gmail.com>
	<50E42A0F.2040908@drees.name>
	<CAL6gwWV4B1ZjBykLRqeA_UAxMRetoMQdQmumHj=utSRSNsKUVw@mail.gmail.com>
Message-ID: <50E549D7.1000007@drees.name>

Hi Hernan,

On 03.01.13 05:05, Hernan Grecco wrote:
> ... I have done some work to improve the search results on the Python
> Docs. You can compare the current [0] with the proposed [1], or both
> at the same time [2]. It is basically a patch for sphinx [4], plus a
> python specific javascript [3]. The ideas are briefly explained [4].
>
> I have not optimized the scores in [4], just some educated guesses.
> ...
>
> [0] http://hgrecco.github.com/searchpydocs/current/
> [1] http://hgrecco.github.com/searchpydocs/proposed/
> [2] http://hgrecco.github.com/searchpydocs/
> [3] https://github.com/hgrecco/searchpydocs/blob/master/cpy_scorer.js
> [4] https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results
>

that looks good to me for eg. file, dict and dict.clear. Far better, 
than a google/bing/whatever_external  search by the way (as tested with 
dict, using google search on "dict site:http://docs.python.org/3/") :-))

As I read in the sphinx issue mail flow you opened, Georg asks for a 
pull request of the patches. I consider this very promising. Thanks 
again for the effort and these good first results Hernan!

All the best,
Stefan.

Further historic details:

> On Wed, Jan 2, 2013 at 1:37 PM, Stefan Drees <stefan at drees.name> wrote:
>> Hi hernan,
>> On 02.01.13 12:20, Hernan Grecco wrote:
>>>
>>> ... Thanks for all the feedback. I was hacking the sphinx indexer and the
>>>
>>> javacript searchtool today. I think the search results can be improved
>>> by patching sphinx upstream and adding a small project dependent (in
>>> this case Python) javascript snippet. I have created a proposal in the
>>> Sphinx Issue tracker [0]. Let's move the discussion there.
>>> ...
>>> [0]
>>> https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results
>>
>>
>> thanks a lot for transforming the mail thread to improve the local search
>> facility into real code suggestions.
>>
>> I commented on a first snippet from your suggested patch there.
>>
>> All the best,
>> Stefan.
>>
>> Further historic details:
>>>
>>>
>>> On Mon, Dec 31, 2012 at 8:47 AM, Stefan Drees <stefan at drees.name> wrote:
>>>>
>>>> On 30.12.12 20:45, Georg Brandl wrote:
>>>>>
>>>>> On 12/30/2012 07:11 PM, Ned Batchelder wrote:
>>>>>>
>>>>>> On 12/30/2012 12:54 PM, Hernan Grecco wrote:
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>> I have seen many people new to Python stumbling while using the Python
>>>>>>> docs due to the order of the search results.
>>>>>>> ...
>>>>>>>
>>>>>>> So my suggestion is to put the builtins first, the rest of the
>>>>>>> standard lib later including HowTos, FAQ, etc and finally the
>>>>>>> c-modules. Additionally, a section with a title matching exactly the
>>>>>>> search query should come first. (I am not sure if the last suggestion
>>>>>>> belongs in python-ideas or in
>>>>>>> the sphinx mailing list, please advice)
>>>>>>
>>>>>>
>>>>>>
>>>>>> While we're on the topic, why in this day and age do we have a custom
>>>>>> search?  Using google site search would be faster for the user, and
>>>>>> more
>>>>>> accurate.
>>>>>
>>>>>
>>>>>
>>>>> I agree.  Someone needs to propose a patch though.
>>>>> ...
>>>>
>>>>
>>>>
>>>> a custom search in itself is a wonderful thing. To me it also shows more
>>>> appreciation of visitor concerns than thoses sites, that are just
>>>> _offering_
>>>> google site search (which is accessible anyway to every visitor capable
>>>> of
>>>> memorizing the google or bing or whatnot URL).
>>>>
>>>> I second Hernans suggestion about ordering and also his question where
>>>> the
>>>> request (and patches) should be directed to.
>>>> ...



From storchaka at gmail.com  Thu Jan  3 12:42:57 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 03 Jan 2013 13:42:57 +0200
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <CALruUQJsOJNuPKOECiMHU4xz+Lz6Yz+tUofMW06iPminWVOVUw@mail.gmail.com>
References: <201301022234.18839.storchaka@gmail.com>
	<CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
	<CALruUQJsOJNuPKOECiMHU4xz+Lz6Yz+tUofMW06iPminWVOVUw@mail.gmail.com>
Message-ID: <kc3qs2$33r$1@ger.gmane.org>

On 02.01.13 22:45, Haoyi Li wrote:
> Something curried?
>
> custom_dict(cfg=...)(key1=..., key2=...)

Yes, it looks good. In any case custom_dict() should return a new type, 
not dict, for allow serialization.




From storchaka at gmail.com  Thu Jan  3 12:50:29 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 03 Jan 2013 13:50:29 +0200
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <CAGu0AnvaWtCwcyf5MeoMn6G_jvJnoSTKZzR34XsyKkfsc6=Tow@mail.gmail.com>
References: <201301022234.18839.storchaka@gmail.com>
	<CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
	<9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net>
	<CAGu0AnvaWtCwcyf5MeoMn6G_jvJnoSTKZzR34XsyKkfsc6=Tow@mail.gmail.com>
Message-ID: <kc3rai$6u7$1@ger.gmane.org>

On 02.01.13 23:33, Bruce Leban wrote:
> I agree collections is the place to put it but that would give us three
> specialized subclasses of dictionary which cannot be combined. That is,
> I can have a dictionary with a default, one that is ordered or one that
> uses a key function but not any combination of those. It would seem
> better to have something like Haoyi Li suggested:
>
> collections.Dictionary(default=None, ordered=False, key=None) --> a dict
> subclass

I doubt if such combinations have a sense. At least not all features can 
be combined.




From storchaka at gmail.com  Thu Jan  3 12:51:04 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 03 Jan 2013 13:51:04 +0200
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <kc2dgh$a67$1@ger.gmane.org>
References: <kc1eld$buk$1@ger.gmane.org> <kc2dgh$a67$1@ger.gmane.org>
Message-ID: <kc3rb9$7cu$1@ger.gmane.org>

On 03.01.13 00:48, Terry Reedy wrote:
> What do you mean by ordinal dict, as opposed to plain dict.

Sorry to have confused you. I mean "ordinary dict", same as "plain dict".

> I don't know anything about pickling or __sizeof__, by if one uses
> user-defined classes for nodes and edges, equality is identity, so I
> don't see what would be gained.

If one uses a list, a dict, or user-defined class with defined __eq__, 
equality is not identity. Yes, you can use an identity dict with mutable 
types!




From storchaka at gmail.com  Thu Jan  3 12:53:35 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 03 Jan 2013 13:53:35 +0200
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <kc2972$2df$1@ger.gmane.org>
References: <201301022234.18839.storchaka@gmail.com>
	<CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
	<kc2972$2df$1@ger.gmane.org>
Message-ID: <kc3rft$6u7$2@ger.gmane.org>

On 02.01.13 23:35, Richard Oudkerk wrote:
> Wouldn't you need to specify a hash function at the same time?

A hash function is hash(keyfunc(key)).



From storchaka at gmail.com  Thu Jan  3 13:09:21 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 03 Jan 2013 14:09:21 +0200
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <CADiSq7c7DZya3DesiG1QiK-8Tyj=tt-YRfJQV48L-o-QUUMkdQ@mail.gmail.com>
References: <kc1eld$buk$1@ger.gmane.org> <kc2dgh$a67$1@ger.gmane.org>
	<CADiSq7c7DZya3DesiG1QiK-8Tyj=tt-YRfJQV48L-o-QUUMkdQ@mail.gmail.com>
Message-ID: <kc3sdg$fam$1@ger.gmane.org>

On 03.01.13 05:37, Nick Coghlan wrote:
> I assumed Serhiy meant OrderedDict.

Sorry to have confused you. I meant "ordinary dict".

> 1. It's more correct for caching. For example, "0 + 0" should give
> "0", while "0.0 + 0.0" should give "0.0". An identity based cache will
> get this right, a value based cache will get it wrong
> (functools.lru_cache actually splits the difference and goes with a
> type+value based cache rather than a simple value based cache)

This is not a use case. Two "0" are same key in CPython, but two "1000" 
or two "0.0" are not. There is yet one "wart" (as in any other language 
which has identity maps).

> However, one important problem with this kind of data structure is
> that it is *very* easy to get into lifecycle problems if you don't
> store at least a weak reference to a real key (since id's may be
> recycled after an object is destroyed, as shown here:

Of course, identity dict and set should got an ownership on its keys and 
values, as all other non-weak collections. Except lookup function they 
don't differ from their ordinary counterparts.

> Indeed. As noted elsewhere, we already have a nasty composition
> problem between __missing__, order preservation and weak referencing.

I doubt if all combinations have a sense.




From storchaka at gmail.com  Thu Jan  3 13:30:56 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 03 Jan 2013 14:30:56 +0200
Subject: [Python-ideas] Preventing out of memory conditions
In-Reply-To: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
References: <CAOVPiMiNBwe_v95apXFVKwj5r5ipmz-ttGriVLV5YE-AXQAT0Q@mail.gmail.com>
Message-ID: <kc3tm4$rfm$1@ger.gmane.org>

On 01.01.13 00:16, Max Moroz wrote:
> But let's say I am willing to do some work to estimate the maximum
> amount of memory my application can be allowed to use. If I provide
> that number to Python interpreter, it may be possible for it to notify
> me when the next memory allocation would exceed this limit by calling
> a function I provide it (hopefully passing as arguments the amount of
> memory being requested, as well as the amount currently in use). My
> callback function could then destroy some objects, and return True to
> indicate that some objects were destroyed. At that point, the
> intepreter could run its standard garbage collection routines to
> release the memory that corresponded to those objects - before
> proceeding with whatever it was trying to do originally. (If I
> returned False, or if I didn't provide a callback function at all, the
> interpreter would simply behave as it does today.) Any memory
> allocations that happen while the callback function itself is
> executing, would not trigger further calls to it. The whole mechanism
> would be disabled for the rest of the session if the memory freed by
> the callback function was insufficient to prevent going over the
> memory limit.
>
> Would this be worth considering for a future language extension? How
> hard would it be to implement?

You can't call a callback function right from memory allocation 
function. A lot of code in the core, in the standard and third-party 
extensions rely on the fact that no Python code executed on some C API 
functions.

Violation of this rule will lead to a breakdown of all. Even copying the 
list could be broken. You allocate the memory of the necessary size for 
the new list and then copy the elements. If the callback function was 
called during memory allocation, the size of the original list may 
change. This will lead to a violation of the integrity and the crash or 
the wrong result. And there are thousands of such places. Change all of 
them is impossible, and it will lead to reduction of performance even if 
callbacks are not used.

You can call a callback function only at safe point, at least when GIL 
is released.



From oscar.j.benjamin at gmail.com  Thu Jan  3 16:03:02 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Thu, 3 Jan 2013 15:03:02 +0000
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <kc3sdg$fam$1@ger.gmane.org>
References: <kc1eld$buk$1@ger.gmane.org> <kc2dgh$a67$1@ger.gmane.org>
	<CADiSq7c7DZya3DesiG1QiK-8Tyj=tt-YRfJQV48L-o-QUUMkdQ@mail.gmail.com>
	<kc3sdg$fam$1@ger.gmane.org>
Message-ID: <CAHVvXxSs0Ew0ut0_SuT1KMDRQhGX-z8D6vOM_zX00iarGYHn3g@mail.gmail.com>

On 3 January 2013 12:09, Serhiy Storchaka <storchaka at gmail.com> wrote:
> On 03.01.13 05:37, Nick Coghlan wrote:
>>
>> [SNIP]
>> However, one important problem with this kind of data structure is
>> that it is *very* easy to get into lifecycle problems if you don't
>> store at least a weak reference to a real key (since id's may be
>> recycled after an object is destroyed, as shown here:
>
> Of course, identity dict and set should got an ownership on its keys and
> values, as all other non-weak collections. Except lookup function they don't
> differ from their ordinary counterparts.

I think what Nick means is that if you implement this naively then you
don't hold references to the keys:

class IdentityDict(dict):
    def __setitem__(self, key, val):
        dict.__setitem__(self, id(key), val)
        # No reference to key held when this function ends ...

A way to fix this is to store both objects in the value (with
corresponding changes to __getitem__ etc.):

class IdentityDict(dict):
    def __setitem__(self, key, val):
        dict.__setitem__(self, id(key), (key, val))


Oscar


From christian at python.org  Thu Jan  3 16:10:20 2013
From: christian at python.org (Christian Heimes)
Date: Thu, 03 Jan 2013 16:10:20 +0100
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <CADiSq7c7DZya3DesiG1QiK-8Tyj=tt-YRfJQV48L-o-QUUMkdQ@mail.gmail.com>
References: <kc1eld$buk$1@ger.gmane.org> <kc2dgh$a67$1@ger.gmane.org>
	<CADiSq7c7DZya3DesiG1QiK-8Tyj=tt-YRfJQV48L-o-QUUMkdQ@mail.gmail.com>
Message-ID: <50E59F5C.3010302@python.org>

Am 03.01.2013 04:37, schrieb Nick Coghlan:
> This can be important in some use cases:
> 
> 1. It's more correct for caching. For example, "0 + 0" should give
> "0", while "0.0 + 0.0" should give "0.0". An identity based cache will
> get this right, a value based cache will get it wrong
> (functools.lru_cache actually splits the difference and goes with a
> type+value based cache rather than a simple value based cache)

Do you mean +0.0 or -0.0? IEEE 754 zeros are always signed although +0.0
is equal to -0.0. And NaNs are always unequal to all NaNs, even to
itself. For floats we would need a type specified dict that handles
special values correctly ... Can of worms?




From ncoghlan at gmail.com  Wed Jan  2 12:40:26 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 2 Jan 2013 21:40:26 +1000
Subject: [Python-ideas] Updated PEP 342: Simplifying the CPython update
	sequence
Message-ID: <CADiSq7fqt-H8Nd=d6aX+Tt3iBBHufOr6Fc8z4mg=LhAj8wtL3A@mail.gmail.com>

I've updated the PEP heavily based on the previous thread and
miscellanous comments in response to checkins.

Latest version is at http://www.python.org/dev/peps/pep-0432/ and inline below.

The biggest change in the new version is moving from a Python
dictionary to a C struct as the storage for the full low level
interpreter configuration as Antoine suggested. The individual
settings are now either C integers for the various flag values
(defaulting to -1 to indicate "figure this out"), or pointers to the
appropriate specific Python type (defaulting to NULL to indicate
"figure this out").

I'm happy enough with the design now that I think it's worth starting
to implement it before I tinker with the PEP any further.

Cheers,
Nick.

================================
PEP: 432
Title: Simplifying the CPython startup sequence
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan <ncoghlan at gmail.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 28-Dec-2012
Python-Version: 3.4
Post-History: 28-Dec-2012, 2-Jan-2013


Abstract
========

This PEP proposes a mechanism for simplifying the startup sequence for
CPython, making it easier to modify the initialization behaviour of the
reference interpreter executable, as well as making it easier to control
CPython's startup behaviour when creating an alternate executable or
embedding it as a Python execution engine inside a larger application.

Note: TBC = To Be Confirmed, TBD = To Be Determined. The appropriate
resolution for most of these should become clearer as the reference
implementation is developed.


Proposal
========

This PEP proposes that CPython move to an explicit multi-phase initialization
process, where a preliminary interpreter is put in place with limited OS
interaction capabilities early in the startup sequence. This essential core
remains in place while all of the configuration settings are determined,
until a final configuration call takes those settings and finishes
bootstrapping the interpreter immediately before locating and executing
the main module.

In the new design, the interpreter will move through the following
well-defined phases during the startup sequence:

* Pre-Initialization - no interpreter available
* Initialization - interpreter partially available
* Initialized - full interpreter available, __main__ related metadata
  incomplete
* Main Execution - optional state, __main__ related metadata populated,
  bytecode executing in the __main__ module namespace

As a concrete use case to help guide any design changes, and to solve a known
problem where the appropriate defaults for system utilities differ from those
for running user scripts, this PEP also proposes the creation and
distribution of a separate system Python (``spython``) executable which, by
default, ignores user site directories and environment variables, and does
not implicitly set ``sys.path[0]`` based on the current directory or the
script being executed.

To keep the implementation complexity under control, this PEP does *not*
propose wholesale changes to the way the interpreter state is accessed at
runtime, nor does it propose changes to the way subinterpreters are
created after the main interpreter has already been initialized. Changing
the order in which the existing initialization steps occur in order to make
the startup sequence easier to maintain is already a substantial change, and
attempting to make those other changes at the same time will make the
change significantly more invasive and much harder to review. However, such
proposals may be suitable topics for follow-on PEPs or patches - one key
benefit of this PEP is decreasing the coupling between the internal storage
model and the configuration interface, so such changes should be easier
once this PEP has been implemented.


Background
==========

Over time, CPython's initialization sequence has become progressively more
complicated, offering more options, as well as performing more complex tasks
(such as configuring the Unicode settings for OS interfaces in Python 3 as
well as bootstrapping a pure Python implementation of the import system).

Much of this complexity is accessible only through the ``Py_Main`` and
``Py_Initialize`` APIs, offering embedding applications little opportunity
for customisation. This creeping complexity also makes life difficult for
maintainers, as much of the configuration needs to take place prior to the
``Py_Initialize`` call, meaning much of the Python C API cannot be used
safely.

A number of proposals are on the table for even *more* sophisticated
startup behaviour, such as better control over ``sys.path`` initialization
(easily adding additional directories on the command line in a cross-platform
fashion, as well as controlling the configuration of ``sys.path[0]``), easier
configuration of utilities like coverage tracing when launching Python
subprocesses, and easier control of the encoding used for the standard IO
streams when embedding CPython in a larger application.

Rather than attempting to bolt such behaviour onto an already complicated
system, this PEP proposes to instead simplify the status quo *first*, with
the aim of making these further feature requests easier to implement.


Key Concerns
============

There are a couple of key concerns that any change to the startup sequence
needs to take into account.


Maintainability
---------------

The current CPython startup sequence is difficult to understand, and even
more difficult to modify. It is not clear what state the interpreter is in
while much of the initialization code executes, leading to behaviour such
as lists, dictionaries and Unicode values being created prior to the call
to ``Py_Initialize`` when the ``-X`` or ``-W`` options are used [1_].

By moving to an explicitly multi-phase startup sequence, developers should
only need to understand which features are not available in the core
bootstrapping state, as the vast majority of the configuration process
will now take place in that state.

By basing the new design on a combination of C structures and Python
data types, it should also be easier to modify the system in the
future to add new configuration options.


Performance
-----------

CPython is used heavily to run short scripts where the runtime is dominated
by the interpreter initialization time. Any changes to the startup sequence
should minimise their impact on the startup overhead.

Experience with the importlib migration suggests that the startup time is
dominated by IO operations. However, to monitor the impact of any changes,
a simple benchmark can be used to check how long it takes to start and then
tear down the interpreter::

   python3 -m timeit -s "from subprocess import call"
"call(['./python', '-c', 'pass'])"

Current numbers on my system for 2.7, 3.2 and 3.3 (using the 3.3
subprocess and timeit modules to execute the check, all with non-debug
builds)::

    # Python 2.7
    $ py33/python -m timeit -s "from subprocess import call"
"call(['py27/python', '-c', 'pass'])"
    100 loops, best of 3: 17.8 msec per loop
    # Python 3.2
    $ py33/python -m timeit -s "from subprocess import call"
"call(['py32/python', '-c', 'pass'])"
    10 loops, best of 3: 39 msec per loop
    # Python 3.3
    $ py33/python -m timeit -s "from subprocess import call"
"call(['py33/python', '-c', 'pass'])"
    10 loops, best of 3: 25.3 msec per loop

Improvements in the import system and the Unicode support already resulted
in a more than 30% improvement in startup time in Python 3.3 relative to
3.2. Python 3.3 is still slightly slower to start than Python 2.7 due to the
additional infrastructure that needs to be put in place to support the
Unicode based text model.

This PEP is not expected to have any significant effect on the startup time,
as it is aimed primarily at *reordering* the existing initialization
sequence, without making substantial changes to the individual steps.

However, if this simple check suggests that the proposed changes to the
initialization sequence may pose a performance problem, then a more
sophisticated microbenchmark will be developed to assist in investigation.


Required Configuration Settings
===============================

A comprehensive configuration scheme requires that an embedding application
be able to control the following aspects of the final interpreter state:

* Whether or not to use randomised hashes (and if used, potentially specify
  a specific random seed)
* The "Where is Python located?" elements in the ``sys`` module:
  * ``sys.executable``
  * ``sys.base_exec_prefix``
  * ``sys.base_prefix``
  * ``sys.exec_prefix``
  * ``sys.prefix``
* The path searched for imports from the filesystem (and other path hooks):
  * ``sys.path``
* The command line arguments seen by the interpeter:
  * ``sys.argv``
* The filesystem encoding used by:
  * ``sys.getfsencoding``
  * ``os.fsencode``
  * ``os.fsdecode``
* The IO encoding (if any) and the buffering used by:
  * ``sys.stdin``
  * ``sys.stdout``
  * ``sys.stderr``
* The initial warning system state:
  * ``sys.warnoptions``
* Arbitrary extended options (e.g. to automatically enable ``faulthandler``):
  * ``sys._xoptions``
* Whether or not to implicitly cache bytecode files:
  * ``sys.dont_write_bytecode``
* Whether or not to enforce correct case in filenames on case-insensitive
  platforms
  * ``os.environ["PYTHONCASEOK"]``
* The other settings exposed to Python code in ``sys.flags``:

  * ``debug`` (Enable debugging output in the pgen parser)
  * ``inspect`` (Enter interactive interpreter after __main__ terminates)
  * ``interactive`` (Treat stdin as a tty)
  * ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings)
  * ``no_user_site`` (don't add the user site directory to sys.path)
  * ``no_site`` (don't implicitly import site during startup)
  * ``ignore_environment`` (whether environment vars are used during config)
  * ``verbose`` (enable all sorts of random output)
  * ``bytes_warning`` (warnings/errors for implicit str/bytes interaction)
  * ``quiet`` (disable banner output even if verbose is also enabled or
    stdin is a tty and the interpreter is launched in interactive mode)

* Whether or not CPython's signal handlers should be installed
* What code (if any) should be executed as ``__main__``:

  * Nothing (just create an empty module)
  * A filesystem path referring to a Python script (source or bytecode)
  * A filesystem path referring to a valid ``sys.path`` entry (typically
    a directory or zipfile)
  * A given string (equivalent to the "-c" option)
  * A module or package (equivalent to the "-m" option)
  * Standard input as a script (i.e. a non-interactive stream)
  * Standard input as an interactive interpreter session

<TBD: Did I miss anything?>

Note that this just covers settings that are currently configurable in some
manner when using the main CPython executable. While this PEP aims to make
adding additional configuration settings easier in the future, it
deliberately avoids adding any new settings of its own.


The Status Quo
==============

The current mechanisms for configuring the interpreter have accumulated in
a fairly ad hoc fashion over the past 20+ years, leading to a rather
inconsistent interface with varying levels of documentation.

(Note: some of the info below could probably be cleaned up and added to the
C API documentation - it's all CPython specific, so it doesn't belong in
the language reference)


Ignoring Environment Variables
------------------------------

The ``-E`` command line option allows all environment variables to be
ignored when initializing the Python interpreter. An embedding application
can enable this behaviour by setting ``Py_IgnoreEnvironmentFlag`` before
calling ``Py_Initialize()``.

In the CPython source code, the ``Py_GETENV`` macro implicitly checks this
flag, and always produces ``NULL`` if it is set.

<TBD: I believe PYTHONCASEOK is checked regardless of this setting >
<TBD: Does -E also ignore Windows registry keys? >


Randomised Hashing
------------------

The randomised hashing is controlled via the ``-R`` command line option (in
releases prior to 3.3), as well as the ``PYTHONHASHSEED`` environment
variable.

In Python 3.3, only the environment variable remains relevant. It can be
used to disable randomised hashing (by using a seed value of 0) or else
to force a specific hash value (e.g. for repeatability of testing, or
to share hash values between processes)

However, embedding applications must use the ``Py_HashRandomizationFlag``
to explicitly request hash randomisation (CPython sets it in ``Py_Main()``
rather than in ``Py_Initialize()``).

The new configuration API should make it straightforward for an
embedding application to reuse the ``PYTHONHASHSEED`` processing with
a text based configuration setting provided by other means (e.g. a
config file or separate environment variable).


Locating Python and the standard library
----------------------------------------

The location of the Python binary and the standard library is influenced
by several elements. The algorithm used to perform the calculation is
not documented anywhere other than in the source code [3_,4_]. Even that
description is incomplete, as it failed to be updated for the virtual
environment support added in Python 3.3 (detailed in PEP 420).

These calculations are affected by the following function calls (made
prior to calling ``Py_Initialize()``) and environment variables:

* ``Py_SetProgramName()``
* ``Py_SetPythonHome()``
* ``PYTHONHOME``

The filesystem is also inspected for ``pyvenv.cfg`` files (see PEP 420) or,
failing that, a ``lib/os.py`` (Windows) or ``lib/python$VERSION/os.py``
file.

The build time settings for PREFIX and EXEC_PREFIX are also relevant,
as are some registry settings on Windows. The hardcoded fallbacks are
based on the layout of the CPython source tree and build output when
working in a source checkout.


Configuring ``sys.path``
------------------------

An embedding application may call ``Py_SetPath()`` prior to
``Py_Initialize()`` to completely override the calculation of
``sys.path``. It is not straightforward to only allow *some* of the
calculations, as modifying ``sys.path`` after initialization is
already complete means those modifications will not be in effect
when standard library modules are imported during the startup sequence.

If ``Py_SetPath()`` is not used prior to the first call to ``Py_GetPath()``
(implicit in ``Py_Initialize()``), then it builds on the location data
calculations above to calculate suitable path entries, along with
the ``PYTHONPATH`` environment variable.

<TBD: On Windows, there's also a bunch of stuff to do with the registry>

The ``site`` module, which is implicitly imported at startup (unless
disabled via the ``-S`` option) adds additional paths to this initial
set of paths, as described in its documentation [5_].

The ``-s`` command line option can be used to exclude the user site
directory from the list of directories added. Embedding applications
can control this by setting the ``Py_NoUserSiteDirectory`` global variable.

The following commands can be used to check the default path configurations
for a given Python executable on a given system:

* ``./python -c "import sys, pprint; pprint.pprint(sys.path)"``
  - standard configuration
* ``./python -s -c "import sys, pprint; pprint.pprint(sys.path)"``
  - user site directory disabled
* ``./python -S -c "import sys, pprint; pprint.pprint(sys.path)"``
  - all site path modifications disabled

(Note: you can see similar information using ``-m site`` instead of ``-c``,
but this is slightly misleading as it calls ``os.abspath`` on all of the
path entries, making relative path entries look absolute. Using the ``site``
module also causes problems in the last case, as on Python versions prior to
3.3, explicitly importing site will carry out the path modifications ``-S``
avoids, while on 3.3+ combining ``-m site`` with ``-S`` currently fails)

The calculation of ``sys.path[0]`` is comparatively straightforward:

* For an ordinary script (Python source or compiled bytecode),
  ``sys.path[0]`` will be the directory containing the script.
* For a valid ``sys.path`` entry (typically a zipfile or directory),
  ``sys.path[0]`` will be that path
* For an interactive session, running from stdin or when using the ``-c`` or
  ``-m`` switches, ``sys.path[0]`` will be the empty string, which the import
  system interprets as allowing imports from the current directory


Configuring ``sys.argv``
------------------------

Unlike most other settings discussed in this PEP, ``sys.argv`` is not
set implicitly by ``Py_Initialize()``. Instead, it must be set via an
explicitly call to ``Py_SetArgv()``.

CPython calls this in ``Py_Main()`` after calling ``Py_Initialize()``. The
calculation of ``sys.argv[1:]`` is straightforward: they're the command line
arguments passed after the script name or the argument to the ``-c`` or
``-m`` options.

The calculation of ``sys.argv[0]`` is a little more complicated:

* For an ordinary script (source or bytecode), it will be the script name
* For a ``sys.path`` entry (typically a zipfile or directory) it will
  initially be the zipfile or directory name, but will later be changed by
  the ``runpy`` module to the full path to the imported ``__main__`` module.
* For a module specified with the ``-m`` switch, it will initially be the
  string ``"-m"``, but will later be changed by the ``runpy`` module to the
  full path to the executed module.
* For a package specified with the ``-m`` switch, it will initially be the
  string ``"-m"``, but will later be changed by the ``runpy`` module to the
  full path to the executed ``__main__`` submodule of the package.
* For a command executed with ``-c``, it will be the string ``"-c"``
* For explicitly requested input from stdin, it will be the string ``"-"``
* Otherwise, it will be the empty string

Embedding applications must call Py_SetArgv themselves. The CPython logic
for doing so is part of ``Py_Main()`` and is not exposed separately.
However, the ``runpy`` module does provide roughly equivalent logic in
``runpy.run_module`` and ``runpy.run_path``.



Other configuration settings
----------------------------

TBD: Cover the initialization of the following in more detail:

* The initial warning system state:
  * ``sys.warnoptions``
  * (-W option, PYTHONWARNINGS)
* Arbitrary extended options (e.g. to automatically enable ``faulthandler``):
  * ``sys._xoptions``
  * (-X option)
* The filesystem encoding used by:
  * ``sys.getfsencoding``
  * ``os.fsencode``
  * ``os.fsdecode``
* The IO encoding and buffering used by:
  * ``sys.stdin``
  * ``sys.stdout``
  * ``sys.stderr``
  * (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO)
* Whether or not to implicitly cache bytecode files:
  * ``sys.dont_write_bytecode``
  * (-B option, PYTHONDONTWRITEBYTECODE)
* Whether or not to enforce correct case in filenames on case-insensitive
  platforms
  * ``os.environ["PYTHONCASEOK"]``
* The other settings exposed to Python code in ``sys.flags``:

  * ``debug`` (Enable debugging output in the pgen parser)
  * ``inspect`` (Enter interactive interpreter after __main__ terminates)
  * ``interactive`` (Treat stdin as a tty)
  * ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings)
  * ``no_user_site`` (don't add the user site directory to sys.path)
  * ``no_site`` (don't implicitly import site during startup)
  * ``ignore_environment`` (whether environment vars are used during config)
  * ``verbose`` (enable all sorts of random output)
  * ``bytes_warning`` (warnings/errors for implicit str/bytes interaction)
  * ``quiet`` (disable banner output even if verbose is also enabled or
    stdin is a tty and the interpreter is launched in interactive mode)

* Whether or not CPython's signal handlers should be installed

Much of the configuration of CPython is currently handled through C level
global variables::

    Py_BytesWarningFlag (-b)
    Py_DebugFlag (-d option)
    Py_InspectFlag (-i option, PYTHONINSPECT)
    Py_InteractiveFlag (property of stdin, cannot be overridden)
    Py_OptimizeFlag (-O option, PYTHONOPTIMIZE)
    Py_DontWriteBytecodeFlag (-B option, PYTHONDONTWRITEBYTECODE)
    Py_NoUserSiteDirectory (-s option, PYTHONNOUSERSITE)
    Py_NoSiteFlag (-S option)
    Py_UnbufferedStdioFlag (-u, PYTHONUNBUFFEREDIO)
    Py_VerboseFlag (-v option, PYTHONVERBOSE)

For the above variables, the conversion of command line options and
environment variables to C global variables is handled by ``Py_Main``,
so each embedding application must set those appropriately in order to
change them from their defaults.

Some configuration can only be provided as OS level environment variables::

    PYTHONSTARTUP
    PYTHONCASEOK
    PYTHONIOENCODING

The ``Py_InitializeEx()`` API also accepts a boolean flag to indicate
whether or not CPython's signal handlers should be installed.

Finally, some interactive behaviour (such as printing the introductory
banner) is triggered only when standard input is reported as a terminal
connection by the operating system.

TBD: Document how the "-x" option is handled (skips processing of the
first comment line in the main script)

Also see detailed sequence of operations notes at [1_]


Design Details
==============

(Note: details here are still very much in flux, but preliminary feedback
is appreciated anyway)

The main theme of this proposal is to create the interpreter state for
the main interpreter *much* earlier in the startup process. This will allow
most of the CPython API to be used during the remainder of the initialization
process, potentially simplifying a number of operations that currently need
to rely on basic C functionality rather than being able to use the richer
data structures provided by the CPython C API.

In the following, the term "embedding application" also covers the standard
CPython command line application.


Interpreter Initialization Phases
---------------------------------

Four distinct phases are proposed:

* Pre-Initialization:

  * no interpreter is available.
  * ``Py_IsInitializing()`` returns ``0``
  * ``Py_IsInitialized()`` returns ``0``
  * ``Py_IsRunningMain()`` returns ``0``
  * The embedding application determines the settings required to create the
    main interpreter and moves to the next phase by calling
    ``Py_BeginInitialization``.

* Initialization:

  * the main interpreter is available, but only partially configured.
  * ``Py_IsInitializing()`` returns ``1``
  * ``Py_IsInitialized()`` returns ``0``
  * ``Py_RunningMain()`` returns ``0``
  * The embedding application determines and applies the settings
    required to complete the initialization process by calling
    ``Py_ReadConfiguration`` and ``Py_EndInitialization``.

* Initialized:

  * the main interpreter is available and fully operational, but
    ``__main__`` related metadata is incomplete.
  * ``Py_IsInitializing()`` returns ``0``
  * ``Py_IsInitialized()`` returns ``1``
  * ``Py_IsRunningMain()`` returns ``0``
  * Optionally, the embedding application may identify and begin
    executing code in the ``__main__`` module namespace by calling
    ``Py_RunPathAsMain``, ``Py_RunModuleAsMain`` or ``Py_RunStreamAsMain``.

* Main Execution:

  * bytecode is being executed in the ``__main__`` namespace
  * ``Py_IsInitializing()`` returns ``0``
  * ``Py_IsInitialized()`` returns ``1``
  * ``Py_IsRunningMain()`` returns ``1``

As indicated by the phase reporting functions, main module execution is
an optional subphase of Initialized rather than a completely distinct phase.

All 4 phases will be used by the standard CPython interpreter and the
proposed System Python interpreter. Other embedding applications may
choose to skip the step of executing code in the ``__main__`` namespace.

An embedding application may still continue to leave initialization almost
entirely under CPython's control by using the existing ``Py_Initialize``
API. Alternatively, if an embedding application wants greater control
over CPython's initial state, it will be able to use the new, finer
grained API, which allows the embedding application greater control
over the initialization process::

    /* Phase 1: Pre-Initialization */
    Py_CoreConfig core_config = Py_CoreConfig_INIT;
    Py_Config config = Py_Config_INIT;
    /* Easily control the core configuration */
    core_config.ignore_environment = 1; /* Ignore environment variables */
    core_config.use_hash_seed = 0;      /* Full hash randomisation */
    Py_BeginInitialization(&core_config);
    /* Phase 2: Initialization */
    /* Optionally preconfigure some settings here - they will then be
     * used to derive other settings */
    Py_ReadConfiguration(&config);
    /* Can completely override derived settings here */
    Py_EndInitialization(&config);
    /* Phase 3: Initialized */
    /* If an embedding application has no real concept of a main module
     * it can leave the interpreter in this state indefinitely.
     * Otherwise, it can launch __main__ via the Py_Run*AsMain functions.
     */


Pre-Initialization Phase
------------------------

The pre-initialization phase is where an embedding application determines
the settings which are absolutely required before the interpreter can be
initialized at all. Currently, the only configuration settings in this
category are those related to the randomised hash algorithm - the hash
algorithms must be consistent for the lifetime of the process, and so they
must be in place before the core interpreter is created.

The specific settings needed are a flag indicating whether or not to use a
specific seed value for the randomised hashes, and if so, the specific value
for the seed (a seed value of zero disables randomised hashing). In addition,
due to the possible use of ``PYTHONHASHSEED`` in configuring the hash
randomisation, the question of whether or not to consider environment
variables must also be addressed early.

The proposed API for this step in the startup sequence is::

    void Py_BeginInitialization(const Py_CoreConfig *config);

Like Py_Initialize, this part of the new API treats initialization failures
as fatal errors. While that's still not particularly embedding friendly,
the operations in this step *really* shouldn't be failing, and changing them
to return error codes instead of aborting would be an even larger task than
the one already being proposed.

The new ``Py_CoreConfig`` struct holds the settings required for preliminary
configuration::

    /* Note: if changing anything in Py_CoreConfig, also update
     * Py_CoreConfig_INIT */
    typedef struct {
        int ignore_environment;   /* -E switch */
        int use_hash_seed;        /* PYTHONHASHSEED */
        unsigned long hash_seed;  /* PYTHONHASHSEED */
    } Py_CoreConfig;

    #define Py_CoreConfig_INIT {0, -1, 0}

The core configuration settings pointer may be ``NULL``, in which case the
default values are ``ignore_environment = 0`` and ``use_hash_seed = -1``.

The ``Py_CoreConfig_INIT`` macro is designed to allow easy initialization
of a struct instance with sensible defaults::

    Py_CoreConfig core_config = Py_CoreConfig_INIT;

``ignore_environment`` controls the processing of all Python related
environment variables. If the flag is zero, then environment variables are
processed normally. Otherwise, all Python-specific environment variables
are considered undefined (exceptions may be made for some OS specific
environment variables, such as those used on Mac OS X to communicate
between the App bundle and the main Python binary).

``use_hash_seed`` controls the configuration of the randomised hash
algorithm. If it is zero, then randomised hashes with a random seed will
be used. It it is positive, then the value in ``hash_seed`` will be used
to seed the random number generator. If the ``hash_seed`` is zero in this
case, then the randomised hashing is disabled completely.

If ``use_hash_seed`` is negative (and ``ignore_environment`` is zero),
then CPython will inspect the ``PYTHONHASHSEED`` environment variable. If it
is not set, is set to the empty string, or to the value ``"random"``, then
randomised hashes with a random seed will be used. If it is set to the string
``"0"`` the randomised hashing will be disabled. Otherwise, the hash seed is
expected to be a string representation of an integer in the range
``[0; 4294967295]``.

To make it easier for embedding applications to use the ``PYTHONHASHSEED``
processing with a different data source, the following helper function
will be added to the C API::

    int Py_ReadHashSeed(char *seed_text,
                        int *use_hash_seed,
                        unsigned long *hash_seed);

This function accepts a seed string in ``seed_text`` and converts it to
the appropriate flag and seed values. If ``seed_text`` is ``NULL``,
the empty string or the value ``"random"``, both ``use_hash_seed`` and
``hash_seed`` will be set to zero. Otherwise, ``use_hash_seed`` will be set to
``1`` and the seed text will be interpreted as an integer and reported as
``hash_seed``. On success the function will return zero. A non-zero return
value indicates an error (most likely in the conversion to an integer).

The aim is to keep this initial level of configuration as small as possible
in order to keep the bootstrapping environment consistent across
different embedding applications. If we can create a valid interpreter state
without the setting, then the setting should go in the config dict passed
to ``Py_EndInitialization()`` rather than in the core configuration.

A new query API will allow code to determine if the interpreter is in the
bootstrapping state between the creation of the interpreter state and the
completion of the bulk of the initialization process::

    int Py_IsInitializing();

Attempting to call ``Py_BeginInitialization()`` again when
``Py_IsInitializing()`` or ``Py_IsInitialized()`` is true is a fatal error.

While in the initializing state, the interpreter should be fully functional
except that:

* compilation is not allowed (as the parser and compiler are not yet
  configured properly)
* creation of subinterpreters is not allowed
* creation of additional thread states is not allowed
* The following attributes in the ``sys`` module are all either missing or
  ``None``:
  * ``sys.path``
  * ``sys.argv``
  * ``sys.executable``
  * ``sys.base_exec_prefix``
  * ``sys.base_prefix``
  * ``sys.exec_prefix``
  * ``sys.prefix``
  * ``sys.warnoptions``
  * ``sys.flags``
  * ``sys.dont_write_bytecode``
  * ``sys.stdin``
  * ``sys.stdout``
* The filesystem encoding is not yet defined
* The IO encoding is not yet defined
* CPython signal handlers are not yet installed
* only builtin and frozen modules may be imported (due to above limitations)
* ``sys.stderr`` is set to a temporary IO object using unbuffered binary
  mode
* The ``warnings`` module is not yet initialized
* The ``__main__`` module does not yet exist

<TBD: identify any other notable missing functionality>

The main things made available by this step will be the core Python
datatypes, in particular dictionaries, lists and strings. This allows them
to be used safely for all of the remaining configuration steps (unlike the
status quo).

In addition, the current thread will possess a valid Python thread state,
allow any further configuration data to be stored on the interpreter object
rather than in C process globals.

Any call to ``Py_BeginInitialization()`` must have a matching call to
``Py_Finalize()``. It is acceptable to skip calling Py_EndInitialization() in
between (e.g. if attempting to read the configuration settings fails)


Determining the remaining configuration settings
------------------------------------------------

The next step in the initialization sequence is to determine the full
settings needed to complete the process. No changes are made to the
interpreter state at this point. The core API for this step is::

    int Py_ReadConfiguration(PyConfig *config);

The config argument should be a pointer to a Python dictionary. For any
supported configuration setting already in the dictionary, CPython will
sanity check the supplied value, but otherwise accept it as correct.

Unlike ``Py_Initialize`` and ``Py_BeginInitialization``, this call will raise
an exception and report an error return rather than exhibiting fatal errors
if a problem is found with the config data.

Any supported configuration setting which is not already set will be
populated appropriately. The default configuration can be overridden
entirely by setting the value *before* calling ``Py_ReadConfiguration``. The
provided value will then also be used in calculating any settings derived
from that value.

Alternatively, settings may be overridden *after* the
``Py_ReadConfiguration`` call (this can be useful if an embedding
application wants to adjust a setting rather than replace it completely,
such as removing ``sys.path[0]``).


Supported configuration settings
--------------------------------

The new ``Py_Config`` struct holds the settings required to complete the
interpreter configuration. All fields are either pointers to Python
data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::

    /* Note: if changing anything in Py_Config, also update Py_Config_INIT */
    typedef struct {
        /* Argument processing */
        PyList *raw_argv;
        PyList *argv;
        PyList *warnoptions; /* -W switch, PYTHONWARNINGS */
        PyDict *xoptions;    /* -X switch */

        /* Filesystem locations */
        PyUnicode *program_name;
        PyUnicode *executable;
        PyUnicode *prefix;           /* PYTHONHOME */
        PyUnicode *exec_prefix;      /* PYTHONHOME */
        PyUnicode *base_prefix;      /* pyvenv.cfg */
        PyUnicode *base_exec_prefix; /* pyvenv.cfg */

        /* Site module */
        int no_site;       /* -S switch */
        int no_user_site;  /* -s switch, PYTHONNOUSERSITE */

        /* Import configuration */
        int dont_write_bytecode;  /* -B switch, PYTHONDONTWRITEBYTECODE */
        int ignore_module_case;   /* PYTHONCASEOK */
        PyList    *import_path;   /* PYTHONPATH (etc) */

        /* Standard streams */
        int use_unbuffered_io;      /* -u switch, PYTHONUNBUFFEREDIO */
        PyUnicode *stdin_encoding;  /* PYTHONIOENCODING */
        PyUnicode *stdin_errors;    /* PYTHONIOENCODING */
        PyUnicode *stdout_encoding; /* PYTHONIOENCODING */
        PyUnicode *stdout_errors;   /* PYTHONIOENCODING */
        PyUnicode *stderr_encoding; /* PYTHONIOENCODING */
        PyUnicode *stderr_errors;   /* PYTHONIOENCODING */

        /* Filesystem access */
        PyUnicode *fs_encoding;

        /* Interactive interpreter */
        int stdin_is_interactive; /* Force interactive behaviour */
        int inspect_main;         /* -i switch, PYTHONINSPECT */
        PyUnicode *startup_file;  /* PYTHONSTARTUP */

        /* Debugging output */
        int debug_parser;    /* -d switch, PYTHONDEBUG */
        int verbosity;       /* -v switch */
        int suppress_banner; /* -q switch */

        /* Code generation */
        int bytes_warnings;  /* -b switch */
        int optimize;        /* -O switch */

        /* Signal handling */
        int install_sig_handlers;
    } Py_Config;


    /* Struct initialization is pretty ugly in C89. Avoiding this mess would
     * be the most attractive aspect of using a PyDict* instead... */
    #define _Py_ArgConfig_INIT  NULL, NULL, NULL, NULL
    #define _Py_LocationConfig_INIT  NULL, NULL, NULL, NULL, NULL, NULL
    #define _Py_SiteConfig_INIT  -1, -1
    #define _Py_ImportConfig_INIT  -1, -1, NULL
    #define _Py_StreamConfig_INIT  -1, NULL, NULL, NULL, NULL, NULL, NULL
    #define _Py_FilesystemConfig_INIT  NULL
    #define _Py_InteractiveConfig_INIT  -1, -1, NULL
    #define _Py_DebuggingConfig_INIT  -1, -1, -1
    #define _Py_CodeGenConfig_INIT  -1, -1
    #define _Py_SignalConfig_INIT  -1

    #define Py_Config_INIT {_Py_ArgConfig_INIT, _Py_LocationConfig_INIT,
                            _Py_SiteConfig_INIT, _Py_ImportConfig_INIT,
                            _Py_StreamConfig_INIT, _Py_FilesystemConfig_INIT,
                            _Py_InteractiveConfig_INIT,
                            _Py_DebuggingConfig_INIT, _Py_CodeGenConfig_INIT,
                            _Py_SignalConfig_INIT}

<TBD: did I miss anything?>


Completing the interpreter initialization
-----------------------------------------

The final step in the initialization process is to actually put the
configuration settings into effect and finish bootstrapping the interpreter
up to full operation::

    int Py_EndInitialization(const PyConfig *config);

Like Py_ReadConfiguration, this call will raise an exception and report an
error return rather than exhibiting fatal errors if a problem is found with
the config data.

All configuration settings are required - the configuration struct
should always be passed through ``Py_ReadConfiguration()`` to ensure it
is fully populated.

After a successful call, ``Py_IsInitializing()`` will be false, while
``Py_IsInitialized()`` will become true. The caveats described above for the
interpreter during the initialization phase will no longer hold.

However, some metadata related to the ``__main__`` module may still be
incomplete:

* ``sys.argv[0]`` may not yet have its final value

  * it will be ``-m`` when executing a module or package with CPython
  * it will be the same as ``sys.path[0]`` rather than the location of
    the ``__main__`` module when executing a valid ``sys.path`` entry
    (typically a zipfile or directory)

* the metadata in the ``__main__`` module will still indicate it is a
  builtin module


Executing the main module
-------------------------

<TBD>

Initial thought is that hiding the various options behind a single API
would make that API too complicated, so 3 separate APIs is more likely::

    Py_RunPathAsMain
    Py_RunModuleAsMain
    Py_RunStreamAsMain

Query API to indicate that ``sys.argv[0]`` is fully populated::

    Py_IsRunningMain()

Internal Storage of Configuration Data
--------------------------------------

The interpreter state will be updated to include details of the configuration
settings supplied during initialization by extending the interpreter state
object with an embedded copy of the ``Py_CoreConfig`` and ``Py_Config``
structs.

For debugging purposes, the configuration settings will be exposed as
a ``sys._configuration`` simple namespace (similar to ``sys.flags`` and
``sys.implementation``. Field names will match those in the configuration
structs, exception for ``hash_seed``, which will be deliberately excluded.

These are *snapshots* of the initial configuration settings. They are not
consulted by the interpreter during runtime.


Stable ABI
----------

All of the APIs proposed in this PEP are excluded from the stable ABI, as
embedding a Python interpreter involves a much higher degree of coupling
than merely writing an extension.


Backwards Compatibility
-----------------------

Backwards compatibility will be preserved primarily by ensuring that
Py_ReadConfiguration() interrogates all the previously defined configuration
settings stored in global variables and environment variables, and that
Py_EndInitialization() writes affected settings back to the relevant
locations.

One acknowledged incompatiblity is that some environment variables which
are currently read lazily may instead be read once during interpreter
initialization. As the PEP matures, these will be discussed in more detail
on a case by case basis. The environment variables which are currently
known to be looked up dynamically are:

* ``PYTHONCASEOK``: writing to ``os.environ['PYTHONCASEOK']`` will no longer
  dynamically alter the interpreter's handling of filename case differences
  on import (TBC)
* ``PYTHONINSPECT``: ``os.environ['PYTHONINSPECT']`` will still be checked
  after execution of the ``__main__`` module terminates

The ``Py_Initialize()`` style of initialization will continue to be
supported. It will use (at least some elements of) the new API
internally, but will continue to exhibit the same behaviour as it
does today, ensuring that ``sys.argv`` is not populated until a subsequent
``PySys_SetArgv`` call. All APIs that currently support being called
prior to ``Py_Initialize()`` will
continue to do so, and will also support being called prior to
``Py_BeginInitialization()``.

To minimise unnecessary code churn, and to ensure the backwards compatibility
is well tested, the main CPython executable may continue to use some elements
of the old style initialization API. (very much TBC)


Open Questions
==============

* Is ``Py_IsRunningMain()`` worth keeping?
* Should the answers to ``Py_IsInitialized()`` and ``Py_RunningMain()`` be
  exposed via the ``sys`` module?
* Is the ``Py_Config`` struct too unwieldy to be practical? Would a Python
  dictionary be a better choice?
* Would it be better to manage the flag variables in ``Py_Config`` as
  Python integers so the struct can be initialized with a simple
  ``memset(&config, 0, sizeof(*config))``?


A System Python Executable
==========================

When executing system utilities with administrative access to a system, many
of the default behaviours of CPython are undesirable, as they may allow
untrusted code to execute with elevated privileges. The most problematic
aspects are the fact that user site directories are enabled,
environment variables are trusted and that the directory containing the
executed file is placed at the beginning of the import path.

Currently, providing a separate executable with different default behaviour
would be prohibitively hard to maintain. One of the goals of this PEP is to
make it possible to replace much of the hard to maintain bootstrapping code
with more normal CPython code, as well as making it easier for a separate
application to make use of key components of ``Py_Main``. Including this
change in the PEP is designed to help avoid acceptance of a design that
sounds good in theory but proves to be problematic in practice.

Cleanly supporting this kind of "alternate CLI" is the main reason for the
proposed changes to better expose the core logic for deciding between the
different execution modes supported by CPython:

* script execution
* directory/zipfile execution
* command execution ("-c" switch)
* module or package execution ("-m" switch)
* execution from stdin (non-interactive)
* interactive stdin


Implementation
==============

None as yet. Once I have a reasonably solid plan of attack, I intend to work
on a reference implementation as a feature branch in my BitBucket sandbox [2_]


References
==========

.. [1] CPython interpreter initialization notes
   (http://wiki.python.org/moin/CPythonInterpreterInitialization)

.. [2] BitBucket Sandbox
   (https://bitbucket.org/ncoghlan/cpython_sandbox)

.. [3] \*nix getpath implementation
   (http://hg.python.org/cpython/file/default/Modules/getpath.c)

.. [4] Windows getpath implementation
   (http://hg.python.org/cpython/file/default/PC/getpathp.c)

.. [5] Site module documentation
   (http://docs.python.org/3/library/site.html)

Copyright
===========
This document has been placed in the public domain.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From tjreedy at udel.edu  Thu Jan  3 19:32:17 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 03 Jan 2013 13:32:17 -0500
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <kc3rb9$7cu$1@ger.gmane.org>
References: <kc1eld$buk$1@ger.gmane.org> <kc2dgh$a67$1@ger.gmane.org>
	<kc3rb9$7cu$1@ger.gmane.org>
Message-ID: <kc4is9$7g0$1@ger.gmane.org>

On 1/3/2013 6:51 AM, Serhiy Storchaka wrote:
> On 03.01.13 00:48, Terry Reedy wrote:

In my original, the following quote is preceded by the following snipped 
line.

"By default, equality check is identity check."

>> I don't know anything about pickling or __sizeof__, by if one uses
>> user-defined classes for nodes and edges, equality is identity,

In that context, I pretty obviously meant user-defined class with the 
default equality as identity. The contrapositive is "If equality is not 
identity, one is not using a user-defined class with default identity."

> If one uses a list, a dict, or user-defined class with defined __eq__,
> equality is not identity.

Yes, these are are examples of 'not a user-defined class with default 
identity' in which equality is not identity. I thought it was clear that 
I know about such things.

> Yes, you can use an identity dict with mutable types!

Yes, and my point was that we effectively already have such things.

class Node(): pass

Instances of Node wrap a dict as .__dict__, but are compared by wrapper 
identity rather than dict value. A set of such things is effectively an 
identity set. In 3.3+, if the instances all have the same attributes (if 
the .__dicts__ all have the same keys), there is only one (not too 
sparse) hashed list of keys for all instances and one corresponding (not 
too sparse) list of values for each instance.

Also, which I did not say before: if one instead represents nodes by a 
unique integer or string or by a list that starts with such a unique 
identifier, then equality is again effectively identity and a set (or 
sequence) of such things is effectively an identity set. This 
corresponds to a standard database table where records have keys, so 
that the identity of records is not lost when reordered or removed from 
the table.

 >> so I don't see what would be gained.

You are proposing (yet-another) dict variation for use in *python* code. 
That requires more justification than a few percent speedup in 
specialized usages. It should make python programming substantially 
easier in multiple use cases. I do not yet see this in regard to graph 
algorithm.

-- 
Terry Jan Reedy



From bruce at leapyear.org  Thu Jan  3 19:55:02 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Thu, 3 Jan 2013 10:55:02 -0800
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <kc3rai$6u7$1@ger.gmane.org>
References: <201301022234.18839.storchaka@gmail.com>
	<CAD=7U2D7gsmEXCyYC-owHfnY34_GcPMUU4f_rOJcn-SBSxvarw@mail.gmail.com>
	<9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net>
	<CAGu0AnvaWtCwcyf5MeoMn6G_jvJnoSTKZzR34XsyKkfsc6=Tow@mail.gmail.com>
	<kc3rai$6u7$1@ger.gmane.org>
Message-ID: <CAGu0AnuJ9ZnU9z5J+92C3FchngSE7w51r=hQF8i+5O1nhZtjWA@mail.gmail.com>

On Thu, Jan 3, 2013 at 3:50 AM, Serhiy Storchaka <storchaka at gmail.com>wrote:

> On 02.01.13 23:33, Bruce Leban wrote:
>
>> I agree collections is the place to put it but that would give us three
>> specialized subclasses of dictionary which cannot be combined. That is,
>> I can have a dictionary with a default, one that is ordered or one that
>> uses a key function but not any combination of those. It would seem
>> better to have something like Haoyi Li suggested:
>>
>> collections.Dictionary(**default=None, ordered=False, key=None) --> a
>> dict
>> subclass
>>
>
> I doubt if such combinations have a sense. At least not all features can
> be combined.


I agree that all feature combinations may not make sense. I think a default
ordered dict would be useful and if other dict variations are created,
combinations of them may be useful too. I don't know if identity dicts are
useful enough to add, but I think that if another dict variation is added,
using a factory should be considered.

I have specifically wanted a sorted default dict in the past. (A sorted
dict is like an ordered dict but the order is sorted by key not by
insertion order. It is simulated by iterating over sorted(dict.keys()). I
doubt that sorted dict is common enough to be worth adding, but if it were
it would be unfortunate to not have a default variation of it.)

--- Bruce
Check this out: http://kck.st/YeqGxQ
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130103/93092406/attachment.html>

From storchaka at gmail.com  Thu Jan  3 21:43:29 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 03 Jan 2013 22:43:29 +0200
Subject: [Python-ideas] Identity dicts and sets
In-Reply-To: <kc4is9$7g0$1@ger.gmane.org>
References: <kc1eld$buk$1@ger.gmane.org> <kc2dgh$a67$1@ger.gmane.org>
	<kc3rb9$7cu$1@ger.gmane.org> <kc4is9$7g0$1@ger.gmane.org>
Message-ID: <kc4qhi$ft6$1@ger.gmane.org>

On 03.01.13 20:32, Terry Reedy wrote:
> Yes, and my point was that we effectively already have such things.
>
> class Node(): pass
>
> Instances of Node wrap a dict as .__dict__, but are compared by wrapper
> identity rather than dict value. A set of such things is effectively an
> identity set. In 3.3+, if the instances all have the same attributes (if
> the .__dicts__ all have the same keys), there is only one (not too
> sparse) hashed list of keys for all instances and one corresponding (not
> too sparse) list of values for each instance.
>
> Also, which I did not say before: if one instead represents nodes by a
> unique integer or string or by a list that starts with such a unique
> identifier, then equality is again effectively identity and a set (or
> sequence) of such things is effectively an identity set. This
> corresponds to a standard database table where records have keys, so
> that the identity of records is not lost when reordered or removed from
> the table.

Not always you can choose node type. Sometimes nodes already exists and 
you should just work with them.

>  >> so I don't see what would be gained.
>
> You are proposing (yet-another) dict variation for use in *python* code.

In fact I think first of all about C code. Now using identity dict/set 
idiom is rather cumbersome in C code. With standard IdentityDict it 
should be so simple as using an ordinary dict.

> That requires more justification than a few percent speedup in
> specialized usages. It should make python programming substantially
> easier in multiple use cases. I do not yet see this in regard to graph
> algorithm.

Identity dict/set idiom used at least in followed stdlib modules: 
_threading_local, xmlrpc.client, json, lib2to3 (xrange fixer), copy, 
unittest.mock, idlelib (rpc, remote debugger and browser), ctypes, 
doctest, pickle, cProfile. May be it is implicitly used in some other 
places or can be used.



From guido at python.org  Fri Jan  4 23:33:44 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 4 Jan 2013 14:33:44 -0800
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
In-Reply-To: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
Message-ID: <CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>

[Markus sent this to me off-list, but agreed to me responding on-list,
quoting his entire message.]

On Wed, Dec 26, 2012 at 2:38 PM, Markus <nepenthesdev at gmail.com> wrote:
> Hi,

Hi Markus,

I don't believe we've met before, have we? It would probably help if
you introduced yourself and your experience, since our past
experiences color our judgment.

> as I've been waiting for this to happen, I decided to speak up.
> While I really look forward to this, I disagree with the PEP.

Heh, we can't all agree on everything. :-)

> First shoot should be getting a well established event loop into python.

Perhaps. What is your definition of an event loop?

> libev is great, it takes care of operating system specialities, and
> only does a single job, providing an event loop.

It is also written for C, and I presume much of its API design was
influenced by the conventions and affordabilities of that language.

> This event loop can take care of timers, sockets and signals,

But sockets are not native on Windows, and I am making some effort
with PEP 3156 to efficiently support higher-level abstractions without
tying them to sockets. (The plan is to support IOCP on Windows. The
previous version of Tulip already had a branch that did support that,
as a demonstration of the power of this abstraction.)

> pyev, a
> great python wrapper for libev already provides this simple eventing
> facility in python.

But, being a libev wrapper, it is likely also strongly influenced by C.

> In case you embed python in a c program, the libev default loop of the
> python code and c code can even be shared, providing a great amount of
> flexibility.

Only if the C code also uses libev, of course. But C programs may use
other event mechanisms -- e.g. AFAIK there are alternatives to libev
(during the early stages of Tulip development I chatted a bit with one
of the original authors of libevent, Niels Provos, and I believe
there's also something called libuv), and GUI frameworks (e.g. X, Qt,
Gtk, Wx) tend to have their own event loop.

PEP 3156 is designed to let alternative *implementations* of the same
*interface* be selected at run time. Hopefully it is possible to
provide a conforming implementation using libev -- then your goal
(smooth interoperability with C code using libev) is obtained.

It's possible that in order to do that the PEP 3156 interface may have
to be refactored into separate pieces. The Tulip implementation
already has separate "pollster" implementations (which concern
themselves *only* with polling for I/O using select, poll, or other
alternatives). It probably makes sense to factor the part that
implements transports out as well. However, the whole point of
including transports and protocols (and futures) in the PEP is that
some platforms may want to implement the same high-level API (e.g.
create a transport that connects to a certain host/port) using a
different approach altogether, e.g. on Windows the transport might not
even use sockets. OTOH on UNIX it may be possible to add file
descriptors representing pipes and pseudo-ttys.

> libev is great as it is small - it provides exactly what's required,
> and nothing beyond.

Depending on your requirements. :-)

> getaddrinfo/getnameinfo/create_transport are out of scope from a event
> loop point of view.
> This functionality already exists in python, it just does not use a
> event loop and is blocking, as every other io related api.

It wasn't random to add these. The "event loop" in PEP 3156 provides
abstractions that leave the platform free to implement connections
using the appropriate native constructs without letting those
constructs "leak" into the application -- after all, whether you're on
UNIX or on Windows, a TCP connection represents the same abstraction,
but the network stack may have a very different interface.

> I'd propose not to replicate the functionality in the event loop
> namespace, but to extend the existing implementations - by allowing to
> provide an event loop/callback/ctx as optional args which get used.

That's an interface choice that I would regret (I really don't like
writing code using callbacks).

(It would also be harder to implement initially as a 3rd party
framework. At the lowest level, no changes to Python itself are needed
-- it already supports non-blocking sockets, for example. But adding
optional callbacks to existing low-level APIs would require changes
throughout the stdlib.)

> If you specify something like pyev as PEP, you can still come up with
> another PEP which defines the semantics for upper layer protocols like
> udp/tcp on IPv4/6, which can be used to take care of dns and
> 'echo-server-connections'.

I could split up the PEP, but that wouldn't really change anything,
since to me it is still a package deal. I am willing to put an effort
into specifying a low-level event loop because I know that I can still
write high-level code which is (mostly) free of callbacks, using
futures, tasks and the yield-from construct. And in order to do that I
need a minimum set of high-level abstractions such as getaddrinfo()
and transport creation (the exact names of the transport creation
methods are still under debate, as are the details of their
signatures, but the need for them is established without a doubt in my
mind).

I note that the stdlib socket module has roughly the same set of
abstractions bundled together:

- socket objects
- getaddrinfo(), getnameinfo()
- create_connection()
- the makefile() methods on socket objects, which create buffered streams

PEP 3156 offers alternatives for all of these, using higher-level
abstractions that have been developed and proven in practice by
Twisted, *and* offers a path to interop to frameworks that previously
couldn't very well interoperate -- Twisted, Tornado, and others have
traditionally been pretty segregated, but with PEP 3156 they can
interoperate both through the event loop and through Futures (which
are friendly both to a callback style and to yield-from).

> Anyway, I really hope you'll have a look on libev and pyev, both is
> great and well tested software and may give you an idea what people
> who dedicate themselves to event loops came up with already in terms
> of names, subclassing, requirements, guarantees and workarounds for
> platform specific failures (kqueue, epoll ...).

I will certainly have a look! I am not so concerned about naming (it
seems inevitable that everyone uses somewhat different terminology
anyway, and it is probably better not to reuse terms when the meaning
is different), but I do like to look at guarantees (or the absence
thereof!) and best practices for dealing with the differences between
platforms.

> http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod
> http://code.google.com/p/pyev/
>
> All together, I'd limit the scope of the PEP to the API of the event
> loop, just focussing on io/timers/signals and propose to extend
> existing API to  be usable with an event loop, instead of replicating
> it.

You haven't convinced me about this. However, you can help me by
comparing the event loop part of PEP 3156 (ignoring anything that
returns or takes a Future) to libev and pointing out things (either
specific APIs or certain guarantees or requirements) that would be
hard to implement using libev, as well as useful features in libev
that you think every event loop should have.

> For naming I'd prefer 'watcher' over 'Handler'.

Hm, 'watcher' to me sounds more active than the behavior I have in
mind for this class. It is just a reification of a specific function
and some arguments to pass to it, with the ability to cancel the call
altogether.

Thanks for writing!

-- 
--Guido van Rossum (python.org/~guido)


From djmitche at gmail.com  Fri Jan  4 23:50:49 2013
From: djmitche at gmail.com (Dustin J. Mitchell)
Date: Fri, 4 Jan 2013 17:50:49 -0500
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
In-Reply-To: <mailman.404.1357339728.2938.python-ideas@python.org>
References: <mailman.404.1357339728.2938.python-ideas@python.org>
Message-ID: <CAJtE5vT_N5g-cVuzo2JvjRAqE5xRcnn2-VDOdoBCqWdpG-EBbg@mail.gmail.com>

As the maintainer of a pretty large, complex app written in Twisted, I
think this is great.  I look forward to a future of being able to
select from a broad library of async tools, and being able to write
tools that can be used outside of Twisted.

Buildbot began, lo these many years ago, doing a lot of things in
memory on on local disk, neither of which require asynchronous IO.  So
a lot of API methods did not originally return Deferreds.  Those
methods are then used by other methods, many of which also do not
return Deferreds.  Now, we want to use a database backend, and
parallelize some of the operations, meaning that the methods need to
return a Deferred.  Unfortunately, that requires a complete tree
traversal of all of the methods and methods that call them, rewriting
them to take and return Deferreds.  There's no "halfway" solution.
This is a little easier with generators (@inlineCallbacks), since the
syntax doesn't change much, but it's a significant change to the API
(in fact, this is a large part of the reason for the big rewrite for
Buildbot-0.9.x).

I bring all this up to say, this PEP will introduce a new "kind" of
method signature into standard Python, one which the caller must know,
and the use of which changes the signature of the caller.  That can
cause sweeping changes, and debugging those changes can be tricky.
Two things can help:

First, `yield from somemeth()` should work fine even if `somemeth` is
not a coroutine function, and authors of async tools should be
encouraged to use this form to assist future-compatibility.  Second,
`somemeth()` without a yield should fail loudly if `somemeth` is a
coroutine function.  Otherwise, the effects can be pretty confusing.

In http://code.google.com/p/uthreads, I accomplished the latter by
taking advantage of garbage collection: if the generator is garbage
collected before it's begun, then it's probably not been yielded.
This is a bit gross, but good enough as a debugging technique.

On the topic of debugging, I also took pains to make sure that
tracebacks looked reasonable, filtering out scheduler code[1].  I
haven't looked closely at Tulip to see if that's a problem.  Most of
the "noise" in the tracebacks came from the lack of 'yield from', so
it may not be an issue at all.

Dustin

P.S. Apologies for the bad threading - I wasn't on the list when this
was last posted.

[1] http://code.google.com/p/uthreads/source/browse/trunk/uthreads/core.py#253


From guido at python.org  Fri Jan  4 23:59:40 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 4 Jan 2013 14:59:40 -0800
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <a6f61bd1-e62f-4415-a63d-634aa51c0094@googlegroups.com>
References: <CAP7+vJLrbi0jJkQe6f+MLWv2WatO4FmGJWs28TrkfcpXfSE4vQ@mail.gmail.com>
	<FADD4950E0EA483BA34DEE1C1BCFBB14@gmail.com>
	<a6f61bd1-e62f-4415-a63d-634aa51c0094@googlegroups.com>
Message-ID: <CAP7+vJJyxmeCcLxMPxzp_yy8yONJadOJfAkx3GYCy+oHEzidNw@mail.gmail.com>

On Fri, Jan 4, 2013 at 2:38 PM, Dustin Mitchell <djmitche at gmail.com> wrote:
> As the maintainer of a pretty large, complex app written in Twisted, I think
> this is great.  I look forward to a future of being able to select from a
> broad library of async tools, and being able to write tools that can be used
> outside of Twisted.

Thanks. Me too. :-)

> Buildbot began, lo these many years ago, doing a lot of things in memory on
> on local disk, neither of which require asynchronous IO.  So a lot of API
> methods did not originally return Deferreds.  Those methods are then used by
> other methods, many of which also do not return Deferreds.  Now, we want to
> use a database backend, and parallelize some of the operations, meaning that
> the methods need to return a Deferred.  Unfortunately, that requires a
> complete tree traversal of all of the methods and methods that call them,
> rewriting them to take and return Deferreds.  There's no "halfway" solution.
> This is a little easier with generators (@inlineCallbacks), since the syntax
> doesn't change much, but it's a significant change to the API (in fact, this
> is a large part of the reason for the big rewrite for Buildbot-0.9.x).
>
> I bring all this up to say, this PEP will introduce a new "kind" of method
> signature into standard Python, one which the caller must know, and the use
> of which changes the signature of the caller.  That can cause sweeping
> changes, and debugging those changes can be tricky.

Yes, and this is the biggest unproven point of the PEP. (The rest is
all backed by a decade or more of experience.)

> Two things can help:
>
> First, `yield from somemeth()` should work fine even if `somemeth` is not a
> coroutine function, and authors of async tools should be encouraged to use
> this form to assist future-compatibility.  Second, `somemeth()` without a
> yield should fail loudly if `somemeth` is a coroutine function.  Otherwise,
> the effects can be pretty confusing.

That would be nice. But the way yield from and generators work, that's
hard to accomplish without further changes to the language -- and I
don't want to have to change the language again (at least not
immediately -- maybe in a few releases, after we've learned what the
real issues are). The best I can do for the first requirement is to
define @coroutine in a way that if the decorated function isn't a
generator, it is wrapped in one. For the second requirement, if you
call somemeth() and ignore the result, nothing happens at all -- this
is indeed infuriating but I see no way to change this.(*) If you use
the result, well, Futures have different attributes than most other
objects so hopefully you'll get a loud AttributeError or TypeError
soon, but of course if you pass it into something else which uses it,
it may still be difficult to track. Hopefully these error messages
provide a hint:

>>> f.foo
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Future' object has no attribute 'foo'
>>> f()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Future' object is not callable
>>>

(*) There's a heavy gun we might use, but I would make this optional,
as a heavy duty debugging mode only. @coroutine could wrap generators
in a lightweight object with a __del__ method and an __iter__ method.
If __del__ is called before __iter__ is ever called, it could raise an
exception or log a warning. But this probably adds too much overhead
to have it always enabled.

> In http://code.google.com/p/uthreads, I accomplished the latter by taking
> advantage of garbage collection: if the generator is garbage collected
> before it's begun, then it's probably not been yielded.  This is a bit
> gross, but good enough as a debugging technique.

Eh, yeah, what I said. :-)

> On the topic of debugging, I also took pains to make sure that tracebacks
> looked reasonable, filtering out scheduler code[1].  I haven't looked
> closely at Tulip to see if that's a problem.  Most of the "noise" in the
> tracebacks came from the lack of 'yield from', so it may not be an issue at
> all.

One of the great advantages of using yield from is that the tracebacks
automatically look nice.

> Dustin
>
> [1]
> http://code.google.com/p/uthreads/source/browse/trunk/uthreads/core.py#253



-- 
--Guido van Rossum (python.org/~guido)


From josh at bartletts.id.au  Sat Jan  5 09:52:11 2013
From: josh at bartletts.id.au (Joshua Bartlett)
Date: Sat, 5 Jan 2013 18:52:11 +1000
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
Message-ID: <CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>

I've just read through PEP 3156 and I thought I'd resurrect this thread
from March. Giving context managers the ability to react to yield and send,
and especially to yield from, would allow the eventual introduction of
asynchronous locks using PEP 3156 futures. This is one of the open issues
listed in the PEP.

Cheers,

J. D. Bartlett.


On 30 March 2012 10:00, Joshua Bartlett <josh at bartletts.id.au> wrote:

> I'd like to propose adding the ability for context managers to catch and
> handle control passing into and out of them via yield and generator.send()
> / generator.next().
>
> For instance,
>
> class cd(object):
>     def __init__(self, path):
>         self.inner_path = path
>
>     def __enter__(self):
>         self.outer_path = os.getcwd()
>         os.chdir(self.inner_path)
>
>     def __exit__(self, exc_type, exc_val, exc_tb):
>         os.chdir(self.outer_path)
>
>     def __yield__(self):
>         self.inner_path = os.getcwd()
>         os.chdir(self.outer_path)
>
>     def __send__(self):
>         self.outer_path = os.getcwd()
>         os.chdir(self.inner_path)
>
> Here __yield__() would be called when control is yielded through the with
> block and __send__() would be called when control is returned via .send()
> or .next(). To maintain compatibility, it would not be an error to leave
> either __yield__ or __send__ undefined.
>
> The rationale for this is that it's sometimes useful for a context manager
> to set global or thread-global state as in the example above, but when the
> code is used in a generator, the author of the generator needs to make
> assumptions about what the calling code is doing. e.g.
>
> def my_generator(path):
>     with cd(path):
>         yield do_something()
>         do_something_else()
>
> Even if the author of this generator knows what effect do_something() and
> do_something_else() have on the current working directory, the author needs
> to assume that the caller of the generator isn't touching the working
> directory. For instance, if someone were to create two my_generator()
> generators with different paths and advance them alternately, the resulting
> behaviour could be most unexpected. With the proposed change, the context
> manager would be able to handle this so that the author of the generator
> doesn't need to make these assumptions.
>
> Naturally, nested with blocks would be handled by calling __yield__ from
> innermost to outermost and __send__ from outermost to innermost.
>
> I rather suspect that if this change were included, someone could come up
> with a variant of the contextlib.contextmanager decorator to simplify
> writing generators for this sort of situation.
>
> Cheers,
>
> J. D. Bartlett
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130105/29bdbf59/attachment.html>

From guido at python.org  Sat Jan  5 20:23:51 2013
From: guido at python.org (Guido van Rossum)
Date: Sat, 5 Jan 2013 11:23:51 -0800
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
Message-ID: <CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>

Possibly (though it will have to be a separate PEP -- PEP 3156 needs
to be able to run on unchanged Python 3.3). Does anyone on this thread
have enough understanding of the implementation of context managers
and generators to be able to figure out how this could be specified
and implemented (or to explain why it is a bad idea, or impossible)?

--Guido

On Sat, Jan 5, 2013 at 12:52 AM, Joshua Bartlett <josh at bartletts.id.au> wrote:
> I've just read through PEP 3156 and I thought I'd resurrect this thread from
> March. Giving context managers the ability to react to yield and send, and
> especially to yield from, would allow the eventual introduction of
> asynchronous locks using PEP 3156 futures. This is one of the open issues
> listed in the PEP.
>
> Cheers,
>
> J. D. Bartlett.
>
>
> On 30 March 2012 10:00, Joshua Bartlett <josh at bartletts.id.au> wrote:
>>
>> I'd like to propose adding the ability for context managers to catch and
>> handle control passing into and out of them via yield and generator.send() /
>> generator.next().
>>
>> For instance,
>>
>> class cd(object):
>>     def __init__(self, path):
>>         self.inner_path = path
>>
>>     def __enter__(self):
>>         self.outer_path = os.getcwd()
>>         os.chdir(self.inner_path)
>>
>>     def __exit__(self, exc_type, exc_val, exc_tb):
>>         os.chdir(self.outer_path)
>>
>>     def __yield__(self):
>>         self.inner_path = os.getcwd()
>>         os.chdir(self.outer_path)
>>
>>     def __send__(self):
>>         self.outer_path = os.getcwd()
>>         os.chdir(self.inner_path)
>>
>> Here __yield__() would be called when control is yielded through the with
>> block and __send__() would be called when control is returned via .send() or
>> .next(). To maintain compatibility, it would not be an error to leave either
>> __yield__ or __send__ undefined.
>>
>> The rationale for this is that it's sometimes useful for a context manager
>> to set global or thread-global state as in the example above, but when the
>> code is used in a generator, the author of the generator needs to make
>> assumptions about what the calling code is doing. e.g.
>>
>> def my_generator(path):
>>     with cd(path):
>>         yield do_something()
>>         do_something_else()
>>
>> Even if the author of this generator knows what effect do_something() and
>> do_something_else() have on the current working directory, the author needs
>> to assume that the caller of the generator isn't touching the working
>> directory. For instance, if someone were to create two my_generator()
>> generators with different paths and advance them alternately, the resulting
>> behaviour could be most unexpected. With the proposed change, the context
>> manager would be able to handle this so that the author of the generator
>> doesn't need to make these assumptions.
>>
>> Naturally, nested with blocks would be handled by calling __yield__ from
>> innermost to outermost and __send__ from outermost to innermost.
>>
>> I rather suspect that if this change were included, someone could come up
>> with a variant of the contextlib.contextmanager decorator to simplify
>> writing generators for this sort of situation.
>>
>> Cheers,
>>
>> J. D. Bartlett
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
--Guido van Rossum (python.org/~guido)


From barry at python.org  Sat Jan  5 22:42:20 2013
From: barry at python.org (Barry Warsaw)
Date: Sat, 5 Jan 2013 16:42:20 -0500
Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update
	sequence
References: <CADiSq7dmibf7TKo2KYRTReZB68jrHQnnNhiWTXJAZdCHBgZdqA@mail.gmail.com>
Message-ID: <20130105164220.09d654be@anarchist.wooz.org>

Hi Nick,

PEP 432 is looking very nice.  It'll be fun to watch the implementation come
together. :)

Some comments...

The start up sequences:

> * Pre-Initialization - no interpreter available
> * Initialization - interpreter partially available

What about "Initializing"?

> * Initialized - full interpreter available, __main__ related metadata
>   incomplete
> * Main Execution - optional state, __main__ related metadata populated,
>   bytecode executing in the __main__ module namespace 

What is "optional" about this state?  Maybe it should be called "Operational"?

> ... separate system Python (spython) executable ...

I love the idea, but I'm not crazy about the name.  What about
`python-minimal` (yes, it's deliberately longer.  Symlinks ftw. :)

> <TBD: Did I miss anything?>

What about sys.implementation?

> as it failed to be updated for the virtual environment support added in
> Python 3.3 (detailed in PEP 420).

venv is defined in PEP 405 (there are two cases of mis-referencing).

Note that there may be other important build time settings on some platforms.
An example is Debian/Ubuntu, where we define the multiarch triplet in the
configure script, and pass that through Makefile(.pre.in) to sysmodule.c for
exposure as sys.implementation._multiarch.

> For a command executed with -c, it will be the string "-c"
> For explicitly requested input from stdin, it will be the string "-"

Wow, I couldn't believe it but it's true!  That seems crazy useless. :)

> Embedding applications must call Py_SetArgv themselves. The CPython logic
> for doing so is part of Py_Main() and is not exposed separately. However,
> the runpy module does provide roughly equivalent logic in runpy.run_module
> and runpy.run_path.

As I've mentioned before on the python-porting mailing list, this is actually
more difficult than it seems because main() takes char*s but Py_SetArgv() and
Py_SetProgramName() takes wchar_t*s.

Maybe Python's own conversion could be refactored to make this easier either
as part of this PEP or after the PEP is implemented.

> int Py_ReadConfiguration(PyConfig *config);

> The config argument should be a pointer to a Python dictionary. For any
> supported configuration setting already in the dictionary, CPython will
> sanity check the supplied value, but otherwise accept it as correct.

So why not define this to take a PyObject* or a PyDictObject* ?

(also: the Py_Config struct members need the correct concrete type pointers,
e.g. PyDictObject*)

> Alternatively, settings may be overridden after the Py_ReadConfiguration
> call (this can be useful if an embedding application wants to adjust a
> setting rather than replace it completely, such as removing sys.path[0]).

How will setting something after Py_ReadConfiguration() is called change a
value such as sys.path?  Or is this the reason why you pass a Py_Config to
Py_EndInitialization()?

(also, see the type typo <wink> in the definition of Py_EndInitialization())

Also, I suggest taking the opportunity to change the sense of flags such as
no_site and dont_write_bytecode.  I find it much more difficult to reason that
"dont_write_bytecode = 0" means *do* write bytecode, rather than
"write_bytecode = 1".  I.e. positives are better than double-negatives.

> sys.argv[0] may not yet have its final value
> it will be -m when executing a module or package with CPython

Gosh, wouldn't it be nice if this could have a more useful value?

> Initial thought is that hiding the various options behind a single API would
> make that API too complicated, so 3 separate APIs is more likely:

+1

> The interpreter state will be updated to include details of the
> configuration settings supplied during initialization by extending the
> interpreter state object with an embedded copy of the Py_CoreConfig and
> Py_Config structs.

Couldn't it just have a dict with all the values from both structs collapsed
into it?

> For debugging purposes, the configuration settings will be exposed as a
> sys._configuration simple namespace

I suggest un-underscoring the name and making it public.  It might be useful
for other than debugging purposes.

> Is Py_IsRunningMain() worth keeping?

Perhaps.  Does it provide any additional information above Py_IsInitialized()?

> Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via
> the sys module?

I can't think of a use case.

> Is the Py_Config struct too unwieldy to be practical? Would a Python
> dictionary be a better choice?

Although I see why you've spec'd it this way, I don't like having *two* config
structures (Py_CoreConfig and Py_Config).  Having a dictionary for the latter
would probably be fine, and in fact you could copy the Py_Config values into
it (when possible during the init sequence) and expose it in the sys module.

> Would it be better to manage the flag variables in Py_Config as Python
> integers so the struct can be initialized with a simple memset(&config, 0,
> sizeof(*config))?

Would we even notice the optimization?

> A System Python Executable

This should probably at least mention Christian's idea of the -I flag (which I
think hasn't been PEP'd yet).  We can bikeshed about the name of the
executable later. :)

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130105/32736f7b/attachment.pgp>

From tjreedy at udel.edu  Sat Jan  5 23:54:52 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 05 Jan 2013 17:54:52 -0500
Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update
	sequence
In-Reply-To: <20130105164220.09d654be@anarchist.wooz.org>
References: <CADiSq7dmibf7TKo2KYRTReZB68jrHQnnNhiWTXJAZdCHBgZdqA@mail.gmail.com>
	<20130105164220.09d654be@anarchist.wooz.org>
Message-ID: <kcab0n$91d$1@ger.gmane.org>

On 1/5/2013 4:42 PM, Barry Warsaw wrote:

> Also, I suggest taking the opportunity to change the sense of flags such as
> no_site and dont_write_bytecode.  I find it much more difficult to reason that
> "dont_write_bytecode = 0" means *do* write bytecode, rather than
> "write_bytecode = 1".  I.e. positives are better than double-negatives.

IE, you prefer positive flags, with some on by default, over having all 
flags indicate a non-default condition. I would too, but I don't hack on 
the C code base. 'dont_write_bytecode' is especially ugly.

In any case, this seems orthogonal to Nick's PEP and should be a 
separate discussion (on pydev), tracker issue, and patch. Is the current 
tradition just happenstance or something that some of the major C 
developers strongly care about?

-- 
Terry Jan Reedy



From guido at python.org  Sun Jan  6 00:30:41 2013
From: guido at python.org (Guido van Rossum)
Date: Sat, 5 Jan 2013 15:30:41 -0800
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
In-Reply-To: <CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
	<CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>
	<CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
Message-ID: <CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>

On Fri, Jan 4, 2013 at 6:53 PM, Markus <nepenthesdev at gmail.com> wrote:
> On Fri, Jan 4, 2013 at 11:33 PM, Guido van Rossum <guido at python.org> wrote:
>>On Wed, Dec 26, 2012 at 2:38 PM, Markus <nepenthesdev at gmail.com> wrote:
>>> First shoot should be getting a well established event loop into python.
>>
>> Perhaps. What is your definition of an event loop?
>
> I ask the loop to notify me via callback if something I care about happens.

Heh. That's rather too general -- it depends on "something I care
about" which could be impossible to guess. :-)

> Usually that's fds and read/writeability.

Ok, although on some platforms it can't be a fd (UNIX-style small
integer) but some other abstraction, e.g. a socket *object* in Jython
or a "handle" on Windows (but I am already starting to repeat myself
:-).

> I create a data structure which has the fd, the event I care about,
> the callback and userdata, pass it to the loop, and the loop will take
> care.
>
> Next, timers, same story,
> I create a data structure which has the time I care about, the
> callback and userdata, pass it to the loop, and the loop will take
> care.

The "create data structure" part is a specific choice of interface
style, not necessarily the best for Python. Most event loop
implementations I've seen for Python (pyev excluded) just have various
methods that express everything through the argument list, not with a
separate data structure.

> Signals - sometimes having signals in the event loop is handy too.
> Same story.

Agreed, I've added this to the open issues section in the PEP.

Do you have a suggestion for a minimal interface for signal handling?
I could imagine the following:

- add_signal_handler(sig, callback, *args).  Whenever signal 'sig' is
received, arrange for callback(*args) to be called. Returns a Handler
which can be used to cancel the signal callback. Specifying another
callback for the same signal replaces the previous handler (only one
handler can be active per signal).

- remove_signal_handler(sig).  Removes the handler for signal 'sig',
if one is set.

Is anything else needed?

Note that Python only receives signals in the main thread, and the
effect may be undefined if the event loop is not running in the main
thread, or if more than one event loop sets a handler for the same
signal. It also can't work for signals directed to a specific thread
(I think POSIX defines a few of these, but I don't know of any support
for these in Python.)

>> But sockets are not native on Windows, and I am making some effort
>> with PEP 3156 to efficiently support higher-level abstractions without
>> tying them to sockets. (The plan is to support IOCP on Windows. The
>> previous version of Tulip already had a branch that did support that,
>> as a demonstration of the power of this abstraction.)
>
> Supporting IOCP on windows is absolutely required, as WSAPoll is
> broken and won't be fixed.
> http://social.msdn.microsoft.com/Forums/hu/wsk/thread/18769abd-fca0-4d3c-9884-1a38ce27ae90

Wow. Now I'm even more glad that we're planning to support IOCP.

>> Only if the C code also uses libev, of course. But C programs may use
>> other event mechanisms -- e.g. AFAIK there are alternatives to libev
>> (during the early stages of Tulip development I chatted a bit with one
>> of the original authors of libevent, Niels Provos, and I believe
>> there's also something called libuv), and GUI frameworks (e.g. X, Qt,
>> Gtk, Wx) tend to have their own event loop.
>
> libuv is a wrapper around libev -adding IOCP- which adds some other
> things besides an event loop and is developed for/used in node.js.

Ah, that's helpful. I did not realize this after briefly skimming the
libuv page. (And the github logs suggest that it may no longer be the
case: https://github.com/joyent/libuv/commit/1282d64868b9c560c074b9c9630391f3b18ef633

>> PEP 3156 is designed to let alternative *implementations* of the same
>> *interface* be selected at run time. Hopefully it is possible to
>> provide a conforming implementation using libev -- then your goal
>> (smooth interoperability with C code using libev) is obtained.
>
> Smooth interoperability is not a major goal here - it's great if you
> get it for free.
> I'm just looking forward an event loop in the stdlib I want to use.

Heh, so stop objecting. :-)

>> (It would also be harder to implement initially as a 3rd party
>> framework. At the lowest level, no changes to Python itself are needed
>> -- it already supports non-blocking sockets, for example. But adding
>> optional callbacks to existing low-level APIs would require changes
>> throughout the stdlib.)
>
> As a result - making the stdlib async io aware - the complete stdlib.
> Would be great.

No matter what API style is chosen, making the entire stdlib async
aware will be tough. No matter what you do, the async support will
have to be "pulled through" every abstraction layer -- e.g. making
sockets async-aware doesn't automatically make socketserver or urllib2
async-aware(*). With the strong requirements for backwards
compatibility, in many cases it may be easier to define a new API that
is suitable for async use instead of trying to augment existing APIs.

(*) Unless you use microthreads, like gevent, but this has its own set
of problems -- I don't want to get into that here, since we seem to at
least agree on the need for an event loop with callbacks.

>> I am not so concerned about naming (it
>> seems inevitable that everyone uses somewhat different terminology
>> anyway, and it is probably better not to reuse terms when the meaning
>> is different), but I do like to look at guarantees (or the absence
>> thereof!) and best practices for dealing with the differences between
>> platforms.
>
> Handler - the best example for not re-using terms.

??? (Can't tell if you're sarcastic or agreeing here.)

>> You haven't convinced me about this.
>
> Fine, if you include transports, I'll pick on the transports as well ;)

??? (Similar.)

>> However, you can help me by
>> comparing the event loop part of PEP 3156 (ignoring anything that
>> returns or takes a Future) to libev and pointing out things (either
>> specific APIs or certain guarantees or requirements) that would be
>> hard to implement using libev, as well as useful features in libev
>> that you think every event loop should have.
>
>
> Note: In libev only the "default event loop" can have timers.

Interesting. This seems an odd constraint.

> EventLoop

>  * run() - ev_run(struct ev_loop)
>  * stop() - ev_break(EV_UNLOOP_ALL)
>  * run_forever() - registering an idle watcher will keep the loop alive
>  * run_once(timeout=None) - registering an timer, have the timer stop() the loop
>  * call_later(delay, callback, *args) - ev_timer
>  * call_repeatedly(interval, callback, **args) - ev_timer (periodic)
>  * call_soon(callback, *args) -  Equivalent to call_later(0, callback, *args).
>  - call_soon_threadsafe(callback, *args) - it would be better to have

> the event loops taking care of signals too, else waking up an ev_async
> in the loop which checks a async queue which contains the required
> information to register the call_soon callback would be possible

Not sure I understand. PEP 3156/Tulip uses a self-pipe to prevent race
conditions when call_soon_threadsafe() is called from a signal handler
or other thread(*) -- but I don't know if that is relevant or not.

(*) http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#448
and http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#576

>  - getaddrinfo(host, port, family=0, type=0, proto=0, flags=0) - libev
> does not do dns
>  - getnameinfo(sockaddr, flags=0) - libev does not do dns

Note that these exist at least in part so that an event loop
implementation may *choose* to implement its own DNS handling (IIUC
Twisted has this), whereas the default behavior is just to run
socket.getaddrinfo() -- but in a separate thread because it blocks.
(This is a useful test case for run_in_executor() too.)

>  - create_transport(protocol_factory, host, port, **kwargs) - libev
> does not do transports
>  - start_serving(protocol_factory, host, port, **kwds) - libev does
> not do transports
>  * add_reader(fd, callback, *args) - create a ev_io watcher with EV_READ
>  * add_writer(fd, callback, *args) - create ev_io watcher with EV_WRITE
>  * remove_reader(fd) - in libev you have to name the watcher you want
> to stop, you can not remove watchers/handlers by fd, workaround is
> maintaining a dict with fd:Handler in the EventLoop

Ok, this does not sound like a show-stopper for a conforming PEP 3156
implementation on top of libev then, right? Just a minor
inconvenience. I'm sure everyone has *some* impedance mismatches to
deal with.

>  * remove_writer(fd) - same
>  * add_connector(fd, callback, *args) - poll for writeability, getsockopt, done

TBH, I'm not 100% convinced of the need for add_connector(), but
Richard Oudkerk claims that it is needed for Windows. (OTOH if
WSAPoll() is too broken to bother, maybe we don't need it. It's a bit
of a nuisance because code that uses add_writer() instead works just
fine on UNIX but would be subtly broken on Windows, leading to
disappointments when porting apps to Windows. I'd rather have things
break on all platforms, or on none...)

>  * remove_connector(fd) - same as with all other remove-by-fd methods
>
> As Transport are part of the PEP - some more:
>
> EventLoop
>  * create_transport(protocol_factory, host, port, **kwargs)
>    kwargs requires "local" - local address as tuple like
> ('fe80::14ad:1680:54e1:6a91%eth0',0) - so you can bind when using ipv6
> link local scope.
>   or ('192.168.2.1',5060) - bind local port for udp

Not sure I understand. What socket.connect() (or other API) call
parameters does this correspond to? What can't expressed through the
host and port parameters?

>  * start_serving(protocol_factory, host, port, **kwds)
>    what is the behaviour for SOCK_DGRAM - does this multiplex sessions
> based on src host/port / dst host/port - I'd love it.

TBH I haven't thought much about datagram transports. It's been years
since I used UDP. I guess the API may have to distinguish between
connected and unconnected UDP. I think the transport/protocol API will
be different than for SOCK_STREAM: for every received datagram, the
transport will call protocol.datagram_received(data, address), (the
address will be a dummy for connected use) and to send a datagram, the
protocol must call tranport.write_datagram(data, [address]), which
returns immediately. Flow control (if supported) should work the same
as for streams: if the transport finds its buffers exceed a certain
limit it will tell the protocol to back off by calling
protocol.pause().

> Handler:
> Requiring 2 handlers for every active connection r/w is highly ineffective.

How so? What is the concern? The actions of the read and write handler
are typically completely different, so the first thing the handler
would have to do is to decide whether to call the read or the write
code. Also, depending on flow control, only one of the two may be
active.

If you are after minimizing the number of records passed to [e]poll or
kqueue, you can always collapse the handlers at that level and
distinguish between read/write based on the mask and recover the
appropriate user-level handler from the readers/writers array (and
this is what Tulip's epoll pollster class does).

PS. Also check out this issue, where an implementation of *just*
Tulip's pollster class for the stdlib is being designed:
http://bugs.python.org/issue16853; also check out the code reviews
here; http://bugs.python.org/review/16853/

> I'd prefer to be able to create a Handler from a loop.
> Handler = EventLoop.create_handler(socket, callback, events)
> and have the callback called with the returned events, so I can
> multiplex read/write op in the callback.

Hm. See above.

> Additionally, I can .stop() the handler without having to know the fd,
> .stop() the handler, change the events the handler is looking for,
> restart the handler with .start().
> In your proposal, I'd create a new handler every time I want to sent
> something, poll for readability - discard the handler when I'm done,
> create a new one for the next sent.

The questions are, does it make any difference in efficiency (when
using Python -- the performance of the C API is hardly relevant here),
and how often does this pattern occur.

> Timers:
> Not in the PEP - re-arming a timer
> lets say I want to do something if nothing happens for 5 seconds.
> I create a timer call_later(5.,cb), if something happens, I need to
> cancel the timer and create a new one. If there was a Timer:
> Timer.stop()
> Timer.set(5)
> Timer.start()

Actually it's one less call using the PEP's proposed API:

timer.cancel()
timer = loop.call_later(5, callback)

Which of the two idioms is faster? Who knows? libev's pattern is
probably faster in C, but that has little to bear on the cost in
Python. My guess is that the amount of work is about the same -- the
real cost is that you have to make some changes the heap used to keep
track of all timers in the order in which they will trigger, and those
changes are the same regardless of how you style the API.

> Transports:
> I think SSL should be a Protocol not a transport - implemented using BIO pairs.
> If you can chain protocols, like Transport / ProtocolA / ProtocolB you can have
> TCP / SSL / HTTP as https or  TCP / SSL / SOCKS /  HTTP as https via
> ssl enabled socks proxy without having to much problems. Another
> example, shaping a connection TCP / RATELIMIT / HTTP.

Interesting idea. This may be up to the implementation -- not every
implementation may have BIO wrappers available (AFAIK the stdlib
doesn't), so the stackability may not be easy to implement everywhere.
In any case, when you stack things like this, the stack doesn't look
like transport<-->protocol<-->protocol<-->protocol; rather, it's
A<-->B<-->C<-->D where each object has a "left" and a "right" API.
Each arrow connects the "transport (right) half" of the object on its
left (e.g. A) to the "protocol (left) half" of the object on the
arrow's right (e.g. B). So maybe we can visualise this as T1 <-->
P2:T2 <--> P3:T3 <--> P4.

> Having SSL as a Protocol allows closing the SSL connection without
> closing the TCP connection, re-using the TCP connection, re-using a
> SSL session cookie during reconnect of the SSL Protocol.

That seems a pretty esoteric use case (though given your background in
honeypots maybe common for you :-). It also seems hard to get both
sides acting correctly when you do this (but I'm certainly no SSL
expert -- I just want it supported because half the web is
inaccessible these days if you don't speak SSL, regardless of whether
you do any actual verification).

All in all I think that stackable transports/protocols are mostly
something that is enabled by the interfaces defined here (the PEP
takes care not to specify any base classes from which you must inherit
-- you must just implement certain methods, and the rest is duck
typing) but otherwise does not concern the PEP much.

The only concern I have, really, is that the PEP currently hints that
both protocols and transports might have pause() and resume() methods
for flow control, where the protocol calls transport.pause() if
protocol.data_received() is called too frequently, and the transport
calls protocol.pause() if transport.write() has buffered more data
than sensible. But for an object that is both a protocol and a
transport, this would make it impossible to distinguish between
pause() calls by its left and right neighbors. So maybe the names must
differ. Given the tendency of transport method names to be shorter
(e.g. write()) vs. the longer protocol method names (data_received(),
connection_lost() etc.), perhaps it should be transport.pause() and
protocol.pause_writing() (and similar for resume()).

>  * reconnect() - I'd love to be able to reconnect a transport

But what does that mean in general? It depends on the protocol (e.g.
FTP, HTTP, IRC, SMTP) how much state must be restored/renegotiated
upon a reconnect, and how much data may have to be re-sent. This seems
a higher-level feature that transports and protocols will have to
implement themselves.

>  * timers - Transports need timers

I think you mean timeouts?

>    * dns-resolve-timeout - dns can be slow
>    * connecting-timeout - connecting can take too much time, more than
> we want to wait
>    * idle-timeout ( no action on the connection for a while ) - call
> protocol.timeout_idle()
>    * sustain-timeout ( max session time ) - close() transport
>    * ssl-handshake-timeout ( in case ssl is a Transport ) - close transport
>    * close-timeout (shutdown is async) - close transport hard
>    * reconnect-timeout - (wait some seconds before reconnecting) -
> reconnect connection

This is an interesting point. I think some of these really do need
APIs in the PEP, others may be implemented using existing machinery
(e.g. call_later() to schedule a callback that calls cancel() on a
task). I've added a bullet on this to Open Issue.

> Now, in case we connect to a host by name, and have multiple addresses
> resolved, and the first connection can not be established, there is no
> way to 'reconnect()' - as the protocol does not yet exist.

Twisted suggested something here which I haven't implemented yet but
which seems reasonable -- using a series of short timeouts try
connecting to the various addresses and keep the first one that
connects successfully. If multiple addresses connect after the first
timeout, too bad, just close the redundant sockets, little harm is
done (though the timeouts should be tuned that this is relatively
rare, because a server may waste significant resources on such
redundant connects).

> For almost all the timeouts I mentioned - the protocol needs to take
> care - so the protocol has to exist before the connection is
> established in case of outbound connections.

I'm not sure I follow. Can you sketch out some code to help me here?
ISTM that e.g. the DNS, connect and handshake timeouts can be
implemented by the machinery that tries to set up the connection
behind the scenes, and the user's protocol won't know anything of
these shenanigans. The code that calls create_transport() (actually
it'll probably be renamed create_client()) will just get a Future that
either indicates success (and then the protocol and transport are
successfully hooked up) or an error (and then no protocol was created
-- whether or not a transport was created is an implementation
detail).

> In case aconnection is lost and reconnecting is required -
> .reconnect() is handy, so the protocol can request reconnecting.

I'd need more details of how you would like to specify this.

> As this does not work with the current Protocols callbacks I propose
> Protocols.connection_established() therefore.

How does this differ from connection_made()?

(I'm trying to follow Twisted's guidance here, they seem to have the
longest experience doing these kinds of things. When I talked to Glyph
IIRC he was skeptical about reconnecting in general.)

> Protocols
> I'd outline protocol_factory can be a instance of a class, which can
> set specific parameters for 'things'
> class p:
>         def __init__(self, a=1,b=2,c=3):
>                 self.a = a
>                 self.b = b
>                 self.c = c
>         def __call__(self):
>                 return p(a=self.a, b=self.b, c=self.c)
>         def ... all protocol methods ...:
>                 pass
>
> EventLoop.start_serving(p(a=5,b=7), ...)
> EventLoop.start_serving(p(a=9,b=4), ...)
>
> Same Protocol, different parameters for it.

No such helper method (or class) is needed. You can use a lambda or
functools.partial for the same effect. I'll add a note to the PEP to
remind people of this.

>  + connection_established()
>  + timeout_dns()
>  + timeout_idle()
>  + timeout_connecting()

Signatures please?

>  * data_received(data) - if it was possible to return the number of
> bytes consumed by the protocol, and have the Transport buffer the rest
> for the next io in call, one would avoid having to do this in every
> Protocol on it's own - learned from experience.

Twisted has a whole slew of protocol implementation subclasses that
implement various strategies like line-buffering (including a really
complex version where you can turn the line buffering on and off) and
"netstrings". I am trying to limit the PEP's size by not including
these, but I fully expect that in practice a set of useful protocol
implementations will be created that handles common cases. I'm not
convinced that putting this in the transport/protocol interface will
make user code less buggy: it seems easy for the user code to miscount
the bytes or not return a count at all in a rarely taken code branch.

>  * eof_received()/connection_lost(exc) - a connection can be closed
> clean recv()=0, unclean recv()=-1, errno, SIGPIPE when writing and in
> case of SSL even more, it is required to distinguish.

Well, this is why eof_received() exists -- to indicate a clean close.
We should never receive SIGPIPE (Python disables this signal, so you
always get the errno instead). According to Glyph, SSL doesn't support
sending eof, so you have to use Content-length or a chunked encoding.
What other conditions do you expect from SSL that wouldn't be
distinguished by the exception instance passed to connection_lost()?

>  + nextlayer_is_empty() - called if the Transport (or underlying
> Protocol in case of chaining) write buffer is empty - Imagine an http
> server sending a 1GB file, you do not want to sent 1GB at once - as
> you do not have that much memory, but get a callback if the transport
> done sending the chunk you've queued, so you can send the next chunk
> of data.

That's what the pause()/resume() flow control protocol is for. You
read the file (presumably it's a file) in e.g. 16K blocks and call
write() for each block; if the transport can't keep up and exceeds its
buffer space, it calls protocol.pause() (or perhaps
protocol.pause_writing(), see discussion above).

> Next, what happens if a dns can not be resolved, ssl handshake (in
> case ssl is transport) or connecting fails - in my opinion it's an
> error the protocol is supposed to take care of
>  + error_dns
>  + error_ssl
>  + error_connecting

The future returned by create_transport() (aka create_client()) will
raise the exception.

> I'm not that much into futures - so I may have got some things wrong.

No problem. You may want to read PEP 3148, it explains Futures and
much of that explanation remains valid; just in PEP 3156 to wait for a
future you must use "yield from <future>".

-- 
--Guido van Rossum (python.org/~guido)


From shibturn at gmail.com  Sun Jan  6 00:55:55 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Sat, 05 Jan 2013 23:55:55 +0000
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
In-Reply-To: <CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
	<CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>
	<CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
	<CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
Message-ID: <kcaeif$36k$1@ger.gmane.org>

On 05/01/2013 11:30pm, Guido van Rossum wrote:
>> >Supporting IOCP on windows is absolutely required, as WSAPoll is
>> >broken and won't be fixed.
>> >http://social.msdn.microsoft.com/Forums/hu/wsk/thread/18769abd-fca0-4d3c-9884-1a38ce27ae90
> Wow. Now I'm even more glad that we're planning to support IOCP.
>

I took care to work around that bug when adding support for WSAPoll() in 
tulip.

-- 
Richard



From shibturn at gmail.com  Sun Jan  6 00:57:43 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Sat, 05 Jan 2013 23:57:43 +0000
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
In-Reply-To: <CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
	<CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>
	<CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
	<CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
Message-ID: <kcaelr$36k$2@ger.gmane.org>

On 05/01/2013 11:30pm, Guido van Rossum wrote:
> TBH, I'm not 100% convinced of the need for add_connector(), but
> Richard Oudkerk claims that it is needed for Windows. (OTOH if
> WSAPoll() is too broken to bother, maybe we don't need it. It's a bit
> of a nuisance because code that uses add_writer() instead works just
> fine on UNIX but would be subtly broken on Windows, leading to
> disappointments when porting apps to Windows. I'd rather have things
> break on all platforms, or on none...)

add_connector() is needed to work around the brokenness of WSAPoll().

-- 
Richard



From rosuav at gmail.com  Sun Jan  6 01:00:53 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Sun, 6 Jan 2013 11:00:53 +1100
Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update
	sequence
In-Reply-To: <kcab0n$91d$1@ger.gmane.org>
References: <CADiSq7dmibf7TKo2KYRTReZB68jrHQnnNhiWTXJAZdCHBgZdqA@mail.gmail.com>
	<20130105164220.09d654be@anarchist.wooz.org>
	<kcab0n$91d$1@ger.gmane.org>
Message-ID: <CAPTjJmpY62o6L=Ez2E_931q37oR80q+Z5=tf7Rkv4ueJV5kEvA@mail.gmail.com>

On Sun, Jan 6, 2013 at 9:54 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 1/5/2013 4:42 PM, Barry Warsaw wrote:
>
>> Also, I suggest taking the opportunity to change the sense of flags such
>> as
>> no_site and dont_write_bytecode.  I find it much more difficult to reason
>> that
>> "dont_write_bytecode = 0" means *do* write bytecode, rather than
>> "write_bytecode = 1".  I.e. positives are better than double-negatives.
>
> IE, you prefer positive flags, with some on by default, over having all
> flags indicate a non-default condition. I would too, but I don't hack on the
> C code base. 'dont_write_bytecode' is especially ugly.

Would it be less ugly if called 'suppress_bytecode'? It sounds less
negative, but does the same thing. Suppressing something is an active
and positive action (though the democratic decision to not publish is
quite different, as Yes Minister proved).

ChrisA


From ncoghlan at gmail.com  Sun Jan  6 08:26:14 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 6 Jan 2013 17:26:14 +1000
Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update
	sequence
In-Reply-To: <20130105164220.09d654be@anarchist.wooz.org>
References: <CADiSq7dmibf7TKo2KYRTReZB68jrHQnnNhiWTXJAZdCHBgZdqA@mail.gmail.com>
	<20130105164220.09d654be@anarchist.wooz.org>
Message-ID: <CADiSq7d7v58S-Cm2+KwO6Nci4BmS3FaFsYkqgWsOTUiWo321WA@mail.gmail.com>

On Sun, Jan 6, 2013 at 7:42 AM, Barry Warsaw <barry at python.org> wrote:
> Hi Nick,
>
> PEP 432 is looking very nice.  It'll be fun to watch the implementation come
> together. :)
>
> Some comments...
>
> The start up sequences:
>
>> * Pre-Initialization - no interpreter available
>> * Initialization - interpreter partially available
>
> What about "Initializing"?

Makes sense, changed.

>> * Initialized - full interpreter available, __main__ related metadata
>>   incomplete
>> * Main Execution - optional state, __main__ related metadata populated,
>>   bytecode executing in the __main__ module namespace
>
> What is "optional" about this state?  Maybe it should be called "Operational"?

Unlike the other phases which are sequential and distinct, "Main
Execution" is a subphase of Initialized. Embedding applications
without the concept of a "__main__" module (e.g. mod_wsgi) will never
use it.

>> ... separate system Python (spython) executable ...
>
> I love the idea, but I'm not crazy about the name.  What about
> `python-minimal` (yes, it's deliberately longer.  Symlinks ftw. :)

Yeah, I'll go with "python-minimal".

>> <TBD: Did I miss anything?>
>
> What about sys.implementation?

Unaffected, since that's all configured at build time. I've added an
explicit note that sys.implementation and sysconfig.get_config_vars()
are not affected by this initial proposal.

>> as it failed to be updated for the virtual environment support added in
>> Python 3.3 (detailed in PEP 420).
>
> venv is defined in PEP 405 (there are two cases of mis-referencing).

Oops, fixed.

> Note that there may be other important build time settings on some platforms.
> An example is Debian/Ubuntu, where we define the multiarch triplet in the
> configure script, and pass that through Makefile(.pre.in) to sysmodule.c for
> exposure as sys.implementation._multiarch.

Yeah, I don't want to mess with adding new runtime configuration
options at this point, beyond the features inherent in breaking up the
existing initialization phases.

>
>> For a command executed with -c, it will be the string "-c"
>> For explicitly requested input from stdin, it will be the string "-"
>
> Wow, I couldn't believe it but it's true!  That seems crazy useless. :)

Yup. While researching this PEP I had many moments where I was looking
at the screen going "WTF, we seriously do that?" (most notably when I
learned that using the -W and -X options means we create Python
objects in Py_Main() before the call to Py_Initialize(). This is why
there has to be an explicit call to _Py_Random_Init() before the
option processing code)

>> Embedding applications must call Py_SetArgv themselves. The CPython logic
>> for doing so is part of Py_Main() and is not exposed separately. However,
>> the runpy module does provide roughly equivalent logic in runpy.run_module
>> and runpy.run_path.
>
> As I've mentioned before on the python-porting mailing list, this is actually
> more difficult than it seems because main() takes char*s but Py_SetArgv() and
> Py_SetProgramName() takes wchar_t*s.
>
> Maybe Python's own conversion could be refactored to make this easier either
> as part of this PEP or after the PEP is implemented.

Yeah, one of the changes in the PEP is that you can pass program_name
and raw_argv as a Unicode object or a list of Unicode objects instead
of use wchar_t.

>
>> int Py_ReadConfiguration(PyConfig *config);
>
>> The config argument should be a pointer to a Python dictionary. For any
>> supported configuration setting already in the dictionary, CPython will
>> sanity check the supplied value, but otherwise accept it as correct.
>
> So why not define this to take a PyObject* or a PyDictObject* ?

That wording is a holdover from a previous version of the PEP where
this was indeed a dictionary pointer. I came around to Antoine's point
of view that since we have a fixed list of supported settings at any
given point in time, a struct would be easier to deal with on the C
side. However, I missed a few spots (including this one) when I made
the change to the PEP.

>
> (also: the Py_Config struct members need the correct concrete type pointers,
> e.g. PyDictObject*)

Fixed.

>> Alternatively, settings may be overridden after the Py_ReadConfiguration
>> call (this can be useful if an embedding application wants to adjust a
>> setting rather than replace it completely, such as removing sys.path[0]).
>
> How will setting something after Py_ReadConfiguration() is called change a
> value such as sys.path?  Or is this the reason why you pass a Py_Config to
> Py_EndInitialization()?

Correct - calling Py_ReadConfiguration has no effect on the
interpreter state. The
interpreter state only changes in Py_EndInitialization. I'll include a
more explicit
explanation of that behaviour.

> (also, see the type typo <wink> in the definition of Py_EndInitialization())
>
> Also, I suggest taking the opportunity to change the sense of flags such as
> no_site and dont_write_bytecode.  I find it much more difficult to reason that
> "dont_write_bytecode = 0" means *do* write bytecode, rather than
> "write_bytecode = 1".  I.e. positives are better than double-negatives.

While I agree with this principle in general, I'm deliberate not doing
anything about most of these because these settings are already
exposed in their double-negative form as environment variables
(PYTHONDONTWRITEBYTECODE, PYTHONNOUSERSITE), as global variables that
can be set by an embedding application (Py_DontWriteBytecodeFlag,
Py_NoSiteFlag, Py_NoUserSiteDirectory) and as sys module attributes
(sys.dont_write_bytecode, sys.flags.no_site, sys.flags.no_user_site).

However, I *am* going to change the sense of the no_site setting to
"enable_site_config". The reason for this is that the meaning of the
setting actually changed in Python 3.3 to also mean "disable the side
effects that are currently implicit in importing the site module", in
addition to implicitly importing that module as part of the startup
sequence.

>> sys.argv[0] may not yet have its final value
>> it will be -m when executing a module or package with CPython
>
> Gosh, wouldn't it be nice if this could have a more useful value?

It does once runpy is done with it (it has the __file__ attribute
corresponding to whatever code is actually being run as __main__). At
this point in the initialisation sequence, though, __main__ is still
the builtin __main__ module, and there's no getting around the fact
that we need to be able to import and run arbitrary Python code (both
from the standard library and from package __init__ files) in order to
properly locate __main__.

>> Initial thought is that hiding the various options behind a single API would
>> make that API too complicated, so 3 separate APIs is more likely:
>
> +1
>
>> The interpreter state will be updated to include details of the
>> configuration settings supplied during initialization by extending the
>> interpreter state object with an embedded copy of the Py_CoreConfig and
>> Py_Config structs.
>
> Couldn't it just have a dict with all the values from both structs collapsed
> into it?

It could, but that's substantially less convenient from the C side of the API.
>
>> For debugging purposes, the configuration settings will be exposed as a
>> sys._configuration simple namespace
>
> I suggest un-underscoring the name and making it public.  It might be useful
> for other than debugging purposes.

The underscore is there because the specific fields are currently
CPython specific. Another implementation may not make these settings
configurable at all.

If there are particular settings that would be useful to modules like
importlib or site, then we may want to look at exposing them through
sys.implementation as required attributes, but that's a distinct PEP
from this one.

>> Is Py_IsRunningMain() worth keeping?
>
> Perhaps.  Does it provide any additional information above Py_IsInitialized()?

Yes - it indicates that sys.argv[0] and the metadata in __main__ are
fully updated (i.e. the placeholder info used while executing Python
code in order to locate __main__ in the first place has been replaced
with the real info).

>> Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via
>> the sys module?
>
> I can't think of a use case.

Neither can I. I'll leave them as "for embedding apps only" until
someone comes up with an actual reason to expose them.

>> Is the Py_Config struct too unwieldy to be practical? Would a Python
>> dictionary be a better choice?
>
> Although I see why you've spec'd it this way, I don't like having *two* config
> structures (Py_CoreConfig and Py_Config).  Having a dictionary for the latter
> would probably be fine, and in fact you could copy the Py_Config values into
> it (when possible during the init sequence) and expose it in the sys module.

Yeah, I originally had just Py_CoreConfig and then a Py_DictObject for
the rest of it. The first draft of Py_Config embedded a copy of
Py_CoreConfig as the first field. However, I eventually settled on the
current scheme as best aligning the model with the reality that we
really do have two kinds of configuration setting which need to be
handled differently:

- Py_CoreConfig holds the settings that are required to create a
Py_InterpreterState at all (passed to Py_BeginInitialization)
- Py_Config holds the settings that are required to get to a fully
functional interpreter (passed to Py_EndInitialization)

Using a struct for both of them is easier to work with from C, and
makes the number vs string vs list vs mapping distinction for the
various settings self-documenting.

>> Would it be better to manage the flag variables in Py_Config as Python
>> integers so the struct can be initialized with a simple memset(&config, 0,
>> sizeof(*config))?
>
> Would we even notice the optimization?

I'll clarify this a bit - it's a maintainability question, rather than
an optimization. (i.e. I think _Py_Config_INIT is ugly as hell, I just
don't have any better ideas)

>
>> A System Python Executable
>
> This should probably at least mention Christian's idea of the -I flag (which I
> think hasn't been PEP'd yet).  We can bikeshed about the name of the
> executable later. :)

Yeah, I've gone through and added a bunch of tracker links, including
that one. There's a signficant number of things which this should make
easier in the future (e.g. I haven't linked to it, but the proposal to
support custom memory allocators could be handled by adding more
fields to Py_CoreConfig rather than more C level global variables)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ncoghlan at gmail.com  Sun Jan  6 08:28:22 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 6 Jan 2013 17:28:22 +1000
Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update
	sequence
In-Reply-To: <CADiSq7d7v58S-Cm2+KwO6Nci4BmS3FaFsYkqgWsOTUiWo321WA@mail.gmail.com>
References: <CADiSq7dmibf7TKo2KYRTReZB68jrHQnnNhiWTXJAZdCHBgZdqA@mail.gmail.com>
	<20130105164220.09d654be@anarchist.wooz.org>
	<CADiSq7d7v58S-Cm2+KwO6Nci4BmS3FaFsYkqgWsOTUiWo321WA@mail.gmail.com>
Message-ID: <CADiSq7dTxah=T70eRVptAzCZKG7iOHFNngvqvHBuSKDqFPYtMg@mail.gmail.com>

On Sun, Jan 6, 2013 at 5:26 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> I love the idea, but I'm not crazy about the name.  What about
>> `python-minimal` (yes, it's deliberately longer.  Symlinks ftw. :)
>
> Yeah, I'll go with "python-minimal".

Oops, I was editing the PEP and the email at the same time, and
changed my mind about this without fixing the email. I actually went
with "pysystem" for now, but I also noted the need to paint this
bikeshed under Open Questions.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ncoghlan at gmail.com  Sun Jan  6 10:06:31 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 6 Jan 2013 19:06:31 +1000
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
	<CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
Message-ID: <CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>

On Sun, Jan 6, 2013 at 5:23 AM, Guido van Rossum <guido at python.org> wrote:
> Possibly (though it will have to be a separate PEP -- PEP 3156 needs
> to be able to run on unchanged Python 3.3). Does anyone on this thread
> have enough understanding of the implementation of context managers
> and generators to be able to figure out how this could be specified
> and implemented (or to explain why it is a bad idea, or impossible)?

There aren't any syntax changes needed to implement asynchronous
locks, since they're unlikely to experience high latency in __exit__.
For that and similar cases, it's enough to use an asynchronous
operation to retrieve the CM in the first place (i.e. acquire in
__iter__ rather than __enter__) or else have __enter__ produce a
Future that acquires the lock in __iter__ (see
http://python-notes.boredomandlaziness.org/en/latest/pep_ideas/async_programming.html#asynchronous-context-managers)

The real challenge is in handling something like an asynchronous
database transaction, which will need to yield on __exit__ as it
commits or rolls back the database transaction. At the moment, the
only solutions for that are to switch to a synchronous-to-asynchronous
adapter like gevent or else write out the try/except block and avoid
using the with statement.

It's not an impossible problem, just a tricky one to solve in a
readable fashion. Some possible constraints on the problem space:

- any syntactic solution should work for at least "for" statements and
"with" statements
- also working for comprehensions is highly desirable
- syntactic ambiguity with currently legal constructs should be
avoided. Even if the compiler can figure it out, large behavioural
changes due to a subtle difference in syntax should be avoided because
they're hard for *humans* to read

For example:

    # Synchronous
    for x in y:   # Invokes _iter = iter(y) and _iter.__next__()
        print(x)
    #Asynchronous:
    for x in yielding y:   # Invokes _iter = yield from iter(y) and
yield from _iter.__next__()
        print(x)

    # Synchronous
    with x as y:   # Invokes _cm = x, y = _cm.__enter__() and
_cm.__exit__(*args)
        print(y)
    #Asynchronous:
    with yielding x as y:   # Invokes _cm = x, y = yield from
_cm.__enter__() and yield from _cm.__exit__(*args)
        print(y)

A new keyword like "yielding" would make it explicit that what is
going on differs from a (yield x) or (yield from x) in the
corresponding expression slot.

Approaches with function level granularity may also be of interest -
PEP 3152 is largely an exploration of that idea (but would need
adjustments in light of PEP 3156)

Somewhat related, there's also a case to be made that "yield from x"
should fall back to being equivalent to "x()" if x implements __call__
but not __iter__. That way, async ready code can be written using
"yield from", but passing in a pre-canned result via lambda or
functools.partial would no longer require a separate operation that
just adapts the asynchronous call API (i.e. __iter__) to the
synchronous call one (i.e. __call__):

    def async_call(f):
        @functools.wraps(f)
        def _sync(*args, **kwds):
            return f(*args, **kwds)
            yield # Force this to be a generator
        return _iterable_call

The argument against, of course, is the ease with which this can lead
to a "wrong answer" problem where the exception gets thrown a long way
from the erroneous code which left out the parens for the function
call.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From _ at lvh.cc  Sun Jan  6 11:20:35 2013
From: _ at lvh.cc (Laurens Van Houtven)
Date: Sun, 6 Jan 2013 11:20:35 +0100
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
	<CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
	<CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>
Message-ID: <CAE_Hg6bLHOn-L7ithXMXL5w8JF7E85gJKuwu0yPqVomo=-Msrg@mail.gmail.com>

Hi Nick,


When you say "high latency" (in __exit__), what does "high" mean? Is that
order of magnitude what __exit__ usually means now, or network IO included?

(Use case: distributed locking and remotely stored locks: it doesn't take a
long time on network scales, but it can take a long time on CPU scales.)



On Sun, Jan 6, 2013 at 10:06 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Sun, Jan 6, 2013 at 5:23 AM, Guido van Rossum <guido at python.org> wrote:
> > Possibly (though it will have to be a separate PEP -- PEP 3156 needs
> > to be able to run on unchanged Python 3.3). Does anyone on this thread
> > have enough understanding of the implementation of context managers
> > and generators to be able to figure out how this could be specified
> > and implemented (or to explain why it is a bad idea, or impossible)?
>
> There aren't any syntax changes needed to implement asynchronous
> locks, since they're unlikely to experience high latency in __exit__.
> For that and similar cases, it's enough to use an asynchronous
> operation to retrieve the CM in the first place (i.e. acquire in
> __iter__ rather than __enter__) or else have __enter__ produce a
> Future that acquires the lock in __iter__ (see
>
> http://python-notes.boredomandlaziness.org/en/latest/pep_ideas/async_programming.html#asynchronous-context-managers
> )
>
> The real challenge is in handling something like an asynchronous
> database transaction, which will need to yield on __exit__ as it
> commits or rolls back the database transaction. At the moment, the
> only solutions for that are to switch to a synchronous-to-asynchronous
> adapter like gevent or else write out the try/except block and avoid
> using the with statement.
>
> It's not an impossible problem, just a tricky one to solve in a
> readable fashion. Some possible constraints on the problem space:
>
> - any syntactic solution should work for at least "for" statements and
> "with" statements
> - also working for comprehensions is highly desirable
> - syntactic ambiguity with currently legal constructs should be
> avoided. Even if the compiler can figure it out, large behavioural
> changes due to a subtle difference in syntax should be avoided because
> they're hard for *humans* to read
>
> For example:
>
>     # Synchronous
>     for x in y:   # Invokes _iter = iter(y) and _iter.__next__()
>         print(x)
>     #Asynchronous:
>     for x in yielding y:   # Invokes _iter = yield from iter(y) and
> yield from _iter.__next__()
>         print(x)
>
>     # Synchronous
>     with x as y:   # Invokes _cm = x, y = _cm.__enter__() and
> _cm.__exit__(*args)
>         print(y)
>     #Asynchronous:
>     with yielding x as y:   # Invokes _cm = x, y = yield from
> _cm.__enter__() and yield from _cm.__exit__(*args)
>         print(y)
>
> A new keyword like "yielding" would make it explicit that what is
> going on differs from a (yield x) or (yield from x) in the
> corresponding expression slot.
>
> Approaches with function level granularity may also be of interest -
> PEP 3152 is largely an exploration of that idea (but would need
> adjustments in light of PEP 3156)
>
> Somewhat related, there's also a case to be made that "yield from x"
> should fall back to being equivalent to "x()" if x implements __call__
> but not __iter__. That way, async ready code can be written using
> "yield from", but passing in a pre-canned result via lambda or
> functools.partial would no longer require a separate operation that
> just adapts the asynchronous call API (i.e. __iter__) to the
> synchronous call one (i.e. __call__):
>
>     def async_call(f):
>         @functools.wraps(f)
>         def _sync(*args, **kwds):
>             return f(*args, **kwds)
>             yield # Force this to be a generator
>         return _iterable_call
>
> The argument against, of course, is the ease with which this can lead
> to a "wrong answer" problem where the exception gets thrown a long way
> from the erroneous code which left out the parens for the function
> call.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
cheers
lvh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130106/8cd91b50/attachment.html>

From ncoghlan at gmail.com  Sun Jan  6 12:37:11 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 6 Jan 2013 21:37:11 +1000
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CAE_Hg6bLHOn-L7ithXMXL5w8JF7E85gJKuwu0yPqVomo=-Msrg@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
	<CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
	<CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>
	<CAE_Hg6bLHOn-L7ithXMXL5w8JF7E85gJKuwu0yPqVomo=-Msrg@mail.gmail.com>
Message-ID: <CADiSq7ePm4dg=fj3CmUL195297-KNYzMHN76R9xkb2w+LRSeag@mail.gmail.com>

On Sun, Jan 6, 2013 at 8:20 PM, Laurens Van Houtven <_ at lvh.cc> wrote:
> Hi Nick,
>
>
> When you say "high latency" (in __exit__), what does "high" mean? Is that
> order of magnitude what __exit__ usually means now, or network IO included?
>
> (Use case: distributed locking and remotely stored locks: it doesn't take a
> long time on network scales, but it can take a long time on CPU scales.)

The status quo can only be made to work for in-memory locks. If the
release step involves network access, then it's closer to the
"database transaction" use case, because the __exit__ method may need
to block.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From nepenthesdev at gmail.com  Sun Jan  6 16:45:52 2013
From: nepenthesdev at gmail.com (Markus)
Date: Sun, 6 Jan 2013 16:45:52 +0100
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
In-Reply-To: <CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
	<CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>
	<CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
	<CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
Message-ID: <CACEGMv8EaCzuZL_wu1ESFUjKMo-a15799SR=BfroUWB0vME1WQ@mail.gmail.com>

Hi,

> Do you have a suggestion for a minimal interface for signal handling?
> I could imagine the following:
>
> Note that Python only receives signals in the main thread, and the
> effect may be undefined if the event loop is not running in the main
> thread, or if more than one event loop sets a handler for the same
> signal. It also can't work for signals directed to a specific thread
> (I think POSIX defines a few of these, but I don't know of any support
> for these in Python.)

Exactly - signals are a mess, threading and signals make things worse
- I'm no expert here, but I just have had experienced problems with
signal handling and threads, basically the same problems you describe.
Creating the threads after installing signal handlers (in the main
thread) works, and signals get delivered to the main thread,
installing the signal handlers (in the main thread) after creating the
threads - and the signals ended up in *some thread*.
Additionally it depended on if you'd install your signal handler with
signal() or sigaction() and flags when creating threads.

>> Supporting IOCP on windows is absolutely required, as WSAPoll is
>> broken and won't be fixed.
>> http://social.msdn.microsoft.com/Forums/hu/wsk/thread/18769abd-fca0-4d3c-9884-1a38ce27ae90
>
> Wow. Now I'm even more glad that we're planning to support IOCP.

tulip already has a workaround:
http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#244

>> libuv is a wrapper around libev -adding IOCP- which adds some other
>> things besides an event loop and is developed for/used in node.js.
>
> Ah, that's helpful. I did not realize this after briefly skimming the
> libuv page. (And the github logs suggest that it may no longer be the
> case: https://github.com/joyent/libuv/commit/1282d64868b9c560c074b9c9630391f3b18ef633

Okay, they moved to libngx - nginx core library, obviously I missed this.

>> Handler - the best example for not re-using terms.
>
> ??? (Can't tell if you're sarcastic or agreeing here.)

sarcastic.

>> Fine, if you include transports, I'll pick on the transports as well ;)
>
> ??? (Similar.)

Not sarcastic.

>> Note: In libev only the "default event loop" can have timers.
>
> Interesting. This seems an odd constraint.

I'm wrong - discard. This limitation refered to watchers for child processes.

>> EventLoop
>>  - call_soon_threadsafe(callback, *args) - it would be better to have
> Not sure I understand. PEP 3156/Tulip uses a self-pipe to prevent race
> conditions when call_soon_threadsafe() is called from a signal handler
> or other thread(*) -- but I don't know if that is relevant or not.

ev_async is a self-pipe too.

> (*) http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#448
> and http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#576
>
>>  - getaddrinfo(host, port, family=0, type=0, proto=0, flags=0) - libev
>> does not do dns
>>  - getnameinfo(sockaddr, flags=0) - libev does not do dns
>
> Note that these exist at least in part so that an event loop
> implementation may *choose* to implement its own DNS handling (IIUC
> Twisted has this), whereas the default behavior is just to run
> socket.getaddrinfo() -- but in a separate thread because it blocks.
> (This is a useful test case for run_in_executor() too.)

I'd expect the EventLoop never to create threads on his own behalf,
it's just wrong.
If you can't provide some functionality without threads, don't provide
the functionality.

Besides, getaddrinfo() is a bad choice, as it relies on distribution
specific flags.
For example ip6 link local scope exists on every current platform, but
- when resolving an link local scope address -not domain- with
getaddrinfo, getaddrinfo will fail if no global routed ipv6 address is
available on debian/ubuntu.

>> As Transport are part of the PEP - some more:
>>
>> EventLoop
>>  * create_transport(protocol_factory, host, port, **kwargs)
>>    kwargs requires "local" - local address as tuple like
>> ('fe80::14ad:1680:54e1:6a91%eth0',0) - so you can bind when using ipv6
>> link local scope.
>>   or ('192.168.2.1',5060) - bind local port for udp
>
> Not sure I understand. What socket.connect() (or other API) call
> parameters does this correspond to? What can't expressed through the
> host and port parameters?

In case you have multiple interfaces, and multiple gateways, you need
to assign the connection to an address - so the kernel knows which
interface to use for the connection - else he'd default to "the first"
interface.
In IPv6 link-local scope you can have multiple addresses in the same
subnet fe80:: - IIRC if you want to connect somewhere, you have to
either set the scope_id of the remote, or bind the "source" address
before - I don't know how to set the scope_id in python, it's in
sockaddr_in6.

In terms of socket. it is a bind before a connect.

s = socket.socket(AF_INET6,SOCK_DGRAM,0)
s.bind(('fe80::1',0))
s.connect(('fe80::2',4712))

same for ipv4 in case you are multi homed and rely on source based routing.

>> Handler:
>> Requiring 2 handlers for every active connection r/w is highly ineffective.
>
> How so? What is the concern?

Of course you can fold the fdsets, but in case you need a seperate
handler for write, you re-create it for every write - see below.

>> Additionally, I can .stop() the handler without having to know the fd,
>> .stop() the handler, change the events the handler is looking for,
>> restart the handler with .start().
>> In your proposal, I'd create a new handler every time I want to sent
>> something, poll for readability - discard the handler when I'm done,
>> create a new one for the next sent.
>
> The questions are, does it make any difference in efficiency (when
> using Python -- the performance of the C API is hardly relevant here),
> and how often does this pattern occur.

Every time you send - you poll for write-ability, you get the
callback, you write, you got nothing left, you stop polling for
write-ability.

>> Timers:
>> ...
>> Timer.stop()
>> Timer.set(5)
>> Timer.start()
>
> Actually it's one less call using the PEP's proposed API:
>
> timer.cancel()
> timer = loop.call_later(5, callback)

My example was ill-chosen, problem for both of us - how to we know
it's 5 seconds?
timer.restart() or timer.again()

the timer could remember it's interval, else you have to store the
interval somewhere, next to the timer.

> Which of the two idioms is faster? Who knows? libev's pattern is
> probably faster in C, but that has little to bear on the cost in
> Python. My guess is that the amount of work is about the same -- the
> real cost is that you have to make some changes the heap used to keep
> track of all timers in the order in which they will trigger, and those
> changes are the same regardless of how you style the API.

Speed, nothing is fast in every circumstances, for example select is
faster than epoll for small numbers of sockets.
Let's look on usability.

>> Transports:
>> I think SSL should be a Protocol not a transport - implemented using BIO pairs.
>> If you can chain protocols, like Transport / ProtocolA / ProtocolB you can have
>> TCP / SSL / HTTP as https or  TCP / SSL / SOCKS /  HTTP as https via
>> ssl enabled socks proxy without having to much problems. Another
>> example, shaping a connection TCP / RATELIMIT / HTTP.
>
> Interesting idea. This may be up to the implementation -- not every
> implementation may have BIO wrappers available (AFAIK the stdlib
> doesn't),

Right, for ssl bios pyopenssl is required - or ctypes.

> So maybe we can visualise this as T1 <-->
> P2:T2 <--> P3:T3 <--> P4.

Yes, exactly.

>> Having SSL as a Protocol allows closing the SSL connection without
>> closing the TCP connection, re-using the TCP connection, re-using a
>> SSL session cookie during reconnect of the SSL Protocol.
>
> That seems a pretty esoteric use case (though given your background in
> honeypots maybe common for you :-). It also seems hard to get both
> sides acting correctly when you do this (but I'm certainly no SSL
> expert -- I just want it supported because half the web is
> inaccessible these days if you don't speak SSL, regardless of whether
> you do any actual verification).

Well, proper shutdown is not a SSL protocol requirement, closing the
connection hard saves some cycles, so it pays of not do it right in
large scaled deployments - such as google.
Nevertheless, doing SSL properly can help, as it allows to distinguish
from connection reset errors and proper shutdown.

> The only concern I have, really, is that the PEP currently hints that
> both protocols and transports might have pause() and resume() methods
> for flow control, where the protocol calls transport.pause() if
> protocol.data_received() is called too frequently, and the transport
> calls protocol.pause() if transport.write() has buffered more data
> than sensible. But for an object that is both a protocol and a
> transport, this would make it impossible to distinguish between
> pause() calls by its left and right neighbors. So maybe the names must
> differ. Given the tendency of transport method names to be shorter
> (e.g. write()) vs. the longer protocol method names (data_received(),
> connection_lost() etc.), perhaps it should be transport.pause() and
> protocol.pause_writing() (and similar for resume()).

Protocol.data_received rename to Protocol.io_in
Protocol.io_out - in case the transports out buffer is empty -
(instead of Protocol.next_layer_is_empty())
Protocol.pause_io_out - in case the transport wants to stop the
protocol sending more as the out buffer is crowded already
Protocol.resume_io_out - in case the transport wants to inform the
protocol the out buffer can take some more bytes again

For the Protocol limiting the amount of data received:
Transport.pause -> Transport.pause_io_in
Transport.resume -> Transport.resume_io_in

or drop the "_io" from the names, "(pause|resume_(in|out)"

>>  * reconnect() - I'd love to be able to reconnect a transport
>
> But what does that mean in general? It depends on the protocol (e.g.
> FTP, HTTP, IRC, SMTP) how much state must be restored/renegotiated
> upon a reconnect, and how much data may have to be re-sent. This seems
> a higher-level feature that transports and protocols will have to
> implement themselves.

I don't need the EventLoop to sync my state upon reconnect - just have
the Transport providing the ability.
Protocols are free to use this, but do not have to.

>> Now, in case we connect to a host by name, and have multiple addresses
>> resolved, and the first connection can not be established, there is no
>> way to 'reconnect()' - as the protocol does not yet exist.
>
> Twisted suggested something here which I haven't implemented yet but
> which seems reasonable -- using a series of short timeouts try
> connecting to the various addresses and keep the first one that
> connects successfully. If multiple addresses connect after the first
> timeout, too bad, just close the redundant sockets, little harm is
> done (though the timeouts should be tuned that this is relatively
> rare, because a server may waste significant resources on such
> redundant connects).

Fast, yes - reasonable? - no.
How would you feel if web browsers behaved like this?
domain name has to be resolved, addresses ordered according to rfc X
which says prefer ipv6 etc., try connecting linear.

>> For almost all the timeouts I mentioned - the protocol needs to take
>> care - so the protocol has to exist before the connection is
>> established in case of outbound connections.
>
> I'm not sure I follow. Can you sketch out some code to help me here?
> ISTM that e.g. the DNS, connect and handshake timeouts can be
> implemented by the machinery that tries to set up the connection
> behind the scenes, and the user's protocol won't know anything of
> these shenanigans. The code that calls create_transport() (actually
> it'll probably be renamed create_client()) will just get a Future that
> either indicates success (and then the protocol and transport are
> successfully hooked up) or an error (and then no protocol was created
> -- whether or not a transport was created is an implementation
> detail).

>From my understanding the Future does not provide any information
which connection to which host using which protocol and credentials
failed?
I'd create the Procotol when trying to create a connection, so the
Protocol is informed when the Transport fails and can take action -
retry, whatever.

>> In case aconnection is lost and reconnecting is required -
>> .reconnect() is handy, so the protocol can request reconnecting.
>
> I'd need more details of how you would like to specify this.

Transport
 * is closed by remote
 * connecting the remote failed
 * resolving the domain name failed

have inform the protocol about the failure - and if the Protocol
changes the Transports state to "reconnect", the Transport creates a
"reconnect timer of N seconds", and retries connecting then.

It is up to the protocol to login, clean state and start fresh or
login and regain old state by issuing required commands to get there.
For ftp - this would be changing the cwd.

>> As this does not work with the current Protocols callbacks I propose
>> Protocols.connection_established() therefore.
>
> How does this differ from connection_made()?

If you create the Protocol before the connection is established - you
may want to distinguish from _made() and _established().
You can not distinguish by using __init__, as it may miss the Transport arg.

> (I'm trying to follow Twisted's guidance here, they seem to have the
> longest experience doing these kinds of things. When I talked to Glyph
> IIRC he was skeptical about reconnecting in general.)

Point is - connections don't last forever, even if we want them to.
If the transport supports "reconnect" - it is still upto the protocol
to either support it or not.
If a Protocol gets disconnected and wants to reconnect -without the
Transport supporting .reconnect()- the protocol has to know it's
factory.

>>  + connection_established()
>>  + timeout_dns()
>>  + timeout_idle()
>>  + timeout_connecting()
>
> Signatures please?

 + connection_established(self, transport)
the connection is established - in your proposal it is connection_made
which I disagree with due to the lack of context in the Futures,
returns None

 + timeout_dns(self)
Resolving the domain name failed - Protocol can .reconnect() for
another try. returns None

 + timeout_idle(self)
connection was idle for some time - send a high layer keep alive or
close the connection - returns None

 + timeout_connecting(self)
connection timed out connection - Protocol can .reconnect() for
another try, returns None

>>  * data_received(data) - if it was possible to return the number of
>> bytes consumed by the protocol, and have the Transport buffer the rest
>> for the next io in call, one would avoid having to do this in every
>> Protocol on it's own - learned from experience.
>
> Twisted has a whole slew of protocol implementation subclasses that
> implement various strategies like line-buffering (including a really
> complex version where you can turn the line buffering on and off) and
> "netstrings". I am trying to limit the PEP's size by not including
> these, but I fully expect that in practice a set of useful protocol
> implementations will be created that handles common cases. I'm not
> convinced that putting this in the transport/protocol interface will
> make user code less buggy: it seems easy for the user code to miscount
> the bytes or not return a count at all in a rarely taken code branch.

Please don't drop this.

You never know how much data you'll receive, you never know how much
data you need for a message, so the Protocol needs a buffer.
Having this io in buffer in the Transports allows every Protocol to
benefit, they try to read a message from the data passed to
data_received(), if the data received is not sufficient to create a
full message, they need to buffer it and wait for more data.
So having the Protocol.data_received return the number of bytes the
Protocol could process, the Transport can do the job, saving it for
every Protocol.
Still - a protocol can have it's own buffering strategy, i.e. in case
of a incremental XML parser which does it's own buffering, and always
return len(data), so the Transport does not buffer anything.
In case the size returned by the Protocol is less than the size of the
buffer given to the protocol, the Transport erases only the consumed
bytes from the buffer, in case the len matches the size of the buffer
passed, erases the buffer.
In nonblocking IO - this buffering has to be done for every protocol,
if Transports could take care, the data_received method of the
Protocol does not need to bother.

A benefit for every protocol.

Else, every Protocol.data_received method starts with self.buffer +=
data and ends with self.buffer = self.buffer[len(consumed):]

You can even default to use a return value of None like len(data).

If you want to be fancy. you could even pass the data to the Protocol
as long as the protocol could consume data and there is data left.
This way a protocol data_received can focus on processing a single
message, if more than a single message is contained in the data - it
will get the data again - as it returned > 0, in case there is no
message in the data left, it will return 0.

This really assists when writing protocols, and as every protocol
needs it, have it in Transport.


>>  * eof_received()/connection_lost(exc) - a connection can be closed
>> clean recv()=0, unclean recv()=-1, errno, SIGPIPE when writing and in
>> case of SSL even more, it is required to distinguish.
>
> Well, this is why eof_received() exists -- to indicate a clean close.
> We should never receive SIGPIPE (Python disables this signal, so you
> always get the errno instead). According to Glyph, SSL doesn't support
> sending eof, so you have to use Content-length or a chunked encoding.
> What other conditions do you expect from SSL that wouldn't be
> distinguished by the exception instance passed to connection_lost()?

Depends on the implementation of SSL, bio/fd Transport/Protocol
SSL_ERROR_SYSCALL and unlikely SSL_ERROR_SSL.
In case of stacking TCP / SSL / http a SSL service rejecting a client
certificate for login is - to me - a connection_lost too.

>>  + nextlayer_is_empty() - called if the Transport (or underlying
>> Protocol in case of chaining) write buffer is empty
>
> That's what the pause()/resume() flow control protocol is for. You
> read the file (presumably it's a file) in e.g. 16K blocks and call
> write() for each block; if the transport can't keep up and exceeds its
> buffer space, it calls protocol.pause() (or perhaps
> protocol.pause_writing(), see discussion above).

I'd still love a callback for "we are empty".
Protocol.io_out - maybe the name changes your mind?

>> Next, what happens if a dns can not be resolved, ssl handshake (in
>> case ssl is transport) or connecting fails - in my opinion it's an
>> error the protocol is supposed to take care of
>>  + error_dns
>>  + error_ssl
>>  + error_connecting
>
> The future returned by create_transport() (aka create_client()) will
> raise the exception.

When do I get this exception - the EventLoop.run() raises?
And this exception has all information required to retry connecting?
Let's say I want to reconnect in case of dns error after 20s, the
Future raised - depending on the Exception I call_later a callback
which create_transport again?- instead of Transport.reconnect() from
the Protocol, not really easier.


MfG
Markus


From guido at python.org  Sun Jan  6 17:24:07 2013
From: guido at python.org (Guido van Rossum)
Date: Sun, 6 Jan 2013 08:24:07 -0800
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CADiSq7ePm4dg=fj3CmUL195297-KNYzMHN76R9xkb2w+LRSeag@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
	<CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
	<CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>
	<CAE_Hg6bLHOn-L7ithXMXL5w8JF7E85gJKuwu0yPqVomo=-Msrg@mail.gmail.com>
	<CADiSq7ePm4dg=fj3CmUL195297-KNYzMHN76R9xkb2w+LRSeag@mail.gmail.com>
Message-ID: <CAP7+vJLOtuOd5qPWUFZAzZEYDddY5xFRKQjbck2y97DC3XXjrQ@mail.gmail.com>

On Sunday, January 6, 2013, Nick Coghlan wrote:

> On Sun, Jan 6, 2013 at 8:20 PM, Laurens Van Houtven <_ at lvh.cc> wrote:
> > Hi Nick,
> >
> >
> > When you say "high latency" (in __exit__), what does "high" mean? Is that
> > order of magnitude what __exit__ usually means now, or network IO
> included?
> >
> > (Use case: distributed locking and remotely stored locks: it doesn't
> take a
> > long time on network scales, but it can take a long time on CPU scales.)
>
> The status quo can only be made to work for in-memory locks. If the
> release step involves network access, then it's closer to the
> "database transaction" use case, because the __exit__ method may need
> to block.


But you don't need to wait for the release. You can do that asynchronously.

Also, have you given the implementation of your 'yielding' proposal any
thought yet?


>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com <javascript:;>   |   Brisbane,
> Australia
>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130106/f05b2d23/attachment.html>

From solipsis at pitrou.net  Sun Jan  6 17:25:38 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 6 Jan 2013 17:25:38 +0100
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
	<CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>
	<CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
	<CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
	<CACEGMv8EaCzuZL_wu1ESFUjKMo-a15799SR=BfroUWB0vME1WQ@mail.gmail.com>
Message-ID: <20130106172538.1a0d563b@pitrou.net>

On Sun, 6 Jan 2013 16:45:52 +0100
Markus <nepenthesdev at gmail.com> wrote:
> >> Transports:
> >> I think SSL should be a Protocol not a transport - implemented using BIO pairs.
> >> If you can chain protocols, like Transport / ProtocolA / ProtocolB you can have
> >> TCP / SSL / HTTP as https or  TCP / SSL / SOCKS /  HTTP as https via
> >> ssl enabled socks proxy without having to much problems. Another
> >> example, shaping a connection TCP / RATELIMIT / HTTP.
> >
> > Interesting idea. This may be up to the implementation -- not every
> > implementation may have BIO wrappers available (AFAIK the stdlib
> > doesn't),
> 
> Right, for ssl bios pyopenssl is required - or ctypes.

Or a patch to Python 3.4.
See http://docs.python.org/devguide/

By the way, how does "SSL as a protocol" deal with SNI? How does the
HTTP layer tell the SSL layer which servername to indicate?
Or, on the server-side, how would the SSL layer invoke the HTTP layer's
servername callback?

> > (I'm trying to follow Twisted's guidance here, they seem to have the
> > longest experience doing these kinds of things. When I talked to Glyph
> > IIRC he was skeptical about reconnecting in general.)
> 
> Point is - connections don't last forever, even if we want them to.
> If the transport supports "reconnect" - it is still upto the protocol
> to either support it or not.
> If a Protocol gets disconnected and wants to reconnect -without the
> Transport supporting .reconnect()- the protocol has to know it's
> factory.

+1 to this.

>  + connection_established(self, transport)
> the connection is established - in your proposal it is connection_made
> which I disagree with due to the lack of context in the Futures,
> returns None
> 
>  + timeout_dns(self)
> Resolving the domain name failed - Protocol can .reconnect() for
> another try. returns None
> 
>  + timeout_idle(self)
> connection was idle for some time - send a high layer keep alive or
> close the connection - returns None
> 
>  + timeout_connecting(self)
> connection timed out connection - Protocol can .reconnect() for
> another try, returns None

I would rather have connection_failed(self, exc).
(where exc can be a OSError or a socket.timeout)

> >>  * data_received(data) - if it was possible to return the number of
> >> bytes consumed by the protocol, and have the Transport buffer the rest
> >> for the next io in call, one would avoid having to do this in every
> >> Protocol on it's own - learned from experience.
> >
> > Twisted has a whole slew of protocol implementation subclasses that
> > implement various strategies like line-buffering (including a really
> > complex version where you can turn the line buffering on and off) and
> > "netstrings". I am trying to limit the PEP's size by not including
> > these, but I fully expect that in practice a set of useful protocol
> > implementations will be created that handles common cases. I'm not
> > convinced that putting this in the transport/protocol interface will
> > make user code less buggy: it seems easy for the user code to miscount
> > the bytes or not return a count at all in a rarely taken code branch.
> 
> Please don't drop this.
> 
> You never know how much data you'll receive, you never know how much
> data you need for a message, so the Protocol needs a buffer.
> Having this io in buffer in the Transports allows every Protocol to
> benefit, they try to read a message from the data passed to
> data_received(), if the data received is not sufficient to create a
> full message, they need to buffer it and wait for more data.

Another solution for every Protocol to benefit is to provide a bunch of
base Protocol implementations, as Twisted does: LineReceiver, etc.

Your proposed solution (returning the number of consumed bytes) implies
a lot of slicing and concatenation of immutable bytes objects inside
the Transport, which may be quite inefficient.

Regards

Antoine.




From dreamingforward at gmail.com  Sun Jan  6 19:01:33 2013
From: dreamingforward at gmail.com (Mark Adam)
Date: Sun, 6 Jan 2013 12:01:33 -0600
Subject: [Python-ideas] Vigil
Message-ID: <CAMjeLr8CqHd_Kph2K-90J_uq1sPnpJp6zL8yib+KYyvcVtLLxA@mail.gmail.com>

There's an interesting python "variant" (more of an overlay actually)
that is rather intriguing on github -- Vigil:  a truly safe progamming
language.

>From the readme:

"Infinitely more important than mere syntax and semantics are its
addition of supreme moral vigilance. This is similar to contracts, but
less legal and more medieval."

http://github.com/munificent/vigil

Mark


From ubershmekel at gmail.com  Sun Jan  6 21:08:40 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Sun, 6 Jan 2013 22:08:40 +0200
Subject: [Python-ideas] Vigil
In-Reply-To: <CAMjeLr8CqHd_Kph2K-90J_uq1sPnpJp6zL8yib+KYyvcVtLLxA@mail.gmail.com>
References: <CAMjeLr8CqHd_Kph2K-90J_uq1sPnpJp6zL8yib+KYyvcVtLLxA@mail.gmail.com>
Message-ID: <CANSw7KwJKFcHjnC1vAH185Okr7cZ3uNvPMAv+kfc49eo-1tDug@mail.gmail.com>

On Sun, Jan 6, 2013 at 8:01 PM, Mark Adam <dreamingforward at gmail.com> wrote:

> There's an interesting python "variant" (more of an overlay actually)
> that is rather intriguing on github -- Vigil:  a truly safe progamming
> language.
>
>
It's a joke language that deletes code when an assert fails. Python-ideas
really isn't the place to post this. Try out http://www.reddit.com/r/python
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130106/d285dbc4/attachment.html>

From dreamingforward at gmail.com  Sun Jan  6 21:44:09 2013
From: dreamingforward at gmail.com (Mark Adam)
Date: Sun, 6 Jan 2013 14:44:09 -0600
Subject: [Python-ideas] Vigil
In-Reply-To: <CANSw7KwJKFcHjnC1vAH185Okr7cZ3uNvPMAv+kfc49eo-1tDug@mail.gmail.com>
References: <CAMjeLr8CqHd_Kph2K-90J_uq1sPnpJp6zL8yib+KYyvcVtLLxA@mail.gmail.com>
	<CANSw7KwJKFcHjnC1vAH185Okr7cZ3uNvPMAv+kfc49eo-1tDug@mail.gmail.com>
Message-ID: <CAMjeLr-XRGxJUU-NHpdXUhuJyk1v5CDnwt-KAsoksnj_JDRa_g@mail.gmail.com>

On Sun, Jan 6, 2013 at 2:08 PM, Yuval Greenfield <ubershmekel at gmail.com> wrote:
> On Sun, Jan 6, 2013 at 8:01 PM, Mark Adam <dreamingforward at gmail.com> wrote:
>>
>> There's an interesting python "variant" (more of an overlay actually)
>> that is rather intriguing on github -- Vigil:  a truly safe progamming
>> language.
>>
>
> It's a joke language that deletes code when an assert fails. Python-ideas
> really isn't the place to post this. Try out http://www.reddit.com/r/python

Yeah, I sort of got that, but imagine in a multi-user p2p environment
(the internet "global brain"), it could be a way to enforce policy
across the network.   I know list policy, but I rather like the
keywords it used to expand on the language.  By making the programmer
encode expectations, the multiprocessing code doesn't have to work so
hard with exception handlin.

mark


From nepenthesdev at gmail.com  Sun Jan  6 21:46:04 2013
From: nepenthesdev at gmail.com (Markus)
Date: Sun, 6 Jan 2013 21:46:04 +0100
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
In-Reply-To: <20130106172538.1a0d563b@pitrou.net>
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
	<CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>
	<CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
	<CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
	<CACEGMv8EaCzuZL_wu1ESFUjKMo-a15799SR=BfroUWB0vME1WQ@mail.gmail.com>
	<20130106172538.1a0d563b@pitrou.net>
Message-ID: <CACEGMv8xU-nFypg3D=OAwW+YsFfKeHgwQxy1RRYHnmmFjJRLkg@mail.gmail.com>

Hi,

On Sun, Jan 6, 2013 at 5:25 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sun, 6 Jan 2013 16:45:52 +0100
> Markus <nepenthesdev at gmail.com> wrote:
>>
>> Right, for ssl bios pyopenssl is required - or ctypes.
>
> Or a patch to Python 3.4.
> See http://docs.python.org/devguide/

Or discuss merging pyopenssl.

> By the way, how does "SSL as a protocol" deal with SNI? How does the
> HTTP layer tell the SSL layer which servername to indicate?
SSL_set_tlsext_host_name

> Or, on the server-side, how would the SSL layer invoke the HTTP layer's
> servername callback?

callback - set via
SSL_CTX_set_tlsext_servername_callback
SSL_CTX_set_tlsext_servername_arg


> I would rather have connection_failed(self, exc).
> (where exc can be a OSError or a socket.timeout)

I'd prefer a single callback per error, allows to preserve defaults
for certain cases when inheriting from Protocol.

>> You never know how much data you'll receive, you never know how much
>> data you need for a message, so the Protocol needs a buffer.
>> Having this io in buffer in the Transports allows every Protocol to
>> benefit, they try to read a message from the data passed to
>> data_received(), if the data received is not sufficient to create a
>> full message, they need to buffer it and wait for more data.
>
> Another solution for every Protocol to benefit is to provide a bunch of
> base Protocol implementations, as Twisted does: LineReceiver, etc.

In case your Protocol.data_received gets called until there is nothing
left or 0 is returned, the LineReceiver is simply looking for an \0 or
\n in the data, process this line and return the length of the line or
0 in case there is no line terminatior.

> Your proposed solution (returning the number of consumed bytes) implies
> a lot of slicing and concatenation of immutable bytes objects inside
> the Transport, which may be quite inefficient.

Yes - but is has to be done anyway, so it's just a matter of having
this problem in stdlib, where it is easy to improve for everybody, or
everybody else has to come up with his own implementation as part of
Protocol.

I'd prefer to have this in Transport therefore - having everybody
benefit from any improvement for free.

Markus


From solipsis at pitrou.net  Sun Jan  6 22:05:39 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 6 Jan 2013 22:05:39 +0100
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
	<CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>
	<CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
	<CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
	<CACEGMv8EaCzuZL_wu1ESFUjKMo-a15799SR=BfroUWB0vME1WQ@mail.gmail.com>
	<20130106172538.1a0d563b@pitrou.net>
	<CACEGMv8xU-nFypg3D=OAwW+YsFfKeHgwQxy1RRYHnmmFjJRLkg@mail.gmail.com>
Message-ID: <20130106220539.5d98f416@pitrou.net>

On Sun, 6 Jan 2013 21:46:04 +0100
Markus <nepenthesdev at gmail.com> wrote:
> > By the way, how does "SSL as a protocol" deal with SNI? How does the
> > HTTP layer tell the SSL layer which servername to indicate?
> SSL_set_tlsext_host_name
> 
> > Or, on the server-side, how would the SSL layer invoke the HTTP layer's
> > servername callback?
> 
> callback - set via
> SSL_CTX_set_tlsext_servername_callback
> SSL_CTX_set_tlsext_servername_arg

Right, these are the C OpenSSL APIs. My question was about the
Python protocol / transport level. How can they be exposed?

> > Your proposed solution (returning the number of consumed bytes) implies
> > a lot of slicing and concatenation of immutable bytes objects inside
> > the Transport, which may be quite inefficient.
> 
> Yes - but is has to be done anyway, so it's just a matter of having
> this problem in stdlib, where it is easy to improve for everybody, or
> everybody else has to come up with his own implementation as part of
> Protocol.

Actually, the point is that it doesn't have to be done.

An internal buffering mechanism in a protocol can avoid making many
copies and concatenations (e.g. by using a list or a deque to buffer the
incoming chunks). The transport cannot, since the Protocol API mandates
that data_received() be called with a bytes object representing the
available data.

Regards

Antoine.




From jkbbwr at gmail.com  Sun Jan  6 22:14:15 2013
From: jkbbwr at gmail.com (Jakob Bowyer)
Date: Sun, 6 Jan 2013 21:14:15 +0000
Subject: [Python-ideas] Vigil
In-Reply-To: <CAMjeLr-XRGxJUU-NHpdXUhuJyk1v5CDnwt-KAsoksnj_JDRa_g@mail.gmail.com>
References: <CAMjeLr8CqHd_Kph2K-90J_uq1sPnpJp6zL8yib+KYyvcVtLLxA@mail.gmail.com>
	<CANSw7KwJKFcHjnC1vAH185Okr7cZ3uNvPMAv+kfc49eo-1tDug@mail.gmail.com>
	<CAMjeLr-XRGxJUU-NHpdXUhuJyk1v5CDnwt-KAsoksnj_JDRa_g@mail.gmail.com>
Message-ID: <CAA+RL7GYF3=ceLG-DiX-bZA-z3HfqTx065cnxqxFOGHTWy0CVA@mail.gmail.com>

But what about constraints on processor time, memory usage, recursion
limit, accept and return types, we are starting to get a bit verbose here.


On Sun, Jan 6, 2013 at 8:44 PM, Mark Adam <dreamingforward at gmail.com> wrote:

> On Sun, Jan 6, 2013 at 2:08 PM, Yuval Greenfield <ubershmekel at gmail.com>
> wrote:
> > On Sun, Jan 6, 2013 at 8:01 PM, Mark Adam <dreamingforward at gmail.com>
> wrote:
> >>
> >> There's an interesting python "variant" (more of an overlay actually)
> >> that is rather intriguing on github -- Vigil:  a truly safe progamming
> >> language.
> >>
> >
> > It's a joke language that deletes code when an assert fails. Python-ideas
> > really isn't the place to post this. Try out
> http://www.reddit.com/r/python
>
> Yeah, I sort of got that, but imagine in a multi-user p2p environment
> (the internet "global brain"), it could be a way to enforce policy
> across the network.   I know list policy, but I rather like the
> keywords it used to expand on the language.  By making the programmer
> encode expectations, the multiprocessing code doesn't have to work so
> hard with exception handlin.
>
> mark
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130106/4b906660/attachment.html>

From ncoghlan at gmail.com  Mon Jan  7 06:47:25 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 7 Jan 2013 15:47:25 +1000
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CAP7+vJLOtuOd5qPWUFZAzZEYDddY5xFRKQjbck2y97DC3XXjrQ@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
	<CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
	<CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>
	<CAE_Hg6bLHOn-L7ithXMXL5w8JF7E85gJKuwu0yPqVomo=-Msrg@mail.gmail.com>
	<CADiSq7ePm4dg=fj3CmUL195297-KNYzMHN76R9xkb2w+LRSeag@mail.gmail.com>
	<CAP7+vJLOtuOd5qPWUFZAzZEYDddY5xFRKQjbck2y97DC3XXjrQ@mail.gmail.com>
Message-ID: <CADiSq7dmsPCmu7tXiJQWFVhG8Bm2ysvqregdsOn5AzjKVB6eBg@mail.gmail.com>

On Mon, Jan 7, 2013 at 2:24 AM, Guido van Rossum <guido at python.org> wrote:
> On Sunday, January 6, 2013, Nick Coghlan wrote:
>>
>> On Sun, Jan 6, 2013 at 8:20 PM, Laurens Van Houtven <_ at lvh.cc> wrote:
>> > Hi Nick,
>> >
>> >
>> > When you say "high latency" (in __exit__), what does "high" mean? Is
>> > that
>> > order of magnitude what __exit__ usually means now, or network IO
>> > included?
>> >
>> > (Use case: distributed locking and remotely stored locks: it doesn't
>> > take a
>> > long time on network scales, but it can take a long time on CPU scales.)
>>
>> The status quo can only be made to work for in-memory locks. If the
>> release step involves network access, then it's closer to the
>> "database transaction" use case, because the __exit__ method may need
>> to block.
>
> But you don't need to wait for the release. You can do that asynchronously.

Ah, true, I hadn't thought of that. So yes, any case where the
__exit__ method can be "fire-and-forget" is also straightforward to
implement with just PEP 3156. That takes us back to things like
database transactions being the only ones where

> Also, have you given the implementation of your 'yielding' proposal any
> thought yet?

Not in depth. Off the top of my head, I'd suggest:
  - make "yielding" a new kind of node in the grammar (so you can't
write "yielding expr" in arbitrary locations, but only in those that
are marked as allowing it)
  - flag for loops and with statements as accepting these nodes as
iterables and context managers respectively
  - create a new Yielding AST node (with a single Expr node as the child)
  - emit different bytecode in the affected compound statements based
on whether the relevant subnode is an ordinary expression (thus
invoking the special methods as "obj.__method__()") or a yielding one
(thus invoking the special methods as "yield from obj.__method__()").

I'm not seeing any obvious holes in that strategy, but I haven't
looked closely at the compiler code in a while, so there may be
limitations I haven't accounted for.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From nepenthesdev at gmail.com  Mon Jan  7 08:31:51 2013
From: nepenthesdev at gmail.com (Markus)
Date: Mon, 7 Jan 2013 08:31:51 +0100
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
In-Reply-To: <20130106220539.5d98f416@pitrou.net>
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
	<CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>
	<CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
	<CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
	<CACEGMv8EaCzuZL_wu1ESFUjKMo-a15799SR=BfroUWB0vME1WQ@mail.gmail.com>
	<20130106172538.1a0d563b@pitrou.net>
	<CACEGMv8xU-nFypg3D=OAwW+YsFfKeHgwQxy1RRYHnmmFjJRLkg@mail.gmail.com>
	<20130106220539.5d98f416@pitrou.net>
Message-ID: <CACEGMv9-J6PjPti40=ZNLNfWXf0nE3okjCjFhLt+UbAasPm4Gw@mail.gmail.com>

Hi,

On Sun, Jan 6, 2013 at 10:05 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sun, 6 Jan 2013 21:46:04 +0100
> Markus <nepenthesdev at gmail.com> wrote:
>> > By the way, how does "SSL as a protocol" deal with SNI? How does the
>> > HTTP layer tell the SSL layer which servername to indicate?

Transport.ctrl(name, **kwargs) - if the Transport lacks the queried
control, it has to ask his upper.

In case of chains like TCP / SSL / HTTP, SSL can query the hostname
from it's Transport - or HTTP can query

>> > Or, on the server-side, how would the SSL layer invoke the HTTP layer's
>> > servername callback?

Transport.ctrl(name, **kwargs)

HTTP can query for the name, in case of TCP / SSL / HTTP, SSL may
provide an answer.

> Right, these are the C OpenSSL APIs. My question was about the
> Python protocol / transport level. How can they be exposed?

Attributes of the Transport(-side of a Protocol in case of stacking),
which can be queried.
For TCP e.g. it would be handy to store connection-related things in a
defined data structure which keeps domain, resolved addresses, and
used-address for current connection together.
like TCP.{local,remote}.{address,addresses,domain,port}

For a client, SSL can query for "TCP.remote.domain" and in case it is
not an ip address - use for SNI.
For a server, HTTP can query SSL.server_name_indication.

> An internal buffering mechanism in a protocol can avoid making many
> copies and concatenations (e.g. by using a list or a deque to buffer the
> incoming chunks). The transport cannot, since the Protocol API mandates
> that data_received() be called with a bytes object representing the
> available data.

bytes-like would be much better then for the definition of data_received.
same semantics, but a list of memoryviews with offset, whatever is
required internally.


MfG
Markus


From guido at python.org  Tue Jan  8 02:06:35 2013
From: guido at python.org (Guido van Rossum)
Date: Mon, 7 Jan 2013 17:06:35 -0800
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CADiSq7dmsPCmu7tXiJQWFVhG8Bm2ysvqregdsOn5AzjKVB6eBg@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
	<CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
	<CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>
	<CAE_Hg6bLHOn-L7ithXMXL5w8JF7E85gJKuwu0yPqVomo=-Msrg@mail.gmail.com>
	<CADiSq7ePm4dg=fj3CmUL195297-KNYzMHN76R9xkb2w+LRSeag@mail.gmail.com>
	<CAP7+vJLOtuOd5qPWUFZAzZEYDddY5xFRKQjbck2y97DC3XXjrQ@mail.gmail.com>
	<CADiSq7dmsPCmu7tXiJQWFVhG8Bm2ysvqregdsOn5AzjKVB6eBg@mail.gmail.com>
Message-ID: <CAP7+vJKn26jNra+YsGW4OSA-OXq-WoGUjxtxMpzr_4P0C+O6rg@mail.gmail.com>

On Sun, Jan 6, 2013 at 9:47 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Mon, Jan 7, 2013 at 2:24 AM, Guido van Rossum <guido at python.org> wrote:
>> On Sunday, January 6, 2013, Nick Coghlan wrote:
>>>
>>> On Sun, Jan 6, 2013 at 8:20 PM, Laurens Van Houtven <_ at lvh.cc> wrote:
>>> > Hi Nick,
>>> >
>>> >
>>> > When you say "high latency" (in __exit__), what does "high" mean? Is
>>> > that
>>> > order of magnitude what __exit__ usually means now, or network IO
>>> > included?
>>> >
>>> > (Use case: distributed locking and remotely stored locks: it doesn't
>>> > take a
>>> > long time on network scales, but it can take a long time on CPU scales.)
>>>
>>> The status quo can only be made to work for in-memory locks. If the
>>> release step involves network access, then it's closer to the
>>> "database transaction" use case, because the __exit__ method may need
>>> to block.
>>
>> But you don't need to wait for the release. You can do that asynchronously.
>
> Ah, true, I hadn't thought of that. So yes, any case where the
> __exit__ method can be "fire-and-forget" is also straightforward to
> implement with just PEP 3156. That takes us back to things like
> database transactions being the only ones where

And 'yielding' wouldn't do anything about this, would it?

>> Also, have you given the implementation of your 'yielding' proposal any
>> thought yet?
>
> Not in depth. Off the top of my head, I'd suggest:
>   - make "yielding" a new kind of node in the grammar (so you can't
> write "yielding expr" in arbitrary locations, but only in those that
> are marked as allowing it)
>   - flag for loops and with statements as accepting these nodes as
> iterables and context managers respectively
>   - create a new Yielding AST node (with a single Expr node as the child)
>   - emit different bytecode in the affected compound statements based
> on whether the relevant subnode is an ordinary expression (thus
> invoking the special methods as "obj.__method__()") or a yielding one
> (thus invoking the special methods as "yield from obj.__method__()").
>
> I'm not seeing any obvious holes in that strategy, but I haven't
> looked closely at the compiler code in a while, so there may be
> limitations I haven't accounted for.

So would 'yielding' insert the equivalent of 'yield from' or the
equivalent of 'yield' in the code?

-- 
--Guido van Rossum (python.org/~guido)


From brian at python.org  Tue Jan  8 04:38:47 2013
From: brian at python.org (Brian Curtin)
Date: Mon, 7 Jan 2013 21:38:47 -0600
Subject: [Python-ideas] FYI - wiki.python.org compromised
Message-ID: <CAD+XWwo4KSsMC+OwKhYv5cJc8uFp5DPU_C1hONmqhdpBpEmx2Q@mail.gmail.com>

On December 28th, an unknown attacker used a previously unknown remote
code exploit on http://wiki.python.org/. The attacker was able to get
shell access as the "moin" user, but no other services were affected.

Some time later, the attacker deleted all files owned by the "moin"
user, including all instance data for both the Python and Jython
wikis. The attack also had full access to all MoinMoin user data on
all wikis. In light of this, the Python Software Foundation encourages
all wiki users to change their password on other sites if the same one
is in use elsewhere. We apologize for the inconvenience and will post
further news as we bring the new and improved wiki.python.org online.

If you have any questions about this incident please contact
jnoller at python.org. Thank you for your patience.


From ncoghlan at gmail.com  Tue Jan  8 11:13:50 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 8 Jan 2013 20:13:50 +1000
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CAP7+vJKn26jNra+YsGW4OSA-OXq-WoGUjxtxMpzr_4P0C+O6rg@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
	<CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
	<CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>
	<CAE_Hg6bLHOn-L7ithXMXL5w8JF7E85gJKuwu0yPqVomo=-Msrg@mail.gmail.com>
	<CADiSq7ePm4dg=fj3CmUL195297-KNYzMHN76R9xkb2w+LRSeag@mail.gmail.com>
	<CAP7+vJLOtuOd5qPWUFZAzZEYDddY5xFRKQjbck2y97DC3XXjrQ@mail.gmail.com>
	<CADiSq7dmsPCmu7tXiJQWFVhG8Bm2ysvqregdsOn5AzjKVB6eBg@mail.gmail.com>
	<CAP7+vJKn26jNra+YsGW4OSA-OXq-WoGUjxtxMpzr_4P0C+O6rg@mail.gmail.com>
Message-ID: <CADiSq7dJpc_M-H07ZGi0BJsbHh-M+=j7WUKhdV+0BHdC=xDkQw@mail.gmail.com>

On Tue, Jan 8, 2013 at 11:06 AM, Guido van Rossum <guido at python.org> wrote:
> On Sun, Jan 6, 2013 at 9:47 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Ah, true, I hadn't thought of that. So yes, any case where the
>> __exit__ method can be "fire-and-forget" is also straightforward to
>> implement with just PEP 3156. That takes us back to things like
>> database transactions being the only ones where
>
> And 'yielding' wouldn't do anything about this, would it?

Any new syntax should properly handle the database transaction context
manager problem, otherwise what's the point? The workarounds for
asynchronous __next__ and __enter__ methods aren't too bad - it's
allowing asynchronous __exit__ methods that can only be solved with
new syntax.

>> I'm not seeing any obvious holes in that strategy, but I haven't
>> looked closely at the compiler code in a while, so there may be
>> limitations I haven't accounted for.
>
> So would 'yielding' insert the equivalent of 'yield from' or the
> equivalent of 'yield' in the code?

Given PEP 3156, the most logical would be for it to use "yield from",
since that is becoming the asynchronous equivalent of a normal
function call.

Something like:

    with yielding db.session() as :
        # Do stuff here

Could be made roughly equivalent to:

    _async_cm = db.session()
    conn = yield from _async_cm.__enter__()
    try:
        # Use session here
    except Exception as exc:
        # Rollback
        yield from _async_cm.__exit__(type(exc), exc, exc.__traceback__)
    else:
        # Commit
        yield from _async_cm.__exit__(None, None, None)

Creating a contextlib.contextmanager style decorator for writing such
asynchronous context managers would be difficult, though, as the two
different meanings of "yield" would get in each other's way - you
would need something like "yield EnterResult(expr)" to indicate to
__enter__ in the wrapper object when to stop. It would probably be
easier to just write separate __enter__ and __exit__ methods as
coroutines.

However, note that I just wanted to be clear that I consider the idea
of a syntax for "asynchronous context managers" plausible, and
sketched out a possible design to explain *why* I thought it should be
possible. My focus will stay with PEP 432 until that's done.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From guido at python.org  Tue Jan  8 19:32:00 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Jan 2013 10:32:00 -0800
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CADiSq7dJpc_M-H07ZGi0BJsbHh-M+=j7WUKhdV+0BHdC=xDkQw@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
	<CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
	<CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>
	<CAE_Hg6bLHOn-L7ithXMXL5w8JF7E85gJKuwu0yPqVomo=-Msrg@mail.gmail.com>
	<CADiSq7ePm4dg=fj3CmUL195297-KNYzMHN76R9xkb2w+LRSeag@mail.gmail.com>
	<CAP7+vJLOtuOd5qPWUFZAzZEYDddY5xFRKQjbck2y97DC3XXjrQ@mail.gmail.com>
	<CADiSq7dmsPCmu7tXiJQWFVhG8Bm2ysvqregdsOn5AzjKVB6eBg@mail.gmail.com>
	<CAP7+vJKn26jNra+YsGW4OSA-OXq-WoGUjxtxMpzr_4P0C+O6rg@mail.gmail.com>
	<CADiSq7dJpc_M-H07ZGi0BJsbHh-M+=j7WUKhdV+0BHdC=xDkQw@mail.gmail.com>
Message-ID: <CAP7+vJKdxJCzKCnoHPVM5O1HqewyLpSG5LfXJGFQ6GGU+xKZnA@mail.gmail.com>

On Tue, Jan 8, 2013 at 2:13 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Tue, Jan 8, 2013 at 11:06 AM, Guido van Rossum <guido at python.org> wrote:
>> On Sun, Jan 6, 2013 at 9:47 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> Ah, true, I hadn't thought of that. So yes, any case where the
>>> __exit__ method can be "fire-and-forget" is also straightforward to
>>> implement with just PEP 3156. That takes us back to things like
>>> database transactions being the only ones where
>>
>> And 'yielding' wouldn't do anything about this, would it?
>
> Any new syntax should properly handle the database transaction context
> manager problem, otherwise what's the point? The workarounds for
> asynchronous __next__ and __enter__ methods aren't too bad - it's
> allowing asynchronous __exit__ methods that can only be solved with
> new syntax.

Is your idea that if you write "with yielding x as y: blah" this
effectively replaces the calls to __enter__ and __exit__ with "yield
from x.__enter__()" and "yield from x.__enter__()"? (And assigning the
result of yield fro, x.__enter__() to y.)

>>> I'm not seeing any obvious holes in that strategy, but I haven't
>>> looked closely at the compiler code in a while, so there may be
>>> limitations I haven't accounted for.
>>
>> So would 'yielding' insert the equivalent of 'yield from' or the
>> equivalent of 'yield' in the code?
>
> Given PEP 3156, the most logical would be for it to use "yield from",
> since that is becoming the asynchronous equivalent of a normal
> function call.
>
> Something like:
>
>     with yielding db.session() as :
>         # Do stuff here
>
> Could be made roughly equivalent to:
>
>     _async_cm = db.session()
>     conn = yield from _async_cm.__enter__()
>     try:
>         # Use session here
>     except Exception as exc:
>         # Rollback
>         yield from _async_cm.__exit__(type(exc), exc, exc.__traceback__)
>     else:
>         # Commit
>         yield from _async_cm.__exit__(None, None, None)
>
> Creating a contextlib.contextmanager style decorator for writing such
> asynchronous context managers would be difficult, though, as the two
> different meanings of "yield" would get in each other's way - you
> would need something like "yield EnterResult(expr)" to indicate to
> __enter__ in the wrapper object when to stop. It would probably be
> easier to just write separate __enter__ and __exit__ methods as
> coroutines.
>
> However, note that I just wanted to be clear that I consider the idea
> of a syntax for "asynchronous context managers" plausible, and
> sketched out a possible design to explain *why* I thought it should be
> possible. My focus will stay with PEP 432 until that's done.

Sure, I didn't intend any time pressure. Others may take this up as
well -- or if nobody cares, we can put it off until the need has been
demonstrated more. possibly after Python 3.4 is release.

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Tue Jan  8 21:11:25 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Jan 2013 12:11:25 -0800
Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted
In-Reply-To: <CACEGMv8EaCzuZL_wu1ESFUjKMo-a15799SR=BfroUWB0vME1WQ@mail.gmail.com>
References: <CACEGMv8FiCMCDRLjfPWJPv2jdoJpe9i_3uOYDoNUcVN97MXw1A@mail.gmail.com>
	<CAP7+vJJf1PAJzu7DbO5MB4CRi6bFOuwKijBPjznE1ArhRMd9bg@mail.gmail.com>
	<CACEGMv_gxUQ2E9pU3_UJp_QkH1VEzw9+tTgv5OCNANK0Qircng@mail.gmail.com>
	<CAP7+vJJgrYHUmbo27dgEtReboESTbyRKTPVU+TdDcucT+_AQVw@mail.gmail.com>
	<CACEGMv8EaCzuZL_wu1ESFUjKMo-a15799SR=BfroUWB0vME1WQ@mail.gmail.com>
Message-ID: <CAP7+vJKTXzqSv_zmVyBYr0MiDyE=Df=sVYKjkRyncnxhBRHNzA@mail.gmail.com>

(Trimming stuff that doesn't need a reply -- this doesn't mean I
agree, just that I don't see a need for more discussion.)

On Sun, Jan 6, 2013 at 7:45 AM, Markus <nepenthesdev at gmail.com> wrote:
> Exactly - signals are a mess, threading and signals make things worse
> - I'm no expert here, but I just have had experienced problems with
> signal handling and threads, basically the same problems you describe.
> Creating the threads after installing signal handlers (in the main
> thread) works, and signals get delivered to the main thread,
> installing the signal handlers (in the main thread) after creating the
> threads - and the signals ended up in *some thread*.
> Additionally it depended on if you'd install your signal handler with
> signal() or sigaction() and flags when creating threads.

So I suppose you're okay with the signal handling API I proposed? I'll
add it to the PEP then, with a note that it may raise an exception if
not supported.

> I'd expect the EventLoop never to create threads on his own behalf,
> it's just wrong.

Here's the way it works. You can call run_in_executor(executor,
function, *args) where executor is an executor (a fancy thread pool)
that you create. You have full control. However you can pass
executor=None and then the event loop will create its own, default
executor -- or it will use a default executor that you have created
and given to it previously.

It needs the default executor so that it can implement getaddrinfo()
by calling the stdlib socket.getaddrinfo() in a thread -- and
getaddrinfo() is essential for creating transports. The user can take
full control over the executor though -- you could set the default to
something that always raises an exception.

> If you can't provide some functionality without threads, don't provide
> the functionality.

I don't see this as an absolute requirement. The threads are an
implementation detail (other event loop implementations could
implement getaddrinfo() differently, taking directly to DNS using
tasklets or callbacks), and you can control its use of threads.

> Besides, getaddrinfo() is a bad choice, as it relies on distribution
> specific flags.
> For example ip6 link local scope exists on every current platform, but
> - when resolving an link local scope address -not domain- with
> getaddrinfo, getaddrinfo will fail if no global routed ipv6 address is
> available on debian/ubuntu.

Nevertheless it is the only thing available in the stdlib. If you want
to improve it, that's fine, but just use the issue tracker.

>>> As Transport are part of the PEP - some more:
>>>
>>> EventLoop
>>>  * create_transport(protocol_factory, host, port, **kwargs)
>>>    kwargs requires "local" - local address as tuple like
>>> ('fe80::14ad:1680:54e1:6a91%eth0',0) - so you can bind when using ipv6
>>> link local scope.
>>>   or ('192.168.2.1',5060) - bind local port for udp
>>
>> Not sure I understand. What socket.connect() (or other API) call
>> parameters does this correspond to? What can't expressed through the
>> host and port parameters?
>
> In case you have multiple interfaces, and multiple gateways, you need
> to assign the connection to an address - so the kernel knows which
> interface to use for the connection - else he'd default to "the first"
> interface.
> In IPv6 link-local scope you can have multiple addresses in the same
> subnet fe80:: - IIRC if you want to connect somewhere, you have to
> either set the scope_id of the remote, or bind the "source" address
> before - I don't know how to set the scope_id in python, it's in
> sockaddr_in6.
>
> In terms of socket. it is a bind before a connect.
>
> s = socket.socket(AF_INET6,SOCK_DGRAM,0)
> s.bind(('fe80::1',0))
> s.connect(('fe80::2',4712))
>
> same for ipv4 in case you are multi homed and rely on source based routing.

Ok, this seems a useful option to add to create_transport(). Your
example shows SOCK_DGRAM -- is it also relevant for SOCK_STREAM?

>>> Handler:
>>> Requiring 2 handlers for every active connection r/w is highly ineffective.
>>
>> How so? What is the concern?
>
> Of course you can fold the fdsets, but in case you need a seperate
> handler for write, you re-create it for every write - see below.

That would seem to depend on the write rate.

>>> Additionally, I can .stop() the handler without having to know the fd,
>>> .stop() the handler, change the events the handler is looking for,
>>> restart the handler with .start().
>>> In your proposal, I'd create a new handler every time I want to sent
>>> something, poll for readability - discard the handler when I'm done,
>>> create a new one for the next sent.
>>
>> The questions are, does it make any difference in efficiency (when
>> using Python -- the performance of the C API is hardly relevant here),
>> and how often does this pattern occur.
>
> Every time you send - you poll for write-ability, you get the
> callback, you write, you got nothing left, you stop polling for
> write-ability.

That's not quite how it's implemented. The code first tries to send
without polling. Since the socket is non-blocking, if this succeeds,
great -- only if it returns a partial send or EAGAIN we register a
callback. If the protocol keeps the buffer filled the callback doesn't
have to be recreated each time. If the protocol doesn't keep the
buffer full, we must unregister the callback to prevent
select/poll/etc. from calling it over and over again, there's nothing
you can do about that.

>>>  * reconnect() - I'd love to be able to reconnect a transport
>>
>> But what does that mean in general? It depends on the protocol (e.g.
>> FTP, HTTP, IRC, SMTP) how much state must be restored/renegotiated
>> upon a reconnect, and how much data may have to be re-sent. This seems
>> a higher-level feature that transports and protocols will have to
>> implement themselves.
>
> I don't need the EventLoop to sync my state upon reconnect - just have
> the Transport providing the ability.
> Protocols are free to use this, but do not have to.

Aha, I get it. You want to be able to call transport.reconnect() from
connection_lost() and it should respond by eventually calling
protocol.connection_made(transport) again. Of course, this only
applies to clients -- for a server to reconnect to a client makes no
sense (it would be up to the client).

That seems simple enough to implement, but Glyph recommended strongly
against this, because reusing the protocol object often means that
some private state of the protocol may not be properly reinitialized.

It would also be difficult to decide where errors from the reconnect
attempt should go -- reconnect() itself must return immediately (since
connection_lost() cannot wait for I/O, it can only schedule async I/O
events).

But at a higher level in your app it would be easy to set this up: you
just call eventloop.create_transport(lambda: protocol, ...) where
protocol is a protocol instance you've created earlier.

>> Twisted suggested something here which I haven't implemented yet but
>> which seems reasonable -- using a series of short timeouts try
>> connecting to the various addresses and keep the first one that
>> connects successfully. If multiple addresses connect after the first
>> timeout, too bad, just close the redundant sockets, little harm is
>> done (though the timeouts should be tuned that this is relatively
>> rare, because a server may waste significant resources on such
>> redundant connects).
>
> Fast, yes - reasonable? - no.
> How would you feel if web browsers behaved like this?

I have no idea -- who says they aren't doing this? Browsers do tons of
stuff that I am not aware of.

> domain name has to be resolved, addresses ordered according to rfc X
> which says prefer ipv6 etc., try connecting linear.

Sure. It was just an idea. I'll see what Twisted actually does.

>>> For almost all the timeouts I mentioned - the protocol needs to take
>>> care - so the protocol has to exist before the connection is
>>> established in case of outbound connections.
>>
>> I'm not sure I follow. Can you sketch out some code to help me here?
>> ISTM that e.g. the DNS, connect and handshake timeouts can be
>> implemented by the machinery that tries to set up the connection
>> behind the scenes, and the user's protocol won't know anything of
>> these shenanigans. The code that calls create_transport() (actually
>> it'll probably be renamed create_client()) will just get a Future that
>> either indicates success (and then the protocol and transport are
>> successfully hooked up) or an error (and then no protocol was created
>> -- whether or not a transport was created is an implementation
>> detail).
>
> From my understanding the Future does not provide any information
> which connection to which host using which protocol and credentials
> failed?

That's not up to the Future -- it just passes an exception object
along. We could make this info available as attributes on the
exception object, if there is a need.

> I'd create the Procotol when trying to create a connection, so the
> Protocol is informed when the Transport fails and can take action -
> retry, whatever.

I had this in an earlier version, but Glyph convinced me that this is
the wrong design -- and it doesn't work for servers anyway, you must
have a protocol factory there.

>>>  * data_received(data) - if it was possible to return the number of
>>> bytes consumed by the protocol, and have the Transport buffer the rest
>>> for the next io in call, one would avoid having to do this in every
>>> Protocol on it's own - learned from experience.
>>
>> Twisted has a whole slew of protocol implementation subclasses that
>> implement various strategies like line-buffering (including a really
>> complex version where you can turn the line buffering on and off) and
>> "netstrings". I am trying to limit the PEP's size by not including
>> these, but I fully expect that in practice a set of useful protocol
>> implementations will be created that handles common cases. I'm not
>> convinced that putting this in the transport/protocol interface will
>> make user code less buggy: it seems easy for the user code to miscount
>> the bytes or not return a count at all in a rarely taken code branch.
>
> Please don't drop this.
>
> You never know how much data you'll receive, you never know how much
> data you need for a message, so the Protocol needs a buffer.

That all depends on what the protocol is trying to do. (The ECHO
protocol certainly doesn't need a buffer. :-)

> Having this io in buffer in the Transports allows every Protocol to
> benefit, they try to read a message from the data passed to
> data_received(), if the data received is not sufficient to create a
> full message, they need to buffer it and wait for more data.

Having it in a Protocol base class also allows every protocol that
wants it to benefit, without complicating the transport. I can also
see problems where the transport needs to keep calling data_received()
until either all data is consumed or it returns 0 (no data consumed).
It just doesn't seem right to make the transport responsible for this
logic, since it doesn't know enough about the needs of the protocol.

> So having the Protocol.data_received return the number of bytes the
> Protocol could process, the Transport can do the job, saving it for
> every Protocol.
> Still - a protocol can have it's own buffering strategy, i.e. in case
> of a incremental XML parser which does it's own buffering, and always
> return len(data), so the Transport does not buffer anything.

Right, data_received() is closely related to the concept of a "feed
parser" which is used in a few places in the stdlib
(http://docs.python.org/3/search.html?q=feed&check_keywords=yes&area=default)
and even has a 3rd party implementation
(http://pypi.python.org/pypi/feedparser/), and there the parser (i.e.
the protocol equivalent) is always responsible for buffering data it
cannot immediately process.

>>> Next, what happens if a dns can not be resolved, ssl handshake (in
>>> case ssl is transport) or connecting fails - in my opinion it's an
>>> error the protocol is supposed to take care of
>>>  + error_dns
>>>  + error_ssl
>>>  + error_connecting
>>
>> The future returned by create_transport() (aka create_client()) will
>> raise the exception.
>
> When do I get this exception - the EventLoop.run() raises?

No, the eventloop doesn't normally raise, just whichever task is
waiting for that future using 'yield from' will get the exception. Or
you can use eventloop.run_until_complete(<future>) and then that call
will raise.

-- 
--Guido van Rossum (python.org/~guido)


From ncoghlan at gmail.com  Wed Jan  9 02:04:30 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 9 Jan 2013 11:04:30 +1000
Subject: [Python-ideas] Yielding through context managers
In-Reply-To: <CAP7+vJKdxJCzKCnoHPVM5O1HqewyLpSG5LfXJGFQ6GGU+xKZnA@mail.gmail.com>
References: <CABZk45x13SV4tt=dP0cf79UaM7+D7q2EYPyXpuYGVW1G98kMxQ@mail.gmail.com>
	<CABZk45xur6q-=V681QFXuWNunYqPun10+WJm5QX8QzruCur7oA@mail.gmail.com>
	<CAP7+vJJXBmpqF+Su2=WFy5HjtRwPfegz4S=ZG7fqDoXuZM7Wpw@mail.gmail.com>
	<CADiSq7fRyzVVZLyvuU3FV2GL0oZ987UQgtDef41Z77KDgB1rKQ@mail.gmail.com>
	<CAE_Hg6bLHOn-L7ithXMXL5w8JF7E85gJKuwu0yPqVomo=-Msrg@mail.gmail.com>
	<CADiSq7ePm4dg=fj3CmUL195297-KNYzMHN76R9xkb2w+LRSeag@mail.gmail.com>
	<CAP7+vJLOtuOd5qPWUFZAzZEYDddY5xFRKQjbck2y97DC3XXjrQ@mail.gmail.com>
	<CADiSq7dmsPCmu7tXiJQWFVhG8Bm2ysvqregdsOn5AzjKVB6eBg@mail.gmail.com>
	<CAP7+vJKn26jNra+YsGW4OSA-OXq-WoGUjxtxMpzr_4P0C+O6rg@mail.gmail.com>
	<CADiSq7dJpc_M-H07ZGi0BJsbHh-M+=j7WUKhdV+0BHdC=xDkQw@mail.gmail.com>
	<CAP7+vJKdxJCzKCnoHPVM5O1HqewyLpSG5LfXJGFQ6GGU+xKZnA@mail.gmail.com>
Message-ID: <CADiSq7emM27o=0qe2ZaMfvJR7vPGDj+pSbKB+AwRHHHcRe_52Q@mail.gmail.com>

On Wed, Jan 9, 2013 at 4:32 AM, Guido van Rossum <guido at python.org> wrote:
> On Tue, Jan 8, 2013 at 2:13 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On Tue, Jan 8, 2013 at 11:06 AM, Guido van Rossum <guido at python.org> wrote:
>>> On Sun, Jan 6, 2013 at 9:47 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>>> Ah, true, I hadn't thought of that. So yes, any case where the
>>>> __exit__ method can be "fire-and-forget" is also straightforward to
>>>> implement with just PEP 3156. That takes us back to things like
>>>> database transactions being the only ones where
>>>
>>> And 'yielding' wouldn't do anything about this, would it?
>>
>> Any new syntax should properly handle the database transaction context
>> manager problem, otherwise what's the point? The workarounds for
>> asynchronous __next__ and __enter__ methods aren't too bad - it's
>> allowing asynchronous __exit__ methods that can only be solved with
>> new syntax.
>
> Is your idea that if you write "with yielding x as y: blah" this
> effectively replaces the calls to __enter__ and __exit__ with "yield
> from x.__enter__()" and "yield from x.__enter__()"? (And assigning the
> result of yield fro, x.__enter__() to y.)

Yep - that's why it would need a new keyword, as the subexpression
itself would be evaluated normally, while the later special method
invocations would be wrapped in yield from expressions.

>> However, note that I just wanted to be clear that I consider the idea
>> of a syntax for "asynchronous context managers" plausible, and
>> sketched out a possible design to explain *why* I thought it should be
>> possible. My focus will stay with PEP 432 until that's done.
>
> Sure, I didn't intend any time pressure. Others may take this up as
> well -- or if nobody cares, we can put it off until the need has been
> demonstrated more. possibly after Python 3.4 is release.

Yep - the fact you can fall back to an explicit try-finally if needed,
or else use something like gevent to suspend implicitly if you want to
use such idioms a lot makes it easy to justify postponing doing
anything about it.

I'll at least mention the idea in my python-notes essay, though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From yorik.sar at gmail.com  Wed Jan  9 02:14:02 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 9 Jan 2013 05:14:02 +0400
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
Message-ID: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>

Hello.

I've read the PEP and some things raise questions in my consciousness. Here
they are.

1. Series of sock_ methods can be organized into a wrapper around sock
object. This wrappers can then be saved and used later in async-aware code.
This way code like:

    sock = socket(...)
    # later, e.g. in connect()
    yield from tulip.get_event_loop().sock_connect(sock, ...)
    # later, e.g. in read()
    data = yield from tulip.get_event_loop().sock_recv(sock, ...)

will look like:

    sock = socket(...)
    async_sock = tulip.get_event_loop().wrap_socket(sock)
    # later, e.g. in connect()
    yield from async_sock.connect(...)
    # later, e.g. in read()
    data = yield from async_sock.recv(...)

Interface looks cleaner while plain calls (if they ever needed) will be
only 5 chars longer.

2. Not as great, but still possible to wrap fd in similar way to make
interface simpler. Instead of:

    add_reader(fd, callback, *args)
    remove_reader(fd)

We can do:

    wrap_fd(fd).reader = functools.partial(callback, *args)
    wrap_fd(fd).reader = None  # or
    del wrap_fd(fd).reader

3. Why not use properties (or fields) instead of methods for cancelled,
running and done in Future class? I think, it'll be easier to use since I
expect such attributes to be accessed as properties. I see it as some
javaism since in Java Future have getters for this fields but they are
prefixed with 'is'.

4. Why separate exception() from result() for Future class? It does the
same as result() but with different interface (return instead of raise).
Doesn't this violate the rule "There should be one obvious way to do it"?

5. I think, protocol and transport methods' names are not easy or
understanding enough:
- write_eof() does not write anything but closes smth, should be
close_writing or smth alike;
- the same way eof_received() should become smth like receive_closed;
- pause() and resume() work with reading only, so they should be suffixed
(prefixed) with read(ing), like pause_reading(), resume_reading().


Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/26409f9a/attachment.html>

From ncoghlan at gmail.com  Wed Jan  9 03:31:04 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 9 Jan 2013 12:31:04 +1000
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
Message-ID: <CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>

On Wed, Jan 9, 2013 at 11:14 AM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
> 4. Why separate exception() from result() for Future class? It does the same
> as result() but with different interface (return instead of raise). Doesn't
> this violate the rule "There should be one obvious way to do it"?

The exception() method exists for the same reason that we support both
"key in mapping" and raising KeyError from "mapping[key]": sometimes
you want "Look Before You Leap", other times you want to let the
exception fly. If you want the latter, just call .result() directly,
if you want the former, check .exception() first.

Regardless, the Future API isn't really being defined in PEP 3156, as
it is mostly inheritied from the previously implemented PEP 3148
(http://www.python.org/dev/peps/pep-3148/#future-objects)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From guido at python.org  Wed Jan  9 03:49:50 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Jan 2013 18:49:50 -0800
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CAPZV6o--P2e_ENp7NnBAkNhQXP_Sf+f-A-J6+RRj3_xh6WxCzw@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAPZV6o--P2e_ENp7NnBAkNhQXP_Sf+f-A-J6+RRj3_xh6WxCzw@mail.gmail.com>
Message-ID: <CAP7+vJLtvo6ou--erkL0Bon3X=ct1zAzfZzoy=hsxeN1PvVQLw@mail.gmail.com>

On Tue, Jan 8, 2013 at 6:07 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2013/1/8 Yuriy Taraday <yorik.sar at gmail.com>:
>> 4. Why separate exception() from result() for Future class? It does the same
>> as result() but with different interface (return instead of raise). Doesn't
>> this violate the rule "There should be one obvious way to do it"?
>
> I expect that's a copy-and-paste error. exception() will return the
> exception if one occured.

I don't see the typo. It is as Nick explained.

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Wed Jan  9 04:06:19 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Jan 2013 19:06:19 -0800
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CAPZV6o_ngzqigoCRqjQn2_o1Z-Gg=Y5oCoYbTRRP9qUdRcNQbQ@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAPZV6o--P2e_ENp7NnBAkNhQXP_Sf+f-A-J6+RRj3_xh6WxCzw@mail.gmail.com>
	<CAP7+vJLtvo6ou--erkL0Bon3X=ct1zAzfZzoy=hsxeN1PvVQLw@mail.gmail.com>
	<CAPZV6o_ngzqigoCRqjQn2_o1Z-Gg=Y5oCoYbTRRP9qUdRcNQbQ@mail.gmail.com>
Message-ID: <CAP7+vJ+E46r=NLjjEWWcMpTRikbGMypNMRdTyeyk+MGTZ-n+UA@mail.gmail.com>

On Tue, Jan 8, 2013 at 6:53 PM, Benjamin Peterson <benjamin at python.org> wrote:
> 2013/1/8 Guido van Rossum <guido at python.org>:
>> On Tue, Jan 8, 2013 at 6:07 PM, Benjamin Peterson <benjamin at python.org> wrote:
>>> 2013/1/8 Yuriy Taraday <yorik.sar at gmail.com>:
>>>> 4. Why separate exception() from result() for Future class? It does the same
>>>> as result() but with different interface (return instead of raise). Doesn't
>>>> this violate the rule "There should be one obvious way to do it"?
>>>
>>> I expect that's a copy-and-paste error. exception() will return the
>>> exception if one occured.
>>
>> I don't see the typo. It is as Nick explained.
>
> PEP 3156 says "exception(). Difference with PEP 3148: This has no
> timeout argument and does not wait; if the future is not yet done, it
> raises an exception." I assume it's not supposed to raise.

No, actually, in that case it *does* raise an exception, because it
means that the caller didn't understand the interface. It *returns* an
exception object when the Future is done but the "result" is
exceptional. But it *raises* when the Future is not done yet.

-- 
--Guido van Rossum (python.org/~guido)


From yorik.sar at gmail.com  Wed Jan  9 04:56:17 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 9 Jan 2013 07:56:17 +0400
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>
Message-ID: <CABocrW6kPeSarBDjSi8wM95D+KO6M9p1Tsn-+Vg2OnjmPaRbJg@mail.gmail.com>

On Wed, Jan 9, 2013 at 6:31 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Wed, Jan 9, 2013 at 11:14 AM, Yuriy Taraday <yorik.sar at gmail.com>
> wrote:
> > 4. Why separate exception() from result() for Future class? It does the
> same
> > as result() but with different interface (return instead of raise).
> Doesn't
> > this violate the rule "There should be one obvious way to do it"?
>
> The exception() method exists for the same reason that we support both
> "key in mapping" and raising KeyError from "mapping[key]": sometimes
> you want "Look Before You Leap", other times you want to let the
> exception fly. If you want the latter, just call .result() directly,
> if you want the former, check .exception() first.
>

Ok, I get it now. Thank you for clarifying.


> Regardless, the Future API isn't really being defined in PEP 3156, as
> it is mostly inheritied from the previously implemented PEP 3148
> (http://www.python.org/dev/peps/pep-3148/#future-objects)
>

Then #3 and #4 are about PEP 3148. Why was it done this way?

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/bddfdacf/attachment.html>

From guido at python.org  Wed Jan  9 05:31:58 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Jan 2013 20:31:58 -0800
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
Message-ID: <CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>

On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
> I've read the PEP and some things raise questions in my consciousness. Here
> they are.

Thanks!

> 1. Series of sock_ methods can be organized into a wrapper around sock
> object. This wrappers can then be saved and used later in async-aware code.
> This way code like:
>
>     sock = socket(...)
>     # later, e.g. in connect()
>     yield from tulip.get_event_loop().sock_connect(sock, ...)
>     # later, e.g. in read()
>     data = yield from tulip.get_event_loop().sock_recv(sock, ...)
>
> will look like:
>
>     sock = socket(...)
>     async_sock = tulip.get_event_loop().wrap_socket(sock)
>     # later, e.g. in connect()
>     yield from async_sock.connect(...)
>     # later, e.g. in read()
>     data = yield from async_sock.recv(...)
>
> Interface looks cleaner while plain calls (if they ever needed) will be only
> 5 chars longer.

This is a semi-internal API that is mostly useful to Transport
implementers, and there won't be many of those. So I prefer the API
that has the fewest classes.

> 2. Not as great, but still possible to wrap fd in similar way to make
> interface simpler. Instead of:
>
>     add_reader(fd, callback, *args)
>     remove_reader(fd)
>
> We can do:
>
>     wrap_fd(fd).reader = functools.partial(callback, *args)
>     wrap_fd(fd).reader = None  # or
>     del wrap_fd(fd).reader

Ditto.

> 3. Why not use properties (or fields) instead of methods for cancelled,
> running and done in Future class? I think, it'll be easier to use since I
> expect such attributes to be accessed as properties. I see it as some
> javaism since in Java Future have getters for this fields but they are
> prefixed with 'is'.

Too late, this is how PEP 3148 defined it. It was indeed inspired by
Java Futures. However I would defend using methods here, since these
are not all that cheap -- they have to acquire and release a lock.

> 4. Why separate exception() from result() for Future class? It does the same
> as result() but with different interface (return instead of raise). Doesn't
> this violate the rule "There should be one obvious way to do it"?

Because it is quite awkward to check for an exception if you have to
catch it (4 lines instead of 1).

> 5. I think, protocol and transport methods' names are not easy or
> understanding enough:
> - write_eof() does not write anything but closes smth, should be
> close_writing or smth alike;
> - the same way eof_received() should become smth like receive_closed;

I am indeed struggling a bit with these names, but "writing an EOF" is
actually how I think of this (maybe I am dating myself to the time of
mag tapes though :-).

> - pause() and resume() work with reading only, so they should be suffixed
> (prefixed) with read(ing), like pause_reading(), resume_reading().

Agreed.

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Wed Jan  9 05:50:32 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Jan 2013 20:50:32 -0800
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
Message-ID: <CAP7+vJKMpFuNzb-ahULSQbwTRDyUntAHCcPdcv-pqF10MyfJtA@mail.gmail.com>

On Tue, Jan 8, 2013 at 8:31 PM, Guido van Rossum <guido at python.org> wrote:
> On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
>> - pause() and resume() work with reading only, so they should be suffixed
>> (prefixed) with read(ing), like pause_reading(), resume_reading().
>
> Agreed.

I think I want to take that back. I think it is more common for a
protocol to want to pause the transport (i.e. hold back
data_received() calls) than it is for a transport to want to pause the
protocol (i.e. hold back write() calls). So the more common method can
have a shorter name. Also, pause_reading() is almost confusing, since
the protocol's method is named data_received(), not read_data(). Also,
there's no reason for the protocol to want to pause the *write* (send)
actions of the transport -- if wanted to write less it should not have
called write(). The reason to distinguish between the two modes of
pausing is because it is sometimes useful to "stack" multiple
protocols, and then a protocol in the middle of the stack acts as a
transport to the protocol next to it (and vice versa). See the
discussion on this list previously, e.g.
http://mail.python.org/pipermail/python-ideas/2013-January/018522.html
(search for the keyword "stack" in this long message to find the
relevant section).

-- 
--Guido van Rossum (python.org/~guido)


From yorik.sar at gmail.com  Wed Jan  9 06:02:23 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 9 Jan 2013 09:02:23 +0400
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
Message-ID: <CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>

On Wed, Jan 9, 2013 at 8:31 AM, Guido van Rossum <guido at python.org> wrote:

> On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
> > I've read the PEP and some things raise questions in my consciousness.
> Here
> > they are.
>
> Thanks!
>
> > 1. Series of sock_ methods can be organized into a wrapper around sock
> > object. This wrappers can then be saved and used later in async-aware
> code.
>
> This is a semi-internal API that is mostly useful to Transport
> implementers, and there won't be many of those. So I prefer the API
> that has the fewest classes.
>
> > 2. Not as great, but still possible to wrap fd in similar way to make
> > interface simpler.
>
> Ditto.
>

Ok, I see.
Should transports be bound to event loop on creation? I wonder, what would
happen if someone changes current event loop between these calls.


>
> > 3. Why not use properties (or fields) instead of methods for cancelled,
> > running and done in Future class? I think, it'll be easier to use since I
> > expect such attributes to be accessed as properties. I see it as some
> > javaism since in Java Future have getters for this fields but they are
> > prefixed with 'is'.
>
> Too late, this is how PEP 3148 defined it. It was indeed inspired by
> Java Futures. However I would defend using methods here, since these
> are not all that cheap -- they have to acquire and release a lock.
>
>
I understand why it should be a method, but still if it's a getter, it
should have either get_ or is_ prefix.
Are there any way to change this with 'Final' PEP?


> > 4. Why separate exception() from result() for Future class? It does the
> same
> > as result() but with different interface (return instead of raise).
> Doesn't
> > this violate the rule "There should be one obvious way to do it"?
>
> Because it is quite awkward to check for an exception if you have to
> catch it (4 lines instead of 1).
>
>
> 5. I think, protocol and transport methods' names are not easy or
> > understanding enough:
> > - write_eof() does not write anything but closes smth, should be
> > close_writing or smth alike;
> > - the same way eof_received() should become smth like receive_closed;
>
> I am indeed struggling a bit with these names, but "writing an EOF" is
> actually how I think of this (maybe I am dating myself to the time of
> mag tapes though :-).
>
> I never saw a computer working with a tape, but it's clear to me what does
they do.
I've just imagined the amount of words I'll have to say to students about
EOFs instead of simple "it closes our end of one half of a socket".

> - pause() and resume() work with reading only, so they should be suffixed
> > (prefixed) with read(ing), like pause_reading(), resume_reading().
>
> Agreed.
>
> --
> --Guido van Rossum (python.org/~guido)
>



-- 

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/a3017123/attachment.html>

From guido at python.org  Wed Jan  9 06:14:05 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Jan 2013 21:14:05 -0800
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
Message-ID: <CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>

On Tue, Jan 8, 2013 at 9:02 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
> On Wed, Jan 9, 2013 at 8:31 AM, Guido van Rossum <guido at python.org> wrote:
>> On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
>> > 1. Series of sock_ methods can be organized into a wrapper around sock
>> > object. This wrappers can then be saved and used later in async-aware
>> > code.
>>
>> This is a semi-internal API that is mostly useful to Transport
>> implementers, and there won't be many of those. So I prefer the API
>> that has the fewest classes.
>>
>> > 2. Not as great, but still possible to wrap fd in similar way to make
>> > interface simpler.
>>
>> Ditto.
>
>
> Ok, I see.
> Should transports be bound to event loop on creation? I wonder, what would
> happen if someone changes current event loop between these calls.

Yes, this is what the transport implementation does.

>> > 3. Why not use properties (or fields) instead of methods for cancelled,
>> > running and done in Future class? I think, it'll be easier to use since
>> > I
>> > expect such attributes to be accessed as properties. I see it as some
>> > javaism since in Java Future have getters for this fields but they are
>> > prefixed with 'is'.
>>
>> Too late, this is how PEP 3148 defined it. It was indeed inspired by
>> Java Futures. However I would defend using methods here, since these
>> are not all that cheap -- they have to acquire and release a lock.
>>
>
> I understand why it should be a method, but still if it's a getter, it
> should have either get_ or is_ prefix.

Why? That's not a universal coding standard. The names seem clear enough to me.

> Are there any way to change this with 'Final' PEP?

No, the concurrent.futures package has been released (I forget if it
was Python 3.2 or 3.3) and we're bound to backwards compatibility.
Also I really don't think it's a big deal at all.

>> > 4. Why separate exception() from result() for Future class? It does the
>> > same
>> > as result() but with different interface (return instead of raise).
>> > Doesn't
>> > this violate the rule "There should be one obvious way to do it"?
>>
>> Because it is quite awkward to check for an exception if you have to
>> catch it (4 lines instead of 1).
>>
>>
>> > 5. I think, protocol and transport methods' names are not easy or
>> > understanding enough:
>> > - write_eof() does not write anything but closes smth, should be
>> > close_writing or smth alike;
>> > - the same way eof_received() should become smth like receive_closed;
>>
>> I am indeed struggling a bit with these names, but "writing an EOF" is
>> actually how I think of this (maybe I am dating myself to the time of
>> mag tapes though :-).
>>
> I never saw a computer working with a tape, but it's clear to me what does
> they do.
> I've just imagined the amount of words I'll have to say to students about
> EOFs instead of simple "it closes our end of one half of a socket".

But which half? A socket is two independent streams, one in each
direction. Twisted uses half_close() for this concept but unless you
already know what this is for you are left wondering which half. Which
is why I like using 'write' in the name.

-- 
--Guido van Rossum (python.org/~guido)


From yorik.sar at gmail.com  Wed Jan  9 06:26:09 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 9 Jan 2013 09:26:09 +0400
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
	<CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
Message-ID: <CABocrW7qDbb4Nk7gFFNoPxbihFhV062JVKuK66=21AFCoczwug@mail.gmail.com>

On Wed, Jan 9, 2013 at 9:14 AM, Guido van Rossum <guido at python.org> wrote:

> On Tue, Jan 8, 2013 at 9:02 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
>  > Should transports be bound to event loop on creation? I wonder, what
> would
> > happen if someone changes current event loop between these calls.
>
> Yes, this is what the transport implementation does.
>

But in theory every sock_ call is independent and returns Future bound to
current event loop.
So if one change event loop with active transport, nothing bad should
happen. Or I'm missing something.


> > I understand why it should be a method, but still if it's a getter, it
> > should have either get_ or is_ prefix.
>
> Why? That's not a universal coding standard. The names seem clear enough
> to me.
>

When I see (in autocompletion, for example) or remember name like
"running", it triggers thought that it's a field. When I remember smth like
is_running, it definitely associates with method.


> > Are there any way to change this with 'Final' PEP?
>
> No, the concurrent.futures package has been released (I forget if it
> was Python 3.2 or 3.3) and we're bound to backwards compatibility.
> Also I really don't think it's a big deal at all.
>

Yes, not a big deal.

>
> >> > 5. I think, protocol and transport methods' names are not easy or
> >> > understanding enough:
> >> > - write_eof() does not write anything but closes smth, should be
> >> > close_writing or smth alike;
> >> > - the same way eof_received() should become smth like receive_closed;
> >>
> >> I am indeed struggling a bit with these names, but "writing an EOF" is
> >> actually how I think of this (maybe I am dating myself to the time of
> >> mag tapes though :-).
> >>
> > I never saw a computer working with a tape, but it's clear to me what
> does
> > they do.
> > I've just imagined the amount of words I'll have to say to students about
> > EOFs instead of simple "it closes our end of one half of a socket".
>
> But which half? A socket is two independent streams, one in each
> direction. Twisted uses half_close() for this concept but unless you
> already know what this is for you are left wondering which half. Which
> is why I like using 'write' in the name.


Yes, 'write' part is good, I should mention it. I meant to say that I won't
need to explain that there were days when we had to handle a special marker
at the end of file.

-- 

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/07c63351/attachment.html>

From stephen at xemacs.org  Wed Jan  9 06:42:30 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 09 Jan 2013 14:42:30 +0900
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO
	Support	Rebooted
In-Reply-To: <CABocrW7qDbb4Nk7gFFNoPxbihFhV062JVKuK66=21AFCoczwug@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
	<CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
	<CABocrW7qDbb4Nk7gFFNoPxbihFhV062JVKuK66=21AFCoczwug@mail.gmail.com>
Message-ID: <87k3rmap4p.fsf@uwakimon.sk.tsukuba.ac.jp>

Is this thread really ready to migrate to python-dev when we're still
bikeshedding method names?

Yuriy Taraday writes:

 > > But which half? A socket is two independent streams, one in each
 > > direction. Twisted uses half_close() for this concept but unless you
 > > already know what this is for you are left wondering which half. Which
 > > is why I like using 'write' in the name.
 > 
 > Yes, 'write' part is good, I should mention it. I meant to say that I won't
 > need to explain that there were days when we had to handle a special marker
 > at the end of file.

Mystery is good for students.<wink/>

Getting serious, "close_writer" occured to me as a possibility.


From jstpierre at mecheye.net  Wed Jan  9 06:59:39 2013
From: jstpierre at mecheye.net (Jasper St. Pierre)
Date: Wed, 9 Jan 2013 00:59:39 -0500
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <87k3rmap4p.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
	<CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
	<CABocrW7qDbb4Nk7gFFNoPxbihFhV062JVKuK66=21AFCoczwug@mail.gmail.com>
	<87k3rmap4p.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <CAA0H+QRKTJqXS9Mk2dsx-fyBoKtb8biJmj95gQSYvAg6TR+Wvg@mail.gmail.com>

Well, if we're at the "bikeshedding about names" stage, that means that no
serious issues with the proposal are left. So it's a sign of progress.


On Wed, Jan 9, 2013 at 12:42 AM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

> Is this thread really ready to migrate to python-dev when we're still
> bikeshedding method names?
>
> Yuriy Taraday writes:
>
>  > > But which half? A socket is two independent streams, one in each
>  > > direction. Twisted uses half_close() for this concept but unless you
>  > > already know what this is for you are left wondering which half. Which
>  > > is why I like using 'write' in the name.
>  >
>  > Yes, 'write' part is good, I should mention it. I meant to say that I
> won't
>  > need to explain that there were days when we had to handle a special
> marker
>  > at the end of file.
>
> Mystery is good for students.<wink/>
>
> Getting serious, "close_writer" occured to me as a possibility.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
  Jasper
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/a75184fd/attachment.html>

From guido at python.org  Wed Jan  9 07:02:58 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 8 Jan 2013 22:02:58 -0800
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CABocrW7qDbb4Nk7gFFNoPxbihFhV062JVKuK66=21AFCoczwug@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
	<CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
	<CABocrW7qDbb4Nk7gFFNoPxbihFhV062JVKuK66=21AFCoczwug@mail.gmail.com>
Message-ID: <CAP7+vJJdkKaL3jmWgWzL1ojn9XOmHUyryi+dPds7ynTCLG2S3w@mail.gmail.com>

On Tue, Jan 8, 2013 at 9:26 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
>
>
>
> On Wed, Jan 9, 2013 at 9:14 AM, Guido van Rossum <guido at python.org> wrote:
>>
>> On Tue, Jan 8, 2013 at 9:02 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
>> > Should transports be bound to event loop on creation? I wonder, what
>> > would
>> > happen if someone changes current event loop between these calls.
>>
>> Yes, this is what the transport implementation does.
>
>
> But in theory every sock_ call is independent and returns Future bound to
> current event loop.

It is bound to the event loop whose sock_<call>() method you called.

> So if one change event loop with active transport, nothing bad should
> happen. Or I'm missing something.

Changing event loops in the middle of event processing is not a common
(or even useful) pattern. You start the event loop and then leave it
alone.

>> > I understand why it should be a method, but still if it's a getter, it
>> > should have either get_ or is_ prefix.
>>
>> Why? That's not a universal coding standard. The names seem clear enough
>> to me.
>
>
> When I see (in autocompletion, for example) or remember name like "running",
> it triggers thought that it's a field. When I remember smth like is_running,
> it definitely associates with method.

That must pretty specific to your personal experience.

>> > Are there any way to change this with 'Final' PEP?
>>
>> No, the concurrent.futures package has been released (I forget if it
>> was Python 3.2 or 3.3) and we're bound to backwards compatibility.
>> Also I really don't think it's a big deal at all.
>
>
> Yes, not a big deal.
>>
>>
>> >> > 5. I think, protocol and transport methods' names are not easy or
>> >> > understanding enough:
>> >> > - write_eof() does not write anything but closes smth, should be
>> >> > close_writing or smth alike;
>> >> > - the same way eof_received() should become smth like receive_closed;
>> >>
>> >> I am indeed struggling a bit with these names, but "writing an EOF" is
>> >> actually how I think of this (maybe I am dating myself to the time of
>> >> mag tapes though :-).
>> >>
>> > I never saw a computer working with a tape, but it's clear to me what
>> > does
>> > they do.
>> > I've just imagined the amount of words I'll have to say to students
>> > about
>> > EOFs instead of simple "it closes our end of one half of a socket".
>>
>> But which half? A socket is two independent streams, one in each
>> direction. Twisted uses half_close() for this concept but unless you
>> already know what this is for you are left wondering which half. Which
>> is why I like using 'write' in the name.
>
>
> Yes, 'write' part is good, I should mention it. I meant to say that I won't
> need to explain that there were days when we had to handle a special marker
> at the end of file.

But even today you have to mark the end somehow, to distinguish it
from "not done yet, more could be coming". The equivalent is typing ^D
into a UNIX terminal (or ^Z on Windows).

-- 
--Guido van Rossum (python.org/~guido)


From glyph at twistedmatrix.com  Wed Jan  9 10:30:43 2013
From: glyph at twistedmatrix.com (Glyph)
Date: Wed, 9 Jan 2013 01:30:43 -0800
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
	<CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
Message-ID: <69E9D1F0-50C3-4F49-998A-3EEB79611C43@twistedmatrix.com>

On Jan 8, 2013, at 9:14 PM, Guido van Rossum <guido at python.org> wrote:

> But which half? A socket is two independent streams, one in each
> direction. Twisted uses half_close() for this concept but unless you
> already know what this is for you are left wondering which half. Which
> is why I like using 'write' in the name.

I should add, if you don't already know what this means you really shouldn't be trying to do it ;-).

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/3e9503e2/attachment.html>

From shibturn at gmail.com  Wed Jan  9 11:28:42 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Wed, 09 Jan 2013 10:28:42 +0000
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>
Message-ID: <kcjgp4$r08$1@ger.gmane.org>

On 09/01/2013 2:31am, Nick Coghlan wrote:
> The exception() method exists for the same reason that we support both
> "key in mapping" and raising KeyError from "mapping[key]": sometimes
> you want "Look Before You Leap", other times you want to let the
> exception fly. If you want the latter, just call .result() directly,
> if you want the former, check .exception() first.

But how can you do LBYL.  I can't see a way to check that an exception 
has occurred seeing whether result() raises an error: done() tells you 
that the operation is finished, but not whether it succeeded.

-- 
Richard



From yorik.sar at gmail.com  Wed Jan  9 11:45:55 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 9 Jan 2013 14:45:55 +0400
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CAP7+vJKMpFuNzb-ahULSQbwTRDyUntAHCcPdcv-pqF10MyfJtA@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CAP7+vJKMpFuNzb-ahULSQbwTRDyUntAHCcPdcv-pqF10MyfJtA@mail.gmail.com>
Message-ID: <CABocrW4gFvGhgmTsHqsMz3GJna8_0xsNVcKHbyx8O+35ztiJKg@mail.gmail.com>

On Wed, Jan 9, 2013 at 8:50 AM, Guido van Rossum <guido at python.org> wrote:

> On Tue, Jan 8, 2013 at 8:31 PM, Guido van Rossum <guido at python.org> wrote:
> > On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday <yorik.sar at gmail.com>
> wrote:
> >> - pause() and resume() work with reading only, so they should be
> suffixed
> >> (prefixed) with read(ing), like pause_reading(), resume_reading().
> >
> > Agreed.
>
> I think I want to take that back. I think it is more common for a
> protocol to want to pause the transport (i.e. hold back
> data_received() calls) than it is for a transport to want to pause the
> protocol (i.e. hold back write() calls). So the more common method can
> have a shorter name. Also, pause_reading() is almost confusing, since
> the protocol's method is named data_received(), not read_data(). Also,
> there's no reason for the protocol to want to pause the *write* (send)
> actions of the transport -- if wanted to write less it should not have
> called write(). The reason to distinguish between the two modes of
> pausing is because it is sometimes useful to "stack" multiple
> protocols, and then a protocol in the middle of the stack acts as a
> transport to the protocol next to it (and vice versa). See the
> discussion on this list previously, e.g.
> http://mail.python.org/pipermail/python-ideas/2013-January/018522.html
> (search for the keyword "stack" in this long message to find the
> relevant section).


I totally agree with protocol/transport stacking, anyone should be able to
do some ugly thing like FTP over SSL over SOCKS over SSL over HTTP (j/k).
Just take a look at what you can do with netgraph in *BSD (anything over
anything with any number of layers).
But still we shouldn't sacrifice ease of understanding (both docs and code)
for couple extra chars (10 actually).
Yes, 'reading' is misleading, pause_receiving and resume_receiving are
better.

-- 

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/4f35067a/attachment.html>

From ncoghlan at gmail.com  Wed Jan  9 11:54:42 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 9 Jan 2013 20:54:42 +1000
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <kcjgp4$r08$1@ger.gmane.org>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>
	<kcjgp4$r08$1@ger.gmane.org>
Message-ID: <CADiSq7e4fo7Cf8V_b087gY42SVBYLz-BeneE59a7CVT3e+e_Mg@mail.gmail.com>

On Wed, Jan 9, 2013 at 8:28 PM, Richard Oudkerk <shibturn at gmail.com> wrote:
> On 09/01/2013 2:31am, Nick Coghlan wrote:
>>
>> The exception() method exists for the same reason that we support both
>> "key in mapping" and raising KeyError from "mapping[key]": sometimes
>> you want "Look Before You Leap", other times you want to let the
>> exception fly. If you want the latter, just call .result() directly,
>> if you want the former, check .exception() first.
>
>
> But how can you do LBYL.  I can't see a way to check that an exception has
> occurred seeing whether result() raises an error: done() tells you that the
> operation is finished, but not whether it succeeded.

You need to combine it with the other LBYL checks (f.done() and
f.cancelled()) to be sure it won't throw an exception.

    if f.done() and not f.cancelled():
        # Since we now know neither TimeoutError nor CancelledError can happen,
        # we can check for exceptions either by calling f.exception() or
        # by calling f.result() inside a try/except block
        # The latter will usually be the better option

Just calling f.result() is by far the most common, but the other can
be convenient in some cases (e.g. if you're writing a scheduler that
needs to check if it should be calling send() or throw() on a
generator).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From yorik.sar at gmail.com  Wed Jan  9 11:55:27 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 9 Jan 2013 14:55:27 +0400
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CAP7+vJJdkKaL3jmWgWzL1ojn9XOmHUyryi+dPds7ynTCLG2S3w@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
	<CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
	<CABocrW7qDbb4Nk7gFFNoPxbihFhV062JVKuK66=21AFCoczwug@mail.gmail.com>
	<CAP7+vJJdkKaL3jmWgWzL1ojn9XOmHUyryi+dPds7ynTCLG2S3w@mail.gmail.com>
Message-ID: <CABocrW5TeDnvSPUd-BKYmhw4GeFriY3PAwHj_ne06jeJDk3UKg@mail.gmail.com>

On Wed, Jan 9, 2013 at 10:02 AM, Guido van Rossum <guido at python.org> wrote:

> Changing event loops in the middle of event processing is not a common
>  (or even useful) pattern. You start the event loop and then leave it
> alone.
>

Yes. It was not-so-great morning idea.

> Yes, 'write' part is good, I should mention it. I meant to say that I
> won't
> > need to explain that there were days when we had to handle a special
> marker
> > at the end of file.
>
> But even today you have to mark the end somehow, to distinguish it
> from "not done yet, more could be coming". The equivalent is typing ^D
> into a UNIX terminal (or ^Z on Windows).


My interns told me that they remember EOF as special object only from high
school when they had to study Pascal. I guess, in 5 years students won't
understand how one can write an EOF. (and schools will finally replace
Pascal with Python)

-- 

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/41bf3b55/attachment.html>

From yorik.sar at gmail.com  Wed Jan  9 12:00:00 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 9 Jan 2013 15:00:00 +0400
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <20130109103911.4f599709@pitrou.net>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<20130109103911.4f599709@pitrou.net>
Message-ID: <CABocrW7SQGK+P4C24X9zB4iN2jA75QYyRmFt2_7Sq7UF1gCtXw@mail.gmail.com>

On Wed, Jan 9, 2013 at 1:39 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

>
> Hi Yuriy,
>
> For the record, it isn't necessary to cross-post. python-ideas is
> the place for discussing this, and most interested people will be
> subscribed to both python-ideas and python-dev, and therefore they get
> duplicate messages.
>

Oh, sorry. I just found this thread in both MLs, so decided to send it to
both. This will be my last email (for now) on this topic at python-dev.

-- 

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/a402c064/attachment.html>

From ncoghlan at gmail.com  Wed Jan  9 12:06:45 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 9 Jan 2013 21:06:45 +1000
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CABocrW5TeDnvSPUd-BKYmhw4GeFriY3PAwHj_ne06jeJDk3UKg@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
	<CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
	<CABocrW7qDbb4Nk7gFFNoPxbihFhV062JVKuK66=21AFCoczwug@mail.gmail.com>
	<CAP7+vJJdkKaL3jmWgWzL1ojn9XOmHUyryi+dPds7ynTCLG2S3w@mail.gmail.com>
	<CABocrW5TeDnvSPUd-BKYmhw4GeFriY3PAwHj_ne06jeJDk3UKg@mail.gmail.com>
Message-ID: <CADiSq7dfa2No2tQ-C8cDZEdnC-18x4fUvXex6B72mPKn13edCQ@mail.gmail.com>

On Wed, Jan 9, 2013 at 8:55 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
> My interns told me that they remember EOF as special object only from high
> school when they had to study Pascal. I guess, in 5 years students won't
> understand how one can write an EOF. (and schools will finally replace
> Pascal with Python)

Python really doesn't try to avoid the concept of an End-of-file marker.

================
$ python3
Python 3.2.3 (default, Jun  8 2012, 05:36:09)
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> quit
Use quit() or Ctrl-D (i.e. EOF) to exit
>>> import io
>>> print(io.FileIO.read.__doc__)
read(size: int) -> bytes.  read at most size bytes, returned as bytes.

Only makes one system call, so less data may be returned than requested
In non-blocking mode, returns None if no data is available.
On end-of-file, returns ''.
================

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From shibturn at gmail.com  Wed Jan  9 13:13:19 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Wed, 09 Jan 2013 12:13:19 +0000
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CADiSq7e4fo7Cf8V_b087gY42SVBYLz-BeneE59a7CVT3e+e_Mg@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>
	<kcjgp4$r08$1@ger.gmane.org>
	<CADiSq7e4fo7Cf8V_b087gY42SVBYLz-BeneE59a7CVT3e+e_Mg@mail.gmail.com>
Message-ID: <kcjmt7$lns$1@ger.gmane.org>

On 09/01/2013 10:54am, Nick Coghlan wrote:
> You need to combine it with the other LBYL checks (f.done() and
> f.cancelled()) to be sure it won't throw an exception.
>
>      if f.done() and not f.cancelled():
>          # Since we now know neither TimeoutError nor CancelledError can happen,
>          # we can check for exceptions either by calling f.exception() or
>          # by calling f.result() inside a try/except block
>          # The latter will usually be the better option
>
> Just calling f.result() is by far the most common, but the other can
> be convenient in some cases (e.g. if you're writing a scheduler that
> needs to check if it should be calling send() or throw() on a
> generator).

Which goes to show that it cannot be used with LBYL.

For exception() to be usable with LBYL one would need to be able to 
check that exception() returns a value without having to catch any 
exceptions -- either from exception() or from result().

But you can only check that exception() doesn't raise an error by 
calling result() to ensure that it does raise an error.  But then you 
might as well catch the exception from result().

And the idea of calling exception() first and then result() if it fails 
is just crazy.

As things stand, exception() is pointless.

-- 
Richard



From yorik.sar at gmail.com  Wed Jan  9 13:51:09 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 9 Jan 2013 16:51:09 +0400
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <kcjmt7$lns$1@ger.gmane.org>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>
	<kcjgp4$r08$1@ger.gmane.org>
	<CADiSq7e4fo7Cf8V_b087gY42SVBYLz-BeneE59a7CVT3e+e_Mg@mail.gmail.com>
	<kcjmt7$lns$1@ger.gmane.org>
Message-ID: <CABocrW4V2R4c1Qg2qNZFYx64i890AhGi9GOiMfCMD0WqLXOJNA@mail.gmail.com>

On Wed, Jan 9, 2013 at 4:13 PM, Richard Oudkerk <shibturn at gmail.com> wrote:

> On 09/01/2013 10:54am, Nick Coghlan wrote:
>
>> You need to combine it with the other LBYL checks (f.done() and
>> f.cancelled()) to be sure it won't throw an exception.
>>
>>      if f.done() and not f.cancelled():
>>          # Since we now know neither TimeoutError nor CancelledError can
>> happen,
>>          # we can check for exceptions either by calling f.exception() or
>>          # by calling f.result() inside a try/except block
>>          # The latter will usually be the better option
>>
>> Just calling f.result() is by far the most common, but the other can
>> be convenient in some cases (e.g. if you're writing a scheduler that
>> needs to check if it should be calling send() or throw() on a
>> generator).
>>
>
> Which goes to show that it cannot be used with LBYL.
>
> For exception() to be usable with LBYL one would need to be able to check
> that exception() returns a value without having to catch any exceptions --
> either from exception() or from result().
>
> But you can only check that exception() doesn't raise an error by calling
> result() to ensure that it does raise an error.  But then you might as well
> catch the exception from result().
>
> And the idea of calling exception() first and then result() if it fails is
> just crazy.
>
> As things stand, exception() is pointless.


exception() will raise only TimeoutError or CancelledError, exceptions from
the Future computation are not raised, they are returned.
So to verify that a Future is properly computed, you should write:

    f.done() and not f.cancelled() and f.exception() is None

and you won't have to catch any exceptions.

-- 

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130109/32570e99/attachment.html>

From shibturn at gmail.com  Wed Jan  9 13:59:49 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Wed, 09 Jan 2013 12:59:49 +0000
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CABocrW4V2R4c1Qg2qNZFYx64i890AhGi9GOiMfCMD0WqLXOJNA@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>
	<kcjgp4$r08$1@ger.gmane.org>
	<CADiSq7e4fo7Cf8V_b087gY42SVBYLz-BeneE59a7CVT3e+e_Mg@mail.gmail.com>
	<kcjmt7$lns$1@ger.gmane.org>
	<CABocrW4V2R4c1Qg2qNZFYx64i890AhGi9GOiMfCMD0WqLXOJNA@mail.gmail.com>
Message-ID: <kcjpkd$f4h$1@ger.gmane.org>

On 09/01/2013 12:51pm, Yuriy Taraday wrote:
> exception() will raise only TimeoutError or CancelledError, exceptions
> from the Future computation are not raised, they are returned.
> So to verify that a Future is properly computed, you should write:
>
>      f.done() and not f.cancelled() and f.exception() is None
>
> and you won't have to catch any exceptions.

Ah.  I missed the point that exception() returns None (rather than 
raising) if there was no exception.

-- 
Richard



From guido at python.org  Wed Jan  9 16:58:10 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 9 Jan 2013 07:58:10 -0800
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <kcjmt7$lns$1@ger.gmane.org>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CADiSq7fURiDnU_j0LEUxJpz1D5Dv6EzNUwRw1WO1p=1fEeY8qQ@mail.gmail.com>
	<kcjgp4$r08$1@ger.gmane.org>
	<CADiSq7e4fo7Cf8V_b087gY42SVBYLz-BeneE59a7CVT3e+e_Mg@mail.gmail.com>
	<kcjmt7$lns$1@ger.gmane.org>
Message-ID: <CAP7+vJK20Saw4aMxickgKbBSPFk9dKpOy-1PMEW9+opLbgqFxQ@mail.gmail.com>

On Wed, Jan 9, 2013 at 4:13 AM, Richard Oudkerk <shibturn at gmail.com> wrote:
> On 09/01/2013 10:54am, Nick Coghlan wrote:
>>
>> You need to combine it with the other LBYL checks (f.done() and
>> f.cancelled()) to be sure it won't throw an exception.
>>
>>      if f.done() and not f.cancelled():
>>          # Since we now know neither TimeoutError nor CancelledError can
>> happen,
>>          # we can check for exceptions either by calling f.exception() or
>>          # by calling f.result() inside a try/except block
>>          # The latter will usually be the better option
>>
>> Just calling f.result() is by far the most common, but the other can
>> be convenient in some cases (e.g. if you're writing a scheduler that
>> needs to check if it should be calling send() or throw() on a
>> generator).
>
>
> Which goes to show that it cannot be used with LBYL.
>
> For exception() to be usable with LBYL one would need to be able to check
> that exception() returns a value without having to catch any exceptions --
> either from exception() or from result().
>
> But you can only check that exception() doesn't raise an error by calling
> result() to ensure that it does raise an error.  But then you might as well
> catch the exception from result().
>
> And the idea of calling exception() first and then result() if it fails is
> just crazy.
>
> As things stand, exception() is pointless.

Not true -- if the future has a callback associated with it, the
callback (or callbacks) is called when it becomes "done", and if the
callback wants to check for an exception it can use exception(). The
callback is guaranteed that the future is done so it doesn't have to
worry about the exception that is raised if the future isn't done. (Of
course a callback can also just call result() and catch the exception,
or let it bubble out -- in that case it will be logged by the event
loop and then dropped.)

-- 
--Guido van Rossum (python.org/~guido)


From federico.dev at reghe.net  Thu Jan 10 21:44:54 2013
From: federico.dev at reghe.net (Federico Reghenzani)
Date: Thu, 10 Jan 2013 21:44:54 +0100
Subject: [Python-ideas] TCP Fast Open protocol
Message-ID: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>

Hi all,

I'm new in Python development. I'm interesting in the new TCP Fast Open
protocol (http://research.google.com/pubs/pub37517.html). This protocol is
implemented in linux kernel 3.6 for client and 3.7 for server, and in
python changeset 5435a9278028 are defined the related constants. This TCP
change is an important optimization, in particular for http, and it is
completely backward compatible: even if a client or a server doesn't
support TFO, the connection proceed with normal procedure.

I think can be useful an implementation in socketserver module: an
attribute "allow_tcp_fast_open" that automatically set before listening the
correct socket option (another attribute is necessary to choose the queue
size). Similar implementation can be done in http modules.

The default value of this attribute may be "True" (according to its
backward compatibility), but new versions of glibc might expose
TCP_FASTOPEN costant even if the kernel does not support it (so use hasattr
to check if the constant exists don't guarantee that TFO is supported by
kernel). Maybe more complex code can resolve this problem, but I don't know
how do that (maybe catching exception or checking kernel version?)

I attached the simple patch for socketserver (and doc), let me know what
you think!


Federico
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130110/4147933e/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tfo.patch
Type: application/octet-stream
Size: 2079 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130110/4147933e/attachment.obj>

From phd at phdru.name  Thu Jan 10 21:55:41 2013
From: phd at phdru.name (Oleg Broytman)
Date: Fri, 11 Jan 2013 00:55:41 +0400
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
Message-ID: <20130110205541.GA1640@iskra.aviel.ru>

Hi!

On Thu, Jan 10, 2013 at 09:44:54PM +0100, Federico Reghenzani <federico.dev at reghe.net> wrote:
> I attached the simple patch for socketserver (and doc), let me know what
> you think!

   The patch looks good at the first glance, thank you for the work! The
better place for patches is the issue tracker at http://bugs.python.org
-- patches in the mailing list tend to be lost.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From federico.dev at reghe.net  Thu Jan 10 22:06:21 2013
From: federico.dev at reghe.net (Federico Reghenzani)
Date: Thu, 10 Jan 2013 22:06:21 +0100
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <20130110205541.GA1640@iskra.aviel.ru>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
Message-ID: <CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>

Hi Oleg,

I've posted here because I'm asking if it may be an idea make some changes
also in http module, maybe setting that option on 'True' as default (but
first we need to fix the kernel-glibc problem).

Thanks,
Federico
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130110/d3472d86/attachment.html>

From phd at phdru.name  Thu Jan 10 22:19:38 2013
From: phd at phdru.name (Oleg Broytman)
Date: Fri, 11 Jan 2013 01:19:38 +0400
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
Message-ID: <20130110211938.GB1640@iskra.aviel.ru>

On Thu, Jan 10, 2013 at 10:06:21PM +0100, Federico Reghenzani <federico.dev at reghe.net> wrote:
> I've posted here because I'm asking if it may be an idea make some changes
> also in http module, maybe setting that option on 'True' as default (but
> first we need to fix the kernel-glibc problem).

   I think IWBN to patch as many network modules as (ftplib, urllib,
urllib2, xmlrpclib). Having tests also helps.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From guido at python.org  Thu Jan 10 22:24:56 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 10 Jan 2013 13:24:56 -0800
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <20130110211938.GB1640@iskra.aviel.ru>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
Message-ID: <CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>

Is there sample code for an HTTP client? What if the server doesn't
yet support the feature?

On Thu, Jan 10, 2013 at 1:19 PM, Oleg Broytman <phd at phdru.name> wrote:
> On Thu, Jan 10, 2013 at 10:06:21PM +0100, Federico Reghenzani <federico.dev at reghe.net> wrote:
>> I've posted here because I'm asking if it may be an idea make some changes
>> also in http module, maybe setting that option on 'True' as default (but
>> first we need to fix the kernel-glibc problem).
>
>    I think IWBN to patch as many network modules as (ftplib, urllib,
> urllib2, xmlrpclib). Having tests also helps.
>
> Oleg.
> --
>      Oleg Broytman            http://phdru.name/            phd at phdru.name
>            Programmers don't die, they just GOSUB without RETURN.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas



-- 
--Guido van Rossum (python.org/~guido)


From phd at phdru.name  Thu Jan 10 22:32:38 2013
From: phd at phdru.name (Oleg Broytman)
Date: Fri, 11 Jan 2013 01:32:38 +0400
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
Message-ID: <20130110213238.GC1640@iskra.aviel.ru>

On Thu, Jan 10, 2013 at 01:24:56PM -0800, Guido van Rossum <guido at python.org> wrote:
> Is there sample code for an HTTP client? What if the server doesn't
> yet support the feature?

   AFAIU the feature is implemented at the kernel level and doesn't
require any change at the user level, only a socket option. If the
server doesn't implement the feature the kernel on the client side
transparently (to the client) reverts to normal 3-way TCP handshaking.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From benoitc at gunicorn.org  Thu Jan 10 22:29:02 2013
From: benoitc at gunicorn.org (Benoit Chesneau)
Date: Thu, 10 Jan 2013 22:29:02 +0100
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
Message-ID: <AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>


On Jan 10, 2013, at 10:24 PM, Guido van Rossum <guido at python.org> wrote:

> Is there sample code for an HTTP client? What if the server doesn't
> yet support the feature?

Like I read it, this is transparent for the application if it doesn't support it.

https://lwn.net/Articles/508865/

- beno?t
> 
> On Thu, Jan 10, 2013 at 1:19 PM, Oleg Broytman <phd at phdru.name> wrote:
>> On Thu, Jan 10, 2013 at 10:06:21PM +0100, Federico Reghenzani <federico.dev at reghe.net> wrote:
>>> I've posted here because I'm asking if it may be an idea make some changes
>>> also in http module, maybe setting that option on 'True' as default (but
>>> first we need to fix the kernel-glibc problem).
>> 
>>   I think IWBN to patch as many network modules as (ftplib, urllib,
>> urllib2, xmlrpclib). Having tests also helps.
>> 
>> Oleg.
>> --
>>     Oleg Broytman            http://phdru.name/            phd at phdru.name
>>           Programmers don't die, they just GOSUB without RETURN.
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
> 
> 
> 
> -- 
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130110/ee2cc919/attachment.html>

From phd at phdru.name  Thu Jan 10 22:34:54 2013
From: phd at phdru.name (Oleg Broytman)
Date: Fri, 11 Jan 2013 01:34:54 +0400
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <20130110213238.GC1640@iskra.aviel.ru>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<20130110213238.GC1640@iskra.aviel.ru>
Message-ID: <20130110213454.GD1640@iskra.aviel.ru>

On Fri, Jan 11, 2013 at 01:32:38AM +0400, Oleg Broytman <phd at phdru.name> wrote:
> On Thu, Jan 10, 2013 at 01:24:56PM -0800, Guido van Rossum <guido at python.org> wrote:
> > Is there sample code for an HTTP client? What if the server doesn't
> > yet support the feature?
> 
>    AFAIU the feature is implemented at the kernel level and doesn't
> require any change at the user level, only a socket option. If the
> server doesn't implement the feature the kernel on the client side
> transparently (to the client) reverts to normal 3-way TCP handshaking.

   Sorry, I was completely confused. Yes, clients need different calls:
https://lwn.net/Articles/508865/

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From guido at python.org  Thu Jan 10 22:46:11 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 10 Jan 2013 13:46:11 -0800
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <20130110213454.GD1640@iskra.aviel.ru>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<20130110213238.GC1640@iskra.aviel.ru>
	<20130110213454.GD1640@iskra.aviel.ru>
Message-ID: <CAP7+vJ+offT_vTjAoTKeE3gz8PQCjjf+ReCdRSFu-77mSRg5AA@mail.gmail.com>

On Thu, Jan 10, 2013 at 1:34 PM, Oleg Broytman <phd at phdru.name> wrote:
> On Fri, Jan 11, 2013 at 01:32:38AM +0400, Oleg Broytman <phd at phdru.name> wrote:
>> On Thu, Jan 10, 2013 at 01:24:56PM -0800, Guido van Rossum <guido at python.org> wrote:
>> > Is there sample code for an HTTP client? What if the server doesn't
>> > yet support the feature?
>>
>>    AFAIU the feature is implemented at the kernel level and doesn't
>> require any change at the user level, only a socket option. If the
>> server doesn't implement the feature the kernel on the client side
>> transparently (to the client) reverts to normal 3-way TCP handshaking.
>
>    Sorry, I was completely confused. Yes, clients need different calls:
> https://lwn.net/Articles/508865/

Right, that's what I gleaned from skimming the referenced paper. But
that and the lwn article you link only show C code. Let's see some
Python! (I would try it, but no machine I have access to supports this
yet.)

Hopefully the OP has some sample Python code? Otherwise I think it's a
little too early to adopt this...

-- 
--Guido van Rossum (python.org/~guido)


From geertj at gmail.com  Fri Jan 11 00:10:10 2013
From: geertj at gmail.com (Geert Jansen)
Date: Fri, 11 Jan 2013 00:10:10 +0100
Subject: [Python-ideas] TestMill - Python system testing
Message-ID: <CADbA=FXemt3SDnz8Ov3ywHrDj3bOFamtb7jsdAHJJF0wEbWyHw@mail.gmail.com>

Hi,

my apologies if this is slightly off-topic but I believe this could be
useful. As a side project, I've been working on a tool to use my
company's cloud service to offer system testing for Python. The tool
is called TestMill, and it allows you to test your Python project for
free, remotely and in parallel on a range of different OSs (currently:
Fedora, CentOS and Ubuntu).

Feedback very much appreciated. Code can be found on Github here
https://github.com/ravello/testmill.

Regards,
Geert


From tjreedy at udel.edu  Fri Jan 11 03:45:46 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 10 Jan 2013 21:45:46 -0500
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
Message-ID: <kcnue0$3vn$1@ger.gmane.org>

On 1/10/2013 4:29 PM, Benoit Chesneau wrote:
>
> On Jan 10, 2013, at 10:24 PM, Guido van Rossum
> <guido at python.org
> <mailto:guido at python.org>> wrote:
>
>> Is there sample code for an HTTP client? What if the server doesn't
>> yet support the feature?
>
> Like I read it, this is transparent for the application if it doesn't
> support it.
>
> https://lwn.net/Articles/508865/

I read both the post (Aug 1, 2012, before the Linux 3.7 with the server 
code) and comments. FastOpen appears to still be an experimental 
proposal: "Currently, TFO is an Internet Draft with the IETF. ... (The 
current implementation employs the TCP Experimental Option Number 
facility as a placeholder for a real TCP Option Number.)". From the 
comments, I would say that its success outside of Google is not certain.

It appears that its main use case is repeated requests to webservers 
from browswers. This is because the latter often make *multiple* 
requests, often short, to the same site in order to construct a 
displayed web page. There is no time saving on the first request of a 
series. I suspect that after Google updates Chrome to use the new 
feature, one of the other 'independent' browsers is likely to be the 
next user.

To be active, the feature must be compiled into the socket code of both 
server and client machines AND must be explicitly requested by both 
client and server applications.

On the server side, it must be requested because the request makes a 
promise that syn+data requests will be handled idempotently. (So the 
default should be 'off'.) This is trivial for static web pages but may 
require app-specific overhead for anything else. So, in general, the app 
should not bother being able to handle FastOpen unless it will be run on 
servers with FastOpen, and for efficiency, it should not add the 
overhead unless it is needed because a particular request is from a 
FastOpen client.

This is not a problem for Google, with thousands of duplicate apps 
running on duplicate server configurations. But it was not clear in the 
OPs post how a Python app would know for sure whether a particular 
machine is FastOpen capable. I did not see the question of how a server 
app would know about the client connection type even addressed.

On the client side, .connect and at least the first .send must be 
combined into either .sendto or .sendmsg (which?, still to be decided, 
apparently;-) with a new MSG_FASTOPEN argument. So programs need a 
non-trivial rewrite. If a particular server is not fastopen capable, 
then new fastopen client kernal socket code can potentially handle the 
fallback to the old way. But if the client is not fastopen capable, the 
the fallback must be handled in the Python .sendto code or else in the 
client code. (So one of those layers must *know* the client system 
capability.)

Again, dealing with this, on multiple OSes, should be a lot easier for a 
monolithic browser like Chrome or Firefox (which might, on some systems, 
even use their own socket layer code), than for general purpose Python 
socket and app code.

So my conclusion is that this is (mostly) premature for Python at this 
time. This is a slight performance enhancement of limited use that will 
make code at least slightly more complex in a core module that must be 
keep at least as rock solid as it is now. Let Google get it working on 
both their servers and Chrome browser. And wait for Mozilla, say, to add 
it to Firefox. Things might change before the first 3.4 beta, but I 
think 3.5 is more likely. Of course, testing will require all 4 
combinations of client and server.

-- 
Terry Jan Reedy



From federico.dev at reghe.net  Fri Jan 11 08:30:08 2013
From: federico.dev at reghe.net (Federico Reghenzani)
Date: Fri, 11 Jan 2013 08:30:08 +0100
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <kcnue0$3vn$1@ger.gmane.org>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
	<kcnue0$3vn$1@ger.gmane.org>
Message-ID: <CADf4hJJibi037S0WxbWq-4-0LpOjMhvB4WBr-xvMnnUhWt+YYg@mail.gmail.com>

On Fri, Jan 11, 2013 at 3:45 AM, Terry Reedy <tjreedy at udel.edu> wrote:

>
> I read both the post (Aug 1, 2012, before the Linux 3.7 with the server
> code) and comments. FastOpen appears to still be an experimental proposal:
> "Currently, TFO is an Internet Draft with the IETF. ... (The current
> implementation employs the TCP Experimental Option Number facility as a
> placeholder for a real TCP Option Number.)". From the comments, I would say
> that its success outside of Google is not certain.
>
> It appears that its main use case is repeated requests to webservers from
> browswers. This is because the latter often make *multiple* requests, often
> short, to the same site in order to construct a displayed web page. There
> is no time saving on the first request of a series. I suspect that after
> Google updates Chrome to use the new feature, one of the other
> 'independent' browsers is likely to be the next user.
>

Yes, the protocol has been designed for situations where there are multiple
requests such as HTTP or FTP. Probably only in these cases default
'True' option is appropriate.


>
> To be active, the feature must be compiled into the socket code of both
> server and client machines AND must be explicitly requested by both client
> and server applications.
>
> On the server side, it must be requested because the request makes a
> promise that syn+data requests will be handled idempotently. (So the
> default should be 'off'.) This is trivial for static web pages but may
> require app-specific overhead for anything else. So, in general, the app
> should not bother being able to handle FastOpen unless it will be run on
> servers with FastOpen, and for efficiency, it should not add the overhead
> unless it is needed because a particular request is from a FastOpen client.
>

If the server doesn't support FastOpen and receive a FastOpen request from
a client capable, it simply ignores the TFO cookie and reply with a normal
SYN+ACK. In this case the first packet (SYN+TFO from client) is only 4 byte
larger than normal connection; no other packet is bigger than normal. So
for an server app that does not support FastOpen, is completely transparent
and does not cause any overhead.



>
> This is not a problem for Google, with thousands of duplicate apps running
> on duplicate server configurations. But it was not clear in the OPs post
> how a Python app would know for sure whether a particular machine is
> FastOpen capable. I did not see the question of how a server app would know
> about the client connection type even addressed.
>

The server know the client connection type by the first packet that it
sends: if the first packet coming by client is a SYN+TFO cookie the server
proceed to generate cookie and continue with a FastOpen connection, if the
first packet is a SYN, the server proceed with normal 3-handshake
connection. In any case these operations are transparent both to Python
that application because they're made by kernel.


>
> On the client side, .connect and at least the first .send must be combined
> into either .sendto or .sendmsg (which?, still to be decided, apparently;-)
> with a new MSG_FASTOPEN argument. So programs need a non-trivial rewrite.
> If a particular server is not fastopen capable, then new fastopen client
> kernal socket code can potentially handle the fallback to the old way. But
> if the client is not fastopen capable, the the fallback must be handled in
> the Python .sendto code or else in the client code. (So one of those layers
> must *know* the client system capability.)
>

As I said, if a client uses a .sendto or a .sendmsg with MSG_FASTOPEN on a
server no-tfo capable, the linux kernel fallback to the old way, therefore
it is as if it has done normal .connect and .send. The application don't
know if the connection has been made in TFO-mode or normal mode and does
not care to know.


>
> Again, dealing with this, on multiple OSes, should be a lot easier for a
> monolithic browser like Chrome or Firefox (which might, on some systems,
> even use their own socket layer code), than for general purpose Python
> socket and app code.
>
> So my conclusion is that this is (mostly) premature for Python at this
> time. This is a slight performance enhancement of limited use that will
> make code at least slightly more complex in a core module that must be keep
> at least as rock solid as it is now. Let Google get it working on both
> their servers and Chrome browser. And wait for Mozilla, say, to add it to
> Firefox. Things might change before the first 3.4 beta, but I think 3.5 is
> more likely. Of course, testing will require all 4 combinations of client
> and server.


We can introduce TFO only in some modules such as HTTP or FTP. The code is
not really complex: for the server is only a .setsockopt before .listen and
for the client we should replace the .connect and the first .send with a
single .sendto or .sendmsg.


On Jan 10, 2013, at 10:46 PM, Guido van Rossum:

>

Hopefully the OP has some sample Python code?


Yes, it is pratically same as C, I attached examples (I needed to declare
manually TCP and MSG constants because my glibc hasn't them yet).


Federico Reghenzani
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130111/818545e9/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tfo.tar.gz
Type: application/x-gzip
Size: 10240 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130111/818545e9/attachment.bin>

From benoitc at gunicorn.org  Fri Jan 11 15:00:47 2013
From: benoitc at gunicorn.org (Benoit Chesneau)
Date: Fri, 11 Jan 2013 15:00:47 +0100
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <CADf4hJJibi037S0WxbWq-4-0LpOjMhvB4WBr-xvMnnUhWt+YYg@mail.gmail.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
	<kcnue0$3vn$1@ger.gmane.org>
	<CADf4hJJibi037S0WxbWq-4-0LpOjMhvB4WBr-xvMnnUhWt+YYg@mail.gmail.com>
Message-ID: <59B626EC-5C9C-480E-AC8C-2299CAB139A9@gunicorn.org>


On Jan 11, 2013, at 8:30 AM, Federico Reghenzani <federico.dev at reghe.net> wrote:

> 
> On Fri, Jan 11, 2013 at 3:45 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> 
> I read both the post (Aug 1, 2012, before the Linux 3.7 with the server code) and comments. FastOpen appears to still be an experimental proposal: "Currently, TFO is an Internet Draft with the IETF. ... (The current implementation employs the TCP Experimental Option Number facility as a placeholder for a real TCP Option Number.)". From the comments, I would say that its success outside of Google is not certain.
> 
> It appears that its main use case is repeated requests to webservers from browswers. This is because the latter often make *multiple* requests, often short, to the same site in order to construct a displayed web page. There is no time saving on the first request of a series. I suspect that after Google updates Chrome to use the new feature, one of the other 'independent' browsers is likely to be the next user.
>  
> Yes, the protocol has been designed for situations where there are multiple requests such as HTTP or FTP. Probably only in these cases default 'True' option is appropriate.
>  
> 
> To be active, the feature must be compiled into the socket code of both server and client machines AND must be explicitly requested by both client and server applications.
> 
> On the server side, it must be requested because the request makes a promise that syn+data requests will be handled idempotently. (So the default should be 'off'.) This is trivial for static web pages but may require app-specific overhead for anything else. So, in general, the app should not bother being able to handle FastOpen unless it will be run on servers with FastOpen, and for efficiency, it should not add the overhead unless it is needed because a particular request is from a FastOpen client.
> 
> If the server doesn't support FastOpen and receive a FastOpen request from a client capable, it simply ignores the TFO cookie and reply with a normal SYN+ACK. In this case the first packet (SYN+TFO from client) is only 4 byte larger than normal connection; no other packet is bigger than normal. So for an server app that does not support FastOpen, is completely transparent and does not cause any overhead.
> 
>  
> 
> This is not a problem for Google, with thousands of duplicate apps running on duplicate server configurations. But it was not clear in the OPs post how a Python app would know for sure whether a particular machine is FastOpen capable. I did not see the question of how a server app would know about the client connection type even addressed.
> 
> The server know the client connection type by the first packet that it sends: if the first packet coming by client is a SYN+TFO cookie the server proceed to generate cookie and continue with a FastOpen connection, if the first packet is a SYN, the server proceed with normal 3-handshake connection. In any case these operations are transparent both to Python that application because they're made by kernel. 
>  
> 
> On the client side, .connect and at least the first .send must be combined into either .sendto or .sendmsg (which?, still to be decided, apparently;-) with a new MSG_FASTOPEN argument. So programs need a non-trivial rewrite. If a particular server is not fastopen capable, then new fastopen client kernal socket code can potentially handle the fallback to the old way. But if the client is not fastopen capable, the the fallback must be handled in the Python .sendto code or else in the client code. (So one of those layers must *know* the client system capability.)
> 
> As I said, if a client uses a .sendto or a .sendmsg with MSG_FASTOPEN on a server no-tfo capable, the linux kernel fallback to the old way, therefore it is as if it has done normal .connect and .send. The application don't know if the connection has been made in TFO-mode or normal mode and does not care to know.
>  
> 
> Again, dealing with this, on multiple OSes, should be a lot easier for a monolithic browser like Chrome or Firefox (which might, on some systems, even use their own socket layer code), than for general purpose Python socket and app code.
> 
> So my conclusion is that this is (mostly) premature for Python at this time. This is a slight performance enhancement of limited use that will make code at least slightly more complex in a core module that must be keep at least as rock solid as it is now. Let Google get it working on both their servers and Chrome browser. And wait for Mozilla, say, to add it to Firefox. Things might change before the first 3.4 beta, but I think 3.5 is more likely. Of course, testing will require all 4 combinations of client and server.
> 
> We can introduce TFO only in some modules such as HTTP or FTP. The code is not really complex: for the server is only a .setsockopt before .listen and for the client we should replace the .connect and the first .send with a single .sendto or .sendmsg.
> 
> 
> On Jan 10, 2013, at 10:46 PM, Guido van Rossum:
>  
> Hopefully the OP has some sample Python code? 
>  
> Yes, it is pratically same as C, I attached examples (I needed to declare manually TCP and MSG constants because my glibc hasn't them yet).
>  
> 
> Federico Reghenzani
> <tfo.tar.gz>_______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas


For expetimentation I added a patch to gunicorn in the  `featire/tcp_fast` branch:

https://github.com/benoitc/gunicorn/pull/471

I expect to do the same in my restkit (http client lib) so i can test all together. So far this API can be interesting for internal purpose as well.

- beno?t


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130111/032816a2/attachment.html>

From mal at egenix.com  Fri Jan 11 15:02:07 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 11 Jan 2013 15:02:07 +0100
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <kcnue0$3vn$1@ger.gmane.org>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
	<kcnue0$3vn$1@ger.gmane.org>
Message-ID: <50F01B5F.5060807@egenix.com>

On 11.01.2013 03:45, Terry Reedy wrote:
> So my conclusion is that this is (mostly) premature for Python at this time. This is a slight
> performance enhancement of limited use that will make code at least slightly more complex in a core
> module that must be keep at least as rock solid as it is now. Let Google get it working on both
> their servers and Chrome browser. And wait for Mozilla, say, to add it to Firefox. Things might
> change before the first 3.4 beta, but I think 3.5 is more likely. Of course, testing will require
> all 4 combinations of client and server.

Agreed.

I also wonder how this relates to HTTP pipelining, a feature
to improve the same multiple-requests-to-one-server situation.

Pipelining has been implemented for years both on clients and servers,
yet it is still turned off per default in e.g. Firefox:

http://en.wikipedia.org/wiki/HTTP_pipelining

There's also HTTP 2.0 on the horizon, so it may be better to
what which of those technologies actually gets enough use
in practice, before adding support to the Python library.

That said, it may be useful to have a PyPI package which implements
the FastOpen protocol in a separate socket implementation (which can
then monkey itself into the stdlib, if the application developer
wants this).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 11 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-01-22: Python Meeting Duesseldorf ...                 11 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


From rurpy at yahoo.com  Fri Jan 11 18:16:05 2013
From: rurpy at yahoo.com (rurpy at yahoo.com)
Date: Fri, 11 Jan 2013 09:16:05 -0800 (PST)
Subject: [Python-ideas] csv dialect enhancement
Message-ID: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com>

There is a common dialect of CSV, often used in database
applications [*1], that distinguishes between an empty
(quoted) string,
 
  e.g., the second field in  "abc","",3
 
and an empty field,
 
  e.g., the second field in "abc",,3
 
This distinction is needed to specify or tell the 
difference between 0-length strings and NULLs, when sending
csv data to or receiving it from a database application.

AFAICT, Python's csv module does not distinguish between
empty fields and empty quoted strings.  Both of the examples 
above, when parsed by csv.Reader, will return ['abc', '', 3] 
(or possibly '3' for the last item depending on options).  
Similarly, csv.Writer produces the same output csv text 
(nothing or a quoted empty string depending on Dialect.quoting) 
for row items '' or None.

csv.Reader could distinguish between the above cases by
using an empty string ('') to report an empty (quoted) string 
field, and None to report an empty field.  Thus the second 
example would produce ['abc', None, 3] (or ...,'3').  Similarly,
csv.Writer could produce alternate text (nothing or a quoted 
empty string) depending on whether a row item was None or 
an empty string.
 
I propose that a new dialect attribute be added, "nulls" [*2],
which when false (default) will cause csv to behave as it currently
does.  When true it will have the following effect:
 
Reader:
  When two adjacent delimiters occur, or two white-space
  separated delimiters when Dialect.skipinitialspace is true,
  a value of None will be returned for that field.
 
Writer:
  When a None is present in the the list of items being
  formatted, it will result in an empty output field
  (two adjacent delimiters) regardless of other options
  (eg a QUOTE_ALL setting.)

Sniffer:
  Will set "nulls" to True when both adjacent delimiters and
  quoted empty strings are seen in the input text. 
  (Perhaps this behaviour needs to be optional for backward
  compatibility reasons?)

I think this will allow the csv module to generate the csv
dialect(s) commonly used by databases applications. 

A specific use case:

I am migrating data from a MS Access database to Postgresql.
I run a tool that extracts table data from Access and correctly
produces CSV files in the dialect used by Postgresql with some 
(nullable) column values having empty fields and other non-
nullable column values having empty string fields.

But I need to modify some values before import.  So I write a
Python program that parses the csv data, modifies some of it
and writes it back out, using the csv module.  But the result 
is that all empty fields and empty strings are written out 
identically as one or the other (the distinction is not preserved).  
Result is that information is lost and the output cannot be 
used.  I would be able to do this if the csv module provide a
"nulls" option as proposed above.

AFAICT, Python's csv module does not distinguish between
empty fields and empty quoted strings.
Both of the examples above, when parsed by csv.Reader,
will return ['abc', '', 3] (or possibly '3' for the last item
depending on options).  Similarly, csv.Writer produces the
same output csv text (nothing or a quoted empty string
depending on Dialect.quoting) for row items '' or None.

csv.Reader could distinguish between the above cases by 
using an empty string ('') to report an empty (quoted) string field, 
and None to report an empty field.  Thus the second example 
would produce ['abc', None, 3] (or ...,'3').  Similarly, csv.Writer
could produce alternate text (nothing or a quoted empty string)
depending on whether a row item was None or an empty string.
 
I propose that a new dialect attribute be added, "nulls" [*2],
which when false (default) will cause csv to behave as it currently 
does.  When true it will have the following effect:
 
Reader: 
  When two adjacent delimiters occur, or two white-space 
  separated delimiters when Dialect.skipinitialspace is true, 
  a value of None will be returned for that field. 
 
Writer: 
  When a None is present in the the list of items being
  formatted, it will result in an empty output field 
  (two adjacent delimiters) regardless of other options
  (eg a QUOTE_ALL setting.)

Sniffer:
  Will set "nulls" to True when both adjacent delimiters and
  quoted empty strings are seen in the input text.  
  (Perhaps this behaviour needs to be optional for backward
  compatibility reasons?)

I think this will allow the csv module to generate the csv 
dialect(s) required for databases applications.  

A specific use case:

I am migrating data from a MS Access database to Postgresql.
I run a tool that extracts table data from Access and produces
CSV files in the dialect used by Postgresql with some (nullable)
column values having empty fields and other non-nullable column
values having empty string fields.

But I need to modify some values before import.  So I write a 
Python program that parses the csv data, modifies some of it 
and writes it out, using the csv module.  But the result is that 
all empty fields and empty strings are written out identically
as one or the other (the distinction is not preserved).  Result
is that information is lost and the output cannot be used. 
I would be able to do this if the csv module provide a 
"nulls" option as proposed above.

----
[*1] One of the two most important open-source databases,
Postgresql, uses this dialect.  See:
  http://www.postgresql.org/docs/9.2/interactive/sql-copy.html#AEN66692
I don't know about the other.

[*2] I don't really care what the attribute name is; I chose 
"nulls" as a trial balloon because I wanted to avoid something 
with "none" in it to avoid confusion with QUOTE_NONE.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130111/a9ddf75f/attachment.html>

From jimjjewett at gmail.com  Fri Jan 11 18:50:25 2013
From: jimjjewett at gmail.com (Jim Jewett)
Date: Fri, 11 Jan 2013 12:50:25 -0500
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <CADf4hJJibi037S0WxbWq-4-0LpOjMhvB4WBr-xvMnnUhWt+YYg@mail.gmail.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
	<kcnue0$3vn$1@ger.gmane.org>
	<CADf4hJJibi037S0WxbWq-4-0LpOjMhvB4WBr-xvMnnUhWt+YYg@mail.gmail.com>
Message-ID: <CA+OGgf5fDbJxE0PpMQedGV0V5BNxmYb-S5wSLNVyB_GoFto2Fw@mail.gmail.com>

On 1/11/13, Federico Reghenzani <federico.dev at reghe.net> wrote:
> On Fri, Jan 11, 2013 at 3:45 AM, Terry Reedy <tjreedy at udel.edu> wrote:

> Yes, the protocol has been designed for situations where there are multiple
> requests such as HTTP or FTP. Probably only in these cases default
> 'True' option is appropriate.

What is the harm of using it in other situations?  If the answer were truly
just "4 bytes per host", then it might still be a good tradeoff.

>> To be active, the feature must be compiled into the socket code of both
>> server and client machines AND must be explicitly requested by both
>> client and server applications.

This, however, is a problem.

Based on (most of) the rest of your descriptions, it sounds like a
seamless drop-in replacement; it should be an implementation detail
that applications never ever notice, like having a security patch
applied to the operating system when python isn't even running.

But if that were true, an explicit request would be overly cautious,
unless this were truly still so experimental that production servers
(and, thus, the python distribution in a default build) should not yet
use it.

Also note that if it isn't available on Windows (and probably even
on Windows XP without additional dependencies), Python can't
yet rely on it.

Below, you also say that it is not appropriate for servers unless
syn+data is idempotent -- but I don't know even what that means
without looking it up, let alone whether it is true of my app -- so it
sounds like a bug magnet.

> The server know the client connection type by the first packet that it
> sends: if the first packet coming by client is a SYN+TFO cookie the server
> proceed to generate cookie and continue with a FastOpen connection, if the
> first packet is a SYN, the server proceed with normal 3-handshake
> connection. In any case these operations are transparent both to Python
> that application because they're made by kernel.

So how is this a python issue at all?  Because of the explicit request?
Because of the need to keep something idempotent?

I see no harm in letting open accept and pass through additional optional
arguments, or in a generic way to query the kernel for its extensions, but
if you need something specific to this particular extension, then please
do it as an external package first.

>> On the client side, .connect and at least the first .send must be
>> combined
>> into either .sendto or .sendmsg (which?, still to be decided,
>> apparently;-)
>> with a new MSG_FASTOPEN argument. So programs need a non-trivial rewrite.

Application programs, or just the plumbing in the httplib?

-jJ


From rurpy at yahoo.com  Fri Jan 11 18:51:09 2013
From: rurpy at yahoo.com (rurpy at yahoo.com)
Date: Fri, 11 Jan 2013 09:51:09 -0800 (PST)
Subject: [Python-ideas] csv dialect enhancement (repost)
In-Reply-To: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com>
References: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com>
Message-ID: <d5f10739-c6b0-4228-95ba-414d8bb449d7@googlegroups.com>

[Sorry for the duplicated text in the previous post, please ignore 
that one in favor of this one]

There is a common dialect of CSV, often used in database
applications [*1], that distinguishes between an empty
(quoted) string,
 
  e.g., the second field in  "abc","",3
 
and an empty field,
 
  e.g., the second field in "abc",,3
 
This distinction is needed to specify or tell the
difference between 0-length strings and NULLs, when sending
csv data to or receiving it from a database application.

AFAICT, Python's csv module does not distinguish between
empty fields and empty quoted strings.  Both of the examples
above, when parsed by csv.Reader, will return ['abc', '', 3]
(or possibly '3' for the last item depending on options). 
Similarly, csv.Writer produces the same output csv text
(nothing or a quoted empty string depending on Dialect.quoting)
for row items '' or None.

csv.Reader could distinguish between the above cases by
using an empty string ('') to report an empty (quoted) string
field, and None to report an empty field.  Thus the second
example would produce ['abc', None, 3] (or ...,'3').  Similarly,
csv.Writer could produce alternate text (nothing or a quoted
empty string) depending on whether a row item was None or
an empty string.
 
I propose that a new dialect attribute be added, "nulls" [*2],
which when false (default) will cause csv to behave as it currently
does.  When true it will have the following effect:
 
Reader:
  When two adjacent delimiters occur, or two white-space
  separated delimiters when Dialect.skipinitialspace is true,
  a value of None will be returned for that field.
 
Writer:
  When a None is present in the the list of items being
  formatted, it will result in an empty output field
  (two adjacent delimiters) regardless of other options
  (eg a QUOTE_ALL setting.)

Sniffer:
  Will set "nulls" to True when both adjacent delimiters and
  quoted empty strings are seen in the input text.
  (Perhaps this behaviour needs to be optional for backward
  compatibility reasons?)

I think this will allow the csv module to generate the csv
dialect(s) commonly used by databases applications.

A specific use case:

I am migrating data from a MS Access database to Postgresql.
I run a tool that extracts table data from Access and correctly
produces CSV files in the dialect used by Postgresql with some
(nullable) column values having empty fields and other non-
nullable column values having empty string fields.

But I need to modify some values before import.  So I write a
Python program that parses the csv data, modifies some of it
and writes it back out, using the csv module.  But the result
is that all empty fields and empty strings are written out
identically as one or the other (the distinction is not preserved). 
Result is that information is lost and the output cannot be
used.  I would be able to do this if the csv module provide a
"nulls" option as proposed above.

----
[*1] One of the two most important open-source databases,
Postgresql, uses this dialect.  See:
  http://www.postgresql.org/docs/9.2/interactive/sql-copy.html#AEN66692
I don't know about the other.

[*2] I don't really care what the attribute name is; I chose
"nulls" as a trial balloon because I wanted to avoid something
with "none" in it to avoid confusion with QUOTE_NONE.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130111/7c83e73e/attachment.html>

From ethan at stoneleaf.us  Fri Jan 11 18:49:27 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 11 Jan 2013 09:49:27 -0800
Subject: [Python-ideas] csv dialect enhancement
In-Reply-To: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com>
References: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com>
Message-ID: <50F050A7.1010409@stoneleaf.us>

On 01/11/2013 09:16 AM, rurpy at yahoo.com wrote:
> I propose that a new dialect attribute be added, "nulls",
> which when false (default) will cause csv to behave as it currently
> does.  When true it will have the following effect:
>
> Reader:
>    When two adjacent delimiters occur, or two white-space
>    separated delimiters when Dialect.skipinitialspace is true,
>    a value of None will be returned for that field.
>
> Writer:
>    When a None is present in the the list of items being
>    formatted, it will result in an empty output field
>    (two adjacent delimiters) regardless of other options
>    (eg a QUOTE_ALL setting.)
>
> Sniffer:
>    Will set "nulls" to True when both adjacent delimiters and
>    quoted empty strings are seen in the input text.
>    (Perhaps this behaviour needs to be optional for backward
>    compatibility reasons?)

+1


From yorik.sar at gmail.com  Fri Jan 11 21:03:51 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Sat, 12 Jan 2013 00:03:51 +0400
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <50F01B5F.5060807@egenix.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
	<kcnue0$3vn$1@ger.gmane.org> <50F01B5F.5060807@egenix.com>
Message-ID: <CABocrW6xraudAMbut=d8_EMEtyMWM1CpGvKZtLtz3bub5GGqyA@mail.gmail.com>

On Fri, Jan 11, 2013 at 6:02 PM, M.-A. Lemburg <mal at egenix.com> wrote:

> That said, it may be useful to have a PyPI package which implements
> the FastOpen protocol in a separate socket implementation (which can
> then monkey itself into the stdlib, if the application developer
> wants this).
>

TCP Fast Open should be supported in client code directly, it's not enough
to have socket() supporting it. It's not up to socket() implementation.

Server-side is pretty simple, so to say "Python supports TCP_FASTOPEN"
there should be support implemented for each (or most) client libraries in
stdlib, such as almost every module in
http://docs.python.org/3/library/internet.html

Monkey-patching all these modules (or their connect() parts) is not very
clean way, I think.

-- 

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130112/4d88329d/attachment.html>

From mal at egenix.com  Fri Jan 11 23:12:21 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 11 Jan 2013 23:12:21 +0100
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <CABocrW6xraudAMbut=d8_EMEtyMWM1CpGvKZtLtz3bub5GGqyA@mail.gmail.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
	<kcnue0$3vn$1@ger.gmane.org> <50F01B5F.5060807@egenix.com>
	<CABocrW6xraudAMbut=d8_EMEtyMWM1CpGvKZtLtz3bub5GGqyA@mail.gmail.com>
Message-ID: <50F08E45.5050602@egenix.com>

On 11.01.2013 21:03, Yuriy Taraday wrote:
> On Fri, Jan 11, 2013 at 6:02 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>> That said, it may be useful to have a PyPI package which implements
>> the FastOpen protocol in a separate socket implementation (which can
>> then monkey itself into the stdlib, if the application developer
>> wants this).
>>
> 
> TCP Fast Open should be supported in client code directly, it's not enough
> to have socket() supporting it. It's not up to socket() implementation.

Right, the new methods would have to be used by the application.

> Server-side is pretty simple, so to say "Python supports TCP_FASTOPEN"
> there should be support implemented for each (or most) client libraries in
> stdlib, such as almost every module in
> http://docs.python.org/3/library/internet.html
> 
> Monkey-patching all these modules (or their connect() parts) is not very
> clean way, I think.

Of course not, but it's viable way to test drive such an implementation
before putting the code directly into the stdlib modules.

gevent uses the same approach, BTW.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 11 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-01-22: Python Meeting Duesseldorf ...                 11 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


From guido at python.org  Fri Jan 11 23:23:00 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 11 Jan 2013 14:23:00 -0800
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <50F08E45.5050602@egenix.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
	<kcnue0$3vn$1@ger.gmane.org> <50F01B5F.5060807@egenix.com>
	<CABocrW6xraudAMbut=d8_EMEtyMWM1CpGvKZtLtz3bub5GGqyA@mail.gmail.com>
	<50F08E45.5050602@egenix.com>
Message-ID: <CAP7+vJJCPP_SMLh3d=VwT38Nu7BRWCpa+tqE4nkM42zUVBoETw@mail.gmail.com>

So, again. Has *anyone* actually written *any* working Python code for this?

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Sat Jan 12 00:41:05 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 11 Jan 2013 15:41:05 -0800
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question:
	CPU vs. I/O starvation
Message-ID: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>

Here's an interesting puzzle. Check out the core of Tulip's event
loop: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672

Specifically this does something like this:

1. poll for I/O, appending any ready handlers to the _ready queue

2. append any handlers scheduled for a time <= now to the _ready queue

3. while _ready:
       handler = _ready.popleft()
       call handler

It is the latter loop that causes me some concern. In theory it is
possible for a bad callback to make this loop never finish, as
follows:

def hogger():
    tulip.get_event_loop().call_soon(hogger)

Because call_soon() appends the handler to the _ready queue, the while
loop will never finish.

There is a simple enough solution (Tornado uses this AFAIK):

now_ready = list(_ready)
_ready.clear()
for handler in now_ready:
    call handler

However this implies that we go back to the I/O polling code more
frequently. While the I/O polling code sets the timeout to zero when
there's anything in the _ready queue, so it won't block, it still
isn't free; it's an expensive system call that we'd like to put off
until we have nothing better to do.

I can imagine various patterns where handlers append other handlers to
the _ready queue for immediate execution, and I'd make such patterns
efficient (i.e. the user shouldn't have to worry about the cost of the
I/O poll compared to the amount of work appended to the _ready queue).
It is also convenient to say that a hogger that really wants to hog
the CPU can do so anyway, e.g.:

def hogger():
    while True:
        pass

However this would pretty much assume malice; real-life versions of
the former hogger pattern may be spread across many callbacks and
could be hard to recognize or anticipate.

So what's more important? Avoid I/O starvation at all cost or make the
callbacks-posting-callbacks pattern efficient? I can see several
outcomes of this discussion: we could end up deciding that one or the
other strategy is always best; we could also leave it up to the
implementation (but then I still would want guidance for what to do in
Tulip); we could even decide this is so important that the user needs
to be able to control the policy here (though I hate having many
configuration options, since in practice few people bother to take
control, and you might as well have hard-coded the default...).

Thoughts? Do I need to explain it better?

-- 
--Guido van Rossum (python.org/~guido)


From ronaldoussoren at mac.com  Sat Jan 12 01:03:33 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 11 Jan 2013 16:03:33 -0800
Subject: [Python-ideas] TCP Fast Open protocol
In-Reply-To: <CA+OGgf5fDbJxE0PpMQedGV0V5BNxmYb-S5wSLNVyB_GoFto2Fw@mail.gmail.com>
References: <CADf4hJKMnczvSW1nFivOX_aR-sL340qbpcJLOSVoFM-SvbWDXQ@mail.gmail.com>
	<20130110205541.GA1640@iskra.aviel.ru>
	<CADf4hJ+V=VFSPhgpnNSEb8SKwGAYSOg=PLdGyryBr5rpbUH2Fw@mail.gmail.com>
	<20130110211938.GB1640@iskra.aviel.ru>
	<CAP7+vJJKu+fQ08_qkMvk8yPs+OOvsO7aiRKJyUZNCCsUpQ-UsA@mail.gmail.com>
	<AF4E7D19-5B84-434E-8451-843859D9ACE7@gunicorn.org>
	<kcnue0$3vn$1@ger.gmane.org>
	<CADf4hJJibi037S0WxbWq-4-0LpOjMhvB4WBr-xvMnnUhWt+YYg@mail.gmail.com>
	<CA+OGgf5fDbJxE0PpMQedGV0V5BNxmYb-S5wSLNVyB_GoFto2Fw@mail.gmail.com>
Message-ID: <8EEEA6D5-17A7-4FAF-9EEC-A0E6E15E3BBB@mac.com>


On 11 Jan, 2013, at 9:50, Jim Jewett <jimjjewett at gmail.com> wrote:

> On 1/11/13, Federico Reghenzani <federico.dev at reghe.net> wrote:
>> On Fri, Jan 11, 2013 at 3:45 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> 
>> Yes, the protocol has been designed for situations where there are multiple
>> requests such as HTTP or FTP. Probably only in these cases default
>> 'True' option is appropriate.
> 
> What is the harm of using it in other situations?  If the answer were truly
> just "4 bytes per host", then it might still be a good tradeoff.
> 
>>> To be active, the feature must be compiled into the socket code of both
>>> server and client machines AND must be explicitly requested by both
>>> client and server applications.
> 
> This, however, is a problem.
> 
> Based on (most of) the rest of your descriptions, it sounds like a
> seamless drop-in replacement; it should be an implementation detail
> that applications never ever notice, like having a security patch
> applied to the operating system when python isn't even running.
> 
> But if that were true, an explicit request would be overly cautious,
> unless this were truly still so experimental that production servers
> (and, thus, the python distribution in a default build) should not yet
> use it.

It must be explictly requested by the server because the behavior might change,
in particular the lwn.net page about this feature mentions that duplicate SYN
messages are not detected, and if I parse that page correctly that might mean
that the servers gets two or more requests when the connection is unreliable (or
slow) and retransmission happens. That is fine for static webpages, but not if
the client request has side effects (e.g. the server starts updates a database).

BTW. This (linux-only) feature is very new, it would IMHO be useful to use this
in real life with a package on PyPI that monkeypatches the stdlib before adding
the feature to the stdlib.   It is currently not clear if the option will be usefull in the long
run.

Ronald


From zuo at chopin.edu.pl  Sat Jan 12 02:28:25 2013
From: zuo at chopin.edu.pl (Jan Kaliszewski)
Date: Sat, 12 Jan 2013 02:28:25 +0100
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
Message-ID: <cae95346cbbccf7cee1f1a8f56fa3469@chopin.edu.pl>

12.01.2013 00:41, Guido van Rossum wrote:

> Here's an interesting puzzle. Check out the core of Tulip's event
> loop: 
> http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672
>
> Specifically this does something like this:
>
> 1. poll for I/O, appending any ready handlers to the _ready queue
>
> 2. append any handlers scheduled for a time <= now to the _ready 
> queue
>
> 3. while _ready:
>        handler = _ready.popleft()
>        call handler
>
> It is the latter loop that causes me some concern. In theory it is
> possible for a bad callback to make this loop never finish, as
> follows:
>
> def hogger():
>     tulip.get_event_loop().call_soon(hogger)
>
> Because call_soon() appends the handler to the _ready queue, the 
> while
> loop will never finish.
>
> There is a simple enough solution (Tornado uses this AFAIK):
>
> now_ready = list(_ready)
> _ready.clear()
> for handler in now_ready:
>     call handler
>
> However this implies that we go back to the I/O polling code more
> frequently. While the I/O polling code sets the timeout to zero when
> there's anything in the _ready queue, so it won't block, it still
> isn't free; it's an expensive system call that we'd like to put off
> until we have nothing better to do.
[...]
> So what's more important? Avoid I/O starvation at all cost or make 
> the
> callbacks-posting-callbacks pattern efficient? I can see several
> outcomes of this discussion: we could end up deciding that one or the
> other strategy is always best; we could also leave it up to the
> implementation (but then I still would want guidance for what to do 
> in
> Tulip); we could even decide this is so important that the user needs
> to be able to control the policy here
[...]

Maybe it could be, at least for the standard Tulip implementation,
parameterizable with a simple integer value -- the suggested max number
of loop iterations?

E.g. something like the following:

     # `suggested_iter_limit` is the parameter
     actual_limit = max(len(_ready), suggested_iter_limit)
     for i in range(actual_limit):
         if not _ready:
             break
         handler = _ready.popleft()
         call handler...

Regards.
*j



From ncoghlan at gmail.com  Sat Jan 12 04:08:14 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Jan 2013 13:08:14 +1000
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
Message-ID: <CADiSq7ctUZUY0CaSoWhj+TG=b_an1E1T5xFc4QS142uj9MJaRw@mail.gmail.com>

On Sat, Jan 12, 2013 at 9:41 AM, Guido van Rossum <guido at python.org> wrote:
> So what's more important? Avoid I/O starvation at all cost or make the
> callbacks-posting-callbacks pattern efficient? I can see several
> outcomes of this discussion: we could end up deciding that one or the
> other strategy is always best; we could also leave it up to the
> implementation (but then I still would want guidance for what to do in
> Tulip); we could even decide this is so important that the user needs
> to be able to control the policy here (though I hate having many
> configuration options, since in practice few people bother to take
> control, and you might as well have hard-coded the default...).
>
> Thoughts? Do I need to explain it better?

Given the availability of "yield from" as a tool for efficiently
invoking other asynchronous operations without hitting the event loop
at all, it seems to me that it is more appropriate to avoid IO
starvation by interleaving IO event processing and ready callback
processing.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ncoghlan at gmail.com  Sat Jan 12 04:20:40 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Jan 2013 13:20:40 +1000
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <CADiSq7ctUZUY0CaSoWhj+TG=b_an1E1T5xFc4QS142uj9MJaRw@mail.gmail.com>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
	<CADiSq7ctUZUY0CaSoWhj+TG=b_an1E1T5xFc4QS142uj9MJaRw@mail.gmail.com>
Message-ID: <CADiSq7fGF7_Pv7oEAyTVwrFXsaKyY5NVeb27a2YMoLQeZ_Mv=A@mail.gmail.com>

On Sat, Jan 12, 2013 at 1:08 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Sat, Jan 12, 2013 at 9:41 AM, Guido van Rossum <guido at python.org> wrote:
>> So what's more important? Avoid I/O starvation at all cost or make the
>> callbacks-posting-callbacks pattern efficient? I can see several
>> outcomes of this discussion: we could end up deciding that one or the
>> other strategy is always best; we could also leave it up to the
>> implementation (but then I still would want guidance for what to do in
>> Tulip); we could even decide this is so important that the user needs
>> to be able to control the policy here (though I hate having many
>> configuration options, since in practice few people bother to take
>> control, and you might as well have hard-coded the default...).
>>
>> Thoughts? Do I need to explain it better?
>
> Given the availability of "yield from" as a tool for efficiently
> invoking other asynchronous operations without hitting the event loop
> at all, it seems to me that it is more appropriate to avoid IO
> starvation by interleaving IO event processing and ready callback
> processing.

Oops, I meant to include a link to http://bugs.python.org/issue7946,
which is about the convoy effect created by the GIL implementation
when I/O bound threads are processed in the presence of a CPU bound
thread (essentially, the I/O latency increases to the GIL check
interval). (The thing that changed in 3.2 is that the magnitude of the
convoy effect is now independent of the work-per-bytecode in the CPU
bound thread)

That's what makes me think always alternating between processing ready
callbacks and checking for IO events is the right thing to do.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From robertc at robertcollins.net  Sat Jan 12 06:06:22 2013
From: robertc at robertcollins.net (Robert Collins)
Date: Sat, 12 Jan 2013 18:06:22 +1300
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
Message-ID: <CAJ3HoZ1rBgrUDPsoS6irCLaB+dFCLKJeHwAXu1aL04ofj40dWA@mail.gmail.com>

On 12 January 2013 12:41, Guido van Rossum <guido at python.org> wrote:
> Here's an interesting puzzle. Check out the core of Tulip's event
> loop: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672
> now_ready = list(_ready)
> _ready.clear()
> for handler in now_ready:
>     call handler
>
> However this implies that we go back to the I/O polling code more
> frequently. While the I/O polling code sets the timeout to zero when
> there's anything in the _ready queue, so it won't block, it still
> isn't free; it's an expensive system call that we'd like to put off
> until we have nothing better to do.

How expensive is it really? If its select, its terrible, but we
shouldn't be using that anywhere.
if its poll() it is moderately expensive, but still it doesn't scale -
its linear with fd's.

If its IO Completion ports in Windows, it is approximately free - the
OS calls back into us every time we tell it we're ready for more
events.
And if its epoll it is also basically free, reading off of an event
queue rather than checking every entry in the array.
kqueue has similar efficiency, for BSD systems.

I'd want to see some actual numbers before assuming that the call into
epoll or completion is actually a driving factor in latency here.

-Rob


From stefan_ml at behnel.de  Sat Jan 12 07:39:42 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 12 Jan 2013 07:39:42 +0100
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <cae95346cbbccf7cee1f1a8f56fa3469@chopin.edu.pl>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
	<cae95346cbbccf7cee1f1a8f56fa3469@chopin.edu.pl>
Message-ID: <kcr0fb$uqj$1@ger.gmane.org>

Jan Kaliszewski, 12.01.2013 02:28:
> 12.01.2013 00:41, Guido van Rossum wrote:
> 
>> Here's an interesting puzzle. Check out the core of Tulip's event
>> loop: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672
>>
>> Specifically this does something like this:
>>
>> 1. poll for I/O, appending any ready handlers to the _ready queue
>>
>> 2. append any handlers scheduled for a time <= now to the _ready queue
>>
>> 3. while _ready:
>>        handler = _ready.popleft()
>>        call handler
>>
>> It is the latter loop that causes me some concern. In theory it is
>> possible for a bad callback to make this loop never finish, as
>> follows:
>>
>> def hogger():
>>     tulip.get_event_loop().call_soon(hogger)
>>
>> Because call_soon() appends the handler to the _ready queue, the while
>> loop will never finish.
>>
>> There is a simple enough solution (Tornado uses this AFAIK):
>>
>> now_ready = list(_ready)
>> _ready.clear()
>> for handler in now_ready:
>>     call handler
>>
>> However this implies that we go back to the I/O polling code more
>> frequently. While the I/O polling code sets the timeout to zero when
>> there's anything in the _ready queue, so it won't block, it still
>> isn't free; it's an expensive system call that we'd like to put off
>> until we have nothing better to do.
> [...]
>> So what's more important? Avoid I/O starvation at all cost or make the
>> callbacks-posting-callbacks pattern efficient? I can see several
>> outcomes of this discussion: we could end up deciding that one or the
>> other strategy is always best; we could also leave it up to the
>> implementation (but then I still would want guidance for what to do in
>> Tulip); we could even decide this is so important that the user needs
>> to be able to control the policy here
> [...]
> 
> Maybe it could be, at least for the standard Tulip implementation,
> parameterizable with a simple integer value -- the suggested max number
> of loop iterations?
> 
> E.g. something like the following:
> 
>     # `suggested_iter_limit` is the parameter
>     actual_limit = max(len(_ready), suggested_iter_limit)
>     for i in range(actual_limit):
>         if not _ready:
>             break
>         handler = _ready.popleft()
>         call handler...

Yep, it could simply use itertools.islice() when iterating over _ready with
an appropriate upper bound factor relative to the actual length, and then
cut down the list after the loop. So it would never go, say, 50% over the
initially anticipated workload. Or rather a fixed number, I guess, to make
it more predictable for users. That would be a user configurable parameter
to the I/O loop.

    actual_limit = len(_ready) + max_additional_load_per_loop
    for handler in itertools.islice(_ready, None, actual_limit):
        call handler...
    del _ready[:actual_limit]

Stefan




From ncoghlan at gmail.com  Sat Jan 12 10:53:27 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Jan 2013 19:53:27 +1000
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <kcr0fb$uqj$1@ger.gmane.org>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
	<cae95346cbbccf7cee1f1a8f56fa3469@chopin.edu.pl>
	<kcr0fb$uqj$1@ger.gmane.org>
Message-ID: <CADiSq7f8AxBg9t1wmWjuTj09EECHZMTZQySy0KkP1gFWKPJaUw@mail.gmail.com>

On Sat, Jan 12, 2013 at 4:39 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Yep, it could simply use itertools.islice() when iterating over _ready with
> an appropriate upper bound factor relative to the actual length, and then
> cut down the list after the loop. So it would never go, say, 50% over the
> initially anticipated workload. Or rather a fixed number, I guess, to make
> it more predictable for users. That would be a user configurable parameter
> to the I/O loop.
>
>     actual_limit = len(_ready) + max_additional_load_per_loop
>     for handler in itertools.islice(_ready, None, actual_limit):
>         call handler...
>     del _ready[:actual_limit]

But do we need that in the reference loop? It seems like additional
complexity when it has yet to be demonstrated that the simple solution
of alternating processing of call_soon registrations with IO callbacks
is inadequate.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From solipsis at pitrou.net  Sat Jan 12 11:38:30 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 12 Jan 2013 11:38:30 +0100
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
	<cae95346cbbccf7cee1f1a8f56fa3469@chopin.edu.pl>
	<kcr0fb$uqj$1@ger.gmane.org>
	<CADiSq7f8AxBg9t1wmWjuTj09EECHZMTZQySy0KkP1gFWKPJaUw@mail.gmail.com>
Message-ID: <20130112113830.5b374f1b@pitrou.net>

On Sat, 12 Jan 2013 19:53:27 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Sat, Jan 12, 2013 at 4:39 PM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> > Yep, it could simply use itertools.islice() when iterating over _ready with
> > an appropriate upper bound factor relative to the actual length, and then
> > cut down the list after the loop. So it would never go, say, 50% over the
> > initially anticipated workload. Or rather a fixed number, I guess, to make
> > it more predictable for users. That would be a user configurable parameter
> > to the I/O loop.
> >
> >     actual_limit = len(_ready) + max_additional_load_per_loop
> >     for handler in itertools.islice(_ready, None, actual_limit):
> >         call handler...
> >     del _ready[:actual_limit]
> 
> But do we need that in the reference loop? It seems like additional
> complexity when it has yet to be demonstrated that the simple solution
> of alternating processing of call_soon registrations with IO callbacks
> is inadequate.

Why do you talk about "reference loop"? It should be usable in
production, not some kind of demonstration system that people will
have to replace with a third-party library to get decent results.

Regards

Antoine.




From ubershmekel at gmail.com  Sat Jan 12 12:03:26 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Sat, 12 Jan 2013 13:03:26 +0200
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
Message-ID: <CANSw7Kya989pEiA3=Bj+iHOwtKEMaGJ2MgK+yvj36dGrAF27Mw@mail.gmail.com>

On Sat, Jan 12, 2013 at 1:41 AM, Guido van Rossum <guido at python.org> wrote:

> [...]

def hogger():
>     tulip.get_event_loop().call_soon(hogger)
>
> Because call_soon() appends the handler to the _ready queue, the while
> loop will never finish.
>
> [...]
> However this implies that we go back to the I/O polling code more
> frequently. While the I/O polling code sets the timeout to zero when
> there's anything in the _ready queue, so it won't block, it still
> isn't free; it's an expensive system call that we'd like to put off
> until we have nothing better to do.
>
> I can imagine various patterns where handlers append other handlers to
> the _ready queue for immediate execution, and I'd make such patterns
> efficient (i.e. the user shouldn't have to worry about the cost of the
> I/O poll compared to the amount of work appended to the _ready queue).
>
>
I read your statements as:
* I don't want the user to cause IO starvation
* I want the user to cause IO starvation

Which means you have two options:
* Make an opinionated decision that won't be perfect for everyone (not as
bad as it sounds)
* Allow configurability

IMO core event loops need this configurability but not on a daily basis,
e.g. Windows XP's event loop gave priority to the foreground process (i.e.
UI events) and Windows Server 2003 gave priority to background processes.

e.g. (warning unoptimized pseudocode follows)

while True:
    for i in range(io_weight):
        pop_io()
    for i in range(event_weight):
        pop_ready()

# note one of the weights can be zero
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130112/591141c1/attachment.html>

From ncoghlan at gmail.com  Sat Jan 12 12:44:36 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 12 Jan 2013 21:44:36 +1000
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <20130112113830.5b374f1b@pitrou.net>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
	<cae95346cbbccf7cee1f1a8f56fa3469@chopin.edu.pl>
	<kcr0fb$uqj$1@ger.gmane.org>
	<CADiSq7f8AxBg9t1wmWjuTj09EECHZMTZQySy0KkP1gFWKPJaUw@mail.gmail.com>
	<20130112113830.5b374f1b@pitrou.net>
Message-ID: <CADiSq7fxZ6xQM=z5ik075=iRaDdTSd_1BT++RmAvocTsdt0uQg@mail.gmail.com>

On Sat, Jan 12, 2013 at 8:38 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> But do we need that in the reference loop? It seems like additional
>> complexity when it has yet to be demonstrated that the simple solution
>> of alternating processing of call_soon registrations with IO callbacks
>> is inadequate.
>
> Why do you talk about "reference loop"? It should be usable in
> production, not some kind of demonstration system that people will
> have to replace with a third-party library to get decent results.

I mean "reference loop" in the same sense that CPython is the
"reference interpreter". You can get a lot more cool stuff by
upgrading to, e.g. IPython, as your interactive interpreter, or using
an interactive debugger other than pdb, but that doesn't mean all of
those enhancements should be folded back into the core.

In this case, we have a feature where there is a reasonable default
behaviour (i.e. alternating between processing ready calls and
checking for triggering of IO callbacks as Guido suggested), and no
compelling evidence to justify a more complex solution. Ergo, the
reference loop should use the simple approach, until such evidence is
provided.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From nepenthesdev at gmail.com  Sat Jan 12 14:01:40 2013
From: nepenthesdev at gmail.com (Markus)
Date: Sat, 12 Jan 2013 14:01:40 +0100
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
Message-ID: <CACEGMv-RH25bJ3Sdp2dLRbTeUCWzomMaWjNKkHkK2r9B+H4z1A@mail.gmail.com>

Hi,

On Sat, Jan 12, 2013 at 12:41 AM, Guido van Rossum <guido at python.org> wrote:
> def hogger():
>     tulip.get_event_loop().call_soon(hogger)
>
> Because call_soon() appends the handler to the _ready queue, the while
> loop will never finish.

Adding a poll-able descriptor to the the loop will eval it in the next
iteration of the loop, so why make a difference with timers?
Define call_soon to be called in the next iteration - not in the same.

Basically every modification of the event loop should be evaluated in
the next iteration, not the same.


MfG
Markus


From ncoghlan at gmail.com  Sat Jan 12 15:55:53 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Jan 2013 00:55:53 +1000
Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup
	sequence)
Message-ID: <CADiSq7d2HLkEWdJ=TiGJgmahY25x8N6utpo2OGUGW=CWfs_w4A@mail.gmail.com>

I've started work on the PEP 432 implementation at
https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits

As part of that work, I'm also cleaning up some of the crazier things
in the source tree layout, like "pythonrun" being this gigantic
monolith covering interpreter initialisation, code execution and
interpreter shutdown all in one file, as well as the source files for
the application binaries being mixed in with the source files for
standard library builtin and extension modules.

This means I know I'm breaking the Windows builds. Rather than leaving
that until the end, I'm looking for someone that's willing to take the
changes from the "pep432_modular_bootstrap" in my sandbox repo, check
what is needed to get them building on Windows, and then send me pull
requests on BitBucket to fix them.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From dustin at v.igoro.us  Sat Jan 12 16:03:40 2013
From: dustin at v.igoro.us (Dustin J. Mitchell)
Date: Sat, 12 Jan 2013 10:03:40 -0500
Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support
	Rebooted
In-Reply-To: <CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
References: <CABocrW7UdCRiXCS24erNN68ScSdRQKoJtONaB6sYZLNJP5prvA@mail.gmail.com>
	<CAP7+vJ+Oq5GZ4Kw8Ysjac4H8zxUA-ic2vtq3dS2QZG8i-=cnig@mail.gmail.com>
	<CABocrW7Pf85tHDFDXt7DGfeGhYf7RTH8Wv-XjovePjubEQo2bA@mail.gmail.com>
	<CAP7+vJ+QSiZCQpA7DyJ=TogoQsACELGcXQLQesJ-dSaU1NvWCA@mail.gmail.com>
Message-ID: <CAJtE5vRWDoCjqbRY1zUMtp3T6+WC_RoQFYUtB+WjJNLuJX3iLQ@mail.gmail.com>

On Wed, Jan 9, 2013 at 12:14 AM, Guido van Rossum <guido at python.org> wrote:
> But which half? A socket is two independent streams, one in each
> direction. Twisted uses half_close() for this concept but unless you
> already know what this is for you are left wondering which half. Which
> is why I like using 'write' in the name.

FWIW, "half-closed" is, IMHO, a well-known term.  It's not just a Twisted thing.

Either name is better than "shutdown"!

Dustin


From dustin at v.igoro.us  Sat Jan 12 18:08:13 2013
From: dustin at v.igoro.us (Dustin J. Mitchell)
Date: Sat, 12 Jan 2013 12:08:13 -0500
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <CACEGMv-RH25bJ3Sdp2dLRbTeUCWzomMaWjNKkHkK2r9B+H4z1A@mail.gmail.com>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
	<CACEGMv-RH25bJ3Sdp2dLRbTeUCWzomMaWjNKkHkK2r9B+H4z1A@mail.gmail.com>
Message-ID: <CAJtE5vSuf3DUFjnLYoVs5HFak7zvmDCt3hqdN23ZDo-H7U+Bzg@mail.gmail.com>

On Sat, Jan 12, 2013 at 8:01 AM, Markus <nepenthesdev at gmail.com> wrote:
> Adding a poll-able descriptor to the the loop will eval it in the next
> iteration of the loop, so why make a difference with timers?
> Define call_soon to be called in the next iteration - not in the same.
>
> Basically every modification of the event loop should be evaluated in
> the next iteration, not the same.

We're looking for a "fair" scheduling algorithm here, and I think this
describes it.  Everything else should be an optimization from here.

For example, if the event loop "detects" that it is CPU-bound, perhaps
it skips some fraction of the relatively expensive calls to the IO
check.  I have no idea how to do such "detection" efficiently.  Maybe
just count the number of consecutive IO checks with timeout=0 that
returned no actionable IO, and skip that number of checks.

run_ready_queue()
check_io(timeout=0) -> no IO, counter becomes 1
run_ready_queue()
(skip 1)
run_ready_queue()
check_io(timeout=0) -> no IO, counter becomes 2
run_ready_queue()
(skip 1)
run_ready_queue()
(skip 2)
run_ready_queue()
check_io(timeout=0) ...

There would be some limits, of course, and this should probably be
based on time, not cycles.  It's not clear what to do when IO *does*
occur -- divide the counter by 2?  At any rate, this is an O(1) change
to the the event loop that would get some interesting adaptive
behavior, while still maintaining its fairness.

IMHO the PEP should leave this unspecified, perhaps suggesting only
that event loops have clear documentations regarding their fairness.
Then users can select event loops based on their needs.

Dustin


From ben at bendarnell.com  Sat Jan 12 18:18:45 2013
From: ben at bendarnell.com (Ben Darnell)
Date: Sat, 12 Jan 2013 12:18:45 -0500
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
 question: CPU vs. I/O starvation
In-Reply-To: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
Message-ID: <CAFkYKJ7qD1WTLuybDcwY_J9QcqtNpK8MHjpWoJEQhX2uVhPiYA@mail.gmail.com>

On Fri, Jan 11, 2013 at 6:41 PM, Guido van Rossum <guido at python.org> wrote:

> Here's an interesting puzzle. Check out the core of Tulip's event
> loop:
> http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672
>
> Specifically this does something like this:
>
> 1. poll for I/O, appending any ready handlers to the _ready queue
>
> 2. append any handlers scheduled for a time <= now to the _ready queue
>
> 3. while _ready:
>        handler = _ready.popleft()
>        call handler
>
> It is the latter loop that causes me some concern. In theory it is
> possible for a bad callback to make this loop never finish, as
> follows:
>
> def hogger():
>     tulip.get_event_loop().call_soon(hogger)
>

This is actually a useful pattern, not just a pathological "bad callback".
 If the function does some work before re-adding itself, it allows for
better multitasking kind of like doing the work in another thread (with a
starvation-free event loop).  If the event loop starves IO in this case
it's difficult to get this kind of non-blocking background execution (you
have to use call_later with a non-zero timeout, slowing the work down
unnecessarily).



>
> Because call_soon() appends the handler to the _ready queue, the while
> loop will never finish.
>
> There is a simple enough solution (Tornado uses this AFAIK):
>
> now_ready = list(_ready)
> _ready.clear()
> for handler in now_ready:
>     call handler
>
> However this implies that we go back to the I/O polling code more
> frequently.


In isolation, yes.  Under real-world load, it's less clear.  A zero-timeout
poll that has nothing to return is in some sense wasted work, but if
there's other stuff going on then we may just change the timing of the poll
calls rather than inserting additional ones.



>
> So what's more important? Avoid I/O starvation at all cost or make the
> callbacks-posting-callbacks pattern efficient? I can see several
> outcomes of this discussion: we could end up deciding that one or the
> other strategy is always best; we could also leave it up to the
> implementation (but then I still would want guidance for what to do in
> Tulip); we could even decide this is so important that the user needs
> to be able to control the policy here (though I hate having many
> configuration options, since in practice few people bother to take
> control, and you might as well have hard-coded the default...).
>

I'm not sure it's worth the complexity to offer both, so I'd be inclined to
just have the starvation-free version.

-Ben


>
> Thoughts? Do I need to explain it better?
>
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130112/54d6c7db/attachment.html>

From federico.dev at reghe.net  Sat Jan 12 18:20:32 2013
From: federico.dev at reghe.net (Federico Reghenzani)
Date: Sat, 12 Jan 2013 18:20:32 +0100
Subject: [Python-ideas] csv dialect enhancement (repost)
In-Reply-To: <d5f10739-c6b0-4228-95ba-414d8bb449d7@googlegroups.com>
References: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com>
	<d5f10739-c6b0-4228-95ba-414d8bb449d7@googlegroups.com>
Message-ID: <CADf4hJJE0uYsWsKYN7QA6zcT+2bMLNwkVmsdAdpjB=ZmZKPf3g@mail.gmail.com>

On Fri, Jan 11, 2013 at 6:51 PM, rurpy at yahoo.com <rurpy at yahoo.com> wrote:

> [Sorry for the duplicated text in the previous post, please ignore
> that one in favor of this one]
>
> There is a common dialect of CSV, often used in database
> applications [*1], that distinguishes between an empty
> (quoted) string,


How many DBMS have this dialect? e.g. MySQL want \N for null values, in
other databases this is not even possible. Anyway I think that should be
implemented because it may have different uses.


>
> [*2] I don't really care what the attribute name is; I chose
> "nulls" as a trial balloon because I wanted to avoid something
> with "none" in it to avoid confusion with QUOTE_NONE.
>

+1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130112/58750ecc/attachment.html>

From guido at python.org  Sat Jan 12 19:29:30 2013
From: guido at python.org (Guido van Rossum)
Date: Sat, 12 Jan 2013 10:29:30 -0800
Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation
	question: CPU vs. I/O starvation
In-Reply-To: <CAFkYKJ7qD1WTLuybDcwY_J9QcqtNpK8MHjpWoJEQhX2uVhPiYA@mail.gmail.com>
References: <CAP7+vJLK5VhuvqXGF5FEiqa7Gwb4a=hsfnaPG8yXHb02UNjxMQ@mail.gmail.com>
	<CAFkYKJ7qD1WTLuybDcwY_J9QcqtNpK8MHjpWoJEQhX2uVhPiYA@mail.gmail.com>
Message-ID: <CAP7+vJ+9w90bnTJQzrqGp6wEzjtPMJnALccT=ewChaN729TN+w@mail.gmail.com>

Thanks all! It is clear what to do now. Run all those handlers that are
currently ready but not those added during this run.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130112/00708b93/attachment.html>

From d.s at daniel.shahaf.name  Sun Jan 13 01:25:04 2013
From: d.s at daniel.shahaf.name (Daniel Shahaf)
Date: Sun, 13 Jan 2013 02:25:04 +0200
Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython
	update	sequence
In-Reply-To: <CADiSq7fqt-H8Nd=d6aX+Tt3iBBHufOr6Fc8z4mg=LhAj8wtL3A@mail.gmail.com>
References: <CADiSq7fqt-H8Nd=d6aX+Tt3iBBHufOr6Fc8z4mg=LhAj8wtL3A@mail.gmail.com>
Message-ID: <20130113002504.GT2956@lp-shahaf.local>

Quick question, do you plan to expose the C argv values as part of this
work?

Issue #14208 asks for the full C argv array; my use-case today required
only the C argv[0].  Both of the use-cases had to do with having
a script reexecute itself (eg, 'os.execl(sys.executable, *args)').

I see a 'raw_argv' in the config struct, but I'm not sure if it'll be
accessible to Python code.

Cheers,

Daniel

Nick Coghlan wrote on Wed, Jan 02, 2013 at 21:40:26 +1000:
> Configuring ``sys.argv``
> ------------------------
> 
> Unlike most other settings discussed in this PEP, ``sys.argv`` is not
> set implicitly by ``Py_Initialize()``. Instead, it must be set via an
> explicitly call to ``Py_SetArgv()``.
> 
> CPython calls this in ``Py_Main()`` after calling ``Py_Initialize()``. The
> calculation of ``sys.argv[1:]`` is straightforward: they're the command line
> arguments passed after the script name or the argument to the ``-c`` or
> ``-m`` options.
> 
> The calculation of ``sys.argv[0]`` is a little more complicated:
> 
> * For an ordinary script (source or bytecode), it will be the script name
> * For a ``sys.path`` entry (typically a zipfile or directory) it will
>   initially be the zipfile or directory name, but will later be changed by
>   the ``runpy`` module to the full path to the imported ``__main__`` module.
> * For a module specified with the ``-m`` switch, it will initially be the
>   string ``"-m"``, but will later be changed by the ``runpy`` module to the
>   full path to the executed module.
> * For a package specified with the ``-m`` switch, it will initially be the
>   string ``"-m"``, but will later be changed by the ``runpy`` module to the
>   full path to the executed ``__main__`` submodule of the package.
> * For a command executed with ``-c``, it will be the string ``"-c"``
> * For explicitly requested input from stdin, it will be the string ``"-"``
> * Otherwise, it will be the empty string
> 
> Embedding applications must call Py_SetArgv themselves. The CPython logic
> for doing so is part of ``Py_Main()`` and is not exposed separately.
> However, the ``runpy`` module does provide roughly equivalent logic in
> ``runpy.run_module`` and ``runpy.run_path``.
> 
> 
> 
> Supported configuration settings
> --------------------------------
> 
> The new ``Py_Config`` struct holds the settings required to complete the
> interpreter configuration. All fields are either pointers to Python
> data types (not set == ``NULL``) or numeric flags (not set == ``-1``)::
> 
>     /* Note: if changing anything in Py_Config, also update Py_Config_INIT */
>     typedef struct {
>         /* Argument processing */
>         PyList *raw_argv;
>         PyList *argv;
>         PyList *warnoptions; /* -W switch, PYTHONWARNINGS */
>         PyDict *xoptions;    /* -X switch */
> 
>         /* Filesystem locations */
>         PyUnicode *program_name;
>         PyUnicode *executable;
>         PyUnicode *prefix;           /* PYTHONHOME */
>         PyUnicode *exec_prefix;      /* PYTHONHOME */
>         PyUnicode *base_prefix;      /* pyvenv.cfg */
>         PyUnicode *base_exec_prefix; /* pyvenv.cfg */
> 
>         /* Site module */
>         int no_site;       /* -S switch */
>         int no_user_site;  /* -s switch, PYTHONNOUSERSITE */
> 
>         /* Import configuration */
>         int dont_write_bytecode;  /* -B switch, PYTHONDONTWRITEBYTECODE */
>         int ignore_module_case;   /* PYTHONCASEOK */
>         PyList    *import_path;   /* PYTHONPATH (etc) */
> 
>         /* Standard streams */
>         int use_unbuffered_io;      /* -u switch, PYTHONUNBUFFEREDIO */
>         PyUnicode *stdin_encoding;  /* PYTHONIOENCODING */
>         PyUnicode *stdin_errors;    /* PYTHONIOENCODING */
>         PyUnicode *stdout_encoding; /* PYTHONIOENCODING */
>         PyUnicode *stdout_errors;   /* PYTHONIOENCODING */
>         PyUnicode *stderr_encoding; /* PYTHONIOENCODING */
>         PyUnicode *stderr_errors;   /* PYTHONIOENCODING */
> 
>         /* Filesystem access */
>         PyUnicode *fs_encoding;
> 
>         /* Interactive interpreter */
>         int stdin_is_interactive; /* Force interactive behaviour */
>         int inspect_main;         /* -i switch, PYTHONINSPECT */
>         PyUnicode *startup_file;  /* PYTHONSTARTUP */
> 
>         /* Debugging output */
>         int debug_parser;    /* -d switch, PYTHONDEBUG */
>         int verbosity;       /* -v switch */
>         int suppress_banner; /* -q switch */
> 
>         /* Code generation */
>         int bytes_warnings;  /* -b switch */
>         int optimize;        /* -O switch */
> 
>         /* Signal handling */
>         int install_sig_handlers;
>     } Py_Config;


From victor.stinner at gmail.com  Sun Jan 13 01:43:55 2013
From: victor.stinner at gmail.com (Victor Stinner)
Date: Sun, 13 Jan 2013 01:43:55 +0100
Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update
	sequence
In-Reply-To: <20130113002504.GT2956@lp-shahaf.local>
References: <CADiSq7fqt-H8Nd=d6aX+Tt3iBBHufOr6Fc8z4mg=LhAj8wtL3A@mail.gmail.com>
	<20130113002504.GT2956@lp-shahaf.local>
Message-ID: <CAMpsgwZ4OJgQAyPs1K3N50VY-vVBcc8iUUi5kNjUA6gtTSAXtA@mail.gmail.com>

2013/1/13 Daniel Shahaf <d.s at daniel.shahaf.name>:
> Quick question, do you plan to expose the C argv values as part of this
> work?
>
> Issue #14208 asks for the full C argv array; my use-case today required
> only the C argv[0].  Both of the use-cases had to do with having
> a script reexecute itself (eg, 'os.execl(sys.executable, *args)').

I don't remember where, but it was already asked how is it possible to
recreate the Python command line to create a subprocess with the
Python command line options.

For example:

$ python -O -c "import sys; print(sys.argv)"
['-c']

How do you get ['python', '-O']? I guess that the question was for the
multiprocessing on Windows (which does not support fork).

Victor


From ncoghlan at gmail.com  Sun Jan 13 02:56:53 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Jan 2013 11:56:53 +1000
Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update
	sequence
In-Reply-To: <20130113002504.GT2956@lp-shahaf.local>
References: <CADiSq7fqt-H8Nd=d6aX+Tt3iBBHufOr6Fc8z4mg=LhAj8wtL3A@mail.gmail.com>
	<20130113002504.GT2956@lp-shahaf.local>
Message-ID: <CADiSq7d1CiprNzfAKbs4RD4NfAkzrLk+xVg8KSPpGmgNmja8CQ@mail.gmail.com>

It will be accessible. Currently planned spelling:
sys._configuration.raw_argv

Cheers,
Nick.

--
Sent from my phone, thus the relative brevity :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130113/c55213a9/attachment.html>

From brian at python.org  Sun Jan 13 02:59:14 2013
From: brian at python.org (Brian Curtin)
Date: Sat, 12 Jan 2013 19:59:14 -0600
Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup
	sequence)
In-Reply-To: <CADiSq7d2HLkEWdJ=TiGJgmahY25x8N6utpo2OGUGW=CWfs_w4A@mail.gmail.com>
References: <CADiSq7d2HLkEWdJ=TiGJgmahY25x8N6utpo2OGUGW=CWfs_w4A@mail.gmail.com>
Message-ID: <CAD+XWwp6Ba0Vof6oDuj0oC3vhpvmuq2F1=NMPjWWZNYCuD5DSQ@mail.gmail.com>

On Sat, Jan 12, 2013 at 8:55 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> I've started work on the PEP 432 implementation at
> https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits
>
> As part of that work, I'm also cleaning up some of the crazier things
> in the source tree layout, like "pythonrun" being this gigantic
> monolith covering interpreter initialisation, code execution and
> interpreter shutdown all in one file, as well as the source files for
> the application binaries being mixed in with the source files for
> standard library builtin and extension modules.
>
> This means I know I'm breaking the Windows builds. Rather than leaving
> that until the end, I'm looking for someone that's willing to take the
> changes from the "pep432_modular_bootstrap" in my sandbox repo, check
> what is needed to get them building on Windows, and then send me pull
> requests on BitBucket to fix them.

I'll try to take a look within the next few days.


From ncoghlan at gmail.com  Sun Jan 13 03:15:35 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Jan 2013 12:15:35 +1000
Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup
	sequence)
In-Reply-To: <CAD+XWwp6Ba0Vof6oDuj0oC3vhpvmuq2F1=NMPjWWZNYCuD5DSQ@mail.gmail.com>
References: <CADiSq7d2HLkEWdJ=TiGJgmahY25x8N6utpo2OGUGW=CWfs_w4A@mail.gmail.com>
	<CAD+XWwp6Ba0Vof6oDuj0oC3vhpvmuq2F1=NMPjWWZNYCuD5DSQ@mail.gmail.com>
Message-ID: <CADiSq7ei8bAGCK3+zKVkin6giA27xW+hk_+1bD=TvMx0aGmVWg@mail.gmail.com>

On Sun, Jan 13, 2013 at 11:59 AM, Brian Curtin <brian at python.org> wrote:
> On Sat, Jan 12, 2013 at 8:55 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> I've started work on the PEP 432 implementation at
>> https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits
>>
>> As part of that work, I'm also cleaning up some of the crazier things
>> in the source tree layout, like "pythonrun" being this gigantic
>> monolith covering interpreter initialisation, code execution and
>> interpreter shutdown all in one file, as well as the source files for
>> the application binaries being mixed in with the source files for
>> standard library builtin and extension modules.
>>
>> This means I know I'm breaking the Windows builds. Rather than leaving
>> that until the end, I'm looking for someone that's willing to take the
>> changes from the "pep432_modular_bootstrap" in my sandbox repo, check
>> what is needed to get them building on Windows, and then send me pull
>> requests on BitBucket to fix them.
>
> I'll try to take a look within the next few days.

Richard Oudkerk has given me a patch at least for the VS 2010 files.
(We discovered in the process that bitbucket only allows pull requests
for forked repos back to their parent - no pull requests between
sibling repos).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From g.brandl at gmx.net  Sun Jan 13 11:31:54 2013
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 13 Jan 2013 11:31:54 +0100
Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup
	sequence)
In-Reply-To: <CADiSq7ei8bAGCK3+zKVkin6giA27xW+hk_+1bD=TvMx0aGmVWg@mail.gmail.com>
References: <CADiSq7d2HLkEWdJ=TiGJgmahY25x8N6utpo2OGUGW=CWfs_w4A@mail.gmail.com>
	<CAD+XWwp6Ba0Vof6oDuj0oC3vhpvmuq2F1=NMPjWWZNYCuD5DSQ@mail.gmail.com>
	<CADiSq7ei8bAGCK3+zKVkin6giA27xW+hk_+1bD=TvMx0aGmVWg@mail.gmail.com>
Message-ID: <kcu2ck$dm4$1@ger.gmane.org>

Am 13.01.2013 03:15, schrieb Nick Coghlan:
> On Sun, Jan 13, 2013 at 11:59 AM, Brian Curtin <brian at python.org> wrote:
>> On Sat, Jan 12, 2013 at 8:55 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>> I've started work on the PEP 432 implementation at
>>> https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits
>>>
>>> As part of that work, I'm also cleaning up some of the crazier things
>>> in the source tree layout, like "pythonrun" being this gigantic
>>> monolith covering interpreter initialisation, code execution and
>>> interpreter shutdown all in one file, as well as the source files for
>>> the application binaries being mixed in with the source files for
>>> standard library builtin and extension modules.
>>>
>>> This means I know I'm breaking the Windows builds. Rather than leaving
>>> that until the end, I'm looking for someone that's willing to take the
>>> changes from the "pep432_modular_bootstrap" in my sandbox repo, check
>>> what is needed to get them building on Windows, and then send me pull
>>> requests on BitBucket to fix them.
>>
>> I'll try to take a look within the next few days.
> 
> Richard Oudkerk has given me a patch at least for the VS 2010 files.
> (We discovered in the process that bitbucket only allows pull requests
> for forked repos back to their parent - no pull requests between
> sibling repos).

That sounds unfortunate -- did you open a report/feature request in their
tracker?

Georg



From ncoghlan at gmail.com  Sun Jan 13 11:48:05 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 13 Jan 2013 20:48:05 +1000
Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup
	sequence)
In-Reply-To: <kcu2ck$dm4$1@ger.gmane.org>
References: <CADiSq7d2HLkEWdJ=TiGJgmahY25x8N6utpo2OGUGW=CWfs_w4A@mail.gmail.com>
	<CAD+XWwp6Ba0Vof6oDuj0oC3vhpvmuq2F1=NMPjWWZNYCuD5DSQ@mail.gmail.com>
	<CADiSq7ei8bAGCK3+zKVkin6giA27xW+hk_+1bD=TvMx0aGmVWg@mail.gmail.com>
	<kcu2ck$dm4$1@ger.gmane.org>
Message-ID: <CADiSq7cz5CeS4gDEgWyAsuE1nsGtJs-tohUMQagDTQhUSpSM+g@mail.gmail.com>

On Sun, Jan 13, 2013 at 8:31 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Am 13.01.2013 03:15, schrieb Nick Coghlan:
>> Richard Oudkerk has given me a patch at least for the VS 2010 files.
>> (We discovered in the process that bitbucket only allows pull requests
>> for forked repos back to their parent - no pull requests between
>> sibling repos).
>
> That sounds unfortunate -- did you open a report/feature request in their
> tracker?

I hadn't, but I have now:
https://bitbucket.org/site/master/issue/5968/allow-creation-of-pull-requests-between

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From p.f.moore at gmail.com  Sun Jan 13 19:53:51 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 13 Jan 2013 18:53:51 +0000
Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup
	sequence)
In-Reply-To: <CADiSq7ei8bAGCK3+zKVkin6giA27xW+hk_+1bD=TvMx0aGmVWg@mail.gmail.com>
References: <CADiSq7d2HLkEWdJ=TiGJgmahY25x8N6utpo2OGUGW=CWfs_w4A@mail.gmail.com>
	<CAD+XWwp6Ba0Vof6oDuj0oC3vhpvmuq2F1=NMPjWWZNYCuD5DSQ@mail.gmail.com>
	<CADiSq7ei8bAGCK3+zKVkin6giA27xW+hk_+1bD=TvMx0aGmVWg@mail.gmail.com>
Message-ID: <CACac1F_wKSrU+cZhjoELMiYuafckVRCb7D_WVMcO6T5wiuMX8Q@mail.gmail.com>

On 13 January 2013 02:15, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Richard Oudkerk has given me a patch at least for the VS 2010 files.
> (We discovered in the process that bitbucket only allows pull requests
> for forked repos back to their parent - no pull requests between
> sibling repos).

Looks like it's OK now - I just pulled your latest version and it
built and ran all the tests fine. I didn't build the various extension
modules that need external libraries, I assume they won't have
changed. I can do if it would help, though.

Couple of crashes in test_capi and test_faulthandler. I suspect those
are expected, though. And one in test_urllib2 which I haven't
investigated yet but I doubt is related to these changes.

Paul


From brian at python.org  Sun Jan 13 19:57:16 2013
From: brian at python.org (Brian Curtin)
Date: Sun, 13 Jan 2013 12:57:16 -0600
Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup
	sequence)
In-Reply-To: <CACac1F_wKSrU+cZhjoELMiYuafckVRCb7D_WVMcO6T5wiuMX8Q@mail.gmail.com>
References: <CADiSq7d2HLkEWdJ=TiGJgmahY25x8N6utpo2OGUGW=CWfs_w4A@mail.gmail.com>
	<CAD+XWwp6Ba0Vof6oDuj0oC3vhpvmuq2F1=NMPjWWZNYCuD5DSQ@mail.gmail.com>
	<CADiSq7ei8bAGCK3+zKVkin6giA27xW+hk_+1bD=TvMx0aGmVWg@mail.gmail.com>
	<CACac1F_wKSrU+cZhjoELMiYuafckVRCb7D_WVMcO6T5wiuMX8Q@mail.gmail.com>
Message-ID: <CAD+XWwqNEBQ_A8GaUro8S5CV2bRtDKGV504U1FY-3acUAy6gCw@mail.gmail.com>

On Sun, Jan 13, 2013 at 12:53 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 13 January 2013 02:15, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Richard Oudkerk has given me a patch at least for the VS 2010 files.
>> (We discovered in the process that bitbucket only allows pull requests
>> for forked repos back to their parent - no pull requests between
>> sibling repos).
>
> Looks like it's OK now - I just pulled your latest version and it
> built and ran all the tests fine. I didn't build the various extension
> modules that need external libraries, I assume they won't have
> changed. I can do if it would help, though.
>
> Couple of crashes in test_capi and test_faulthandler. I suspect those
> are expected, though. And one in test_urllib2 which I haven't
> investigated yet but I doubt is related to these changes.

The test_capi and test_faulthandler ones are expected but kind of a
hassle on the desktop. There's an issue somewhere and I have a patch
to make those work nicer, but there's a few ways we can go about it.

On the buildbots those aren't a problem because we already remove
and/or have a script that closes the crash dialogs.


From rosuav at gmail.com  Sun Jan 13 21:55:21 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Mon, 14 Jan 2013 07:55:21 +1100
Subject: [Python-ideas] csv dialect enhancement
In-Reply-To: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com>
References: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com>
Message-ID: <CAPTjJmpC-NTQ9YMandsgmOojfWkg-E1OBsvWe2xcFZYT_4Etww@mail.gmail.com>

On Sat, Jan 12, 2013 at 4:16 AM, rurpy at yahoo.com <rurpy at yahoo.com> wrote:
> There is a common dialect of CSV, often used in database
> applications [*1], that distinguishes between an empty
> (quoted) string,
>
>   e.g., the second field in  "abc","",3
>
> and an empty field,
>
>   e.g., the second field in "abc",,3
>
> This distinction is needed to specify or tell the
> difference between 0-length strings and NULLs, when sending
> csv data to or receiving it from a database application.

Ugh, this is exactly the sort of thing that my boss didn't believe
happened. He thinks that CSV is the same the world over, except for a
few really old or arcane programs that can be completely ignored. Took
a lot of arguing before we agreed to disagree on that one...

As an explicitly-requestable dialect, looks good.

> Sniffer:
>  Will set "nulls" to True when both adjacent delimiters and
>  quoted empty strings are seen in the input text.
>  (Perhaps this behaviour needs to be optional for backward
>  compatibility reasons?)

Yes, and make it optional. I think the interpretation of ,,,, as empty
strings is the more common, since CSV is often used in contexts that
don't have a concept of NULL (spreadsheets mainly); this ought, then,
to be the default, but one quick option can add recognition of this.

So, +1 on the whole idea.

ChrisA


From p.f.moore at gmail.com  Mon Jan 14 14:22:27 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 14 Jan 2013 13:22:27 +0000
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
Message-ID: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>

This may be simple enough to just be a feature request on the tracker,
but I thought I'd post it here first to see if people thought it was a
good idea.

I'd like it if the glob module supported the (relatively common)
facility to use ** to mean recursively search subdirectories. It's a
reasonably straightforward patch, and offers a feature that is fairly
difficult to implement in user code on top of the existing
functionality. The syntax is supported in a number of places (for
example the bash shell and things like Java Ant) so it will be
relatively familiar to users.

For people who don't know the syntax, "a/**/b" is equivalent to "a/*/b
or a/*/*/b or a/*/*/*/b or ..." (for as many levels as needed).

One obvious downside is that if used carelessly, it can make globbing
pretty slow. So I'd propose that it be added as an optional extension
enabled using a flag argument (glob(pat, allow_recursive=True)) which
is false by default. That would also mean that backward compatibility
should not be an issue.

Any comments? Is this worth submitting a patch to the tracker?

Paul.


From ubershmekel at gmail.com  Mon Jan 14 15:52:13 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Mon, 14 Jan 2013 16:52:13 +0200
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <CANSw7KwBtoPffEr+rTczX341ZANuYpOxzWnbCNtnzR=OXGQHFQ@mail.gmail.com>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<CANSw7KwBtoPffEr+rTczX341ZANuYpOxzWnbCNtnzR=OXGQHFQ@mail.gmail.com>
Message-ID: <CANSw7KxoK5iCuR3BQa3c3G97VAfd3qVEs0hvV4kbN8gAbfND4w@mail.gmail.com>

http://bugs.python.org/issue13968

"Support recursive globs"

Yuval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130114/8b6f5cae/attachment.html>

From vinay_sajip at yahoo.co.uk  Mon Jan 14 16:46:29 2013
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Mon, 14 Jan 2013 15:46:29 +0000 (UTC)
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
Message-ID: <loom.20130114T163330-125@post.gmane.org>

Paul Moore <p.f.moore at ...> writes:

> I'd like it if the glob module supported the (relatively common)
> facility to use ** to mean recursively search subdirectories. It's a
> reasonably straightforward patch, and offers a feature that is fairly
> difficult to implement in user code on top of the existing
> functionality. The syntax is supported in a number of places (for
> example the bash shell and things like Java Ant) so it will be
> relatively familiar to users.

Agreed. This was in packaging/distutils2 and I have now got it in distlib [1];
it supports both recursive globs and variants using the {opt1,opt2,op3} syntax.

> One obvious downside is that if used carelessly, it can make globbing
> pretty slow. So I'd propose that it be added as an optional extension
> enabled using a flag argument (glob(pat, allow_recursive=True)) which
> is false by default. That would also mean that backward compatibility
> should not be an issue.

Isn't the requirement to recurse implied by the presence of '**' in the
pattern? What's to be gained by specifying it using allow_recursive as well?
Will having allow_recursive=True have any effect if '**' is not in the
pattern? If you specify a pattern with '**' and allow_recursive=False, does
that mean that '**' effectively acts as '*' would (i.e. one directory level
only)?

Regards,

Vinay Sajip

[1]
https://bitbucket.org/vinay.sajip/distlib/src/29666/distlib/glob.py?at=default



From p.f.moore at gmail.com  Mon Jan 14 16:58:36 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 14 Jan 2013 15:58:36 +0000
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <CANSw7KxoK5iCuR3BQa3c3G97VAfd3qVEs0hvV4kbN8gAbfND4w@mail.gmail.com>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<CANSw7KwBtoPffEr+rTczX341ZANuYpOxzWnbCNtnzR=OXGQHFQ@mail.gmail.com>
	<CANSw7KxoK5iCuR3BQa3c3G97VAfd3qVEs0hvV4kbN8gAbfND4w@mail.gmail.com>
Message-ID: <CACac1F_8Z19UUF3Vb+BiiVUpYXwvc3=EowXSCZYBvWG0yhuuPQ@mail.gmail.com>

On 14 January 2013 14:52, Yuval Greenfield <ubershmekel at gmail.com> wrote:
> http://bugs.python.org/issue13968
>
> "Support recursive globs"

The time machine strikes again :-)

I'll take a look at that tracker item.
Paul


From storchaka at gmail.com  Mon Jan 14 17:14:55 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 14 Jan 2013 18:14:55 +0200
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
Message-ID: <kd1au1$mp3$1@ger.gmane.org>

On 14.01.13 15:22, Paul Moore wrote:
> This may be simple enough to just be a feature request on the tracker,
> but I thought I'd post it here first to see if people thought it was a
> good idea.

There were several issues on tracker for this feature. Issue 13968 has 
almost ready patch (I should only protect recursive glob from infinite 
symlink loops). Except symlink loops the patch looks working and you can 
try it and make a review. I'm going to finish the work this week.

> For people who don't know the syntax, "a/**/b" is equivalent to "a/*/b
> or a/*/*/b or a/*/*/*/b or ..." (for as many levels as needed).

Or a/b.

> One obvious downside is that if used carelessly, it can make globbing
> pretty slow. So I'd propose that it be added as an optional extension
> enabled using a flag argument (glob(pat, allow_recursive=True)) which
> is false by default. That would also mean that backward compatibility
> should not be an issue.

Indeed. That's why I added the "recursive" parameter and disable this by 
default.




From solipsis at pitrou.net  Mon Jan 14 17:21:20 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 14 Jan 2013 17:21:20 +0100
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<kd1au1$mp3$1@ger.gmane.org>
Message-ID: <20130114172120.43c77a0d@pitrou.net>

Le Mon, 14 Jan 2013 18:14:55 +0200,
Serhiy Storchaka <storchaka at gmail.com> a
?crit :
> 
> > One obvious downside is that if used carelessly, it can make
> > globbing pretty slow. So I'd propose that it be added as an
> > optional extension enabled using a flag argument (glob(pat,
> > allow_recursive=True)) which is false by default. That would also
> > mean that backward compatibility should not be an issue.
> 
> Indeed. That's why I added the "recursive" parameter and disable this
> by default.

Using APIs carelessly is the user's problem, not ours. It should be
sufficient to add a small warning in the docs, as I did in pathlib:

https://pathlib.readthedocs.org/en/latest/#pathlib.Path.glob

Regards

Antoine.




From storchaka at gmail.com  Mon Jan 14 17:21:40 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 14 Jan 2013 18:21:40 +0200
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <loom.20130114T163330-125@post.gmane.org>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
Message-ID: <kd1bam$r76$1@ger.gmane.org>

On 14.01.13 17:46, Vinay Sajip wrote:
> Isn't the requirement to recurse implied by the presence of '**' in the
> pattern? What's to be gained by specifying it using allow_recursive as well?

I'll be glad to make it enabled by default, however I'm feeling this is 
too dangerous. glob('**') on FS root takes too long time. Perhaps that's 
why (and for backward compatibility) this option (called "starglob") is 
disabled by default in Bash.

> Will having allow_recursive=True have any effect if '**' is not in the
> pattern? If you specify a pattern with '**' and allow_recursive=False, does
> that mean that '**' effectively acts as '*' would (i.e. one directory level
> only)?

Yes, as now.




From p.f.moore at gmail.com  Mon Jan 14 17:25:59 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 14 Jan 2013 16:25:59 +0000
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <kd1au1$mp3$1@ger.gmane.org>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<kd1au1$mp3$1@ger.gmane.org>
Message-ID: <CACac1F__=LUuXmr_YCazB6mG_jLmq5BecLBotdXarQeHm9CTyQ@mail.gmail.com>

On 14 January 2013 16:14, Serhiy Storchaka <storchaka at gmail.com> wrote:
>
>> For people who don't know the syntax, "a/**/b" is equivalent to "a/*/b
>> or a/*/*/b or a/*/*/*/b or ..." (for as many levels as needed).
>
>
> Or a/b.

Hmm, from my experiments, bash doesn't show a/b as matching the
pattern a/**/b ...

>> One obvious downside is that if used carelessly, it can make globbing
>> pretty slow. So I'd propose that it be added as an optional extension
>> enabled using a flag argument (glob(pat, allow_recursive=True)) which
>> is false by default. That would also mean that backward compatibility
>> should not be an issue.
>
>
> Indeed. That's why I added the "recursive" parameter and disable this by
> default.

Although I can see Vinay's point, that ** is not useful syntax
currently, so there's no compatibility problem. Careless use resulting
in long glob times is more of a user issue.

Having said that, this debate is *precisely* why I suggested making it
a parameter in the first place, so people can choose for themselves.
So I guess I agree with your decision :-)

Paul.


From storchaka at gmail.com  Mon Jan 14 17:27:24 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 14 Jan 2013 18:27:24 +0200
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <20130114172120.43c77a0d@pitrou.net>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<kd1au1$mp3$1@ger.gmane.org> <20130114172120.43c77a0d@pitrou.net>
Message-ID: <kd1blc$uhk$1@ger.gmane.org>

On 14.01.13 18:21, Antoine Pitrou wrote:
> Using APIs carelessly is the user's problem, not ours. It should be
> sufficient to add a small warning in the docs, as I did in pathlib:

We need a time machine to publish this warning in 1994, before anyone 
used the glob in his program.

Pathlib has an advantage in this.




From steve at pearwood.info  Mon Jan 14 17:24:20 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 15 Jan 2013 03:24:20 +1100
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <loom.20130114T163330-125@post.gmane.org>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
Message-ID: <50F43134.6030902@pearwood.info>

On 15/01/13 02:46, Vinay Sajip wrote:
> Paul Moore<p.f.moore at ...>  writes:
>
>> I'd like it if the glob module supported the (relatively common)
>> facility to use ** to mean recursively search subdirectories.

+1

>> One obvious downside is that if used carelessly, it can make globbing
>> pretty slow. So I'd propose that it be added as an optional extension
>> enabled using a flag argument (glob(pat, allow_recursive=True)) which
>> is false by default. That would also mean that backward compatibility
>> should not be an issue.
>
> Isn't the requirement to recurse implied by the presence of '**' in the
> pattern? What's to be gained by specifying it using allow_recursive as well?

Not necessarily. At the moment, a glob like "/**/spam" is equivalent to
"/*/spam":


[steve at ando /]$ touch /tmp/spam
[steve at ando /]$ mkdir /tmp/ham
[steve at ando /]$ touch /tmp/ham/spam
[steve at ando /]$ python3.3 -c "import glob; print(glob.glob('/**/spam'))"
['/tmp/spam']


With the suggested new functionality, the meaning of the glob will change.

 From a backwards-compatibility point of view, one might not want to enable
the new semantics by default. But, from a *future*-compatibility point of
view, I don't know that it is a good idea to require a flag every time a
new globbing feature is added.

glob.glob(pattern, allow_recurse=True, allow_spam=True, allow_ham=True, allow_eggs=True, ...)


Rather than a flag, I suggest a version number:

glob.glob(pattern, version=1)  # current behaviour, as of 3.3
glob.glob(pattern, version=2)  # adds ** recursion in Python 3.4

Then in Python 3.5 or 3.6 support for version 1 globs could be dropped.



> Will having allow_recursive=True have any effect if '**' is not in the
> pattern?

I would expect that it will not have any effect unless ** is present.
After all, it simply allows ** to recurse, and no other glob
metacharacter can recurse.


>If you specify a pattern with '**' and allow_recursive=False, does
> that mean that '**' effectively acts as '*' would (i.e. one directory level
> only)?

I expect that without allow_recursive=True, ** would behave identically to
a single *


-- 
Steven


From storchaka at gmail.com  Mon Jan 14 17:33:42 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 14 Jan 2013 18:33:42 +0200
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <CACac1F__=LUuXmr_YCazB6mG_jLmq5BecLBotdXarQeHm9CTyQ@mail.gmail.com>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<kd1au1$mp3$1@ger.gmane.org>
	<CACac1F__=LUuXmr_YCazB6mG_jLmq5BecLBotdXarQeHm9CTyQ@mail.gmail.com>
Message-ID: <kd1c17$2g5$1@ger.gmane.org>

On 14.01.13 18:25, Paul Moore wrote:
> Hmm, from my experiments, bash doesn't show a/b as matching the
> pattern a/**/b ...

$ shopt -s globstar
$ echo Lib/**/test
Lib/ctypes/test Lib/sqlite3/test Lib/test Lib/tkinter/test Lib/unittest/test




From ubershmekel at gmail.com  Mon Jan 14 17:39:12 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Mon, 14 Jan 2013 18:39:12 +0200
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <kd1blc$uhk$1@ger.gmane.org>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<kd1au1$mp3$1@ger.gmane.org> <20130114172120.43c77a0d@pitrou.net>
	<kd1blc$uhk$1@ger.gmane.org>
Message-ID: <CANSw7Kx_7ymtmysxu+q9Mg6roeHczr_Ac2vrUspr=rAdqrD4Pg@mail.gmail.com>

On Mon, Jan 14, 2013 at 6:27 PM, Serhiy Storchaka <storchaka at gmail.com>wrote:

> We need a time machine to publish this warning in 1994, before anyone used
> the glob in his program.
>
> Pathlib has an advantage in this.
>
>
The following have been discussed already:

 - deprecate the 'glob' module, moving its functionality to shutil
 - "starglob" or "use_recursive" option
 - have a separate "rglob" or "tree" function do this for you

http://mail.python.org/pipermail/python-bugs-list/2012-February/thread.html#159056

And more at
https://www.google.com/search?q=site%3Amail.python.org+recursive+glob


The patch has been discussed to death already. Not to say that it's too
late to speak your mind, but I think if it passes the proper tests and
review - it should go in.


Yuval Greenfield
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130114/8ce331b3/attachment.html>

From p.f.moore at gmail.com  Mon Jan 14 17:41:46 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Mon, 14 Jan 2013 16:41:46 +0000
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <kd1c17$2g5$1@ger.gmane.org>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<kd1au1$mp3$1@ger.gmane.org>
	<CACac1F__=LUuXmr_YCazB6mG_jLmq5BecLBotdXarQeHm9CTyQ@mail.gmail.com>
	<kd1c17$2g5$1@ger.gmane.org>
Message-ID: <CACac1F81EFkvcEerBig-i3P04RxgJ24qrmdrfg-=r+Zw79w7HQ@mail.gmail.com>

On 14 January 2013 16:33, Serhiy Storchaka <storchaka at gmail.com> wrote:
> On 14.01.13 18:25, Paul Moore wrote:
>> Hmm, from my experiments, bash doesn't show a/b as matching the
>> pattern a/**/b ...
>
> $ shopt -s globstar
> $ echo Lib/**/test
> Lib/ctypes/test Lib/sqlite3/test Lib/test Lib/tkinter/test Lib/unittest/test

Ah, thanks. I hadn't enabled globstar. See what happens when you let a
Windows user near a Unix shell? :-) And the fact that globstar is an
option gives some weight to having a globstar-like flag in the
function signature.

Sorry for the noise.
Paul.


From solipsis at pitrou.net  Mon Jan 14 18:06:34 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 14 Jan 2013 18:06:34 +0100
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<kd1bam$r76$1@ger.gmane.org>
Message-ID: <20130114180634.0da954c5@pitrou.net>

Le Mon, 14 Jan 2013 18:21:40 +0200,
Serhiy Storchaka <storchaka at gmail.com> a
?crit :
> On 14.01.13 17:46, Vinay Sajip wrote:
> > Isn't the requirement to recurse implied by the presence of '**' in
> > the pattern? What's to be gained by specifying it using
> > allow_recursive as well?
> 
> I'll be glad to make it enabled by default, however I'm feeling this
> is too dangerous. glob('**') on FS root takes too long time. Perhaps
> that's why (and for backward compatibility) this option (called
> "starglob") is disabled by default in Bash.

But there's no reason to write glob('**') with the current API.

Regards

Antoine.




From storchaka at gmail.com  Mon Jan 14 21:26:49 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 14 Jan 2013 22:26:49 +0200
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <20130114180634.0da954c5@pitrou.net>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<kd1bam$r76$1@ger.gmane.org> <20130114180634.0da954c5@pitrou.net>
Message-ID: <kd1pmb$b6i$1@ger.gmane.org>

On 14.01.13 19:06, Antoine Pitrou wrote:
> But there's no reason to write glob('**') with the current API.

There is a reason to write glob('*%s*' % escaped_substring).




From greg.ewing at canterbury.ac.nz  Mon Jan 14 22:38:25 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 15 Jan 2013 10:38:25 +1300
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <50F43134.6030902@pearwood.info>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<50F43134.6030902@pearwood.info>
Message-ID: <50F47AD1.2090904@canterbury.ac.nz>

Steven D'Aprano wrote:

> Rather than a flag, I suggest a version number:
> 
> glob.glob(pattern, version=1)  # current behaviour, as of 3.3
> glob.glob(pattern, version=2)  # adds ** recursion in Python 3.4

Yuck, then the reader has to know what features are
enabled by which version numbers -- not something that's
easy to keep in one's head.

-- 
Greg


From bruce at leapyear.org  Mon Jan 14 23:17:17 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Mon, 14 Jan 2013 14:17:17 -0800
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <50F47AD1.2090904@canterbury.ac.nz>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<50F43134.6030902@pearwood.info>
	<50F47AD1.2090904@canterbury.ac.nz>
Message-ID: <CAGu0Ant5QLeEiUz_EpxMgswyL0xoN43PQ03Q4aiLz-edrZQ+Ug@mail.gmail.com>

On Mon, Jan 14, 2013 at 1:38 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>wrote:

> Steven D'Aprano wrote:
>
>  Rather than a flag, I suggest a version number:
>>
>> glob.glob(pattern, version=1)  # current behaviour, as of 3.3
>> glob.glob(pattern, version=2)  # adds ** recursion in Python 3.4
>>
>
> Yuck, then the reader has to know what features are
> enabled by which version numbers -- not something that's
> easy to keep in one's head.


And if you write glob.glob(..., foofeature=True) it will automatically
raise an exception if you use it in a version that doesn't support the
feature rather than silently ignoring the error.

--- Bruce
Check this out: http://bit.ly/yearofpuzzles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130114/2551b053/attachment.html>

From steve at pearwood.info  Tue Jan 15 02:31:56 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 15 Jan 2013 12:31:56 +1100
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <50F47AD1.2090904@canterbury.ac.nz>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<50F43134.6030902@pearwood.info>
	<50F47AD1.2090904@canterbury.ac.nz>
Message-ID: <50F4B18C.2020602@pearwood.info>

On 15/01/13 08:38, Greg Ewing wrote:
> Steven D'Aprano wrote:
>
>> Rather than a flag, I suggest a version number:
>>
>> glob.glob(pattern, version=1) # current behaviour, as of 3.3
>> glob.glob(pattern, version=2) # adds ** recursion in Python 3.4
>
> Yuck, then the reader has to know what features are
> enabled by which version numbers -- not something that's
> easy to keep in one's head.


True. But neither are a plethora of enable_feature flags. Is it
allow_recursion or allow_recursive or enable_double_star? Globbing
is not likely to be something that most people use often enough that
the name of the arguments will stick in their head. People will
likely need to look it up one way or the other.

All this assumes that we need to care about backward compatibility
of ** in existing globs. It does seem to be an unlikely thing for
people to write. If we don't, then no need for a flag at all.
Instead, we could raise a warning for globs with ** in 3.4, and
then drop the warning in 3.5.

Another option, is a new function. Bool parameters that do nothing
but change the behaviour of a function are somewhat of a mild
anti-pattern. Perhaps it is better to just keep glob.glob as is,
and add glob.recglob or rglob to support **.



-- 
Steven


From python at mrabarnett.plus.com  Tue Jan 15 04:20:00 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 15 Jan 2013 03:20:00 +0000
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <50F4B18C.2020602@pearwood.info>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<50F43134.6030902@pearwood.info>
	<50F47AD1.2090904@canterbury.ac.nz>
	<50F4B18C.2020602@pearwood.info>
Message-ID: <50F4CAE0.2000803@mrabarnett.plus.com>

On 2013-01-15 01:31, Steven D'Aprano wrote:
> On 15/01/13 08:38, Greg Ewing wrote:
>> Steven D'Aprano wrote:
>>
>>> Rather than a flag, I suggest a version number:
>>>
>>> glob.glob(pattern, version=1) # current behaviour, as of 3.3
>>> glob.glob(pattern, version=2) # adds ** recursion in Python 3.4
>>
>> Yuck, then the reader has to know what features are
>> enabled by which version numbers -- not something that's
>> easy to keep in one's head.
>
>
> True. But neither are a plethora of enable_feature flags. Is it
> allow_recursion or allow_recursive or enable_double_star? Globbing
> is not likely to be something that most people use often enough that
> the name of the arguments will stick in their head. People will
> likely need to look it up one way or the other.
>
> All this assumes that we need to care about backward compatibility
> of ** in existing globs. It does seem to be an unlikely thing for
> people to write. If we don't, then no need for a flag at all.
> Instead, we could raise a warning for globs with ** in 3.4, and
> then drop the warning in 3.5.
>
> Another option, is a new function. Bool parameters that do nothing
> but change the behaviour of a function are somewhat of a mild
> anti-pattern. Perhaps it is better to just keep glob.glob as is,
> and add glob.recglob or rglob to support **.
>
If there's rglob, then shouldn't there also be riglob or irglob?

If so, then which one? :-)


From bruce at leapyear.org  Tue Jan 15 05:03:08 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Mon, 14 Jan 2013 20:03:08 -0800
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <50F4B18C.2020602@pearwood.info>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<50F43134.6030902@pearwood.info>
	<50F47AD1.2090904@canterbury.ac.nz> <50F4B18C.2020602@pearwood.info>
Message-ID: <CAGu0AnvemKXOh3V3Fp31kuCCBi9qN6GPdf+98v3pe4s=E=RB7g@mail.gmail.com>

On Mon, Jan 14, 2013 at 5:31 PM, Steven D'Aprano <steve at pearwood.info>wrote:

> Yuck, then the reader has to know what features are
>> enabled by which version numbers -- not something that's
>> easy to keep in one's head.
>>
>
>
> True. But neither are a plethora of enable_feature flags. Is it
> allow_recursion or allow_recursive or enable_double_star? Globbing
> is not likely to be something that most people use often enough that
> the name of the arguments will stick in their head. People will
> likely need to look it up one way or the other.
>

I see nothing wrong with asking people to consult the documentation for
features they don't use that frequently. Better to check the docs than get
it wrong. But the reader of the code is more likely to notice something
special is going on when they see glob(..., allow_recursive=True) then
rglob(...).

And I'd rather have flags than rglob to allow recursion and iglob to ignore
case and then either riglob or irglob to do both. Yuck.

--- Bruce
Check this out: http://bit.ly/yearofpuzzles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130114/85a25bee/attachment.html>

From ubershmekel at gmail.com  Tue Jan 15 06:15:20 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Tue, 15 Jan 2013 07:15:20 +0200
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <CAGu0AnvemKXOh3V3Fp31kuCCBi9qN6GPdf+98v3pe4s=E=RB7g@mail.gmail.com>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<50F43134.6030902@pearwood.info>
	<50F47AD1.2090904@canterbury.ac.nz> <50F4B18C.2020602@pearwood.info>
	<CAGu0AnvemKXOh3V3Fp31kuCCBi9qN6GPdf+98v3pe4s=E=RB7g@mail.gmail.com>
Message-ID: <CANSw7KyOfFCPg7bZocGmj=ZJdz_DQH-onddOzKZUmez4=zDJUQ@mail.gmail.com>

On Tue, Jan 15, 2013 at 6:03 AM, Bruce Leban <bruce at leapyear.org> wrote:

>
> [...] and iglob to ignore case and [....]
>
>
>
OT - iglob is the iterator version of glob. perhaps in python 2 this should
have been called "xglob". In python 3 it should have been just "glob".

>>> rglob('**.py')

or

>>> glob('**.py', True)

I don't mind either, though I think the first one is a bit clearer because
"r" is more telling than "True". Don't mention glob('**.py',
allow_recursive=True) because that's probably not going to be the norm.

Yuval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130115/2c1cc073/attachment.html>

From ncoghlan at gmail.com  Tue Jan 15 07:33:17 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 15 Jan 2013 16:33:17 +1000
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <50F4B18C.2020602@pearwood.info>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<50F43134.6030902@pearwood.info>
	<50F47AD1.2090904@canterbury.ac.nz>
	<50F4B18C.2020602@pearwood.info>
Message-ID: <CADiSq7fZv375euwhwNBNgLHStFCFKMypY6dC9aF8T4N0WryB+Q@mail.gmail.com>

On Tue, Jan 15, 2013 at 11:31 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> All this assumes that we need to care about backward compatibility
> of ** in existing globs. It does seem to be an unlikely thing for
> people to write. If we don't, then no need for a flag at all.
> Instead, we could raise a warning for globs with ** in 3.4, and
> then drop the warning in 3.5.
>
> Another option, is a new function. Bool parameters that do nothing
> but change the behaviour of a function are somewhat of a mild
> anti-pattern. Perhaps it is better to just keep glob.glob as is,
> and add glob.recglob or rglob to support **.

Making boolean parameters less awful from a readability perspective is
part of the rationale for keyword-only arguments: they force you to
include the parameter name, thus making the call self-documenting.

In this case, the conservative backwards compatible migration path would be:

In 3.4:
- add the recursive globbing capability
- add "allow_recursive=None" as a keyword-only argument
- emit a DeprecationWarning if the double-star pattern is seen when
allow_recursive is None (but not when it is explicitly False)

In 3.5:
- switch the allow_recursive default value to True
- drop the deprecation warning

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From jeremy at jeremysanders.net  Tue Jan 15 09:36:53 2013
From: jeremy at jeremysanders.net (Jeremy Sanders)
Date: Tue, 15 Jan 2013 08:36:53 +0000
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
Message-ID: <kd34f3$2he$1@ger.gmane.org>

Vinay Sajip wrote:
 
> Isn't the requirement to recurse implied by the presence of '**' in the
> pattern? What's to be gained by specifying it using allow_recursive as
> well? Will having allow_recursive=True have any effect if '**' is not in
> the pattern? If you specify a pattern with '**' and allow_recursive=False,
> does that mean that '**' effectively acts as '*' would (i.e. one directory
> level only)?

The glob string may come from the user or a remote source. It is possible 
that developer using glob has never considered "**" might be added, leading 
to an attacker accessing files in directories they are not allowed to, or 
DoS attacks because glob becomes very slow.

Jeremy




From ethan at stoneleaf.us  Tue Jan 15 18:00:30 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 15 Jan 2013 09:00:30 -0800
Subject: [Python-ideas] Adding '**' recursive search to glob.glob
In-Reply-To: <CANSw7KyOfFCPg7bZocGmj=ZJdz_DQH-onddOzKZUmez4=zDJUQ@mail.gmail.com>
References: <CACac1F84uKY7KbQqvJRh4_ZunXU1p+Y37mEufFhjTiSF7xF9Tg@mail.gmail.com>
	<loom.20130114T163330-125@post.gmane.org>
	<50F43134.6030902@pearwood.info>
	<50F47AD1.2090904@canterbury.ac.nz>
	<50F4B18C.2020602@pearwood.info>
	<CAGu0AnvemKXOh3V3Fp31kuCCBi9qN6GPdf+98v3pe4s=E=RB7g@mail.gmail.com>
	<CANSw7KyOfFCPg7bZocGmj=ZJdz_DQH-onddOzKZUmez4=zDJUQ@mail.gmail.com>
Message-ID: <50F58B2E.402@stoneleaf.us>

On 01/14/2013 09:15 PM, Yuval Greenfield wrote:
>  >>> rglob('**.py')
>
> or
>
>  >>> glob('**.py', True)
>
> I don't mind either, though I think the first one is a bit clearer
> because "r" is more telling than "True". Don't mention glob('**.py',
> allow_recursive=True) because that's probably not going to be the norm.

If `allow_recursive` is a keyword-only parameter it will be the norm.  :)

~Ethan~



From tarek at ziade.org  Wed Jan 16 11:30:22 2013
From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Wed, 16 Jan 2013 11:30:22 +0100
Subject: [Python-ideas] Parametrized any() and all() ?
Message-ID: <50F6813E.60503@ziade.org>

Hello

any() and all() are very useful small functions, and I am wondering if 
it could be interesting to have them work
with different operators, by using a callable.

e.g. something like:

import operator

def  any(iterable, filter=operator.truth):
     for  element  in  iterable:
         if  filter(element):
             return  True
     return  False


For instance I could then us any() to find out if there's a None in the 
sequence:

if  any(iterable, op=lambda x: x is None):
     raise SomeError("There's a none in that list")


Granted, it's easy to do it myself in a small util function - but since 
any() and all() are in Python...


Cheers
Tarek

-- 
Tarek Ziad? ? http://ziade.org ? @tarek_ziade

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/1f62ee59/attachment.html>

From _ at lvh.cc  Wed Jan 16 11:33:41 2013
From: _ at lvh.cc (Laurens Van Houtven)
Date: Wed, 16 Jan 2013 11:33:41 +0100
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <50F6813E.60503@ziade.org>
References: <50F6813E.60503@ziade.org>
Message-ID: <CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>

Hey Tarek,

I would write that as any(x is None for x in it) -- the example you gave
doesn't really strike me as an improvement over that, although I could see
how many there are cases where it's nicer...


On Wed, Jan 16, 2013 at 11:30 AM, Tarek Ziad? <tarek at ziade.org> wrote:

>  Hello
>
> any() and all() are very useful small functions, and I am wondering if it
> could be interesting to have them work
> with different operators, by using a callable.
>
> e.g. something like:
>
> import operator
>
> def any(iterable, filter=operator.truth):
>     for element in iterable:
>         if filter(element):
>             return True
>     return False
>
>
> For instance I could then us any() to find out if there's a None in the
> sequence:
>
> if any(iterable, op=lambda x: x is None):
>     raise SomeError("There's a none in that list")
>
>
> Granted, it's easy to do it myself in a small util function - but since
> any() and all() are in Python...
>
>
> Cheers
> Tarek
>
> --
> Tarek Ziad? ? http://ziade.org ? @tarek_ziade
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
cheers
lvh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/1373215e/attachment.html>

From songofacandy at gmail.com  Wed Jan 16 11:37:21 2013
From: songofacandy at gmail.com (INADA Naoki)
Date: Wed, 16 Jan 2013 19:37:21 +0900
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
Message-ID: <CAEfz+TxiFm0-i6+Egx7FRmcpnDJt_rWD_QNKzHZG0fqpFeB3MQ@mail.gmail.com>

I think adding this example to docstring and document may help many people.


On Wed, Jan 16, 2013 at 7:33 PM, Laurens Van Houtven <_ at lvh.cc> wrote:

> Hey Tarek,
>
> I would write that as any(x is None for x in it) -- the example you gave
> doesn't really strike me as an improvement over that, although I could see
> how many there are cases where it's nicer...
>
>
> On Wed, Jan 16, 2013 at 11:30 AM, Tarek Ziad? <tarek at ziade.org> wrote:
>
>>  Hello
>>
>> any() and all() are very useful small functions, and I am wondering if it
>> could be interesting to have them work
>> with different operators, by using a callable.
>>
>> e.g. something like:
>>
>> import operator
>>
>> def any(iterable, filter=operator.truth):
>>     for element in iterable:
>>         if filter(element):
>>             return True
>>     return False
>>
>>
>> For instance I could then us any() to find out if there's a None in the
>> sequence:
>>
>> if any(iterable, op=lambda x: x is None):
>>     raise SomeError("There's a none in that list")
>>
>>
>> Granted, it's easy to do it myself in a small util function - but since
>> any() and all() are in Python...
>>
>>
>> Cheers
>> Tarek
>>
>> --
>> Tarek Ziad? ? http://ziade.org ? @tarek_ziade
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
>>
>>
>
>
> --
> cheers
> lvh
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 
INADA Naoki  <songofacandy at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/21d787f0/attachment.html>

From tarek at ziade.org  Wed Jan 16 11:44:13 2013
From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Wed, 16 Jan 2013 11:44:13 +0100
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
Message-ID: <50F6847D.2020404@ziade.org>

On 1/16/13 11:33 AM, Laurens Van Houtven wrote:
> Hey Tarek,
>
> I would write that as any(x is None for x in it)

But here you're building yet another iterable to adapt it to any(), 
which seems to me overkill if we can just parametrized the loop in any()


Cheers
Tarek

-- 
Tarek Ziad? ? http://ziade.org ? @tarek_ziade



From masklinn at masklinn.net  Wed Jan 16 12:08:55 2013
From: masklinn at masklinn.net (Masklinn)
Date: Wed, 16 Jan 2013 12:08:55 +0100
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <50F6847D.2020404@ziade.org>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
Message-ID: <93F3FFBC-F145-4956-9512-04DF46A0E14C@masklinn.net>

On 2013-01-16, at 11:44 , Tarek Ziad? wrote:
> On 1/16/13 11:33 AM, Laurens Van Houtven wrote:
>> Hey Tarek,
>> 
>> I would write that as any(x is None for x in it)
> 
> But here you're building yet another iterable to adapt it to any(), which seems to me overkill if we can just parametrized the loop in any()

It's just a generator, and will be terminated early if possible.

I'm pretty sure adding a key function to any and all has already been
submitted several times, and from what I remember it was struck down
every time because the use case is covered by Laurens's suggestion: key
functions are necessary when you'd otherwise need DSU (because the
result is the original input, not the key function's output) but it's
not the case for any() and all()

Here's the previous/latest instance: http://mail.python.org/pipermail/python-ideas/2012-July/015837.html

From ncoghlan at gmail.com  Wed Jan 16 12:10:00 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 16 Jan 2013 21:10:00 +1000
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <50F6847D.2020404@ziade.org>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
Message-ID: <CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>

On Wed, Jan 16, 2013 at 8:44 PM, Tarek Ziad? <tarek at ziade.org> wrote:
> On 1/16/13 11:33 AM, Laurens Van Houtven wrote:
>>
>> Hey Tarek,
>>
>> I would write that as any(x is None for x in it)
>
>
> But here you're building yet another iterable to adapt it to any(), which
> seems to me overkill if we can just parametrized the loop in any()

Such a micro-optimization isn't worth the cost of adding a second way
to do it that everyone will then need to learn.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From oscar.j.benjamin at gmail.com  Wed Jan 16 12:20:54 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Wed, 16 Jan 2013 11:20:54 +0000
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <50F6813E.60503@ziade.org>
References: <50F6813E.60503@ziade.org>
Message-ID: <CAHVvXxQP+MUcR1+mrEae6V_YPA9d2t_JsVAETog1q3vhKKNLfA@mail.gmail.com>

On 16 January 2013 10:30, Tarek Ziad? <tarek at ziade.org> wrote:
> Hello
>
> any() and all() are very useful small functions, and I am wondering if it
> could be interesting to have them work
> with different operators, by using a callable.
>
> e.g. something like:
>
> import operator
>
> def any(iterable, filter=operator.truth):
>     for element in iterable:
>         if filter(element):
>             return True
>     return False
>
>
> For instance I could then us any() to find out if there's a None in the
> sequence:
>
> if any(iterable, op=lambda x: x is None):
>     raise SomeError("There's a none in that list")
>
>
> Granted, it's easy to do it myself in a small util function - but since
> any() and all() are in Python...

I wouldn't write a util function for this. The resulting code
    any(iterable, op=func)
is not really shorter, easier or clearer than the current methods
    any(map(func, iterable))
    any(func(x) for x in iterable)


Oscar


From steve at pearwood.info  Wed Jan 16 15:10:32 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 17 Jan 2013 01:10:32 +1100
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
Message-ID: <50F6B4D8.6070002@pearwood.info>

On 16/01/13 22:10, Nick Coghlan wrote:
> On Wed, Jan 16, 2013 at 8:44 PM, Tarek Ziad?<tarek at ziade.org>  wrote:
>> On 1/16/13 11:33 AM, Laurens Van Houtven wrote:
>>>
>>> Hey Tarek,
>>>
>>> I would write that as any(x is None for x in it)
>>
>>
>> But here you're building yet another iterable to adapt it to any(), which
>> seems to me overkill if we can just parametrized the loop in any()
>
> Such a micro-optimization isn't worth the cost of adding a second way
> to do it that everyone will then need to learn.


For all
we know, adding a filter function will be a pessimization, not an optimization,
using more memory and/or being slower than using a generator expression. It
certainly isn't clear to me that creating a generator expression like
(x is None for x in it) is more expensive than creating a filter function like
(lambda x: x is None).

-1 on adding a filter function.



-- 
Steven


From tarek at ziade.org  Wed Jan 16 15:52:19 2013
From: tarek at ziade.org (=?UTF-8?B?VGFyZWsgWmlhZMOp?=)
Date: Wed, 16 Jan 2013 15:52:19 +0100
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <50F6B4D8.6070002@pearwood.info>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
	<50F6B4D8.6070002@pearwood.info>
Message-ID: <50F6BEA3.7090807@ziade.org>

On 1/16/13 3:10 PM, Steven D'Aprano wrote:
> On 16/01/13 22:10, Nick Coghlan wrote:
>> On Wed, Jan 16, 2013 at 8:44 PM, Tarek Ziad?<tarek at ziade.org>  wrote:
>>> On 1/16/13 11:33 AM, Laurens Van Houtven wrote:
>>>>
>>>> Hey Tarek,
>>>>
>>>> I would write that as any(x is None for x in it)
>>>
>>>
>>> But here you're building yet another iterable to adapt it to any(), 
>>> which
>>> seems to me overkill if we can just parametrized the loop in any()
>>
>> Such a micro-optimization isn't worth the cost of adding a second way
>> to do it that everyone will then need to learn.
>
>
> For all
> we know, adding a filter function will be a pessimization, not an 
> optimization,
> using more memory and/or being slower than using a generator 
> expression. It
> certainly isn't clear to me that creating a generator expression like
> (x is None for x in it) is more expensive than creating a filter 
> function like
> (lambda x: x is None).
>
> -1 on adding a filter function.

I abandoned the idea,

but I'd be curious to understand how creating several
iterables with one that has an 'if', can be more efficient than having a 
single
iterable with an 'if'...


>
>
>


-- 
Tarek Ziad? ? http://ziade.org ? @tarek_ziade



From p.f.moore at gmail.com  Wed Jan 16 16:12:16 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 16 Jan 2013 15:12:16 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
Message-ID: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>

I've so far been lurking on the tulip/async discussions, as although
I'm interested, I have no specific need for writing high-performance
network code.

However, I hit a use case today which seems to me to be ideal for an
async-style approach, and yet I don't think it's covered by the
current PEP. Specifically, I am looking at monitoring a
subprocess.Popen object. This is basically an IO loop, but monitoring
the 3 pipes to the subprocess (well, only stdout and stderr in my
case...). Something like add_reader/add_writer would be fine, except
for the fact that (a) they are documented as low-level not for the
user, and (b) they don't work in all cases (e.g. in a select-based
loop on Windows).

I'd like PEP 3156 to include some support for waiting on IO from (one
or more) subprocesses like this in a cross-platform way. If there's
something in there to do this at the moment, that's great, but it
wasn't obvious to me when I looked...

Paul.


From eliben at gmail.com  Wed Jan 16 16:12:05 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Wed, 16 Jan 2013 07:12:05 -0800
Subject: [Python-ideas] question about the Tulip effort
Message-ID: <CAF-Rda94Buwv5dQs5d+f0LVZu+Q46uXaL+uc1EowvJwSh2A87Q@mail.gmail.com>

Hi,

I've been reading PEP 3156 and looking at the reference implementation (
http://code.google.com/p/tulip/). I'll be happy to contribute to the
effort, and following are a couple of questions on how to do that.

1. Questions and clarifications should be sent to this list (python-ideas),
correct?
2. Is there a list of tasks help would be needed with? Is it the the TODO
file in tulip's root dir?
3. How/where to contribute patches?

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/84c8cde4/attachment.html>

From ronaldoussoren at mac.com  Wed Jan 16 16:13:15 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Wed, 16 Jan 2013 16:13:15 +0100
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <50F6BEA3.7090807@ziade.org>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
	<50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org>
Message-ID: <D746FB22-9866-4FDC-B460-966A51E0136E@mac.com>


On 16 Jan, 2013, at 15:52, Tarek Ziad? <tarek at ziade.org> wrote:

> On 1/16/13 3:10 PM, Steven D'Aprano wrote:
>> On 16/01/13 22:10, Nick Coghlan wrote:
>>> On Wed, Jan 16, 2013 at 8:44 PM, Tarek Ziad?<tarek at ziade.org>  wrote:
>>>> On 1/16/13 11:33 AM, Laurens Van Houtven wrote:
>>>>> 
>>>>> Hey Tarek,
>>>>> 
>>>>> I would write that as any(x is None for x in it)
>>>> 
>>>> 
>>>> But here you're building yet another iterable to adapt it to any(), which
>>>> seems to me overkill if we can just parametrized the loop in any()
>>> 
>>> Such a micro-optimization isn't worth the cost of adding a second way
>>> to do it that everyone will then need to learn.
>> 
>> 
>> For all
>> we know, adding a filter function will be a pessimization, not an optimization,
>> using more memory and/or being slower than using a generator expression. It
>> certainly isn't clear to me that creating a generator expression like
>> (x is None for x in it) is more expensive than creating a filter function like
>> (lambda x: x is None).
>> 
>> -1 on adding a filter function.
> 
> I abandoned the idea,
> 
> but I'd be curious to understand how creating several
> iterables with one that has an 'if', can be more efficient than having a single
> iterable with an 'if'...

Have you any reason to assume that "any(x is None for x in it)" is slow?

I wouldn't be surprised if a key argument for any/all would have a higher overhead than the generator expression (if there is any difference).  The key function would have to be called after all, with the overhead of normal function calls.

Ronald

> 
> 
>> 
>> 
>> 
> 
> 
> -- 
> Tarek Ziad? ? http://ziade.org ? @tarek_ziade
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas



From guido at python.org  Wed Jan 16 18:52:57 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Jan 2013 09:52:57 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
Message-ID: <CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>

On Wed, Jan 16, 2013 at 7:12 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> I've so far been lurking on the tulip/async discussions, as although
> I'm interested, I have no specific need for writing high-performance
> network code.
>
> However, I hit a use case today which seems to me to be ideal for an
> async-style approach, and yet I don't think it's covered by the
> current PEP. Specifically, I am looking at monitoring a
> subprocess.Popen object. This is basically an IO loop, but monitoring
> the 3 pipes to the subprocess (well, only stdout and stderr in my
> case...). Something like add_reader/add_writer would be fine, except
> for the fact that (a) they are documented as low-level not for the
> user, and (b) they don't work in all cases (e.g. in a select-based
> loop on Windows).
>
> I'd like PEP 3156 to include some support for waiting on IO from (one
> or more) subprocesses like this in a cross-platform way. If there's
> something in there to do this at the moment, that's great, but it
> wasn't obvious to me when I looked...

This is a great use case. The right approach would probably be to
define a new Transport (and an event loop method to create one) that
wraps pipes going into and out of a subprocess. The new method would
have a standard API (probably similar to that of subprocess), whereas
there would be different implementations of the Transport based on
platform and event loop implementation (similar to the way the
subprocess module has quite different implementations).

Can you check out the Tulip source code (code.google.com/p/tulip) and
come up with a patch to do this? I'll gladly review it. It's fine to
only cover the UNIX case for now.

-- 
--Guido van Rossum (python.org/~guido)


From p.f.moore at gmail.com  Wed Jan 16 18:59:55 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 16 Jan 2013 17:59:55 +0000
Subject: [Python-ideas] The async API of the future
In-Reply-To: <k741qk$hat$1@ger.gmane.org>
References: <CAP7+vJLzct4p_SHyMHPc6C0aDE=-zbHw-L6F9502xi8zfGpj9w@mail.gmail.com>
	<2CEFACA8-FB96-4C17-9D14-CADEE217F662@molden.no>
	<CAP7+vJKXgmTXA7JnHw0=uGst5P=mxv3HhFMxh71GDGOn4ZFQDQ@mail.gmail.com>
	<k741qk$hat$1@ger.gmane.org>
Message-ID: <CACac1F9Grgwvz7-6RS902BEEwuP6mYROjmnxYoOKf3aoKQEDfA@mail.gmail.com>

On 3 November 2012 21:20, Richard Oudkerk <shibturn at gmail.com> wrote:
> The IOCP proactor does not support ssl (or ipv6) so main.py does not succeed
> in downloading from xkcd.com using ssl.  Using the other proactors it works
> correctly.
>
> The basic interface for the proactor looks like
>
>     class Proactor:
>         def recv(self, sock, n): ...
>         def send(self, sock, buf): ...
>         def connect(self, sock, address): ...
>         def accept(self, sock): ...
>
>         def poll(self, timeout=None): ...
>         def pollable(self): ...

I've just been looking at this, and from what I can see, am I right in
thinking that the IOCP support is *only* for sockets? (I'm not very
familiar with socket programming, so I had a bit of difficulty
following the code). In particular, it can't be used to register
non-socket file objects? From my understanding of the IOCP
documentation on MSDN, this is fundamental - IOCP can only be used on
HANDLE objects that have been opened with the FILE_FLAG_OVERLAPPED
flag, which is not used by "normal" Python IO objects like file
handles and pipes, so it will never be possible to poll these objects
using IOCP.

Just trying to make sure I understand the scope of this work...

Paul


From guido at python.org  Wed Jan 16 19:07:15 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Jan 2013 10:07:15 -0800
Subject: [Python-ideas] question about the Tulip effort
In-Reply-To: <CAF-Rda94Buwv5dQs5d+f0LVZu+Q46uXaL+uc1EowvJwSh2A87Q@mail.gmail.com>
References: <CAF-Rda94Buwv5dQs5d+f0LVZu+Q46uXaL+uc1EowvJwSh2A87Q@mail.gmail.com>
Message-ID: <CAP7+vJL928zc975Kf8Jix_h3zgjBuN-cJbGF8qN5DEz2bO1PSA@mail.gmail.com>

On Wed, Jan 16, 2013 at 7:12 AM, Eli Bendersky <eliben at gmail.com> wrote:
> I've been reading PEP 3156 and looking at the reference implementation
> (http://code.google.com/p/tulip/). I'll be happy to contribute to the
> effort, and following are a couple of questions on how to do that.

And I'd be happy to have your help!

> 1. Questions and clarifications should be sent to this list (python-ideas),
> correct?

Yes, unless you think it's of little public value, you can always mail
me directly (Tulip is my top priority until the PEP is accepted and
Tulip lands in the 3.4 stdlib).

> 2. Is there a list of tasks help would be needed with? Is it the the TODO
> file in tulip's root dir?

Hm, that's mostly reminders for myself, and I don't always update it.
There are also lots of TODOs and XXXs in the source code (the XXXs
mark things that are *definitely* in need of fixing, like missing
docstrings; TODOs are often just for pondering). You can certainly
read through it, and if you see a task you would like to do, ping me
for details.

Some tasks that I don't think are represented well but where I would
love to get help:

- Write a somewhat significant server app. I have a somewhat
significant client app (crawl.py) but nothing that exercises the
server API at all. I suspect that there are some awkward things in the
server API that will need fixing.

- Try writing a significant app for a protocol other than HTTP.

- Move the StreamReader class out of http_client.py and design an API
to make it easy to hook it up to any protocol.

- Datagram support (read the section in the PEP on this topic first).

> 3. How/where to contribute patches?

I like to get code review requests using codereview.appspot.com (send
them to gvanrossum at gmail.com). Please use the upload.py utility to
upload your patch, don't bother with defining a repository. If I like
your patch I'll probably ask you to submit it yourself, I'll give you
repo access once you've signed a PSF contributor form.

-- 
--Guido van Rossum (python.org/~guido)


From p.f.moore at gmail.com  Wed Jan 16 19:10:34 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 16 Jan 2013 18:10:34 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
Message-ID: <CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>

On 16 January 2013 17:52, Guido van Rossum <guido at python.org> wrote:
> On Wed, Jan 16, 2013 at 7:12 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>> I've so far been lurking on the tulip/async discussions, as although
>> I'm interested, I have no specific need for writing high-performance
>> network code.
>>
>> However, I hit a use case today which seems to me to be ideal for an
>> async-style approach, and yet I don't think it's covered by the
>> current PEP. Specifically, I am looking at monitoring a
>> subprocess.Popen object. This is basically an IO loop, but monitoring
>> the 3 pipes to the subprocess (well, only stdout and stderr in my
>> case...). Something like add_reader/add_writer would be fine, except
>> for the fact that (a) they are documented as low-level not for the
>> user, and (b) they don't work in all cases (e.g. in a select-based
>> loop on Windows).
>>
>> I'd like PEP 3156 to include some support for waiting on IO from (one
>> or more) subprocesses like this in a cross-platform way. If there's
>> something in there to do this at the moment, that's great, but it
>> wasn't obvious to me when I looked...
>
> This is a great use case. The right approach would probably be to
> define a new Transport (and an event loop method to create one) that
> wraps pipes going into and out of a subprocess. The new method would
> have a standard API (probably similar to that of subprocess), whereas
> there would be different implementations of the Transport based on
> platform and event loop implementation (similar to the way the
> subprocess module has quite different implementations).
>
> Can you check out the Tulip source code (code.google.com/p/tulip) and
> come up with a patch to do this? I'll gladly review it. It's fine to
> only cover the UNIX case for now.

I'll have a look. There *is* one problem, though - I imagine it will
be relatively easy to put something together that works on Unix, as
waiting on pipes is covered by the existing select/poll mechanisms.
But I'm on Windows, so I won't be able to test it. And on Windows,
there's no mechanism in place to wait on arbitrary filehandles, so the
process wait mechanism is a much harder nut to crack. Chicken and egg
problem...

Maybe I'll start by looking at waiting on arbitrary filehandles, and
use that to build the process approach. Unfortunately, I don't think
IOCP is any more able to wait on arbitrary files than select - see my
followup to an older thread on Richard's work there. Or maybe I'll set
up a hacking environment in a Linux VM or something. That'd be a fun
experience in any case.

I'll have to get my brain round the existing spec as well. I'm finding
it hard to understand why there are so many methods on the event loop
that are specific to particular use cases (for this example, your
suggested method to create the new type of Transport). My instinct
says that this should *also* be a good test case for a user coming up
with a new type of "event source" and wanting to plug it into the
event loop. Having to add a new method to the event loop seems to
imply this isn't possible.

OK, off to do a lot of spec reading and then some coding. With luck,
you'll be patient with dumb questions from me on the way :-)

Thanks,
Paul


From amauryfa at gmail.com  Wed Jan 16 19:15:05 2013
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Wed, 16 Jan 2013 19:15:05 +0100
Subject: [Python-ideas] The async API of the future
In-Reply-To: <CACac1F9Grgwvz7-6RS902BEEwuP6mYROjmnxYoOKf3aoKQEDfA@mail.gmail.com>
References: <CAP7+vJLzct4p_SHyMHPc6C0aDE=-zbHw-L6F9502xi8zfGpj9w@mail.gmail.com>
	<2CEFACA8-FB96-4C17-9D14-CADEE217F662@molden.no>
	<CAP7+vJKXgmTXA7JnHw0=uGst5P=mxv3HhFMxh71GDGOn4ZFQDQ@mail.gmail.com>
	<k741qk$hat$1@ger.gmane.org>
	<CACac1F9Grgwvz7-6RS902BEEwuP6mYROjmnxYoOKf3aoKQEDfA@mail.gmail.com>
Message-ID: <CAGmFidaVkC=tQXq2NpYy_QoDOpoVgRiuo=Jr2Ei1-_Q6d_DoOg@mail.gmail.com>

2013/1/16 Paul Moore <p.f.moore at gmail.com>

> On 3 November 2012 21:20, Richard Oudkerk <shibturn at gmail.com> wrote:
> > The IOCP proactor does not support ssl (or ipv6) so main.py does not
> succeed
> > in downloading from xkcd.com using ssl.  Using the other proactors it
> works
> > correctly.
> >
> > The basic interface for the proactor looks like
> >
> >     class Proactor:
> >         def recv(self, sock, n): ...
> >         def send(self, sock, buf): ...
> >         def connect(self, sock, address): ...
> >         def accept(self, sock): ...
> >
> >         def poll(self, timeout=None): ...
> >         def pollable(self): ...
>
> I've just been looking at this, and from what I can see, am I right in
> thinking that the IOCP support is *only* for sockets? (I'm not very
> familiar with socket programming, so I had a bit of difficulty
> following the code). In particular, it can't be used to register
> non-socket file objects? From my understanding of the IOCP
> documentation on MSDN, this is fundamental - IOCP can only be used on
> HANDLE objects that have been opened with the FILE_FLAG_OVERLAPPED
> flag, which is not used by "normal" Python IO objects like file
> handles and pipes, so it will never be possible to poll these objects
> using IOCP.
>

It works for disk files as well, but you indeed have to pass
FILE_FLAG_OVERLAPPED when opening the file.

This is similar to sockets: s.setblocking(False) is required for
asynchronous writes to work.

-- 
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/6403bb9b/attachment.html>

From geertj at gmail.com  Wed Jan 16 19:20:23 2013
From: geertj at gmail.com (Geert Jansen)
Date: Wed, 16 Jan 2013 20:20:23 +0200
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
Message-ID: <CADbA=FXwajg8=EHyFzdccCZP5yR047u1YvOXxBvrhNZ32+JZwA@mail.gmail.com>

On Wed, Jan 16, 2013 at 8:10 PM, Paul Moore <p.f.moore at gmail.com> wrote:

> I'll have a look. There *is* one problem, though - I imagine it will
> be relatively easy to put something together that works on Unix, as
> waiting on pipes is covered by the existing select/poll mechanisms.
> But I'm on Windows, so I won't be able to test it. And on Windows,
> there's no mechanism in place to wait on arbitrary filehandles, so the
> process wait mechanism is a much harder nut to crack. Chicken and egg
> problem...
>
> Maybe I'll start by looking at waiting on arbitrary filehandles, and
> use that to build the process approach. Unfortunately, I don't think
> IOCP is any more able to wait on arbitrary files than select - see my
> followup to an older thread on Richard's work there. Or maybe I'll set
> up a hacking environment in a Linux VM or something. That'd be a fun
> experience in any case.

Dealing with subprocesses on Windows in a non-blocking way is a royal
pain. As far as I know, the only option is to use named pipes and
block on them using a thread pool.  A few years back I wrote something
that did this, see the link below. However it ain't pretty..

https://bitbucket.org/geertj/winpexpect/src/tip/lib/winpexpect.py

Regards,
Geert


From guido at python.org  Wed Jan 16 19:21:40 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Jan 2013 10:21:40 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
Message-ID: <CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>

On Wed, Jan 16, 2013 at 10:10 AM, Paul Moore <p.f.moore at gmail.com> wrote:

> On 16 January 2013 17:52, Guido van Rossum <guido at python.org> wrote:
> > On Wed, Jan 16, 2013 at 7:12 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> >> I've so far been lurking on the tulip/async discussions, as although
> >> I'm interested, I have no specific need for writing high-performance
> >> network code.
> >>
> >> However, I hit a use case today which seems to me to be ideal for an
> >> async-style approach, and yet I don't think it's covered by the
> >> current PEP. Specifically, I am looking at monitoring a
> >> subprocess.Popen object. This is basically an IO loop, but monitoring
> >> the 3 pipes to the subprocess (well, only stdout and stderr in my
> >> case...). Something like add_reader/add_writer would be fine, except
> >> for the fact that (a) they are documented as low-level not for the
> >> user, and (b) they don't work in all cases (e.g. in a select-based
> >> loop on Windows).
> >>
> >> I'd like PEP 3156 to include some support for waiting on IO from (one
> >> or more) subprocesses like this in a cross-platform way. If there's
> >> something in there to do this at the moment, that's great, but it
> >> wasn't obvious to me when I looked...
> >
> > This is a great use case. The right approach would probably be to
> > define a new Transport (and an event loop method to create one) that
> > wraps pipes going into and out of a subprocess. The new method would
> > have a standard API (probably similar to that of subprocess), whereas
> > there would be different implementations of the Transport based on
> > platform and event loop implementation (similar to the way the
> > subprocess module has quite different implementations).
> >
> > Can you check out the Tulip source code (code.google.com/p/tulip) and
> > come up with a patch to do this? I'll gladly review it. It's fine to
> > only cover the UNIX case for now.
>
> I'll have a look. There *is* one problem, though - I imagine it will
> be relatively easy to put something together that works on Unix, as
> waiting on pipes is covered by the existing select/poll mechanisms.
> But I'm on Windows, so I won't be able to test it. And on Windows,
> there's no mechanism in place to wait on arbitrary filehandles, so the
> process wait mechanism is a much harder nut to crack. Chicken and egg
> problem...
>

What does the subprocess module do on Windows? (I'm in the reverse
position, although I have asked the kind IT folks at Dropbox to provide me
with a Windows machine.)


> Maybe I'll start by looking at waiting on arbitrary filehandles, and
> use that to build the process approach. Unfortunately, I don't think
> IOCP is any more able to wait on arbitrary files than select - see my
> followup to an older thread on Richard's work there. Or maybe I'll set
> up a hacking environment in a Linux VM or something. That'd be a fun
> experience in any case.
>

I'm eagerly awaiting Richard's response. AFAIK handles on Windows *are*
more general than sockets...


> I'll have to get my brain round the existing spec as well. I'm finding
> it hard to understand why there are so many methods on the event loop
> that are specific to particular use cases (for this example, your
> suggested method to create the new type of Transport).


This is mainly so that the event loop implementation can control the
Transport class. Note that it isn't enough to define different Transport
classes per platform -- on a single platform there may be multiple event
loop implementations (e.g. on Windows you can use Select or IOCP) and these
may need different Transport implementations. SO this must really be under
control of the event loop object.


> My instinct
> says that this should *also* be a good test case for a user coming up
> with a new type of "event source" and wanting to plug it into the
> event loop. Having to add a new method to the event loop seems to
> imply this isn't possible.
>

If the user is okay with solving the problem only for their particular
platform and event loop implementation they don't need to add anything to
the event loop. But for transports that make it into the PEP, it is
essential that alternate implementations (e.g. one that proxies a Twisted
Reactor) be in control of the Transport construction.

>
> OK, off to do a lot of spec reading and then some coding. With luck,
> you'll be patient with dumb questions from me on the way :-)
>

I will be!

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/99260a10/attachment.html>

From guido at python.org  Wed Jan 16 19:22:21 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Jan 2013 10:22:21 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADbA=FXwajg8=EHyFzdccCZP5yR047u1YvOXxBvrhNZ32+JZwA@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CADbA=FXwajg8=EHyFzdccCZP5yR047u1YvOXxBvrhNZ32+JZwA@mail.gmail.com>
Message-ID: <CAP7+vJ+nfmW7QoMQ4WYD2A56ncuvPtKhsVD3qMqjmzq=XjXawA@mail.gmail.com>

On Wed, Jan 16, 2013 at 10:20 AM, Geert Jansen <geertj at gmail.com> wrote:

> On Wed, Jan 16, 2013 at 8:10 PM, Paul Moore <p.f.moore at gmail.com> wrote:
>
> > I'll have a look. There *is* one problem, though - I imagine it will
> > be relatively easy to put something together that works on Unix, as
> > waiting on pipes is covered by the existing select/poll mechanisms.
> > But I'm on Windows, so I won't be able to test it. And on Windows,
> > there's no mechanism in place to wait on arbitrary filehandles, so the
> > process wait mechanism is a much harder nut to crack. Chicken and egg
> > problem...
> >
> > Maybe I'll start by looking at waiting on arbitrary filehandles, and
> > use that to build the process approach. Unfortunately, I don't think
> > IOCP is any more able to wait on arbitrary files than select - see my
> > followup to an older thread on Richard's work there. Or maybe I'll set
> > up a hacking environment in a Linux VM or something. That'd be a fun
> > experience in any case.
>
> Dealing with subprocesses on Windows in a non-blocking way is a royal
> pain. As far as I know, the only option is to use named pipes and
> block on them using a thread pool.  A few years back I wrote something
> that did this, see the link below. However it ain't pretty..
>
> https://bitbucket.org/geertj/winpexpect/src/tip/lib/winpexpect.py
>

Hm, doesn't IOCP support named pipes?

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/bc2be898/attachment.html>

From eliben at gmail.com  Wed Jan 16 19:26:44 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Wed, 16 Jan 2013 10:26:44 -0800
Subject: [Python-ideas] question about the Tulip effort
In-Reply-To: <CAP7+vJL928zc975Kf8Jix_h3zgjBuN-cJbGF8qN5DEz2bO1PSA@mail.gmail.com>
References: <CAF-Rda94Buwv5dQs5d+f0LVZu+Q46uXaL+uc1EowvJwSh2A87Q@mail.gmail.com>
	<CAP7+vJL928zc975Kf8Jix_h3zgjBuN-cJbGF8qN5DEz2bO1PSA@mail.gmail.com>
Message-ID: <CAF-Rda9NqHnAkgz5uKCpEHpE7xzgo0c4ib5cgvCzx0bzFn7dhQ@mail.gmail.com>

>
> > 1. Questions and clarifications should be sent to this list
> (python-ideas),
> > correct?
>
> Yes, unless you think it's of little public value, you can always mail
> me directly (Tulip is my top priority until the PEP is accepted and
> Tulip lands in the 3.4 stdlib).
>
> > 2. Is there a list of tasks help would be needed with? Is it the the TODO
> > file in tulip's root dir?
>
> Hm, that's mostly reminders for myself, and I don't always update it.
> There are also lots of TODOs and XXXs in the source code (the XXXs
> mark things that are *definitely* in need of fixing, like missing
> docstrings; TODOs are often just for pondering). You can certainly
> read through it, and if you see a task you would like to do, ping me
> for details.
>
> Some tasks that I don't think are represented well but where I would
> love to get help:
>
> - Write a somewhat significant server app. I have a somewhat
> significant client app (crawl.py) but nothing that exercises the
> server API at all. I suspect that there are some awkward things in the
> server API that will need fixing.
>
> - Try writing a significant app for a protocol other than HTTP.
>
> - Move the StreamReader class out of http_client.py and design an API
> to make it easy to hook it up to any protocol.
>
> - Datagram support (read the section in the PEP on this topic first).
>
>
Great, I'll start looking around.


>  > 3. How/where to contribute patches?
>
> I like to get code review requests using codereview.appspot.com (send
> them to gvanrossum at gmail.com). Please use the upload.py utility to
> upload your patch, don't bother with defining a repository. If I like
> your patch I'll probably ask you to submit it yourself, I'll give you
> repo access once you've signed a PSF contributor form.
>

Is that the same contributor form I had to sign for CPython a while ago (I
have the asterisk near my name in the issue tracker)? Anyway, sending
patches through Rietveld SGTM.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/0f6ca4d1/attachment.html>

From geertj at gmail.com  Wed Jan 16 19:27:10 2013
From: geertj at gmail.com (Geert Jansen)
Date: Wed, 16 Jan 2013 20:27:10 +0200
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+nfmW7QoMQ4WYD2A56ncuvPtKhsVD3qMqjmzq=XjXawA@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CADbA=FXwajg8=EHyFzdccCZP5yR047u1YvOXxBvrhNZ32+JZwA@mail.gmail.com>
	<CAP7+vJ+nfmW7QoMQ4WYD2A56ncuvPtKhsVD3qMqjmzq=XjXawA@mail.gmail.com>
Message-ID: <CADbA=FW8Ao8esxyovDnHOTuuZ2mZ--tD-6AUHFz=xXK15H=bTA@mail.gmail.com>

On Wed, Jan 16, 2013 at 8:22 PM, Guido van Rossum <guido at python.org> wrote:
>> Dealing with subprocesses on Windows in a non-blocking way is a royal
>> pain. As far as I know, the only option is to use named pipes and
>> block on them using a thread pool.  A few years back I wrote something
>> that did this, see the link below. However it ain't pretty..
>>
>> https://bitbucket.org/geertj/winpexpect/src/tip/lib/winpexpect.py
>
>
> Hm, doesn't IOCP support named pipes?

Oops, yes, I stand corrected. I got confused between select and IOCP. Sorry.

Regards,
Geert


From solipsis at pitrou.net  Wed Jan 16 19:47:56 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 16 Jan 2013 19:47:56 +0100
Subject: [Python-ideas] Parametrized any() and all() ?
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
	<50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org>
Message-ID: <20130116194756.2efe9afe@pitrou.net>

On Wed, 16 Jan 2013 15:52:19 +0100
Tarek Ziad? <tarek at ziade.org> wrote:
> >
> > For all
> > we know, adding a filter function will be a pessimization, not an 
> > optimization,
> > using more memory and/or being slower than using a generator 
> > expression. It
> > certainly isn't clear to me that creating a generator expression like
> > (x is None for x in it) is more expensive than creating a filter 
> > function like
> > (lambda x: x is None).
> >
> > -1 on adding a filter function.
> 
> I abandoned the idea,
> 
> but I'd be curious to understand how creating several
> iterables with one that has an 'if', can be more efficient than having a 
> single
> iterable with an 'if'...

You know, discussing performance without posting benchmark numbers is
generally pointless.

Regards

Antoine.




From shibturn at gmail.com  Wed Jan 16 19:54:50 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Wed, 16 Jan 2013 18:54:50 +0000
Subject: [Python-ideas] The async API of the future
In-Reply-To: <CACac1F9Grgwvz7-6RS902BEEwuP6mYROjmnxYoOKf3aoKQEDfA@mail.gmail.com>
References: <CAP7+vJLzct4p_SHyMHPc6C0aDE=-zbHw-L6F9502xi8zfGpj9w@mail.gmail.com>
	<2CEFACA8-FB96-4C17-9D14-CADEE217F662@molden.no>
	<CAP7+vJKXgmTXA7JnHw0=uGst5P=mxv3HhFMxh71GDGOn4ZFQDQ@mail.gmail.com>
	<k741qk$hat$1@ger.gmane.org>
	<CACac1F9Grgwvz7-6RS902BEEwuP6mYROjmnxYoOKf3aoKQEDfA@mail.gmail.com>
Message-ID: <kd6t1r$vt0$1@ger.gmane.org>

On 16/01/2013 5:59pm, Paul Moore wrote:
 > I've just been looking at this, and from what I can see, am I right in
 > thinking that the IOCP support is*only*  for sockets? (I'm not very
 > familiar with socket programming, so I had a bit of difficulty
 > following the code). In particular, it can't be used to register
 > non-socket file objects? From my understanding of the IOCP
 > documentation on MSDN, this is fundamental - IOCP can only be used on
 > HANDLE objects that have been opened with the FILE_FLAG_OVERLAPPED
 > flag, which is not used by "normal" Python IO objects like file
 > handles and pipes, so it will never be possible to poll these objects
 > using IOCP.

Only sockets are supported because it uses WSARecv()/WSASend(), but it 
could very easily be made to use ReadFile()/WriteFile().  Then it would 
work with overlapped pipes (as currently used by multiprocessing) or 
other files openned with FILE_FLAG_OVERLAPPED.

IOCP cannot be used with normal python file objects.  But see

     http://bugs.python.org/issue12939

-- 
Richard



From shibturn at gmail.com  Wed Jan 16 20:18:22 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Wed, 16 Jan 2013 19:18:22 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
Message-ID: <kd6udv$fp2$1@ger.gmane.org>

On 16/01/2013 6:21pm, Guido van Rossum wrote:
> I'm eagerly awaiting Richard's response. AFAIK handles on Windows *are*
> more general than sockets...

I would like to modify subprocess on Windows to use file-like objects 
which wrap overlapped pipe handles.  Then doing async work with 
subprocess would become relatively straight forward, and does not really 
require tulip or IOCP.

-- 
Richard



From p.f.moore at gmail.com  Wed Jan 16 20:35:15 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 16 Jan 2013 19:35:15 +0000
Subject: [Python-ideas] The async API of the future
In-Reply-To: <kd6t1r$vt0$1@ger.gmane.org>
References: <CAP7+vJLzct4p_SHyMHPc6C0aDE=-zbHw-L6F9502xi8zfGpj9w@mail.gmail.com>
	<2CEFACA8-FB96-4C17-9D14-CADEE217F662@molden.no>
	<CAP7+vJKXgmTXA7JnHw0=uGst5P=mxv3HhFMxh71GDGOn4ZFQDQ@mail.gmail.com>
	<k741qk$hat$1@ger.gmane.org>
	<CACac1F9Grgwvz7-6RS902BEEwuP6mYROjmnxYoOKf3aoKQEDfA@mail.gmail.com>
	<kd6t1r$vt0$1@ger.gmane.org>
Message-ID: <CACac1F8LPSkbfYgZwrjsLN2W8vCFP+0-k1wQB8sRi3aMcmm4yg@mail.gmail.com>

On 16 January 2013 18:54, Richard Oudkerk <shibturn at gmail.com> wrote:
> Only sockets are supported because it uses WSARecv()/WSASend(), but it could
> very easily be made to use ReadFile()/WriteFile().  Then it would work with
> overlapped pipes (as currently used by multiprocessing) or other files
> openned with FILE_FLAG_OVERLAPPED.

Oh, cool. I hadn't checked the source to see if multiprocessing opened
its pipes with FILE_FLAG_OVERLAPPED. Good to know it does. And yes, if
normal file objects were opened that way, that would allow those to be
used as well.

Paul


From guido at python.org  Wed Jan 16 21:16:09 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Jan 2013 12:16:09 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <kd6udv$fp2$1@ger.gmane.org>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<kd6udv$fp2$1@ger.gmane.org>
Message-ID: <CAP7+vJ+BQBsUTph6hCkZ-uY+-U+Kzo2CCUwi7i+e0NANpo+VgQ@mail.gmail.com>

On Wed, Jan 16, 2013 at 11:18 AM, Richard Oudkerk <shibturn at gmail.com>wrote:

> On 16/01/2013 6:21pm, Guido van Rossum wrote:
>
>> I'm eagerly awaiting Richard's response. AFAIK handles on Windows *are*
>> more general than sockets...
>>
>
> I would like to modify subprocess on Windows to use file-like objects
> which wrap overlapped pipe handles.  Then doing async work with subprocess
> would become relatively straight forward, and does not really require tulip
> or IOCP.


But when you want to use it in the context of an event loop would it still
be *possible* to hook it up to that using a transport or
add_reader/add_writer?

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/d81fc5ed/attachment.html>

From guido at python.org  Wed Jan 16 21:16:58 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Jan 2013 12:16:58 -0800
Subject: [Python-ideas] question about the Tulip effort
In-Reply-To: <CAF-Rda9NqHnAkgz5uKCpEHpE7xzgo0c4ib5cgvCzx0bzFn7dhQ@mail.gmail.com>
References: <CAF-Rda94Buwv5dQs5d+f0LVZu+Q46uXaL+uc1EowvJwSh2A87Q@mail.gmail.com>
	<CAP7+vJL928zc975Kf8Jix_h3zgjBuN-cJbGF8qN5DEz2bO1PSA@mail.gmail.com>
	<CAF-Rda9NqHnAkgz5uKCpEHpE7xzgo0c4ib5cgvCzx0bzFn7dhQ@mail.gmail.com>
Message-ID: <CAP7+vJ+7Ty7kj8aCpdQUMJU1AXg1T_vUj3LxY1Oy4xBY5+P-8Q@mail.gmail.com>

On Wed, Jan 16, 2013 at 10:26 AM, Eli Bendersky <eliben at gmail.com> wrote:

> Is that the same contributor form I had to sign for CPython a while ago (I
> have the asterisk near my name in the issue tracker)?
>

The same. So you're all set.


>  Anyway, sending patches through Rietveld SGTM.
>

Looking forward to them!

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/f5e7795f/attachment.html>

From shibturn at gmail.com  Wed Jan 16 21:21:05 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Wed, 16 Jan 2013 20:21:05 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+BQBsUTph6hCkZ-uY+-U+Kzo2CCUwi7i+e0NANpo+VgQ@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<kd6udv$fp2$1@ger.gmane.org>
	<CAP7+vJ+BQBsUTph6hCkZ-uY+-U+Kzo2CCUwi7i+e0NANpo+VgQ@mail.gmail.com>
Message-ID: <kd723j$im0$1@ger.gmane.org>

On 16/01/2013 8:16pm, Guido van Rossum wrote:
> But when you want to use it in the context of an event loop would it
> still be *possible* to hook it up to that using a transport or
> add_reader/add_writer?

Assuming you use an IOCP reactor, yes.

-- 
Richard



From greg.ewing at canterbury.ac.nz  Wed Jan 16 21:45:40 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 17 Jan 2013 09:45:40 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
Message-ID: <50F71174.7070803@canterbury.ac.nz>

Guido van Rossum wrote:
> If the user is okay with solving the problem only for their particular 
> platform and event loop implementation they don't need to add anything 
> to the event loop.

In this case, shouldn't it be sufficient for tulip to provide
a way of wrapping pipes, whatever they may look like on the
platform? I don't see why a Transport specific to subprocesses
should be required.

-- 
Greg


From guido at python.org  Wed Jan 16 22:16:42 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 16 Jan 2013 13:16:42 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <50F71174.7070803@canterbury.ac.nz>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<50F71174.7070803@canterbury.ac.nz>
Message-ID: <CAP7+vJKOZpPuW8OqsFNiA5v47_HzF2hHg5VmCwsH7zaHrgmf+w@mail.gmail.com>

On Wed, Jan 16, 2013 at 12:45 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>wrote:

> Guido van Rossum wrote:
>
>> If the user is okay with solving the problem only for their particular
>> platform and event loop implementation they don't need to add anything to
>> the event loop.
>>
>
> In this case, shouldn't it be sufficient for tulip to provide
> a way of wrapping pipes, whatever they may look like on the
> platform? I don't see why a Transport specific to subprocesses
> should be required.


Tulip on UNIX already wraps pipes (and ptys, and certain other things),
since the add_reader() API takes any filedescriptor (though it makes no
sense for disk files because those are always considered readable).

The issue is that on other platforms (read: Windows) you have to do
something completely different, and hook it up to the native (IOCP) async
eventloop differently. The Transport/Protocol abstraction however would be
completely appropriate in both cases though (or a slightly modified version
that handles stdout/stderr separately).

So, just like the subprocess module contains two completely disjoint
implementations for UNIX and Windows, implementing the same API, PEP 3156
could also have a standard API for running a subprocess connected with
async streams connected to stdin, stdout, stderr, backed by different
implementations.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130116/d01f760e/attachment.html>

From p.f.moore at gmail.com  Thu Jan 17 13:23:10 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 17 Jan 2013 12:23:10 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
Message-ID: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>

On 16 January 2013 18:21, Guido van Rossum <guido at python.org> wrote:
>> OK, off to do a lot of spec reading and then some coding. With luck,
>> you'll be patient with dumb questions from me on the way :-)
>
> I will be!

OK, I'm reading the PEP through now. I'm happy with the basics of the
event loop, and it seems fine to me. When I reached create_transport,
I had to skip ahead to the definitions of transport and protocol, as
create_transport makes no sense if you don't know about those. Once
I've read that, though, the whole transport/protocol mechanism seems
to make reasonable sense to me. Although the host and port arguments
to create_transport are clearly irrelevant to the case of a transport
managing a process as a data source. So (a) I see why you say I'd need
a new transport creation method, but (b) it strikes me that something
more general that covered both cases (and any others that may come up
later) would be better.

On the other hand, given the existence of create_transport, I'm now
struggling to understand why a user would ever use
add_reader/add_writer rather than using a transport/protocol. And if
they do have a reason to do so, why does a similar reason not apply to
having an add_pipe type of method for waiting on (subprocess) pipes?

In general, it still feels to me like the socket use case is being
treated as "special", and other data sources and sinks (subprocesses
being my use case, but I'm sure others exist) are either second-class
or require a whole set of their own specialised methods, which isn't
practical.

As a strawman type of argument in favour of extensibility, consider a
very specialist user with a hardware device that sends input via (say)
a serial port. I can easily imagine that user wanting to plug his
device data into the Python event loop. As this is a very specialised
area, I wouldn't expect the core code to be able to help, but I would
expect him to be able to write code that plugs into the standard event
loop seamlessly. Ideally, I'd like to use the subprocess case as a
proof that this is practical.

Does that make sense?
Paul.


From ica at iki.fi  Thu Jan 17 13:44:03 2013
From: ica at iki.fi (Ilkka Pelkonen)
Date: Thu, 17 Jan 2013 14:44:03 +0200
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
Message-ID: <CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>

Hi all,
I ran into an issue in expression evaluation with Python for Windows 2.7.3.
Consider the following code:

expected_result = (expected_string != 'TRUE') # Boolean
element = find_element() # Can return None or an instance of Element
flag = (element and element.is_visible())
if flag == expected_result:
..# Ok
..return
# Otherwise perform some failure related stuff.

This code does not work. What happens on the 'flag' assignment row, is that
if 'element' is None, the expression returns None, not False. This makes
the if comparison to fail if expected_result is False, since boolean False
is not None.

To me as a primarily C++ programmer it seems there could be two different
changes here, either change the behavior of the 'and' expression, forcing
it to return Boolean even if the latter part is not evaluated, and/or make
the comparison "False == None" return True. Although potentially complex,
I'd myself go for the first approach. It seems to me more logical that
False != None than an 'and' expression returning non-boolean. Also the
latter change might require people change their code, while the former
should not require any modifications.

This behavior probably results in lots of errors when people like me, used
to more traditional languages, take on Python in a serious manner. I like
the concept 'pythonic', and am trying to apply it to practice like above.

Hoping to hear your thoughts,
Regards,

Ilkka Pelkonen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130117/732873ed/attachment.html>

From phd at phdru.name  Thu Jan 17 13:51:05 2013
From: phd at phdru.name (Oleg Broytman)
Date: Thu, 17 Jan 2013 16:51:05 +0400
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
Message-ID: <20130117125105.GA2609@iskra.aviel.ru>

On Thu, Jan 17, 2013 at 02:44:03PM +0200, Ilkka Pelkonen <ica at iki.fi> wrote:
> expected_result = (expected_string != 'TRUE') # Boolean
> element = find_element() # Can return None or an instance of Element
> flag = (element and element.is_visible())
> if flag == expected_result:
> ..# Ok
> ..return
> # Otherwise perform some failure related stuff.
> 
> This code does not work. What happens on the 'flag' assignment row, is that
> if 'element' is None, the expression returns None, not False. This makes
> the if comparison to fail if expected_result is False, since boolean False
> is not None.

   No need to change the language. Just do

flag = bool(element and element.is_visible())

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From jsbueno at python.org.br  Thu Jan 17 13:58:11 2013
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Thu, 17 Jan 2013 10:58:11 -0200
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
Message-ID: <CAH0mxTQA_7DLPLRjAczYQbYZ1_fEkx+=TtmVXvpyi9DqpzY--A@mail.gmail.com>

On 17 January 2013 10:44, Ilkka Pelkonen <ica at iki.fi> wrote:
> Hi all,
> I ran into an issue in expression evaluation with Python for Windows 2.7.3.
> Consider the following code:
>
> expected_result = (expected_string != 'TRUE') # Boolean
> element = find_element() # Can return None or an instance of Element
> flag = (element and element.is_visible())
> if flag == expected_result:
> ..# Ok
> ..return
> # Otherwise perform some failure related stuff.
>
> This code does not work. What happens on the 'flag' assignment row, is that
> if 'element' is None, the expression returns None, not False. This makes the
> if comparison to fail if expected_result is False, since boolean False is
> not None.
>
> To me as a primarily C++ programmer it seems there could be two different
> changes here, either change the behavior of the 'and' expression, forcing it
> to return Boolean even if the latter part is not evaluated,
> and/or make the
> comparison "False == None" return True.




Hi Ikka..

My personal suggestion - rewrite your code to read:

flag = bool(element and element.is_visible())

instead.

That way you don't have to mention trying to change a 20 year old
behavior in a language with billions of lines of code in the wild
which should be kept compatible, at expense of thinking your
expressions.

Nor wait for the next major "4.0" release of Python for being able to
write your code.
  js
 -><-


From ilkka.pelkonen at iki.fi  Thu Jan 17 14:10:45 2013
From: ilkka.pelkonen at iki.fi (Ilkka Pelkonen)
Date: Thu, 17 Jan 2013 15:10:45 +0200
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <20130117125105.GA2609@iskra.aviel.ru>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
	<20130117125105.GA2609@iskra.aviel.ru>
Message-ID: <CADRrsLp0DyQkdgy7Z6HWd-MzU1yjF4KorHnAoO-tPqzzV_YdpA@mail.gmail.com>

Hi Oleg, others,
It's not that it can't be done, just that it does something you don't
expect. I've been professionally working with C++ for nine years in
large-scale Windows systems, and I do expect a boolean expression return a
boolean value.

Or, can you show me an example how the developer would benefit of the
current behavior? Any operator traditionally considered as boolean will do.

Regards,
Ilkka


On Thu, Jan 17, 2013 at 2:51 PM, Oleg Broytman <phd at phdru.name> wrote:

> On Thu, Jan 17, 2013 at 02:44:03PM +0200, Ilkka Pelkonen <ica at iki.fi>
> wrote:
> > expected_result = (expected_string != 'TRUE') # Boolean
> > element = find_element() # Can return None or an instance of Element
> > flag = (element and element.is_visible())
> > if flag == expected_result:
> > ..# Ok
> > ..return
> > # Otherwise perform some failure related stuff.
> >
> > This code does not work. What happens on the 'flag' assignment row, is
> that
> > if 'element' is None, the expression returns None, not False. This makes
> > the if comparison to fail if expected_result is False, since boolean
> False
> > is not None.
>
>    No need to change the language. Just do
>
> flag = bool(element and element.is_visible())
>
> Oleg.
> --
>      Oleg Broytman            http://phdru.name/            phd at phdru.name
>            Programmers don't die, they just GOSUB without RETURN.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130117/ff83ad34/attachment.html>

From phd at phdru.name  Thu Jan 17 14:23:11 2013
From: phd at phdru.name (Oleg Broytman)
Date: Thu, 17 Jan 2013 17:23:11 +0400
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <CADRrsLp0DyQkdgy7Z6HWd-MzU1yjF4KorHnAoO-tPqzzV_YdpA@mail.gmail.com>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
	<20130117125105.GA2609@iskra.aviel.ru>
	<CADRrsLp0DyQkdgy7Z6HWd-MzU1yjF4KorHnAoO-tPqzzV_YdpA@mail.gmail.com>
Message-ID: <20130117132311.GA5971@iskra.aviel.ru>

On Thu, Jan 17, 2013 at 03:10:45PM +0200, Ilkka Pelkonen <ilkka.pelkonen at iki.fi> wrote:
> It's not that it can't be done, just that it does something you don't
> expect. I've been professionally working with C++ for nine years in
> large-scale Windows systems, and I do expect a boolean expression return a
> boolean value.

   It does something Python developers expect. It's a well-known
behaviour and there are many programs that rely on that behaviour.

> Or, can you show me an example how the developer would benefit of the
> current behavior? Any operator traditionally considered as boolean will do.

address = user and user.address
if address is None:
    raise ValueError("Unknown address")

   In the example neither user nor user.address are allowed to be None.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From jsbueno at python.org.br  Thu Jan 17 14:23:47 2013
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Thu, 17 Jan 2013 11:23:47 -0200
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <CADRrsLp0DyQkdgy7Z6HWd-MzU1yjF4KorHnAoO-tPqzzV_YdpA@mail.gmail.com>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
	<20130117125105.GA2609@iskra.aviel.ru>
	<CADRrsLp0DyQkdgy7Z6HWd-MzU1yjF4KorHnAoO-tPqzzV_YdpA@mail.gmail.com>
Message-ID: <CAH0mxTQ7oy9E+Tn8-2ofCuECLt=dgGH7saYTS+=T7yBky7v10g@mail.gmail.com>

On 17 January 2013 11:10, Ilkka Pelkonen <ilkka.pelkonen at iki.fi> wrote:
> Hi Oleg, others,
> It's not that it can't be done, just that it does something you don't
> expect. I've been professionally working with C++ for nine years in
> large-scale Windows systems, and I do expect a boolean expression return a
> boolean value.
>
> Or, can you show me an example how the developer would benefit of the
> current behavior? Any operator traditionally considered as boolean will do.


Ikka,

Python is a dynamic typed language. As such, there is no strict type checking
for most operations.

The behavior of boolean operations for Python 2.x is well defiend and described
here:

http://docs.python.org/2/reference/expressions.html#boolean-operations

If you are testing for "truthfullness" of a given object, using the
"==" value for that,
as in your "if flag == expected_result:"  is definetelly a
non-recomended pratice.
Objects that have False or True value have always been well defined in Python,
and that definition follows common sense closely, on what should be False.

No one would expect "None" to be True. The behavior of yielding the
first part of the expression in
a failed "and" operation is not unique to Python, and AFAIK, has been
inspired from C -
and there are tons of code which rely directly on this behavior. (Even
though I'd agree a lot of this
code, emulating the ternary operator in a time before it was available
in Python 2.5
is very poorly written)


> Regards,
> Ilkka
>


From solipsis at pitrou.net  Thu Jan 17 15:32:43 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 17 Jan 2013 15:32:43 +0100
Subject: [Python-ideas] Boolean behavior of None
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
	<20130117125105.GA2609@iskra.aviel.ru>
	<CADRrsLp0DyQkdgy7Z6HWd-MzU1yjF4KorHnAoO-tPqzzV_YdpA@mail.gmail.com>
Message-ID: <20130117153243.72fd7508@pitrou.net>

Le Thu, 17 Jan 2013 15:10:45 +0200,
Ilkka Pelkonen <ilkka.pelkonen at iki.fi> a ?crit :
> Hi Oleg, others,
> It's not that it can't be done, just that it does something you don't
> expect. I've been professionally working with C++ for nine years in
> large-scale Windows systems, and I do expect a boolean expression
> return a boolean value.

"and" is not a boolean operator, it is a shortcutting control flow
operator.
Basically, what you are looking for is:

    flag = element.is_visible() if element else False

(or, more explicitly:
    flag = element.is_visible() if element is not None else False
)

Regards

Antoine.




From p.f.moore at gmail.com  Thu Jan 17 15:35:13 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 17 Jan 2013 14:35:13 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
Message-ID: <CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>

On 17 January 2013 12:23, Paul Moore <p.f.moore at gmail.com> wrote:
> In general, it still feels to me like the socket use case is being
> treated as "special", and other data sources and sinks (subprocesses
> being my use case, but I'm sure others exist) are either second-class
> or require a whole set of their own specialised methods, which isn't
> practical.

Thinking about this some more. The key point is that for any event
loop there can only be one "source of events" in terms of the thing
that the event loop checks when there are no pending tasks. So the
event loop is roughly:

while True:
    process_ready_queue()
    new_events = block_on_event_source(src, timeout=N)
    add_to_ready_queue(new_events)
    add_timed_events_to_ready_queue()

The source has to be a unique object, as there's an OS-level wait in
there, and you can't do two of them at once.

As things stand, methods like add_reader on the event loop object
should really be methods on the event source object (and indeed,
that's more or less what Tulip does internally). Would it not make
more sense to explicitly expose the event source? This is (I guess)
what the section "Choosing an Event Loop Implementation" in the PEP is
about. But if the event source is a user-visible object, methods like
add_reader would no longer be optional event loop methods, but rather
they would be methods of the event source (but only for those event
sources for which they make sense).

The point here is that there's a lot of event loop machinery (ready
queue, timed events, run methods) that are independent of the precise
means by which you poll the OS to ask "has anything interesting
happened?" Abstracting out that machinery would seem to me to make the
design cleaner and more understandable.

Other benefits - our hypothetical person with a serial port device can
build his own event source and plug it into the event loop directly.
Or someone could offer a multiplexer that combines two separate
sources by running them in different threads and merging the output on
a queue (that may be YAGNI, though).

This is really just something to think about while I'm trying to build
a Linux development environment so that I can do a Unix proof of
concept. Once I get started on that, I'll think about the
protocol/transport stuff.

Paul


From ned at nedbatchelder.com  Thu Jan 17 15:54:28 2013
From: ned at nedbatchelder.com (Ned Batchelder)
Date: Thu, 17 Jan 2013 09:54:28 -0500
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <CADRrsLp0DyQkdgy7Z6HWd-MzU1yjF4KorHnAoO-tPqzzV_YdpA@mail.gmail.com>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
	<20130117125105.GA2609@iskra.aviel.ru>
	<CADRrsLp0DyQkdgy7Z6HWd-MzU1yjF4KorHnAoO-tPqzzV_YdpA@mail.gmail.com>
Message-ID: <50F810A4.8040802@nedbatchelder.com>

On 1/17/2013 8:10 AM, Ilkka Pelkonen wrote:
> Hi Oleg, others,
> It's not that it can't be done, just that it does something you don't 
> expect. I've been professionally working with C++ for nine years in 
> large-scale Windows systems, and I do expect a boolean expression 
> return a boolean value.
>
> Or, can you show me an example how the developer would benefit of the 
> current behavior? Any operator traditionally considered as boolean 
> will do.
>
> Regards,
> Ilkka

Ilkka, welcome to the Python community.  Python is a wonderfully 
expressive language once you learn its subtleties.

Python and C++ are different.  If they weren't, we'd only have one 
language, not two.  The short-circuiting operations "and" and "or" 
behave as they do for a reason.  As an example, a common way to deal 
with default values:

     def accumulate(value, to=None):
         to = to or []
         to.append(value)
         # Forget whether this is a good function or not..
         return to

If "or" always returned a boolean, as I'm assuming you'd prefer, then 
we'd have a much clumsier time defaulting values like this.

--Ned.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130117/403661a6/attachment.html>

From grosser.meister.morti at gmx.net  Thu Jan 17 17:07:13 2013
From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=)
Date: Thu, 17 Jan 2013 17:07:13 +0100
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
Message-ID: <50F821B1.9070905@gmx.net>

This change would break a lot of existing code and would make Python awkwardly stand out from all
other modern dynamically typed languages (e.g. Ruby and JavaScript). You often write things like
this:

	def foo(bar=None):
		bar = bar or []
		...

Or:

	obj = obj and obj.property

The proposed change would needlessly complicate these things and break existing code. Forcing a
bool type really doesn't require that much code (bool(expr)) and is good practise anyway.

On 01/17/2013 01:44 PM, Ilkka Pelkonen wrote:
> Hi all,
> I ran into an issue in expression evaluation with Python for Windows 2.7.3. Consider the following code:
>
> expected_result = (expected_string != 'TRUE') # Boolean
> element = find_element() # Can return None or an instance of Element
> flag = (element and element.is_visible())
> if flag == expected_result:
> ..# Ok
> ..return
> # Otherwise perform some failure related stuff.
>
> This code does not work. What happens on the 'flag' assignment row, is that if 'element' is None, the expression returns None, not False. This makes the if comparison to fail if expected_result is False, since boolean False is not None.
>
> To me as a primarily C++ programmer it seems there could be two different changes here, either change the behavior of the 'and' expression, forcing it to return Boolean even if the latter part is not evaluated, and/or make the comparison "False == None" return True. Although potentially complex, I'd
> myself go for the first approach. It seems to me more logical that False != None than an 'and' expression returning non-boolean. Also the latter change might require people change their code, while the former should not require any modifications.
>
> This behavior probably results in lots of errors when people like me, used to more traditional languages, take on Python in a serious manner. I like the concept 'pythonic', and am trying to apply it to practice like above.
>
> Hoping to hear your thoughts,
> Regards,
>
> Ilkka Pelkonen
>
>
>


From ilkka.pelkonen at iki.fi  Thu Jan 17 18:03:12 2013
From: ilkka.pelkonen at iki.fi (Ilkka Pelkonen)
Date: Thu, 17 Jan 2013 19:03:12 +0200
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <50F821B1.9070905@gmx.net>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
	<50F821B1.9070905@gmx.net>
Message-ID: <CADRrsLpacJYS10Fj9S7v1sgFb1iRZ-+24GFFs-jwFzy=UE7+Kw@mail.gmail.com>

Thank you all. It was just that when I started with Python, everything
worked right like I expected, and I found the ways to do anything I've
needed all the way until today, so when I came across with this, it
appeared to me a clear bug in the language/interpreter. Casting to bool is
indeed a good solution and practice, and I do now agree that there's no
point in changing the language - like Antoine said, we're talking control
flow operators here, not exactly boolean. (This might be a good addition to
the documentation.)

Thank you Ned for the warm welcome and everyone for your input. I hope to
be able to contribute in the future. :)

Best Regards,
Ilkka


On Thu, Jan 17, 2013 at 6:07 PM, Mathias Panzenb?ck <
grosser.meister.morti at gmx.net> wrote:

> This change would break a lot of existing code and would make Python
> awkwardly stand out from all
> other modern dynamically typed languages (e.g. Ruby and JavaScript). You
> often write things like
> this:
>
>         def foo(bar=None):
>                 bar = bar or []
>                 ...
>
> Or:
>
>         obj = obj and obj.property
>
> The proposed change would needlessly complicate these things and break
> existing code. Forcing a
> bool type really doesn't require that much code (bool(expr)) and is good
> practise anyway.
>
>
> On 01/17/2013 01:44 PM, Ilkka Pelkonen wrote:
>
>> Hi all,
>> I ran into an issue in expression evaluation with Python for Windows
>> 2.7.3. Consider the following code:
>>
>> expected_result = (expected_string != 'TRUE') # Boolean
>> element = find_element() # Can return None or an instance of Element
>> flag = (element and element.is_visible())
>> if flag == expected_result:
>> ..# Ok
>> ..return
>> # Otherwise perform some failure related stuff.
>>
>> This code does not work. What happens on the 'flag' assignment row, is
>> that if 'element' is None, the expression returns None, not False. This
>> makes the if comparison to fail if expected_result is False, since boolean
>> False is not None.
>>
>> To me as a primarily C++ programmer it seems there could be two different
>> changes here, either change the behavior of the 'and' expression, forcing
>> it to return Boolean even if the latter part is not evaluated, and/or make
>> the comparison "False == None" return True. Although potentially complex,
>> I'd
>> myself go for the first approach. It seems to me more logical that False
>> != None than an 'and' expression returning non-boolean. Also the latter
>> change might require people change their code, while the former should not
>> require any modifications.
>>
>> This behavior probably results in lots of errors when people like me,
>> used to more traditional languages, take on Python in a serious manner. I
>> like the concept 'pythonic', and am trying to apply it to practice like
>> above.
>>
>> Hoping to hear your thoughts,
>> Regards,
>>
>> Ilkka Pelkonen
>>
>>
>>
>>  ______________________________**_________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130117/50e3e26b/attachment.html>

From ilkka.pelkonen at iki.fi  Thu Jan 17 19:43:01 2013
From: ilkka.pelkonen at iki.fi (Ilkka Pelkonen)
Date: Thu, 17 Jan 2013 20:43:01 +0200
Subject: [Python-ideas] Fwd: Boolean behavior of None
In-Reply-To: <CADRrsLpacJYS10Fj9S7v1sgFb1iRZ-+24GFFs-jwFzy=UE7+Kw@mail.gmail.com>
References: <CADRrsLq=h+DWveRJpNsurPfNPdrPbY1+6gyuiMRKEbq4YMjd-Q@mail.gmail.com>
	<CADRrsLpq6Op_teEGaDvXjhTdbBfLDQZGyk9rH_9aio=sPYgA5w@mail.gmail.com>
	<50F821B1.9070905@gmx.net>
	<CADRrsLpacJYS10Fj9S7v1sgFb1iRZ-+24GFFs-jwFzy=UE7+Kw@mail.gmail.com>
Message-ID: <CADRrsLp1En==Zbv+Ybt1w97SR4EDMpAOriRiLTPCuxmVQg2B6Q@mail.gmail.com>

To sum this up, I also thought of the None vs False issue some more, and
found a use case which can bring a benefit for the separation: a function
could normally return True or False, or None in a special case. Because of
the separation, the developer can handle all the cases appropriately.

Too little thinking, too much action. I wonder why I didn't google it this
time, the issue is all over the net like any question you can imagine to
ask, at least in case of Python. :)

Sorry for the trouble. I'll stay around with a little lower profile. :)

Regards again,
Ilkka



On Thu, Jan 17, 2013 at 7:03 PM, Ilkka Pelkonen <ilkka.pelkonen at iki.fi>wrote:

> Thank you all. It was just that when I started with Python, everything
> worked right like I expected, and I found the ways to do anything I've
> needed all the way until today, so when I came across with this, it
> appeared to me a clear bug in the language/interpreter. Casting to bool is
> indeed a good solution and practice, and I do now agree that there's no
> point in changing the language - like Antoine said, we're talking control
> flow operators here, not exactly boolean. (This might be a good addition to
> the documentation.)
>
> Thank you Ned for the warm welcome and everyone for your input. I hope to
> be able to contribute in the future. :)
>
> Best Regards,
> Ilkka
>
>
> On Thu, Jan 17, 2013 at 6:07 PM, Mathias Panzenb?ck <
> grosser.meister.morti at gmx.net> wrote:
>
>> This change would break a lot of existing code and would make Python
>> awkwardly stand out from all
>> other modern dynamically typed languages (e.g. Ruby and JavaScript). You
>> often write things like
>> this:
>>
>>         def foo(bar=None):
>>                 bar = bar or []
>>                 ...
>>
>> Or:
>>
>>         obj = obj and obj.property
>>
>> The proposed change would needlessly complicate these things and break
>> existing code. Forcing a
>> bool type really doesn't require that much code (bool(expr)) and is good
>> practise anyway.
>>
>>
>> On 01/17/2013 01:44 PM, Ilkka Pelkonen wrote:
>>
>>> Hi all,
>>> I ran into an issue in expression evaluation with Python for Windows
>>> 2.7.3. Consider the following code:
>>>
>>> expected_result = (expected_string != 'TRUE') # Boolean
>>> element = find_element() # Can return None or an instance of Element
>>> flag = (element and element.is_visible())
>>> if flag == expected_result:
>>> ..# Ok
>>> ..return
>>> # Otherwise perform some failure related stuff.
>>>
>>> This code does not work. What happens on the 'flag' assignment row, is
>>> that if 'element' is None, the expression returns None, not False. This
>>> makes the if comparison to fail if expected_result is False, since boolean
>>> False is not None.
>>>
>>> To me as a primarily C++ programmer it seems there could be two
>>> different changes here, either change the behavior of the 'and' expression,
>>> forcing it to return Boolean even if the latter part is not evaluated,
>>> and/or make the comparison "False == None" return True. Although
>>> potentially complex, I'd
>>> myself go for the first approach. It seems to me more logical that False
>>> != None than an 'and' expression returning non-boolean. Also the latter
>>> change might require people change their code, while the former should not
>>> require any modifications.
>>>
>>> This behavior probably results in lots of errors when people like me,
>>> used to more traditional languages, take on Python in a serious manner. I
>>> like the concept 'pythonic', and am trying to apply it to practice like
>>> above.
>>>
>>> Hoping to hear your thoughts,
>>> Regards,
>>>
>>> Ilkka Pelkonen
>>>
>>>
>>>
>>>  ______________________________**_________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130117/b9ac39d5/attachment.html>

From guido at python.org  Thu Jan 17 20:10:57 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 17 Jan 2013 11:10:57 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
Message-ID: <CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>

(I'm responding to two separate messages in one response.)

On Thu, Jan 17, 2013 at 4:23 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> OK, I'm reading the PEP through now. I'm happy with the basics of the
> event loop, and it seems fine to me. When I reached create_transport,
> I had to skip ahead to the definitions of transport and protocol, as
> create_transport makes no sense if you don't know about those.

Whoops, I should fix the order in the PEP, or at least insert forward
references.

> Once
> I've read that, though, the whole transport/protocol mechanism seems
> to make reasonable sense to me. Although the host and port arguments
> to create_transport are clearly irrelevant to the case of a transport
> managing a process as a data source. So (a) I see why you say I'd need
> a new transport creation method, but (b) it strikes me that something
> more general that covered both cases (and any others that may come up
> later) would be better.

This is why there is a TBD item suggesting to rename
create_transport() to create_connection() -- this method is for
creating the most common type of transport only, i.e. one that
connects a client to a server given by host and port.

> On the other hand, given the existence of create_transport, I'm now
> struggling to understand why a user would ever use
> add_reader/add_writer rather than using a transport/protocol. And if
> they do have a reason to do so, why does a similar reason not apply to
> having an add_pipe type of method for waiting on (subprocess) pipes?

add_reader and friends exist for the benefit of Transport
implementations. The PEP even says that not all event loops need to
implement these (though on UNIXy systems it is better if they do, and
I am considering removing or weakening this language.

Because on UNIX pipes are just file descriptors, and work fine with
select()/poll()/etc., there is no need for add_pipe() (assuming that
API would take an existing pipe filedescriptor and a callback), since
add_reader() will do the right thing. (Or add_writer() for the other
end.)

> In general, it still feels to me like the socket use case is being
> treated as "special", and other data sources and sinks (subprocesses
> being my use case, but I'm sure others exist) are either second-class
> or require a whole set of their own specialised methods, which isn't
> practical.

Well, sockets are treated special because on Windows they *are*
special. At least the select() system call only works for sockets.
IOCP supports other types of unusual handles, but the ways to create
handles you can use with it are mostly custom.

Basically, if you want to write code that works both on Windows and on
UNIX, you have to limit yourself to sockets. (And you shouldn't use
add_reader and friends either, because that limits you to the
SelectSelector, whereas if you use the transport/protocol API you will
be compatible with either that or IOCPSelector.)

> As a strawman type of argument in favour of extensibility, consider a
> very specialist user with a hardware device that sends input via (say)
> a serial port. I can easily imagine that user wanting to plug his
> device data into the Python event loop. As this is a very specialised
> area, I wouldn't expect the core code to be able to help, but I would
> expect him to be able to write code that plugs into the standard event
> loop seamlessly. Ideally, I'd like to use the subprocess case as a
> proof that this is practical.
>
> Does that make sense?

Yes, it does make sense, but you have to choose whether to do it on
Windows or on UNIX. If you use UNIX, presumably your serial port is
accessible via a file descriptor that works with select/poll/etc. --
if it doesn't, you are going to have a really hard time integrating it
with the event loop, you may have to use a separate thread that talks
to the device and sends the data to the event loop over a pipe or
something. On Windows, I have no idea how it would work, but I presume
that serial port drivers are somehow hooked up to "handles" and
"waitable events" (or whatever the Microsoft terminology is -- I am
about to get educated about this) and then presumably it will
integrate nicely with IOCP (but not with Select).

I think that for UNIX, hooking a subprocess up to a transport should
be easy enough (except perhaps for the stdout/stderr distinction), and
your transport should use add_reader/writer. For Windows I am not sure
but you can probably crib the details from the Windows-specific code
in subprocess.py in the stdlib.

On Thu, Jan 17, 2013 at 6:35 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 17 January 2013 12:23, Paul Moore <p.f.moore at gmail.com> wrote:
>> In general, it still feels to me like the socket use case is being
>> treated as "special", and other data sources and sinks (subprocesses
>> being my use case, but I'm sure others exist) are either second-class
>> or require a whole set of their own specialised methods, which isn't
>> practical.
>
> Thinking about this some more. The key point is that for any event
> loop there can only be one "source of events" in terms of the thing
> that the event loop checks when there are no pending tasks. So the
> event loop is roughly:
>
> while True:
>     process_ready_queue()
>     new_events = block_on_event_source(src, timeout=N)
>     add_to_ready_queue(new_events)
>     add_timed_events_to_ready_queue()
>
> The source has to be a unique object, as there's an OS-level wait in
> there, and you can't do two of them at once.

Right, that's the idea.

> As things stand, methods like add_reader on the event loop object
> should really be methods on the event source object (and indeed,
> that's more or less what Tulip does internally). Would it not make
> more sense to explicitly expose the event source? This is (I guess)
> what the section "Choosing an Event Loop Implementation" in the PEP is
> about. But if the event source is a user-visible object, methods like
> add_reader would no longer be optional event loop methods, but rather
> they would be methods of the event source (but only for those event
> sources for which they make sense).

The problem with this idea is (you may have guessed it by now :-) ...
Windows. On Windows, at least when using a (at this moment purely
hypothetical) IOCP-based implementation of the event loop, there will
*not* be an underlying Selector object. Please track down discussion
of IOCP in older posts on this list. IOCP requires you to use a
different paradigm, which is supported by the separate methods
sock_recv(), sock_sendall() and so on. For I/O objects that are not
sockets, different methods are needed, but the idea is the same: you
specify the I/O, and you get a callback when it is done. This in
contrast with the UNIX selector, where you specify the file descriptor
and I/O direction, and you get a callback when you can read/write
without blocking.

This is why the event loop has the higher-level
transport/protocol-based APIs: an IOCP implementation of these creates
instances of a completely different transport implementation, which
however have the same interface and *meaning* as the UNIX transports
(e.g. the transport created by create_connection() connects to a host
and port over TCP/IP and calls the protocol's connection_made(),
data_received(), connection_lost() methods).

So if you want a transport that encapsulates a subprocess (instead of
a TCP/IP connection), and you want to support both UNIX and Windows,
you have to provide (at least) two separate implementations: one on
UNIX that uses add_reader() and friends, and one on Windows that uses
(I don't know what, but something). Each of these implementations by
itself is dependent on the platform (and the specific event loop
implementation); but together they cover all supported platforms.

If you develop this as 3rd party code, and you want your users not to
have to write platform-specific code, you have to write a "start
subprocess" function that inspects the platform (and the event loop
implementation) and then imports and instantiates the right transport
implementation for the platform. If we want to add this to the PEP,
the right thing is to add a "start subprocess" method to the event
loop API (which can be identical to the start subprocess function in
your 3rd party package :-).

> The point here is that there's a lot of event loop machinery (ready
> queue, timed events, run methods) that are independent of the precise
> means by which you poll the OS to ask "has anything interesting
> happened?" Abstracting out that machinery would seem to me to make the
> design cleaner and more understandable.

It is abstracted out in the implementation, but I hope I have
explained with sufficient clarity why it should not be abstracted out
in the PEP: the Selector abstraction only works on UNIX (or with
sockets on Windows).

Also note a subtlety in the PEP: while it describes a
platform-independent API, it doesn't preclude that some parts of that
API may have platform-specific behaviors -- for example, add_reader()
may only take sockets on Windows (and in Jython, I suspect, where
select() only works with sockets), but takes other file descriptors on
UNIX, so you can implement your own subprocess transport for UNIX.
Similarly, the PEP describes the interface between transports and
protocols, but does not give you a way to construct a transport except
for TCP/IP connections. But the abstraction is usable for other
purposes too, and this is intentional! (E.g. you may be able to create
a transport that uses a subprocess running ssh to talk to a remote
server, which might be used to "tunnel" HTTP, so it would make sense
to connect this custom transport with a standard HTTP protocol
implementation.)

> Other benefits - our hypothetical person with a serial port device can
> build his own event source and plug it into the event loop directly.

I think I've answered that above.

> Or someone could offer a multiplexer that combines two separate
> sources by running them in different threads and merging the output on
> a queue (that may be YAGNI, though).

I think there are Twisted reactor implementations that do things like
this. My hope is that a proxy between the Twisted reactor and the PEP
3156 interface will enable this too -- and the event loop APIs for
working with transports and protocols are essential for this purpose.
(Twisted has a working IOCP reactor, FWIW.)

> This is really just something to think about while I'm trying to build
> a Linux development environment so that I can do a Unix proof of
> concept. Once I get started on that, I'll think about the
> protocol/transport stuff.

I think it would be tremendously helpful if you tried to implement the
UNIX version of the subprocess transport. (Note that AFAIK Twisted has
one of these too, maybe you can get some implementation ideas from
them.)

-- 
--Guido van Rossum (python.org/~guido)


From ncoghlan at gmail.com  Thu Jan 17 22:08:09 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 18 Jan 2013 07:08:09 +1000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
Message-ID: <CADiSq7ciJ-VRZo15zMW5v8_8cvg71xgD1xnHSBP8-Zyua8SpxA@mail.gmail.com>

Hmm, there may still be something to the idea of clearly separating out
"for everyone" and "for transports" methods. Even if that's just a split in
the documentation, similar to the "for everyone" vs "for the executor"
split in the concurrent.futures implementation.

--
Sent from my phone, thus the relative brevity :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/453ece44/attachment.html>

From greg.ewing at canterbury.ac.nz  Thu Jan 17 23:40:08 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 18 Jan 2013 11:40:08 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
Message-ID: <50F87DC8.1060000@canterbury.ac.nz>

Paul Moore wrote:
> Although the host and port arguments
> to create_transport are clearly irrelevant to the case of a transport
> managing a process as a data source.

Shouldn't this be called create_internet_transport or something
like that?

-- 
Greg


From guido at python.org  Fri Jan 18 00:39:49 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 17 Jan 2013 15:39:49 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADiSq7ciJ-VRZo15zMW5v8_8cvg71xgD1xnHSBP8-Zyua8SpxA@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CADiSq7ciJ-VRZo15zMW5v8_8cvg71xgD1xnHSBP8-Zyua8SpxA@mail.gmail.com>
Message-ID: <CAP7+vJ+SMJ2Lz1kAO7OhKRSyvPYk5B9oAGvZ1H15cj+vc4By3A@mail.gmail.com>

On Thu, Jan 17, 2013 at 1:08 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Hmm, there may still be something to the idea of clearly separating out "for
> everyone" and "for transports" methods. Even if that's just a split in the
> documentation, similar to the "for everyone" vs "for the executor" split in
> the concurrent.futures implementation.

Good idea, I like this idea.

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Fri Jan 18 00:40:30 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 17 Jan 2013 15:40:30 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <50F87DC8.1060000@canterbury.ac.nz>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
Message-ID: <CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>

On Thu, Jan 17, 2013 at 2:40 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Paul Moore wrote:
>>
>> Although the host and port arguments
>> to create_transport are clearly irrelevant to the case of a transport
>> managing a process as a data source.

> Shouldn't this be called create_internet_transport or something
> like that?

I just renamed it to create_connection(), like I've been promising for
a long time.

-- 
--Guido van Rossum (python.org/~guido)


From p.f.moore at gmail.com  Fri Jan 18 00:44:18 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 17 Jan 2013 23:44:18 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
Message-ID: <CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>

On 17 January 2013 19:10, Guido van Rossum <guido at python.org> wrote:
> I think it would be tremendously helpful if you tried to implement the
> UNIX version of the subprocess transport. (Note that AFAIK Twisted has
> one of these too, maybe you can get some implementation ideas from
> them.)

You were right. In starting to do so, I found out that my thinking has
been solely based on a callback style of programming (users implement
protocol classes and code the relevant "data received" methods
themselves). From looking at some of the sample code, I see that this
is not really the intended usage style. At this point my head
exploded. Coroutines, what fun! I am now reading the sample code, the
section of the PEP on coroutines, and the mailing list threads on the
matter. I may be some time :-)

(The technicalities of the implementation aren't hard - it's just a
data_received type of protocol wrapper round a couple of pipes. It's
the usability and design issues that matter, and they are strongly
affected by "intended usage").

Paul

PS From the PEP, it seems that a protocol must implement the 4 methods
connection_made, data_received, eof_received and connection_lost. For
a process, which has 2 output streams involved, a single data_received
method isn't enough. I see two options - having 2 separate protocol
classes involved, or having a process protocol with a different
interface. Neither option seems obviously best, although Twisted
appears to use different protocol types for different types of
transport. How critical is the principle that there is a single type
of protocol to the PEP?


From guido at python.org  Fri Jan 18 01:19:35 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 17 Jan 2013 16:19:35 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
Message-ID: <CAP7+vJJQYX=MkTjfuV2B4=-t00F2aA5-TfGG7Fy=a3fK_bmTZA@mail.gmail.com>

On Thu, Jan 17, 2013 at 3:44 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 17 January 2013 19:10, Guido van Rossum <guido at python.org> wrote:
>> I think it would be tremendously helpful if you tried to implement the
>> UNIX version of the subprocess transport. (Note that AFAIK Twisted has
>> one of these too, maybe you can get some implementation ideas from
>> them.)
>
> You were right. In starting to do so, I found out that my thinking has
> been solely based on a callback style of programming (users implement
> protocol classes and code the relevant "data received" methods
> themselves). From looking at some of the sample code, I see that this
> is not really the intended usage style. At this point my head
> exploded. Coroutines, what fun! I am now reading the sample code, the
> section of the PEP on coroutines, and the mailing list threads on the
> matter. I may be some time :-)
>
> (The technicalities of the implementation aren't hard - it's just a
> data_received type of protocol wrapper round a couple of pipes. It's
> the usability and design issues that matter, and they are strongly
> affected by "intended usage").

Right, this is a very good observation.

> Paul
>
> PS From the PEP, it seems that a protocol must implement the 4 methods
> connection_made, data_received, eof_received and connection_lost. For
> a process, which has 2 output streams involved, a single data_received
> method isn't enough. I see two options - having 2 separate protocol
> classes involved, or having a process protocol with a different
> interface. Neither option seems obviously best, although Twisted
> appears to use different protocol types for different types of
> transport. How critical is the principle that there is a single type
> of protocol to the PEP?

Not critical at all. The plan for UDP (datagrams in general) is to
have different protocol methods as well.

TBH I would be happy with a first cut that only deals with stdout,
like os.popen(). :-)

Note that I am intrigued by this problem as well and may be hacking up
a version for myself in my spare time.

-- 
--Guido van Rossum (python.org/~guido)


From james.d.harding at siemens.com  Fri Jan 18 02:52:28 2013
From: james.d.harding at siemens.com (Harding, James)
Date: Fri, 18 Jan 2013 01:52:28 +0000
Subject: [Python-ideas] 'const' and 'require' statements
Message-ID: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>

Hello,

I am new here but am itching with an idea. Here are two separate ideas but they are related so they shall both be presented at the same time.

The first idea is for a 'const' statement for declaring constant names. Its syntax would be:

    'const' identifier '=' expression

The expression would be restricted to result in an immutable object such as 17, "green", or (1,2,3). The compiler would effectively replace any use of the identifier with this expression when seen. Some examples of constants might include:

    const ST_MODE = 0
    const FontName = "Ariel"
    const Monday = 1
    const Tuesday = Monday + 1     # may use previously defined const in expression. Compiler will fold constants (hopefully)

Constant names would be limited in scope. A constant defined in a function would only have a life to the end of the function, for instance.

Now why should there be such a syntax if the existing language already has a mechanism for effectively declaring constants, which it does? First, it opens possibilities for the compiler to do things like more constant folding and generally producing more efficient code. Second, since the compiler substitutes for the name at compile time, there is no chance for the name to be stepped on at run-time. Third, ideas such as PEP 3103 could be re-visited. One of the problems in PEP 3103 was that so often constants are represented by names and those names may be changed and/or the constant values in those names are not known until run-time.

Constant names are fine but of limited use if they may only be used within the module they are declared in. This brings up the second idea of a 'require' statement.

The import statement of Python is executed at run-time. This creates a disconnect between modules at compile time (which is a good thing) but gives the compiler no hint as to how to produce better code. What I propose is a 'require' statement with almost exactly the same syntax as the import and from statements but with the keyword 'require' substituted for 'module'. The word require was chosen because the require declaration from the BLISS language helped inspire this idea. C minded people might prefer using a word such as include be used instead.

What the require statement would do is cause the module to be read in by the compiler and compiled when the statement is parsed. The contents of a required module could be restricted to only be const statements in order to avoid the many headaches this would produce. Examples:

    require font_data
    from stat_constants require ST_MODE
    from weekdays require *

In the first example, the name 'font_data' would be a constant module to the compiler. An expression such as font_data.FontName would at compile-time reference the constant name FontName from the font_data module and substitute for it. In the second example, the constant name ST_MODE is added to the current scope. In the third example, all constant names defined in the module (except those with a '_' prefix) are added to the current scope. Since the names added are constant names and not variable names, it is OK to use require * at the function scope level.

In order to help compatibility with existing uses and to avoid declaring constants twice, a require statement could use a 'as *' to both include constant names and assign them to a module's dictionary. For example, the file stat.py might do something like:

    require stat_constants as *

This would add all the constant names defined in the stat_constants module and place them in the stat module's dictionary. For instance, if there is the line in stat_constant.py:

    const ST_MODE = 0

Then for stat.py the compiler will act as if it saw:

    ST_MODE = 0

Well, those are my two bits of ideas.

Thank you,

James Harding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/7ec53746/attachment.html>

From ben+python at benfinney.id.au  Fri Jan 18 04:37:21 2013
From: ben+python at benfinney.id.au (Ben Finney)
Date: Fri, 18 Jan 2013 14:37:21 +1100
Subject: [Python-ideas] 'const' statement
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
Message-ID: <7wfw1zupou.fsf@benfinney.id.au>

"Harding, James"
<james.d.harding at siemens.com> writes:

> The first idea is for a 'const' statement for declaring constant
> names.

Do you have some concrete Python code which would clearly be improved by
this proposal?

> Its syntax would be:
>
>     'const' identifier '=' expression
>
> The expression would be restricted to result in an immutable object
> such as 17, "green", or (1,2,3). The compiler would effectively replace
> any use of the identifier with this expression when seen. Some examples
> of constants might include:
>
>     const ST_MODE = 0
>     const FontName = "Ariel"
>     const Monday = 1
>     const Tuesday = Monday + 1     # may use previously defined const
>     in expression. Compiler will fold constants (hopefully)

So, the compiler will ?replace any use of the identifier with? the
constant value.

    const ST_MODE = 0
    const ST_FILENAME = "foo"
    const ST_RECURSIVE = True

    name_prefix = "ST_"
    foo = globals().get(name_prefix + "MODE")
    bar = globals().get(name_prefix + "FILENAME")
    baz = globals().get(name_prefix + "RECURSIVE")

What do you expect the compiler to do in the above code?

-- 
 \     ?Airports are ugly. Some are very ugly. Some attain a degree of |
  `\        ugliness that can only be the result of a special effort.? |
_o__)       ?Douglas Adams, _The Long Dark Tea-Time of the Soul_, 1988 |
Ben Finney



From cs at zip.com.au  Fri Jan 18 05:28:53 2013
From: cs at zip.com.au (Cameron Simpson)
Date: Fri, 18 Jan 2013 15:28:53 +1100
Subject: [Python-ideas] 'const' statement
In-Reply-To: <7wfw1zupou.fsf@benfinney.id.au>
References: <7wfw1zupou.fsf@benfinney.id.au>
Message-ID: <20130118042853.GA27650@cskk.homeip.net>

On 18Jan2013 14:37, Ben Finney <ben+python at benfinney.id.au> wrote:
| "Harding, James"
| <james.d.harding at siemens.com> writes:
| > Its syntax would be:
| >     'const' identifier '=' expression
| >
| > The expression would be restricted to result in an immutable object
| > such as 17, "green", or (1,2,3). The compiler would effectively replace
| > any use of the identifier with this expression when seen. Some examples
| > of constants might include:
| >
| >     const ST_MODE = 0
| >     const FontName = "Ariel"
| >     const Monday = 1
| >     const Tuesday = Monday + 1     # may use previously defined const
| >     in expression. Compiler will fold constants (hopefully)
| 
| So, the compiler will ?replace any use of the identifier with? the
| constant value.
| 
|     const ST_MODE = 0
|     const ST_FILENAME = "foo"
|     const ST_RECURSIVE = True
| 
|     name_prefix = "ST_"
|     foo = globals().get(name_prefix + "MODE")
|     bar = globals().get(name_prefix + "FILENAME")
|     baz = globals().get(name_prefix + "RECURSIVE")
| 
| What do you expect the compiler to do in the above code?

Personally I'd expect the compiler to produce essentially the same code
it does now with stock Python. After all, name_prefix isn't a const.

But under his proposal I'd expect the compiler to be _able_ to produce
inlined constant results for bare, direct uses of ST_MODE etc.

If I'd written his proposal I'd have probably termed these things
"bind-once", generating names that may not be rebound. They would
still need to be carefully placed if the compiler were to have the
option of constant folding i.e. they're need to be outside function and
class definitions, determinable from static analysis.

Just comments, not endorsement:-)
-- 
Cameron Simpson <cs at zip.com.au>


From steve at pearwood.info  Fri Jan 18 05:52:09 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 18 Jan 2013 15:52:09 +1100
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
Message-ID: <50F8D4F9.9020308@pearwood.info>

On 18/01/13 12:52, Harding, James wrote:
> Hello,
>
> I am new here but am itching with an idea. Here are two separate ideas
>but they are related so they shall both be presented at the same time.
>
> The first idea is for a 'const' statement for declaring constant names.
>Its syntax would be:
>
>      'const' identifier '=' expression
>
> The expression would be restricted to result in an immutable object


What is the purpose of this restriction?

I would like to see the ability to prevent rebinding or unbinding of
names, with no restriction on the value. If that is useful (and I think
it is), then it is useful for mutable objects as well as immutable.



> such as 17, "green", or (1,2,3). The compiler would effectively replace
>any use of the identifier with this expression when seen.

Is that the driving use-case for your suggestion? Compile-time efficiency?
If so, then I suspect that you're on the wrong track. As I understand it,
the sort of optimizations that PyPy can perform at runtime are far more
valuable than this sort of constant substitution.

There are also complications that need to be carefully thought about.
For example, in Python today, you can be sure that this assertion will
always pass:


k = ("Some value", "Another value")  # for example
x = k
y = k
assert x is y  # this always passes, no matter the value of k


But if k is a const, it will fail, because the lines "x = k" and "y = k"
will be expanded at compile time:

x = ("Some value", "Another value")
y = ("Some value", "Another value")
assert x is y  # not guaranteed to pass


So Python would have to intern every const, not just do a compile-time
substitution. And that will have runtime consequences.


Another question: what happens if the constant expression can't be
evaluated until runtime?

x = random.random()
const k = x + 1

y = k - 1

What value should the compiler substitute for y?



> Constant names would be limited in scope. A constant defined in a function
>would only have a life to the end of the function, for instance.

I don't think that makes sense. Since you're talking about something known
to the compiler, it is meaningless to talk about the life of the constant
*at runtime*. Consider:


def f(n):
     const k = ("something", "or", "other")
     if n == 0:
         return k
     else:
         return k[n:]


This will compile to the byte-code equivalent of:

def f(n):
     if n == 0:
         return ("something", "or", "other")
     else:
         return ("something", "or", "other")[n:]


I recommend you run that function through dis.dis to see what it will
be compiled to. In the compiled code, there are two calls to the
LOAD_CONST byte-code. The literal ("something", "or", "other") needs
to be compiled into the byte-code, and so it will exist for as long
as the function exists, not just until the function exits.



> Now why should there be such a syntax if the existing language already
>has a mechanism for effectively declaring constants, which it does?

I dispute that Python has a mechanism for effectively declaring constants.
It has a *convention* for declaring constants, and hoping that neither
you, the developer, nor the caller, accidentally (or deliberately) rebind
that pseudo-constant.



-- 
Steven


From greg.ewing at canterbury.ac.nz  Fri Jan 18 05:59:01 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 18 Jan 2013 17:59:01 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
Message-ID: <50F8D695.3050002@canterbury.ac.nz>

Guido van Rossum wrote:
> I just renamed it to create_connection(), like I've been promising for
> a long time.

That still doesn't spell out that it's about the internet
in particular. Or is the assumption that internet connections
are the only kind that matter these days?

-- 
Greg


From guido at python.org  Fri Jan 18 06:08:06 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 17 Jan 2013 21:08:06 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <50F8D695.3050002@canterbury.ac.nz>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
Message-ID: <CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>

On Thu, Jan 17, 2013 at 8:59 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>>
>> I just renamed it to create_connection(), like I've been promising for
>> a long time.

> That still doesn't spell out that it's about the internet
> in particular. Or is the assumption that internet connections
> are the only kind that matter these days?

Basically yes, in this context. The same assumption underlies
socket.getaddrinfo() in the stdlib. If you have a CORBA system lying
around and you want to support it, you're welcome to create the
transport connection function create_corba_connection(). :-)

-- 
--Guido van Rossum (python.org/~guido)


From bruce at leapyear.org  Fri Jan 18 07:04:53 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Thu, 17 Jan 2013 22:04:53 -0800
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <50F8D4F9.9020308@pearwood.info>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
	<50F8D4F9.9020308@pearwood.info>
Message-ID: <CAGu0Anti0CKjro+9agqyePu9zcwRfTTx1=GoKn1iR+0Yo-Bqig@mail.gmail.com>

On Thu, Jan 17, 2013 at 8:52 PM, Steven D'Aprano <steve at pearwood.info>wrote:

> On 18/01/13 12:52, Harding, James wrote:
>
>> The first idea is for a 'const' statement for declaring constant names.
>> Its syntax would be:
>>
>>      'const' identifier '=' expression
>>
>> The expression would be restricted to result in an immutable object
>>
>
> What is the purpose of this restriction?
>
> I would like to see the ability to prevent rebinding or unbinding of
> names, with no restriction on the value. If that is useful (and I think
> it is), then it is useful for mutable objects as well as immutable.
>
>
Java has a keyword 'final' which means a variable must be bound exactly
once. It is an error if it is bound more than once or not bound at all, or
read before it is initialized. For example, if a class has a final
non-static field foo, then the constructor *must* set foo. A final value
may be immutable.

http://en.wikipedia.org/wiki/Final_(Java)

This catches double initialization errors among other things.

I don't know if final belongs in Python, but I'd find that more useful than
const.

--- Bruce
http://bit.ly/yearofpuzzles
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130117/7b5bd520/attachment.html>

From rosuav at gmail.com  Fri Jan 18 07:31:52 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 18 Jan 2013 17:31:52 +1100
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <50F8D4F9.9020308@pearwood.info>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
	<50F8D4F9.9020308@pearwood.info>
Message-ID: <CAPTjJmoSV1oyfY8yegywBrurdbHs7kAeWsyq6vKQfZj3-cZnOw@mail.gmail.com>

On Fri, Jan 18, 2013 at 3:52 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> Another question: what happens if the constant expression can't be
> evaluated until runtime?
>
> x = random.random()
> const k = x + 1
>
> y = k - 1
>
> What value should the compiler substitute for y?

That should be disallowed. In the declaration of a constant, you have
to use only what can be handled by the constants evaluator. As a rule
of thumb, it'd make sense to be able to use const with anything that
could safely be evaluated by ast.literal_eval.

As to the issues of rebinding, I'd just state that all uses of a
particular named constant evaluate to the same object, just as would
happen if you used any other form of name binding.

I don't have the post to hand, but wasn't there a project being
discussed recently that would do a lot of that work automatically?

ChrisA


From haoyi.sg at gmail.com  Fri Jan 18 08:06:41 2013
From: haoyi.sg at gmail.com (Haoyi Li)
Date: Thu, 17 Jan 2013 23:06:41 -0800
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <CAPTjJmoSV1oyfY8yegywBrurdbHs7kAeWsyq6vKQfZj3-cZnOw@mail.gmail.com>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
	<50F8D4F9.9020308@pearwood.info>
	<CAPTjJmoSV1oyfY8yegywBrurdbHs7kAeWsyq6vKQfZj3-cZnOw@mail.gmail.com>
Message-ID: <CALruUQK1DbE+6TsuLwDt3-j0Db55wtyGWuhU=5KK1oKqLjx9Sw@mail.gmail.com>

Compiler-enforced immutability is one of those really hard problems which,
if you manage to do flexibly and correctly, would be an academically
publishable result, not something you hack into the interpreter over a
weekend.

If you go the dumb-and-easy route, you end up with a simple "sub this
variable with constant" thing, which isn't very useful (what about
calculated constants?)

If you go the slightly-less-dumb route, you end up with some mini-language
to work with these `const` values, which has some operations but not the
full power of python. This basically describes C Macros, which I don't
think you'd want to include in python!

If you go the "full python" route, you basically branch into two
possibilities.

- enforcement of `const` as part of the main program. If you do it hackily,
you end up with C++'s `const` or Java's `final` declaration. Neither of
these really make the object (and all of its contents!) immutable. If you
want to do it properly, this would involve some sort of
effect-tracking-system. This is really hard.

- multi-stage computations, so the program is partially-evaluated at
"compile" time and the `const` sections computed. This is also really hard.
Furthermore, if you want to be able to use bits of the standard library in
the early stages (you probably do, e.g. for things like min, max, len,
etc.) either you'd need to manually start annotating huge chunks of the
standard library to be available at "compile" time (a huge undertaking) or
you'll need an effect-tracking-system to do it for you.


In any case, either you get a crappy implementation that nobody wants (C
Macros) something that doesn't really give the guarantees you'd hope for
(java final/c++ const) or you would have a publishable result w.r.t. either
effect-tracking (!) or multi-stage computations (!!!).

Even though it is very easy to describe the idea (it just stops it from
changing, duh!) and how it would work in a few trivial cases, doing it
properly will likely require some substantial theoretical breakthroughs
before it can actually happen.



On Thu, Jan 17, 2013 at 10:31 PM, Chris Angelico <rosuav at gmail.com> wrote:

> On Fri, Jan 18, 2013 at 3:52 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> > Another question: what happens if the constant expression can't be
> > evaluated until runtime?
> >
> > x = random.random()
> > const k = x + 1
> >
> > y = k - 1
> >
> > What value should the compiler substitute for y?
>
> That should be disallowed. In the declaration of a constant, you have
> to use only what can be handled by the constants evaluator. As a rule
> of thumb, it'd make sense to be able to use const with anything that
> could safely be evaluated by ast.literal_eval.
>
> As to the issues of rebinding, I'd just state that all uses of a
> particular named constant evaluate to the same object, just as would
> happen if you used any other form of name binding.
>
> I don't have the post to hand, but wasn't there a project being
> discussed recently that would do a lot of that work automatically?
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130117/3b8cf182/attachment.html>

From greg.ewing at canterbury.ac.nz  Fri Jan 18 08:17:57 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 18 Jan 2013 20:17:57 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
Message-ID: <50F8F725.20505@canterbury.ac.nz>

Paul Moore wrote:
> PS From the PEP, it seems that a protocol must implement the 4 methods
> connection_made, data_received, eof_received and connection_lost. For
> a process, which has 2 output streams involved, a single data_received
> method isn't enough.

It looks like there would have to be at least two Transport instances
involved, one for stdin/stdout and one for stderr.

Connecting them both to a single Protocol object doesn't seem to be
possible with the framework as defined. You would have to use a
couple of adapter objects to translate the data_received calls into
calls on different methods of another object.

This sort of thing would be easier if, instead of the Transport calling
a predefined method of the Protocol, the Protocol installed a callback
into the Transport. Then a Protocol designed for dealing with subprocesses
could hook different methods of itself into a pair of Transports.

Stepping back a bit, I must say that from the coroutine viewpoint,
the Protocol/Transport stuff just seems to get in the way. If I were
writing coroutine-based code to deal with a subprocess, I would want
to be able to write coroutines like

    def handle_output(stdout):
       while 1:
          line = yield from stdout.readline()
          if not line:
             break
          mungulate_line(line)

    def handle_errors(stderr):
       while 1:
          line = yield from stderr.readline()
          if not line:
             break
          complain_to_user(line)

In other words, I don't want Transports or Protocols or any of that
cruft, I just want a simple pair of async stream objects that I can
read and write using yield-from calls. There doesn't seem to be
anything like that specified in PEP 3156.

It does mention something about implementing a streaming buffer on
top of a Transport, but in a way that makes it sound like a suggested
recipe rather than something to be provided by the library. Also it
seems like a lot of layers of overhead to go through.

On the whole, in PEP 3156 the idea of providing callback-based
interfaces with yield-from-based ones built on top has been
pushed way further up the stack than I imagined it would. I don't
want to be *forced* to write my coroutine code at the level of
Protocols; I want to be able to work at a lower level than that.

-- 
Greg


From aquavitae69 at gmail.com  Fri Jan 18 08:22:01 2013
From: aquavitae69 at gmail.com (David Townshend)
Date: Fri, 18 Jan 2013 09:22:01 +0200
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <CALruUQK1DbE+6TsuLwDt3-j0Db55wtyGWuhU=5KK1oKqLjx9Sw@mail.gmail.com>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
	<50F8D4F9.9020308@pearwood.info>
	<CAPTjJmoSV1oyfY8yegywBrurdbHs7kAeWsyq6vKQfZj3-cZnOw@mail.gmail.com>
	<CALruUQK1DbE+6TsuLwDt3-j0Db55wtyGWuhU=5KK1oKqLjx9Sw@mail.gmail.com>
Message-ID: <CAEgL-fd+SWT9ONmhJ03MQU=ua4fkCztyd9PXM4ZsDYY56B86HA@mail.gmail.com>

On Fri, Jan 18, 2013 at 9:06 AM, Haoyi Li <haoyi.sg at gmail.com> wrote:

> Compiler-enforced immutability is one of those really hard problems which,
> if you manage to do flexibly and correctly, would be an academically
> publishable result, not something you hack into the interpreter over a
> weekend.
>
> If you go the dumb-and-easy route, you end up with a simple "sub this
> variable with constant" thing, which isn't very useful (what about
> calculated constants?)
>
> If you go the slightly-less-dumb route, you end up with some mini-language
> to work with these `const` values, which has some operations but not the
> full power of python. This basically describes C Macros, which I don't
> think you'd want to include in python!
>
> If you go the "full python" route, you basically branch into two
> possibilities.
>
> - enforcement of `const` as part of the main program. If you do it
> hackily, you end up with C++'s `const` or Java's `final` declaration.
> Neither of these really make the object (and all of its contents!)
> immutable. If you want to do it properly, this would involve some sort of
> effect-tracking-system. This is really hard.
>
> - multi-stage computations, so the program is partially-evaluated at
> "compile" time and the `const` sections computed. This is also really hard.
> Furthermore, if you want to be able to use bits of the standard library in
> the early stages (you probably do, e.g. for things like min, max, len,
> etc.) either you'd need to manually start annotating huge chunks of the
> standard library to be available at "compile" time (a huge undertaking) or
> you'll need an effect-tracking-system to do it for you.
>
>
> In any case, either you get a crappy implementation that nobody wants (C
> Macros) something that doesn't really give the guarantees you'd hope for
> (java final/c++ const) or you would have a publishable result w.r.t. either
> effect-tracking (!) or multi-stage computations (!!!).
>
> Even though it is very easy to describe the idea (it just stops it from
> changing, duh!) and how it would work in a few trivial cases, doing it
> properly will likely require some substantial theoretical breakthroughs
> before it can actually happen.
>
>
>
> On Thu, Jan 17, 2013 at 10:31 PM, Chris Angelico <rosuav at gmail.com> wrote:
>
>> On Fri, Jan 18, 2013 at 3:52 PM, Steven D'Aprano <steve at pearwood.info>
>> wrote:
>> > Another question: what happens if the constant expression can't be
>> > evaluated until runtime?
>> >
>> > x = random.random()
>> > const k = x + 1
>> >
>> > y = k - 1
>> >
>> > What value should the compiler substitute for y?
>>
>> That should be disallowed. In the declaration of a constant, you have
>> to use only what can be handled by the constants evaluator. As a rule
>> of thumb, it'd make sense to be able to use const with anything that
>> could safely be evaluated by ast.literal_eval.
>>
>> As to the issues of rebinding, I'd just state that all uses of a
>> particular named constant evaluate to the same object, just as would
>> happen if you used any other form of name binding.
>>
>> I don't have the post to hand, but wasn't there a project being
>> discussed recently that would do a lot of that work automatically?
>>
>> ChrisA
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
>>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
As has already been pointed out, syntax to allow compile-time optimisations
doesn't really make much sense in python, especially considering the
optimisations Pypy already carries out.  Some sort of "finalise" option may
be somewhat useful (although I can't say I've ever needed it).  To avoid
adding a new keyword it could be implementer as a function, e.g.
finalise("varname") or finalise(varname="value").  In a class, this would
actually be quite easy to implement by simply replacing the class dict with
a custom dict designed to restrict writing to finalised names.  I haven't
ever tried changing the globals dict type, but I imagine it would be
possible, or at least possible to to provide a method to change it.  I
haven't thought through all the implications of doing it this way, but I'd
rather see something like this than a new "const" keyword.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/056ee946/attachment.html>

From ncoghlan at gmail.com  Fri Jan 18 09:02:14 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 18 Jan 2013 18:02:14 +1000
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <CALruUQK1DbE+6TsuLwDt3-j0Db55wtyGWuhU=5KK1oKqLjx9Sw@mail.gmail.com>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
	<50F8D4F9.9020308@pearwood.info>
	<CAPTjJmoSV1oyfY8yegywBrurdbHs7kAeWsyq6vKQfZj3-cZnOw@mail.gmail.com>
	<CALruUQK1DbE+6TsuLwDt3-j0Db55wtyGWuhU=5KK1oKqLjx9Sw@mail.gmail.com>
Message-ID: <CADiSq7ekk0ArcbHuHSzuQAhnkJAw9HqOFe-PHTHY2dFNXserzA@mail.gmail.com>

On Fri, Jan 18, 2013 at 5:06 PM, Haoyi Li <haoyi.sg at gmail.com> wrote:
> Compiler-enforced immutability is one of those really hard problems which,
> if you manage to do flexibly and correctly, would be an academically
> publishable result, not something you hack into the interpreter over a
> weekend.
>
> If you go the dumb-and-easy route, you end up with a simple "sub this
> variable with constant" thing, which isn't very useful (what about
> calculated constants?)
>
> If you go the slightly-less-dumb route, you end up with some mini-language
> to work with these `const` values, which has some operations but not the
> full power of python. This basically describes C Macros, which I don't think
> you'd want to include in python!
>
> If you go the "full python" route, you basically branch into two
> possibilities.
>
> - enforcement of `const` as part of the main program. If you do it hackily,
> you end up with C++'s `const` or Java's `final` declaration. Neither of
> these really make the object (and all of its contents!) immutable. If you
> want to do it properly, this would involve some sort of
> effect-tracking-system. This is really hard.
>
> - multi-stage computations, so the program is partially-evaluated at
> "compile" time and the `const` sections computed. This is also really hard.
> Furthermore, if you want to be able to use bits of the standard library in
> the early stages (you probably do, e.g. for things like min, max, len, etc.)
> either you'd need to manually start annotating huge chunks of the standard
> library to be available at "compile" time (a huge undertaking) or you'll
> need an effect-tracking-system to do it for you.
>
>
> In any case, either you get a crappy implementation that nobody wants (C
> Macros) something that doesn't really give the guarantees you'd hope for
> (java final/c++ const) or you would have a publishable result w.r.t. either
> effect-tracking (!) or multi-stage computations (!!!).
>
> Even though it is very easy to describe the idea (it just stops it from
> changing, duh!) and how it would work in a few trivial cases, doing it
> properly will likely require some substantial theoretical breakthroughs
> before it can actually happen.

As James noted, lack of a good answer to this problem is part of the
reason Python doesn't have a switch/case statement [1,2] (only part,
though).

We already have three interesting points in time where evaluation can
happen in Python code:

- compile time (evaluation of literals, including tuples of literals)
- function definition time (evaluation of decorator expressions,
annotations and default arguments, along with decorator invocation)
- execution time (normal execution time - in the case of functions,
function definition time occurs during the execution time of the
containing scope)

We know from experience with default arguments that people find
evaluation at function definition time *incredibly* confusing, because
it means a data value is shared across functions. You can try to limit
this by saying "immutable values only", but then you run into the
problem where dynamic name lookups mean only literals can be
considered truly constant, and those are *already* evaluated (and
sometimes folded together) at compile time:

>>> def f():
...     return 2 * 3
...
>>> dis.dis(f)
  2           0 LOAD_CONST               3 (6)
              3 RETURN_VALUE

(The constant folding in CPython isn't especially clever, but that's
an implementation issue - the language spec already *allows* such
folding, we just don't always detect when it's possible).

So, once you allow name lookups, the question then becomes what
namespace they run in. If you say "the containing namespace" then you
get a few interesting consequences:

1. We're in the same, already known to be confusing, territory as
function default arguments
2. The behaviour of the new construct at module and class level will
necessarily be different to that at function level
3. Quality of error messages and tracebacks will be a potential issue
for debugging
4. When two of these constructs exist in the same scope, is the later
one allowed to refer to the earlier one?

Now we get to the meat of James's suggestion, and while I think it's a
pretty decent take on the "multi-stage evaluation" proposal, it still
runs afoul of many of the same problems past proposals [3] have
struggled with:

1. Name binding operations other than assignment (e.g. import,
function and class definitions)
2. Handling of name binding in nested functions
3. Handling of references to previous early evaluation operations
4. Breaking expectations regarding dynamic modification of module globals
5. Finding a good keyword is hard - suitable terms are either widely
used as variable names, or have too much misleading baggage from other
languages

I can alleviate the concerns about making other components available
at compile time though - if this construct was defined appropriately,
Python would be able to happily import, compile and execute other
modules during a suitable "pre-execution" phase.

The real kicker though, is that, after all that work, you'll have to
ask two questions:
1. Does this change help Python users write more readable code?
2. Does this change help JIT-compiled Python code (e.g. in PyPy) run
faster? (PyPy's JIT can often identify near-constants and move their
calculation out of any frequently executed code paths)

If the answer to that turns out to be "No to both, but it will help
CPython, which has no JIT, run some manually annotated code faster",
then it's a bad idea (it's not an *obviously* bad idea - just one that
is a lot trickier than it may first appear).

Cheers,
Nick.

[1] http://www.python.org/dev/peps/pep-0275/
[2] http://www.python.org/dev/peps/pep-3103/
[3] https://encrypted.google.com/search?q=site%3Amail.python.org%20inurl%3Apython-ideas%20atdef

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ncoghlan at gmail.com  Fri Jan 18 09:08:01 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 18 Jan 2013 18:08:01 +1000
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <CAEgL-fd+SWT9ONmhJ03MQU=ua4fkCztyd9PXM4ZsDYY56B86HA@mail.gmail.com>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
	<50F8D4F9.9020308@pearwood.info>
	<CAPTjJmoSV1oyfY8yegywBrurdbHs7kAeWsyq6vKQfZj3-cZnOw@mail.gmail.com>
	<CALruUQK1DbE+6TsuLwDt3-j0Db55wtyGWuhU=5KK1oKqLjx9Sw@mail.gmail.com>
	<CAEgL-fd+SWT9ONmhJ03MQU=ua4fkCztyd9PXM4ZsDYY56B86HA@mail.gmail.com>
Message-ID: <CADiSq7fZEkG7HFe0FsoT_em=ArcGc+nKAw3W-MpVoZsdRxvCjg@mail.gmail.com>

On Fri, Jan 18, 2013 at 5:22 PM, David Townshend <aquavitae69 at gmail.com> wrote:
> As has already been pointed out, syntax to allow compile-time optimisations
> doesn't really make much sense in python, especially considering the
> optimisations Pypy already carries out.  Some sort of "finalise" option may
> be somewhat useful (although I can't say I've ever needed it).  To avoid
> adding a new keyword it could be implementer as a function, e.g.
> finalise("varname") or finalise(varname="value").  In a class, this would
> actually be quite easy to implement by simply replacing the class dict with
> a custom dict designed to restrict writing to finalised names.  I haven't
> ever tried changing the globals dict type, but I imagine it would be
> possible, or at least possible to to provide a method to change it.  I
> haven't thought through all the implications of doing it this way, but I'd
> rather see something like this than a new "const" keyword.

While you won't see module level support (beyond the ability to place
arbitrary classes in sys.modules), this is already completely possible
through the descriptor protocol (e.g. by creating read-only
properties).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From p.f.moore at gmail.com  Fri Jan 18 09:08:33 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 18 Jan 2013 08:08:33 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
Message-ID: <CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>

On 18 January 2013 05:08, Guido van Rossum <guido at python.org> wrote:
>> That still doesn't spell out that it's about the internet
>> in particular. Or is the assumption that internet connections
>> are the only kind that matter these days?
>
> Basically yes, in this context. The same assumption underlies
> socket.getaddrinfo() in the stdlib. If you have a CORBA system lying
> around and you want to support it, you're welcome to create the
> transport connection function create_corba_connection(). :-)

To create that create_corba_connection() function, you'd be expected
to subclass the standard event loop, is that right?

Paul


From ncoghlan at gmail.com  Fri Jan 18 09:38:53 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 18 Jan 2013 18:38:53 +1000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
Message-ID: <CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>

On Fri, Jan 18, 2013 at 6:08 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 18 January 2013 05:08, Guido van Rossum <guido at python.org> wrote:
>>> That still doesn't spell out that it's about the internet
>>> in particular. Or is the assumption that internet connections
>>> are the only kind that matter these days?
>>
>> Basically yes, in this context. The same assumption underlies
>> socket.getaddrinfo() in the stdlib. If you have a CORBA system lying
>> around and you want to support it, you're welcome to create the
>> transport connection function create_corba_connection(). :-)
>
> To create that create_corba_connection() function, you'd be expected
> to subclass the standard event loop, is that right?

I'm not sure why CORBA would be a transport in its own right rather
than a protocol running over a standard socket transport.

Transports are about the communications channel
- network sockets
- OS pipes
- shared memory
- CANbus
- protocol tunneling

Transports should only be platform specific at the base layer where
they actually need to interact with the OS through the event loop.
Higher level transports should be connected to lower level protocols
based on APIs provided by those transports and protocols themselves.

The *whole point* of the protocol vs transport model is to allow you
to write adaptive stacks. To use the example from PEP 3153, to
implement full JSON-RPC support over both sockets and a HTTP-tunnel
you need the following implemented:

- TCP socket transport
- HTTP protocol
- HTTP-based transport
- JSON-RPC protocol

Because the transport API is standardised, the JSON-RPC protocol can
be written once and run over HTTP using the full stack as shown, *or*
directly over TCP by stripping out the two middle layers.

The *only* layer that the event loop needs to concern itself with is
the base transport layer - it doesn't care how many layers of
protocols or protocol-as-transport adapters you stack on top.

The other thing that may not have been emphasised sufficiently is that
the *protocol* APIs is completely dependent on the protocol involved.
The API of a pipe protocol is not that of HTTP or CORBA or JSON-RPC or
XML-RPC. That's why tunneling, as in the example above, requires a
protocol-specific adapter to translate from the protocol API back to
the standard transport API.

So, for example, Greg's request for the ability to pass callbacks
rather than needing particular method names can be satisfied by
writing a simple callback protocol:

    class CallbackProtocol:
        """Invoke arbitrary callbacks in response to transport events"""
        def __init__(self, on_data, on_conn, on_loss, on_eof):
            self.on_data = on_data
            self.on_conn = on_conn
            self.on_loss = on_loss
            self.on_eof = on_eof

        def connection_made(transport):
            self.on_conn(transport)

        def data_received(data):
            self.on_data(data)

        def eof_received():
            self.on_eof()

        def connection_lost(exc):
            self.on_loss(exc)

Similarly, his request for a IOStreamProtocol would likely look a lot
like an asynchronous version of the existing IO stack API (to handle
encoding, buffering, etc), with the lowest layer being built on the
transport API rather than the file API (as it is in the io module).

You would then be able to treat *any* transport, whether it's an SSH
tunnel, an ordinary socket connection or a pipe to a subprocess as a
non-seekable stream.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From p.f.moore at gmail.com  Fri Jan 18 10:01:23 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 18 Jan 2013 09:01:23 +0000
Subject: [Python-ideas] 'const' statement
In-Reply-To: <20130118042853.GA27650@cskk.homeip.net>
References: <7wfw1zupou.fsf@benfinney.id.au>
	<20130118042853.GA27650@cskk.homeip.net>
Message-ID: <CACac1F-czfNYpAYPEJimDxnZ7W3fseZJcHHUZt7sLakTfigvQQ@mail.gmail.com>

On 18 January 2013 04:28, Cameron Simpson <cs at zip.com.au> wrote:
> If I'd written his proposal I'd have probably termed these things
> "bind-once", generating names that may not be rebound. They would
> still need to be carefully placed if the compiler were to have the
> option of constant folding i.e. they're need to be outside function and
> class definitions, determinable from static analysis.

A few thoughts along the same lines:

1. Global lookups are not likely to be the performance bottleneck in
any real code, so constant folding is not going to be a particular
benefit.
2. The idea of names that can't be rebound isn't particularly Pythonic
(given that things like private class variables aren't part of the
language)
3. Constants that can't be imported from another module aren't much
use, and yet if they can be imported you have real problems enforcing
the non-rebindability. Consider:

    import my_consts
    print(my_consts.A_VALUE) # Presumably a constant value, but
obviously the compiler can't inline it...
    my_consts.A_VALUE = 12 # The language has no chance to prevent
this without completely changing module semantics

Named values are obviously a good thing, but I see little benefit, and
a lot of practical difficulty, with the idea of "enforced const-ness"
in Python.

Paul.


From aquavitae69 at gmail.com  Fri Jan 18 10:38:25 2013
From: aquavitae69 at gmail.com (David Townshend)
Date: Fri, 18 Jan 2013 11:38:25 +0200
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <CADiSq7fZEkG7HFe0FsoT_em=ArcGc+nKAw3W-MpVoZsdRxvCjg@mail.gmail.com>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
	<50F8D4F9.9020308@pearwood.info>
	<CAPTjJmoSV1oyfY8yegywBrurdbHs7kAeWsyq6vKQfZj3-cZnOw@mail.gmail.com>
	<CALruUQK1DbE+6TsuLwDt3-j0Db55wtyGWuhU=5KK1oKqLjx9Sw@mail.gmail.com>
	<CAEgL-fd+SWT9ONmhJ03MQU=ua4fkCztyd9PXM4ZsDYY56B86HA@mail.gmail.com>
	<CADiSq7fZEkG7HFe0FsoT_em=ArcGc+nKAw3W-MpVoZsdRxvCjg@mail.gmail.com>
Message-ID: <CAEgL-fcPdL-1L160NMSbGVnhOkp_9_iLfS62ME69RvmzZm2BtA@mail.gmail.com>

On Fri, Jan 18, 2013 at 10:08 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Fri, Jan 18, 2013 at 5:22 PM, David Townshend <aquavitae69 at gmail.com>
> wrote:
> > As has already been pointed out, syntax to allow compile-time
> optimisations
> > doesn't really make much sense in python, especially considering the
> > optimisations Pypy already carries out.  Some sort of "finalise" option
> may
> > be somewhat useful (although I can't say I've ever needed it).  To avoid
> > adding a new keyword it could be implementer as a function, e.g.
> > finalise("varname") or finalise(varname="value").  In a class, this would
> > actually be quite easy to implement by simply replacing the class dict
> with
> > a custom dict designed to restrict writing to finalised names.  I haven't
> > ever tried changing the globals dict type, but I imagine it would be
> > possible, or at least possible to to provide a method to change it.  I
> > haven't thought through all the implications of doing it this way, but
> I'd
> > rather see something like this than a new "const" keyword.
>
> While you won't see module level support (beyond the ability to place
> arbitrary classes in sys.modules), this is already completely possible
> through the descriptor protocol (e.g. by creating read-only
> properties).
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>

True.  I was going for something which might work in modules too, but
module-level descriptors would probably be a more consistent approach
anyway.  This is actually something I have needed in the past, and got
around it by putting a class in sys.modules.  Maybe finding a neat way to
write module-level descriptors would be more useful, and cover the same use
case as consts?

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/dbc246d9/attachment.html>

From p.f.moore at gmail.com  Fri Jan 18 10:33:09 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 18 Jan 2013 09:33:09 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
Message-ID: <CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>

On 18 January 2013 08:38, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Transports are about the communications channel
> - network sockets
> - OS pipes
> - shared memory
> - CANbus
> - protocol tunneling
>
> Transports should only be platform specific at the base layer where
> they actually need to interact with the OS through the event loop.
> Higher level transports should be connected to lower level protocols
> based on APIs provided by those transports and protocols themselves.
>
> The *whole point* of the protocol vs transport model is to allow you
> to write adaptive stacks.

Interesting. On that basis, the whole subprocess interaction scenario
is not a low level transport at all (contrary to what I understood
from Guido's suggestion of an event loop method) and so should be
built in user code (OK, probably as a standard library helper, but
definitely not as specialist methods on the event loop) layered on the
low-level pipe transport.

That was my original instinct, but it fell afoul of

1. The Windows implementation of a low level pipe transport doesn't
exist (yet) and I don't know enough about IOCP to write it [1].
2. I don't understand the programming model well enough to understand
how to write a transport/protocol layer (coroutine head explosion
issue).

I have now (finally!) got Guido's point that implementing a process
protocol will give me a good insight into how this stuff is meant to
work. I'm still struggling to understand why he thinks it needs a
dedicated method on the event loop, rather than being a higher-level
layer like you're suggesting, but I'm at least starting to understand
what questions to ask.

Paul

[1] There is some stuff in the IOCP documentation about handles having
to be opened in OVERLAPPED mode, which worries me here as it may imply
that arbitrary pipes (such as the ones subprocess.Popen uses) can't be
plugged in. It's a bit like setting a filehandle to nonblocking in
Unix, but it has to be done at open time, IIUC. I think I saw an email
about this that I need to hunt out.


From ncoghlan at gmail.com  Fri Jan 18 11:37:23 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 18 Jan 2013 20:37:23 +1000
Subject: [Python-ideas] 'const' statement
In-Reply-To: <CACac1F-czfNYpAYPEJimDxnZ7W3fseZJcHHUZt7sLakTfigvQQ@mail.gmail.com>
References: <7wfw1zupou.fsf@benfinney.id.au>
	<20130118042853.GA27650@cskk.homeip.net>
	<CACac1F-czfNYpAYPEJimDxnZ7W3fseZJcHHUZt7sLakTfigvQQ@mail.gmail.com>
Message-ID: <CADiSq7d0Aftix_QXdi7qSffzLxP5y=hKQ0cmBBndrfsk_gtL1A@mail.gmail.com>

On Fri, Jan 18, 2013 at 7:01 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> Named values are obviously a good thing, but I see little benefit, and
> a lot of practical difficulty, with the idea of "enforced const-ness"
> in Python.

FWIW, people can play whatever games they like by injecting arbitrary
objects into sys.modules.

>>> class Locked:
...     def __setattr__(self, attr, value):
...         raise AttributeError("Rebinding not permitted")
...     def __delattr__(self, attr):
...         raise AttributeError("Deletion not permitted")
...     attr1 = "Hello"
...     attr2 = "World"
...
>>> sys.modules["example"] = Locked
>>> import example
>>> example.attr1
'Hello'
>>> example.attr2
'World'
>>> example.attr2 = "Change"
>>> example.attr2 = "World"
>>> sys.modules["example"] = Locked()
>>> import example
>>> example.attr1
'Hello'
>>> example.attr2
'World'
>>> example.attr2 = "Change"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __setattr__
AttributeError: Rebinding not permitted

The import system is even defined to expressly permit doing this in a
*module's own code* by replacing "sys.module[__name__]" with a
different object.

So, any such proposal needs to be made with the awareness that anyone
that *really* wants to do this kind of thing already can, but they
don't.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From jsbueno at python.org.br  Fri Jan 18 12:28:56 2013
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Fri, 18 Jan 2013 09:28:56 -0200
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <CAEgL-fd+SWT9ONmhJ03MQU=ua4fkCztyd9PXM4ZsDYY56B86HA@mail.gmail.com>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
	<50F8D4F9.9020308@pearwood.info>
	<CAPTjJmoSV1oyfY8yegywBrurdbHs7kAeWsyq6vKQfZj3-cZnOw@mail.gmail.com>
	<CALruUQK1DbE+6TsuLwDt3-j0Db55wtyGWuhU=5KK1oKqLjx9Sw@mail.gmail.com>
	<CAEgL-fd+SWT9ONmhJ03MQU=ua4fkCztyd9PXM4ZsDYY56B86HA@mail.gmail.com>
Message-ID: <CAH0mxTQ0d=H-LTOj9=zCa6R7kDPvuguucbxHspKtR9VChPd19g@mail.gmail.com>

On 18 January 2013 05:22, David Townshend <aquavitae69 at gmail.com> wrote:

>
> As has already been pointed out, syntax to allow compile-time optimisations
> doesn't really make much sense in python, especially considering the
> optimisations Pypy already carries out.  Some sort of "finalise" option may
> be somewhat useful (although I can't say I've ever needed it).  To avoid
> adding a new keyword it could be implementer as a function, e.g.
> finalise("varname") or finalise(varname="value").  In a class, this would
> actually be quite easy to implement by simply replacing the class dict with
> a custom dict designed to restrict writing to finalised names.  I haven't
> ever tried changing the globals dict type, but I imagine it would be
> possible, or at least possible to to provide a method to change it.  I
> haven't thought through all the implications of doing it this way, but I'd
> rather see something like this than a new "const" keyword.
>

Yes - changing a module's (or object that stands for a module :-)  ) dict type
does work [1] - which would allow for  a "module decorator" to change it.
So, the functionality from Java's "final"  and others can  be had in
Python today, with a small set of "module decorator" utilities.

Now, do I think such a thing should go in the standard library?  -0 for that.

[1] - http://stackoverflow.com/questions/13274916/python-imported-module-is-none/13278043#13278043


> David


From ncoghlan at gmail.com  Fri Jan 18 12:59:24 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 18 Jan 2013 21:59:24 +1000
Subject: [Python-ideas] 'const' and 'require' statements
In-Reply-To: <CAEgL-fcPdL-1L160NMSbGVnhOkp_9_iLfS62ME69RvmzZm2BtA@mail.gmail.com>
References: <D88D6CBD04B9E647B778917814A70F9A92407967@USSLMMBX003.net.plm.eds.com>
	<50F8D4F9.9020308@pearwood.info>
	<CAPTjJmoSV1oyfY8yegywBrurdbHs7kAeWsyq6vKQfZj3-cZnOw@mail.gmail.com>
	<CALruUQK1DbE+6TsuLwDt3-j0Db55wtyGWuhU=5KK1oKqLjx9Sw@mail.gmail.com>
	<CAEgL-fd+SWT9ONmhJ03MQU=ua4fkCztyd9PXM4ZsDYY56B86HA@mail.gmail.com>
	<CADiSq7fZEkG7HFe0FsoT_em=ArcGc+nKAw3W-MpVoZsdRxvCjg@mail.gmail.com>
	<CAEgL-fcPdL-1L160NMSbGVnhOkp_9_iLfS62ME69RvmzZm2BtA@mail.gmail.com>
Message-ID: <CADiSq7dJ0N5seB3JwPMPYWQaE8qPncCjh2gtHZc=igAP2Lsa0g@mail.gmail.com>

On Fri, Jan 18, 2013 at 7:38 PM, David Townshend <aquavitae69 at gmail.com> wrote:
> True.  I was going for something which might work in modules too, but
> module-level descriptors would probably be a more consistent approach
> anyway.  This is actually something I have needed in the past, and got
> around it by putting a class in sys.modules.  Maybe finding a neat way to
> write module-level descriptors would be more useful, and cover the same use
> case as consts?

I think putting class objects in sys.modules *is* the way to get
"module level" descriptors. The fact it feels like a hack is a
positive in my book - techniques that are "always dubious, but
sometimes necessary" *should* feel like hacks, so people stay away
from them until they run out of other options :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ncoghlan at gmail.com  Fri Jan 18 12:55:19 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 18 Jan 2013 21:55:19 +1000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
Message-ID: <CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>

On Fri, Jan 18, 2013 at 7:33 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 18 January 2013 08:38, Nick Coghlan <ncoghlan at gmail.com> wrote:
> I have now (finally!) got Guido's point that implementing a process
> protocol will give me a good insight into how this stuff is meant to
> work. I'm still struggling to understand why he thinks it needs a
> dedicated method on the event loop, rather than being a higher-level
> layer like you're suggesting, but I'm at least starting to understand
> what questions to ask.

The creation of the pipe transport needs to be on the event loop,
precisely because of cross-platform differences when it comes to
Windows. On *nix, on the other hand, the pipe transport should look an
awful lot like the socket transport and thus be able to use the
existing file descriptor based interfaces on the event loop.

The protocol part is then about adapting the transport API to
coroutine friendly readlines/writelines API (the part that Guido
points out needs more detail in
http://www.python.org/dev/peps/pep-3156/#coroutines-and-protocols)

As a rough untested sketch (the buffering here could likely be a lot smarter):

    # Remember we're not using preemptive threading, so we don't need
locking for thread safety
    # Note that the protocol isn't designed to support reconnection -
a new connection means
    # a new protocol instance. The create_* APIs on the event loop
accept a protocol factory
    # specifically in order to encourage this approach
    class SimpleStreamingProtocol:
        def __init__(self):
            self._transport = None
            self._data = bytearray()
            self._pending = None

        def connection_made(self, transport):
            self._transport = transport
        def connection_lost(self, exc):
            self._transport = None
            # Could also store the exc directly on the protocol and raise
            # it in subsequent write calls
            if self._pending is not None:
                self._pending.set_exception(exc)
        def received_eof(self):
            self.transport = None
            if self._pending is not None:
                self._pending.set_result(False)
        def received_data(self, data):
            self.data.extend(data)
            if self._pending is not None:
                self._pending.set_result(True)

        # The writing side is fairly easy, as we just pass it through
to the transport
        # These are all defined by PEP 3156 as non-blocking calls
        def write(self, data):
            if self._transport is None:
                raise RuntimeError("Connection not open")
            self._transport.write(data)
        def writelines(self, iterable):
            if self._transport is None:
                raise RuntimeError("Connection not open")
            self._transport.writelines(iterable)
        def close(self):
            if self._transport is not None:
                self._transport.close()
                self._transport = None

        def _read_from_buffer(self):
            data = bytes(self._data)
            self._data.clear()
            return data

        # The reading side has to adapt between coroutines and callbacks
        @coroutine
        def read(self):
            if self._transport is None:
                raise RuntimeError("Connection not open")
            if self._pending is not None:
                raise RuntimeError("Concurrent reads not permitted")
            # First check if we already have data waiting
            data = self._read_from_buffer()
            if data:
                return data
            # Otherwise wait for data
            # This method can easily be updated to use a loop and multiple
            # futures in order to support a "minimum read" parameter
            f = self._pending = tulip.Future()
            finished = yield from f
            data = b'' if finished else self._read_from_buffer()
            return data

        # This uses async iteration as described at [1]
        # We yield coroutines, which must then be invoked with yield from
        def readlines(self):
            cached_lines = self._data.split(b'\n')
            self._data.clear()
            if cached_lines[-1]: # Last line is incomplete
                self._data.extend(cached_lines[-1])
            del cached_lines[-1]
            while not finished:
                # When we already have the data, a simple future will do
                for line in cached_lines:
                    f = tulip.Future()
                    f.set_result(line)
                    yield f
                # Otherwise, we hand control to the event loop
                @coroutine
                def wait_for_line():
                    nonlocal finished
                    data = yield from self.read()
                    if not data:
                        finished = True
                        return b''
                    lines = data.split(b'\n')
                    if lines[-1]: # Last line is incomplete
                        self._data.extend(lines[-1])
                    cached_lines.extend(lines[1:-1])
                    return lines[0]
                yield wait_for_line()

    # Used as:
    pipe, stream = event_loop.create_pipe(SimpleStreamingProtocol)
    # Or even as:
    conn, stream = event_loop.create_connection(SimpleStreamingProtocol,
                                                ... # connection details)

    # Reading from the stream in a coroutine
    for f in stream.readlines():
        line = yield from f

[1] http://python-notes.boredomandlaziness.org/en/latest/pep_ideas/async_programming.html#asynchronous-iterators

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From tarek at ziade.org  Fri Jan 18 13:30:15 2013
From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=)
Date: Fri, 18 Jan 2013 13:30:15 +0100
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <20130116194756.2efe9afe@pitrou.net>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
	<50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org>
	<20130116194756.2efe9afe@pitrou.net>
Message-ID: <50F94057.9080005@ziade.org>

On 1/16/13 7:47 PM, Antoine Pitrou wrote:
> You know, discussing performance without posting benchmark numbers is 
> generally pointless.

Sure, yes, so I tried to implement it by adapting the current any() :

http://tarek.pastebin.mozilla.org/2068630

but it is 20% slower in my benchmark. However, I have no idea  if my 
implementation is the right way to do things.

Cheers
Tarek

-- 
Tarek Ziad? ? http://ziade.org ? @tarek_ziade



From ncoghlan at gmail.com  Fri Jan 18 13:52:58 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 18 Jan 2013 22:52:58 +1000
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <50F94057.9080005@ziade.org>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
	<50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org>
	<20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org>
Message-ID: <CADiSq7c9=ro6bNAAa=oJX0idRLd3W9jmXQM13hE1WDkAyUKkFg@mail.gmail.com>

On Fri, Jan 18, 2013 at 10:30 PM, Tarek Ziad? <tarek at ziade.org> wrote:
> On 1/16/13 7:47 PM, Antoine Pitrou wrote:
>>
>> You know, discussing performance without posting benchmark numbers is
>> generally pointless.
>
>
> Sure, yes, so I tried to implement it by adapting the current any() :
>
> http://tarek.pastebin.mozilla.org/2068630
>
> but it is 20% slower in my benchmark. However, I have no idea  if my
> implementation is the right way to do things.

Resuming an existing frame (i.e. using a generator expression) is
almost always going to be faster than going through the argument
passing machinery and initialising a *new* frame. Chaining C level
iterators together (e.g. map, itertools) is even faster.

DSU is great for cases where you need it, but a transformation
pipeline is otherwise likely to be faster (or at least not
substantially slower).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From eliben at gmail.com  Fri Jan 18 15:56:55 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Fri, 18 Jan 2013 06:56:55 -0800
Subject: [Python-ideas] PEP 3156 / Tulip question: write/send callback/future
Message-ID: <CAF-Rda9SPFtS6PeASzOy5e=i__aJE2rMwnquDtBBrdk=rUYFaw@mail.gmail.com>

Hi,

I'm looking through PEP 3156 and the Tulip code, and either something is
missing or I'm not looking in the right places.

I can't find any sort of callback / future return for asynchronous writes,
e.g. in transport.

Should there be no "data_sent" parallel to "data_received" somewhere? Or,
alternatively, "write" returning some sort of future that can be checked
later for status? For connections that aren't infinitely fast it's useful
to know when the data was actually sent/written, or alternatively if an
error has occurred. This is also important for when writing would actually
block because of full buffers. boost::asio has such a handler for
async_write.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/6b97c9a5/attachment.html>

From ericsnowcurrently at gmail.com  Fri Jan 18 16:54:42 2013
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Fri, 18 Jan 2013 08:54:42 -0700
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <CADiSq7c9=ro6bNAAa=oJX0idRLd3W9jmXQM13hE1WDkAyUKkFg@mail.gmail.com>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
	<50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org>
	<20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org>
	<CADiSq7c9=ro6bNAAa=oJX0idRLd3W9jmXQM13hE1WDkAyUKkFg@mail.gmail.com>
Message-ID: <CALFfu7D1gGF9dxvmH91cfhpB-HMELXvk0xOdseXWNTKjd=2rwg@mail.gmail.com>

On Fri, Jan 18, 2013 at 5:52 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> DSU is great for cases where you need it, but a transformation
> pipeline is otherwise likely to be faster (or at least not
> substantially slower).

It took me a sec.  :)  DSU == "Decorate-Sort-Undecorate". [1]

-eric


[1] http://en.wikipedia.org/wiki/Decorate-sort-undecorate


From tjreedy at udel.edu  Fri Jan 18 19:36:07 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 18 Jan 2013 13:36:07 -0500
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <CALFfu7D1gGF9dxvmH91cfhpB-HMELXvk0xOdseXWNTKjd=2rwg@mail.gmail.com>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
	<50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org>
	<20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org>
	<CADiSq7c9=ro6bNAAa=oJX0idRLd3W9jmXQM13hE1WDkAyUKkFg@mail.gmail.com>
	<CALFfu7D1gGF9dxvmH91cfhpB-HMELXvk0xOdseXWNTKjd=2rwg@mail.gmail.com>
Message-ID: <kdc4n3$c6o$1@ger.gmane.org>

On 1/18/2013 10:54 AM, Eric Snow wrote:
> On Fri, Jan 18, 2013 at 5:52 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> DSU is great for cases where you need it, but a transformation
>> pipeline is otherwise likely to be faster (or at least not
>> substantially slower).
>
> It took me a sec.  :)  DSU == "Decorate-Sort-Undecorate". [1]

No, no, no. Its Delaware State University in Dover, as opposed to 
Univesity of Delaware (UD) in Newark ;-).

In other words, it depends on the universe you live in.
-- 
Terry Jan Reedy



From p.f.moore at gmail.com  Fri Jan 18 22:01:32 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 18 Jan 2013 21:01:32 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
Message-ID: <CACac1F8591LHqLogm3KE5ePi0Zr7zVG2rKfV06b7Na9ze_uu7w@mail.gmail.com>

On 18 January 2013 09:33, Paul Moore <p.f.moore at gmail.com> wrote:
> [1] There is some stuff in the IOCP documentation about handles having
> to be opened in OVERLAPPED mode, which worries me here as it may imply
> that arbitrary pipes (such as the ones subprocess.Popen uses) can't be
> plugged in. It's a bit like setting a filehandle to nonblocking in
> Unix, but it has to be done at open time, IIUC. I think I saw an email
> about this that I need to hunt out.

Hmm, I'm looking at a pipe transport on Unix, and I find I don't know
enough about programming Unix. How do I set a file descriptor
(specifically a pipe) in Unix to be nonblocking? For a socket,
sock.setblocking(False) does the job. But for a pipe/file, the only
thing I can see is the O_NONBLOCK flag to os.open/os.pipe2. Is it not
possible to set an already open file descriptor to be nonblocking? If
that's the case, it means that Unix has the same problem as I suspect
exists for Windows - existing pipes and filehandles can't be used in
async code as they won't necessarily be in nonblocking mode.

Is there a way round this on Unix that I'm not aware of? Otherwise, it
seems that there's going to have to be a whole load of duplication in
the "async world" (an async version of subprocess.Popen, for a start,
as well as any other "open" type of calls that might need to produce
handles that can be used asynchronously). Either that or everything
that returns a pipe/handle that you might want to use in async code
will have to grow some sort of "async" flag.

Paul


From guido at python.org  Fri Jan 18 22:02:16 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 13:02:16 -0800
Subject: [Python-ideas] PEP 3156 / Tulip question: write/send
	callback/future
In-Reply-To: <CAF-Rda9SPFtS6PeASzOy5e=i__aJE2rMwnquDtBBrdk=rUYFaw@mail.gmail.com>
References: <CAF-Rda9SPFtS6PeASzOy5e=i__aJE2rMwnquDtBBrdk=rUYFaw@mail.gmail.com>
Message-ID: <CAP7+vJJPU2rd=iMzNX+MOkU00UJj-AD-c-R=A9Lo6Y4FBhpo6A@mail.gmail.com>

On Fri, Jan 18, 2013 at 6:56 AM, Eli Bendersky <eliben at gmail.com> wrote:
> I'm looking through PEP 3156 and the Tulip code, and either something is
> missing or I'm not looking in the right places.
>
> I can't find any sort of callback / future return for asynchronous writes,
> e.g. in transport.

I guess you should read some Twisted tutorial. :-)

> Should there be no "data_sent" parallel to "data_received" somewhere? Or,
> alternatively, "write" returning some sort of future that can be checked
> later for status? For connections that aren't infinitely fast it's useful to
> know when the data was actually sent/written, or alternatively if an error
> has occurred. This is also important for when writing would actually block
> because of full buffers. boost::asio has such a handler for async_write.

The model is a little different. Glyph has convinced me that it works
well in practice. We just buffer what is written (when it can't all be
sent immediately). This is enough for most apps that don't serve 100MB
files. If the buffer becomes too large, the transport will call
.pause() on the protocol until it is drained, then it calls .resume().
(The names of these are TBD, maybe they will end up .pause_writing()
and .resume_writing().) There are some default behaviors that we can
add here too, e.g. suspending the task.

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Fri Jan 18 22:24:15 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 13:24:15 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
Message-ID: <CAP7+vJ+caQp+qzsBtPGHv0QO9aLWQiU3RCba6nWKj9JOMf_nig@mail.gmail.com>

On Fri, Jan 18, 2013 at 12:08 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 18 January 2013 05:08, Guido van Rossum <guido at python.org> wrote:
>>> That still doesn't spell out that it's about the internet
>>> in particular. Or is the assumption that internet connections
>>> are the only kind that matter these days?
>>
>> Basically yes, in this context. The same assumption underlies
>> socket.getaddrinfo() in the stdlib. If you have a CORBA system lying
>> around and you want to support it, you're welcome to create the
>> transport connection function create_corba_connection(). :-)
>
> To create that create_corba_connection() function, you'd be expected
> to subclass the standard event loop, is that right?

No, it doesn't need to be a method on the event loop at all. It can
just be a function in a different package; it can use
events.get_current_event_loop() to reference the event loop.

-- 
--Guido van Rossum (python.org/~guido)


From phd at phdru.name  Fri Jan 18 22:25:31 2013
From: phd at phdru.name (Oleg Broytman)
Date: Sat, 19 Jan 2013 01:25:31 +0400
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F8591LHqLogm3KE5ePi0Zr7zVG2rKfV06b7Na9ze_uu7w@mail.gmail.com>
References: <CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CACac1F8591LHqLogm3KE5ePi0Zr7zVG2rKfV06b7Na9ze_uu7w@mail.gmail.com>
Message-ID: <20130118212531.GA19497@iskra.aviel.ru>

On Fri, Jan 18, 2013 at 09:01:32PM +0000, Paul Moore <p.f.moore at gmail.com> wrote:
> Hmm, I'm looking at a pipe transport on Unix, and I find I don't know
> enough about programming Unix. How do I set a file descriptor
> (specifically a pipe) in Unix to be nonblocking? For a socket,
> sock.setblocking(False) does the job. But for a pipe/file, the only
> thing I can see is the O_NONBLOCK flag to os.open/os.pipe2. Is it not
> possible to set an already open file descriptor to be nonblocking?

http://linuxmanpages.com/man2/fcntl.2.php

The file status flags
    A file descriptor has certain associated flags, initialized by open(2)
    and possibly modified by fcntl(2). The flags are shared between copies
    (made with dup(2), fork(2), etc.) of the same file descriptor.

    The flags and their semantics are described in open(2).

    F_GETFL
        Read the file descriptor's flags.
    F_SETFL
        Set the file status flags part of the descriptor's flags to the
        value specified by arg. Remaining bits (access mode, file creation
        flags) in arg are ignored. On Linux this command can only change the
        O_APPEND, O_NONBLOCK, O_ASYNC, and O_DIRECT flags.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From james.d.harding at siemens.com  Fri Jan 18 22:28:43 2013
From: james.d.harding at siemens.com (Harding, James)
Date: Fri, 18 Jan 2013 21:28:43 +0000
Subject: [Python-ideas] Regarding 'const' and 'require' statements
Message-ID: <D88D6CBD04B9E647B778917814A70F9A9240A9CC@USSLMMBX003.net.plm.eds.com>

Hello,

There are so many replies that I am going to try and summarize responses with a lot of cut and paste in one post. Sorry if this is the wrong way to do it.



> Do you have some concrete Python code which would clearly be improved by this proposal?



Let me explain myself. I am a low-level programmer fascinated by Python's elegant syntax and how it is executed. We actually do little Python programming here but we do allow interaction between Python and our product and so I am not able to show any concrete code. I guess that makes me a crank but I am fine with that. At a low level, I look at what Python has to go through to execute statements and thoughts swirl through my mind as to how it could be improved. I finally cracked and made a post here with one of those improvements.




>>     const ST_MODE = 0



>> So, the compiler will ?replace any use of the identifier with? the constant value.



Yes, the compiler will replace any use of the name with its value. A statement like:



    If c == ST_MODE:



Would be treated by the compiler at compile-time as if it had seen:



    If c == 0:



The name ST_MODE in this example is not a bindable name. The name only lives during compilation and is not accessible at run-time. It would not be stored in a dictionary (unless the magic syntax 'require module as *' were used that only confuses what I am trying to say).



>    name_prefix = "ST_"

>    foo = globals().get(name_prefix + "MODE")

>

> What do you expect the compiler to do in the above code?



Since the name is not accessible at run-time, the above would produce an exception. Const names are only available at compile-time.



> If I'd written his proposal I'd have probably termed these things "bind-once", generating names that may not be rebound. They would still

> need to be carefully placed if the compiler were to have the option of constant folding i.e. they're need to be outside function and class

> definitions, determinable from static analysis.



These are "bind-never" names. The compiler would have to be able to see the definition when a module that uses them  is being compiled. That is the reason for the require statement. The compiler does not normally look at the contents of other modules when parsing a source file. The require statement tells it to do so.



>> The expression would be restricted to result in an immutable object



>What is the purpose of this restriction?



My thought is that a constant name should have the same value regardless of context. If I were to say something like "const A = B" then A is no longer a constant and when substituted depends on how B is interpreted within the current context (is it a global? A local? A nonlocal?). If I were to say "const A = [1,2,3]" then you need to worry about side effects. You would have to entirely clone the value at compile-time for each use rather than simply incrementing the reference count.



>Is that the driving use-case for your suggestion? Compile-time efficiency?

>If so, then I suspect that you're on the wrong track. As I understand it,

>the sort of optimizations that PyPy can perform at runtime are far more

>valuable than this sort of constant substitution.



That is my basic track. If the existing tools handle this better than my idea should be discarded as not providing any significant improvement and adding additional baggage to the language.





>k = ("Some value", "Another value")  # for example

>x = k

>y = k

>assert x is y  # this always passes, no matter the value of k

>

>But if k is a const, it will fail, because the lines "x = k" and "y = k"

>will be expanded at compile time:



The restriction that constant names be immutable objects would allow their values to be placed in the constant pool for the function. In the above, if 'k' were a constant name then it would (hopefully) reside in a single location in the constant pool and the assignments to 'x' and 'y' would access the same constant pool location.



>Another question: what happens if the constant expression can't be

>evaluated until runtime?



Constant expressions would be restricted to be compile-time constants. They would not be evaluated at run-time.



>Compiler-enforced immutability is one of those really hard problems which,

>if you manage to do flexibly and correctly, would be an academically

>publishable result, not something you hack into the interpreter over a

>weekend.



I have to plead guilty here. I am not an academic and do not know all the implications of things. I do not follow research either and so am basically proposing this as a crank/hacker sort of person.



>- multi-stage computations, so the program is partially-evaluated at

>"compile" time and the `const` sections computed. This is also really hard.

>Furthermore, if you want to be able to use bits of the standard library in

>the early stages (you probably do, e.g. for things like min, max, len,

>etc.) either you'd need to manually start annotating huge chunks of the

>standard library to be available at "compile" time (a huge undertaking) or

>you'll need an effect-tracking-system to do it for you.



This is indeed a big worry. I would have had it such that a module could (but not required in any sense) be split into two parts. One part that is referenced at run-time using the existing import mechanism. This would not change. The second part of a module would be constants (and only constants) that are referenced at compile-time. There would be no requirement that modules change over to this new method. It would just mean that constants defined in the module are available to the compiler. That last statement is apparently not exactly true if I understand the comments about what PyPy optimizations do.



The idea of the compiler accessing the source files for other modules does give me pause. Currently, compiling one module is fairly disjoint from other modules in that a change to one module does not require a re-compile of modules that use it even if 'constants' are changed. This is a good feature of Python and maybe something to boast about but I would worry if these ideas introduced bad practices. I don't think that there have been many cases of Python programmers saying: "I made a change to my module - you need to recompile your module to get the changes".





I would like to thank you all for critiquing my ideas and pointing out its flaws with patience and respect. In many ways this was just an exercise in getting things off my chest because in the end I am just a crank.



Thank you,



James Harding











-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/3f8c12e5/attachment.html>

From phd at phdru.name  Fri Jan 18 22:37:34 2013
From: phd at phdru.name (Oleg Broytman)
Date: Sat, 19 Jan 2013 01:37:34 +0400
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <20130118212531.GA19497@iskra.aviel.ru>
References: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CACac1F8591LHqLogm3KE5ePi0Zr7zVG2rKfV06b7Na9ze_uu7w@mail.gmail.com>
	<20130118212531.GA19497@iskra.aviel.ru>
Message-ID: <20130118213734.GB19497@iskra.aviel.ru>

On Sat, Jan 19, 2013 at 01:25:31AM +0400, Oleg Broytman <phd at phdru.name> wrote:
> On Fri, Jan 18, 2013 at 09:01:32PM +0000, Paul Moore <p.f.moore at gmail.com> wrote:
> > Hmm, I'm looking at a pipe transport on Unix, and I find I don't know
> > enough about programming Unix. How do I set a file descriptor
> > (specifically a pipe) in Unix to be nonblocking? For a socket,
> > sock.setblocking(False) does the job. But for a pipe/file, the only
> > thing I can see is the O_NONBLOCK flag to os.open/os.pipe2. Is it not
> > possible to set an already open file descriptor to be nonblocking?
> 
>     F_GETFL
>         Read the file descriptor's flags.
>     F_SETFL
>         Set the file status flags part of the descriptor's flags to the
>         value specified by arg. Remaining bits (access mode, file creation
>         flags) in arg are ignored. On Linux this command can only change the
>         O_APPEND, O_NONBLOCK, O_ASYNC, and O_DIRECT flags.

   So you have to call fnctl() on the pipe's descriptor to F_GETFL
flags, set O_NONBLOCK and call fnctl() to F_SETFL the new flags back.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From eliben at gmail.com  Fri Jan 18 22:40:44 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Fri, 18 Jan 2013 13:40:44 -0800
Subject: [Python-ideas] PEP 3156 / Tulip question: write/send
	callback/future
In-Reply-To: <CAP7+vJJPU2rd=iMzNX+MOkU00UJj-AD-c-R=A9Lo6Y4FBhpo6A@mail.gmail.com>
References: <CAF-Rda9SPFtS6PeASzOy5e=i__aJE2rMwnquDtBBrdk=rUYFaw@mail.gmail.com>
	<CAP7+vJJPU2rd=iMzNX+MOkU00UJj-AD-c-R=A9Lo6Y4FBhpo6A@mail.gmail.com>
Message-ID: <CAF-Rda_ftvkJOzrHnJgECpKec1zTh2V7OT=HBH686uPA-FC4Rg@mail.gmail.com>

On Fri, Jan 18, 2013 at 1:02 PM, Guido van Rossum <guido at python.org> wrote:

> On Fri, Jan 18, 2013 at 6:56 AM, Eli Bendersky <eliben at gmail.com> wrote:
> > I'm looking through PEP 3156 and the Tulip code, and either something is
> > missing or I'm not looking in the right places.
> >
> > I can't find any sort of callback / future return for asynchronous
> writes,
> > e.g. in transport.
>
> I guess you should read some Twisted tutorial. :-)
>

Yes, I noticed that Twisted also doesn't have it, so I suspected that
influence.


>
> > Should there be no "data_sent" parallel to "data_received" somewhere? Or,
> > alternatively, "write" returning some sort of future that can be checked
> > later for status? For connections that aren't infinitely fast it's
> useful to
> > know when the data was actually sent/written, or alternatively if an
> error
> > has occurred. This is also important for when writing would actually
> block
> > because of full buffers. boost::asio has such a handler for async_write.
>
> The model is a little different. Glyph has convinced me that it works
> well in practice. We just buffer what is written (when it can't all be
> sent immediately). This is enough for most apps that don't serve 100MB
> files. If the buffer becomes too large, the transport will call
> .pause() on the protocol until it is drained, then it calls .resume().
> (The names of these are TBD, maybe they will end up .pause_writing()
> and .resume_writing().) There are some default behaviors that we can
> add here too, e.g. suspending the task.
>
>
I agree it can be made to work, but how would even simple "done sending"
notification work? Or "send error" for that matter? AFAIR, low-level socket
async API do provide this information. Are we confident enough it will
never be needed to simply hide it away?

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/c1772aac/attachment.html>

From guido at python.org  Fri Jan 18 22:48:24 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 13:48:24 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
Message-ID: <CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>

On Fri, Jan 18, 2013 at 12:38 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Fri, Jan 18, 2013 at 6:08 PM, Paul Moore <p.f.moore at gmail.com> wrote:
>> On 18 January 2013 05:08, Guido van Rossum <guido at python.org> wrote:
>>>> That still doesn't spell out that it's about the internet
>>>> in particular. Or is the assumption that internet connections
>>>> are the only kind that matter these days?
>>>
>>> Basically yes, in this context. The same assumption underlies
>>> socket.getaddrinfo() in the stdlib. If you have a CORBA system lying
>>> around and you want to support it, you're welcome to create the
>>> transport connection function create_corba_connection(). :-)
>>
>> To create that create_corba_connection() function, you'd be expected
>> to subclass the standard event loop, is that right?
>
> I'm not sure why CORBA would be a transport in its own right rather
> than a protocol running over a standard socket transport.

I don't know -- but I could imagine that a particular CORBA
implementation might be provided as a set of API function calls rather
than something that hooks into sockets. I don't care about CORBA, but
that was the use case I intended to highlight -- something that (for
whatever reason, no matter how misguided) doesn't use sockets and
doesn't have an underlying file descriptor you can wait on. (IIRC most
GUI frameworks also fall into that category.)

> Transports are about the communications channel
> - network sockets
> - OS pipes
> - shared memory
> - CANbus
> - protocol tunneling

Hm. I think of transports more as an abstraction of a specific set of
semantics for a communication channel -- bidrectional streams, in
particular, presumably with error correction/detection so that you can
assume that you either see what the other end sent you, in the order
in which it sent it (but not preserving buffer/packet/record
boundaries!), or you get a "broken connection" error.

Now, we may be in violent agreement here -- the transports I am
thinking of can certainly use any of the mechanisms you list as
underlying abstraction. But I wouldn't call it a transport unless it
had standardized semantics and a standardized interface with the
protocol.

(For datagrams, we need slightly different abstractions, with
different guarantees and semantics. But, again, all datagram
transports should be more or less interchangeable.)

> Transports should only be platform specific at the base layer where
> they actually need to interact with the OS through the event loop.
> Higher level transports should be connected to lower level protocols
> based on APIs provided by those transports and protocols themselves.

Yeah, well, but in practice I expect that layering transports on top
of each other is rare, and using platform specific transport
implementations is by far the common case. (Note that in theory you
could layer SSL over any unencrypted transport; but in practice (a)
few people need that, and (b) the ssl module doesn't support this --
hence I am comfortable with treating SSL as another platform-specific
transport.)

> The *whole point* of the protocol vs transport model is to allow you
> to write adaptive stacks. To use the example from PEP 3153, to
> implement full JSON-RPC support over both sockets and a HTTP-tunnel
> you need the following implemented:
>
> - TCP socket transport
> - HTTP protocol
> - HTTP-based transport
> - JSON-RPC protocol
>
> Because the transport API is standardised, the JSON-RPC protocol can
> be written once and run over HTTP using the full stack as shown, *or*
> directly over TCP by stripping out the two middle layers.

I don't know enough about JSON-RPC (shame on me!) but this sounds very
reasonable.

> The *only* layer that the event loop needs to concern itself with is
> the base transport layer - it doesn't care how many layers of
> protocols or protocol-as-transport adapters you stack on top.

True.

There's one important issue here: *constructing* the stack is not up
to the event loop. It is totally fine if the HTTP-based transport is a
3rd party package that exports a function to set up the stack, given
an event loop and a protocol to run on top (JSON-RPC in this example).
This function can have a custom signature that is not compatible with
any other transport-creating APIs in existence. (In fact this is why I
renamed create_transport() to create_connection() -- the standardized
API just has methods for creating internet connections.)

> The other thing that may not have been emphasised sufficiently is that
> the *protocol* APIs is completely dependent on the protocol involved.
> The API of a pipe protocol is not that of HTTP or CORBA or JSON-RPC or
> XML-RPC. That's why tunneling, as in the example above, requires a
> protocol-specific adapter to translate from the protocol API back to
> the standard transport API.

I'm not even sure what you mean by the protocol API. From the PEP's
POV, the "protcol API" is just the methods that the transport calls
(connection_made(), data_received(), etc.) and those certainly *are*
supposed to be standardized.

> So, for example, Greg's request for the ability to pass callbacks
> rather than needing particular method names

Hm, I have yet to respond to Greg's message, but I'm not sure that's a
reasonable request.

> can be satisfied by writing a simple callback protocol:
>
>     class CallbackProtocol:
>         """Invoke arbitrary callbacks in response to transport events"""
>         def __init__(self, on_data, on_conn, on_loss, on_eof):
>             self.on_data = on_data
>             self.on_conn = on_conn
>             self.on_loss = on_loss
>             self.on_eof = on_eof
>
>         def connection_made(transport):
>             self.on_conn(transport)
>
>         def data_received(data):
>             self.on_data(data)
>
>         def eof_received():
>             self.on_eof()
>
>         def connection_lost(exc):
>             self.on_loss(exc)

Well, except that you can't just pass CallbackProtocol where a
protocol factory is required by the PEP -- you'll have to pass a
lambda or partial function without arguments that calls
CallbackProtocol with some arguments taken from elsewhere. No big deal
though.

> Similarly, his request for a IOStreamProtocol would likely look a lot
> like an asynchronous version of the existing IO stack API (to handle
> encoding, buffering, etc), with the lowest layer being built on the
> transport API rather than the file API (as it is in the io module).

That sounds like an intriguing idea which I'd like to explore in the
distant future. One point of light: a transport probably already is
acceptable as a binary *output* stream, because its write() method is
not a coroutine. (This is intentional.) But doing the same for input
is harder.

> You would then be able to treat *any* transport, whether it's an SSH
> tunnel, an ordinary socket connection or a pipe to a subprocess as a
> non-seekable stream.

Right.

(TBH, I'm often not sure whether you are just explaining the PEP's
philosophy or trying to propose changes... Sorry for the confusion
this may cause.)

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Fri Jan 18 22:52:59 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 13:52:59 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <20130118213734.GB19497@iskra.aviel.ru>
References: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CACac1F8591LHqLogm3KE5ePi0Zr7zVG2rKfV06b7Na9ze_uu7w@mail.gmail.com>
	<20130118212531.GA19497@iskra.aviel.ru>
	<20130118213734.GB19497@iskra.aviel.ru>
Message-ID: <CAP7+vJLMC7DREPVKFYpYY5cW9TA_o+Db-BtTM8o-_V=24jVs=Q@mail.gmail.com>

On Fri, Jan 18, 2013 at 1:37 PM, Oleg Broytman <phd at phdru.name> wrote:
> On Sat, Jan 19, 2013 at 01:25:31AM +0400, Oleg Broytman <phd at phdru.name> wrote:
>> On Fri, Jan 18, 2013 at 09:01:32PM +0000, Paul Moore <p.f.moore at gmail.com> wrote:
>> > Hmm, I'm looking at a pipe transport on Unix, and I find I don't know
>> > enough about programming Unix. How do I set a file descriptor
>> > (specifically a pipe) in Unix to be nonblocking? For a socket,
>> > sock.setblocking(False) does the job. But for a pipe/file, the only
>> > thing I can see is the O_NONBLOCK flag to os.open/os.pipe2. Is it not
>> > possible to set an already open file descriptor to be nonblocking?
>>
>>     F_GETFL
>>         Read the file descriptor's flags.
>>     F_SETFL
>>         Set the file status flags part of the descriptor's flags to the
>>         value specified by arg. Remaining bits (access mode, file creation
>>         flags) in arg are ignored. On Linux this command can only change the
>>         O_APPEND, O_NONBLOCK, O_ASYNC, and O_DIRECT flags.
>
>    So you have to call fnctl() on the pipe's descriptor to F_GETFL
> flags, set O_NONBLOCK and call fnctl() to F_SETFL the new flags back.

Here's my code for this:

def _setnonblocking(fd):
    flags = fcntl.fcntl(fd, fcntl.F_GETFL)
    fcntl.fcntl(fd, fcntl.F_SETFL, flags | os.O_NONBLOCK)

-- 
--Guido van Rossum (python.org/~guido)


From p.f.moore at gmail.com  Fri Jan 18 23:07:23 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 18 Jan 2013 22:07:23 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <20130118213734.GB19497@iskra.aviel.ru>
References: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CACac1F8591LHqLogm3KE5ePi0Zr7zVG2rKfV06b7Na9ze_uu7w@mail.gmail.com>
	<20130118212531.GA19497@iskra.aviel.ru>
	<20130118213734.GB19497@iskra.aviel.ru>
Message-ID: <CACac1F8cz7syqjZSRaWVRc2-cjx5H9x9NOhHH0JD_EonFT=A3w@mail.gmail.com>

On 18 January 2013 21:37, Oleg Broytman <phd at phdru.name> wrote:
>    So you have to call fnctl() on the pipe's descriptor to F_GETFL
> flags, set O_NONBLOCK and call fnctl() to F_SETFL the new flags back.

Ah, excellent. Thanks for the information - I'll use that in my code.
Paul


From guido at python.org  Fri Jan 18 23:15:07 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 14:15:07 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <50F8F725.20505@canterbury.ac.nz>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
	<50F8F725.20505@canterbury.ac.nz>
Message-ID: <CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>

On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing
<greg.ewing at canterbury.ac.nz> wrote:
> Paul Moore wrote:
>>
>> PS From the PEP, it seems that a protocol must implement the 4 methods
>> connection_made, data_received, eof_received and connection_lost. For
>> a process, which has 2 output streams involved, a single data_received
>> method isn't enough.

> It looks like there would have to be at least two Transport instances
> involved, one for stdin/stdout and one for stderr.
>
> Connecting them both to a single Protocol object doesn't seem to be
> possible with the framework as defined. You would have to use a
> couple of adapter objects to translate the data_received calls into
> calls on different methods of another object.

So far this makes sense.

But for this specific case there's a simpler solution -- require the
protocol to support a few extra methods, in particular,
err_data_received() and err_eof_received(), which are to stderr what
data_received() and eof_received() are for stdout. (After all, the
point of a subprocess is that "normal" data goes to stdout.) There's
only one input stream to the subprocess, so there's no ambiguity for
write(), and neither is there a need for multiple
connection_made()/lost() methods. (However, we could argue endlessly
over whether connection_lost() should be called when the subprocess
exits, or when the other side of all three pipes is closed. :-)

> This sort of thing would be easier if, instead of the Transport calling
> a predefined method of the Protocol, the Protocol installed a callback
> into the Transport. Then a Protocol designed for dealing with subprocesses
> could hook different methods of itself into a pair of Transports.

Hm. Not excited. I like everyone using the same names for these
callback methods, so that a reader (who is familiar with the
transport/protocol API) can instantly know what kind of callback it is
and what its arguments are. (But see Nick's simple solution for having
your cake and eating it, too.)

> Stepping back a bit, I must say that from the coroutine viewpoint,
> the Protocol/Transport stuff just seems to get in the way. If I were
> writing coroutine-based code to deal with a subprocess, I would want
> to be able to write coroutines like
>
>    def handle_output(stdout):
>       while 1:
>          line = yield from stdout.readline()
>          if not line:
>             break
>          mungulate_line(line)
>
>    def handle_errors(stderr):
>       while 1:
>          line = yield from stderr.readline()
>          if not line:
>             break
>          complain_to_user(line)
>
> In other words, I don't want Transports or Protocols or any of that
> cruft, I just want a simple pair of async stream objects that I can
> read and write using yield-from calls. There doesn't seem to be
> anything like that specified in PEP 3156.

This is a good observation -- one that I've made myself as well. I
also have a plan for dealing with it -- but I haven't coded it up
properly yet and consequently I haven't written it up for the PEP yet
either.

The idea is that there will be some even-higher-level functions for
tasks to call to open connections (etc.) which just give you two
unidrectional streams (one for reading, one for writing). The
write-stream can just be the transport (its write() and writelines()
methods are familiar from regular I/O streams) and the read-stream can
be a StreamReader -- a class I've written but which needs to be moved
into a better place:
http://code.google.com/p/tulip/source/browse/tulip/http_client.py#37

Anyway, the reason for having the transport/protocol abstractions in
the middle is so that other frameworks can ignore coroutines if they
want to -- all they have to do is work with Futures, which can be
fully controlled through callbacks (which are native at the lowest
level of almost all frameworks, including Tulip / PEP 3156).

> It does mention something about implementing a streaming buffer on
> top of a Transport, but in a way that makes it sound like a suggested
> recipe rather than something to be provided by the library. Also it
> seems like a lot of layers of overhead to go through.

It'll be in the stdlib, no worries. I don't expect the overhead to be a problem.

> On the whole, in PEP 3156 the idea of providing callback-based
> interfaces with yield-from-based ones built on top has been
> pushed way further up the stack than I imagined it would. I don't
> want to be *forced* to write my coroutine code at the level of
> Protocols; I want to be able to work at a lower level than that.

You can write an alternative framework using coroutines and callbacks,
bypassing transports and protocols. (You'll still need Futures.)
However you'd be missing the interoperability offered by the
protocol/transport abstractions: in an IOCP world you'd have to
interact with the event loop's callbacks differently than in a
select/poll/etc. world.

PEP 3156 is trying to make different groups happy: people who like
callbacks, people who like coroutines; people who like UNIX, people
who like Windows. Everybody may have to compromise a little bit, but
the reward will (hopefully) be better portability and better
interoperability.

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Fri Jan 18 23:22:34 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 14:22:34 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
Message-ID: <CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>

On Fri, Jan 18, 2013 at 3:55 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Fri, Jan 18, 2013 at 7:33 PM, Paul Moore <p.f.moore at gmail.com> wrote:
>> On 18 January 2013 08:38, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> I have now (finally!) got Guido's point that implementing a process
>> protocol will give me a good insight into how this stuff is meant to
>> work. I'm still struggling to understand why he thinks it needs a
>> dedicated method on the event loop, rather than being a higher-level
>> layer like you're suggesting, but I'm at least starting to understand
>> what questions to ask.
>
> The creation of the pipe transport needs to be on the event loop,
> precisely because of cross-platform differences when it comes to
> Windows. On *nix, on the other hand, the pipe transport should look an
> awful lot like the socket transport and thus be able to use the
> existing file descriptor based interfaces on the event loop.

Thanks for clarifying that -- I'm behind on this thread!

> The protocol part is then about adapting the transport API to
> coroutine friendly readlines/writelines API (the part that Guido
> points out needs more detail in
> http://www.python.org/dev/peps/pep-3156/#coroutines-and-protocols)
>
> As a rough untested sketch (the buffering here could likely be a lot smarter):

I have a more-or-less working but probably incomplete version checked
into the tulip repo:
http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py

Note that this completely ignores stderr -- this makes the code
simpler while still useful (there's plenty of useful stuff you can do
without reading stderr), and avoids the questions Greg Ewing brought
up about needing two transports (one for stdout, another for stderr).

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Fri Jan 18 23:25:10 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 14:25:10 -0800
Subject: [Python-ideas] PEP 3156 / Tulip question: write/send
	callback/future
In-Reply-To: <CAF-Rda_ftvkJOzrHnJgECpKec1zTh2V7OT=HBH686uPA-FC4Rg@mail.gmail.com>
References: <CAF-Rda9SPFtS6PeASzOy5e=i__aJE2rMwnquDtBBrdk=rUYFaw@mail.gmail.com>
	<CAP7+vJJPU2rd=iMzNX+MOkU00UJj-AD-c-R=A9Lo6Y4FBhpo6A@mail.gmail.com>
	<CAF-Rda_ftvkJOzrHnJgECpKec1zTh2V7OT=HBH686uPA-FC4Rg@mail.gmail.com>
Message-ID: <CAP7+vJ+Sd4u4gP0vqeiT6GYn9CdSqn5zWmi0aReoy+y2pnb4Pg@mail.gmail.com>

On Fri, Jan 18, 2013 at 1:40 PM, Eli Bendersky <eliben at gmail.com> wrote:
> On Fri, Jan 18, 2013 at 1:02 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> On Fri, Jan 18, 2013 at 6:56 AM, Eli Bendersky <eliben at gmail.com> wrote:
>> > I'm looking through PEP 3156 and the Tulip code, and either something is
>> > missing or I'm not looking in the right places.
>> >
>> > I can't find any sort of callback / future return for asynchronous
>> > writes,
>> > e.g. in transport.
>>
>> I guess you should read some Twisted tutorial. :-)
>
>
> Yes, I noticed that Twisted also doesn't have it, so I suspected that
> influence.
>
>>
>>
>> > Should there be no "data_sent" parallel to "data_received" somewhere?
>> > Or,
>> > alternatively, "write" returning some sort of future that can be checked
>> > later for status? For connections that aren't infinitely fast it's
>> > useful to
>> > know when the data was actually sent/written, or alternatively if an
>> > error
>> > has occurred. This is also important for when writing would actually
>> > block
>> > because of full buffers. boost::asio has such a handler for async_write.
>>
>> The model is a little different. Glyph has convinced me that it works
>> well in practice. We just buffer what is written (when it can't all be
>> sent immediately). This is enough for most apps that don't serve 100MB
>> files. If the buffer becomes too large, the transport will call
>> .pause() on the protocol until it is drained, then it calls .resume().
>> (The names of these are TBD, maybe they will end up .pause_writing()
>> and .resume_writing().) There are some default behaviors that we can
>> add here too, e.g. suspending the task.
>>
>
> I agree it can be made to work, but how would even simple "done sending"
> notification work? Or "send error" for that matter? AFAIR, low-level socket
> async API do provide this information. Are we confident enough it will never
> be needed to simply hide it away?

AFAIK the Twisted folks have found that most of the time (basically
all of the time) you don't need a positive "done sending"
notification; when the send eventually *fails*, the transport calls
the protocol's connection_lost() method with an exception indicating
what failed.

-- 
--Guido van Rossum (python.org/~guido)


From p.f.moore at gmail.com  Fri Jan 18 23:32:17 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 18 Jan 2013 22:32:17 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+caQp+qzsBtPGHv0QO9aLWQiU3RCba6nWKj9JOMf_nig@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CAP7+vJ+caQp+qzsBtPGHv0QO9aLWQiU3RCba6nWKj9JOMf_nig@mail.gmail.com>
Message-ID: <CACac1F-WMaqQdWb14uA9M89X9+6cTWz-L5UyS61WkjX82UVqNQ@mail.gmail.com>

On 18 January 2013 21:24, Guido van Rossum <guido at python.org> wrote:
>> To create that create_corba_connection() function, you'd be expected
>> to subclass the standard event loop, is that right?
>
> No, it doesn't need to be a method on the event loop at all. It can
> just be a function in a different package; it can use
> events.get_current_event_loop() to reference the event loop.

Aargh. I'm confused again! (I did warn you about dumb questions, didn't I? :-))

The event loop implementation contains the code that does the OS-level
poll for events to process. (In tulip, that is handled by the selector
object, but that's not mentioned in the PEP so I assume it should be
considered an implementation detail). So, the event loop has to define
what types of (OS-level) objects can be registered. At the moment,
event loops only handle sockets (via select/poll/etc) and even the raw
add_reader methods are not for end user use.

So a standalone create_corba_connection function can certainly get the
event loop using get_current_event_loop(), but it has no means of
asking the event loop to poll the CORBA connection it creates for new
messages. Without direct access to the selector (or equivalent) it
can't add the extra event source. (Unless that source is a pollable
file descriptor and it's willing to play with the optional add_reader
methods, but that's not a "new event source" then...) The same problem
will likely occur if you try to integrate Windows GUI events (you
check for a GUI message by calling a Windows API).

I don't think this matters except in obscure cases (it's likely a huge
case of YAGNI) but I genuinely don't understand how you can say that
create_corba_connection() could be written as a standalone function,
and yet that create_connection() has to be a method of the event loop.
That's what I'm getting at when I keep saying that I see you treating
sockets as "special". There's clearly something I'm missing in your
thinking, and it keeps tripping me up.

Paul.


From p.f.moore at gmail.com  Fri Jan 18 23:48:36 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 18 Jan 2013 22:48:36 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
Message-ID: <CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>

On 18 January 2013 22:22, Guido van Rossum <guido at python.org> wrote:
>> The protocol part is then about adapting the transport API to
>> coroutine friendly readlines/writelines API (the part that Guido
>> points out needs more detail in
>> http://www.python.org/dev/peps/pep-3156/#coroutines-and-protocols)
>>
>> As a rough untested sketch (the buffering here could likely be a lot smarter):
>
> I have a more-or-less working but probably incomplete version checked
> into the tulip repo:
> http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py

Ha! You beat me to it.

OK, looking at your code, I see that you freely used the
add_reader/add_writer functions and friends, and the fact that the
Unix selectors handle pipes as well as sockets. With the freedom to do
that, your code looks both reasonable and pretty straightforward. I
was having trouble getting past the fact that this approach wouldn't
work on Windows, and confusing "nonportable" with "not allowed". My
apologies. You kept telling me that writing the code for Unix would be
helpful, but I kept thinking in terms of writing code that worked on
Unix but with portability to Windows in mind, which completely misses
the point. I knew that the transport/protocol code I'd end up writing
would look something like this, but TBH I'd not seen that as the
interesting part of the problem...

BTW, to avoid duplication of the fork/exec stuff, I would probably
have written the transport to take a subprocess.Popen object as its
only argument, then hooked up self._wstdin to popen.stdin and
self._rstdout to popen.stdout. That requires the user to have created
the Popen object with those file descriptors as pipes (I don't know if
it's possible to introspect a Popen object to check that) but avoids
duplicating the subprocess logic. I can probably fairly quickly modify
your code to demonstrate, but it's late and I don't want to start
booting my Unix environment now, so it'll have to wait till tomorrow
:-)

Paul.


From guido at python.org  Fri Jan 18 23:51:41 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 14:51:41 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F-WMaqQdWb14uA9M89X9+6cTWz-L5UyS61WkjX82UVqNQ@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CAP7+vJ+caQp+qzsBtPGHv0QO9aLWQiU3RCba6nWKj9JOMf_nig@mail.gmail.com>
	<CACac1F-WMaqQdWb14uA9M89X9+6cTWz-L5UyS61WkjX82UVqNQ@mail.gmail.com>
Message-ID: <CAP7+vJLsCRwt0QyuFnZtb_UO8dWToKZme=cxBXxELuu-9-8QPA@mail.gmail.com>

On Fri, Jan 18, 2013 at 2:32 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 18 January 2013 21:24, Guido van Rossum <guido at python.org> wrote:
>>> To create that create_corba_connection() function, you'd be expected
>>> to subclass the standard event loop, is that right?
>>
>> No, it doesn't need to be a method on the event loop at all. It can
>> just be a function in a different package; it can use
>> events.get_current_event_loop() to reference the event loop.
>
> Aargh. I'm confused again! (I did warn you about dumb questions, didn't I? :-))
>
> The event loop implementation contains the code that does the OS-level
> poll for events to process. (In tulip, that is handled by the selector
> object, but that's not mentioned in the PEP so I assume it should be
> considered an implementation detail). So, the event loop has to define
> what types of (OS-level) objects can be registered. At the moment,
> event loops only handle sockets (via select/poll/etc) and even the raw
> add_reader methods are not for end user use.

Well, *on UNIX* the event loop also handles other file descriptors,
and there's nothing to actually *prevent* an end user using
add_reader. It just may not work when their code is run on Windows,
but then it probably won't run on Windows anyway. :-)

> So a standalone create_corba_connection function can certainly get the
> event loop using get_current_event_loop(), but it has no means of
> asking the event loop to poll the CORBA connection it creates for new
> messages.

Right, unless it is in on the conspiracy between the event loop and
the selector (IOW if it is effectively aware and/or part of the event
loop implementation for the specific platform).

> Without direct access to the selector (or equivalent) it
> can't add the extra event source. (Unless that source is a pollable
> file descriptor and it's willing to play with the optional add_reader
> methods, but that's not a "new event source" then...) The same problem
> will likely occur if you try to integrate Windows GUI events (you
> check for a GUI message by calling a Windows API).

Let's say that you are thinking through the example much farther than
I had intended... :-)

> I don't think this matters except in obscure cases (it's likely a huge
> case of YAGNI) but I genuinely don't understand how you can say that
> create_corba_connection() could be written as a standalone function,
> and yet that create_connection() has to be a method of the event loop.
> That's what I'm getting at when I keep saying that I see you treating
> sockets as "special". There's clearly something I'm missing in your
> thinking, and it keeps tripping me up.

Let's assume that create_corba_connection() actually *can* be written
using add_reader(), but only on UNIX. So the app is limited to UNIX,
and in that context create_corba_connection() can be a function in
another package.

It's not so much that create_connection() *must* be a method on the
event loop. It's just that I *want* it to be a method on the event
loop so you will be able to write user code that is portable between
UNIX and Windows. It will call create_connection(), which is a
portable API with two platform-specific implementations; on Windows
(when using IOCP) it will return an instance of, say,
_IocpSocketTransport(), while on UNIX it returns a
_UnixSocketTransport() instance.

But we have no hope of making create_corba_connection() on Windows (in
my example -- please just play along) and hence there is no need to
make it a method of the event loop.

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Fri Jan 18 23:53:15 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 14:53:15 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
Message-ID: <CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>

On Fri, Jan 18, 2013 at 2:48 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 18 January 2013 22:22, Guido van Rossum <guido at python.org> wrote:
>>> The protocol part is then about adapting the transport API to
>>> coroutine friendly readlines/writelines API (the part that Guido
>>> points out needs more detail in
>>> http://www.python.org/dev/peps/pep-3156/#coroutines-and-protocols)
>>>
>>> As a rough untested sketch (the buffering here could likely be a lot smarter):
>>
>> I have a more-or-less working but probably incomplete version checked
>> into the tulip repo:
>> http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py
>
> Ha! You beat me to it.
>
> OK, looking at your code, I see that you freely used the
> add_reader/add_writer functions and friends, and the fact that the
> Unix selectors handle pipes as well as sockets. With the freedom to do
> that, your code looks both reasonable and pretty straightforward. I
> was having trouble getting past the fact that this approach wouldn't
> work on Windows, and confusing "nonportable" with "not allowed". My
> apologies. You kept telling me that writing the code for Unix would be
> helpful, but I kept thinking in terms of writing code that worked on
> Unix but with portability to Windows in mind, which completely misses
> the point. I knew that the transport/protocol code I'd end up writing
> would look something like this, but TBH I'd not seen that as the
> interesting part of the problem...

Glad you've got it now!

> BTW, to avoid duplication of the fork/exec stuff, I would probably
> have written the transport to take a subprocess.Popen object as its
> only argument, then hooked up self._wstdin to popen.stdin and
> self._rstdout to popen.stdout. That requires the user to have created
> the Popen object with those file descriptors as pipes (I don't know if
> it's possible to introspect a Popen object to check that) but avoids
> duplicating the subprocess logic. I can probably fairly quickly modify
> your code to demonstrate, but it's late and I don't want to start
> booting my Unix environment now, so it'll have to wait till tomorrow
> :-)

I would love for you to create that version. I only checked it in so I
could point to it -- I am not happy with either the implementation,
the API spec, or the unit test...

-- 
--Guido van Rossum (python.org/~guido)


From p.f.moore at gmail.com  Fri Jan 18 23:53:54 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 18 Jan 2013 22:53:54 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJLsCRwt0QyuFnZtb_UO8dWToKZme=cxBXxELuu-9-8QPA@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CAP7+vJ+caQp+qzsBtPGHv0QO9aLWQiU3RCba6nWKj9JOMf_nig@mail.gmail.com>
	<CACac1F-WMaqQdWb14uA9M89X9+6cTWz-L5UyS61WkjX82UVqNQ@mail.gmail.com>
	<CAP7+vJLsCRwt0QyuFnZtb_UO8dWToKZme=cxBXxELuu-9-8QPA@mail.gmail.com>
Message-ID: <CACac1F_2pCHSB0zZo+mW-Zgchj6zLGrk6iDWWF2S2MhSSL5mjg@mail.gmail.com>

On 18 January 2013 22:51, Guido van Rossum <guido at python.org> wrote:
> It's not so much that create_connection() *must* be a method on the
> event loop. It's just that I *want* it to be a method on the event
> loop so you will be able to write user code that is portable between
> UNIX and Windows. It will call create_connection(), which is a
> portable API with two platform-specific implementations; on Windows
> (when using IOCP) it will return an instance of, say,
> _IocpSocketTransport(), while on UNIX it returns a
> _UnixSocketTransport() instance.
>
> But we have no hope of making create_corba_connection() on Windows (in
> my example -- please just play along) and hence there is no need to
> make it a method of the event loop.

Ah, OK. I've got it now, thanks!
Paul


From p.f.moore at gmail.com  Fri Jan 18 23:57:45 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Fri, 18 Jan 2013 22:57:45 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
	<50F8F725.20505@canterbury.ac.nz>
	<CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
Message-ID: <CACac1F8FD5pBxE5Lv+s04qi=CzaCVM2svn3OcF4G4dG90p51Fg@mail.gmail.com>

On 18 January 2013 22:15, Guido van Rossum <guido at python.org> wrote:
> But for this specific case there's a simpler solution -- require the
> protocol to support a few extra methods, in particular,
> err_data_received() and err_eof_received(), which are to stderr what
> data_received() and eof_received() are for stdout. (After all, the
> point of a subprocess is that "normal" data goes to stdout.) There's
> only one input stream to the subprocess, so there's no ambiguity for
> write(), and neither is there a need for multiple
> connection_made()/lost() methods. (However, we could argue endlessly
> over whether connection_lost() should be called when the subprocess
> exits, or when the other side of all three pipes is closed. :-)

While I don't really care about arguing over *when* connection_lost
should be called, it *is* relevant to my thinking that getting
notified when the process exits doesn't seem to me to be possible -
again it's the issue that the transport can't ask the event loop to
poll for anything that the event loop isn't already coded to check. So
(once again, unless I've missed something) the only viable option for
a standalone transport is to call connection_lost when all the pipes
are closed.

Am I still missing something?
Paul


From guido at python.org  Sat Jan 19 00:01:54 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 15:01:54 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F8FD5pBxE5Lv+s04qi=CzaCVM2svn3OcF4G4dG90p51Fg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
	<50F8F725.20505@canterbury.ac.nz>
	<CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
	<CACac1F8FD5pBxE5Lv+s04qi=CzaCVM2svn3OcF4G4dG90p51Fg@mail.gmail.com>
Message-ID: <CAP7+vJ+rb3OENt+Oa89=OOpY97OMky2Eufy+iVnrk4c6A2AHWQ@mail.gmail.com>

On Fri, Jan 18, 2013 at 2:57 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 18 January 2013 22:15, Guido van Rossum <guido at python.org> wrote:
>> But for this specific case there's a simpler solution -- require the
>> protocol to support a few extra methods, in particular,
>> err_data_received() and err_eof_received(), which are to stderr what
>> data_received() and eof_received() are for stdout. (After all, the
>> point of a subprocess is that "normal" data goes to stdout.) There's
>> only one input stream to the subprocess, so there's no ambiguity for
>> write(), and neither is there a need for multiple
>> connection_made()/lost() methods. (However, we could argue endlessly
>> over whether connection_lost() should be called when the subprocess
>> exits, or when the other side of all three pipes is closed. :-)
>
> While I don't really care about arguing over *when* connection_lost
> should be called, it *is* relevant to my thinking that getting
> notified when the process exits doesn't seem to me to be possible -
> again it's the issue that the transport can't ask the event loop to
> poll for anything that the event loop isn't already coded to check. So
> (once again, unless I've missed something) the only viable option for
> a standalone transport is to call connection_lost when all the pipes
> are closed.

That is typically how these things are done (e.g. popen and subprocess
work this way). It is also probably the most useful, since it is
*possible* that the parent process forks a child and then exits
itself, where the child does all the work of the pipeline.

> Am I still missing something?

I believe it is, at least in theory, possible to implement waiting for
the process to exit, using signals. The event loop can add signal
handlers, and there is a signal that gets sent upon child process
exit. There are lots of problems here (what if some other piece of
code forked that process) but we could come up with reasonable
solutions for these.

However waiting for the pipes closing makes the most sense, so no need
to bother. :-)

-- 
--Guido van Rossum (python.org/~guido)


From greg.ewing at canterbury.ac.nz  Sat Jan 19 00:12:34 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 19 Jan 2013 12:12:34 +1300
Subject: [Python-ideas] Regarding 'const' and 'require' statements
In-Reply-To: <D88D6CBD04B9E647B778917814A70F9A9240A9CC@USSLMMBX003.net.plm.eds.com>
References: <D88D6CBD04B9E647B778917814A70F9A9240A9CC@USSLMMBX003.net.plm.eds.com>
Message-ID: <50F9D6E2.9020703@canterbury.ac.nz>

Harding, James wrote:
> The name ST_MODE in this example is not a bindable name. The name only 
> lives during compilation and is not accessible at run-time.

I don't think that's a good idea. It would be better for it
to be available at run time like a normal module-level name,
but protected from rebinding.

There may be cases where the compiler can't work out the
value, such as when the module is imported dynamically. Such
code would then continue to work, it just wouldn't be
optimised.

Not having the name present at run time could also lead to
unexpected results. If something tries to rebind the name,
it will succeed, but it won't affect compiler-optimised
code using the name. It would be better if attempting to
rebind a const name raised an exception.

-- 
Greg


From greg.ewing at canterbury.ac.nz  Sat Jan 19 00:21:02 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 19 Jan 2013 12:21:02 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F8591LHqLogm3KE5ePi0Zr7zVG2rKfV06b7Na9ze_uu7w@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CACac1F8591LHqLogm3KE5ePi0Zr7zVG2rKfV06b7Na9ze_uu7w@mail.gmail.com>
Message-ID: <50F9D8DE.9040003@canterbury.ac.nz>

Paul Moore wrote:
> Is it not
> possible to set an already open file descriptor to be nonblocking? If
> that's the case, it means that Unix has the same problem as I suspect
> exists for Windows - existing pipes and filehandles can't be used in
> async code as they won't necessarily be in nonblocking mode.

No, it doesn't -- a fd doesn't *have* to be non-blocking in
order to use it with select/poll/whatever.

Sometimes people do, but only to allow a performance optimisation
by attempting another read before going back to the event loop,
just in case more data came in while you were processing the
first lot. But doing that is entirely optional.

Having said that, fcntl() is usually the way to change the
O_NONBLOCK flag of an already-opened fd, althought the details
may vary from one unix to another.

-- 
Greg


From greg.ewing at canterbury.ac.nz  Sat Jan 19 00:59:38 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 19 Jan 2013 12:59:38 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
Message-ID: <50F9E1EA.4010305@canterbury.ac.nz>

Guido van Rossum wrote:

> Well, except that you can't just pass CallbackProtocol where a
> protocol factory is required by the PEP -- you'll have to pass a
> lambda or partial function without arguments that calls
> CallbackProtocol with some arguments taken from elsewhere.

Something smells wrong to me about APIs that require protocol
factories. I don't see what advantage there is in writing

    create_connection(HTTPProtocol, "some.where.net", 80)

as opposed to just writing something like

    HTTPProtocol(TCPTransport("some.where.net", 80))

You're going to have to use the latter style anyway to set up
anything other than the very simplist configurations, e.g.
your earlier 4-layer protocol stack example.

So create_connection() can't be anything more than a convenience
function, and unless I'm missing something, it hardly seems to
add enough convenience to be worth the bother.

-- 
Greg


From guido at python.org  Sat Jan 19 01:12:29 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 16:12:29 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <50F9E1EA.4010305@canterbury.ac.nz>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
Message-ID: <CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>

On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
>> Well, except that you can't just pass CallbackProtocol where a
>> protocol factory is required by the PEP -- you'll have to pass a
>> lambda or partial function without arguments that calls
>> CallbackProtocol with some arguments taken from elsewhere.
>
>
> Something smells wrong to me about APIs that require protocol
> factories. I don't see what advantage there is in writing
>
>    create_connection(HTTPProtocol, "some.where.net", 80)
>
> as opposed to just writing something like
>
>    HTTPProtocol(TCPTransport("some.where.net", 80))
>
> You're going to have to use the latter style anyway to set up
> anything other than the very simplist configurations, e.g.
> your earlier 4-layer protocol stack example.
>
> So create_connection() can't be anything more than a convenience
> function, and unless I'm missing something, it hardly seems to
> add enough convenience to be worth the bother.

Glyph should really answer this one. Personally I don't feel strongly
either way for this case. There may be an advantage to not calling the
protocol factory if the connection can't be made (in which case the
Future returned by create_connection() has the exception).

-- 
--Guido van Rossum (python.org/~guido)


From solipsis at pitrou.net  Sat Jan 19 01:19:55 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 19 Jan 2013 01:19:55 +0100
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
Message-ID: <20130119011955.644003f3@pitrou.net>

On Fri, 18 Jan 2013 16:12:29 -0800
Guido van Rossum <guido at python.org> wrote:
> On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > Guido van Rossum wrote:
> >
> >> Well, except that you can't just pass CallbackProtocol where a
> >> protocol factory is required by the PEP -- you'll have to pass a
> >> lambda or partial function without arguments that calls
> >> CallbackProtocol with some arguments taken from elsewhere.
> >
> >
> > Something smells wrong to me about APIs that require protocol
> > factories. I don't see what advantage there is in writing
> >
> >    create_connection(HTTPProtocol, "some.where.net", 80)
> >
> > as opposed to just writing something like
> >
> >    HTTPProtocol(TCPTransport("some.where.net", 80))

Except that you probably want the protocol to outlive the transport if
you want to deal with reconnections or connection failures, and
therefore:

    TCPClient(HTTPProtocol(), ("some.where.net", 80))

Regards

Antoine.




From cs at zip.com.au  Sat Jan 19 01:30:38 2013
From: cs at zip.com.au (Cameron Simpson)
Date: Sat, 19 Jan 2013 11:30:38 +1100
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+rb3OENt+Oa89=OOpY97OMky2Eufy+iVnrk4c6A2AHWQ@mail.gmail.com>
References: <CAP7+vJ+rb3OENt+Oa89=OOpY97OMky2Eufy+iVnrk4c6A2AHWQ@mail.gmail.com>
Message-ID: <20130119003038.GA15133@cskk.homeip.net>

On 18Jan2013 15:01, Guido van Rossum <guido at python.org> wrote:
|  It is also probably the most useful, since it is
| *possible* that the parent process forks a child and then exits
| itself, where the child does all the work of the pipeline.

For me, even common. I often make grandchildren instead of children when
only the I/O matters so that I don't leave zombies around, nor spurious
processes to interfere with wait calls.
-- 
Cameron Simpson <cs at zip.com.au>

To have no errors
Would be life without meaning
No struggle, no joy
- Haiku Error Messages http://www.salonmagazine.com/21st/chal/1998/02/10chal2.html


From james.d.harding at siemens.com  Sat Jan 19 01:35:46 2013
From: james.d.harding at siemens.com (Harding, James)
Date: Sat, 19 Jan 2013 00:35:46 +0000
Subject: [Python-ideas] Regarding 'const' and 'require' statements
Message-ID: <D88D6CBD04B9E647B778917814A70F9A9240CC82@USSLMMBX003.net.plm.eds.com>


Thank you everyone for your comments. I wish to retract my idea due to some killer issues.

First, if a module had platform dependencies in defining constants, my scheme would not be able to handle that. The idea fails because many common situations would not work. Python should work for all.

Second, this scheme would require some sort of time stamping of files where constants were taken from in order to see if a re-compile is necessary. The time needed for time-stamp checks would likely be more than any time saved by using constant names.

Now, back to the shadows.

Thank you,

James Harding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130119/db5595d1/attachment.html>

From greg.ewing at canterbury.ac.nz  Sat Jan 19 01:42:20 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 19 Jan 2013 13:42:20 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
	<50F8F725.20505@canterbury.ac.nz>
	<CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
Message-ID: <50F9EBEC.2090106@canterbury.ac.nz>

Guido van Rossum wrote:
> I like everyone using the same names for these
> callback methods, so that a reader (who is familiar with the
> transport/protocol API) can instantly know what kind of callback it is
> and what its arguments are.

You don't seem to follow this philosophy anywhere else in
the PEP, though. In all the other places a callback is
specified, you get to pass in an arbitrary function.
The PEP offers no rationale as to why transports should
be the odd one out.

> You can write an alternative framework using coroutines and callbacks,
> bypassing transports and protocols. (You'll still need Futures.)
> However you'd be missing the interoperability offered by the
> protocol/transport abstractions: in an IOCP world you'd have to
> interact with the event loop's callbacks differently than in a
> select/poll/etc. world.

I was hoping there would be a slightly higher-level layer,
that provides a coroutine interface but hides the platform
differences.

What would you think of the idea of making the Transport
objects themselves fill both roles, by having read_async
and write_async methods? They wouldn't have to do any
buffering, I'd be happy to wrap another object around it
if I wanted that.

-- 
Greg


From guido at python.org  Sat Jan 19 02:16:54 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 18 Jan 2013 17:16:54 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <50F9EBEC.2090106@canterbury.ac.nz>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
	<50F8F725.20505@canterbury.ac.nz>
	<CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
	<50F9EBEC.2090106@canterbury.ac.nz>
Message-ID: <CAP7+vJKo7k4YwUtZ=auUeF8K1xBTzv0bFkXOc4oY5SkHgp0Oow@mail.gmail.com>

On Fri, Jan 18, 2013 at 4:42 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>> I like everyone using the same names for these
>> callback methods, so that a reader (who is familiar with the
>> transport/protocol API) can instantly know what kind of callback it is
>> and what its arguments are.

> You don't seem to follow this philosophy anywhere else in
> the PEP, though. In all the other places a callback is
> specified, you get to pass in an arbitrary function.
> The PEP offers no rationale as to why transports should
> be the odd one out.

Well, yes, it *is* the odd one (or two, counting start_serving()) out.
That's because it is the high-level API.

>> You can write an alternative framework using coroutines and callbacks,
>> bypassing transports and protocols. (You'll still need Futures.)
>> However you'd be missing the interoperability offered by the
>> protocol/transport abstractions: in an IOCP world you'd have to
>> interact with the event loop's callbacks differently than in a
>> select/poll/etc. world.

> I was hoping there would be a slightly higher-level layer,
> that provides a coroutine interface but hides the platform
> differences.

Hm, Transports+Protocols *is* the higher level layer.

> What would you think of the idea of making the Transport
> objects themselves fill both roles, by having read_async
> and write_async methods? They wouldn't have to do any
> buffering, I'd be happy to wrap another object around it
> if I wanted that.

You could code that up very simply using sock_recv() and
sock_sendall(). But everyone who's thought about performance of
select/poll/etc., seems to think that that is not a good model because
it will cause many extra calls to add/remove reader/writer.

-- 
--Guido van Rossum (python.org/~guido)


From glyph at twistedmatrix.com  Sat Jan 19 02:23:50 2013
From: glyph at twistedmatrix.com (Glyph)
Date: Fri, 18 Jan 2013 17:23:50 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
Message-ID: <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>


On Jan 18, 2013, at 4:12 PM, Guido van Rossum <guido at python.org> wrote:


> Glyph should really answer this one.

Thanks for pointing it out to me, keeping up with python-ideas is always a challenge :).

> Personally I don't feel strongly
> either way for this case. There may be an advantage to not calling the
> protocol factory if the connection can't be made (in which case the
> Future returned by create_connection() has the exception).



> On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

>> Guido van Rossum wrote:
>> 
>>> Well, except that you can't just pass CallbackProtocol where a
>>> protocol factory is required by the PEP -- you'll have to pass a
>>> lambda or partial function without arguments that calls
>>> CallbackProtocol with some arguments taken from elsewhere.
>> 
>> Something smells wrong to me about APIs that require protocol
>> factories.

For starters, nothing "smells wrong" to me about protocol factories.  Responding to this kind of criticism is difficult, because it's not substantive - what's the actual problem?  I think that some Python programmers have an aversion to factories because a common path to Python is flight from Java environments that over- or mis-use the factory pattern.

>> I don't see what advantage there is in writing
>> 
>>   create_connection(HTTPProtocol, "some.where.net", 80)
>> 
>> as opposed to just writing something like
>> 
>>   HTTPProtocol(TCPTransport("some.where.net", 80))

Guido mentioned one advantage already; you don't have to create the protocol object if the connection fails, so your protocol objects are real honest-to-goodness connections, not "well, maybe there's a connection or maybe there'll be a connection later".

To be fair, this is rarely of practical utility, but in edge cases where you are doing something like, "simultaneously try to connect to these 1000 hosts, and give up on all outstanding connections when the first 3 connections succeed", being able to avoid all the construction overhead for your protocols if they're not going to be used is nice.

There's a more pressing issue of correctness though: even if you create the protocol in advance, you really don't want to tell it about the transport until the transport truly exists.  The connection to some.where.net (by which I mean, ahem, "somewhere.example.com"; "where.net" will not thank you if you ignore BCP 32 in the documentation or examples) might fail, and if the client wants to issue a client greeting, it should not have access to its half-formed transport before that failure.  Of course, it's possible to present an API that works around this by buffering writes issued before the connection is established, and by the protocol waiting for the connection_made callback before actually doing its work.

Finally, using a factory also makes client-creating and server-creating code more symmetrical, since you clearly need a protocol factory in the listening-socket case.  If your main example protocol is HTTP, this doesn't make sense*, but once you start trying to do things like SIP or XMPP, where the participants in a connection are really peers, having the structure be similar is handy.  In the implementation, it's nice to have things set up this way so that the order of the protocol<->transport symmetric setup is less important and by the time the appropriate methods are being invoked, everybody knows about everybody else.  The transport can't really have a reference to the protocol in the protocol's constructor.

*: Unless you're doing this, of course <http://wiki.secondlife.com/wiki/Reverse_HTTP>.

However, aside from the factory-or-not issue, the fact that TCPTransport's name implies that it is both (1) a class and (2) the actual transport implementation, is more problematic.

TCPTransport will need multiple backends for different multiplexing and I/O mechanisms.  This is why I keep bringing up IOCP; this is a major API where the transport implementation is actually quite different.  In Twisted, they're entirely different classes.  They could probably share a bit more implementation than they do and reduce a little duplication, but it's nice that they don't have to.  You don't want to burden application code with picking the right one, and it's ugly to smash the socket-implementation-selection into a class.  (create_connection really ought to be a method on an event-loop object of some kind, which produces the appropriate implementation.  I think right now it implicitly looks it up in thread-local storage for the "current" main loop, and I'd rather it were more explicit, but the idea is the same.)

Your example is misleadingly named; surely you mean TCPClient, because a TCPTransport would implicitly support both clients and servers - and a server would start with a socket returned from accept(), not a host and port.  (Certainly not a DNS host name.)

create_connection will actually need to create multiple sockets internally.  See <http://tools.ietf.org/html/rfc3493> covers this, in part (for a more condensed discussion, see <https://twistedmatrix.com/trac/ticket/4859>).

>> You're going to have to use the latter style anyway to set up
>> anything other than the very simplist configurations, e.g.
>> your earlier 4-layer protocol stack example.

I don't see how this is true.  I've written layered protocols over and over again in Twisted and never wanted to manually construct the bottom transport for that reason.*  In fact, the more elaborate multi-layered structures you have to construct when a protocol finishes connecting, the more you want to avoid being required to do it in advance of actually needing the protocols to exist.

*: I _have_ had to manually construct transports to deal with some fiddly performance-tuning issues, but those are just deficiencies in the existing transport implementation that ought to be remedied.

>> So create_connection() can't be anything more than a convenience
>> function, and unless I'm missing something, it hardly seems to
>> add enough convenience to be worth the bother.

*Just* implementing the multiple-parallel-connection-attempts algorithm required to deal with the IPv6 transition period would be enough convenience to be worth having a function, even if none of the other stuff I just wrote applied :).

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/d28e46e4/attachment.html>

From greg.ewing at canterbury.ac.nz  Sat Jan 19 05:16:17 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 19 Jan 2013 17:16:17 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <20130119011955.644003f3@pitrou.net>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<20130119011955.644003f3@pitrou.net>
Message-ID: <50FA1E11.1060107@canterbury.ac.nz>

Antoine Pitrou wrote:
> Except that you probably want the protocol to outlive the transport if
> you want to deal with reconnections or connection failures, and
> therefore:
> 
>     TCPClient(HTTPProtocol(), ("some.where.net", 80))

I don't see how to generalise that to more complicated
protocol stacks, though.

For dealing with re-connections, it seems like both the
protocol *and* the transport need to outlive the connection
failure, and the transport needs a reconnect() method that
is called by a protocol that can deal with that situation.
Reconnection can then propagate along the whole chain.

-- 
Greg


From greg.ewing at canterbury.ac.nz  Sat Jan 19 07:05:35 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 19 Jan 2013 19:05:35 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
Message-ID: <50FA37AF.40306@canterbury.ac.nz>

Glyph wrote:
> I think that some Python 
> programmers have an aversion to factories because a common path to 
> Python is flight from Java environments that over- or mis-use the 
> factory pattern.

I'm not averse to using the factory pattern when it genuinely
helps. I'm questioning whether it helps enough in this case
to be worth using.

> Guido mentioned one advantage already; you don't have to create the 
> protocol object if the connection fails, so your protocol objects are 
> real honest-to-goodness connections, not "well, maybe there's a 
> connection or maybe there'll be a connection later".

I would suggest that merely instantiating a protocol object
should be cheap enough that you don't normally care. Any
substantive setup work should be done in the connection_made()
method, not in __init__().

Transports are already a "maybe there's a connection" kind of
deal, otherwise why does connection_made() exist at all?

> if the client wants to issue 
> a client greeting, it should not have access to its half-formed 
> transport before that failure.  Of course, it's possible to present an 
> API that works around this by buffering writes issued before the 
> connection is established, and by the protocol waiting for the 
> connection_made callback before actually doing its work.

Which it seems to me is the way *all* protocols should be
written. If necessary, you could "encourage" people to write
them this way by having a transport refuse to accept any
writes until the connection_made() call has occurred.

> However, aside from the factory-or-not issue, the fact that 
> TCPTransport's name implies that it is both (1) a class and (2) the 
> actual transport implementation, is more problematic.

They don't have to be classes, they could be functions:

    create_http_protocol(create_tcp_transport("hammerme.seeificare.com", 80))

The important thing is that each function concerns itself
with just one step of the chain, and chains of any length
can be constructed by composing them in the obvious way.

> Your example is misleadingly named; surely you mean TCP*Client*, because 
> a TCP*Transport* would implicitly support both clients and servers - and 
> a server would start with a socket returned from accept(), not a host 
> and port.

Maybe. Or maybe the constructor could be called in more than one
way -- create_tcp_transport(host, port) on the client side and
create_tcp_transport(socket) on the server side.
> 
> create_connection will actually need to create multiple sockets 
> internally.  See <http://tools.ietf.org/html/rfc3493> covers this, in 
> part (for a more condensed discussion, see 
> <https://twistedmatrix.com/trac/ticket/4859>).

Couldn't all that be handled inside the transport?

> I've written layered protocols over and 
> over again in Twisted and never wanted to manually construct the bottom 
> transport for that reason.

So what does the code for setting up a multi-layer stack look
like? How does it make use of create_connection()?

Also, what does an implementation of create_connection() look
like that avoids creating the protocol until the connection is
made? It seems tricky, because the way you know the connection
is made is that it calls connection_made() on the protocol.

But there's no protocol yet. So you would have to install a
temporary protocol whose connection_made() creates the real
protocol. That sounds like it could be even more overhead than
just creating the real protocol in the first place, as long
as the protocol doesn't do any work until its connection_made()
is called.

-- 
Greg


From shane at umbrellacode.com  Sat Jan 19 07:20:29 2013
From: shane at umbrellacode.com (Shane Green)
Date: Fri, 18 Jan 2013 22:20:29 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <50FA1E11.1060107@canterbury.ac.nz>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<20130119011955.644003f3@pitrou.net>
	<50FA1E11.1060107@canterbury.ac.nz>
Message-ID: <C0007D3D-5ADC-402A-A960-5F8D63E0171F@umbrellacode.com>

Just like there's no reason for having a protocol without a transport, it seems like there's no reason for a transport without a connection, and that separating the two might further normalize differences between client and server channels




Shane Green 
www.umbrellacode.com
805-452-9666 | shane at umbrellacode.com

On Jan 18, 2013, at 8:16 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:

> Antoine Pitrou wrote:
>> Except that you probably want the protocol to outlive the transport if
>> you want to deal with reconnections or connection failures, and
>> therefore:
>>    TCPClient(HTTPProtocol(), ("some.where.net", 80))
> 
> I don't see how to generalise that to more complicated
> protocol stacks, though.
> 
> For dealing with re-connections, it seems like both the
> protocol *and* the transport need to outlive the connection
> failure, and the transport needs a reconnect() method that
> is called by a protocol that can deal with that situation.
> Reconnection can then propagate along the whole chain.
> 
> -- 
> Greg
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130118/e79f0bfd/attachment.html>

From d.s at daniel.shahaf.name  Sat Jan 19 11:10:24 2013
From: d.s at daniel.shahaf.name (Daniel Shahaf)
Date: Sat, 19 Jan 2013 12:10:24 +0200
Subject: [Python-ideas] chdir context manager
Message-ID: <20130119101024.GB2969@lp-shahaf.local>

The following is a common pattern (used by, for example,
shutil.make_archive):

    save_cwd = os.getcwd()
    try:
        foo()
    finally:
        os.chdir(save_cwd)

I suggest this deserves a context manager:

    with saved_cwd():
        foo()

Initial feedback on IRC suggests shutil as where this functionality
should live (other suggestions were made, such as pathlib).  Hence,
attached patch implements this as shutil.saved_cwd, based on os.fchdir.

The patch also adds os.chdir to os.supports_dir_fd and documents the
context manager abilities of builtins.open() in its reference.

Thoughts?

Thanks,

Daniel


diff -r 74b0461346f0 Doc/library/functions.rst
--- a/Doc/library/functions.rst	Fri Jan 18 17:53:18 2013 -0800
+++ b/Doc/library/functions.rst	Sat Jan 19 09:39:27 2013 +0000
@@ -828,6 +828,9 @@ are always available.  They are listed h
    Open *file* and return a corresponding :term:`file object`.  If the file
    cannot be opened, an :exc:`OSError` is raised.
 
+   This function can be used as a :term:`context manager` that closes the
+   file when it exits.
+
    *file* is either a string or bytes object giving the pathname (absolute or
    relative to the current working directory) of the file to be opened or
    an integer file descriptor of the file to be wrapped.  (If a file descriptor
diff -r 74b0461346f0 Doc/library/os.rst
--- a/Doc/library/os.rst	Fri Jan 18 17:53:18 2013 -0800
+++ b/Doc/library/os.rst	Sat Jan 19 09:39:27 2013 +0000
@@ -1315,6 +1315,9 @@ features:
    This function can support :ref:`specifying a file descriptor <path_fd>`.  The
    descriptor must refer to an opened directory, not an open file.
 
+   See also :func:`shutil.saved_cwd` for a context manager that restores the
+   current working directory.
+
    Availability: Unix, Windows.
 
    .. versionadded:: 3.3
diff -r 74b0461346f0 Doc/library/shutil.rst
--- a/Doc/library/shutil.rst	Fri Jan 18 17:53:18 2013 -0800
+++ b/Doc/library/shutil.rst	Sat Jan 19 09:39:27 2013 +0000
@@ -36,6 +36,19 @@ copying and removal. For operations on i
 Directory and files operations
 ------------------------------
 
+.. function:: saved_cwd()
+
+   Return a :term:`context manager` that restores the current working directory
+   when it exits.  See :func:`os.chdir` for changing the current working
+   directory.
+
+   The context manager returns an open file descriptor for the saved directory.
+
+   Only available when :func:`os.chdir` supports file descriptor arguments.
+
+   .. versionadded:: 3.4
+
+
 .. function:: copyfileobj(fsrc, fdst[, length])
 
    Copy the contents of the file-like object *fsrc* to the file-like object *fdst*.
diff -r 74b0461346f0 Lib/os.py
--- a/Lib/os.py	Fri Jan 18 17:53:18 2013 -0800
+++ b/Lib/os.py	Sat Jan 19 09:39:27 2013 +0000
@@ -120,6 +120,7 @@ if _exists("_have_functions"):
 
     _set = set()
     _add("HAVE_FACCESSAT",  "access")
+    _add("HAVE_FCHDIR",     "chdir")
     _add("HAVE_FCHMODAT",   "chmod")
     _add("HAVE_FCHOWNAT",   "chown")
     _add("HAVE_FSTATAT",    "stat")
diff -r 74b0461346f0 Lib/shutil.py
--- a/Lib/shutil.py	Fri Jan 18 17:53:18 2013 -0800
+++ b/Lib/shutil.py	Sat Jan 19 09:39:27 2013 +0000
@@ -38,6 +38,7 @@ __all__ = ["copyfileobj", "copyfile", "c
            "unregister_unpack_format", "unpack_archive",
            "ignore_patterns", "chown", "which"]
            # disk_usage is added later, if available on the platform
+           # saved_cwd is added later, if available on the platform
 
 class Error(OSError):
     pass
@@ -1111,3 +1112,20 @@ def which(cmd, mode=os.F_OK | os.X_OK, p
                 if _access_check(name, mode):
                     return name
     return None
+
+# Define the chdir context manager.
+if os.chdir in os.supports_dir_fd:
+    class saved_cwd:
+        def __init__(self):
+            pass
+        def __enter__(self):
+            self.dh = os.open(os.curdir,
+                              os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0))
+            return self.dh
+        def __exit__(self, exc_type, exc_value, traceback):
+            try:
+                os.chdir(self.dh)
+            finally:
+                os.close(self.dh)
+            return False
+    __all__.append('saved_cwd')
diff -r 74b0461346f0 Lib/test/test_shutil.py
--- a/Lib/test/test_shutil.py	Fri Jan 18 17:53:18 2013 -0800
+++ b/Lib/test/test_shutil.py	Sat Jan 19 09:39:27 2013 +0000
@@ -1276,6 +1276,20 @@ class TestShutil(unittest.TestCase):
         rv = shutil.copytree(src_dir, dst_dir)
         self.assertEqual(['foo'], os.listdir(rv))
 
+    def test_saved_cwd(self):
+        if hasattr(os, 'fchdir'):
+            temp_dir = self.mkdtemp()
+            orig_dir = os.getcwd()
+            with shutil.saved_cwd() as dir_fd:
+                os.chdir(temp_dir)
+                new_dir = os.getcwd()
+                self.assertIsInstance(dir_fd, int)
+            final_dir = os.getcwd()
+            self.assertEqual(orig_dir, final_dir)
+            self.assertEqual(temp_dir, new_dir)
+        else:
+            self.assertFalse(hasattr(shutil, 'saved_cwd'))
+
 
 class TestWhich(unittest.TestCase):
 


From _ at lvh.cc  Sat Jan 19 11:19:41 2013
From: _ at lvh.cc (Laurens Van Houtven)
Date: Sat, 19 Jan 2013 11:19:41 +0100
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130119101024.GB2969@lp-shahaf.local>
References: <20130119101024.GB2969@lp-shahaf.local>
Message-ID: <CAE_Hg6bJedgK+1EHEFip6+1JNVkjxuLS2WWo_S5tw2erKa5Y-w@mail.gmail.com>

+1


On Sat, Jan 19, 2013 at 11:10 AM, Daniel Shahaf <d.s at daniel.shahaf.name>wrote:

> The following is a common pattern (used by, for example,
> shutil.make_archive):
>
>     save_cwd = os.getcwd()
>     try:
>         foo()
>     finally:
>         os.chdir(save_cwd)
>
> I suggest this deserves a context manager:
>
>     with saved_cwd():
>         foo()
>
> Initial feedback on IRC suggests shutil as where this functionality
> should live (other suggestions were made, such as pathlib).  Hence,
> attached patch implements this as shutil.saved_cwd, based on os.fchdir.
>
> The patch also adds os.chdir to os.supports_dir_fd and documents the
> context manager abilities of builtins.open() in its reference.
>
> Thoughts?
>
> Thanks,
>
> Daniel
>
>
> diff -r 74b0461346f0 Doc/library/functions.rst
> --- a/Doc/library/functions.rst Fri Jan 18 17:53:18 2013 -0800
> +++ b/Doc/library/functions.rst Sat Jan 19 09:39:27 2013 +0000
> @@ -828,6 +828,9 @@ are always available.  They are listed h
>     Open *file* and return a corresponding :term:`file object`.  If the
> file
>     cannot be opened, an :exc:`OSError` is raised.
>
> +   This function can be used as a :term:`context manager` that closes the
> +   file when it exits.
> +
>     *file* is either a string or bytes object giving the pathname
> (absolute or
>     relative to the current working directory) of the file to be opened or
>     an integer file descriptor of the file to be wrapped.  (If a file
> descriptor
> diff -r 74b0461346f0 Doc/library/os.rst
> --- a/Doc/library/os.rst        Fri Jan 18 17:53:18 2013 -0800
> +++ b/Doc/library/os.rst        Sat Jan 19 09:39:27 2013 +0000
> @@ -1315,6 +1315,9 @@ features:
>     This function can support :ref:`specifying a file descriptor
> <path_fd>`.  The
>     descriptor must refer to an opened directory, not an open file.
>
> +   See also :func:`shutil.saved_cwd` for a context manager that restores
> the
> +   current working directory.
> +
>     Availability: Unix, Windows.
>
>     .. versionadded:: 3.3
> diff -r 74b0461346f0 Doc/library/shutil.rst
> --- a/Doc/library/shutil.rst    Fri Jan 18 17:53:18 2013 -0800
> +++ b/Doc/library/shutil.rst    Sat Jan 19 09:39:27 2013 +0000
> @@ -36,6 +36,19 @@ copying and removal. For operations on i
>  Directory and files operations
>  ------------------------------
>
> +.. function:: saved_cwd()
> +
> +   Return a :term:`context manager` that restores the current working
> directory
> +   when it exits.  See :func:`os.chdir` for changing the current working
> +   directory.
> +
> +   The context manager returns an open file descriptor for the saved
> directory.
> +
> +   Only available when :func:`os.chdir` supports file descriptor
> arguments.
> +
> +   .. versionadded:: 3.4
> +
> +
>  .. function:: copyfileobj(fsrc, fdst[, length])
>
>     Copy the contents of the file-like object *fsrc* to the file-like
> object *fdst*.
> diff -r 74b0461346f0 Lib/os.py
> --- a/Lib/os.py Fri Jan 18 17:53:18 2013 -0800
> +++ b/Lib/os.py Sat Jan 19 09:39:27 2013 +0000
> @@ -120,6 +120,7 @@ if _exists("_have_functions"):
>
>      _set = set()
>      _add("HAVE_FACCESSAT",  "access")
> +    _add("HAVE_FCHDIR",     "chdir")
>      _add("HAVE_FCHMODAT",   "chmod")
>      _add("HAVE_FCHOWNAT",   "chown")
>      _add("HAVE_FSTATAT",    "stat")
> diff -r 74b0461346f0 Lib/shutil.py
> --- a/Lib/shutil.py     Fri Jan 18 17:53:18 2013 -0800
> +++ b/Lib/shutil.py     Sat Jan 19 09:39:27 2013 +0000
> @@ -38,6 +38,7 @@ __all__ = ["copyfileobj", "copyfile", "c
>             "unregister_unpack_format", "unpack_archive",
>             "ignore_patterns", "chown", "which"]
>             # disk_usage is added later, if available on the platform
> +           # saved_cwd is added later, if available on the platform
>
>  class Error(OSError):
>      pass
> @@ -1111,3 +1112,20 @@ def which(cmd, mode=os.F_OK | os.X_OK, p
>                  if _access_check(name, mode):
>                      return name
>      return None
> +
> +# Define the chdir context manager.
> +if os.chdir in os.supports_dir_fd:
> +    class saved_cwd:
> +        def __init__(self):
> +            pass
> +        def __enter__(self):
> +            self.dh = os.open(os.curdir,
> +                              os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0))
> +            return self.dh
> +        def __exit__(self, exc_type, exc_value, traceback):
> +            try:
> +                os.chdir(self.dh)
> +            finally:
> +                os.close(self.dh)
> +            return False
> +    __all__.append('saved_cwd')
> diff -r 74b0461346f0 Lib/test/test_shutil.py
> --- a/Lib/test/test_shutil.py   Fri Jan 18 17:53:18 2013 -0800
> +++ b/Lib/test/test_shutil.py   Sat Jan 19 09:39:27 2013 +0000
> @@ -1276,6 +1276,20 @@ class TestShutil(unittest.TestCase):
>          rv = shutil.copytree(src_dir, dst_dir)
>          self.assertEqual(['foo'], os.listdir(rv))
>
> +    def test_saved_cwd(self):
> +        if hasattr(os, 'fchdir'):
> +            temp_dir = self.mkdtemp()
> +            orig_dir = os.getcwd()
> +            with shutil.saved_cwd() as dir_fd:
> +                os.chdir(temp_dir)
> +                new_dir = os.getcwd()
> +                self.assertIsInstance(dir_fd, int)
> +            final_dir = os.getcwd()
> +            self.assertEqual(orig_dir, final_dir)
> +            self.assertEqual(temp_dir, new_dir)
> +        else:
> +            self.assertFalse(hasattr(shutil, 'saved_cwd'))
> +
>
>  class TestWhich(unittest.TestCase):
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
cheers
lvh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130119/fc28ac41/attachment.html>

From _ at lvh.cc  Sat Jan 19 11:28:07 2013
From: _ at lvh.cc (Laurens Van Houtven)
Date: Sat, 19 Jan 2013 11:28:07 +0100
Subject: [Python-ideas] PEP 3156 / Tulip question: write/send
	callback/future
In-Reply-To: <CAP7+vJ+Sd4u4gP0vqeiT6GYn9CdSqn5zWmi0aReoy+y2pnb4Pg@mail.gmail.com>
References: <CAF-Rda9SPFtS6PeASzOy5e=i__aJE2rMwnquDtBBrdk=rUYFaw@mail.gmail.com>
	<CAP7+vJJPU2rd=iMzNX+MOkU00UJj-AD-c-R=A9Lo6Y4FBhpo6A@mail.gmail.com>
	<CAF-Rda_ftvkJOzrHnJgECpKec1zTh2V7OT=HBH686uPA-FC4Rg@mail.gmail.com>
	<CAP7+vJ+Sd4u4gP0vqeiT6GYn9CdSqn5zWmi0aReoy+y2pnb4Pg@mail.gmail.com>
Message-ID: <CAE_Hg6Zs8W38bzMoVwqVQXRdO8uu56OkC-rOGcDy9Cp=qOnHBg@mail.gmail.com>

Also, ISTR that it's not always possible to consistently have that behavior
everywhere (i.e. have it in the first place or fake it where it's not
directly available), so it's of somewhat limited utility, since a protocol
can't actually rely on it existing. Most behavior that requires it is
generally implemented using IPushProducer/IPullProducer (i.e. the
pause/resume API Guido mentioned earlier).

There's been some attempts at work towards a better producer/consumer API
(e.g. supporting things like buffer changes, and generally just simplifying
things that seem duplicated amongst transports and consumers/producers)
called 'tubes', but I don't think any of that is ready enough to be a
template for tulip :)


On Fri, Jan 18, 2013 at 11:25 PM, Guido van Rossum <guido at python.org> wrote:

> On Fri, Jan 18, 2013 at 1:40 PM, Eli Bendersky <eliben at gmail.com> wrote:
> > On Fri, Jan 18, 2013 at 1:02 PM, Guido van Rossum <guido at python.org>
> wrote:
> >>
> >> On Fri, Jan 18, 2013 at 6:56 AM, Eli Bendersky <eliben at gmail.com>
> wrote:
> >> > I'm looking through PEP 3156 and the Tulip code, and either something
> is
> >> > missing or I'm not looking in the right places.
> >> >
> >> > I can't find any sort of callback / future return for asynchronous
> >> > writes,
> >> > e.g. in transport.
> >>
> >> I guess you should read some Twisted tutorial. :-)
> >
> >
> > Yes, I noticed that Twisted also doesn't have it, so I suspected that
> > influence.
> >
> >>
> >>
> >> > Should there be no "data_sent" parallel to "data_received" somewhere?
> >> > Or,
> >> > alternatively, "write" returning some sort of future that can be
> checked
> >> > later for status? For connections that aren't infinitely fast it's
> >> > useful to
> >> > know when the data was actually sent/written, or alternatively if an
> >> > error
> >> > has occurred. This is also important for when writing would actually
> >> > block
> >> > because of full buffers. boost::asio has such a handler for
> async_write.
> >>
> >> The model is a little different. Glyph has convinced me that it works
> >> well in practice. We just buffer what is written (when it can't all be
> >> sent immediately). This is enough for most apps that don't serve 100MB
> >> files. If the buffer becomes too large, the transport will call
> >> .pause() on the protocol until it is drained, then it calls .resume().
> >> (The names of these are TBD, maybe they will end up .pause_writing()
> >> and .resume_writing().) There are some default behaviors that we can
> >> add here too, e.g. suspending the task.
> >>
> >
> > I agree it can be made to work, but how would even simple "done sending"
> > notification work? Or "send error" for that matter? AFAIR, low-level
> socket
> > async API do provide this information. Are we confident enough it will
> never
> > be needed to simply hide it away?
>
> AFAIK the Twisted folks have found that most of the time (basically
> all of the time) you don't need a positive "done sending"
> notification; when the send eventually *fails*, the transport calls
> the protocol's connection_lost() method with an exception indicating
> what failed.
>
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
cheers
lvh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130119/e8aba79f/attachment.html>

From p.f.moore at gmail.com  Sat Jan 19 12:15:14 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 19 Jan 2013 11:15:14 +0000
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130119101024.GB2969@lp-shahaf.local>
References: <20130119101024.GB2969@lp-shahaf.local>
Message-ID: <CACac1F9pU0nFpK2kD9kpJF9-iJ0ZH=hm_+=bH+FHka92V=A5Ww@mail.gmail.com>

On 19 January 2013 10:10, Daniel Shahaf <d.s at daniel.shahaf.name> wrote:
> The following is a common pattern (used by, for example,
> shutil.make_archive):
>
>     save_cwd = os.getcwd()
>     try:
>         foo()
>     finally:
>         os.chdir(save_cwd)
>
> I suggest this deserves a context manager:
>
>     with saved_cwd():
>         foo()

+1. I've written this myself many times...

Paul


From p.f.moore at gmail.com  Sat Jan 19 13:12:52 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 19 Jan 2013 12:12:52 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
	<CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>
Message-ID: <CACac1F_ALpTMMWqeJo4_QEiNMyRtVTOwdt-2fRk0AHqkhWfhdw@mail.gmail.com>

On 18 January 2013 22:53, Guido van Rossum <guido at python.org> wrote:
> I can probably fairly quickly modify
>> your code to demonstrate, but it's late and I don't want to start
>> booting my Unix environment now, so it'll have to wait till tomorrow
>> :-)
>
> I would love for you to create that version. I only checked it in so I
> could point to it -- I am not happy with either the implementation,
> the API spec, or the unit test...

May be a few days before I can get to it. Apparently when Ubuntu
installs an automatic upgrade, it feels that it's OK to break the
wireless drivers. I now have the choice of scouring the internet on
another PC to find possible solutions (so far that approach is a waste
of time...), or reinstalling the OS. How do you Linux users put up
with this sort of thing? :-)

Seriously, I'm probably going to have to build a VM so I don't get
this sort of unnecessary hardware issue holding me up.
Paul


From jsbueno at python.org.br  Sat Jan 19 13:17:57 2013
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Sat, 19 Jan 2013 10:17:57 -0200
Subject: [Python-ideas] chdir context manager
In-Reply-To: <CACac1F9pU0nFpK2kD9kpJF9-iJ0ZH=hm_+=bH+FHka92V=A5Ww@mail.gmail.com>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CACac1F9pU0nFpK2kD9kpJF9-iJ0ZH=hm_+=bH+FHka92V=A5Ww@mail.gmail.com>
Message-ID: <CAH0mxTQfJkWzQGGUxwbzganGMoB+tJ82cpohgu-28pNnGtTm7w@mail.gmail.com>

> On 19 January 2013 10:10, Daniel Shahaf <d.s at daniel.shahaf.name> wrote:

>> I suggest this deserves a context manager:
>>
>>     with saved_cwd():
>>         foo()
>

But if doing that, why does "foo" have to implement the directory
changing itself?


Why not something along:

with temp_dir("/tmp"):
    # things that perform in  /tmp

# directory is restored.

Of course that one function could do both things,
depending ob wether a <dir> parameter is passed.

  js
 -><-


From d.s at daniel.shahaf.name  Sat Jan 19 13:33:29 2013
From: d.s at daniel.shahaf.name (Daniel Shahaf)
Date: Sat, 19 Jan 2013 14:33:29 +0200
Subject: [Python-ideas] chdir context manager
In-Reply-To: <CAH0mxTQfJkWzQGGUxwbzganGMoB+tJ82cpohgu-28pNnGtTm7w@mail.gmail.com>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CACac1F9pU0nFpK2kD9kpJF9-iJ0ZH=hm_+=bH+FHka92V=A5Ww@mail.gmail.com>
	<CAH0mxTQfJkWzQGGUxwbzganGMoB+tJ82cpohgu-28pNnGtTm7w@mail.gmail.com>
Message-ID: <20130119123329.GD2969@lp-shahaf.local>

Joao S. O. Bueno wrote on Sat, Jan 19, 2013 at 10:17:57 -0200:
> Why not something along:
> 
> with temp_dir("/tmp"):
>     # things that perform in  /tmp
> 
> # directory is restored.
> 
> Of course that one function could do both things,
> depending ob wether a <dir> parameter is passed.
> 

+1



From tjreedy at udel.edu  Sat Jan 19 14:37:17 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 19 Jan 2013 08:37:17 -0500
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130119101024.GB2969@lp-shahaf.local>
References: <20130119101024.GB2969@lp-shahaf.local>
Message-ID: <kde7ip$t5a$1@ger.gmane.org>

On 1/19/2013 5:10 AM, Daniel Shahaf wrote:
> The following is a common pattern (used by, for example,
> shutil.make_archive):
>
>      save_cwd = os.getcwd()
>      try:
>          foo()
>      finally:
>          os.chdir(save_cwd)
>
> I suggest this deserves a context manager:
>
>      with saved_cwd():
>          foo()

This strikes me as not a proper context manager. A context manager 
should create a temporary, altered context. One way is to add something 
that is deleted on exit. File as context managers are the typical 
example. Another way is to alter something after saving the restore 
info, and restoring on exit. An example would be a context manager to 
temporarily change stdout. (Do we have one? If not, it would be at least 
as generally useful as this proposal.)

So to me, your proposal is only 1/2 or 2/3 of a context manager. (And 
'returns an open file descriptor for the saved directory' seems backward 
or wrong for a context manager.) It does not actually make a new 
context. A proper temp_cwd context manager should have one parameter, 
the new working directory, with chdir(new_cwd) in the enter method. To 
allow for conditional switching, the two chdir system calls could be 
conditional on new_cwd (either None or '' would mean no chdir calls).

Looking at your pattern, if foo() does not change cwd, the save and 
restore is pointless, even if harmless. If foo does change cwd, it 
should also restore it, whether explicitly or with a context manager 
temp_cwd.

Looking at your actual example, shutil.make_archive, the change and 
restore are conditional and asymmetrical.

   save_cwd = os.getcwd()
   if root_dir is not None:
     ...
     if not dry_run:
       os.chdir(root_dir)
   ...
   finally:
     if root_dir is not None:
     ...
       os.chdir(save_cwd)

The initial chdir is conditional on dry_run (undocumented, but passed on 
to the archive function), the restore is not. Since I believe not 
switching on dry_runs is just a minor optimization, I believe that that 
condition could be dropped and the code re-written as

   with new_cwd(root_dir):
     ...

I am aware that this would require a change in the finally logging, but 
that would be true of the original proposal also.

-- 
Terry Jan Reedy



From p.f.moore at gmail.com  Sat Jan 19 14:37:37 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Sat, 19 Jan 2013 13:37:37 +0000
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130119123329.GD2969@lp-shahaf.local>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CACac1F9pU0nFpK2kD9kpJF9-iJ0ZH=hm_+=bH+FHka92V=A5Ww@mail.gmail.com>
	<CAH0mxTQfJkWzQGGUxwbzganGMoB+tJ82cpohgu-28pNnGtTm7w@mail.gmail.com>
	<20130119123329.GD2969@lp-shahaf.local>
Message-ID: <CACac1F9D0qGiOr=f3dLGRZp-OB0HqDc90k4vM_+eRT7cU-T5ww@mail.gmail.com>

On 19 January 2013 12:33, Daniel Shahaf <d.s at daniel.shahaf.name> wrote:
>> Of course that one function could do both things,
>> depending ob wether a <dir> parameter is passed.
>>
>
> +1

Yes, that's a better idea.
Paul


From vinay_sajip at yahoo.co.uk  Sat Jan 19 14:46:26 2013
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Sat, 19 Jan 2013 13:46:26 +0000 (UTC)
Subject: [Python-ideas] chdir context manager
References: <20130119101024.GB2969@lp-shahaf.local>
	<CACac1F9pU0nFpK2kD9kpJF9-iJ0ZH=hm_+=bH+FHka92V=A5Ww@mail.gmail.com>
	<CAH0mxTQfJkWzQGGUxwbzganGMoB+tJ82cpohgu-28pNnGtTm7w@mail.gmail.com>
	<20130119123329.GD2969@lp-shahaf.local>
Message-ID: <loom.20130119T144311-843@post.gmane.org>

Daniel Shahaf <d.s at ...> writes:

> Joao S. O. Bueno wrote on Sat, Jan 19, 2013 at 10:17:57 -0200:
> > with temp_dir("/tmp"):
> >     # things that perform in  /tmp
> > # directory is restored.
> 
> +1

I implemented this in distlib as:

@contextlib.contextmanager
def chdir(d):
    cwd = os.getcwd()
    try:
        os.chdir(d)
        yield
    finally:
        os.chdir(cwd)

which could perhaps be placed in shutil, so usage would be:

with shutil.chdir('new_dir'):
    # work with new_dir as current dir
# directory restored when you get here.

Regards,

Vinay Sajip




From phd at phdru.name  Sat Jan 19 15:02:19 2013
From: phd at phdru.name (Oleg Broytman)
Date: Sat, 19 Jan 2013 18:02:19 +0400
Subject: [Python-ideas] chdir context manager
In-Reply-To: <loom.20130119T144311-843@post.gmane.org>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CACac1F9pU0nFpK2kD9kpJF9-iJ0ZH=hm_+=bH+FHka92V=A5Ww@mail.gmail.com>
	<CAH0mxTQfJkWzQGGUxwbzganGMoB+tJ82cpohgu-28pNnGtTm7w@mail.gmail.com>
	<20130119123329.GD2969@lp-shahaf.local>
	<loom.20130119T144311-843@post.gmane.org>
Message-ID: <20130119140219.GA10303@iskra.aviel.ru>

On Sat, Jan 19, 2013 at 01:46:26PM +0000, Vinay Sajip <vinay_sajip at yahoo.co.uk> wrote:
> Daniel Shahaf <d.s at ...> writes:
> 
> > Joao S. O. Bueno wrote on Sat, Jan 19, 2013 at 10:17:57 -0200:
> > > with temp_dir("/tmp"):
> > >     # things that perform in  /tmp
> > > # directory is restored.
> > 
> > +1
> 
> I implemented this in distlib as:
> 
> @contextlib.contextmanager
> def chdir(d):
>     cwd = os.getcwd()
>     try:
>         os.chdir(d)
>         yield
>     finally:
>         os.chdir(cwd)
> 
> which could perhaps be placed in shutil, so usage would be:
> 
> with shutil.chdir('new_dir'):
>     # work with new_dir as current dir
> # directory restored when you get here.

   Pushd or pushdir would be a better name, IMHO.

https://en.wikipedia.org/wiki/Pushd_and_popd

   Quite a known pair of names.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From ncoghlan at gmail.com  Sat Jan 19 15:57:47 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jan 2013 00:57:47 +1000
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130119101024.GB2969@lp-shahaf.local>
References: <20130119101024.GB2969@lp-shahaf.local>
Message-ID: <CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>

-1 from me

I consider caring about the current directory to be an anti-pattern -
paths should be converted to absolute ASAP, and for invocation of
other tools that care about the current directory, that's why the
subprocess APIs accept a "cwd" argument. I certainly don't want to
encourage people to unnecessarily rely on global state by providing a
standard library context manager that makes it easier to do so.

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From d.s at daniel.shahaf.name  Sat Jan 19 16:06:31 2013
From: d.s at daniel.shahaf.name (Daniel Shahaf)
Date: Sat, 19 Jan 2013 17:06:31 +0200
Subject: [Python-ideas] chdir context manager
In-Reply-To: <kde7ip$t5a$1@ger.gmane.org>
References: <20130119101024.GB2969@lp-shahaf.local>
	<kde7ip$t5a$1@ger.gmane.org>
Message-ID: <20130119150631.GF2969@lp-shahaf.local>

Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500:
> On 1/19/2013 5:10 AM, Daniel Shahaf wrote:
>> The following is a common pattern (used by, for example,
>> shutil.make_archive):
>>
>>      save_cwd = os.getcwd()
>>      try:
>>          foo()
>>      finally:
>>          os.chdir(save_cwd)
>>
>> I suggest this deserves a context manager:
>>
>>      with saved_cwd():
>>          foo()
>
> So to me, your proposal is only 1/2 or 2/3 of a context manager. (And  
> 'returns an open file descriptor for the saved directory' seems backward  
> or wrong for a context manager.) It does not actually make a new  

What should __enter__ return, then?

It could return None, the to-be-restored directory's file descriptor, or
the newly-changed-to directory (once a "directory to chdir to" optional
argument is added).  The latter could be either a pathname (string) or
a file descriptor (since it's just passed through to os.chdir).

It seems to me returning the old dir's fd would be the most useful of
the three option, since the other two are things callers already have
--- None, which is global, and the argument to the context manager.

> context. A proper temp_cwd context manager should have one parameter,  
> the new working directory, with chdir(new_cwd) in the enter method. To  
> allow for conditional switching, the two chdir system calls could be  
> conditional on new_cwd (either None or '' would mean no chdir calls).
>

I think making the new_cwd argument optional would be useful if the
context manager body does multiple chdir() calls:

    with saved_cwd():
        os.chdir('/foo')
        do_something()
        os.chdir('/bar')
        do_something()

I'm not sure if that's exactly what you suggest --- you seem to be
suggesting that saved_cwd(None) will avoid calling fchdir() from
__exit__()?

> Looking at your pattern, if foo() does not change cwd, the save and  
> restore is pointless, even if harmless.

Do you have a better suggestion?  Determining whether the fchdir() call
can be avoided, if possible, presumably requires a system call, so
I figure you might as well call fchdir() without trying to make that
determination.

> If foo does change cwd, it  should also restore it, whether explicitly
> or with a context manager  temp_cwd.
>
> Looking at your actual example, shutil.make_archive, the change and  
> restore are conditional and asymmetrical.
>

shutil.make_archive is just the first place in stdlib which uses the
pattern, or something close to it.  It's not exactly a canonical
example.  There are some canonical examples of the "pattern" in
test_os.py.

> -- 
> Terry Jan Reedy

Cheers

Daniel


From ironfroggy at gmail.com  Sat Jan 19 16:27:56 2013
From: ironfroggy at gmail.com (Calvin Spealman)
Date: Sat, 19 Jan 2013 10:27:56 -0500
Subject: [Python-ideas] chdir context manager
In-Reply-To: <CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
Message-ID: <CAGaVwhQQ_fjfSuw3kPiVffqvWvDvLH8U=zt8_+7Q4ekoejp5pA@mail.gmail.com>

-1 from me, as well. Encouraging a bad habit.


On Sat, Jan 19, 2013 at 9:57 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> -1 from me
>
> I consider caring about the current directory to be an anti-pattern -
> paths should be converted to absolute ASAP, and for invocation of
> other tools that care about the current directory, that's why the
> subprocess APIs accept a "cwd" argument. I certainly don't want to
> encourage people to unnecessarily rely on global state by providing a
> standard library context manager that makes it easier to do so.
>
> Regards,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing:
http://www.twitter.com/ironfroggy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130119/1d5622d1/attachment.html>

From christian at python.org  Sat Jan 19 16:33:44 2013
From: christian at python.org (Christian Heimes)
Date: Sat, 19 Jan 2013 16:33:44 +0100
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130119101024.GB2969@lp-shahaf.local>
References: <20130119101024.GB2969@lp-shahaf.local>
Message-ID: <50FABCD8.9080709@python.org>

Am 19.01.2013 11:10, schrieb Daniel Shahaf:
> The following is a common pattern (used by, for example,
> shutil.make_archive):
> 
>     save_cwd = os.getcwd()
>     try:
>         foo()
>     finally:
>         os.chdir(save_cwd)
> 
> I suggest this deserves a context manager:
> 
>     with saved_cwd():
>         foo()

-1 from me, too.

chdir() is not a safe operation because if affects the whole process.
You can NOT make it work properly and safe in a multi-threaded
environment or from code like signal handlers.

The Open Group has acknowledged the issue and added a new set of
functions to POSIX.1-2008 in order to address the issue. The *at()
variants of functions like open() take an additional file descriptor as
first argument. The fd must refer to a directory and is used as base for
relative paths. Python 3.3 supports the new *at() feature.

Christian


From christian at python.org  Sat Jan 19 16:40:32 2013
From: christian at python.org (Christian Heimes)
Date: Sat, 19 Jan 2013 16:40:32 +0100
Subject: [Python-ideas] chdir context manager
In-Reply-To: <CAGaVwhQQ_fjfSuw3kPiVffqvWvDvLH8U=zt8_+7Q4ekoejp5pA@mail.gmail.com>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
	<CAGaVwhQQ_fjfSuw3kPiVffqvWvDvLH8U=zt8_+7Q4ekoejp5pA@mail.gmail.com>
Message-ID: <50FABE70.7040902@python.org>

Am 19.01.2013 16:27, schrieb Calvin Spealman:
> -1 from me, as well. Encouraging a bad habit.

It's not just bad habit. It's broken by design because it's a major race
condition.



From eliben at gmail.com  Sat Jan 19 16:53:12 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Sat, 19 Jan 2013 07:53:12 -0800
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130119101024.GB2969@lp-shahaf.local>
References: <20130119101024.GB2969@lp-shahaf.local>
Message-ID: <CAF-Rda844Fj=9HE21hsA=eSS3FPz+yE=0R4061gpVarRctwfxA@mail.gmail.com>

On Sat, Jan 19, 2013 at 2:10 AM, Daniel Shahaf <d.s at daniel.shahaf.name>wrote:

> The following is a common pattern (used by, for example,
> shutil.make_archive):
>
>     save_cwd = os.getcwd()
>     try:
>         foo()
>     finally:
>         os.chdir(save_cwd)
>
> I suggest this deserves a context manager:
>
>     with saved_cwd():
>         foo()
>
> Initial feedback on IRC suggests shutil as where this functionality
> should live (other suggestions were made, such as pathlib).  Hence,
> attached patch implements this as shutil.saved_cwd, based on os.fchdir.
>
> The patch also adds os.chdir to os.supports_dir_fd and documents the
> context manager abilities of builtins.open() in its reference.
>
> Thoughts?
>
>
I don't think that every trivial convenience context manager should be
added to the standard library. It's just "yet another thing to look up". As
the discussion shows, the semantics of such a context manager are unclear
(does it do the change-dir itself or does the user code do it?), which
makes it even more important to look-up once you see it.

Moreover, this kind of a pattern is too general and specializing it for
each use case is burdensome. I've frequently written similar context
managers for other uses. The pattern is:

saved = save_call()
yield
restore_call(saved)

You can have it for chdir, for sys.path, for seek position in stream, for
anything really where it may be useful to do some operation with a
temporary state.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130119/93870d87/attachment.html>

From guido at python.org  Sat Jan 19 18:18:18 2013
From: guido at python.org (Guido van Rossum)
Date: Sat, 19 Jan 2013 09:18:18 -0800
Subject: [Python-ideas] chdir context manager
In-Reply-To: <CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
Message-ID: <CAP7+vJK4Jsg_6hD704fc5Re-yM0ACB3JwaYhSfCfVMwSWzjY8A@mail.gmail.com>

On Sat, Jan 19, 2013 at 6:57 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> -1 from me
>
> I consider caring about the current directory to be an anti-pattern -
> paths should be converted to absolute ASAP, and for invocation of
> other tools that care about the current directory, that's why the
> subprocess APIs accept a "cwd" argument. I certainly don't want to
> encourage people to unnecessarily rely on global state by providing a
> standard library context manager that makes it easier to do so.

Also it's not thread-safe.

TBH I think if people are doing this today it's probably a good idea
to suggest that they make their code more reliable by turning it into
a context manager; but I think having that context manager in the
stdlib is encouraging dubious practices. (The recommendation to use
absolute filenames is a good one but not always easy to implement
given a large codebase relying on the current directory.)

-- 
--Guido van Rossum (python.org/~guido)


From vinay_sajip at yahoo.co.uk  Sat Jan 19 18:29:33 2013
From: vinay_sajip at yahoo.co.uk (Vinay Sajip)
Date: Sat, 19 Jan 2013 17:29:33 +0000 (UTC)
Subject: [Python-ideas] chdir context manager
References: <20130119101024.GB2969@lp-shahaf.local>
	<CADiSq7eqBwWa8Gv@mail.gmail.com>
Message-ID: <loom.20130119T182004-956@post.gmane.org>

Nick Coghlan <ncoghlan at ...> writes:

> 
> -1 from me
> 
> I consider caring about the current directory to be an anti-pattern

I would agree, but in some places we unfortunately have to care about this,
because of stdlib history - for example, distutils. Wherever you have to do
"python setup.py ..." there is an implicit assumption that anything setup.py
looks at will be relative to wherever the setup.py is - it's seldom invoked
as "python /path/to/setup.py", and from what I've seen, very few projects
do the right thing in their setup.py and code called from it in terms of
getting an absolute path for the directory setup.py is in, and then using it
in subsequent operations.

I agree that we shouldn't encourage this kind of behaviour :-)

Regards,

Vinay Sajip







From ben at bendarnell.com  Sat Jan 19 18:32:55 2013
From: ben at bendarnell.com (Ben Darnell)
Date: Sat, 19 Jan 2013 12:32:55 -0500
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
Message-ID: <CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>

On Fri, Jan 18, 2013 at 8:23 PM, Glyph <glyph at twistedmatrix.com> wrote:

>
> On Jan 18, 2013, at 4:12 PM, Guido van Rossum <guido at python.org> wrote:
>
> On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>
> wrote:
>
>
> Guido van Rossum wrote:
>
> Well, except that you can't just pass CallbackProtocol where a
> protocol factory is required by the PEP -- you'll have to pass a
> lambda or partial function without arguments that calls
> CallbackProtocol with some arguments taken from elsewhere.
>
>
> Something smells wrong to me about APIs that require protocol
> factories.
>
>
> For starters, nothing "smells wrong" to me about protocol factories.
>  Responding to this kind of criticism is difficult, because it's not
> substantive - what's the actual problem?  I think that some Python
> programmers have an aversion to factories because a common path to Python
> is flight from Java environments that over- or mis-use the factory pattern.
>
>
I think the smell is that the factory is A) only used once and B) invoked
without adding any additional arguments that weren't available when the
factory was passed in, so there's no clear reason to defer creation of the
protocol.  I think it would make more sense if the transport were passed as
an argument to the factory (and then we could get rid of connection_made as
a required method on Protocol, although libraries or applications that want
to separate protocol creation from connection_made could still do so in
their own factories).

-Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130119/fd43f2bb/attachment.html>

From tjreedy at udel.edu  Sat Jan 19 18:40:50 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 19 Jan 2013 12:40:50 -0500
Subject: [Python-ideas] chdir context manager
In-Reply-To: <CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
Message-ID: <kdelre$kgh$1@ger.gmane.org>

On 1/19/2013 9:57 AM, Nick Coghlan wrote:
> -1 from me
>
> I consider caring about the current directory to be an anti-pattern -
> paths should be converted to absolute ASAP, and for invocation of
> other tools that care about the current directory, that's why the
> subprocess APIs accept a "cwd" argument. I certainly don't want to
> encourage people to unnecessarily rely on global state by providing a
> standard library context manager that makes it easier to do so.

Are you suggesting then that stdlib functions, such as archive makers, 
should 1) not require any particular setting of cwd but should have 
parameters that allow all paths to passed as absolute paths, and 2) not 
change cwd? If so, then shutil.make_archive should be able to pass 
absolute source and target paths to the archive makers, rather than 
having to set cwd before calling them.

-- 
Terry Jan Reedy



From guido at python.org  Sat Jan 19 19:06:41 2013
From: guido at python.org (Guido van Rossum)
Date: Sat, 19 Jan 2013 10:06:41 -0800
Subject: [Python-ideas] chdir context manager
In-Reply-To: <kdelre$kgh$1@ger.gmane.org>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
	<kdelre$kgh$1@ger.gmane.org>
Message-ID: <CAP7+vJKE8fAgDge6TGygJikWGt1AdeAEqkym5ZtdLus-ciPziw@mail.gmail.com>

AFAICT shutil.make_archive() already has all the information it needs
to be able to do it sjob without using chdir -- it's just being lazy.

On Sat, Jan 19, 2013 at 9:40 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 1/19/2013 9:57 AM, Nick Coghlan wrote:
>>
>> -1 from me
>>
>> I consider caring about the current directory to be an anti-pattern -
>> paths should be converted to absolute ASAP, and for invocation of
>> other tools that care about the current directory, that's why the
>> subprocess APIs accept a "cwd" argument. I certainly don't want to
>> encourage people to unnecessarily rely on global state by providing a
>> standard library context manager that makes it easier to do so.
>
>
> Are you suggesting then that stdlib functions, such as archive makers,
> should 1) not require any particular setting of cwd but should have
> parameters that allow all paths to passed as absolute paths, and 2) not
> change cwd? If so, then shutil.make_archive should be able to pass absolute
> source and target paths to the archive makers, rather than having to set cwd
> before calling them.
>
> --
> Terry Jan Reedy
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas



-- 
--Guido van Rossum (python.org/~guido)


From tjreedy at udel.edu  Sat Jan 19 19:07:09 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 19 Jan 2013 13:07:09 -0500
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130119150631.GF2969@lp-shahaf.local>
References: <20130119101024.GB2969@lp-shahaf.local>
	<kde7ip$t5a$1@ger.gmane.org>
	<20130119150631.GF2969@lp-shahaf.local>
Message-ID: <kdencp$1b4$1@ger.gmane.org>

On 1/19/2013 10:06 AM, Daniel Shahaf wrote:
> Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500:
>> On 1/19/2013 5:10 AM, Daniel Shahaf wrote:
>>> The following is a common pattern (used by, for example,
>>> shutil.make_archive):
>>>
>>>       save_cwd = os.getcwd()
>>>       try:
>>>           foo()
>>>       finally:
>>>           os.chdir(save_cwd)
>>>
>>> I suggest this deserves a context manager:
>>>
>>>       with saved_cwd():
>>>           foo()
>>
>> So to me, your proposal is only 1/2 or 2/3 of a context manager. (And
>> 'returns an open file descriptor for the saved directory' seems backward
>> or wrong for a context manager.) It does not actually make a new
>
> What should __enter__ return, then?
>
> It could return None, the to-be-restored directory's file descriptor, or
> the newly-changed-to directory (once a "directory to chdir to" optional
> argument is added).  The latter could be either a pathname (string) or
> a file descriptor (since it's just passed through to os.chdir).
>
> It seems to me returning the old dir's fd would be the most useful of
> the three option, since the other two are things callers already have
> --- None, which is global, and the argument to the context manager.

make_archive would prefer the old dir pathname, as it wants that for the 
logging call. But I do not think that that should drive design.

>> context. A proper temp_cwd context manager should have one parameter,
>> the new working directory, with chdir(new_cwd) in the enter method. To
>> allow for conditional switching, the two chdir system calls could be
>> conditional on new_cwd (either None or '' would mean no chdir calls).
>>
>
> I think making the new_cwd argument optional would be useful if the
> context manager body does multiple chdir() calls:
>
>      with saved_cwd():
>          os.chdir('/foo')
>          do_something()
>          os.chdir('/bar')
>          do_something()
>
> I'm not sure if that's exactly what you suggest --- you seem to be
> suggesting that saved_cwd(None) will avoid calling fchdir() from
> __exit__()?

I was, but that is a non-essential optimization. My idea is basically 
similar to Bueno's except for parameter absent versus None (and the two 
cases could be handled differently).

I think this proposal suffers a bit from being both too specific and too 
general. Eli explained the 'too specific' part: there are many things 
that might be changed and changed back. The 'too general' part is that 
specific applications need different specific details. There are various 
possibilities of what to do in and return from __enter__.

However, given the strong -1 from at least three core developers and 
one other person, the detail seem moot.

--
Terry Jan Reedy

-- 
Terry Jan Reedy



From python at mrabarnett.plus.com  Sat Jan 19 19:32:21 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Sat, 19 Jan 2013 18:32:21 +0000
Subject: [Python-ideas] chdir context manager
In-Reply-To: <kdencp$1b4$1@ger.gmane.org>
References: <20130119101024.GB2969@lp-shahaf.local>
	<kde7ip$t5a$1@ger.gmane.org>
	<20130119150631.GF2969@lp-shahaf.local>
	<kdencp$1b4$1@ger.gmane.org>
Message-ID: <50FAE6B5.3030507@mrabarnett.plus.com>

On 2013-01-19 18:07, Terry Reedy wrote:
> On 1/19/2013 10:06 AM, Daniel Shahaf wrote:
>> Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500:
>>> On 1/19/2013 5:10 AM, Daniel Shahaf wrote:
>>>> The following is a common pattern (used by, for example,
>>>> shutil.make_archive):
>>>>
>>>>       save_cwd = os.getcwd()
>>>>       try:
>>>>           foo()
>>>>       finally:
>>>>           os.chdir(save_cwd)
>>>>
>>>> I suggest this deserves a context manager:
>>>>
>>>>       with saved_cwd():
>>>>           foo()
>>>
>>> So to me, your proposal is only 1/2 or 2/3 of a context manager. (And
>>> 'returns an open file descriptor for the saved directory' seems backward
>>> or wrong for a context manager.) It does not actually make a new
>>
>> What should __enter__ return, then?
>>
>> It could return None, the to-be-restored directory's file descriptor, or
>> the newly-changed-to directory (once a "directory to chdir to" optional
>> argument is added).  The latter could be either a pathname (string) or
>> a file descriptor (since it's just passed through to os.chdir).
>>
>> It seems to me returning the old dir's fd would be the most useful of
>> the three option, since the other two are things callers already have
>> --- None, which is global, and the argument to the context manager.
>
> make_archive would prefer the old dir pathname, as it wants that for the
> logging call. But I do not think that that should drive design.
>
>>> context. A proper temp_cwd context manager should have one parameter,
>>> the new working directory, with chdir(new_cwd) in the enter method. To
>>> allow for conditional switching, the two chdir system calls could be
>>> conditional on new_cwd (either None or '' would mean no chdir calls).
>>>
>>
>> I think making the new_cwd argument optional would be useful if the
>> context manager body does multiple chdir() calls:
>>
>>      with saved_cwd():
>>          os.chdir('/foo')
>>          do_something()
>>          os.chdir('/bar')
>>          do_something()
>>
>> I'm not sure if that's exactly what you suggest --- you seem to be
>> suggesting that saved_cwd(None) will avoid calling fchdir() from
>> __exit__()?
>
> I was, but that is a non-essential optimization. My idea is basically
> similar to Bueno's except for parameter absent versus None (and the two
> cases could be handled differently).
>
> I think this proposal suffers a bit from being both too specific and too
> general. Eli explained the 'too specific' part: there are many things
> that might be changed and changed back. The 'too general' part is that
> specific applications need different specific details. There are various
> possibilities of what to do in and return from __enter__.
>
> However, given the strong -1 from at least three core developers and
> one other person, the detail seem moot.
>
FWIW, -1 from me too because, as has been said already, you shouldn't 
really
be using os.chdir; use absolute paths instead.


From chris.jerdonek at gmail.com  Sat Jan 19 19:52:54 2013
From: chris.jerdonek at gmail.com (Chris Jerdonek)
Date: Sat, 19 Jan 2013 10:52:54 -0800
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130119101024.GB2969@lp-shahaf.local>
References: <20130119101024.GB2969@lp-shahaf.local>
Message-ID: <CAOTb1weP+imgTr0nP2O-0T-9Q9Zq+ZFYqNkVQ6xXGYpbVsGTow@mail.gmail.com>

On Sat, Jan 19, 2013 at 2:10 AM, Daniel Shahaf <d.s at daniel.shahaf.name> wrote:
> The following is a common pattern (used by, for example,
> shutil.make_archive):
>
>     save_cwd = os.getcwd()
>     try:
>         foo()
>     finally:
>         os.chdir(save_cwd)

FWIW, test.support has such a context manager (though test.support is
not for public consumption and test.support's implementation does more
than one thing, though see issue 15415):

http://hg.python.org/cpython/file/48cddcb9c841/Lib/test/support.py#l738

--Chris


>
> I suggest this deserves a context manager:
>
>     with saved_cwd():
>         foo()
>
> Initial feedback on IRC suggests shutil as where this functionality
> should live (other suggestions were made, such as pathlib).  Hence,
> attached patch implements this as shutil.saved_cwd, based on os.fchdir.
>
> The patch also adds os.chdir to os.supports_dir_fd and documents the
> context manager abilities of builtins.open() in its reference.
>
> Thoughts?
>
> Thanks,
>
> Daniel
>
>
> diff -r 74b0461346f0 Doc/library/functions.rst
> --- a/Doc/library/functions.rst Fri Jan 18 17:53:18 2013 -0800
> +++ b/Doc/library/functions.rst Sat Jan 19 09:39:27 2013 +0000
> @@ -828,6 +828,9 @@ are always available.  They are listed h
>     Open *file* and return a corresponding :term:`file object`.  If the file
>     cannot be opened, an :exc:`OSError` is raised.
>
> +   This function can be used as a :term:`context manager` that closes the
> +   file when it exits.
> +
>     *file* is either a string or bytes object giving the pathname (absolute or
>     relative to the current working directory) of the file to be opened or
>     an integer file descriptor of the file to be wrapped.  (If a file descriptor
> diff -r 74b0461346f0 Doc/library/os.rst
> --- a/Doc/library/os.rst        Fri Jan 18 17:53:18 2013 -0800
> +++ b/Doc/library/os.rst        Sat Jan 19 09:39:27 2013 +0000
> @@ -1315,6 +1315,9 @@ features:
>     This function can support :ref:`specifying a file descriptor <path_fd>`.  The
>     descriptor must refer to an opened directory, not an open file.
>
> +   See also :func:`shutil.saved_cwd` for a context manager that restores the
> +   current working directory.
> +
>     Availability: Unix, Windows.
>
>     .. versionadded:: 3.3
> diff -r 74b0461346f0 Doc/library/shutil.rst
> --- a/Doc/library/shutil.rst    Fri Jan 18 17:53:18 2013 -0800
> +++ b/Doc/library/shutil.rst    Sat Jan 19 09:39:27 2013 +0000
> @@ -36,6 +36,19 @@ copying and removal. For operations on i
>  Directory and files operations
>  ------------------------------
>
> +.. function:: saved_cwd()
> +
> +   Return a :term:`context manager` that restores the current working directory
> +   when it exits.  See :func:`os.chdir` for changing the current working
> +   directory.
> +
> +   The context manager returns an open file descriptor for the saved directory.
> +
> +   Only available when :func:`os.chdir` supports file descriptor arguments.
> +
> +   .. versionadded:: 3.4
> +
> +
>  .. function:: copyfileobj(fsrc, fdst[, length])
>
>     Copy the contents of the file-like object *fsrc* to the file-like object *fdst*.
> diff -r 74b0461346f0 Lib/os.py
> --- a/Lib/os.py Fri Jan 18 17:53:18 2013 -0800
> +++ b/Lib/os.py Sat Jan 19 09:39:27 2013 +0000
> @@ -120,6 +120,7 @@ if _exists("_have_functions"):
>
>      _set = set()
>      _add("HAVE_FACCESSAT",  "access")
> +    _add("HAVE_FCHDIR",     "chdir")
>      _add("HAVE_FCHMODAT",   "chmod")
>      _add("HAVE_FCHOWNAT",   "chown")
>      _add("HAVE_FSTATAT",    "stat")
> diff -r 74b0461346f0 Lib/shutil.py
> --- a/Lib/shutil.py     Fri Jan 18 17:53:18 2013 -0800
> +++ b/Lib/shutil.py     Sat Jan 19 09:39:27 2013 +0000
> @@ -38,6 +38,7 @@ __all__ = ["copyfileobj", "copyfile", "c
>             "unregister_unpack_format", "unpack_archive",
>             "ignore_patterns", "chown", "which"]
>             # disk_usage is added later, if available on the platform
> +           # saved_cwd is added later, if available on the platform
>
>  class Error(OSError):
>      pass
> @@ -1111,3 +1112,20 @@ def which(cmd, mode=os.F_OK | os.X_OK, p
>                  if _access_check(name, mode):
>                      return name
>      return None
> +
> +# Define the chdir context manager.
> +if os.chdir in os.supports_dir_fd:
> +    class saved_cwd:
> +        def __init__(self):
> +            pass
> +        def __enter__(self):
> +            self.dh = os.open(os.curdir,
> +                              os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0))
> +            return self.dh
> +        def __exit__(self, exc_type, exc_value, traceback):
> +            try:
> +                os.chdir(self.dh)
> +            finally:
> +                os.close(self.dh)
> +            return False
> +    __all__.append('saved_cwd')
> diff -r 74b0461346f0 Lib/test/test_shutil.py
> --- a/Lib/test/test_shutil.py   Fri Jan 18 17:53:18 2013 -0800
> +++ b/Lib/test/test_shutil.py   Sat Jan 19 09:39:27 2013 +0000
> @@ -1276,6 +1276,20 @@ class TestShutil(unittest.TestCase):
>          rv = shutil.copytree(src_dir, dst_dir)
>          self.assertEqual(['foo'], os.listdir(rv))
>
> +    def test_saved_cwd(self):
> +        if hasattr(os, 'fchdir'):
> +            temp_dir = self.mkdtemp()
> +            orig_dir = os.getcwd()
> +            with shutil.saved_cwd() as dir_fd:
> +                os.chdir(temp_dir)
> +                new_dir = os.getcwd()
> +                self.assertIsInstance(dir_fd, int)
> +            final_dir = os.getcwd()
> +            self.assertEqual(orig_dir, final_dir)
> +            self.assertEqual(temp_dir, new_dir)
> +        else:
> +            self.assertFalse(hasattr(shutil, 'saved_cwd'))
> +
>
>  class TestWhich(unittest.TestCase):
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas


From glyph at twistedmatrix.com  Sat Jan 19 23:53:43 2013
From: glyph at twistedmatrix.com (Glyph)
Date: Sat, 19 Jan 2013 14:53:43 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
Message-ID: <BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>


On Jan 19, 2013, at 9:32 AM, Ben Darnell <ben at bendarnell.com> wrote:

> On Fri, Jan 18, 2013 at 8:23 PM, Glyph <glyph at twistedmatrix.com> wrote:
> 
> On Jan 18, 2013, at 4:12 PM, Guido van Rossum <guido at python.org> wrote:
> 
>> On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> 
>>> Guido van Rossum wrote:
>>> 
>>>> Well, except that you can't just pass CallbackProtocol where a
>>>> protocol factory is required by the PEP -- you'll have to pass a
>>>> lambda or partial function without arguments that calls
>>>> CallbackProtocol with some arguments taken from elsewhere.
>>> 
>>> Something smells wrong to me about APIs that require protocol
>>> factories.
> 
> For starters, nothing "smells wrong" to me about protocol factories.  Responding to this kind of criticism is difficult, because it's not substantive - what's the actual problem?  I think that some Python programmers have an aversion to factories because a common path to Python is flight from Java environments that over- or mis-use the factory pattern.
> 
> 
> I think the smell is that the factory is A) only used once and B) invoked without adding any additional arguments that weren't available when the factory was passed in, so there's no clear reason to defer creation of the protocol.  I think it would make more sense if the transport were passed as an argument to the factory (and then we could get rid of connection_made as a required method on Protocol, although libraries or applications that want to separate protocol creation from connection_made could still do so in their own factories).

The problem with creating the protocol with the transport as an argument to its constructor is that in order to behave correctly, the transport needs to know about the protocol as well; so it also wants to be constructed with a reference to the protocol to *its* constructor.  So adding a no-protocol-yet case adds more edge-cases to every transport's implementation.

All these solutions are roughly isomorphic to each other, so I don't care deeply about it.  However, my proposed architecture has been in use for a decade in Twisted without any major problems I can see.  I'm not saying that Twisted programs are perfect, but it would *really* be useful to discuss this in terms of problems you can identify with the humungous existing corpus of Twisted-using code, and say "here's a problem that develops in some programs due to the sub-optimal shape of this API".  Unnecessary class definitions, for example, or a particular type of bug; something like that.  For example, I can identify several difficulties with Twisted's current flow-control setup code and would not recommend that it be copied exactly.  Talking about how the code smells or what might hypothetically make more sense is just bikeshedding.

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130119/e51c4855/attachment.html>

From ncoghlan at gmail.com  Sun Jan 20 02:51:14 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jan 2013 11:51:14 +1000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
Message-ID: <CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>

On Sun, Jan 20, 2013 at 8:53 AM, Glyph <glyph at twistedmatrix.com> wrote:
>
>> On Jan 19, 2013, at 9:32 AM, Ben Darnell <ben at bendarnell.com> wrote:
>>
>> On Fri, Jan 18, 2013 at 8:23 PM, Glyph <glyph at twistedmatrix.com> wrote:
>>> For starters, nothing "smells wrong" to me about protocol factories.
>>> Responding to this kind of criticism is difficult, because it's not
>>> substantive - what's the actual problem?  I think that some Python
>>> programmers have an aversion to factories because a common path to Python is
>>> flight from Java environments that over- or mis-use the factory pattern.
>>>
>>
>> I think the smell is that the factory is A) only used once and B) invoked
>> without adding any additional arguments that weren't available when the
>> factory was passed in, so there's no clear reason to defer creation of the
>> protocol.  I think it would make more sense if the transport were passed as
>> an argument to the factory (and then we could get rid of connection_made as
>> a required method on Protocol, although libraries or applications that want
>> to separate protocol creation from connection_made could still do so in
>> their own factories).
>
> The problem with creating the protocol with the transport as an argument to
> its constructor is that in order to behave correctly, the transport needs to
> know about the protocol as well; so it also wants to be constructed with a
> reference to the protocol to *its* constructor.  So adding a no-protocol-yet
> case adds more edge-cases to every transport's implementation.

But the trade-off in separating protocol creation from notification of
the connection is that it means every *protocol* has to be written to
handle the "no connection yet" gap between __init__ and the call to
connection_made.

However, if we instead delay the call to the protocol factory until
*after the connection is made*, then most protocols can be written
assuming they always have a connection (at least until connection_lost
is called). A persistent protocol that spanned multiple
connect/reconnect cycles could be written such that you passed
"my_protocol.connection_made" as the protocol factory, while normal
protocols (that last only the length of a single connection) would
pass "MyProtocol" directly.

At the transport layer, the two states "has a protocol" and "has a
connection" could then be collapsed into one - if there is a
connection, then there will be a protocol, and vice-versa. This
differs from the current status in PEP 3156, where it's possible for a
transport to have a protocol without a connection if it calls the
protocol factory well before calling connection_made.

Now, it may be that *there's a good reason* why conflating "has a
protocol" and "has a connection" at the transport layer is a bad idea,
and thus we actually *need* the "protocol creation" and "protocol
association with a connection" events to be distinct. However, the PEP
currently doesn't explain *why* it's necessary to separate the two,
hence the confusion for at least Greg, Ben and myself.

Given that new protocol implementations should be significantly more
common than new transport implementations, there's a strong case to be
made for pushing any required complexity into the transports.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From wuwei23 at gmail.com  Sun Jan 20 03:02:32 2013
From: wuwei23 at gmail.com (alex23)
Date: Sat, 19 Jan 2013 18:02:32 -0800 (PST)
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <kdc4n3$c6o$1@ger.gmane.org>
References: <50F6813E.60503@ziade.org>
	<CAE_Hg6Y26Veq6eJrzsOZDL_Je2N03s5N3Hxfr8QHN-1Lhf23VQ@mail.gmail.com>
	<50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
	<50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org>
	<20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org>
	<CADiSq7c9=ro6bNAAa=oJX0idRLd3W9jmXQM13hE1WDkAyUKkFg@mail.gmail.com>
	<CALFfu7D1gGF9dxvmH91cfhpB-HMELXvk0xOdseXWNTKjd=2rwg@mail.gmail.com>
	<kdc4n3$c6o$1@ger.gmane.org>
Message-ID: <2e59f105-83fb-46b0-8e6f-e854a71ab08f@th3g2000pbc.googlegroups.com>

On Jan 19, 4:36?am, Terry Reedy <tjre... at udel.edu> wrote:
> On 1/18/2013 10:54 AM, Eric Snow wrote:
> > It took me a sec. ?:) ?DSU == "Decorate-Sort-Undecorate". [1]
>
> No, no, no. Its Delaware State University in Dover, as opposed to
> Univesity of Delaware (UD) in Newark ;-).
>
> In other words, it depends on the universe you live in.

"Namespaces are one honking great idea" :)


From ncoghlan at gmail.com  Sun Jan 20 03:34:24 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jan 2013 12:34:24 +1000
Subject: [Python-ideas] PEP 3156: Clarifying the different components of the
	event loop API
Message-ID: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>

PEP 3156 currently lists *29* proposed methods for the event loop API.
These methods serve quite different purposes and I think a bit more
structure in the overall API could help clarify that.

First proposal: clearly split the abstract EventLoop API from concrete
DescriptorEventLoop and IOCPEventLoop subclasses.

The main benefit here is to help clarify that:
1. the additional methods defined on DescriptorEventLoop and
IOCPEventLoop are not available on all event loop implementations, so
any code using them is necessarily event loop specific
2. the goal of the transport abstraction is to mask the differences
between these low level platform specific APIs
3. other event loops are free to use a completely different API
between their low level transports and the event loop

Second proposal: better separate the "event loop management", "event
monitoring" and "do things" methods

I don't have a clear idea of how to do this yet (beyond restructuring
the documentation of the event loop API in the PEP), but I can at
least describe the split I see (along with a few name changes that may
be worth considering).

Event loop management:
- run_once()
- run() # Perhaps "run_until_idle()"?
- run_forever() # Perhaps "run_until_stop()"?
- run_until_complete()
- stop()
- close()
- set_default_executor()

Event monitoring:
- add_signal_handler()
- remove_signal_handler()
- start_serving() # (The "stop serving" API is TBD in the PEP)

Do things (fire and forget):
- call_soon()
- call_soon_threadsafe()
- call_later()
- call_repeatedly()

Do things (and get the result with "yield from"):
- wrap_future() # Perhaps "wrap_executor_future"?
- run_in_executor()
- getaddrinfo()
- getnameinfo()

Low level transport creation:
- create_connection()
- create_pipe() # Once it exists in the PEP

Cheers,
Nick.

P.S. Off-topic for the thread, but I think the existence of run_once
vs run (or run_until_idle) validates the decision to stick with only
running one generation of ready callbacks per iteration. I forgot
about it when we were discussing that question.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From d.s at daniel.shahaf.name  Sun Jan 20 05:23:15 2013
From: d.s at daniel.shahaf.name (Daniel Shahaf)
Date: Sun, 20 Jan 2013 06:23:15 +0200
Subject: [Python-ideas] chdir context manager
In-Reply-To: <50FABE70.7040902@python.org>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
	<CAGaVwhQQ_fjfSuw3kPiVffqvWvDvLH8U=zt8_+7Q4ekoejp5pA@mail.gmail.com>
	<50FABE70.7040902@python.org>
Message-ID: <20130120042315.GB2950@lp-shahaf.local>

Christian Heimes wrote on Sat, Jan 19, 2013 at 16:40:32 +0100:
> Am 19.01.2013 16:27, schrieb Calvin Spealman:
> > -1 from me, as well. Encouraging a bad habit.
> 
> It's not just bad habit. It's broken by design because it's a major race
> condition.

In other words, single-threaded processes will need to implement their
own chdir context manager because using it in multi-threaded
applications would be a bug.  I note the same reasoning applies to
a hypothetical context manager that changes the nice(2) level of the
current process.

This reasoning reduces to "make all of stdlib thread-safe" --- at the
expense of people who write single-threaded code, know full well that
chdir and nice are global state and should normally not be used, and
want to use them anyway.

I know enough programming to never call renice or chdir in library code.
But when I write some __main__ code, I might want to use them.  I should
be able to.


From d.s at daniel.shahaf.name  Sun Jan 20 05:25:55 2013
From: d.s at daniel.shahaf.name (Daniel Shahaf)
Date: Sun, 20 Jan 2013 06:25:55 +0200
Subject: [Python-ideas] chdir context manager
In-Reply-To: <50FABE70.7040902@python.org>
References: <20130119101024.GB2969@lp-shahaf.local>
	<CADiSq7eqBwWa8Gv__2S-r85SYDgjwAVe7qB=VL5SS_mwkFAK-Q@mail.gmail.com>
	<CAGaVwhQQ_fjfSuw3kPiVffqvWvDvLH8U=zt8_+7Q4ekoejp5pA@mail.gmail.com>
	<50FABE70.7040902@python.org>
Message-ID: <20130120042555.GC2950@lp-shahaf.local>

Christian Heimes wrote on Sat, Jan 19, 2013 at 16:40:32 +0100:
> Am 19.01.2013 16:27, schrieb Calvin Spealman:
> > -1 from me, as well. Encouraging a bad habit.
> 
> It's not just bad habit. It's broken by design because it's a major race
> condition.

A couple of other clarifications: Christian clarified on IRC that this
refers to the use chdir() (which modifies process-global state) in
multithreaded applications.  The code uses fchdir so it's not vulnerable
to race conditions whereby another process removes the original cwd
before it chdir's back to it.  (In that respect it's better than the
common "try: getcwd() finally: chdir()" pattern.)

Someone suggested adding a mutex to the saved_cwd context manager.  That
would solve the race condition, but I don't have a use-case for it ---
precisely because I can't imagine multithreaded code where threads
depend their cwd.


From guido at python.org  Sun Jan 20 05:35:04 2013
From: guido at python.org (Guido van Rossum)
Date: Sat, 19 Jan 2013 20:35:04 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
Message-ID: <CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>

On Sat, Jan 19, 2013 at 5:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> But the trade-off in separating protocol creation from notification of
> the connection is that it means every *protocol* has to be written to
> handle the "no connection yet" gap between __init__ and the call to
> connection_made.

That doesn't strike me as a problematic design. I've seen it plenty of times.

> However, if we instead delay the call to the protocol factory until
> *after the connection is made*, then most protocols can be written
> assuming they always have a connection (at least until connection_lost
> is called). A persistent protocol that spanned multiple
> connect/reconnect cycles could be written such that you passed
> "my_protocol.connection_made" as the protocol factory, while normal
> protocols (that last only the length of a single connection) would
> pass "MyProtocol" directly.

Well, almost. connection_made() would have to return self to make this
work. But we could certainly use add some other method that did that.

(At first I thought it would be harder to pass other parameters to the
constructor for the non-reconnecting case, but the solution is about
the same as before -- use a partial function or a lambda that takes a
protocol and calls the constructor with that and whatever other
parameters it wants to pass.)

> At the transport layer, the two states "has a protocol" and "has a
> connection" could then be collapsed into one - if there is a
> connection, then there will be a protocol, and vice-versa. This
> differs from the current status in PEP 3156, where it's possible for a
> transport to have a protocol without a connection if it calls the
> protocol factory well before calling connection_made.

This doesn't strike me as important. The code I've written for Tulip
puts most of the connection-making code outside the transport, and the
transport constructor is completely private. Every transport
implementation is completely free in how it works, and every event
loop implementation is free to put as much or as little of the
connection set-up in the transport as it wants to. The same is true
for transports written by users (and there will be some of these). The
*only* things we care about for transports is that the thing passed to
the protocol's connection_made() has the methods specified by the PEP
(write(), writelines(), pause(), resume(), and a few more). Also, it
does not matter one iota whether it is the transport or some other
entity that calls the protocol's methods (connection_made(),
data_received(), etc.) -- the only thing that matters is the order in
which they are called.

IOW, even though a transport may "have" a protocol without a
connection, nobody should care about that state, and nobody should be
calling its methods (again, write() etc.) in that state. In fact,
nobody except event loop internal code should ever have a reference to
a transport in that state. (The transport that is returned by
create_connection() is fully connected to the socket (or whatever
might takes its place) as well as to the protocol.)

I think we can make the same assumptions for transports implemented by
user code.

> Now, it may be that *there's a good reason* why conflating "has a
> protocol" and "has a connection" at the transport layer is a bad idea,
> and thus we actually *need* the "protocol creation" and "protocol
> association with a connection" events to be distinct. However, the PEP
> currently doesn't explain *why* it's necessary to separate the two,
> hence the confusion for at least Greg, Ben and myself.

So, your whole point here seems to be that you'd rather see the PEP
specify that the sequence when a connection is made is

  protocol = protocol_factory(transport)

rather than

  protocol = protocol_factory()
  protocol.connection_made(transport)

I looked in the Tulip code to see whether this would cause any
problems. I think it could be done, but the solution would feel a
little awkward to me, because currently the protocol's
connection_made() method is not called directly by the transport: it
is called indirectly via the event loop's call_soon() method. So using
your approach the transport wouldn't have a protocol attribute until
this callback is called -- or we'd have to change things to call it
directly rather than via call_soon(). Now I'm pretty sure I can prove
that nothing will be referencing the protocol *before* the
connection_made() call is actually made, and also that directly
calling it instead of using call_soon() is fine. But nevertheless the
transport code would feel a little harder to reason about.

> Given that new protocol implementations should be significantly more
> common than new transport implementations, there's a strong case to be
> made for pushing any required complexity into the transports.

TBH I don't see the protocol implementation getting any simpler
because of this. There is some protocol initialization code that
doesn't depend on the transport, and some that does. Using your
approach, these all go in __init__(). Using the PEP's current
proposal, the latter go in a separate method, connection_made(). But
using your approach, writing the lambda or partial function that calls
the constructor with the right arguments (to be passed as
protocol_factory) becomes a tad more complex, since now it must take a
transport argument. On the third hand, rigging things so that a
pre-existing protocol instance can be reused becomes a little harder
to figure out, since you have to write a helper method that takes a
transport and returns the protocol (i.e., self).

All in all I see it as six of one, half a dozen of the other, and I am
happy with Glyph's testimony that the Twisted design works well in
practice.

-- 
--Guido van Rossum (python.org/~guido)


From d.s at daniel.shahaf.name  Sun Jan 20 05:43:08 2013
From: d.s at daniel.shahaf.name (Daniel Shahaf)
Date: Sun, 20 Jan 2013 06:43:08 +0200
Subject: [Python-ideas] chdir context manager
In-Reply-To: <50FAE6B5.3030507@mrabarnett.plus.com>
References: <20130119101024.GB2969@lp-shahaf.local>
	<kde7ip$t5a$1@ger.gmane.org>
	<20130119150631.GF2969@lp-shahaf.local>
	<kdencp$1b4$1@ger.gmane.org> <50FAE6B5.3030507@mrabarnett.plus.com>
Message-ID: <20130120044308.GD2950@lp-shahaf.local>

MRAB wrote on Sat, Jan 19, 2013 at 18:32:21 +0000:
> On 2013-01-19 18:07, Terry Reedy wrote:
>> On 1/19/2013 10:06 AM, Daniel Shahaf wrote:
>>> Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500:
>>>> On 1/19/2013 5:10 AM, Daniel Shahaf wrote:
>>>>> The following is a common pattern (used by, for example,
>>>>> shutil.make_archive):
>>>>>
>>>>>       save_cwd = os.getcwd()
>>>>>       try:
>>>>>           foo()
>>>>>       finally:
>>>>>           os.chdir(save_cwd)
>>>>>
>>>>> I suggest this deserves a context manager:
>>>>>
>>>>>       with saved_cwd():
>>>>>           foo()
>>>>
>>>> So to me, your proposal is only 1/2 or 2/3 of a context manager. (And
>>>> 'returns an open file descriptor for the saved directory' seems backward
>>>> or wrong for a context manager.) It does not actually make a new
>>>
>>> What should __enter__ return, then?
>>>
>>> It could return None, the to-be-restored directory's file descriptor, or
>>> the newly-changed-to directory (once a "directory to chdir to" optional
>>> argument is added).  The latter could be either a pathname (string) or
>>> a file descriptor (since it's just passed through to os.chdir).
>>>
>>> It seems to me returning the old dir's fd would be the most useful of
>>> the three option, since the other two are things callers already have
>>> --- None, which is global, and the argument to the context manager.
>>
>> make_archive would prefer the old dir pathname, as it wants that for the
>> logging call. But I do not think that that should drive design.
>>
>>>> context. A proper temp_cwd context manager should have one parameter,
>>>> the new working directory, with chdir(new_cwd) in the enter method. To
>>>> allow for conditional switching, the two chdir system calls could be
>>>> conditional on new_cwd (either None or '' would mean no chdir calls).
>>>>
>>>
>>> I think making the new_cwd argument optional would be useful if the
>>> context manager body does multiple chdir() calls:
>>>
>>>      with saved_cwd():
>>>          os.chdir('/foo')
>>>          do_something()
>>>          os.chdir('/bar')
>>>          do_something()
>>>
>>> I'm not sure if that's exactly what you suggest --- you seem to be
>>> suggesting that saved_cwd(None) will avoid calling fchdir() from
>>> __exit__()?
>>
>> I was, but that is a non-essential optimization. My idea is basically
>> similar to Bueno's except for parameter absent versus None (and the two
>> cases could be handled differently).
>>
>> I think this proposal suffers a bit from being both too specific and too
>> general. Eli explained the 'too specific' part: there are many things
>> that might be changed and changed back. The 'too general' part is that
>> specific applications need different specific details. There are various
>> possibilities of what to do in and return from __enter__.
>>

OK.

>> However, given the strong -1 from at least three core developers and
>> one other person, the detail seem moot.
>>

*nod*, I see.

> FWIW, -1 from me too because, as has been said already, you shouldn't  
> really be using os.chdir; use absolute paths instead.

I don't use chdir in library code or multithreaded code, but I do use it
in __main__ of short scripts, where there's no "caller" or "other
thread" to consider.

Consider sys.argv.  The language and stdlib don't prevent library code
from accessing (or modifying) sys.argv, but well-behaved libraries
neither read sys.argv nor modify it.  The same is true of the cwd.


From guido at python.org  Sun Jan 20 05:37:34 2013
From: guido at python.org (Guido van Rossum)
Date: Sat, 19 Jan 2013 20:37:34 -0800
Subject: [Python-ideas] PEP 3156: Clarifying the different components of
 the event loop API
In-Reply-To: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
References: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
Message-ID: <CAP7+vJK9v3Pqg69XtHYOo3imrTagx9=YF6K6ySBqvEZ5hdfBiQ@mail.gmail.com>

(I'm out of time to respond at length, but I think you have a good
point here and I expect I will heed it. It may be a while before I
have time for another sprint with the PEP and Tulip though.)

On Sat, Jan 19, 2013 at 6:34 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> PEP 3156 currently lists *29* proposed methods for the event loop API.
> These methods serve quite different purposes and I think a bit more
> structure in the overall API could help clarify that.
>
> First proposal: clearly split the abstract EventLoop API from concrete
> DescriptorEventLoop and IOCPEventLoop subclasses.
>
> The main benefit here is to help clarify that:
> 1. the additional methods defined on DescriptorEventLoop and
> IOCPEventLoop are not available on all event loop implementations, so
> any code using them is necessarily event loop specific
> 2. the goal of the transport abstraction is to mask the differences
> between these low level platform specific APIs
> 3. other event loops are free to use a completely different API
> between their low level transports and the event loop
>
> Second proposal: better separate the "event loop management", "event
> monitoring" and "do things" methods
>
> I don't have a clear idea of how to do this yet (beyond restructuring
> the documentation of the event loop API in the PEP), but I can at
> least describe the split I see (along with a few name changes that may
> be worth considering).
>
> Event loop management:
> - run_once()
> - run() # Perhaps "run_until_idle()"?
> - run_forever() # Perhaps "run_until_stop()"?
> - run_until_complete()
> - stop()
> - close()
> - set_default_executor()
>
> Event monitoring:
> - add_signal_handler()
> - remove_signal_handler()
> - start_serving() # (The "stop serving" API is TBD in the PEP)
>
> Do things (fire and forget):
> - call_soon()
> - call_soon_threadsafe()
> - call_later()
> - call_repeatedly()
>
> Do things (and get the result with "yield from"):
> - wrap_future() # Perhaps "wrap_executor_future"?
> - run_in_executor()
> - getaddrinfo()
> - getnameinfo()
>
> Low level transport creation:
> - create_connection()
> - create_pipe() # Once it exists in the PEP
>
> Cheers,
> Nick.
>
> P.S. Off-topic for the thread, but I think the existence of run_once
> vs run (or run_until_idle) validates the decision to stick with only
> running one generation of ready callbacks per iteration. I forgot
> about it when we were discussing that question.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas



-- 
--Guido van Rossum (python.org/~guido)


From ncoghlan at gmail.com  Sun Jan 20 07:13:39 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jan 2013 16:13:39 +1000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
	<CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
Message-ID: <CADiSq7faSakaJ7QA66yv7hxOA+_2R-n503zJb65x8=hsCYWHew@mail.gmail.com>

On Sun, Jan 20, 2013 at 2:35 PM, Guido van Rossum <guido at python.org> wrote:
> TBH I don't see the protocol implementation getting any simpler
> because of this. There is some protocol initialization code that
> doesn't depend on the transport, and some that does. Using your
> approach, these all go in __init__(). Using the PEP's current
> proposal, the latter go in a separate method, connection_made().

When the two are separated without a clear definition of what else can
happen in between, *every other method on the protocol* needs to cope
with the fact that other calls to protocol methods may happen in
between the call to __init__ and the call to connection_made - you
simply can't write a protocol without dealing with that problem.

As you correctly figured out, my specific proposal was to move from:

    protocol = protocol_factory()
    protocol.connection_made(transport)

To a single event:

    protocol = protocol_factory(transport)

The *reason* I wanted to do this is that I *don't understand* what may
happen to my protocol implementation between construction and the call
to make_connection.

Your description of the current implementation actually worries me, as
it suggests to me that when I get a (transport, protocol) pair back
from a call to "create_connection", "connection_made" may *not* have
been called yet - the protocol may be in exactly the state I am
worried about, because the event loop is sending the notification in a
fire-and-forget fashion, instead of waiting until the call is
complete:

    protocol = protocol_factory()
    loop.call_soon(protocol.connection_made, transport)
    # The protocol isn't actually fully initialized here...


However, that description also made me realise why two distinct
operations are needed, so I'd like to change my suggestion to the
following:

    protocol = factory()
    yield from protocol.connection_made(transport) # Or callback equivalent

The protocol factory would still be used to create the protocol
object. However, the PEP would be updated to make it clear that
immediately after creation the *only* permitted method invocation on
the result is "connection_made", which will complete the protocol
initialization process.

The connection_made event handler would be redefined to return a
*Future* (or equivalent object) rather than completing synchronously.
create_connection would then call connection_made and *wait for it to
finish*, rather than using call_soon in a fire-and-forget fashion.

The advantage of this is that the rationale for the various possible
states become clear:

- the protocol factory is invoked synchronously, and is thus not
allowed to perform any blocking actions (but may trigger
"fire-and-forget" operations)
- connection_made is invoked asynchronously, and is thus able to wait
for various operations
- a protocol returned from create_connection is certain to have had
connection_made already called, thus a protocol implementation may
safely assume in other methods that both __init__ and connection_made
will have been called during the initialization process.

Cheers,
Nick.


--
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ncoghlan at gmail.com  Sun Jan 20 07:31:32 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jan 2013 16:31:32 +1000
Subject: [Python-ideas] PEP 3156: Clarifying the different components of
	the event loop API
In-Reply-To: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
References: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
Message-ID: <CADiSq7f46f55SN3SgVjG3CcwA3Cz6wqTDZ9q4VMEEX+QdpKXUg@mail.gmail.com>

On Sun, Jan 20, 2013 at 12:34 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Do things (and get the result with "yield from"):
> - wrap_future() # Perhaps "wrap_executor_future"?
> - run_in_executor()
> - getaddrinfo()
> - getnameinfo()
>
> Low level transport creation:
> - create_connection()
> - create_pipe() # Once it exists in the PEP

Somewhere early in the PEP, there may need to be a concise description
of the two APIs for waiting for an asynchronous Future:

1. "f.add_done_callback()"
2. "yield from f" in a coroutine (resumes the coroutine when the
future completes, with either the result or exception as appropriate)

At the moment, these are buried in amongst much larger APIs, yet
they're key to understanding the way everything above the core event
loop layer interacts.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From glyph at twistedmatrix.com  Sun Jan 20 07:51:31 2013
From: glyph at twistedmatrix.com (Glyph)
Date: Sat, 19 Jan 2013 22:51:31 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADiSq7faSakaJ7QA66yv7hxOA+_2R-n503zJb65x8=hsCYWHew@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
	<CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
	<CADiSq7faSakaJ7QA66yv7hxOA+_2R-n503zJb65x8=hsCYWHew@mail.gmail.com>
Message-ID: <96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com>


On Jan 19, 2013, at 10:13 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> When the two are separated without a clear definition of what else can
> happen in between, *every other method on the protocol* needs to cope
> with the fact that other calls to protocol methods may happen in
> between the call to __init__ and the call to connection_made - you
> simply can't write a protocol without dealing with that problem.


Nope. You only have to deal with the methods that the transport will call on the protocol in that state, since nothing else has a reference to it yet.

Except the transport won't call them in that state, so... still nope.

Again: there's enormous corpus of Twisted code out there that is written this way, you can go look at that code to see how it deals with the problem you've imagined, which is to say: it doesn't.  It doesn't need to.

Now, if you make the change you're proposing, and tie together the protocol's construction with the transport's construction, so that you end up with protocol(transport(...)), this means that the protocol will immediately begin interacting with the transport in this vague, undefined, not quite connected state, because, since the protocol didn't even exist at the time of the transport's construction, the transport can't possibly have a reference to a protocol yet.  And the side of the protocol that issues a greeting will necessarily need to do transport.write(), which may want to trigger a notification to the protocol of some kind (flow control?), and where will that notification go?  It needs to be solved less often, but it's a much trickier problem to solve.

There are also some potential edge-cases where the existing Twisted-style design might be nicer, like delivering explicit TLS handshake notifications to protocols which support them in the vague state between protocol construction and connection_made, but seeing as how I still haven't gotten around to implementing that properly in Twisted, I imagine it will be another 10 years before Tulip is practically concerned with it :).

Finally, I should say that Guido's point about the transport constructor being private is actually somewhat important.  We've been saying 'transport(...)' thus far, but in fact it's more like 'SocketTransport(loop, socket)'.  Or perhaps in the case of a pipe, 'PipeTransport(loop, readfd, writefd)'.  In the case of an actual outbound TCP connection with name resolution, it's 'yield from make_outgoing_tcp_transport(loop, hostname, port)'.  Making these all methods that hide the details and timing of the transport's construction is a definite plus.

-glyph

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130119/7b87c565/attachment.html>

From ncoghlan at gmail.com  Sun Jan 20 08:18:33 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jan 2013 17:18:33 +1000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
	<CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
	<CADiSq7faSakaJ7QA66yv7hxOA+_2R-n503zJb65x8=hsCYWHew@mail.gmail.com>
	<96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com>
Message-ID: <CADiSq7eDSmD8ga3m51zp-WSoCbRdgARc3C-MepbLV8n8tvJdEQ@mail.gmail.com>

On Sun, Jan 20, 2013 at 4:51 PM, Glyph <glyph at twistedmatrix.com> wrote:
>
> On Jan 19, 2013, at 10:13 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> When the two are separated without a clear definition of what else can
> happen in between, *every other method on the protocol* needs to cope
> with the fact that other calls to protocol methods may happen in
> between the call to __init__ and the call to connection_made - you
> simply can't write a protocol without dealing with that problem.
>
>
> Nope. You only have to deal with the methods that the transport will call on
> the protocol in that state, since nothing else has a reference to it yet.
>
> Except the transport won't call them in that state, so... still nope.

Yes, after Guido explained how tulip was currently handling this, I
realised that the problem was mostly one of documentation. However, I
think there is one key bug in the current implementation, which is
that create_connection is returning *before* the call to
"connection_made" is completed, thus exposing the protocol in an
incompletely initialised state.

> Finally, I should say that Guido's point about the transport constructor
> being private is actually somewhat important.  We've been saying
> 'transport(...)' thus far, but in fact it's more like 'SocketTransport(loop,
> socket)'.  Or perhaps in the case of a pipe, 'PipeTransport(loop, readfd,
> writefd)'.  In the case of an actual outbound TCP connection with name
> resolution, it's 'yield from make_outgoing_tcp_transport(loop, hostname,
> port)'.  Making these all methods that hide the details and timing of the
> transport's construction is a definite plus.

Yes, I didn't have a problem with that part - it was just the lack of
clear explanation of the different roles of the protocol constructor
and the connection_made callback that I found problematic.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From glyph at twistedmatrix.com  Sun Jan 20 09:22:11 2013
From: glyph at twistedmatrix.com (Glyph)
Date: Sun, 20 Jan 2013 00:22:11 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADiSq7eDSmD8ga3m51zp-WSoCbRdgARc3C-MepbLV8n8tvJdEQ@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
	<CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
	<CADiSq7faSakaJ7QA66yv7hxOA+_2R-n503zJb65x8=hsCYWHew@mail.gmail.com>
	<96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com>
	<CADiSq7eDSmD8ga3m51zp-WSoCbRdgARc3C-MepbLV8n8tvJdEQ@mail.gmail.com>
Message-ID: <BE8E789B-03F1-4885-9281-04662EEA43E5@twistedmatrix.com>


On Jan 19, 2013, at 11:18 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Sun, Jan 20, 2013 at 4:51 PM, Glyph <glyph at twistedmatrix.com> wrote:
>> 
>> On Jan 19, 2013, at 10:13 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> 
>> When the two are separated without a clear definition of what else can
>> happen in between, *every other method on the protocol* needs to cope
>> with the fact that other calls to protocol methods may happen in
>> between the call to __init__ and the call to connection_made - you
>> simply can't write a protocol without dealing with that problem.
>> 
>> 
>> Nope. You only have to deal with the methods that the transport will call on
>> the protocol in that state, since nothing else has a reference to it yet.
>> 
>> Except the transport won't call them in that state, so... still nope.
> 
> Yes, after Guido explained how tulip was currently handling this, I
> realised that the problem was mostly one of documentation. However, I
> think there is one key bug in the current implementation, which is
> that create_connection is returning *before* the call to
> "connection_made" is completed, thus exposing the protocol in an
> incompletely initialised state.

Aah.  Yes, I think you're right about that being a bug.  There are probably some docs in Twisted that could be improved to explain that this ordering is part of our analogous interface's contract...

>> Finally, I should say that Guido's point about the transport constructor
>> being private is actually somewhat important.  We've been saying
>> 'transport(...)' thus far, but in fact it's more like 'SocketTransport(loop,
>> socket)'.  Or perhaps in the case of a pipe, 'PipeTransport(loop, readfd,
>> writefd)'.  In the case of an actual outbound TCP connection with name
>> resolution, it's 'yield from make_outgoing_tcp_transport(loop, hostname,
>> port)'.  Making these all methods that hide the details and timing of the
>> transport's construction is a definite plus.
> 
> Yes, I didn't have a problem with that part - it was just the lack of
> clear explanation of the different roles of the protocol constructor
> and the connection_made callback that I found problematic.

I wasn't clear if you were arguing against it; I just wanted to make it clear :).

-glyph



From phd at phdru.name  Sun Jan 20 11:13:39 2013
From: phd at phdru.name (Oleg Broytman)
Date: Sun, 20 Jan 2013 14:13:39 +0400
Subject: [Python-ideas] Parametrized any() and all() ?
In-Reply-To: <2e59f105-83fb-46b0-8e6f-e854a71ab08f@th3g2000pbc.googlegroups.com>
References: <50F6847D.2020404@ziade.org>
	<CADiSq7cesP5yi+skMjQXk65bXP4rO+tpU9W4W7tqacCkToPM4w@mail.gmail.com>
	<50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org>
	<20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org>
	<CADiSq7c9=ro6bNAAa=oJX0idRLd3W9jmXQM13hE1WDkAyUKkFg@mail.gmail.com>
	<CALFfu7D1gGF9dxvmH91cfhpB-HMELXvk0xOdseXWNTKjd=2rwg@mail.gmail.com>
	<kdc4n3$c6o$1@ger.gmane.org>
	<2e59f105-83fb-46b0-8e6f-e854a71ab08f@th3g2000pbc.googlegroups.com>
Message-ID: <20130120101339.GA7617@iskra.aviel.ru>

On Sat, Jan 19, 2013 at 06:02:32PM -0800, alex23 <wuwei23 at gmail.com> wrote:
> On Jan 19, 4:36?am, Terry Reedy <tjre... at udel.edu> wrote:
> > On 1/18/2013 10:54 AM, Eric Snow wrote:
> > > It took me a sec. ?:) ?DSU == "Decorate-Sort-Undecorate". [1]
> >
> > No, no, no. Its Delaware State University in Dover, as opposed to
> > Univesity of Delaware (UD) in Newark ;-).
> >
> > In other words, it depends on the universe you live in.
> 
> "Namespaces are one honking great idea" :)

   "In 1989, a random of the journalistic persuasion asked hacker Paul
Boutin "What do you think will be the biggest problem in computing in
the 90s?" Paul's straight-faced response: "There are only 17,000
three-letter acronyms." (To be exact, there are 26^3 = 17,576.)"

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From solipsis at pitrou.net  Sun Jan 20 13:36:54 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 20 Jan 2013 13:36:54 +0100
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
	<CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
Message-ID: <20130120133654.60dbfdb2@pitrou.net>

On Sat, 19 Jan 2013 20:35:04 -0800
Guido van Rossum <guido at python.org> wrote:
> IOW, even though a transport may "have" a protocol without a
> connection, nobody should care about that state, and nobody should be
> calling its methods (again, write() etc.) in that state. In fact,
> nobody except event loop internal code should ever have a reference to
> a transport in that state.

This is just not true. When the connection breaks, the protocol still
has a reference to the transport and may still be trying to do things
with the transport (because connection_lost() has not been called yet).

Regards

Antoine.




From solipsis at pitrou.net  Sun Jan 20 13:38:35 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 20 Jan 2013 13:38:35 +0100
Subject: [Python-ideas] PEP 3156: Clarifying the different components of
 the event loop API
References: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
Message-ID: <20130120133835.458f0c9e@pitrou.net>

On Sun, 20 Jan 2013 12:34:24 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> Low level transport creation:
> - create_connection()
> - create_pipe() # Once it exists in the PEP

You need some kind of create_listener() too.

Regards

Antoine.




From ncoghlan at gmail.com  Sun Jan 20 14:10:56 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 20 Jan 2013 23:10:56 +1000
Subject: [Python-ideas] PEP 3156: Clarifying the different components of
 the event loop API
In-Reply-To: <20130120133835.458f0c9e@pitrou.net>
References: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
	<20130120133835.458f0c9e@pitrou.net>
Message-ID: <CADiSq7d=ap7T83G4=v=eQefcRW1Fcg1nq1WF-n8k=7iE1o3fjg@mail.gmail.com>

On Jan 20, 2013 10:46 PM, "Antoine Pitrou" <solipsis at pitrou.net> wrote:
>
> On Sun, 20 Jan 2013 12:34:24 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
> >
> > Low level transport creation:
> > - create_connection()
> > - create_pipe() # Once it exists in the PEP
>
> You need some kind of create_listener() too.

That's actually the "start_serving" method up in the event monitoring
section. While it does end up creating transports, the overall flow is
rather different from the client side one.

Cheers,
Nick.

--
Sent from my phone, thus the relative brevity :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130120/68af71fc/attachment.html>

From solipsis at pitrou.net  Sun Jan 20 14:18:10 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 20 Jan 2013 14:18:10 +0100
Subject: [Python-ideas] PEP 3156: Clarifying the different components of
 the event loop API
References: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
	<20130120133835.458f0c9e@pitrou.net>
	<CADiSq7d=ap7T83G4=v=eQefcRW1Fcg1nq1WF-n8k=7iE1o3fjg@mail.gmail.com>
Message-ID: <20130120141810.3a162eb7@pitrou.net>

On Sun, 20 Jan 2013 23:10:56 +1000
Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Jan 20, 2013 10:46 PM, "Antoine Pitrou" <solipsis-xNDA5Wrcr86sTnJN9+BGXg at public.gmane.org> wrote:
> >
> > On Sun, 20 Jan 2013 12:34:24 +1000
> > Nick Coghlan <ncoghlan at gmail.com> wrote:
> > >
> > > Low level transport creation:
> > > - create_connection()
> > > - create_pipe() # Once it exists in the PEP
> >
> > You need some kind of create_listener() too.
> 
> That's actually the "start_serving" method up in the event monitoring
> section. While it does end up creating transports, the overall flow is
> rather different from the client side one.

Ah, right. Well, in any case, the API is much too limited. It doesn't
support SSL, and it doesn't support UDP.

?TBD: Support SSL? I don't even know how to do that synchronously, and
I suppose it needs a certificate.?

See http://docs.python.org/dev/library/ssl.html#server-side-operation
(and the non-blocking handshake part also applies)

Regards

Antoine.




From eliben at gmail.com  Sun Jan 20 15:18:02 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Sun, 20 Jan 2013 06:18:02 -0800
Subject: [Python-ideas] PEP 3156: Clarifying the different components of
 the event loop API
In-Reply-To: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
References: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
Message-ID: <CAF-Rda-u8Gq2wVcJC5NRw+s4=UiG+bg4iTVu_HVuAx1MGzC2Qg@mail.gmail.com>

On Sat, Jan 19, 2013 at 6:34 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> PEP 3156 currently lists *29* proposed methods for the event loop API.
> These methods serve quite different purposes and I think a bit more
> structure in the overall API could help clarify that.
>
> First proposal: clearly split the abstract EventLoop API from concrete
> DescriptorEventLoop and IOCPEventLoop subclasses.


> The main benefit here is to help clarify that:
> 1. the additional methods defined on DescriptorEventLoop and
> IOCPEventLoop are not available on all event loop implementations, so
> any code using them is necessarily event loop specific
> 2. the goal of the transport abstraction is to mask the differences
> between these low level platform specific APIs
> 3. other event loops are free to use a completely different API
> between their low level transports and the event loop
>
>
I like the idea of splitting up the big interface, but could you clarify
what would go into such subclasses? I.e. isn't the current EventLoop
interface supposed to represent an interface all event loops will adhere to?

And sorry if this was discussed before and I'm missing the context, but
what kinds of EventLoop implementations are we expecting to see eventually?
Is it only a matter of implementing the API per platform (akin to the
current tulip.unix_events.UnixEventLoop) or a broader expectation of
frameworks like Twisted to plug into the API by providing their own
implementation (PEP 3156 mentions this somewhere).


> Second proposal: better separate the "event loop management", "event
> monitoring" and "do things" methods
>

<snip>

>
> Do things (and get the result with "yield from"):
> - wrap_future() # Perhaps "wrap_executor_future"?
> - run_in_executor()
> - getaddrinfo()
> - getnameinfo()
>
> Low level transport creation:
> - create_connection()
> - create_pipe() # Once it exists in the PEP
>
>
+1 These certainly look somewhat out of place in the generic EventLoop API,
but concretely - how do you propose to structure the split?

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130120/0ab05cc8/attachment.html>

From ncoghlan at gmail.com  Sun Jan 20 15:49:59 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 21 Jan 2013 00:49:59 +1000
Subject: [Python-ideas] PEP 3156: Clarifying the different components of
 the event loop API
In-Reply-To: <CAF-Rda-u8Gq2wVcJC5NRw+s4=UiG+bg4iTVu_HVuAx1MGzC2Qg@mail.gmail.com>
References: <CADiSq7cT8BVi022sC5+dJ3Nom7z_bhCdbP6Gnb1YPTFQnBLafA@mail.gmail.com>
	<CAF-Rda-u8Gq2wVcJC5NRw+s4=UiG+bg4iTVu_HVuAx1MGzC2Qg@mail.gmail.com>
Message-ID: <CADiSq7c0O14_+_wVSwBd2LcELC0-O55GiU-5ta3Y9jx6TWG2AQ@mail.gmail.com>

The concrete event loop methods are already separated in the PEP - they're
just flagged as optional methods rather than distinct subclasses.

The rest I think actually do belong on the event loop, hence the suggestion
to start just by rearranging them into those categories, without making the
class hierarchy any more complicated.

--
Sent from my phone, thus the relative brevity :)
On Jan 21, 2013 12:18 AM, "Eli Bendersky" <eliben at gmail.com> wrote:

> On Sat, Jan 19, 2013 at 6:34 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>> PEP 3156 currently lists *29* proposed methods for the event loop API.
>> These methods serve quite different purposes and I think a bit more
>> structure in the overall API could help clarify that.
>>
>> First proposal: clearly split the abstract EventLoop API from concrete
>> DescriptorEventLoop and IOCPEventLoop subclasses.
>
>
>> The main benefit here is to help clarify that:
>> 1. the additional methods defined on DescriptorEventLoop and
>> IOCPEventLoop are not available on all event loop implementations, so
>> any code using them is necessarily event loop specific
>> 2. the goal of the transport abstraction is to mask the differences
>> between these low level platform specific APIs
>> 3. other event loops are free to use a completely different API
>> between their low level transports and the event loop
>>
>>
> I like the idea of splitting up the big interface, but could you clarify
> what would go into such subclasses? I.e. isn't the current EventLoop
> interface supposed to represent an interface all event loops will adhere to?
>
> And sorry if this was discussed before and I'm missing the context, but
> what kinds of EventLoop implementations are we expecting to see eventually?
> Is it only a matter of implementing the API per platform (akin to the
> current tulip.unix_events.UnixEventLoop) or a broader expectation of
> frameworks like Twisted to plug into the API by providing their own
> implementation (PEP 3156 mentions this somewhere).
>
>
>> Second proposal: better separate the "event loop management", "event
>> monitoring" and "do things" methods
>>
>
> <snip>
>
>>
>> Do things (and get the result with "yield from"):
>> - wrap_future() # Perhaps "wrap_executor_future"?
>> - run_in_executor()
>> - getaddrinfo()
>> - getnameinfo()
>>
>> Low level transport creation:
>> - create_connection()
>> - create_pipe() # Once it exists in the PEP
>>
>>
> +1 These certainly look somewhat out of place in the generic EventLoop
> API, but concretely - how do you propose to structure the split?
>
> Eli
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130121/4a7d7b49/attachment.html>

From guido at python.org  Sun Jan 20 20:03:22 2013
From: guido at python.org (Guido van Rossum)
Date: Sun, 20 Jan 2013 11:03:22 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <20130120133654.60dbfdb2@pitrou.net>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
	<CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
	<20130120133654.60dbfdb2@pitrou.net>
Message-ID: <CAP7+vJKhEJT3oa7NxF4L5Dvg=RCWsp1B1dWNx7KiE2XobXGR7g@mail.gmail.com>

On Sun, Jan 20, 2013 at 4:36 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sat, 19 Jan 2013 20:35:04 -0800
> Guido van Rossum <guido at python.org> wrote:
>> IOW, even though a transport may "have" a protocol without a
>> connection, nobody should care about that state, and nobody should be
>> calling its methods (again, write() etc.) in that state. In fact,
>> nobody except event loop internal code should ever have a reference to
>> a transport in that state.
>
> This is just not true. When the connection breaks, the protocol still
> has a reference to the transport and may still be trying to do things
> with the transport (because connection_lost() has not been called yet).

That's a different case though. There once *was* a connection. You are
right that the transport needs to protect itself against the protocol
making further calls to the transport API in this case. Anyway, I
think Nick is okay with the separation between the protocol_factory()
call and the connection_made() call, as long as the future returned by
create_connection() isn't marked done until the connection_made() call
returns. That's an easy fix in the current Tulip code. It's a little
harder though to fix up the PEP to clarify all this...

-- 
--Guido van Rossum (python.org/~guido)


From barry at python.org  Sun Jan 20 21:38:10 2013
From: barry at python.org (Barry Warsaw)
Date: Sun, 20 Jan 2013 15:38:10 -0500
Subject: [Python-ideas] chdir context manager
References: <20130119101024.GB2969@lp-shahaf.local>
	<50FABCD8.9080709@python.org>
Message-ID: <20130120153810.331f685b@anarchist.wooz.org>

On Jan 19, 2013, at 04:33 PM, Christian Heimes wrote:

>chdir() is not a safe operation because if affects the whole process.
>You can NOT make it work properly and safe in a multi-threaded
>environment or from code like signal handlers.

I've used a homebrewed chdir() context manager but only in certain limited and
specific test cases.  It's easy enough to write so it doesn't bother me that
once in a while I have to.

-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130120/0596a0e2/attachment.pgp>

From andrew at bemusement.org  Sun Jan 20 23:53:43 2013
From: andrew at bemusement.org (Andrew Bennetts)
Date: Mon, 21 Jan 2013 09:53:43 +1100
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
References: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
Message-ID: <20130120225343.GA26816@flay.puzzling.org>

Guido van Rossum wrote:
[?]
> I have a more-or-less working but probably incomplete version checked
> into the tulip repo:
> http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py
> 
> Note that this completely ignores stderr -- this makes the code
> simpler while still useful (there's plenty of useful stuff you can do
> without reading stderr), and avoids the questions Greg Ewing brought
> up about needing two transports (one for stdout, another for stderr).

Although 3 pipes to a subprocess (stdin, stdout, stderr) is the usual
convention, it's not the only possibility, so that configuration
shouldn't be hard-coded.  On POSIX some programs can and do make use of
the ability to have more pipes to a subprocess; e.g. the various *fd
options of gnupg (--status-fd, --logger-fd, --command-fd, and so on).
And some programs give the child process file descriptors that aren't
pipes, like sockets (e.g. an inetd-like server that accepts a socket
then spawns a subprocess to serve it).

So I hope tulip will support these possibilities (although obviously the
stdin/out/err style should be the convenient default).  You will be
unsurprised to hear that Twisted does :)

(Please forgive me if this was already pointed out.  It's hard keeping
up with python-ideas.)

-Andrew.



From p.f.moore at gmail.com  Mon Jan 21 00:25:12 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 20 Jan 2013 23:25:12 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <20130120225343.GA26816@flay.puzzling.org>
References: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<20130120225343.GA26816@flay.puzzling.org>
Message-ID: <CACac1F-yH4M1z-Q1f0X3LtQFkTaztFd8e3b=5gV7+GnHDT7bhg@mail.gmail.com>

On 20 January 2013 22:53, Andrew Bennetts <andrew at bemusement.org> wrote:
> Guido van Rossum wrote:
> [?]
>> I have a more-or-less working but probably incomplete version checked
>> into the tulip repo:
>> http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py
>>
>> Note that this completely ignores stderr -- this makes the code
>> simpler while still useful (there's plenty of useful stuff you can do
>> without reading stderr), and avoids the questions Greg Ewing brought
>> up about needing two transports (one for stdout, another for stderr).
>
> Although 3 pipes to a subprocess (stdin, stdout, stderr) is the usual
> convention, it's not the only possibility, so that configuration
> shouldn't be hard-coded.  On POSIX some programs can and do make use of
> the ability to have more pipes to a subprocess; e.g. the various *fd
> options of gnupg (--status-fd, --logger-fd, --command-fd, and so on).
> And some programs give the child process file descriptors that aren't
> pipes, like sockets (e.g. an inetd-like server that accepts a socket
> then spawns a subprocess to serve it).
>
> So I hope tulip will support these possibilities (although obviously the
> stdin/out/err style should be the convenient default).  You will be
> unsurprised to hear that Twisted does :)

My plan is to modify Guido's current code to take a subprocess.Popen
object when creating a connection to a subprocess. So you'd use the
existing API to start the process, and then tulip to interact with it.
Having said that, I have no idea if or how subprocess.Popen would
support the extra fds you are talking about. If you can show me some
sample code, I can see what would be needed to handle it. But as far
as I know, subprocess.Popen objects only have the 3 standard handles
exposed as attributes - stdin, stdout and stderr.

If you have to create your own pipes and manage them yourself in
"normal" code, then I would expect that you'd have to do the same with
tulip. That may indicate a need for (yet another) event loop API to
create a pipe which can then be used with subprocess. Or you could use
the add_reader/add_writer interfaces, at the expense of portability.

Paul

PS The above is still my plan. But at the moment, every PC in my house
seems to have decided to stop working, so I'm rebuilding PCs rather
than doing anything useful :-( Normal service will be resumed in due
course...


From eliben at gmail.com  Mon Jan 21 00:29:50 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Sun, 20 Jan 2013 15:29:50 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F-yH4M1z-Q1f0X3LtQFkTaztFd8e3b=5gV7+GnHDT7bhg@mail.gmail.com>
References: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<20130120225343.GA26816@flay.puzzling.org>
	<CACac1F-yH4M1z-Q1f0X3LtQFkTaztFd8e3b=5gV7+GnHDT7bhg@mail.gmail.com>
Message-ID: <CAF-Rda84UMV6L76wYhJUQ+ifKETCrMfs3rsPK70QGoVJmHHiQA@mail.gmail.com>

On Sun, Jan 20, 2013 at 3:25 PM, Paul Moore <p.f.moore at gmail.com> wrote:

> On 20 January 2013 22:53, Andrew Bennetts <andrew at bemusement.org> wrote:
> > Guido van Rossum wrote:
> > [?]
> >> I have a more-or-less working but probably incomplete version checked
> >> into the tulip repo:
> >>
> http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py
> >>
> >> Note that this completely ignores stderr -- this makes the code
> >> simpler while still useful (there's plenty of useful stuff you can do
> >> without reading stderr), and avoids the questions Greg Ewing brought
> >> up about needing two transports (one for stdout, another for stderr).
> >
> > Although 3 pipes to a subprocess (stdin, stdout, stderr) is the usual
> > convention, it's not the only possibility, so that configuration
> > shouldn't be hard-coded.  On POSIX some programs can and do make use of
> > the ability to have more pipes to a subprocess; e.g. the various *fd
> > options of gnupg (--status-fd, --logger-fd, --command-fd, and so on).
> > And some programs give the child process file descriptors that aren't
> > pipes, like sockets (e.g. an inetd-like server that accepts a socket
> > then spawns a subprocess to serve it).
> >
> > So I hope tulip will support these possibilities (although obviously the
> > stdin/out/err style should be the convenient default).  You will be
> > unsurprised to hear that Twisted does :)
>
> My plan is to modify Guido's current code to take a subprocess.Popen
> object when creating a connection to a subprocess. So you'd use the
> existing API to start the process, and then tulip to interact with it.
> Having said that, I have no idea if or how subprocess.Popen would
> support the extra fds you are talking about. If you can show me some
> sample code, I can see what would be needed to handle it. But as far
> as I know, subprocess.Popen objects only have the 3 standard handles
> exposed as attributes - stdin, stdout and stderr.
>
>
subprocess.Popen has the pass_fds argument, documented as follows:

    *pass_fds* is an optional sequence of file descriptors to keep open
between the parent and child. Providing any *pass_fds* forces *close_fds*to be
True <http://docs.python.org/dev/library/constants.html#True>. (Unix only)

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130120/581300ca/attachment.html>

From p.f.moore at gmail.com  Mon Jan 21 00:40:34 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Sun, 20 Jan 2013 23:40:34 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAF-Rda84UMV6L76wYhJUQ+ifKETCrMfs3rsPK70QGoVJmHHiQA@mail.gmail.com>
References: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<20130120225343.GA26816@flay.puzzling.org>
	<CACac1F-yH4M1z-Q1f0X3LtQFkTaztFd8e3b=5gV7+GnHDT7bhg@mail.gmail.com>
	<CAF-Rda84UMV6L76wYhJUQ+ifKETCrMfs3rsPK70QGoVJmHHiQA@mail.gmail.com>
Message-ID: <CACac1F-WCK2iD0jGb_uwL3VUm=woGgucTRvnxZvnSUJgjupW2Q@mail.gmail.com>

On 20 January 2013 23:29, Eli Bendersky <eliben at gmail.com> wrote:
> subprocess.Popen has the pass_fds argument, documented as follows:
>
>     pass_fds is an optional sequence of file descriptors to keep open
> between the parent and child. Providing any pass_fds forces close_fds to be
> True. (Unix only)

I thought that was the case, but it seems like this is only really
enabling you to manually manage the extra pipes as I was suggesting in
my comment.

My current expectation is that the API would be something like
eventloop.connect_process(protocol_factory, popen_obj) and the
protocol would have data_received and err_received methods called when
the stdout or stderr fds have data, and the transport would have a
write method to write to stdin.

If anyone has a suggestion for an API that could be used for arbitrary
FDs (which I presume could be either input or output) on top of this,
I'd be happy to incorporate it - but personally, I can't think of
anything that wouldn't be unusably complex :-(

Paul


From ncoghlan at gmail.com  Mon Jan 21 08:52:38 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 21 Jan 2013 17:52:38 +1000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJKhEJT3oa7NxF4L5Dvg=RCWsp1B1dWNx7KiE2XobXGR7g@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
	<CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
	<20130120133654.60dbfdb2@pitrou.net>
	<CAP7+vJKhEJT3oa7NxF4L5Dvg=RCWsp1B1dWNx7KiE2XobXGR7g@mail.gmail.com>
Message-ID: <CADiSq7eLQLTuoirLF0nMXPRSbMzDrd3Mv-HpQZpS5AmUm=OQzg@mail.gmail.com>

On Mon, Jan 21, 2013 at 5:03 AM, Guido van Rossum <guido at python.org> wrote:
> Anyway, I
> think Nick is okay with the separation between the protocol_factory()
> call and the connection_made() call, as long as the future returned by
> create_connection() isn't marked done until the connection_made() call
> returns. That's an easy fix in the current Tulip code. It's a little
> harder though to fix up the PEP to clarify all this...

Right, I understand what the separate method enables now. I think one
way to make it clearer in the PEP is to require that "connection_made"
return a Future or coroutine, rather than being an ordinary method
returning None.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From guido at python.org  Mon Jan 21 17:13:45 2013
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Jan 2013 08:13:45 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <20130120225343.GA26816@flay.puzzling.org>
References: <CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<20130120225343.GA26816@flay.puzzling.org>
Message-ID: <CAP7+vJ+sTQWDTu_t-jJLCxixLD5MmGwEbSSC-m6Wc3d9BP0GNg@mail.gmail.com>

On Sun, Jan 20, 2013 at 2:53 PM, Andrew Bennetts <andrew at bemusement.org> wrote:
> Guido van Rossum wrote:
> [?]
>> I have a more-or-less working but probably incomplete version checked
>> into the tulip repo:
>> http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py
>>
>> Note that this completely ignores stderr -- this makes the code
>> simpler while still useful (there's plenty of useful stuff you can do
>> without reading stderr), and avoids the questions Greg Ewing brought
>> up about needing two transports (one for stdout, another for stderr).
>
> Although 3 pipes to a subprocess (stdin, stdout, stderr) is the usual
> convention, it's not the only possibility, so that configuration
> shouldn't be hard-coded.  On POSIX some programs can and do make use of
> the ability to have more pipes to a subprocess; e.g. the various *fd
> options of gnupg (--status-fd, --logger-fd, --command-fd, and so on).
> And some programs give the child process file descriptors that aren't
> pipes, like sockets (e.g. an inetd-like server that accepts a socket
> then spawns a subprocess to serve it).

Hm. I agree that something to represent an arbitrary pipe or pair of
pipes may be useful occasionally, and we need to have an
implementation that can deal with stdout and stderr separately anyway,
but I don't think such extended configurations are common enough that
we need to completely generalize the API. I think it is fine to follow
the example of subprocess.py, which allows but does not encourage
extra pipes and treats stdin, stdout and stderr differently.

> So I hope tulip will support these possibilities (although obviously the
> stdin/out/err style should be the convenient default).  You will be
> unsurprised to hear that Twisted does :)
>
> (Please forgive me if this was already pointed out.  It's hard keeping
> up with python-ideas.)

-- 
--Guido van Rossum (python.org/~guido)


From guido at python.org  Mon Jan 21 17:22:19 2013
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Jan 2013 08:22:19 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADiSq7eLQLTuoirLF0nMXPRSbMzDrd3Mv-HpQZpS5AmUm=OQzg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
	<CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
	<20130120133654.60dbfdb2@pitrou.net>
	<CAP7+vJKhEJT3oa7NxF4L5Dvg=RCWsp1B1dWNx7KiE2XobXGR7g@mail.gmail.com>
	<CADiSq7eLQLTuoirLF0nMXPRSbMzDrd3Mv-HpQZpS5AmUm=OQzg@mail.gmail.com>
Message-ID: <CAP7+vJKtNk2EbRn+rUj5bontKe1YjPwyfibsj2KtcLV9As1jkw@mail.gmail.com>

On Sun, Jan 20, 2013 at 11:52 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Mon, Jan 21, 2013 at 5:03 AM, Guido van Rossum <guido at python.org> wrote:
>> Anyway, I
>> think Nick is okay with the separation between the protocol_factory()
>> call and the connection_made() call, as long as the future returned by
>> create_connection() isn't marked done until the connection_made() call
>> returns. That's an easy fix in the current Tulip code. It's a little
>> harder though to fix up the PEP to clarify all this...
>
> Right, I understand what the separate method enables now. I think one
> way to make it clearer in the PEP is to require that "connection_made"
> return a Future or coroutine, rather than being an ordinary method
> returning None.

Hm. This would seem to introduce Futures / coroutines at the wrong
level (I want to allow protocol implementers to use them, but not
require them). If connection_made() wants to initiate some blocking
I/O, it is free to do so, but it ought to wrap that in a Task. If the
class needs completion of this task to be a prerequisite for handling
data passed to a subsequent data_received() call, it will need to
devise some buffering and/or locking scheme that's outside the scope
of the PEP.

Note that I am also hoping to produce a more coroutine-oriented style
for writing protocols. The main piece of code for this already exists,
the StreamReader class
(http://code.google.com/p/tulip/source/browse/tulip/http_client.py?r=b1028ab02dc0f722d790aac4768663a972d9d555#37),
but I need to think about how to hook it all together nicely (for
writing, the transport's API is ready to be used by coroutines).

-- 
--Guido van Rossum (python.org/~guido)


From Steve.Dower at microsoft.com  Mon Jan 21 17:50:12 2013
From: Steve.Dower at microsoft.com (Steve Dower)
Date: Mon, 21 Jan 2013 16:50:12 +0000
Subject: [Python-ideas] chdir context manager
In-Reply-To: <20130120153810.331f685b@anarchist.wooz.org>
References: <20130119101024.GB2969@lp-shahaf.local>
	<50FABCD8.9080709@python.org>
	<20130120153810.331f685b@anarchist.wooz.org>
Message-ID: <f8597e6e71b646969534c7d737718036@BLUPR03MB035.namprd03.prod.outlook.com>

FWIW, when Windows revised their API set, GetCurrentDirectory and SetCurrentDirectory were completely removed. This seems a pretty strong move away from these APIs. (This only applies to new-style Windows 8 apps; desktop apps can still call them, but the intent is clear.)

Cheers,
Steve

> -----Original Message-----
> From: Python-ideas [mailto:python-ideas-
> bounces+steve.dower=microsoft.com at python.org] On Behalf Of Barry
> Warsaw
> Sent: Sunday, January 20, 2013 1238
> To: python-ideas at python.org
> Subject: Re: [Python-ideas] chdir context manager
> 
> On Jan 19, 2013, at 04:33 PM, Christian Heimes wrote:
> 
> >chdir() is not a safe operation because if affects the whole process.
> >You can NOT make it work properly and safe in a multi-threaded
> >environment or from code like signal handlers.
> 
> I've used a homebrewed chdir() context manager but only in certain limited
> and specific test cases.  It's easy enough to write so it doesn't bother me that
> once in a while I have to.
> 
> -Barry



From storchaka at gmail.com  Mon Jan 21 20:20:08 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Mon, 21 Jan 2013 21:20:08 +0200
Subject: [Python-ideas] More details in MemoryError
Message-ID: <kdk4db$k5a$1@ger.gmane.org>

I propose to add new optional attributes to MemoryError, which show how 
many memory was required in failed allocation and how many memory was 
used at this moment.



From ben at bendarnell.com  Mon Jan 21 22:23:06 2013
From: ben at bendarnell.com (Ben Darnell)
Date: Mon, 21 Jan 2013 16:23:06 -0500
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
	<50F8F725.20505@canterbury.ac.nz>
	<CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
Message-ID: <CAFkYKJ4eDigruA4pd5izCDf8GGFqmWxvxxr79JWtSug5LoJ=DA@mail.gmail.com>

On Fri, Jan 18, 2013 at 5:15 PM, Guido van Rossum <guido at python.org> wrote:

> On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing
> <greg.ewing at canterbury.ac.nz> wrote:
> > Paul Moore wrote:
> >>
> >> PS From the PEP, it seems that a protocol must implement the 4 methods
> >> connection_made, data_received, eof_received and connection_lost. For
> >> a process, which has 2 output streams involved, a single data_received
> >> method isn't enough.
>
> > It looks like there would have to be at least two Transport instances
> > involved, one for stdin/stdout and one for stderr.
> >
> > Connecting them both to a single Protocol object doesn't seem to be
> > possible with the framework as defined. You would have to use a
> > couple of adapter objects to translate the data_received calls into
> > calls on different methods of another object.
>
> So far this makes sense.
>
> But for this specific case there's a simpler solution -- require the
> protocol to support a few extra methods, in particular,
> err_data_received() and err_eof_received(), which are to stderr what
> data_received() and eof_received() are for stdout. (After all, the
> point of a subprocess is that "normal" data goes to stdout.) There's
> only one input stream to the subprocess, so there's no ambiguity for
> write(), and neither is there a need for multiple
> connection_made()/lost() methods. (However, we could argue endlessly
> over whether connection_lost() should be called when the subprocess
> exits, or when the other side of all three pipes is closed. :-)
>
>
Using separate methods for stderr breaks compatibility with existing
Protocols for no good reason (UDP needs a different protocol interface
because individual datagrams can't be concatenated; that doesn't apply here
since pipes are stream-oriented).  We'll have intermediate Protocol classes
like LineReceiver that work with sockets; why should they be reimplemented
for stderr?  It's also likely that if I do care about both stdout and
stderr, I'm going to take stdout as a blob and redirect it to a file, but
I'll want to read stderr with a line-oriented protocol to get error
messages, so I don't think we want to favor stdout over stderr in the
interface.

I think we should have a pipe-based Transport and the subprocess should
just contain several of these transports (depending on which fds the caller
cares about; in my experience I rarely have more than one pipe per
subprocess, but whether that pipe is stdout or stderr varies).  The process
object itself should also be able to run a callback when the child exits;
waiting for the standard streams to close is sufficient in most cases but
not always.

-Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130121/bcd3d613/attachment.html>

From benjamin at python.org  Mon Jan 21 23:12:10 2013
From: benjamin at python.org (Benjamin Peterson)
Date: Mon, 21 Jan 2013 22:12:10 +0000 (UTC)
Subject: [Python-ideas] More details in MemoryError
References: <kdk4db$k5a$1@ger.gmane.org>
Message-ID: <loom.20130121T231157-263@post.gmane.org>

Serhiy Storchaka <storchaka at ...> writes:

> 
> I propose to add new optional attributes to MemoryError, which show how 
> many memory was required in failed allocation and how many memory was 
> used at this moment.

What is this useful for?






From ben at bendarnell.com  Mon Jan 21 23:13:37 2013
From: ben at bendarnell.com (Ben Darnell)
Date: Mon, 21 Jan 2013 17:13:37 -0500
Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations and
	idleness?
Message-ID: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>

While working on proof-of-concept tornado/tulip integration (
https://gist.github.com/4582282), I found a few methods that could not
easily be implemented on top of the tornado IOLoop because they rely on
details that Tornado does not expose.  While it wouldn't be hard to add
support for these methods to Tornado, I would argue that they are
unnecessary and expose implementation details, and so they are good
candidates for removal from this already very broad interface.

First, run_once and call_every_iteration both expose the event loop's
underlying iterations to the application.  The trouble is that the duration
of one iteration is so widely variable that it's not a very useful concept
(and when implementing the EventLoop interface on top of some existing
event loop these methods may not be available).  When is it better to use
run_once instead of just using call_later to schedule a stop after a short
timeout, or call_every_iteration instead of call_repeatedly?

Second, while run_until_idle is convenient (especially for tests), it's
kind of fragile and exposes you to implementation details in the libraries
you use.  If anyone uses call_repeatedly, run_until_idle won't work unless
that callback is cancelled.  As an example, I once had to introduce
Tornado's equivalent of call_repeatedly in a library to work around a bug
in libcurl.  If had been using run_until_idle in my tests, they'd have all
broken.  I think we should either remove run_until_idle or add a "daemon"
flag to call_repeatedly (and call_later, and possibly others).

-Ben
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130121/31c068c5/attachment.html>

From guido at python.org  Tue Jan 22 03:31:41 2013
From: guido at python.org (Guido van Rossum)
Date: Mon, 21 Jan 2013 18:31:41 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAFkYKJ4eDigruA4pd5izCDf8GGFqmWxvxxr79JWtSug5LoJ=DA@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
	<50F8F725.20505@canterbury.ac.nz>
	<CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
	<CAFkYKJ4eDigruA4pd5izCDf8GGFqmWxvxxr79JWtSug5LoJ=DA@mail.gmail.com>
Message-ID: <CAP7+vJ+r4Ys-iJBnD4dk+mH9qni9M7sc56ACS+o8kGn+peNYcw@mail.gmail.com>

On Mon, Jan 21, 2013 at 1:23 PM, Ben Darnell <ben at bendarnell.com> wrote:
> On Fri, Jan 18, 2013 at 5:15 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing
>> <greg.ewing at canterbury.ac.nz> wrote:
>> > Paul Moore wrote:
>> >>
>> >> PS From the PEP, it seems that a protocol must implement the 4 methods
>> >> connection_made, data_received, eof_received and connection_lost. For
>> >> a process, which has 2 output streams involved, a single data_received
>> >> method isn't enough.
>>
>> > It looks like there would have to be at least two Transport instances
>> > involved, one for stdin/stdout and one for stderr.
>> >
>> > Connecting them both to a single Protocol object doesn't seem to be
>> > possible with the framework as defined. You would have to use a
>> > couple of adapter objects to translate the data_received calls into
>> > calls on different methods of another object.
>>
>> So far this makes sense.
>>
>> But for this specific case there's a simpler solution -- require the
>> protocol to support a few extra methods, in particular,
>> err_data_received() and err_eof_received(), which are to stderr what
>> data_received() and eof_received() are for stdout. (After all, the
>> point of a subprocess is that "normal" data goes to stdout.) There's
>> only one input stream to the subprocess, so there's no ambiguity for
>> write(), and neither is there a need for multiple
>> connection_made()/lost() methods. (However, we could argue endlessly
>> over whether connection_lost() should be called when the subprocess
>> exits, or when the other side of all three pipes is closed. :-)

> Using separate methods for stderr breaks compatibility with existing
> Protocols for no good reason (UDP needs a different protocol interface
> because individual datagrams can't be concatenated; that doesn't apply here
> since pipes are stream-oriented).  We'll have intermediate Protocol classes
> like LineReceiver that work with sockets; why should they be reimplemented
> for stderr?

This is a good point.

> It's also likely that if I do care about both stdout and
> stderr, I'm going to take stdout as a blob and redirect it to a file, but
> I'll want to read stderr with a line-oriented protocol to get error
> messages, so I don't think we want to favor stdout over stderr in the
> interface.

That all depends rather on the application.

> I think we should have a pipe-based Transport and the subprocess should just
> contain several of these transports (depending on which fds the caller cares
> about; in my experience I rarely have more than one pipe per subprocess, but
> whether that pipe is stdout or stderr varies).  The process object itself
> should also be able to run a callback when the child exits; waiting for the
> standard streams to close is sufficient in most cases but not always.

Unfortunately you'll also need a separate protocol for each transport,
since the transport calls methods with fixed names on the protocol
(and you've just argued that that we should stick to that -- and I
agree :-). Note that since there's (normally) only one input file to
the subprocess, only one of these transports should have a write()
method -- but both of them have to call data_received() and
potentially eof_received() on different objects.

And in this case it doesn't seem easy to use the StreamReader class,
since you can't know which of the two (stdout or stderr) will have
data available first, and guessing wrong might cause a deadlock. (So,
yes, this is a case where coroutines are less convenient than
callbacks.)

-- 
--Guido van Rossum (python.org/~guido)


From ben at bendarnell.com  Tue Jan 22 04:29:57 2013
From: ben at bendarnell.com (Ben Darnell)
Date: Mon, 21 Jan 2013 22:29:57 -0500
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+r4Ys-iJBnD4dk+mH9qni9M7sc56ACS+o8kGn+peNYcw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
	<50F8F725.20505@canterbury.ac.nz>
	<CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
	<CAFkYKJ4eDigruA4pd5izCDf8GGFqmWxvxxr79JWtSug5LoJ=DA@mail.gmail.com>
	<CAP7+vJ+r4Ys-iJBnD4dk+mH9qni9M7sc56ACS+o8kGn+peNYcw@mail.gmail.com>
Message-ID: <CAFkYKJ7MG8N-gZzMqB-NNgjCwMU8o-qL9v3g+J8wuWG6DaKPBw@mail.gmail.com>

On Mon, Jan 21, 2013 at 9:31 PM, Guido van Rossum <guido at python.org> wrote:

> On Mon, Jan 21, 2013 at 1:23 PM, Ben Darnell <ben at bendarnell.com> wrote:
> > On Fri, Jan 18, 2013 at 5:15 PM, Guido van Rossum <guido at python.org>
> wrote:
> >>
> >> On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing
> >> <greg.ewing at canterbury.ac.nz> wrote:
> >> > Paul Moore wrote:
> >> >>
> >> >> PS From the PEP, it seems that a protocol must implement the 4
> methods
> >> >> connection_made, data_received, eof_received and connection_lost. For
> >> >> a process, which has 2 output streams involved, a single
> data_received
> >> >> method isn't enough.
> >>
> >> > It looks like there would have to be at least two Transport instances
> >> > involved, one for stdin/stdout and one for stderr.
> >> >
> >> > Connecting them both to a single Protocol object doesn't seem to be
> >> > possible with the framework as defined. You would have to use a
> >> > couple of adapter objects to translate the data_received calls into
> >> > calls on different methods of another object.
> >>
> >> So far this makes sense.
> >>
> >> But for this specific case there's a simpler solution -- require the
> >> protocol to support a few extra methods, in particular,
> >> err_data_received() and err_eof_received(), which are to stderr what
> >> data_received() and eof_received() are for stdout. (After all, the
> >> point of a subprocess is that "normal" data goes to stdout.) There's
> >> only one input stream to the subprocess, so there's no ambiguity for
> >> write(), and neither is there a need for multiple
> >> connection_made()/lost() methods. (However, we could argue endlessly
> >> over whether connection_lost() should be called when the subprocess
> >> exits, or when the other side of all three pipes is closed. :-)
>
> > Using separate methods for stderr breaks compatibility with existing
> > Protocols for no good reason (UDP needs a different protocol interface
> > because individual datagrams can't be concatenated; that doesn't apply
> here
> > since pipes are stream-oriented).  We'll have intermediate Protocol
> classes
> > like LineReceiver that work with sockets; why should they be
> reimplemented
> > for stderr?
>
> This is a good point.
>
> > It's also likely that if I do care about both stdout and
> > stderr, I'm going to take stdout as a blob and redirect it to a file, but
> > I'll want to read stderr with a line-oriented protocol to get error
> > messages, so I don't think we want to favor stdout over stderr in the
> > interface.
>
> That all depends rather on the application.
>

Exactly.


>
> > I think we should have a pipe-based Transport and the subprocess should
> just
> > contain several of these transports (depending on which fds the caller
> cares
> > about; in my experience I rarely have more than one pipe per subprocess,
> but
> > whether that pipe is stdout or stderr varies).  The process object itself
> > should also be able to run a callback when the child exits; waiting for
> the
> > standard streams to close is sufficient in most cases but not always.
>
> Unfortunately you'll also need a separate protocol for each transport,
> since the transport calls methods with fixed names on the protocol
> (and you've just argued that that we should stick to that -- and I
> agree :-).


Well, to be precise I was arguing that pipe transports should work the same
way as socket transports.  I'm still not a fan of the use of fixed method
names.  (As an alternative, what if protocols were just callables that took
a Future argument?  for data_received future.result() would return the data
and for eof_received and connection_lost it would raise an appropriate
exception type.  That just leaves connection_made, which I was arguing in
the other thread should be on the protocol factory instead of the protocol.)


> Note that since there's (normally) only one input file to
> the subprocess, only one of these transports should have a write()
> method -- but both of them have to call data_received() and
> potentially eof_received() on different objects.
>

I'd actually give stdin its own transport and protocol, distinct from
stdout and stderr (remember that using all three pipes on the same process
is relatively uncommon).  It's a degenerate case since it will never call
data_received, but it's analogous to the way that subprocess uses three
read-only and write-only file objects instead of trying to glue stdin and
stdout together.

This is fairly new and little-tested, but it shows the interface I have in
mind:
http://tornado.readthedocs.org/en/latest/process.html#tornado.process.Subprocess


>
> And in this case it doesn't seem easy to use the StreamReader class,
> since you can't know which of the two (stdout or stderr) will have
> data available first, and guessing wrong might cause a deadlock. (So,
> yes, this is a case where coroutines are less convenient than
> callbacks.)
>

I'm not sure I follow.  Couldn't you just attach a StreamReader to each
stream and use as_completed to read from them both in parallel?  You'd get
in trouble if one of the streams has a line longer than the StreamReader's
buffer size, but that sort of peril is everywhere if you're using both
stdout and stderr, no matter what the interface is (unless you just use a
large or unlimited buffer and hope you won't run out of memory, like
subprocess.communicate).  At least with "yield from
stderr_stream.readline()" you're better off than with a synchronous
subprocess since the StreamReader's buffer size is adjustable, unlike the
pipe buffer size.

-Ben


>
> --
> --Guido van Rossum (python.org/~guido)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130121/cb53d265/attachment.html>

From greg.ewing at canterbury.ac.nz  Tue Jan 22 05:13:50 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 22 Jan 2013 17:13:50 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJ+r4Ys-iJBnD4dk+mH9qni9M7sc56ACS+o8kGn+peNYcw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<CACac1F9-W6VdTU=+=u488YbopkSrG_dE3Y-uNo9hzqS72pesHQ@mail.gmail.com>
	<CAP7+vJJE3AQNaGmphemk9TtaVLM+DgeMbHqbA4wt5Ma_qSQgWA@mail.gmail.com>
	<CACac1F99MGhWT-D_vSX+d8Os+a0CZQv8r7dh4v6zRrWw8Nmp-Q@mail.gmail.com>
	<50F8F725.20505@canterbury.ac.nz>
	<CAP7+vJ+jUqK6JBoLbYKWQDTpntwWWXpJa8qTZp3GYkyKv_ArkQ@mail.gmail.com>
	<CAFkYKJ4eDigruA4pd5izCDf8GGFqmWxvxxr79JWtSug5LoJ=DA@mail.gmail.com>
	<CAP7+vJ+r4Ys-iJBnD4dk+mH9qni9M7sc56ACS+o8kGn+peNYcw@mail.gmail.com>
Message-ID: <50FE11FE.80905@canterbury.ac.nz>

Guido van Rossum wrote:
> And in this case it doesn't seem easy to use the StreamReader class,
> since you can't know which of the two (stdout or stderr) will have
> data available first, and guessing wrong might cause a deadlock.

I don't see the problem. You run two Tasks, one handling
stdin/stdout and one handling stderr. (Or three tasks if
stdin and stdout are not synchronised.) Seems like an ideal
use case for coroutines to me.

-- 
Greg


From greg.ewing at canterbury.ac.nz  Tue Jan 22 05:17:57 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 22 Jan 2013 17:17:57 +1300
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CAP7+vJLsnGCJMe0P=DC5OjMgkcMSP1By4fu+Ft0q-y8Jn3X=Aw@mail.gmail.com>
	<50F9E1EA.4010305@canterbury.ac.nz>
	<CAP7+vJLL0M=zfEDtR5s-7t5hUVPaZKNfjNDjcFJ=yWCJv6j6gg@mail.gmail.com>
	<1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com>
	<CAFkYKJ4PtvoGBdwXFtrbMJRW-sYMxxSEC-7vxNy4g6GQRsF5Wg@mail.gmail.com>
	<BEE3B68A-94DE-440A-942B-40C0A2033166@twistedmatrix.com>
	<CADiSq7f1wTe_OB6_Rd11D2nqMcTK9VVV=1VXUWcTLZQ2Y6qa5Q@mail.gmail.com>
	<CAP7+vJKvKrmmvi6o7jD60nas9Jx5HsCJbjm_WOPfrVmaS6EPJQ@mail.gmail.com>
	<CADiSq7faSakaJ7QA66yv7hxOA+_2R-n503zJb65x8=hsCYWHew@mail.gmail.com>
	<96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com>
Message-ID: <50FE12F5.80108@canterbury.ac.nz>

Glyph wrote:
> 
> this means that the protocol will 
> immediately begin interacting with the transport in this vague, 
> undefined, not quite connected state,

You still haven't explained why the protocol can't simply
refrain from doing anything with the transport until its
connection_made() is called.

If a transport is always to be assumed ready-to-go as soon
as it's exposed to the outside world, what is the point
of having connection_made() at all?

-- 
Greg


From phd at phdru.name  Mon Jan 21 20:35:48 2013
From: phd at phdru.name (Oleg Broytman)
Date: Mon, 21 Jan 2013 23:35:48 +0400
Subject: [Python-ideas] More details in MemoryError
In-Reply-To: <kdk4db$k5a$1@ger.gmane.org>
References: <kdk4db$k5a$1@ger.gmane.org>
Message-ID: <20130121193548.GA20342@iskra.aviel.ru>

On Mon, Jan 21, 2013 at 09:20:08PM +0200, Serhiy Storchaka <storchaka at gmail.com> wrote:
> I propose to add new optional attributes to MemoryError, which show
> how many memory was required in failed allocation and how many
> memory was used at this moment.

   I'd very much like to see a situation when a program can survive
MemoryError.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From cf.natali at gmail.com  Tue Jan 22 08:13:23 2013
From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Tue, 22 Jan 2013 08:13:23 +0100
Subject: [Python-ideas] More details in MemoryError
In-Reply-To: <20130121193548.GA20342@iskra.aviel.ru>
References: <kdk4db$k5a$1@ger.gmane.org>
	<20130121193548.GA20342@iskra.aviel.ru>
Message-ID: <CAH_1eM154R35VRepLBjQRKDBQAX-wmB4NpZH-vONEA4RT8Dg=w@mail.gmail.com>

2013/1/21 Oleg Broytman <phd at phdru.name>:
>    I'd very much like to see a situation when a program can survive
> MemoryError.

Let's say your using an image processing program.
You have several images open on which you've been working for a couple
minutes/hours.
You open a new one, and it's so large that it results in MemoryError :
instead of just losing all your current work (yeah, the program should
support auto-save anyway, but let's pretend it doesn't), the program
catches MemoryError, and displays a popup saying "No enough memory to
process this image".

Now, sure, there are cases where an OOM condition will result in
thrashing to death, or simply because of overcommit malloc() will
never return NULL and you'll get nuked by the OOM killer, but
depending on your operating system and allocation pattern, there are
times when you can reasonably recover from a MemoryError.
Also, a memory allocation failure doesn't necessarily mean you're OOM,
it could be that youve exhausted your address space (on 32-bit), or
hit RLIMIT_VM/RLIMIT_DATA.

2013/1/21 Benjamin Peterson <benjamin at python.org>:
> What is this useful for?

Even if the exception isn't caught, if the extra information gets
dumped in the traceback, it can be used for post-mortem debugging (to
help distinguish between OOM, address space exhaustion, heap
fragmentation, overflow in computation of malloc() argument, etc).


So I think it could probably be useful, but I see two problems:
- right now, the amount of memory isn't tracked. IIRC, Antoine added
recently a counter for allocated blocks, not bytes
- the exception is raised at the calling site where the allocation
routine failed (this comes from Modules/_pickle.c):
"""
    PyMemoTable *memo = PyMem_MALLOC(sizeof(PyMemoTable));
        if (memo == NULL) {
        PyErr_NoMemory();
        return NULL;
    }
"""

So we can't easily capture the current allocated memory and the
requested memory (the former could probably be retrieved in
PyErr_NoMemory(), but the later would require modifying every call
site and repeating it).


From geertj at gmail.com  Tue Jan 22 09:04:30 2013
From: geertj at gmail.com (Geert Jansen)
Date: Tue, 22 Jan 2013 09:04:30 +0100
Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations
	and idleness?
In-Reply-To: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>
References: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>
Message-ID: <CADbA=FWC92A9_u8-sC4f8wSfWWEu0RF0acS6_8OxNUx6kAg2fg@mail.gmail.com>

On Mon, Jan 21, 2013 at 11:13 PM, Ben Darnell <ben at bendarnell.com> wrote:
> While working on proof-of-concept tornado/tulip integration
> (https://gist.github.com/4582282), I found a few methods that could not
> easily be implemented on top of the tornado IOLoop because they rely on
> details that Tornado does not expose.  While it wouldn't be hard to add
> support for these methods to Tornado, I would argue that they are
> unnecessary and expose implementation details, and so they are good
> candidates for removal from this already very broad interface.
>
> First, run_once and call_every_iteration both expose the event loop's
> underlying iterations to the application.  The trouble is that the duration
> of one iteration is so widely variable that it's not a very useful concept
> (and when implementing the EventLoop interface on top of some existing event
> loop these methods may not be available).  When is it better to use run_once
> instead of just using call_later to schedule a stop after a short timeout,
> or call_every_iteration instead of call_repeatedly?

- run_once() vs call_later(0) is probably the same thing and just an
matter of API design. If Tornado has call_later() it might be able to
emulate call_once() as call_later(0), depending on how call_once
works. In Guido's latest code for example call_once() callbacks, when
added inside a callback, will run in the *next* iteration. This makes
call_soon() and call_later(0) the same.

- call_every_iteration() vs call_repeatedly(): you really need both. I
did a small proof of concept to integrate libdbus with the tulip event
loop. I use call_every_iteration() to dispatch events every time after
IO has happened. The idea is that events will always originate from
IO, and therefore having a callback on every iteration is a convenient
way to check for events that need to be dispatched. Using
call_repeatedly() here is not right, because there may be times that
there are 100s of events per second, and times there are none. There
is no sensible fixed polling frequency.

If Tornado doesn't have infrastructure for call_every_iteration() you
could emulate it with a function that re-reschedules itself using
call_soon() just before calling the callback. (See my first point
about when call_soon() callbacks are scheduled.)

If you want to see how event loop adapters for libev and libuv look
like, you can check out my project here:
https://github.com/geertj/looping

Regards,
Geert


From ncoghlan at gmail.com  Tue Jan 22 09:53:49 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 22 Jan 2013 18:53:49 +1000
Subject: [Python-ideas] More details in MemoryError
In-Reply-To: <CAH_1eM154R35VRepLBjQRKDBQAX-wmB4NpZH-vONEA4RT8Dg=w@mail.gmail.com>
References: <kdk4db$k5a$1@ger.gmane.org>
	<20130121193548.GA20342@iskra.aviel.ru>
	<CAH_1eM154R35VRepLBjQRKDBQAX-wmB4NpZH-vONEA4RT8Dg=w@mail.gmail.com>
Message-ID: <CADiSq7f8Qnkmo6vGR+0tXNwWukR1SiWUsf-3iGy6xAQsNHPd+g@mail.gmail.com>

There's a bigger reason memory error must be stateless: we preallocate and
reuse it.

--
Sent from my phone, thus the relative brevity :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130122/4c79511a/attachment.html>

From storchaka at gmail.com  Tue Jan 22 10:00:48 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Tue, 22 Jan 2013 11:00:48 +0200
Subject: [Python-ideas] More details in MemoryError
In-Reply-To: <loom.20130121T231157-263@post.gmane.org>
References: <kdk4db$k5a$1@ger.gmane.org>
	<loom.20130121T231157-263@post.gmane.org>
Message-ID: <kdlkfu$2ij$1@ger.gmane.org>

On 22.01.13 00:12, Benjamin Peterson wrote:
> Serhiy Storchaka <storchaka at ...> writes:
>> I propose to add new optional attributes to MemoryError, which show how
>> many memory was required in failed allocation and how many memory was
>> used at this moment.
>
> What is this useful for?

Bigmem testing.



From solipsis at pitrou.net  Tue Jan 22 10:12:21 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 22 Jan 2013 10:12:21 +0100
Subject: [Python-ideas] More details in MemoryError
References: <kdk4db$k5a$1@ger.gmane.org>
	<20130121193548.GA20342@iskra.aviel.ru>
	<CAH_1eM154R35VRepLBjQRKDBQAX-wmB4NpZH-vONEA4RT8Dg=w@mail.gmail.com>
	<CADiSq7f8Qnkmo6vGR+0tXNwWukR1SiWUsf-3iGy6xAQsNHPd+g@mail.gmail.com>
Message-ID: <20130122101221.3e7b44a6@pitrou.net>

Le Tue, 22 Jan 2013 18:53:49 +1000,
Nick Coghlan <ncoghlan at gmail.com> a
?crit :
> There's a bigger reason memory error must be stateless: we
> preallocate and reuse it.

Not anymore, it's a freelist now:
http://hg.python.org/cpython/file/e8f40d4f497c/Objects/exceptions.c#l2123

The "stateless" part was bogus in Python 3, because of the embedded
traceback and context.

Regards

Antoine.




From solipsis at pitrou.net  Tue Jan 22 10:14:38 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 22 Jan 2013 10:14:38 +0100
Subject: [Python-ideas] More details in MemoryError
References: <kdk4db$k5a$1@ger.gmane.org>
Message-ID: <20130122101438.42a58bc0@pitrou.net>

Le Mon, 21 Jan 2013 21:20:08 +0200,
Serhiy Storchaka <storchaka at gmail.com> a
?crit :
> I propose to add new optional attributes to MemoryError, which show
> how many memory was required in failed allocation and how many memory
> was used at this moment.

+1 on the principle. I hope you can devise an implementation :-)

Regards

Antoine.




From ncoghlan at gmail.com  Tue Jan 22 11:42:38 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 22 Jan 2013 20:42:38 +1000
Subject: [Python-ideas] More details in MemoryError
In-Reply-To: <20130122101221.3e7b44a6@pitrou.net>
References: <kdk4db$k5a$1@ger.gmane.org>
	<20130121193548.GA20342@iskra.aviel.ru>
	<CAH_1eM154R35VRepLBjQRKDBQAX-wmB4NpZH-vONEA4RT8Dg=w@mail.gmail.com>
	<CADiSq7f8Qnkmo6vGR+0tXNwWukR1SiWUsf-3iGy6xAQsNHPd+g@mail.gmail.com>
	<20130122101221.3e7b44a6@pitrou.net>
Message-ID: <CADiSq7e9PQcaN0Oq266Ca9ZJ802t=GnMb1we6axTo2r2P-h83w@mail.gmail.com>

On Tue, Jan 22, 2013 at 7:12 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Le Tue, 22 Jan 2013 18:53:49 +1000,
> Nick Coghlan <ncoghlan at gmail.com> a
> ?crit :
>> There's a bigger reason memory error must be stateless: we
>> preallocate and reuse it.
>
> Not anymore, it's a freelist now:
> http://hg.python.org/cpython/file/e8f40d4f497c/Objects/exceptions.c#l2123
>
> The "stateless" part was bogus in Python 3, because of the embedded
> traceback and context.

Oh cool, I forgot about that change.

In that case, +0 for at least reporting how much memory was being
requested for the call that failed, even if that only turns out to be
useful in our own test suite. -0 for the "currently allocated"
suggestion though, as I don't see how we can provide a meaningful
value for that (too much memory usage can be outside of the controller
of the Python memory allocator, and we don't even track our own usage
all that closely in non-debug builds).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From solipsis at pitrou.net  Tue Jan 22 11:50:28 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 22 Jan 2013 11:50:28 +0100
Subject: [Python-ideas] More details in MemoryError
References: <kdk4db$k5a$1@ger.gmane.org>
	<20130121193548.GA20342@iskra.aviel.ru>
	<CAH_1eM154R35VRepLBjQRKDBQAX-wmB4NpZH-vONEA4RT8Dg=w@mail.gmail.com>
	<CADiSq7f8Qnkmo6vGR+0tXNwWukR1SiWUsf-3iGy6xAQsNHPd+g@mail.gmail.com>
	<20130122101221.3e7b44a6@pitrou.net>
	<CADiSq7e9PQcaN0Oq266Ca9ZJ802t=GnMb1we6axTo2r2P-h83w@mail.gmail.com>
Message-ID: <20130122115028.537fb2fc@pitrou.net>

Le Tue, 22 Jan 2013 20:42:38 +1000,
Nick Coghlan <ncoghlan at gmail.com> a
?crit :
> 
> In that case, +0 for at least reporting how much memory was being
> requested for the call that failed, even if that only turns out to be
> useful in our own test suite. -0 for the "currently allocated"
> suggestion though, as I don't see how we can provide a meaningful
> value for that (too much memory usage can be outside of the controller
> of the Python memory allocator, and we don't even track our own usage
> all that closely in non-debug builds).

Windows makes it easy to retrive the current process' memory statistics:
http://hg.python.org/benchmarks/file/43f8a0f5edd3/perf.py#l240

As usual, though, POSIX platforms are stupidly painful to work with:
http://hg.python.org/benchmarks/file/43f8a0f5edd3/perf.py#l202

Regards

Antoine.




From p.f.moore at gmail.com  Tue Jan 22 13:03:43 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 22 Jan 2013 12:03:43 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F_ALpTMMWqeJo4_QEiNMyRtVTOwdt-2fRk0AHqkhWfhdw@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
	<CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>
	<CACac1F_ALpTMMWqeJo4_QEiNMyRtVTOwdt-2fRk0AHqkhWfhdw@mail.gmail.com>
Message-ID: <CACac1F82S3O5VgAb03gcxULYz9xppnC0S-qqNTFfOH5AquTW4w@mail.gmail.com>

On 19 January 2013 12:12, Paul Moore <p.f.moore at gmail.com> wrote:
>> I would love for you to create that version. I only checked it in so I
>> could point to it -- I am not happy with either the implementation,
>> the API spec, or the unit test...
>
> May be a few days before I can get to it.

OK, I finally have a working VM.

The subprocess test code assumes that it can call
transport.write_eof() in the protocol connection_made() function. I'm
not sure if that is fundamental, or just an artifact of the current
implementation. Certainly if you have a stdin pipe open, you likely
want to close it to avoid deadlocks, but with the subprocess.Popen
approach, it's entirely possible to not open a pipe to stdin. In that
case, writing to stdin is neither possible nor necessary.

Clearly, writing data to stdin if you didn't open a pipe should be
flagged as an error. And my immediate thought is that write_eof should
also be an error. But I can imagine people wanting to write reusable
protocols that pre-emptively write EOF to the stdin pipe to avoid
deadlocks.

So, a question: If the user passed a popen object without a stdin
pipe, should write_eof be an error or should it just silently do
nothing?

Paul


From steve at pearwood.info  Tue Jan 22 13:42:05 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 22 Jan 2013 23:42:05 +1100
Subject: [Python-ideas] More details in MemoryError
In-Reply-To: <loom.20130121T231157-263@post.gmane.org>
References: <kdk4db$k5a$1@ger.gmane.org>
	<loom.20130121T231157-263@post.gmane.org>
Message-ID: <50FE891D.2080603@pearwood.info>

On 22/01/13 09:12, Benjamin Peterson wrote:
> Serhiy Storchaka<storchaka at ...>  writes:
>
>>
>> I propose to add new optional attributes to MemoryError, which show how
>> many memory was required in failed allocation and how many memory was
>> used at this moment.
>
> What is this useful for?


After locking up a production machine with a foolishly large list
multiplication (I left it thrashing overnight, and 16+ hours later gave
up and power-cycled the machine), I have come to appreciate ulimit on
Linux systems. That means I often see MemoryErrors while testing.


[steve at ando ~]$ ulimit -v 20000
[steve at ando ~]$ python3.3
Python 3.3.0rc3 (default, Sep 27 2012, 18:44:58)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux
Type "help", "copyright", "credits" or "license" for more information.
=== startup script executed ===
py> x = [0]*1000000
py> x = [0]*123456789012  # oops what was I thinking?
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
MemoryError


For interactive use, it would be really useful in such a situation to
see how much memory was requested and how much was available. That
would allow me to roughly estimate (say) how big a list I could make
in the available memory, instead of tediously trying larger and smaller
lists.

Something like this could be used to decide whether or not to flush
unimportant in-memory caches, compact data structures, etc., or just
give up and exit.



-- 
Steven


From rosuav at gmail.com  Tue Jan 22 14:04:15 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 23 Jan 2013 00:04:15 +1100
Subject: [Python-ideas] More details in MemoryError
In-Reply-To: <50FE891D.2080603@pearwood.info>
References: <kdk4db$k5a$1@ger.gmane.org>
	<loom.20130121T231157-263@post.gmane.org>
	<50FE891D.2080603@pearwood.info>
Message-ID: <CAPTjJmrkSFMQ6Am+WcRd7YFjCaWBexgufMc+e4O5XthujPJU-Q@mail.gmail.com>

On Tue, Jan 22, 2013 at 11:42 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> Something like this could be used to decide whether or not to flush
> unimportant in-memory caches, compact data structures, etc., or just
> give up and exit.

That's a nice idea, but unless the requested allocation was fairly
large, there's a good chance you don't have room to allocate anything
more. That may make it a bit tricky to do a compaction operation. But
if there's some sort of "automatically freeable memory" (simple
example: exception-triggered stack unwinding results in a whole bunch
of locals disappearing), and you can stay within that, then you might
be able to recover. Would require some tightrope-walking in the
exception handler, but ought to be possible.

ChrisA


From ncoghlan at gmail.com  Tue Jan 22 14:34:48 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 22 Jan 2013 23:34:48 +1000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F82S3O5VgAb03gcxULYz9xppnC0S-qqNTFfOH5AquTW4w@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
	<CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>
	<CACac1F_ALpTMMWqeJo4_QEiNMyRtVTOwdt-2fRk0AHqkhWfhdw@mail.gmail.com>
	<CACac1F82S3O5VgAb03gcxULYz9xppnC0S-qqNTFfOH5AquTW4w@mail.gmail.com>
Message-ID: <CADiSq7c4EKrrYGYs0w_Za22FLjTTcNFBdTg3jPWRRrr=2iQ64g@mail.gmail.com>

On Tue, Jan 22, 2013 at 10:03 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> So, a question: If the user passed a popen object without a stdin
> pipe, should write_eof be an error or should it just silently do
> nothing?

It should be an error. The analogy is similar to calling flush() vs
close(). Calling flush() on an already closed file is an error, while
you can call close() as many times as you like.

If you want to ensure a pipe is closed gracefully, call close(), not
write_eof(). (abort() is the method for abrupt closure).

Also, I agree with the comment someone else made that attempting to
pair stdin with either stderr or stdout is a bad idea - better to
treat them as three independent transports (as the subprocess module
does), so that the close() semantics and error handling are clear.

sockets are different, as those actually *are* bidirectional data
streams, whereas pipes are unidirectional.

I don't know whether it's worth defining separate SimplexTransmit
(e.g. stdin pipe in parent process, stdout, stderr pipes in child
process), SimplexReceive (stdout, stderr pipes in parent process,
stdin pip in child process), HalfDuplex (e.g. some radio transmitters)
and FullDuplex (e.g. sockets) transport abstractions - I guess if
Twisted haven't needed them, it probably isn't worth bothering. It's
also fairly obvious how to implement the first three based on the full
duplex API currently described in the PEP just be raising the
appropriate exceptions.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From solipsis at pitrou.net  Tue Jan 22 14:49:36 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 22 Jan 2013 14:49:36 +0100
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
	<CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>
	<CACac1F_ALpTMMWqeJo4_QEiNMyRtVTOwdt-2fRk0AHqkhWfhdw@mail.gmail.com>
	<CACac1F82S3O5VgAb03gcxULYz9xppnC0S-qqNTFfOH5AquTW4w@mail.gmail.com>
	<CADiSq7c4EKrrYGYs0w_Za22FLjTTcNFBdTg3jPWRRrr=2iQ64g@mail.gmail.com>
Message-ID: <20130122144936.3ead5006@pitrou.net>

Le Tue, 22 Jan 2013 23:34:48 +1000,
Nick Coghlan <ncoghlan at gmail.com> a
?crit :
> 
> Also, I agree with the comment someone else made that attempting to
> pair stdin with either stderr or stdout is a bad idea - better to
> treat them as three independent transports (as the subprocess module
> does), so that the close() semantics and error handling are clear.
> 
> sockets are different, as those actually *are* bidirectional data
> streams, whereas pipes are unidirectional.

+1

> I don't know whether it's worth defining separate SimplexTransmit
> (e.g. stdin pipe in parent process, stdout, stderr pipes in child
> process), SimplexReceive (stdout, stderr pipes in parent process,
> stdin pip in child process), HalfDuplex (e.g. some radio transmitters)
> and FullDuplex (e.g. sockets) transport abstractions - I guess if
> Twisted haven't needed them, it probably isn't worth bothering.

It's an implementation detail, since the user should only see transport
instances, not transport classes.
(until the user tries to write their own transport, that is)

Regards

Antoine.




From ben at bendarnell.com  Tue Jan 22 16:31:22 2013
From: ben at bendarnell.com (Ben Darnell)
Date: Tue, 22 Jan 2013 10:31:22 -0500
Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations
	and idleness?
In-Reply-To: <CADbA=FWC92A9_u8-sC4f8wSfWWEu0RF0acS6_8OxNUx6kAg2fg@mail.gmail.com>
References: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>
	<CADbA=FWC92A9_u8-sC4f8wSfWWEu0RF0acS6_8OxNUx6kAg2fg@mail.gmail.com>
Message-ID: <CAFkYKJ5Be27ZR4ChvEcXsOZ87rAcVZMiDkpgp7v1xAdJH4kTMA@mail.gmail.com>

On Tue, Jan 22, 2013 at 3:04 AM, Geert Jansen <geertj at gmail.com> wrote:

> - call_every_iteration() vs call_repeatedly(): you really need both. I
>  did a small proof of concept to integrate libdbus with the tulip event
> loop. I use call_every_iteration() to dispatch events every time after
> IO has happened. The idea is that events will always originate from
> IO, and therefore having a callback on every iteration is a convenient
> way to check for events that need to be dispatched. Using
> call_repeatedly() here is not right, because there may be times that
> there are 100s of events per second, and times there are none. There
> is no sensible fixed polling frequency.
>

I don't understand what you mean by "events will always originate from IO"
(I don't know anything about libdbus).  If the events are coming from IO
that causes an event loop iteration, it must be from some tulip callback.
 Why can't that callback be responsible for scheduling any further
dispatching that may be needed?


>
> If Tornado doesn't have infrastructure for call_every_iteration() you
> could emulate it with a function that re-reschedules itself using
> call_soon() just before calling the callback. (See my first point
> about when call_soon() callbacks are scheduled.)
>

No, because call_soon (and call_later(0)) cause the event loop to use a
timeout of zero on its next poll call, so a function that reschedules
itself with call_soon will be a busy loop.  There is no good way to emulate
call_every_iteration from the other methods; you'll either busy loop with
call_soon or use a fixed timeout.  If you need it it's an easy thing to
offer, but since neither tornado nor twisted have such a method I'm
questioning the need.

run_once() will run for an unpredictable amount of time (until the next IO
or timeout); run_forever() with call_soon(stop) will handle events that are
ready at that moment and then stop.

-Ben


>
> If you want to see how event loop adapters for libev and libuv look
> like, you can check out my project here:
> https://github.com/geertj/looping
>
> Regards,
> Geert
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130122/54b0d66c/attachment.html>

From p.f.moore at gmail.com  Tue Jan 22 16:43:51 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 22 Jan 2013 15:43:51 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CADiSq7c4EKrrYGYs0w_Za22FLjTTcNFBdTg3jPWRRrr=2iQ64g@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
	<CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>
	<CACac1F_ALpTMMWqeJo4_QEiNMyRtVTOwdt-2fRk0AHqkhWfhdw@mail.gmail.com>
	<CACac1F82S3O5VgAb03gcxULYz9xppnC0S-qqNTFfOH5AquTW4w@mail.gmail.com>
	<CADiSq7c4EKrrYGYs0w_Za22FLjTTcNFBdTg3jPWRRrr=2iQ64g@mail.gmail.com>
Message-ID: <CACac1F-=e1phQULJz6hMbfXAVrUik1mqiEj_wC2pi45cdbMe-w@mail.gmail.com>

On 22 January 2013 13:34, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Also, I agree with the comment someone else made that attempting to
> pair stdin with either stderr or stdout is a bad idea - better to
> treat them as three independent transports (as the subprocess module
> does), so that the close() semantics and error handling are clear.

That was my original feeling - although I made my case badly by
arguing in terms of portability rather than clearer design. But Guido
argued for a higher-level portable subprocess transport that was
implemented "under the hood" using the existing nonportable
add_reader/add_writer methods on Unix, and an as-yet-unimplemented
IOCP-based alternative on Windows.

I still feel that a more general approach would be to have two methods
on the event loop connect_input_pipe(protocol_factory, readable_pipe)
and connect_output_pipe(protocol_factory, writeable_pipe) which use
the standard transport/protocol methods as defined in the PEP. Then
the subprocess transport can be layered on top of that as one possible
example of a "higher layer" convenience transport.

I know that twisted has a create_process event loop (reactor) method,
but I suspect part of the reason for that is that it predates the
subprocess module's unified interface.

I'll try implementing the pipe transport approach and see how it looks
in contrast.

Paul.


From p.f.moore at gmail.com  Tue Jan 22 17:10:18 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 22 Jan 2013 16:10:18 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F-=e1phQULJz6hMbfXAVrUik1mqiEj_wC2pi45cdbMe-w@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
	<CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>
	<CACac1F_ALpTMMWqeJo4_QEiNMyRtVTOwdt-2fRk0AHqkhWfhdw@mail.gmail.com>
	<CACac1F82S3O5VgAb03gcxULYz9xppnC0S-qqNTFfOH5AquTW4w@mail.gmail.com>
	<CADiSq7c4EKrrYGYs0w_Za22FLjTTcNFBdTg3jPWRRrr=2iQ64g@mail.gmail.com>
	<CACac1F-=e1phQULJz6hMbfXAVrUik1mqiEj_wC2pi45cdbMe-w@mail.gmail.com>
Message-ID: <CACac1F_P4Ge1D9aGxiHY+=BuVSoPe13=v09UzZMWGhieZyAUfg@mail.gmail.com>

On 22 January 2013 15:43, Paul Moore <p.f.moore at gmail.com> wrote:
> I'll try implementing the pipe transport approach and see how it looks
> in contrast.

Here's a quick proof of concept (for a read pipe):

class UnixEventLoop(events.EventLoop):
    ...
    @tasks.task
    def connect_read_pipe(self, protocol_factory, rpipe):
        protocol = protocol_factory()
        waiter = futures.Future()
        transport = _UnixReadPipeTransport(self, rpipe, protocol, waiter)
        yield from waiter
        return transport, protocol

class _UnixReadPipeTransport(transports.Transport):

    def __init__(self, event_loop, rpipe, protocol, waiter=None):
        self._event_loop = event_loop
        self._pipe = rpipe.fileno()
        self._protocol = protocol
        self._buffer = []
        self._event_loop.add_reader(self._pipe.fileno(), self._read_ready)
        self._event_loop.call_soon(self._protocol.connection_made, self)
        if waiter is not None:
            self._event_loop.call_soon(waiter.set_result, None)

    def _read_ready(self):
        try:
            data = os.read(self._pipe, 16*1024)
        except BlockingIOError:
            return
        if data:
            self._event_loop.call_soon(self._protocol.data_received, data)
        else:
            self._event_loop.remove_reader(self._pipe)
            self._event_loop.call_soon(self._protocol.eof_received)


Using this to re-implement the subprocess test looks something like
this (the protocol is unchanged from the existing test):

def testUnixSubprocessWithPipe(self):
    proc = subprocess.Popen(['/bin/ls', '-lR'], stdout=subprocess.PIPE)
    t, p = yield from self.event_loop.connect_read_pipe(MyProto, proc.stdout)
    self.event_loop.run()

To be honest, this looks sufficiently straightforward that I don't see
the benefit in a less-general high-level transport type...

Paul


From geertj at gmail.com  Tue Jan 22 17:16:15 2013
From: geertj at gmail.com (Geert Jansen)
Date: Tue, 22 Jan 2013 17:16:15 +0100
Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations
	and idleness?
In-Reply-To: <CAFkYKJ5Be27ZR4ChvEcXsOZ87rAcVZMiDkpgp7v1xAdJH4kTMA@mail.gmail.com>
References: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>
	<CADbA=FWC92A9_u8-sC4f8wSfWWEu0RF0acS6_8OxNUx6kAg2fg@mail.gmail.com>
	<CAFkYKJ5Be27ZR4ChvEcXsOZ87rAcVZMiDkpgp7v1xAdJH4kTMA@mail.gmail.com>
Message-ID: <CADbA=FWPxaAB_CKgGT_OmsWM85X9-A5NsOJ5d+ipBSDO9O3x0Q@mail.gmail.com>

On Tue, Jan 22, 2013 at 4:31 PM, Ben Darnell <ben at bendarnell.com> wrote:

> I don't understand what you mean by "events will always originate from IO"
> (I don't know anything about libdbus).

What I meant is that if there is something to dispatch, then this is
due to an inbound IO (or a timeout for that matter). Due either event,
the loop will advance by one tick, and hit my call_every_iteration()
handler where I dispatch.

> If the events are coming from IO
> that causes an event loop iteration, it must be from some tulip callback.
> Why can't that callback be responsible for scheduling any further
> dispatching that may be needed?

Well your original question was why not call_repeatedly() instead of
call_every_iteration(). I tried to answer that for my use case.

Indeed, call_soon() could be used to schedule a dispatch every time
when an IO is received. However, I preferred to have a fixed callback
that I do not need to allocate and register every time, for
efficiency.

>> If Tornado doesn't have infrastructure for call_every_iteration() you
>> could emulate it with a function that re-reschedules itself using
>> call_soon() just before calling the callback. (See my first point
>> about when call_soon() callbacks are scheduled.)
>
>
> No, because call_soon (and call_later(0)) cause the event loop to use a
> timeout of zero on its next poll call, so a function that reschedules itself
> with call_soon will be a busy loop.  There is no good way to emulate
> call_every_iteration from the other methods; you'll either busy loop with
> call_soon or use a fixed timeout.  If you need it it's an easy thing to
> offer, but since neither tornado nor twisted have such a method I'm
> questioning the need.

Yes, you're right. I was confusing things with libuv and libev. I may
have actually implemented call_soon() the wrong way there :)

Maybe I am abusing call_every_iteration() when I use it for
dispatching. If you look at the libuv and libev documentation, then
they say that their call_every_iteration() equivalents (Prepare and
Check) are for integrating with external event loops. So maybe that is
the use case. However, I've not looked into this in any detail.

If Tornado and Twisted cannot implement call_every_iteration(), then I
think that is a good reason to remove it.

Regards,
Geert


From guido at python.org  Tue Jan 22 17:33:28 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 22 Jan 2013 08:33:28 -0800
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CACac1F_P4Ge1D9aGxiHY+=BuVSoPe13=v09UzZMWGhieZyAUfg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
	<CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>
	<CACac1F_ALpTMMWqeJo4_QEiNMyRtVTOwdt-2fRk0AHqkhWfhdw@mail.gmail.com>
	<CACac1F82S3O5VgAb03gcxULYz9xppnC0S-qqNTFfOH5AquTW4w@mail.gmail.com>
	<CADiSq7c4EKrrYGYs0w_Za22FLjTTcNFBdTg3jPWRRrr=2iQ64g@mail.gmail.com>
	<CACac1F-=e1phQULJz6hMbfXAVrUik1mqiEj_wC2pi45cdbMe-w@mail.gmail.com>
	<CACac1F_P4Ge1D9aGxiHY+=BuVSoPe13=v09UzZMWGhieZyAUfg@mail.gmail.com>
Message-ID: <CAP7+vJLKUXy6-rNjJzjE5vQZ3_LkeZemGN3V3H0S-KaJwvaJKg@mail.gmail.com>

I am not actually very committed to a particular design for a subprocess
transport. I'll happily leave it up to others to come up with a design and
make it work on multiple platforms.

--Guido van Rossum (sent from Android phone)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130122/e7b8953d/attachment.html>

From solipsis at pitrou.net  Tue Jan 22 19:27:10 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 22 Jan 2013 19:27:10 +0100
Subject: [Python-ideas] More details in MemoryError
References: <kdk4db$k5a$1@ger.gmane.org>
	<loom.20130121T231157-263@post.gmane.org>
	<50FE891D.2080603@pearwood.info>
	<CAPTjJmrkSFMQ6Am+WcRd7YFjCaWBexgufMc+e4O5XthujPJU-Q@mail.gmail.com>
Message-ID: <20130122192710.6f94d16f@pitrou.net>

On Wed, 23 Jan 2013 00:04:15 +1100
Chris Angelico <rosuav at gmail.com> wrote:
> On Tue, Jan 22, 2013 at 11:42 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> > Something like this could be used to decide whether or not to flush
> > unimportant in-memory caches, compact data structures, etc., or just
> > give up and exit.
> 
> That's a nice idea, but unless the requested allocation was fairly
> large, there's a good chance you don't have room to allocate anything
> more.

I wouldn't be surprised if most cases of MemoryErrors were on fairly
large allocation requests ;-)

Regards

Antoine.




From python at mrabarnett.plus.com  Tue Jan 22 19:36:07 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 22 Jan 2013 18:36:07 +0000
Subject: [Python-ideas] More details in MemoryError
In-Reply-To: <CAPTjJmrkSFMQ6Am+WcRd7YFjCaWBexgufMc+e4O5XthujPJU-Q@mail.gmail.com>
References: <kdk4db$k5a$1@ger.gmane.org>
	<loom.20130121T231157-263@post.gmane.org>
	<50FE891D.2080603@pearwood.info>
	<CAPTjJmrkSFMQ6Am+WcRd7YFjCaWBexgufMc+e4O5XthujPJU-Q@mail.gmail.com>
Message-ID: <50FEDC17.7010608@mrabarnett.plus.com>

On 2013-01-22 13:04, Chris Angelico wrote:
> On Tue, Jan 22, 2013 at 11:42 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> Something like this could be used to decide whether or not to flush
>> unimportant in-memory caches, compact data structures, etc., or just
>> give up and exit.
>
> That's a nice idea, but unless the requested allocation was fairly
> large, there's a good chance you don't have room to allocate anything
> more. That may make it a bit tricky to do a compaction operation. But
> if there's some sort of "automatically freeable memory" (simple
> example: exception-triggered stack unwinding results in a whole bunch
> of locals disappearing), and you can stay within that, then you might
> be able to recover. Would require some tightrope-walking in the
> exception handler, but ought to be possible.
>
FYI, allocating memory specially for such cases is sometimes called a
"memory parachute".

I wonder whether you could have a subclass of MemoryError called
LowMemoryError.

If allocation fails and there's a parachute, it would free the
parachute and raise LowMemoryError. That would gave you a chance to
tidy up before quitting or even, perhaps, free enough stuff to make a
new parachute and continue working.

If allocation fails and there's no parachute, it would raise
MemoryError as at present.

With LowMemoryError as a subclass of MemoryError, existing code would
still work the same.


From guido at python.org  Tue Jan 22 20:19:04 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 22 Jan 2013 11:19:04 -0800
Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations
	and idleness?
In-Reply-To: <CADbA=FWPxaAB_CKgGT_OmsWM85X9-A5NsOJ5d+ipBSDO9O3x0Q@mail.gmail.com>
References: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>
	<CADbA=FWC92A9_u8-sC4f8wSfWWEu0RF0acS6_8OxNUx6kAg2fg@mail.gmail.com>
	<CAFkYKJ5Be27ZR4ChvEcXsOZ87rAcVZMiDkpgp7v1xAdJH4kTMA@mail.gmail.com>
	<CADbA=FWPxaAB_CKgGT_OmsWM85X9-A5NsOJ5d+ipBSDO9O3x0Q@mail.gmail.com>
Message-ID: <CAP7+vJKvQNVNk1n7Q8B6Oaio16Aai-hqbaqGn57AE-u7J0AFSA@mail.gmail.com>

On Tue, Jan 22, 2013 at 8:16 AM, Geert Jansen <geertj at gmail.com> wrote:
> On Tue, Jan 22, 2013 at 4:31 PM, Ben Darnell <ben at bendarnell.com> wrote:
>
>> I don't understand what you mean by "events will always originate from IO"
>> (I don't know anything about libdbus).
>
> What I meant is that if there is something to dispatch, then this is
> due to an inbound IO (or a timeout for that matter). Due either event,
> the loop will advance by one tick, and hit my call_every_iteration()
> handler where I dispatch.
>
>> If the events are coming from IO
>> that causes an event loop iteration, it must be from some tulip callback.
>> Why can't that callback be responsible for scheduling any further
>> dispatching that may be needed?
>
> Well your original question was why not call_repeatedly() instead of
> call_every_iteration(). I tried to answer that for my use case.
>
> Indeed, call_soon() could be used to schedule a dispatch every time
> when an IO is received. However, I preferred to have a fixed callback
> that I do not need to allocate and register every time, for
> efficiency.
>
>>> If Tornado doesn't have infrastructure for call_every_iteration() you
>>> could emulate it with a function that re-reschedules itself using
>>> call_soon() just before calling the callback. (See my first point
>>> about when call_soon() callbacks are scheduled.)
>>
>>
>> No, because call_soon (and call_later(0)) cause the event loop to use a
>> timeout of zero on its next poll call, so a function that reschedules itself
>> with call_soon will be a busy loop.  There is no good way to emulate
>> call_every_iteration from the other methods; you'll either busy loop with
>> call_soon or use a fixed timeout.  If you need it it's an easy thing to
>> offer, but since neither tornado nor twisted have such a method I'm
>> questioning the need.
>
> Yes, you're right. I was confusing things with libuv and libev. I may
> have actually implemented call_soon() the wrong way there :)
>
> Maybe I am abusing call_every_iteration() when I use it for
> dispatching. If you look at the libuv and libev documentation, then
> they say that their call_every_iteration() equivalents (Prepare and
> Check) are for integrating with external event loops. So maybe that is
> the use case. However, I've not looked into this in any detail.
>
> If Tornado and Twisted cannot implement call_every_iteration(), then I
> think that is a good reason to remove it.

Ok, I'll kill call_every_iteration(). I'll wait for more discussion on
run_once() and run()'s until-idle behavior.

-- 
--Guido van Rossum (python.org/~guido)


From p.f.moore at gmail.com  Tue Jan 22 21:36:47 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Tue, 22 Jan 2013 20:36:47 +0000
Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events
In-Reply-To: <CAP7+vJLKUXy6-rNjJzjE5vQZ3_LkeZemGN3V3H0S-KaJwvaJKg@mail.gmail.com>
References: <CACac1F-MJTi6DUAfc4S-a_G9dgqBEf89U_cTmpwTHFAmcsHzvg@mail.gmail.com>
	<CAP7+vJLPzngdwatNu7MPzFYyc56TwO3x+0CQ1EQnqmQ0q0V2Tw@mail.gmail.com>
	<CACac1F8C67+Aqo9=EFyf53Wft0rhXDFNmw-NXLgnmJFkd6Jx-Q@mail.gmail.com>
	<CAP7+vJJQBE1-EFsYVmE+Lt+hGFA+T3ST624wQU6fbRGLb2x2bg@mail.gmail.com>
	<CACac1F-TBuqSjGeyKM9Ae3G-+7JjB2WWxfg+9EpSdDqkwKZ=-w@mail.gmail.com>
	<50F87DC8.1060000@canterbury.ac.nz>
	<CAP7+vJKdV0cLZ=G_yF6nFSuEH+ET-2tZw+aursJnCGQUY10o+g@mail.gmail.com>
	<50F8D695.3050002@canterbury.ac.nz>
	<CAP7+vJ+m64Bd70iggrV+nafnkvk7Qzhu9E6xjRAR7LwsJvXa4w@mail.gmail.com>
	<CACac1F-bUNGCo8u2eXhLcBpzZradLtX7YMe1m6OHeyPSVANNaw@mail.gmail.com>
	<CADiSq7ftggQW+7d__1yzE+45i9Xi7WTDAvEkEJrDJ8OLDk8czw@mail.gmail.com>
	<CACac1F9KiWmVW=ZCriVXjcb-a+74UvvznO+8GyT79nZOE5uKHg@mail.gmail.com>
	<CADiSq7fxVexvi4enU=AA=gEtUydkgwFTtu=TX9yFrDe3tnkb8w@mail.gmail.com>
	<CAP7+vJJx9b5tK9NE6kJx8AQHJPDGEdR9J9tJb-DikqdjJ4hrAA@mail.gmail.com>
	<CACac1F-voxCSgR67jWOdCKNQQdq1XKZD6xXO28kiZ7_RgTxhxQ@mail.gmail.com>
	<CAP7+vJJ49DMH8v7odgBVU2QQkMytMD2dsbyU62DtMjOyYw0nxw@mail.gmail.com>
	<CACac1F_ALpTMMWqeJo4_QEiNMyRtVTOwdt-2fRk0AHqkhWfhdw@mail.gmail.com>
	<CACac1F82S3O5VgAb03gcxULYz9xppnC0S-qqNTFfOH5AquTW4w@mail.gmail.com>
	<CADiSq7c4EKrrYGYs0w_Za22FLjTTcNFBdTg3jPWRRrr=2iQ64g@mail.gmail.com>
	<CACac1F-=e1phQULJz6hMbfXAVrUik1mqiEj_wC2pi45cdbMe-w@mail.gmail.com>
	<CACac1F_P4Ge1D9aGxiHY+=BuVSoPe13=v09UzZMWGhieZyAUfg@mail.gmail.com>
	<CAP7+vJLKUXy6-rNjJzjE5vQZ3_LkeZemGN3V3H0S-KaJwvaJKg@mail.gmail.com>
Message-ID: <CACac1F_2NPZo6yBDmdmBVY9=k3K_7qUvqJZRE0A=fPbMXjqBAw@mail.gmail.com>

On 22 January 2013 16:33, Guido van Rossum <guido at python.org> wrote:
> I am not actually very committed to a particular design for a subprocess
> transport. I'll happily leave it up to others to come up with a design and
> make it work on multiple platforms.

OK. I've written a pipe transport (event_loop.connect_read_pipe and
event_loop.connect_write_pipe) and modified the existing subprocess
test to use it. I've also added a small read/write test.

The code is in my bitbucket repository at https://bitbucket.org/pmoore/tulip.

I'm not very happy with the call-back based style of the read/write
test. I'm sure it would be much better written in an async style, but
I don't know how to do so. If anyone who understands the async style
better than I do can offer a translation, I'd be very grateful - I'd
like to see if the resulting code looks sufficiently clear. Here's the
relevant code. The biggest ugliness is the need for the two protocol
classes, which basically do nothing but (1) collect data received and
(2) ignore unwanted callbacks.

class DummyProto(protocols.Protocol):
    def __init__(self):
        pass
    def connection_made(self):
        pass
    def data_received(self, data):
        pass
    def eof_received(self):
        pass
    def connection_lost():
        pass

class MyCollector(protocols.Protocol):
    def __init__(self):
        self.data = []
    def connection_made(self):
        pass
    def data_received(self, data):
        self.data.append(data)
    def eof_received(self):
        pass
    def connection_lost():
        pass
    def get_data(self):
        return b''.join(self.data)

def testReadWrite(self):
    proc = Popen(['/bin/tr', 'a-z', 'A-Z'], stdin=PIPE, stdout=PIPE)
    rt, rp = yield from self.event_loop.connect_read_pipe(MyCollector,
proc.stdout)
    wt, wp = yield from self.event_loop.connect_read_pipe(DummyProto,
proc.stdin)
    def send_data():
        wt.write("hello, world")
        wt.write_eof()
    self.event_loop.call_soon(send_data)
    self.event_loop.run()
    self.assertEqual(rp.get_data(), b'HELLO, WORLD')

Paul


From rosuav at gmail.com  Tue Jan 22 22:32:20 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 23 Jan 2013 08:32:20 +1100
Subject: [Python-ideas] More details in MemoryError
In-Reply-To: <20130122192710.6f94d16f@pitrou.net>
References: <kdk4db$k5a$1@ger.gmane.org>
	<loom.20130121T231157-263@post.gmane.org>
	<50FE891D.2080603@pearwood.info>
	<CAPTjJmrkSFMQ6Am+WcRd7YFjCaWBexgufMc+e4O5XthujPJU-Q@mail.gmail.com>
	<20130122192710.6f94d16f@pitrou.net>
Message-ID: <CAPTjJmriaVhq8+d2X-qUk9U6MtfoJf3R+5s3ymqFiQAmXhxXAQ@mail.gmail.com>

On Wed, Jan 23, 2013 at 5:27 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Wed, 23 Jan 2013 00:04:15 +1100
> Chris Angelico <rosuav at gmail.com> wrote:
>> On Tue, Jan 22, 2013 at 11:42 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> > Something like this could be used to decide whether or not to flush
>> > unimportant in-memory caches, compact data structures, etc., or just
>> > give up and exit.
>>
>> That's a nice idea, but unless the requested allocation was fairly
>> large, there's a good chance you don't have room to allocate anything
>> more.
>
> I wouldn't be surprised if most cases of MemoryErrors were on fairly
> large allocation requests ;-)

Depends on the workflow. Something that allocates an immediate block
of memory, yes, but if you're progressively building a complex
structure, individual allocations mightn't themselves be significant.

ChrisA


From jcd at sdf.lonestar.org  Wed Jan 23 02:06:08 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Tue, 22 Jan 2013 20:06:08 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
Message-ID: <1358903168.4767.4.camel@webb>

Idea folks,

I'm working with some poorly-formed CSV files, and I noticed that
DictReader always and only pulls headers off of the first row.  But many
of the files I see have blank lines before the row of headers, sometimes
with commas to the appropriate field count, sometimes without.  The
current implementation's behavior in this case is likely never correct,
and certainly always annoying.  Given the following file:

---Start File 1---
,,
A,B,C
1,2,3
2,4,6
---End File 1---

csv.DictReader yields the rows:

    {'': 'C'}
    {'': '3'}
    {'': '6'}


And given a file starting with a zero-length line, like the following:

---Start File 2---

A,B,C
1,2,3
2,4,6
---End File 2---

It yields the following:

{None: ['A', 'B', 'C']}
{None: ['1', '2', '3']}
{None: ['2', '4', '6']}

I think that in both cases, the proper response would be treat the A,B,C
line as the header line.  The change that makes this work is pretty
simple.  In the fieldnames getter property, the "if not
self._fieldnames:" conditional becomes "while not self._fieldnames or
not any(self._fieldnames):"  As a subclass:

import csv


class DictReader(csv.DictReader):
    @property
    def fieldnames(self):
        while self._fieldnames is None or not any(self._fieldnames):
            try:
                self._fieldnames = next(self.reader)
            except StopIteration:
                break
        return self._fieldnames
        self.line_num = self.reader.line_num

    #Same as the original setter, just rewritten to associate with the
new getter propery
    @fieldnames.setter
    def fieldnames(self, value):
        self._fieldnames = value

There might be some issues with existing code that depends on the {None:
['1','2','3']} construction, but I can't imagine a time when programmers
would want to see {'': '3'} with the 1 and 2 values getting lost.

Thoughts? Do folks think this is worth adding to the csv library, or
should I just keep using my subclass?

Cheers,
Cliff




From wuwei23 at gmail.com  Wed Jan 23 02:51:38 2013
From: wuwei23 at gmail.com (alex23)
Date: Tue, 22 Jan 2013 17:51:38 -0800 (PST)
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1358903168.4767.4.camel@webb>
References: <1358903168.4767.4.camel@webb>
Message-ID: <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>

On Jan 23, 11:06?am, "J. Cliff Dyer" <j... at sdf.lonestar.org> wrote:
> I'm working with some poorly-formed CSV files, and I noticed that
> DictReader always and only pulls headers off of the first row. ?But many
> of the files I see have blank lines before the row of headers, sometimes
> with commas to the appropriate field count, sometimes without. ?The
> current implementation's behavior in this case is likely never correct,
> and certainly always annoying.

I don't think we should start adding support for every malformed type
of csv file that exists. It's easy enough to remove the unnecessary
lines yourself before passing them to DictReader:

    from csv import DictReader

    with open('malformed.csv','rb') as csvfile:
        csvlines = list(l for l in csvfile if l.strip())
        csvreader = DictReader(csvlines)

Personally, if I was dealing with this as often as you are, I'd
probably make a custom context manager instead. The problem lies in
the files themselves, not in csv's response to them.


From cf.natali at gmail.com  Wed Jan 23 12:16:14 2013
From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Wed, 23 Jan 2013 12:16:14 +0100
Subject: [Python-ideas] reducing multiprocessing.Queue contention
Message-ID: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>

Hello,

Currently, multiprocessing.Queue put() and get() methods hold locks
for the entire duration of the writing/reading to the backing
Connection (which can be a pipe, unix domain socket, or whatever it's
called on Windows).

For example, here's what the feeder thread does:
"""
           else:
               wacquire()
               try:
                   send(obj)
                   # Delete references to object. See issue16284
                   del obj
               finally:
                   wrelease()
"""

Connection.send() and Connection.recv() have to serialize the data
using pickle before writing them to the underlying file descriptor.
While the locking is necessary to guarantee atomic read/write (well,
it's not necessary if you're writing to a pipe less than PIPE_BUF, and
writes seem atomic on Windows), the locks don't have to be held while
the data is serialized.

Although I didn't make any measurement, my gut feeling is that this
serializing can take a non negligible part of the overall
sending/receiving time, for large data items. If that's the case, then
simply holding the lock for the duration of the read()/write() syscall
(and not during serialization) could reduce contention in case of
large data sending/receiving.

One way to do that would be to refactor the code a bit to provide
maybe a (private) AtomicConnection, which would encapsulate the
necessary locking: another advantage is that this would hide the
platform-dependent code inside Connection (right now, Queue only uses
a lock for ending on Unix platforms, since write is apparently atomic
on Windows).

Thoughts?


From shibturn at gmail.com  Wed Jan 23 13:09:42 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Wed, 23 Jan 2013 12:09:42 +0000
Subject: [Python-ideas] reducing multiprocessing.Queue contention
In-Reply-To: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>
References: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>
Message-ID: <kdojud$o4a$1@ger.gmane.org>

On 23/01/2013 11:16am, Charles-Fran?ois Natali wrote:
> Connection.send() and Connection.recv() have to serialize the data
> using pickle before writing them to the underlying file descriptor.
> While the locking is necessary to guarantee atomic read/write (well,
> it's not necessary if you're writing to a pipe less than PIPE_BUF, and
> writes seem atomic on Windows), the locks don't have to be held while
> the data is serialized.

But you can only rely on the atomicity of writing less than PIPE_BUF 
bytes if you know that no other process is currently trying to send a 
message longer than PIPE_BUF.  Otherwise the short message could be 
embedded in the long message (even if the process sending the long 
message is holding the lock).

-- 
Richard



From cf.natali at gmail.com  Wed Jan 23 13:27:39 2013
From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Wed, 23 Jan 2013 13:27:39 +0100
Subject: [Python-ideas] reducing multiprocessing.Queue contention
In-Reply-To: <kdojud$o4a$1@ger.gmane.org>
References: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>
	<kdojud$o4a$1@ger.gmane.org>
Message-ID: <CAH_1eM0uWyXZb1-f2DXfj1NiouC2RTbGqRRDv6dbcwj_YSAk-Q@mail.gmail.com>

> But you can only rely on the atomicity of writing less than PIPE_BUF bytes
> if you know that no other process is currently trying to send a message
> longer than PIPE_BUF.  Otherwise the short message could be embedded in the
> long message (even if the process sending the long message is holding the
> lock).

Maybe I wasn't clear.
I'm not suggesting to not hold the lock when sending less than
PIPE_BUF, since it wouldn't work in the case you describe above.
I'm suggesting to serialize the data prior to acquiring the writer
lock, to reduce contention (and unserialize after releasing the
reading lock).

(I only mentioned PIPE_BUF because I was sad to see that Windows
supported atomic messages, and this comforted me a bit :-)


From solipsis at pitrou.net  Wed Jan 23 13:37:17 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 23 Jan 2013 13:37:17 +0100
Subject: [Python-ideas] reducing multiprocessing.Queue contention
References: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>
Message-ID: <20130123133717.4c3a7357@pitrou.net>

Le Wed, 23 Jan 2013 12:16:14 +0100,
Charles-Fran?ois Natali
<cf.natali at gmail.com> a ?crit :
> 
> One way to do that would be to refactor the code a bit to provide
> maybe a (private) AtomicConnection, which would encapsulate the
> necessary locking: another advantage is that this would hide the
> platform-dependent code inside Connection

Or perhaps simply some _send_with_lock and _recv_with_lock methods?
(it may also skip the lock for the Windows PipeConnection
implementation)

Regards

Antoine.




From shibturn at gmail.com  Wed Jan 23 14:13:30 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Wed, 23 Jan 2013 13:13:30 +0000
Subject: [Python-ideas] reducing multiprocessing.Queue contention
In-Reply-To: <CAH_1eM0uWyXZb1-f2DXfj1NiouC2RTbGqRRDv6dbcwj_YSAk-Q@mail.gmail.com>
References: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>
	<kdojud$o4a$1@ger.gmane.org>
	<CAH_1eM0uWyXZb1-f2DXfj1NiouC2RTbGqRRDv6dbcwj_YSAk-Q@mail.gmail.com>
Message-ID: <kdonm2$rku$1@ger.gmane.org>

On 23/01/2013 12:27pm, Charles-Fran?ois Natali wrote:
> Maybe I wasn't clear.
> I'm not suggesting to not hold the lock when sending less than
> PIPE_BUF, since it wouldn't work in the case you describe above.
> I'm suggesting to serialize the data prior to acquiring the writer
> lock, to reduce contention (and unserialize after releasing the
> reading lock).

That is reasonable.  In fact if we should probably serialize when put() 
is called to catch any pickling error early.

-- 
Richard



From eliben at gmail.com  Wed Jan 23 16:00:14 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Wed, 23 Jan 2013 07:00:14 -0800
Subject: [Python-ideas] reducing multiprocessing.Queue contention
In-Reply-To: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>
References: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>
Message-ID: <CAF-Rda9QWxPZ8bZ7r1_dRpzbzcz9n+7fobB8H8Awe_CRMidwpQ@mail.gmail.com>

On Wed, Jan 23, 2013 at 3:16 AM, Charles-Fran?ois Natali <
cf.natali at gmail.com> wrote:

> Hello,
>
> Currently, multiprocessing.Queue put() and get() methods hold locks
> for the entire duration of the writing/reading to the backing
> Connection (which can be a pipe, unix domain socket, or whatever it's
> called on Windows).
>
> For example, here's what the feeder thread does:
> """
>            else:
>                wacquire()
>                try:
>                    send(obj)
>                    # Delete references to object. See issue16284
>                    del obj
>                finally:
>                    wrelease()
> """
>
> Connection.send() and Connection.recv() have to serialize the data
> using pickle before writing them to the underlying file descriptor.
> While the locking is necessary to guarantee atomic read/write (well,
> it's not necessary if you're writing to a pipe less than PIPE_BUF, and
> writes seem atomic on Windows), the locks don't have to be held while
> the data is serialized.
>
> Although I didn't make any measurement, my gut feeling is that this
> serializing can take a non negligible part of the overall
> sending/receiving time, for large data items. If that's the case, then
> simply holding the lock for the duration of the read()/write() syscall
> (and not during serialization) could reduce contention in case of
> large data sending/receiving.
>
> One way to do that would be to refactor the code a bit to provide
> maybe a (private) AtomicConnection, which would encapsulate the
> necessary locking: another advantage is that this would hide the
> platform-dependent code inside Connection (right now, Queue only uses
> a lock for ending on Unix platforms, since write is apparently atomic
> on Windows).
>
>
In general, this sounds good. There's indeed no reason to perform the
serialization under a lock.

It would be great to have some measurements to see just how much it takes,
though.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130123/4f003538/attachment.html>

From jcd at sdf.lonestar.org  Wed Jan 23 17:51:05 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Wed, 23 Jan 2013 11:51:05 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
Message-ID: <1358959865.5194.8.camel@gdoba.domain.local>

On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote:
> I don't think we should start adding support for every malformed type
> of csv file that exists. It's easy enough to remove the unnecessary
> lines yourself before passing them to DictReader:
> 
>     from csv import DictReader
> 
>     with open('malformed.csv','rb') as csvfile:
>         csvlines = list(l for l in csvfile if l.strip())
>         csvreader = DictReader(csvlines)
> 
> Personally, if I was dealing with this as often as you are, I'd
> probably make a custom context manager instead. The problem lies in
> the files themselves, not in csv's response to them.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
> 

With all due respect, while you make a good point that we don't want to
start special casing every malformed type of CSV, there is absolutely
something wrong with DictReader's response to files that have duplicate
headers. It throws away data silently.

If you (and others on this list) aren't in favor of trying to find the
right header row (which I can understand: "In the face of ambiguity,
refuse the temptation to guess."), maybe a better solution would be to
raise a (suppressible) exception if the headers aren't uniquely named.
("Errors should never pass silently.  Unless explicitly silenced.")

Cheers,
Cliff





From amauryfa at gmail.com  Wed Jan 23 18:08:32 2013
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Wed, 23 Jan 2013 18:08:32 +0100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1358959865.5194.8.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
	<1358959865.5194.8.camel@gdoba.domain.local>
Message-ID: <CAGmFidZU0rZ+jeuuTFH+2DOY3Tpdh_OjFyrOQ+-MXvDEsbbQ=A@mail.gmail.com>

Hi,

2013/1/23 J. Cliff Dyer <jcd at sdf.lonestar.org>

> On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote:
> > I don't think we should start adding support for every malformed type
> > of csv file that exists. It's easy enough to remove the unnecessary
> > lines yourself before passing them to DictReader:
> >
> >     from csv import DictReader
> >
> >     with open('malformed.csv','rb') as csvfile:
> >         csvlines = list(l for l in csvfile if l.strip())
> >         csvreader = DictReader(csvlines)
> >
> > Personally, if I was dealing with this as often as you are, I'd
> > probably make a custom context manager instead. The problem lies in
> > the files themselves, not in csv's response to them.
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > http://mail.python.org/mailman/listinfo/python-ideas
> >
>
> With all due respect, while you make a good point that we don't want to
> start special casing every malformed type of CSV, there is absolutely
> something wrong with DictReader's response to files that have duplicate
> headers. It throws away data silently.
>

That's how Python dictionaries work, by design:
    d = {'a': 1, 'a': 2}
"silently" discards the first value.

If you (and others on this list) aren't in favor of trying to find the
> right header row (which I can understand: "In the face of ambiguity,
> refuse the temptation to guess."), maybe a better solution would be to
> raise a (suppressible) exception if the headers aren't uniquely named.
> ("Errors should never pass silently.  Unless explicitly silenced.")
>

What about a subclass then:

class CarefulDictReader(csv.DictReader):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        fieldnames = self.fieldnames
        if len(fieldnames) != len(set(fieldnames)):
            raise ValueError("Duplicate field names", fieldnames)


-- 
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130123/616cefb7/attachment.html>

From solipsis at pitrou.net  Wed Jan 23 18:15:51 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 23 Jan 2013 18:15:51 +0100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
	<1358959865.5194.8.camel@gdoba.domain.local>
	<CAGmFidZU0rZ+jeuuTFH+2DOY3Tpdh_OjFyrOQ+-MXvDEsbbQ=A@mail.gmail.com>
Message-ID: <20130123181551.44a6e0cb@pitrou.net>

Le Wed, 23 Jan 2013 18:08:32 +0100,
"Amaury Forgeot d'Arc"
<amauryfa at gmail.com> a ?crit :
> Hi,
> 
> 2013/1/23 J. Cliff Dyer <jcd at sdf.lonestar.org>
> 
> > On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote:
> > > I don't think we should start adding support for every malformed
> > > type of csv file that exists. It's easy enough to remove the
> > > unnecessary lines yourself before passing them to DictReader:
> > >
> > >     from csv import DictReader
> > >
> > >     with open('malformed.csv','rb') as csvfile:
> > >         csvlines = list(l for l in csvfile if l.strip())
> > >         csvreader = DictReader(csvlines)
> > >
> > > Personally, if I was dealing with this as often as you are, I'd
> > > probably make a custom context manager instead. The problem lies
> > > in the files themselves, not in csv's response to them.
> > > _______________________________________________
> > > Python-ideas mailing list
> > > Python-ideas at python.org
> > > http://mail.python.org/mailman/listinfo/python-ideas
> > >
> >
> > With all due respect, while you make a good point that we don't
> > want to start special casing every malformed type of CSV, there is
> > absolutely something wrong with DictReader's response to files that
> > have duplicate headers. It throws away data silently.
> >
> 
> That's how Python dictionaries work, by design:
>     d = {'a': 1, 'a': 2}
> "silently" discards the first value.

It's still rather surprising (and, in many cases, undesired). I would
suggest adding a parameter to DictReader to raise an exception when
there are duplicate column headers.

Regards

Antoine.




From ubershmekel at gmail.com  Wed Jan 23 18:26:07 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Wed, 23 Jan 2013 19:26:07 +0200
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <20130123181551.44a6e0cb@pitrou.net>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
	<1358959865.5194.8.camel@gdoba.domain.local>
	<CAGmFidZU0rZ+jeuuTFH+2DOY3Tpdh_OjFyrOQ+-MXvDEsbbQ=A@mail.gmail.com>
	<20130123181551.44a6e0cb@pitrou.net>
Message-ID: <CANSw7KznDEuwB8G3aLTaKE5N8Z6q=ohn7_nFcvW_QPypGJCpqA@mail.gmail.com>

On Wed, Jan 23, 2013 at 7:15 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> It's still rather surprising (and, in many cases, undesired). I would
>  suggest adding a parameter to DictReader to raise an exception when
> there are duplicate column headers.
>
> Regards
>
> Antoine.
>
>
Completely agree, it's a big surprise and a quiet bug.

This is one of those changes we should remember for python 4.0. Until 4.0,
give an option to raise an exception upon duplicates. After 4.0 throw an
exception on duplicate headers by default with an option to ignore them.

Yuval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130123/f2df8de9/attachment.html>

From jcd at sdf.lonestar.org  Wed Jan 23 18:37:01 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Wed, 23 Jan 2013 12:37:01 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <CAGmFidZU0rZ+jeuuTFH+2DOY3Tpdh_OjFyrOQ+-MXvDEsbbQ=A@mail.gmail.com>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
	<1358959865.5194.8.camel@gdoba.domain.local>
	<CAGmFidZU0rZ+jeuuTFH+2DOY3Tpdh_OjFyrOQ+-MXvDEsbbQ=A@mail.gmail.com>
Message-ID: <1358962621.5194.18.camel@gdoba.domain.local>

On Wed, 2013-01-23 at 18:08 +0100, Amaury Forgeot d'Arc wrote:
> Hi,
> 
> 2013/1/23 J. Cliff Dyer <jcd at sdf.lonestar.org>
>         On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote:
>         > I don't think we should start adding support for every
>         malformed type
>         > of csv file that exists. It's easy enough to remove the
>         unnecessary
>         > lines yourself before passing them to DictReader:
>         >
>         >     from csv import DictReader
>         >
>         >     with open('malformed.csv','rb') as csvfile:
>         >         csvlines = list(l for l in csvfile if l.strip())
>         >         csvreader = DictReader(csvlines)
>         >
>         > Personally, if I was dealing with this as often as you are,
>         I'd
>         > probably make a custom context manager instead. The problem
>         lies in
>         > the files themselves, not in csv's response to them.
>         > _______________________________________________
>         > Python-ideas mailing list
>         > Python-ideas at python.org
>         > http://mail.python.org/mailman/listinfo/python-ideas
>         >
>         
>         
>         With all due respect, while you make a good point that we
>         don't want to
>         start special casing every malformed type of CSV, there is
>         absolutely
>         something wrong with DictReader's response to files that have
>         duplicate
>         headers. It throws away data silently.
> 
> 
> That's how Python dictionaries work, by design:
>     d = {'a': 1, 'a': 2}
> "silently" discards the first value.
> 
> 
>         If you (and others on this list) aren't in favor of trying to
>         find the
>         right header row (which I can understand: "In the face of
>         ambiguity,
>         refuse the temptation to guess."), maybe a better solution
>         would be to
>         raise a (suppressible) exception if the headers aren't
>         uniquely named.
>         ("Errors should never pass silently.  Unless explicitly
>         silenced.")
> 
> 
> What about a subclass then:
> 
> 
> class CarefulDictReader(csv.DictReader):
>     def __init__(self, *args, **kwargs):
>         super().__init__(*args, **kwargs)
>         fieldnames = self.fieldnames
>         if len(fieldnames) != len(set(fieldnames)):
>             raise ValueError("Duplicate field names", fieldnames)
> 
> 
> 
> 
> -- 
> Amaury Forgeot d'Arc

Whether it's a subclass or a change to the existing class is worth
having a discussion about.  Obviously, the change could be made in a
subclass.  Currently, that's what I do.  The question at issue is
whether it should be made in the original.  My position is that
something should change in the standard library, whether that is
modifying the code in some way to handle edge cases more robustly, or
updating the documentation to advise programmers on how to handle files
that aren't perfectly formed.

This might include documenting that self.reader is an available
attribute (where the programmer could iterate to find the header row
they're looking for, if needed, and then assign it to self.fieldnames).

I do like the idea of assigning the fieldnames variable and then raising
the ValueError, so if the user silences the exception, they still have
access to the field names found.  However, I think the behavior should
be overridden on the fieldnames property, so as not to change the
semantics of the DictReader.







From bruce at leapyear.org  Wed Jan 23 19:20:21 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Wed, 23 Jan 2013 10:20:21 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <20130123181551.44a6e0cb@pitrou.net>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
	<1358959865.5194.8.camel@gdoba.domain.local>
	<CAGmFidZU0rZ+jeuuTFH+2DOY3Tpdh_OjFyrOQ+-MXvDEsbbQ=A@mail.gmail.com>
	<20130123181551.44a6e0cb@pitrou.net>
Message-ID: <CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>

On Wed, Jan 23, 2013 at 9:15 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:

> > That's how Python dictionaries work, by design:
> >     d = {'a': 1, 'a': 2}
> > "silently" discards the first value.
>
> It's still rather surprising (and, in many cases, undesired). I would
> suggest adding a parameter to DictReader to raise an exception when
> there are duplicate column headers.
>

If there are duplicate column headers, they are probably there for a
reason. I can't imagine a case where the desired result is to discard one
of the columns. If DictReader is going to recognize this case, perhaps:

A,B,A
1,2,3
4,5,6

would be better as

{'A': [1,3], 'B': 2}
{'A': [4,6], 'B': 5}

I realize that sometimes getting a single value and sometimes an array is
potentially messy, but bear in mind that in most cases the reader of the
csv file has some idea of what they are reading. There could be an optional
parameter multivalue="A" that lists the columns that are allowed to have
multiple values and if not present it raises an exception. To allow any
column to be multivalued, you could use multivalue=True.

As to skipping over a leading blank line, this happened to me just
yesterday. I was saving some data in csv files and all the files ended up
with an extra blank line at the top. I'd be +1 for skipping over a blank
line at the top, +0 for skipping over more than one blank line.

--- Bruce
Only 5 hours left!
http://www.kickstarter.com/projects/royleban/unique-puzzles-for-a-yankee-echo-alfa-romeo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130123/e7733630/attachment.html>

From mark.hackett at metoffice.gov.uk  Wed Jan 23 19:24:07 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Wed, 23 Jan 2013 18:24:07 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1358962621.5194.18.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<CAGmFidZU0rZ+jeuuTFH+2DOY3Tpdh_OjFyrOQ+-MXvDEsbbQ=A@mail.gmail.com>
	<1358962621.5194.18.camel@gdoba.domain.local>
Message-ID: <201301231824.07965.mark.hackett@metoffice.gov.uk>

On Wednesday 23 Jan 2013, J. Cliff Dyer wrote:
> 
> Whether it's a subclass or a change to the existing class is worth
> having a discussion about.  Obviously, the change could be made in a
> subclass.  Currently, that's what I do.  The question at issue is
> whether it should be made in the original.  My position is that
> something should change in the standard library, whether that is
> modifying the code in some way to handle edge cases more robustly, or
> updating the documentation to advise programmers on how to handle files
> that aren't perfectly formed.
> 

It looks entirely like a format checking on something that doesn't necessarily 
have a format.

It therefore belongs in something else. I.e. you define your "csv schema", pass 
it on to something that creates a "lint check" on the entire bytestream and/or 
checks each input as read, and passed in like any decoration on a base 
function in python.

CSV format checking isn't, IMO any different than the socket service decorators 
that embed policy on the base function.


From mark.hackett at metoffice.gov.uk  Wed Jan 23 19:32:20 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Wed, 23 Jan 2013 18:32:20 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>
References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net>
	<CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>
Message-ID: <201301231832.20087.mark.hackett@metoffice.gov.uk>

On Wednesday 23 Jan 2013, Bruce Leban wrote:
> If there are duplicate column headers, they are probably there for a
> reason. I can't imagine a case where the desired result is to discard one
> of the columns. If DictReader is going to recognize this case, perhaps:
> 

I can't see why there would be duplicate column headers for valid reason.

Someone may have written their CSV export incorrectly, but that's not actually 
valid.

It would therefore be arguable for the program to give at least a WARNING that 
it's throwing data away.

However, since python is mechanising this as a dictionary and since in python 
setting A to 1 then setting A to 3 would throw away the earlier value for A 
and the import function working AS EXPECTED in Python.

Hence a decorator to insist on some formatting issues (e.g. turning A into a 
list of values 1,3 rather than throwing away the 1 or the 3). To do otherwise 
would have someone in the official library have to write their own format 
conversion and shove it in the middle and telling people what they should be 
doing.


From malaclypse2 at gmail.com  Wed Jan 23 20:59:42 2013
From: malaclypse2 at gmail.com (Jerry Hill)
Date: Wed, 23 Jan 2013 14:59:42 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301231832.20087.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net>
	<CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>
	<201301231832.20087.mark.hackett@metoffice.gov.uk>
Message-ID: <CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>

On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett
<mark.hackett at metoffice.gov.uk> wrote:
> I can't see why there would be duplicate column headers for valid reason.
>
> Someone may have written their CSV export incorrectly, but that's not actually
> valid.

Sure it is.  Since there is no formal spec for .csv files, having a
multiple columns with the same text in the header is a perfectly valid
.csv file.  For what it's worth, the informal spec for csv files seems
to be "whatever Excel does" and Excel (and every other
spreadsheet-oriented program) is happy to let you have duplicated
headers too.

> It would therefore be arguable for the program to give at least a WARNING that
> it's throwing data away.

I think the library should give the programmer some sort of indication
that they are losing data.  Personally, I'd prefer an exception which
can either be caught or not, depending on whether the program is
designed to handle the situation or not.

> However, since python is mechanising this as a dictionary and since in python
> setting A to 1 then setting A to 3 would throw away the earlier value for A
> and the import function working AS EXPECTED in Python.

I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
It's not terribly surprising once you sit down and think about it, but
it's certainly at least a little unexpected to me that data is being
thrown away with no notice.  It's unusual for errors to pass silently
in python.

-- 
Jerry


From cf.natali at gmail.com  Wed Jan 23 21:03:46 2013
From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Wed, 23 Jan 2013 21:03:46 +0100
Subject: [Python-ideas] reducing multiprocessing.Queue contention
In-Reply-To: <CAF-Rda9QWxPZ8bZ7r1_dRpzbzcz9n+7fobB8H8Awe_CRMidwpQ@mail.gmail.com>
References: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>
	<CAF-Rda9QWxPZ8bZ7r1_dRpzbzcz9n+7fobB8H8Awe_CRMidwpQ@mail.gmail.com>
Message-ID: <CAH_1eM1Ws1MZ_raVP0JMocHz4XE9+MPObd30GH0q+6M4xyHAZw@mail.gmail.com>

> In general, this sounds good. There's indeed no reason to perform the
> serialization under a lock.
>
> It would be great to have some measurements to see just how much it takes,
> though.

I was curious, so I wrote a quick and dirty patch (it's doesn't
support timed get()/put(), so I won't post it here).

I used the attached script as benchmark: basically, it just spawns a
bunch of processes that put()/get() to a queue some data repeatedly
(10000 times a list of 1024 ints), and returns when everything has
been sent and received.

The following tests have been made on an 8-cores box, from 1 reader/1
writer up to 4 readers/4 writers (it would be interesting to see with
only 1 writer and multiple readers, but readers would keep waiting for
input so it requires another benchmark):

Without patch:
"""
$ ./python /tmp/multi_queue.py
took 0.7993290424346924 seconds with 1 workers
took 1.8892168998718262 seconds with 2 workers
took 3.075777053833008 seconds with 3 workers
took 4.050479888916016 seconds with 4 workers
"""

With patch:
"""
$ ./python /tmp/multi_queue.py
took 0.7730131149291992 seconds with 1 workers
took 0.7471320629119873 seconds with 2 workers
took 0.752316951751709 seconds with 3 workers
took 0.8303961753845215 seconds with 4 workers
"""
-------------- next part --------------
A non-text attachment was scrubbed...
Name: multi_queue.py
Type: application/octet-stream
Size: 1138 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130123/f629a381/attachment.obj>

From jcd at sdf.lonestar.org  Wed Jan 23 22:13:54 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Wed, 23 Jan 2013 16:13:54 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net>
	<CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>
	<201301231832.20087.mark.hackett@metoffice.gov.uk>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
Message-ID: <1358975634.4866.0.camel@gdoba.domain.local>

On Wed, 2013-01-23 at 14:59 -0500, Jerry Hill wrote:
> > However, since python is mechanising this as a dictionary and since
> in python
> > setting A to 1 then setting A to 3 would throw away the earlier
> value for A
> > and the import function working AS EXPECTED in Python.
> 
> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
> It's not terribly surprising once you sit down and think about it, but
> it's certainly at least a little unexpected to me that data is being
> thrown away with no notice.  It's unusual for errors to pass silently
> in python. 

Moreover, I think while it might be expected for a dict to do this, it
does not follow that a DictReader should be expected to silently throw
away the user's data.  Just because it uses the dict format for storage
does not mean that it's okay to throw away user's data silently.  Dicts
need to be blazingly fast for a host of reasons.  DictReaders do not.
They're usually dealing with file input, so any slowness in the
DictReader itself is going to be dwarfed by the file access.  As such we
can afford to be more programmer-friendly here.

Cheers,
Cliff





From ubershmekel at gmail.com  Wed Jan 23 22:54:05 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Wed, 23 Jan 2013 23:54:05 +0200
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1358975634.4866.0.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net>
	<CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>
	<201301231832.20087.mark.hackett@metoffice.gov.uk>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<1358975634.4866.0.camel@gdoba.domain.local>
Message-ID: <CANSw7KxVQ6ozar-XHow--oBkoFh90GH16kaM17qRqieXKdXaiQ@mail.gmail.com>

On Wed, Jan 23, 2013 at 11:13 PM, J. Cliff Dyer <jcd at sdf.lonestar.org>wrote:

>
> Moreover, I think while it might be expected for a dict to do this, it
> does not follow that a DictReader should be expected to silently throw
> away the user's data.  Just because it uses the dict format for storage
> does not mean that it's okay to throw away user's data silently.  Dicts
> need to be blazingly fast for a host of reasons.  DictReaders do not.
> They're usually dealing with file input, so any slowness in the
> DictReader itself is going to be dwarfed by the file access.  As such we
> can afford to be more programmer-friendly here.
>

If it were a NamedTupleReader, this wouldn't be an issue.

>>> from collections import namedtuple
>>> namedtuple('x', 'a b a c')
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    namedtuple('x', 'a b a c')
  File "C:\Python27\lib\collections.py", line 288, in namedtuple
    raise ValueError('Encountered duplicate field name: %r' % name)
ValueError: Encountered duplicate field name: 'a'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130123/c0c880bc/attachment.html>

From steve at pearwood.info  Thu Jan 24 01:19:34 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 24 Jan 2013 11:19:34 +1100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
	<1358959865.5194.8.camel@gdoba.domain.local>
	<CAGmFidZU0rZ+jeuuTFH+2DOY3Tpdh_OjFyrOQ+-MXvDEsbbQ=A@mail.gmail.com>
	<20130123181551.44a6e0cb@pitrou.net>
	<CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>
Message-ID: <51007E16.30805@pearwood.info>

On 24/01/13 05:20, Bruce Leban wrote:

> I realize that sometimes getting a single value and sometimes an array is
> potentially messy, but bear in mind that in most cases the reader of the
> csv file has some idea of what they are reading. There could be an optional
> parameter multivalue="A" that lists the columns that are allowed to have
> multiple values and if not present it raises an exception. To allow any
> column to be multivalued, you could use multivalue=True.

-1 to adding optional parameters that change the behaviour of a class.

To deal with cases where you expect multiple columns with the same name,
add a new reader class that treats all columns to be multi-valued. The
standard DictReader class should continue to behave like a dict.

Don't over-engineer this MultiDictReader -- it should stay simple and treat
all column names as potentially multivalued. If the caller has some
requirements for which names can have how many columns -- "there should be
exactly three columns named X, and only one Y, and at least four Z" -- they
can check the result and decide for themselves if there is a problem.


> As to skipping over a leading blank line, this happened to me just
> yesterday. I was saving some data in csv files and all the files ended up
> with an extra blank line at the top. I'd be +1 for skipping over a blank
> line at the top, +0 for skipping over more than one blank line.


I don't see any reason not to skip blank lines at the top of the file.



-- 
Steven


From steve at pearwood.info  Thu Jan 24 01:26:52 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 24 Jan 2013 11:26:52 +1100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net>
	<CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>
	<201301231832.20087.mark.hackett@metoffice.gov.uk>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
Message-ID: <51007FCC.5090400@pearwood.info>

On 24/01/13 06:59, Jerry Hill wrote:
> On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett
> <mark.hackett at metoffice.gov.uk>  wrote:
>> I can't see why there would be duplicate column headers for valid reason.
>>
>> Someone may have written their CSV export incorrectly, but that's not actually
>> valid.
>
> Sure it is.  Since there is no formal spec for .csv files, having a
> multiple columns with the same text in the header is a perfectly valid
> .csv file.  For what it's worth, the informal spec for csv files seems
> to be "whatever Excel does" and Excel (and every other
> spreadsheet-oriented program) is happy to let you have duplicated
> headers too.

+1

I think keeping DictReader as it is now is fine for backward compatibility.
Or better, simply have DictReader raise an exception rather than silently
eat data. I don't expect that anyone is relying on that behaviour, nor is
it behaviour promised by the class.

But we should add a MultiDictReader that supports the multiple columns with
the same name.


>> It would therefore be arguable for the program to give at least a WARNING that
>> it's throwing data away.
>
> I think the library should give the programmer some sort of indication
> that they are losing data.  Personally, I'd prefer an exception which
> can either be caught or not, depending on whether the program is
> designed to handle the situation or not.
>
>> However, since python is mechanising this as a dictionary and since in python
>> setting A to 1 then setting A to 3 would throw away the earlier value for A
>> and the import function working AS EXPECTED in Python.
>
> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
> It's not terribly surprising once you sit down and think about it, but
> it's certainly at least a little unexpected to me that data is being
> thrown away with no notice.  It's unusual for errors to pass silently
> in python.

Yes, we should not forget that a CSV file is not a dict. Just because DictReader
is implemented with a dict as the storage, doesn't mean that it should behave
exactly like a dict in all things. Multiple columns with the same name are legal
in CSV, so there should be a reader for that situation.



-- 
Steven


From dustin at v.igoro.us  Thu Jan 24 02:57:17 2013
From: dustin at v.igoro.us (Dustin J. Mitchell)
Date: Wed, 23 Jan 2013 20:57:17 -0500
Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations
	and idleness?
In-Reply-To: <CAP7+vJKvQNVNk1n7Q8B6Oaio16Aai-hqbaqGn57AE-u7J0AFSA@mail.gmail.com>
References: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>
	<CADbA=FWC92A9_u8-sC4f8wSfWWEu0RF0acS6_8OxNUx6kAg2fg@mail.gmail.com>
	<CAFkYKJ5Be27ZR4ChvEcXsOZ87rAcVZMiDkpgp7v1xAdJH4kTMA@mail.gmail.com>
	<CADbA=FWPxaAB_CKgGT_OmsWM85X9-A5NsOJ5d+ipBSDO9O3x0Q@mail.gmail.com>
	<CAP7+vJKvQNVNk1n7Q8B6Oaio16Aai-hqbaqGn57AE-u7J0AFSA@mail.gmail.com>
Message-ID: <CAJtE5vRueHa7P7Bui=B-+d=3dw_ZXUOSMcvUUKrw5bYMwD3g1A@mail.gmail.com>

On Tue, Jan 22, 2013 at 2:19 PM, Guido van Rossum <guido at python.org> wrote:
> Ok, I'll kill call_every_iteration(). I'll wait for more discussion on
> run_once() and run()'s until-idle behavior.

One of the things that's been difficult for some time in Twisted is
writing clients in such a way that they reliably finish.  It's easy
for a simple client, but when the client involves several levels of
libraries doing mysterious, asynchronous things, it can be hard to
know when everything's really done.  Add error conditions in, and you
end up spending a lot of time thinking about something that, in a
synchronous program, is pretty simple.

One option, recently introduced to Twisted, is "react" -
http://twistedmatrix.com/documents/12.3.0/api/twisted.internet.task.html#react
 The idea is to encapsulate the lifetime of a client in a single
asynchronous operation; the synchronous parallel is libc calling
`exit` for you when `main` returns.  If all of your library code
cooperates and reliably indicates when it's done with any background
operations, then this is a good choice.

In cases where your libraries are less than perfect (perhaps they sync
to the cloud "in the background"), the run-until-idle behavior is
useful.  The client calls a function that triggers a cascade of
events.  When that cascade has exhausted itself, the process exits.
Synchronous, threaded programs do this with non-daemon threads.

I think that this option should be supported, if only for the
parallelism with synchronous code.


As for run-until-idle - I've used this sort of behavior occasionally
in tests, where I want to carefully control the sequence of
operations.  For example, I may want to reliably test handling of race
conditions:

op = start_operation()
while not in_critical_section():
    run_once()
generate_conflict()
while in_critical_section():
    run_once()
assert something()

Such a case would rely heavily on the details of the event loop.
Depending on how closely I want to tie my tests to that
implementation, that may or may not be OK.  If a particular event loop
implementation doesn't even *have* this model (as, it appears, Tornado
does not), then I think it would be fine to simply not implement this
operation.  So perhaps run_once() should be described as optional in
the PEP?

Dustin


From greg.ewing at canterbury.ac.nz  Thu Jan 24 03:14:31 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 24 Jan 2013 15:14:31 +1300
Subject: [Python-ideas] PEP 3156 - Coroutines are more better
In-Reply-To: <CAJtE5vRueHa7P7Bui=B-+d=3dw_ZXUOSMcvUUKrw5bYMwD3g1A@mail.gmail.com>
References: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>
	<CADbA=FWC92A9_u8-sC4f8wSfWWEu0RF0acS6_8OxNUx6kAg2fg@mail.gmail.com>
	<CAFkYKJ5Be27ZR4ChvEcXsOZ87rAcVZMiDkpgp7v1xAdJH4kTMA@mail.gmail.com>
	<CADbA=FWPxaAB_CKgGT_OmsWM85X9-A5NsOJ5d+ipBSDO9O3x0Q@mail.gmail.com>
	<CAP7+vJKvQNVNk1n7Q8B6Oaio16Aai-hqbaqGn57AE-u7J0AFSA@mail.gmail.com>
	<CAJtE5vRueHa7P7Bui=B-+d=3dw_ZXUOSMcvUUKrw5bYMwD3g1A@mail.gmail.com>
Message-ID: <51009907.8030404@canterbury.ac.nz>

On 24/01/13 14:57, Dustin J. Mitchell wrote:
> One of the things that's been difficult for some time in Twisted is
> writing clients in such a way that they reliably finish.

I think I'm going to wait and see what the coroutine-level features
of tulip turn out to be like before saying much more.

It seems to me that many of the problems we're arguing about here
simply don't exist in coroutine-land. For example, if you can write
something like

    yield from create_http(yield from create_tcp(host, port))

and creation of the transport fails and raises an exception, then
create_http never gets called, so you won't waste any effort creating
an unused protocol object.

Likewise, if the main loop of your protocol consists of a Task
that reads asynchronously from the transport, then (as long as
you haven't done anything blatantly stupid) you know it will
eventually return when the connection gets closed.

If I were designing all this, I think I would have made coroutines
the default way of dealing with everything above the event loop
layer, and provide callback wrappers for those that like to do
things that way. Building an entire callback-based protocol stack
seems like going about it the hard way.

-- 
Greg


From guido at python.org  Thu Jan 24 03:29:38 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 23 Jan 2013 18:29:38 -0800
Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations
	and idleness?
In-Reply-To: <CAJtE5vRueHa7P7Bui=B-+d=3dw_ZXUOSMcvUUKrw5bYMwD3g1A@mail.gmail.com>
References: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>
	<CADbA=FWC92A9_u8-sC4f8wSfWWEu0RF0acS6_8OxNUx6kAg2fg@mail.gmail.com>
	<CAFkYKJ5Be27ZR4ChvEcXsOZ87rAcVZMiDkpgp7v1xAdJH4kTMA@mail.gmail.com>
	<CADbA=FWPxaAB_CKgGT_OmsWM85X9-A5NsOJ5d+ipBSDO9O3x0Q@mail.gmail.com>
	<CAP7+vJKvQNVNk1n7Q8B6Oaio16Aai-hqbaqGn57AE-u7J0AFSA@mail.gmail.com>
	<CAJtE5vRueHa7P7Bui=B-+d=3dw_ZXUOSMcvUUKrw5bYMwD3g1A@mail.gmail.com>
Message-ID: <CAP7+vJKKKVu4feqG1j0LJG=3b0Rt7sAU=Q5cQ7VT1+ixmvnvNA@mail.gmail.com>

On Wed, Jan 23, 2013 at 5:57 PM, Dustin J. Mitchell <dustin at v.igoro.us> wrote:
> On Tue, Jan 22, 2013 at 2:19 PM, Guido van Rossum <guido at python.org> wrote:
>> Ok, I'll kill call_every_iteration(). I'll wait for more discussion on
>> run_once() and run()'s until-idle behavior.
>
> One of the things that's been difficult for some time in Twisted is
> writing clients in such a way that they reliably finish.  It's easy
> for a simple client, but when the client involves several levels of
> libraries doing mysterious, asynchronous things, it can be hard to
> know when everything's really done.  Add error conditions in, and you
> end up spending a lot of time thinking about something that, in a
> synchronous program, is pretty simple.
>
> One option, recently introduced to Twisted, is "react" -
> http://twistedmatrix.com/documents/12.3.0/api/twisted.internet.task.html#react
>  The idea is to encapsulate the lifetime of a client in a single
> asynchronous operation; the synchronous parallel is libc calling
> `exit` for you when `main` returns.  If all of your library code
> cooperates and reliably indicates when it's done with any background
> operations, then this is a good choice.
>
> In cases where your libraries are less than perfect (perhaps they sync
> to the cloud "in the background"), the run-until-idle behavior is
> useful.  The client calls a function that triggers a cascade of
> events.  When that cascade has exhausted itself, the process exits.
> Synchronous, threaded programs do this with non-daemon threads.
>
> I think that this option should be supported, if only for the
> parallelism with synchronous code.
>
>
> As for run-until-idle - I've used this sort of behavior occasionally
> in tests, where I want to carefully control the sequence of
> operations.  For example, I may want to reliably test handling of race
> conditions:
>
> op = start_operation()
> while not in_critical_section():
>     run_once()
> generate_conflict()
> while in_critical_section():
>     run_once()
> assert something()
>
> Such a case would rely heavily on the details of the event loop.
> Depending on how closely I want to tie my tests to that
> implementation, that may or may not be OK.  If a particular event loop
> implementation doesn't even *have* this model (as, it appears, Tornado
> does not), then I think it would be fine to simply not implement this
> operation.  So perhaps run_once() should be described as optional in
> the PEP?

Despite some earlier moves in that direction I am not actually a fan
of having optional parts in a spec. That way it's too easy for an app
to claim compliance without actually running anywhere except on its
"home" framework.

I think that run_until_idle() can be safely replaced by
run_until_complete(some_future). For run_once(), I expect that I will
be able to concoct alternatives just fine as well.

And, to Greg (who somehow replied in a separate thread), I amcertainly
 not planning to write the entire stack with only callbacks! Much of
the code will have Futures on the outside and coroutines on the
inside.

-- 
--Guido van Rossum (python.org/~guido)


From fafhrd91 at gmail.com  Thu Jan 24 04:50:45 2013
From: fafhrd91 at gmail.com (Nikolay Kim)
Date: Wed, 23 Jan 2013 19:50:45 -0800
Subject: [Python-ideas] PEP 3156 - gunicorn worker
Message-ID: <36B67E59-ED7C-46E8-84DD-08E13C8CB5E0@gmail.com>

Hello,

To get feeling of  tulip I wrote gunicorn worker and websocket server, it is possible to run
wsgi app on top of it. maybe someone will be interested.

gunicorn worker - https://github.com/fafhrd91/gtulip
websocket server - https://github.com/fafhrd91/pyramid_sockjs2



From amauryfa at gmail.com  Thu Jan 24 09:37:55 2013
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Thu, 24 Jan 2013 09:37:55 +0100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <51007E16.30805@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
	<1358959865.5194.8.camel@gdoba.domain.local>
	<CAGmFidZU0rZ+jeuuTFH+2DOY3Tpdh_OjFyrOQ+-MXvDEsbbQ=A@mail.gmail.com>
	<20130123181551.44a6e0cb@pitrou.net>
	<CAGu0AntUv+F_UwYx35-dwEV-gp9CpDkcU1Rk6Oo_vG-oAQCVdg@mail.gmail.com>
	<51007E16.30805@pearwood.info>
Message-ID: <CAGmFidadt1J4w0HUqeuKS3Y1XUhA6rddY=QxkY9Sv1BY8zNgZw@mail.gmail.com>

2013/1/24 Steven D'Aprano <steve at pearwood.info>

> -1 to adding optional parameters that change the behaviour of a class.


Unfortunately there is a precedent with csv.DictWriter:
extrasaction='raise' or 'ignore'.
And the feature is close to the one proposed here: how to deal with
"invalid" data.

-- 
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/7dcd2798/attachment.html>

From mark.hackett at metoffice.gov.uk  Thu Jan 24 11:33:01 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Thu, 24 Jan 2013 10:33:01 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CAGmFidadt1J4w0HUqeuKS3Y1XUhA6rddY=QxkY9Sv1BY8zNgZw@mail.gmail.com>
References: <1358903168.4767.4.camel@webb> <51007E16.30805@pearwood.info>
	<CAGmFidadt1J4w0HUqeuKS3Y1XUhA6rddY=QxkY9Sv1BY8zNgZw@mail.gmail.com>
Message-ID: <201301241033.01555.mark.hackett@metoffice.gov.uk>

On Thursday 24 Jan 2013, Amaury Forgeot d'Arc wrote:
> 2013/1/24 Steven D'Aprano <steve at pearwood.info>
> 
> > -1 to adding optional parameters that change the behaviour of a class.
> 
> Unfortunately there is a precedent with csv.DictWriter:
> extrasaction='raise' or 'ignore'.
> And the feature is close to the one proposed here: how to deal with
> "invalid" data.
> 

Just because you did wrong before doesn't mean you need to do it wrong again!


From mark.hackett at metoffice.gov.uk  Thu Jan 24 11:37:57 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Thu, 24 Jan 2013 10:37:57 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
References: <1358903168.4767.4.camel@webb>
	<201301231832.20087.mark.hackett@metoffice.gov.uk>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
Message-ID: <201301241037.57101.mark.hackett@metoffice.gov.uk>

On Wednesday 23 Jan 2013, Jerry Hill wrote:
> On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett
> 
> <mark.hackett at metoffice.gov.uk> wrote:
> > I can't see why there would be duplicate column headers for valid reason.
> >
> > Someone may have written their CSV export incorrectly, but that's not
> > actually valid.
> 
> Sure it is.  Since there is no formal spec for .csv files, having a
> multiple columns with the same text in the header is a perfectly valid
> .csv file.  For what it's worth, the informal spec for csv files seems

Then you don't want it put in a dictionary, since a dictionary doesn't allow 
duplicate fields.

> to be "whatever Excel does" and Excel (and every other
> spreadsheet-oriented program) is happy to let you have duplicated
> headers too.

You don't, in Excel, use the name of the column in your calculation, you use 
the unique column ID (A, B, C..AA, AB, ...).

> 
> > It would therefore be arguable for the program to give at least a WARNING
> > that it's throwing data away.
> 
> I think the library should give the programmer some sort of indication
> that they are losing data.  Personally, I'd prefer an exception which
> can either be caught or not, depending on whether the program is
> designed to handle the situation or not.
> 
> > However, since python is mechanising this as a dictionary and since in
> > python setting A to 1 then setting A to 3 would throw away the earlier
> > value for A and the import function working AS EXPECTED in Python.
> 
> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
> It's not terribly surprising once you sit down and think about it, but
> it's certainly at least a little unexpected to me that data is being
> thrown away with no notice.  It's unusual for errors to pass silently
> in python.
> 

Python doesn't warn about duplicate addition to keys, so as expected, it isn't 
warning about them now.

Programming languages are hard enough to understand (why does everyone use a 
different way of stopping a loop???), so it's not a good idea to have little 
codas to the way things are done "oh, unless you're putting it into a 
dictionary via this call...".

I can understand the library call doing so, mind, but I can also see the 
writer of the library going "You're putting it into a dictionary. Well, you 
know what happens when you put duplicate entries in them, right, else you 
wouldn't be using this routine that puts csv entries into a dictionary".


From mark.hackett at metoffice.gov.uk  Thu Jan 24 11:41:41 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Thu, 24 Jan 2013 10:41:41 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1358975634.4866.0.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<1358975634.4866.0.camel@gdoba.domain.local>
Message-ID: <201301241041.41301.mark.hackett@metoffice.gov.uk>

On Wednesday 23 Jan 2013, J. Cliff Dyer wrote:
> 
> Moreover, I think while it might be expected for a dict to do this, it
> does not follow that a DictReader should be expected to silently throw
> away the user's data.
> Cheers,
> Cliff
> 
> 

Cliff, the name of the routine is "DictReader".

It is a very big hint.

Like I said, the situation here is putting formatting expectations on the file 
being read in.

It's pretty identical with sockets or threading libraries in python. If you 
want a specific action done that isn't "normal" for just "make one of them", 
you put policy on it as a decoration. But if you wanted some specific action 
and don't use the decorator to do so, you don't get an error, you get what you 
get without the decorator.

> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
> 



From mark.hackett at metoffice.gov.uk  Thu Jan 24 11:47:17 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Thu, 24 Jan 2013 10:47:17 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <51007FCC.5090400@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
Message-ID: <201301241047.17391.mark.hackett@metoffice.gov.uk>

On Thursday 24 Jan 2013, Steven D'Aprano wrote:

> > I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
> > It's not terribly surprising once you sit down and think about it, but
> > it's certainly at least a little unexpected to me that data is being
> > thrown away with no notice.  It's unusual for errors to pass silently
> > in python.
> 
> Yes, we should not forget that a CSV file is not a dict. Just because
>  DictReader is implemented with a dict as the storage, doesn't mean that it
>  should behave exactly like a dict in all things. Multiple columns with the
>  same name are legal in CSV, so there should be a reader for that
>  situation.
> 

But just because it's reading a csv file, we shouldn't change how a dictionary 
works if you add the same key again.

Duplicate headings in a csv file are as legal as using the same name for 
something else in a programming language.

e.g.

endvalue=a+b+c/5
...code using that result...
endvalue = os.printerr(file_descriptor)
...print out an error string...

this is "legal" but really REALLY smelly.

Similarly a multivalued csv file.

Excel uses the column ID not the name on the first row, to identify the columns 
in its macro language. Because otherwise which "endvalue" column did you mean?


From shane at umbrellacode.com  Thu Jan 24 12:55:05 2013
From: shane at umbrellacode.com (Shane Green)
Date: Thu, 24 Jan 2013 03:55:05 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301241047.17391.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
Message-ID: <F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>

Not sure if I'm reading the discussion correctly, but it sounds like there's discussion about whether swallowing CSV values when confronted with multiple columns by the same name, which seems very incorrect if so.  CSV doesn't even mandate column headers exist at all, as far as I know.  If anything I would think mapping column positions to header values would make sense, such that header.items() -> [(0, header1), (1, header2), (2, header3), etc.], and header1 and header2 could be equal.  To work with rows as dictionaries they can follow the FieldStorage model and have lists of values?either when there's a collision, or always?so all column values are contained. 




Shane Green 
www.umbrellacode.com
805-452-9666 | shane at umbrellacode.com

On Jan 24, 2013, at 2:47 AM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

> On Thursday 24 Jan 2013, Steven D'Aprano wrote:
> 
>>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
>>> It's not terribly surprising once you sit down and think about it, but
>>> it's certainly at least a little unexpected to me that data is being
>>> thrown away with no notice.  It's unusual for errors to pass silently
>>> in python.
>> 
>> Yes, we should not forget that a CSV file is not a dict. Just because
>> DictReader is implemented with a dict as the storage, doesn't mean that it
>> should behave exactly like a dict in all things. Multiple columns with the
>> same name are legal in CSV, so there should be a reader for that
>> situation.
>> 
> 
> But just because it's reading a csv file, we shouldn't change how a dictionary 
> works if you add the same key again.
> 
> Duplicate headings in a csv file are as legal as using the same name for 
> something else in a programming language.
> 
> e.g.
> 
> endvalue=a+b+c/5
> ...code using that result...
> endvalue = os.printerr(file_descriptor)
> ...print out an error string...
> 
> this is "legal" but really REALLY smelly.
> 
> Similarly a multivalued csv file.
> 
> Excel uses the column ID not the name on the first row, to identify the columns 
> in its macro language. Because otherwise which "endvalue" column did you mean?
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/c092d6af/attachment.html>

From ncoghlan at gmail.com  Thu Jan 24 13:33:07 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 24 Jan 2013 22:33:07 +1000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
Message-ID: <CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>

On Thu, Jan 24, 2013 at 9:55 PM, Shane Green <shane at umbrellacode.com> wrote:
> Not sure if I'm reading the discussion correctly, but it sounds like there's
> discussion about whether swallowing CSV values when confronted with multiple
> columns by the same name, which seems very incorrect if so.  CSV doesn't
> even mandate column headers exist at all, as far as I know.  If anything I
> would think mapping column positions to header values would make sense, such
> that header.items() -> [(0, header1), (1, header2), (2, header3), etc.], and
> header1 and header2 could be equal.  To work with rows as dictionaries they
> can follow the FieldStorage model and have lists of values?either when
> there's a collision, or always?so all column values are contained.

That's not quite the discussion. The discussion is specifically about
*DictReader*, and whether it should:

1. Do any data conditioning by ignoring empty lines and lines of just
field delimiters before the header row (consensus seems to be "no")
2. Give an error when encountering a duplicate field name (which will
lead to data loss when reading from the file) (consensus seems to be
"yes")

The problem with the latter suggestion is that it's a backwards
incompatible change - code where "use the last column with that name"
is the correct behaviour currently works, but would be broken if that
situation was declared an error.

Rather than messing with DictReader, it seems more fruitful to further
investigate the idea of a namedtuple based reader
(http://bugs.python.org/issue1818). The "multiple columns with the
same name" use case seems specialised enough that the standard readers
can continue to ignore it (although, as noted earlier in this thread,
a namedtuple based reader will correctly reject duplicate column
names)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From solipsis at pitrou.net  Thu Jan 24 13:38:58 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 24 Jan 2013 13:38:58 +0100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
Message-ID: <20130124133858.32622f6e@pitrou.net>

Le Thu, 24 Jan 2013 22:33:07 +1000,
Nick Coghlan <ncoghlan at gmail.com> a
?crit :
> On Thu, Jan 24, 2013 at 9:55 PM, Shane Green
> <shane at umbrellacode.com> wrote:
> > Not sure if I'm reading the discussion correctly, but it sounds
> > like there's discussion about whether swallowing CSV values when
> > confronted with multiple columns by the same name, which seems very
> > incorrect if so.  CSV doesn't even mandate column headers exist at
> > all, as far as I know.  If anything I would think mapping column
> > positions to header values would make sense, such that
> > header.items() -> [(0, header1), (1, header2), (2, header3), etc.],
> > and header1 and header2 could be equal.  To work with rows as
> > dictionaries they can follow the FieldStorage model and have lists
> > of values?either when there's a collision, or always?so all column
> > values are contained.
> 
> That's not quite the discussion. The discussion is specifically about
> *DictReader*, and whether it should:
> 
> 1. Do any data conditioning by ignoring empty lines and lines of just
> field delimiters before the header row (consensus seems to be "no")
> 2. Give an error when encountering a duplicate field name (which will
> lead to data loss when reading from the file) (consensus seems to be
> "yes")
> 
> The problem with the latter suggestion is that it's a backwards
> incompatible change - code where "use the last column with that name"
> is the correct behaviour currently works, but would be broken if that
> situation was declared an error.

It's not really a problem if the new behaviour is conditioned by a
constructor argument.

Regards

Antoine.




From eliben at gmail.com  Thu Jan 24 14:25:08 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Thu, 24 Jan 2013 05:25:08 -0800
Subject: [Python-ideas] reducing multiprocessing.Queue contention
In-Reply-To: <CAH_1eM1Ws1MZ_raVP0JMocHz4XE9+MPObd30GH0q+6M4xyHAZw@mail.gmail.com>
References: <CAH_1eM35_aeOupBx09Pe8dAj-bUX40fs1Z3uj96-JiBY2G2S3w@mail.gmail.com>
	<CAF-Rda9QWxPZ8bZ7r1_dRpzbzcz9n+7fobB8H8Awe_CRMidwpQ@mail.gmail.com>
	<CAH_1eM1Ws1MZ_raVP0JMocHz4XE9+MPObd30GH0q+6M4xyHAZw@mail.gmail.com>
Message-ID: <CAF-Rda9yxN1WN2hnZ+tGioU9A1iK0LOWXf+Hg4sSjYUjHAqCWQ@mail.gmail.com>

On Wed, Jan 23, 2013 at 12:03 PM, Charles-Fran?ois Natali <
cf.natali at gmail.com> wrote:

> > In general, this sounds good. There's indeed no reason to perform the
> > serialization under a lock.
> >
> > It would be great to have some measurements to see just how much it
> takes,
> > though.
>
> I was curious, so I wrote a quick and dirty patch (it's doesn't
> support timed get()/put(), so I won't post it here).
>
> I used the attached script as benchmark: basically, it just spawns a
> bunch of processes that put()/get() to a queue some data repeatedly
> (10000 times a list of 1024 ints), and returns when everything has
> been sent and received.
>
> The following tests have been made on an 8-cores box, from 1 reader/1
> writer up to 4 readers/4 writers (it would be interesting to see with
> only 1 writer and multiple readers, but readers would keep waiting for
> input so it requires another benchmark):
>
> Without patch:
> """
> $ ./python /tmp/multi_queue.py
> took 0.7993290424346924 seconds with 1 workers
> took 1.8892168998718262 seconds with 2 workers
> took 3.075777053833008 seconds with 3 workers
> took 4.050479888916016 seconds with 4 workers
> """
>
> With patch:
> """
> $ ./python /tmp/multi_queue.py
> took 0.7730131149291992 seconds with 1 workers
> took 0.7471320629119873 seconds with 2 workers
> took 0.752316951751709 seconds with 3 workers
> took 0.8303961753845215 seconds with 4 workers
> """
>

Looks great, what are you waiting for :-)

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/e188a7c0/attachment.html>

From benoitc at gunicorn.org  Thu Jan 24 14:50:21 2013
From: benoitc at gunicorn.org (Benoit Chesneau)
Date: Thu, 24 Jan 2013 14:50:21 +0100
Subject: [Python-ideas] PEP 3156 - gunicorn worker
In-Reply-To: <36B67E59-ED7C-46E8-84DD-08E13C8CB5E0@gmail.com>
References: <36B67E59-ED7C-46E8-84DD-08E13C8CB5E0@gmail.com>
Message-ID: <B2B77F19-7883-4230-86A2-8FEA8C7C383F@gunicorn.org>


On Jan 24, 2013, at 4:50 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:

> Hello,
> 
> To get feeling of  tulip I wrote gunicorn worker and websocket server, it is possible to run
> wsgi app on top of it. maybe someone will be interested.
> 
> gunicorn worker - https://github.com/fafhrd91/gtulip
> websocket server - https://github.com/fafhrd91/pyramid_sockjs2
> 
> 


Just tested also against a pure wsgi app:

$ gunicorn -w 3 -k gtulip.worker.TulipWorker test:app
2013-01-24 14:45:24 [55771] [INFO] Starting gunicorn 0.17.2
2013-01-24 14:45:24 [55771] [INFO] Listening at: http://127.0.0.1:8000 (55771)
2013-01-24 14:45:24 [55771] [INFO] Using worker: gtulip.worker.TulipWorker
2013-01-24 14:45:24 [55774] [INFO] Booting worker with pid: 55774
2013-01-24 14:45:24 [55775] [INFO] Booting worker with pid: 55775
2013-01-24 14:45:24 [55776] [INFO] Booting worker with pid: 55776


and it works great. Will do more test asap :)

- beno?t



From jcd at sdf.lonestar.org  Thu Jan 24 16:11:34 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Thu, 24 Jan 2013 10:11:34 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <20130124133858.32622f6e@pitrou.net>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
Message-ID: <1359040294.4802.29.camel@gdoba.domain.local>

On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote:
> > 1. Do any data conditioning by ignoring empty lines and lines of
> > just field delimiters before the header row (consensus seems to be
> > "no")

Well, I wouldn't necessarily say we have a consensus on this one.  This
idea received a +1 from Bruce Leban and an "I don't see any reason not
to" from Steven D'Aprano.

Objections are:

1. It's a backwards-incompatible change.  (This could be mitigated in a
couple ways, as with the duplicate header problem, below). I don't think
anyone has argued that programmers would ever actually want to use the
blank line as the headers, only that they may be doing it now as a
workaround, and breaking the workarounds is undesirable.

2. You should pre-process the CSV instead of adapting the reader to
malformations. (In which case, I think the DictReader.reader attribute
should be better documented, so programmers have some guidance how to do
the pre-processing, as the current DictReader can cause data loss which
would make it difficult to recover the real headers without using the
underlying reader).


> > 2. Give an error when encountering a duplicate field name (which 
> > will lead to data loss when reading from the file) (consensus seems
> > to be "yes") 

Mostly, but with a strong objection from Mark Hackett, and hesitation
about altering current behavior from Amaury Forgeot d'Arc.

Proposals to solve this problem:

1. Raise an exception (After setting the fieldnames, I think, so if you
wanted to catch and continue or catch and edit the conflicting
fieldnames, you could do so).

2. Combine multiple fields with the same header into a list under the
same key.

2a. Make lists when there are multiple fields, but otherwise, key to
strings as is currently done

2b. For consistency, make all values lists, regardless of the number of
columns.

Proposals for implementation:

1. Create a new Reader class.  Suggestions include
"CarefulDictReader" (for the version that raises an exception) and
"MultiDictReader" (for the versions that make lists of values).

2. Add an option to DictReader.  The idea to add an option for a
MultiDictReader-like behavior was objected to, but there were multiple
suggestions to add an option for raising an exception, in one case with
the idea that in the future ("Python 4") the option would be standard
behavior.


Note: If we were to implement a CarefulDictReader, it could, without
backward incompatibility, implement both skipping of blank header lines,
and exception raising on duplicate headers.

Cheers,
Cliff




From jcd at sdf.lonestar.org  Thu Jan 24 16:23:24 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Thu, 24 Jan 2013 10:23:24 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
Message-ID: <1359041004.4802.32.camel@gdoba.domain.local>

On Thu, 2013-01-24 at 22:33 +1000, Nick Coghlan wrote:
> The problem with the latter suggestion is that it's a backwards
> incompatible change - code where "use the last column with that name"
> is the correct behaviour currently works, but would be broken if that
> situation was declared an error. 

One example where a programmer would legitimately want to ignore errors
of this kind: A CSV file has a number of named columns, and a few
unnamed ones, and the programmer doesn't care about data from the
unnamed columns.  The unnamed columns all have the same name (''), and
would raise this exception.  Hence the need to be able to suppress it
somehow (e.g., by instantiation argument or by catching the exception)
without losing the fieldnames.

Cheers,
Cliff






From rosuav at gmail.com  Thu Jan 24 16:24:23 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 25 Jan 2013 02:24:23 +1100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
Message-ID: <CAPTjJmqrDiuGwON9Qr-dcZp=cf9Qs635h-PwVZRYFkj++yzaPA@mail.gmail.com>

On Fri, Jan 25, 2013 at 2:11 AM, J. Cliff Dyer <jcd at sdf.lonestar.org> wrote:
> On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote:
>> > 1. Do any data conditioning by ignoring empty lines and lines of
>> > just field delimiters before the header row (consensus seems to be
>> > "no")
>
> Well, I wouldn't necessarily say we have a consensus on this one.  This
> idea received a +1 from Bruce Leban and an "I don't see any reason not
> to" from Steven D'Aprano.

I've been lurking this thread, but fwiw, I'd put +1 on ignoring empty
lines/just delimiter lines. For a row of column headers, a completely
blank line makes no sense. It's a backward-incompatible change, yes,
but I can't imagine any code actively relying on this. ISTM this would
probably be safe for a minor release (Python 3.4), though of course
not for Python 3.3.1.

ChrisA


From shane at umbrellacode.com  Thu Jan 24 16:28:49 2013
From: shane at umbrellacode.com (Shane Green)
Date: Thu, 24 Jan 2013 07:28:49 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
Message-ID: <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com>

Since every form of CSV file counts EOL as a line terminator, I think discarding empty lines preceding the headers is arguably acceptable, but do not think discarding lines of just delimiters would be.  What about extending the DictReader API so it was easy to perform these actions explicitly, such as being able to discard() the field names to be re-evaluated on the next line?





Shane Green 
www.umbrellacode.com
805-452-9666 | shane at umbrellacode.com

On Jan 24, 2013, at 7:11 AM, "J. Cliff Dyer" <jcd at sdf.lonestar.org> wrote:

> On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote:
>>> 1. Do any data conditioning by ignoring empty lines and lines of
>>> just field delimiters before the header row (consensus seems to be
>>> "no")
> 
> Well, I wouldn't necessarily say we have a consensus on this one.  This
> idea received a +1 from Bruce Leban and an "I don't see any reason not
> to" from Steven D'Aprano.
> 
> Objections are:
> 
> 1. It's a backwards-incompatible change.  (This could be mitigated in a
> couple ways, as with the duplicate header problem, below). I don't think
> anyone has argued that programmers would ever actually want to use the
> blank line as the headers, only that they may be doing it now as a
> workaround, and breaking the workarounds is undesirable.
> 
> 2. You should pre-process the CSV instead of adapting the reader to
> malformations. (In which case, I think the DictReader.reader attribute
> should be better documented, so programmers have some guidance how to do
> the pre-processing, as the current DictReader can cause data loss which
> would make it difficult to recover the real headers without using the
> underlying reader).
> 
> 
>>> 2. Give an error when encountering a duplicate field name (which 
>>> will lead to data loss when reading from the file) (consensus seems
>>> to be "yes") 
> 
> Mostly, but with a strong objection from Mark Hackett, and hesitation
> about altering current behavior from Amaury Forgeot d'Arc.
> 
> Proposals to solve this problem:
> 
> 1. Raise an exception (After setting the fieldnames, I think, so if you
> wanted to catch and continue or catch and edit the conflicting
> fieldnames, you could do so).
> 
> 2. Combine multiple fields with the same header into a list under the
> same key.
> 
> 2a. Make lists when there are multiple fields, but otherwise, key to
> strings as is currently done
> 
> 2b. For consistency, make all values lists, regardless of the number of
> columns.
> 
> Proposals for implementation:
> 
> 1. Create a new Reader class.  Suggestions include
> "CarefulDictReader" (for the version that raises an exception) and
> "MultiDictReader" (for the versions that make lists of values).
> 
> 2. Add an option to DictReader.  The idea to add an option for a
> MultiDictReader-like behavior was objected to, but there were multiple
> suggestions to add an option for raising an exception, in one case with
> the idea that in the future ("Python 4") the option would be standard
> behavior.
> 
> 
> Note: If we were to implement a CarefulDictReader, it could, without
> backward incompatibility, implement both skipping of blank header lines,
> and exception raising on duplicate headers.
> 
> Cheers,
> Cliff
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/4dc25083/attachment.html>

From mark.hackett at metoffice.gov.uk  Thu Jan 24 16:29:19 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Thu, 24 Jan 2013 15:29:19 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb> <20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
Message-ID: <201301241529.19304.mark.hackett@metoffice.gov.uk>

On Thursday 24 Jan 2013, J. Cliff Dyer wrote:
> > > 2. Give an error when encountering a duplicate field name (which 
> > > will lead to data loss when reading from the file) (consensus seems
> > > to be "yes") 
> 
> Mostly, but with a strong objection from Mark Hackett, and hesitation
> about altering current behavior from Amaury Forgeot d'Arc.
> 


More along the lines of your earlier:

> 1. It's a backwards-incompatible change.

strong objection. :-)

Programs that had been working will stop. Programs that won't work because it 
doesn't throw an exception yet are no worse off.

When you change something, you'll hear almost entirely from those for whom the 
change will be useful. From those who will find it an obstacle, you don't hear 
from. Until it's implemented.

Requiring catching an exception means that until the code is changed, your 
working program no longer works.

And as you later point out Cliff, empty and uninteresting field names may 
legitimately exist and WANT to be ignored.

So although I CAN see a reasoning for an exception, I do not see it as enough 
to put it in this version of the library. It's a learning process and for the 
next version which will need code changes to incorporate anyway, that 
knowledge can be used to make things better *next time*.


From jcd at sdf.lonestar.org  Thu Jan 24 16:55:17 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Thu, 24 Jan 2013 10:55:17 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <201301241529.19304.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb> <20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
	<201301241529.19304.mark.hackett@metoffice.gov.uk>
Message-ID: <1359042917.4802.39.camel@gdoba.domain.local>

On Thu, 2013-01-24 at 15:29 +0000, Mark Hackett wrote:
> On Thursday 24 Jan 2013, J. Cliff Dyer wrote:
> > > > 2. Give an error when encountering a duplicate field name (which 
> > > > will lead to data loss when reading from the file) (consensus seems
> > > > to be "yes") 
> > 
> > Mostly, but with a strong objection from Mark Hackett, and hesitation
> > about altering current behavior from Amaury Forgeot d'Arc.
> > 
> 
> 
> More along the lines of your earlier:
> 
> > 1. It's a backwards-incompatible change.
> 
> strong objection. :-)
> 
> Programs that had been working will stop. Programs that won't work because it 
> doesn't throw an exception yet are no worse off.
> 

Noted.  I will say that this doesn't seem any worse than any other
backwards-incompatible change, which are sometimes allowed, so it should
probably be considered by the same standard.

That said, what are your feelings on adding a CarefulDictReader?  



From jcd at sdf.lonestar.org  Thu Jan 24 17:08:16 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Thu, 24 Jan 2013 11:08:16 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
	<2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com>
Message-ID: <1359043696.4802.42.camel@gdoba.domain.local>

On Thu, 2013-01-24 at 07:28 -0800, Shane Green wrote:
> Since every form of CSV file counts EOL as a line terminator, I think
> discarding empty lines preceding the headers is arguably acceptable,
> but do not think discarding lines of just delimiters would be.  What
> about extending the DictReader API so it was easy to perform these
> actions explicitly, such as being able to discard() the field names to
> be re-evaluated on the next line?

I think I like this idea.  There's something a little distasteful about
making the user manually delve into the underlying reader, but this
makes it more user-friendly and more obvious how to proceed.

For clarity's sake, what is your objection to discarding lines of
delimiters?  The reason I suggest doing it is that it is a common output
situation when exporting Excel files or LibreCalc files that have a
blank row at the top.

Cheers,
Cliff




From ubershmekel at gmail.com  Thu Jan 24 17:08:34 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Thu, 24 Jan 2013 18:08:34 +0200
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
Message-ID: <CANSw7KyKj80_DjjzW56HqxrVQyHkpPfGxqiVZcmaL5bf8DtX9w@mail.gmail.com>

On Thu, Jan 24, 2013 at 5:11 PM, J. Cliff Dyer <jcd at sdf.lonestar.org> wrote:

> On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote:
> > > 1. Do any data conditioning by ignoring empty lines and lines of
> > > just field delimiters before the header row (consensus seems to be
> > > "no")
>
> Well, I wouldn't necessarily say we have a consensus on this one.  This
> idea received a +1 from Bruce Leban and an "I don't see any reason not
> to" from Steven D'Aprano.
>
>
Count me in that list as well.

If it were urllib handling a special case for a server you don't control
then fine. But it's a valid CSV file you can process yourself if you need
more control. We should keep DictReader simple. This is also a reason
against "CarefulDictReader". If you need to be more specific then use
csv.Reader.


>
> > > 2. Give an error when encountering a duplicate field name (which
> > > will lead to data loss when reading from the file) (consensus seems
> > > to be "yes")
>
> Mostly, but with a strong objection from Mark Hackett, and hesitation
> about altering current behavior from Amaury Forgeot d'Arc.
>

In that one too.

Maybe we should ask the people on this list
http://hg.python.org/cpython/log/5b02d622d625/Lib/csv.py

Yuval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/d4b74000/attachment.html>

From ubershmekel at gmail.com  Thu Jan 24 17:09:28 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Thu, 24 Jan 2013 18:09:28 +0200
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CANSw7KyKj80_DjjzW56HqxrVQyHkpPfGxqiVZcmaL5bf8DtX9w@mail.gmail.com>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
	<CANSw7KyKj80_DjjzW56HqxrVQyHkpPfGxqiVZcmaL5bf8DtX9w@mail.gmail.com>
Message-ID: <CANSw7Kyeb4nS45vTePLZHG1r+-8LHECC9yJ1BbvkkdswyddQqQ@mail.gmail.com>

To clarify - I agree with the aforementioned "consensus".
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/2983f07b/attachment.html>

From python at mrabarnett.plus.com  Thu Jan 24 17:12:09 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 24 Jan 2013 16:12:09 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CAPTjJmqrDiuGwON9Qr-dcZp=cf9Qs635h-PwVZRYFkj++yzaPA@mail.gmail.com>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
	<CAPTjJmqrDiuGwON9Qr-dcZp=cf9Qs635h-PwVZRYFkj++yzaPA@mail.gmail.com>
Message-ID: <51015D59.6030409@mrabarnett.plus.com>

On 2013-01-24 15:24, Chris Angelico wrote:
> On Fri, Jan 25, 2013 at 2:11 AM, J. Cliff Dyer <jcd at sdf.lonestar.org> wrote:
>> On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote:
>>> > 1. Do any data conditioning by ignoring empty lines and lines of
>>> > just field delimiters before the header row (consensus seems to be
>>> > "no")
>>
>> Well, I wouldn't necessarily say we have a consensus on this one.  This
>> idea received a +1 from Bruce Leban and an "I don't see any reason not
>> to" from Steven D'Aprano.
>
> I've been lurking this thread, but fwiw, I'd put +1 on ignoring empty
> lines/just delimiter lines. For a row of column headers, a completely
> blank line makes no sense. It's a backward-incompatible change, yes,
> but I can't imagine any code actively relying on this. ISTM this would
> probably be safe for a minor release (Python 3.4), though of course
> not for Python 3.3.1.
>
Ignoring empty lines before a header seems OK to me, but ignoring
just-delimiter lines doesn't.

To me, a just-delimiter line where it's expecting a header would mean
that all of the columns are unnamed, unless we insist that it's not a
header unless at least one column is named, and I don't think that that
should be the default behaviour.

As for duplicated columns names, I think that it should probably raise
an exception unless you've specified that duplicates should be put into
a list.



From mark.hackett at metoffice.gov.uk  Thu Jan 24 17:23:50 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Thu, 24 Jan 2013 16:23:50 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1359043696.4802.42.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com>
	<1359043696.4802.42.camel@gdoba.domain.local>
Message-ID: <201301241623.50279.mark.hackett@metoffice.gov.uk>

On Thursday 24 Jan 2013, J. Cliff Dyer wrote:
> For clarity's sake, what is your objection to discarding lines of
> delimiters?  The reason I suggest doing it is that it is a common output
> situation when exporting Excel files or LibreCalc files that have a
> blank row at the top.
> 
> Cheers,
> Cliff
> 

I'm putting too many pennies in this pot, I feel, but...

What was the purpose of those blank lines? Like duplicate column names at the 
first row, what you need to do with them depends on why they are there and what 
the program using the output wants to do.

If someone took the repository of macros from the spreadsheet which used 
column numbers and this was used to recreate EXACTLY whatever calculations 
were done without having to keep two copies of the same algorithm to account 
for the dropping of rows in the script, then dropping the rows would break 
this.

This really is policy (wrt the source of the CSV and the consumer of the 
dictionary).

Make it a pre process of the CSV to be used and configured to fit what the 
meaning of the CSV file output was to the producing program and what bits of it 
make a difference to the consumer of the dictionary's contents.


From jcd at sdf.lonestar.org  Thu Jan 24 17:40:09 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Thu, 24 Jan 2013 11:40:09 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <201301241617.58727.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301241529.19304.mark.hackett@metoffice.gov.uk>
	<1359042917.4802.39.camel@gdoba.domain.local>
	<201301241617.58727.mark.hackett@metoffice.gov.uk>
Message-ID: <1359045609.4802.57.camel@gdoba.domain.local>

On Thu, 2013-01-24 at 16:17 +0000, Mark Hackett wrote:
> > 
> > That said, what are your feelings on adding a CarefulDictReader?
> > 
> 
> It's as good a solution to me as any.
> 
> However, I'm not that good a programmer, and therefore what *I'd* do
> isn't 
> necessarily a good idea, it's just one of the better ones out of the
> limited 
> toolbox I have available.
> 
> I'd prefer (for aesthetic reasons) some sort of stream converter. Much
> like 
> freeze/thaw serialisation of data, it'd be a step between the raw csv
> and the 
> reader that reads it.
> 
> 

I think my reason for wanting to have a CarefulDictReader (or a careful
DictReader), and why I think a stream converter isn't the best solution,
is that CSVs are very commonly used by people just starting to get their
feet wet with programming.  Consider the use case: I've got my excel
file, and I'm just getting to the point where excel isn't cutting it
anymore.  I want to start manipulating my data with python, and everyone
is telling me to use the csv library.  DictReader sounds cool, because I
don't want to have to remember column numbers, and this is going make my
code much more readable.  But I can't make it read my headers simply
because I put some blank space at the top of my excel file, above my
headers.  

A stream converter is another layer of complexity that keeps this
potential new programmer from having a good experience with programming,
for what gain?  So that the csv library can "properly" (?) treat a line
without data as a header?  I think it would be fully reasonable (and add
little to no complexity to the code) to have a DictReader that treats
the first non-empty line as the header row.

The csv module is one of the big gateways into python programming for a
lot of people.  That's also one of the reasons I think the sockets
library is a poor analogue here.  A new programmer is unlikely to reach
the sockets library until they've been through a few of the urllibs, the
httplibs, requests, some part of http or an external web framework,
smtplib, or some other higher-level networking-related libraries.  

For the same reason, I think if the solution isn't something handled
automatically by the library, it needs to be accompanied by improvements
to the documentation.  If we're going to provide a DictReader that is
this easy to break, we need to answer the question: How do I fix it?  


Cheers,
Cliff




From jcd at sdf.lonestar.org  Thu Jan 24 17:41:07 2013
From: jcd at sdf.lonestar.org (J. Cliff Dyer)
Date: Thu, 24 Jan 2013 11:41:07 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <201301241623.50279.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com>
	<1359043696.4802.42.camel@gdoba.domain.local>
	<201301241623.50279.mark.hackett@metoffice.gov.uk>
Message-ID: <1359045667.4802.58.camel@gdoba.domain.local>

On Thu, 2013-01-24 at 16:23 +0000, Mark Hackett wrote:

> If someone took the repository of macros from the spreadsheet which used 
> column numbers and this was used to recreate EXACTLY whatever calculations 
> were done without having to keep two copies of the same algorithm to account 
> for the dropping of rows in the script, then dropping the rows would break 
> this.
> 

If that's the case, then why are you using a DictReader instead of a raw
csv.reader?  You're already losing the first row.



From shane at umbrellacode.com  Thu Jan 24 17:41:40 2013
From: shane at umbrellacode.com (Shane Green)
Date: Thu, 24 Jan 2013 08:41:40 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1359043696.4802.42.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
	<2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com>
	<1359043696.4802.42.camel@gdoba.domain.local>
Message-ID: <DB88DF77-A07B-4EE0-ABD1-319E5FBF2C1A@umbrellacode.com>

Well, my objection to doing it automatically was based in part on not being familiar with the common scenarios you've brought up, but the other reasons I had in mind were that it seemed like the kind of thing that might also be indicative of an error?something wrong with the data someone might want to know was happening rather than have masked; and also because discarding such rows leaves a question about the delimiter: it's now known, but knowing it based on rows we've discarded seems unclean.  





Shane Green 
www.umbrellacode.com
805-452-9666 | shane at umbrellacode.com

On Jan 24, 2013, at 8:08 AM, "J. Cliff Dyer" <jcd at sdf.lonestar.org> wrote:

> On Thu, 2013-01-24 at 07:28 -0800, Shane Green wrote:
>> Since every form of CSV file counts EOL as a line terminator, I think
>> discarding empty lines preceding the headers is arguably acceptable,
>> but do not think discarding lines of just delimiters would be.  What
>> about extending the DictReader API so it was easy to perform these
>> actions explicitly, such as being able to discard() the field names to
>> be re-evaluated on the next line?
> 
> I think I like this idea.  There's something a little distasteful about
> making the user manually delve into the underlying reader, but this
> makes it more user-friendly and more obvious how to proceed.
> 
> For clarity's sake, what is your objection to discarding lines of
> delimiters?  The reason I suggest doing it is that it is a common output
> situation when exporting Excel files or LibreCalc files that have a
> blank row at the top.
> 
> Cheers,
> Cliff
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/9829b159/attachment.html>

From guido at python.org  Thu Jan 24 19:23:40 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Jan 2013 10:23:40 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the
	transport
Message-ID: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>

A pragmatic question popped up: sometimes the protocol would like to
know the name of the socket or its peer, i.e. call getsockname() or
getpeername() on the underlying socket. (I can imagine wanting to log
this, or do some kind of IP address blocking.)

What should the interface for this look like? I can think of several ways:

A) An API to return the underlying socket, if there is one. (In the
case of a stack of transports and protocols there may not be one, so
it may return None.) Downside is that it requires the transport to use
sockets -- if it were to use some native Windows API there might not
be a socket object even though there might be an IP connection with
easily-accessible address and peer.

B) An API to get the address and peer address; e.g.
transport.getsockname() and transport.getpeername(). These would call
the corresponding call on the underlying socket, if there is one, or
return None otherwise; IP transports that don't use sockets would be
free to retrieve and return the requested information in a
platform-specific way. Note that the address may take different forms;
e.g. for AF_UNIX sockets it is a filename, so the protocol must be
prepared for different formats.

C) Similar to (A) or (B), but putting the API in an abstract subclass
of Transport (e.g. SocketTransport) so that a transport that doesn't
have this doesn't need to implement dummy methods returning None -- it
is now the protocol's responsibility to check for
isinstance(transport, SocketTransport) before calling the method. I'm
not so keen on this, Twisted has shown (IMO) that a deep hierarchy of
interfaces or ABCs does not necessarily provide clarity.

Discussion?

-- 
--Guido van Rossum (python.org/~guido)


From fafhrd91 at gmail.com  Thu Jan 24 19:41:51 2013
From: fafhrd91 at gmail.com (Nikolay Kim)
Date: Thu, 24 Jan 2013 10:41:51 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
Message-ID: <40C339D1-1B7B-4D4B-978C-96D4571E2DFF@gmail.com>


On Jan 24, 2013, at 10:23 AM, Guido van Rossum <guido at python.org> wrote:


> C) Similar to (A) or (B), but putting the API in an abstract subclass
> of Transport (e.g. SocketTransport) so that a transport that doesn't
> have this doesn't need to implement dummy methods returning None -- it
> is now the protocol's responsibility to check for
> isinstance(transport, SocketTransport) before calling the method. I'm
> not so keen on this, Twisted has shown (IMO) that a deep hierarchy of
> interfaces or ABCs does not necessarily provide clarity.
> 

SocketTransport could be abstract just like Transport class, just for description purpose.

Another question, should we expect ability to use protocols on top of different 
transports (i.e. HTTPProtocol and UnixSubprocessTransport) ?



From guido at python.org  Thu Jan 24 19:44:48 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Jan 2013 10:44:48 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <40C339D1-1B7B-4D4B-978C-96D4571E2DFF@gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<40C339D1-1B7B-4D4B-978C-96D4571E2DFF@gmail.com>
Message-ID: <CAP7+vJLUmtE2rDMobw78t8cwGp5hvm0v1hi7OWMvVpY1EdDDdg@mail.gmail.com>

On Thu, Jan 24, 2013 at 10:41 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>
> On Jan 24, 2013, at 10:23 AM, Guido van Rossum <guido at python.org> wrote:
>
>
>> C) Similar to (A) or (B), but putting the API in an abstract subclass
>> of Transport (e.g. SocketTransport) so that a transport that doesn't
>> have this doesn't need to implement dummy methods returning None -- it
>> is now the protocol's responsibility to check for
>> isinstance(transport, SocketTransport) before calling the method. I'm
>> not so keen on this, Twisted has shown (IMO) that a deep hierarchy of
>> interfaces or ABCs does not necessarily provide clarity.

> SocketTransport could be abstract just like Transport class, just for description purpose.

Yes, but I'm arguing against this. :-)

> Another question, should we expect ability to use protocols on top of different
> transports (i.e. HTTPProtocol and UnixSubprocessTransport) ?

Yes, it should be possible, for example the subprocess might implement
some kind of custom tunnel. If in this case there's no way to get the
socket or peer name, or if the names aren't very useful, that's okay.

-- 
--Guido van Rossum (python.org/~guido)


From ubershmekel at gmail.com  Thu Jan 24 19:45:17 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Thu, 24 Jan 2013 20:45:17 +0200
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
Message-ID: <CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>

On Thu, Jan 24, 2013 at 8:23 PM, Guido van Rossum <guido at python.org> wrote:

> A pragmatic question popped up: sometimes the protocol would like to
> know the name of the socket or its peer, i.e. call getsockname() or
> getpeername() on the underlying socket. (I can imagine wanting to log
> this, or do some kind of IP address blocking.)
>
> What should the interface for this look like? I can think of several ways:
>
> A) An API to return the underlying socket, if there is one. (In the
> case of a stack of transports and protocols there may not be one, so
> it may return None.) Downside is that it requires the transport to use
> sockets -- if it were to use some native Windows API there might not
> be a socket object even though there might be an IP connection with
> easily-accessible address and peer.
>

I feel (A) is the best option as it's the most flexible - underlying
transports can have many different special methods. No?

Yuval Greenfield
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/e19cef43/attachment.html>

From guido at python.org  Thu Jan 24 19:50:03 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Jan 2013 10:50:03 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
Message-ID: <CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>

On Thu, Jan 24, 2013 at 10:45 AM, Yuval Greenfield
<ubershmekel at gmail.com> wrote:
> On Thu, Jan 24, 2013 at 8:23 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> A pragmatic question popped up: sometimes the protocol would like to
>> know the name of the socket or its peer, i.e. call getsockname() or
>> getpeername() on the underlying socket. (I can imagine wanting to log
>> this, or do some kind of IP address blocking.)
>>
>> What should the interface for this look like? I can think of several ways:
>>
>> A) An API to return the underlying socket, if there is one. (In the
>> case of a stack of transports and protocols there may not be one, so
>> it may return None.) Downside is that it requires the transport to use
>> sockets -- if it were to use some native Windows API there might not
>> be a socket object even though there might be an IP connection with
>> easily-accessible address and peer.
>
>
> I feel (A) is the best option as it's the most flexible - underlying
> transports can have many different special methods. No?

The whole idea of defining a transport API is that the protocol
shouldn't care about what type of transport it is being used with. The
example of using an http client protocol with a subprocess transport
that invokes some kind of tunneling process might clarify this. So I
would like the transport API to be both small and fixed, rather than
having different transports have different extensions to the standard
transport API.

What other things might you want to do with the socket besides calling
getpeername() or getsockname()? Would that be reasonable to expect
from a protocol written to be independent of the specific transport
type?

-- 
--Guido van Rossum (python.org/~guido)


From fafhrd91 at gmail.com  Thu Jan 24 20:05:40 2013
From: fafhrd91 at gmail.com (Nikolay Kim)
Date: Thu, 24 Jan 2013 11:05:40 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
Message-ID: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>


On Jan 24, 2013, at 10:50 AM, Guido van Rossum <guido at python.org> wrote:

> On Thu, Jan 24, 2013 at 10:45 AM, Yuval Greenfield
> <ubershmekel at gmail.com> wrote:
>> On Thu, Jan 24, 2013 at 8:23 PM, Guido van Rossum <guido at python.org> wrote:
>>> 
>>> A pragmatic question popped up: sometimes the protocol would like to
>>> know the name of the socket or its peer, i.e. call getsockname() or
>>> getpeername() on the underlying socket. (I can imagine wanting to log
>>> this, or do some kind of IP address blocking.)
>>> 
>>> What should the interface for this look like? I can think of several ways:
>>> 
>>> A) An API to return the underlying socket, if there is one. (In the
>>> case of a stack of transports and protocols there may not be one, so
>>> it may return None.) Downside is that it requires the transport to use
>>> sockets -- if it were to use some native Windows API there might not
>>> be a socket object even though there might be an IP connection with
>>> easily-accessible address and peer.
>> 
>> 
>> I feel (A) is the best option as it's the most flexible - underlying
>> transports can have many different special methods. No?
> 
> The whole idea of defining a transport API is that the protocol
> shouldn't care about what type of transport it is being used with. The
> example of using an http client protocol with a subprocess transport
> that invokes some kind of tunneling process might clarify this. So I
> would like the transport API to be both small and fixed, rather than
> having different transports have different extensions to the standard
> transport API.
> 
> What other things might you want to do with the socket besides calling
> getpeername() or getsockname()? Would that be reasonable to expect
> from a protocol written to be independent of the specific transport
> type?

transport could have dictionary attribute where it can store optional information
like socket name, peer name or file path, etc.



From guido at python.org  Thu Jan 24 20:12:00 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Jan 2013 11:12:00 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
Message-ID: <CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>

On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>
> On Jan 24, 2013, at 10:50 AM, Guido van Rossum <guido at python.org> wrote:
>
>> On Thu, Jan 24, 2013 at 10:45 AM, Yuval Greenfield
>> <ubershmekel at gmail.com> wrote:
>>> On Thu, Jan 24, 2013 at 8:23 PM, Guido van Rossum <guido at python.org> wrote:
>>>>
>>>> A pragmatic question popped up: sometimes the protocol would like to
>>>> know the name of the socket or its peer, i.e. call getsockname() or
>>>> getpeername() on the underlying socket. (I can imagine wanting to log
>>>> this, or do some kind of IP address blocking.)
>>>>
>>>> What should the interface for this look like? I can think of several ways:
>>>>
>>>> A) An API to return the underlying socket, if there is one. (In the
>>>> case of a stack of transports and protocols there may not be one, so
>>>> it may return None.) Downside is that it requires the transport to use
>>>> sockets -- if it were to use some native Windows API there might not
>>>> be a socket object even though there might be an IP connection with
>>>> easily-accessible address and peer.
>>>
>>>
>>> I feel (A) is the best option as it's the most flexible - underlying
>>> transports can have many different special methods. No?
>>
>> The whole idea of defining a transport API is that the protocol
>> shouldn't care about what type of transport it is being used with. The
>> example of using an http client protocol with a subprocess transport
>> that invokes some kind of tunneling process might clarify this. So I
>> would like the transport API to be both small and fixed, rather than
>> having different transports have different extensions to the standard
>> transport API.
>>
>> What other things might you want to do with the socket besides calling
>> getpeername() or getsockname()? Would that be reasonable to expect
>> from a protocol written to be independent of the specific transport
>> type?
>
> transport could have dictionary attribute where it can store optional information
> like socket name, peer name or file path, etc.

Aha, that makes some sense. Though maybe it shouldn't be a dict -- it
may be expensive to populate some values in some cases, so maybe there
should just be a method transport.get_extra_info('key') which computes
and returns (and possibly caches) certain values but returns None if
the info is not supported. E.g. get_extra_info('name'),
get_extra_info('peer'). This API makes it pretty clear that the caller
should check the value for None before using it.

-- 
--Guido van Rossum (python.org/~guido)


From ben at bendarnell.com  Thu Jan 24 20:14:38 2013
From: ben at bendarnell.com (Ben Darnell)
Date: Thu, 24 Jan 2013 14:14:38 -0500
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
Message-ID: <CAFkYKJ7xFaFfWvmD-CQ2SY16nA7DDJqPUzn-dN4XtAbZtDP3tw@mail.gmail.com>

On Tornado we basically do A (the IOStream's socket attribute was never
really documented for public consumption but has become the de facto
standard way to get this kind of information).  As food for thought,
consider extending this to include not just peer address but also SSL
certificates.  Tornado's SSL support uses the stdlib's ssl.SSLSocket, so
the certificate is available from the socket object, but Twisted (I
believe) uses pycrypto and things work differently there.  To expose SSL
certificates (and NPN, and other information that may or may not be there
depending on SSL implementation) across both tornado- and twisted-based
transports you'd need something like B or C.

-Ben

On Thu, Jan 24, 2013 at 1:23 PM, Guido van Rossum <guido at python.org> wrote:

> A pragmatic question popped up: sometimes the protocol would like to
> know the name of the socket or its peer, i.e. call getsockname() or
> getpeername() on the underlying socket. (I can imagine wanting to log
> this, or do some kind of IP address blocking.)
>
> What should the interface for this look like? I can think of several ways:
>
> A) An API to return the underlying socket, if there is one. (In the
> case of a stack of transports and protocols there may not be one, so
> it may return None.) Downside is that it requires the transport to use
> sockets -- if it were to use some native Windows API there might not
> be a socket object even though there might be an IP connection with
> easily-accessible address and peer.
>
> B) An API to get the address and peer address; e.g.
> transport.getsockname() and transport.getpeername(). These would call
> the corresponding call on the underlying socket, if there is one, or
> return None otherwise; IP transports that don't use sockets would be
> free to retrieve and return the requested information in a
> platform-specific way. Note that the address may take different forms;
> e.g. for AF_UNIX sockets it is a filename, so the protocol must be
> prepared for different formats.
>
> C) Similar to (A) or (B), but putting the API in an abstract subclass
> of Transport (e.g. SocketTransport) so that a transport that doesn't
> have this doesn't need to implement dummy methods returning None -- it
> is now the protocol's responsibility to check for
> isinstance(transport, SocketTransport) before calling the method. I'm
> not so keen on this, Twisted has shown (IMO) that a deep hierarchy of
> interfaces or ABCs does not necessarily provide clarity.
>
> Discussion?
>
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/e3f878a7/attachment.html>

From guido at python.org  Thu Jan 24 20:32:22 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Jan 2013 11:32:22 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAFkYKJ7xFaFfWvmD-CQ2SY16nA7DDJqPUzn-dN4XtAbZtDP3tw@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CAFkYKJ7xFaFfWvmD-CQ2SY16nA7DDJqPUzn-dN4XtAbZtDP3tw@mail.gmail.com>
Message-ID: <CAP7+vJJNWO4L5X5PtKsW-8fBtEPBDzLmfSzcHKD+VQPvsQuppA@mail.gmail.com>

On Thu, Jan 24, 2013 at 11:14 AM, Ben Darnell <ben at bendarnell.com> wrote:
> On Tornado we basically do A (the IOStream's socket attribute was never
> really documented for public consumption but has become the de facto
> standard way to get this kind of information).  As food for thought,
> consider extending this to include not just peer address but also SSL
> certificates.  Tornado's SSL support uses the stdlib's ssl.SSLSocket, so the
> certificate is available from the socket object, but Twisted (I believe)
> uses pycrypto and things work differently there.  To expose SSL certificates
> (and NPN, and other information that may or may not be there depending on
> SSL implementation) across both tornado- and twisted-based transports you'd
> need something like B or C.

Excellent points all. I'll mull this over -- it's unfortunate that (A)
is so easy to do and handles future needs as well, but may shut out
alternate transport implementations...

> -Ben
>
> On Thu, Jan 24, 2013 at 1:23 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> A pragmatic question popped up: sometimes the protocol would like to
>> know the name of the socket or its peer, i.e. call getsockname() or
>> getpeername() on the underlying socket. (I can imagine wanting to log
>> this, or do some kind of IP address blocking.)
>>
>> What should the interface for this look like? I can think of several ways:
>>
>> A) An API to return the underlying socket, if there is one. (In the
>> case of a stack of transports and protocols there may not be one, so
>> it may return None.) Downside is that it requires the transport to use
>> sockets -- if it were to use some native Windows API there might not
>> be a socket object even though there might be an IP connection with
>> easily-accessible address and peer.
>>
>> B) An API to get the address and peer address; e.g.
>> transport.getsockname() and transport.getpeername(). These would call
>> the corresponding call on the underlying socket, if there is one, or
>> return None otherwise; IP transports that don't use sockets would be
>> free to retrieve and return the requested information in a
>> platform-specific way. Note that the address may take different forms;
>> e.g. for AF_UNIX sockets it is a filename, so the protocol must be
>> prepared for different formats.
>>
>> C) Similar to (A) or (B), but putting the API in an abstract subclass
>> of Transport (e.g. SocketTransport) so that a transport that doesn't
>> have this doesn't need to implement dummy methods returning None -- it
>> is now the protocol's responsibility to check for
>> isinstance(transport, SocketTransport) before calling the method. I'm
>> not so keen on this, Twisted has shown (IMO) that a deep hierarchy of
>> interfaces or ABCs does not necessarily provide clarity.
>>
>> Discussion?
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
>
>



-- 
--Guido van Rossum (python.org/~guido)


From solipsis at pitrou.net  Thu Jan 24 20:34:06 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 24 Jan 2013 20:34:06 +0100
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
Message-ID: <20130124203406.0952fb00@pitrou.net>

On Thu, 24 Jan 2013 10:23:40 -0800
Guido van Rossum <guido at python.org> wrote:
> A pragmatic question popped up: sometimes the protocol would like to
> know the name of the socket or its peer, i.e. call getsockname() or
> getpeername() on the underlying socket. (I can imagine wanting to log
> this, or do some kind of IP address blocking.)
> 
> What should the interface for this look like? I can think of several ways:
> 
> A) An API to return the underlying socket, if there is one. (In the
> case of a stack of transports and protocols there may not be one, so
> it may return None.) Downside is that it requires the transport to use
> sockets -- if it were to use some native Windows API there might not
> be a socket object even though there might be an IP connection with
> easily-accessible address and peer.

I don't understand why you say Windows doesn't use sockets for IP
connections. AFAIK, sockets are the *only* way to do networking with
the Windows API. See e.g. WSARecv, which supports synchronous and
asynchronous operation:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms741688%28v=vs.85%29.aspx

(I also suppose you meant "TCP connection", not "IP connection" ;-))

That said, the problem with returning a socket is that it's quite
low-level, and might return sockets with different characteristics
depending on the backend. So, while it can be there, I think the
preferred APIs for most uses should be B or C.

> C) Similar to (A) or (B), but putting the API in an abstract subclass
> of Transport (e.g. SocketTransport) so that a transport that doesn't
> have this doesn't need to implement dummy methods returning None -- it
> is now the protocol's responsibility to check for
> isinstance(transport, SocketTransport) before calling the method. I'm
> not so keen on this, Twisted has shown (IMO) that a deep hierarchy of
> interfaces or ABCs does not necessarily provide clarity.

IMO, Twisted mostly shows that zope.interface doesn't combine very well
with automated doc generators such as epydoc (you have to look up the
interface every time you want the documentation of one of the concrete
classes).

And as Ben says, I don't think you want to enumerate all possible
introspection APIs (such as the various pieces of SSL-related
information) on the base Transport class.

Regards

Antoine.




From shane at umbrellacode.com  Thu Jan 24 20:37:25 2013
From: shane at umbrellacode.com (Shane Green)
Date: Thu, 24 Jan 2013 11:37:25 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAFkYKJ7xFaFfWvmD-CQ2SY16nA7DDJqPUzn-dN4XtAbZtDP3tw@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CAFkYKJ7xFaFfWvmD-CQ2SY16nA7DDJqPUzn-dN4XtAbZtDP3tw@mail.gmail.com>
Message-ID: <DA842F91-70A9-45EE-B2B7-41A26CAD8546@umbrellacode.com>

Starting to seem like the transport could almost be an entry in the dictionary rather than owning it, kind of like environ['input'] in wsgi spec.  Not that I'm necessarily recommending this, but it seems like the details may outlive the transports, could potentially include information the transport itself considered input, and may be a useful place to store details such as SSL details that might be shared.  A lot of these details could be initialized when the transport was created, and many would be based on the whatever spawned it.  For example, a transport spawned by an HTTPS server that accepted an incoming connection would inherit the SSL configuration, etc.





Shane Green 
www.umbrellacode.com
805-452-9666 | shane at umbrellacode.com

On Jan 24, 2013, at 11:14 AM, Ben Darnell <ben at bendarnell.com> wrote:

> On Tornado we basically do A (the IOStream's socket attribute was never really documented for public consumption but has become the de facto standard way to get this kind of information).  As food for thought, consider extending this to include not just peer address but also SSL certificates.  Tornado's SSL support uses the stdlib's ssl.SSLSocket, so the certificate is available from the socket object, but Twisted (I believe) uses pycrypto and things work differently there.  To expose SSL certificates (and NPN, and other information that may or may not be there depending on SSL implementation) across both tornado- and twisted-based transports you'd need something like B or C.  
> 
> -Ben
> 
> On Thu, Jan 24, 2013 at 1:23 PM, Guido van Rossum <guido at python.org> wrote:
> A pragmatic question popped up: sometimes the protocol would like to
> know the name of the socket or its peer, i.e. call getsockname() or
> getpeername() on the underlying socket. (I can imagine wanting to log
> this, or do some kind of IP address blocking.)
> 
> What should the interface for this look like? I can think of several ways:
> 
> A) An API to return the underlying socket, if there is one. (In the
> case of a stack of transports and protocols there may not be one, so
> it may return None.) Downside is that it requires the transport to use
> sockets -- if it were to use some native Windows API there might not
> be a socket object even though there might be an IP connection with
> easily-accessible address and peer.
> 
> B) An API to get the address and peer address; e.g.
> transport.getsockname() and transport.getpeername(). These would call
> the corresponding call on the underlying socket, if there is one, or
> return None otherwise; IP transports that don't use sockets would be
> free to retrieve and return the requested information in a
> platform-specific way. Note that the address may take different forms;
> e.g. for AF_UNIX sockets it is a filename, so the protocol must be
> prepared for different formats.
> 
> C) Similar to (A) or (B), but putting the API in an abstract subclass
> of Transport (e.g. SocketTransport) so that a transport that doesn't
> have this doesn't need to implement dummy methods returning None -- it
> is now the protocol's responsibility to check for
> isinstance(transport, SocketTransport) before calling the method. I'm
> not so keen on this, Twisted has shown (IMO) that a deep hierarchy of
> interfaces or ABCs does not necessarily provide clarity.
> 
> Discussion?
> 
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/9f73020f/attachment.html>

From Steve.Dower at microsoft.com  Thu Jan 24 21:16:58 2013
From: Steve.Dower at microsoft.com (Steve Dower)
Date: Thu, 24 Jan 2013 20:16:58 +0000
Subject: [Python-ideas] PEP 3156: getting the socket or peer name
	from	the transport
In-Reply-To: <20130124203406.0952fb00@pitrou.net>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<20130124203406.0952fb00@pitrou.net>
Message-ID: <ad7566ef61014aad8e659e0907d166ad@BLUPR03MB035.namprd03.prod.outlook.com>

Antoine Pitrou wrote:
> On Thu, 24 Jan 2013 10:23:40 -0800
> Guido van Rossum <guido at python.org> wrote:
> > A) An API to return the underlying socket, if there is one. (In the
> > case of a stack of transports and protocols there may not be one, so
> > it may return None.) Downside is that it requires the transport to use
> > sockets -- if it were to use some native Windows API there might not
> > be a socket object even though there might be an IP connection with
> > easily-accessible address and peer.
> 
> I don't understand why you say Windows doesn't use sockets for IP
> connections. AFAIK, sockets are the *only* way to do networking with the
> Windows API. See e.g. WSARecv, which supports synchronous and
> asynchronous operation:
> http://msdn.microsoft.com/en-
> us/library/windows/desktop/ms741688%28v=vs.85%29.aspx

There's also a whole selection of "Internet" APIs that could be used http://msdn.microsoft.com/en-us/library/hh309468.aspx and plenty (probably too many) other high level APIs. There's no expectation that every application has to deal solely in sockets. 

Cheers,
Steve




From storchaka at gmail.com  Thu Jan 24 21:35:14 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Thu, 24 Jan 2013 22:35:14 +0200
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
Message-ID: <kds5u5$b4u$1@ger.gmane.org>

On 23.01.13 03:51, alex23 wrote:
>      with open('malformed.csv','rb') as csvfile:
>          csvlines = list(l for l in csvfile if l.strip())
>          csvreader = DictReader(csvlines)

csvreader = DictReader(l for l in csvfile if l.strip())




From ncoghlan at gmail.com  Thu Jan 24 21:51:40 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Jan 2013 06:51:40 +1000
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
Message-ID: <CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>

On Fri, Jan 25, 2013 at 5:12 AM, Guido van Rossum <guido at python.org> wrote:
> On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>> transport could have dictionary attribute where it can store optional information
>> like socket name, peer name or file path, etc.
>
> Aha, that makes some sense. Though maybe it shouldn't be a dict -- it
> may be expensive to populate some values in some cases, so maybe there
> should just be a method transport.get_extra_info('key') which computes
> and returns (and possibly caches) certain values but returns None if
> the info is not supported. E.g. get_extra_info('name'),
> get_extra_info('peer'). This API makes it pretty clear that the caller
> should check the value for None before using it.

A "get_extra_info" API like that is also amenable to providing an
explicit default for the "key not present" case, and makes it clearer
that the calculations involved may not be cheap. You could even go so
far as to have it return a Future, allowing it to be used for info
that requires network activity.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From guido at python.org  Thu Jan 24 22:50:35 2013
From: guido at python.org (Guido van Rossum)
Date: Thu, 24 Jan 2013 13:50:35 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
Message-ID: <CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>

On Thu, Jan 24, 2013 at 12:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Fri, Jan 25, 2013 at 5:12 AM, Guido van Rossum <guido at python.org> wrote:
>> On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>>> transport could have dictionary attribute where it can store optional information
>>> like socket name, peer name or file path, etc.
>>
>> Aha, that makes some sense. Though maybe it shouldn't be a dict -- it
>> may be expensive to populate some values in some cases, so maybe there
>> should just be a method transport.get_extra_info('key') which computes
>> and returns (and possibly caches) certain values but returns None if
>> the info is not supported. E.g. get_extra_info('name'),
>> get_extra_info('peer'). This API makes it pretty clear that the caller
>> should check the value for None before using it.
>
> A "get_extra_info" API like that is also amenable to providing an
> explicit default for the "key not present" case, and makes it clearer
> that the calculations involved may not be cheap.

Yeah, the signature could be get_extra_info(key, default=None).

> You could even go so
> far as to have it return a Future, allowing it to be used for info
> that requires network activity.

I think that goes too far. It doesn't look like getpeername() goes out
to the network -- what other use case did you have in mind? (I suppose
it could use a Future for some keys only -- but then the caller would
still need to be aware that it could return None instead of a Future,
so it would be somewhat awkward to use -- you couldn't write

  remote_user = yield from self.transport.get_extra_info("remote_user")

you'd have to write

  f = yield from self.transport.get_extra_info("remote_user")
  remote_user = (yield from f) if f else None

-- 
--Guido van Rossum (python.org/~guido)


From steve at pearwood.info  Fri Jan 25 00:15:14 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 25 Jan 2013 10:15:14 +1100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1359043696.4802.42.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
	<2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com>
	<1359043696.4802.42.camel@gdoba.domain.local>
Message-ID: <5101C082.9070702@pearwood.info>

On 25/01/13 03:08, J. Cliff Dyer wrote:
> On Thu, 2013-01-24 at 07:28 -0800, Shane Green wrote:
>> Since every form of CSV file counts EOL as a line terminator, I think
>> discarding empty lines preceding the headers is arguably acceptable,
>> but do not think discarding lines of just delimiters would be.  What
>> about extending the DictReader API so it was easy to perform these
>> actions explicitly, such as being able to discard() the field names to
>> be re-evaluated on the next line?
>
> I think I like this idea.  There's something a little distasteful about
> making the user manually delve into the underlying reader, but this
> makes it more user-friendly and more obvious how to proceed.

I couldn't disagree more. I think:

- it adds burden to the caller, since the caller is now expected to manually
   inspect the field names and decide whether some should be discarded;

- it is less obvious: *how* does the caller decide that there are too many
   field names?

- incomplete: if there is a discard(), where is the add()?

- completely irrelevant for the topic being discussed ("DictReader should
   ignore leading blank lines... I know, let's give the caller the ability
   to *discard* field names" -- but auto-detecting *too many* field names is
   not the problem);

- and being able to change the field names on the fly is so far beyond
   anything required for ordinary CSV that it doesn't belong in the CSV
   module.


> For clarity's sake, what is your objection to discarding lines of
> delimiters?  The reason I suggest doing it is that it is a common output
> situation when exporting Excel files or LibreCalc files that have a
> blank row at the top.


A row of delimiters should be treated by the reader object as a row with
explicitly empty fields. If the caller wishes to discard them, they can.
But the reader object shouldn't make that decision.

An empty row, on the other hand, should be just ignored. DictReader *already*
ignores empty rows, provided that they are not in the first row.



-- 
Steven


From steve at pearwood.info  Fri Jan 25 00:53:51 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Fri, 25 Jan 2013 10:53:51 +1100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
Message-ID: <5101C98F.1000704@pearwood.info>

On 25/01/13 02:11, J. Cliff Dyer wrote:
> On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote:
>>> 1. Do any data conditioning by ignoring empty lines and lines of
>>> just field delimiters before the header row (consensus seems to be
>>> "no")
>
> Well, I wouldn't necessarily say we have a consensus on this one.  This
> idea received a +1 from Bruce Leban and an "I don't see any reason not
> to" from Steven D'Aprano.
>
> Objections are:
>
> 1. It's a backwards-incompatible change.

All bug fixes are backwards-incompatible changes. The question is, is
there anyone relying on this behaviour?

DictReader already ignores blank lines, *except for the very first line*.
Using Python 3.3:

py> from io import StringIO
py> from csv import DictReader
py> data = StringIO('spam,ham,eggs\n\n\n\n1,2,3\n\n\n\n\n4,5,6\n')
py> x = csv.DictReader(data)
py> next(x)
{'eggs': '3', 'ham': '2', 'spam': '1'}
py> next(x)
{'eggs': '6', 'ham': '5', 'spam': '4'}


I don't expect that there is anyone relying on a CSV file with a leading
blank line to be treated as one having no columns at all:

py> data = StringIO('\n\n\n\nspam,ham,eggs\n1,2,3\n4,5,6\n')
py> x = DictReader(data)
py> next(x)
{None: ['spam', 'ham', 'eggs']}
py> x.fieldnames
[]


I expect that there is probably code that works around this issue, by
skipping blank lines somehow, e.g.

DictReader(row for row in data if row.strip())

These work-arounds may (or not) be fragile or buggy, but they ought
to continue working even if DictReader changes its header detection.



-- 
Steven


From shane at umbrellacode.com  Fri Jan 25 01:05:43 2013
From: shane at umbrellacode.com (Shane Green)
Date: Thu, 24 Jan 2013 16:05:43 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5101C082.9070702@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
	<2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com>
	<1359043696.4802.42.camel@gdoba.domain.local>
	<5101C082.9070702@pearwood.info>
Message-ID: <A4308D81-77B5-43B2-82E0-4629F1CB1180@umbrellacode.com>

If this is part of the same response?

> A row of delimiters should be treated by the reader object as a row with
> explicitly empty fields. If the caller wishes to discard them, they can.
> But the reader object shouldn't make that decision.
> 
> An empty row, on the other hand, should be just ignored. DictReader *already*
> ignores empty rows, provided that they are not in the first row.

Then I think my description was unclear.  I wasn't suggesting we add methods for manipulating individual headers, only for telling the DictParser to drop existing headers and reevaluate them on the next row.  To make it easy to do something like 

while not any(records.fieldnames):
	records.discard_fieldnames() # or something to that effect?

without changing any existing behaviour.






Shane Green 
www.umbrellacode.com
805-452-9666 | shane at umbrellacode.com

On Jan 24, 2013, at 3:15 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> On 25/01/13 03:08, J. Cliff Dyer wrote:
>> On Thu, 2013-01-24 at 07:28 -0800, Shane Green wrote:
>>> Since every form of CSV file counts EOL as a line terminator, I think
>>> discarding empty lines preceding the headers is arguably acceptable,
>>> but do not think discarding lines of just delimiters would be.  What
>>> about extending the DictReader API so it was easy to perform these
>>> actions explicitly, such as being able to discard() the field names to
>>> be re-evaluated on the next line?
>> 
>> I think I like this idea.  There's something a little distasteful about
>> making the user manually delve into the underlying reader, but this
>> makes it more user-friendly and more obvious how to proceed.
> 
> I couldn't disagree more. I think:
> 
> - it adds burden to the caller, since the caller is now expected to manually
>  inspect the field names and decide whether some should be discarded;
> 
> - it is less obvious: *how* does the caller decide that there are too many
>  field names?
> 
> - incomplete: if there is a discard(), where is the add()?
> 
> - completely irrelevant for the topic being discussed ("DictReader should
>  ignore leading blank lines... I know, let's give the caller the ability
>  to *discard* field names" -- but auto-detecting *too many* field names is
>  not the problem);
> 
> - and being able to change the field names on the fly is so far beyond
>  anything required for ordinary CSV that it doesn't belong in the CSV
>  module.
> 
> 
>> For clarity's sake, what is your objection to discarding lines of
>> delimiters?  The reason I suggest doing it is that it is a common output
>> situation when exporting Excel files or LibreCalc files that have a
>> blank row at the top.
> 
> 
> A row of delimiters should be treated by the reader object as a row with
> explicitly empty fields. If the caller wishes to discard them, they can.
> But the reader object shouldn't make that decision.
> 
> An empty row, on the other hand, should be just ignored. DictReader *already*
> ignores empty rows, provided that they are not in the first row.
> 
> 
> 
> -- 
> Steven
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130124/007a05f6/attachment.html>

From wuwei23 at gmail.com  Fri Jan 25 02:49:53 2013
From: wuwei23 at gmail.com (alex23)
Date: Thu, 24 Jan 2013 17:49:53 -0800 (PST)
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <kds5u5$b4u$1@ger.gmane.org>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
	<kds5u5$b4u$1@ger.gmane.org>
Message-ID: <25963d3e-a6fa-4737-bbd1-04ba0454f793@ro7g2000pbb.googlegroups.com>

On 25 Jan, 06:35, Serhiy Storchaka <storch... at gmail.com> wrote:
> On 23.01.13 03:51, alex23 wrote:
>
> > ? ? ?with open('malformed.csv','rb') as csvfile:
> > ? ? ? ? ?csvlines = list(l for l in csvfile if l.strip())
> > ? ? ? ? ?csvreader = DictReader(csvlines)
>
> csvreader = DictReader(l for l in csvfile if l.strip())

Uh, thanks, although I'm not sure what you think you're showing me
that I'm not already aware of. I spelled it out as two separate
expressions for clarity, I didn't realise we were playing code golf in
our examples.


From stephen at xemacs.org  Fri Jan 25 03:38:30 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 25 Jan 2013 11:38:30 +0900
Subject: [Python-ideas] csv.DictReader could handle headers
	more	intelligently.
In-Reply-To: <5101C082.9070702@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
	<2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com>
	<1359043696.4802.42.camel@gdoba.domain.local>
	<5101C082.9070702@pearwood.info>
Message-ID: <877gn23s2h.fsf@uwakimon.sk.tsukuba.ac.jp>

Steven D'Aprano writes:

 > - it adds burden to the caller, since the caller is now expected to
 >   manually inspect the field names and decide whether some should
 >   be discarded;

It's a dirty job but somebody has to do it.

And that ultimately has to be the *writer* of the CSV file, not the
reader.  Both csv.DictReader and the caller are merely guessing unless
there's a private agreement with the writer.  cvs.DictReader, as a
stdlib module, can't know about that agreement.  The caller can
(although one obvious use case for csv.DictReader is that the caller
doesn't and is hoping csv.DictReader can guess better, oops).

Unless somebody has figured out how to give stdlib code "channeling"
capability?


From ethan at stoneleaf.us  Fri Jan 25 04:20:23 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 24 Jan 2013 19:20:23 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301241047.17391.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
Message-ID: <5101F9F7.1070301@stoneleaf.us>

On 01/24/2013 02:47 AM, Mark Hackett wrote:
> On Thursday 24 Jan 2013, Steven D'Aprano wrote:
>
>>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
>>> It's not terribly surprising once you sit down and think about it, but
>>> it's certainly at least a little unexpected to me that data is being
>>> thrown away with no notice.  It's unusual for errors to pass silently
>>> in python.
>>
>> Yes, we should not forget that a CSV file is not a dict. Just because
>>   DictReader is implemented with a dict as the storage, doesn't mean that it
>>   should behave exactly like a dict in all things. Multiple columns with the
>>   same name are legal in CSV, so there should be a reader for that
>>   situation.
>>
>
> But just because it's reading a csv file, we shouldn't change how a dictionary
> works if you add the same key again.

The proposal is not to change how a dict works, but what the proper 
response is for DictReader when a duplicate key is found.

~Ethan~



From ethan at stoneleaf.us  Fri Jan 25 04:25:38 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 24 Jan 2013 19:25:38 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <1358903168.4767.4.camel@webb>
References: <1358903168.4767.4.camel@webb>
Message-ID: <5101FB32.7030306@stoneleaf.us>

On 01/22/2013 05:06 PM, J. Cliff Dyer wrote:

> Thoughts? Do folks think this is worth adding to the csv library, or
> should I just keep using my subclass?

+1 for ignoring blank lines (including delimiter-only lines)

+1 for raising an exception on duplicate headers

+1 for a flag to not raise on duplicate empty headers (but a completely 
empty header line is still ignored)

~Ethan~



From tjreedy at udel.edu  Fri Jan 25 05:26:19 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 24 Jan 2013 23:26:19 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5101C98F.1000704@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<CADwdpyZ5THwJFo+huVMXGK3+aCaWYmYtCh3-2ucUp+sf4eEjgA@mail.gmail.com>
	<51007FCC.5090400@pearwood.info>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<F160A33B-D06E-44C9-B499-597EB6688416@umbrellacode.com>
	<CADiSq7ftqeVFVYfwWBXs40GX87Yu5E9xOrrja8B+uE-xFhnzKg@mail.gmail.com>
	<20130124133858.32622f6e@pitrou.net>
	<1359040294.4802.29.camel@gdoba.domain.local>
	<5101C98F.1000704@pearwood.info>
Message-ID: <kdt1i2$odt$1@ger.gmane.org>

On 1/24/2013 6:53 PM, Steven D'Aprano wrote:

> DictReader already ignores blank lines, *except for the very first line*.

Interesting. A proper csv file does not contain blank lines. The csv doc 
is silent on what is does they are present. (The work 'blank' does not 
appear.) Ignoring them seems reasonable, but then all should be ignored. 
And the doc should say so.

> Using Python 3.3:
>
> py> from io import StringIO
> py> from csv import DictReader
> py> data = StringIO('spam,ham,eggs\n\n\n\n1,2,3\n\n\n\n\n4,5,6\n')
> py> x = csv.DictReader(data)
> py> next(x)
> {'eggs': '3', 'ham': '2', 'spam': '1'}
> py> next(x)
> {'eggs': '6', 'ham': '5', 'spam': '4'}
>
>
> I don't expect that there is anyone relying on a CSV file with a leading
> blank line to be treated as one having no columns at all:
>
> py> data = StringIO('\n\n\n\nspam,ham,eggs\n1,2,3\n4,5,6\n')
> py> x = DictReader(data)
> py> next(x)
> {None: ['spam', 'ham', 'eggs']}
> py> x.fieldnames
> []
>
>
> I expect that there is probably code that works around this issue, by
> skipping blank lines somehow, e.g.
>
> DictReader(row for row in data if row.strip())
>
> These work-arounds may (or not) be fragile or buggy, but they ought
> to continue working even if DictReader changes its header detection.

-- 
Terry Jan Reedy



From storchaka at gmail.com  Fri Jan 25 11:01:08 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Fri, 25 Jan 2013 12:01:08 +0200
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <25963d3e-a6fa-4737-bbd1-04ba0454f793@ro7g2000pbb.googlegroups.com>
References: <1358903168.4767.4.camel@webb>
	<5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com>
	<kds5u5$b4u$1@ger.gmane.org>
	<25963d3e-a6fa-4737-bbd1-04ba0454f793@ro7g2000pbb.googlegroups.com>
Message-ID: <kdtl56$lbk$1@ger.gmane.org>

On 25.01.13 03:49, alex23 wrote:
> On 25 Jan, 06:35, Serhiy Storchaka <storch... at gmail.com> wrote:
>> csvreader = DictReader(l for l in csvfile if l.strip())
>
> Uh, thanks, although I'm not sure what you think you're showing me
> that I'm not already aware of. I spelled it out as two separate
> expressions for clarity, I didn't realise we were playing code golf in
> our examples.

I point that you no need to read all file in a memory. You can use an 
iterator and process it line by line.




From mark.hackett at metoffice.gov.uk  Fri Jan 25 11:58:28 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Fri, 25 Jan 2013 10:58:28 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5101F9F7.1070301@stoneleaf.us>
References: <1358903168.4767.4.camel@webb>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<5101F9F7.1070301@stoneleaf.us>
Message-ID: <201301251058.28531.mark.hackett@metoffice.gov.uk>

On Friday 25 Jan 2013, Ethan Furman wrote:
> On 01/24/2013 02:47 AM, Mark Hackett wrote:
> > On Thursday 24 Jan 2013, Steven D'Aprano wrote:
> >>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
> >>> It's not terribly surprising once you sit down and think about it, but
> >>> it's certainly at least a little unexpected to me that data is being
> >>> thrown away with no notice.  It's unusual for errors to pass silently
> >>> in python.
> >>
> >> Yes, we should not forget that a CSV file is not a dict. Just because
> >>   DictReader is implemented with a dict as the storage, doesn't mean
> >> that it should behave exactly like a dict in all things. Multiple
> >> columns with the same name are legal in CSV, so there should be a reader
> >> for that situation.
> >
> > But just because it's reading a csv file, we shouldn't change how a
> > dictionary works if you add the same key again.
> 
> The proposal is not to change how a dict works, but what the proper
> response is for DictReader when a duplicate key is found.
> 

Ethan, the proposal is predicated on the "silent abandonment" (which isn't 
actually the case any more than doing:

a=4
a=9

is abandoning silently the 4.) being unexpected.

Except, just like the assignment in the aside above, this is entirely what IS 
expected if you're putting a CSV line into a dictionary with duplicate key 
names.

If you don't want it to do what a dictionary does, then don't use DictReader, 
as Chris proposes.

My only niggle with that idea is that you'd be writing a lot of "SumptyReader" 
for each case and is redundant. But that may, in practice, be no problem at 
all.

If you didn't want it to do what a dict does, don't use a dict.


From mark.hackett at metoffice.gov.uk  Fri Jan 25 12:00:31 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Fri, 25 Jan 2013 11:00:31 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5101C082.9070702@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<1359043696.4802.42.camel@gdoba.domain.local>
	<5101C082.9070702@pearwood.info>
Message-ID: <201301251100.31153.mark.hackett@metoffice.gov.uk>

On Thursday 24 Jan 2013, Steven D'Aprano wrote:
> - it is less obvious: how does the caller decide that there are too many
>    field names?
> 

Additionally, the user of the library now has to read much more about the 
library (either code or documentation, which has to track the code too), to 
decide what it is going to do.

If you have to read the code, then it's not really OO, is it. It's light grey, 
not black box.


From ncoghlan at gmail.com  Fri Jan 25 12:09:43 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 25 Jan 2013 21:09:43 +1000
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
Message-ID: <CADiSq7eH3YosFV5F3iaoK1MSh=y8-_rwwpymPZ1KscYyk-4npA@mail.gmail.com>

On Fri, Jan 25, 2013 at 7:50 AM, Guido van Rossum <guido at python.org> wrote:
> I think that goes too far. It doesn't look like getpeername() goes out
> to the network -- what other use case did you have in mind?

I don't have one, so YAGNI sounds like a good answer to me.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ronaldoussoren at mac.com  Fri Jan 25 12:24:58 2013
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 25 Jan 2013 12:24:58 +0100
Subject: [Python-ideas] PEP 3156: getting the socket or peer name
	from	the transport
In-Reply-To: <CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
Message-ID: <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>


On 24 Jan, 2013, at 22:50, Guido van Rossum <guido at python.org> wrote:

> On Thu, Jan 24, 2013 at 12:51 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On Fri, Jan 25, 2013 at 5:12 AM, Guido van Rossum <guido at python.org> wrote:
>>> On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>>>> transport could have dictionary attribute where it can store optional information
>>>> like socket name, peer name or file path, etc.
>>> 
>>> Aha, that makes some sense. Though maybe it shouldn't be a dict -- it
>>> may be expensive to populate some values in some cases, so maybe there
>>> should just be a method transport.get_extra_info('key') which computes
>>> and returns (and possibly caches) certain values but returns None if
>>> the info is not supported. E.g. get_extra_info('name'),
>>> get_extra_info('peer'). This API makes it pretty clear that the caller
>>> should check the value for None before using it.
>> 
>> A "get_extra_info" API like that is also amenable to providing an
>> explicit default for the "key not present" case, and makes it clearer
>> that the calculations involved may not be cheap.
> 
> Yeah, the signature could be get_extra_info(key, default=None).
> 
>> You could even go so
>> far as to have it return a Future, allowing it to be used for info
>> that requires network activity.
> 
> I think that goes too far. It doesn't look like getpeername() goes out
> to the network -- what other use case did you have in mind? (I suppose
> it could use a Future for some keys only -- but then the caller would
> still need to be aware that it could return None instead of a Future,
> so it would be somewhat awkward to use -- you couldn't write

A transport that tunnels traffic over a SOCKS or SSH tunnel might require network access to get the sockname or peername of the proxied connection. I don't know enough about either protocol to know for sure, and the information could also be fetched during connection setup and then cached.

Ronald

From ethan at stoneleaf.us  Fri Jan 25 17:30:25 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 25 Jan 2013 08:30:25 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301251100.31153.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<1359043696.4802.42.camel@gdoba.domain.local>
	<5101C082.9070702@pearwood.info>
	<201301251100.31153.mark.hackett@metoffice.gov.uk>
Message-ID: <5102B321.8070706@stoneleaf.us>

On 01/25/2013 03:00 AM, Mark Hackett wrote:
> On Thursday 24 Jan 2013, Steven D'Aprano wrote:
>> - it is less obvious: how does the caller decide that there are too many
>>     field names?
>>
>
> Additionally, the user of the library now has to read much more about the
> library (either code or documentation, which has to track the code too), to
> decide what it is going to do.
>
> If you have to read the code, then it's not really OO, is it. It's light grey,
> not black box.

If you have to read the code, the documentation needs improvement.

~Ethan~


From mark.hackett at metoffice.gov.uk  Fri Jan 25 17:53:46 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Fri, 25 Jan 2013 16:53:46 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5102B321.8070706@stoneleaf.us>
References: <1358903168.4767.4.camel@webb>
	<201301251100.31153.mark.hackett@metoffice.gov.uk>
	<5102B321.8070706@stoneleaf.us>
Message-ID: <201301251653.46558.mark.hackett@metoffice.gov.uk>

On Friday 25 Jan 2013, Ethan Furman wrote:
> On 01/25/2013 03:00 AM, Mark Hackett wrote:
> > On Thursday 24 Jan 2013, Steven D'Aprano wrote:
> >> - it is less obvious: how does the caller decide that there are too many
> >>     field names?
> >
> > Additionally, the user of the library now has to read much more about the
> > library (either code or documentation, which has to track the code too),
> > to decide what it is going to do.
> >
> > If you have to read the code, then it's not really OO, is it. It's light
> > grey, not black box.
> 
> If you have to read the code, the documentation needs improvement.
> 

And if you put your feet too close to the fire, your feet will burn.

Neither have anything to do with the subject at hand, however.

Which is if a dictionary acts a certain way and calling a routine that creates 
a dictionary AND WORKS DIFFERENTLY, then why did you use a routine that 
creates a dictionary?

You see, the option here is to leave it operating as a dictionary operates. 
And in that case, you do not need to document anything. The documentation of 
how it works is already covered by the python basics: "How does a dictionary 
work in Python?".

So don't change it, and you don't have to improve the documentation.


From guido at python.org  Fri Jan 25 18:47:35 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Jan 2013 09:47:35 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
	<2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>
Message-ID: <CAP7+vJKWfq+q0s_=s5rPorOntzXZ-J-aApXrKdxGWdL5wEZ1uw@mail.gmail.com>

On Fri, Jan 25, 2013 at 3:24 AM, Ronald Oussoren <ronaldoussoren at mac.com>wrote:

>
> On 24 Jan, 2013, at 22:50, Guido van Rossum <guido at python.org> wrote:
>
> > On Thu, Jan 24, 2013 at 12:51 PM, Nick Coghlan <ncoghlan at gmail.com>
> wrote:
> >> On Fri, Jan 25, 2013 at 5:12 AM, Guido van Rossum <guido at python.org>
> wrote:
> >>> On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim <fafhrd91 at gmail.com>
> wrote:
> >>>> transport could have dictionary attribute where it can store optional
> information
> >>>> like socket name, peer name or file path, etc.
> >>>
> >>> Aha, that makes some sense. Though maybe it shouldn't be a dict -- it
> >>> may be expensive to populate some values in some cases, so maybe there
> >>> should just be a method transport.get_extra_info('key') which computes
> >>> and returns (and possibly caches) certain values but returns None if
> >>> the info is not supported. E.g. get_extra_info('name'),
> >>> get_extra_info('peer'). This API makes it pretty clear that the caller
> >>> should check the value for None before using it.
> >>
> >> A "get_extra_info" API like that is also amenable to providing an
> >> explicit default for the "key not present" case, and makes it clearer
> >> that the calculations involved may not be cheap.
> >
> > Yeah, the signature could be get_extra_info(key, default=None).
> >
> >> You could even go so
> >> far as to have it return a Future, allowing it to be used for info
> >> that requires network activity.
> >
> > I think that goes too far. It doesn't look like getpeername() goes out
> > to the network -- what other use case did you have in mind? (I suppose
> > it could use a Future for some keys only -- but then the caller would
> > still need to be aware that it could return None instead of a Future,
> > so it would be somewhat awkward to use -- you couldn't write
>
> A transport that tunnels traffic over a SOCKS or SSH tunnel might require
> network access to get the sockname or peername of the proxied connection. I
> don't know enough about either protocol to know for sure, and the
> information could also be fetched during connection setup and then cached.


Sounds good (to fetch it proactively ahead of time, rather than inject a
Future into the API).

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130125/5bd98c00/attachment.html>

From ethan at stoneleaf.us  Fri Jan 25 17:48:43 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 25 Jan 2013 08:48:43 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301251058.28531.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301241047.17391.mark.hackett@metoffice.gov.uk>
	<5101F9F7.1070301@stoneleaf.us>
	<201301251058.28531.mark.hackett@metoffice.gov.uk>
Message-ID: <5102B76B.2080106@stoneleaf.us>

On 01/25/2013 02:58 AM, Mark Hackett wrote:
> On Friday 25 Jan 2013, Ethan Furman wrote:
>> On 01/24/2013 02:47 AM, Mark Hackett wrote:
>>> On Thursday 24 Jan 2013, Steven D'Aprano wrote:
>>>>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label.
>>>>> It's not terribly surprising once you sit down and think about it, but
>>>>> it's certainly at least a little unexpected to me that data is being
>>>>> thrown away with no notice.  It's unusual for errors to pass silently
>>>>> in python.
>>>>
>>>> Yes, we should not forget that a CSV file is not a dict. Just because
>>>>    DictReader is implemented with a dict as the storage, doesn't mean
>>>> that it should behave exactly like a dict in all things. Multiple
>>>> columns with the same name are legal in CSV, so there should be a reader
>>>> for that situation.
>>>
>>> But just because it's reading a csv file, we shouldn't change how a
>>> dictionary works if you add the same key again.
>>
>> The proposal is not to change how a dict works, but what the proper
>> response is for DictReader when a duplicate key is found.
>
> Ethan, the proposal is predicated on the "silent abandonment" (which isn't
> actually the case any more than doing:
>
> a=4
> a=9
>
> is abandoning silently the 4.) being unexpected.

We're going to have to agree to disagree on this point -- I think there 
is a huge difference between reassigning a variable which is completely 
under your control from losing entire columns of data from a file which 
you may have never seen before.


> Except, just like the assignment in the aside above, this is entirely what IS
> expected if you're putting a CSV line into a dictionary with duplicate key
> names.

Expected by whom?  The library writer?  Sure.  The application writer? 
Maybe.  The person creating the spreadsheet that's going to be dumped to 
csv to be imported into the program that thought, "This field also needs 
an item number... I'll call it 'item_no', just like that other column" 
-- Nope.


> If you don't want it to do what a dictionary does, then don't use DictReader,
> as Chris proposes.

DictReader puts a name on a column -- that's its primary use;  I don't 
think the designers had the goal of dropping data when they implemented 
it -- I suspect it was just missed as a possibility (not being the 
"normal" type of csv file) or putting a warning in the docs was missed.

~Ethan~


From rurpy at yahoo.com  Fri Jan 25 19:03:03 2013
From: rurpy at yahoo.com (rurpy at yahoo.com)
Date: Fri, 25 Jan 2013 10:03:03 -0800 (PST)
Subject: [Python-ideas] csv.DictReader could handle headers more
 intelligently.
In-Reply-To: <201301251653.46558.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301251100.31153.mark.hackett@metoffice.gov.uk>
	<5102B321.8070706@stoneleaf.us>
	<201301251653.46558.mark.hackett@metoffice.gov.uk>
Message-ID: <17bba319-ff53-41a6-8ada-3cd3ad036076@googlegroups.com>

On 01/25/2013 09:53 AM, Mark Hackett wrote:> On Friday 25 Jan 2013, Ethan 
Furman wrote:
>> On 01/25/2013 03:00 AM, Mark Hackett wrote:
>> > On Thursday 24 Jan 2013, Steven D'Aprano wrote:
>> >> - it is less obvious: how does the caller decide that there are too 
many
>> >>     field names?
>> >
>> > Additionally, the user of the library now has to read much more about 
the
>> > library (either code or documentation, which has to track the code 
too),
>> > to decide what it is going to do.
>> >
>> > If you have to read the code, then it's not really OO, is it. It's 
light
>> > grey, not black box.
>> 
>> If you have to read the code, the documentation needs improvement.
>> 
> 
> And if you put your feet too close to the fire, your feet will burn.
> 
> Neither have anything to do with the subject at hand, however.
> 
> Which is if a dictionary acts a certain way and calling a routine that 
creates 
> a dictionary AND WORKS DIFFERENTLY, then why did you use a routine that 
> creates a dictionary?
> 
> You see, the option here is to leave it operating as a dictionary 
operates. 
> And in that case, you do not need to document anything. The documentation 
of 
> how it works is already covered by the python basics: "How does a 
dictionary 
> work in Python?".

The csv DictReader *uses* a dictionary for its output. That 
it does so imposes no requirements on how it should parse or 
otherwise handle the input that eventually goes into that 
dict.

I can understand the appeal of keeping things simple and
simply cramming whatever comes out of a simple parse of 
the header into the dict keys.  Simplicity is good and
that is a valid opinion.  However it is not a-priori the
obviously best one no matter how much hand-waving and 
foot stomping comes with it.

I would prefer to see a suppressible exception when header
keys are duplicated on the grounds that such a csv file 
is not in general an appropriate input for the DictReader.

> So don't change it, and you don't have to improve the documentation.

If it's not changed then documentation definitely should
be fixed.  The very fact that when the behaviour was pointed
out here, the result was a long discussion rather than one
or two responses that said, "of course it behaves that way"
is the strongest evidence that the current description
is inadequate.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130125/9256c04f/attachment.html>

From fafhrd91 at gmail.com  Fri Jan 25 19:03:49 2013
From: fafhrd91 at gmail.com (Nikolay Kim)
Date: Fri, 25 Jan 2013 10:03:49 -0800
Subject: [Python-ideas] PEP 3156: Transport.sendfile
In-Reply-To: <CAP7+vJKWfq+q0s_=s5rPorOntzXZ-J-aApXrKdxGWdL5wEZ1uw@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
	<2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>
	<CAP7+vJKWfq+q0s_=s5rPorOntzXZ-J-aApXrKdxGWdL5wEZ1uw@mail.gmail.com>
Message-ID: <D13F5E9F-97EE-4EED-9DC0-504169C8DE3C@gmail.com>


I think Transport needs 'sendfile' api, something like:

   @tasks.coroutine
   def sendfile(self, fd, offset, nbytes):
      ?.

otherwise it is impossible to implement sendfile without breaking transport encapsulation.



From guido at python.org  Fri Jan 25 19:08:46 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Jan 2013 10:08:46 -0800
Subject: [Python-ideas] PEP 3156: Transport.sendfile
In-Reply-To: <D13F5E9F-97EE-4EED-9DC0-504169C8DE3C@gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
	<2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>
	<CAP7+vJKWfq+q0s_=s5rPorOntzXZ-J-aApXrKdxGWdL5wEZ1uw@mail.gmail.com>
	<D13F5E9F-97EE-4EED-9DC0-504169C8DE3C@gmail.com>
Message-ID: <CAP7+vJLZD3EOV6dqh_Bt+ti1KHocpwBEJ59-myoCLJxOEvxPVQ@mail.gmail.com>

On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:

>
> I think Transport needs 'sendfile' api, something like:
>
>    @tasks.coroutine
>    def sendfile(self, fd, offset, nbytes):
>       ?.
>
> otherwise it is impossible to implement sendfile without breaking
> transport encapsulation


Really? Can't the user write this themselves? What's wrong with this:

while True:
  data = os.read(fd, 16*1024)
  if not data: break
  transport.write(data)

(Perhaps augmented with a way to respond to pause() requests.)

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130125/45ecc5ad/attachment.html>

From fafhrd91 at gmail.com  Fri Jan 25 19:11:37 2013
From: fafhrd91 at gmail.com (Nikolay Kim)
Date: Fri, 25 Jan 2013 10:11:37 -0800
Subject: [Python-ideas] PEP 3156: Transport.sendfile
In-Reply-To: <CAP7+vJLZD3EOV6dqh_Bt+ti1KHocpwBEJ59-myoCLJxOEvxPVQ@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
	<2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>
	<CAP7+vJKWfq+q0s_=s5rPorOntzXZ-J-aApXrKdxGWdL5wEZ1uw@mail.gmail.com>
	<D13F5E9F-97EE-4EED-9DC0-504169C8DE3C@gmail.com>
	<CAP7+vJLZD3EOV6dqh_Bt+ti1KHocpwBEJ59-myoCLJxOEvxPVQ@mail.gmail.com>
Message-ID: <F0322C94-28C5-4368-8150-9F45D224C4BF@gmail.com>


On Jan 25, 2013, at 10:08 AM, Guido van Rossum <guido at python.org> wrote:

> On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
> 
> I think Transport needs 'sendfile' api, something like:
> 
>    @tasks.coroutine
>    def sendfile(self, fd, offset, nbytes):
>       ?.
> 
> otherwise it is impossible to implement sendfile without breaking transport encapsulation
>  
> Really? Can't the user write this themselves? What's wrong with this:
> 
> while True:
>   data = os.read(fd, 16*1024)
>   if not data: break
>   transport.write(data)
> 
> (Perhaps augmented with a way to respond to pause() requests.)
> 

i mean 'os.sendfile()', zero-copy sendfile.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130125/acd685e9/attachment.html>

From guido at python.org  Fri Jan 25 21:04:24 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Jan 2013 12:04:24 -0800
Subject: [Python-ideas] PEP 3156: Transport.sendfile
In-Reply-To: <F0322C94-28C5-4368-8150-9F45D224C4BF@gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
	<2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>
	<CAP7+vJKWfq+q0s_=s5rPorOntzXZ-J-aApXrKdxGWdL5wEZ1uw@mail.gmail.com>
	<D13F5E9F-97EE-4EED-9DC0-504169C8DE3C@gmail.com>
	<CAP7+vJLZD3EOV6dqh_Bt+ti1KHocpwBEJ59-myoCLJxOEvxPVQ@mail.gmail.com>
	<F0322C94-28C5-4368-8150-9F45D224C4BF@gmail.com>
Message-ID: <CAP7+vJL1Z0Q+QbdwgN8ECr5b1Us6ep1mAgPLg-2XTzom5TnOpg@mail.gmail.com>

On Fri, Jan 25, 2013 at 10:11 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:

>
> On Jan 25, 2013, at 10:08 AM, Guido van Rossum <guido at python.org> wrote:
>
> On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>
>>
>> I think Transport needs 'sendfile' api, something like:
>>
>>    @tasks.coroutine
>>    def sendfile(self, fd, offset, nbytes):
>>       ?.
>>
>> otherwise it is impossible to implement sendfile without breaking
>> transport encapsulation
>
>
> Really? Can't the user write this themselves? What's wrong with this:
>
> while True:
>   data = os.read(fd, 16*1024)
>   if not data: break
>   transport.write(data)
>
> (Perhaps augmented with a way to respond to pause() requests.)
>
>
> i mean 'os.sendfile()', zero-copy sendfile.
>

I see (http://docs.python.org/dev/library/os.html#os.sendfile).

Hm, that function is so platform-specific that we might as well force users
to do it this way:

sock = transport.get_extra_info("socket")
if sock is not None:
  os.sendfile(sock.fileno(), ......)
else:
  <use write() like I suggested above>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130125/4a40fd30/attachment.html>

From fafhrd91 at gmail.com  Fri Jan 25 21:25:50 2013
From: fafhrd91 at gmail.com (Nikolay Kim)
Date: Fri, 25 Jan 2013 12:25:50 -0800
Subject: [Python-ideas] PEP 3156: Transport.sendfile
In-Reply-To: <CAP7+vJL1Z0Q+QbdwgN8ECr5b1Us6ep1mAgPLg-2XTzom5TnOpg@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
	<2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>
	<CAP7+vJKWfq+q0s_=s5rPorOntzXZ-J-aApXrKdxGWdL5wEZ1uw@mail.gmail.com>
	<D13F5E9F-97EE-4EED-9DC0-504169C8DE3C@gmail.com>
	<CAP7+vJLZD3EOV6dqh_Bt+ti1KHocpwBEJ59-myoCLJxOEvxPVQ@mail.gmail.com>
	<F0322C94-28C5-4368-8150-9F45D224C4BF@gmail.com>
	<CAP7+vJL1Z0Q+QbdwgN8ECr5b1Us6ep1mAgPLg-2XTzom5TnOpg@mail.gmail.com>
Message-ID: <F48F9B5E-890C-4694-86A9-697100EB357D@gmail.com>


On Jan 25, 2013, at 12:04 PM, Guido van Rossum <guido at python.org> wrote:

> On Fri, Jan 25, 2013 at 10:11 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
> 
> On Jan 25, 2013, at 10:08 AM, Guido van Rossum <guido at python.org> wrote:
> 
>> On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>> 
>> I think Transport needs 'sendfile' api, something like:
>> 
>>    @tasks.coroutine
>>    def sendfile(self, fd, offset, nbytes):
>>       ?.
>> 
>> otherwise it is impossible to implement sendfile without breaking transport encapsulation
>>  
>> Really? Can't the user write this themselves? What's wrong with this:
>> 
>> while True:
>>   data = os.read(fd, 16*1024)
>>   if not data: break
>>   transport.write(data)
>> 
>> (Perhaps augmented with a way to respond to pause() requests.)
> 
> i mean 'os.sendfile()', zero-copy sendfile.
> 
> I see (http://docs.python.org/dev/library/os.html#os.sendfile).
> 
> Hm, that function is so platform-specific that we might as well force users to do it this way:
> 
> sock = transport.get_extra_info("socket")
> if sock is not None:
>   os.sendfile(sock.fileno(), ......)
> else:
>   <use write() like I suggested above>

there should some kind of way to flush write buffer or write callbacks. 

sock = transport.get_extra_info("socket")
if sock is not None:
   os.sendfile(sock.fileno(), ......)
else:
   yield from transport.write_buffer_flush()
   <use of write() method>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130125/b20e1203/attachment.html>

From guido at python.org  Fri Jan 25 21:28:28 2013
From: guido at python.org (Guido van Rossum)
Date: Fri, 25 Jan 2013 12:28:28 -0800
Subject: [Python-ideas] PEP 3156: Transport.sendfile
In-Reply-To: <F48F9B5E-890C-4694-86A9-697100EB357D@gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
	<2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>
	<CAP7+vJKWfq+q0s_=s5rPorOntzXZ-J-aApXrKdxGWdL5wEZ1uw@mail.gmail.com>
	<D13F5E9F-97EE-4EED-9DC0-504169C8DE3C@gmail.com>
	<CAP7+vJLZD3EOV6dqh_Bt+ti1KHocpwBEJ59-myoCLJxOEvxPVQ@mail.gmail.com>
	<F0322C94-28C5-4368-8150-9F45D224C4BF@gmail.com>
	<CAP7+vJL1Z0Q+QbdwgN8ECr5b1Us6ep1mAgPLg-2XTzom5TnOpg@mail.gmail.com>
	<F48F9B5E-890C-4694-86A9-697100EB357D@gmail.com>
Message-ID: <CAP7+vJJNeS7wSi+TmN9W1Tbv=uQ3ovcVGXj_zw7YfAUg+4jCBg@mail.gmail.com>

On Fri, Jan 25, 2013 at 12:25 PM, Nikolay Kim <fafhrd91 at gmail.com> wrote:

>
> On Jan 25, 2013, at 12:04 PM, Guido van Rossum <guido at python.org> wrote:
>
> On Fri, Jan 25, 2013 at 10:11 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>
>>
>> On Jan 25, 2013, at 10:08 AM, Guido van Rossum <guido at python.org> wrote:
>>
>> On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>>
>>>
>>> I think Transport needs 'sendfile' api, something like:
>>>
>>>    @tasks.coroutine
>>>    def sendfile(self, fd, offset, nbytes):
>>>       ?.
>>>
>>> otherwise it is impossible to implement sendfile without breaking
>>> transport encapsulation
>>
>>
>> Really? Can't the user write this themselves? What's wrong with this:
>>
>> while True:
>>   data = os.read(fd, 16*1024)
>>   if not data: break
>>   transport.write(data)
>>
>> (Perhaps augmented with a way to respond to pause() requests.)
>>
>>
>> i mean 'os.sendfile()', zero-copy sendfile.
>>
>
> I see (http://docs.python.org/dev/library/os.html#os.sendfile).
>
> Hm, that function is so platform-specific that we might as well force
> users to do it this way:
>
> sock = transport.get_extra_info("socket")
> if sock is not None:
>   os.sendfile(sock.fileno(), ......)
> else:
>   <use write() like I suggested above>
>
>
> there should some kind of way to flush write buffer or write callbacks.
>
> sock = transport.get_extra_info("socket")
> if sock is not None:
>    os.sendfile(sock.fileno(), ......)
> else:
>    yield from transport.write_buffer_flush()
>    <use of write() method>
>

Oh, that's an interesting idea in its own right. But I'm not sure Twisted
could implement this given that their flow control works differently.

However, I think you've convinced me that offering sendfile() is actually
better. But should it take a file descriptor or a stream (file) object?

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130125/32cbd835/attachment.html>

From g.rodola at gmail.com  Fri Jan 25 21:44:14 2013
From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=)
Date: Fri, 25 Jan 2013 21:44:14 +0100
Subject: [Python-ideas] PEP 3156: Transport.sendfile
In-Reply-To: <F48F9B5E-890C-4694-86A9-697100EB357D@gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com>
	<CAP7+vJKCGRAQkt628Nt_7JUoBWDWXXNcxxdDSyyYUBjQ+j_yxA@mail.gmail.com>
	<CADiSq7e1oDXyQL+ohRe0b=bQ6rGg7ycKtZB+TWHt_4VQ7igwug@mail.gmail.com>
	<CAP7+vJKE9wTrPaZVXEGWRusshdz61fa0rMYshbpOczNK99d0bQ@mail.gmail.com>
	<2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com>
	<CAP7+vJKWfq+q0s_=s5rPorOntzXZ-J-aApXrKdxGWdL5wEZ1uw@mail.gmail.com>
	<D13F5E9F-97EE-4EED-9DC0-504169C8DE3C@gmail.com>
	<CAP7+vJLZD3EOV6dqh_Bt+ti1KHocpwBEJ59-myoCLJxOEvxPVQ@mail.gmail.com>
	<F0322C94-28C5-4368-8150-9F45D224C4BF@gmail.com>
	<CAP7+vJL1Z0Q+QbdwgN8ECr5b1Us6ep1mAgPLg-2XTzom5TnOpg@mail.gmail.com>
	<F48F9B5E-890C-4694-86A9-697100EB357D@gmail.com>
Message-ID: <CAFYqXL_zpy=_vjMkzjRycDCUU65MYmPwYmHpz75o7MxefDgrLQ@mail.gmail.com>

In principle os.sendfile() is not too different than socket.send():
they share the same return value (no. of bytes sent) and errors, hence
it's pretty straightforward to implement (the user could even just
override Transport.write() him/herself).
Nonetheless there are other subtle differences (e.g. it works with
regular (mmap-like) files only) so that deciding whether to use send()
or sendfile() behind the curtains is not a good idea.
Transport class should probably provide a separate method (other than write()).
Also, I think that *at this point* thinking about adding sendfile()
into Tulip is probably premature.

--- Giampaolo

http://code.google.com/p/pyftpdlib/
http://code.google.com/p/psutil/
http://code.google.com/p/pysendfile/


2013/1/25 Nikolay Kim <fafhrd91 at gmail.com>:
>
> On Jan 25, 2013, at 12:04 PM, Guido van Rossum <guido at python.org> wrote:
>
> On Fri, Jan 25, 2013 at 10:11 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>>
>>
>> On Jan 25, 2013, at 10:08 AM, Guido van Rossum <guido at python.org> wrote:
>>
>> On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim <fafhrd91 at gmail.com> wrote:
>>>
>>>
>>> I think Transport needs 'sendfile' api, something like:
>>>
>>>    @tasks.coroutine
>>>    def sendfile(self, fd, offset, nbytes):
>>>       ?.
>>>
>>> otherwise it is impossible to implement sendfile without breaking
>>> transport encapsulation
>>
>>
>> Really? Can't the user write this themselves? What's wrong with this:
>>
>> while True:
>>   data = os.read(fd, 16*1024)
>>   if not data: break
>>   transport.write(data)
>>
>> (Perhaps augmented with a way to respond to pause() requests.)
>>
>>
>> i mean 'os.sendfile()', zero-copy sendfile.
>
>
> I see (http://docs.python.org/dev/library/os.html#os.sendfile).
>
> Hm, that function is so platform-specific that we might as well force users
> to do it this way:
>
> sock = transport.get_extra_info("socket")
> if sock is not None:
>   os.sendfile(sock.fileno(), ......)
> else:
>   <use write() like I suggested above>
>
>
> there should some kind of way to flush write buffer or write callbacks.
>
> sock = transport.get_extra_info("socket")
> if sock is not None:
>    os.sendfile(sock.fileno(), ......)
> else:
>    yield from transport.write_buffer_flush()
>    <use of write() method>
>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>


From shane at umbrellacode.com  Sat Jan 26 12:55:48 2013
From: shane at umbrellacode.com (Shane Green)
Date: Sat, 26 Jan 2013 03:55:48 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301251653.46558.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301251100.31153.mark.hackett@metoffice.gov.uk>
	<5102B321.8070706@stoneleaf.us>
	<201301251653.46558.mark.hackett@metoffice.gov.uk>
Message-ID: <F4553DB5-3A86-4E8C-A8A5-55A60FF8DE7B@umbrellacode.com>

Sorry if this is a dupe?it went to the google groups address the first time around, and I think that's different?


> I've been trying to avoid the wrath, but can't any longer.  Let me start but clarifying that I know what a dictionary is, how it works, and what Python is, so we can bypass calling that into question.  I also know what CSV is, and I've dealt with a lot of real-life examples of CSV data: not just exports from excel, log data from the energy management space, sensor values, etc.; critical electrical fault data generated by very legacy, stupid equipment.  And while it's true that a dictionary is a dictionary and it works the way it works, the real point that drives home is that it's an inappropriate mechanism for dealing ordered rows of sequential values.  Regardless of what choices were made for the implementation, if the module's name is csv, it should be able to do the things it says it does with any legal CSV content without losing information.  Just because its how a dictionary works doesn't mean column 3's value replacing column 1's value is something other than the loss of data.  One CSV file I worked with had headers for five columns of information, then the header "VALUE" for every 5 minute period in an hour.  Using this CSV parser would leave the client with one sample an hour: how dictionaries work isn't going to bring back 10 values, so information was lost.  
> 
> The final point is a simple one: while that CSV file format was stupid, it was perfectly legal.  Something that deals with CSV content should not be losing any of its content.  It also should [not] be barfing or throwing exceptions, by the way.  


> And what about fixing it by replacing implementing a class that does it correctly, maps values to column numbers, keeps values as lists modeled after FieldStorage.  Make iterating it work just like it does now by replacing the values with the last value in each least before returning it, and provide iterator methods for getting at the new functionality, which includes iterating items with repeating header names in order, etc; and also iter records, or something like that, to iterate the head: [value, ?] maps?



Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 25, 2013, at 8:53 AM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

> On Friday 25 Jan 2013, Ethan Furman wrote:
>> On 01/25/2013 03:00 AM, Mark Hackett wrote:
>>> On Thursday 24 Jan 2013, Steven D'Aprano wrote:
>>>> - it is less obvious: how does the caller decide that there are too many
>>>>    field names?
>>> 
>>> Additionally, the user of the library now has to read much more about the
>>> library (either code or documentation, which has to track the code too),
>>> to decide what it is going to do.
>>> 
>>> If you have to read the code, then it's not really OO, is it. It's light
>>> grey, not black box.
>> 
>> If you have to read the code, the documentation needs improvement.
>> 
> 
> And if you put your feet too close to the fire, your feet will burn.
> 
> Neither have anything to do with the subject at hand, however.
> 
> Which is if a dictionary acts a certain way and calling a routine that creates 
> a dictionary AND WORKS DIFFERENTLY, then why did you use a routine that 
> creates a dictionary?
> 
> You see, the option here is to leave it operating as a dictionary operates. 
> And in that case, you do not need to document anything. The documentation of 
> how it works is already covered by the python basics: "How does a dictionary 
> work in Python?".
> 
> So don't change it, and you don't have to improve the documentation.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130126/5eefaa63/attachment.html>

From stephen at xemacs.org  Sat Jan 26 14:53:53 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 26 Jan 2013 22:53:53 +0900
Subject: [Python-ideas] csv.DictReader could handle headers
	more	intelligently.
In-Reply-To: <F4553DB5-3A86-4E8C-A8A5-55A60FF8DE7B@umbrellacode.com>
References: <1358903168.4767.4.camel@webb>
	<201301251100.31153.mark.hackett@metoffice.gov.uk>
	<5102B321.8070706@stoneleaf.us>
	<201301251653.46558.mark.hackett@metoffice.gov.uk>
	<F4553DB5-3A86-4E8C-A8A5-55A60FF8DE7B@umbrellacode.com>
Message-ID: <87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp>

Shane Green writes:

 > And while it's true that a dictionary is a dictionary and it works
 > the way it works, the real point that drives home is that it's an
 > inappropriate mechanism for dealing ordered rows of sequential
 > values.

Right!  So use csv.reader, or csv.DictReader with an explicit
fieldnames argument.

The point of csv.DictReader with default fieldnames is to take a
"well-behaved" table and turn it into a sequence of "poor-man's"
objects.

 > The final point is a simple one: while that CSV file format was
 > stupid, it was perfectly legal.  Something that deals with CSV
 > content should not be losing any of its content.

That's a reasonable requirement.

 > It also should [not] be barfing or throwing exceptions, by the way.

That's not.  As long as the module provides classes capable of
handling any CSV format (it does), it may also provide convenience
classes for special purposes with restricted formats.  Those classes
may throw exceptions on input that doesn't satisfy the restrictions.

 > And what about fixing it by replacing implementing a class that
 > does it correctly, [...]?

Doesn't help users who want automatically detected access-by-name.
They must have unique field names.  (I don't have a use case.  I
assume the implementer of csv.DictReader did.<wink/>)



From vito.detullio at gmail.com  Sat Jan 26 13:01:11 2013
From: vito.detullio at gmail.com (Vito De Tullio)
Date: Sat, 26 Jan 2013 13:01:11 +0100
Subject: [Python-ideas] complex number and fractional exponent
Message-ID: <ke0gi7$61s$1@ger.gmane.org>

Hi.

with python3000 there was a lot of fuzz about pep238 (integer division with 
float result).

Today I was stumble upon a similar behaviour, but I did not found a clear 
reference on the net, regarding an "extension" of the pep to others 
mathematical operations / data types

    >>> (-1)**.5
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: negative number cannot be raised to a fractional power
    >>> (-1+0j)**.5
    (6.123031769111886e-17+1j)

(with is "really close" to 1j)


There is some ideas about extending the pow() / ** operator to return 
complex number when necessary?

ATM I don't need to work with complex numbers, nor I have strong opinion on 
the choice, it's more that I'm curious on why was introduced a so big 
language difference on division and not extended to power exponentiation.

thanks 

note: at the moment I don't have a python3 executable, but I guess this is 
applicable to it.


-- 
ZeD



From shane at umbrellacode.com  Sat Jan 26 15:39:11 2013
From: shane at umbrellacode.com (Shane Green)
Date: Sat, 26 Jan 2013 06:39:11 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1358903168.4767.4.camel@webb>
	<201301251100.31153.mark.hackett@metoffice.gov.uk>
	<5102B321.8070706@stoneleaf.us>
	<201301251653.46558.mark.hackett@metoffice.gov.uk>
	<F4553DB5-3A86-4E8C-A8A5-55A60FF8DE7B@umbrellacode.com>
	<87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <D88C8B2C-DB03-46F4-888A-FAC5C0F25E83@umbrellacode.com>

Okay, I like your point about DictReader having a place with a subset of CSV tables, and agree that, given that definition, it should throw an exception when its fed something that doesn't conform to this definition.  I like that.

One thing, though, the new version would let you access column data by name as well: 

Instead of
	row["timestamp"] == 1359210019.299478

It would be
	row["timestamp"] == [1359210019.299478]

And potentially 
	row["timestamp"] == [1359210019.299478,1359210019.299478]

It could also be accessed as: 
	row.headers[0] == "timestamp"
	row.headers[1] == "timestamp"
	row.values[0] == 1359210019.299478
	row.values[1] == 1359210019.299478

Could still provide: 
	for name,value in records.iterfirstitems(): # get the first value for each column with a given name.
	 	- or - 
	for name,value in records.iterlasttitems(): # get the last value for each column with a given name.

And the exact functionality you have now: 
	records.itervaluemaps() # or something? just a map(dict(records.iterlastitesm()))
		
Overkill, but really simple things to add? 

The only thing this really adds to the "convenience" of the current DictReader for well-behaved tables, is the ability to access values sequentially or by name; other than that, the only difference would be iterating on a generator method's output instead of the instance itself.  




Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 26, 2013, at 5:53 AM, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:

> Shane Green writes:
> 
>> And while it's true that a dictionary is a dictionary and it works
>> the way it works, the real point that drives home is that it's an
>> inappropriate mechanism for dealing ordered rows of sequential
>> values.
> 
> Right!  So use csv.reader, or csv.DictReader with an explicit
> fieldnames argument.
> 
> The point of csv.DictReader with default fieldnames is to take a
> "well-behaved" table and turn it into a sequence of "poor-man's"
> objects.
> 
>> The final point is a simple one: while that CSV file format was
>> stupid, it was perfectly legal.  Something that deals with CSV
>> content should not be losing any of its content.
> 
> That's a reasonable requirement.
> 
>> It also should [not] be barfing or throwing exceptions, by the way.
> 
> That's not.  As long as the module provides classes capable of
> handling any CSV format (it does), it may also provide convenience
> classes for special purposes with restricted formats.  Those classes
> may throw exceptions on input that doesn't satisfy the restrictions.
> 
>> And what about fixing it by replacing implementing a class that
>> does it correctly, [...]?
> 
> Doesn't help users who want automatically detected access-by-name.
> They must have unique field names.  (I don't have a use case.  I
> assume the implementer of csv.DictReader did.<wink/>)
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130126/6964040a/attachment.html>

From ncoghlan at gmail.com  Sat Jan 26 16:01:31 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 27 Jan 2013 01:01:31 +1000
Subject: [Python-ideas] complex number and fractional exponent
In-Reply-To: <ke0gi7$61s$1@ger.gmane.org>
References: <ke0gi7$61s$1@ger.gmane.org>
Message-ID: <CADiSq7c9rYQLYP0SAVgThuEQohHgk9GB-KAyv6FMmdJ2Gy53+A@mail.gmail.com>

On Sat, Jan 26, 2013 at 10:01 PM, Vito De Tullio
<vito.detullio at gmail.com> wrote:
> There is some ideas about extending the pow() / ** operator to return
> complex number when necessary?
>
> ATM I don't need to work with complex numbers, nor I have strong opinion on
> the choice, it's more that I'm curious on why was introduced a so big
> language difference on division and not extended to power exponentiation.

Python 3.2.3 (default, Jun  8 2012, 05:36:09)
[GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> (-1) ** 0.5
(6.123031769111886e-17+1j)
>>> pow(-1, 0.5)
(6.123031769111886e-17+1j)

The math module is still deliberately restricted to float results,
though, as that module is intended to be a reasonably thin wrapper
around the platform floating point support. The cmath module and the
builtin pow are available if support for complex results is needed.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From storchaka at gmail.com  Sat Jan 26 17:01:14 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Sat, 26 Jan 2013 18:01:14 +0200
Subject: [Python-ideas] complex number and fractional exponent
In-Reply-To: <ke0gi7$61s$1@ger.gmane.org>
References: <ke0gi7$61s$1@ger.gmane.org>
Message-ID: <ke0uk8$ms8$1@ger.gmane.org>

On 26.01.13 14:01, Vito De Tullio wrote:
> note: at the moment I don't have a python3 executable, but I guess this is
> applicable to it.

No, it isn't.

 >>> (-1)**0.5
(6.123031769111886e-17+1j)




From dustin at v.igoro.us  Sat Jan 26 19:37:22 2013
From: dustin at v.igoro.us (Dustin J. Mitchell)
Date: Sat, 26 Jan 2013 13:37:22 -0500
Subject: [Python-ideas] PEP 3156 - Coroutines are more better
In-Reply-To: <51009907.8030404@canterbury.ac.nz>
References: <CAFkYKJ4BEgqQ5padUqoOg5-+wv62FmarPn0fON63Cdr=FhiTrA@mail.gmail.com>
	<CADbA=FWC92A9_u8-sC4f8wSfWWEu0RF0acS6_8OxNUx6kAg2fg@mail.gmail.com>
	<CAFkYKJ5Be27ZR4ChvEcXsOZ87rAcVZMiDkpgp7v1xAdJH4kTMA@mail.gmail.com>
	<CADbA=FWPxaAB_CKgGT_OmsWM85X9-A5NsOJ5d+ipBSDO9O3x0Q@mail.gmail.com>
	<CAP7+vJKvQNVNk1n7Q8B6Oaio16Aai-hqbaqGn57AE-u7J0AFSA@mail.gmail.com>
	<CAJtE5vRueHa7P7Bui=B-+d=3dw_ZXUOSMcvUUKrw5bYMwD3g1A@mail.gmail.com>
	<51009907.8030404@canterbury.ac.nz>
Message-ID: <CAJtE5vQNsu6UHgPYhScm6jBxDDVy0Rd5NTcruPQpZ2BnCwhWdQ@mail.gmail.com>

On Wed, Jan 23, 2013 at 9:14 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> I think I'm going to wait and see what the coroutine-level features
> of tulip turn out to be like before saying much more.

I think this is pretty smart, actually.  Deferreds, futures, promises,
etc. give the programmer a lot of rope.  They don't require classical
models of control flow, in particular.  That's cool, but tends to lead
to code with subtle bugs.

Coroutines re-introduce just enough structure to put programmers back
in comfortable territory for verifying correctness.  This ends up
looking a bit like threads, but with less concern for synchronization
primitives, and virtually-free cloning.

Dustin


From tjreedy at udel.edu  Sat Jan 26 20:09:44 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 26 Jan 2013 14:09:44 -0500
Subject: [Python-ideas] complex number and fractional exponent
In-Reply-To: <ke0gi7$61s$1@ger.gmane.org>
References: <ke0gi7$61s$1@ger.gmane.org>
Message-ID: <ke19mg$nrg$1@ger.gmane.org>

On 1/26/2013 7:01 AM, Vito De Tullio wrote:

> There is some ideas about extending the pow() / ** operator to return
> complex number when necessary?

As other have noted, it was. But the analogy with division is not exact. 
The new // operator was added for when one wants floor(a/b).

For sqrt, one also has a choice and always has since complexes were added.
 >>> import math, cmath
 >>> math.sqrt(-1)
Traceback (most recent call last):
   File "<pyshell#1>", line 1, in <module>
     math.sqrt(-1)
ValueError: math domain error
 >>> cmath.sqrt(-1)
1j

For many purposes, such as computing standard deviation in statistics, 
one wants the exception, as negative variance indicates a calculation error.

-- 
Terry Jan Reedy



From oscar.j.benjamin at gmail.com  Sat Jan 26 22:22:22 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Sat, 26 Jan 2013 21:22:22 +0000
Subject: [Python-ideas] complex number and fractional exponent
In-Reply-To: <ke19mg$nrg$1@ger.gmane.org>
References: <ke0gi7$61s$1@ger.gmane.org>
	<ke19mg$nrg$1@ger.gmane.org>
Message-ID: <CAHVvXxTj9aOFfK8fLqh94aFT80_M=Y4maxeVYWxNmnD6JS62bQ@mail.gmail.com>

On 26 January 2013 19:09, Terry Reedy <tjreedy at udel.edu> wrote:
>
> For sqrt, one also has a choice and always has since complexes were added.
>>>> import math, cmath
>>>> math.sqrt(-1)
>
> Traceback (most recent call last):
>   File "<pyshell#1>", line 1, in <module>
>     math.sqrt(-1)
> ValueError: math domain error
>>>> cmath.sqrt(-1)
> 1j

Why does cmath.sqrt give a different value from the __pow__ version?

~$ python3
Python 3.2.3 (default, Oct 19 2012, 19:53:16)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cmath
>>> cmath.sqrt(-1)
1j
>>> (-1) ** .5
(6.123031769111886e-17+1j)


Oscar


From nbvfour at gmail.com  Sun Jan 27 00:27:39 2013
From: nbvfour at gmail.com (nbv4)
Date: Sat, 26 Jan 2013 15:27:39 -0800 (PST)
Subject: [Python-ideas] built-in argspec for function objects
Message-ID: <ea23e5e8-25b7-4566-b11d-4abc20b1d3ab@googlegroups.com>

def my_function(a, b=c):
    pass

>>> my_function.args
['a']
>>> my_function.kwargs
{'b': c}
>>> my_function.all_args
['a', 'b']

What do you all think? Argspec is kind of unwieldy, this I think looks a 
lot nicer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130126/5726efc7/attachment.html>

From oscar.j.benjamin at gmail.com  Sun Jan 27 01:09:15 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Sun, 27 Jan 2013 00:09:15 +0000
Subject: [Python-ideas] built-in argspec for function objects
In-Reply-To: <ea23e5e8-25b7-4566-b11d-4abc20b1d3ab@googlegroups.com>
References: <ea23e5e8-25b7-4566-b11d-4abc20b1d3ab@googlegroups.com>
Message-ID: <CAHVvXxRiwPzakPxR=w_PfBO08ZRbTpnYFROJcpzcoHEQ=C6fLw@mail.gmail.com>

On 26 January 2013 23:27, nbv4 <nbvfour at gmail.com> wrote:
> def my_function(a, b=c):
>     pass
>
>>>> my_function.args
> ['a']

I would have expected 'b' to be in that list.

>>>> my_function.kwargs
> {'b': c}

I would have expected this to be a boolean indicating whether or not
the function accepts **kwargs.

>>>> my_function.all_args
> ['a', 'b']

Rather than args and all_args, I would perhaps have used something
more specific like required_positional_args and positional_args

> What do you all think? Argspec is kind of unwieldy, this I think looks a lot
> nicer.

Are you aware of PEP-362?
http://www.python.org/dev/peps/pep-0362/

I think that inspecting the arguments of a function is a relatively
uncommon thing to do, so I don't really have any problem with the fact
that all the code to do it is located in a special module (inspect).


Oscar


From tjreedy at udel.edu  Sun Jan 27 02:51:39 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 26 Jan 2013 20:51:39 -0500
Subject: [Python-ideas] built-in argspec for function objects
In-Reply-To: <ea23e5e8-25b7-4566-b11d-4abc20b1d3ab@googlegroups.com>
References: <ea23e5e8-25b7-4566-b11d-4abc20b1d3ab@googlegroups.com>
Message-ID: <ke2185$de9$1@ger.gmane.org>

On 1/26/2013 6:27 PM, nbv4 wrote:
> def my_function(a, b=c):
>      pass
>
>  >>> my_function.args
> ['a']
>  >>> my_function.kwargs
> {'b': c}
>  >>> my_function.all_args
> ['a', 'b']
>
> What do you all think? Argspec is kind of unwieldy, this I think looks a
> lot nicer.

The argument specifications are attributes of code objects. The inspect 
functions, including the new signature function, are one way to pull 
them out. You can write you own if you wish. I do not think they should 
be duplicated on function objects.

-- 
Terry Jan Reedy



From python at mrabarnett.plus.com  Sun Jan 27 04:03:49 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Sun, 27 Jan 2013 03:03:49 +0000
Subject: [Python-ideas] Interrupting threads
Message-ID: <51049915.3060808@mrabarnett.plus.com>

I know that this topic has been discussed before, but I've added a new
twist...

It's possible to interrupt the main thread using KeyboardInterrupt, so
why shouldn't it be possible to do something similar to a thread?

What I'm suggesting is that the Thread class could support an
'interrupt' method, which would raise a ThreadInterrupt exception
similar to KeyboardInterrupt.

Actually, there's more to it than that because sometimes you don't want
a section of code to be interrupted.

So here's what I'd like to suggest:

1. There's a private thread-specific flag called 'interrupt_occurred'.

2. There's a private thread-specific flag called 'heeding_interrupt'.

3. There's a context manager called 'heed_interrupt'.


About the context manager:

1. It accepts a bool argument.

2. On entry, it saves 'heeding_interrupt' and sets it to the argument.

3. It catches ThreadInterrupt and sets 'interrupt_occurred' to True.

4. On exit, it restores 'heeding_interrupt'.

5. On restoring 'heeding_interrupt', if it's now True and
'interrupt_occurred' is also True, it raises (or re-raises)
ThreadInterrupt.


Here are some examples which I hope will make things bit clearer
(although it's still somewhat involved!):


Example 1:

with heed_interrupt(False):
     # some code

Behaviour:

On entry, the context manager saves heeding_interrupt and sets it to
False.

If an interrupt occurs in "some code", interrupt_occurred will be set
to True (because heeding_interrupt is currently False).

On exit, the context manager restores heeding_interrupt. If
heeding_interrupt is now True and interrupt_occurred is also True, it
raises ThreadInterrupt.


Example 2:

with heed_interrupt(False):
     # some code 1
     with heed_interrupt(True):
         # some code 2
     # some code 3

Behaviour:

On entry, the outer context manager saves heeding_interrupt and sets it
to False.

If an interrupt occurs in "some code 1", interrupt_occurred will be set
to True (because heeding_interrupt is currently False).

On entry, the inner context manager saves heeding_interrupt and sets it
to True.

If an interrupt occurs in "some code 2" , ThreadInterrupt will be
raised (because heeding_interrupt is currently True). The exception
will be propagated until it's caught by the inner context manager,
which will then set interrupt_occurred to True.

(It's also possible that interrupt_occurred will already be True on
entry because an interrupt occurred in "some code 1"; in that case, it
may short-circuit, skipping "some code 2" completely.)

On exit, the inner contact manager restores heeding_interrupt and
restores it to False.

If an interrupt occurs in "some code 3", interrupt_occurred will be set
to True (because heeding_interrupt is currently False).

On exit, the outer context manager restores heeding_interrupt and
restores it. If heeding_interrupt is now True, it will raise
ThreadInterrupt.


From andre.roberge at gmail.com  Sun Jan 27 04:12:39 2013
From: andre.roberge at gmail.com (Andre Roberge)
Date: Sat, 26 Jan 2013 23:12:39 -0400
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <51049915.3060808@mrabarnett.plus.com>
References: <51049915.3060808@mrabarnett.plus.com>
Message-ID: <CAGMu_=pwRoNs86j=qGN6wcVTASNM34uQ+hx80iRTE=_HYZ=vqg@mail.gmail.com>

On Sat, Jan 26, 2013 at 11:03 PM, MRAB <python at mrabarnett.plus.com> wrote:

> I know that this topic has been discussed before, but I've added a new
> twist...
>
> It's possible to interrupt the main thread using KeyboardInterrupt, so
> why shouldn't it be possible to do something similar to a thread?
>

How about simply using http://code.activestate.com/recipes/496960/ ?

Andr?



>
> What I'm suggesting is that the Thread class could support an
> 'interrupt' method, which would raise a ThreadInterrupt exception
> similar to KeyboardInterrupt.
>
> Actually, there's more to it than that because sometimes you don't want
> a section of code to be interrupted.
>
> So here's what I'd like to suggest:
>
> 1. There's a private thread-specific flag called 'interrupt_occurred'.
>
> 2. There's a private thread-specific flag called 'heeding_interrupt'.
>
> 3. There's a context manager called 'heed_interrupt'.
>
>
> About the context manager:
>
> 1. It accepts a bool argument.
>
> 2. On entry, it saves 'heeding_interrupt' and sets it to the argument.
>
> 3. It catches ThreadInterrupt and sets 'interrupt_occurred' to True.
>
> 4. On exit, it restores 'heeding_interrupt'.
>
> 5. On restoring 'heeding_interrupt', if it's now True and
> 'interrupt_occurred' is also True, it raises (or re-raises)
> ThreadInterrupt.
>
>
> Here are some examples which I hope will make things bit clearer
> (although it's still somewhat involved!):
>
>
> Example 1:
>
> with heed_interrupt(False):
>     # some code
>
> Behaviour:
>
> On entry, the context manager saves heeding_interrupt and sets it to
> False.
>
> If an interrupt occurs in "some code", interrupt_occurred will be set
> to True (because heeding_interrupt is currently False).
>
> On exit, the context manager restores heeding_interrupt. If
> heeding_interrupt is now True and interrupt_occurred is also True, it
> raises ThreadInterrupt.
>
>
> Example 2:
>
> with heed_interrupt(False):
>     # some code 1
>     with heed_interrupt(True):
>         # some code 2
>     # some code 3
>
> Behaviour:
>
> On entry, the outer context manager saves heeding_interrupt and sets it
> to False.
>
> If an interrupt occurs in "some code 1", interrupt_occurred will be set
> to True (because heeding_interrupt is currently False).
>
> On entry, the inner context manager saves heeding_interrupt and sets it
> to True.
>
> If an interrupt occurs in "some code 2" , ThreadInterrupt will be
> raised (because heeding_interrupt is currently True). The exception
> will be propagated until it's caught by the inner context manager,
> which will then set interrupt_occurred to True.
>
> (It's also possible that interrupt_occurred will already be True on
> entry because an interrupt occurred in "some code 1"; in that case, it
> may short-circuit, skipping "some code 2" completely.)
>
> On exit, the inner contact manager restores heeding_interrupt and
> restores it to False.
>
> If an interrupt occurs in "some code 3", interrupt_occurred will be set
> to True (because heeding_interrupt is currently False).
>
> On exit, the outer context manager restores heeding_interrupt and
> restores it. If heeding_interrupt is now True, it will raise
> ThreadInterrupt.
> ______________________________**_________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130126/f9255911/attachment.html>

From guido at python.org  Sun Jan 27 04:33:14 2013
From: guido at python.org (Guido van Rossum)
Date: Sat, 26 Jan 2013 19:33:14 -0800
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <51049915.3060808@mrabarnett.plus.com>
References: <51049915.3060808@mrabarnett.plus.com>
Message-ID: <CAP7+vJ+pJkDD2wH6u70W8yZuJv8B3EgJiwES2yyQxj-rJCW0dA@mail.gmail.com>

This won't be very useful unless you also find a way to immediately
stop threads that are blocked, (1) for I/O (which may never happen),
or (2) for a lock. This most likely will mean mucking with signals.

On Sat, Jan 26, 2013 at 7:03 PM, MRAB <python at mrabarnett.plus.com> wrote:
> I know that this topic has been discussed before, but I've added a new
> twist...
>
> It's possible to interrupt the main thread using KeyboardInterrupt, so
> why shouldn't it be possible to do something similar to a thread?
>
> What I'm suggesting is that the Thread class could support an
> 'interrupt' method, which would raise a ThreadInterrupt exception
> similar to KeyboardInterrupt.
>
> Actually, there's more to it than that because sometimes you don't want
> a section of code to be interrupted.
>
> So here's what I'd like to suggest:
>
> 1. There's a private thread-specific flag called 'interrupt_occurred'.
>
> 2. There's a private thread-specific flag called 'heeding_interrupt'.
>
> 3. There's a context manager called 'heed_interrupt'.
>
>
> About the context manager:
>
> 1. It accepts a bool argument.
>
> 2. On entry, it saves 'heeding_interrupt' and sets it to the argument.
>
> 3. It catches ThreadInterrupt and sets 'interrupt_occurred' to True.
>
> 4. On exit, it restores 'heeding_interrupt'.
>
> 5. On restoring 'heeding_interrupt', if it's now True and
> 'interrupt_occurred' is also True, it raises (or re-raises)
> ThreadInterrupt.
>
>
> Here are some examples which I hope will make things bit clearer
> (although it's still somewhat involved!):
>
>
> Example 1:
>
> with heed_interrupt(False):
>     # some code
>
> Behaviour:
>
> On entry, the context manager saves heeding_interrupt and sets it to
> False.
>
> If an interrupt occurs in "some code", interrupt_occurred will be set
> to True (because heeding_interrupt is currently False).
>
> On exit, the context manager restores heeding_interrupt. If
> heeding_interrupt is now True and interrupt_occurred is also True, it
> raises ThreadInterrupt.
>
>
> Example 2:
>
> with heed_interrupt(False):
>     # some code 1
>     with heed_interrupt(True):
>         # some code 2
>     # some code 3
>
> Behaviour:
>
> On entry, the outer context manager saves heeding_interrupt and sets it
> to False.
>
> If an interrupt occurs in "some code 1", interrupt_occurred will be set
> to True (because heeding_interrupt is currently False).
>
> On entry, the inner context manager saves heeding_interrupt and sets it
> to True.
>
> If an interrupt occurs in "some code 2" , ThreadInterrupt will be
> raised (because heeding_interrupt is currently True). The exception
> will be propagated until it's caught by the inner context manager,
> which will then set interrupt_occurred to True.
>
> (It's also possible that interrupt_occurred will already be True on
> entry because an interrupt occurred in "some code 1"; in that case, it
> may short-circuit, skipping "some code 2" completely.)
>
> On exit, the inner contact manager restores heeding_interrupt and
> restores it to False.
>
> If an interrupt occurs in "some code 3", interrupt_occurred will be set
> to True (because heeding_interrupt is currently False).
>
> On exit, the outer context manager restores heeding_interrupt and
> restores it. If heeding_interrupt is now True, it will raise
> ThreadInterrupt.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas



-- 
--Guido van Rossum (python.org/~guido)


From ncoghlan at gmail.com  Sun Jan 27 06:26:30 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 27 Jan 2013 15:26:30 +1000
Subject: [Python-ideas] built-in argspec for function objects
In-Reply-To: <CADiSq7cCE=KXsBSuz9eBR0xTT63rhEOUMNGS+5byuSW2N-K+YQ@mail.gmail.com>
References: <ea23e5e8-25b7-4566-b11d-4abc20b1d3ab@googlegroups.com>
	<CADiSq7cCE=KXsBSuz9eBR0xTT63rhEOUMNGS+5byuSW2N-K+YQ@mail.gmail.com>
Message-ID: <CADiSq7dw3LkyWCsQ5AH=Tsf8f2dUqYoevixS+DFdegN6T-7sCw@mail.gmail.com>

<I wish the Google Groups bridge would quit corrupting the reply-to fields>

On Sun, Jan 27, 2013 at 3:24 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Sun, Jan 27, 2013 at 9:27 AM, nbv4 <nbvfour at gmail.com> wrote:
>> def my_function(a, b=c):
>>     pass
>>
>>>>> my_function.args
>> ['a']
>>>>> my_function.kwargs
>> {'b': c}
>>>>> my_function.all_args
>> ['a', 'b']
>
> Those are parameters, not arguments (see
> http://docs.python.org/3/faq/programming.html#faq-argument-vs-parameter)
>
>> What do you all think? Argspec is kind of unwieldy, this I think looks a lot
>> nicer.
>
> Indeed, argspec is unwieldy, which is why Python 3.3 includes the new
> inspect.signature function to calculate a richer signature
> representation that is easier to process:
> http://docs.python.org/3/library/inspect#introspecting-callables-with-the-signature-object
>
> Aaron Illes has backported this feature to earlier Python versions as
> the "funcsigs" PyPI package: http://pypi.python.org/pypi/funcsigs
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ubershmekel at gmail.com  Sun Jan 27 09:16:14 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Sun, 27 Jan 2013 10:16:14 +0200
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
Message-ID: <CANSw7KwafX+qGiMszK7LVMs=xacmyEUjGLDPdqGzfW95GxbXfg@mail.gmail.com>

Sorry for the delay, it took me a while to read
http://code.google.com/p/tulip/source/browse/ and wrap my head around it.

On Thu, Jan 24, 2013 at 8:50 PM, Guido van Rossum <guido at python.org> wrote:

> What other things might you want to do with the socket besides calling
>  getpeername() or getsockname()?



>From http://en.wikipedia.org/wiki/Berkeley_sockets#Options_for_sockets

> Options for sockets
>
> After creating a socket, it is possible to set options on it. Some of the
more common options are:
>
> TCP_NODELAY disables the Nagle algorithm.
> SO_KEEPALIVE enables periodic 'liveness' pings, if supported by the OS.

Though these may not be the concern of a protocol as defined by PEP 3156.

Would that be reasonable to expect
> from a protocol written to be independent of the specific transport
> type?
>
>
Most protocols should be written independent of transport. But it seems to
me that a user might write an entire app as a "protocol".

Yuval Greenfield
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130127/9aedcbff/attachment.html>

From cf.natali at gmail.com  Sun Jan 27 09:58:03 2013
From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Sun, 27 Jan 2013 09:58:03 +0100
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <51049915.3060808@mrabarnett.plus.com>
References: <51049915.3060808@mrabarnett.plus.com>
Message-ID: <CAH_1eM2B+4614=80RCpAUzoSOOsdQ4jpWsKrcsPjQw5_zGXK+Q@mail.gmail.com>

> It's possible to interrupt the main thread using KeyboardInterrupt, so
> why shouldn't it be possible to do something similar to a thread?

Because it's unsafe.
Allowing asynchronous interruptions at any point in the code is
calling for trouble: in a multi-threaded program, if you interrupt a
thread in the middle of a critical section, there's a high chance that
the invariants protected in this critical section won't hold. So
basically, the object/structure will be in an unusable state, which
will lead to random failures at some point in the future.

> Actually, there's more to it than that because sometimes you don't want
> a section of code to be interrupted.

Actually it's exactly the opposite: you only want to handle
interruption at very specific points in the code, so that the rollback
and interruption logic is tractable.

Also, as noted by Guido, it's basically useless because neither
sleep() nor lock acquisition can be interrupted - at least in the
current implementation -  and those are likely the calls you'd like to
interrupt.

FWIW, Java has a Thread.Stop() method that more or less does what
you're suggesting. It was quickly depreciated because it's inherently
unsafe: the right way to do it is through a cooperative form of
interruption, with an interruption exception that can be thrown at
specific points in the code (and a per-thread interrupt status flag
that can be checked explicitly, and which is checked implicitly when
entering an interruptible method).
See the rationale here:
http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html

> So here's what I'd like to suggest:
>
> 1. There's a private thread-specific flag called 'interrupt_occurred'.
>
> 2. There's a private thread-specific flag called 'heeding_interrupt'.
>
> 3. There's a context manager called 'heed_interrupt'.

I'm not a native speaker, and I had never heard about the 'heed' verb
before, had to look it up in the dictionary :-)


From dickinsm at gmail.com  Sun Jan 27 12:04:17 2013
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sun, 27 Jan 2013 11:04:17 +0000
Subject: [Python-ideas] complex number and fractional exponent
In-Reply-To: <CAHVvXxTj9aOFfK8fLqh94aFT80_M=Y4maxeVYWxNmnD6JS62bQ@mail.gmail.com>
References: <ke0gi7$61s$1@ger.gmane.org> <ke19mg$nrg$1@ger.gmane.org>
	<CAHVvXxTj9aOFfK8fLqh94aFT80_M=Y4maxeVYWxNmnD6JS62bQ@mail.gmail.com>
Message-ID: <CAAu3qLXdLXUYcXLWUFR1bd5J3pHzEEYmz1jiw5fY3SKHGUmJKA@mail.gmail.com>

On Sat, Jan 26, 2013 at 9:22 PM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> Why does cmath.sqrt give a different value from the __pow__ version?
>
> ~$ python3
> Python 3.2.3 (default, Oct 19 2012, 19:53:16)
> [GCC 4.7.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import cmath
>>>> cmath.sqrt(-1)
> 1j
>>>> (-1) ** .5
> (6.123031769111886e-17+1j)

Because they use different algorithms.  pow(x, y) essentially computes
exp(y * log(x)).  That involves a number of steps, any of which can
introduce small errors.  cmath.sqrt can use a more specific (and
usually more accurate) algorithm.  Moral: use cmath.sqrt and math.sqrt
for computing square roots, rather than x ** 0.5.

Mark


From solipsis at pitrou.net  Sun Jan 27 12:21:21 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 27 Jan 2013 12:21:21 +0100
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<CANSw7KwafX+qGiMszK7LVMs=xacmyEUjGLDPdqGzfW95GxbXfg@mail.gmail.com>
Message-ID: <20130127122121.6b779ada@pitrou.net>

On Sun, 27 Jan 2013 10:16:14 +0200
Yuval Greenfield <ubershmekel at gmail.com>
wrote:
> From http://en.wikipedia.org/wiki/Berkeley_sockets#Options_for_sockets
> 
> > Options for sockets
> >
> > After creating a socket, it is possible to set options on it. Some of the
> more common options are:
> >
> > TCP_NODELAY disables the Nagle algorithm.
> > SO_KEEPALIVE enables periodic 'liveness' pings, if supported by the OS.
> 
> Though these may not be the concern of a protocol as defined by PEP 3156.

How about e.g. TCP_CORK?

> > Would that be reasonable to expect
> > from a protocol written to be independent of the specific transport
> > type?
> >
> >
> Most protocols should be written independent of transport. But it seems to
> me that a user might write an entire app as a "protocol".

Well, such an assumption can fall flat. For example, certificate
checking in HTTPS expects that the transport is some version of TLS or
SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1

Regards

Antoine.




From solipsis at pitrou.net  Sun Jan 27 13:16:37 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sun, 27 Jan 2013 13:16:37 +0100
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
 the transport
In-Reply-To: <CANSw7KyuPCg9Ot6tY3ML_WBquO0PfFVqpzWPRpa3o8gmTyVS_A@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<CANSw7KwafX+qGiMszK7LVMs=xacmyEUjGLDPdqGzfW95GxbXfg@mail.gmail.com>
	<20130127122121.6b779ada@pitrou.net>
	<CANSw7KyuPCg9Ot6tY3ML_WBquO0PfFVqpzWPRpa3o8gmTyVS_A@mail.gmail.com>
Message-ID: <1359288997.3488.2.camel@localhost.localdomain>

Le dimanche 27 janvier 2013 ? 14:12 +0200, Yuval Greenfield a ?crit :
> On Sun, Jan 27, 2013 at 1:21 PM, Antoine Pitrou <solipsis at pitrou.net>
> wrote:
>         > Most protocols should be written independent of transport.
>         But it seems to
>         > me that a user might write an entire app as a "protocol".
>         
>         
>         Well, such an assumption can fall flat. For example,
>         certificate
>         checking in HTTPS expects that the transport is some version
>         of TLS or
>         SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1
> 
> 
> I'm not sure I understood your reply. You'd be for an api that exposes
> the underlying transport? I meant to say that "an entire app" entails
> control over the subtleties of the underlying transport. 

What I meant is that the HTTP protocol needs to know that it is running
over a secure transport, and it needs to fetch the server certificate
from that transport (or, alternatively, it needs to have one of its
callbacks called by the transport when the certificate is known). That's
not entirely transport-agnostic.

Regards

Antoine.




From shane at umbrellacode.com  Sun Jan 27 15:10:49 2013
From: shane at umbrellacode.com (Shane Green)
Date: Sun, 27 Jan 2013 06:10:49 -0800
Subject: [Python-ideas] Fwd: csv.DictReader could handle headers more
	intelligently.
References: <D88C8B2C-DB03-46F4-888A-FAC5C0F25E83@umbrellacode.com>
Message-ID: <6867B23C-4C94-4B64-B5C3-CC7AACF25A79@umbrellacode.com>

Something as simple as this (straw man) demonstrates what I mean: 

> class Record(defaultdict):
>     def __init__(self, headers, fields):
>         super(Record, self).__init__(list)
>         self.headers = headers
>         self.fields = fields
>         map(self.enter, self.headers, self.fields)
>     def valuemap(self, first=False):
>         index = 0 if first else -1
>         return dict([(key,values[index]) for key,values in self.items()])
>     def enter(self, header, *values):
>         if isinstance(header, int):
>             header = self.headers[header]
>         self[header].extend(values)
>     def itemseq(self):
>         return zip(self.headers,self.fields)
>     def __getitem__(self, spec):
>         if isinstance(spec, int):
>             return self.fields[spec]
>         return super(Record, self).__getitem__(spec)
>     def __getslice__(self, *args):
>         return self.fields.__getslice__(*args)
> 

This would let you access column values using header names, just like before.  Each column's value(s) is now in a list, and would contain multiple values anytime for any column repeated more than once in the header.  
Values can also be accessed sequentially using integer indexes, and the valuemap() returns a standard dictionary that conforms to the previous standard exactly: there is a one to one mapping between column headers and values, which the last value associated with a given column name being the value. 

While I think the changes should be added without changing what exists for backward compatibility reasons, I've started to think the existing version should also be deprecated, rather than maintained as a special case.  Even when the format is perfect for the existing code, I don't see any big advantages to using it over this approach. 

Keep in mind the example is just a quick straw man: performance is a big difference (and plenty of bugs), but that doesn't seem like the right thing to base the decision, as performance can easily be enhanced later.  

In summary, given headers: A, B, C, D, E, B, G

record.headers == ["A", "B", "C", "D", "E", "B", "G"]
record.fields = [0, 1, 2, 3, 4, 5, 6, 7]

record["A"] == [0]
record["B"] == [1, 5]

# Note sequential access values are not in lists, and the second "B" column's value 5 is in it's original 5th position. 
record[0] == 0
record[1] ==1
record[2] == 2
record[3] == 3
record[4] == 4
record[5] == 5

record.items() == [("A", [0]), ("B", [1, 5)), ?]
record.valuemap() == {"A": 0, "B": 5, ?} # This returns exactly what DictReader does today, a single value per named column, with the last value being the one used. 


Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

Begin forwarded message:

> From: Shane Green <shane at umbrellacode.com>
> Subject: Re: [Python-ideas] csv.DictReader could handle headers more intelligently.
> Date: January 26, 2013 6:39:11 AM PST
> To: "Stephen J. Turnbull" <stephen at xemacs.org>
> Cc: python-ideas at python.org
> 
> Okay, I like your point about DictReader having a place with a subset of CSV tables, and agree that, given that definition, it should throw an exception when its fed something that doesn't conform to this definition.  I like that.
> 
> One thing, though, the new version would let you access column data by name as well: 
> 
> Instead of
> 	row["timestamp"] == 1359210019.299478
> 
> It would be
> 	row["timestamp"] == [1359210019.299478]
> 
> And potentially 
> 	row["timestamp"] == [1359210019.299478,1359210019.299478]
> 
> It could also be accessed as: 
> 	row.headers[0] == "timestamp"
> 	row.headers[1] == "timestamp"
> 	row.values[0] == 1359210019.299478
> 	row.values[1] == 1359210019.299478
> 
> Could still provide: 
> 	for name,value in records.iterfirstitems(): # get the first value for each column with a given name.
> 	 	- or - 
> 	for name,value in records.iterlasttitems(): # get the last value for each column with a given name.
> 
> And the exact functionality you have now: 
> 	records.itervaluemaps() # or something? just a map(dict(records.iterlastitesm()))
> 		
> Overkill, but really simple things to add? 
> 
> The only thing this really adds to the "convenience" of the current DictReader for well-behaved tables, is the ability to access values sequentially or by name; other than that, the only difference would be iterating on a generator method's output instead of the instance itself.  
> 
> 
> 
> 
> Shane Green 
> www.umbrellacode.com
> 408-692-4666 | shane at umbrellacode.com
> 
> On Jan 26, 2013, at 5:53 AM, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:
> 
>> Shane Green writes:
>> 
>>> And while it's true that a dictionary is a dictionary and it works
>>> the way it works, the real point that drives home is that it's an
>>> inappropriate mechanism for dealing ordered rows of sequential
>>> values.
>> 
>> Right!  So use csv.reader, or csv.DictReader with an explicit
>> fieldnames argument.
>> 
>> The point of csv.DictReader with default fieldnames is to take a
>> "well-behaved" table and turn it into a sequence of "poor-man's"
>> objects.
>> 
>>> The final point is a simple one: while that CSV file format was
>>> stupid, it was perfectly legal.  Something that deals with CSV
>>> content should not be losing any of its content.
>> 
>> That's a reasonable requirement.
>> 
>>> It also should [not] be barfing or throwing exceptions, by the way.
>> 
>> That's not.  As long as the module provides classes capable of
>> handling any CSV format (it does), it may also provide convenience
>> classes for special purposes with restricted formats.  Those classes
>> may throw exceptions on input that doesn't satisfy the restrictions.
>> 
>>> And what about fixing it by replacing implementing a class that
>>> does it correctly, [...]?
>> 
>> Doesn't help users who want automatically detected access-by-name.
>> They must have unique field names.  (I don't have a use case.  I
>> assume the implementer of csv.DictReader did.<wink/>)
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130127/2be9300b/attachment.html>

From guido at python.org  Sun Jan 27 17:28:50 2013
From: guido at python.org (Guido van Rossum)
Date: Sun, 27 Jan 2013 08:28:50 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <1359288997.3488.2.camel@localhost.localdomain>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<CANSw7KwafX+qGiMszK7LVMs=xacmyEUjGLDPdqGzfW95GxbXfg@mail.gmail.com>
	<20130127122121.6b779ada@pitrou.net>
	<CANSw7KyuPCg9Ot6tY3ML_WBquO0PfFVqpzWPRpa3o8gmTyVS_A@mail.gmail.com>
	<1359288997.3488.2.camel@localhost.localdomain>
Message-ID: <CAP7+vJ+av5zRDDLThgo7CCn_xLJ7rY3u-myP6Hihf50j0z7pYQ@mail.gmail.com>

On Sun, Jan 27, 2013 at 4:16 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Le dimanche 27 janvier 2013 ? 14:12 +0200, Yuval Greenfield a ?crit :
>> On Sun, Jan 27, 2013 at 1:21 PM, Antoine Pitrou <solipsis at pitrou.net>
>> wrote:
>>         > Most protocols should be written independent of transport.
>>         But it seems to
>>         > me that a user might write an entire app as a "protocol".
>>
>>
>>         Well, such an assumption can fall flat. For example,
>>         certificate
>>         checking in HTTPS expects that the transport is some version
>>         of TLS or
>>         SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1
>>
>>
>> I'm not sure I understood your reply. You'd be for an api that exposes
>> the underlying transport? I meant to say that "an entire app" entails
>> control over the subtleties of the underlying transport.
>
> What I meant is that the HTTP protocol needs to know that it is running
> over a secure transport, and it needs to fetch the server certificate
> from that transport (or, alternatively, it needs to have one of its
> callbacks called by the transport when the certificate is known). That's
> not entirely transport-agnostic.

Yeah, it sounds like in the end having access to the socket itself (if
there is one) may be necessary. I suppose there are a number of
different ways to handle that specific use case, but it seems clear
that we can't anticipate all use cases. I'd rather have a simpler
abstraction with an escape hatch than attempting to codify more use
cases into the abstraction. We can always iterate on the design after
Python 3.4, if there's a useful generalization we didn't anticipate.

-- 
--Guido van Rossum (python.org/~guido)


From shane at umbrellacode.com  Sun Jan 27 18:11:05 2013
From: shane at umbrellacode.com (Umbrella Code)
Date: Sun, 27 Jan 2013 09:11:05 -0800
Subject: [Python-ideas] Fwd: PEP 3156: getting the socket or peer name from
	the transport
References: <DFB910EA-1AD0-454F-902D-13B18F4499A9@umbrellacode.com>
Message-ID: <39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com>

It's been a few years so my memory must be rusty, but where is the https protocol dependent on the transport/SSL setup in that way?

Sent from my iPad

Begin forwarded message:

> From: Umbrella Code <shane at umbrellacode.com>
> Date: January 27, 2013, 9:06:48 AM PST
> To: Guido van Rossum <guido at python.org>
> Subject: Re: [Python-ideas] PEP 3156: getting the socket or peer name from the transport
> 
> It's been a few years so my memory must be rusty, but where is the https protocol dependent on the transport/SSL setup in that way?
> 
> 
> Sent from my iPad
> 
> On Jan 27, 2013, at 8:28 AM, Guido van Rossum <guido at python.org> wrote:
> 
>> On Sun, Jan 27, 2013 at 4:16 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>> Le dimanche 27 janvier 2013 ? 14:12 +0200, Yuval Greenfield a ?crit :
>>>> On Sun, Jan 27, 2013 at 1:21 PM, Antoine Pitrou <solipsis at pitrou.net>
>>>> wrote:
>>>>> Most protocols should be written independent of transport.
>>>>       But it seems to
>>>>> me that a user might write an entire app as a "protocol".
>>>> 
>>>> 
>>>>       Well, such an assumption can fall flat. For example,
>>>>       certificate
>>>>       checking in HTTPS expects that the transport is some version
>>>>       of TLS or
>>>>       SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1
>>>> 
>>>> 
>>>> I'm not sure I understood your reply. You'd be for an api that exposes
>>>> the underlying transport? I meant to thsay that "an entire app" entails
>>>> control over the subtleties of the underlying transport.
>>> What I meant is that the HTTP protocol needs to know that it is running
>>> over a secure transport, and it needs to fetch the server certificate
>>> from that transport (or, alternatively, it needs to have one of its
>>> callbacks called by the transport when the certificate is known). That's
>>> not entirely transport-agnostic.
>> 
>> Yeah, it sounds like in the end having access to the socket itself (if
>> there is one) may be necessary. I suppose there are a number of
>> different ways to handle that specific use case, but it seems clear
>> that we can't anticipate all use cases. I'd rather have a simpler
>> abstraction with an escape hatch than attempting to codify more use
>> cases into the abstraction. We can always iterate on the design after
>> Python 3.4, if there's a useful generalization we didn't anticipate.
>> 
>> -- 
>> --Guido van Rossum (python.org/~guido)
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130127/93013124/attachment.html>

From ubershmekel at gmail.com  Sun Jan 27 18:41:28 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Sun, 27 Jan 2013 19:41:28 +0200
Subject: [Python-ideas] Fwd: PEP 3156: getting the socket or peer name
 from the transport
In-Reply-To: <39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com>
References: <DFB910EA-1AD0-454F-902D-13B18F4499A9@umbrellacode.com>
	<39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com>
Message-ID: <CANSw7KzRQ_3mz9qFhzeJZqAwhX3eyo7tFq19x4TBFy+TcVQU_A@mail.gmail.com>

On Sun, Jan 27, 2013 at 7:11 PM, Umbrella Code <shane at umbrellacode.com>wrote:

> It's been a few years so my memory must be rusty, but where is the https
> protocol dependent on the transport/SSL setup in that way?
>
> Sent from my iPad
>
> Begin forwarded message:
>
>
I can't speak for Antoine but I'm guessing he's talking about SNI:

* a VPS server hosts 2 sites with 2 certificates for "mysite.com" and "
yoursite.com"
* the original TCP server has no idea which cert to use as both sites share
the same IP address and port.
* the solution is the client sends the hostname in the TLS handshake.

So the DNS or HTTP line "host: mysite.com" is also used in the TLS layer.
This example agrees with Antoine but it's in the reverse direction, so
maybe he has another one in mind.

http://en.wikipedia.org/wiki/Transport_Layer_Security#Support_for_name-based_virtual_servers
http://en.wikipedia.org/wiki/HTTP_Secure#Limitations
http://en.wikipedia.org/wiki/Server_Name_Indication

Yuval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130127/d5bc578f/attachment.html>

From shane at umbrellacode.com  Sun Jan 27 18:51:32 2013
From: shane at umbrellacode.com (Umbrella Code)
Date: Sun, 27 Jan 2013 09:51:32 -0800
Subject: [Python-ideas] Fwd: PEP 3156: getting the socket or peer name
	from the transport
In-Reply-To: <CANSw7KzRQ_3mz9qFhzeJZqAwhX3eyo7tFq19x4TBFy+TcVQU_A@mail.gmail.com>
References: <DFB910EA-1AD0-454F-902D-13B18F4499A9@umbrellacode.com>
	<39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com>
	<CANSw7KzRQ_3mz9qFhzeJZqAwhX3eyo7tFq19x4TBFy+TcVQU_A@mail.gmail.com>
Message-ID: <FD9C2129-3410-4EB9-8915-641D41473B29@umbrellacode.com>

Thanks Yuval, that's a good example and explanation.

Sent from my iPad

On Jan 27, 2013, at 9:41 AM, Yuval Greenfield <ubershmekel at gmail.com> wrote:

> On Sun, Jan 27, 2013 at 7:11 PM, Umbrella Code <shane at umbrellacode.com> wrote:
>> It's been a few years so my memory must be rusty, but where is the https protocol dependent on the transport/SSL setup in that way?
>> 
>> Sent from my iPad
>> 
>> Begin forwarded message:
> 
> I can't speak for Antoine but I'm guessing he's talking about SNI:
> 
> * a VPS server hosts 2 sites with 2 certificates for "mysite.com" and "yoursite.com"
> * the original TCP server has no idea which cert to use as both sites share the same IP address and port.
> * the solution is the client sends the hostname in the TLS handshake. 
> 
> So the DNS or HTTP line "host: mysite.com" is also used in the TLS layer. This example agrees with Antoine but it's in the reverse direction, so maybe he has another one in mind.
> 
> http://en.wikipedia.org/wiki/Transport_Layer_Security#Support_for_name-based_virtual_servers
> http://en.wikipedia.org/wiki/HTTP_Secure#Limitations
> http://en.wikipedia.org/wiki/Server_Name_Indication
> 
> Yuval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130127/55dcb23b/attachment.html>

From shane at umbrellacode.com  Sun Jan 27 19:15:31 2013
From: shane at umbrellacode.com (Umbrella Code)
Date: Sun, 27 Jan 2013 10:15:31 -0800
Subject: [Python-ideas] Fwd: PEP 3156: getting the socket or peer name
	from the transport
In-Reply-To: <CANSw7KzRQ_3mz9qFhzeJZqAwhX3eyo7tFq19x4TBFy+TcVQU_A@mail.gmail.com>
References: <DFB910EA-1AD0-454F-902D-13B18F4499A9@umbrellacode.com>
	<39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com>
	<CANSw7KzRQ_3mz9qFhzeJZqAwhX3eyo7tFq19x4TBFy+TcVQU_A@mail.gmail.com>
Message-ID: <36DF0DF7-1F91-47EB-8C93-2F7A7DFD8EE0@umbrellacode.com>

Could it be handled as a context given to the protocol, and maybe accommodate the other information we'd been discussing?  Ultimately the socket could also be part of the context information available as the escape hatch, but generally pre-populated to buffer from hardware.  It could include address information, SSL data assigned by the server, etc.  Populating it at the right places could also be more efficient.  

Sent from my iPad

On Jan 27, 2013, at 9:41 AM, Yuval Greenfield <ubershmekel at gmail.com> wrote:

> On Sun, Jan 27, 2013 at 7:11 PM, Umbrella Code <shane at umbrellacode.com> wrote:
>> It's been a few years so my memory must be rusty, but where is the https protocol dependent on the transport/SSL setup in that way?
>> 
>> Sent from my iPad
>> 
>> Begin forwarded message:
> 
> I can't speak for Antoine but I'm guessing he's talking about SNI:
> 
> * a VPS server hosts 2 sites with 2 certificates for "mysite.com" and "yoursite.com"
> * the original TCP server has no idea which cert to use as both sites share the same IP address and port.
> * the solution is the client sends the hostname in the TLS handshake. 
> 
> So the DNS or HTTP line "host: mysite.com" is also used in the TLS layer. This example agrees with Antoine but it's in the reverse direction, so maybe he has another one in mind.
> 
> http://en.wikipedia.org/wiki/Transport_Layer_Security#Support_for_name-based_virtual_servers
> http://en.wikipedia.org/wiki/HTTP_Secure#Limitations
> http://en.wikipedia.org/wiki/Server_Name_Indication
> 
> Yuval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130127/a72c0ac5/attachment.html>

From cs at zip.com.au  Sun Jan 27 22:04:15 2013
From: cs at zip.com.au (Cameron Simpson)
Date: Mon, 28 Jan 2013 08:04:15 +1100
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <CAH_1eM2B+4614=80RCpAUzoSOOsdQ4jpWsKrcsPjQw5_zGXK+Q@mail.gmail.com>
References: <CAH_1eM2B+4614=80RCpAUzoSOOsdQ4jpWsKrcsPjQw5_zGXK+Q@mail.gmail.com>
Message-ID: <20130127210415.GA14691@cskk.homeip.net>

On 27Jan2013 09:58, Charles-Fran?ois Natali <cf.natali at gmail.com> wrote:
| > It's possible to interrupt the main thread using KeyboardInterrupt, so
| > why shouldn't it be possible to do something similar to a thread?
| 
| Because it's unsafe.

But the same can easily be true of a KeyboardInterrupt in the main
thread in any multithreaded program.

| Allowing asynchronous interruptions at any point in the code is
| calling for trouble: in a multi-threaded program, if you interrupt a
| thread in the middle of a critical section, there's a high chance that
| the invariants protected in this critical section won't hold. So
| basically, the object/structure will be in an unusable state, which
| will lead to random failures at some point in the future.

This is true if any other exception is raised also. MRAB's suggestion
turns a thread interrupt into an exception, with some control for
ignoring-but-detecting the exception around some places.

| > Actually, there's more to it than that because sometimes you don't want
| > a section of code to be interrupted.
| 
| Actually it's exactly the opposite: you only want to handle
| interruption at very specific points in the code, so that the rollback
| and interruption logic is tractable.

That would amount to running the whole thread inside his context manager
and polling the interrupt_occurred flag regularly.

| Also, as noted by Guido, it's basically useless because neither
| sleep() nor lock acquisition can be interrupted - at least in the
| current implementation -  and those are likely the calls you'd like to
| interrupt.

Sure.

| FWIW, Java has a Thread.Stop() method that more or less does what
| you're suggesting. It was quickly depreciated because it's inherently
| unsafe: the right way to do it is through a cooperative form of
| interruption, with an interruption exception that can be thrown at
| specific points in the code (and a per-thread interrupt status flag
| that can be checked explicitly, and which is checked implicitly when
| entering an interruptible method).
| See the rationale here:
| http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html
| 
| > So here's what I'd like to suggest:
| > 1. There's a private thread-specific flag called 'interrupt_occurred'.
| > 2. There's a private thread-specific flag called 'heeding_interrupt'.
| > 3. There's a context manager called 'heed_interrupt'.

On this basis, I'd be inclined to cast MRAB's suggestion as giving every
Thread object a cancel() method. When heeding_interrupt is True, raise
ThreadInterrupt. When heeding_interrupt is False, set
interrupt_occurred.

In fact, I'd change the word "Interrupt" to "Cancellation", and name
the flags thread_cancelled, thread_heed_cancel, and name the exception
"ThreadCancelled".

This only slightly changes the semantics and makes more clear the notion
that the cancellation may be deferred (eg when I/O blocked, etc). That
lets threads poll the thread_cancelled flag for cooperative behaviour and
still provides an exception based method for situations where it is
suitable.

| I'm not a native speaker, and I had never heard about the 'heed' verb
| before, had to look it up in the dictionary :-)

It's in common use, and not obscure. I am a native speaker, and see no
problem with it. Long standing word with a well known and defined
meaning.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

Rule #1: Never sell a Ducati.  Rule #2: Always obey Rule #1.
        - Godfrey DiGiorgi - ramarren at apple.com - DoD #0493


From scott+python-ideas at scottdial.com  Sun Jan 27 23:17:14 2013
From: scott+python-ideas at scottdial.com (Scott Dial)
Date: Sun, 27 Jan 2013 17:17:14 -0500
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <20130127210415.GA14691@cskk.homeip.net>
References: <CAH_1eM2B+4614=80RCpAUzoSOOsdQ4jpWsKrcsPjQw5_zGXK+Q@mail.gmail.com>
	<20130127210415.GA14691@cskk.homeip.net>
Message-ID: <5105A76A.5020703@scottdial.com>

On 1/27/2013 4:04 PM, Cameron Simpson wrote:
> | I'm not a native speaker, and I had never heard about the 'heed' verb
> | before, had to look it up in the dictionary :-)
> 
> It's in common use, and not obscure. I am a native speaker, and see no
> problem with it. Long standing word with a well known and defined
> meaning.

I disagree. I am a native speaker and am familiar with the word, but
that word definitely falls into the category of words that I don't use
except in idiomatic expressions (e.g., "pay heed ..." or "take heed ...").

Beyond that, why choose such an obscure word when simple words will do?

'interrupt_occurred' => 'interrupted'
'heeding_interrupt' => 'interruptible'

with interruptible(False):
   ...
   with interruptible(True):
       ...

-- 
Scott Dial
scott at scottdial.com


From cf.natali at gmail.com  Mon Jan 28 00:59:12 2013
From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Mon, 28 Jan 2013 00:59:12 +0100
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <20130127210415.GA14691@cskk.homeip.net>
References: <CAH_1eM2B+4614=80RCpAUzoSOOsdQ4jpWsKrcsPjQw5_zGXK+Q@mail.gmail.com>
	<20130127210415.GA14691@cskk.homeip.net>
Message-ID: <CAH_1eM1c+aSkhUGSrtoy_RgOH6gxaWsn5rrMAtRpYO4+qoA9Mg@mail.gmail.com>

> | Because it's unsafe.
>
> But the same can easily be true of a KeyboardInterrupt in the main
> thread in any multithreaded program.

Yes, that's why I don't catch KeyboardInterrupt, and only use it to
interrupt the execution of the program and leave it exit, unless it's
raised at specific places like reading from stdin...

> This is true if any other exception is raised also. MRAB's suggestion
> turns a thread interrupt into an exception, with some control for
> ignoring-but-detecting the exception around some places.

No, because properly written code is prepared to deal with exceptions
that the code is susceptible to throw.
This change would make it possible for an unrelated exception to be
thrown *at any point in the code*. Try writing safe and readable code
with this in mind: it's impossible (especially since the interruption
might be raise in the middle of the exception handling routine).

> That would amount to running the whole thread inside his context manager
> and polling the interrupt_occurred flag regularly.

This should be the default behavior: you only want to support
interruption at specific points in the code.

> On this basis, I'd be inclined to cast MRAB's suggestion as giving every
> Thread object a cancel() method. When heeding_interrupt is True, raise
> ThreadInterrupt. When heeding_interrupt is False, set
> interrupt_occurred.

That's still wrong: you want to test explicitly for interruption, or
throw interrupt exception at specific blocking calls (like sleep() or
acquire()).

Please have a look at the Java rationale and way of dealing with
interruption, you'll see why you want cooperative and specific
interruption support.

> It's in common use, and not obscure. I am a native speaker, and see no
> problem with it. Long standing word with a well known and defined
> meaning.

Really?
I know about sigprocmask(), pthread_sigmask(), SIG_IGN and SIG_BLOCK,
interrupt masking...
I couldn't find a single occurrence of "heed" in the POSIX specification.


From shane at umbrellacode.com  Mon Jan 28 09:57:48 2013
From: shane at umbrellacode.com (Shane Green)
Date: Mon, 28 Jan 2013 00:57:48 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJ+av5zRDDLThgo7CCn_xLJ7rY3u-myP6Hihf50j0z7pYQ@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<CANSw7KwafX+qGiMszK7LVMs=xacmyEUjGLDPdqGzfW95GxbXfg@mail.gmail.com>
	<20130127122121.6b779ada@pitrou.net>
	<CANSw7KyuPCg9Ot6tY3ML_WBquO0PfFVqpzWPRpa3o8gmTyVS_A@mail.gmail.com>
	<1359288997.3488.2.camel@localhost.localdomain>
	<CAP7+vJ+av5zRDDLThgo7CCn_xLJ7rY3u-myP6Hihf50j0z7pYQ@mail.gmail.com>
Message-ID: <EDC83381-4C64-4215-A90B-C72F2327BCA7@umbrellacode.com>

On Jan 27, 2013, at 8:28 AM, Guido van Rossum <guido at python.org> wrote:

> On Sun, Jan 27, 2013 at 4:16 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Le dimanche 27 janvier 2013 ? 14:12 +0200, Yuval Greenfield a ?crit :
>>> On Sun, Jan 27, 2013 at 1:21 PM, Antoine Pitrou <solipsis at pitrou.net>
>>> wrote:
>>>> Most protocols should be written independent of transport.
>>>        But it seems to
>>>> me that a user might write an entire app as a "protocol".
>>> 
>>> 
>>>        Well, such an assumption can fall flat. For example,
>>>        certificate
>>>        checking in HTTPS expects that the transport is some version
>>>        of TLS or
>>>        SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1
>>> 
>>> 
>>> I'm not sure I understood your reply. You'd be for an api that exposes
>>> the underlying transport? I meant to say that "an entire app" entails
>>> control over the subtleties of the underlying transport.
>> 
>> What I meant is that the HTTP protocol needs to know that it is running
>> over a secure transport, and it needs to fetch the server certificate
>> from that transport (or, alternatively, it needs to have one of its
>> callbacks called by the transport when the certificate is known). That's
>> not entirely transport-agnostic.
> 
> Yeah, it sounds like in the end having access to the socket itself (if
> there is one) may be necessary. I suppose there are a number of
> different ways to handle that specific use case, but it seems clear
> that we can't anticipate all use cases. I'd rather have a simpler
> abstraction with an escape hatch than attempting to codify more use
> cases into the abstraction. We can always iterate on the design after
> Python 3.4, if there's a useful generalization we didn't anticipate.
> 
> -- 
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas



What about giving the protocol an environ info object that should have all information it needs already, which could (and probably should) include things like the SSL certificate information, and would probably also be where additional info that happened to be looked up, like host name details, was stored and accessed.  Assuming the transports, etc., can define all the state information a protocol needs, can operate without hardware dependencies; in case that doesn't happen, though, the state dict will also have references to the socket, so the protocol could get to directly if needed. 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130128/8cf35984/attachment.html>

From mark.hackett at metoffice.gov.uk  Mon Jan 28 13:06:39 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Mon, 28 Jan 2013 12:06:39 +0000
Subject: [Python-ideas] Fwd: csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <6867B23C-4C94-4B64-B5C3-CC7AACF25A79@umbrellacode.com>
References: <D88C8B2C-DB03-46F4-888A-FAC5C0F25E83@umbrellacode.com>
	<6867B23C-4C94-4B64-B5C3-CC7AACF25A79@umbrellacode.com>
Message-ID: <201301281206.40149.mark.hackett@metoffice.gov.uk>

On Sunday 27 Jan 2013, Shane Green wrote:
> While I think the changes should be added without changing what exists for
>  backward compatibility reasons, I've started to think the existing version
>  should also be deprecated, rather than maintained as a special case
> 

That sounds effective.


From mark.hackett at metoffice.gov.uk  Mon Jan 28 13:13:45 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Mon, 28 Jan 2013 12:13:45 +0000
Subject: [Python-ideas]
 =?iso-8859-1?q?csv=2EDictReader_could_handle_heade?=
 =?iso-8859-1?q?rs_more=09intelligently=2E?=
In-Reply-To: <87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1358903168.4767.4.camel@webb>
	<F4553DB5-3A86-4E8C-A8A5-55A60FF8DE7B@umbrellacode.com>
	<87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <201301281213.45810.mark.hackett@metoffice.gov.uk>

On Saturday 26 Jan 2013, Stephen J. Turnbull wrote:
> Shane Green writes:
>  > And while it's true that a dictionary is a dictionary and it works
>  > the way it works, the real point that drives home is that it's an
>  > inappropriate mechanism for dealing ordered rows of sequential
>  > values.
> 
> Right!  So use csv.reader, or csv.DictReader with an explicit
> fieldnames argument.
> 
> The point of csv.DictReader with default fieldnames is to take a
> "well-behaved" table and turn it into a sequence of "poor-man's"
> objects.
> 

Well though there's another example out there of what do do next, I was 
thinking of being able to define the csv file format so that you could write it 
out correctly too.

And to that end, some form of description of the csv file is needed. I was 
thinking something like this:

A,B,C,A,D,E
{(A:2,A:1),B,C,D,E}

which would put columns 4 and 1 in the first entry (under the name A) as a 
list, in that order, followed by B, C, D and E all expected to be single 
unique names.

This also allows the same definition to be used to write it out.

Blank headers are denoted with:

A,,,,,,B,C

And headers not used in the dictionary (discarded) are handled by not being 
put in the "where do we put this" line:
A,B,C,D
{A,D}

When writing out, you cannot have empty headers (since these values get 
dropped and the output format spec is now no longer suitable), and you must 
assign each header a dictionary (else again the dictionary doesn't contain all 
the data that was in the input).

To write out these two types of input file, you need to create a new csv format 
spec which CAN be written out.

Therefore you will have to deliberately define an output that loses data.


From mark.hackett at metoffice.gov.uk  Mon Jan 28 13:21:19 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Mon, 28 Jan 2013 12:21:19 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <17bba319-ff53-41a6-8ada-3cd3ad036076@googlegroups.com>
References: <1358903168.4767.4.camel@webb>
	<201301251653.46558.mark.hackett@metoffice.gov.uk>
	<17bba319-ff53-41a6-8ada-3cd3ad036076@googlegroups.com>
Message-ID: <201301281221.19978.mark.hackett@metoffice.gov.uk>

On Friday 25 Jan 2013, rurpy at yahoo.com wrote:
> 
> The csv DictReader *uses* a dictionary for its output. That
> it does so imposes no requirements on how it should parse or
> otherwise handle the input that eventually goes into that
> dict.

And that doesn't mean that writing

dict[A]=1
dict[A]=9

results in dict[A] being a list containing 1 and 9.

A program using a dictionary entry has to know whether the input has duplicate 
headers because in the case where only the first line is done, writing out the 
value of dict[A] gives you "1". Writing out dict[A] if it's a list gives you 
"[1,9]" which must be parsed differently.


From mark.hackett at metoffice.gov.uk  Mon Jan 28 13:21:58 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Mon, 28 Jan 2013 12:21:58 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5102B76B.2080106@stoneleaf.us>
References: <1358903168.4767.4.camel@webb>
	<201301251058.28531.mark.hackett@metoffice.gov.uk>
	<5102B76B.2080106@stoneleaf.us>
Message-ID: <201301281221.58842.mark.hackett@metoffice.gov.uk>

On Friday 25 Jan 2013, Ethan Furman wrote:
> We're going to have to agree to disagree on this point -- I think there 
> is a huge difference between reassigning a variable which is completely 
> under your control from losing entire columns of data from a file which 
> you may have never seen before.
> 

But if you've never seen it before, how do you know that you're going to get a 
LIST in one column?


From wolfgang.maier at biologie.uni-freiburg.de  Mon Jan 28 14:33:45 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Mon, 28 Jan 2013 14:33:45 +0100
Subject: [Python-ideas] while conditional in list comprehension ??
Message-ID: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>

Dear all,
I guess this is so obvious that someone must have suggested it before:
in list comprehensions you can currently exclude items based on the if
conditional, e.g.:

[n for n in range(1,1000) if n % 4 == 0]

Why not extend this filtering by allowing a while statement in addition to
if, as in:

[n for n in range(1,1000) while n < 400]

Trivial effect, I agree, in this example since you could achieve the same by
using range(1,400), but I hope you get the point.
This intuitively understandable extension would provide a big speed-up for
sorted lists where processing all the input is unnecessary.

Consider this:

some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"]     #
a sorted list of names
[n for n in some_names if n.startswith("A")]
# certainly gives a list of all names starting with A, but .
[n for n in some_names while n.startswith("A")]
# would have saved two comparisons

Best,
Wolfgang 





From rosuav at gmail.com  Mon Jan 28 14:56:39 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 29 Jan 2013 00:56:39 +1100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
Message-ID: <CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>

On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier
<wolfgang.maier at biologie.uni-freiburg.de> wrote:
> Why not extend this filtering by allowing a while statement in addition to
> if, as in:
>
> [n for n in range(1,1000) while n < 400]

The time machine strikes again! Check out itertools.takewhile - it can
do pretty much that:

import itertools
[n for n in itertools.takewhile(lambda n: n<400, range(1,1000))]

It's not quite list comp notation, but it works.

>>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39]

ChrisA


From oscar.j.benjamin at gmail.com  Mon Jan 28 14:59:40 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Mon, 28 Jan 2013 13:59:40 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>
Message-ID: <CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>

On 28 January 2013 13:56, Chris Angelico <rosuav at gmail.com> wrote:
> On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier
> <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>> Why not extend this filtering by allowing a while statement in addition to
>> if, as in:
>>
>> [n for n in range(1,1000) while n < 400]
>
> The time machine strikes again! Check out itertools.takewhile - it can
> do pretty much that:
>
> import itertools
> [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))]
>
> It's not quite list comp notation, but it works.
>
>>>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))]
> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
> 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
> 37, 38, 39]

The while clause is a lot clearer/nicer than takewhile/lambda.
Presumably it would be more efficient as well.


Oscar


From masklinn at masklinn.net  Mon Jan 28 15:28:52 2013
From: masklinn at masklinn.net (Masklinn)
Date: Mon, 28 Jan 2013 15:28:52 +0100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>
	<CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
Message-ID: <D19EF489-3A18-4883-9EAD-F7D66C461D12@masklinn.net>


On 2013-01-28, at 14:59 , Oscar Benjamin wrote:

> On 28 January 2013 13:56, Chris Angelico <rosuav at gmail.com> wrote:
>> On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier
>> <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>>> Why not extend this filtering by allowing a while statement in addition to
>>> if, as in:
>>> 
>>> [n for n in range(1,1000) while n < 400]
>> 
>> The time machine strikes again! Check out itertools.takewhile - it can
>> do pretty much that:
>> 
>> import itertools
>> [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))]
>> 
>> It's not quite list comp notation, but it works.
>> 
>>>>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))]
>> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>> 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
>> 37, 38, 39]
> 
> The while clause is a lot clearer/nicer than takewhile/lambda.
> Presumably it would be more efficient as well.

Maybe, but it's a rather uncommon need and that way lies Common Lisp's
`loop`.

From shane at umbrellacode.com  Mon Jan 28 15:32:21 2013
From: shane at umbrellacode.com (Shane Green)
Date: Mon, 28 Jan 2013 06:32:21 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>
	<CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
Message-ID: <1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com>

Isn't "while" kind just the "if" of a looping construct?

Would [n for n in range(1,1000) while n < 400] == [n for n in range(1,1000) if n < 400]?

I guess your kind of looking for an "else break" feature to exit the list comprehension before evaluating all the input values.  Wouldn't that complete the "while()" functionality? 





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 28, 2013, at 5:59 AM, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:

> On 28 January 2013 13:56, Chris Angelico <rosuav at gmail.com> wrote:
>> On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier
>> <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>>> Why not extend this filtering by allowing a while statement in addition to
>>> if, as in:
>>> 
>>> [n for n in range(1,1000) while n < 400]
>> 
>> The time machine strikes again! Check out itertools.takewhile - it can
>> do pretty much that:
>> 
>> import itertools
>> [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))]
>> 
>> It's not quite list comp notation, but it works.
>> 
>>>>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))]
>> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>> 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
>> 37, 38, 39]
> 
> The while clause is a lot clearer/nicer than takewhile/lambda.
> Presumably it would be more efficient as well.
> 
> 
> Oscar
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130128/4c5352db/attachment.html>

From graffatcolmingov at gmail.com  Mon Jan 28 15:38:58 2013
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Mon, 28 Jan 2013 09:38:58 -0500
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>
	<CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
Message-ID: <CAN-Kwu1HNNgx2Wz7ZBuy5MPiGbx1=cNe6+mO+-skhVFZ48dQ3Q@mail.gmail.com>

On Mon, Jan 28, 2013 at 8:59 AM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> On 28 January 2013 13:56, Chris Angelico <rosuav at gmail.com> wrote:
>> On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier
>> <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>>> Why not extend this filtering by allowing a while statement in addition to
>>> if, as in:
>>>
>>> [n for n in range(1,1000) while n < 400]
>>
>> The time machine strikes again! Check out itertools.takewhile - it can
>> do pretty much that:
>>
>> import itertools
>> [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))]
>>
>> It's not quite list comp notation, but it works.
>>
>>>>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))]
>> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>> 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
>> 37, 38, 39]
>
> The while clause is a lot clearer/nicer than takewhile/lambda.
> Presumably it would be more efficient as well.

The while syntax definitely reads better, and I would guess that dis
could clarify how much more efficient using `if n < 400` would be
compared to the lambda. Then again this is a rather uncommon situation
and it could be handled with the if syntax. Also, if we recall the zen
of python "There should be one-- and preferably only one --obvious way
to do it." which is argument enough against the `while` syntax.


From rosuav at gmail.com  Mon Jan 28 15:43:39 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 29 Jan 2013 01:43:39 +1100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>
	<CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
	<1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com>
Message-ID: <CAPTjJmr_7vgXn8Wg3QwAYsXkADwQabQHH8ETovPj45YwCkuwtw@mail.gmail.com>

On Tue, Jan 29, 2013 at 1:32 AM, Shane Green <shane at umbrellacode.com> wrote:
> Isn't "while" kind just the "if" of a looping construct?
>
> Would [n for n in range(1,1000) while n < 400] == [n for n in range(1,1000)
> if n < 400]?
>
> I guess your kind of looking for an "else break" feature to exit the list
> comprehension before evaluating all the input values.  Wouldn't that
> complete the "while()" functionality?

In the specific case given, they'll produce the same result, but there
are two key differences:

1) If the condition becomes true again later in the original iterable,
the 'if' will pick up those entries, but the 'while' won't; and
2) The 'while' version will not consume more than the one result that
failed to pass the condition.

I daresay it would be faster and maybe cleaner to implement this with
a language feature rather than itertools.takewhile, but list
comprehensions can get unwieldy too; is there sufficient call for this
to justify the syntax?

ChrisA


From shane at umbrellacode.com  Mon Jan 28 15:51:23 2013
From: shane at umbrellacode.com (Shane Green)
Date: Mon, 28 Jan 2013 06:51:23 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAPTjJmr_7vgXn8Wg3QwAYsXkADwQabQHH8ETovPj45YwCkuwtw@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>
	<CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
	<1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com>
	<CAPTjJmr_7vgXn8Wg3QwAYsXkADwQabQHH8ETovPj45YwCkuwtw@mail.gmail.com>
Message-ID: <AC35797F-69DE-4E35-AAA4-3E32A16F2A0F@umbrellacode.com>

Yeah, I realized (1) after a minute and came up with "else break": if n < 400 else break.  Could that be functionally equivalent, not based on a loop construct within an iterator?





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 28, 2013, at 6:43 AM, Chris Angelico <rosuav at gmail.com> wrote:

> On Tue, Jan 29, 2013 at 1:32 AM, Shane Green <shane at umbrellacode.com> wrote:
>> Isn't "while" kind just the "if" of a looping construct?
>> 
>> Would [n for n in range(1,1000) while n < 400] == [n for n in range(1,1000)
>> if n < 400]?
>> 
>> I guess your kind of looking for an "else break" feature to exit the list
>> comprehension before evaluating all the input values.  Wouldn't that
>> complete the "while()" functionality?
> 
> In the specific case given, they'll produce the same result, but there
> are two key differences:
> 
> 1) If the condition becomes true again later in the original iterable,
> the 'if' will pick up those entries, but the 'while' won't; and
> 2) The 'while' version will not consume more than the one result that
> failed to pass the condition.
> 
> I daresay it would be faster and maybe cleaner to implement this with
> a language feature rather than itertools.takewhile, but list
> comprehensions can get unwieldy too; is there sufficient call for this
> to justify the syntax?
> 
> ChrisA
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130128/1272c96c/attachment.html>

From graffatcolmingov at gmail.com  Mon Jan 28 16:17:49 2013
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Mon, 28 Jan 2013 10:17:49 -0500
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <AC35797F-69DE-4E35-AAA4-3E32A16F2A0F@umbrellacode.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>
	<CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
	<1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com>
	<CAPTjJmr_7vgXn8Wg3QwAYsXkADwQabQHH8ETovPj45YwCkuwtw@mail.gmail.com>
	<AC35797F-69DE-4E35-AAA4-3E32A16F2A0F@umbrellacode.com>
Message-ID: <CAN-Kwu3XoCddP2AEb28u8PSebVr3ULCygm018v1NJu6WG8gM-Q@mail.gmail.com>

On Mon, Jan 28, 2013 at 9:51 AM, Shane Green <shane at umbrellacode.com> wrote:
> Yeah, I realized (1) after a minute and came up with "else break": if n <
> 400 else break.  Could that be functionally equivalent, not based on a loop
> construct within an iterator?
>

You mean: `[n for n in range(0, 400) if n < 100 else break]`? That is
definitely more obvious (in my opinion) than using the while syntax,
but what does `break` mean in the context of a list comprehension? I
understand the point, but I dislike the execution. I guess coming from
a background in pure mathematics, this just seems wrong for a list (or
set) comprehension.


From rosuav at gmail.com  Mon Jan 28 16:24:56 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Tue, 29 Jan 2013 02:24:56 +1100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAN-Kwu3XoCddP2AEb28u8PSebVr3ULCygm018v1NJu6WG8gM-Q@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAPTjJmrgx=mLyzJ8x4VcTeOOBPjZztuB9aktcZ1wSFAcB2vzwg@mail.gmail.com>
	<CAHVvXxTwnu1SZF=XzWe5iOtu5KhCvbA366xG8Sdm1kDnS5-R_Q@mail.gmail.com>
	<1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com>
	<CAPTjJmr_7vgXn8Wg3QwAYsXkADwQabQHH8ETovPj45YwCkuwtw@mail.gmail.com>
	<AC35797F-69DE-4E35-AAA4-3E32A16F2A0F@umbrellacode.com>
	<CAN-Kwu3XoCddP2AEb28u8PSebVr3ULCygm018v1NJu6WG8gM-Q@mail.gmail.com>
Message-ID: <CAPTjJmpR-9NG_KCHQbwRDsH9CXEm6D8pJ0NY-Y96ztxQ-jrYiA@mail.gmail.com>

On Tue, Jan 29, 2013 at 2:17 AM, Ian Cordasco
<graffatcolmingov at gmail.com> wrote:
> You mean: `[n for n in range(0, 400) if n < 100 else break]`? That is
> definitely more obvious (in my opinion) than using the while syntax,
> but what does `break` mean in the context of a list comprehension?

It's easy enough in the simple case. What would happen if you added an
"else break" to this:

[(x,y) for x in range(10) for y in range(2) if x<3]

Of course, this would be better written with the if between the two
fors, but the clarity isn't that big a problem when it's not going to
change the result. Would it be obvious that the "else break" would
only halt the "for y" loop?

ChrisA


From guido at python.org  Mon Jan 28 16:45:55 2013
From: guido at python.org (Guido van Rossum)
Date: Mon, 28 Jan 2013 07:45:55 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <EDC83381-4C64-4215-A90B-C72F2327BCA7@umbrellacode.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<CANSw7KwafX+qGiMszK7LVMs=xacmyEUjGLDPdqGzfW95GxbXfg@mail.gmail.com>
	<20130127122121.6b779ada@pitrou.net>
	<CANSw7KyuPCg9Ot6tY3ML_WBquO0PfFVqpzWPRpa3o8gmTyVS_A@mail.gmail.com>
	<1359288997.3488.2.camel@localhost.localdomain>
	<CAP7+vJ+av5zRDDLThgo7CCn_xLJ7rY3u-myP6Hihf50j0z7pYQ@mail.gmail.com>
	<EDC83381-4C64-4215-A90B-C72F2327BCA7@umbrellacode.com>
Message-ID: <CAP7+vJKn6hE1zWujnDi=5dUtRsdovM7741G9bK0e4vQJvmbDPA@mail.gmail.com>

On Mon, Jan 28, 2013 at 12:57 AM, Shane Green <shane at umbrellacode.com> wrote:
> What about giving the protocol an environ info object that should have all
> information it needs already, which could (and probably should) include
> things like the SSL certificate information, and would probably also be
> where additional info that happened to be looked up, like host name details,
> was stored and accessed.  Assuming the transports, etc., can define all the
> state information a protocol needs, can operate without hardware
> dependencies; in case that doesn't happen, though, the state dict will also
> have references to the socket, so the protocol could get to directly if
> needed.

Hm. I'm not keen on precomputing all of that, since most protocols
won't need it, and the cost add up. This is not WSGI. The protocol has
the transport object and can ask it specific questions -- if through a
general API, like get_extra_info(key, [default]).

-- 
--Guido van Rossum (python.org/~guido)


From ethan at stoneleaf.us  Mon Jan 28 16:50:09 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 28 Jan 2013 07:50:09 -0800
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <CAH_1eM1c+aSkhUGSrtoy_RgOH6gxaWsn5rrMAtRpYO4+qoA9Mg@mail.gmail.com>
References: <CAH_1eM2B+4614=80RCpAUzoSOOsdQ4jpWsKrcsPjQw5_zGXK+Q@mail.gmail.com>
	<20130127210415.GA14691@cskk.homeip.net>
	<CAH_1eM1c+aSkhUGSrtoy_RgOH6gxaWsn5rrMAtRpYO4+qoA9Mg@mail.gmail.com>
Message-ID: <51069E31.6010909@stoneleaf.us>

On 01/27/2013 03:59 PM, Charles-Fran?ois Natali wrote:
 > On 01/27/2013 01:04 PM, Cameron Simpson wrote:
>> It's in common use, and not obscure. I am a native speaker, and see no
>> problem with it. Long standing word with a well known and defined
>> meaning.
>
> Really?
> I know about sigprocmask(), pthread_sigmask(), SIG_IGN and SIG_BLOCK,
> interrupt masking...
> I couldn't find a single occurrence of "heed" in the POSIX specification.

Common use is not the same as common technical use.  I don't recall ever 
seeing 'heed' or any of its derivatives in technical literature, but I 
am very familiar with the word and its meaning.  It's a good choice.

Having said that, I also agree with Cameron that 'canceled' would be a 
better word that 'interrupted'.

~Ethan~


From eliben at gmail.com  Mon Jan 28 16:55:57 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Mon, 28 Jan 2013 07:55:57 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
Message-ID: <CAF-Rda9vQDbzr-p4CxZQAUqWGB6X16osYJmc_O1jJru9Gf3bnw@mail.gmail.com>

On Mon, Jan 28, 2013 at 5:33 AM, Wolfgang Maier <
wolfgang.maier at biologie.uni-freiburg.de> wrote:

> Dear all,
> I guess this is so obvious that someone must have suggested it before:
> in list comprehensions you can currently exclude items based on the if
> conditional, e.g.:
>
> [n for n in range(1,1000) if n % 4 == 0]
>
> Why not extend this filtering by allowing a while statement in addition to
> if, as in:
>
> [n for n in range(1,1000) while n < 400]
>
> Trivial effect, I agree, in this example since you could achieve the same
> by
> using range(1,400), but I hope you get the point.
> This intuitively understandable extension would provide a big speed-up for
> sorted lists where processing all the input is unnecessary.
>
> Consider this:
>
> some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"]     #
> a sorted list of names
> [n for n in some_names if n.startswith("A")]
> # certainly gives a list of all names starting with A, but .
> [n for n in some_names while n.startswith("A")]
> # would have saved two comparisons
>

-1

This isn't adding a feature that the language can't currently perform. It
can, with itertools, with an explicit 'for' loop and probably other
methods. List comprehensions are a useful shortcut that should be kept as
simple as possible. The semantics of the proposed 'while' aren't
immediately obvious, which makes it out of place in list comprehensions,
IMO.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130128/d00c0a97/attachment.html>

From wolfgang.maier at biologie.uni-freiburg.de  Mon Jan 28 17:19:23 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Mon, 28 Jan 2013 17:19:23 +0100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAF-Rda9vQDbzr-p4CxZQAUqWGB6X16osYJmc_O1jJru9Gf3bnw@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAF-Rda9vQDbzr-p4CxZQAUqWGB6X16osYJmc_O1jJru9Gf3bnw@mail.gmail.com>
Message-ID: <00d901cdfd73$3c67e2c0$b537a840$@biologie.uni-freiburg.de>

> -1
> This isn't adding a feature that the language can't currently perform. It
can, with itertools, with an explicit 'for' loop and probably other methods.
> List comprehensions are a useful shortcut that should be kept as simple as
possible. The semantics of the proposed 'while' aren't immediately
> obvious, which makes it out of place in list comprehensions, IMO.
> 
> Eli

I thought everything that can be done with a list comprehension can also be
done with an explicit 'for' loop! So following your logic, one would have to
remove comprehensions from the language altogether. In terms of semantics I
do not really see what isn't immediately obvious about my proposal.

Since the question of use cases was brought up: I am working as a scientist,
and one of the uses I thought of when proposing this was that it could be
used in combination with any kind of iterator that can yield an infinite
number of elements, but you only want the first few elements up to a certain
value (note: this is related to, but not the same as saying I want a certain
number of elements from the iterator).

Let?s take the often used example of the Fibonacci iterator and assume you
have an instance 'fibo' of its iterable class implementation, then:

[n for n in fibo while n <10000]

would return a list with all Fibonacci numbers that are smaller than 10000
(without having to know in advance how many such numbers there are).
Likewise, with prime numbers and a 'prime' iterator:

[n for n in prime while n<10000]

and many other scientifically useful numeric sequences.
I would appreciate such a feature, and, even though everything can be solved
with itertools, I think it?s too much typing and thinking for generating a
list quickly.

Best,
Wolfgang







From graffatcolmingov at gmail.com  Mon Jan 28 17:33:43 2013
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Mon, 28 Jan 2013 11:33:43 -0500
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <00d901cdfd73$3c67e2c0$b537a840$@biologie.uni-freiburg.de>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CAF-Rda9vQDbzr-p4CxZQAUqWGB6X16osYJmc_O1jJru9Gf3bnw@mail.gmail.com>
	<00d901cdfd73$3c67e2c0$b537a840$@biologie.uni-freiburg.de>
Message-ID: <CAN-Kwu1N6JNYZXB2KD2Kxg+6AksAFzaYogGTQOT9pBtAg_cmpw@mail.gmail.com>

On Mon, Jan 28, 2013 at 11:19 AM, Wolfgang Maier
<wolfgang.maier at biologie.uni-freiburg.de> wrote:
>> -1
>> This isn't adding a feature that the language can't currently perform. It
> can, with itertools, with an explicit 'for' loop and probably other methods.
>> List comprehensions are a useful shortcut that should be kept as simple as
> possible. The semantics of the proposed 'while' aren't immediately
>> obvious, which makes it out of place in list comprehensions, IMO.
>>
>> Eli
>
> I thought everything that can be done with a list comprehension can also be
> done with an explicit 'for' loop! So following your logic, one would have to
> remove comprehensions from the language altogether. In terms of semantics I
> do not really see what isn't immediately obvious about my proposal.
>

Sarcasm will not help your argument. The difference (as I would expect
you to know) between the performance of a list comprehension and an
explict `for` loop is significant and the comprehension is already a
feature of the language. Removing it would be nonsensical.

> Since the question of use cases was brought up: I am working as a scientist,
> and one of the uses I thought of when proposing this was that it could be
> used in combination with any kind of iterator that can yield an infinite
> number of elements, but you only want the first few elements up to a certain
> value (note: this is related to, but not the same as saying I want a certain
> number of elements from the iterator).
>
> Let?s take the often used example of the Fibonacci iterator and assume you
> have an instance 'fibo' of its iterable class implementation, then:
>
> [n for n in fibo while n <10000]
>
> would return a list with all Fibonacci numbers that are smaller than 10000
> (without having to know in advance how many such numbers there are).
> Likewise, with prime numbers and a 'prime' iterator:
>
> [n for n in prime while n<10000]
>
> and many other scientifically useful numeric sequences.
> I would appreciate such a feature, and, even though everything can be solved
> with itertools, I think it?s too much typing and thinking for generating a
> list quickly.
>

This is definitely a problematic use case for a simple list
comprehension, but the takewhile solution works exactly as expected
and even resembles your solution. It is in the standard library and
it's performance seems to be fast enough (to me at least, on a 10 year
old laptop). And the key phrase here is "simple list comprehension".
Yours is in theory a simple list comprehension but is rather a
slightly more complex case that can be handled in a barely more
complex way. itertools is a part of the standard library that needs
more affection, in my opinion and really does its best to accommodate
these more complex cases in sensible ways.

I am still -1 on this.


Cheers,
Ian


From wolfgang.maier at biologie.uni-freiburg.de  Mon Jan 28 17:48:07 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Mon, 28 Jan 2013 17:48:07 +0100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAN-Kwu1N6JNYZXB2KD2Kxg+6AksAFzaYogGTQOT9pBtAg_cmpw@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>	<CAF-Rda9vQDbzr-p4CxZQAUqWGB6X16osYJmc_O1jJru9Gf3bnw@mail.gmail.com>	<00d901cdfd73$3c67e2c0$b537a840$@biologie.uni-freiburg.de>
	<CAN-Kwu1N6JNYZXB2KD2Kxg+6AksAFzaYogGTQOT9pBtAg_cmpw@mail.gmail.com>
Message-ID: <00db01cdfd77$3fc1e660$bf45b320$@biologie.uni-freiburg.de>


> Sarcasm will not help your argument. The difference (as I would expect you
to know) between the performance of a list comprehension and an
>  explict `for` loop is significant and the comprehension is already a
feature of the language. Removing it would be nonsensical.

Ok, I am sorry for the sarcasm. Essentially this is exactly what I wanted to
say with it. Because comprehensions are faster than for loops, I am using
them, and this is why I'd like the while feature in them. I fully agree with
everybody here that itertools provides a solution for it, but imagine for a
moment the if clause wouldn't exist and people would point you to a similar
itertools solution for it, e.g.:

[n for n in itertools.takeif(lambda n: n % 4 == 0, range(1,1000))]

What would you prefer?

I think it is true that this is mostly about how often people would make use
of the feature.
And, yes, it was a mistake to disturb the ongoing voting with sarcasm.

Best,
Wolfgang



> Since the question of use cases was brought up: I am working as a 
> scientist, and one of the uses I thought of when proposing this was 
> that it could be used in combination with any kind of iterator that 
> can yield an infinite number of elements, but you only want the first 
> few elements up to a certain value (note: this is related to, but not 
> the same as saying I want a certain number of elements from the iterator).
>
> Let?s take the often used example of the Fibonacci iterator and assume 
> you have an instance 'fibo' of its iterable class implementation, then:
>
> [n for n in fibo while n <10000]
>
> would return a list with all Fibonacci numbers that are smaller than 
> 10000 (without having to know in advance how many such numbers there are).
> Likewise, with prime numbers and a 'prime' iterator:
>
> [n for n in prime while n<10000]
>
> and many other scientifically useful numeric sequences.
> I would appreciate such a feature, and, even though everything can be 
> solved with itertools, I think it?s too much typing and thinking for 
> generating a list quickly.
>

This is definitely a problematic use case for a simple list comprehension,
but the takewhile solution works exactly as expected and even resembles your
solution. It is in the standard library and it's performance seems to be
fast enough (to me at least, on a 10 year old laptop). And the key phrase
here is "simple list comprehension".
Yours is in theory a simple list comprehension but is rather a slightly more
complex case that can be handled in a barely more complex way. itertools is
a part of the standard library that needs more affection, in my opinion and
really does its best to accommodate these more complex cases in sensible
ways.

I am still -1 on this.


Cheers,
Ian



From ethan at stoneleaf.us  Mon Jan 28 16:53:44 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 28 Jan 2013 07:53:44 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301281221.58842.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301251058.28531.mark.hackett@metoffice.gov.uk>
	<5102B76B.2080106@stoneleaf.us>
	<201301281221.58842.mark.hackett@metoffice.gov.uk>
Message-ID: <51069F08.8070000@stoneleaf.us>

On 01/28/2013 04:21 AM, Mark Hackett wrote:
> On Friday 25 Jan 2013, Ethan Furman wrote:
>> We're going to have to agree to disagree on this point -- I think there
>> is a huge difference between reassigning a variable which is completely
>> under your control from losing entire columns of data from a file which
>> you may have never seen before.
>>
>
> But if you've never seen it before, how do you know that you're going to get a
> LIST in one column?

I don't, which is why an exception should be raised.

~Ethan~


From mark.hackett at metoffice.gov.uk  Mon Jan 28 18:13:52 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Mon, 28 Jan 2013 17:13:52 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <51069F08.8070000@stoneleaf.us>
References: <1358903168.4767.4.camel@webb>
	<201301281221.58842.mark.hackett@metoffice.gov.uk>
	<51069F08.8070000@stoneleaf.us>
Message-ID: <201301281713.52322.mark.hackett@metoffice.gov.uk>

On Monday 28 Jan 2013, Ethan Furman wrote:
> On 01/28/2013 04:21 AM, Mark Hackett wrote:
> > On Friday 25 Jan 2013, Ethan Furman wrote:
> >> We're going to have to agree to disagree on this point -- I think there
> >> is a huge difference between reassigning a variable which is completely
> >> under your control from losing entire columns of data from a file which
> >> you may have never seen before.
> >
> > But if you've never seen it before, how do you know that you're going to
> > get a LIST in one column?
> 
> I don't, which is why an exception should be raised.
> 
> ~Ethan~

And there's an argument for that that I've agreed to before.

There's a counter that this will cause programs that used to work to fail.

Whether the pro is higher than the con or the other way round is what I 
question.

You, however, seem to believe this is a forgone conclusion.

And that's where I disagree.


From python at mrabarnett.plus.com  Mon Jan 28 18:20:50 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Mon, 28 Jan 2013 17:20:50 +0000
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <51049915.3060808@mrabarnett.plus.com>
References: <51049915.3060808@mrabarnett.plus.com>
Message-ID: <5106B372.5040803@mrabarnett.plus.com>

The point has been made that you don't want an interruption in the
middle of an exception handling routine. That's true. You also don't
want an interruption in the middle of a 'finally' block.

I think the problem here is that most of what I've been talking about
regarding the context manager actually belongs to the 'try' statement;
context managers are, after all, built on the 'try' statement.

In the following the flags, the exception and the thread's method have
been renamed.

On entry to a 'try' statement, heed_thread_cancel is saved.

When an exception is raised, heed_thread_cancel is set to True. This
ensures that normal exceptions take priority.

On entry to a 'finally' block, heed_thread_cancel is set to True. This
ensures that the block will not be interrupted.

Execution will leave the 'try' statement in one of two ways:

1. Normal exit (i.e. the next statement to be executed will be the one
after the 'try' statement). heed_thread_cancel is restored. If
heed_thread_cancel is True and thread_cancelled is also True, then
heed_thread_cancel is set to False and ThreadCancelled is raised.

2. Exception propagation (i.e. the 'try' statement has not handled the
exception). The saved heed_thread_cancel is discarded
(heed_thread_cancel remains False) and the propagation continues.

If the same logic applies to the keyboard interrupt (and the only real
difference between ThreadCancelled and KeyboardInterrupt is that the
former is triggered by the thread's 'cancel' method while the latter is
triggered by the user pressing ^C or the equivalent), then the user
pressing ^C will no longer interrupt the code in 'finally' blocks,
breaking the clean-up code in context managers.


From python at mrabarnett.plus.com  Mon Jan 28 18:26:31 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Mon, 28 Jan 2013 17:26:31 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <51069F08.8070000@stoneleaf.us>
References: <1358903168.4767.4.camel@webb>
	<201301251058.28531.mark.hackett@metoffice.gov.uk>
	<5102B76B.2080106@stoneleaf.us>
	<201301281221.58842.mark.hackett@metoffice.gov.uk>
	<51069F08.8070000@stoneleaf.us>
Message-ID: <5106B4C7.3090803@mrabarnett.plus.com>

On 2013-01-28 15:53, Ethan Furman wrote:
> On 01/28/2013 04:21 AM, Mark Hackett wrote:
>> On Friday 25 Jan 2013, Ethan Furman wrote:
>>> We're going to have to agree to disagree on this point -- I think there
>>> is a huge difference between reassigning a variable which is completely
>>> under your control from losing entire columns of data from a file which
>>> you may have never seen before.
>>>
>>
>> But if you've never seen it before, how do you know that you're going to get a
>> LIST in one column?
>
> I don't, which is why an exception should be raised.
>
+1

It shouldn't silently drop the columns, nor should it silently merge
the columns into a list. It should complain, unless you state that it
should merge if necessary because, presumably, you're prepared for such
an eventuality.



From mark.hackett at metoffice.gov.uk  Mon Jan 28 18:45:16 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Mon, 28 Jan 2013 17:45:16 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5106B4C7.3090803@mrabarnett.plus.com>
References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us>
	<5106B4C7.3090803@mrabarnett.plus.com>
Message-ID: <201301281745.16485.mark.hackett@metoffice.gov.uk>

On Monday 28 Jan 2013, MRAB wrote:
> It shouldn't silently drop the columns
> 

Why not?

It's adding to a dictionary and adding a duplicate key replaces the earlier 
one.

If it dropped the columns and shouldn't have, then the results will be seen to 
be wrong anyway, so there's not a huge amount of need for this.

If it WANTED to keep both columns with the duplicate names, it won't work and 
needs abandoning. So no different from now.

If it WANTED duplicate keys (e.g. blanks which aren't imported and aren't 
wanted), then you've just broken it. They can't necessarily change the csv file 
to put headers in. So now you've made the call useless for this case.

And why, really, are there duplicate column names in there anyway? You can 
come up with the assertion that this might be wanted, but they're not normally 
what you see in a csv file.

I've never seen nor used a csv file that duplicated column names other than 
being blank.

If it had been such a problem, the call would already have been abandoned.


From ethan at stoneleaf.us  Mon Jan 28 21:39:54 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 28 Jan 2013 12:39:54 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
Message-ID: <5106E21A.1000507@stoneleaf.us>

On 01/28/2013 05:33 AM, Wolfgang Maier wrote:
> Consider this:
>
> some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"]     #
> a sorted list of names
> [n for n in some_names if n.startswith("A")]
> # certainly gives a list of all names starting with A, but .
> [n for n in some_names while n.startswith("A")]
> # would have saved two comparisons

What happens when you want the names that start with 'B'?  The advantage 
of 'if' is it processes the entire list so grabs all items that match, 
and the list does not have to be ordered.  The disadvantage (can be) 
that it processes the entire list.

Given that 'while' would only work on sorted lists, and could only start 
from the beginning, I think it may be too specialized.

But I wouldn't groan if someone wanted to code it up.  :)

+0

~Ethan~


From wolfgang.maier at biologie.uni-freiburg.de  Mon Jan 28 22:22:09 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Mon, 28 Jan 2013 21:22:09 +0000 (UTC)
Subject: [Python-ideas] while conditional in list comprehension ??
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<5106E21A.1000507@stoneleaf.us>
Message-ID: <loom.20130128T215214-714@post.gmane.org>

>Ethan Furman <ethan at ...> writes:

> 
> On 01/28/2013 05:33 AM, Wolfgang Maier wrote:
> > Consider this:
> >
> > some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"]     #
> > a sorted list of names
> > [n for n in some_names if n.startswith("A")]
> > # certainly gives a list of all names starting with A, but .
> > [n for n in some_names while n.startswith("A")]
> > # would have saved two comparisons
> 
> What happens when you want the names that start with 'B'?  The advantage 
> of 'if' is it processes the entire list so grabs all items that match, 
> and the list does not have to be ordered.  The disadvantage (can be) 
> that it processes the entire list.
> 
> Given that 'while' would only work on sorted lists, and could only start 
> from the beginning, I think it may be too specialized.
> 
> But I wouldn't groan if someone wanted to code it up.  :)
> 
> +0
> 
> ~Ethan~
> 


I thought about this question, and I agree this is not what the while clause
would be best for.
However, currently you could solve tasks like this with itertools.takewhile in
the following (almost perl-like) way (I illustrate things with numbers to keep
it simpler):

l=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
# now retrieve all numbers from 10 to 19 (combining takewhile and slicing)
[n for n in itertools.takewhile(lambda n:n<20,l[len([x for x in
itertools.takewhile(lambda x:x<10,l)]):])]

Nice, isn't it?

If I am not mistaken, then with my suggestion this would at least simplify to:

[n for n in l[len([x for x in l while x<10]):] while n<20]

Not great either, I admit, but at least it's fun to play this mindgame.
Best,
Wolfgang




From ethan at stoneleaf.us  Mon Jan 28 22:43:33 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Mon, 28 Jan 2013 13:43:33 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <loom.20130128T215214-714@post.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<5106E21A.1000507@stoneleaf.us>
	<loom.20130128T215214-714@post.gmane.org>
Message-ID: <5106F105.3060001@stoneleaf.us>

On 01/28/2013 01:22 PM, Wolfgang Maier wrote:
> However, currently you could solve tasks like this with itertools.takewhile in
> the following (almost perl-like) way (I illustrate things with numbers to keep
> it simpler):
>
> l=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]
> # now retrieve all numbers from 10 to 19 (combining takewhile and slicing)
> [n for n in itertools.takewhile(lambda n:n<20,l[len([x for x in
> itertools.takewhile(lambda x:x<10,l)]):])]
>
> Nice, isn't it?
>
> If I am not mistaken, then with my suggestion this would at least simplify to:
>
> [n for n in l[len([x for x in l while x<10]):] while n<20]
>
> Not great either, I admit, but at least it's fun to play this mindgame.

Well, as long as we're dreaming, how about

[n for n in l while 10 <= n < 20]

and somebody (else!) can code to skip until the first condition is met, 
then keep until the second condition is met, and then stop.

:)

~Ethan~



From wolfgang.maier at biologie.uni-freiburg.de  Mon Jan 28 23:01:40 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Mon, 28 Jan 2013 22:01:40 +0000 (UTC)
Subject: [Python-ideas] while conditional in list comprehension ??
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<5106E21A.1000507@stoneleaf.us>
	<loom.20130128T215214-714@post.gmane.org>
	<5106F105.3060001@stoneleaf.us>
Message-ID: <loom.20130128T225553-12@post.gmane.org>

>Ethan Furman <ethan at ...> writes:
>
> Well, as long as we're dreaming, how about
> 
> [n for n in l while 10 <= n < 20]
> 
> and somebody (else!) can code to skip until the first condition is met, 
> then keep until the second condition is met, and then stop.
> 
> :)


Sounds great ;)
Here it's 11 p.m. so dreaming sounds like a reasonable thing to do,
Wolfgang




From saghul at gmail.com  Mon Jan 28 23:48:54 2013
From: saghul at gmail.com (=?ISO-8859-1?Q?Sa=FAl_Ibarra_Corretg=E9?=)
Date: Mon, 28 Jan 2013 23:48:54 +0100
Subject: [Python-ideas] libuv based eventloop for tulip experiment
Message-ID: <51070056.8020006@gmail.com>

Hi all!

I haven't been able to keep up with all the tulip development on the 
mailing list (hopefully I will now!) so please excuse me if something I 
mention has already been discussed.

For those who may not know it, libuv is the platform layer library for 
nodejs, which implements a uniform interface on top of epoll, kqueue, 
event ports and iocp. I wrote Python bindings [1] for it a while ago, 
and I was very excited to see Tulip, so I thought I'd give this a try.

Here [2] is the source code, along with some notes I took during the 
implementation.

I know that the idea is not to re-implement the PEP itself but for 
people to create different EventLoop implementations. On rose I bundled 
tulip just to make a single package I could play with easily, once tulip 
makes it to the stdlib only the EventLop will remain.

Here are some thoughts (in no particular order):

- add_connector / remove_connector seem to be related to Windows, but 
being exposed like that feels a bit like leaking an implementation 
detail. I guess there was no way around it.

- libuv implements a type of handle (Poll) which provides 
level-triggered file descriptor polling which also works on Windows, 
while being highly performant. It uses something called AFD Polling 
apparently, which is only available on Windows >= Vista, and a select 
thread on XP. I'm no Windows expert, but thanks to this the API is 
consistent across all platforms, which is nice. mAybe it's worth 
investigating? [3]

- The transport abstraction seems quite tight to socket objects. pyuv 
provides a TCP and UDP handles, which provide a completion-style API and 
use a better approach than Poll handles. They should give better 
performance since EINTR in handled internally and there are less 
roundtrips between Python-land and C-land. Was it ever considered to 
provide some sort of abstraction so that transports can be used on top 
of something other than regular sockets? For example I see no way to get 
the remote party from the transport, without checking the underlying socket.

Thanks for reading this far and keep up the good work.


Regards,

[1]: https://github.com/saghul/pyuv
[2]: https://github.com/saghul/rose
[3]: https://github.com/joyent/libuv/blob/master/src/win/poll.c

-- 
Sa?l Ibarra Corretg?
http://saghul.net/blog | http://about.me/saghul


From tjreedy at udel.edu  Tue Jan 29 00:27:08 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 28 Jan 2013 18:27:08 -0500
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
Message-ID: <ke71ge$u1m$1@ger.gmane.org>

On 1/28/2013 8:33 AM, Wolfgang Maier wrote:
> Dear all,
> I guess this is so obvious that someone must have suggested it before:

No one who understands comprehensions would suggest this.

> in list comprehensions you can currently exclude items based on the if
> conditional, e.g.:
>
> [n for n in range(1,1000) if n % 4 == 0]
>
> Why not extend this filtering by allowing a while statement in addition to
> if, as in:

Why not? Because it is flat-out wrong. Or if you prefer, nonsensical. 
You want to break, not filter; and you are depending on the order of the 
items from the iterator. Comprehensions are a math-logic idea invented 
for (unordered) sets and borrowed by computer science and extended to 
sequences. However, sequences do not replace sets.

https://en.wikipedia.org/wiki/Set-builder_notation
https://en.wikipedia.org/wiki/List_comprehension

Python has also extended the idea to dicts and iterators and uses almost 
exactly the same syntax for all 4 variations.

> [n for n in range(1,1000) while n < 400]

This would translate as

def _temp():
   res = []
   for n in range(1, 1000):
     while n < 400):
       res.append(n)
   return res
_temp()

which makes an infinite loop, not a truncated loop.
What you actually want is

   res = []
   for n in range(1, 1000):
     if >= 400): break
     res.append(n)

which is not the form of a comprehension.

-- 
Terry Jan Reedy



From bruce at leapyear.org  Tue Jan 29 01:01:56 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Mon, 28 Jan 2013 16:01:56 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301281745.16485.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us>
	<5106B4C7.3090803@mrabarnett.plus.com>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
Message-ID: <CAGu0Anu-2+TqQyswqyM0PC1-kxfyLs++TfXnB75DwecYW=hyrw@mail.gmail.com>

The reader could return a multidict. If you know it's a multidict you an
access the 'discarded' values. Otherwise, it appears just like the dict
that we have today. A middle ground between people that don't want the
interface changed and those who want to get the multiple values.
Personally, I prefer code that raises exceptions when it gets unreasonable
input, and I think duplicate field names qualifies. But if that's the the
general sentiment than a multidict is a potential compromise.

--- Bruce
Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130128/353bfaf2/attachment.html>

From oscar.j.benjamin at gmail.com  Tue Jan 29 02:02:31 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Tue, 29 Jan 2013 01:02:31 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <ke71ge$u1m$1@ger.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org>
Message-ID: <CAHVvXxSQkXtUf_kdUEFxncjY_B+pfzbuLokN09pYTkXnGxe+MA@mail.gmail.com>

On 28 January 2013 23:27, Terry Reedy <tjreedy at udel.edu> wrote:
> On 1/28/2013 8:33 AM, Wolfgang Maier wrote:
>>
>> Dear all,
>> I guess this is so obvious that someone must have suggested it before:
>
> No one who understands comprehensions would suggest this.

That's a little strong.

>
>> in list comprehensions you can currently exclude items based on the if
>> conditional, e.g.:
>>
>> [n for n in range(1,1000) if n % 4 == 0]
>>
>> Why not extend this filtering by allowing a while statement in addition to
>> if, as in:
>
>
> Why not? Because it is flat-out wrong. Or if you prefer, nonsensical. You
> want to break, not filter; and you are depending on the order of the items
> from the iterator. Comprehensions are a math-logic idea invented for
> (unordered) sets and borrowed by computer science and extended to sequences.
> However, sequences do not replace sets.

Python's comprehensions are based on iterators that are inherently
ordered (although in some cases the order is arbitrary). In the most
common cases the comprehensions produce lists or generators that
preserve the order of the underlying iterable. I find that the cases
where the order of an iterable is relevant are very common in my own
usage of iterables and of comprehensions.

>
> https://en.wikipedia.org/wiki/Set-builder_notation
> https://en.wikipedia.org/wiki/List_comprehension
>
> Python has also extended the idea to dicts and iterators and uses almost
> exactly the same syntax for all 4 variations.

Although dicts and sets should be considered unordered they may still
be constructed from a naturally ordered iterable. There are still
cases where it makes sense to define the construction of such an
object in terms of an order-dependent rule on the underlying iterator.

>
>> [n for n in range(1,1000) while n < 400]
>
> This would translate as
>
> def _temp():
>   res = []
>   for n in range(1, 1000):
>     while n < 400):
>       res.append(n)
>   return res
> _temp()

I guess this is what you mean by "No one who understands
comprehensions would suggest this." Of course those are not the
suggested semantics but I guess from this that you would object to a
while clause that had a different meaning.

> which makes an infinite loop, not a truncated loop.
> What you actually want is
>
>   res = []
>   for n in range(1, 1000):
>     if >= 400): break
>     res.append(n)
>
> which is not the form of a comprehension.

The form of a comprehension is not unchangeable.


Oscar


From graffatcolmingov at gmail.com  Tue Jan 29 02:12:08 2013
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Mon, 28 Jan 2013 20:12:08 -0500
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxSQkXtUf_kdUEFxncjY_B+pfzbuLokN09pYTkXnGxe+MA@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org>
	<CAHVvXxSQkXtUf_kdUEFxncjY_B+pfzbuLokN09pYTkXnGxe+MA@mail.gmail.com>
Message-ID: <CAN-Kwu1Suod=LraFqvscaNqW5pOUr8vEXJ4wijoG7z0se8AaLw@mail.gmail.com>

On Mon, Jan 28, 2013 at 8:02 PM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> On 28 January 2013 23:27, Terry Reedy <tjreedy at udel.edu> wrote:
>> On 1/28/2013 8:33 AM, Wolfgang Maier wrote:
>>>
>>> Dear all,
>>> I guess this is so obvious that someone must have suggested it before:
>>
>> No one who understands comprehensions would suggest this.
>
> That's a little strong.
>
>>
>>> in list comprehensions you can currently exclude items based on the if
>>> conditional, e.g.:
>>>
>>> [n for n in range(1,1000) if n % 4 == 0]
>>>
>>> Why not extend this filtering by allowing a while statement in addition to
>>> if, as in:
>>
>>
>> Why not? Because it is flat-out wrong. Or if you prefer, nonsensical. You
>> want to break, not filter; and you are depending on the order of the items
>> from the iterator. Comprehensions are a math-logic idea invented for
>> (unordered) sets and borrowed by computer science and extended to sequences.
>> However, sequences do not replace sets.
>
> Python's comprehensions are based on iterators that are inherently
> ordered (although in some cases the order is arbitrary). In the most
> common cases the comprehensions produce lists or generators that
> preserve the order of the underlying iterable. I find that the cases
> where the order of an iterable is relevant are very common in my own
> usage of iterables and of comprehensions.
>

Technically they are not inherently ordered. You give the perfect example below.

>>
>> https://en.wikipedia.org/wiki/Set-builder_notation
>> https://en.wikipedia.org/wiki/List_comprehension
>>
>> Python has also extended the idea to dicts and iterators and uses almost
>> exactly the same syntax for all 4 variations.
>
> Although dicts and sets should be considered unordered they may still
> be constructed from a naturally ordered iterable. There are still
> cases where it makes sense to define the construction of such an
> object in terms of an order-dependent rule on the underlying iterator.
>

They may be, but they may also be constructed from an unordered
iterable. How so?
Let `d` be a non-empty dictionary, and `f` a function that defines
some mutation of it's input such that there doesn't exist x such that
x = f(x).

e = {k: f(v) for k, v in d.items()}

You're taking an unordered object (a dictionary) and making a new one
from it. An order dependent rule here would not make sense. Likewise,
if we were to do:

e = [(k, f(v)) for k, v in d.items()]

We're creating order from an object in which there is none. How could
the while statement be useful there? An if statement works fine. A
`while` statement as suggested wouldn't.

>>
>>> [n for n in range(1,1000) while n < 400]
>>
>> This would translate as
>>
>> def _temp():
>>   res = []
>>   for n in range(1, 1000):
>>     while n < 400):
>>       res.append(n)
>>   return res
>> _temp()
>
> I guess this is what you mean by "No one who understands
> comprehensions would suggest this." Of course those are not the
> suggested semantics but I guess from this that you would object to a
> while clause that had a different meaning.
>

They are not the suggested semantics. You are correct. But based upon
how list comprehensions are currently explained, one would be
reasonable to expect a list comprehension with `while` to operate like
this.

>> which makes an infinite loop, not a truncated loop.
>> What you actually want is
>>
>>   res = []
>>   for n in range(1, 1000):
>>     if >= 400): break
>>     res.append(n)
>>
>> which is not the form of a comprehension.
>
> The form of a comprehension is not unchangeable.
>

Agreed it is definitely mutable. I am just of the opinion that this is
one of those instances where it shouldn't be changed.


From alexandre.zani at gmail.com  Tue Jan 29 02:15:22 2013
From: alexandre.zani at gmail.com (Alexandre Zani)
Date: Mon, 28 Jan 2013 17:15:22 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CAGu0Anu-2+TqQyswqyM0PC1-kxfyLs++TfXnB75DwecYW=hyrw@mail.gmail.com>
References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us>
	<5106B4C7.3090803@mrabarnett.plus.com>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<CAGu0Anu-2+TqQyswqyM0PC1-kxfyLs++TfXnB75DwecYW=hyrw@mail.gmail.com>
Message-ID: <CAJVMporKt7WsBzjHB3TFABOevXBnO3T_BNXtUfcR4UY6pzRW9Q@mail.gmail.com>

I think raising an exception on duplicate headers is actually very likely
to cause working code to break. Consider that all you need for that to
happen is an extra couple of empty separators on the first line creating
two "" headers. That seems like the sort of behavior that is easy to occur
in spreadsheet programs. (Empty cells are usually not very well
differentiated from non-existent cells in spreadsheet UIs IME) A
StrictDictReader is better, but I think this is overkill.

As for a MultiDictReader, I don't think this is superior to csv.reader. In
both cases, you need to keep track of the column orders. And if you already
know the column order, you might as well just manually specify the field
names in DictReader.


On Mon, Jan 28, 2013 at 4:01 PM, Bruce Leban <bruce at leapyear.org> wrote:

> The reader could return a multidict. If you know it's a multidict you an
> access the 'discarded' values. Otherwise, it appears just like the dict
> that we have today. A middle ground between people that don't want the
> interface changed and those who want to get the multiple values.
> Personally, I prefer code that raises exceptions when it gets unreasonable
> input, and I think duplicate field names qualifies. But if that's the the
> general sentiment than a multidict is a potential compromise.
>
> --- Bruce
> Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130128/3c757462/attachment.html>

From steve at pearwood.info  Tue Jan 29 02:30:56 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 29 Jan 2013 12:30:56 +1100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <ke71ge$u1m$1@ger.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org>
Message-ID: <51072650.5090808@pearwood.info>

On 29/01/13 10:27, Terry Reedy wrote:

>> [n for n in range(1,1000) while n < 400]
>
> This would translate as
>
> def _temp():
>   res = []
>   for n in range(1, 1000):
>     while n < 400):
>       res.append(n)
>   return res
> _temp()
>
> which makes an infinite loop, not a truncated loop.

Why would it translate that way? That would be a silly decision to make. Python can decide on the semantics of a while clause in a comprehension in whatever way makes the most sense, not necessarily according to some mechanical, nonsensical translation.

We could easily decide that although [n for n in range(1,1000) if n < 400] has the semantics of:

res = []
for n in range(1, 1000):
     if n < 400):
         res.append(n)


[n for n in range(1,1000) while n < 400] could instead have the semantics of:

res = []
for n in range(1, 1000):
     if not (n < 400):
         break
     res.append(n)


If it were decided that reusing the while keyword in this way was too confusing (which doesn't seem likely, since it is a request that keeps coming up over and over again), we could use a different keyword:

[n for n in range(1,1000) until n >= 400]



> What you actually want is
>
> res = []
> for n in range(1, 1000):
>     if >= 400): break
>     res.append(n)
>
> which is not the form of a comprehension.


Why not? Was the idea of a comprehension handed down from above by a deity, never to be changed? Or is it a tool to be changed if the change makes it more useful?

Mathematical set builder notation has no notion of "break" because it is an abstraction. It takes exactly as much effort (time, memory, resources, whatever) to generate these two mathematical sets:

{1}
{x for all x in Reals if x == 1}

(using a hybrid maths/python notation which I hope is clear enough).

To put it another way, mathematically the list comp [p+1 for p in primes()] is expected to run infinitely fast. But clearly Python code is not a mathematical abstraction. So the fact that mathematical set builder notation does not include any way to break out of the loop is neither here nor there. Comprehensions are code, and need to be judged as code, not abstract mathematical identities.



-- 
Steven


From oscar.j.benjamin at gmail.com  Tue Jan 29 02:34:46 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Tue, 29 Jan 2013 01:34:46 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAN-Kwu1Suod=LraFqvscaNqW5pOUr8vEXJ4wijoG7z0se8AaLw@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org>
	<CAHVvXxSQkXtUf_kdUEFxncjY_B+pfzbuLokN09pYTkXnGxe+MA@mail.gmail.com>
	<CAN-Kwu1Suod=LraFqvscaNqW5pOUr8vEXJ4wijoG7z0se8AaLw@mail.gmail.com>
Message-ID: <CAHVvXxQb2N+5bfreidNc5xi5bS9=oxOeD-C-ZTwAS2RWWgQdVA@mail.gmail.com>

On 29 January 2013 01:12, Ian Cordasco <graffatcolmingov at gmail.com> wrote:
> On Mon, Jan 28, 2013 at 8:02 PM, Oscar Benjamin
> <oscar.j.benjamin at gmail.com> wrote:
>>
>> Although dicts and sets should be considered unordered they may still
>> be constructed from a naturally ordered iterable. There are still
>> cases where it makes sense to define the construction of such an
>> object in terms of an order-dependent rule on the underlying iterator.
>
> They may be, but they may also be constructed from an unordered
> iterable. How so?
> Let `d` be a non-empty dictionary, and `f` a function that defines
> some mutation of it's input such that there doesn't exist x such that
> x = f(x).
>
> e = {k: f(v) for k, v in d.items()}
>
> You're taking an unordered object (a dictionary) and making a new one
> from it. An order dependent rule here would not make sense. Likewise,
> if we were to do:
>
> e = [(k, f(v)) for k, v in d.items()]
>
> We're creating order from an object in which there is none. How could
> the while statement be useful there? An if statement works fine. A
> `while` statement as suggested wouldn't.

I was referring to the case of constructing an object that does not
preserve order by iterating over an object that does. Clearly a while
clause would be a lot less useful if you were iterating over an object
whose order was arbitrary: so don't use it in that case.

A (contrived) example - caching Fibonacci numbers:

# Fibonacci number generator
def fib():
    a = b = 1
    while True:
        yield a
        a, b = b, a+b

# Cache the first N fibonacci numbers
fib_cache = {n: x for n, x in zip(range(N), fib())}
# Alternative
fib_cache = {n: x for n, x in enumerate(fib()) while n < N}

# Cache the Fibonacci numbers less than X
fib_cache = {}
for n, x in enumerate(fib()):
    if x > X:
        break
    fib_cache[n] = x
# Alternative 1
fib_cache = {n: x for n, x in enumerate(takewhile(lambda x: x < X, fib()))}
# Alternative 2
fib_cache = {n: x for n, x in enumerate(fib()) while x < X}


Oscar


From graffatcolmingov at gmail.com  Tue Jan 29 02:43:22 2013
From: graffatcolmingov at gmail.com (Ian Cordasco)
Date: Mon, 28 Jan 2013 20:43:22 -0500
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxQb2N+5bfreidNc5xi5bS9=oxOeD-C-ZTwAS2RWWgQdVA@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org>
	<CAHVvXxSQkXtUf_kdUEFxncjY_B+pfzbuLokN09pYTkXnGxe+MA@mail.gmail.com>
	<CAN-Kwu1Suod=LraFqvscaNqW5pOUr8vEXJ4wijoG7z0se8AaLw@mail.gmail.com>
	<CAHVvXxQb2N+5bfreidNc5xi5bS9=oxOeD-C-ZTwAS2RWWgQdVA@mail.gmail.com>
Message-ID: <CAN-Kwu3pXqjodDW7nvB5pYokmX_XVwXm3uuE=PiT04Cb+4JTfw@mail.gmail.com>

On Mon, Jan 28, 2013 at 8:34 PM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> I was referring to the case of constructing an object that does not
> preserve order by iterating over an object that does. Clearly a while
> clause would be a lot less useful if you were iterating over an object
> whose order was arbitrary: so don't use it in that case.
>

Yeah, I'm not sure how well telling someone to use a construct of the
language will go over.

> A (contrived) example - caching Fibonacci numbers:
>
> # Fibonacci number generator
> def fib():
>     a = b = 1
>     while True:
>         yield a
>         a, b = b, a+b
>
> # Cache the first N fibonacci numbers
> fib_cache = {n: x for n, x in zip(range(N), fib())}
> # Alternative
> fib_cache = {n: x for n, x in enumerate(fib()) while n < N}
>
> # Cache the Fibonacci numbers less than X
> fib_cache = {}
> for n, x in enumerate(fib()):
>     if x > X:
>         break
>     fib_cache[n] = x
> # Alternative 1
> fib_cache = {n: x for n, x in enumerate(takewhile(lambda x: x < X, fib()))}
> # Alternative 2
> fib_cache = {n: x for n, x in enumerate(fib()) while x < X}
>

As contrived as it may be, it is a good example. Still, I dislike the
use of `while` and would rather Steven's suggestion of `until` were
this to be included. This would make `until` a super special case, but
then again, this construct seems special enough that only a few
examples of its usefulness can be constructed. I guess I'm more -0
with `until` than -1.

Thanks for the extra example Oscar. It was helpful.

Cheers,
Ian


From jsbueno at python.org.br  Tue Jan 29 02:50:45 2013
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Mon, 28 Jan 2013 23:50:45 -0200
Subject: [Python-ideas] constant/enum type in stdlib
Message-ID: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>

This idea is not new - but it is stalled  -
Last I remember it came around in Python-devel in 2010, in this thread:
http://mail.python.org/pipermail/python-dev/2010-November/thread.html#105967

There is an even older PEP (PEP 354)  that was rejected just for not
being enough interest at the time -

And it was not dismissed at all - to the contrary the last e-mail in the thread
is a message from the BDLF for it to **be** ! The discussion  happened in a bad
moment as Python was mostly freature froozen for 3.2 - and it did not
show up again for Python 3.3;

The reasoning for wanting enums/ constants has been debated already -
but one of the main reasons that emerge from that thread are the ability to have
named constants (just like we have "True" and "False".

why do I think this is needed in the stdlib, and having itin a 3rd
party module is not enough? because they are an interesting thing to have,
 not only on the stdlib, but on several widely used Python projects that
 don't have other dependencies.
Having a feature like this into the stdlib allow these projects to
make use of it, without needing other dependencies, and moreover,
users which will
benefit the most out of such constants will have a wll known
"constant" type which
won't come as a surprise in each package he is using interactively or debugging.

Most of the discussion on the 2010 thread was summed up in a message by
Michael Foord in this link
http://mail.python.org/pipermail/python-dev/2010-November/106063.html
with some follow up here:
http://mail.python.org/pipermail/python-dev/2010-November/106065.html



  js
 -><-
----------

--


From shane at umbrellacode.com  Tue Jan 29 06:24:06 2013
From: shane at umbrellacode.com (Shane Green)
Date: Mon, 28 Jan 2013 21:24:06 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301281745.16485.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us>
	<5106B4C7.3090803@mrabarnett.plus.com>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
Message-ID: <CEB512C5-AD4A-4DA8-A28D-9664B0352016@umbrellacode.com>


> On Monday 28 Jan 2013, MRAB wrote:
>> It shouldn't silently drop the columns
>> 
> 
> Why not?
> 
> It's adding to a dictionary and adding a duplicate key replaces the earlier 
> one.
> 
> If it dropped the columns and shouldn't have, then the results will be seen to 
> be wrong anyway, so there's not a huge amount of need for this.
> 
> If it WANTED to keep both columns with the duplicate names, it won't work and 
> needs abandoning. So no different from now.
> 
> If it WANTED duplicate keys (e.g. blanks which aren't imported and aren't 
> wanted), then you've just broken it. They can't necessarily change the csv file 
> to put headers in. So now you've made the call useless for this case.
> 
> And why, really, are there duplicate column names in there anyway? You can 
> come up with the assertion that this might be wanted, but they're not normally 
> what you see in a csv file.
> 
> I've never seen nor used a csv file that duplicated column names other than 
> being blank.
> 
> If it had been such a problem, the call would already have been abandoned.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas


Actually I've seen a many real life examples of CSV files with repeated column names, working with log data in the energy management space.  CSV has been around for a very long time, and is used for a lot more than spreadsheets; there are a lot of funky formats out there.  Things like, every "VALUE" column is a 15 minute reading.  It seems like we're getting too hung up on dicts: all the information about a record is precisely stored by two sequences of values: the headers, and the field values.  Those entires and their order can both be useful to a consumer of CSV records, and should be made available.  The record also maps headers to corresponding value sequences for mapped access.  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130128/83f08f6a/attachment.html>

From cf.natali at gmail.com  Tue Jan 29 08:23:33 2013
From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=)
Date: Tue, 29 Jan 2013 08:23:33 +0100
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <5106B372.5040803@mrabarnett.plus.com>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
Message-ID: <CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>

> The point has been made that you don't want an interruption in the
> middle of an exception handling routine. That's true. You also don't
> want an interruption in the middle of a 'finally' block.

That's a good start :-)

> I think the problem here is that most of what I've been talking about
> regarding the context manager actually belongs to the 'try' statement;
> context managers are, after all, built on the 'try' statement.
> [...]

Several points:
- I prefer the original "interruption" word to "cancellation":
interruption is the mechanism by which a thread is notified of an
asynchronous interruption/cancellation/whatever request. Cancellation
is one of the potential outcomes of a thread interruption: the thread
could ignore it, handle it in some specific way and continue its merry
life, *or* cancel its activity and bail out. Also, "interruption" is
already familiar to anybody knowing about hardware interrupts, and has
precedent in other languages (e.g. Java, C#. pthread has cancellation
points but those are really cancellation).
- I don't understand what would happen by default, i.e. outside of any
try/context manager: could an interruption exception be raised at any
point?
- I still don't see what this brings over a simple, explicit static
Thread.interrupted() method. Interrupting a thread (through
thread.interrupt()) would just set this flag (which would probably be
an event/atomic read/write variable to assure memory visibility), and
then a thread could just call Thread.interrupted() to check for
pending interruption, and react accordingly. You don't have to mess
with the 'heed' flag when an exception is raised, you're sure an
asynchronous exception won't pop up at an arbitrary point in the code,
it's simpler, and well, "explicit is better than implicit".
The only usage I can see of an interruption exception is, as in Java,
to interrupt a blocking call (which is currently not supported).
- Really, "heed"? I've never had to look up a word in a dictionary
while reading a technical book/presentation/discussion before. I may
not be particularly good in English, but I'm positive this term will
puzzle many non native speakers...


From stephen at xemacs.org  Tue Jan 29 09:17:46 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 29 Jan 2013 17:17:46 +0900
Subject: [Python-ideas] csv.DictReader could handle headers
	more	intelligently.
In-Reply-To: <CEB512C5-AD4A-4DA8-A28D-9664B0352016@umbrellacode.com>
References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us>
	<5106B4C7.3090803@mrabarnett.plus.com>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<CEB512C5-AD4A-4DA8-A28D-9664B0352016@umbrellacode.com>
Message-ID: <87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp>

Shane Green writes:

 > Actually I've seen a many real life examples of CSV files with
 > repeated column names,

Sure, but this really isn't the issue.  If it were, "cvs.reader is
your friend" would be all the answer that the issue deserves IMHO.

 > It seems like we're getting too hung up on dicts:

Not at all.  (For reasons I don't understand) Somebody has a use case
where it's useful to have the field names stored in each record,
rather than stored once and have both field names and field values
accessed by position as needed.  The point is to return a name-value
*mapping object* for *each* row, and that may as well be a dict.

The people who suggest a multidict or a list-valued dict are missing
that point, AFAICS.  Eg, in your "BLABLA", "VALUE", ..., "VALUE"
example, position really is what matters, so a dict of any kind is
inappropriate IMO.  Again, it's arbitrary whether the list-valued dict
does d["VALUE"].append(x) or d["VALUE"].insert(0,x), and it's hard for
me to guess which it would do in practice: .append is easier to write,
but .insert seems closer to the behavior of csv.reader (which is what
we really want in your example IMO).



From amauryfa at gmail.com  Tue Jan 29 09:52:45 2013
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Tue, 29 Jan 2013 09:52:45 +0100
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
Message-ID: <CAGmFidZKQkUA46i=UEEDem8rexTy6-sbKhU=Zs4hi+p5C4Q5kA@mail.gmail.com>

2013/1/29 Charles-Fran?ois Natali <cf.natali at gmail.com>

> > The point has been made that you don't want an interruption in the
> > middle of an exception handling routine. That's true. You also don't
> > want an interruption in the middle of a 'finally' block.
>
> That's a good start :-)


But is it feasible?
Is it possible to handle the case where a finally block calls another
Python function?

-- 
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/f2805c2a/attachment.html>

From shane at umbrellacode.com  Tue Jan 29 10:21:17 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 01:21:17 -0800
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJKn6hE1zWujnDi=5dUtRsdovM7741G9bK0e4vQJvmbDPA@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<CANSw7KwafX+qGiMszK7LVMs=xacmyEUjGLDPdqGzfW95GxbXfg@mail.gmail.com>
	<20130127122121.6b779ada@pitrou.net>
	<CANSw7KyuPCg9Ot6tY3ML_WBquO0PfFVqpzWPRpa3o8gmTyVS_A@mail.gmail.com>
	<1359288997.3488.2.camel@localhost.localdomain>
	<CAP7+vJ+av5zRDDLThgo7CCn_xLJ7rY3u-myP6Hihf50j0z7pYQ@mail.gmail.com>
	<EDC83381-4C64-4215-A90B-C72F2327BCA7@umbrellacode.com>
	<CAP7+vJKn6hE1zWujnDi=5dUtRsdovM7741G9bK0e4vQJvmbDPA@mail.gmail.com>
Message-ID: <11D4B601-0234-41B0-8EA4-7078EFD5E30D@umbrellacode.com>

Right.  I was thinking about it from too high of a level, I think, and focused too much on a single example, HTTPS.  To clarify a couple things, though, I actually didn't mean for transports to populate the state with superfluous information or things they didn't already know.  Again, based on the single example I was considering, i was thinking they could intelligently populate it with state they know will be needed, and already have.  Like the HTTPS server spawning a new HTTPS transport channel knows the channel will need its SSL information, and the transport can add it's own socket connection to the state in case the protocol needs it.  I had also thought the state might somehow end up participating in get_extra_info(), so the expensive information returned was stored there; more importantly, though, I didn't mean for any such calls to be made preemptively in an attempt to populate state just in case a protocol did need it.  

HTTPS is a single, atypical example that's too high-level?and something similar to WSGI seemed like a reasonable approach ;-)




Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 28, 2013, at 7:45 AM, Guido van Rossum <guido at python.org> wrote:

> On Mon, Jan 28, 2013 at 12:57 AM, Shane Green <shane at umbrellacode.com> wrote:
>> What about giving the protocol an environ info object that should have all
>> information it needs already, which could (and probably should) include
>> things like the SSL certificate information, and would probably also be
>> where additional info that happened to be looked up, like host name details,
>> was stored and accessed.  Assuming the transports, etc., can define all the
>> state information a protocol needs, can operate without hardware
>> dependencies; in case that doesn't happen, though, the state dict will also
>> have references to the socket, so the protocol could get to directly if
>> needed.
> 
> Hm. I'm not keen on precomputing all of that, since most protocols
> won't need it, and the cost add up. This is not WSGI. The protocol has
> the transport object and can ask it specific questions -- if through a
> general API, like get_extra_info(key, [default]).
> 
> -- 
> --Guido van Rossum (python.org/~guido)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/fccbc496/attachment.html>

From solipsis at pitrou.net  Tue Jan 29 10:54:43 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 29 Jan 2013 10:54:43 +0100
Subject: [Python-ideas] Interrupting threads
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
Message-ID: <20130129105443.2804520b@pitrou.net>

Le Tue, 29 Jan 2013 08:23:33 +0100,
Charles-Fran?ois Natali
<cf.natali at gmail.com> a ?crit :
> - Really, "heed"? I've never had to look up a word in a dictionary
> while reading a technical book/presentation/discussion before. I may
> not be particularly good in English, but I'm positive this term will
> puzzle many non native speakers...

Ditto here. Now it's not unusual to have to learn new vocabulary, but
"heed" is obscure and makes an API difficult to understand for me.

Of course, I sympathize with native English speakers who are annoyed
by the prevalence of Globish over real English. That said, Python
already mandates American English instead of British English.

Regards

Antoine.




From shane at umbrellacode.com  Tue Jan 29 11:18:21 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 02:18:21 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us>
	<5106B4C7.3090803@mrabarnett.plus.com>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<CEB512C5-AD4A-4DA8-A28D-9664B0352016@umbrellacode.com>
	<87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <FB9C766B-AF6B-45F7-8777-FA8F00523BD7@umbrellacode.com>

So I wasn't really questioning the usefulness of the dictionary representation, but couldn't the returned object also let you access the header and value sequences, etc?  I was also thinking the conversion to simple dict with single (non-list) values per column could be part of the API.  

Appending duplicate field values as they're read reflects the order the duplicate entries appear in the source (when I've encountered CSV that purposely used duplicate column headers, the sequence they appear was critical).  The output from the current implementation should reflect the last duplicate value, as that always replaces previous ones in the dict, so my conversions returned the last value (-1), which should do the same?I think.  It was a straw man ;-).

I see your point about the point.  I think it would be good to have an implementation that kept all the information but still put the most usable API on it possible, rather than saying you can't have dictionary access unless you want to lose duplicate values, for example.  I mean, I've needed to consume CSV a lot, and that's what would have made the module useful to me, and the implementation that keeps all the information and lets it easily to trimmed as-not-needed seems better than one that just wipes it out to start.  







Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 12:17 AM, "Stephen J. Turnbull" <stephen at xemacs.org> wrote:

> Shane Green writes:
> 
>> Actually I've seen a many real life examples of CSV files with
>> repeated column names,
> 
> Sure, but this really isn't the issue.  If it were, "cvs.reader is
> your friend" would be all the answer that the issue deserves IMHO.
> 
>> It seems like we're getting too hung up on dicts:
> 
> Not at all.  (For reasons I don't understand) Somebody has a use case
> where it's useful to have the field names stored in each record,
> rather than stored once and have both field names and field values
> accessed by position as needed.  The point is to return a name-value
> *mapping object* for *each* row, and that may as well be a dict.
> 
> The people who suggest a multidict or a list-valued dict are missing
> that point, AFAICS.  Eg, in your "BLABLA", "VALUE", ..., "VALUE"
> example, position really is what matters, so a dict of any kind is
> inappropriate IMO.  Again, it's arbitrary whether the list-valued dict
> does d["VALUE"].append(x) or d["VALUE"].insert(0,x), and it's hard for
> me to guess which it would do in practice: .append is easier to write,
> but .insert seems closer to the behavior of csv.reader (which is what
> we really want in your example IMO).
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/89d98e30/attachment.html>

From ncoghlan at gmail.com  Tue Jan 29 11:44:54 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Jan 2013 20:44:54 +1000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <51072650.5090808@pearwood.info>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
Message-ID: <CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>

On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> Why would it translate that way? That would be a silly decision to make.
> Python can decide on the semantics of a while clause in a comprehension in
> whatever way makes the most sense, not necessarily according to some
> mechanical, nonsensical translation.

Terry is correct: comprehensions are deliberately designed to have the
exact same looping semantics as the equivalent statements flattened
out into a single line, with the innermost expression lifted out of
the loop body and placed in front. This then works to arbitrarily deep
nesting levels. The surrounding syntax (parentheses, brackets, braces,
and whether or not there is a colon present in the main expression)
then governs what kind of result you get (generator-iterator, list,
set, dict).

For example in:

   (x, y, z for x in a if x for y in b if y for z in c if z)
   [x, y, z for x in a if x for y in b if y for z in c if z]
   {x, y, z for x in a if x for y in b if y for z in c if z}
   {x: y, z for x in a if x for y in b if y for z in c if z}

The looping semantics of these expressions are all completely defined
by the equivalent statements:

    for x in a:
        if x:
            for y in b:
                if y:
                for z in c:
                    if z:

(modulo a few name lookup quirks if you're playing with class scopes)

Any attempt to change that fundamental equivalence between
comprehensions and the corresponding statements has basically zero
chance of getting accepted through the PEP process.

The only remotely plausible proposal I've seen in this thread is the
"else break" on the filter conditions, because that *can* be mapped
directly to the statement form in order to accurately describe the
intended semantics. However, it would fail the "just use
itertools.takewhile or a custom iterator, that use case isn't common
enough to justify dedicated syntax". The conceptual basis of Python's
comprehensions in mathematical set notation would likely also play a
part in rejecting an addition that requires an inherently procedural
interpretation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From shane at umbrellacode.com  Tue Jan 29 11:59:14 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 02:59:14 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
Message-ID: <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>

Unfortunately "else break" also kind of falls flat on its face when you consider it's being used in context of an expression.





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 2:44 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>> Why would it translate that way? That would be a silly decision to make.
>> Python can decide on the semantics of a while clause in a comprehension in
>> whatever way makes the most sense, not necessarily according to some
>> mechanical, nonsensical translation.
> 
> Terry is correct: comprehensions are deliberately designed to have the
> exact same looping semantics as the equivalent statements flattened
> out into a single line, with the innermost expression lifted out of
> the loop body and placed in front. This then works to arbitrarily deep
> nesting levels. The surrounding syntax (parentheses, brackets, braces,
> and whether or not there is a colon present in the main expression)
> then governs what kind of result you get (generator-iterator, list,
> set, dict).
> 
> For example in:
> 
>   (x, y, z for x in a if x for y in b if y for z in c if z)
>   [x, y, z for x in a if x for y in b if y for z in c if z]
>   {x, y, z for x in a if x for y in b if y for z in c if z}
>   {x: y, z for x in a if x for y in b if y for z in c if z}
> 
> The looping semantics of these expressions are all completely defined
> by the equivalent statements:
> 
>    for x in a:
>        if x:
>            for y in b:
>                if y:
>                for z in c:
>                    if z:
> 
> (modulo a few name lookup quirks if you're playing with class scopes)
> 
> Any attempt to change that fundamental equivalence between
> comprehensions and the corresponding statements has basically zero
> chance of getting accepted through the PEP process.
> 
> The only remotely plausible proposal I've seen in this thread is the
> "else break" on the filter conditions, because that *can* be mapped
> directly to the statement form in order to accurately describe the
> intended semantics. However, it would fail the "just use
> itertools.takewhile or a custom iterator, that use case isn't common
> enough to justify dedicated syntax". The conceptual basis of Python's
> comprehensions in mathematical set notation would likely also play a
> part in rejecting an addition that requires an inherently procedural
> interpretation.
> 
> Cheers,
> Nick.
> 
> -- 
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/efb0518e/attachment.html>

From oscar.j.benjamin at gmail.com  Tue Jan 29 12:16:02 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Tue, 29 Jan 2013 11:16:02 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <FB9C766B-AF6B-45F7-8777-FA8F00523BD7@umbrellacode.com>
References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us>
	<5106B4C7.3090803@mrabarnett.plus.com>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<CEB512C5-AD4A-4DA8-A28D-9664B0352016@umbrellacode.com>
	<87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp>
	<FB9C766B-AF6B-45F7-8777-FA8F00523BD7@umbrellacode.com>
Message-ID: <CAHVvXxR5ZqP9-x-M3nS8zWAgViQ6vQfWt3gHSk1zayzgGq736g@mail.gmail.com>

On 29 January 2013 10:18, Shane Green <shane at umbrellacode.com> wrote:
> So I wasn't really questioning the usefulness of the dictionary
> representation, but couldn't the returned object also let you access the
> header and value sequences, etc?  I was also thinking the conversion to
> simple dict with single (non-list) values per column could be part of the
> API.
>
> Appending duplicate field values as they're read reflects the order the
> duplicate entries appear in the source (when I've encountered CSV that
> purposely used duplicate column headers, the sequence they appear was
> critical).  The output from the current implementation should reflect the
> last duplicate value, as that always replaces previous ones in the dict, so
> my conversions returned the last value (-1), which should do the same?I
> think.  It was a straw man ;-).
>
> I see your point about the point.  I think it would be good to have an
> implementation that kept all the information but still put the most usable
> API on it possible, rather than saying you can't have dictionary access
> unless you want to lose duplicate values, for example.  I mean, I've needed
> to consume CSV a lot, and that's what would have made the module useful to
> me, and the implementation that keeps all the information and lets it easily
> to trimmed as-not-needed seems better than one that just wipes it out to
> start.

This is exactly what the csv.reader objects do.

While it is a problem that csv.DictReader silently discards data when
that is very likely an error, there's no need to try and guess how
people want to deal with duplicate column headers and invent a new
class for it. It's easy enough to write your own wrapper that exactly
performs whatever processing you happen to want:

def multireader(csvreader):
    try:
        headers = next(csvreader)
    except StopIteration:
        raise ValueError('No header')
    for row in csvreader:
        d = defaultdict(list)
        for h, v in zip(headers, row):
            d[h].append(v)
        yield d


Oscar


From shane at umbrellacode.com  Tue Jan 29 12:33:05 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 03:33:05 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CAHVvXxR5ZqP9-x-M3nS8zWAgViQ6vQfWt3gHSk1zayzgGq736g@mail.gmail.com>
References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us>
	<5106B4C7.3090803@mrabarnett.plus.com>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<CEB512C5-AD4A-4DA8-A28D-9664B0352016@umbrellacode.com>
	<87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp>
	<FB9C766B-AF6B-45F7-8777-FA8F00523BD7@umbrellacode.com>
	<CAHVvXxR5ZqP9-x-M3nS8zWAgViQ6vQfWt3gHSk1zayzgGq736g@mail.gmail.com>
Message-ID: <A75BC14B-26AF-45C6-A1F9-869B3F69DD79@umbrellacode.com>

Okay, sure, I guess the starting point of my argument is, DictReader is nice, why not make one that supports duplicate columns and easily implement the other behaviors, whether it's discarding values from duplicate columns so there's a one-to-one mapping, or just raising an exception when a duplicate column is encountered to start with, in terms of something that handles this superset of legal CSV formats that do in fact specify exactly what header names each of their values should be mapped to?  





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 3:16 AM, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:

> On 29 January 2013 10:18, Shane Green <shane at umbrellacode.com> wrote:
>> So I wasn't really questioning the usefulness of the dictionary
>> representation, but couldn't the returned object also let you access the
>> header and value sequences, etc?  I was also thinking the conversion to
>> simple dict with single (non-list) values per column could be part of the
>> API.
>> 
>> Appending duplicate field values as they're read reflects the order the
>> duplicate entries appear in the source (when I've encountered CSV that
>> purposely used duplicate column headers, the sequence they appear was
>> critical).  The output from the current implementation should reflect the
>> last duplicate value, as that always replaces previous ones in the dict, so
>> my conversions returned the last value (-1), which should do the same?I
>> think.  It was a straw man ;-).
>> 
>> I see your point about the point.  I think it would be good to have an
>> implementation that kept all the information but still put the most usable
>> API on it possible, rather than saying you can't have dictionary access
>> unless you want to lose duplicate values, for example.  I mean, I've needed
>> to consume CSV a lot, and that's what would have made the module useful to
>> me, and the implementation that keeps all the information and lets it easily
>> to trimmed as-not-needed seems better than one that just wipes it out to
>> start.
> 
> This is exactly what the csv.reader objects do.
> 
> While it is a problem that csv.DictReader silently discards data when
> that is very likely an error, there's no need to try and guess how
> people want to deal with duplicate column headers and invent a new
> class for it. It's easy enough to write your own wrapper that exactly
> performs whatever processing you happen to want:
> 
> def multireader(csvreader):
>    try:
>        headers = next(csvreader)
>    except StopIteration:
>        raise ValueError('No header')
>    for row in csvreader:
>        d = defaultdict(list)
>        for h, v in zip(headers, row):
>            d[h].append(v)
>        yield d
> 
> 
> Oscar

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/14487ddd/attachment.html>

From mark.hackett at metoffice.gov.uk  Tue Jan 29 12:39:28 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Tue, 29 Jan 2013 11:39:28 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CAJVMporKt7WsBzjHB3TFABOevXBnO3T_BNXtUfcR4UY6pzRW9Q@mail.gmail.com>
References: <1358903168.4767.4.camel@webb>
	<CAGu0Anu-2+TqQyswqyM0PC1-kxfyLs++TfXnB75DwecYW=hyrw@mail.gmail.com>
	<CAJVMporKt7WsBzjHB3TFABOevXBnO3T_BNXtUfcR4UY6pzRW9Q@mail.gmail.com>
Message-ID: <201301291139.28128.mark.hackett@metoffice.gov.uk>

On Tuesday 29 Jan 2013, Alexandre Zani wrote:
> 
> As for a MultiDictReader, I don't think this is superior to csv.reader. In
> both cases, you need to keep track of the column orders. And if you already
> know the column order, you might as well just manually specify the field
> names in DictReader.
> 

But it would allow you to access the index by name.

value=csv_array[indecies{"Total Cost"}]

A little more verbose than

value=csv_dict{"Total Cost"}

But it's easier to read what it's doing than

value=csv_array[3]


From ncoghlan at gmail.com  Tue Jan 29 12:50:07 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Jan 2013 21:50:07 +1000
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
Message-ID: <CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>

On Tue, Jan 29, 2013 at 11:50 AM, Joao S. O. Bueno
<jsbueno at python.org.br> wrote:
> This idea is not new - but it is stalled  -
> Last I remember it came around in Python-devel in 2010, in this thread:
> http://mail.python.org/pipermail/python-dev/2010-November/thread.html#105967

FWIW, since that last discussion, I've switched to using strings for
my special constants, dumping them in a container if I need some kind
of easy validity checking or iteration.

That said, an enum type may still be useful for interoperability with
other systems (databases, C APIs, etc).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From yoavglazner at gmail.com  Tue Jan 29 12:51:17 2013
From: yoavglazner at gmail.com (yoav glazner)
Date: Tue, 29 Jan 2013 13:51:17 +0200
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
Message-ID: <CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>

Here is very similar version that works (tested on python27)
>>> def stop():
next(iter([]))

>>> list((i if i<50 else stop()) for i in range(100))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49]


On Tue, Jan 29, 2013 at 12:59 PM, Shane Green <shane at umbrellacode.com>wrote:

> Unfortunately "else break" also kind of falls flat on its face when you
> consider it's being used in context of an expression.
>
>
>
>
>
> Shane Green
> www.umbrellacode.com
> 408-692-4666 | shane at umbrellacode.com
>
> On Jan 29, 2013, at 2:44 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>
> On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano <steve at pearwood.info>
> wrote:
>
> Why would it translate that way? That would be a silly decision to make.
> Python can decide on the semantics of a while clause in a comprehension in
> whatever way makes the most sense, not necessarily according to some
> mechanical, nonsensical translation.
>
>
> Terry is correct: comprehensions are deliberately designed to have the
> exact same looping semantics as the equivalent statements flattened
> out into a single line, with the innermost expression lifted out of
> the loop body and placed in front. This then works to arbitrarily deep
> nesting levels. The surrounding syntax (parentheses, brackets, braces,
> and whether or not there is a colon present in the main expression)
> then governs what kind of result you get (generator-iterator, list,
> set, dict).
>
> For example in:
>
>   (x, y, z for x in a if x for y in b if y for z in c if z)
>   [x, y, z for x in a if x for y in b if y for z in c if z]
>   {x, y, z for x in a if x for y in b if y for z in c if z}
>   {x: y, z for x in a if x for y in b if y for z in c if z}
>
> The looping semantics of these expressions are all completely defined
> by the equivalent statements:
>
>    for x in a:
>        if x:
>            for y in b:
>                if y:
>                for z in c:
>                    if z:
>
> (modulo a few name lookup quirks if you're playing with class scopes)
>
> Any attempt to change that fundamental equivalence between
> comprehensions and the corresponding statements has basically zero
> chance of getting accepted through the PEP process.
>
> The only remotely plausible proposal I've seen in this thread is the
> "else break" on the filter conditions, because that *can* be mapped
> directly to the statement form in order to accurately describe the
> intended semantics. However, it would fail the "just use
> itertools.takewhile or a custom iterator, that use case isn't common
> enough to justify dedicated syntax". The conceptual basis of Python's
> comprehensions in mathematical set notation would likely also play a
> part in rejecting an addition that requires an inherently procedural
> interpretation.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/fd70e465/attachment.html>

From ncoghlan at gmail.com  Tue Jan 29 12:53:00 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Jan 2013 21:53:00 +1000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
Message-ID: <CADiSq7emZNb6BA1V56BwAZKbDoPtt923-MViY+MGD6vQRHFGtg@mail.gmail.com>

On Tue, Jan 29, 2013 at 8:59 PM, Shane Green <shane at umbrellacode.com> wrote:
> Unfortunately "else break" also kind of falls flat on its face when you
> consider it's being used in context of an expression.

Not really, since comprehensions are all about providing expression
forms of the equivalent statements. I'm not saying "else break" would
get approved (I actually don't think that's likely for other reasons),
just that it isn't clearly dead in the water due to the inconsistency
with the statement semantics (which is the core problem with the
"while" suggestion).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From shane at umbrellacode.com  Tue Jan 29 12:54:09 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 03:54:09 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301291139.28128.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<CAGu0Anu-2+TqQyswqyM0PC1-kxfyLs++TfXnB75DwecYW=hyrw@mail.gmail.com>
	<CAJVMporKt7WsBzjHB3TFABOevXBnO3T_BNXtUfcR4UY6pzRW9Q@mail.gmail.com>
	<201301291139.28128.mark.hackett@metoffice.gov.uk>
Message-ID: <4FE11280-A0C2-4485-82A5-C8057145B61B@umbrellacode.com>

> And funky CSV formats don't make the current version not work for anyone. It 
> works for the people it's been working for all along. Why stop that?

Agreed: I'm actually not for changing the existing stuff. I don't think something that used to return single values, should start returning lists, and if it's going to start raising exceptions, I think that should be an option you enable explicitly.  I think maybe this should be deprecated, in favor something that implements what we're discussing.  I'm also realizing that way of thinking means it's slightly off topic, and apologize for that ;-)



Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 3:39 AM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

> On Tuesday 29 Jan 2013, Alexandre Zani wrote:
>> 
>> As for a MultiDictReader, I don't think this is superior to csv.reader. In
>> both cases, you need to keep track of the column orders. And if you already
>> know the column order, you might as well just manually specify the field
>> names in DictReader.
>> 
> 
> But it would allow you to access the index by name.
> 
> value=csv_array[indecies{"Total Cost"}]
> 
> A little more verbose than
> 
> value=csv_dict{"Total Cost"}
> 
> But it's easier to read what it's doing than
> 
> value=csv_array[3]
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/a371dff3/attachment.html>

From wolfgang.maier at biologie.uni-freiburg.de  Tue Jan 29 13:03:49 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Tue, 29 Jan 2013 12:03:49 +0000 (UTC)
Subject: [Python-ideas] while conditional in list comprehension ??
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
Message-ID: <loom.20130129T124016-570@post.gmane.org>

> Nick Coghlan <ncoghlan at ...> writes:
>
> 
> On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano <steve at ...> wrote:
> > Why would it translate that way? That would be a silly decision to make.
> > Python can decide on the semantics of a while clause in a comprehension in
> > whatever way makes the most sense, not necessarily according to some
> > mechanical, nonsensical translation.
> 
> Terry is correct: comprehensions are deliberately designed to have the
> exact same looping semantics as the equivalent statements flattened
> out into a single line, with the innermost expression lifted out of
> the loop body and placed in front. This then works to arbitrarily deep
> nesting levels. The surrounding syntax (parentheses, brackets, braces,
> and whether or not there is a colon present in the main expression)
> then governs what kind of result you get (generator-iterator, list,
> set, dict).
> 
> For example in:
> 
>    (x, y, z for x in a if x for y in b if y for z in c if z)
>    [x, y, z for x in a if x for y in b if y for z in c if z]
>    {x, y, z for x in a if x for y in b if y for z in c if z}
>    {x: y, z for x in a if x for y in b if y for z in c if z}
> 
> The looping semantics of these expressions are all completely defined
> by the equivalent statements:
> 
>     for x in a:
>         if x:
>             for y in b:
>                 if y:
>                 for z in c:
>                     if z:
> 
> (modulo a few name lookup quirks if you're playing with class scopes)
> 
> Any attempt to change that fundamental equivalence between
> comprehensions and the corresponding statements has basically zero
> chance of getting accepted through the PEP process.
> 
> The only remotely plausible proposal I've seen in this thread is the
> "else break" on the filter conditions, because that *can* be mapped
> directly to the statement form in order to accurately describe the
> intended semantics. However, it would fail the "just use
> itertools.takewhile or a custom iterator, that use case isn't common
> enough to justify dedicated syntax". The conceptual basis of Python's
> comprehensions in mathematical set notation would likely also play a
> part in rejecting an addition that requires an inherently procedural
> interpretation.
> 
> Cheers,
> Nick.
> 

Thanks Nick, that is really helpful, as I can now see where the problem really
lies for the developer team.
I agree that under these circumstances my suggestion is inacceptable. You know,
I am just a python user, and I don't know about your development paradigms.

Knowing about them, let me make a wild suggestion (and I am sure it has no
chance of getting accepted either, it's more of a test to see if I understood
the problem):
You could introduce a new 'breakif <condition>' statement, which would be
equivalent to 'if <condition>: break'.
Its use as a standalone statement could be allowed (but since its equivalent is
already very simple it would be a very minor change). In addition, however the
'breakif' could be integrated into comprehensions just like 'if', and could be
translated directly into loops of any nesting level without ambiguities.

another note: in light of your explanation, it looks like the earlier suggestion
of 'else break' would also work without ambiguities since with the rigid logic
applied, there would be no doubt which of several 'for' loops gets broken by the
'break'.

Thanks for any comments on this (and please :), don't yell at me for asking for
a new keyword to achieve something minor, I already understood that part).

Best,
Wolfgang




From mark.hackett at metoffice.gov.uk  Tue Jan 29 13:07:16 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Tue, 29 Jan 2013 12:07:16 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <loom.20130129T124016-570@post.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<loom.20130129T124016-570@post.gmane.org>
Message-ID: <201301291207.16626.mark.hackett@metoffice.gov.uk>

On Tuesday 29 Jan 2013, Wolfgang Maier wrote:
> 
> another note: in light of your explanation, it looks like the earlier
>  suggestion of 'else break' would also work without ambiguities since with
>  the rigid logic applied, there would be no doubt which of several 'for'
>  loops gets broken by the 'break'.
> 

Deeply nested loops you want to break out of more than just the current loop 
of are why goto is still warranted in a language.

Rules are there to make you THINK before you break them!


From steve at pearwood.info  Tue Jan 29 13:09:47 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 29 Jan 2013 23:09:47 +1100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
Message-ID: <5107BC0B.8090808@pearwood.info>

On 29/01/13 00:33, Wolfgang Maier wrote:
> Dear all,
> I guess this is so obvious that someone must have suggested it before:
> in list comprehensions you can currently exclude items based on the if
> conditional, e.g.:
>
> [n for n in range(1,1000) if n % 4 == 0]
>
> Why not extend this filtering by allowing a while statement in addition to
> if, as in:
>
> [n for n in range(1,1000) while n<  400]


Comprehensions in Clojure have this feature.

http://clojuredocs.org/clojure_core/clojure.core/for

;; :when continues through the collection even if some have the
;; condition evaluate to false, like filter
user=> (for [x (range 3 33 2) :when (prime? x)]
          x)
(3 5 7 11 13 17 19 23 29 31)

;; :while stops at the first collection element that evaluates to
;; false, like take-while
user=> (for [x (range 3 33 2) :while (prime? x)]
          x)
(3 5 7)



So there is precedent in at least one other language for this
obvious and useful feature.




-- 
Steven


From ncoghlan at gmail.com  Tue Jan 29 13:15:30 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Jan 2013 22:15:30 +1000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <loom.20130129T124016-570@post.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<loom.20130129T124016-570@post.gmane.org>
Message-ID: <CADiSq7erio3J=igybgH=h2ABmV2sS6WW1KDqAqyiqcAUM5-b+A@mail.gmail.com>

On Tue, Jan 29, 2013 at 10:03 PM, Wolfgang Maier
<wolfgang.maier at biologie.uni-freiburg.de> wrote:
> Thanks for any comments on this (and please :), don't yell at me for asking for
> a new keyword to achieve something minor, I already understood that part).

I try not to do that - the judgement calls we have to make in
designing the language don't always have obvious solutions, and part
of the reason python-ideas exists is as a place for people to share
ideas that turn out to be questionable, for the sake of uncovering
those ideas that turn out to be worthwhile. I've had several proposals
make their way into Python over the years, but they're still
outnumbered by the ones which didn't make it (many because I decided
not to propose them in the first place, but quite a few others because
people on python-ideas and python-dev pointed out flaws, drawbacks and
inconsistencies that I had missed).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From steve at pearwood.info  Tue Jan 29 13:26:13 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Tue, 29 Jan 2013 23:26:13 +1100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301281745.16485.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us>
	<5106B4C7.3090803@mrabarnett.plus.com>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
Message-ID: <5107BFE5.6010800@pearwood.info>

On 29/01/13 04:45, Mark Hackett wrote:
> On Monday 28 Jan 2013, MRAB wrote:
>> It shouldn't silently drop the columns
>>
>
> Why not?
>
> It's adding to a dictionary and adding a duplicate key replaces the earlier
> one.

Then adding to a dictionary was a mistake.

The choice of a dict is *implementation*, not *interface*. The interface needed
is to return a mapping of column names to values. The nature of that mapping is
an implementation detail, and dict is only the simplest solution, not necessarily
the correct solution.

There is nothing about CSV files that imply that the right behaviour is to drop
columns. The nature of CSV files is to allow duplicate column names, and so CSV
readers should too. That implies that using a dict, which silently drops duplicate
keys, was the wrong choice.

We might argue that using duplicate column names is stupid, but CSV supports it,
and so should CSV readers.


> If it dropped the columns and shouldn't have, then the results will be seen to
> be wrong anyway, so there's not a huge amount of need for this.

You cannot assume that the caller knows that there are duplicated column names.
That's why dropping columns is problematic: it *silently* drops them, giving the
caller no idea that it has happened.

Given that DictReader already exists, and that there probably is someone out
there who is relying on it silently eating columns, I think that the only
reasonable way forward is to add a new reader that supports multiple columns
with the same name. The caller can then use whichever reader suits their
use-case:


* I don't care about duplicate-name columns, just give me some arbitrary one;
   - use DictReader

* I want all of the duplicate-name columns;
   - use MultiDictReader

* I want some of the duplicate-name columns;
   - use MultiDictReader, and then filter the results as you get them


(When I put it like that, DictReader sounds even less useful. But as I said,
I daresay *somebody* is relying on it right now, so we can't change it.)


> And why, really, are there duplicate column names in there anyway? You can
> come up with the assertion that this might be wanted, but they're not normally
> what you see in a csv file.
>
> I've never seen nor used a csv file that duplicated column names other than
> being blank.

Well there you go. That is exactly one such example of duplicate column names.




-- 
Steven


From mark.hackett at metoffice.gov.uk  Tue Jan 29 13:30:49 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Tue, 29 Jan 2013 12:30:49 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5107BFE5.6010800@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
Message-ID: <201301291230.49247.mark.hackett@metoffice.gov.uk>

On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
> On 29/01/13 04:45, Mark Hackett wrote:
> > On Monday 28 Jan 2013, MRAB wrote:
> >> It shouldn't silently drop the columns
> >
> > Why not?
> >
> > It's adding to a dictionary and adding a duplicate key replaces the
> > earlier one.
> 
> Then adding to a dictionary was a mistake.
> 

I agree.

So don't use DictReader in that case.

We have Oscar with the method to do your own (and looked fairly simple and 
straightforward).
Chris with carefuldictreader.
Shane with his dual-retention object.


From mark.hackett at metoffice.gov.uk  Tue Jan 29 13:35:01 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Tue, 29 Jan 2013 12:35:01 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5107BFE5.6010800@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
Message-ID: <201301291235.01513.mark.hackett@metoffice.gov.uk>

On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
> > If it dropped the columns and shouldn't have, then the results will be
> > seen to be wrong anyway, so there's not a huge amount of need for this.
> 
> You cannot assume that the caller knows that there are duplicated column
>  names
> 

You cannot assume they wanted them as a list.

You cannot assume that duplicate replacement is what they want.

If someone is using a csv file with header names they have never read, how are 
they going to use the data? They won't even know the name to access the value 
in the dictionary! So I discard the claim that the caller may not know the 
column names are duplicated. They have to know what the headers are to use 
DictReader.


From jsbueno at python.org.br  Tue Jan 29 13:35:22 2013
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Tue, 29 Jan 2013 10:35:22 -0200
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
Message-ID: <CAH0mxTSoAtG1FP6UMDEN964TEvjdRVPL9g3BOnFrJ61wSf8ehw@mail.gmail.com>

On 29 January 2013 09:51, yoav glazner <yoavglazner at gmail.com> wrote:
> Here is very similar version that works (tested on python27)
>>>> def stop():
> next(iter([]))
>
>>>> list((i if i<50 else stop()) for i in range(100))
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
> 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
> 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

Great. I think this nails it. It is exactly the intended behavior,
and very readable under current language capabilities.

One does not have to stop and go read what "itertools.takewhile" does,
and mentally unfold the lambda guard expression - that is what makes
this (and the O.P. request)  more readable than using takewhile.

Note: stop can also just explictly raise StopIteration -
or your next(iter([])) expression can be inlined within the generator.

It works in Python 3 as well - though for those who did not test:
it won't work for list, dicr or set  comprehensions - just for
generator expressions.


 js
-><-


From steve at pearwood.info  Tue Jan 29 14:08:03 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 30 Jan 2013 00:08:03 +1100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
Message-ID: <5107C9B3.2010608@pearwood.info>

On 29/01/13 21:44, Nick Coghlan wrote:
> On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano<steve at pearwood.info>  wrote:
>> Why would it translate that way? That would be a silly decision to make.
>> Python can decide on the semantics of a while clause in a comprehension in
>> whatever way makes the most sense, not necessarily according to some
>> mechanical, nonsensical translation.
>
> Terry is correct: comprehensions are deliberately designed to have the
> exact same looping semantics as the equivalent statements flattened
> out into a single line, with the innermost expression lifted out of
> the loop body and placed in front.


You have inadvertently supported the point I am trying to make: what is
*deliberately designed* by people one way can be deliberately designed
another way instead. List comps have the form, and limitations, they
have because of people's decisions. People could decide differently.

A while clause in a comprehension can map to the same statement form as
currently used. Just because the parser sees "while" inside a
comprehension doesn't mean that the underlying implementation has to
literally insert a while loop inside a for-loop. Terry is right about
one thing: that would lead to an entirely pointless infinite loop.

Where Terry gets it wrong is to suppose that the only *conceivable* way
to handle syntax that looks like [x for x in seq while condition] is to
insert a while loop inside a for loop. But "while" is just a convenient
keyword that looks good, is readable, and has a natural interpretation
as executable pseudo-code. We could invent a new keyword if we wished,
say "jabberwock", and treat "jabberwock cond" inside a comprehension as
equivalent to "if cond else break":

    (x, y for x in a jabberwock x for y in b jabberwock y)

     for x in a:
         if x:
             for y in b:
                 if y:
                     yield (x, y)
                 else: break
         else: break


If you, as a core developer, tell me that in practice this would be
exceedingly hard for the CPython implementation to do, I can only trust
your opinion since I am not qualified to argue.

But since you've already allowed that permitting "if cond else break"
in comprehensions would be possible, I find it rather difficult to
believe that spelling it "jabberwock cond" is not.



> The only remotely plausible proposal I've seen in this thread is the
> "else break" on the filter conditions,

Which just begs for confusion and misunderstanding. Just wait until people
start asking why they can't write "else some_expression", and we have to
explain that inside a comprehension, the only thing allowed to follow "else"
is "break".




-- 
Steven


From shane at umbrellacode.com  Tue Jan 29 14:08:25 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 05:08:25 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301291235.01513.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
Message-ID: <ED261157-CBED-459E-ABB7-E40BB8B121CE@umbrellacode.com>

Let's remove the assumptions about their information by retaining all of it, and make an assumption that everyone is capable of dealing with lists. 





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 4:35 AM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

> On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
>>> If it dropped the columns and shouldn't have, then the results will be
>>> seen to be wrong anyway, so there's not a huge amount of need for this.
>> 
>> You cannot assume that the caller knows that there are duplicated column
>> names
>> 
> 
> You cannot assume they wanted them as a list.
> 
> You cannot assume that duplicate replacement is what they want.
> 
> If someone is using a csv file with header names they have never read, how are 
> they going to use the data? They won't even know the name to access the value 
> in the dictionary! So I discard the claim that the caller may not know the 
> column names are duplicated. They have to know what the headers are to use 
> DictReader.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/0ef9537a/attachment.html>

From shibturn at gmail.com  Tue Jan 29 14:18:44 2013
From: shibturn at gmail.com (Richard Oudkerk)
Date: Tue, 29 Jan 2013 13:18:44 +0000
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <20130129105443.2804520b@pitrou.net>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
	<20130129105443.2804520b@pitrou.net>
Message-ID: <ke8i82$rj6$1@ger.gmane.org>

On 29/01/2013 9:54am, Antoine Pitrou wrote:
> Of course, I sympathize with native English speakers who are annoyed
> by the prevalence of Globish over real English. That said, Python
> already mandates American English instead of British English.

Is Future.cancelled() an acceptable American spelling?

-- 
Richard



From solipsis at pitrou.net  Tue Jan 29 14:25:05 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Tue, 29 Jan 2013 14:25:05 +0100
Subject: [Python-ideas] Interrupting threads
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
	<20130129105443.2804520b@pitrou.net> <ke8i82$rj6$1@ger.gmane.org>
Message-ID: <20130129142505.285bdc23@pitrou.net>

Le Tue, 29 Jan 2013 13:18:44 +0000,
Richard Oudkerk <shibturn at gmail.com> a
?crit :

> On 29/01/2013 9:54am, Antoine Pitrou wrote:
> > Of course, I sympathize with native English speakers who are annoyed
> > by the prevalence of Globish over real English. That said, Python
> > already mandates American English instead of British English.
> 
> Is Future.cancelled() an acceptable American spelling?

You shouldn't ask me. The only thing I can tell you is that it's not
acceptable French :-)

Regards

Antoine.




From steve at pearwood.info  Tue Jan 29 14:28:19 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 30 Jan 2013 00:28:19 +1100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301291235.01513.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
Message-ID: <5107CE73.8070209@pearwood.info>

On 29/01/13 23:35, Mark Hackett wrote:
> On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
>>> If it dropped the columns and shouldn't have, then the results will be
>>> seen to be wrong anyway, so there's not a huge amount of need for this.
>>
>> You cannot assume that the caller knows that there are duplicated column
>>   names
>>
>
> You cannot assume they wanted them as a list.

I don't need to assume that. They can take the list and post-process it into
any data type they want.

A list is a natural fit for associating multiple values to a single key,
because it doesn't lose data: it is variable-sized, so it can handle "no
values" or "1000 values" equally easily; it is ordered, and it is iterable.
If the caller wants something else, they can convert it.

> You cannot assume that duplicate replacement is what they want.

I don't think I ever suggested that it was.


> If someone is using a csv file with header names they have never read, how are
> they going to use the data?

reader = csv.DictReader(whatever)
for mapping in reader:
     for key, value in mapping.items():
         process(key, value)


Or perhaps you only care about one column, and don't care about the other, unknown,
columns:

for mapping in reader:
     value = mapping.get('spam', 'some default')
     process(value)



> They won't even know the name to access the value in the dictionary!

Dealing with arbitrary field names in data you read from a file is not hard.



-- 
Steven


From ncoghlan at gmail.com  Tue Jan 29 14:35:56 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 29 Jan 2013 23:35:56 +1000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <5107C9B3.2010608@pearwood.info>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<5107C9B3.2010608@pearwood.info>
Message-ID: <CADiSq7eJF1+v0+bCMqK8evByK5ib0VhW7LNf4K3XxhoJ7xO==g@mail.gmail.com>

On Tue, Jan 29, 2013 at 11:08 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On 29/01/13 21:44, Nick Coghlan wrote:
>>
>> On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano<steve at pearwood.info>
>> wrote:
>>>
>>> Why would it translate that way? That would be a silly decision to make.
>>> Python can decide on the semantics of a while clause in a comprehension
>>> in
>>> whatever way makes the most sense, not necessarily according to some
>>> mechanical, nonsensical translation.
>>
>>
>> Terry is correct: comprehensions are deliberately designed to have the
>> exact same looping semantics as the equivalent statements flattened
>> out into a single line, with the innermost expression lifted out of
>> the loop body and placed in front.
>
>
>
> You have inadvertently supported the point I am trying to make: what is
> *deliberately designed* by people one way can be deliberately designed
> another way instead. List comps have the form, and limitations, they
> have because of people's decisions. People could decide differently.

"People" could. I'm telling you *we* (as in python-dev) won't.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From mark.hackett at metoffice.gov.uk  Tue Jan 29 14:44:35 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Tue, 29 Jan 2013 13:44:35 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5107CE73.8070209@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
	<5107CE73.8070209@pearwood.info>
Message-ID: <201301291344.35342.mark.hackett@metoffice.gov.uk>

On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
> On 29/01/13 23:35, Mark Hackett wrote:
> > On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
> >>> If it dropped the columns and shouldn't have, then the results will be
> >>> seen to be wrong anyway, so there's not a huge amount of need for this.
> >>
> >> You cannot assume that the caller knows that there are duplicated column
> >>   names
> >
> > You cannot assume they wanted them as a list.
> 
> I don't need to assume that. They can take the list and post-process it
>  into any data type they want.

Yes you ARE assuming it. You want them to post process it. But if they don't 
know there are duplicates there and have found their script works for their 
needs and therefore never looked, they will now get the wrong answer.

As Oscar says, they could process the csv file themselves by hand and code in 
EXACTLY what they want. They don't have to put it in a dictionary then.

And you've already said

> Then adding to a dictionary was a mistake.

So they shouldn't be using DictReader.


From shane at umbrellacode.com  Tue Jan 29 14:45:25 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 05:45:25 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301291310.48404.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
	<ED261157-CBED-459E-ABB7-E40BB8B121CE@umbrellacode.com>
	<201301291310.48404.mark.hackett@metoffice.gov.uk>
Message-ID: <48511CC7-69B5-4FDE-98C9-07765FCEBAAE@umbrellacode.com>

On Jan 29, 2013, at 5:10 AM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

> On Tuesday 29 Jan 2013, you wrote:
>> Let's remove the assumptions about their information by retaining all of
>> it, and make an assumption that everyone is capable of dealing with lists.
>> 
> 
> Then lets not use a dictionary. And leave the DictReader alone.
> 

Yes, I think a more useful CSV construct would map header names to lists of values, provide access to original header and value sequences, and methods for iterating sequential (header,value) items (with possibly repeating header values, and which could be fed to dict() to produce exactly what DictReader produces),  As such, it would not be a DictReader because it would produce something that just extended the dictionary API.  I would think something like CSVRecord, or just Record, would be more accurate. 



From bborcic at gmail.com  Tue Jan 29 14:53:29 2013
From: bborcic at gmail.com (Boris Borcic)
Date: Tue, 29 Jan 2013 14:53:29 +0100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
Message-ID: <ke8k8m$akm$1@ger.gmane.org>

|>>> def notyet(cond) :
	if cond :
		raise StopIteration
	return True

|>>> list(x for x in range(100) if notyet(x>10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]




From shane at umbrellacode.com  Tue Jan 29 15:09:12 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 06:09:12 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301291344.35342.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
	<5107CE73.8070209@pearwood.info>
	<201301291344.35342.mark.hackett@metoffice.gov.uk>
Message-ID: <BF090B90-F430-484B-BD7E-B7B032AE285A@umbrellacode.com>

I'm not sure this is constructive.

I think it's safe to assume changing something in an API that used to return single values, into something that now returns lists of those values, will be a problem for folks.  

I also think it's safe to assume folks can design their applications for an API that returns lists of values.  In support of this assumption, I will point out that's precisely what CGI's FieldStorage does to represent all HTML form values because some form values (radio buttons, checkboxes, etc.), can have more than one value associated with their name on submission.

Finally, I would assert that the more legally formatted content your content reader accurately reads and handles, the better.







Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 5:44 AM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

> On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
>> On 29/01/13 23:35, Mark Hackett wrote:
>>> On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
>>>>> If it dropped the columns and shouldn't have, then the results will be
>>>>> seen to be wrong anyway, so there's not a huge amount of need for this.
>>>> 
>>>> You cannot assume that the caller knows that there are duplicated column
>>>>  names
>>> 
>>> You cannot assume they wanted them as a list.
>> 
>> I don't need to assume that. They can take the list and post-process it
>> into any data type they want.
> 
> Yes you ARE assuming it. You want them to post process it. But if they don't 
> know there are duplicates there and have found their script works for their 
> needs and therefore never looked, they will now get the wrong answer.
> 
> As Oscar says, they could process the csv file themselves by hand and code in 
> EXACTLY what they want. They don't have to put it in a dictionary then.
> 
> And you've already said
> 
>> Then adding to a dictionary was a mistake.
> 
> So they shouldn't be using DictReader.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/7c851a93/attachment.html>

From shane at umbrellacode.com  Tue Jan 29 15:13:22 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 06:13:22 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <ke8k8m$akm$1@ger.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke8k8m$akm$1@ger.gmane.org>
Message-ID: <6B3751EA-339A-4A7F-88D9-545B56AE675A@umbrellacode.com>

How funny? I tried a variation of that because one of my original thoughts had been "[? if x else raise StopIteration()]" may have also made some sense.  But I tried it based on the example from earlier, and hadn't even considered it was even closer?.





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 5:53 AM, Boris Borcic <bborcic at gmail.com> wrote:

> |>>> def notyet(cond) :
> 	if cond :
> 		raise StopIteration
> 	return True
> 
> |>>> list(x for x in range(100) if notyet(x>10))
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/337a0a5e/attachment.html>

From shane at umbrellacode.com  Tue Jan 29 15:16:01 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 06:16:01 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <6B3751EA-339A-4A7F-88D9-545B56AE675A@umbrellacode.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke8k8m$akm$1@ger.gmane.org>
	<6B3751EA-339A-4A7F-88D9-545B56AE675A@umbrellacode.com>
Message-ID: <368FE5D3-4628-406F-9AA1-F93238AF1FF4@umbrellacode.com>

And, stupidly, I didn't put it in a generator?doh!





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 6:13 AM, Shane Green <shane at umbrellacode.com> wrote:

> How funny? I tried a variation of that because one of my original thoughts had been "[? if x else raise StopIteration()]" may have also made some sense.  But I tried it based on the example from earlier, and hadn't even considered it was even closer?.
> 
> 
> 
> 
> 
> Shane Green 
> www.umbrellacode.com
> 408-692-4666 | shane at umbrellacode.com
> 
> On Jan 29, 2013, at 5:53 AM, Boris Borcic <bborcic at gmail.com> wrote:
> 
>> |>>> def notyet(cond) :
>> 	if cond :
>> 		raise StopIteration
>> 	return True
>> 
>> |>>> list(x for x in range(100) if notyet(x>10))
>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>> 
>> 
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/82ba07f2/attachment.html>

From wolfgang.maier at biologie.uni-freiburg.de  Tue Jan 29 15:32:39 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Tue, 29 Jan 2013 14:32:39 +0000 (UTC)
Subject: [Python-ideas] while conditional in list comprehension ??
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke8k8m$akm$1@ger.gmane.org>
Message-ID: <loom.20130129T152832-59@post.gmane.org>

> Boris Borcic <bborcic at ...> writes:
>
> 
> |>>> def notyet(cond) :
> 	if cond :
> 		raise StopIteration
> 	return True
> 
> |>>> list(x for x in range(100) if notyet(x>10))
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
> 

Are you trying to say you entered that code and it ran?
I would be very surprised: if you could simply 'raise StopIteration' within the
'if' clause then there would be no point to the discussion.
But as it is, your StopIteration should not be caught by the 'for', but will be
raised directly. Did you try running it?





From wolfgang.maier at biologie.uni-freiburg.de  Tue Jan 29 15:36:40 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Tue, 29 Jan 2013 14:36:40 +0000 (UTC)
Subject: [Python-ideas] while conditional in list comprehension ??
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke8k8m$akm$1@ger.gmane.org>
	<loom.20130129T152832-59@post.gmane.org>
Message-ID: <loom.20130129T153505-509@post.gmane.org>

> Are you trying to say you entered that code and it ran?
> I would be very surprised: if you could simply 'raise StopIteration' within the
> 'if' clause then there would be no point to the discussion.
> But as it is, your StopIteration should not be caught by the 'for', but will be
> raised directly. Did you try running it?

Sorry, I missed your enclosing list(), which explains things of course.
Cheers,
Wolfgang





From shane at umbrellacode.com  Tue Jan 29 15:45:14 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 06:45:14 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <loom.20130129T153505-509@post.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke8k8m$akm$1@ger.gmane.org>
	<loom.20130129T152832-59@post.gmane.org>
	<loom.20130129T153505-509@post.gmane.org>
Message-ID: <FAAAD7DF-65E2-46DE-B451-7F55756D92FC@umbrellacode.com>

Here's what I was doing, and worked when i switched to the generator: 

>>> def stop(): 
?     raise StopIteration()

>>> list(((x if x < 5 else stop()) for x in range(10)))
[0, 1, 2, 3, 4]





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 6:36 AM, Wolfgang Maier <wolfgang.maier at biologie.uni-freiburg.de> wrote:

>> Are you trying to say you entered that code and it ran?
>> I would be very surprised: if you could simply 'raise StopIteration' within the
>> 'if' clause then there would be no point to the discussion.
>> But as it is, your StopIteration should not be caught by the 'for', but will be
>> raised directly. Did you try running it?
> 
> Sorry, I missed your enclosing list(), which explains things of course.
> Cheers,
> Wolfgang
> 
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/9a2d6c9e/attachment.html>

From rosuav at gmail.com  Tue Jan 29 15:55:09 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Wed, 30 Jan 2013 01:55:09 +1100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <BF090B90-F430-484B-BD7E-B7B032AE285A@umbrellacode.com>
References: <1358903168.4767.4.camel@webb>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
	<5107CE73.8070209@pearwood.info>
	<201301291344.35342.mark.hackett@metoffice.gov.uk>
	<BF090B90-F430-484B-BD7E-B7B032AE285A@umbrellacode.com>
Message-ID: <CAPTjJmp0MCGOUqVRakGkTh39FJNPK8KTDGVqZHzg9j7BXLzo7Q@mail.gmail.com>

On Wed, Jan 30, 2013 at 1:09 AM, Shane Green <shane at umbrellacode.com> wrote:
> I think it's safe to assume changing something in an API that used to return
> single values, into something that now returns lists of those values, will
> be a problem for folks.
>
> I also think it's safe to assume folks can design their applications for an
> API that returns lists of values.

Agreed on both points. A new API that returns lists of everything
would be a lot safer than fiddling with the current one.

ChrisA


From rob.cliffe at btinternet.com  Tue Jan 29 16:02:40 2013
From: rob.cliffe at btinternet.com (Rob Cliffe)
Date: Tue, 29 Jan 2013 15:02:40 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
Message-ID: <5107E490.9070501@btinternet.com>


On 29/01/2013 10:44, Nick Coghlan wrote:
> Terry is correct: comprehensions are deliberately designed to have the
> exact same looping semantics as the equivalent statements flattened
> out into a single line, with the innermost expression lifted out of
> the loop body and placed in front. This then works to arbitrarily deep
> nesting levels. The surrounding syntax (parentheses, brackets, braces,
> and whether or not there is a colon present in the main expression)
> then governs what kind of result you get (generator-iterator, list,
> set, dict).
>
> For example in:
>
>     (x, y, z for x in a if x for y in b if y for z in c if z)
>     [x, y, z for x in a if x for y in b if y for z in c if z]
>     {x, y, z for x in a if x for y in b if y for z in c if z}
>     {x: y, z for x in a if x for y in b if y for z in c if z}
>
> The looping semantics of these expressions are all completely defined
> by the equivalent statements:
>
>      for x in a:
>          if x:
>              for y in b:
>                  if y:
>                  for z in c:
>                      if z:
>
> (modulo a few name lookup quirks if you're playing with class scopes)
>
Thanks for spelling this out so clearly.  It helps me remember which 
order to place nested "for"s inside a list comprehension! :-)



From wolfgang.maier at biologie.uni-freiburg.de  Tue Jan 29 16:24:55 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Tue, 29 Jan 2013 15:24:55 +0000 (UTC)
Subject: [Python-ideas] while conditional in list comprehension ??
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke8k8m$akm$1@ger.gmane.org>
	<loom.20130129T152832-59@post.gmane.org>
	<loom.20130129T153505-509@post.gmane.org>
	<FAAAD7DF-65E2-46DE-B451-7F55756D92FC@umbrellacode.com>
Message-ID: <loom.20130129T161011-496@post.gmane.org>

yoav glazner <yoavglazner at ...> writes:
> > Here is very similar version that works (tested on python27)
> >>>> def stop():
> > next(iter([]))
> >
> >>>> list((i if i<50 else stop()) for i in range(100))
> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

Joao S. O. Bueno <jsbueno at ...> writes:
> Great. I think this nails it. It is exactly the intended behavior,
> and very readable under current language capabilities.
> 
> One does not have to stop and go read what "itertools.takewhile" does,
> and mentally unfold the lambda guard expression - that is what makes
> this (and the O.P. request)  more readable than using takewhile.
> 
> Note: stop can also just explictly raise StopIteration -
> or your next(iter([])) expression can be inlined within the generator.
> 
> It works in Python 3 as well - though for those who did not test:
> it won't work for list, dicr or set  comprehensions - just for
> generator expressions.
>

Shane Green <shane at ...> writes:
> 
> Here's what I was doing, and worked when i switched to the generator: 
> 
> >>> def stop(): 
> ?     raise StopIteration()
> 
> 
> >>> list(((x if x < 5 else stop()) for x in range(10)))
> [0, 1, 2, 3, 4]

Wow, thanks to the three of you!
I think it's still not as clear what the code does as it would be with my
'while' suggestion. Particularly, the fact that this is not a simple 'if'-or-not
decision for individual elements of the list, but in fact terminates the list
with the first non-matching element (the while-like property) can easily be
overlooked.
However, I find it much more appealing to use built-in python semantics than to
resort to the also hard to read itertools.takewhile().
In addition, this is also the fastest solution that was brought up so far. In my
hands, it runs about 2x as fast as the equivalent takewhile construct, which in
turn is just marginally faster than Boris Borcic's suggestion:
|>>> def notyet(cond) :
	if cond :
		raise StopIteration
	return True

|>>> list(x for x in range(100) if notyet(x>10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

I guess, I'll use your solution in my code from now on.
Best,
Wolfgang





From oscar.j.benjamin at gmail.com  Tue Jan 29 16:25:40 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Tue, 29 Jan 2013 15:25:40 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
Message-ID: <CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>

On 29 January 2013 11:51, yoav glazner <yoavglazner at gmail.com> wrote:
> Here is very similar version that works (tested on python27)
>>>> def stop():
> next(iter([]))
>
>>>> list((i if i<50 else stop()) for i in range(100))
> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
> 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
> 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

That's a great idea. You could also do:
>>> list(i for i in range(100) if i<50 or stop())

It's a shame it doesn't work for list/set/dict comprehensions, though.


Oscar


From zachary.ware+pyideas at gmail.com  Tue Jan 29 16:34:13 2013
From: zachary.ware+pyideas at gmail.com (Zachary Ware)
Date: Tue, 29 Jan 2013 09:34:13 -0600
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
Message-ID: <CAKJDb-MDm6FwRd-anZ28U6EOF5+1LddojTstzOnMrxWZFJpbiA@mail.gmail.com>

On Jan 29, 2013 9:26 AM, "Oscar Benjamin" <oscar.j.benjamin at gmail.com>
wrote:
>
> On 29 January 2013 11:51, yoav glazner <yoavglazner at gmail.com> wrote:
> > Here is very similar version that works (tested on python27)
> >>>> def stop():
> > next(iter([]))
> >
> >>>> list((i if i<50 else stop()) for i in range(100))
> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20,
> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39,
> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
>
> That's a great idea. You could also do:
> >>> list(i for i in range(100) if i<50 or stop())
>
> It's a shame it doesn't work for list/set/dict comprehensions, though.
>

I know I'm showing my ignorance here, but how are list/dict/set
comprehensions and generator expressions implemented differently that one's
for loop will catch a StopIteration and the others won't? Would it make
sense to reimplement list/dict/set comprehensions as an equivalent
generator expression passed to the appropriate constructor, and thereby
allow the StopIteration trick to work for each of them as well?

Regards,

Zach Ware
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/66089f35/attachment.html>

From wolfgang.maier at biologie.uni-freiburg.de  Tue Jan 29 16:44:01 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Tue, 29 Jan 2013 15:44:01 +0000 (UTC)
Subject: [Python-ideas] while conditional in list comprehension ??
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
Message-ID: <loom.20130129T163910-565@post.gmane.org>

Oscar Benjamin <oscar.j.benjamin at ...> writes:

> 
> On 29 January 2013 11:51, yoav glazner <yoavglazner at ...> wrote:
> > Here is very similar version that works (tested on python27)
> >>>> def stop():
> > next(iter([]))
> >
> >>>> list((i if i<50 else stop()) for i in range(100))
> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
> 
> That's a great idea. You could also do:
> >>> list(i for i in range(100) if i<50 or stop())
> 
> It's a shame it doesn't work for list/set/dict comprehensions, though.
> 
> Oscar
>

list(i for i in range(100) if i<50 or stop())
Really (!) nice (and 2x as fast as using itertools.takewhile())!

With the somewhat simpler (suggested earlier by Shane)

def stop():
   raise StopIteration

this should become part of the python cookbook!!

Thanks a real lot for working this out,
Wolfgang







From eliben at gmail.com  Tue Jan 29 17:00:07 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Tue, 29 Jan 2013 08:00:07 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
Message-ID: <CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>

On Tue, Jan 29, 2013 at 3:50 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> On Tue, Jan 29, 2013 at 11:50 AM, Joao S. O. Bueno
> <jsbueno at python.org.br> wrote:
> > This idea is not new - but it is stalled  -
> > Last I remember it came around in Python-devel in 2010, in this thread:
> >
> http://mail.python.org/pipermail/python-dev/2010-November/thread.html#105967
>
> FWIW, since that last discussion, I've switched to using strings for
> my special constants, dumping them in a container if I need some kind
> of easy validity checking or iteration.
>
> That said, an enum type may still be useful for interoperability with
> other systems (databases, C APIs, etc).
>

I really wish there would be an enum type in Python that would make sense.
ISTM this has been raised numerous times, but not one submitted a
good-enough proposal.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/e50eb19e/attachment.html>

From jsbueno at python.org.br  Tue Jan 29 17:01:28 2013
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Tue, 29 Jan 2013 14:01:28 -0200
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAKJDb-MDm6FwRd-anZ28U6EOF5+1LddojTstzOnMrxWZFJpbiA@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<CAKJDb-MDm6FwRd-anZ28U6EOF5+1LddojTstzOnMrxWZFJpbiA@mail.gmail.com>
Message-ID: <CAH0mxTQC-Lw_pLuKZ+hrmLCxcP8tB9DtdZRRvmGZTrkkrffpwQ@mail.gmail.com>

On 29 January 2013 13:34, Zachary Ware <zachary.ware+pyideas at gmail.com> wrote:
>
> On Jan 29, 2013 9:26 AM, "Oscar Benjamin" <oscar.j.benjamin at gmail.com>
> wrote:
>>
>> On 29 January 2013 11:51, yoav glazner <yoavglazner at gmail.com> wrote:
>> > Here is very similar version that works (tested on python27)
>> >>>> def stop():
>> > next(iter([]))
>> >
>> >>>> list((i if i<50 else stop()) for i in range(100))
>> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>> > 20,
>> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
>> > 39,
>> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
>>
>> That's a great idea. You could also do:
>> >>> list(i for i in range(100) if i<50 or stop())
>>
>> It's a shame it doesn't work for list/set/dict comprehensions, though.
>>
>
> I know I'm showing my ignorance here, but how are list/dict/set
> comprehensions and generator expressions implemented differently that one's
> for loop will catch a StopIteration and the others won't? Would it make
> sense to reimplement list/dict/set comprehensions as an equivalent generator
> expression passed to the appropriate constructor, and thereby allow the
> StopIteration trick to work for each of them as well?
>

That is because whil list/set/dict constructors are sort of "self contained",
a generator expression - they will expect the StopIteration to be raised by the
iterator in the "for" part os the expression.

The generator expression, on the other hand, is an iterator in itself, and it is
expected to raise a StopIteration sometime.

The code put aroundit to actually execute it will catch the StopIteration -
and it won't care wether it was raised by the for iterator or by any other
expression in the iterator.

I mean - when you do list(bla for bla in blargh) the generator is
exausted inside the "list"
call - and this generator exaustion is signaled by the StopIteration
exception in both cases.


> Regards,
>
> Zach Ware
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>


From oscar.j.benjamin at gmail.com  Tue Jan 29 17:02:35 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Tue, 29 Jan 2013 16:02:35 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAKJDb-MDm6FwRd-anZ28U6EOF5+1LddojTstzOnMrxWZFJpbiA@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<CAKJDb-MDm6FwRd-anZ28U6EOF5+1LddojTstzOnMrxWZFJpbiA@mail.gmail.com>
Message-ID: <CAHVvXxSXNG0D+OmEsy4Pwp0nynW=ED7Ut042hzW7AjDUdAv72A@mail.gmail.com>

On 29 January 2013 15:34, Zachary Ware <zachary.ware+pyideas at gmail.com> wrote:
>
> On Jan 29, 2013 9:26 AM, "Oscar Benjamin" <oscar.j.benjamin at gmail.com>
> wrote:
>>
>> On 29 January 2013 11:51, yoav glazner <yoavglazner at gmail.com> wrote:
>> > Here is very similar version that works (tested on python27)
>> >>>> def stop():
>> > next(iter([]))
>> >
>> >>>> list((i if i<50 else stop()) for i in range(100))
>> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
>> > 20,
>> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
>> > 39,
>> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
>>
>> That's a great idea. You could also do:
>> >>> list(i for i in range(100) if i<50 or stop())
>>
>> It's a shame it doesn't work for list/set/dict comprehensions, though.
>>
>
> I know I'm showing my ignorance here, but how are list/dict/set
> comprehensions and generator expressions implemented differently that one's
> for loop will catch a StopIteration and the others won't? Would it make
> sense to reimplement list/dict/set comprehensions as an equivalent generator
> expression passed to the appropriate constructor, and thereby allow the
> StopIteration trick to work for each of them as well?

A for loop is like a while loop with a try/except handler for
StopIteration. So the following are roughly equivalent:

# For loop
for x in iterable:
    func1(x)
else:
    func2()

# Equivalent loop
it = iter(iterable)
while True:
    try:
        x = next(it)
    except StopIteration:
        func2()
        break
    func1(x)

A list comprehension is just like an implicit for loop with limited
functionality so it looks like:

# List comp
results = [func1(x) for x in iterable if func2(x)]

# Equivalent loop
results = []
it = iter(iterable)
while True:
    try:
        x = next(it)
    except StopIteration:
        break
    # This part is outside the try/except
    if func2(x):
        results.append(func1(x))

The problem in the above is that we only catch StopIteration around
the call to next(). So if either of func1 or func2 raises
StopIteration the exception will propagate rather than terminate the
loop. (This may mean that it terminates a for loop higher in the call
stack - which can lead to confusing bugs - so it's important to always
catch StopIteration anywhere it might get raised.)

The difference with the list(generator) version is that func1() and
func2() are both called inside the call to next() from the perspective
of the list() function. This means that if they raise StopIteration
then the try/except handler in the enclosing list function will catch
it and terminate its loop.

# list(generator)
results = list(func1(x) for x in iterable if func2(c))

# Equivalent loop:
def list(iterable):
    it = iter(iterable)
    results = []
    while True:
        try:
            # Now func1 and func2 are both called in next() here
            x = next(it)
        except StopIteration:
            break
        results.append(x)
    return results

results_gen = (func1(x) for x in iterable if func2(x))
results = list(results_gen)


Oscar


From jsbueno at python.org.br  Tue Jan 29 17:09:20 2013
From: jsbueno at python.org.br (Joao S. O. Bueno)
Date: Tue, 29 Jan 2013 14:09:20 -0200
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
Message-ID: <CAH0mxTQXW+ep2n0Fdz4WBLTNaGRnP4RD=CpqeFT5PJPzBF=oOw@mail.gmail.com>

On 29 January 2013 14:00, Eli Bendersky <eliben at gmail.com> wrote:
>
> On Tue, Jan 29, 2013 at 3:50 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>> On Tue, Jan 29, 2013 at 11:50 AM, Joao S. O. Bueno
>> <jsbueno at python.org.br> wrote:
>> > This idea is not new - but it is stalled  -
>> > Last I remember it came around in Python-devel in 2010, in this thread:
>> >
>> > http://mail.python.org/pipermail/python-dev/2010-November/thread.html#105967
>>
>> FWIW, since that last discussion, I've switched to using strings for
>> my special constants, dumping them in a container if I need some kind
>> of easy validity checking or iteration.
>>
>> That said, an enum type may still be useful for interoperability with
>> other systems (databases, C APIs, etc).
>
>
> I really wish there would be an enum type in Python that would make sense.
> ISTM this has been raised numerous times, but not one submitted a
> good-enough proposal.

As I pointed above, this last discussion was coming to a good term. Bad timing
and no one clearly saying, with all the words "Michael Foord, please
make this into a PEP"
made it fade away, I think.

  js
 -><-


>
> Eli
>
>


From zachary.ware+pyideas at gmail.com  Tue Jan 29 17:23:33 2013
From: zachary.ware+pyideas at gmail.com (Zachary Ware)
Date: Tue, 29 Jan 2013 10:23:33 -0600
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxSXNG0D+OmEsy4Pwp0nynW=ED7Ut042hzW7AjDUdAv72A@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<CAKJDb-MDm6FwRd-anZ28U6EOF5+1LddojTstzOnMrxWZFJpbiA@mail.gmail.com>
	<CAHVvXxSXNG0D+OmEsy4Pwp0nynW=ED7Ut042hzW7AjDUdAv72A@mail.gmail.com>
Message-ID: <CAKJDb-MtEuNVYu3yr+OUjiynbOUXm_+aZn3QRq+pcR7LbTHK8Q@mail.gmail.com>

On Jan 29, 2013 10:02 AM, "Oscar Benjamin" <oscar.j.benjamin at gmail.com>
wrote:
>
> On 29 January 2013 15:34, Zachary Ware <zachary.ware+pyideas at gmail.com>
wrote:
> >
> > On Jan 29, 2013 9:26 AM, "Oscar Benjamin" <oscar.j.benjamin at gmail.com>
> > wrote:
> >>
> >> On 29 January 2013 11:51, yoav glazner <yoavglazner at gmail.com> wrote:
> >> > Here is very similar version that works (tested on python27)
> >> >>>> def stop():
> >> > next(iter([]))
> >> >
> >> >>>> list((i if i<50 else stop()) for i in range(100))
> >> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19,
> >> > 20,
> >> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
38,
> >> > 39,
> >> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
> >>
> >> That's a great idea. You could also do:
> >> >>> list(i for i in range(100) if i<50 or stop())
> >>
> >> It's a shame it doesn't work for list/set/dict comprehensions, though.
> >>
> >
> > I know I'm showing my ignorance here, but how are list/dict/set
> > comprehensions and generator expressions implemented differently that
one's
> > for loop will catch a StopIteration and the others won't? Would it make
> > sense to reimplement list/dict/set comprehensions as an equivalent
generator
> > expression passed to the appropriate constructor, and thereby allow the
> > StopIteration trick to work for each of them as well?
>
> A for loop is like a while loop with a try/except handler for
> StopIteration. So the following are roughly equivalent:
>
> # For loop
> for x in iterable:
>     func1(x)
> else:
>     func2()
>
> # Equivalent loop
> it = iter(iterable)
> while True:
>     try:
>         x = next(it)
>     except StopIteration:
>         func2()
>         break
>     func1(x)
>
> A list comprehension is just like an implicit for loop with limited
> functionality so it looks like:
>
> # List comp
> results = [func1(x) for x in iterable if func2(x)]
>
> # Equivalent loop
> results = []
> it = iter(iterable)
> while True:
>     try:
>         x = next(it)
>     except StopIteration:
>         break
>     # This part is outside the try/except
>     if func2(x):
>         results.append(func1(x))
>
> The problem in the above is that we only catch StopIteration around
> the call to next(). So if either of func1 or func2 raises
> StopIteration the exception will propagate rather than terminate the
> loop. (This may mean that it terminates a for loop higher in the call
> stack - which can lead to confusing bugs - so it's important to always
> catch StopIteration anywhere it might get raised.)
>
> The difference with the list(generator) version is that func1() and
> func2() are both called inside the call to next() from the perspective
> of the list() function. This means that if they raise StopIteration
> then the try/except handler in the enclosing list function will catch
> it and terminate its loop.
>
> # list(generator)
> results = list(func1(x) for x in iterable if func2(c))
>
> # Equivalent loop:
> def list(iterable):
>     it = iter(iterable)
>     results = []
>     while True:
>         try:
>             # Now func1 and func2 are both called in next() here
>             x = next(it)
>         except StopIteration:
>             break
>         results.append(x)
>     return results
>
> results_gen = (func1(x) for x in iterable if func2(x))
> results = list(results_gen)
>

That makes a lot of sense. Thank you, Oscar and Joao, for the explanations.
I wasn't thinking in enough scopes :)

Regards,

Zach Ware
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/677790b0/attachment.html>

From tjreedy at udel.edu  Tue Jan 29 17:41:09 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 29 Jan 2013 11:41:09 -0500
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <20130129105443.2804520b@pitrou.net>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
	<20130129105443.2804520b@pitrou.net>
Message-ID: <ke8u39$nvi$1@ger.gmane.org>

On 1/29/2013 4:54 AM, Antoine Pitrou wrote:
> Le Tue, 29 Jan 2013 08:23:33 +0100,
> Charles-Fran?ois Natali
> <cf.natali at gmail.com> a ?crit :
>> - Really, "heed"? I've never had to look up a word in a dictionary
>> while reading a technical book/presentation/discussion before. I may
>> not be particularly good in English, but I'm positive this term will
>> puzzle many non native speakers...
>
> Ditto here. Now it's not unusual to have to learn new vocabulary, but
> "heed" is obscure and makes an API difficult to understand for me.

As a native American English speaker, 'heed' is not obscure. Heeding 
(paying close attention to) things such as the warnings in our fine 
manual may be out of style among gung-ho programmers, but I hope it is 
not archaic;-). That said, I can imagine that 'heed' falls below the 
threshhold of usage frequency for words taught abroad.

> Of course, I sympathize with native English speakers who are annoyed
> by the prevalence of Globish over real English.That said, Python
> already mandates American English instead of British English.

Heeding warning may be 'old-fashioned', but that does not make it British.

-- 
Terry Jan Reedy




From tjreedy at udel.edu  Tue Jan 29 18:28:25 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 29 Jan 2013 12:28:25 -0500
Subject: [Python-ideas] Canceled versus cancelled (was Re: Interrupting
	threads)
In-Reply-To: <ke8i82$rj6$1@ger.gmane.org>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
	<20130129105443.2804520b@pitrou.net> <ke8i82$rj6$1@ger.gmane.org>
Message-ID: <ke90rt$khg$1@ger.gmane.org>

On 1/29/2013 8:18 AM, Richard Oudkerk wrote:
> On 29/01/2013 9:54am, Antoine Pitrou wrote:
>> Of course, I sympathize with native English speakers who are annoyed
>> by the prevalence of Globish over real English. That said, Python
>> already mandates American English instead of British English.
>
> Is Future.cancelled() an acceptable American spelling?

Slightly controversial, but 'Yes'. My 1960s Dictionary of the American 
language gives 'canceled' and 'cancelled'. Ditto for travel.  I see the 
same at modern web sites:
http://www.merriam-webster.com/dictionary/cancel
http://www.thefreedictionary.com/cancel

Both give the one el version first, and that might indicate a 
preference. But I was actually taught in school (some decades ago) to 
double the els of travel and cancel have have read the rule various 
places. I suspect that is not done now. More discussion:

http://www.reference.com/motif/language/cancelled-vs-canceled
http://grammarist.com/spelling/cancel/

The latter has a Google ngram that shows 'canceled' has become more 
common in the U.S., but only in the last 30 years. It has even crept 
into British usage.

http://books.google.com/ngrams/graph?content=canceled%2Ccancelled&year_start=1800&year_end=2000&corpus=6&smoothing=3&share=

On the other hand, just about no one, even in the U.S., currently spells 
'cancellation' as 'cancelation'. That was tried by a few writers 1910 to 
1940, but never caught on.

http://books.google.com/ngrams/graph?content=cancelation%2Ccancellation&year_start=1800&year_end=2000&corpus=17&smoothing=3&share=

-- 
Terry Jan Reedy



From breamoreboy at yahoo.co.uk  Tue Jan 29 18:38:50 2013
From: breamoreboy at yahoo.co.uk (Mark Lawrence)
Date: Tue, 29 Jan 2013 17:38:50 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301291230.49247.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
	<201301291230.49247.mark.hackett@metoffice.gov.uk>
Message-ID: <ke91d6$pur$1@ger.gmane.org>

On 29/01/2013 12:30, Mark Hackett wrote:
> On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
>> On 29/01/13 04:45, Mark Hackett wrote:
>>> On Monday 28 Jan 2013, MRAB wrote:
>>>> It shouldn't silently drop the columns
>>>
>>> Why not?
>>>
>>> It's adding to a dictionary and adding a duplicate key replaces the
>>> earlier one.
>>
>> Then adding to a dictionary was a mistake.
>>
>
> I agree.
>
> So don't use DictReader in that case.
>
> We have Oscar with the method to do your own (and looked fairly simple and
> straightforward).
> Chris with carefuldictreader.
> Shane with his dual-retention object.
>

Please can we also have a 
RemoveTheNullByteThatsPutAtheEndOfTheFileByBrainDeadMicrosoftMoney? :)

-- 
Cheers.

Mark Lawrence



From guido at python.org  Tue Jan 29 18:40:07 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Jan 2013 09:40:07 -0800
Subject: [Python-ideas] Canceled versus cancelled (was Re: Interrupting
	threads)
In-Reply-To: <ke90rt$khg$1@ger.gmane.org>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
	<20130129105443.2804520b@pitrou.net> <ke8i82$rj6$1@ger.gmane.org>
	<ke90rt$khg$1@ger.gmane.org>
Message-ID: <CAP7+vJLOsyC+PmdQtF6SjJxkPOguLe8SH+-Qttt-dVxhStXCvg@mail.gmail.com>

This is all pretty pointless given that PEP 3148 uses cancelled() and
concurrent.futures.Future has been released since Python 3.2.
Introducing single-ell aliases is just going to confuse things more.

On Tue, Jan 29, 2013 at 9:28 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 1/29/2013 8:18 AM, Richard Oudkerk wrote:
>>
>> On 29/01/2013 9:54am, Antoine Pitrou wrote:
>>>
>>> Of course, I sympathize with native English speakers who are annoyed
>>> by the prevalence of Globish over real English. That said, Python
>>> already mandates American English instead of British English.
>>
>>
>> Is Future.cancelled() an acceptable American spelling?
>
>
> Slightly controversial, but 'Yes'. My 1960s Dictionary of the American
> language gives 'canceled' and 'cancelled'. Ditto for travel.  I see the same
> at modern web sites:
> http://www.merriam-webster.com/dictionary/cancel
> http://www.thefreedictionary.com/cancel
>
> Both give the one el version first, and that might indicate a preference.
> But I was actually taught in school (some decades ago) to double the els of
> travel and cancel have have read the rule various places. I suspect that is
> not done now. More discussion:
>
> http://www.reference.com/motif/language/cancelled-vs-canceled
> http://grammarist.com/spelling/cancel/
>
> The latter has a Google ngram that shows 'canceled' has become more common
> in the U.S., but only in the last 30 years. It has even crept into British
> usage.
>
> http://books.google.com/ngrams/graph?content=canceled%2Ccancelled&year_start=1800&year_end=2000&corpus=6&smoothing=3&share=
>
> On the other hand, just about no one, even in the U.S., currently spells
> 'cancellation' as 'cancelation'. That was tried by a few writers 1910 to
> 1940, but never caught on.
>
> http://books.google.com/ngrams/graph?content=cancelation%2Ccancellation&year_start=1800&year_end=2000&corpus=17&smoothing=3&share=
>
> --
> Terry Jan Reedy
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas



-- 
--Guido van Rossum (python.org/~guido)


From tjreedy at udel.edu  Tue Jan 29 19:07:48 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 29 Jan 2013 13:07:48 -0500
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <ke8u39$nvi$1@ger.gmane.org>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
	<20130129105443.2804520b@pitrou.net> <ke8u39$nvi$1@ger.gmane.org>
Message-ID: <ke935o$bqd$1@ger.gmane.org>

On 1/29/2013 11:41 AM, Terry Reedy wrote:
> On 1/29/2013 4:54 AM, Antoine Pitrou wrote:

>> Ditto here. Now it's not unusual to have to learn new vocabulary, but
>> "heed" is obscure and makes an API difficult to understand for me.
>
> As a native American English speaker, 'heed' is not obscure.

As I believe you have often said, we need some benchmark numbers. 
According to Google's Ngram, 'heed' is still about 5 times more common 
in American books than 'annoy' and 'sympathize', which you use in your 
next sentence.

>> Of course, I sympathize with native English speakers who are annoyed
>> by the prevalence of Globish over real English.

http://books.google.com/ngrams/graph?content=heed%2Ccancel%2Cannoy%2Csympathize&year_start=1800&year_end=2000&corpus=17&smoothing=3&share=

Talking about obscure words, I have not seen 'Globish' before and I had 
to search to discover that it was not your idiosyncratic coinage. I was 
really surprised to find that there is even a Wikipedia entry.

https://en.wikipedia.org/wiki/Globish

--
Terry Jan Reedy



From python at mrabarnett.plus.com  Tue Jan 29 19:14:25 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Tue, 29 Jan 2013 18:14:25 +0000
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <CAGmFidZKQkUA46i=UEEDem8rexTy6-sbKhU=Zs4hi+p5C4Q5kA@mail.gmail.com>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
	<CAGmFidZKQkUA46i=UEEDem8rexTy6-sbKhU=Zs4hi+p5C4Q5kA@mail.gmail.com>
Message-ID: <51081181.5010001@mrabarnett.plus.com>

On 2013-01-29 08:52, Amaury Forgeot d'Arc wrote:
> 2013/1/29 Charles-Fran?ois Natali <cf.natali at gmail.com
> <mailto:cf.natali at gmail.com>>
>
>      > The point has been made that you don't want an interruption in the
>      > middle of an exception handling routine. That's true. You also don't
>      > want an interruption in the middle of a 'finally' block.
>
>     That's a good start :-)
>
>
> But is it feasible?
> Is it possible to handle the case where a finally block calls another
> Python function?
>
On entry to a finally block, interruption/cancellation is
disabled/suppressed, and it remains disabled until the try statement is
exited normally, at which point the original state is restored.

However, the code in the finally block or in a function called from the
finally block could re-enable it for a section of code with the context
manager, but the context manager would save the current (disabled)
state on entry and restore it on exit.


From guido at python.org  Tue Jan 29 19:48:49 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Jan 2013 10:48:49 -0800
Subject: [Python-ideas] libuv based eventloop for tulip experiment
In-Reply-To: <51070056.8020006@gmail.com>
References: <51070056.8020006@gmail.com>
Message-ID: <CAP7+vJJd0oZxzGqu_L0juxDwWvaPhd7cmC8azJSmn=brSnTv_w@mail.gmail.com>

On Mon, Jan 28, 2013 at 2:48 PM, Sa?l Ibarra Corretg? <saghul at gmail.com> wrote:
> Hi all!
>
> I haven't been able to keep up with all the tulip development on the mailing
> list (hopefully I will now!) so please excuse me if something I mention has
> already been discussed.

Me neither! :-) Libuv has been brought up before, though I haven't
looked at it in detail. I think you're bringing up good stuff.

> For those who may not know it, libuv is the platform layer library for
> nodejs, which implements a uniform interface on top of epoll, kqueue, event
> ports and iocp. I wrote Python bindings [1] for it a while ago, and I was
> very excited to see Tulip, so I thought I'd give this a try.

Great to hear!

> Here [2] is the source code, along with some notes I took during the
> implementation.

Hm... I see you just copied all of tulip and then hacked on it for a
while. :-) I wonder if you could refactor things so that an app would
be able to dynamically choose between tulip's and rose's event loop
using tulip's EventLoopPolicy machinery? The app could just
instantiate tulip.unix_eventloop._UnixEventLoop() (yes, this should
really be renamed!) or rose.uv.EventLoop, but all its imports should
come from tulip.

Also, there's a refactoring of the event loop classes underway in
tulip's iocp branch -- this adds IOCP support on Windows.

> I know that the idea is not to re-implement the PEP itself but for people to
> create different EventLoop implementations. On rose I bundled tulip just to
> make a single package I could play with easily, once tulip makes it to the
> stdlib only the EventLoop will remain.

It will be a long time before tulip makes it into the stdlib -- but
for easy experimentation it should be possible for apps to choose
between tulip and rose without having to change all their tulip
imports to rose imports.

> Here are some thoughts (in no particular order):
>
> - add_connector / remove_connector seem to be related to Windows, but being
> exposed like that feels a bit like leaking an implementation detail. I guess
> there was no way around it.

They would only be needed if we ever were to support WSAPoll() on
Windows, but I'm pretty much decided against that (need to check with
Richard Oudkerk once more). Then we can kill add_connector and
remove_connector.

> - libuv implements a type of handle (Poll) which provides level-triggered
> file descriptor polling which also works on Windows, while being highly
> performant. It uses something called AFD Polling apparently, which is only
> available on Windows >= Vista, and a select thread on XP. I'm no Windows
> expert, but thanks to this the API is consistent across all platforms, which
> is nice. mAybe it's worth investigating? [3]

Again that's probably for Richard to look into. I have no idea how it
relates to IOCP.

> - The transport abstraction seems quite tight to socket objects.

I'm confused to hear you say this, since the APIs for transports and
protocols are one of the few places of PEP 3156 where sockets are
*not* explicitly mentioned. (Though they are used in the
implementations, but I am envisioning alternate implementations that
don't use sockets.)

> pyuv
> provides a TCP and UDP handles, which provide a completion-style API and use
> a better approach than Poll handles.

So it implements TCP and UDP without socket objects? I actually like
this, because it validates my decision to keep socket objects out of
the transport/protocol APIs. (Note that PEP 3156 and Tulip currently
don't support UDP; it will require a somewhat different API between
transports and protocols.)

> They should give better performance
> since EINTR in handled internally and there are less roundtrips between
> Python-land and C-land.

Why would EINTR handling be important? That should occur almost never.
Or did you mean EAGAIN?

> Was it ever considered to provide some sort of
> abstraction so that transports can be used on top of something other than
> regular sockets? For example I see no way to get the remote party from the
> transport, without checking the underlying socket.

This we are considering in another thread -- there are in fact two
proposals on the table, one to add transport methods get_name() and
get_peer(), which should return (host, port) pairs if possible, or
None if the transport is not talking to an IP connection (or there are
too many layers in between to dig out that information). The other
proposal is a more generic API to get info out of the transport, e.g.
get_extra_info("name") and get_extra_info("peer"), which can be more
easily extended (without changing the PEP) to support other things,
e.g. certificate info if the transport implements SSL.

> Thanks for reading this far and keep up the good work.

Thanks for looking at this and reimplementing PEP 3156 on top of
libuv! This is exactly the kind of thing I am hoping for.

> Regards,
>
> [1]: https://github.com/saghul/pyuv
> [2]: https://github.com/saghul/rose
> [3]: https://github.com/joyent/libuv/blob/master/src/win/poll.c
>
> --
> Sa?l Ibarra Corretg?
> http://saghul.net/blog | http://about.me/saghul

-- 
--Guido van Rossum (python.org/~guido)


From tjreedy at udel.edu  Tue Jan 29 19:53:59 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 29 Jan 2013 13:53:59 -0500
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <5107E490.9070501@btinternet.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<5107E490.9070501@btinternet.com>
Message-ID: <ke95sc$7h1$1@ger.gmane.org>

On 1/29/2013 10:02 AM, Rob Cliffe wrote:
>
> On 29/01/2013 10:44, Nick Coghlan wrote:
>> Terry is correct: comprehensions are deliberately designed to have the
>> exact same looping semantics as the equivalent statements flattened
>> out into a single line, with the innermost expression lifted out of
>> the loop body and placed in front. This then works to arbitrarily deep
>> nesting levels. The surrounding syntax (parentheses, brackets, braces,
>> and whether or not there is a colon present in the main expression)
>> then governs what kind of result you get (generator-iterator, list,
>> set, dict).
>>
>> For example in:
>>
>>     (x, y, z for x in a if x for y in b if y for z in c if z)
>>     [x, y, z for x in a if x for y in b if y for z in c if z]
>>     {x, y, z for x in a if x for y in b if y for z in c if z}
>>     {x: y, z for x in a if x for y in b if y for z in c if z}
>>
>> The looping semantics of these expressions are all completely defined
>> by the equivalent statements:
>>
>>      for x in a:
>>          if x:
>>              for y in b:
>>                  if y:
>>                  for z in c:
>>                      if z:
>>
>> (modulo a few name lookup quirks if you're playing with class scopes)
>>
> Thanks for spelling this out so clearly.  It helps me remember which
> order to place nested "for"s inside a list comprehension! :-)

The reference manual does spell it out: "In this case, the elements of 
the new container are those that would be produced by considering each 
of the for or if clauses a block, nesting from left to right, and 
evaluating the expression to produce an element each time the innermost 
block is reached." Perhaps a non-trivial concrete example (say 4 levels 
deep) would help people understand that better.

-- 
Terry Jan Reedy



From eric at trueblade.com  Tue Jan 29 19:49:01 2013
From: eric at trueblade.com (Eric V. Smith)
Date: Tue, 29 Jan 2013 13:49:01 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301291235.01513.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
Message-ID: <5108199D.2000601@trueblade.com>

On 01/29/2013 07:35 AM, Mark Hackett wrote:
> On Tuesday 29 Jan 2013, Steven D'Aprano wrote:
>>> If it dropped the columns and shouldn't have, then the results will be
>>> seen to be wrong anyway, so there's not a huge amount of need for this.
>>
>> You cannot assume that the caller knows that there are duplicated column
>>  names
>>
> 
> You cannot assume they wanted them as a list.
> 
> You cannot assume that duplicate replacement is what they want.
> 
> If someone is using a csv file with header names they have never read, how are 
> they going to use the data? They won't even know the name to access the value 
> in the dictionary! So I discard the claim that the caller may not know the 
> column names are duplicated. They have to know what the headers are to use 
> DictReader.

Not true: I process some csv files just to translate them into another
format, say tab delimited. I don't care about the column names, but
dropping columns would sure bother me. I don't think any of the files
I've processed have duplicate columns, but I wouldn't swear to it. And
if they did, that would be an error I'd like to know about.

Eric.




From tjreedy at udel.edu  Tue Jan 29 20:11:22 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 29 Jan 2013 14:11:22 -0500
Subject: [Python-ideas] Canceled versus cancelled (was Re: Interrupting
	threads)
In-Reply-To: <CAP7+vJLOsyC+PmdQtF6SjJxkPOguLe8SH+-Qttt-dVxhStXCvg@mail.gmail.com>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
	<20130129105443.2804520b@pitrou.net>
	<ke8i82$rj6$1@ger.gmane.org> <ke90rt$khg$1@ger.gmane.org>
	<CAP7+vJLOsyC+PmdQtF6SjJxkPOguLe8SH+-Qttt-dVxhStXCvg@mail.gmail.com>
Message-ID: <ke96sv$fp8$1@ger.gmane.org>

On 1/29/2013 12:40 PM, Guido van Rossum wrote:
> This is all pretty pointless given that PEP 3148 uses cancelled() and
> concurrent.futures.Future has been released since Python 3.2.

I should have added that I considered 'cancelled' the right choice now 
for a global language. The case is quite different from 'color' versus 
'colour'.

> Introducing single-ell aliases is just going to confuse things more.

So this need not be considered for perhaps 50 to 100 years ;-).

-- 
Terry Jan Reedy



From turnbull at sk.tsukuba.ac.jp  Tue Jan 29 20:19:30 2013
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Wed, 30 Jan 2013 04:19:30 +0900
Subject: [Python-ideas] csv.DictReader could handle headers
	more	intelligently.
In-Reply-To: <5108199D.2000601@trueblade.com>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
	<5108199D.2000601@trueblade.com>
Message-ID: <87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp>

Eric V. Smith writes:

 > Not true: I process some csv files just to translate them into another
 > format, say tab delimited. I don't care about the column names,

Then you'd be nuts to use csv.DictReader!  csv.reader does exactly
what you want.

DictReader is about transforming a data format from a sequence of rows
of values accessed by position, one of which might be a header, to a
headerless sequence of objects with values accessed by name.  If your
use case doesn't involve access by name, it is irrelevant.



From rob.cliffe at btinternet.com  Tue Jan 29 20:16:57 2013
From: rob.cliffe at btinternet.com (Rob Cliffe)
Date: Tue, 29 Jan 2013 19:16:57 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <ke95sc$7h1$1@ger.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<5107E490.9070501@btinternet.com> <ke95sc$7h1$1@ger.gmane.org>
Message-ID: <51082029.5000103@btinternet.com>


On 29/01/2013 18:53, Terry Reedy wrote:
> On 1/29/2013 10:02 AM, Rob Cliffe wrote:
>>
>> On 29/01/2013 10:44, Nick Coghlan wrote:
>>> Terry is correct: comprehensions are deliberately designed to have the
>>> exact same looping semantics as the equivalent statements flattened
>>> out into a single line, with the innermost expression lifted out of
>>> the loop body and placed in front. This then works to arbitrarily deep
>>> nesting levels. The surrounding syntax (parentheses, brackets, braces,
>>> and whether or not there is a colon present in the main expression)
>>> then governs what kind of result you get (generator-iterator, list,
>>> set, dict).
>>>
>>> For example in:
>>>
>>>     (x, y, z for x in a if x for y in b if y for z in c if z)
>>>     [x, y, z for x in a if x for y in b if y for z in c if z]
>>>     {x, y, z for x in a if x for y in b if y for z in c if z}
>>>     {x: y, z for x in a if x for y in b if y for z in c if z}
>>>
>>> The looping semantics of these expressions are all completely defined
>>> by the equivalent statements:
>>>
>>>      for x in a:
>>>          if x:
>>>              for y in b:
>>>                  if y:
>>>                  for z in c:
>>>                      if z:
>>>
>>> (modulo a few name lookup quirks if you're playing with class scopes)
>>>
>> Thanks for spelling this out so clearly.  It helps me remember which
>> order to place nested "for"s inside a list comprehension! :-)
>
> The reference manual does spell it out: "In this case, the elements of 
> the new container are those that would be produced by considering each 
> of the for or if clauses a block, nesting from left to right, and 
> evaluating the expression to produce an element each time the 
> innermost block is reached." Perhaps a non-trivial concrete example 
> (say 4 levels deep) would help people understand that better.
>
Definitely.  +1.  Though I think 3 levels is enough.


From ethan at stoneleaf.us  Tue Jan 29 20:14:48 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Tue, 29 Jan 2013 11:14:48 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <ke95sc$7h1$1@ger.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<5107E490.9070501@btinternet.com> <ke95sc$7h1$1@ger.gmane.org>
Message-ID: <51081FA8.4010002@stoneleaf.us>

On 01/29/2013 10:53 AM, Terry Reedy wrote:
> On 1/29/2013 10:02 AM, Rob Cliffe wrote:
>>
>> On 29/01/2013 10:44, Nick Coghlan wrote:
>>> Terry is correct: comprehensions are deliberately designed to have the
>>> exact same looping semantics as the equivalent statements flattened
>>> out into a single line, with the innermost expression lifted out of
>>> the loop body and placed in front. This then works to arbitrarily deep
>>> nesting levels. The surrounding syntax (parentheses, brackets, braces,
>>> and whether or not there is a colon present in the main expression)
>>> then governs what kind of result you get (generator-iterator, list,
>>> set, dict).
>>>
>>> For example in:
>>>
>>>     (x, y, z for x in a if x for y in b if y for z in c if z)
>>>     [x, y, z for x in a if x for y in b if y for z in c if z]
>>>     {x, y, z for x in a if x for y in b if y for z in c if z}
>>>     {x: y, z for x in a if x for y in b if y for z in c if z}
>>>
>>> The looping semantics of these expressions are all completely defined
>>> by the equivalent statements:
>>>
>>>      for x in a:
>>>          if x:
>>>              for y in b:
>>>                  if y:
>>>                  for z in c:
>>>                      if z:
>>>
>>> (modulo a few name lookup quirks if you're playing with class scopes)
>>>
>> Thanks for spelling this out so clearly.  It helps me remember which
>> order to place nested "for"s inside a list comprehension! :-)
>
> The reference manual does spell it out: "In this case, the elements of
> the new container are those that would be produced by considering each
> of the for or if clauses a block, nesting from left to right, and
> evaluating the expression to produce an element each time the innermost
> block is reached." Perhaps a non-trivial concrete example (say 4 levels
> deep) would help people understand that better.

+1

The picture is much more enlightening (to me, anyway) than the words!

~Ethan~



From eric at trueblade.com  Tue Jan 29 20:21:58 2013
From: eric at trueblade.com (Eric V. Smith)
Date: Tue, 29 Jan 2013 14:21:58 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
	<5108199D.2000601@trueblade.com>
	<87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <51082156.40702@trueblade.com>

On 01/29/2013 02:19 PM, Stephen J. Turnbull wrote:
> Eric V. Smith writes:
> 
>  > Not true: I process some csv files just to translate them into another
>  > format, say tab delimited. I don't care about the column names,
> 
> Then you'd be nuts to use csv.DictReader!  csv.reader does exactly
> what you want.
> 
> DictReader is about transforming a data format from a sequence of rows
> of values accessed by position, one of which might be a header, to a
> headerless sequence of objects with values accessed by name.  If your
> use case doesn't involve access by name, it is irrelevant.

True. But my point stands: it's possible to read the data (even with a
DictReader), do something with the data, and not know the column names
in advance. It's not an impossible use case.

Eric.



From saghul at gmail.com  Tue Jan 29 21:08:33 2013
From: saghul at gmail.com (=?ISO-8859-1?Q?Sa=FAl_Ibarra_Corretg=E9?=)
Date: Tue, 29 Jan 2013 21:08:33 +0100
Subject: [Python-ideas] libuv based eventloop for tulip experiment
In-Reply-To: <CAP7+vJJd0oZxzGqu_L0juxDwWvaPhd7cmC8azJSmn=brSnTv_w@mail.gmail.com>
References: <51070056.8020006@gmail.com>
	<CAP7+vJJd0oZxzGqu_L0juxDwWvaPhd7cmC8azJSmn=brSnTv_w@mail.gmail.com>
Message-ID: <51082C41.2030508@gmail.com>

Hi!

[snip]

>
>> Here [2] is the source code, along with some notes I took during the
>> implementation.
>
> Hm... I see you just copied all of tulip and then hacked on it for a
> while. :-) I wonder if you could refactor things so that an app would
> be able to dynamically choose between tulip's and rose's event loop
> using tulip's EventLoopPolicy machinery? The app could just
> instantiate tulip.unix_eventloop._UnixEventLoop() (yes, this should
> really be renamed!) or rose.uv.EventLoop, but all its imports should
> come from tulip.
>
> Also, there's a refactoring of the event loop classes underway in
> tulip's iocp branch -- this adds IOCP support on Windows.
>

Sure, that's the idea, I just put everything together so that it would 
still run even if some API changes :-) Anyway, since I plan to follow 
this more closely I'll definitely go for that and rose will just create 
a new EventLoopPolicy which uses the uv event loop.

>> I know that the idea is not to re-implement the PEP itself but for people to
>> create different EventLoop implementations. On rose I bundled tulip just to
>> make a single package I could play with easily, once tulip makes it to the
>> stdlib only the EventLoop will remain.
>
> It will be a long time before tulip makes it into the stdlib -- but
> for easy experimentation it should be possible for apps to choose
> between tulip and rose without having to change all their tulip
> imports to rose imports.
>

Agreed.

>> Here are some thoughts (in no particular order):
>>
>> - add_connector / remove_connector seem to be related to Windows, but being
>> exposed like that feels a bit like leaking an implementation detail. I guess
>> there was no way around it.
>
> They would only be needed if we ever were to support WSAPoll() on
> Windows, but I'm pretty much decided against that (need to check with
> Richard Oudkerk once more). Then we can kill add_connector and
> remove_connector.
>

Ok, good to hear :-)

>> - libuv implements a type of handle (Poll) which provides level-triggered
>> file descriptor polling which also works on Windows, while being highly
>> performant. It uses something called AFD Polling apparently, which is only
>> available on Windows>= Vista, and a select thread on XP. I'm no Windows
>> expert, but thanks to this the API is consistent across all platforms, which
>> is nice. mAybe it's worth investigating? [3]
>
> Again that's probably for Richard to look into. I have no idea how it
> relates to IOCP.

I'm no windows expert either :-) AFAIS, IOCP provides a completion-based 
interface, but many people/libraries are used to level-triggered 
readiness notifications. It's apparently not easy to have unix style 
file descriptor polling in Windows, but that AFD Poll stuff (fairy dust 
to me, to be honest) does the trick. It only works for sockets, but I 
guess that's ok.

>
>> - The transport abstraction seems quite tight to socket objects.
>
> I'm confused to hear you say this, since the APIs for transports and
> protocols are one of the few places of PEP 3156 where sockets are
> *not* explicitly mentioned. (Though they are used in the
> implementations, but I am envisioning alternate implementations that
> don't use sockets.)
>

Indeed I meant the implementation. For example right now start_serving 
returns a Python socket object maybe some sort of ServerHandler class 
could hide that and provide some some convenience methods such as 
getsockname. If the eventloop implementation uses Python sockets it 
could just call the function in the underlying sockets, but some other 
implementations may have other means so gather that information.

>> pyuv
>> provides a TCP and UDP handles, which provide a completion-style API and use
>> a better approach than Poll handles.
>
> So it implements TCP and UDP without socket objects? I actually like
> this, because it validates my decision to keep socket objects out of
> the transport/protocol APIs. (Note that PEP 3156 and Tulip currently
> don't support UDP; it will require a somewhat different API between
> transports and protocols.)
>

Yes, the TCP and UDP handles from pyuv are wrappers to their 
corresponding types in libuv. They exist because JS doesn't have sockets 
so the had to create them for nodejs. The API, however, is completion 
style, here is a simple example on how data is read from a TCP handle:

def on_data_received(handle, data, error):
     if error == pyuv.error.UV_EOF:
         # Remove closed the connection
         handle.close()
         return
     print(data)

tcp_handle.start_read(on_data_received)

This model actually fits pretty well in tulip's transport/protocol 
mechanism.

>> They should give better performance
>> since EINTR in handled internally and there are less roundtrips between
>> Python-land and C-land.
>
> Why would EINTR handling be important? That should occur almost never.
> Or did you mean EAGAIN?
>

Actually, both. If the process receives signal epoll_wait would be 
interrupted, and libuv takes care of rearming the file descriptor, which 
happens in C without the GIL. Same goes for EAGAIN, basically libuv 
tries to read 64k chunks when start_read is called, and it automatically 
retires on EAGAIN. I don't have number to back this up (yet) but 
conceptually sounds pretty plausible.

>> Was it ever considered to provide some sort of
>> abstraction so that transports can be used on top of something other than
>> regular sockets? For example I see no way to get the remote party from the
>> transport, without checking the underlying socket.
>
> This we are considering in another thread -- there are in fact two
> proposals on the table, one to add transport methods get_name() and
> get_peer(), which should return (host, port) pairs if possible, or
> None if the transport is not talking to an IP connection (or there are
> too many layers in between to dig out that information). The other
> proposal is a more generic API to get info out of the transport, e.g.
> get_extra_info("name") and get_extra_info("peer"), which can be more
> easily extended (without changing the PEP) to support other things,
> e.g. certificate info if the transport implements SSL.
>

The second model seems more flexible indeed. I guess the SSL transport 
could be tricky, because while currently Tulip uses the ssl module I 
have no TLS handle on pyuv so I'd have to build one on top of a TCP 
handle with pyOpenSSL (I have a prototype here [1]), so object types / 
APIs wouldn't match, unless Tulip provides some wrappers for SSL related 
objects such as certificates...

>> Thanks for reading this far and keep up the good work.
>
> Thanks for looking at this and reimplementing PEP 3156 on top of
> libuv! This is exactly the kind of thing I am hoping for.
>

I'll follow up the discussion closer now :-)


[1]: https://gist.github.com/4599801#file-uvtls-py

Regards,

-- 
Sa?l Ibarra Corretg?
http://saghul.net/blog | http://about.me/saghul


From guido at python.org  Tue Jan 29 21:26:52 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Jan 2013 12:26:52 -0800
Subject: [Python-ideas] libuv based eventloop for tulip experiment
In-Reply-To: <51082C41.2030508@gmail.com>
References: <51070056.8020006@gmail.com>
	<CAP7+vJJd0oZxzGqu_L0juxDwWvaPhd7cmC8azJSmn=brSnTv_w@mail.gmail.com>
	<51082C41.2030508@gmail.com>
Message-ID: <CAP7+vJLVyps8UgB+Yx6dkmshn1gqEXxAWoOkbs-m96WQtpzvhA@mail.gmail.com>

On Tue, Jan 29, 2013 at 12:08 PM, Sa?l Ibarra Corretg? <saghul at gmail.com> wrote:
> [snip]
[snip*2]
> I'm no windows expert either :-) AFAIS, IOCP provides a completion-based
> interface, but many people/libraries are used to level-triggered readiness
> notifications. It's apparently not easy to have unix style file descriptor
> polling in Windows, but that AFD Poll stuff (fairy dust to me, to be honest)
> does the trick. It only works for sockets, but I guess that's ok.

Yeah, so do the other polling things on Windows. (Well, mostly
sockets. There are some other things supported like named pipes.)

I guess in order to support this we'd need some kind of abstraction
away from socket objects and file descriptors, at least for event loop
methods like sock_recv() and add_reader(). But those are mostly meant
for transports to build upon, so I think that would be fine.

>>> - The transport abstraction seems quite tight to socket objects.

>> I'm confused to hear you say this, since the APIs for transports and
>> protocols are one of the few places of PEP 3156 where sockets are
>> *not* explicitly mentioned. (Though they are used in the
>> implementations, but I am envisioning alternate implementations that
>> don't use sockets.)

> Indeed I meant the implementation. For example right now start_serving
> returns a Python socket object maybe some sort of ServerHandler class could
> hide that and provide some some convenience methods such as getsockname. If
> the eventloop implementation uses Python sockets it could just call the
> function in the underlying sockets, but some other implementations may have
> other means so gather that information.

Ah, yes, the start_serving() API. It is far from ready. :-(

>>> pyuv
>>> provides a TCP and UDP handles, which provide a completion-style API and
>>> use
>>> a better approach than Poll handles.

>> So it implements TCP and UDP without socket objects? I actually like
>> this, because it validates my decision to keep socket objects out of
>> the transport/protocol APIs. (Note that PEP 3156 and Tulip currently
>> don't support UDP; it will require a somewhat different API between
>> transports and protocols.)

> Yes, the TCP and UDP handles from pyuv are wrappers to their corresponding
> types in libuv. They exist because JS doesn't have sockets so the had to
> create them for nodejs. The API, however, is completion style, here is a
> simple example on how data is read from a TCP handle:
>
> def on_data_received(handle, data, error):
>     if error == pyuv.error.UV_EOF:
>         # Remove closed the connection
>         handle.close()
>         return
>     print(data)
>
> tcp_handle.start_read(on_data_received)
>
> This model actually fits pretty well in tulip's transport/protocol
> mechanism.

Yeah, I see. If we squint and read "handle" instead of "socket" we
could even make it so that loop.sock_recv() takes one of these -- it
would return a Future and your callback would set the Future's result,
or its exception if an error was set.

>>> They should give better performance
>>> since EINTR in handled internally and there are less roundtrips between
>>> Python-land and C-land.

>> Why would EINTR handling be important? That should occur almost never.
>> Or did you mean EAGAIN?

> Actually, both. If the process receives signal epoll_wait would be
> interrupted, and libuv takes care of rearming the file descriptor, which
> happens in C without the GIL. Same goes for EAGAIN, basically libuv tries to
> read 64k chunks when start_read is called, and it automatically retires on
> EAGAIN. I don't have number to back this up (yet) but conceptually sounds
> pretty plausible.

Hm. Anything that uses signals for its normal operation sounds highly
suspect to me. But it probably doesn't matter either way.

>>> Was it ever considered to provide some sort of
>>> abstraction so that transports can be used on top of something other than
>>> regular sockets? For example I see no way to get the remote party from
>>> the transport, without checking the underlying socket.

>> This we are considering in another thread -- there are in fact two
>> proposals on the table, one to add transport methods get_name() and
>> get_peer(), which should return (host, port) pairs if possible, or
>> None if the transport is not talking to an IP connection (or there are
>> too many layers in between to dig out that information). The other
>> proposal is a more generic API to get info out of the transport, e.g.
>> get_extra_info("name") and get_extra_info("peer"), which can be more
>> easily extended (without changing the PEP) to support other things,
>> e.g. certificate info if the transport implements SSL.

> The second model seems more flexible indeed. I guess the SSL transport could
> be tricky, because while currently Tulip uses the ssl module I have no TLS
> handle on pyuv so I'd have to build one on top of a TCP handle with
> pyOpenSSL (I have a prototype here [1]), so object types / APIs wouldn't
> match, unless Tulip provides some wrappers for SSL related objects such as
> certificates...

Hm, I thought certificates were just blobs of data? We should probably
come up with a standard way to represent these that isn't tied to the
stdlib's ssl module. But I don't think this should be part of PEP 3156
-- it's too big already.

> [1]: https://gist.github.com/4599801#file-uvtls-py

> Sa?l Ibarra Corretg?
> http://saghul.net/blog | http://about.me/saghul

-- 
--Guido van Rossum (python.org/~guido)


From stephen at xemacs.org  Tue Jan 29 21:37:38 2013
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 30 Jan 2013 05:37:38 +0900
Subject: [Python-ideas] csv.DictReader could handle headers
	more	intelligently.
In-Reply-To: <51082156.40702@trueblade.com>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
	<5108199D.2000601@trueblade.com>
	<87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp>
	<51082156.40702@trueblade.com>
Message-ID: <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp>

Eric V. Smith writes:

 > True. But my point stands: it's possible to read the data (even with a
 > DictReader), do something with the data, and not know the column names
 > in advance. It's not an impossible use case.

But it is.  Dicts don't guarantee iteration order, so you will most
likely get an output file that not only has a different delimiter, but
a different order of fields.

The right use case here is duck-typing.  Something like "I have a
bunch of tables of data about car models from different manufacturers
which have different sets of columns, and I know that all of them have
a column labeled 'MSRP', but which column might vary across tables."

Of course, I don't actually believe you'd get that lucky.


From eric at trueblade.com  Tue Jan 29 21:59:42 2013
From: eric at trueblade.com (Eric V. Smith)
Date: Tue, 29 Jan 2013 15:59:42 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <1358903168.4767.4.camel@webb>
	<201301281745.16485.mark.hackett@metoffice.gov.uk>
	<5107BFE5.6010800@pearwood.info>
	<201301291235.01513.mark.hackett@metoffice.gov.uk>
	<5108199D.2000601@trueblade.com>
	<87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp>
	<51082156.40702@trueblade.com>
	<87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <5108383E.3020501@trueblade.com>

On 1/29/2013 3:37 PM, Stephen J. Turnbull wrote:
> Eric V. Smith writes:
> 
>  > True. But my point stands: it's possible to read the data (even with a
>  > DictReader), do something with the data, and not know the column names
>  > in advance. It's not an impossible use case.
> 
> But it is.  Dicts don't guarantee iteration order, so you will most
> likely get an output file that not only has a different delimiter, but
> a different order of fields.

We're going to have to agree to disagree. Order is not always important.

-- 
Eric.


From yorik.sar at gmail.com  Wed Jan 30 00:37:00 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 30 Jan 2013 03:37:00 +0400
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <loom.20130129T163910-565@post.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
Message-ID: <CABocrW4w48YJdP+gKrRrL8rT4n5XkchzOfFkVqPqe9TeyGPj5A@mail.gmail.com>

On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier <
wolfgang.maier at biologie.uni-freiburg.de> wrote:

> list(i for i in range(100) if i<50 or stop())
> Really (!) nice (and 2x as fast as using itertools.takewhile())!
>

I couldn't believe it so I had to check it:

from __future__ import print_function
import functools, itertools, operator, timeit

def var1():
    def _gen():
        for i in range(100):
            if i > 50: break
            yield i
    return list(_gen())

def var2():
    def stop():
        raise StopIteration
    return list(i for i in range(100) if i <= 50 or stop())

def var3():
    return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))]

def var4():
    return [i for i in itertools.takewhile(functools.partial(operator.lt,
50), range(100))]

if __name__ == '__main__':
    for f in (var1, var2, var3, var4):
        print(f.__name__, end=' ')
        print(timeit.timeit(f))

Results on my machine:

var1 20.4974410534
var2 23.6218020916
var3 32.1543409824
var4 4.90913701057

var1 might have became the fastest of the first 3 because it's a special
and very simple case. Why should explicit loops be slower that generator
expressions?
var3 is the slowest. I guess, because it has lambda in it.
But switching to Python and back can not be faster than the last option -
sitting in the C code as much as we can.

-- 

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/abdd9c6f/attachment.html>

From shane at umbrellacode.com  Wed Jan 30 01:17:51 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 16:17:51 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAKJDb-MtEuNVYu3yr+OUjiynbOUXm_+aZn3QRq+pcR7LbTHK8Q@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<CAKJDb-MDm6FwRd-anZ28U6EOF5+1LddojTstzOnMrxWZFJpbiA@mail.gmail.com>
	<CAHVvXxSXNG0D+OmEsy4Pwp0nynW=ED7Ut042hzW7AjDUdAv72A@mail.gmail.com>
	<CAKJDb-MtEuNVYu3yr+OUjiynbOUXm_+aZn3QRq+pcR7LbTHK8Q@mail.gmail.com>
Message-ID: <36B85EEB-E336-4E68-BC82-763F4AA582F1@umbrellacode.com>

Haven't read back far enough to know whether this is as interesting as it looks to me, but..

>>> def until(items):
...     stop = None
...     counter = 0
...     items = iter(items)
...     while not stop:
...             stop = yield next(items)
...             if stop: 
...                     yield
...             counter += 1
...             print(counter)
... 
>>> gen = until(range(15))
>>> stop = lambda: gen.send(True)
>>> [x for x in gen if x < 3 or stop()]
1
2
3
4
[0, 1, 2]
>>> 






Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 8:23 AM, Zachary Ware <zachary.ware+pyideas at gmail.com> wrote:

> 
> On Jan 29, 2013 10:02 AM, "Oscar Benjamin" <oscar.j.benjamin at gmail.com> wrote:
> >
> > On 29 January 2013 15:34, Zachary Ware <zachary.ware+pyideas at gmail.com> wrote:
> > >
> > > On Jan 29, 2013 9:26 AM, "Oscar Benjamin" <oscar.j.benjamin at gmail.com>
> > > wrote:
> > >>
> > >> On 29 January 2013 11:51, yoav glazner <yoavglazner at gmail.com> wrote:
> > >> > Here is very similar version that works (tested on python27)
> > >> >>>> def stop():
> > >> > next(iter([]))
> > >> >
> > >> >>>> list((i if i<50 else stop()) for i in range(100))
> > >> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
> > >> > 20,
> > >> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
> > >> > 39,
> > >> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
> > >>
> > >> That's a great idea. You could also do:
> > >> >>> list(i for i in range(100) if i<50 or stop())
> > >>
> > >> It's a shame it doesn't work for list/set/dict comprehensions, though.
> > >>
> > >
> > > I know I'm showing my ignorance here, but how are list/dict/set
> > > comprehensions and generator expressions implemented differently that one's
> > > for loop will catch a StopIteration and the others won't? Would it make
> > > sense to reimplement list/dict/set comprehensions as an equivalent generator
> > > expression passed to the appropriate constructor, and thereby allow the
> > > StopIteration trick to work for each of them as well?
> >
> > A for loop is like a while loop with a try/except handler for
> > StopIteration. So the following are roughly equivalent:
> >
> > # For loop
> > for x in iterable:
> >     func1(x)
> > else:
> >     func2()
> >
> > # Equivalent loop
> > it = iter(iterable)
> > while True:
> >     try:
> >         x = next(it)
> >     except StopIteration:
> >         func2()
> >         break
> >     func1(x)
> >
> > A list comprehension is just like an implicit for loop with limited
> > functionality so it looks like:
> >
> > # List comp
> > results = [func1(x) for x in iterable if func2(x)]
> >
> > # Equivalent loop
> > results = []
> > it = iter(iterable)
> > while True:
> >     try:
> >         x = next(it)
> >     except StopIteration:
> >         break
> >     # This part is outside the try/except
> >     if func2(x):
> >         results.append(func1(x))
> >
> > The problem in the above is that we only catch StopIteration around
> > the call to next(). So if either of func1 or func2 raises
> > StopIteration the exception will propagate rather than terminate the
> > loop. (This may mean that it terminates a for loop higher in the call
> > stack - which can lead to confusing bugs - so it's important to always
> > catch StopIteration anywhere it might get raised.)
> >
> > The difference with the list(generator) version is that func1() and
> > func2() are both called inside the call to next() from the perspective
> > of the list() function. This means that if they raise StopIteration
> > then the try/except handler in the enclosing list function will catch
> > it and terminate its loop.
> >
> > # list(generator)
> > results = list(func1(x) for x in iterable if func2(c))
> >
> > # Equivalent loop:
> > def list(iterable):
> >     it = iter(iterable)
> >     results = []
> >     while True:
> >         try:
> >             # Now func1 and func2 are both called in next() here
> >             x = next(it)
> >         except StopIteration:
> >             break
> >         results.append(x)
> >     return results
> >
> > results_gen = (func1(x) for x in iterable if func2(x))
> > results = list(results_gen)
> >
> 
> That makes a lot of sense. Thank you, Oscar and Joao, for the explanations. I wasn't thinking in enough scopes :)
> 
> Regards,
> 
> Zach Ware
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/c1596f0c/attachment.html>

From steve at pearwood.info  Wed Jan 30 01:34:30 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 30 Jan 2013 11:34:30 +1100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <loom.20130129T163910-565@post.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
Message-ID: <51086A96.9020300@pearwood.info>

On 30/01/13 02:44, Wolfgang Maier wrote:

> list(i for i in range(100) if i<50 or stop())
> Really (!) nice (and 2x as fast as using itertools.takewhile())!

I think you are mistaken about the speed. The itertools iterators are highly
optimized and do all their work in fast C code. If you are seeing takewhile
as slow, you are probably doing something wrong: untrustworthy timing code,
misinterpreting what you are seeing, or some other error.


Here's a comparison done the naive or obvious way. Copy and paste it into an
interactive Python session:


from itertools import takewhile
from timeit import Timer

def stop(): raise StopIteration

setup = 'from __main__ import stop, takewhile'

t1 = Timer('list(i for i in xrange(1000) if i < 50 or stop())', setup)
t2 = Timer('[i for i in takewhile(lambda x: x < 50, xrange(1000))]', setup)

min(t1.repeat(number=100000, repeat=5))
min(t2.repeat(number=100000, repeat=5))


On my computer, t1 is about 1.5 times faster than t2. But this is misleading,
because it's not takewhile that is slow. I am feeding something slow into
takewhile. If I really need to run as fast as possible, I can optimize the
function call inside takewhile:


from operator import lt
from functools import partial

small_enough = partial(lt, 50)
setup2 = 'from __main__ import takewhile, small_enough'

t3 = Timer('[i for i in takewhile(small_enough, xrange(1000))]', setup2)

min(t3.repeat(number=100000, repeat=5))


On my computer, t3 is nearly 13 times faster than t1, and 19 times faster
than t2. Here are the actual times I get, using Python 2.7:


py> min(t1.repeat(number=100000, repeat=5))  # using the StopIteration hack
1.2609241008758545
py> min(t2.repeat(number=100000, repeat=5))  # takewhile and lambda
1.85182785987854
py> min(t3.repeat(number=100000, repeat=5))  # optimized version
0.09847092628479004



-- 
Steven


From larry at hastings.org  Wed Jan 30 02:06:45 2013
From: larry at hastings.org (Larry Hastings)
Date: Tue, 29 Jan 2013 17:06:45 -0800
Subject: [Python-ideas] Extend module objects to support properties
Message-ID: <51087225.3040801@hastings.org>



Properties are a wonderful facility.  But they only work on conventional 
objects.  Specifically, they *don't* work on module objects.  It would 
be nice to extend module objects so properties worked there too.

For example, Victor Stinner's currently proposed PEP 433 adds two new 
methods to the sys module: sys.getdefaultcloexc() and 
sys.setdefaultcloexc().  What are we, Java?  Surely this would be much 
nicer as a property, sys.defaultcloexc.


//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/bbe2b2ab/attachment.html>

From barry at python.org  Wed Jan 30 02:27:30 2013
From: barry at python.org (Barry Warsaw)
Date: Tue, 29 Jan 2013 20:27:30 -0500
Subject: [Python-ideas] constant/enum type in stdlib
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
Message-ID: <20130129202730.6ea6d0d5@anarchist.wooz.org>

On Jan 28, 2013, at 11:50 PM, Joao S. O. Bueno wrote:

>And it was not dismissed at all - to the contrary the last e-mail in the
>thread is a message from the BDLF for it to **be** ! The discussion happened
>in a bad moment as Python was mostly freature froozen for 3.2 - and it did
>not show up again for Python 3.3;

I still offer up my own enum implementation, which I've used and has been
available for years on PyPI, and hasn't had a new release in months because it
hasn't needed one. :)  It should be compatible with Pythons from 2.6 to 3.3.

http://pypi.python.org/pypi/flufl.enum

The one hang up about it the last time this came up was that my enum items are
not ints and Guido though they should be.  I actually tried at one point to
make that so, but had some troublesome test failures that I didn't have time
or motivation to fix, mostly because I don't particularly like those
semantics.  I don't remember the details.

However, if someone *else* wanted to submit a branch/patch to have enum items
inherit from ints, and that was all it took to have these adopted into the
stdlib, I would be happy to take a look.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/9b552107/attachment.pgp>

From shane at umbrellacode.com  Wed Jan 30 02:27:52 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 17:27:52 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CABocrW4w48YJdP+gKrRrL8rT4n5XkchzOfFkVqPqe9TeyGPj5A@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<CABocrW4w48YJdP+gKrRrL8rT4n5XkchzOfFkVqPqe9TeyGPj5A@mail.gmail.com>
Message-ID: <F2D2D366-664B-4E30-9FA9-3EF5E24C7EFC@umbrellacode.com>

Wait, it was much simpler than that?

>>> def until(items):
...     stops = []
...     def stop():
...             stops.append(1) 
...     yield stop
...     items = iter(items)
...     counter = 0
...     while not stops:
...             yield next(items)
...             print(counter)
...             counter += 1
... 
>>> 
>>> gen = until(range(15))
>>> stop = next(gen)
>>> [x for x in gen if x < 3 or stop()]
0
1
2
3
[0, 1, 2]
>>> 


I must have just been up for too long that this looks like something new to me.



Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 3:37 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:

> On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier <wolfgang.maier at biologie.uni-freiburg.de> wrote:
> list(i for i in range(100) if i<50 or stop())
> Really (!) nice (and 2x as fast as using itertools.takewhile())!
> 
> I couldn't believe it so I had to check it:
> 
> from __future__ import print_function
> import functools, itertools, operator, timeit
> 
> def var1():
>     def _gen():
>         for i in range(100):
>             if i > 50: break
>             yield i
>     return list(_gen())
> 
> def var2():
>     def stop():
>         raise StopIteration
>     return list(i for i in range(100) if i <= 50 or stop())
> 
> def var3():
>     return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))]
> 
> def var4():
>     return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))]
> 
> if __name__ == '__main__':
>     for f in (var1, var2, var3, var4):
>         print(f.__name__, end=' ')
>         print(timeit.timeit(f))
> 
> Results on my machine:
> 
> var1 20.4974410534
> var2 23.6218020916
> var3 32.1543409824
> var4 4.90913701057
> 
> var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions?
> var3 is the slowest. I guess, because it has lambda in it.
> But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can.
> 
> -- 
> 
> Kind regards, Yuriy.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/db65d1e3/attachment.html>

From shane at umbrellacode.com  Wed Jan 30 02:56:36 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 17:56:36 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <F2D2D366-664B-4E30-9FA9-3EF5E24C7EFC@umbrellacode.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<CABocrW4w48YJdP+gKrRrL8rT4n5XkchzOfFkVqPqe9TeyGPj5A@mail.gmail.com>
	<F2D2D366-664B-4E30-9FA9-3EF5E24C7EFC@umbrellacode.com>
Message-ID: <3B15C735-3030-4E1B-900E-BF2C7B1A2A92@umbrellacode.com>

Ah, right, feeding it through an iterator gives you full control?





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 5:27 PM, Shane Green <shane at umbrellacode.com> wrote:

> Wait, it was much simpler than that?
> 
> >>> def until(items):
> ...     stops = []
> ...     def stop():
> ...             stops.append(1) 
> ...     yield stop
> ...     items = iter(items)
> ...     counter = 0
> ...     while not stops:
> ...             yield next(items)
> ...             print(counter)
> ...             counter += 1
> ... 
> >>> 
> >>> gen = until(range(15))
> >>> stop = next(gen)
> >>> [x for x in gen if x < 3 or stop()]
> 0
> 1
> 2
> 3
> [0, 1, 2]
> >>> 
> 
> 
> I must have just been up for too long that this looks like something new to me.
> 
> 
> 
> Shane Green 
> www.umbrellacode.com
> 408-692-4666 | shane at umbrellacode.com
> 
> On Jan 29, 2013, at 3:37 PM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
> 
>> On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>> list(i for i in range(100) if i<50 or stop())
>> Really (!) nice (and 2x as fast as using itertools.takewhile())!
>> 
>> I couldn't believe it so I had to check it:
>> 
>> from __future__ import print_function
>> import functools, itertools, operator, timeit
>> 
>> def var1():
>>     def _gen():
>>         for i in range(100):
>>             if i > 50: break
>>             yield i
>>     return list(_gen())
>> 
>> def var2():
>>     def stop():
>>         raise StopIteration
>>     return list(i for i in range(100) if i <= 50 or stop())
>> 
>> def var3():
>>     return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))]
>> 
>> def var4():
>>     return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))]
>> 
>> if __name__ == '__main__':
>>     for f in (var1, var2, var3, var4):
>>         print(f.__name__, end=' ')
>>         print(timeit.timeit(f))
>> 
>> Results on my machine:
>> 
>> var1 20.4974410534
>> var2 23.6218020916
>> var3 32.1543409824
>> var4 4.90913701057
>> 
>> var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions?
>> var3 is the slowest. I guess, because it has lambda in it.
>> But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can.
>> 
>> -- 
>> 
>> Kind regards, Yuriy.
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/c7eb03a8/attachment.html>

From shane at umbrellacode.com  Wed Jan 30 03:31:37 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 18:31:37 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <3B15C735-3030-4E1B-900E-BF2C7B1A2A92@umbrellacode.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<CABocrW4w48YJdP+gKrRrL8rT4n5XkchzOfFkVqPqe9TeyGPj5A@mail.gmail.com>
	<F2D2D366-664B-4E30-9FA9-3EF5E24C7EFC@umbrellacode.com>
	<3B15C735-3030-4E1B-900E-BF2C7B1A2A92@umbrellacode.com>
Message-ID: <864D6A71-6663-478A-B342-83F5634DF15C@umbrellacode.com>

Although it's not always viable, given how easy it is to wrap an iterator, it seems like might come in handy for comprehensions.  

	[x for x in items if x < 50 or items.close()]



Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/6a7864f8/attachment.html>

From shane at umbrellacode.com  Wed Jan 30 03:34:24 2013
From: shane at umbrellacode.com (Shane Green)
Date: Tue, 29 Jan 2013 18:34:24 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <864D6A71-6663-478A-B342-83F5634DF15C@umbrellacode.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<CABocrW4w48YJdP+gKrRrL8rT4n5XkchzOfFkVqPqe9TeyGPj5A@mail.gmail.com>
	<F2D2D366-664B-4E30-9FA9-3EF5E24C7EFC@umbrellacode.com>
	<3B15C735-3030-4E1B-900E-BF2C7B1A2A92@umbrellacode.com>
	<864D6A71-6663-478A-B342-83F5634DF15C@umbrellacode.com>
Message-ID: <644AED9E-D6A3-45AC-B07B-57EF7A2B6442@umbrellacode.com>

Sorry, that was phrased backwards: the ease of wrapping iterators increases the viability?





Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 29, 2013, at 6:31 PM, Shane Green <shane at umbrellacode.com> wrote:

> Although it's not always viable, given how easy it is to wrap an iterator, it seems like might come in handy for comprehensions.  
> 
> 	[x for x in items if x < 50 or items.close()]
> 
> 
> 
> Shane Green 
> www.umbrellacode.com
> 408-692-4666 | shane at umbrellacode.com
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/f639361a/attachment.html>

From greg.ewing at canterbury.ac.nz  Wed Jan 30 00:26:35 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Jan 2013 12:26:35 +1300
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
Message-ID: <51085AAB.6090303@canterbury.ac.nz>

Eli Bendersky wrote:
> I really wish there would be an enum type in Python that would make 
> sense. ISTM this has been raised numerous times, but not one submitted a 
> good-enough proposal.

I think the reason the discussion petered out last time
is that everyone has a slightly different idea on what
an enum type should be like. A number of proposals were
made, but none of them stood out as being the obviously
right one to put in the std lib.

Also, so far nobody has come up with a really elegant
solution to the DRY problem that inevitably arises in
connection with enums. Ideally you want to be able to
specify the names of the enums as identifiers, and not
have to write them again as strings or otherwise provide
explicit values for them. That seems to be very difficult
to achieve cleanly with Python syntax as it stands.

-- 
Greg


From eliben at gmail.com  Wed Jan 30 03:45:07 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Tue, 29 Jan 2013 18:45:07 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <51085AAB.6090303@canterbury.ac.nz>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
Message-ID: <CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>

On Tue, Jan 29, 2013 at 3:26 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>wrote:

> Eli Bendersky wrote:
>
>> I really wish there would be an enum type in Python that would make
>> sense. ISTM this has been raised numerous times, but not one submitted a
>> good-enough proposal.
>>
>
> I think the reason the discussion petered out last time
> is that everyone has a slightly different idea on what
> an enum type should be like. A number of proposals were
> made, but none of them stood out as being the obviously
> right one to put in the std lib.
>
> Also, so far nobody has come up with a really elegant
> solution to the DRY problem that inevitably arises in
> connection with enums. Ideally you want to be able to
> specify the names of the enums as identifiers, and not
> have to write them again as strings or otherwise provide
> explicit values for them. That seems to be very difficult
> to achieve cleanly with Python syntax as it stands.


Since we're discussing a new language feature, why do we have to be
restricted by the existing Python syntax? We have plenty of time before 3.4
at this point.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/bdd492c7/attachment.html>

From guido at python.org  Wed Jan 30 05:01:34 2013
From: guido at python.org (Guido van Rossum)
Date: Tue, 29 Jan 2013 20:01:34 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
Message-ID: <CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>

On Tue, Jan 29, 2013 at 6:45 PM, Eli Bendersky <eliben at gmail.com> wrote:
> On Tue, Jan 29, 2013 at 3:26 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>
> wrote:
>>
>> Eli Bendersky wrote:
>>>
>>> I really wish there would be an enum type in Python that would make
>>> sense. ISTM this has been raised numerous times, but not one submitted a
>>> good-enough proposal.
>>
>> I think the reason the discussion petered out last time
>> is that everyone has a slightly different idea on what
>> an enum type should be like. A number of proposals were
>> made, but none of them stood out as being the obviously
>> right one to put in the std lib.
>>
>> Also, so far nobody has come up with a really elegant
>> solution to the DRY problem that inevitably arises in
>> connection with enums. Ideally you want to be able to
>> specify the names of the enums as identifiers, and not
>> have to write them again as strings or otherwise provide
>> explicit values for them. That seems to be very difficult
>> to achieve cleanly with Python syntax as it stands.

Hm, if people really want to write something like

color = enum(RED, WHITE, BLUE)

that might still be true, but given that it's likely going to look a
little more like a class definition, this doesn't look so bad, and
certainly doesn't violate DRY (though it's somewhat verbose):

class color(enum):
  RED = value()
  WHITE = value()
  BLUE = value()

The Python 3 metaclass can observe the order in which the values are
defined easily by setting the class dict to an OrderdDict.

> Since we're discussing a new language feature, why do we have to be
> restricted by the existing Python syntax? We have plenty of time before 3.4
> at this point.

Introducing new syntax requires orders of magnitude more convincing
than a new library module or even a new builtin.

-- 
--Guido van Rossum (python.org/~guido)


From greg.ewing at canterbury.ac.nz  Wed Jan 30 05:34:57 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Jan 2013 17:34:57 +1300
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
Message-ID: <5108A2F1.5010006@canterbury.ac.nz>

Guido van Rossum wrote:
> this doesn't look so bad, and
> certainly doesn't violate DRY (though it's somewhat verbose):
> 
> class color(enum):
>   RED = value()
>   WHITE = value()
>   BLUE = value()

The verbosity is what makes it fail the "truly elegant"
test for me. And I would say that it does violate DRY
in the sense that you have to write value() repeatedly
for no good reason.

Sure, it's not bad enough to make it unusable, but like
all the other solutions, it leaves me feeling vaguely
annoyed that there isn't a better way.

And it *is* bad enough to make writing an enum definition
into a dreary chore, rather than the pleasure it should be.

-- 
Greg


From greg.ewing at canterbury.ac.nz  Wed Jan 30 05:58:37 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 30 Jan 2013 17:58:37 +1300
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
Message-ID: <5108A87D.9000207@canterbury.ac.nz>

Guido van Rossum wrote:

> class color(enum):
>   RED = value()
>   WHITE = value()
>   BLUE = value()

We could do somewhat better than that:

    class Color(Enum):
       RED, WHITE, BLUE = range(3)

However, it's still slightly annoying that you have to
specify how many values there are in the range() call.
It would be even nicer it we could just use an infinite
iterator, such as

    class Color(Enum):
       RED, WHITE, BLUE = values()

However, the problem here is that the unpacking bytecode
anally insists on the iterator providing *exactly* the
right number of items, and there is no way for values()
to know when to stop producing items.

So, suppose we use a slightly extended version of the
iterator protocol for unpacking purposes. If the object
being unpacked has an __endunpack__ method, we call it
after unpacking the last value, and it is responsible
for doing appopriate checking and raising an exception
if necessary. Otherwise we do as we do now.

The values() object can then have an __endunpack__
method that does nothing, allowing you to unpack any
number of items from it.

-- 
Greg


From eliben at gmail.com  Wed Jan 30 06:26:11 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Tue, 29 Jan 2013 21:26:11 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
Message-ID: <CAF-Rda-e9ezNVqKx2GBngOJb6wT165Z1do9=boMioxBcJejfeA@mail.gmail.com>

> Hm, if people really want to write something like

>
> color = enum(RED, WHITE, BLUE)
>
> that might still be true, but given that it's likely going to look a
> little more like a class definition, this doesn't look so bad, and
> certainly doesn't violate DRY (though it's somewhat verbose):
>
> class color(enum):
>   RED = value()
>   WHITE = value()
>   BLUE = value()
>
> The Python 3 metaclass can observe the order in which the values are
> defined easily by setting the class dict to an OrderdDict.
>
>
Even though I agree that enums lend themselves nicely to "class"-y syntax,
the example you provide shows exactly why sticking to existing syntax makes
use bend over backwards. Because 'color' is really not a class. And I don't
want to explicitly say it's both a class and it subclasses something called
'enum'. And I don't want to specify values when I don't need values. All I
really want is:

enum color:
  RED
  WHITE
  BLUE

Or shorter:

enum color:
  RED, WHITE, BLUE

Would adding a new "enum" keyword in Python 3.4 *really* meet that much
resistance? ISTM built-in, standard, enums have been on the wishlist of
Python developers for a long time.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130129/9c703ac8/attachment.html>

From solipsis at pitrou.net  Wed Jan 30 08:26:39 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 30 Jan 2013 08:26:39 +0100
Subject: [Python-ideas] constant/enum type in stdlib
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
Message-ID: <20130130082639.0b28d7eb@pitrou.net>

On Wed, 30 Jan 2013 17:58:37 +1300
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
> 
> > class color(enum):
> >   RED = value()
> >   WHITE = value()
> >   BLUE = value()
> 
> We could do somewhat better than that:
> 
>     class Color(Enum):
>        RED, WHITE, BLUE = range(3)
> 
> However, it's still slightly annoying that you have to
> specify how many values there are in the range() call.
> It would be even nicer it we could just use an infinite
> iterator, such as
> 
>     class Color(Enum):
>        RED, WHITE, BLUE = values()

Well, how about:

class Color(Enum):
    values = ('RED', 'WHITE', 'BLUE')

?

(replace values with __values__ if you prefer)

Regards

Antoine.




From storchaka at gmail.com  Wed Jan 30 08:49:43 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 30 Jan 2013 09:49:43 +0200
Subject: [Python-ideas] Interrupting threads
In-Reply-To: <51049915.3060808@mrabarnett.plus.com>
References: <51049915.3060808@mrabarnett.plus.com>
Message-ID: <keajao$6ld$1@ger.gmane.org>

On 27.01.13 05:03, MRAB wrote:
> I know that this topic has been discussed before, but I've added a new
> twist...

For previous discussion see topics "Thread stopping" [1] and "Protecting 
finally clauses of interruptions" [2]. See also PEP 419 [3] created on 
the results of the last discussion.

[1] http://comments.gmane.org/gmane.comp.python.ideas/14647
[2] http://comments.gmane.org/gmane.comp.python.ideas/14689
[3] http://www.python.org/dev/peps/pep-0419/




From mal at egenix.com  Wed Jan 30 09:37:02 2013
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 30 Jan 2013 09:37:02 +0100
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <51087225.3040801@hastings.org>
References: <51087225.3040801@hastings.org>
Message-ID: <5108DBAE.8030601@egenix.com>

On 30.01.2013 02:06, Larry Hastings wrote:
> 
> 
> Properties are a wonderful facility.  But they only work on conventional objects.  Specifically,
> they *don't* work on module objects.  It would be nice to extend module objects so properties worked
> there too.
> 
> For example, Victor Stinner's currently proposed PEP 433 adds two new methods to the sys module:
> sys.getdefaultcloexc() and sys.setdefaultcloexc().  What are we, Java?  Surely this would be much
> nicer as a property, sys.defaultcloexc.

Would be nice, but I'm not sure how you'd implement this, since module
contents are accessed directly via the module dictionary, so the attribute
lookup hook to add the property magic is missing.

Overall, it would be great to have modules behave more like classes.
This idea has been floating around for years, but hasn't gone far due
to the above direct content dict access approach taken by the Python
code.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 30 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/


From wolfgang.maier at biologie.uni-freiburg.de  Wed Jan 30 10:46:31 2013
From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier)
Date: Wed, 30 Jan 2013 09:46:31 +0000 (UTC)
Subject: [Python-ideas] while conditional in list comprehension ??
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<51086A96.9020300@pearwood.info>
Message-ID: <loom.20130130T094306-124@post.gmane.org>

Yuriy Taraday <yorik.sar at ...> writes:

> 
> 
> On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier
<wolfgang.maier at biologie.uni-freiburg.de> wrote:
> list(i for i in range(100) if i<50 or stop())
> Really (!) nice (and 2x as fast as using itertools.takewhile())!
> 
> 
> 
> I couldn't believe it so I had to check it:
> 
> 
> from __future__ import print_function
> 
> import functools, itertools, operator, timeit
> 
> def var1():
>     def _gen():
>         for i in range(100):
>             if i > 50: break
>             yield i
> 
>     return list(_gen())
> 
> def var2():
>     def stop():
>         raise StopIteration
>     return list(i for i in range(100) if i <= 50 or stop())
> 
> 
> def var3():
>     return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))]
> 
> def var4():
>     return [i for i in itertools.takewhile(functools.partial(operator.lt, 50),
range(100))]
> 
> 
> if __name__ == '__main__':
>     for f in (var1, var2, var3, var4):
>         print(f.__name__, end=' ')
>         print(timeit.timeit(f))
> 
> 
> 
> Results on my machine:
> 
> 
> var1 20.4974410534
> var2 23.6218020916
> var3 32.1543409824
> var4 4.90913701057
> 
> var1 might have became the fastest of the first 3 because it's a special and
very simple case. Why should explicit loops be slower that generator expressions?
> 
> var3 is the slowest. I guess, because it has lambda in it.
> But switching to Python and back can not be faster than the last option -
sitting in the C code as much as we can.
> 
> 
> -- Kind regards, Yuriy.

Steven D'Aprano <steve at ...> writes:

> 
> On 30/01/13 02:44, Wolfgang Maier wrote:
> 
> > list(i for i in range(100) if i<50 or stop())
> > Really (!) nice (and 2x as fast as using itertools.takewhile())!
> 
> I think you are mistaken about the speed. The itertools iterators are highly
> optimized and do all their work in fast C code. If you are seeing takewhile
> as slow, you are probably doing something wrong: untrustworthy timing code,
> misinterpreting what you are seeing, or some other error.
> 
> Here's a comparison done the naive or obvious way. Copy and paste it into an
> interactive Python session:
> 
> from itertools import takewhile
> from timeit import Timer
> 
> def stop(): raise StopIteration
> 
> setup = 'from __main__ import stop, takewhile'
> 
> t1 = Timer('list(i for i in xrange(1000) if i < 50 or stop())', setup)
> t2 = Timer('[i for i in takewhile(lambda x: x < 50, xrange(1000))]', setup)
> 
> min(t1.repeat(number=100000, repeat=5))
> min(t2.repeat(number=100000, repeat=5))
> 
> On my computer, t1 is about 1.5 times faster than t2. But this is misleading,
> because it's not takewhile that is slow. I am feeding something slow into
> takewhile. If I really need to run as fast as possible, I can optimize the
> function call inside takewhile:
> 
> from operator import lt
> from functools import partial
> 
> small_enough = partial(lt, 50)
> setup2 = 'from __main__ import takewhile, small_enough'
> 
> t3 = Timer('[i for i in takewhile(small_enough, xrange(1000))]', setup2)
> 
> min(t3.repeat(number=100000, repeat=5))
> 
> On my computer, t3 is nearly 13 times faster than t1, and 19 times faster
> than t2. Here are the actual times I get, using Python 2.7:
> 
> py> min(t1.repeat(number=100000, repeat=5))  # using the StopIteration hack
> 1.2609241008758545
> py> min(t2.repeat(number=100000, repeat=5))  # takewhile and lambda
> 1.85182785987854
> py> min(t3.repeat(number=100000, repeat=5))  # optimized version
> 0.09847092628479004
> 

Hi Yuriy and Steven,
a) I had compared the originally proposed 'takewhile with lambda' version to the
'if cond or stop()' solution using 'timeit' just like you did. In principle, you
find the same as I did, although I am a bit surprised that our differences are
different. To be exact 'if cond or stop()' was 1.84 x faster in my hands than
'takewhile with lambda'.

b) I have to say I was very impressed by the speed gains you report through the
use of 'partial', which I had not thought of at all, I have to admit.
However, I tested your suggestions and I think they both suffer from the same
mistake:
your condition is 'partial(lt,50)', but this is not met to begin with and
results in an empty list at least for me. Have you two actually checked the
output of the code or have you just timed it? I found that in order to make it
work the comparison has to be made via 'partial(gt,50)'. With this modification
the resulting list in your example would be [0,..,49] as it should be.

And now the big surprise in terms of runtimes:
partial(lt,50) variant:     1.17  (but incorrect results)
partial(gt,50) variant:    13.95
if cond or stop() variant:  9.86

I guess python is just smart enough to recognize that it compares against a
constant value all the time, and optimizes the code accordingly (after all the
if clause is a pretty standard thing to use in a comprehension).

So the reason for your reported speed-gain is that you actually broke out of the
comprehension at the very first element instead of going through the first 50!
Please comment, if you get different results.
Best,
Wolfgang







From ncoghlan at gmail.com  Wed Jan 30 10:54:18 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 30 Jan 2013 19:54:18 +1000
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <51087225.3040801@hastings.org>
References: <51087225.3040801@hastings.org>
Message-ID: <CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>

On Wed, Jan 30, 2013 at 11:06 AM, Larry Hastings <larry at hastings.org> wrote:
>
>
> Properties are a wonderful facility.  But they only work on conventional
> objects.  Specifically, they *don't* work on module objects.  It would be
> nice to extend module objects so properties worked there too.

As MAL notes, the issues with such an approach are:

- code executed at module scope
- code in inner scopes that uses "global"
- code that uses globals()
- code that directly modifies a module's __dict__

There is too much code that expects to be able to modify a module's
namespace directly without going through the attribute access
machinery.

However, a slightly more practical suggestion might be:

1. Officially bless the practice of placing class instances in
sys.modules (currently this is tolerated, since it's the only way to
manage things like lazy module loading, but not officially recommended
as the way to achieve "module properties")
2. Change sys from a module object to an ordinary class instance

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From chris.jerdonek at gmail.com  Wed Jan 30 10:54:24 2013
From: chris.jerdonek at gmail.com (Chris Jerdonek)
Date: Wed, 30 Jan 2013 01:54:24 -0800
Subject: [Python-ideas] Canceled versus cancelled (was Re: Interrupting
	threads)
In-Reply-To: <ke90rt$khg$1@ger.gmane.org>
References: <51049915.3060808@mrabarnett.plus.com>
	<5106B372.5040803@mrabarnett.plus.com>
	<CAH_1eM0WbyBhMyyhcZP90TXdSWpL3oyXMa7c1GR_H8MBMfJezA@mail.gmail.com>
	<20130129105443.2804520b@pitrou.net> <ke8i82$rj6$1@ger.gmane.org>
	<ke90rt$khg$1@ger.gmane.org>
Message-ID: <CAOTb1wfzFnQhMjUo1PRy+pn9bCK-pjwqhYtrvetvPi2HtiCeCQ@mail.gmail.com>

On Tue, Jan 29, 2013 at 9:28 AM, Terry Reedy <tjreedy at udel.edu> wrote:
> On 1/29/2013 8:18 AM, Richard Oudkerk wrote:
>>
>> On 29/01/2013 9:54am, Antoine Pitrou wrote:
>>>
>>> Of course, I sympathize with native English speakers who are annoyed
>>> by the prevalence of Globish over real English. That said, Python
>>> already mandates American English instead of British English.
>>
>>
>> Is Future.cancelled() an acceptable American spelling?
>
>
> Slightly controversial, but 'Yes'. My 1960s Dictionary of the American
> language gives 'canceled' and 'cancelled'. Ditto for travel.  I see the same
> at modern web sites:
> http://www.merriam-webster.com/dictionary/cancel
> http://www.thefreedictionary.com/cancel
>
> Both give the one el version first, and that might indicate a preference.
> But I was actually taught in school (some decades ago) to double the els of
> travel and cancel have have read the rule various places. I suspect that is
> not done now. More discussion:

FWIW, my high school grammar teacher (who himself wrote a grammar
book) taught us a rule about this.  I can't remember the rule in its
entirety, but part of it involved the location of the accent.  If the
accent is on the last syllable, then the final consonant is doubled --
modulo the rest of the rule. :)  For example, "referring" and
"fathering."

Of course, there are exceptions.

--Chris




>
> http://www.reference.com/motif/language/cancelled-vs-canceled
> http://grammarist.com/spelling/cancel/
>
> The latter has a Google ngram that shows 'canceled' has become more common
> in the U.S., but only in the last 30 years. It has even crept into British
> usage.
>
> http://books.google.com/ngrams/graph?content=canceled%2Ccancelled&year_start=1800&year_end=2000&corpus=6&smoothing=3&share=
>
> On the other hand, just about no one, even in the U.S., currently spells
> 'cancellation' as 'cancelation'. That was tried by a few writers 1910 to
> 1940, but never caught on.
>
> http://books.google.com/ngrams/graph?content=cancelation%2Ccancellation&year_start=1800&year_end=2000&corpus=17&smoothing=3&share=
>
> --
> Terry Jan Reedy
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas


From saghul at gmail.com  Wed Jan 30 10:55:51 2013
From: saghul at gmail.com (=?ISO-8859-1?Q?Sa=FAl_Ibarra_Corretg=E9?=)
Date: Wed, 30 Jan 2013 10:55:51 +0100
Subject: [Python-ideas] libuv based eventloop for tulip experiment
In-Reply-To: <CAP7+vJLVyps8UgB+Yx6dkmshn1gqEXxAWoOkbs-m96WQtpzvhA@mail.gmail.com>
References: <51070056.8020006@gmail.com>
	<CAP7+vJJd0oZxzGqu_L0juxDwWvaPhd7cmC8azJSmn=brSnTv_w@mail.gmail.com>
	<51082C41.2030508@gmail.com>
	<CAP7+vJLVyps8UgB+Yx6dkmshn1gqEXxAWoOkbs-m96WQtpzvhA@mail.gmail.com>
Message-ID: <5108EE27.1000102@gmail.com>


> Yeah, so do the other polling things on Windows. (Well, mostly
> sockets. There are some other things supported like named pipes.)
>

In pyuv there is a pecial handle for those (Pipe) which works on both 
unix and windows with the same interface.

> I guess in order to support this we'd need some kind of abstraction
> away from socket objects and file descriptors, at least for event loop
> methods like sock_recv() and add_reader(). But those are mostly meant
> for transports to build upon, so I think that would be fine.
>

I see, great!

[snip]

>
> Yeah, I see. If we squint and read "handle" instead of "socket" we
> could even make it so that loop.sock_recv() takes one of these -- it
> would return a Future and your callback would set the Future's result,
> or its exception if an error was set.
>

YEah, sounds like it could work :-) Anyway, I wouldn't be opposed to 
leaving to APIs just for Python sockets (which I can interact with using 
a Poll handle) if transports can be built on top other entities such as 
TCP handles.

[snip]

>
>> The second model seems more flexible indeed. I guess the SSL transport could
>> be tricky, because while currently Tulip uses the ssl module I have no TLS
>> handle on pyuv so I'd have to build one on top of a TCP handle with
>> pyOpenSSL (I have a prototype here [1]), so object types / APIs wouldn't
>> match, unless Tulip provides some wrappers for SSL related objects such as
>> certificates...
>
> Hm, I thought certificates were just blobs of data? We should probably
> come up with a standard way to represent these that isn't tied to the
> stdlib's ssl module. But I don't think this should be part of PEP 3156
> -- it's too big already.
>

Yes, they are blobs, I meant the objects that wrap those blobs and 
provide verification functions and such. But that can indeed be left out 
and have implementation deal with it, having tulip just hand over the blobs.


Regards,

-- 
Sa?l Ibarra Corretg?
http://saghul.net/blog | http://about.me/saghul


From mark.hackett at metoffice.gov.uk  Wed Jan 30 11:32:54 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Wed, 30 Jan 2013 10:32:54 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <5108383E.3020501@trueblade.com>
References: <1358903168.4767.4.camel@webb>
	<87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<5108383E.3020501@trueblade.com>
Message-ID: <201301301032.54211.mark.hackett@metoffice.gov.uk>

On Tuesday 29 Jan 2013, Eric V. Smith wrote:
> On 1/29/2013 3:37 PM, Stephen J. Turnbull wrote:
> > Eric V. Smith writes:
> >  > True. But my point stands: it's possible to read the data (even with a
> >  > DictReader), do something with the data, and not know the column names
> >  > in advance. It's not an impossible use case.
> >
> > But it is.  Dicts don't guarantee iteration order, so you will most
> > likely get an output file that not only has a different delimiter, but
> > a different order of fields.
> 
> We're going to have to agree to disagree. Order is not always important.
> 

It's not impossible that we're living in a simulated world.

If you don't know what's in the csv file at all, then how do you know what 
you're supposed to do with it.

Reading into a list will ensure order, so that is usable if order is 
important. If the names aren't important at all, then you should drop the first 
line and read it into a list again. If the names are important, you'd better 
know what names the headers are using.


From stefan_ml at behnel.de  Wed Jan 30 11:34:31 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 30 Jan 2013 11:34:31 +0100
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
Message-ID: <keasvl$s5k$1@ger.gmane.org>

Nick Coghlan, 30.01.2013 10:54:
> On Wed, Jan 30, 2013 at 11:06 AM, Larry Hastings wrote:
>> Properties are a wonderful facility.  But they only work on conventional
>> objects.  Specifically, they *don't* work on module objects.  It would be
>> nice to extend module objects so properties worked there too.
> 
> As MAL notes, the issues with such an approach are:
> 
> - code executed at module scope
> - code in inner scopes that uses "global"
> - code that uses globals()
> - code that directly modifies a module's __dict__
> 
> There is too much code that expects to be able to modify a module's
> namespace directly without going through the attribute access
> machinery.

The Cython project has been wanting this feature for years. We even
considered writing our own Module (sub-)type for this, but didn't get
ourselves convinced that all of the involved hassle was really worth it.
The main drive behind it is full control over setters to allow for safe and
fast internal C level access to module globals (which usually don't change
from the outside but may...). Currently, users help themselves by
explicitly declaring globals as static internal names that are invisible to
external Python code.

Allowing regular objects in sys.modules would be one way to do it, but
these things are a lot more involved at the C level than at the Python
level due to the C level module setup procedure.

I wouldn't mind letting such a feature appear at the C level first, even
though the Python syntax would be pretty obvious anyway. It's not like
people would commonly mess around with sys.__dict__. (Although, many C
modules have a Python module wrapper these days, not sure if and how this
should get passed through.)

Stefan




From steve at pearwood.info  Wed Jan 30 13:09:20 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Wed, 30 Jan 2013 23:09:20 +1100
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301301032.54211.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<5108383E.3020501@trueblade.com>
	<201301301032.54211.mark.hackett@metoffice.gov.uk>
Message-ID: <51090D70.2050102@pearwood.info>

On 30/01/13 21:32, Mark Hackett wrote:

> If you don't know what's in the csv file at all, then how do you know what
> you're supposed to do with it.

Maybe you're processing the file without caring what the column names are,
but you still need to map column name to column contents. This is no more
unusual than processing a dict where you don't know the keys: you just iterate
over them.

Or maybe you're scanning the file for one specific column name, and you don't
care what the other names are.

Or, most likely, you know what you are *expecting* in the CSV file, but because
data files don't always contain what you expect, you want to be notified if
there is something unexpected rather than just have it silently do the wrong
thing.



-- 
Steven




From mark.hackett at metoffice.gov.uk  Wed Jan 30 13:14:09 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Wed, 30 Jan 2013 12:14:09 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <51090D70.2050102@pearwood.info>
References: <1358903168.4767.4.camel@webb>
	<201301301032.54211.mark.hackett@metoffice.gov.uk>
	<51090D70.2050102@pearwood.info>
Message-ID: <201301301214.09203.mark.hackett@metoffice.gov.uk>

On Wednesday 30 Jan 2013, Steven D'Aprano wrote:
> On 30/01/13 21:32, Mark Hackett wrote:
> > If you don't know what's in the csv file at all, then how do you know
> > what you're supposed to do with it.
> 
> Maybe you're processing the file without caring what the column names are,

If you don't care, then you shouldn't be using a dictionary because you have 
to know to say what one you want.

> but you still need to map column name to column contents.

Why? You said this hypothetical reckless person doesn't care.

> This is no more
> unusual than processing a dict where you don't know the keys: you just
>  iterate over them.
> 

Which is only used for printing the info out.

There's a much easier way to do that:

"cat file.csv"

> Or maybe you're scanning the file for one specific column name, and you
>  don't care what the other names are.
> 

Then you'll know if it's duplicated or not.

> Or, most likely, you know what you are *expecting* in the CSV file, but
>  because data files don't always contain what you expect, you want to be
>  notified if there is something unexpected rather than just have it
>  silently do the wrong thing.
> 

There's a way to do that:

"head -n1 file.csv".

You know, have a look.


From shane at umbrellacode.com  Wed Jan 30 13:24:53 2013
From: shane at umbrellacode.com (Shane Green)
Date: Wed, 30 Jan 2013 04:24:53 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301301032.54211.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<5108383E.3020501@trueblade.com>
	<201301301032.54211.mark.hackett@metoffice.gov.uk>
Message-ID: <0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com>

So I've done some thinking on it, a bit of research, etc., and have worked with a lot of different CSV content.  There are a lot of parallels between the name/value pairs of an HTML form submission, and our use case.  

Namely:
	- There's typically only one value per name, but it's perfectly legal to have multiple values assigned to a name.
	- When there are duplicate multiple values assigned to a name, order can be very important. 
	- They made the mistake of mapping names to values; they made the mistake of mapping name field names to singular values when there was only one value, and multiple values where there were multiple values.  
	- Each of these have been deprecated an their FieldStorage now always maps field names to lists of values.  

I've implemented a Record class I'm going to pitch for feedback.  Although I followed the FieldStorage API for a couple of methods, it didn't translate very well because their values are complex objects.  This Record class is a dictionary type that maps header names to the values from columns labeled by that same header.  Most lists have a single field because usually headers aren't duplicated.  When multiple values are in a field, they are listed in the order they were read from the CSV file.  The API provides convenience methods for getting the first or last value listed for a given column name, making it very easy to turn work with singular values when desired.  The dictionary API will likely bent primary mechanism for interacting with it, however, knows the header and row sequences it was built from, and provides sequential access to them as well.  In addition to working with non-standard CSV, performing transformations, etc.this information makes it possible to reproduce correctly ordered CSV.

While I don't really know yet whether it would make sense to support any kind of manipulation of values on the record instances themselves, versus using more copy()/update() approach to defining modifying records or something, but I did decide to wrap the row values in a tuple, making it read only.  This was for several reasons.   One was to address a potential inconsistency that might arise should we decide to support editing, and the other is because the record is the representation of that row read from the source file, and so it should always accurately reflect that content.

About the code: I wrote it tonight, tested it for an hour, so it's not meant to be perfect or final, but it should stir up a very concrete discussion about the API, if nothing else ;-)  I included a generator that seemed to work on the some test files.  It most definitely is not meant to be critiqued or a distraction, but I've included it in case anyone ends up wanting to investigate the things further.  Although the iterator function provides a slightly different signature that DictReader, that's not because I'm trying toe change anything; please keep in mind the generator was just a test.  Also, I'd like to mention one last time that I don't think we should change what exists to reflect any of these changes: I was thinking it would be a new set of classes and functions that, that would become the preferred implementation in the future.  




> class Record(dict):
>     def __init__(self, headers, fields):
>         if len(headers) != len(fields):
>             # I don't make decicions about how gaps should be filled. 
>             raise ValueError("header/field size mismatch")
>         self._headers = headers
>         self._fields = tuple(fields)
>         [self.setdefault(h,[]).append(v) for h,v in self.fielditems()]
>         super(Record, self).__init__()
>     def fielditems(self):
>         """
>             Get header,value sequence that reflects CSV source.  
>         """
>         return zip(self.headers(),self.fields())
>     def headers(self):
>         """
>             Get ordered sequence of headers reflecting CSV source. 
>         """
>         return self._headers
>     def fields(self):
>         """
>             Get ordered sequence of values reflecting CSV row source. 
>         """
>         return self._fields
>     def getfirst(self, name, default=None):
>         """
>             Get value of last field associated with header named  
>             'name'; return 'default' if no such value exists. 
>         """
>         return self[name][0] if name in self else default
>     def getlast(self, name, default=None):
>         """
>             Get value of last field associated with header named  
>             'name'; return 'default' if no such value exists. 
>         """
>         return self[name][-1] if name in self else default
>     def getlist(self, name): 
>         """
>             Get values of all fields associated with header named 'name'.
>         """
>         return self.get(name, [])
>     def pretty(self, header=True):
>         lines = []
>         if header:
>             lines.append(
>                 ["%s".ljust(10).rjust(20) % h for h in self.headers()])
>         lines.append(
>             ["%s".ljust(10).rjust(20) % v for v in self.fields()])
>         return "\n\n".join(["|".join(line).strip() for line in lines])
>     def __getslice__(self, start=0, stop=None):
>         return self.fields()[start: stop]
> 
> 
> import itertools
> 
> Undefined = object()
> def iterrecords(f, headers=None, bucketheader=Undefined, 
>     missingfieldsok=False, dialect="excel", *args, **kw):
>     rows = reader(f, dialect, *args, **kw)
>     for row in itertools.ifilter(None, rows):
>         if not headers:
>             headers = row
>             headcount = len(headers)
>             print headers
>             continue
>         rowcount = len(row)
>         rowheaders = headers
>         if rowcount < headcount:
>             if not missingfieldsok:
>                 raise KeyError("row has more values than headers")
>         elif rowcount > headcount: 
>             if bucketheader is Undefined:
>                 raise KeyError("row has more values than headers")
>             rowheaders += [bucketheader] * (rowcount - headcount)
>         record = Record(rowheaders, row)
>         yield record


# That's run within the context of the "csv" module to work? maybe.  


Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 30, 2013, at 2:32 AM, Mark Hackett <mark.hackett at metoffice.gov.uk> wrote:

> On Tuesday 29 Jan 2013, Eric V. Smith wrote:
>> On 1/29/2013 3:37 PM, Stephen J. Turnbull wrote:
>>> Eric V. Smith writes:
>>>> True. But my point stands: it's possible to read the data (even with a
>>>> DictReader), do something with the data, and not know the column names
>>>> in advance. It's not an impossible use case.
>>> 
>>> But it is.  Dicts don't guarantee iteration order, so you will most
>>> likely get an output file that not only has a different delimiter, but
>>> a different order of fields.
>> 
>> We're going to have to agree to disagree. Order is not always important.
>> 
> 
> It's not impossible that we're living in a simulated world.
> 
> If you don't know what's in the csv file at all, then how do you know what 
> you're supposed to do with it.
> 
> Reading into a list will ensure order, so that is usable if order is 
> important. If the names aren't important at all, then you should drop the first 
> line and read it into a list again. If the names are important, you'd better 
> know what names the headers are using.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/d19663e7/attachment.html>

From shane at umbrellacode.com  Wed Jan 30 13:59:17 2013
From: shane at umbrellacode.com (Shane Green)
Date: Wed, 30 Jan 2013 04:59:17 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com>
References: <1358903168.4767.4.camel@webb>
	<87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<5108383E.3020501@trueblade.com>
	<201301301032.54211.mark.hackett@metoffice.gov.uk>
	<0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com>
Message-ID: <C05BFAE8-748B-4FB3-BC91-E3880DC9E2A2@umbrellacode.com>

> So I've done some thinking on it, a bit of research, etc., and have worked with a lot of different CSV content.  There are a lot of parallels between the name/value pairs of an HTML form submission, and our use case.  
> 
> Namely:
> 	- There's typically only one value per name, but it's perfectly legal to have multiple values assigned to a name.
> 	- When there are duplicate multiple values assigned to a name, order can be very important. 
> 	- They made the mistake of mapping names to values; they made the mistake of mapping name field names to singular values when there was only one value, and multiple values where there were multiple values.  
> 	- Each of these have been deprecated an their FieldStorage now always maps field names to lists of values.  
> 
> I've implemented a Record class I'm going to pitch for feedback.  Although I followed the FieldStorage API for a couple of methods, it didn't translate very well because their values are complex objects.  This Record class is a dictionary type that maps header names to the values from columns labeled by that same header.  Most lists have a single field because usually headers aren't duplicated.  When multiple values are in a field, they are listed in the order they were read from the CSV file.  The API provides convenience methods for getting the first or last value listed for a given column name, making it very easy to turn work with singular values when desired.  The dictionary API will likely bent primary mechanism for interacting with it, however, knows the header and row sequences it was built from, and provides sequential access to them as well.  In addition to working with non-standard CSV, performing transformations, etc.this information makes it possible to reproduce correctly ordered CSV.
> 
> While I don't really know yet whether it would make sense to support any kind of manipulation of values on the record instances themselves, versus using more copy()/update() approach to defining modifying records or something, but I did decide to wrap the row values in a tuple, making it read only.  This was for several reasons.   One was to address a potential inconsistency that might arise should we decide to support editing, and the other is because the record is the representation of that row read from the source file, and so it should always accurately reflect that content.
> 
> About the code: I wrote it tonight, tested it for an hour, so it's not meant to be perfect or final, but it should stir up a very concrete discussion about the API, if nothing else ;-)  I included a generator that seemed to work on the some test files.  It most definitely is not meant to be critiqued or a distraction, but I've included it in case anyone ends up wanting to investigate the things further.  Although the iterator function provides a slightly different signature that DictReader, that's not because I'm trying toe change anything; please keep in mind the generator was just a test.  Also, I'd like to mention one last time that I don't think we should change what exists to reflect any of these changes: I was thinking it would be a new set of classes and functions that, that would become the preferred implementation in the future.  
> 
> 
> 
> 
>> class Record(dict):
>>     def __init__(self, headers, fields):
>>         if len(headers) != len(fields):
>>             # I don't make decicions about how gaps should be filled. 
>>             raise ValueError("header/field size mismatch")
>>         self._headers = headers
>>         self._fields = tuple(fields)
>>         [self.setdefault(h,[]).append(v) for h,v in self.fielditems()]
>>         super(Record, self).__init__()
>>     def fielditems(self):
>>         """
>>             Get header,value sequence that reflects CSV source.  
>>         """
>>         return zip(self.headers(),self.fields())
>>     def headers(self):
>>         """
>>             Get ordered sequence of headers reflecting CSV source. 
>>         """
>>         return self._headers
>>     def fields(self):
>>         """
>>             Get ordered sequence of values reflecting CSV row source. 
>>         """
>>         return self._fields
>>     def getfirst(self, name, default=None):
>>         """
>>             Get value of last field associated with header named  
>>             'name'; return 'default' if no such value exists. 
>>         """
>>         return self[name][0] if name in self else default
>>     def getlast(self, name, default=None):
>>         """
>>             Get value of last field associated with header named  
>>             'name'; return 'default' if no such value exists. 
>>         """
>>         return self[name][-1] if name in self else default
>>     def getlist(self, name): 
>>         """
>>             Get values of all fields associated with header named 'name'.
>>         """
>>         return self.get(name, [])
>>     def pretty(self, header=True):
>>         lines = []
>>         if header:
>>             lines.append(
>>                 ["%s".ljust(10).rjust(20) % h for h in self.headers()])
>>         lines.append(
>>             ["%s".ljust(10).rjust(20) % v for v in self.fields()])
>>         return "\n\n".join(["|".join(line).strip() for line in lines])
>>     def __getslice__(self, start=0, stop=None):
>>         return self.fields()[start: stop]
>> 
>> 
>> import itertools
>> 
>> Undefined = object()
>> def iterrecords(f, headers=None, bucketheader=Undefined, 
>>     missingfieldsok=False, dialect="excel", *args, **kw):
>>     rows = reader(f, dialect, *args, **kw)
>>     for row in itertools.ifilter(None, rows):
>>         if not headers:
>>             headers = row
>>             headcount = len(headers)
>>             print headers
>>             continue
>>         rowcount = len(row)
>>         rowheaders = headers
>>         if rowcount < headcount:
>>             if not missingfieldsok:
>>                 raise KeyError("row has more values than headers")
>>         elif rowcount > headcount: 
>>             if bucketheader is Undefined:
>>                 raise KeyError("row has more values than headers")
>>             rowheaders += [bucketheader] * (rowcount - headcount)
>>         record = Record(rowheaders, row)
>>         yield record
> 


I should probably also have noted the dictionary API behaviour since it's not explicitly: 
	keys() -> list of unique() header names.
	values() -> list of field values lists.
	items() -> [(header, field-list),] pairs.

And then of course dictionary lookup.  One thing that comes to mind is that there's really no value to the unordered sequence of value lists; there could be some value in extending an OrderedDict, making all the iteration methods consistent and therefore something that could be used to do something like write values, etc?.




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/ef4af81d/attachment.html>

From jeff at jeffreyjenkins.ca  Wed Jan 30 15:04:47 2013
From: jeff at jeffreyjenkins.ca (Jeff Jenkins)
Date: Wed, 30 Jan 2013 09:04:47 -0500
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <C05BFAE8-748B-4FB3-BC91-E3880DC9E2A2@umbrellacode.com>
References: <1358903168.4767.4.camel@webb>
	<87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<5108383E.3020501@trueblade.com>
	<201301301032.54211.mark.hackett@metoffice.gov.uk>
	<0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com>
	<C05BFAE8-748B-4FB3-BC91-E3880DC9E2A2@umbrellacode.com>
Message-ID: <CAK6S7j+VbLDAikcO07wy+EmQoEdYATdNgb6dq=6i-D1YPzPc7w@mail.gmail.com>

I think this may have been lost somewhere in the last 90 messages, but
adding a warning to DictReader in the docs seems like it solves almost the
entire problem.  New csv.DictReader users are informed, no one's old code
breaks, and a separate discussion can be had about whether it's worth
adding a csv.MultiDictReader which uses lists.


On Wed, Jan 30, 2013 at 7:59 AM, Shane Green <shane at umbrellacode.com> wrote:

> So I've done some thinking on it, a bit of research, etc., and have worked
> with a lot of different CSV content.  There are a lot of parallels between
> the name/value pairs of an HTML form submission, and our use case.
>
> Namely:
> - There's typically only one value per name, but it's perfectly legal to
> have multiple values assigned to a name.
> - When there are duplicate multiple values assigned to a name, order can
> be very important.
> - They made the mistake of mapping names to values; they made the mistake
> of mapping name field names to singular values when there was only one
> value, and multiple values where there were multiple values.
> - Each of these have been deprecated an their FieldStorage now always maps
> field names to lists of values.
>
> I've implemented a Record class I'm going to pitch for feedback.  Although
> I followed the FieldStorage API for a couple of methods, it didn't
> translate very well because their values are complex objects.  This Record
> class is a dictionary type that maps header names to the values from
> columns labeled by that same header.  Most lists have a single field
> because usually headers aren't duplicated.  When multiple values are in a
> field, they are listed in the order they were read from the CSV file.  The
> API provides convenience methods for getting the first or last value listed
> for a given column name, making it very easy to turn work with singular
> values when desired.  The dictionary API will likely bent primary mechanism
> for interacting with it, however, knows the header and row sequences it was
> built from, and provides sequential access to them as well.  In addition to
> working with non-standard CSV, performing transformations, etc.this
> information makes it possible to reproduce correctly ordered CSV.
>
> While I don't really know yet whether it would make sense to support any
> kind of manipulation of values on the record instances themselves, versus
> using more copy()/update() approach to defining modifying records or
> something, but I did decide to wrap the row values in a tuple, making it
> read only.  This was for several reasons.   One was to address a potential
> inconsistency that might arise should we decide to support editing, and the
> other is because the record is the representation of that row read from the
> source file, and so it should always accurately reflect that content.
>
> About the code: I wrote it tonight, tested it for an hour, so it's not
> meant to be perfect or final, but it should stir up a very concrete
> discussion about the API, if nothing else ;-)  I included a generator that
> seemed to work on the some test files.  It most definitely is not meant to
> be critiqued or a distraction, but I've included it in case anyone ends up
> wanting to investigate the things further.  Although the iterator function
> provides a slightly different signature that DictReader, that's not because
> I'm trying toe change anything; please keep in mind the generator was just
> a test.  Also, I'd like to mention one last time that I don't think we
> should change what exists to reflect any of these changes: I was thinking
> it would be a new set of classes and functions that, that would become the
> preferred implementation in the future.
>
>
>
>
> class Record(dict):
>     def __init__(self, headers, fields):
>         if len(headers) != len(fields):
>             # I don't make decicions about how gaps should be filled.
>             raise ValueError("header/field size mismatch")
>         self._headers = headers
>         self._fields = tuple(fields)
>         [self.setdefault(h,[]).append(v) for h,v in self.fielditems()]
>         super(Record, self).__init__()
>     def fielditems(self):
>         """
>             Get header,value sequence that reflects CSV source.
>         """
>         return zip(self.headers(),self.fields())
>     def headers(self):
>         """
>             Get ordered sequence of headers reflecting CSV source.
>         """
>         return self._headers
>     def fields(self):
>         """
>             Get ordered sequence of values reflecting CSV row source.
>         """
>         return self._fields
>     def getfirst(self, name, default=None):
>         """
>             Get value of last field associated with header named
>             'name'; return 'default' if no such value exists.
>         """
>         return self[name][0] if name in self else default
>     def getlast(self, name, default=None):
>         """
>             Get value of last field associated with header named
>             'name'; return 'default' if no such value exists.
>         """
>         return self[name][-1] if name in self else default
>     def getlist(self, name):
>         """
>             Get values of all fields associated with header named 'name'.
>         """
>         return self.get(name, [])
>     def pretty(self, header=True):
>         lines = []
>         if header:
>             lines.append(
>                 ["%s".ljust(10).rjust(20) % h for h in self.headers()])
>         lines.append(
>             ["%s".ljust(10).rjust(20) % v for v in self.fields()])
>         return "\n\n".join(["|".join(line).strip() for line in lines])
>     def __getslice__(self, start=0, stop=None):
>         return self.fields()[start: stop]
>
>
> import itertools
>
> Undefined = object()
> def iterrecords(f, headers=None, bucketheader=Undefined,
>     missingfieldsok=False, dialect="excel", *args, **kw):
>     rows = reader(f, dialect, *args, **kw)
>     for row in itertools.ifilter(None, rows):
>         if not headers:
>             headers = row
>             headcount = len(headers)
>             print headers
>             continue
>         rowcount = len(row)
>         rowheaders = headers
>         if rowcount < headcount:
>             if not missingfieldsok:
>                 raise KeyError("row has more values than headers")
>         elif rowcount > headcount:
>             if bucketheader is Undefined:
>                 raise KeyError("row has more values than headers")
>             rowheaders += [bucketheader] * (rowcount - headcount)
>         record = Record(rowheaders, row)
>         yield record
>
>
>
>
> I should probably also have noted the dictionary API behaviour since it's
> not explicitly:
> keys() -> list of unique() header names.
> values() -> list of field values lists.
> items() -> [(header, field-list),] pairs.
>
> And then of course dictionary lookup.  One thing that comes to mind is
> that there's really no value to the unordered sequence of value lists;
> there could be some value in extending an OrderedDict, making all the
> iteration methods consistent and therefore something that could be used to
> do something like write values, etc?.
>
>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/42df6b2f/attachment.html>

From dreis.pt at hotmail.com  Wed Jan 30 15:22:37 2013
From: dreis.pt at hotmail.com (Daniel Reis)
Date: Wed, 30 Jan 2013 14:22:37 +0000
Subject: [Python-ideas] Standard library high level support for email
	messages
In-Reply-To: <mailman.5650.1359553300.2938.python-ideas@python.org>
References: <mailman.5650.1359553300.2938.python-ideas@python.org>
Message-ID: <COL002-W356F457A37666B1F46F7458C1E0@phx.gbl>

Hello all,

Python, as a "batteries included" language, strives to provide out of the box solution for most common programming tasks.
Composing and sending email messages is a common task, supported by `email` and `smtplib` modules.

However, a programmer not familiar with MIME won't be able to create non-trivial email messages.
Actually, this proposal idea comes from the frustration of fast learning about MIME to get the job done, and later learn that some people?s email clients couldn't properly display the messages because I tripped in some details of multipart messages with Text+HTML and attachments.

You can call me a bad programmer, but couldn't / shouldn't this be easier?
Should a programmer be required to know about MIME in order to send a decently composed email with images or attachments?

The hardest part is already built in. Why not go that one step further and add to the email standard library an wrapper to handle common email composition without exposing the MIME details.

Something similar to http://code.activestate.com/recipes/576858-send-html-or-text-email-with-or-without-attachment, or perhaps including as lib such as pyzlib.

Regards
Daniel Reis

 		 	   		   		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/9faeb8ed/attachment.html>

From shane at umbrellacode.com  Wed Jan 30 15:44:26 2013
From: shane at umbrellacode.com (Shane Green)
Date: Wed, 30 Jan 2013 06:44:26 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CAK6S7j+VbLDAikcO07wy+EmQoEdYATdNgb6dq=6i-D1YPzPc7w@mail.gmail.com>
References: <1358903168.4767.4.camel@webb>
	<87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<5108383E.3020501@trueblade.com>
	<201301301032.54211.mark.hackett@metoffice.gov.uk>
	<0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com>
	<C05BFAE8-748B-4FB3-BC91-E3880DC9E2A2@umbrellacode.com>
	<CAK6S7j+VbLDAikcO07wy+EmQoEdYATdNgb6dq=6i-D1YPzPc7w@mail.gmail.com>
Message-ID: <8A15CA39-99E1-4E57-8541-FE39B53323DD@umbrellacode.com>



	"""Also, I'd like to mention one last time that I don't think we should change what exists to reflect any of these changes: I was thinking it would be a new set of classes and functions that, that would become the preferred implementation in the future."""


This is kind of that new discussion.  I agree?




Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 30, 2013, at 6:04 AM, Jeff Jenkins <jeff at jeffreyjenkins.ca> wrote:

> I think this may have been lost somewhere in the last 90 messages, but adding a warning to DictReader in the docs seems like it solves almost the entire problem.  New csv.DictReader users are informed, no one's old code breaks, and a separate discussion can be had about whether it's worth adding a csv.MultiDictReader which uses lists.
> 
> 
> On Wed, Jan 30, 2013 at 7:59 AM, Shane Green <shane at umbrellacode.com> wrote:
>> So I've done some thinking on it, a bit of research, etc., and have worked with a lot of different CSV content.  There are a lot of parallels between the name/value pairs of an HTML form submission, and our use case.  
>> 
>> Namely:
>> 	- There's typically only one value per name, but it's perfectly legal to have multiple values assigned to a name.
>> 	- When there are duplicate multiple values assigned to a name, order can be very important. 
>> 	- They made the mistake of mapping names to values; they made the mistake of mapping name field names to singular values when there was only one value, and multiple values where there were multiple values.  
>> 	- Each of these have been deprecated an their FieldStorage now always maps field names to lists of values.  
>> 
>> I've implemented a Record class I'm going to pitch for feedback.  Although I followed the FieldStorage API for a couple of methods, it didn't translate very well because their values are complex objects.  This Record class is a dictionary type that maps header names to the values from columns labeled by that same header.  Most lists have a single field because usually headers aren't duplicated.  When multiple values are in a field, they are listed in the order they were read from the CSV file.  The API provides convenience methods for getting the first or last value listed for a given column name, making it very easy to turn work with singular values when desired.  The dictionary API will likely bent primary mechanism for interacting with it, however, knows the header and row sequences it was built from, and provides sequential access to them as well.  In addition to working with non-standard CSV, performing transformations, etc.this information makes it possible to reproduce correctly ordered CSV.
>> 
>> While I don't really know yet whether it would make sense to support any kind of manipulation of values on the record instances themselves, versus using more copy()/update() approach to defining modifying records or something, but I did decide to wrap the row values in a tuple, making it read only.  This was for several reasons.   One was to address a potential inconsistency that might arise should we decide to support editing, and the other is because the record is the representation of that row read from the source file, and so it should always accurately reflect that content.
>> 
>> About the code: I wrote it tonight, tested it for an hour, so it's not meant to be perfect or final, but it should stir up a very concrete discussion about the API, if nothing else ;-)  I included a generator that seemed to work on the some test files.  It most definitely is not meant to be critiqued or a distraction, but I've included it in case anyone ends up wanting to investigate the things further.  Although the iterator function provides a slightly different signature that DictReader, that's not because I'm trying toe change anything; please keep in mind the generator was just a test.  Also, I'd like to mention one last time that I don't think we should change what exists to reflect any of these changes: I was thinking it would be a new set of classes and functions that, that would become the preferred implementation in the future.  
>> 
>> 
>> 
>> 
>>> class Record(dict):
>>>     def __init__(self, headers, fields):
>>>         if len(headers) != len(fields):
>>>             # I don't make decicions about how gaps should be filled. 
>>>             raise ValueError("header/field size mismatch")
>>>         self._headers = headers
>>>         self._fields = tuple(fields)
>>>         [self.setdefault(h,[]).append(v) for h,v in self.fielditems()]
>>>         super(Record, self).__init__()
>>>     def fielditems(self):
>>>         """
>>>             Get header,value sequence that reflects CSV source.  
>>>         """
>>>         return zip(self.headers(),self.fields())
>>>     def headers(self):
>>>         """
>>>             Get ordered sequence of headers reflecting CSV source. 
>>>         """
>>>         return self._headers
>>>     def fields(self):
>>>         """
>>>             Get ordered sequence of values reflecting CSV row source. 
>>>         """
>>>         return self._fields
>>>     def getfirst(self, name, default=None):
>>>         """
>>>             Get value of last field associated with header named  
>>>             'name'; return 'default' if no such value exists. 
>>>         """
>>>         return self[name][0] if name in self else default
>>>     def getlast(self, name, default=None):
>>>         """
>>>             Get value of last field associated with header named  
>>>             'name'; return 'default' if no such value exists. 
>>>         """
>>>         return self[name][-1] if name in self else default
>>>     def getlist(self, name): 
>>>         """
>>>             Get values of all fields associated with header named 'name'.
>>>         """
>>>         return self.get(name, [])
>>>     def pretty(self, header=True):
>>>         lines = []
>>>         if header:
>>>             lines.append(
>>>                 ["%s".ljust(10).rjust(20) % h for h in self.headers()])
>>>         lines.append(
>>>             ["%s".ljust(10).rjust(20) % v for v in self.fields()])
>>>         return "\n\n".join(["|".join(line).strip() for line in lines])
>>>     def __getslice__(self, start=0, stop=None):
>>>         return self.fields()[start: stop]
>>> 
>>> 
>>> import itertools
>>> 
>>> Undefined = object()
>>> def iterrecords(f, headers=None, bucketheader=Undefined, 
>>>     missingfieldsok=False, dialect="excel", *args, **kw):
>>>     rows = reader(f, dialect, *args, **kw)
>>>     for row in itertools.ifilter(None, rows):
>>>         if not headers:
>>>             headers = row
>>>             headcount = len(headers)
>>>             print headers
>>>             continue
>>>         rowcount = len(row)
>>>         rowheaders = headers
>>>         if rowcount < headcount:
>>>             if not missingfieldsok:
>>>                 raise KeyError("row has more values than headers")
>>>         elif rowcount > headcount: 
>>>             if bucketheader is Undefined:
>>>                 raise KeyError("row has more values than headers")
>>>             rowheaders += [bucketheader] * (rowcount - headcount)
>>>         record = Record(rowheaders, row)
>>>         yield record
>> 
> 
> 
> I should probably also have noted the dictionary API behaviour since it's not explicitly: 
> 	keys() -> list of unique() header names.
> 	values() -> list of field values lists.
> 	items() -> [(header, field-list),] pairs.
> 
> And then of course dictionary lookup.  One thing that comes to mind is that there's really no value to the unordered sequence of value lists; there could be some value in extending an OrderedDict, making all the iteration methods consistent and therefore something that could be used to do something like write values, etc?.
> 
> 
> 
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/0cb3c70b/attachment.html>

From phd at phdru.name  Wed Jan 30 15:54:10 2013
From: phd at phdru.name (Oleg Broytman)
Date: Wed, 30 Jan 2013 18:54:10 +0400
Subject: [Python-ideas] Standard library high level support for email
 messages
In-Reply-To: <COL002-W356F457A37666B1F46F7458C1E0@phx.gbl>
References: <mailman.5650.1359553300.2938.python-ideas@python.org>
	<COL002-W356F457A37666B1F46F7458C1E0@phx.gbl>
Message-ID: <20130130145410.GA30635@iskra.aviel.ru>

Hi!

On Wed, Jan 30, 2013 at 02:22:37PM +0000, Daniel Reis <dreis.pt at hotmail.com> wrote:
> Python, as a "batteries included" language, strives to provide out of the box solution for most common programming tasks.
> Composing and sending email messages is a common task, supported by `email` and `smtplib` modules.
> 
> However, a programmer not familiar with MIME won't be able to create non-trivial email messages.
> Actually, this proposal idea comes from the frustration of fast learning about MIME to get the job done, and later learn that some people?s email clients couldn't properly display the messages because I tripped in some details of multipart messages with Text+HTML and attachments.
> 
> You can call me a bad programmer, but couldn't / shouldn't this be easier?
> Should a programmer be required to know about MIME in order to send a decently composed email with images or attachments?
> 
> The hardest part is already built in. Why not go that one step further and add to the email standard library an wrapper to handle common email composition without exposing the MIME details.
> 
> Something similar to http://code.activestate.com/recipes/576858-send-html-or-text-email-with-or-without-attachment, or perhaps including as lib such as pyzlib.

   The Law of Leaked Abstractions. If you are going to use a protocol or
a data format you have to learn all the basic details and deep
internals. Yes, it's inevitable. Because if something went wrong (sooner
or later) how do you debug your code without deep understanding of
what's going on?
   One of the most painful experience with email in Russia is when some
server (forum software, e.g.) running on Linux and using koi8-r charset
sends mail messages with unencoded headers to Windows users who use
cp1251 encoding. This is because server software are often written by
people who never use anything besides pure ascii so they write code
like:
    print "Subject: " + subject
How do you debug bug reports without understanding why and how you have
to encode mail headers?
   It was just an example, but I think it shows an important point.

   On the other hand actually writing software shouldn't be hard, I
agree. The way to extend the standard library is: write a module,
publish it on PyPI, make it popular, apply for inclusion the module.

Oleg.
-- 
     Oleg Broytman            http://phdru.name/            phd at phdru.name
           Programmers don't die, they just GOSUB without RETURN.


From mark.hackett at metoffice.gov.uk  Wed Jan 30 16:16:37 2013
From: mark.hackett at metoffice.gov.uk (Mark Hackett)
Date: Wed, 30 Jan 2013 15:16:37 +0000
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <CAK6S7j+VbLDAikcO07wy+EmQoEdYATdNgb6dq=6i-D1YPzPc7w@mail.gmail.com>
References: <1358903168.4767.4.camel@webb>
	<C05BFAE8-748B-4FB3-BC91-E3880DC9E2A2@umbrellacode.com>
	<CAK6S7j+VbLDAikcO07wy+EmQoEdYATdNgb6dq=6i-D1YPzPc7w@mail.gmail.com>
Message-ID: <201301301516.37499.mark.hackett@metoffice.gov.uk>

On Wednesday 30 Jan 2013, Jeff Jenkins wrote:
> I think this may have been lost somewhere in the last 90 messages, but
> adding a warning to DictReader in the docs seems like it solves almost the
> entire problem. 

Jeff, it breaks code that works now because duplicates aren't cared about.

Shane is putting code up for a NEW call that you can use if you're worried 
about how the current one works and consideration for this issue is being 
included in the derivation of a new library for the next (and therefore 
allowed to be incompatible) python library version.


From fuzzyman at gmail.com  Wed Jan 30 16:16:49 2013
From: fuzzyman at gmail.com (Michael Foord)
Date: Wed, 30 Jan 2013 15:16:49 +0000
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130129202730.6ea6d0d5@anarchist.wooz.org>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
Message-ID: <CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>

On 30 January 2013 01:27, Barry Warsaw <barry at python.org> wrote:

> On Jan 28, 2013, at 11:50 PM, Joao S. O. Bueno wrote:
>
> >And it was not dismissed at all - to the contrary the last e-mail in the
> >thread is a message from the BDLF for it to **be** ! The discussion
> happened
> >in a bad moment as Python was mostly freature froozen for 3.2 - and it did
> >not show up again for Python 3.3;
>
> I still offer up my own enum implementation, which I've used and has been
> available for years on PyPI, and hasn't had a new release in months
> because it
> hasn't needed one. :)  It should be compatible with Pythons from 2.6 to
> 3.3.
>
> http://pypi.python.org/pypi/flufl.enum
>
> The one hang up about it the last time this came up was that my enum items
> are
> not ints and Guido though they should be.  I actually tried at one point to
> make that so, but had some troublesome test failures that I didn't have
> time
> or motivation to fix, mostly because I don't particularly like those
> semantics.  I don't remember the details.
>
> However, if someone *else* wanted to submit a branch/patch to have enum
> items
> inherit from ints, and that was all it took to have these adopted into the
> stdlib, I would be happy to take a look.
>
>

Being an int subclass (and possibly optionally a strs subclass) is a
requirement if any adopted Enum is to be used *within* the standard library
in places where integers are currently used as "poor man's enums". I also
don't *think* flufl.enum supports flag enums (ones that can be OR'd
together), right?

Michael


> Cheers,
> -Barry
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>


-- 

http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/a4776990/attachment.html>

From stefan_ml at behnel.de  Wed Jan 30 16:25:36 2013
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Wed, 30 Jan 2013 16:25:36 +0100
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAF-Rda-e9ezNVqKx2GBngOJb6wT165Z1do9=boMioxBcJejfeA@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<CAF-Rda-e9ezNVqKx2GBngOJb6wT165Z1do9=boMioxBcJejfeA@mail.gmail.com>
Message-ID: <kebe1d$3jk$1@ger.gmane.org>

Eli Bendersky, 30.01.2013 06:26:
> enum color:
>   RED, WHITE, BLUE
> 
> Would adding a new "enum" keyword in Python 3.4 *really* meet that much
> resistance? ISTM built-in, standard, enums have been on the wishlist of
> Python developers for a long time.

Special cases aren't special enough to break the rules (or even existing
code!).

Stefan




From fuzzyman at gmail.com  Wed Jan 30 16:22:06 2013
From: fuzzyman at gmail.com (Michael Foord)
Date: Wed, 30 Jan 2013 15:22:06 +0000
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130082639.0b28d7eb@pitrou.net>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
Message-ID: <CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>

On 30 January 2013 07:26, Antoine Pitrou <solipsis at pitrou.net> wrote:

> On Wed, 30 Jan 2013 17:58:37 +1300
> Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> > Guido van Rossum wrote:
> >
> > > class color(enum):
> > >   RED = value()
> > >   WHITE = value()
> > >   BLUE = value()
> >
> > We could do somewhat better than that:
> >
> >     class Color(Enum):
> >        RED, WHITE, BLUE = range(3)
>



With a Python 3 metaclass that provides default values for *looked up*
entries you could have this:

class Color(Enum):
    RED, WHITE, BLUE

The lookup would create the member - with the appropriate value.

Michael





> >
> > However, it's still slightly annoying that you have to
> > specify how many values there are in the range() call.
> > It would be even nicer it we could just use an infinite
> > iterator, such as
> >
> >     class Color(Enum):
> >        RED, WHITE, BLUE = values()
>
> Well, how about:
>
> class Color(Enum):
>     values = ('RED', 'WHITE', 'BLUE')
>
> ?
>
> (replace values with __values__ if you prefer)
>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 

http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/1b40d3b8/attachment.html>

From fuzzyman at gmail.com  Wed Jan 30 16:30:48 2013
From: fuzzyman at gmail.com (Michael Foord)
Date: Wed, 30 Jan 2013 15:30:48 +0000
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
	<CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>
Message-ID: <CAKCKLWxTt0kwjc4dpRJUz27_0bt+u+AHkEqzTJEoZNC7rshKkQ@mail.gmail.com>

On 30 January 2013 15:22, Michael Foord <fuzzyman at gmail.com> wrote:

>
>
> On 30 January 2013 07:26, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
>> On Wed, 30 Jan 2013 17:58:37 +1300
>> Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> > Guido van Rossum wrote:
>> >
>> > > class color(enum):
>> > >   RED = value()
>> > >   WHITE = value()
>> > >   BLUE = value()
>> >
>> > We could do somewhat better than that:
>> >
>> >     class Color(Enum):
>> >        RED, WHITE, BLUE = range(3)
>>
>
>
>
> With a Python 3 metaclass that provides default values for *looked up*
> entries you could have this:
>
> class Color(Enum):
>     RED, WHITE, BLUE
>
> The lookup would create the member - with the appropriate value.
>
>


class values(dict):
    def __init__(self):
        self.value = 0
    def __getitem__(self, key):
        try:
            return dict.__getitem__(self, key)
        except KeyError:
            value = self[key] = self.value
            self.value += 1
            return value

class EnumMeta(type):

     @classmethod
     def __prepare__(metacls, name, bases):
        return values()

     def __new__(cls, name, bases, classdict):
        result = type.__new__(cls, name, bases, dict(classdict))
        return result


class Enum(metaclass=EnumMeta):
    pass

class Color(Enum):
    RED, WHITE, BLUE




> Michael
>
>
>
>
>
>> >
>> > However, it's still slightly annoying that you have to
>> > specify how many values there are in the range() call.
>> > It would be even nicer it we could just use an infinite
>> > iterator, such as
>> >
>> >     class Color(Enum):
>> >        RED, WHITE, BLUE = values()
>>
>> Well, how about:
>>
>> class Color(Enum):
>>     values = ('RED', 'WHITE', 'BLUE')
>>
>> ?
>>
>> (replace values with __values__ if you prefer)
>>
>> Regards
>>
>> Antoine.
>>
>>
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> http://mail.python.org/mailman/listinfo/python-ideas
>>
>
>
>
> --
>
> http://www.voidspace.org.uk/
>
> May you do good and not evil
> May you find forgiveness for yourself and forgive others
>
> May you share freely, never taking more than you give.
> -- the sqlite blessing http://www.sqlite.org/different.html
>
>


-- 

http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/1cc3bb41/attachment.html>

From barry at python.org  Wed Jan 30 16:35:48 2013
From: barry at python.org (Barry Warsaw)
Date: Wed, 30 Jan 2013 10:35:48 -0500
Subject: [Python-ideas] constant/enum type in stdlib
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
Message-ID: <20130130103548.12bce67d@anarchist.wooz.org>

On Jan 30, 2013, at 03:16 PM, Michael Foord wrote:

>Being an int subclass (and possibly optionally a strs subclass) is a
>requirement if any adopted Enum is to be used *within* the standard library
>in places where integers are currently used as "poor man's enums". I also
>don't *think* flufl.enum supports flag enums (ones that can be OR'd
>together), right?

Sure, it does because you have to be explicit about the enum int value to
assign the item.  This doesn't bother me because the syntax is clear, I almost
always want an explicit int value anyway, inheritance is supported, and as you
comment, flag values are (mostly) easy to support.

class Colors(Enum):
    red = 1
    green = 2
    blue = 3

class MoreColors(Colors):
    cyan = 4
    magenta = 5
    # chartreuse = 2 would be an error

class Flags(Enum):
    beautiful = 1
    fast = 2
    elegant = 4
    wonderful = 8


Now, it's true that because Flags.fast is not an int, it must be explicitly
converted to an int, e.g. `int(Flags.fast)`.  That doesn't bother me.

What does bother me is that Enum doesn't support automatic conversion to int
for OR and AND, so you have to do this:

>>> int(Flags.fast) | int(Flags.elegant)
6

That should be easy enough to fix by adding the appropriate operators so that
you could do:

>>> Flags.fast | Flags.elegant
6

Returning an int from such operations is the only sensible interpretation.

https://bugs.launchpad.net/flufl.enum/+bug/1110501

As far as autonumbering goes, I think we could support that in Python 3.3+,
though I don't have any brilliant ideas on syntax.  A couple of suggestions
are in this bug:

https://bugs.launchpad.net/flufl.enum/+bug/1110507

e.g

class Colors(Enum):
    red = None
    green = None
    blue = None

or

from flufl.enum import Enum, auto
class Colors(Enum):
    red = auto
    green = auto
    blue = auto

I'm definitely open to suggestions here!

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/32345120/attachment.pgp>

From guido at python.org  Wed Jan 30 16:45:12 2013
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Jan 2013 07:45:12 -0800
Subject: [Python-ideas] libuv based eventloop for tulip experiment
In-Reply-To: <5108EE27.1000102@gmail.com>
References: <51070056.8020006@gmail.com>
	<CAP7+vJJd0oZxzGqu_L0juxDwWvaPhd7cmC8azJSmn=brSnTv_w@mail.gmail.com>
	<51082C41.2030508@gmail.com>
	<CAP7+vJLVyps8UgB+Yx6dkmshn1gqEXxAWoOkbs-m96WQtpzvhA@mail.gmail.com>
	<5108EE27.1000102@gmail.com>
Message-ID: <CAP7+vJJ6Q6pMcgAxD=bV37qj52P-jD7eZ3wdvcbvV7SxbvcsgA@mail.gmail.com>

On Wed, Jan 30, 2013 at 1:55 AM, Sa?l Ibarra Corretg? <saghul at gmail.com> wrote:
>
>> Yeah, so do the other polling things on Windows. (Well, mostly
>> sockets. There are some other things supported like named pipes.)
>>
>
> In pyuv there is a pecial handle for those (Pipe) which works on both unix
> and windows with the same interface.

PEP 3156 should add a new API for adding a pipe (either the read or
write end). Someone worked on that for a bit, search last week's
python-ideas archives.

>> I guess in order to support this we'd need some kind of abstraction
>> away from socket objects and file descriptors, at least for event loop
>> methods like sock_recv() and add_reader(). But those are mostly meant
>> for transports to build upon, so I think that would be fine.
>>
>
> I see, great!

The iocp branch now has all these refactorings.

>> Hm, I thought certificates were just blobs of data? We should probably
>> come up with a standard way to represent these that isn't tied to the
>> stdlib's ssl module. But I don't think this should be part of PEP 3156
>> -- it's too big already.
>>
>
> Yes, they are blobs, I meant the objects that wrap those blobs and provide
> verification functions and such. But that can indeed be left out and have
> implementation deal with it, having tulip just hand over the blobs.

Do you know how to write code like that? It would be illustrative to
take the curl.py and crawl.py examples and adjust them so that if the
protocol is https, the server's authenticity is checked and reported.
I've never dealt with this myself so I would probably do it wrong...
:-(

-- 
--Guido van Rossum (python.org/~guido)


From barry at python.org  Wed Jan 30 16:46:23 2013
From: barry at python.org (Barry Warsaw)
Date: Wed, 30 Jan 2013 10:46:23 -0500
Subject: [Python-ideas] Standard library high level support for email
	messages
References: <mailman.5650.1359553300.2938.python-ideas@python.org>
	<COL002-W356F457A37666B1F46F7458C1E0@phx.gbl>
Message-ID: <20130130104623.4fb79da2@anarchist.wooz.org>

On Jan 30, 2013, at 02:22 PM, Daniel Reis wrote:

>The hardest part is already built in. Why not go that one step further and
>add to the email standard library an wrapper to handle common email
>composition without exposing the MIME details.

Please discuss this on the email-sig mailing list.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/f76e4515/attachment.pgp>

From eliben at gmail.com  Wed Jan 30 17:17:10 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Wed, 30 Jan 2013 08:17:10 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130103548.12bce67d@anarchist.wooz.org>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130103548.12bce67d@anarchist.wooz.org>
Message-ID: <CAF-Rda-6vDo4tG5+ByASSDUm9Z=em4KjLL6CWsfwgO2hF9X+fg@mail.gmail.com>

On Wed, Jan 30, 2013 at 7:35 AM, Barry Warsaw <barry at python.org> wrote:

> On Jan 30, 2013, at 03:16 PM, Michael Foord wrote:
>
> >Being an int subclass (and possibly optionally a strs subclass) is a
> >requirement if any adopted Enum is to be used *within* the standard
> library
> >in places where integers are currently used as "poor man's enums". I also
> >don't *think* flufl.enum supports flag enums (ones that can be OR'd
> >together), right?
>
> Sure, it does because you have to be explicit about the enum int value to
> assign the item.  This doesn't bother me because the syntax is clear, I
> almost
> always want an explicit int value anyway, inheritance is supported, and as
> you
> comment, flag values are (mostly) easy to support.
>
> class Colors(Enum):
>     red = 1
>     green = 2
>     blue = 3
>
> class MoreColors(Colors):
>     cyan = 4
>     magenta = 5
>     # chartreuse = 2 would be an error
>
> class Flags(Enum):
>     beautiful = 1
>     fast = 2
>     elegant = 4
>     wonderful = 8
>
>
> Now, it's true that because Flags.fast is not an int, it must be explicitly
> converted to an int, e.g. `int(Flags.fast)`.  That doesn't bother me.
>
> What does bother me is that Enum doesn't support automatic conversion to
> int
> for OR and AND, so you have to do this:
>
> >>> int(Flags.fast) | int(Flags.elegant)
> 6
>
> That should be easy enough to fix by adding the appropriate operators so
> that
> you could do:
>
> >>> Flags.fast | Flags.elegant
> 6
>
> Returning an int from such operations is the only sensible interpretation.
>
> https://bugs.launchpad.net/flufl.enum/+bug/1110501
>
> As far as autonumbering goes, I think we could support that in Python 3.3+,
> though I don't have any brilliant ideas on syntax.  A couple of suggestions
> are in this bug:
>
> https://bugs.launchpad.net/flufl.enum/+bug/1110507
>
> e.g
>
> class Colors(Enum):
>     red = None
>     green = None
>     blue = None
>
> or
>
> from flufl.enum import Enum, auto
> class Colors(Enum):
>     red = auto
>     green = auto
>     blue = auto
>
> I'm definitely open to suggestions here!
>

Barry, since you've obviously given this issue a lot of thought, maybe you
could summarize it in a PEP so we have a clear way of moving forward for
3.4 ?

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/dd7dc71e/attachment.html>

From p.f.moore at gmail.com  Wed Jan 30 17:21:41 2013
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 30 Jan 2013 16:21:41 +0000
Subject: [Python-ideas] libuv based eventloop for tulip experiment
In-Reply-To: <CAP7+vJJ6Q6pMcgAxD=bV37qj52P-jD7eZ3wdvcbvV7SxbvcsgA@mail.gmail.com>
References: <51070056.8020006@gmail.com>
	<CAP7+vJJd0oZxzGqu_L0juxDwWvaPhd7cmC8azJSmn=brSnTv_w@mail.gmail.com>
	<51082C41.2030508@gmail.com>
	<CAP7+vJLVyps8UgB+Yx6dkmshn1gqEXxAWoOkbs-m96WQtpzvhA@mail.gmail.com>
	<5108EE27.1000102@gmail.com>
	<CAP7+vJJ6Q6pMcgAxD=bV37qj52P-jD7eZ3wdvcbvV7SxbvcsgA@mail.gmail.com>
Message-ID: <CACac1F-tb3wt0XNpd3FZFNrkHV5x-HoAdM3Y7p-Moc0nuftF9A@mail.gmail.com>

On 30 January 2013 15:45, Guido van Rossum <guido at python.org> wrote:
>> In pyuv there is a pecial handle for those (Pipe) which works on both unix
>> and windows with the same interface.
>
> PEP 3156 should add a new API for adding a pipe (either the read or
> write end). Someone worked on that for a bit, search last week's
> python-ideas archives.

That was me.

There's a patched version of tulip with pipe connector methods and a
subprocess transport using them in my bitbucket repository:
https://bitbucket.org/pmoore/tulip

Paul


From solipsis at pitrou.net  Wed Jan 30 17:23:10 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 30 Jan 2013 17:23:10 +0100
Subject: [Python-ideas] constant/enum type in stdlib
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
Message-ID: <20130130172310.60b49ef4@pitrou.net>

Le Wed, 30 Jan 2013 15:16:49 +0000,
Michael Foord <fuzzyman at gmail.com> a
?crit :
> 
> Being an int subclass (and possibly optionally a strs subclass) is a
> requirement if any adopted Enum is to be used *within* the standard
> library in places where integers are currently used as "poor man's
> enums". I also don't *think* flufl.enum supports flag enums (ones
> that can be OR'd together), right?

If a flexible solution is desired (with either int or str subclassing,
various numbering schemes), may I suggest another kind of syntax:

class ErrorFlag(Enum):
    type = 'symbolic'
    names = ('strict', 'ignore', 'replace')

class SeekFlag(Enum):
    type = 'sequential'
    names = ('SET', 'CUR', 'END')

class TypeFlag(Enum):
    type = 'bitmask'
    names = ('HEAPTYPE', 'HAS_GC', 'INT_SUBCLASS')


>>> ErrorFlag.ignore
ErrorFlag.ignore
>>> ErrorFlag.ignore == 'ignore'
True
>>> ErrorFlag('ignore')
ErrorFlag.ignore
>>> isinstance(ErrorFlag.ignore, str)
True
>>> isinstance(ErrorFlag.ignore, int)
False
>>> ErrorFlag(0)
[...]
ValueError: invalid value for <class 'enum.ErrorFlag'>: 0

>>> SeekFlag('SET')
SeekFlag.SET
>>> SeekFlag('SET') + 0
0
>>> SeekFlag(0)        
SeekFlag.SET
>>> isinstance(SeekFlag.CUR, int)
True
>>> isinstance(SeekFlag.CUR, str)
False

>>> TypeFlag(1)
TypeFlag.HEAPTYPE
>>> TypeFlag(2)
TypeFlag.HAS_GC
>>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS
6


Regards

Antoine.




From barry at python.org  Wed Jan 30 17:27:07 2013
From: barry at python.org (Barry Warsaw)
Date: Wed, 30 Jan 2013 11:27:07 -0500
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAF-Rda-6vDo4tG5+ByASSDUm9Z=em4KjLL6CWsfwgO2hF9X+fg@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130103548.12bce67d@anarchist.wooz.org>
	<CAF-Rda-6vDo4tG5+ByASSDUm9Z=em4KjLL6CWsfwgO2hF9X+fg@mail.gmail.com>
Message-ID: <20130130112707.5cf60dfc@anarchist.wooz.org>

On Jan 30, 2013, at 08:17 AM, Eli Bendersky wrote:

>Barry, since you've obviously given this issue a lot of thought, maybe you
>could summarize it in a PEP so we have a clear way of moving forward for
>3.4 ?

I'm happy to do so if there's a realistic chance of it being accepted.  We
already have one rejected enum PEP (354) and we probably don't need two. ;)

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/5045dc7d/attachment.pgp>

From solipsis at pitrou.net  Wed Jan 30 17:26:27 2013
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Wed, 30 Jan 2013 17:26:27 +0100
Subject: [Python-ideas] constant/enum type in stdlib
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
	<CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>
Message-ID: <20130130172627.32f64e71@pitrou.net>

Le Wed, 30 Jan 2013 15:22:06 +0000,
Michael Foord <fuzzyman at gmail.com> a
?crit :
> On 30 January 2013 07:26, Antoine Pitrou
> <solipsis at pitrou.net> wrote:
> 
> > On Wed, 30 Jan 2013 17:58:37 +1300
> > Greg Ewing <greg.ewing at canterbury.ac.nz>
> > wrote:
> > > Guido van Rossum wrote:
> > >
> > > > class color(enum):
> > > >   RED = value()
> > > >   WHITE = value()
> > > >   BLUE = value()
> > >
> > > We could do somewhat better than that:
> > >
> > >     class Color(Enum):
> > >        RED, WHITE, BLUE = range(3)
> >
> 
> 
> 
> With a Python 3 metaclass that provides default values for *looked up*
> entries you could have this:
> 
> class Color(Enum):
>     RED, WHITE, BLUE

This relies on tuple evaluation order, and would also evaluate any
other symbol looked up from inside the class body (which means I
cannot add anything else than enum symbols to the class).

In other words, I'm afraid it would be somewhat fragile ;)

Regards

Antoine.




From eliben at gmail.com  Wed Jan 30 17:33:35 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Wed, 30 Jan 2013 08:33:35 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130112707.5cf60dfc@anarchist.wooz.org>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130103548.12bce67d@anarchist.wooz.org>
	<CAF-Rda-6vDo4tG5+ByASSDUm9Z=em4KjLL6CWsfwgO2hF9X+fg@mail.gmail.com>
	<20130130112707.5cf60dfc@anarchist.wooz.org>
Message-ID: <CAF-Rda9V1mv0UWuFCS17eKW+mL6tOHpycwtR=foGJ2Dzujy=pQ@mail.gmail.com>

On Wed, Jan 30, 2013 at 8:27 AM, Barry Warsaw <barry at python.org> wrote:

> On Jan 30, 2013, at 08:17 AM, Eli Bendersky wrote:
>
> >Barry, since you've obviously given this issue a lot of thought, maybe you
> >could summarize it in a PEP so we have a clear way of moving forward for
> >3.4 ?
>
> I'm happy to do so if there's a realistic chance of it being accepted.  We
> already have one rejected enum PEP (354) and we probably don't need two. ;)
>
>
Reading this thread it seems that many core devs are interested in the
feature and the discussion is mainly deciding on the exact semantics and
implementation. Even Guido didn't really speak against it (only somewhat
against adding new syntax).

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/869e5d11/attachment.html>

From fuzzyman at gmail.com  Wed Jan 30 17:35:25 2013
From: fuzzyman at gmail.com (Michael Foord)
Date: Wed, 30 Jan 2013 16:35:25 +0000
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130172627.32f64e71@pitrou.net>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
	<CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>
	<20130130172627.32f64e71@pitrou.net>
Message-ID: <CAKCKLWwJWv5qdQryY8VUh02fjo+EM4=1kfL4Bxh9_9d+9n4-7Q@mail.gmail.com>

On 30 January 2013 16:26, Antoine Pitrou <solipsis at pitrou.net> wrote:

> Le Wed, 30 Jan 2013 15:22:06 +0000,
> Michael Foord <fuzzyman at gmail.com> a
> ?crit :
> > On 30 January 2013 07:26, Antoine Pitrou
> > <solipsis at pitrou.net> wrote:
> >
> > > On Wed, 30 Jan 2013 17:58:37 +1300
> > > Greg Ewing <greg.ewing at canterbury.ac.nz>
> > > wrote:
> > > > Guido van Rossum wrote:
> > > >
> > > > > class color(enum):
> > > > >   RED = value()
> > > > >   WHITE = value()
> > > > >   BLUE = value()
> > > >
> > > > We could do somewhat better than that:
> > > >
> > > >     class Color(Enum):
> > > >        RED, WHITE, BLUE = range(3)
> > >
> >
> >
> >
> > With a Python 3 metaclass that provides default values for *looked up*
> > entries you could have this:
> >
> > class Color(Enum):
> >     RED, WHITE, BLUE
>
> This relies on tuple evaluation order,



It does if you do them as a tuple.


> and would also evaluate any
> other symbol looked up from inside the class body


Only if they aren't actually defined.


> (which means I
> cannot add anything else than enum symbols to the class).
>
>
So not true - it is only *undefined* symbols that are added as enum values.


> In other words, I'm afraid it would be somewhat fragile ;)
>

Well, within specific parameters...

Michael


>
> Regards
>
> Antoine.
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



-- 

http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/6a95c7f5/attachment.html>

From larry at hastings.org  Wed Jan 30 17:42:53 2013
From: larry at hastings.org (Larry Hastings)
Date: Wed, 30 Jan 2013 08:42:53 -0800
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
Message-ID: <51094D8D.606@hastings.org>

On 01/30/2013 01:54 AM, Nick Coghlan wrote:
> On Wed, Jan 30, 2013 at 11:06 AM, Larry Hastings <larry at hastings.org> wrote:
>> Properties are a wonderful facility.  But they only work on conventional
>> objects.  Specifically, they *don't* work on module objects.  It would be
>> nice to extend module objects so properties worked there too.
> As MAL notes, the issues with such an approach are:
>
> - code executed at module scope
> - code in inner scopes that uses "global"
> - code that uses globals()
> - code that directly modifies a module's __dict__
>
> There is too much code that expects to be able to modify a module's
> namespace directly without going through the attribute access
> machinery.

Of those four issues, the latter two are wontfix.  Code that futzes with 
an object's __dict__ bypasses the property machinery but this is already 
viewed as acceptable.

Obviously the point of the proposal is to change the behavior of the 
first two.  Whether this is manageable additional complexity, or fast 
enough, remains to be seen--which is why this is in ideas not dev.

Also, I'm not sure there are any existing globals that we'd want to 
convert into properties.  Assuming this is only used for new globals, 
this change hopefully wouldn't break existing code. (Fingers crossed.)


//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/481e8a40/attachment.html>

From storchaka at gmail.com  Wed Jan 30 17:56:56 2013
From: storchaka at gmail.com (Serhiy Storchaka)
Date: Wed, 30 Jan 2013 18:56:56 +0200
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130172310.60b49ef4@pitrou.net>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130172310.60b49ef4@pitrou.net>
Message-ID: <kebjcs$vou$1@ger.gmane.org>

On 30.01.13 18:23, Antoine Pitrou wrote:
>>>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS
> 6

I prefer something like

 >>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS
TypeFlag.HAS_GC|INT_SUBCLASS




From fuzzyman at gmail.com  Wed Jan 30 18:08:36 2013
From: fuzzyman at gmail.com (Michael Foord)
Date: Wed, 30 Jan 2013 17:08:36 +0000
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <kebjcs$vou$1@ger.gmane.org>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130172310.60b49ef4@pitrou.net> <kebjcs$vou$1@ger.gmane.org>
Message-ID: <CAKCKLWwEWEAanyRxxvX2KBk1DWkgjOgrZpf4OV+AuG+hg+az7w@mail.gmail.com>

On 30 January 2013 16:56, Serhiy Storchaka <storchaka at gmail.com> wrote:

> On 30.01.13 18:23, Antoine Pitrou wrote:
>
>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS
>>>>>
>>>> 6
>>
>
> I prefer something like
>
> >>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS
> TypeFlag.HAS_GC|INT_SUBCLASS
>
>

Indeed - the whole benefit (pretty much) of using an Enum class is that
you're no longer dealing with raw ints.

Michael


>
>
> ______________________________**_________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/**mailman/listinfo/python-ideas<http://mail.python.org/mailman/listinfo/python-ideas>
>



-- 

http://www.voidspace.org.uk/

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.
-- the sqlite blessing http://www.sqlite.org/different.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/ef6001fa/attachment.html>

From ethan at stoneleaf.us  Wed Jan 30 18:19:36 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 30 Jan 2013 09:19:36 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130172627.32f64e71@pitrou.net>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
	<CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>
	<20130130172627.32f64e71@pitrou.net>
Message-ID: <51095628.1080406@stoneleaf.us>

On 01/30/2013 08:26 AM, Antoine Pitrou wrote:
> Le Wed, Michael Foord a ?crit :
>> With a Python 3 metaclass that provides default values for *looked up*
>> entries you could have this:
>>
>> class Color(Enum):
>>      RED, WHITE, BLUE
>
> This relies on tuple evaluation order, and would also evaluate any
> other symbol looked up from inside the class body (which means I
> cannot add anything else than enum symbols to the class).

Probably a dumb question, but why would you want to add non-enum to an 
enum class?

~Ethan~



From ethan at stoneleaf.us  Wed Jan 30 18:28:35 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 30 Jan 2013 09:28:35 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130172310.60b49ef4@pitrou.net>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130172310.60b49ef4@pitrou.net>
Message-ID: <51095843.3070503@stoneleaf.us>

On 01/30/2013 08:23 AM, Antoine Pitrou wrote:
> If a flexible solution is desired (with either int or str subclassing,
> various numbering schemes), may I suggest another kind of syntax:
>
> class ErrorFlag(Enum):
>      type = 'symbolic'
>      names = ('strict', 'ignore', 'replace')
>
> class SeekFlag(Enum):
>      type = 'sequential'
>      names = ('SET', 'CUR', 'END')
>
> class TypeFlag(Enum):
>      type = 'bitmask'
>      names = ('HEAPTYPE', 'HAS_GC', 'INT_SUBCLASS')

This I like.


>>>> ErrorFlag.ignore
> ErrorFlag.ignore
>>>> ErrorFlag.ignore == 'ignore'
> True
>>>> ErrorFlag('ignore')
> ErrorFlag.ignore
>>>> isinstance(ErrorFlag.ignore, str)
> True
>>>> isinstance(ErrorFlag.ignore, int)
> False
>>>> ErrorFlag(0)
> [...]
> ValueError: invalid value for <class 'enum.ErrorFlag'>: 0
>
>>>> SeekFlag('SET')
> SeekFlag.SET
>>>> SeekFlag('SET') + 0
> 0
>>>> SeekFlag(0)
> SeekFlag.SET
>>>> isinstance(SeekFlag.CUR, int)
> True
>>>> isinstance(SeekFlag.CUR, str)
> False
>
>>>> TypeFlag(1)
> TypeFlag.HEAPTYPE
>>>> TypeFlag(2)
> TypeFlag.HAS_GC
>>>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS
> 6

This should be `TypeFlag.HEAPTYPE|HAS_GC`

+1

~Ethan~



From bruce at leapyear.org  Wed Jan 30 18:38:10 2013
From: bruce at leapyear.org (Bruce Leban)
Date: Wed, 30 Jan 2013 09:38:10 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <51095628.1080406@stoneleaf.us>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
	<CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>
	<20130130172627.32f64e71@pitrou.net> <51095628.1080406@stoneleaf.us>
Message-ID: <CAGu0AnsRfS3EWWzLAY6TLReyZ4TPdSx5Rkq-knw3=BJtD51U1w@mail.gmail.com>

On Wed, Jan 30, 2013 at 9:19 AM, Ethan Furman <ethan at stoneleaf.us> wrote:

> Probably a dumb question, but why would you want to add non-enum to an
> enum class?
>

class Color(Enum):
     RED, WHITE, BLUE

     def translate(language):
         """Get the name of an enum in the specified language."""
         pass

--- Bruce
Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/9ac7fd9d/attachment.html>

From yorik.sar at gmail.com  Wed Jan 30 18:56:51 2013
From: yorik.sar at gmail.com (Yuriy Taraday)
Date: Wed, 30 Jan 2013 21:56:51 +0400
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <loom.20130130T094306-124@post.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<51086A96.9020300@pearwood.info>
	<loom.20130130T094306-124@post.gmane.org>
Message-ID: <CABocrW7wjNb+eWnmJ+5acYzbC1t0COMxgDHpYxTnQeMM20CzTw@mail.gmail.com>

On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier <
wolfgang.maier at biologie.uni-freiburg.de> wrote:

> your condition is 'partial(lt,50)', but this is not met to begin with and
> results in an empty list at least for me. Have you two actually checked the
> output of the code or have you just timed it?
>

Yeah. Shame on me. You're right. My belief in partial and operator module
has been shaken.

-- 

Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/d5d9fbd0/attachment.html>

From oscar.j.benjamin at gmail.com  Wed Jan 30 19:05:44 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Wed, 30 Jan 2013 18:05:44 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CABocrW7wjNb+eWnmJ+5acYzbC1t0COMxgDHpYxTnQeMM20CzTw@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<51086A96.9020300@pearwood.info>
	<loom.20130130T094306-124@post.gmane.org>
	<CABocrW7wjNb+eWnmJ+5acYzbC1t0COMxgDHpYxTnQeMM20CzTw@mail.gmail.com>
Message-ID: <CAHVvXxTbH_XkLVh2kzZ=_J9bPKxh0t5L9DKSM6G_1Zj0e_fXmg@mail.gmail.com>

On 30 January 2013 17:56, Yuriy Taraday <yorik.sar at gmail.com> wrote:
>
> On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier
> <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>>
>> your condition is 'partial(lt,50)', but this is not met to begin with and
>> results in an empty list at least for me. Have you two actually checked
>> the
>> output of the code or have you just timed it?
>
> Yeah. Shame on me. You're right. My belief in partial and operator module
> has been shaken.
>

This is why I prefer this stop() idea to any of the takewhile()
versions: regardless of performance it leads to clearer code, that can
be understood more easily.


Oscar


From shane at umbrellacode.com  Wed Jan 30 20:02:51 2013
From: shane at umbrellacode.com (Shane Green)
Date: Wed, 30 Jan 2013 11:02:51 -0800
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxTbH_XkLVh2kzZ=_J9bPKxh0t5L9DKSM6G_1Zj0e_fXmg@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<51086A96.9020300@pearwood.info>
	<loom.20130130T094306-124@post.gmane.org>
	<CABocrW7wjNb+eWnmJ+5acYzbC1t0COMxgDHpYxTnQeMM20CzTw@mail.gmail.com>
	<CAHVvXxTbH_XkLVh2kzZ=_J9bPKxh0t5L9DKSM6G_1Zj0e_fXmg@mail.gmail.com>
Message-ID: <0D930FE7-D150-4DA5-90AB-F3EDAFB00E63@umbrellacode.com>

Although it's a bit of a cheat, if you create a wrapper of the thing you're iterating, or don't mind closing it (it's probably best to wrap it unless you know what it is), both generators and list comprehensions can be "while iterated" using this approach: 

[item for item in items if condition or items.close()]


When I tested it earlier with a 1000 entries 5 times and had forgotten the parens on close(), it made it really obvious there would be times when the wrapping overhead wasn't a problem: 

On Jan 30, 2013, at 9:02 AM, Shane Green <shane.green at me.com> wrote:

> Nice catch.  New times, 
> 
> >>> timeit.timeit(var1)
> 8.533167123794556
> >>> timeit.timeit(var2)
> 9.067211151123047
> >>> timeit.timeit(var3)
> 12.966150999069214
> >>> timeit.timeit(var4)
> 
> And I accidentally ran this (without parens), so it was a regular comprehension: 
> def var5(count=1000):
>     seq = (i for i in xrange(count))
>     return [i for i in seq if i < 50 or seq.close]
> 
> >>> timeit.timeit(var5)
> 212.26763486862183
> 
> Then fixed it: 
> >>> timeit.timeit(var5)
> 10.280441045761108
> >>> 
> 
> 
> 
> 
> 
> 
> Shane Green
> 805-452-9666 | shane.green at me.com
> 
> Begin forwarded message:
> 
>> From: Wolfgang Maier <wolfgang.maier at biologie.uni-freiburg.de>
>> Subject: RE: [Python-ideas] while conditional in list comprehension ??
>> Date: January 30, 2013 8:40:51 AM PST
>> To: 'Shane Green' <shane.green at me.com>
>> 
>> Careful! You?re using range() in the slow ones, but xrange() in the fast ones.
>> With the input seq being much longer than the output, differences in the time it takes to produce the range object may be important.
>>  
>> From: Shane Green [mailto:shane.green at me.com] 
>> Sent: Wednesday, January 30, 2013 5:37 PM
>> To: Wolfgang Maier
>> Subject: Re: [Python-ideas] while conditional in list comprehension ??
>>  
>> >>> def var1(count=1000):
>> ...     def _gen():
>> ...          for i in range(count):
>> ...               if i > 50: break
>> ...               yield i
>> ...     return list(_gen())
>> ... 
>> >>> def var2(count=1000):
>> ...     def stop():
>> ...          raise StopIteration
>> ...     return list(i for i in range(count) if i <= 50 or stop())
>> ... 
>> >>> def var3(count=1000):
>> ...     return [i for i in itertools.takewhile(lambda n: n <= 50, range(count))]
>> ... 
>> >>> def var4(count=1000):
>> ...     return [i for i in itertools.takewhile(functools.partial(operator.lt, 50)
>> ... 
>> >>> def var5(count=1000):
>> ...     seq = (i for i in xrange(count))
>> ...     return [i for i in seq if i < 50 or seq.close()]
>>  
>> >>> timeit.timeit(var1)
>> 19.118155002593994
>> >>> timeit.timeit(var2)
>>  
>> 19.217869997024536
>>  
>> >>> timeit.timeit(var5)
>> 10.251838207244873
>>  
>> >>> 
>>  
>>  
>>  
>>  
>> Shane Green
>> 805-452-9666 | shane.green at me.com
>>  
>> On Jan 30, 2013, at 8:17 AM, Wolfgang Maier <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>> 
>> 
>> list(i for i in a if i < 5000 or a.close())
>>  
> 



Shane Green 
www.umbrellacode.com
408-692-4666 | shane at umbrellacode.com

On Jan 30, 2013, at 10:05 AM, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:

> On 30 January 2013 17:56, Yuriy Taraday <yorik.sar at gmail.com> wrote:
>> 
>> On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier
>> <wolfgang.maier at biologie.uni-freiburg.de> wrote:
>>> 
>>> your condition is 'partial(lt,50)', but this is not met to begin with and
>>> results in an empty list at least for me. Have you two actually checked
>>> the
>>> output of the code or have you just timed it?
>> 
>> Yeah. Shame on me. You're right. My belief in partial and operator module
>> has been shaken.
>> 
> 
> This is why I prefer this stop() idea to any of the takewhile()
> versions: regardless of performance it leads to clearer code, that can
> be understood more easily.
> 
> 
> Oscar
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/4a38f800/attachment.html>

From g.rodola at gmail.com  Wed Jan 30 21:13:58 2013
From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=)
Date: Wed, 30 Jan 2013 21:13:58 +0100
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAF-Rda9V1mv0UWuFCS17eKW+mL6tOHpycwtR=foGJ2Dzujy=pQ@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130103548.12bce67d@anarchist.wooz.org>
	<CAF-Rda-6vDo4tG5+ByASSDUm9Z=em4KjLL6CWsfwgO2hF9X+fg@mail.gmail.com>
	<20130130112707.5cf60dfc@anarchist.wooz.org>
	<CAF-Rda9V1mv0UWuFCS17eKW+mL6tOHpycwtR=foGJ2Dzujy=pQ@mail.gmail.com>
Message-ID: <CAFYqXL9vvGZA+f+WCVxEoFO5BviGS8dSybRkoger_+0jYcv2OQ@mail.gmail.com>

2013/1/30 Eli Bendersky <eliben at gmail.com>:
>
>
>
> On Wed, Jan 30, 2013 at 8:27 AM, Barry Warsaw <barry at python.org> wrote:
>>
>> On Jan 30, 2013, at 08:17 AM, Eli Bendersky wrote:
>>
>> >Barry, since you've obviously given this issue a lot of thought, maybe
>> > you
>> >could summarize it in a PEP so we have a clear way of moving forward for
>> >3.4 ?
>>
>> I'm happy to do so if there's a realistic chance of it being accepted.  We
>> already have one rejected enum PEP (354) and we probably don't need two.
>> ;)
>>
>
> Reading this thread it seems that many core devs are interested in the
> feature and the discussion is mainly deciding on the exact semantics and
> implementation. Even Guido didn't really speak against it (only somewhat
> against adding new syntax).
>
> Eli


Personally I'm -1 for a variety of reasons.

1) a const/enum type looks like something which is subject to personal
taste to me. I personally don't like, for example, how flufl requires
to define constants by using a class.
It's just a matter of taste but to me module.FOO looks more "right"
than module.Bar.FOO.
Also "Colors.red < Colors.blue" raising an exception is something
subject to personal taste.

2) introducing something like that (class-based) wouldn't help
migrating the existent module-level constants we have in the stdlib.
Only new projects or new stdlib modules would benefit from it.

3) other than being subject to personal taste, a const/enum type is
also pretty easy to implement.
For example, I came up with this:
http://code.google.com/p/psutil/source/browse/trunk/psutil/_common.py?spec=svn1562&r=1524#33
...which is sufficient for my needs.
Users having different needs can do a similar thing pretty easily.

4) I'm getting the impression that the language is growing too big. To
me, this looks like yet another thing that infrequent users have to
learn before being able to read and understand Python code.
Also consider that people lived without const/enum for 2 decades now.


--- Giampaolo
http://code.google.com/p/pyftpdlib/
http://code.google.com/p/psutil/
http://code.google.com/p/pysendfile/


From steve at pearwood.info  Wed Jan 30 21:52:15 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 31 Jan 2013 07:52:15 +1100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <loom.20130130T094306-124@post.gmane.org>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<51086A96.9020300@pearwood.info>
	<loom.20130130T094306-124@post.gmane.org>
Message-ID: <510987FF.9010808@pearwood.info>

On 30/01/13 20:46, Wolfgang Maier wrote:

> b) I have to say I was very impressed by the speed gains you report through the
> use of 'partial', which I had not thought of at all, I have to admit.
> However, I tested your suggestions and I think they both suffer from the same
> mistake:
> your condition is 'partial(lt,50)', but this is not met to begin with and
> results in an empty list at least for me. Have you two actually checked the
> output of the code or have you just timed it? I found that in order to make it
> work the comparison has to be made via 'partial(gt,50)'.

Yes, you are absolutely correct. I screwed that up badly. I can only take comfort
that apparently so did Yuriy.

I don't often paste code in public without testing it, but when I do, it
invariably turns out to be wrong.



> With this modification
> the resulting list in your example would be [0,..,49] as it should be.
>
> And now the big surprise in terms of runtimes:
> partial(lt,50) variant:     1.17  (but incorrect results)
> partial(gt,50) variant:    13.95
> if cond or stop() variant:  9.86

I do not get such large differences. I get these:

py> min(t1.repeat(number=100000, repeat=5))  # cond or stop()
1.2582030296325684
py> min(t2.repeat(number=100000, repeat=5))  # takewhile and lambda
1.9907748699188232
py> min(t3.repeat(number=100000, repeat=5))  # takewhile and partial
1.8741891384124756

with the timers t1, t2, t3 as per my previous email.


> I guess python is just smart enough to recognize that it compares against a
> constant value all the time, and optimizes the code accordingly (after all the
> if clause is a pretty standard thing to use in a comprehension).


No, it is much simpler than that. partial(lt, 50) is equivalent to:

lambda x: lt(50, x)

which is equivalent to 50 < x, *not* x < 50 like I expected.

So the function tests 50 < 0 on the first iteration, which is False, and
takewhile immediately returns, giving you an empty list.

I was surprised that partial was *so much faster* than a regular function. But
it showed me what I expected/wanted to see, and so I didn't question it. A lesson
for us all.


> So the reason for your reported speed-gain is that you actually broke out of the
> comprehension at the very first element instead of going through the first 50!

Correct.



-- 
Steven


From eliben at gmail.com  Wed Jan 30 21:59:27 2013
From: eliben at gmail.com (Eli Bendersky)
Date: Wed, 30 Jan 2013 12:59:27 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAFYqXL9vvGZA+f+WCVxEoFO5BviGS8dSybRkoger_+0jYcv2OQ@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130103548.12bce67d@anarchist.wooz.org>
	<CAF-Rda-6vDo4tG5+ByASSDUm9Z=em4KjLL6CWsfwgO2hF9X+fg@mail.gmail.com>
	<20130130112707.5cf60dfc@anarchist.wooz.org>
	<CAF-Rda9V1mv0UWuFCS17eKW+mL6tOHpycwtR=foGJ2Dzujy=pQ@mail.gmail.com>
	<CAFYqXL9vvGZA+f+WCVxEoFO5BviGS8dSybRkoger_+0jYcv2OQ@mail.gmail.com>
Message-ID: <CAF-Rda94WZTepMJ8+HrOxaAi_44t2PyztzwSiP+R2KaAp872Dw@mail.gmail.com>

> Reading this thread it seems that many core devs are interested in the

>  > feature and the discussion is mainly deciding on the exact semantics and
> > implementation. Even Guido didn't really speak against it (only somewhat
> > against adding new syntax).
> >
> > Eli
>
>
> Personally I'm -1 for a variety of reasons.
>
> 1) a const/enum type looks like something which is subject to personal
> taste to me. I personally don't like, for example, how flufl requires
> to define constants by using a class.
> It's just a matter of taste but to me module.FOO looks more "right"
> than module.Bar.FOO.
> Also "Colors.red < Colors.blue" raising an exception is something
> subject to personal taste.
>
> 2) introducing something like that (class-based) wouldn't help
> migrating the existent module-level constants we have in the stdlib.
> Only new projects or new stdlib modules would benefit from it.
>

These are more in the domain of implementation details, though, not
criticizing the concep?


>
> 3) other than being subject to personal taste, a const/enum type is
> also pretty easy to implement.
> For example, I came up with this:
>
> http://code.google.com/p/psutil/source/browse/trunk/psutil/_common.py?spec=svn1562&r=1524#33
> ...which is sufficient for my needs.
> Users having different needs can do a similar thing pretty easily.
>

It is precisely *because* every library defines its own way to create enums
that IMHO we should have them in the language (or in the standard library,
at the least).


> 4) I'm getting the impression that the language is growing too big. To
> me, this looks like yet another thing that infrequent users have to
> learn before being able to read and understand Python code.
> Also consider that people lived without const/enum for 2 decades now.
>

I respectfully disagree. Most folks seem to favor a library solution (i.e.
no new syntax, just a new metaclass+class to use). The stdlib has tools for
very obscure things. In comparison, enum is something almost every
non-trivial program needs to use at some stage or another.

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/4d9903b6/attachment.html>

From tjreedy at udel.edu  Wed Jan 30 22:09:31 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 30 Jan 2013 16:09:31 -0500
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130082639.0b28d7eb@pitrou.net>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
Message-ID: <kec26g$mp2$1@ger.gmane.org>

On 1/30/2013 2:26 AM, Antoine Pitrou wrote:
> On Wed, 30 Jan 2013 17:58:37 +1300
> Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>> Guido van Rossum wrote:
>>
>>> class color(enum):
>>>    RED = value()
>>>    WHITE = value()
>>>    BLUE = value()
>>
>> We could do somewhat better than that:
>>
>>      class Color(Enum):
>>         RED, WHITE, BLUE = range(3)
>>
>> However, it's still slightly annoying that you have to
>> specify how many values there are in the range() call.

For small enumerations, not much of a problem. Or, if one does not want 
to take the time to count, allow

RED, WHITE, BLUE, _extras = range(12)  # any number >= n

and have a metaclass delete _extras.

> Well, how about:
>
> class Color(Enum):
>      values = ('RED', 'WHITE', 'BLUE')
> ?
> (replace values with __values__ if you prefer)

I had the same idea, and having never written a metaclass that I can 
remember, decided to try it.

class EnumMeta(type):
     def __new__(cls, name, bases, dic):
         for i, name in enumerate(dic['_values']):
             dic[name] = i
         del dic['_values']
         return type.__new__(cls, name, bases, dic)

class Enum(metaclass=EnumMeta):
     _values = ()

class Color(Enum):
     _values = 'RED', 'GREEN', 'BLUE'

print(Color.RED, Color.GREEN, Color.BLUE)
 >>>
0 1 2

So this syntax is at least feasible -- today.

-- 
Terry Jan Reedy



From tjreedy at udel.edu  Wed Jan 30 22:32:12 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Wed, 30 Jan 2013 16:32:12 -0500
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAKCKLWxTt0kwjc4dpRJUz27_0bt+u+AHkEqzTJEoZNC7rshKkQ@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
	<CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>
	<CAKCKLWxTt0kwjc4dpRJUz27_0bt+u+AHkEqzTJEoZNC7rshKkQ@mail.gmail.com>
Message-ID: <kec3h1$3rv$1@ger.gmane.org>

On 1/30/2013 10:30 AM, Michael Foord wrote:
> On 30 January 2013 15:22, Michael Foord

>     With a Python 3 metaclass that provides default values for *looked
>     up* entries you could have this:
>
>     class Color(Enum):
>          RED, WHITE, BLUE
>
>     The lookup would create the member - with the appropriate value.
>
> class values(dict):
>      def __init__(self):
>          self.value = 0
>      def __getitem__(self, key):

Adding  'print(self.value, key)' here prints

0 __name__
0 __name__
1 RED
2 WHITE
3 BLUE

(I do not understand why it is the second and not first lookup of 
__name__ that increments the counter, but...)

>          try:
>              return dict.__getitem__(self, key)
>          except KeyError:
>              value = self[key] = self.value
>              self.value += 1
>              return value
>
> class EnumMeta(type):
>
>       @classmethod
>       def __prepare__(metacls, name, bases):
>          return values()
>
>       def __new__(cls, name, bases, classdict):
>          result = type.__new__(cls, name, bases, dict(classdict))
>          return result
>
>
> class Enum(metaclass=EnumMeta):
>      pass
> class Color(Enum):
>      RED, WHITE, BLUE

So RED, WHITE, BLUE are 1, 2, 3; not 0, 1, 2 as I and many readers might 
expect. That aside (which can be fixed), this is very nice.

-- 
Terry Jan Reedy



From g.rodola at gmail.com  Wed Jan 30 22:52:53 2013
From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=)
Date: Wed, 30 Jan 2013 22:52:53 +0100
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAF-Rda94WZTepMJ8+HrOxaAi_44t2PyztzwSiP+R2KaAp872Dw@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130103548.12bce67d@anarchist.wooz.org>
	<CAF-Rda-6vDo4tG5+ByASSDUm9Z=em4KjLL6CWsfwgO2hF9X+fg@mail.gmail.com>
	<20130130112707.5cf60dfc@anarchist.wooz.org>
	<CAF-Rda9V1mv0UWuFCS17eKW+mL6tOHpycwtR=foGJ2Dzujy=pQ@mail.gmail.com>
	<CAFYqXL9vvGZA+f+WCVxEoFO5BviGS8dSybRkoger_+0jYcv2OQ@mail.gmail.com>
	<CAF-Rda94WZTepMJ8+HrOxaAi_44t2PyztzwSiP+R2KaAp872Dw@mail.gmail.com>
Message-ID: <CAFYqXL8bNw2SC_omMHLrjvnhj3ryEQb5CcuyvfY=WEitH=Bnog@mail.gmail.com>

2013/1/30 Eli Bendersky <eliben at gmail.com>:
>> Reading this thread it seems that many core devs are interested in the
>>
>> > feature and the discussion is mainly deciding on the exact semantics and
>> > implementation. Even Guido didn't really speak against it (only somewhat
>> > against adding new syntax).
>> >
>> > Eli
>>
>>
>> Personally I'm -1 for a variety of reasons.
>>
>> 1) a const/enum type looks like something which is subject to personal
>> taste to me. I personally don't like, for example, how flufl requires
>> to define constants by using a class.
>> It's just a matter of taste but to me module.FOO looks more "right"
>> than module.Bar.FOO.
>> Also "Colors.red < Colors.blue" raising an exception is something
>> subject to personal taste.
>>
>> 2) introducing something like that (class-based) wouldn't help
>> migrating the existent module-level constants we have in the stdlib.
>> Only new projects or new stdlib modules would benefit from it.
>
>
> These are more in the domain of implementation details, though, not
> criticizing the concep?

Personally I'd be +0 for a constant type and -1 for an enum type,
which I consider just useless.
If a 'constant' type has to be added though, I'd prefer it to be as
simple as possible and close to what we've been used thus far, meaning
accessing it as "foo.BAR".
In everybody's mind it is clear that "foo.BAR" is a constant, and that
should be preserved.
Something along these lines:

>>> from collections import constant
>>> STATUS_IDLE = constant(0, 'idle', doc='refers to the idle state')
>>> STATUS_IDLE
0
>>> str(STATUS_IDLE)
'idle'


---- Giampaolo
http://code.google.com/p/pyftpdlib/
http://code.google.com/p/psutil/
http://code.google.com/p/pysendfile/


From cs at zip.com.au  Wed Jan 30 23:19:26 2013
From: cs at zip.com.au (Cameron Simpson)
Date: Thu, 31 Jan 2013 09:19:26 +1100
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <5108A2F1.5010006@canterbury.ac.nz>
References: <5108A2F1.5010006@canterbury.ac.nz>
Message-ID: <20130130221926.GA20372@cskk.homeip.net>

On 30Jan2013 17:34, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
| Guido van Rossum wrote:
| > this doesn't look so bad, and
| > certainly doesn't violate DRY (though it's somewhat verbose):
| > 
| > class color(enum):
| >   RED = value()
| >   WHITE = value()
| >   BLUE = value()
| 
| The verbosity is what makes it fail the "truly elegant"
| test for me. And I would say that it does violate DRY
| in the sense that you have to write value() repeatedly
| for no good reason.
| 
| Sure, it's not bad enough to make it unusable, but like
| all the other solutions, it leaves me feeling vaguely
| annoyed that there isn't a better way.

How about this:

  Color = enum(RED=None, WHITE=None, BLUE=None, yellow=9)

where None means "pick the next natural choice.
The __init__ method goes something like this:

  def __init__(self, style=None, **kw):
    self._names = {}
    self._taken = set()
    for name, value in kw.items:
      if name in self._names:
        raise ValueError("name already taken: " + name)
      if value is None:
        while seq in self._taken:
          seq += 1
        value = seq
      elif value in self._taken:
        raise ValueError("\"%s\": value already taken: %s" % (name, value))
      self._names[name] = value
      self._taken.add(value)

Obviously this needs a little work:

  - you'd allocate the explicit values first and go after the Nones
    later so that you don't accidentally take an explicit value

  - you'd support (pluggable?) styles, starting with sequential,
    allocating 0, 1, 2, ... and bitmask allocating 1, 2, 4, ...

but it lets you enumerate the names without quoting and specify explicit
values and let the class pick default values.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

ERROR 155 - You can't do that.  - Data General S200 Fortran error code list


From ethan at stoneleaf.us  Wed Jan 30 23:26:40 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 30 Jan 2013 14:26:40 -0800
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAFYqXL8bNw2SC_omMHLrjvnhj3ryEQb5CcuyvfY=WEitH=Bnog@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130103548.12bce67d@anarchist.wooz.org>
	<CAF-Rda-6vDo4tG5+ByASSDUm9Z=em4KjLL6CWsfwgO2hF9X+fg@mail.gmail.com>
	<20130130112707.5cf60dfc@anarchist.wooz.org>
	<CAF-Rda9V1mv0UWuFCS17eKW+mL6tOHpycwtR=foGJ2Dzujy=pQ@mail.gmail.com>
	<CAFYqXL9vvGZA+f+WCVxEoFO5BviGS8dSybRkoger_+0jYcv2OQ@mail.gmail.com>
	<CAF-Rda94WZTepMJ8+HrOxaAi_44t2PyztzwSiP+R2KaAp872Dw@mail.gmail.com>
	<CAFYqXL8bNw2SC_omMHLrjvnhj3ryEQb5CcuyvfY=WEitH=Bnog@mail.gmail.com>
Message-ID: <51099E20.8060200@stoneleaf.us>

On 01/30/2013 01:52 PM, ? wrote:
> 2013/1/30 Eli Bendersky <eliben at gmail.com>:
>> These are more in the domain of implementation details, though, not
>> criticizing the concep?
>
> Personally I'd be +0 for a constant type and -1 for an enum type,
> which I consider just useless.
> If a 'constant' type has to be added though, I'd prefer it to be as
> simple as possible and close to what we've been used thus far, meaning
> accessing it as "foo.BAR".
> In everybody's mind it is clear that "foo.BAR" is a constant, and that
> should be preserved.
> Something along these lines:
>
>>>> from collections import constant
>>>> STATUS_IDLE = constant(0, 'idle', doc='refers to the idle state')
>>>> STATUS_IDLE
> 0
>>>> str(STATUS_IDLE)
> 'idle'

So you'd have something like:

--> from collections import constant
--> STATUS_IDLE = constant(0, 'idle', doc='refers to the idle state')
--> STATUS_PAUSE = constant(1, 'pause', doc='refers to the pause state')
--> STATUS_RUN = constant(2, 'run', doc='refers to the run state')

?

Absolutely -1 on this.  (Although you can certainly implement it now.)

~Ethan~


From steve at pearwood.info  Wed Jan 30 23:56:14 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 31 Jan 2013 09:56:14 +1100
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxTbH_XkLVh2kzZ=_J9bPKxh0t5L9DKSM6G_1Zj0e_fXmg@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<51086A96.9020300@pearwood.info>
	<loom.20130130T094306-124@post.gmane.org>
	<CABocrW7wjNb+eWnmJ+5acYzbC1t0COMxgDHpYxTnQeMM20CzTw@mail.gmail.com>
	<CAHVvXxTbH_XkLVh2kzZ=_J9bPKxh0t5L9DKSM6G_1Zj0e_fXmg@mail.gmail.com>
Message-ID: <5109A50E.8070308@pearwood.info>

On 31/01/13 05:05, Oscar Benjamin wrote:
> On 30 January 2013 17:56, Yuriy Taraday<yorik.sar at gmail.com>  wrote:
>>
>> On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier
>> <wolfgang.maier at biologie.uni-freiburg.de>  wrote:
>>>
>>> your condition is 'partial(lt,50)', but this is not met to begin with and
>>> results in an empty list at least for me. Have you two actually checked
>>> the
>>> output of the code or have you just timed it?
>>
>> Yeah. Shame on me. You're right. My belief in partial and operator module
>> has been shaken.
>>
>
> This is why I prefer this stop() idea to any of the takewhile()
> versions: regardless of performance it leads to clearer code, that can
> be understood more easily.


Funny you say that, clarity of code and ease of understanding is exactly why
I dislike this stop() idea.


1) It does not work with list, dict or set comprehensions, only with generator
    expressions. So if you need a list, dict or set, you have to avoid the
    obvious list/dict/set comprehension.


2) It is fragile: it is easy enough to come up with examples of the above
    that *appear* to work:

    [i for i in range(20) if i < 50 or stop()]  # appears to work fine
    [i for i in range(20) if i < 10 or stop()]  # breaks


3) It reads wrong for a Python boolean expression. Given an if clause:

        if cond1() or cond2()

     you should expect that an element is generated if either cond1 or cond2
     are true. When I see "if cond1() or stop()" I don't read it as "stop if
     not cond1()" but as a Python bool expression, "generate an element if
     cond1() gives a truthy value or if stop() gives a truthy value".


This "if cond or stop()" is a neat hack, but it's still a hack, and less
readable and understandable than I expect from Python code.


-- 
Steven


From oscar.j.benjamin at gmail.com  Thu Jan 31 01:37:23 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Thu, 31 Jan 2013 00:37:23 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <5109A50E.8070308@pearwood.info>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAHVvXxT8wPPtXHk9kwqH1Acb_fkAx-uCvY9UMnzsH==CZwLcOQ@mail.gmail.com>
	<loom.20130129T163910-565@post.gmane.org>
	<51086A96.9020300@pearwood.info>
	<loom.20130130T094306-124@post.gmane.org>
	<CABocrW7wjNb+eWnmJ+5acYzbC1t0COMxgDHpYxTnQeMM20CzTw@mail.gmail.com>
	<CAHVvXxTbH_XkLVh2kzZ=_J9bPKxh0t5L9DKSM6G_1Zj0e_fXmg@mail.gmail.com>
	<5109A50E.8070308@pearwood.info>
Message-ID: <CAHVvXxS+CzQw99BM5MAPnfwV8soh3eq49o3J8H5Fiw4nSctucw@mail.gmail.com>

On 30 January 2013 22:56, Steven D'Aprano <steve at pearwood.info> wrote:
> On 31/01/13 05:05, Oscar Benjamin wrote:
>>
>> On 30 January 2013 17:56, Yuriy Taraday<yorik.sar at gmail.com>  wrote:
>>>
>>>
>>> On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier
>>> <wolfgang.maier at biologie.uni-freiburg.de>  wrote:
>>>>
>>>>
>>>> your condition is 'partial(lt,50)', but this is not met to begin with
>>>> and
>>>> results in an empty list at least for me. Have you two actually checked
>>>> the
>>>> output of the code or have you just timed it?
>>>
>>>
>>> Yeah. Shame on me. You're right. My belief in partial and operator module
>>> has been shaken.
>>>
>>
>> This is why I prefer this stop() idea to any of the takewhile()
>> versions: regardless of performance it leads to clearer code, that can
>> be understood more easily.
>
> Funny you say that, clarity of code and ease of understanding is exactly why
> I dislike this stop() idea.
>
>
> 1) It does not work with list, dict or set comprehensions, only with
> generator
>    expressions. So if you need a list, dict or set, you have to avoid the
>    obvious list/dict/set comprehension.

That's true. I would prefer it if a similar effect were achievable in
these cases.

>
> 2) It is fragile: it is easy enough to come up with examples of the above
>    that *appear* to work:
>
>    [i for i in range(20) if i < 50 or stop()]  # appears to work fine
>    [i for i in range(20) if i < 10 or stop()]  # breaks

As I said I would prefer a solution that would work for list
comprehensions but there isn't one so the stop() method has to come
with the caveat that it can only be used in that way. That said, I
have become used to using a generator inside a call to dict() or set()
(since the comprehensions for those cases were only recently added) so
it doesn't seem a big problem to rewrite the above with calls to
list().

You are right, though, that a bug like this would be problematic. If
the StopIteration leaks up the call stack into a generator that is
being for-looped then it creates a confusing debug problem (at least
it did the first time I encountered it).

>
> 3) It reads wrong for a Python boolean expression. Given an if clause:
>
>        if cond1() or cond2()
>
>     you should expect that an element is generated if either cond1 or cond2
>     are true. When I see "if cond1() or stop()" I don't read it as "stop if
>     not cond1()" but as a Python bool expression, "generate an element if
>     cond1() gives a truthy value or if stop() gives a truthy value".

Again I would have preferred 'else break' or something clearer but
this seems the best available (I'm open to suggestions).

>
> This "if cond or stop()" is a neat hack, but it's still a hack, and less
> readable and understandable than I expect from Python code.

It is a hack (and I would prefer a supported method) but my point was
that both you and Yuriy wrote the wrong code without noticing it. You
both posted it to a mailing list where no one else noticed until
someone actually tried running the code. In other words it wasn't
obvious that the code was incorrect just from looking at it.

This one looks strange but if you knew what stop() was then you would
understand it:
    list(x for x in range(100) if x < 50 or stop())

This one is difficult to mentally parse even if you understand all of
the constituent parts:
    [x for x in takewhile(partial(lt, 50), range(100))]


Oscar


From steve at pearwood.info  Thu Jan 31 01:45:56 2013
From: steve at pearwood.info (Steven D'Aprano)
Date: Thu, 31 Jan 2013 11:45:56 +1100
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <51094D8D.606@hastings.org>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
	<51094D8D.606@hastings.org>
Message-ID: <5109BEC4.4050604@pearwood.info>

On 31/01/13 03:42, Larry Hastings wrote:

> Also, I'm not sure there are any existing globals that we'd want to convert into properties.

How about this?

math.pi = 3

which really should give an exception.

(I'm sure there are many others.)


-- 
Steven


From timothy.c.delaney at gmail.com  Thu Jan 31 02:27:24 2013
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Thu, 31 Jan 2013 12:27:24 +1100
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <kec3h1$3rv$1@ger.gmane.org>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
	<CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>
	<CAKCKLWxTt0kwjc4dpRJUz27_0bt+u+AHkEqzTJEoZNC7rshKkQ@mail.gmail.com>
	<kec3h1$3rv$1@ger.gmane.org>
Message-ID: <CAN8CLg=e9DAqdpgd4=0chU=dubVEr66LtnWUwYdJb6O2O4Vbag@mail.gmail.com>

On 31 January 2013 08:32, Terry Reedy <tjreedy at udel.edu> wrote:

> On 1/30/2013 10:30 AM, Michael Foord wrote:
>
>> On 30 January 2013 15:22, Michael Foord
>>
>
>      With a Python 3 metaclass that provides default values for *looked
>>     up* entries you could have this:
>>
>>     class Color(Enum):
>>          RED, WHITE, BLUE
>>
>>     The lookup would create the member - with the appropriate value.
>>
>> class values(dict):
>>      def __init__(self):
>>          self.value = 0
>>      def __getitem__(self, key):
>>
>
>
> So RED, WHITE, BLUE are 1, 2, 3; not 0, 1, 2 as I and many readers might
> expect. That aside (which can be fixed), this is very nice.


Here is a version that I think creates an enum with most of the features of
traditional and modern enums.

- Enum values are subclasses of int;

- Only need to declare the enum key name;

- Starts at zero by default;

- Can change the start value;

- Can have discontiguous values (e.g. 0, 1, 5, 6);

- Can have other types of class attributes;

- Ensures that there is a 1:1 mapping between key:value (throws an
exception if either of these is violated;

- Able to obtain the keys, values and items as per the mapping interface
(sorted by value);

- Lookup an enum by key or value;

One thing to note is that *any* class attribute assigned a value which
implements __index__ will be considered an enum value assignment.

I've done some funky stuff to ensure that you can access all the above
either via the enum class, or by an instance of the enum class. Most of the
time you would just use the Enum subclass directly (i.e. it's a namespace)
but there may be use cases for having instances of the Enum classes.

import collections
import operator

class EnumValue(int):
    def __new__(cls, key, value):
        e = super().__new__(cls, value)
        super().__setattr__(e, 'key', key)
        return e

    def __setattr__(self, key, value):
        raise TypeError("Cannot set attribute of type %r" % (type(self),))

    def __repr__(self):
        return "<%s '%s': %d>" % (self.__qualname__, self.key, self)

class EnumValues(collections.OrderedDict):
    def __init__(self):
        super().__init__()
        self.value = 0
        self.sealed = False

    def __getitem__(self, key):
        try:
            obj = super().__getitem__(key)

            if not self.sealed and isinstance(obj, EnumValue):
                raise TypeError("Duplicate enum key '%s' with values: %d
and %d" % (obj.key, obj, self.value))

            return obj

        except KeyError:
            if key[:2] == '__' and key[-2:] == '__':
                raise

            value = self.value
            super().__setitem__(key, EnumValue(key, value))
            self.value += 1
            return value

    def __setitem__(self, key, value):
        if key[:2] == '__' and key[-2:] == '__':
            return super().__setitem__(key, value)

        try:
            if isinstance(value, EnumValue):
                assert value.key == key
            else:
                value = operator.index(value)
        except TypeError:
            return super().__setitem__(key, value)

        try:
            o = super().__getitem__(key)

            if isinstance(o, EnumValue):
                raise TypeError("Duplicate enum key '%s' with values: %d
and %d" % (o.key, o, value))

        except KeyError:
            self.value = value + 1

            if isinstance(value, EnumValue):
                value = value
            else:
                value = EnumValue(key, value)

            super().__setitem__(value.key, value)

class EnumMeta(type):

    @classmethod
    def __prepare__(metacls, name, bases):
        return EnumValues()

    def __new__(cls, name, bases, classdict):
        classdict.sealed = True
        result = type.__new__(cls, name, bases, dict(classdict))
        enum = []

        for v in classdict.values():
            if isinstance(v, EnumValue):
                enum.append(v)

        enum.sort()
        result._key_to_enum = collections.OrderedDict()
        result._value_to_enum = collections.OrderedDict()

        for e in enum:
            if e in result._value_to_enum:
                raise TypeError("Duplicate enum value %d for keys: '%s' and
'%s'" % (e, result._value_to_enum[e].key), e.key)

            if e.key in result._key_to_enum:
                raise TypeError("Duplicate enum key '%s' with values: %d
and %d" % (e.key, result._key_to_enum[e.key]), e)

            result._key_to_enum[e.key] = e
            result._value_to_enum[e] = e

        return result

    def __getitem__(self, key):
        try:
            key = operator.index(key)
        except TypeError:
            return self._key_to_enum[key]
        else:
            return self._value_to_enum[key]

    def _items(self):
        return self._key_to_enum.items()

    def _keys(self):
        return self._key_to_enum.keys()

    def _values(self):
        return self._key_to_enum.values()

    def items(self):
        return self._items()

    def keys(self):
        return self._keys()

    def values(self):
        return self._values()

class Enum(metaclass=EnumMeta):
    def __getitem__(self, key):
        cls = type(self)
        return type(cls).__getitem__(cls, key)

    def items(cls):
        return cls._items()

    def keys(cls):
        return cls._keys()

    def values(cls):
        return cls._values()

Enum.items = classmethod(Enum.items)
Enum.keys = classmethod(Enum.keys)
Enum.values = classmethod(Enum.values)

class Color(Enum):
    RED, WHITE, BLUE
    GREEN = 4
    YELLOW
    ORANGE = 'orange'
    BLACK

    def dump(self):
        print(self.RED, self.WHITE, self.BLUE, self.GREEN, self.YELLOW,
self.BLACK, self.ORANGE, self.dump)

print(Color.RED, Color.WHITE, Color.BLUE, Color.GREEN, Color.YELLOW,
Color.BLACK, Color.ORANGE, Color.dump)
Color().dump()
print(repr(Color.RED))
print(repr(Color['RED']))
print(repr(Color().RED))
print(repr(Color()['RED']))
print(repr(Color[0]))
print(repr(Color()[0]))
print(*Color.items())
print(*Color().items())
print(*Color.keys())
print(*Color().keys())
print(*Color.values())
print(*Color().values())

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/f2315725/attachment.html>

From larry at hastings.org  Thu Jan 31 02:53:25 2013
From: larry at hastings.org (Larry Hastings)
Date: Wed, 30 Jan 2013 17:53:25 -0800
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <5109BEC4.4050604@pearwood.info>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
	<51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info>
Message-ID: <5109CE95.7060104@hastings.org>


On 01/30/2013 04:45 PM, Steven D'Aprano wrote:
> On 31/01/13 03:42, Larry Hastings wrote:
>
>> Also, I'm not sure there are any existing globals that we'd want to 
>> convert into properties.
>
> How about this?
>
> math.pi = 3
>
> which really should give an exception.
>
> (I'm sure there are many others.)

Well, hmm.  The thing is, properties--at least the existing 
implementation with classes--doesn't mesh well with direct access via 
the dict.  So, right now,

 >>> math.__dict__['pi']
3.141592653589793

If we change math.pi to be a property it wouldn't be in the dict 
anymore.  So that has the possibility of breaking code.

We could ameliorate it with

 >>> math.__dict__['pi'] = math.pi

But if the user assigns a different value to math.__dict__['pi'], 
math.pi will diverge, which again could break code.  (Who might try to 
assign a different value to pi?  The 1897 House Of Representatives of 
Indiana for one!)


More generally, it's often useful to monkeypatch "constants" at runtime, 
for testing purposes (and for less justifiable purposes). Why prevent 
that?  I cite the Consenting Adults rule.


//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/594d640f/attachment.html>

From ethan at stoneleaf.us  Thu Jan 31 05:22:50 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Wed, 30 Jan 2013 20:22:50 -0800
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <5109CE95.7060104@hastings.org>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
	<51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info>
	<5109CE95.7060104@hastings.org>
Message-ID: <5109F19A.3060902@stoneleaf.us>

On 01/30/2013 05:53 PM, Larry Hastings wrote:
>
> On 01/30/2013 04:45 PM, Steven D'Aprano wrote:
>> On 31/01/13 03:42, Larry Hastings wrote:
>>
>>> Also, I'm not sure there are any existing globals that we'd want to
>>> convert into properties.
>>
>> How about this?
>>
>> math.pi = 3
>>
>> which really should give an exception.
>>
>> (I'm sure there are many others.)
>
> Well, hmm.  The thing is, properties--at least the existing
> implementation with classes--doesn't mesh well with direct access via
> the dict.  So, right now,
>
>  >>> math.__dict__['pi']
> 3.141592653589793
>
> If we change math.pi to be a property it wouldn't be in the dict
> anymore.  So that has the possibility of breaking code.


So make the property access the __dict__:

--> class Test(object):
...   @property
...   def pi(self):
...     return self.__dict__['pi']
...   @pi.setter
...   def pi(self, new_value):
...     self.__dict__['pi'] = new_value
...
--> t = Test()
--> t
<__main__.Test object at 0x7f165d689850>
--> t.pi = 3.141596
--> t.pi
3.141596
--> t.__dict__['pi'] = 3
--> t.pi
3

~Ethan~


From larry at hastings.org  Thu Jan 31 06:04:29 2013
From: larry at hastings.org (Larry Hastings)
Date: Wed, 30 Jan 2013 21:04:29 -0800
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <5109F19A.3060902@stoneleaf.us>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
	<51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info>
	<5109CE95.7060104@hastings.org> <5109F19A.3060902@stoneleaf.us>
Message-ID: <5109FB5D.2090109@hastings.org>

On 01/30/2013 08:22 PM, Ethan Furman wrote:
> On 01/30/2013 05:53 PM, Larry Hastings wrote:
>> If we change math.pi to be a property it wouldn't be in the dict
>> anymore.  So that has the possibility of breaking code.
> So make the property access the __dict__:

In which case, it behaves exactly like it does today without a 
property.  Okay... so why bother?  If your answer is "so it can have 
code behind it", maybe you find a better example than math.pi, which 
will never need code behind it.

In general, I was proposing we add property support to modules mostly so 
that new globals could be properties, saving us from adding more 
accessors to the language.  Otherwise I'm gonna have to switch to Eclipse.


//arry/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130130/643dc525/attachment.html>

From ubershmekel at gmail.com  Thu Jan 31 08:28:15 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Thu, 31 Jan 2013 09:28:15 +0200
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <5109FB5D.2090109@hastings.org>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
	<51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info>
	<5109CE95.7060104@hastings.org> <5109F19A.3060902@stoneleaf.us>
	<5109FB5D.2090109@hastings.org>
Message-ID: <CANSw7KyBFk7Rw5x7ERACUFwBGjKyed9r0m95ZfS=cinbEwshTg@mail.gmail.com>

On Thu, Jan 31, 2013 at 7:04 AM, Larry Hastings <larry at hastings.org> wrote:

>  On 01/30/2013 08:22 PM, Ethan Furman wrote:
>
> On 01/30/2013 05:53 PM, Larry Hastings wrote:
>
> If we change math.pi to be a property it wouldn't be in the dict
> anymore.  So that has the possibility of breaking code.
>
> So make the property access the __dict__:
>
>
> In which case, it behaves exactly like it does today without a property.
> Okay... so why bother?  If your answer is "so it can have code behind it",
> maybe you find a better example than math.pi, which will never need code
> behind it.
>
> In general, I was proposing we add property support to modules mostly so
> that new globals could be properties, saving us from adding more accessors
> to the language.  Otherwise I'm gonna have to switch to Eclipse.
>
>
> */arry*
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>
>

I'm just gonna write "Python 4" for searching later.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/2f84af91/attachment.html>

From greg.ewing at canterbury.ac.nz  Thu Jan 31 09:17:35 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 31 Jan 2013 21:17:35 +1300
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130221926.GA20372@cskk.homeip.net>
References: <5108A2F1.5010006@canterbury.ac.nz>
	<20130130221926.GA20372@cskk.homeip.net>
Message-ID: <510A289F.4090904@canterbury.ac.nz>

Cameron Simpson wrote:
> How about this:
> 
>   Color = enum(RED=None, WHITE=None, BLUE=None, yellow=9)

You see, this is the problem -- there are quite a number
of these solutions, all about as good as each other, with
none of them standing out as obviously the right choice
for stdlib inclusion.

Michael Foord's solution has promise, though, as it manages
to eliminate *all* of the extraneous cruft and look almost
like it's built into the language.

Plus it has the bonus of making you go "...??? How the
blazes does *that* work?" the first time you see it. :-)

-- 
Greg


From ncoghlan at gmail.com  Thu Jan 31 09:32:39 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 31 Jan 2013 18:32:39 +1000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAH0mxTSoAtG1FP6UMDEN964TEvjdRVPL9g3BOnFrJ61wSf8ehw@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAH0mxTSoAtG1FP6UMDEN964TEvjdRVPL9g3BOnFrJ61wSf8ehw@mail.gmail.com>
Message-ID: <CADiSq7daMv2in+ODV8kMs2A7VBNh+MvvmksiXDMZWx4jRhfSWA@mail.gmail.com>

On Tue, Jan 29, 2013 at 10:35 PM, Joao S. O. Bueno
<jsbueno at python.org.br> wrote:
> On 29 January 2013 09:51, yoav glazner <yoavglazner at gmail.com> wrote:
>> Here is very similar version that works (tested on python27)
>>>>> def stop():
>> next(iter([]))
>>
>>>>> list((i if i<50 else stop()) for i in range(100))
>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
>> 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
>> 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
>
> Great. I think this nails it. It is exactly the intended behavior,
> and very readable under current language capabilities.
>
> One does not have to stop and go read what "itertools.takewhile" does,
> and mentally unfold the lambda guard expression - that is what makes
> this (and the O.P. request)  more readable than using takewhile.
>
> Note: stop can also just explictly raise StopIteration -
> or your next(iter([])) expression can be inlined within the generator.
>
> It works in Python 3 as well - though for those who did not test:
> it won't work for list, dicr or set  comprehensions - just for
> generator expressions.

This actually prompted an interesting thought for me. The
statement-as-expression syntactic equivalent of the "else stop()"
construct would actually be "else return", rather than "else break",
since the goal is to say "we're done", regardless of the level of loop
nesting.

It just so happens that, inside a generator (or generator expression)
raising StopIteration and returning from the generator are very close
to being equivalent operations, which is why the "else stop()" trick
works. In a 3.x container comprehension, the inner scope is an
ordinary function, so the equivalence between returning from the
function and raising StopIteration is lost.

Cheers,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From ubershmekel at gmail.com  Thu Jan 31 09:51:14 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Thu, 31 Jan 2013 10:51:14 +0200
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CAP7+vJKn6hE1zWujnDi=5dUtRsdovM7741G9bK0e4vQJvmbDPA@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<CANSw7KwafX+qGiMszK7LVMs=xacmyEUjGLDPdqGzfW95GxbXfg@mail.gmail.com>
	<20130127122121.6b779ada@pitrou.net>
	<CANSw7KyuPCg9Ot6tY3ML_WBquO0PfFVqpzWPRpa3o8gmTyVS_A@mail.gmail.com>
	<1359288997.3488.2.camel@localhost.localdomain>
	<CAP7+vJ+av5zRDDLThgo7CCn_xLJ7rY3u-myP6Hihf50j0z7pYQ@mail.gmail.com>
	<EDC83381-4C64-4215-A90B-C72F2327BCA7@umbrellacode.com>
	<CAP7+vJKn6hE1zWujnDi=5dUtRsdovM7741G9bK0e4vQJvmbDPA@mail.gmail.com>
Message-ID: <CANSw7KzG-ePHkuA4tv2LQ-+Fio2nWO0qv38WsH5QBY-JzDq3Eg@mail.gmail.com>

On Mon, Jan 28, 2013 at 5:45 PM, Guido van Rossum <guido at python.org> wrote:

> Hm. I'm not keen on precomputing all of that, since most protocols
>  won't need it, and the cost add up. This is not WSGI. The protocol has
> the transport object and can ask it specific questions -- if through a
> general API, like get_extra_info(key, [default]).
>
>
I forgot to ask before, but why is get_extra_info better than normal
attributes and methods?

val = transport.get_extra_info(key, None)
if val is None:
    pass

# vs

if hasattr(transport, key):
    val = transport.key
else:
    pass



Yuval
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/2ed499f0/attachment.html>

From ubershmekel at gmail.com  Thu Jan 31 09:52:09 2013
From: ubershmekel at gmail.com (Yuval Greenfield)
Date: Thu, 31 Jan 2013 10:52:09 +0200
Subject: [Python-ideas] PEP 3156: getting the socket or peer name from
	the transport
In-Reply-To: <CANSw7KzG-ePHkuA4tv2LQ-+Fio2nWO0qv38WsH5QBY-JzDq3Eg@mail.gmail.com>
References: <CAP7+vJ+_QwQW13b1mE1NSg8Z89MyYwHkyYmLwhumGo0iO-V+wA@mail.gmail.com>
	<CANSw7Kyi7d-Q5O=MqZTx5a-bOA3CQ4Me5RVafxJ0e7Kj1w693w@mail.gmail.com>
	<CAP7+vJ+BxgVPv0jBP=9GAdnXGdLRqgjCcBhVzVnJ4K7NDAxrjQ@mail.gmail.com>
	<CANSw7KwafX+qGiMszK7LVMs=xacmyEUjGLDPdqGzfW95GxbXfg@mail.gmail.com>
	<20130127122121.6b779ada@pitrou.net>
	<CANSw7KyuPCg9Ot6tY3ML_WBquO0PfFVqpzWPRpa3o8gmTyVS_A@mail.gmail.com>
	<1359288997.3488.2.camel@localhost.localdomain>
	<CAP7+vJ+av5zRDDLThgo7CCn_xLJ7rY3u-myP6Hihf50j0z7pYQ@mail.gmail.com>
	<EDC83381-4C64-4215-A90B-C72F2327BCA7@umbrellacode.com>
	<CAP7+vJKn6hE1zWujnDi=5dUtRsdovM7741G9bK0e4vQJvmbDPA@mail.gmail.com>
	<CANSw7KzG-ePHkuA4tv2LQ-+Fio2nWO0qv38WsH5QBY-JzDq3Eg@mail.gmail.com>
Message-ID: <CANSw7KywHzL==_Z-gGG89O6a5i1ebpX1AGF5gL-X-iv6YxOCbg@mail.gmail.com>

>
>     val = getattr(transport, key)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/576f5be5/attachment.html>

From ncoghlan at gmail.com  Thu Jan 31 09:56:16 2013
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 31 Jan 2013 18:56:16 +1000
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <51094D8D.606@hastings.org>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
	<51094D8D.606@hastings.org>
Message-ID: <CADiSq7cj8xLPXa-YjE-H9DEzXRbgjhm_B40ELQW8QjfGT3+htA@mail.gmail.com>

On Thu, Jan 31, 2013 at 2:42 AM, Larry Hastings <larry at hastings.org> wrote:
> Of those four issues, the latter two are wontfix.  Code that futzes with an
> object's __dict__ bypasses the property machinery but this is already viewed
> as acceptable.
>
> Obviously the point of the proposal is to change the behavior of the first
> two.  Whether this is manageable additional complexity, or fast enough,
> remains to be seen--which is why this is in ideas not dev.

Looking at the problem from a different direction:

Currently, modules are *instances* of a normal type
(types.ModuleType). Thus, anything stored in their global namespace is
like anything else stored in a normal instance dictionary: no
descriptor behaviour.

The request in this thread is basically for a way to:

1. Define a custom type
2. Put an instance of that type in sys.modules instead of the ordinary
module object

Now here's the thing: we already support this, because the import
system is designed to cope with modules replacing
"sys.modules[__name__]" while they're being loaded. The way this
happens is that, after we finish loading a module, we usually don't
trust what the loader gave us. Instead, we go look at what's in
sys.modules under the name being loaded.

So if, in your module code, you do this:

    import sys, types
    class MyPropertyUsingModule(types.ModuleType):
        def __init__(self, original):
            # Keep a reference to the original module to avoid the
            # destructive cleanup of the global namespace
            self._original = original

        @property
        def myglobal(self):
            return theglobal

        @myglobal.setter
        def myglobal(self, value):
            global theglobal
            theglobal = value

    sys.modules[__name__] = MyPropertyUsingModule(sys.modules[__name__])

Then what you end up with in sys.modules is a module with a global
property, "myglobal".

I'd prefer to upgrade this from "begrudged backwards compatibility
hack" to "supported feature", rather than doing anything more
complicated.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


From shane at umbrellacode.com  Thu Jan 31 11:05:53 2013
From: shane at umbrellacode.com (Shane Green)
Date: Thu, 31 Jan 2013 02:05:53 -0800
Subject: [Python-ideas] csv.DictReader could handle headers more
	intelligently.
In-Reply-To: <201301301516.37499.mark.hackett@metoffice.gov.uk>
References: <1358903168.4767.4.camel@webb>
	<C05BFAE8-748B-4FB3-BC91-E3880DC9E2A2@umbrellacode.com>
	<CAK6S7j+VbLDAikcO07wy+EmQoEdYATdNgb6dq=6i-D1YPzPc7w@mail.gmail.com>
	<201301301516.37499.mark.hackett@metoffice.gov.uk>
Message-ID: <510A4201.60504@umbrellacode.com>

It's important to note, though, that I'm not proposing a change for 
DictReader.  We defined the DictReader API a long time ago, and that API 
returns a single value for each column header; if a DictReader began 
returninig dicts with lists of values instead of single values, it would 
be a bug that violated  the API we've defined.

As fun as it would be to explain to people the reason there's now what 
they consider to be a bug in an application that's run "for like 10 
years," is because, if we hadn't fixed it for them, and they began using 
a different file format, there was a chance the old version wouldn't 
have read the new content properly, the truth is I do not want to 
replace DictReader behaviour with what's described below.  I would like 
thumbs +/-, and feedback on the idea of adding CsvRecordReader() (or 
something that mirrors DictReader but produces...) CSVRecord instances, 
for which I've suggested the API below as the starting point.

It might be good to change the subject or something, but I'll leave that 
to someone else because I'm infamous for doing the wrong thing in 
mailing lists...

> Mark Hackett <mailto:mark.hackett at metoffice.gov.uk>
> January 30, 2013 7:16 AM
>
> Jeff, it breaks code that works now because duplicates aren't cared about.
>
> Shane is putting code up for a NEW call that you can use if you're 
> worried
> about how the current one works and consideration for this issue is being
> included in the derivation of a new library for the next (and therefore
> allowed to be incompatible) python library version.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
> Jeff Jenkins <mailto:jeff at jeffreyjenkins.ca>
> January 30, 2013 6:04 AM
> I think this may have been lost somewhere in the last 90 messages, but 
> adding a warning to DictReader in the docs seems like it solves almost 
> the entire problem.  New csv.DictReader users are informed, no one's 
> old code breaks, and a separate discussion can be had about whether 
> it's worth adding a csv.MultiDictReader which uses lists.
>
>
>
> Shane Green <mailto:shane at umbrellacode.com>
> January 30, 2013 4:59 AM
>
>
> I should probably also have noted the dictionary API behaviour since 
> it's not explicitly:
> keys() -> list of unique() header names.
> values() -> list of field values lists.
> items() -> [(header, field-list),] pairs.
>
> And then of course dictionary lookup.  One thing that comes to mind is 
> that there's really no value to the unordered sequence of value lists; 
> there could be some value in extending an OrderedDict, making all the 
> iteration methods consistent and therefore something that could be 
> used to do something like write values, etc....
>
>
>
>
> J. Cliff Dyer <mailto:jcd at sdf.lonestar.org>
> January 22, 2013 5:06 PM
> Idea folks,
>
> I'm working with some poorly-formed CSV files, and I noticed that
> DictReader always and only pulls headers off of the first row. But many
> of the files I see have blank lines before the row of headers, sometimes
> with commas to the appropriate field count, sometimes without. The
> current implementation's behavior in this case is likely never correct,
> and certainly always annoying. Given the following file:
>
> ---Start File 1---
> ,,
> A,B,C
> 1,2,3
> 2,4,6
> ---End File 1---
>
> csv.DictReader yields the rows:
>
> {'': 'C'}
> {'': '3'}
> {'': '6'}
>
>
> And given a file starting with a zero-length line, like the following:
>
> ---Start File 2---
>
> A,B,C
> 1,2,3
> 2,4,6
> ---End File 2---
>
> It yields the following:
>
> {None: ['A', 'B', 'C']}
> {None: ['1', '2', '3']}
> {None: ['2', '4', '6']}
>
> I think that in both cases, the proper response would be treat the A,B,C
> line as the header line. The change that makes this work is pretty
> simple. In the fieldnames getter property, the "if not
> self._fieldnames:" conditional becomes "while not self._fieldnames or
> not any(self._fieldnames):" As a subclass:
>
> import csv
>
>
> class DictReader(csv.DictReader):
> @property
> def fieldnames(self):
> while self._fieldnames is None or not any(self._fieldnames):
> try:
> self._fieldnames = next(self.reader)
> except StopIteration:
> break
> return self._fieldnames
> self.line_num = self.reader.line_num
>
> #Same as the original setter, just rewritten to associate with the
> new getter propery
> @fieldnames.setter
> def fieldnames(self, value):
> self._fieldnames = value
>
> There might be some issues with existing code that depends on the {None:
> ['1','2','3']} construction, but I can't imagine a time when programmers
> would want to see {'': '3'} with the 1 and 2 values getting lost.
>
> Thoughts? Do folks think this is worth adding to the csv library, or
> should I just keep using my subclass?
>
> Cheers,
> Cliff
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/92ab7034/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/92ab7034/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postbox-contact.jpg
Type: image/jpeg
Size: 1041 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/92ab7034/attachment-0001.jpg>

From drekin at gmail.com  Thu Jan 31 11:38:29 2013
From: drekin at gmail.com (drekin at gmail.com)
Date: Thu, 31 Jan 2013 02:38:29 -0800 (PST)
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <20130130172310.60b49ef4@pitrou.net>
Message-ID: <510a49a5.49d80e0a.1489.ffffebe2@mx.google.com>


Hello. It should be also possible to specify the values of enum constants explicitly. For 'bitmask' type only powers of 2 should be allowed or maybe the values could be the exponents (as your TypeFlag example indicates).

The same way 'symbolic' type acts as str and 'sequential' type acts as int, 'bitmask' type could act both as int and set (or frozenset) since its semantics is like of set. The enum value object could represent both the int value and corresponding singleton set. OR-ing would produce corresponding multivalue set.

>>> isinstance(TypeFlag.HEAPTYPE, int)
True
>>> isinstance(TypeFlag.HEAPTYTE, set)
True

>>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS
TypeFlag.HEAPTYPE|HAS_GS # or maybe <TypeFlag {HEAPTYPE, HAS_GS}>
>>> TypeFlag.HEAPTYPE in (TypeFlag.HEAPTYPE | TypeFlag.HAS_GC)
True
>>> TypeFlag.HEAPTYPE in TypeFlag.HEAPTYPE
True

>>> TypeFlag(1)
TypeFlag.HEAPTYPE
>>> TypeFlag(2)
TypeFlag.HAS_GC
>>> set(TypeFlag.HEAPTYPE)
{1}
>>> set(TypeFlag.HEAPTYPE | TypeFlag.HAS_GC)
{1, 2}
>>> int(TypeFlag.HEAPTYPE)
2
>>> int(TypeFlag.HEAPTYPE | TypeFlag.HAS_GC)
6

Note the difference between n and 2 ** n semantics. So there slould be something like
>>> TypeFlag.decompose(2)
TypeFlag.HEAPTYPE
>>> TypeFlag.decompose(6)
TypeFlag.HEAPTYPE|HAS_GS


Regards, Drekin



From oscar.j.benjamin at gmail.com  Thu Jan 31 12:08:58 2013
From: oscar.j.benjamin at gmail.com (Oscar Benjamin)
Date: Thu, 31 Jan 2013 11:08:58 +0000
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CADiSq7daMv2in+ODV8kMs2A7VBNh+MvvmksiXDMZWx4jRhfSWA@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAH0mxTSoAtG1FP6UMDEN964TEvjdRVPL9g3BOnFrJ61wSf8ehw@mail.gmail.com>
	<CADiSq7daMv2in+ODV8kMs2A7VBNh+MvvmksiXDMZWx4jRhfSWA@mail.gmail.com>
Message-ID: <CAHVvXxRvoo0trP-VWeZUdtBQqGQyRP8ORE5AWzyp-cdmEhv5Rw@mail.gmail.com>

On 31 January 2013 08:32, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On Tue, Jan 29, 2013 at 10:35 PM, Joao S. O. Bueno
> <jsbueno at python.org.br> wrote:
>> On 29 January 2013 09:51, yoav glazner <yoavglazner at gmail.com> wrote:
>>> Here is very similar version that works (tested on python27)
>>>>>> def stop():
>>> next(iter([]))
>>>
>>>>>> list((i if i<50 else stop()) for i in range(100))
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
>>> 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
>>> 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
>
> This actually prompted an interesting thought for me. The
> statement-as-expression syntactic equivalent of the "else stop()"
> construct would actually be "else return", rather than "else break",
> since the goal is to say "we're done", regardless of the level of loop
> nesting.

I'm not sure if it is the goal to be able to break out of any level of
nesting or at least that's not how I interpreted the original
proposal. It is what happens for this stop() function but only because
there's no other way.

Personally I don't mind as I generally avoid multiple-for
comprehensions; by the time I've written one out I usually decide that
it would be more readable as ordinary for loops or with a separate
function.

> It just so happens that, inside a generator (or generator expression)
> raising StopIteration and returning from the generator are very close
> to being equivalent operations, which is why the "else stop()" trick
> works. In a 3.x container comprehension, the inner scope is an
> ordinary function, so the equivalence between returning from the
> function and raising StopIteration is lost.

I don't really understand what you mean here. What is the difference
between comprehensions in 2.x and 3.x?


Oscar


From jimjjewett at gmail.com  Thu Jan 31 17:10:33 2013
From: jimjjewett at gmail.com (Jim Jewett)
Date: Thu, 31 Jan 2013 11:10:33 -0500
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
Message-ID: <CA+OGgf6oQZ8m5hUNLaYnoRbWWe8FmTEXiQq1o9vX4fOt-OjPqA@mail.gmail.com>

On Tue, Jan 29, 2013 at 6:50 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> FWIW, since that last discussion, I've switched to using strings for
> my special constants, dumping them in a container if I need some kind
> of easy validity checking or iteration.

Unfortunately, some of the problems with that involve unicode
normalization, and won't show up in English.

Python has defined a normalization for identifiers; this normalization
does not apply to quoted strings.  Essentially, this is the same
problem string exceptions caused, except that it (sometimes) applies
to '==' as well as to 'is'.

Essentially, we want the simplicity of:

    color=enum(red, green, blue)

except that we *also* want to able to compare the symbols to (int or
str) constants, and to decide when they will be equal.  I don't see
any good way to support:

    color=enum(red=15, green, blue)

without requiring either that strings be used instead of symbols, or
that later entries be explicitly initialized.

-jJ


From jasonkeene at gmail.com  Thu Jan 31 17:35:28 2013
From: jasonkeene at gmail.com (Jason Keene)
Date: Thu, 31 Jan 2013 11:35:28 -0500
Subject: [Python-ideas] Definition Symmetry
Message-ID: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>

Why do function definitions require parens?

>>> class MyClass:
...     pass
...
>>> def my_func:
  File "<stdin>", line 1
    def my_func:
               ^
SyntaxError: invalid syntax

This seems to me to break a symmetry with class definitions.  I assume this
is just a hold off from C, perhaps there is a non-historical reason tho.

I believe in the past we've forced parens in list comprehensions to create
a symmetry between comprehensions/generator expressions.  Why not for this?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/50882389/attachment.html>

From jasonkeene at gmail.com  Thu Jan 31 17:43:19 2013
From: jasonkeene at gmail.com (Jason Keene)
Date: Thu, 31 Jan 2013 11:43:19 -0500
Subject: [Python-ideas] Definition Symmetry
In-Reply-To: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
References: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
Message-ID: <CAPAaX_gDsVaxXNOaCJTfF7zwwhLMvgtXcnYO+uxoThxUp4gacA@mail.gmail.com>

Just to be clear, I wasn't suggesting forcing parens for class
definitions.  Rather make them optional for functions!


On Thu, Jan 31, 2013 at 11:35 AM, Jason Keene <jasonkeene at gmail.com> wrote:

> Why do function definitions require parens?
>
> >>> class MyClass:
> ...     pass
> ...
> >>> def my_func:
>   File "<stdin>", line 1
>     def my_func:
>                ^
> SyntaxError: invalid syntax
>
> This seems to me to break a symmetry with class definitions.  I assume
> this is just a hold off from C, perhaps there is a non-historical reason
> tho.
>
> I believe in the past we've forced parens in list comprehensions to create
> a symmetry between comprehensions/generator expressions.  Why not for this?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/f26cf4a0/attachment.html>

From python at mrabarnett.plus.com  Thu Jan 31 18:15:09 2013
From: python at mrabarnett.plus.com (MRAB)
Date: Thu, 31 Jan 2013 17:15:09 +0000
Subject: [Python-ideas] Definition Symmetry
In-Reply-To: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
References: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
Message-ID: <510AA69D.1060300@mrabarnett.plus.com>

On 2013-01-31 16:35, Jason Keene wrote:
> Why do function definitions require parens?
>
>>>> class MyClass:
> ...     pass
> ...
>>>> def my_func:
>    File "<stdin>", line 1
>      def my_func:
>                 ^
> SyntaxError: invalid syntax
>
> This seems to me to break a symmetry with class definitions.  I assume
> this is just a hold off from C, perhaps there is a non-historical reason
> tho.
>
> I believe in the past we've forced parens in list comprehensions to
> create a symmetry between comprehensions/generator expressions.  Why not
> for this?
>
The parentheses are always required when calling the function, so it
makes sense to always require them when defining the function.

The case with class definitions is different; they are used in the
definition only when you want to specify the superclass.

They are always required when creating an instance of the class and in
method definitions.


From ethan at stoneleaf.us  Thu Jan 31 18:26:08 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 31 Jan 2013 09:26:08 -0800
Subject: [Python-ideas] Definition Symmetry
In-Reply-To: <510AA69D.1060300@mrabarnett.plus.com>
References: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
	<510AA69D.1060300@mrabarnett.plus.com>
Message-ID: <510AA930.9020704@stoneleaf.us>

On 01/31/2013 09:15 AM, MRAB wrote:
> On 2013-01-31 16:35, Jason Keene wrote:
>> Why do function definitions require parens?
>>
>>>>> class MyClass:
>> ...     pass
>> ...
>>>>> def my_func:
>>    File "<stdin>", line 1
>>      def my_func:
>>                 ^
>> SyntaxError: invalid syntax
>>
>> This seems to me to break a symmetry with class definitions.  I assume
>> this is just a hold off from C, perhaps there is a non-historical reason
>> tho.
>>
> The parentheses are always required when calling the function, so it
> makes sense to always require them when defining the function.
>
> The case with class definitions is different; they are used in the
> definition only when you want to specify the superclass.

... they are required in the definition when you want to specify the 
superclass, and optional otherwise.

~Ethan~


From ned at nedbatchelder.com  Thu Jan 31 19:11:55 2013
From: ned at nedbatchelder.com (Ned Batchelder)
Date: Thu, 31 Jan 2013 13:11:55 -0500
Subject: [Python-ideas] Definition Symmetry
In-Reply-To: <510AA69D.1060300@mrabarnett.plus.com>
References: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
	<510AA69D.1060300@mrabarnett.plus.com>
Message-ID: <510AB3EB.9020806@nedbatchelder.com>


On 1/31/2013 12:15 PM, MRAB wrote:
> On 2013-01-31 16:35, Jason Keene wrote:
>> Why do function definitions require parens?
>>
>>>>> class MyClass:
>> ...     pass
>> ...
>>>>> def my_func:
>>    File "<stdin>", line 1
>>      def my_func:
>>                 ^
>> SyntaxError: invalid syntax
>>
>> This seems to me to break a symmetry with class definitions.  I assume
>> this is just a hold off from C, perhaps there is a non-historical reason
>> tho.
>>
>> I believe in the past we've forced parens in list comprehensions to
>> create a symmetry between comprehensions/generator expressions. Why not
>> for this?
>>
> The parentheses are always required when calling the function, so it
> makes sense to always require them when defining the function.
>
> The case with class definitions is different; they are used in the
> definition only when you want to specify the superclass.
>

I think parens for super class are an unfortunate syntax, since it looks 
just like arguments to the class and is confusing for some beginners:

     def function(arg):
         ...
     function(10)            # Similar syntax: 10 corresponds to arg

     class Thing(Something):
         ...
     thing = Thing(10)    # How does 10 relate to Something? It doesn't.

A better syntax (which I AM NOT PROPOSING) would be:

     class Thing from Something:

--Ned.

> They are always required when creating an instance of the class and in
> method definitions.
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
>



From tjreedy at udel.edu  Thu Jan 31 20:00:35 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 31 Jan 2013 14:00:35 -0500
Subject: [Python-ideas] while conditional in list comprehension ??
In-Reply-To: <CAHVvXxRvoo0trP-VWeZUdtBQqGQyRP8ORE5AWzyp-cdmEhv5Rw@mail.gmail.com>
References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de>
	<ke71ge$u1m$1@ger.gmane.org> <51072650.5090808@pearwood.info>
	<CADiSq7dGdQU4fj0Oc+x6LUz4GEiSA2pVcwzztHiHCT_cV6TVyg@mail.gmail.com>
	<97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com>
	<CAJ78kjOOrcebp=Gkp++bf=k_tCKiCw36ibMw7j5PKCLPH=Xqwg@mail.gmail.com>
	<CAH0mxTSoAtG1FP6UMDEN964TEvjdRVPL9g3BOnFrJ61wSf8ehw@mail.gmail.com>
	<CADiSq7daMv2in+ODV8kMs2A7VBNh+MvvmksiXDMZWx4jRhfSWA@mail.gmail.com>
	<CAHVvXxRvoo0trP-VWeZUdtBQqGQyRP8ORE5AWzyp-cdmEhv5Rw@mail.gmail.com>
Message-ID: <keef0q$e1o$1@ger.gmane.org>

On 1/31/2013 6:08 AM, Oscar Benjamin wrote:
> On 31 January 2013 08:32, Nick Coghlan <ncoghlan at gmail.com> wrote:
  just so happens that, inside a generator (or generator expression)
>> raising StopIteration and returning from the generator are very close
>> to being equivalent operations, which is why the "else stop()" trick
>> works. In a 3.x container comprehension, the inner scope is an
>> ordinary function, so the equivalence between returning from the
>> function and raising StopIteration is lost.
>
> I don't really understand what you mean here. What is the difference
> between comprehensions in 2.x and 3.x?

In 2.x, (list) conprehensions are translated to the equivalent nested 
for and if statements and compiled and executed in place. In 3.x, the 
translation is wrapped in a temporary function that is called and then 
discarded. The main effect is to localize the loop names, the 'i' in 
'[i*2 for i in iterable]', for instance.

-- 
Terry Jan Reedy



From tjreedy at udel.edu  Thu Jan 31 20:04:52 2013
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 31 Jan 2013 14:04:52 -0500
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <510A289F.4090904@canterbury.ac.nz>
References: <5108A2F1.5010006@canterbury.ac.nz>
	<20130130221926.GA20372@cskk.homeip.net>
	<510A289F.4090904@canterbury.ac.nz>
Message-ID: <keef8r$gcf$1@ger.gmane.org>

On 1/31/2013 3:17 AM, Greg Ewing wrote:
> Cameron Simpson wrote:
>> How about this:
>>
>>   Color = enum(RED=None, WHITE=None, BLUE=None, yellow=9)
>
> You see, this is the problem -- there are quite a number
> of these solutions, all about as good as each other, with
> none of them standing out as obviously the right choice
> for stdlib inclusion.
>
> Michael Foord's solution has promise, though, as it manages
> to eliminate *all* of the extraneous cruft and look almost
> like it's built into the language.
>
> Plus it has the bonus of making you go "...??? How the
> blazes does *that* work?" the first time you see it. :-)

Yeah, I was thinking that if it were added to stdlib, the current 
metaclass discussion in the reference should be augmented by referring 
to it as a non-toy example of metaclasses at work.

-- 
Terry Jan Reedy



From andrew at ei-grad.ru  Thu Jan 31 20:33:05 2013
From: andrew at ei-grad.ru (Andrew Grigorev)
Date: Thu, 31 Jan 2013 23:33:05 +0400
Subject: [Python-ideas] Definition Symmetry
In-Reply-To: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
References: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
Message-ID: <510AC6F1.1060503@ei-grad.ru>

Other strange thing is that the `raise` statement doesn't require to 
instantiate an Exception object, allowing to pass an Exception class to it.

raise NotImplementedError
raise NotImplementedError()

Is there any difference between this two lines of code?

And there is nothing about that fact in python docs. (or I just not 
found?..)

-- 
Andrew


31.01.2013 20:35, Jason Keene ?????:
> Why do function definitions require parens?
>
> >>> class MyClass:
> ...     pass
> ...
> >>> def my_func:
>   File "<stdin>", line 1
>     def my_func:
>                ^
> SyntaxError: invalid syntax
>
> This seems to me to break a symmetry with class definitions.  I assume 
> this is just a hold off from C, perhaps there is a non-historical 
> reason tho.
>
> I believe in the past we've forced parens in list comprehensions to 
> create a symmetry between comprehensions/generator expressions.  Why 
> not for this?
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/de00cee2/attachment.html>

From ericsnowcurrently at gmail.com  Thu Jan 31 20:56:04 2013
From: ericsnowcurrently at gmail.com (Eric Snow)
Date: Thu, 31 Jan 2013 12:56:04 -0700
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <CADiSq7cj8xLPXa-YjE-H9DEzXRbgjhm_B40ELQW8QjfGT3+htA@mail.gmail.com>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
	<51094D8D.606@hastings.org>
	<CADiSq7cj8xLPXa-YjE-H9DEzXRbgjhm_B40ELQW8QjfGT3+htA@mail.gmail.com>
Message-ID: <CALFfu7CTxbGQwdr9+oaChcvUD_Pd0c5S2hxEcQZSfTNPqjawXg@mail.gmail.com>

On Thu, Jan 31, 2013 at 1:56 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Looking at the problem from a different direction:
>
> Currently, modules are *instances* of a normal type
> (types.ModuleType). Thus, anything stored in their global namespace is
> like anything else stored in a normal instance dictionary: no
> descriptor behaviour.
>
> The request in this thread is basically for a way to:
>
> 1. Define a custom type
> 2. Put an instance of that type in sys.modules instead of the ordinary
> module object
>
> Now here's the thing: we already support this, because the import
> system is designed to cope with modules replacing
> "sys.modules[__name__]" while they're being loaded. The way this
> happens is that, after we finish loading a module, we usually don't
> trust what the loader gave us. Instead, we go look at what's in
> sys.modules under the name being loaded.
>
> So if, in your module code, you do this:
>
>     import sys, types
>     class MyPropertyUsingModule(types.ModuleType):
>         def __init__(self, original):
>             # Keep a reference to the original module to avoid the
>             # destructive cleanup of the global namespace
>             self._original = original
>
>         @property
>         def myglobal(self):
>             return theglobal
>
>         @myglobal.setter
>         def myglobal(self, value):
>             global theglobal
>             theglobal = value
>
>     sys.modules[__name__] = MyPropertyUsingModule(sys.modules[__name__])
>
> Then what you end up with in sys.modules is a module with a global
> property, "myglobal".
>
> I'd prefer to upgrade this from "begrudged backwards compatibility
> hack" to "supported feature", rather than doing anything more
> complicated.

+1

At this point I don't see this behavior of the import system changing,
even for Python 4.  Making it part of the spec is the best fit for
this class of problem (not-terribly-sophisticated solution for a
relatively uncommon case).  Otherwise we'd need a way to allow a
module definition (.py, etc.) to dictate which class to use, which
seems unnecessary and even overly complicated given the scale of the
target audience.

That said, Larry's original proposal relates to sys, a built-in module
written in C (in CPython of course).  In that case the solution is not
quite the same, since module initialization interacts with sys.modules
differently. [1][2]  Accommodating the original request would require
more work, whether to muck with the import C-API or making sys an
instance of another type, as someone suggested.

-eric


[1] See http://mail.python.org/pipermail/python-dev/2012-November/122599.html
[2] http://bugs.python.org/msg174704


From timothy.c.delaney at gmail.com  Thu Jan 31 21:19:55 2013
From: timothy.c.delaney at gmail.com (Tim Delaney)
Date: Fri, 1 Feb 2013 07:19:55 +1100
Subject: [Python-ideas] constant/enum type in stdlib
In-Reply-To: <CAN8CLg=e9DAqdpgd4=0chU=dubVEr66LtnWUwYdJb6O2O4Vbag@mail.gmail.com>
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<CADiSq7cKrdgzBgENPYYXA5S-oCtT4Jxs5xnMwr750LUyKg+v6Q@mail.gmail.com>
	<CAF-Rda8B1Cw1vNZxmwuCP0AxHxhjdNQvoCWGqJsKmZsj4g0C3g@mail.gmail.com>
	<51085AAB.6090303@canterbury.ac.nz>
	<CAF-Rda9Oup0fmJapmAg9kFv=YaGwjph2ZyMXrMB3Q0bEhWz2eg@mail.gmail.com>
	<CAP7+vJLT+KxjqNT6=c6ZYmBWw55U_88W8MFqW+_3OH7AqTVw4Q@mail.gmail.com>
	<5108A87D.9000207@canterbury.ac.nz>
	<20130130082639.0b28d7eb@pitrou.net>
	<CAKCKLWyegkHo15dcVvhC23WaY=3Mnr6P-JT5M_Lm6UKCMahUhg@mail.gmail.com>
	<CAKCKLWxTt0kwjc4dpRJUz27_0bt+u+AHkEqzTJEoZNC7rshKkQ@mail.gmail.com>
	<kec3h1$3rv$1@ger.gmane.org>
	<CAN8CLg=e9DAqdpgd4=0chU=dubVEr66LtnWUwYdJb6O2O4Vbag@mail.gmail.com>
Message-ID: <CAN8CLgnNB9iOY8EX9ouJ+Jq_t1BP0BRwWR-_xQgVgL4ivZ4F1w@mail.gmail.com>

On 31 January 2013 12:27, Tim Delaney <timothy.c.delaney at gmail.com> wrote:

> On 31 January 2013 08:32, Terry Reedy <tjreedy at udel.edu> wrote:
>
>> On 1/30/2013 10:30 AM, Michael Foord wrote:
>>
>>> On 30 January 2013 15:22, Michael Foord
>>>
>>
>>      With a Python 3 metaclass that provides default values for *looked
>>>     up* entries you could have this:
>>>
>>>     class Color(Enum):
>>>          RED, WHITE, BLUE
>>>
>>>     The lookup would create the member - with the appropriate value.
>>>
>>> class values(dict):
>>>      def __init__(self):
>>>          self.value = 0
>>>      def __getitem__(self, key):
>>>
>>
>>
>> So RED, WHITE, BLUE are 1, 2, 3; not 0, 1, 2 as I and many readers might
>> expect. That aside (which can be fixed), this is very nice.
>>
>
> Here is a version that I think creates an enum with most of the features
> of traditional and modern enums.
>
> - Enum values are subclasses of int;
>
> - Only need to declare the enum key name;
>
> - Starts at zero by default;
>
> - Can change the start value;
>
> - Can have discontiguous values (e.g. 0, 1, 5, 6);
>
> - Can have other types of class attributes;
>
> - Ensures that there is a 1:1 mapping between key:value (throws an
> exception if either of these is violated;
>
> - Able to obtain the keys, values and items as per the mapping interface
> (sorted by value);
>
> - Lookup an enum by key or value;
>
> One thing to note is that *any* class attribute assigned a value which
> implements __index__ will be considered an enum value assignment.
>

Forgot about making it iterable - an easy-to-ad feature. Obviously it would
iterate over the EnumValue instancess.

Thought I'd better make it explicit as well that this was based on Michael
Foords brilliant work.

Tim Delaney
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130201/5b73c5d4/attachment.html>

From barry at python.org  Thu Jan 31 21:46:30 2013
From: barry at python.org (Barry Warsaw)
Date: Thu, 31 Jan 2013 15:46:30 -0500
Subject: [Python-ideas] constant/enum type in stdlib
References: <5108A2F1.5010006@canterbury.ac.nz>
	<20130130221926.GA20372@cskk.homeip.net>
Message-ID: <20130131154630.23903b07@anarchist.wooz.org>

On Jan 31, 2013, at 09:19 AM, Cameron Simpson wrote:

>  Color = enum(RED=None, WHITE=None, BLUE=None, yellow=9)

Oh, I forgot to mention that flufl.enum has an alternative API that's fairly
close to this, although it does not completely eliminate DRY[1]:

>>> from flufl.enum import make
>>> make('Animals', ('ant', 'bee', 'cat', 'dog'))
<Animals {ant: 1, bee: 2, cat: 3, dog: 4}>

You can also supply the elements as a 2-tuples if you want to specify the
values.  An example from the docs providing bit flags:

>>> def enumiter():
...     start = 1
...     while True:
...         yield start
...         start <<= 1
>>> make('Flags', zip(list('abcdefg'), enumiter()))
<Flags {a: 1, b: 2, c: 4, d: 8, e: 16, f: 32, g: 64}>

Cheers,
-Barry

[1] The first argument is currently necessary in order to give the right
printed representation of the enum.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/22835d7b/attachment.pgp>

From barry at python.org  Thu Jan 31 22:00:32 2013
From: barry at python.org (Barry Warsaw)
Date: Thu, 31 Jan 2013 16:00:32 -0500
Subject: [Python-ideas] constant/enum type in stdlib
References: <CAH0mxTQ9wfcR_R5SbXV0J63iVaB1PvYvSF06tRy=AibO2Qajfg@mail.gmail.com>
	<20130129202730.6ea6d0d5@anarchist.wooz.org>
	<CAKCKLWxddOp-0k3b9Gdeg+WfNhJOnTPtJh50D62sVSDpf-zcaw@mail.gmail.com>
	<20130130103548.12bce67d@anarchist.wooz.org>
	<CAF-Rda-6vDo4tG5+ByASSDUm9Z=em4KjLL6CWsfwgO2hF9X+fg@mail.gmail.com>
	<20130130112707.5cf60dfc@anarchist.wooz.org>
	<CAF-Rda9V1mv0UWuFCS17eKW+mL6tOHpycwtR=foGJ2Dzujy=pQ@mail.gmail.com>
	<CAFYqXL9vvGZA+f+WCVxEoFO5BviGS8dSybRkoger_+0jYcv2OQ@mail.gmail.com>
Message-ID: <20130131160032.63baef0a@anarchist.wooz.org>

I'll agree that enums are subject to personal taste, and I am opinionated
about the syntax and semantics, as should be evident in my library :).

On Jan 30, 2013, at 09:13 PM, Giampaolo Rodol? wrote:

>1) a const/enum type looks like something which is subject to personal
>taste to me. I personally don't like, for example, how flufl requires
>to define constants by using a class.

In practice, I find this quite nice.  In my larger projects, I define the
enum class an the interface module and often intersperse comments among the
enum values so that more documentation is provided to the reader.

>It's just a matter of taste but to me module.FOO looks more "right"
>than module.Bar.FOO.

I almost always 'from module import MyEnum' so typical use looks something
like:

    if thing.color is Color.red:
        ...
    elif thing.color is Color.blue:
        ...

Again, in practice, I find it quite readable and just the right level of
verbosity.

>Also "Colors.red < Colors.blue" raising an exception is something
>subject to personal taste.

I guess, if you like blue more than red, but what if you like red more than
blue? :)

Ordered enums just don't usually make sense, and if they really did, you can
coerce to int to do the comparison (but again, I've never needed it, so YAGNI).

>2) introducing something like that (class-based) wouldn't help
>migrating the existent module-level constants we have in the stdlib.
>Only new projects or new stdlib modules would benefit from it.

Sure, but I don't think this is necessarily about converting the stdlib.  We
rarely do such mass conversions anyway.

>3) other than being subject to personal taste, a const/enum type is
>also pretty easy to implement.

True, depending on the semantics, syntax, and feature you want.

>4) I'm getting the impression that the language is growing too big. To
>me, this looks like yet another thing that infrequent users have to
>learn before being able to read and understand Python code.
>Also consider that people lived without const/enum for 2 decades now.

Well, I would agree that the *language* doesn't need them, but that's
different than the stdlib.  Maybe the stdlib still doesn't need them either.
I don't personally care either way except to save me the trouble of writing up
another PEP. :)

As for the language growing too big, maybe Pycon 2013 is time for another one
of Guido's infamous polls!

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/3344971e/attachment.pgp>

From barry at python.org  Thu Jan 31 22:06:57 2013
From: barry at python.org (Barry Warsaw)
Date: Thu, 31 Jan 2013 16:06:57 -0500
Subject: [Python-ideas] Definition Symmetry
References: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
	<510AC6F1.1060503@ei-grad.ru>
Message-ID: <20130131160657.4620f918@anarchist.wooz.org>

On Jan 31, 2013, at 11:33 PM, Andrew Grigorev wrote:

>Other strange thing is that the `raise` statement doesn't require to
>instantiate an Exception object, allowing to pass an Exception class to it.
>
>raise NotImplementedError
>raise NotImplementedError()
>
>Is there any difference between this two lines of code?

The main difference (I *think* this is still true) is that in the first
example, if the exception is caught in C it can avoid instantiation.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130131/5a3412a0/attachment.pgp>

From greg.ewing at canterbury.ac.nz  Thu Jan 31 22:31:32 2013
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 01 Feb 2013 10:31:32 +1300
Subject: [Python-ideas] Definition Symmetry
In-Reply-To: <CAPAaX_gDsVaxXNOaCJTfF7zwwhLMvgtXcnYO+uxoThxUp4gacA@mail.gmail.com>
References: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
	<CAPAaX_gDsVaxXNOaCJTfF7zwwhLMvgtXcnYO+uxoThxUp4gacA@mail.gmail.com>
Message-ID: <510AE2B4.8070202@canterbury.ac.nz>

Jason Keene wrote:
> Just to be clear, I wasn't suggesting forcing parens for class 
> definitions.  Rather make them optional for functions!

That would introduce an asymmetry between function definitions
and function calls -- parens would be required in the call but
not the definition.

And before you say that this asymmetry currently exists between
class definitions and class instantiations, it's not the same
situation. What goes between the parens in a class definition
is the base classes, not the arguments to the constructor.

-- 
Greg


From rosuav at gmail.com  Thu Jan 31 22:43:51 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 1 Feb 2013 08:43:51 +1100
Subject: [Python-ideas] Definition Symmetry
In-Reply-To: <510AB3EB.9020806@nedbatchelder.com>
References: <CAPAaX_g8uuU9jhnWD_G6Xc3Anw8Pjkeu7UO5R_hDbXQdArkpPA@mail.gmail.com>
	<510AA69D.1060300@mrabarnett.plus.com>
	<510AB3EB.9020806@nedbatchelder.com>
Message-ID: <CAPTjJmpMd04baDfiyLPpUKSs64HtCbjfDu4VcoiV-8zyhiLZZw@mail.gmail.com>

On Fri, Feb 1, 2013 at 5:11 AM, Ned Batchelder <ned at nedbatchelder.com> wrote:
> I think parens for super class are an unfortunate syntax, since it looks
> just like arguments to the class and is confusing for some beginners:
>
>     def function(arg):
>         ...
>     function(10)            # Similar syntax: 10 corresponds to arg
>
>     class Thing(Something):
>         ...
>     thing = Thing(10)    # How does 10 relate to Something? It doesn't.
>
> A better syntax (which I AM NOT PROPOSING) would be:
>
>     class Thing from Something:

What about

class Thing = Something:
  pass

I am not proposing this either, but it would emphasize the difference
between superclasses and __init__ args.

But really, parens are used in many different ways. There doesn't need
to be a logical parallel between generator expressions and function
calls, for instance.

ChrisA


From rosuav at gmail.com  Thu Jan 31 23:00:14 2013
From: rosuav at gmail.com (Chris Angelico)
Date: Fri, 1 Feb 2013 09:00:14 +1100
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <5109CE95.7060104@hastings.org>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
	<51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info>
	<5109CE95.7060104@hastings.org>
Message-ID: <CAPTjJmrT_kvEN2L5g=nWNCdE+ktaxhK5SjYwE1OcWBQgdQVQiw@mail.gmail.com>

On Thu, Jan 31, 2013 at 12:53 PM, Larry Hastings <larry at hastings.org> wrote:
> But if the user assigns a different value to math.__dict__['pi'], math.pi
> will diverge, which again could break code.  (Who might try to assign a
> different value to pi?  The 1897 House Of Representatives of Indiana for
> one!)
>
>
> More generally, it's often useful to monkeypatch "constants" at runtime, for
> testing purposes (and for less justifiable purposes).  Why prevent that?  I
> cite the Consenting Adults rule.

I've never actually been in the situation of doing it, but wouldn't it
be reasonable to switch out math.pi to be (say) a decimal.Decimal
rather than a float?

ChrisA


From ethan at stoneleaf.us  Thu Jan 31 23:13:38 2013
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 31 Jan 2013 14:13:38 -0800
Subject: [Python-ideas] Extend module objects to support properties
In-Reply-To: <5109FB5D.2090109@hastings.org>
References: <51087225.3040801@hastings.org>
	<CADiSq7fRZhD8kWoHfpxg7HKriVr6qET6QRvCLGirXYdsUJHEfA@mail.gmail.com>
	<51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info>
	<5109CE95.7060104@hastings.org> <5109F19A.3060902@stoneleaf.us>
	<5109FB5D.2090109@hastings.org>
Message-ID: <510AEC92.1060809@stoneleaf.us>

On 01/30/2013 09:04 PM, Larry Hastings wrote:
> On 01/30/2013 08:22 PM, Ethan Furman wrote:
>> On 01/30/2013 05:53 PM, Larry Hastings wrote:
>>> If we change math.pi to be a property it wouldn't be in the dict
>>> anymore.  So that has the possibility of breaking code.
>> So make the property access the __dict__:
>
> In which case, it behaves exactly like it does today without a
> property.  Okay... so why bother?  If your answer is "so it can have
> code behind it", maybe you find a better example than math.pi, which
> will never need code behind it.

math.pi wasn't my example, I was just showing how you could use the 
__dict__ as well.

Why bother?  Backwards compatibility.

I think I missed your main point of __dict__ access, though -- if it is 
set directly then the property doesn't get the chance to update whatever 
is supposed to update at the right moment, leading to weird (and most 
likely buggy) behavior.

~Ethan~