PEP 3148 ready for pronouncement
The PEP is here: http://www.python.org/dev/peps/pep-3148/ I think the PEP is ready for pronouncement, and the code is pretty much ready for submission into py3k (I will have to make some minor changes in the patch like changing the copyright assignment): http://code.google.com/p/pythonfutures/source/browse/#svn/branches/ feedback/python3/futures%3Fstate%3Dclosed The tests are here and pass on W2K, Mac OS X and Linux: http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pytho... The docs (which also need some minor changes) are here: http://code.google.com/p/pythonfutures/source/browse/branches/feedback/docs/... Cheers, Brian
On 2010-05-21, Brian Quinlan wrote:
The PEP is here: http://www.python.org/dev/peps/pep-3148/ [snip]
Hi Brian, Could I suggest a small subtle changing in naming: replace "executor" with "executer"? I guess this suggestion is doomed though since Java uses executor:-( I'd also be tempted to rename "submit()" to "apply()" in view of Python's history. Also, maybe change "done()" to "finished()" since the function returns True if the call was cancelled (so the job can't have been "done"), as well as if the call was finished. Actually, having read further, maybe the best name would be "completed()" since that's a term used throughout. Perhaps call the "not_finished" set "pending" since presumably these are still in progress? (My understanding is that if they were cancelled or finished they'd be in the "finished" set. I'd also rename "finished" to "completed" if you have a "completed()" method.) I think FIRST_COMPLETED is misleading since it implies (to me anyway) the first one passed. How about ONE_COMPLETED; and similarly ONE_EXCEPTION? I think it would be helpful to clarify whether the timout value (which you specify as being in seconds) can meaningfully accept a float, e.g., 0.5? Anyway, it looks like it will be a really nice addition to the standard library:-) -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "C++ GUI Programming with Qt 4" - ISBN 0132354160
Hey Mark, This really isn't the time to propose changes. The PEP has been discussed extensively on stdlib-sig and python-dev. On May 21, 2010, at 9:29 PM, Mark Summerfield wrote:
On 2010-05-21, Brian Quinlan wrote:
The PEP is here: http://www.python.org/dev/peps/pep-3148/ [snip]
Hi Brian,
Could I suggest a small subtle changing in naming: replace "executor" with "executer"? I guess this suggestion is doomed though since Java uses executor:-(
I'd also be tempted to rename "submit()" to "apply()" in view of Python's history.
Also, maybe change "done()" to "finished()" since the function returns True if the call was cancelled (so the job can't have been "done"), as well as if the call was finished. Actually, having read further, maybe the best name would be "completed()" since that's a term used throughout.
Perhaps call the "not_finished" set "pending" since presumably these are still in progress? (My understanding is that if they were cancelled or finished they'd be in the "finished" set. I'd also rename "finished" to "completed" if you have a "completed()" method.)
I think FIRST_COMPLETED is misleading since it implies (to me anyway) the first one passed. How about ONE_COMPLETED; and similarly ONE_EXCEPTION?
I think it would be helpful to clarify whether the timout value (which you specify as being in seconds) can meaningfully accept a float, e.g., 0.5?
I've updated the docs to clarify that float args are acceptable. Cheers, Brian
Anyway, it looks like it will be a really nice addition to the standard library:-)
-- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "C++ GUI Programming with Qt 4" - ISBN 0132354160
Brian Quinlan wrote:
The PEP is here: http://www.python.org/dev/peps/pep-3148/
I think the PEP is ready for pronouncement, and the code is pretty much ready for submission into py3k (I will have to make some minor changes in the patch like changing the copyright assignment): http://code.google.com/p/pythonfutures/source/browse/#svn/branches/feedback/...
Your example here: for number, is_prime in zip(PRIMES, executor.map(is_prime, PRIMES)): print('%d is prime: %s' % (number, is_prime)) Overwrites the 'is_prime' function with the return value of the function. Probably better to use a different variable name. John =:->
On May 21, 2010, at 9:44 PM, John Arbash Meinel wrote:
Brian Quinlan wrote:
The PEP is here: http://www.python.org/dev/peps/pep-3148/
I think the PEP is ready for pronouncement, and the code is pretty much ready for submission into py3k (I will have to make some minor changes in the patch like changing the copyright assignment): http://code.google.com/p/pythonfutures/source/browse/#svn/branches/ feedback/python3/futures%3Fstate%3Dclosed
Your example here: for number, is_prime in zip(PRIMES, executor.map(is_prime, PRIMES)): print('%d is prime: %s' % (number, is_prime))
Overwrites the 'is_prime' function with the return value of the function. Probably better to use a different variable name.
Good catch. I've updated the example. Cheers, Brian
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brian Quinlan wrote:
The PEP is here: http://www.python.org/dev/peps/pep-3148/
I think the PEP is ready for pronouncement, and the code is pretty much ready for submission into py3k (I will have to make some minor changes in the patch like changing the copyright assignment): http://code.google.com/p/pythonfutures/source/browse/#svn/branches/feedback/...
The tests are here and pass on W2K, Mac OS X and Linux: http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pytho...
The docs (which also need some minor changes) are here: http://code.google.com/p/pythonfutures/source/browse/branches/feedback/docs/...
Cheers, Brian
I also just noticed that your example uses: zip(PRIMES, executor.map(is_prime, PRIMES)) But your doc explicitly says: map(func, *iterables, timeout=None) Equivalent to map(func, *iterables) but executed asynchronously and possibly out-of-order. So it isn't safe to zip() against something that can return out of order. Which opens up a discussion about how these things should be used. Given that your other example uses a dict to get back to the original arguments, and this example uses zip() [incorrectly], it seems that the Futures object should have the arguments easily accessible. It certainly seems like a common use case that if things are going to be returned in arbitrary order, you'll want an easy way to distinguish which one you have. Having to write a dict map before each call can be done, but seems unoptimal. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkv2cugACgkQJdeBCYSNAAPWzACdE6KepgEmjwhCD1M4bSSVrI97 NIYAn1z5U3CJqZnBSn5XgQ/DyLvcKtbf =TKO7 -----END PGP SIGNATURE-----
On May 21, 2010, at 9:47 PM, John Arbash Meinel wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Brian Quinlan wrote:
The PEP is here: http://www.python.org/dev/peps/pep-3148/
I think the PEP is ready for pronouncement, and the code is pretty much ready for submission into py3k (I will have to make some minor changes in the patch like changing the copyright assignment): http://code.google.com/p/pythonfutures/source/browse/#svn/branches/ feedback/python3/futures%3Fstate%3Dclosed
The tests are here and pass on W2K, Mac OS X and Linux: http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pytho...
The docs (which also need some minor changes) are here: http://code.google.com/p/pythonfutures/source/browse/branches/feedback/docs/...
Cheers, Brian
I also just noticed that your example uses: zip(PRIMES, executor.map(is_prime, PRIMES))
But your doc explicitly says: map(func, *iterables, timeout=None)
Equivalent to map(func, *iterables) but executed asynchronously and possibly out-of-order.
So it isn't safe to zip() against something that can return out of order.
The docs don't say that the return value can be out-of-order, just that execution can be out-of-order. But I agree that the phrasing is confusing so I've changed it to: Equivalent to ``map(func, *iterables)`` but *func* is executed asynchronously and several calls to *func *may be made concurrently.
Which opens up a discussion about how these things should be used.
Except that now isn't the time for that discussion. This PEP has discussed on-and-off for several months on both stdlib-sig and python- dev. If you think that storing the args (e.g. with the future) is a good idea then you can propose a patch after the PEP is integrated (if it is rejected then it probably isn't worth discussing ;-)). Cheers, Brian
Given that your other example uses a dict to get back to the original arguments, and this example uses zip() [incorrectly], it seems that the Futures object should have the arguments easily accessible. It certainly seems like a common use case that if things are going to be returned in arbitrary order, you'll want an easy way to distinguish which one you have. Having to write a dict map before each call can be done, but seems unoptimal.
John =:->
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkv2cugACgkQJdeBCYSNAAPWzACdE6KepgEmjwhCD1M4bSSVrI97 NIYAn1z5U3CJqZnBSn5XgQ/DyLvcKtbf =TKO7 -----END PGP SIGNATURE-----
On Fri, May 21, 2010 at 8:26 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Except that now isn't the time for that discussion. This PEP has discussed on-and-off for several months on both stdlib-sig and python-dev.
I think any time till the PEP is accepted is a good time to discuss changes to the API Issues with the PEP: 1) Examples as written fail on windows. Patch to fix @ http://code.google.com/p/pythonfutures/issues/detail?id=5 Issues with Implementation: 1) Globals are used for tracking running threads (but not processes) and shutdown state, but are not accessed as a globals every where they are modified so it could be inconsistent. 2) The atexit handle randomly throws an exception on exit on windows when running the tests or examples. Error in atexit._run_exitfuncs: TypeError: print_exception(): Exception expected for value, str found Issues 1 & 2 would be solved by moving thread tracking back into the executor responsible for the threads, or making a singleton that tracked threads / processes for all executors. http://code.google.com/p/pythonfutures/issues/detail?id=6 is one such implementation
On May 22, 2010, at 5:30 AM, Dj Gilcrease wrote:
On Fri, May 21, 2010 at 8:26 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Except that now isn't the time for that discussion. This PEP has discussed on-and-off for several months on both stdlib-sig and python-dev.
I think any time till the PEP is accepted is a good time to discuss changes to the API
I disagree. If a PEP is being updated continuously then there is nothing stable to pronounce on.
Issues with the PEP: 1) Examples as written fail on windows. Patch to fix @ http://code.google.com/p/pythonfutures/issues/detail?id=5
Updated, thanks!
Issues with Implementation: 1) Globals are used for tracking running threads (but not processes) and shutdown state, but are not accessed as a globals every where they are modified so it could be inconsistent.
2) The atexit handle randomly throws an exception on exit on windows when running the tests or examples.
Error in atexit._run_exitfuncs: TypeError: print_exception(): Exception expected for value, str found
Lets take this off-list. Cheers, Brian
Issues 1 & 2 would be solved by moving thread tracking back into the executor responsible for the threads, or making a singleton that tracked threads / processes for all executors. http://code.google.com/p/pythonfutures/issues/detail?id=6 is one such implementation _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brian%40sweetapp.com
On Sat, 22 May 2010 19:12:05 +1000, Brian Quinlan <brian@sweetapp.com> wrote:
On May 22, 2010, at 5:30 AM, Dj Gilcrease wrote:
On Fri, May 21, 2010 at 8:26 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Except that now isn't the time for that discussion. This PEP has discussed on-and-off for several months on both stdlib-sig and python-dev.
I think any time till the PEP is accepted is a good time to discuss changes to the API
I disagree. If a PEP is being updated continuously then there is nothing stable to pronounce on.
Well, you've been making updates as a result of this round of discussion. If there is still discussion then perhaps the PEP isn't ready for pronouncement yet. At some point someone can decide it is all bikeshedding and ask for pronouncement on that basis, but I don't think it is appropriate to cut off discussion by saying "it's ready for pronouncement" unless you want increase the chances of its getting rejected. The usual way of doing this (at least so far as I have observed, which granted hasn't been too many cases) is to say something like "I think this PEP is ready for pronouncement" and then wait for feedback on that assertion or for the pronouncement. It's especially good if you can answer any concerns that are raised with "that was discussed already and we concluded X". Bonus points for finding a thread reference and adding it to the PEP :) -- R. David Murray www.bitdance.com
On Sat, May 22, 2010 at 9:59 AM, R. David Murray <rdmurray@bitdance.com> wrote:
On Sat, 22 May 2010 19:12:05 +1000, Brian Quinlan <brian@sweetapp.com> wrote:
On May 22, 2010, at 5:30 AM, Dj Gilcrease wrote:
On Fri, May 21, 2010 at 8:26 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Except that now isn't the time for that discussion. This PEP has discussed on-and-off for several months on both stdlib-sig and python-dev.
I think any time till the PEP is accepted is a good time to discuss changes to the API
I disagree. If a PEP is being updated continuously then there is nothing stable to pronounce on.
Well, you've been making updates as a result of this round of discussion.
If there is still discussion then perhaps the PEP isn't ready for pronouncement yet. At some point someone can decide it is all bikeshedding and ask for pronouncement on that basis, but I don't think it is appropriate to cut off discussion by saying "it's ready for pronouncement" unless you want increase the chances of its getting rejected.
I commiserate with Brian here - he's been very patient, and has been working on things, taking in input, etc for awhile now on this. In his mind, it is done (or at least incredibly close to done) and opening the door in the conversation for more API nit picking and debate about the exact verbiage on method names means we're never going to be done splashing paint.
The usual way of doing this (at least so far as I have observed, which granted hasn't been too many cases) is to say something like "I think this PEP is ready for pronouncement" and then wait for feedback on that assertion or for the pronouncement. It's especially good if you can answer any concerns that are raised with "that was discussed already and we concluded X". Bonus points for finding a thread reference and adding it to the PEP :)
While sure, this is true - I'd actually back Brian up on trying to avoid more "why didn't you call it a banana" style discussions. At some point the constant back and forth has to stop, and to his credit, Brian has made a lot of changes, listened to a lot of feedback, etc. It's fair for him to just ask that a decision be made. jesse
On Sat, May 22, 2010 at 7:09 AM, Jesse Noller <jnoller@gmail.com> wrote:
On Sat, May 22, 2010 at 9:59 AM, R. David Murray <rdmurray@bitdance.com> wrote:
On Sat, 22 May 2010 19:12:05 +1000, Brian Quinlan <brian@sweetapp.com> wrote:
On May 22, 2010, at 5:30 AM, Dj Gilcrease wrote:
On Fri, May 21, 2010 at 8:26 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Except that now isn't the time for that discussion. This PEP has discussed on-and-off for several months on both stdlib-sig and python-dev.
I think any time till the PEP is accepted is a good time to discuss changes to the API
I disagree. If a PEP is being updated continuously then there is nothing stable to pronounce on.
Well, you've been making updates as a result of this round of discussion.
If there is still discussion then perhaps the PEP isn't ready for pronouncement yet. At some point someone can decide it is all bikeshedding and ask for pronouncement on that basis, but I don't think it is appropriate to cut off discussion by saying "it's ready for pronouncement" unless you want increase the chances of its getting rejected.
I commiserate with Brian here - he's been very patient, and has been working on things, taking in input, etc for awhile now on this. In his mind, it is done (or at least incredibly close to done) and opening the door in the conversation for more API nit picking and debate about the exact verbiage on method names means we're never going to be done splashing paint.
The usual way of doing this (at least so far as I have observed, which granted hasn't been too many cases) is to say something like "I think this PEP is ready for pronouncement" and then wait for feedback on that assertion or for the pronouncement. It's especially good if you can answer any concerns that are raised with "that was discussed already and we concluded X". Bonus points for finding a thread reference and adding it to the PEP :)
While sure, this is true - I'd actually back Brian up on trying to avoid more "why didn't you call it a banana" style discussions. At some point the constant back and forth has to stop, and to his credit, Brian has made a lot of changes, listened to a lot of feedback, etc. It's fair for him to just ask that a decision be made.
Great points Jesse! Since I really don't have the time or expertise to make a judgment on this PEP, I hereby appoint you chair of the approval process for this PEP. That basically means that when you think it's ready to be approved, you say so, and it's a done deal. The remaining feedback cycle is up to you now -- it sounds like you're ready for closure, which sounds good to me (again, without having read the PEP or tried to write something using the proposed code). You can do it however you like: you can declare it approved now, or read it over once more yourself and suggest some final changes, or set a period (e.g. 48 hours) during which final comments have to be received, which you then will judge by merit or by your whim, or you can flip a coin or say a prayer... (I've tried most of those myself in the past and haven't done too badly if I say so myself. :-) You're the boss now. I know you will do the right thing for this PEP. -- --Guido van Rossum (python.org/~guido)
Hey all, Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days. So fire away! Cheers, Brian
On Sat, May 22, 2010 at 8:47 PM, Brian Quinlan <brian@sweetapp.com> wrote:
Hey all,
Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days.
So fire away!
Man, everyone's faster on the email thing lately than me :) Yes, I spoke to Brian, and since we're not in a rush - please do bring up serious issues you might have, assume for the moment that the names, unless they're "def lol_python" are going to stay pretty consistent unless everyone breaks out the pitchforks, and also assume the API won't see much changing. So please, do bring up issues. I'm obviously biased towards accepting it - however, nothing is ever set in stone. jesse
On May 22, 2010, at 8:47 PM, Brian Quinlan wrote:
Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days.
So fire away!
As you wish! The PEP should be consistent in its usage of terminology about callables. It alternately calls them "callables", "functions", and "functions or methods". It would be nice to clean this up and be consistent about what can be called where. I personally like "callables". The execution context of callable code is not made clear. Implicitly, submit() or map() would run the code in threads or processes as defined by the executor, but that's not spelled out clearly. More relevant to my own interests, the execution context of the callables passed to add_done_callback and remove_done_callback is left almost completely to the imagination. If I'm reading the sample implementation correctly, <http://code.google.com/p/pythonfutures/source/browse/branches/feedback/python3/futures/process.py#241>, it looks like in the multiprocessing implementation, the done callbacks are invoked in a random local thread. The fact that they are passed the future itself *sort* of implies that this is the case, but the multiprocessing module plays fast and loose with object identity all over the place, so it would be good to be explicit and say that it's *not* a pickled copy of the future sitting in some arbitrary process (or even on some arbitrary machine). This is really minor, I know, but why does it say "NOTE: This method can be used to create adapters from Futures to Twisted Deferreds"? First of all, what's the deal with "NOTE"; it's the only "NOTE" in the whole PEP, and it doesn't seem to add anything. This sentence would read exactly the same if that word were deleted. Without more clarity on the required execution context of the callbacks, this claim might not actually be true anyway; Deferred callbacks can only be invoked in the main reactor thread in Twisted. But even if it is perfectly possible, why leave so much of the adapter implementation up to the imagination? If it's important enough to mention, why not have a reference to such an adapter in the reference Futures implementation, since it *should* be fairly trivial to write? The fact that add_done_callback is implemented using a set is weird, since it means you can't add the same callback more than once. The set implementation also means that the callbacks get called in a semi-random order, potentially creating even _more_ hard-to-debug order of execution issues than you'd normally have with futures. And I think that this documentation will be unclear to a lot of novice developers: many people have trouble with the idea that "a = Foo(); b = Foo(); a.bar_method != b.bar_method", but "import foo_module; foo_module.bar_function == foo_module.bar_function". It's also weird that you can remove callbacks - what's the use case? Deferreds have no callback-removal mechanism and nobody has ever complained of the need for one, as far as I know. (But lots of people do add the same callback multiple times.) I suggest having have add_done_callback, implementing it with a list so that callbacks are always invoked in the order that they're added, and getting rid of remove_done_callback. futures._base.Executor isn't exposed publicly, but it needs to be. The PEP kinda makes it sound like it is ("Executor is an abstract class..."). Plus, A third party library wanting to implement an executor of its own shouldn't have to copy and paste the implementation of Executor.map. One minor suggestion on the "internal future methods" bit - something I wish we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on separate objects, so that it was very explicit whether you were on the emitting side of a Deferred or the consuming side. That seems to be the case with these internal methods - they are not so much "internal" as they are for the producer of the Future (whether a unit test or executor) so you might want to put them on a different object that it's easy for the thing creating a Future() to get at but hard for any subsequent application code to fiddle with by accident. Off the top of my head, I suggest naming it "Invoker()". A good way to do this would be to have an Invoker class which can't be instantiated (raises an exception from __init__ or somesuch), then a Future.create() method which returns an Invoker, which itself has a '.future' attribute. Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :). Issues like the ones I'm bringing up could be fixed pretty straightforwardly if it were just a matter of filing a bug on a small package, but fixing a stdlib module is a major undertaking.
On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:
On May 22, 2010, at 8:47 PM, Brian Quinlan wrote:
Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days.
So fire away!
As you wish!
I retract my request ;-)
The PEP should be consistent in its usage of terminology about callables. It alternately calls them "callables", "functions", and "functions or methods". It would be nice to clean this up and be consistent about what can be called where. I personally like "callables".
Did you find the terminology confusing? If not then I propose not changing it. But changing it in the user docs is probably a good idea. I like "callables" too.
The execution context of callable code is not made clear. Implicitly, submit() or map() would run the code in threads or processes as defined by the executor, but that's not spelled out clearly.
More relevant to my own interests, the execution context of the callables passed to add_done_callback and remove_done_callback is left almost completely to the imagination. If I'm reading the sample implementation correctly, <http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pytho...
, it looks like in the multiprocessing implementation, the done callbacks are invoked in a random local thread. The fact that they are passed the future itself *sort* of implies that this is the case, but the multiprocessing module plays fast and loose with object identity all over the place, so it would be good to be explicit and say that it's *not* a pickled copy of the future sitting in some arbitrary process (or even on some arbitrary machine).
The callbacks will always be called in a thread other than the main thread in the process that created the executor. Is that a strong enough contract?
This is really minor, I know, but why does it say "NOTE: This method can be used to create adapters from Futures to Twisted Deferreds"? First of all, what's the deal with "NOTE"; it's the only "NOTE" in the whole PEP, and it doesn't seem to add anything. This sentence would read exactly the same if that word were deleted. Without more clarity on the required execution context of the callbacks, this claim might not actually be true anyway; Deferred callbacks can only be invoked in the main reactor thread in Twisted. But even if it is perfectly possible, why leave so much of the adapter implementation up to the imagination? If it's important enough to mention, why not have a reference to such an adapter in the reference Futures implementation, since it *should* be fairly trivial to write?
I'm a bit surprised that this doesn't allow for better interoperability with Deferreds given this discussion: At 02:36 PM 3/16/2010 -0700, Brian Quinlan wrote: """ From P.J Eby:
On Mar 7, 2010, at 11:56 AM, P.J. Eby wrote:
At 10:59 AM 3/7/2010 -0800, Jeffrey Yasskin wrote:
Given a way to register "on-done" callbacks with the future, it would be straightforward to wait for a future without blocking, too.
Yes, and with a few more additions besides that one, you might be on the way to an actual competitor for Deferreds. For example: retry support, chaining, logging, API for transparent result processing, coroutine support, co-ordination tools like locks, sempaphores and queues, etc.
OK, but lets just think about making the APIs compatible e.g. you have some code that uses Futures and now you want to integrate it with some code that uses Deferreds.
I think Jeff's suggestion of having a completion callback on Futures would make it possible to write a Future-to-Deferred adapter. Is that correct?
As long as the callback signature included a way to pass in an error, then yes, that'd probably be sufficient. """ If add_done_callback doesn't help with twisted interoperability then I'd suggest removing it to allow for something that may be more useful to be added later.
The fact that add_done_callback is implemented using a set is weird, since it means you can't add the same callback more than once. The set implementation also means that the callbacks get called in a semi-random order, potentially creating even _more_ hard-to-debug order of execution issues than you'd normally have with futures. And I think that this documentation will be unclear to a lot of novice developers: many people have trouble with the idea that "a = Foo(); b = Foo(); a.bar_method != b.bar_method", but "import foo_module; foo_module.bar_function == foo_module.bar_function".
It's also weird that you can remove callbacks - what's the use case? Deferreds have no callback-removal mechanism and nobody has ever complained of the need for one, as far as I know. (But lots of people do add the same callback multiple times.)
I suggest having have add_done_callback, implementing it with a list so that callbacks are always invoked in the order that they're added, and getting rid of remove_done_callback.
Sounds good to me!
futures._base.Executor isn't exposed publicly, but it needs to be. The PEP kinda makes it sound like it is ("Executor is an abstract class..."). Plus, A third party library wanting to implement an executor of its own shouldn't have to copy and paste the implementation of Executor.map.
That was a bug that I've fixed. Thanks!
One minor suggestion on the "internal future methods" bit - something I wish we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on separate objects, so that it was very explicit whether you were on the emitting side of a Deferred or the consuming side. That seems to be the case with these internal methods - they are not so much "internal" as they are for the producer of the Future (whether a unit test or executor) so you might want to put them on a different object that it's easy for the thing creating a Future() to get at but hard for any subsequent application code to fiddle with by accident. Off the top of my head, I suggest naming it "Invoker()". A good way to do this would be to have an Invoker class which can't be instantiated (raises an exception from __init__ or somesuch), then a Future.create() method which returns an Invoker, which itself has a '.future' attribute.
Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :).
We've already had this discussion before. Could you explain why this module should *not* be in the stdlib e.g. does it have significantly less utility than other modules in stdlib? Is it significantly higher risk? etc?
Issues like the ones I'm bringing up could be fixed pretty straightforwardly if it were just a matter of filing a bug on a small package, but fixing a stdlib module is a major undertaking.
True but I don't think that is a convincing argument. A subset of the functionality provided by this module is already available in Java and C++ and (at least in Java) it is used extensively and without too much trouble. If there are implementation bugs then we can fix them just like we would with any other module. Cheers, Brian
On Sun, May 23, 2010 at 2:37 AM, Brian Quinlan <brian@sweetapp.com> wrote: <snip>
Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :).
We've already had this discussion before. Could you explain why this module should *not* be in the stdlib e.g. does it have significantly less utility than other modules in stdlib? Is it significantly higher risk? etc?
Inclusion in the stdlib is the exception, not the rule, and every exception should be issued for a good reason. I'd like to know what that reason is in this case, if only to get a clearer understanding of why the PEP was accepted.
Issues like the ones I'm bringing up could be fixed pretty straightforwardly if it were just a matter of filing a bug on a small package, but fixing a stdlib module is a major undertaking.
True but I don't think that is a convincing argument. A subset of the functionality provided by this module is already available in Java and C++ and (at least in Java) it is used extensively and without too much trouble. If there are implementation bugs then we can fix them just like we would with any other module.
Guido made exactly the opposite argument during his keynote at PyCon. It seemed fairly reasonable at the time- why do you think it doesn't apply here? Geremy Condra
On May 23, 2010, at 7:15 PM, geremy condra wrote:
On Sun, May 23, 2010 at 2:37 AM, Brian Quinlan <brian@sweetapp.com> wrote:
<snip>
Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :).
We've already had this discussion before. Could you explain why this module should *not* be in the stdlib e.g. does it have significantly less utility than other modules in stdlib? Is it significantly higher risk? etc?
Inclusion in the stdlib is the exception, not the rule, and every exception should be issued for a good reason. I'd like to know what that reason is in this case,
This package eliminates the need to construct the boilerplate present in many Python applications i.e. a thread or process pool, a work queue and result queue. It also makes it easy to take an existing Python application that executes (e.g. IO operations) in sequence and execute them in parallel. It package provides common idioms for two existing modules i.e. multiprocessing offers map functionality while threading doesn't. Those idioms are well understood and already present in Java and C++.
if only to get a clearer understanding of why the PEP was accepted.
It hasn't been accepted.
Issues like the ones I'm bringing up could be fixed pretty straightforwardly if it were just a matter of filing a bug on a small package, but fixing a stdlib module is a major undertaking.
True but I don't think that is a convincing argument. A subset of the functionality provided by this module is already available in Java and C++ and (at least in Java) it is used extensively and without too much trouble. If there are implementation bugs then we can fix them just like we would with any other module.
Guido made exactly the opposite argument during his keynote at PyCon. It seemed fairly reasonable at the time- why do you think it doesn't apply here?
Could you be a little more specific about Guido's argument at PyCon? Cheers, Brian
On Sun, May 23, 2010 at 11:39, Brian Quinlan <brian@sweetapp.com> wrote:
This package eliminates the need to construct the boilerplate present in many Python applications i.e. a thread or process pool, a work queue and result queue. It also makes it easy to take an existing Python application that executes (e.g. IO operations) in sequence and execute them in parallel. It package provides common idioms for two existing modules i.e. multiprocessing offers map functionality while threading doesn't. Those idioms are well understood and already present in Java and C++.
It can do that as a separate package as well. And not only that, it could then be available on PyPI for earlier versions of Python as well, making it much more likely to gain widespread acceptance.
Could you be a little more specific about Guido's argument at PyCon?
A module in stdlib has to be "dead". After it's included in the stdlib it can not go through any major changes since that would mean loss of backwards incompatibility. Also, you can't fix bugs except by releasing new versions of Python. Therefore the API must be completely stable, and the product virtually bugfree before it should be in stdlib. The best way of ensuring that is to release it as a separate module on PyPI, and let it stabilize for a couple of years. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
On May 23, 2010, at 7:54 PM, Lennart Regebro wrote:
On Sun, May 23, 2010 at 11:39, Brian Quinlan <brian@sweetapp.com> wrote:
This package eliminates the need to construct the boilerplate present in many Python applications i.e. a thread or process pool, a work queue and result queue. It also makes it easy to take an existing Python application that executes (e.g. IO operations) in sequence and execute them in parallel. It package provides common idioms for two existing modules i.e. multiprocessing offers map functionality while threading doesn't. Those idioms are well understood and already present in Java and C++.
It can do that as a separate package as well.
You could make the same argument about any module in the stdlib.
And not only that, it could then be available on PyPI for earlier versions of Python as well, making it much more likely to gain widespread acceptance.
I doubt it. Simple modules are unlikely to develop a following because it is too easy to partially replicate their functionality. urlparse and os.path are very useful modules but I doubt that they would have been successful on PyPI.
Could you be a little more specific about Guido's argument at PyCon?
A module in stdlib has to be "dead". After it's included in the stdlib it can not go through any major changes since that would mean loss of backwards incompatibility.
The good news in this case is that the same API has been used successfully in Java and C++ for years so it is unlikely that any major changes will need to be made.
Also, you can't fix bugs except by releasing new versions of Python. Therefore the API must be completely stable, and the product virtually bugfree before it should be in stdlib. The best way of ensuring that is to release it as a separate module on PyPI, and let it stabilize for a couple of years.
Yeah but that model isn't likely to work with this package. Cheers, Brian
On Sun, May 23, 2010 at 10:15 PM, Brian Quinlan <brian@sweetapp.com> wrote:
Also, you can't fix bugs except by releasing new versions of Python. Therefore the API must be completely stable, and the product virtually bugfree before it should be in stdlib. The best way of ensuring that is to release it as a separate module on PyPI, and let it stabilize for a couple of years.
Yeah but that model isn't likely to work with this package. Cheers, Brian
Forgive my ignorance, but why do you say that that model won't work with this package?
On May 23, 2010, at 8:43 PM, Robert Collins wrote:
On Sun, May 23, 2010 at 10:15 PM, Brian Quinlan <brian@sweetapp.com> wrote:
Also, you can't fix bugs except by releasing new versions of Python. Therefore the API must be completely stable, and the product virtually bugfree before it should be in stdlib. The best way of ensuring that is to release it as a separate module on PyPI, and let it stabilize for a couple of years.
Yeah but that model isn't likely to work with this package. Cheers, Brian
Forgive my ignorance, but why do you say that that model won't work with this package?
As I said in my last message: """Simple modules are unlikely to develop a following because it is too easy to partially replicate their functionality. urlparse and os.path are very useful modules but I doubt that they would have been successful on PyPI.""" Cheers, Brian
Is there any reason to have Future .cancelled, .done, .running as methods?
From my perspective they are really readonly properties.
BTW, is 'cancelled' correct name? Spell-checkers likes only single 'l' form: 'canceled'. On Sun, May 23, 2010 at 2:47 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On May 23, 2010, at 8:43 PM, Robert Collins wrote:
On Sun, May 23, 2010 at 10:15 PM, Brian Quinlan <brian@sweetapp.com> wrote:
Also, you can't fix bugs except by releasing new versions of Python. Therefore the API must be completely stable, and the product virtually bugfree before it should be in stdlib. The best way of ensuring that is to release it as a separate module on PyPI, and let it stabilize for a couple of years.
Yeah but that model isn't likely to work with this package. Cheers, Brian
Forgive my ignorance, but why do you say that that model won't work with this package?
As I said in my last message:
"""Simple modules are unlikely to develop a following because it is too easy to partially replicate their functionality. urlparse and os.path are very useful modules but I doubt that they would have been successful on PyPI."""
Cheers, Brian _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/andrew.svetlov%40gmail.com
On Sun, May 23, 2010 at 03:16:27PM +0400, Andrew Svetlov wrote:
Is there any reason to have Future .cancelled, .done, .running as methods?
From my perspective they are really readonly properties.
BTW, is 'cancelled' correct name? Spell-checkers likes only single 'l' form: 'canceled'.
In English, only "cancelled" is correct. In American, either is correct.
Andrew Svetlov wrote:
BTW, is 'cancelled' correct name? Spell-checkers likes only single 'l' form: 'canceled'.
I think this is an English vs. American thing. Double 'l' looks right to me, but then I was brought up as a loyal subject of the antipodean branch of the British Empire. :-) -- Greg
Brian Quinlan wrote:
Simple modules are unlikely to develop a following because it is too easy to partially replicate their functionality.
I don't think it needs a particularly large following. What it does need is at least a few people using it in some real projects. No matter how much discussion there is and how much apparent agreement is reached, it's no substitute for practical experience. Often API design mistakes are only found when trying to use the library for real. -- Greg
On Sun, May 23, 2010 at 12:15, Brian Quinlan <brian@sweetapp.com> wrote:
I doubt it. Simple modules are unlikely to develop a following because it is too easy to partially replicate their functionality. urlparse and os.path are very useful modules but I doubt that they would have been successful on PyPI.
simplejson was also fairly simple, but still developed a following.
The good news in this case is that the same API has been used successfully in Java and C++ for years so it is unlikely that any major changes will need to be made.
I would agree that having prior versions in other languages should make the API more stable, but I wouldn't agree that it doesn't need changes (and even minor changes can be a PITA in the stdlib).
Also, you can't fix bugs except by releasing new versions of Python. Therefore the API must be completely stable, and the product virtually bugfree before it should be in stdlib. The best way of ensuring that is to release it as a separate module on PyPI, and let it stabilize for a couple of years.
Yeah but that model isn't likely to work with this package.
Okay, I'll bite: why is your package different? In general, this reminds me of the ipaddr discussions. I read through the thread from March real quick to see if there was reasoning there why this package should be an exception from the "normal" standards track (that is, ripen on PyPI, then moving it in the stdlib when it's mature -- where "mature" is another word for dead, really). But then this is just another instance of the fat-stdlib vs lean-stdlib discussion, I guess, so we can go on at length. Cheers, Dirkjan
On Sun, 23 May 2010 12:43:57 +0200 Dirkjan Ochtman <dirkjan@ochtman.nl> wrote:
In general, this reminds me of the ipaddr discussions. I read through the thread from March real quick to see if there was reasoning there why this package should be an exception from the "normal" standards track (that is, ripen on PyPI, then moving it in the stdlib when it's mature -- where "mature" is another word for dead, really).
I disagree that a stdlib module is a dead module. It is perfectly possible to augment the API with new functionality without breaking compatibility. You can also deprecate old APIs if you want. Regards Antoine.
On Sun, May 23, 2010 at 12:51, Antoine Pitrou <solipsis@pitrou.net> wrote:
I disagree that a stdlib module is a dead module. It is perfectly possible to augment the API with new functionality without breaking compatibility. You can also deprecate old APIs if you want.
Right, it wasn't intended as that harsh... but it does come with a rather impressive set of constraints in terms of what you can do with the API. Cheers, Dirkjan
(Sending again - I didn't mean to drop python-dev from the cc list when I originally sent this via the gmail web interface) On Sun, May 23, 2010 at 9:00 PM, Dirkjan Ochtman <dirkjan@ochtman.nl <mailto:dirkjan@ochtman.nl>> wrote: Right, it wasn't intended as that harsh... but it does come with a rather impressive set of constraints in terms of what you can do with the API. True, but in some cases (especially low level infrastructure), it is worth accepting those constraints in order to achieve other aims (such as standardisation of techniques). Things like itertools, collections, functools, unittest owe their existence largely to the choice of gains in standardisation over flexibility of API updates. Besides, popular PyPI modules don't have that much more freedom than the stdlib when it comes to API changes. The only real difference is that the 18-24 month release cycle for the stdlib is a lot *slower* than that of many PyPI packages, so feedback on any changes we make is correspondingly delayed. Hence the existence of projects like distutils2 and unittest2 to enable that faster feedback cycle to inform the updates passed back into the more slowly evolving stdlib modules, as well as the desire to copy prior art wherever it makes sense to do so (whether that is other languages, existing PyPI modules or the internal code bases of large corporate contributors). Cheers, Nick.
On May 23, 2010, at 8:43 PM, Dirkjan Ochtman wrote:
On Sun, May 23, 2010 at 12:15, Brian Quinlan <brian@sweetapp.com> wrote:
I doubt it. Simple modules are unlikely to develop a following because it is too easy to partially replicate their functionality. urlparse and os.path are very useful modules but I doubt that they would have been successful on PyPI.
simplejson was also fairly simple, but still developed a following.
The API is simple but writing a JSON parser is hard enough that people will check to see if someone has already done the work for them (especially since JSON is fairly topical). If you are familiar with threads then writing a "good enough" solution without futures probably won't take you very long. Also, unless you are familiar with another futures implementation, you aren't likely to know where to look.
The good news in this case is that the same API has been used successfully in Java and C++ for years so it is unlikely that any major changes will need to be made.
I would agree that having prior versions in other languages should make the API more stable, but I wouldn't agree that it doesn't need changes (and even minor changes can be a PITA in the stdlib).
Some changes are hard (i.e. changing the semantics of existing method) but some are pretty easy (i.e. adding new methods). Cheers, Brian
Also, you can't fix bugs except by releasing new versions of Python. Therefore the API must be completely stable, and the product virtually bugfree before it should be in stdlib. The best way of ensuring that is to release it as a separate module on PyPI, and let it stabilize for a couple of years.
Yeah but that model isn't likely to work with this package.
Okay, I'll bite: why is your package different?
In general, this reminds me of the ipaddr discussions. I read through the thread from March real quick to see if there was reasoning there why this package should be an exception from the "normal" standards track (that is, ripen on PyPI, then moving it in the stdlib when it's mature -- where "mature" is another word for dead, really). But then this is just another instance of the fat-stdlib vs lean-stdlib discussion, I guess, so we can go on at length.
Brian Quinlan writes:
If you are familiar with threads then writing a "good enough" solution without futures probably won't take you very long. Also, unless you are familiar with another futures implementation, you aren't likely to know where to look.
That looks like an argument *against* your module, to me. Why would people look for it in the stdlib if they're not looking for it at all, and specifically because anybody who would know enough to look for "something like" it is also able to devise a good-enough solution? You're describing a solution in search of a user, not a user in search of a solution, and it would appear to violate "not every three-line function" as well as TOOWTDI. I personally plan to defer to the people who know and use such constructs (specifically Glyph and Jesse), and who seem to be in favor (at least +0) of stabilizing an API for this in the stdlib. But you may want to rethink your sales pitch if you want to avoid giving ammo to the opposition. It sounds like you agree with them, except on the vote you cast.<wink>
On 24May2010 10:47, Stephen J. Turnbull <stephen@xemacs.org> wrote: | Brian Quinlan writes: | > If you are familiar with threads then writing a "good enough" solution | > without futures probably won't take you very long. Also, unless you | > are familiar with another futures implementation, you aren't likely to | > know where to look. | | That looks like an argument *against* your module, to me. Why would | people look for it in the stdlib if they're not looking for it at all, | and specifically because anybody who would know enough to look for | "something like" it is also able to devise a good-enough solution? | You're describing a solution in search of a user, not a user in search | of a solution, and it would appear to violate "not every three-line | function" as well as TOOWTDI. This might be a terminology problem. I think, above, Brian means "good enough" to mean "looks ok at first cut but doesn't handle the corner cases". Which usually means obscure breakage later. I almost am Brian's hypothetical user. I've got a "FuncMultiQueue" that accepts callables in synchronous and asynchronous modes for future possibly-concurrent execution, just as the futures module does. I've spent a _lot_ of time debugging it. There's a lot to be said for a robust implementation of a well defined problem. Brian's module, had it been present and presuming it robust and debugged, would have been quite welcome. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ I am a Bear of Very Little Brain and long words Bother Me. - Winnie-the-Pooh
On 24 May 2010 03:58, Cameron Simpson <cs@zip.com.au> wrote:
I almost am Brian's hypothetical user. I've got a "FuncMultiQueue" that accepts callables in synchronous and asynchronous modes for future possibly-concurrent execution, just as the futures module does. I've spent a _lot_ of time debugging it.
I pretty much am that user as well (whether or not I am hypothetical, I'll leave to others to determine...) I have a set of scripts that needed to do precisely the sort of thing that the futures module offers. I searched for a fair while for a suitable offering (this was before futures had been published) and found nothing suitable. So in the end I implemented my own - and I hit corner cases, and they needed a lot of work to fix. I now have a working solution, but it's too tangled in the application logic to be reusable :-( If futures had been in the stdlib, I'd have used it like a shot, and saved myself a lot of wasted time.
There's a lot to be said for a robust implementation of a well defined problem. Brian's module, had it been present and presuming it robust and debugged, would have been quite welcome.
Precisely my view. Paul.
Cameron Simpson writes:
There's a lot to be said for a robust implementation of a well defined problem. Brian's module, had it been present and presuming it robust and debugged, would have been quite welcome.
That, of course, is the consensus view, both in general and with respect to this particular module. The difference is over what constitutes sufficient evidence for your presumption of "robust and debugged" from the point of view of the users of the stdlib.
On 24/05/10 20:46, Stephen J. Turnbull wrote:
Cameron Simpson writes:
There's a lot to be said for a robust implementation of a well defined problem. Brian's module, had it been present and presuming it robust and debugged, would have been quite welcome.
That, of course, is the consensus view, both in general and with respect to this particular module.
The difference is over what constitutes sufficient evidence for your presumption of "robust and debugged" from the point of view of the users of the stdlib.
At the very least, we'll be offering a promise to be "more robust and more debugged than what you came up with in that coding marathon last night" ;) Having a decent test suite that is regularly executed on multiple platforms (which will be the case for any accepted module by the time it is included in a Python release) also places anything we release a cut above a *lot* of in-house code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Sun, May 23, 2010 at 12:15, Brian Quinlan <brian@sweetapp.com> wrote:
You could make the same argument about any module in the stdlib.
Yeah, and that's exactly what I did.
I doubt it. Simple modules are unlikely to develop a following because it is too easy to partially replicate their functionality. urlparse and os.path are very useful modules but I doubt that they would have been successful on PyPI.
Are you saying your proposed module is so simple that anyone can easily replicate it with just a couple of lines of code?
The good news in this case is that the same API has been used successfully in Java and C++ for years so it is unlikely that any major changes will need to be made.
Good. Then the time it takes to "mature" on PyPI would be very short. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
On 23 May 2010, at 21:17, Lennart Regebro wrote:
On Sun, May 23, 2010 at 12:15, Brian Quinlan <brian@sweetapp.com> wrote:
You could make the same argument about any module in the stdlib.
Yeah, and that's exactly what I did.
I doubt it. Simple modules are unlikely to develop a following because it is too easy to partially replicate their functionality. urlparse and os.path are very useful modules but I doubt that they would have been successful on PyPI.
Are you saying your proposed module is so simple that anyone can easily replicate it with just a couple of lines of code?
Parts of it, yes. Just like I can replace most operations in os.path and urlparse with a few lines of code.
The good news in this case is that the same API has been used successfully in Java and C++ for years so it is unlikely that any major changes will need to be made.
Good. Then the time it takes to "mature" on PyPI would be very short.
How would you define "very short"? I've had the project on PyPI for about a year now: http://pypi.python.org/pypi/futures3 Cheers, Brian
On Sun, May 23, 2010 at 13:29, Brian Quinlan <brian@sweetapp.com> wrote:
Parts of it, yes. Just like I can replace most operations in os.path and urlparse with a few lines of code.
Yeah, but "parts of" is not the question. I've read the PEP, and I do *not* know how to implement it. That means it's not a trivial module, so that argument doesn't hold up here, even if we accept it as valid (which I actually don't). I don't think any module in the stdlib is entirely trivial. Yes, even parsing an URL is non-trivial, as shown by the fact that the urlparse module apparently has a bug in it for urls like svn+ssh://foo.bar/frotz. ;-) Also, even trivial modules can be useful if you use them a lot.
How would you define "very short"?
That's not up to me to decide. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
On 23/05/10 21:56, Lennart Regebro wrote:
On Sun, May 23, 2010 at 13:29, Brian Quinlan<brian@sweetapp.com> wrote:
Parts of it, yes. Just like I can replace most operations in os.path and urlparse with a few lines of code.
Yeah, but "parts of" is not the question. I've read the PEP, and I do *not* know how to implement it. That means it's not a trivial module, so that argument doesn't hold up here, even if we accept it as valid (which I actually don't). I don't think any module in the stdlib is entirely trivial. Yes, even parsing an URL is non-trivial, as shown by the fact that the urlparse module apparently has a bug in it for urls like svn+ssh://foo.bar/frotz. ;-)
In this case, the "trivial" refers to being able to write something that will get the job done for a specific task or application, but that isn't as solid from a reliability/maintainability/portability/scalability point of view. By providing solid infrastructure in the standard library, we can remove that choice between "do it fast" vs "do it right", by providing ready-made robust infrastructure. Those that say "just put it on PyPI" may not recognise the additional overhead that can be involved in identifying, obtaining approval to use and ongoing management of additional dependencies in a corporate environment that is actually performing appropriate due diligence in regards to IP licensing. This overhead can be especially significant (and, depending on licence and contract details, even a dealbreaker) for companies with specific IP licensing provisions in their contracts with their clients. It doesn't matter *how* easy we make it to download PyPI packages, we can't do anything about such company IP management policies (except for making it easier for programmers to violate them thoughtlessly, of course). To use myself as an example, I have several utilities that I probably would have written differently if the futures module had been available in the standard library at the time I wrote them. As it is, they work well enough, but their usage of the threading module is fairly ad hoc (and migrating them to use multiprocessing would be a fairly complex task, and making that decision run-time selectable even more complicated). In the near-term, backports of future standard library modules are much easier to get through a corporate review process as the licensing is typically similar to the PSF license (or is even the PSF license itself) and the modules come with a clear roadmap for eliminating the dependency (i.e. once the baseline Python version employed by the company includes the module in the standard library, the external dependency is no longer needed). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, May 26, 2010 at 02:10, Nick Coghlan <ncoghlan@gmail.com> wrote:
Those that say "just put it on PyPI" may not recognise the additional ...
Just a note, so we don't get sidelined by misunderstandings: I don't think anybody said that. ;-) There are two issues here, one generic and one specific: Generic: Modules should go on PyPI first, for a time, to stabilize (and so they can be used in earlier versions of Python) before they end up in stdlib. I suspect everyone actually agrees on that (but I could be wrong). Specific:Has futures been long enough on PyPI, and is it stable? I'm staying out of that discussion. :-) -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
On 26/05/10 12:29, Lennart Regebro wrote:
On Wed, May 26, 2010 at 02:10, Nick Coghlan<ncoghlan@gmail.com> wrote:
Those that say "just put it on PyPI" may not recognise the additional ...
Just a note, so we don't get sidelined by misunderstandings: I don't think anybody said that. ;-)
Nah, that pseudo-quote wasn't from this discussion in particular. It's a reference to the ongoing tension between the "batteries included" advocates and the "make the standard library as streamlined as possible" crowd. Both sides have valid points, so the "included battery" vs "optional download" question needs to be decided on a case-by-case basis.
There are two issues here, one generic and one specific:
Generic: Modules should go on PyPI first, for a time, to stabilize (and so they can be used in earlier versions of Python) before they end up in stdlib. I suspect everyone actually agrees on that (but I could be wrong).
That's the point I'm disagreeing with. For most modules it makes sense to do things that way, but for some low-level infrastructure elements, it is going to be less effective (because people will quickly throw together their own solutions instead of adding a new dependency for something "simple"). Other times we'll invent a new module because *we* need it for something (e.g. runpy). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Having read through the PEP again, here are my thoughts. * I'm bothered by the term "future". To my mind, it's too long on cleverness and too short on explanativeness. I think that the standard library is no place for cuteness of naming. The name of a stdlib module should reflect its functionality in some straightforward and obvious way. If I were looking for a thread pool or process pool implementation, the word "future" is not something that would spring readily to mind. The stated purpose of the module is to "execute computations asynchronously", so perhaps a name such as "asyntask" would be appropriate, following the pattern of existing modules dealing with ansynchronous matters, ansyncore and asynchat. For the Future object itself, I'd suggest something like "Task" or "Job". * It seems unnecessarily verbose to tack "Executor" onto the end of every Executor subclass. They could simply be called ThreadPool and ProcessPool without losing anything. * I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code. (More generally, I'm inclined to think that introducing a namespace package for a category of modules having existing members in the stdlib is an anti-pattern, unless it's done during the kind of namespace refactoring that we won't get another chance to perform until Py4k.) Concerning the structure of the PEP: * A section titled 'Specification' should *not* start with a bunch of examples. It may be appropriate to include short examples *following* items in the specification in order to illustrate the features concerned. Extended examples such as these belong in a section of their own. * I found the examples included to be rather difficult to follow, and they served more to confuse than elucidate. I think this is partly because they are written in a fairly compressed style, burying important things being illustrated inside complicated expressions. Rewriting them in a more straightforward style might help. Concerning details of the specification: * Is it possible to have more than one Executor active at a time? The fact that as_completed() is a module-level function rather than an Executor method suggests that it is, but it would be good to have this spelled out one way or the other in the PEP. -- Greg
On May 26, 2010, at 8:57 PM, Greg Ewing wrote:
Having read through the PEP again, here are my thoughts.
* I'm bothered by the term "future". To my mind, it's too long on cleverness and too short on explanativeness.
I think that the standard library is no place for cuteness of naming. The name of a stdlib module should reflect its functionality in some straightforward and obvious way. If I were looking for a thread pool or process pool implementation, the word "future" is not something that would spring readily to mind.
The stated purpose of the module is to "execute computations asynchronously", so perhaps a name such as "asyntask" would be appropriate, following the pattern of existing modules dealing with ansynchronous matters, ansyncore and asynchat. For the Future object itself, I'd suggest something like "Task" or "Job".
"future" is a computing science term of art, like "thread". Anyway, this has been discussed in the past and Guido was happy with the name.
* It seems unnecessarily verbose to tack "Executor" onto the end of every Executor subclass. They could simply be called ThreadPool and ProcessPool without losing anything.
You could have general thread pools that aren't related to executors (actually, it would be great if Python had a good built-in thread pool implementation) and I'd like to avoid using an overly generic name.
* I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code.
I think that Jesse was planning to add some functionality to this namespace. I don't really have an opinion on this.
(More generally, I'm inclined to think that introducing a namespace package for a category of modules having existing members in the stdlib is an anti-pattern, unless it's done during the kind of namespace refactoring that we won't get another chance to perform until Py4k.)
Concerning the structure of the PEP:
* A section titled 'Specification' should *not* start with a bunch of examples. It may be appropriate to include short examples *following* items in the specification in order to illustrate the features concerned. Extended examples such as these belong in a section of their own.
I thought that the specification would be difficult to follow without examples to pave the way. Anyone else have an opinion on this?
* I found the examples included to be rather difficult to follow, and they served more to confuse than elucidate. I think this is partly because they are written in a fairly compressed style, burying important things being illustrated inside complicated expressions. Rewriting them in a more straightforward style might help.
Do you think starting with a simpler example would help? I think that idiomatic future use will end up looking similar to my examples. If that is too complex for most users then we have a problem.
Concerning details of the specification:
* Is it possible to have more than one Executor active at a time?
Of course.
The fact that as_completed() is a module-level function rather than an Executor method suggests that it is, but it would be good to have this spelled out one way or the other in the PEP.
I'll add a note to the global functions that they can accept futures from different in the same call. Cheers, Brian
-- Greg _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brian%40sweetapp.com
Brian Quinlan wrote:
I think that Jesse was planning to add some functionality to this namespace.
Even if that happens, the existing threading and multiprocessing modules would remain outside of it.
You could have general thread pools that aren't related to executors
Yes, but it should be fairly obvious that the ones defined in the futures module have to do with futures. Namespaces are only a honking great idea if you actually let them do the job they're designed for.
I thought that the specification would be difficult to follow without examples to pave the way.
Well, for me, what happened was that I saw the examples and thought "WTF is going on here?" Then I read the specification to figure out how the examples worked. It might be better to have a tutorial section preceeding the specification section, containing explanation interspersed with examples.
I think that idiomatic future use will end up looking similar to my examples.
Maybe, but code written for pedagogical purposes needs to meet a particularly high standard of clarity. Remember that the reader isn't yet familiar with the idioms, so idiomatic code isn't necessarily going to be easy for him to follow.
* Is it possible to have more than one Executor active at a time?
Of course.
That's good, but I don't think that the "of course" is at all obvious, considering that things such as GUI event loops generally can't be mixed easily. -- Greg
On Wed, May 26, 2010 at 7:36 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Brian Quinlan wrote:
I think that Jesse was planning to add some functionality to this namespace.
Even if that happens, the existing threading and multiprocessing modules would remain outside of it.
Not entirely; once concurrent.* comes into existence, I will seriously begin looking at what we can move out of multiprocessing, into concurrent.* alongside futures.
You could have general thread pools that aren't related to executors
Yes, but it should be fairly obvious that the ones defined in the futures module have to do with futures. Namespaces are only a honking great idea if you actually let them do the job they're designed for.
concurrent.* is the namespace, futures is the package within the namespace - concurrent.futures is highly descriptive of the items contained therein. jesse
On 27/05/10 12:04, Jesse Noller wrote:
Namespaces are only a honking great idea if you actually let them do the job they're designed for.
concurrent.* is the namespace, futures is the package within the namespace - concurrent.futures is highly descriptive of the items contained therein.
I was referring to the issue of ThreadPool vs. ThreadPoolExecutor etc. By your own argument above, concurrent.futures.ThreadPool is quite descriptive enough of what it provides. It's not a problem if some other module also provides something called a ThreadPool. -- Greg
On May 27, 2010, at 1:21 PM, Greg Ewing wrote:
On 27/05/10 12:04, Jesse Noller wrote:
Namespaces are only a honking great idea if you actually let them do the job they're designed for.
concurrent.* is the namespace, futures is the package within the namespace - concurrent.futures is highly descriptive of the items contained therein.
I was referring to the issue of ThreadPool vs. ThreadPoolExecutor etc. By your own argument above, concurrent.futures.ThreadPool is quite descriptive enough of what it provides. It's not a problem if some other module also provides something called a ThreadPool.
I think that the "Executor" suffix is a good indicator of the interface being provided. "Pool" is not because you can could have Executor implementations that don't involve pools. Cheers, Brian
Brian Quinlan wrote:
I think that the "Executor" suffix is a good indicator of the interface being provided.
It's not usually considered necessary for the name of a type to indicate its interface. We don't have 'listsequence' and 'dictmapping' for example. I think what bothers me most about these names is their longwindedness. Two parts to a name is okay, but three or more starts to sound pedantic. And for me, "Pool" is a more important piece of information than "Executor". The fact that it manages a pool is the main reason I'd use such a module rather than just spawning a thread myself for each task. -- Greg
On 28 May 2010, at 09:18, Greg Ewing wrote:
Brian Quinlan wrote:
I think that the "Executor" suffix is a good indicator of the interface being provided.
It's not usually considered necessary for the name of a type to indicate its interface. We don't have 'listsequence' and 'dictmapping' for example.
I think what bothers me most about these names is their longwindedness. Two parts to a name is okay, but three or more starts to sound pedantic. And for me, "Pool" is a more important piece of information than "Executor". The fact that it manages a pool is the main reason I'd use such a module rather than just spawning a thread myself for each task.
Actually, an executor implementation that created a new thread per task would still be useful - it would save you the hassle of developing a mechanism to wait for the thread to finish and to collect the results. We actually have such an implementation at Google and it is quite popular. Cheers, Brian
On 27/05/10 09:36, Greg Ewing wrote:
Brian Quinlan wrote:
I think that Jesse was planning to add some functionality to this namespace.
Even if that happens, the existing threading and multiprocessing modules would remain outside of it.
You could have general thread pools that aren't related to executors
Yes, but it should be fairly obvious that the ones defined in the futures module have to do with futures. Namespaces are only a honking great idea if you actually let them do the job they're designed for.
futures.ThreadPoolExecutor would likely be refactored to inherit from the mooted pool.ThreadPool. I'd like to leave that option open, and having two classes with the same name from different modules in a single inheritance tree is one of the places where module namespacing still isn't quite as tidy as we might wish. I'd also consider a simple thread pool and an actual executor different things. I'm fine with the longer names, but if I was going to drop a word from the names, it would actually be "Pool" (i.e. ThreadExecutor, ProcessExecutor). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Thu, 27 May 2010 10:19:50 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
futures.ThreadPoolExecutor would likely be refactored to inherit from the mooted pool.ThreadPool.
There still doesn't seem to be reason to have two different thread pool APIs, though. Shouldn't there be one obvious way to do it?
I'd also consider a simple thread pool and an actual executor different things. I'm fine with the longer names, but if I was going to drop a word from the names, it would actually be "Pool" (i.e. ThreadExecutor, ProcessExecutor).
To me, ThreadPool looks a lot more obvious than ThreadExecutor ("obvious" in that I can easily find it again, and I don't need to read some documentation to know what it is). Regards Antoine.
On 27/05/10 10:29, Antoine Pitrou wrote:
On Thu, 27 May 2010 10:19:50 +1000 Nick Coghlan<ncoghlan@gmail.com> wrote:
futures.ThreadPoolExecutor would likely be refactored to inherit from the mooted pool.ThreadPool.
There still doesn't seem to be reason to have two different thread pool APIs, though. Shouldn't there be one obvious way to do it?
Executors and thread pools are not the same thing. I might create a thread pool for *anything*. An executor will always have a specific execution model associated with it (whether it be called futures, as in this case, or runnables or something else). This confusion is making me think that dropping the "Pool" from the names might even be beneficial (since, to my mind, it currently emphasises a largely irrelevant implementation detail). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Fri, 28 May 2010 02:05:14 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
Executors and thread pools are not the same thing.
I might create a thread pool for *anything*. An executor will always have a specific execution model associated with it (whether it be called futures, as in this case, or runnables or something else).
I'm sorry, but what is the specific execution model you are talking about, and how is it different from what you usually do with a thread pool? Why would you do something other with a thread pool than running tasks and (optionally) collecting their results? Thanks Antoine.
On 28/05/10 02:16, Antoine Pitrou wrote:
On Fri, 28 May 2010 02:05:14 +1000 Nick Coghlan<ncoghlan@gmail.com> wrote:
Executors and thread pools are not the same thing.
I might create a thread pool for *anything*. An executor will always have a specific execution model associated with it (whether it be called futures, as in this case, or runnables or something else).
I'm sorry, but what is the specific execution model you are talking about, and how is it different from what you usually do with a thread pool? Why would you do something other with a thread pool than running tasks and (optionally) collecting their results?
Both the execution and communications models may be different. The components may be long-lived state machines, they may be active objects, they may communicate by message passing or by manipulating shared state, who knows. Executors are designed around a model of "go do this and let me know when you're done". A general purpose pool needs to support other execution models, and hence will look different (and is harder to design and write, since it needs to be more flexible). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Brian Quinlan <brian <at> sweetapp.com> writes:
"future" is a computing science term of art, like "thread". Anyway, this has been discussed in the past and Guido was happy with the name.
I've not seen this mentioned, but on such a long thread I might have missed it: we already have a "__future__" module, as in from __future__ import with_statement and to my mind, this is a potential point of confusion with the term "future".
* It seems unnecessarily verbose to tack "Executor" onto the end of every Executor subclass. They could simply be called ThreadPool and ProcessPool without losing anything.
You could have general thread pools that aren't related to executors (actually, it would be great if Python had a good built-in thread pool implementation) and I'd like to avoid using an overly generic name.
Aren't executors and pools both housekeepers around a bunch of threads which execute code as a service for other threads? A thread is useless unless it executes code, isn't it? I'm not sure I understand the fine distinction between an executor and a pool. Having Executor as a suffix will give a point of familiarity to those who already know java.util.concurrent. And ProcessPool might cause confusion with multiprocessing.Pool because at first glance they seem to be the same thing.
* I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code.
I think that Jesse was planning to add some functionality to this namespace. I don't really have an opinion on this.
I'm not sure of the benefit of a "concurrent" namespace, since it wouldn't contain the existing concurrency stuff. I think it's something to consider only for a big reorg which would break backward compatibility. IMO it would make more sense to leave this module as a top-level module for now (a sibling to "threading", "multiprocessing"). Regards, Vinay Sajip
On Sat, 29 May 2010 08:28:46 am Vinay Sajip wrote:
I've not seen this mentioned, but on such a long thread I might have missed it: we already have a "__future__" module, as in
from __future__ import with_statement
and to my mind, this is a potential point of confusion with the term "future". [...] I'm not sure of the benefit of a "concurrent" namespace, since it wouldn't contain the existing concurrency stuff. I think it's something to consider only for a big reorg which would break backward compatibility. IMO it would make more sense to leave this module as a top-level module for now (a sibling to "threading", "multiprocessing").
I have suggested a way to move the existing concurrency stuff without breaking backwards compatibility, and Terry Reedy asked if it would work. I haven't seen any responses, either positive or negative. For the record, my suggestion was: for each concurrency modules: move it into the concurrency package add a top level module with the same name containing: # e.g. for threading from concurrency.threading import * Then in some future Python version, each top level module gets a PendingDeprecation warning, followed by a Deprecation warning some time later, and eventually in the indefinite future removal of the top level module. Leaving the futures module in the top level of the std lib is far more likely to confuse users than adding it to its own namespace. Compare: import __future__ import futures with: import __future__ import concurrency.futures In my opinion, it is high time for the std lib to pay more attention to the final Zen: Namespaces are one honking great idea -- let's do more of those! -- Steven D'Aprano
On May 28, 2010, at 8:12 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, 29 May 2010 08:28:46 am Vinay Sajip wrote:
I've not seen this mentioned, but on such a long thread I might have missed it: we already have a "__future__" module, as in
from __future__ import with_statement
and to my mind, this is a potential point of confusion with the term "future". [...] I'm not sure of the benefit of a "concurrent" namespace, since it wouldn't contain the existing concurrency stuff. I think it's something to consider only for a big reorg which would break backward compatibility. IMO it would make more sense to leave this module as a top-level module for now (a sibling to "threading", "multiprocessing").
I have suggested a way to move the existing concurrency stuff without breaking backwards compatibility, and Terry Reedy asked if it would work. I haven't seen any responses, either positive or negative.
For the record, my suggestion was:
for each concurrency modules: move it into the concurrency package add a top level module with the same name containing: # e.g. for threading from concurrency.threading import *
Then in some future Python version, each top level module gets a PendingDeprecation warning, followed by a Deprecation warning some time later, and eventually in the indefinite future removal of the top level module.
Leaving the futures module in the top level of the std lib is far more likely to confuse users than adding it to its own namespace. Compare:
import __future__ import futures
with:
import __future__ import concurrency.futures
In my opinion, it is high time for the std lib to pay more attention to the final Zen:
Namespaces are one honking great idea -- let's do more of those!
Yes, your suggestion for how to move things is the way we would want to do it, and in the back of my head, what we should do long term - just not right now.
On 29/05/10 10:19, Jesse Noller wrote:
In my opinion, it is high time for the std lib to pay more attention to the final Zen:
Namespaces are one honking great idea -- let's do more of those!
Yes, your suggestion for how to move things is the way we would want to do it, and in the back of my head, what we should do long term - just not right now.
Yep, this is what I have been saying as well. 1. Using concurrency.futures rather than a top level futures module resolves the potential confusion with __future__ and stock market futures without inventing our own name for a well established computer science concept. 2. With the concurrency package in place following PEP 3148, we can separately consider the question of if/when/how to move other concurrency related modules (e.g. threading, multiprocessing, Queue) into that package at a later date. Since this topic keeps coming up, some reasoning along these lines should go into PEP 3148. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On May 28, 2010, at 11:31 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 29/05/10 10:19, Jesse Noller wrote:
In my opinion, it is high time for the std lib to pay more attention to the final Zen:
Namespaces are one honking great idea -- let's do more of those!
Yes, your suggestion for how to move things is the way we would want to do it, and in the back of my head, what we should do long term - just not right now.
Yep, this is what I have been saying as well.
1. Using concurrency.futures rather than a top level futures module resolves the potential confusion with __future__ and stock market futures without inventing our own name for a well established computer science concept.
2. With the concurrency package in place following PEP 3148, we can separately consider the question of if/when/how to move other concurrency related modules (e.g. threading, multiprocessing, Queue) into that package at a later date.
Since this topic keeps coming up, some reasoning along these lines should go into PEP 3148.
I'll type something up this weekend and shoot it to Brian for inclusion. I was hoping to be able to keep it out of the futures pep itself, but it seems that won't work :) Jesse
On 29/05/10 22:46, Jesse Noller wrote:
On May 28, 2010, at 11:31 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Since this topic keeps coming up, some reasoning along these lines should go into PEP 3148.
I'll type something up this weekend and shoot it to Brian for inclusion. I was hoping to be able to keep it out of the futures pep itself, but it seems that won't work :)
Well, punting on whether or not we actually *do* part 2 is still fine. As Eric pointed out, there are issues with unpickling that make the wisdom of following through with renaming any existing modules fairly questionable. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Steven D'Aprano wrote:
I have suggested a way to move the existing concurrency stuff without breaking backwards compatibility, and Terry Reedy asked if it would work. I haven't seen any responses, either positive or negative.
For the record, my suggestion was:
for each concurrency modules: move it into the concurrency package add a top level module with the same name containing: # e.g. for threading from concurrency.threading import *
In the past the problem identified with this approach has been that pickles produced with new pythons would not be readable by older pythons. I think this was the main reason that Brett's 3.0 library reorganization wasn't more radical. Theres a discussion if this here: http://mail.python.org/pipermail/python-dev/2008-May/079535.html http://mail.python.org/pipermail/stdlib-sig/2008-May/000303.html and a little more here: http://bugs.python.org/issue2775 Eric.
On Fri, May 28, 2010 at 17:12, Steven D'Aprano <steve@pearwood.info> wrote:
On Sat, 29 May 2010 08:28:46 am Vinay Sajip wrote:
I've not seen this mentioned, but on such a long thread I might have missed it: we already have a "__future__" module, as in
from __future__ import with_statement
and to my mind, this is a potential point of confusion with the term "future". [...] I'm not sure of the benefit of a "concurrent" namespace, since it wouldn't contain the existing concurrency stuff. I think it's something to consider only for a big reorg which would break backward compatibility. IMO it would make more sense to leave this module as a top-level module for now (a sibling to "threading", "multiprocessing").
I have suggested a way to move the existing concurrency stuff without breaking backwards compatibility, and Terry Reedy asked if it would work. I haven't seen any responses, either positive or negative.
For the record, my suggestion was:
for each concurrency modules: move it into the concurrency package add a top level module with the same name containing: # e.g. for threading from concurrency.threading import *
Then in some future Python version, each top level module gets a PendingDeprecation warning, followed by a Deprecation warning some time later, and eventually in the indefinite future removal of the top level module.
This was the procedure we used for about a month for Python 2.6 in order to help renamed modules migrate to their new names in Python 3. The issue that came up (and forced use to revert this approach and fully rely on 2to3) was anything pickled by the older interpreters is not going to be happy with that shift. Luckily the stuff being moved most likely does not contain things that have been pickled and stored to disk for ages and thus would break in a transition.
Vinay Sajip wrote:
Brian Quinlan <brian <at> sweetapp.com> writes:
"future" is a computing science term of art, like "thread". Anyway, this has been discussed in the past and Guido was happy with the name.
I've not seen this mentioned, but on such a long thread I might have missed it: we already have a "__future__" module, as in
from __future__ import with_statement
and to my mind, this is a potential point of confusion with the term "future".
* It seems unnecessarily verbose to tack "Executor" onto the end of every Executor subclass. They could simply be called ThreadPool and ProcessPool without losing anything. You could have general thread pools that aren't related to executors (actually, it would be great if Python had a good built-in thread pool implementation) and I'd like to avoid using an overly generic name.
Aren't executors and pools both housekeepers around a bunch of threads which execute code as a service for other threads? A thread is useless unless it executes code, isn't it? I'm not sure I understand the fine distinction between an executor and a pool. Having Executor as a suffix will give a point of familiarity to those who already know java.util.concurrent. And ProcessPool might cause confusion with multiprocessing.Pool because at first glance they seem to be the same thing.
* I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code. I think that Jesse was planning to add some functionality to this namespace. I don't really have an opinion on this.
I'm not sure of the benefit of a "concurrent" namespace, since it wouldn't contain the existing concurrency stuff. I think it's something to consider only for a big reorg which would break backward compatibility. IMO it would make more sense to leave this module as a top-level module for now (a sibling to "threading", "multiprocessing").
Unless there's some way of having the two namespaces (existing and concurrent-oriented) simultaneously coexist. A single implementation with two separate namespace mappings? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000
On Wed, 26 May 2010 08:57:15 pm Greg Ewing wrote:
* I'm bothered by the term "future". To my mind, it's too long on cleverness and too short on explanativeness.
"Futures" is a standard term in computer science which has been around for 33 years. http://en.wikipedia.org/wiki/Futures_and_promises
I think that the standard library is no place for cuteness of naming.
You mean like pickle, marshal, shelve, turtle, and even dict?
* I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code.
I'm sure that it can be done easily, although not quickly. For instance, we move threading into the concurrent namespace, and leave behind in its place a stub: from concurrent.threading import * Then for (say) 3.3 the stub could gain a PendingDeprecation warning, then in 3.4 a Deprecation warning, and finally in 3.5 or 3.6 it could be removed. -- Steven D'Aprano
On 26/05/10 20:57, Greg Ewing wrote:
Having read through the PEP again, here are my thoughts. * It seems unnecessarily verbose to tack "Executor" onto the end of every Executor subclass. They could simply be called ThreadPool and ProcessPool without losing anything.
We would lose the ability to add general purpose thread and process pools under the obvious names later.
* I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code.
(More generally, I'm inclined to think that introducing a namespace package for a category of modules having existing members in the stdlib is an anti-pattern, unless it's done during the kind of namespace refactoring that we won't get another chance to perform until Py4k.)
_thread, threading, Queue and multiprocessing do likely belong here, but moving them isn't likely to be worth the pain. Does it help to know that at least Jesse and I (and probably others) would like to see concurrent.pool added eventually with robust general purpose ThreadPool and ProcessPool implementations? The specific reason the new package namespace was added was to help avoid confusion with stock market futures without using an unduly cumbersome module name, but I don't know how well the PEP explains that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On 26 May 2010, at 22:42, Nick Coghlan wrote:
On 26/05/10 20:57, Greg Ewing wrote:
Having read through the PEP again, here are my thoughts. * It seems unnecessarily verbose to tack "Executor" onto the end of every Executor subclass. They could simply be called ThreadPool and ProcessPool without losing anything.
We would lose the ability to add general purpose thread and process pools under the obvious names later.
* I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code.
(More generally, I'm inclined to think that introducing a namespace package for a category of modules having existing members in the stdlib is an anti-pattern, unless it's done during the kind of namespace refactoring that we won't get another chance to perform until Py4k.)
_thread, threading, Queue and multiprocessing do likely belong here, but moving them isn't likely to be worth the pain. Does it help to know that at least Jesse and I (and probably others) would like to see concurrent.pool added eventually with robust general purpose ThreadPool and ProcessPool implementations?
The specific reason the new package namespace was added was to help avoid confusion with stock market futures without using an unduly cumbersome module name, but I don't know how well the PEP explains that.
It doesn't at all. Are these plans formalized anywhere that I can link to? Cheers, Brian
On Wed, May 26, 2010 at 9:01 AM, Brian Quinlan <brian@sweetapp.com> wrote:
On 26 May 2010, at 22:42, Nick Coghlan wrote:
On 26/05/10 20:57, Greg Ewing wrote:
Having read through the PEP again, here are my thoughts. * It seems unnecessarily verbose to tack "Executor" onto the end of every Executor subclass. They could simply be called ThreadPool and ProcessPool without losing anything.
We would lose the ability to add general purpose thread and process pools under the obvious names later.
* I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code.
(More generally, I'm inclined to think that introducing a namespace package for a category of modules having existing members in the stdlib is an anti-pattern, unless it's done during the kind of namespace refactoring that we won't get another chance to perform until Py4k.)
_thread, threading, Queue and multiprocessing do likely belong here, but moving them isn't likely to be worth the pain. Does it help to know that at least Jesse and I (and probably others) would like to see concurrent.pool added eventually with robust general purpose ThreadPool and ProcessPool implementations?
The specific reason the new package namespace was added was to help avoid confusion with stock market futures without using an unduly cumbersome module name, but I don't know how well the PEP explains that.
It doesn't at all. Are these plans formalized anywhere that I can link to?
Cheers, Brian
Nope; and I don't think we need to worry about it right now.
On 26/05/10 23:01, Brian Quinlan wrote:
_thread, threading, Queue and multiprocessing do likely belong here, but moving them isn't likely to be worth the pain. Does it help to know that at least Jesse and I (and probably others) would like to see concurrent.pool added eventually with robust general purpose ThreadPool and ProcessPool implementations?
The specific reason the new package namespace was added was to help avoid confusion with stock market futures without using an unduly cumbersome module name, but I don't know how well the PEP explains that.
It doesn't at all. Are these plans formalized anywhere that I can link to?
Just the previous lot of discussions. The main point that should be mentioned in the PEP is that "futures" on its own was ambiguous as to the applicable domain, but "concurrent.futures" was perfectly clear, without causing any readability problems the way a longer name could. Moving the general purpose pools out to their own module was just an example that occurred to us as something else that could go in that package rather than a concrete plan for implementation. Yes, we're setting ourselves up for inevitable questions as to why the existing modules are top level rather than part of this package, but the minimal pain response there would be to link to them from the package documentation with a note along the lines of "for historical reasons, some modules you might reasonably expect to find in this package are instead provided as top level modules". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, May 26, 2010 at 8:42 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 26/05/10 20:57, Greg Ewing wrote:
Having read through the PEP again, here are my thoughts. * It seems unnecessarily verbose to tack "Executor" onto the end of every Executor subclass. They could simply be called ThreadPool and ProcessPool without losing anything.
We would lose the ability to add general purpose thread and process pools under the obvious names later.
* I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code.
(More generally, I'm inclined to think that introducing a namespace package for a category of modules having existing members in the stdlib is an anti-pattern, unless it's done during the kind of namespace refactoring that we won't get another chance to perform until Py4k.)
_thread, threading, Queue and multiprocessing do likely belong here, but moving them isn't likely to be worth the pain. Does it help to know that at least Jesse and I (and probably others) would like to see concurrent.pool added eventually with robust general purpose ThreadPool and ProcessPool implementations?
The specific reason the new package namespace was added was to help avoid confusion with stock market futures without using an unduly cumbersome module name, but I don't know how well the PEP explains that.
I assume(d) it's sufficient to link to the mailing list threads where we hashed this out already ;) The namespace serves a few purposes: 1 > We put futures where it makes sense - into a concurrent package. Futures are a concurrency construct; therefore it simply makes sense to put them within a sub package rather on the top level. 2 > We carve out a box to move to, and add other concurrent things, such as generic pools, Actor implementations, etc. See java.util.concurrent. Things within multiprocessing that don't start with P and rhyme with "rocess" can go here too. Admittedly, it's mainly my own long-term vision to see python-core grow more concurrency tidbits - although I don't know too many people who would complain about it. jesse
On 26/05/10 20:57, Greg Ewing wrote:
(More generally, I'm inclined to think that introducing a namespace package for a category of modules having existing members in the stdlib is an anti-pattern,
As a user, I agree with this.
unless it's done during the kind of namespace refactoring that we won't get another chance to perform until Py4k.)
Is Steven D'Aprano's suggestion (in another post) possible?
I'm sure that it can be done easily, although not quickly. For instance, we move threading into the concurrent namespace, and leave behind in its place a stub:
from concurrent.threading import *
Then for (say) 3.3 the stub could gain a PendingDeprecation warning, then in 3.4 a Deprecation warning, and finally in 3.5 or 3.6 it could be removed.
On 5/26/2010 8:42 AM, Nick Coghlan wrote:
_thread, threading, Queue and multiprocessing do likely belong here, but moving them isn't likely to be worth the pain.
As a user, I disagree and think collecting together these and future modules (pun intended) would be beneficial. There are, from my viewpoint, too many top level modules already. [and in another thread]
Yes, we're setting ourselves up for inevitable questions as to why the existing modules are top level rather than part of this package,
Yes, forever until they are put in the package.
but the minimal pain responsethere
For the developers, for the short term
would be to link to them from the package documentation with a note along the lines of "for historical reasons, some modules you might reasonably expect to find in this package are instead provided as top level modules".
You are, in my opinion, overly discounting the cognitive load and confusion on the part of new users. It would be much better to link *to* subpackage documentation *from* a top level entries (until deleted) and just say that the top level synonyms are present for the obvious historical reason that there once no package, just modules. I am suggesting that if we add a package, we do it right, from the beginning. Terry Jan Reedy
On 27/05/10 02:27, Terry Reedy wrote:
I am suggesting that if we add a package, we do it right, from the beginning.
This is a reasonable point of view, but I wouldn't want to hold up PEP 3148 over it (call it a +0 for the idea in general, but a -1 for linking it to the acceptance of PEP 3148). A separate short PEP proposing a migration plan that could be accepted or rejected independently of PEP 3148 would likely be valuable. E.g. - no change in 2.x (obviously) - add concurrent.* alternate names in 3.x - rearrange documentation in 3.x, with pointers from old names to new names - put a PendingDeprecationWarning on the old names, but otherwise leave them alone indefinitely - add 2to3 fixers to translate from the old names to the new names in import statements Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, May 26, 2010 at 8:03 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 27/05/10 02:27, Terry Reedy wrote:
I am suggesting that if we add a package, we do it right, from the beginning.
This is a reasonable point of view, but I wouldn't want to hold up PEP 3148 over it (call it a +0 for the idea in general, but a -1 for linking it to the acceptance of PEP 3148).
A separate short PEP proposing a migration plan that could be accepted or rejected independently of PEP 3148 would likely be valuable.
E.g. - no change in 2.x (obviously) - add concurrent.* alternate names in 3.x - rearrange documentation in 3.x, with pointers from old names to new names - put a PendingDeprecationWarning on the old names, but otherwise leave them alone indefinitely - add 2to3 fixers to translate from the old names to the new names in import statements
Cheers, Nick.
Agreed; and intended as a different PEP.
On 5/26/2010 8:03 PM, Nick Coghlan wrote:
On 27/05/10 02:27, Terry Reedy wrote:
I am suggesting that if we add a package, we do it right, from the beginning.
This is a reasonable point of view, but I wouldn't want to hold up PEP 3148 over it (call it a +0 for the idea in general, but a -1 for linking it to the acceptance of PEP 3148).
That sounds backward. How can you justify accepting PEP 3148 into a "concurrent" namespace without also accepting the demand for such a namespace? What is the contingency if this TBD migration PEP is not accepted, what happens to PEP 3148? After all, there was some complaints about just calling it "futures", without putting it in a "concurrent" namespace. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
On 27/05/10 12:48, Scott Dial wrote:
On 5/26/2010 8:03 PM, Nick Coghlan wrote:
On 27/05/10 02:27, Terry Reedy wrote:
I am suggesting that if we add a package, we do it right, from the beginning.
This is a reasonable point of view, but I wouldn't want to hold up PEP 3148 over it (call it a +0 for the idea in general, but a -1 for linking it to the acceptance of PEP 3148).
That sounds backward. How can you justify accepting PEP 3148 into a "concurrent" namespace without also accepting the demand for such a namespace? What is the contingency if this TBD migration PEP is not accepted, what happens to PEP 3148? After all, there was some complaints about just calling it "futures", without putting it in a "concurrent" namespace.
We can accept PEP 3148 by saying that we're happy to add the extra namespace level purely for disambiguation purposes, even if we never follow through on adding anything else to the package (although I consider such an outcome to be highly unlikely). Any future additions or renames to move things into the concurrent package would then be handled as their own PEPs. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On 28/05/10 09:52, Greg Ewing wrote:
Nick Coghlan wrote:
We can accept PEP 3148 by saying that we're happy to add the extra namespace level purely for disambiguation purposes,
If that's the only rationale for the namespace, it makes it sound like a kludge to work around a poor choice of name.
It's the right name though (it really is a futures implementation - I don't know what else you would even consider calling it). The problem is that the same word is used to mean different things in other programming domains (most obviously finance). Resolving that kind of ambiguity is an *excellent* use of a package namespace - you remove the ambiguity without imposing any significant long term cognitive overhead. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, May 26, 2010 at 3:57 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Having read through the PEP again, here are my thoughts.
* I'm bothered by the term "future". To my mind, it's too long on cleverness and too short on explanativeness.
I think that the standard library is no place for cuteness of naming. The name of a stdlib module should reflect its functionality in some straightforward and obvious way. If I were looking for a thread pool or process pool implementation, the word "future" is not something that would spring readily to mind.
Please go re-read the older threads on this. For example, http://mail.python.org/pipermail/python-dev/2010-March/098279.html.
* It seems unnecessarily verbose to tack "Executor" onto the end of every Executor subclass. They could simply be called ThreadPool and ProcessPool without losing anything.
+0
* I don't see a strong reason to put this module inside a newly-created namespace. If there were a namespace called "concurrent", I would expect to find other existing concurrency-related modules there as well, such as threading and multiprocessing. But we can't move them there without breaking existing code.
Again, please re-read the older thread (which you participated in). For example, http://mail.python.org/pipermail/python-dev/2010-March/098200.html. Jeffrey
Nick Coghlan writes:
Those that say "just put it on PyPI"
Nobody is saying that, AFAICS. Nobody is saying that *some* futures module shouldn't *eventually* go into the stdlib. The question is whether *this* futures module should go into the stdlib *now*. And it is clear that more time on PyPI would provide valuable information. This is a general principle that has served us well: put best current practice backed up by actual widespread usage in the stdlib, not theoretical wins based on the developer's experience. PyPI is a way to broaden usage to determine BCP, not an end in itself. People have been asking "what's special about this module, to violate the BCP principle?" There's nothing special about the fact that several people would use a "robust and debugged" futures module if it were in the stdlib. That's true of *every* module that is worth a PEP. But remember, in the case of ipaddr it was the people who wanted some such module badly who were also the most vocal opponents, because they could see offhand that it was going to serve their use cases badly. (It turned out that this was equally trivial to fix despite a week of hot debate, and everyone lived happily ever after. But that was smiling Luck, not inexorable Fate.) For this module, three people have said "I 'would have' used it if it were available," but none of you has announced that you've started refactoring and the PEP 3148 API meets all expectations. I call that "damning with faint praise". OTOH, Glyph has changed from "why not more time on PyPI?" to "let's see if we can improve this a bit, then let's do it". He has published code (showing how to turn futures into Twisted Deferreds), and argues that based on download stats to date and the nature of the use cases it would take a lot of time on PyPI to demonstrate a BCP. Those are good arguments for an exception, IMHO.
On 26/05/10 13:51, Stephen J. Turnbull wrote:
People have been asking "what's special about this module, to violate the BCP principle?" There's nothing special about the fact that several people would use a "robust and debugged" futures module if it were in the stdlib. That's true of *every* module that is worth a PEP.
I actually wrote a reply to that question earlier in the week, but failed at using gmail's web interface correctly and only sent it to Steve Holden. =================== The trick with futures and executor pools is that they're a *better* way of programming with threads in many cases. However, given the choices of: - hack together something simple with some worker Threads and a Queue (or two) - write my own futures and executor infrastructure - download a futures module from PyPI and live with the additional dependency I'll choose the first option every time, and my programs will be the worse for it. Put the capability to use futures and an executor into the stdlib, and it becomes something I can reach for without having to worry about additional dependencies beyond specifying a minimal Python version. It provides a higher level API that can be more readily switched between threading and multiprocessing back ends. It becomes something that can be taught as a standard Python technique for enabling concurrency in imperative code. This is something that is irrelevant to me as a module on PyPI, but has the potential to significantly affect my programming in the future as a standard library module. Even in the near term, backports of future standard library modules are often perceived differently when being discussed as potential additional dependencies for an application (i.e. I believe it would be worthwhile for a backport of the module to earlier Python versions than 3.2 to be made available on PyPI). =================== Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, May 26, 2010 at 06:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
- download a futures module from PyPI and live with the additional dependency
Why would that be a problem? -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
On 26 May 2010 08:11, Lennart Regebro <regebro@gmail.com> wrote:
On Wed, May 26, 2010 at 06:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
- download a futures module from PyPI and live with the additional dependency
Why would that be a problem?
That has been hashed out repeatedly on this and other lists. Can it please be stipulated that for *some* people, in *some* cases, it is a problem? It seems to me that if you've experienced the sort of culture that makes it a problem, you understand the point immediately, but if you haven't, you never will (that's not disparaging anyone, the idiosyncracies of corporate culture are widespread and bizarre - if it helps, just remember that Dilbert is a documentary :-)) Paul.
On Wed, May 26, 2010 at 09:37, Paul Moore <p.f.moore@gmail.com> wrote:
It seems to me that if you've experienced the sort of culture that makes it a problem,
Ah, it's a culture problem. In a heterogenous world, every action will benefit some and hurt some. Another arbitrary corporate ruleset could also mean you might be stuck on ancient python versions, and might not see a new module added to stdlib in 3.2 until 2015 or so. Some corporations go through a lot of trouble to prevent their employees from doing their job. Pythons core developers can not and should not let that hinder *them* from doing what is best for Python. Decisions on inclusion in stdlib must be made on what benefits Python and it's users in general. Since even small mistakes in a stdlib module will hurt far more people than having to having the module mature on PyPI until the worst API issues and bugs are ironed out, it's clear to me that letting a module mature on PyPI before inclusion is the better policy here, although how long obviously must be decided on a case by case basis. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
On May 26, 2010, at 3:37 AM, Paul Moore wrote:
On 26 May 2010 08:11, Lennart Regebro <regebro@gmail.com> wrote:
On Wed, May 26, 2010 at 06:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
- download a futures module from PyPI and live with the additional dependency
Why would that be a problem?
That has been hashed out repeatedly on this and other lists. Can it please be stipulated that for *some* people, in *some* cases, it is a problem?
Sure, but I for one fully support Lennart asking the question, because while in the short term this *is* a problem with packaging tools in the Python ecosystem, in the long term (as you do note) it's an organizational dysfunction that can be addressed with better tools. I think it would be bad to ever concede the point that sane factoring of dependencies and code re-use aren't worth it because some jerk in Accounting or System Operations wants you to fill out a requisition form for a software component that's free and liberally licensed anyway. To support the unfortunate reality that such jerks in such departments really do in fact exist, there should be simple tools to glom a set of small, nicely factored dependencies into a giant monolithic ball of crud that installs all at once, and slap a sticker on the side of it that says "I am only filling out your stupid form once, okay". This should be as distant as possible from the actual decision to package things in sensibly-sized chunks. In other words, while I kinda-sorta buy Brian's argument that having this module in easy reach will motivate more people to use a standard, tested idiom for parallelization, I *don't* think that the stdlib should be expanded simply to accommodate those who just don't want to install additional packages for anything.
On Wed, 26 May 2010 04:25:18 -0400 Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
In other words, while I kinda-sorta buy Brian's argument that having this module in easy reach will motivate more people to use a standard, tested idiom for parallelization, I *don't* think that the stdlib should be expanded simply to accommodate those who just don't want to install additional packages for anything.
+1. Why don't the castrated-by-the-corporation people offer to maintain a "Sumo" distribution of Python on python.org instead? The rest of the world shouldn't have to be impacted by their corporate culture woes. cheers Antoine.
On Wed, 26 May 2010 07:39:15 pm Antoine Pitrou wrote:
On Wed, 26 May 2010 04:25:18 -0400
Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
In other words, while I kinda-sorta buy Brian's argument that having this module in easy reach will motivate more people to use a standard, tested idiom for parallelization, I *don't* think that the stdlib should be expanded simply to accommodate those who just don't want to install additional packages for anything.
+1. Why don't the castrated-by-the-corporation people offer to maintain a "Sumo" distribution of Python on python.org instead? The rest of the world shouldn't have to be impacted by their corporate culture woes.
It's not just the corporate culture. For many people, the standard library is the first introduction to even the existence of a particular technique or technology. You can't go looking for something on PyPI if you don't know that there's a something to look for. And for many beginners and not-so-beginners, the idea and practice of installing additional packages is simply problematic. I'm not saying that Python-Dev should bend over backwards to accommodate such people to the exclusion of all else, but these folks are stakeholders too, and their wants and needs are just as worthy as the wants and needs of those who prefer a more conservative approach to the standard library. This is a Python implementation of a stable Java API, Brian has said the futures package has been on PyPI for about a year, and it's been flagged as a production/stable release since October last year. http://pypi.python.org/pypi/futures3 Given that there does seem to be a general agreement that futures should go into the std lib at some point, is this not sufficient exposure? -- Steven D'Aprano
On Wed, 26 May 2010 20:42:12 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
I'm not saying that Python-Dev should bend over backwards to accommodate such people to the exclusion of all else, but these folks are stakeholders too, and their wants and needs are just as worthy as the wants and needs of those who prefer a more conservative approach to the standard library.
Well, my "Sumo" proposal was a serious one. (not serious in that I would offer to give a hand, but in that I think it could help those people; also, wouldn't it be sensible for users in a corporate environment to share their efforts and produce something that can benefit all of them? it's the free software spirit after all)
This is a Python implementation of a stable Java API, Brian has said the futures package has been on PyPI for about a year, and it's been flagged as a production/stable release since October last year.
I'm not against futures being in the stdlib, I was just pointing out that I don't agree with the "corporate culture issues should be accomodated by including more modules in the stdlib" argument. Regards Antoine.
On 26 May 2010 11:56, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 26 May 2010 20:42:12 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
I'm not saying that Python-Dev should bend over backwards to accommodate such people to the exclusion of all else, but these folks are stakeholders too, and their wants and needs are just as worthy as the wants and needs of those who prefer a more conservative approach to the standard library.
Well, my "Sumo" proposal was a serious one. (not serious in that I would offer to give a hand, but in that I think it could help those people; also, wouldn't it be sensible for users in a corporate environment to share their efforts and produce something that can benefit all of them? it's the free software spirit after all)
I'm not sure how a "Sumo" approach would work in practical terms, and this thread isn't really the place to discuss, but there's a couple of points I think are worth making: * For a "Sumo" distribution to make sense, some relatively substantial chunk of the standard library would need to be moved *out* to reside in the sumo distribution. Otherwise it's not really a "sumo", just a couple of modules that "nearly made it into the stdlib", at least for the near-to-medium term. I've yet to see any sort of consensus that python-dev is willing to undertake that decoupling work. (Which would include extracting the various tests, migrating bugs out of the pythion tracker, etc etc). * If the decoupled modules aren't simply being abandoned, python-dev needs to continue to commit to supporting them "in the wild" (i.e., on PyPI and in the sumo distribution). Otherwise we're just abandoning existing users and saying "support it yourself". I've seen no indication that python-dev members would expect to follow bug trackers for various decoupled modules - so in practice, this sounds more like abandonment than decoupling. Until a stdlib-decoupling proposal which takes these aspects into account is on the table, I'm afraid that suggesting there's a "Sumo distribution" style middle ground between stdlib and PyPI isn't really true... Paul.
On Wed, May 26, 2010 at 8:19 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 26 May 2010 11:56, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 26 May 2010 20:42:12 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
I'm not saying that Python-Dev should bend over backwards to accommodate such people to the exclusion of all else, but these folks are stakeholders too, and their wants and needs are just as worthy as the wants and needs of those who prefer a more conservative approach to the standard library.
Well, my "Sumo" proposal was a serious one. (not serious in that I would offer to give a hand, but in that I think it could help those people; also, wouldn't it be sensible for users in a corporate environment to share their efforts and produce something that can benefit all of them? it's the free software spirit after all)
I'm not sure how a "Sumo" approach would work in practical terms, and this thread isn't really the place to discuss, but there's a couple of points I think are worth making:
* For a "Sumo" distribution to make sense, some relatively substantial chunk of the standard library would need to be moved *out* to reside in the sumo distribution. Otherwise it's not really a "sumo", just a couple of modules that "nearly made it into the stdlib", at least for the near-to-medium term. I've yet to see any sort of consensus that python-dev is willing to undertake that decoupling work. (Which would include extracting the various tests, migrating bugs out of the pythion tracker, etc etc).
* If the decoupled modules aren't simply being abandoned, python-dev needs to continue to commit to supporting them "in the wild" (i.e., on PyPI and in the sumo distribution). Otherwise we're just abandoning existing users and saying "support it yourself". I've seen no indication that python-dev members would expect to follow bug trackers for various decoupled modules - so in practice, this sounds more like abandonment than decoupling.
Until a stdlib-decoupling proposal which takes these aspects into account is on the table, I'm afraid that suggesting there's a "Sumo distribution" style middle ground between stdlib and PyPI isn't really true...
Paul.
The fat vs. thin stdlib was discussed on stdlib-sig some time ago (I am generally +1 to having a thin dist and a secondary "fatter" dist), however right now, it doesn't make sense packaging and dependency management is still a mess (but getting better), and there's a ton of other things to take into consideration, some of which has been iterated in this thread. That being said, we've now evolved into meta-meta-meta-discussion - if people seriously want to discuss the fat vs. thin subject, it should probably go to stdlib-sig. jesse
On 26/05/2010 13:19, Paul Moore wrote:
On 26 May 2010 11:56, Antoine Pitrou<solipsis@pitrou.net> wrote:
On Wed, 26 May 2010 20:42:12 +1000 Steven D'Aprano<steve@pearwood.info> wrote:
I'm not saying that Python-Dev should bend over backwards to accommodate such people to the exclusion of all else, but these folks are stakeholders too, and their wants and needs are just as worthy as the wants and needs of those who prefer a more conservative approach to the standard library.
Well, my "Sumo" proposal was a serious one. (not serious in that I would offer to give a hand, but in that I think it could help those people; also, wouldn't it be sensible for users in a corporate environment to share their efforts and produce something that can benefit all of them? it's the free software spirit after all)
I'm not sure how a "Sumo" approach would work in practical terms, and this thread isn't really the place to discuss, but there's a couple of points I think are worth making:
* For a "Sumo" distribution to make sense, some relatively substantial chunk of the standard library would need to be moved *out* to reside in the sumo distribution. Otherwise it's not really a "sumo", just a couple of modules that "nearly made it into the stdlib", at least for the near-to-medium term. I've yet to see any sort of consensus that python-dev is willing to undertake that decoupling work. (Which would include extracting the various tests, migrating bugs out of the pythion tracker, etc etc).
* If the decoupled modules aren't simply being abandoned, python-dev needs to continue to commit to supporting them "in the wild" (i.e., on PyPI and in the sumo distribution). Otherwise we're just abandoning existing users and saying "support it yourself". I've seen no indication that python-dev members would expect to follow bug trackers for various decoupled modules - so in practice, this sounds more like abandonment than decoupling.
Until a stdlib-decoupling proposal which takes these aspects into account is on the table, I'm afraid that suggesting there's a "Sumo distribution" style middle ground between stdlib and PyPI isn't really true...
Well... a middle ground certainly could exist; perhaps in the form of an "Extended Standard Library" (community distribution), with simple installation and management tools. It could be "blessed" by python-dev and maintain a high standard (only well established best-of-breed modules with a commitment of ongoing maintenance and more than one maintainer - something that the stdlib itself doesn't stick to). A common license could even be chosen, potentially allowing corporations to approve the extended package in a single pass. Lot of details to flesh out obviously - but it would be great to see something like this come into being. Obviously this would need to be a community initiative and would take some time to establish. A "fat" distribution like this, based on tools like pip and distribute would be great for both newbies and for experienced programmers in making it easier to find "best" solutions for standard problems. It could also act as an incubator for the standard library (perhaps with stable and experimental streams where stable has a more conservative update policy). All the best, Michael Foord
Paul. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
Well... a middle ground certainly could exist; perhaps in the form of an "Extended Standard Library" (community distribution), with simple installation and management tools.
It could be "blessed" by python-dev and maintain a high standard (only well established best-of-breed modules with a commitment of ongoing maintenance and more than one maintainer - something that the stdlib itself doesn't stick to). A common license could even be chosen, potentially allowing corporations to approve the extended package in a single pass.
I read the 'sumo' thread before I read this (and replied in depth there), but I think Michael and I mean similar things. - Yaniv
On Wed, May 26, 2010 at 7:15 PM, Yaniv Aknin <yaniv@aknin.name> wrote:
Well... a middle ground certainly could exist; perhaps in the form of an "Extended Standard Library" (community distribution), with simple installation and management tools.
I'm not sure about the 'installation and management tools' part, but this is basically the idea I was trying to articulate: a middle ground between a 'fat' stdlib and a 'lean' one.
It could be "blessed" by python-dev and maintain a high standard (only well established best-of-breed modules with a commitment of ongoing maintenance and more than one maintainer - something that the stdlib itself doesn't stick to). A common license could even be chosen, potentially allowing corporations to approve the extended package in a single pass.
If we could do it that would be great, IMHO.
I read the 'sumo' thread before I read this (and replied in depth there), but I think Michael and I mean similar things. - Yaniv
I don't think I'm understanding you correctly in that thread then, ISTM that you're advocating better packaging systems as an alternative to this. Would you mind clarifying? Geremy Condra
I don't think I'm understanding you correctly in that thread then, ISTM that you're advocating better packaging systems as an alternative to this. Would you mind clarifying?
Gladly. In my mind, 'better packaging' is not "just" about something that will let you do 'pypkg install foo' and 'pypkg remove foo' along with dependancies, it's also about meta-packages (packages that have nothing but dependencies) but also about separate repositories with different natures, some endorsed by python-dev, some not. It's not so much the packaging system that does the trick - it's the eco system around it. When you install something from the 'recommended' repository (should find a better name), you know its a 'blessed' package. You know it has unittests, documentation, active development, good license, a bug tracker and the general-soundedness which I hope users have come to expect of python-dev. It is not stdlib, it's not made by python-dev, it doesn't come with python.org's default build by default, but you have a lot of important assurances with ease. You know 'it could have been in stdlib, but happens not to be'. To make this really work, the future tool I called "pypkg" will come with all future Python versions (so people can rely on it), and will arrive configured by default to use only the 'recommended' repository. From a developer's standpoint, relying on package 'foo' of the 'recommended' repository is safe: even if their end users don't happen to have 'foo', the developer is certain that 'recommended' is configured on their user's Python and is a quick (automagical?) download away. - Yaniv (p.s.: again, think Ubuntu core vs universe vs multiverse vs. non-Ubuntu repositories vs. PPAs; forgive me for waving the Ubuntu flag, from the packaging systems /and accompanying eco-systems/ I know, they did it best)
Le mercredi 26 mai 2010 à 13:19 +0100, Paul Moore a écrit :
I'm not sure how a "Sumo" approach would work in practical terms, and this thread isn't really the place to discuss, but there's a couple of points I think are worth making:
* For a "Sumo" distribution to make sense, some relatively substantial chunk of the standard library would need to be moved *out* to reside in the sumo distribution. Otherwise it's not really a "sumo", just a couple of modules that "nearly made it into the stdlib", at least for the near-to-medium term.
This is not what I'm suggesting at all. The stdlib wouldn't shrink (well, we could dump outdated modules but that's a separate decision). A hypothetical "Sumo" distribution would be made of those more or less useful modules that many people (application or framework developers; for example, take a look at the dozens of direct and indirect dependencies TurboGears pulls in) need and install. The whole point is that it would *not* be supported by python-dev itself but by a separate group of people (such as you :-)). And it would have its own rules (surely more relaxed) over inclusion or deprecation delays, the ability to make compatibility-breaking changes, release timings, multiple obvious ways to do the same thing, etc. And it means it could really bundle a lot of third-party stuff: small helpers (things like the decorator module), event loops, template engines, network address abstractions, argument parsers, ORMs, UI toolkits, etc. (a side-effect would be that it could, if it works well, behave as a good intermediate stress test - a purgatory, if you want - for modules before they integrate the stdlib) If you want an existing analogy, you could look at EasyPHP. Or think of Python as the Gnome or KDE project (consistent and aiming at providing the most important everyday tools, but quite focused), and "Sumo" as an entire distribution of disparate Linux GUI apps. Regards Antoine.
On Wed, May 26, 2010 at 5:46 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Le mercredi 26 mai 2010 à 13:19 +0100, Paul Moore a écrit :
I'm not sure how a "Sumo" approach would work in practical terms, and this thread isn't really the place to discuss, but there's a couple of points I think are worth making:
* For a "Sumo" distribution to make sense, some relatively substantial chunk of the standard library would need to be moved *out* to reside in the sumo distribution. Otherwise it's not really a "sumo", just a couple of modules that "nearly made it into the stdlib", at least for the near-to-medium term.
This is not what I'm suggesting at all. The stdlib wouldn't shrink (well, we could dump outdated modules but that's a separate decision).
A hypothetical "Sumo" distribution would be made of those more or less useful modules that many people (application or framework developers; for example, take a look at the dozens of direct and indirect dependencies TurboGears pulls in) need and install.
The whole point is that it would *not* be supported by python-dev itself but by a separate group of people (such as you :-)). And it would have its own rules (surely more relaxed) over inclusion or deprecation delays, the ability to make compatibility-breaking changes, release timings, multiple obvious ways to do the same thing, etc.
And it means it could really bundle a lot of third-party stuff: small helpers (things like the decorator module), event loops, template engines, network address abstractions, argument parsers, ORMs, UI toolkits, etc.
(a side-effect would be that it could, if it works well, behave as a good intermediate stress test - a purgatory, if you want - for modules before they integrate the stdlib)
If you want an existing analogy, you could look at EasyPHP. Or think of Python as the Gnome or KDE project (consistent and aiming at providing the most important everyday tools, but quite focused), and "Sumo" as an entire distribution of disparate Linux GUI apps.
Regards
Antoine.
I'd also point out that creating a sumo distribution would focus attention on the need to port those libraries which were a part of it to python3, which would help to weaken the argument that there aren't any packages written for it. Geremy Condra
On 26 May 2010 13:46, Antoine Pitrou <solipsis@pitrou.net> wrote:
This is not what I'm suggesting at all. The stdlib wouldn't shrink (well, we could dump outdated modules but that's a separate decision).
Ah, OK. In that case, I see the argument for a "Sumo" distribution as weak for a different reason - for general use, the standard library is (nearly!) sufficient, and ignoring specialised use cases, there aren't enough generally useful modules to warrant a "Sumo" distribution (you'd essentially be talking about stuff that "nearly made it into the stdlib", and there's not a huge amount of that). Specialised distributions are another matter - I can see a "web stack" distribution comprising your TurboGears example (or should it be Django, or...?). Enthought essentially do that for a "Scientific Python" distribution. There could easily be others. But a general purpose "Sumo" distribution *on top of* the stdlib? I'm skeptical. (Personally, my "essential extras" are pywin32, cx_Oracle and that's about it - futures might make it if it doesn't get into the stdlib, but that's about all). I'm genuinely struggling to see how a Sumo distribution ever comes into being under your proposal. There's no evidence that anyone wants it (otherwise it would have been created by now!!) and until it exists, it's not a plausible "place" to put modules that don't make it into the stdlib. So (unless I'm missing something) your argument seems to be that if enough good stuff is rejected for stdlib inclusion, this will prompt the people who wanted that stuff included to create a sumo distribution, which addresses the "too many dependencies is bad" argument for inclusion in the stdlib. That sounds like a suspiciously circular argument to me... Paul.
Le mercredi 26 mai 2010 à 23:41 +0100, Paul Moore a écrit :
But a general purpose "Sumo" distribution *on top of* the stdlib? I'm skeptical. (Personally, my "essential extras" are pywin32, cx_Oracle and that's about it - futures might make it if it doesn't get into the stdlib, but that's about all).
Well, unless you are short on megabytes or on download bandwidth, would you really care to get some modules you never use? (there are probably some in the stdlib too!)
I'm genuinely struggling to see how a Sumo distribution ever comes into being under your proposal. There's no evidence that anyone wants it (otherwise it would have been created by now!!) and until it exists, it's not a plausible "place" to put modules that don't make it into the stdlib.
I don't think all package owners dream of putting their software in the stdlib. It's likely some don't care, and it's likely some would actively refuse it (because e.g. they don't want to lose control). So the suggestion is not to somehow salvage packages which "don't make it into the stdlib", but to build a broader distribution of packages without necessarily having people bow to the constraints of stdlib inclusion. Of course, I agree with you that someone has to do it if they want it to happen :-) Regards Antoine.
On Wed, May 26, 2010 at 3:41 PM, Paul Moore <p.f.moore@gmail.com> wrote:
On 26 May 2010 13:46, Antoine Pitrou <solipsis@pitrou.net> wrote:
This is not what I'm suggesting at all. The stdlib wouldn't shrink (well, we could dump outdated modules but that's a separate decision).
Ah, OK. In that case, I see the argument for a "Sumo" distribution as weak for a different reason - for general use, the standard library is (nearly!) sufficient, and ignoring specialised use cases, there aren't enough generally useful modules to warrant a "Sumo" distribution (you'd essentially be talking about stuff that "nearly made it into the stdlib", and there's not a huge amount of that).
Specialised distributions are another matter - I can see a "web stack" distribution comprising your TurboGears example (or should it be Django, or...?). Enthought essentially do that for a "Scientific Python" distribution. There could easily be others. But a general purpose "Sumo" distribution *on top of* the stdlib? I'm skeptical. (Personally, my "essential extras" are pywin32, cx_Oracle and that's about it - futures might make it if it doesn't get into the stdlib, but that's about all).
I'm not clear, you seem to be arguing that there's a market for many augmented python distributions but not one. Why not just have one that includes the best from each domain?
I'm genuinely struggling to see how a Sumo distribution ever comes into being under your proposal. There's no evidence that anyone wants it (otherwise it would have been created by now!!)
Everything worth making has already been made?
and until it exists, it's not a plausible "place" to put modules that don't make it into the stdlib.
Of course its implausible to put something somewhere that doesn't exist... until it does.
So (unless I'm missing something) your argument seems to be that if enough good stuff is rejected for stdlib inclusion, this will prompt the people who wanted that stuff included to create a sumo distribution, which addresses the "too many dependencies is bad" argument for inclusion in the stdlib. That sounds like a suspiciously circular argument to me...
I'd say rather that there are a large number of specialized tools which aren't individually popular enough to be included in Python, but which when taken together greatly increase its utility, and that sumo offers a way to provide that additional utility to python's users without forcing python core devs to shoulder the maintenance burden. Geremy Condra
On 27/05/10 09:11, geremy condra wrote:
Specialised distributions are another matter - I can see a "web stack" distribution comprising your TurboGears example (or should it be Django, or...?). Enthought essentially do that for a "Scientific Python" distribution. There could easily be others. But a general purpose "Sumo" distribution *on top of* the stdlib? I'm skeptical. (Personally, my "essential extras" are pywin32, cx_Oracle and that's about it - futures might make it if it doesn't get into the stdlib, but that's about all).
I'm not clear, you seem to be arguing that there's a market for many augmented python distributions but not one. Why not just have one that includes the best from each domain?
Because scientists, financial analysts, web designers, etc all have different needs. A targeted distribution like Scientific Python will include nearly all the stuff a scientist is likely to need, but a financial analyst or web designer would find it lacking. As Paul points out, the current size of the set of modules that are sufficiently general purpose and of high enough quality to qualify for python-dev's blessing, but wouldn't be suitable for inclusion in the normal standard library is fairly small. Particular when most developers are able to get sufficiently valuable modules from PyPI if they genuinely need them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, May 26, 2010 at 4:57 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 27/05/10 09:11, geremy condra wrote:
Specialised distributions are another matter - I can see a "web stack" distribution comprising your TurboGears example (or should it be Django, or...?). Enthought essentially do that for a "Scientific Python" distribution. There could easily be others. But a general purpose "Sumo" distribution *on top of* the stdlib? I'm skeptical. (Personally, my "essential extras" are pywin32, cx_Oracle and that's about it - futures might make it if it doesn't get into the stdlib, but that's about all).
I'm not clear, you seem to be arguing that there's a market for many augmented python distributions but not one. Why not just have one that includes the best from each domain?
Because scientists, financial analysts, web designers, etc all have different needs.
My point is just that a web designer probably doesn't care if he's got numpy, nor does a mathematician care if he has cherrypy onboard. They only care when the tools they need aren't there, which is where sumo can help.
A targeted distribution like Scientific Python will include nearly all the stuff a scientist is likely to need, but a financial analyst or web designer would find it lacking.
Seems like the same point as above, I might be missing something.
As Paul points out, the current size of the set of modules that are sufficiently general purpose and of high enough quality to qualify for python-dev's blessing, but wouldn't be suitable for inclusion in the normal standard library is fairly small. Particular when most developers are able to get sufficiently valuable modules from PyPI if they genuinely need them.
Seems like the point is not to focus on the general purpose ones, but rather to include domain or task specific libraries, and libraries that are close to (but not at) the level where they would be considered for inclusion. Geremy Condra
Because scientists, financial analysts, web designers, etc all have different needs.
My point is just that a web designer probably doesn't care if he's got numpy, nor does a mathematician care if he has cherrypy onboard. They only care when the tools they need aren't there, which is where sumo can help.
Why do we want "distributions" (whether 'sumo' or domain specific) in the first place? Obviously because we have some pains to alleviate, but I think Linux distributions, particularly Ubuntu, have been coping with similar pains for a while now with good results. I used to be rather informed on the state of Linux distributions circa the end of the 90's. Should I use this distribution or that one? Now I don't care, the answer is always 'Ubuntu, and apt-get it to fit your needs'. The pain list, as I see it: (a) Make it easy to find functionality that you'd typically be tempted to implement alone (b) Make it trivial, mind boggling easy, to add this functionality to your interpreter once you know of it (b.1) Make it easy to package a Python 'distribution' so end users don't need to muck around with (b) (c) Make it easy for software to specify its requirements in code and get them mostly automagically (d) Make it *legally* least painful to acquire and use endorsed functionality sets ... (pipe in if I missed something important) I'm wondering if expanding the efforts in Python packaging ("easy_install" and PEP376 comes to mind, only on steroids) isn't the correct solution for alleviating these pains in a much better fashion. I'm not saying it's easy, but I think that superb packaging technologies are a very well proven technology, that they're well within the community's capabilities/needs and that maximizing flexibility is a move in a better direction than creating one or two or seven 'distributions'. I hope you would intuitively grasp what I mean, even if I fail to articulate it well, if you ever used dpkg and apt-get (or equivalents), along with the ability to add sources and the simple distinctions between sources (core vs. universe vs. multiverse), along with eco systems like launchpad and PPAs, etc. To my 1999 self these features of Ubuntu seem like miracles, not so much because Debian and dpkg/apt-get weren't there back then (they were life saving back then too), but because of the whole system's huge support base and centrally-designed decentrally-implemented nature. 'apt-cache search' is usually how I find new software these days, and the same could be true for Python when I need futures or multiprocessing or whatever. I could write lots about what I think we should have in such a system, but maybe it's more befitting for python-ideas or a blog post than here; for this argument, 'mirror Ubuntu's winnage' is a good enough description. I think a good and lively packaging system beats 'distributions' any day. This leaves us only with the legal pain, which I felt harshly on my flesh following some acquisition by a corporation and it's painful and frustrating and mostly unreasonable. I suspect, though I'm a legal n00b (and proud of it), that packaging in multiple repositories like Ubuntu does could help a lot, along with taking some responsibility over the most recommended repo (candidates are the 'almost made it into stdlib' packages, maybe even the giants like Django, NumPy, TurboGears, Twisted, etc). This is not about development responsibility and surely not about taking legal liability, it's about actively helping facilitate these packages' use in hostile legal environments. While the hackers in the trenches working on trunk and py3k may not care much (and probably needn't care much), I humbly think the PSF may care more and is best equipped to help. Maybe by requiring certain licenses for 'recommended' repo inclusion, maybe by working with the likes of Google or IBM on publishing legal reviews of select Python distributions, I don't know, depends what would help a corporate lawyer make speedy approval. - Yaniv
Hello, sorry to interrupt your discussion but.. On Thu, May 27, 2010 at 04:09, Yaniv Aknin <yaniv@aknin.name> wrote:
Because scientists, financial analysts, web designers, etc all have different needs.
My point is just that a web designer probably doesn't care if he's got numpy, nor does a mathematician care if he has cherrypy onboard. They only care when the tools they need aren't there, which is where sumo can help.
Why do we want "distributions" (whether 'sumo' or domain specific) in the first place? Obviously because we have some pains to alleviate, but I think Linux distributions, particularly Ubuntu, have been coping with similar pains for a while now with good results. I used to be rather informed on the state of Linux distributions circa the end of the 90's. Should I use this distribution or that one? Now I don't care, the answer is always 'Ubuntu, and apt-get it to fit your needs'.
please note that the huge bunch of work for Python third-party modules is done in *Debian* and Ubuntu just takes those packages without advertise enough where they come from and who did the work (and not the merchandizing only).
I hope you would intuitively grasp what I mean, even if I fail to articulate it well, if you ever used dpkg and apt-get (or equivalents), along with the ability to add sources and the simple distinctions between sources (core vs. universe vs. multiverse), along with eco systems like launchpad and PPAs, etc. To my 1999 self these features of Ubuntu seem like miracles, not so much because Debian and dpkg/apt-get weren't there back then (they were life saving back then too), but because of the whole system's huge support base and centrally-designed decentrally-implemented nature.
mh? Debian was not in present in 1999? Debian started in 1993 (dpkg in 1994 and apt-get in 1998) while ubuntu only in mid-2000 as a layer over Debian packages (and hiring several Debian Developers for its core positions). Also, let me remind you that transition to Python 2.6 was a huge mess for Ubuntu, where several packages were just left broken (f.e. failing unit tests were disabled to make the package build...) and only now that Debian started to migrate to 2.6 too, you can see a "flow" of packages that works for 2.6 too coming to Ubuntu. Debian can be slow, but we care about quality. End of "give credit where it's due" post :) Regards, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi
One worry with an official sumo distribution is that it could become an excuse for *not* putting something in the stdlib. Otherwise it's an interesting idea. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
Lennart Regebro writes:
One worry with an official sumo distribution is that it could become an excuse for *not* putting something in the stdlib. Otherwise it's an interesting idea.
On the contrary, that is the meat of why it's an interesting idea. I really don't think the proponents of ipaddr and futures (to take two recent PEPs) would have been willing to stop with a hypothetical sumo. Both of those packages were designed with general use in mind. Substantial effort was put into making them TOOWTDI-able. Partly that's pride ("my stuff is good enough for the stdlib"), and partly there's a genuine need for it to be there (for your customers or just to pay back the community). Of course there was a lot of criticism of both that they don't really come up to that standard, but even opponents would credit the proponents for good intentions and making the necessary effort, I think. And it's the stdlib that (in a certain sense) puts the "OO" in "TOOWTDI". On the other hand, some ideas deserve widespread exposure, but they need real experience because the appropriate requirements and specs are unclear. It would be premature to put in the effort to make them TOOWTDI. However, to get the momentum to become BCP, and thus an obvious candidate for stdlib inclusion, it's helpful to be *already* available on *typical* installations. PyPI is great, but it's not quite there; it's not as discoverable and accessible as simply putting "import stuff" based on some snippet you found on the web. And the stdlib itself can't be the means, it's the end. At present, such ideas face the alternative "stdlib or die". The sumo would give them a place to be.
On 27 May 2010 00:11, geremy condra <debatem1@gmail.com> wrote:
I'm not clear, you seem to be arguing that there's a market for many augmented python distributions but not one. Why not just have one that includes the best from each domain?
Because that's "bloat". You later argue that a web designer wouldn't care if his "distribution" included numpy. OK, maybe, but if my needs are simply futures, cx_Oracle and pywin32, I *would* object to downloading many megabytes of other stuff just to get those three. It's a matter of degree.
I'm genuinely struggling to see how a Sumo distribution ever comes into being under your proposal. There's no evidence that anyone wants it (otherwise it would have been created by now!!)
Everything worth making has already been made?
Not what I'm saying. But if/when it happens, something will trigger it. I see no sign of such a trigger. That's all I'm saying.
and until it exists, it's not a plausible "place" to put modules that don't make it into the stdlib.
Of course its implausible to put something somewhere that doesn't exist... until it does.
Hence my point - people are saying futures don't belong in the stdlib but they could go in a sumo distribution. The second half of that statement is (currently) content free if not self-contradictory.
I'd say rather that there are a large number of specialized tools which aren't individually popular enough to be included in Python, but which when taken together greatly increase its utility, and that sumo offers a way to provide that additional utility to python's users without forcing python core devs to shoulder the maintenance burden.
I don't believe that there's evidence that aggregation (except in the context of specialist areas) does provide additional utility. (In the context of the discussion that sparked this debate, that contrasts with inclusion in the stdlib, which *does* offer additional utility - "batteries included", guaranteed and tested cross-platform functioning, a statement of best practice, etc etc). Paul. PS One thing I haven't seen made clear - in my view, they hypothetical "sumo" is a single aggregated distribution of Python modules/packages/extensions. It would NOT include core Python and the stdlib (in contrast to Enthought or ActivePython). I get the impression that other people may be thinking in terms of a full Python distribution, like those 2 cases. We probably ought to be clear which we're talking about.
OK, I had an idea here: How about that the people affected by difficulties in getting software approved got together to put together not a sumo-python, but a python-extras package? That package could include all the popular stuff, like SciPy, Numpy, twisted, distribute, buildout, virtualenv, pip, pytz, PIL, openid, docutils, simplejson, nose, genshi, and tons of others. That would be a big download. But here's the trick: You don't *have* to install them! Just bundle all of it. If licensing is a problem I guess you'd need to have permission to relicense them all to the Python license, which would be problematic. But otherwise having a team of people overseeing and bundling all this might not be that much work, and you'd avoid the bloat by not installing all of it. :-) Or would this not fool the company trolls? -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64
Lennart Regebro writes:
If licensing is a problem I guess you'd need to have permission to relicense them all to the Python license,
Licensing compatibility is only a problem for copyleft, but most copyleft licenses have "mere aggregation is not derivation" clauses. Corporate concern about *knowing* what the license is, is a problem. The XEmacs experience I've referred to elsewhere doesn't apply because all our stuff is GPL, and therefore all our stuff has to be GPL. :-( It's not obvious to me what the resolution is, although lots of distributions now have some way to find out what licenses are. GCC (and soon GNU Emacs) even have a way to check GPL-compatibility at runtime (inspired by the Linux kernel feature, maybe?) Perhaps the sumo infrastructure could provide a license-ok-or-fatal feature. Eg, the application would do something like sumo_ok_licenses = ['gplv2','bsd','microsoft_eula'] and the sumo version of the package's __init.py__ would do sumo_check_my_license('artistic') and raise LicenseError if it's not in sumo_ok_licenses. In theory it might be able to do more complex stuff like keep track of declared licenses and barf if they're incompatible. This scheme probably doesn't save lawyer time. The lawyers would still have to go over the actual packages to make sure the licenses are what they say etc. before going into distribution. Its selling point is that the developers would be warned of problems that need corporate legal's attention early in 90% of the cases, thus not wasting developer time on using packages that were non-starters because of license issues.
which would be problematic. But otherwise having a team of people overseeing and bundling all this might not be that much work, and you'd avoid the bloat by not installing all of it. :-)
As I've argued elsewhere, bloat is good, for some purposes.
Or would this not fool the company trolls?
It will satisfy some, and not others, in my experience, described elsewhere.
Paul Moore writes:
On 27 May 2010 00:11, geremy condra <debatem1@gmail.com> wrote:
I'm not clear, you seem to be arguing that there's a market for many augmented python distributions but not one. Why not just have one that includes the best from each domain?
Because that's "bloat". You later argue that a web designer wouldn't care if his "distribution" included numpy. OK, maybe, but if my needs are simply futures, cx_Oracle and pywin32, I *would* object to downloading many megabytes of other stuff just to get those three. It's a matter of degree.
So don't do that. Go to PyPI and get just what you need. The point of the sumo is that there are people and organizations with more bandwidth/diskspace than brains (or to be more accurate, they have enough bandwidth that optimizing bandwidth is a poor use of their brains). XEmacs has used a separate distribution for packages for over a decade, and it's been a very popular feature. Since originally all packages were part of Emacs (and still are in GNU Emacs), the package distribution is a single source hierarchy (like the Python stdlib). So there are three ways of acquiring packages: check out the sources and build and install them, download individual pre-built packages, and download the sumo of all pre-built packages. The sumos are very popular. The reason is simple. A distribution of all Emacs packages ever made would still probably be under 1GB. This just isn't a lot of bandwidth or disk space when people are schlepping around DVD images, even BD images. A Python sumo would probably be much bigger (multiple GB) than XEmacs's (about 120MB, IIRC), but it would still be a negligible amount of resources *for some people/organizations*. And I have to support the "organizational constraints" argument here. Several people have told me that (strangely enough, given its rather random nature, both in what is provided and the quality) getting the sumo certified by their organization was less trouble than getting XEmacs itself certified, and neither was all that much more effort than getting a single package certified. Maintaining a sumo would be a significant effort. The XEmacs rule is that we allow someone to add a package to the distro if they promise to maintain it for a couple years, or if we think it matters enough that we'll accept the burden. We're permissive enough that there are at least 4 different MUAs in the distribution, several IRC clients, two TeX modes, etc, etc. Still, just maintaining contact with "external maintainers" (who do go AWOL regularly), and dealing with issues where somebody wants to upgrade (eg) "vcard" which is provided by "gnus" but doesn't want to deal with "gnus", etc takes time, thought, and sometimes improvement in the distribution infrastructure. It's not clear to me that Python users would benefit that much over and above the current stdlib, which provides a huge amount of functionality, of somewhat uneven but generally high quality. But I certainly think significant additional benefit would be gained, the question is is it worth the effort? It's worth discussing.
I don't believe that there's evidence that aggregation (except in the context of specialist areas) does provide additional utility.
We'll just have to agree to disagree, then. Plenty of evidence has been provided; it just doesn't happen to apply to you. Fine, but I wish you'd make the "to me" part explicit, because I know that it does apply to others, many of them, from their personal testimony, both related to XEmacs and to Python.
PS One thing I haven't seen made clear - in my view, they hypothetical "sumo" is a single aggregated distribution of Python modules/packages/extensions. It would NOT include core Python and the stdlib (in contrast to Enthought or ActivePython). I get the impression that other people may be thinking in terms of a full Python distribution, like those 2 cases. We probably ought to be clear which we're talking about.
On the XEmacs model, it would not include core Python, but it would include much of the stdlib. The reason is that the stdlib makes commitments to compatibility that the sumo would not need to. So the sumo might include (a) recent, relatively experimental versions of stdlib packages (yes, this kind of duplication is a pain, but (some) users do want it) and (b) packages which are formally separate but duplicate functionality in the stdlib (eg, ipaddr and netaddr) -- in some cases the sumo distro would want to make adjustments so they can co-exist. I wouldn't recommend building a production system on top of a sumo in any case. But (given resources to maintain multiple Python development installations) it is a good environment for experimentation, because not only batteries but screwdrivers and duct tape are supplied.
On 27/05/2010 16:56, Stephen J. Turnbull wrote:
Paul Moore writes:
On 27 May 2010 00:11, geremy condra<debatem1@gmail.com> wrote:
I'm not clear, you seem to be arguing that there's a market for many augmented python distributions but not one. Why not just have one that includes the best from each domain?
Because that's "bloat". You later argue that a web designer wouldn't care if his "distribution" included numpy. OK, maybe, but if my needs are simply futures, cx_Oracle and pywin32, I *would* object to downloading many megabytes of other stuff just to get those three. It's a matter of degree.
So don't do that. Go to PyPI and get just what you need.
The point of the sumo is that there are people and organizations with more bandwidth/diskspace than brains (or to be more accurate, they have enough bandwidth that optimizing bandwidth is a poor use of their brains).
To my mind one of the most important benefits of a "sumo" style distribution is not just that it easily provides a whole bunch of useful modules - but that it *highlights* which modules are the community blessed "best of breed". At the moment if a new user wants to work out how to achieve a particular task (work with images for example) they have to google around and try and work out what the right module to use is. For some problem domains there are a host of modules on PyPI many of which are unmaintained, immature or simply rubbish. A standardised solution makes choosing solutions for common problems *dramatically* easier, and may save people much heartache and frustration. For that to work though it needs to be well curated and genuinely have the substantial backing of the Python development community. All the best, Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
Michael Foord writes:
To my mind one of the most important benefits of a "sumo" style distribution is not just that it easily provides a whole bunch of useful modules - but that it *highlights* which modules are the community blessed "best of breed".
That has several problems. (1) There is a lot of overlap with the mission of the stdlib, and I think confusion over roles would be quite costly. (2) As the stdlib demonstrates, picking winners is expensive. I greatly doubt that running *two* such processes is worthwhile. (3) Very often there is no best of breed.
On 27 May 2010 16:56, Stephen J. Turnbull <stephen@xemacs.org> wrote:
We'll just have to agree to disagree, then. Plenty of evidence has been provided; it just doesn't happen to apply to you. Fine, but I wish you'd make the "to me" part explicit, because I know that it does apply to others, many of them, from their personal testimony, both related to XEmacs and to Python.
Sorry, you're right. There's a very strong "to me" in all of this, but I more or less assumed it was obvious, as I was originally responding to comments implying that a sumo distribution was a solution to a problem I stated that I have. In trying to trim things, and keep things concise, I completely lost the context. My apologies.
I wouldn't recommend building a production system on top of a sumo in any case. But (given resources to maintain multiple Python development installations) it is a good environment for experimentation, because not only batteries but screwdrivers and duct tape are supplied.
That's an interesting perspective that I hadn't seen mentioned before. For experimentation, I'd *love* a sumo distribution as you describe. But I thought this whole discussion focussed around building production systems. For that, the stdlib's quality guarantees are a major benefit, and the costs of locating and validating appropriately high-quality external packages are (sometimes prohibitively) high. But I think I'm getting to the point where I'm adding more confusion than information, so I'll bow out of this discussion at this point. Paul.
On 26/05/10 20:56, Antoine Pitrou wrote:
On Wed, 26 May 2010 20:42:12 +1000 Steven D'Aprano<steve@pearwood.info> wrote:
I'm not saying that Python-Dev should bend over backwards to accommodate such people to the exclusion of all else, but these folks are stakeholders too, and their wants and needs are just as worthy as the wants and needs of those who prefer a more conservative approach to the standard library.
Well, my "Sumo" proposal was a serious one. (not serious in that I would offer to give a hand, but in that I think it could help those people; also, wouldn't it be sensible for users in a corporate environment to share their efforts and produce something that can benefit all of them? it's the free software spirit after all)
That's actually what happens with groups like Enthought and ActiveState - bundles with extra batteries. However, note that this isn't just a dysfunctional corporate culture issue, and I object to it being characterised as such (although dysfunctional cultures can certainly make it much, much worse). Vetting licenses for due diligence reasons, tracking releases of an external module, familiarising yourself with an additional API and code base, the risk of encountering bugs in that code base... these are all real costs that don't go away no matter how good the Python packaging ecosystem becomes. There is a trade off between "do the simplest thing that could possibly work (but may cause you problems later)" and spending the time to investigate third party solutions (with the risk that you end up rolling your own later anyway if you don't find anything suitable or, worse, find something that initially seems suitable but proves unworkable in practice). A module that makes it into the standard library, however, carries python-dev's stamp of approval. Except for some older legacy libraries, that means a module will have at least half decent documentation and an automated test suite that is regularly run on multiple platforms. Its design will also have run the gauntlet of python-dev approval. If we identify a good solution to a standard problem, and we have reason to believe that posting it in on PyPI as a separate module won't lead to a significant amount of additional real world testing, then it makes sense for it to go straight into the standard library. Such modules are going to be rare (since most non-trivial modules *will* benefit from some time on PyPI, and most trivial modules won't be added to the standard library at all), but they do exist (runpy, contextlib, collections, itertools and abc spring to mind). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, May 26, 2010 at 6:56 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 26 May 2010 20:42:12 +1000 Steven D'Aprano <steve@pearwood.info> wrote:
I'm not saying that Python-Dev should bend over backwards to accommodate such people to the exclusion of all else, but these folks are stakeholders too, and their wants and needs are just as worthy as the wants and needs of those who prefer a more conservative approach to the standard library.
Well, my "Sumo" proposal was a serious one. (not serious in that I would offer to give a hand, but in that I think it could help those people; also, wouldn't it be sensible for users in a corporate environment to share their efforts and produce something that can benefit all of them? it's the free software spirit after all)
Not in a corporate environment, but I would gladly help with this. Geremy Condra
Nick Coghlan writes:
On 26/05/10 13:51, Stephen J. Turnbull wrote:
People have been asking "what's special about this module, to violate the BCP principle?" There's nothing special about the fact that several people would use a "robust and debugged" futures module if it were in the stdlib. That's true of *every* module that is worth a PEP.
The trick with futures and executor pools is that they're a *better* way of programming with threads in many cases.
and
However, given the choices of [...]. I'll choose the first option every time, and my programs will be the worse for it.
Again, nothing all that special about those; lots of proposed changes satisfy similar conditions. I don't think anyone denies the truth or applicability of those arguments. But are they enough? Really, what you're arguing is "now is better than never." Indeed, that is so. But you shouldn't forget that is immediately followed by "although never is often better than *right* now."
On 26 May 2010, at 18:44, Stephen J. Turnbull wrote:
Nick Coghlan writes:
On 26/05/10 13:51, Stephen J. Turnbull wrote:
People have been asking "what's special about this module, to violate the BCP principle?" There's nothing special about the fact that several people would use a "robust and debugged" futures module if it were in the stdlib. That's true of *every* module that is worth a PEP.
The trick with futures and executor pools is that they're a *better* way of programming with threads in many cases.
and
However, given the choices of [...]. I'll choose the first option every time, and my programs will be the worse for it.
Again, nothing all that special about those; lots of proposed changes satisfy similar conditions. I don't think anyone denies the truth or applicability of those arguments. But are they enough?
Really, what you're arguing is "now is better than never." Indeed, that is so. But you shouldn't forget that is immediately followed by "although never is often better than *right* now."
I've been trying to stay out of the meta-discussions but "*right* now" would be >6 months if it applies in this context. If that is what "*right* now" means to you then I hope that I never have a heart attack in your presence and need an ambulance *right* now :-) Cheers, Brian
Brian Quinlan wrote:
The good news in this case is that the same API has been used successfully in Java and C++ for years so it is unlikely that any major changes will need to be made.
That doesn't follow. An API that's appropriate for Java or C++ is not necessarily appropriate for Python. Slavishly copying an API from another language is often not the best approach when designing an API for a Python module. -- Greg
Am 24.05.2010 01:51, schrieb Greg Ewing:
Brian Quinlan wrote:
The good news in this case is that the same API has been used successfully in Java and C++ for years so it is unlikely that any major changes will need to be made.
That doesn't follow. An API that's appropriate for Java or C++ is not necessarily appropriate for Python. Slavishly copying an API from another language is often not the best approach when designing an API for a Python module.
*cough* unittest *cough* Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
Brian Quinlan wrote:
On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:
[...]
One minor suggestion on the "internal future methods" bit - something I wish we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on separate objects, so that it was very explicit whether you were on the emitting side of a Deferred or the consuming side. That seems to be the case with these internal methods - they are not so much "internal" as they are for the producer of the Future (whether a unit test or executor) so you might want to put them on a different object that it's easy for the thing creating a Future() to get at but hard for any subsequent application code to fiddle with by accident. Off the top of my head, I suggest naming it "Invoker()". A good way to do this would be to have an Invoker class which can't be instantiated (raises an exception from __init__ or somesuch), then a Future.create() method which returns an Invoker, which itself has a '.future' attribute.
Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :).
We've already had this discussion before. Could you explain why this module should *not* be in the stdlib e.g. does it have significantly less utility than other modules in stdlib? Is it significantly higher risk? etc?
Given that its author was ready to go for pronouncement and is still responding to pretty serious philosophical questions about the API I'd say that it was at least worth talking about. The thing that's needed (isn't it?) of stdlib modules is API stability.
Issues like the ones I'm bringing up could be fixed pretty straightforwardly if it were just a matter of filing a bug on a small package, but fixing a stdlib module is a major undertaking.
True but I don't think that is a convincing argument. A subset of the functionality provided by this module is already available in Java and C++ and (at least in Java) it is used extensively and without too much trouble. If there are implementation bugs then we can fix them just like we would with any other module.
I don't see the availability of this functionality in those languages as any kind of reason why this needs to go into the stdlib now. Is there some desperate rush to get it in? If it were used extensively from PyPi *that* would be a recommendation ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000
On Sun, May 23, 2010 at 7:52 AM, Steve Holden <steve@holdenweb.com> wrote: ...snip...
Issues like the ones I'm bringing up could be fixed pretty straightforwardly if it were just a matter of filing a bug on a small package, but fixing a stdlib module is a major undertaking.
True but I don't think that is a convincing argument. A subset of the functionality provided by this module is already available in Java and C++ and (at least in Java) it is used extensively and without too much trouble. If there are implementation bugs then we can fix them just like we would with any other module.
I don't see the availability of this functionality in those languages as any kind of reason why this needs to go into the stdlib now. Is there some desperate rush to get it in? If it were used extensively from PyPi *that* would be a recommendation ...
Not picking Steve's particular comments out - but Brian cites the previous discussions in the PEP itself: http://www.python.org/dev/peps/pep-3148/ All of you questioning "Why should this be in the standard library" should go read those old threads, where that question was answered numerous times. Now I suddenly regret leaving the floodgates open, as we're rapidly rehashing discussions from months ago. For this same mailing list only a few months ago (brian, I think this link should be added to the PEP, I didn't see it): http://mail.python.org/pipermail/python-dev/2010-March/098169.html Specifically: http://mail.python.org/pipermail/python-dev/2010-March/098173.html Quote: "Baloney. A young library providing some syntactic sugar which uses primitives in the standard library to implement a common pattern is fine for a PEP. We've hashed this out pretty heavily on the stdlib-sig list prior to bringing it here. By the same argument, we should shunt all of the recent unittest changes and improvements into space, since golly, other people did it, why should we. This is something relatively simple, which I would gladly add in an instant to the multiprocessing package - but Brian's one-upped me in that regard and is providing something which works with both threads and processes handily. Take a look at multiprocessing.Pool for example - all that is some sugar on top of the primitives, but it's good sugar, and is used by a fair number of people. Let me also state - "my" vision of where futures would live would be in a concurrent package - for example: from concurrent import futures The reason *why* is that I would like to also move the abstractions I have in multiprocessing *out* of that module, make them work with both threads and processes (if it makes sense) and reduce the multiprocessing module to the base primitive Process object. A concurrent package which implements common patterns built on top of the primitives we support is an objectively Good Thing. For example, how many of us have sat down and implemented a thread pool on top of threading, I would hazard to say that most of us who use threading have done this, and probably more than once. It stands to reason that this is a common enough pattern to include in the standard library. " Brian has already agreed to name spacing it to "concurrent.futures" - this means it will be a small part to a much larger concurrent.* implementation ala Java. So, in short - given we've already hashed the reasoning out. jesse
On Sun, 23 May 2010 08:34:22 -0400 Jesse Noller <jnoller@gmail.com> wrote:
Brian has already agreed to name spacing it to "concurrent.futures" - this means it will be a small part to a much larger concurrent.* implementation ala Java.
What I would question here is what other things will be part of the "concurrent" package, and who will implement them. Are there plans for that? (or even tracker issues open?) Apart from that, it seems to me that the only serious issues blocking PEP approval are Glyph's interesting remarks. Regards Antoine.
On 23/05/10 22:47, Antoine Pitrou wrote:
On Sun, 23 May 2010 08:34:22 -0400 Jesse Noller<jnoller@gmail.com> wrote:
Brian has already agreed to name spacing it to "concurrent.futures" - this means it will be a small part to a much larger concurrent.* implementation ala Java.
What I would question here is what other things will be part of the "concurrent" package, and who will implement them. Are there plans for that? (or even tracker issues open?)
I'm not sure it is called out explicitly in the PEP, but the specific example that came up in the previous discussions was something like "concurrent.pool" to hold a thread vs process agnostic worker pool interface based on the existing Pool interface in multiprocessing (with concrete implementations for both threading and multiprocessing). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Tue, May 25, 2010 at 7:54 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 23/05/10 22:47, Antoine Pitrou wrote:
On Sun, 23 May 2010 08:34:22 -0400 Jesse Noller<jnoller@gmail.com> wrote:
Brian has already agreed to name spacing it to "concurrent.futures" - this means it will be a small part to a much larger concurrent.* implementation ala Java.
What I would question here is what other things will be part of the "concurrent" package, and who will implement them. Are there plans for that? (or even tracker issues open?)
I'm not sure it is called out explicitly in the PEP, but the specific example that came up in the previous discussions was something like "concurrent.pool" to hold a thread vs process agnostic worker pool interface based on the existing Pool interface in multiprocessing (with concrete implementations for both threading and multiprocessing).
Nick is correct - there's plenty of things in multiprocessing which belong in a more abstract package as they're useful for more things than just multiprocessing. I don't think they need to be called out as part of the PEP though. jesse
On Wed, 26 May 2010 09:54:13 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
What I would question here is what other things will be part of the "concurrent" package, and who will implement them. Are there plans for that? (or even tracker issues open?)
I'm not sure it is called out explicitly in the PEP, but the specific example that came up in the previous discussions was something like "concurrent.pool" to hold a thread vs process agnostic worker pool interface based on the existing Pool interface in multiprocessing (with concrete implementations for both threading and multiprocessing).
Ha, I'm a bit surprised. Isn't it what "futures" already provides? (except that for some reason it insists on the "SomeExecutor" naming scheme) http://www.python.org/dev/peps/pep-3148/#processpoolexecutor Regards Antoine.
On 26/05/10 17:38, Antoine Pitrou wrote:
On Wed, 26 May 2010 09:54:13 +1000 Nick Coghlan<ncoghlan@gmail.com> wrote:
What I would question here is what other things will be part of the "concurrent" package, and who will implement them. Are there plans for that? (or even tracker issues open?)
I'm not sure it is called out explicitly in the PEP, but the specific example that came up in the previous discussions was something like "concurrent.pool" to hold a thread vs process agnostic worker pool interface based on the existing Pool interface in multiprocessing (with concrete implementations for both threading and multiprocessing).
Ha, I'm a bit surprised. Isn't it what "futures" already provides? (except that for some reason it insists on the "SomeExecutor" naming scheme) http://www.python.org/dev/peps/pep-3148/#processpoolexecutor
Not really - a general purpose pool would be a lot more agnostic about how you give the pooled threads/processes work to do and get the results back. Executors are the kind of thing you would build on top of one though. If concurrent.pool was added, then the existing processing pools in multiprocessing and the executors in concurrent.future would be the first use cases for it. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Wed, 26 May 2010 22:32:33 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
Ha, I'm a bit surprised. Isn't it what "futures" already provides? (except that for some reason it insists on the "SomeExecutor" naming scheme) http://www.python.org/dev/peps/pep-3148/#processpoolexecutor
Not really - a general purpose pool would be a lot more agnostic about how you give the pooled threads/processes work to do and get the results back.
Executors are the kind of thing you would build on top of one though. If concurrent.pool was added, then the existing processing pools in multiprocessing and the executors in concurrent.future would be the first use cases for it.
I think I'm a bit ignorant, but how is the Executor abstraction (and its proposed implementations) not generic enough? You have a pool, submit one or several tasks, and can either repeatedly poll for completion or do a blocking wait. (after all, Glyph pointed out that it should be quite easy to wrap the resulting Futures into Deferred objects) cheers Antoine.
On 26 May 2010, at 22:50, Antoine Pitrou wrote:
On Wed, 26 May 2010 22:32:33 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
Ha, I'm a bit surprised. Isn't it what "futures" already provides? (except that for some reason it insists on the "SomeExecutor" naming scheme) http://www.python.org/dev/peps/pep-3148/#processpoolexecutor
Not really - a general purpose pool would be a lot more agnostic about how you give the pooled threads/processes work to do and get the results back.
Executors are the kind of thing you would build on top of one though. If concurrent.pool was added, then the existing processing pools in multiprocessing and the executors in concurrent.future would be the first use cases for it.
I think I'm a bit ignorant, but how is the Executor abstraction (and its proposed implementations) not generic enough? You have a pool, submit one or several tasks, and can either repeatedly poll for completion or do a blocking wait.
(after all, Glyph pointed out that it should be quite easy to wrap the resulting Futures into Deferred objects)
Interesting. Executor.submit() return a Future, which might not be useful in some ThreadPool fire-and-forget use cases but having them doesn't seem harmful. Java does take this approach and it gives you a lot more ways to customize the Executor thread pool i.e. the minimum number of threads running, the maximum number, the amount of time that a thread can be idle before it is killed, the queueing strategy to use (e.g. LIFO, FIFO, priority). Cheers, Brian
On 26/05/10 23:29, Brian Quinlan wrote:
On 26 May 2010, at 22:50, Antoine Pitrou wrote:
I think I'm a bit ignorant, but how is the Executor abstraction (and its proposed implementations) not generic enough? You have a pool, submit one or several tasks, and can either repeatedly poll for completion or do a blocking wait.
(after all, Glyph pointed out that it should be quite easy to wrap the resulting Futures into Deferred objects)
Interesting. Executor.submit() return a Future, which might not be useful in some ThreadPool fire-and-forget use cases but having them doesn't seem harmful.
Java does take this approach and it gives you a lot more ways to customize the Executor thread pool i.e. the minimum number of threads running, the maximum number, the amount of time that a thread can be idle before it is killed, the queueing strategy to use (e.g. LIFO, FIFO, priority).
I would say it is precisely that extra configurability which separates the executor pools in the PEP implementation from more flexible general purpose pools. It's something to investigate somewhere along the line, but, as Jesse pointed out, not something we need to worry about specifically for this PEP (except as an example of another module that may eventually end up in the concurrent package) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On 27/05/10 01:48, Nick Coghlan wrote:
I would say it is precisely that extra configurability which separates the executor pools in the PEP implementation from more flexible general purpose pools.
Wouldn't this be better addressed by adding the relevant options to the futures pools, rather than adding another module that provides almost exactly the same thing with different options? -- Greg
On Thu, 27 May 2010 14:29:28 +1200 Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 27/05/10 01:48, Nick Coghlan wrote:
I would say it is precisely that extra configurability which separates the executor pools in the PEP implementation from more flexible general purpose pools.
Wouldn't this be better addressed by adding the relevant options to the futures pools, rather than adding another module that provides almost exactly the same thing with different options?
+1.
On 27/05/10 12:29, Greg Ewing wrote:
On 27/05/10 01:48, Nick Coghlan wrote:
I would say it is precisely that extra configurability which separates the executor pools in the PEP implementation from more flexible general purpose pools.
Wouldn't this be better addressed by adding the relevant options to the futures pools, rather than adding another module that provides almost exactly the same thing with different options?
It would depend on the details, but my instinct says no (instead, the futures pools would be refactored to be task specific tailorings of the general purpose pools). However, this is all very hypothetical at this point and not really relevant to the PEP review. We may never even bother creating these more general purpose threading pools - it was just an example of the kind of thing that may go alongside the futures module. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Jesse Noller wrote:
On Sun, May 23, 2010 at 7:52 AM, Steve Holden <steve@holdenweb.com> wrote: ...snip...
Issues like the ones I'm bringing up could be fixed pretty straightforwardly if it were just a matter of filing a bug on a small package, but fixing a stdlib module is a major undertaking. True but I don't think that is a convincing argument. A subset of the functionality provided by this module is already available in Java and C++ and (at least in Java) it is used extensively and without too much trouble. If there are implementation bugs then we can fix them just like we would with any other module.
I don't see the availability of this functionality in those languages as any kind of reason why this needs to go into the stdlib now. Is there some desperate rush to get it in? If it were used extensively from PyPi *that* would be a recommendation ...
Not picking Steve's particular comments out - but Brian cites the previous discussions in the PEP itself:
http://www.python.org/dev/peps/pep-3148/
All of you questioning "Why should this be in the standard library" should go read those old threads, where that question was answered numerous times. Now I suddenly regret leaving the floodgates open, as we're rapidly rehashing discussions from months ago.
Yes, it might have been better to call for participation from those who had contributed to the original discussion, and therefore knew what they were talking about. No flood from me, though, all my questions have been answered. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000
On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:
On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:
On May 22, 2010, at 8:47 PM, Brian Quinlan wrote:
Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days.
So fire away!
As you wish!
I retract my request ;-)
May you get what you wish for, may you find what you are seeking :).
The PEP should be consistent in its usage of terminology about callables. It alternately calls them "callables", "functions", and "functions or methods". It would be nice to clean this up and be consistent about what can be called where. I personally like "callables".
Did you find the terminology confusing? If not then I propose not changing it.
Yes, actually. Whenever I see references to the multiprocessing module, I picture a giant "HERE BE (serialization) DRAGONS" sign. When I saw that some things were documented as being "functions", I thought that maybe there was intended to be a restriction like the "these can only be top-level functions so they're easy for different executors to locate and serialize". I didn't realize that the intent was "arbitrary callables" until I carefully re-read the document and noticed that the terminology was inconsistent.
But changing it in the user docs is probably a good idea. I like "callables" too.
Great. Still, users will inevitably find the PEP and use it as documentation too.
The execution context of callable code is not made clear. Implicitly, submit() or map() would run the code in threads or processes as defined by the executor, but that's not spelled out clearly.
Any response to this bit? Did I miss something in the PEP?
More relevant to my own interests, the execution context of the callables passed to add_done_callback and remove_done_callback is left almost completely to the imagination. If I'm reading the sample implementation correctly, <http://code.google.com/p/pythonfutures/source/browse/branches/feedback/python3/futures/process.py#241>, it looks like in the multiprocessing implementation, the done callbacks are invoked in a random local thread. The fact that they are passed the future itself *sort* of implies that this is the case, but the multiprocessing module plays fast and loose with object identity all over the place, so it would be good to be explicit and say that it's *not* a pickled copy of the future sitting in some arbitrary process (or even on some arbitrary machine).
The callbacks will always be called in a thread other than the main thread in the process that created the executor. Is that a strong enough contract?
Sure. Really, almost any contract would work, it just needs to be spelled out. It might be nice to know whether the thread invoking the callbacks is a daemon thread or not, but I suppose it's not strictly necessary.
This is really minor, I know, but why does it say "NOTE: This method can be used to create adapters from Futures to Twisted Deferreds"? First of all, what's the deal with "NOTE"; it's the only "NOTE" in the whole PEP, and it doesn't seem to add anything. This sentence would read exactly the same if that word were deleted. Without more clarity on the required execution context of the callbacks, this claim might not actually be true anyway; Deferred callbacks can only be invoked in the main reactor thread in Twisted. But even if it is perfectly possible, why leave so much of the adapter implementation up to the imagination? If it's important enough to mention, why not have a reference to such an adapter in the reference Futures implementation, since it *should* be fairly trivial to write?
I'm a bit surprised that this doesn't allow for better interoperability with Deferreds given this discussion:
<discussion snipped>
I did not communicate that well. As implemented, it's quite possible to implement a translation layer which turns a Future into a Deferred. What I meant by that comment was, the specification in the PEP was to loose to be sure that such a layer would work with arbitrary executors. For what it's worth, the Deferred translator would look like this, if you want to include it in the PEP (untested though, you may want to run it first): from twisted.internet.defer import Deferred from twisted.internet.reactor import callFromThread def future2deferred(future): d = Deferred() def invoke_deferred(): try: result = future.result() except: d.errback() else: d.callback(result) def done_callback(same_future): callFromThread(invoke_deferred) future.add_done_callback(done_callback) return d This does beg the question of what the traceback will look like in that except: block though. I guess the multi-threaded executor will use python3 exception chaining so Deferred should be able to show a sane traceback in case of an error, but what about exceptions in other processes?
I suggest having have add_done_callback, implementing it with a list so that callbacks are always invoked in the order that they're added, and getting rid of remove_done_callback.
Sounds good to me!
Great! :-)
futures._base.Executor isn't exposed publicly, but it needs to be. The PEP kinda makes it sound like it is ("Executor is an abstract class..."). Plus, A third party library wanting to implement an executor of its own shouldn't have to copy and paste the implementation of Executor.map.
That was a bug that I've fixed. Thanks!
Double-great!
One minor suggestion on the "internal future methods" bit - something I wish we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on separate objects, so that it was very explicit whether you were on the emitting side of a Deferred or the consuming side. That seems to be the case with these internal methods - they are not so much "internal" as they are for the producer of the Future (whether a unit test or executor) so you might want to put them on a different object that it's easy for the thing creating a Future() to get at but hard for any subsequent application code to fiddle with by accident. Off the top of my head, I suggest naming it "Invoker()". A good way to do this would be to have an Invoker class which can't be instantiated (raises an exception from __init__ or somesuch), then a Future.create() method which returns an Invoker, which itself has a '.future' attribute.
No reaction on this part? I think you'll wish you did this in a couple of years when you start bumping into application code that calls "set_result" :).
Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :).
We've already had this discussion before. Could you explain why this module should *not* be in the stdlib e.g. does it have significantly less utility than other modules in stdlib? Is it significantly higher risk? etc?
You've convinced me, mainly because I noticed later on in the discussion that it *has* been released to pypi for several months, and does have a bunch of downloads. It doesn't have quite the popularity I'd personally like to see for stdlib modules, but it's not like you didn't try, and you do (sort of) have a point about small modules being hard to get adoption. I'm sorry that this, my least interesting point in my opinion, is what has seen the most discussion so far. I'd appreciate it if you could do a release to pypi with the bugfixes you mentioned here, to make sure that the released version is consistent with what eventually gets into Python. Oh, and one final nitpick: <http://www.rfc-editor.org/rfc/rfc2606.txt> says you really should not put real domain names into your "web crawl example", especially not "some-made-up-domain.com".
On May 24, 2010, at 5:16 AM, Glyph Lefkowitz wrote:
On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:
On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:
On May 22, 2010, at 8:47 PM, Brian Quinlan wrote:
Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days.
So fire away!
As you wish!
I retract my request ;-)
May you get what you wish for, may you find what you are seeking :).
The PEP should be consistent in its usage of terminology about callables. It alternately calls them "callables", "functions", and "functions or methods". It would be nice to clean this up and be consistent about what can be called where. I personally like "callables".
Did you find the terminology confusing? If not then I propose not changing it.
Yes, actually. Whenever I see references to the multiprocessing module, I picture a giant "HERE BE (serialization) DRAGONS" sign. When I saw that some things were documented as being "functions", I thought that maybe there was intended to be a restriction like the "these can only be top-level functions so they're easy for different executors to locate and serialize". I didn't realize that the intent was "arbitrary callables" until I carefully re-read the document and noticed that the terminology was inconsistent.
ProcessPoolExecutor has the same serialization perils that multiprocessing does. My original plan was to link to the multiprocessing docs to explain them but I couldn't find them listed.
But changing it in the user docs is probably a good idea. I like "callables" too.
Great. Still, users will inevitably find the PEP and use it as documentation too.
The execution context of callable code is not made clear. Implicitly, submit() or map() would run the code in threads or processes as defined by the executor, but that's not spelled out clearly.
Any response to this bit? Did I miss something in the PEP?
Yes, the execution context is Executor-dependent. The section under ProcessPoolExecutor and ThreadPoolExecutor spells this out, I think.
, it looks like in the multiprocessing implementation, the done callbacks are invoked in a random local thread. The fact that
More relevant to my own interests, the execution context of the callables passed to add_done_callback and remove_done_callback is left almost completely to the imagination. If I'm reading the sample implementation correctly, <http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pytho... they are passed the future itself *sort* of implies that this is the case, but the multiprocessing module plays fast and loose with object identity all over the place, so it would be good to be explicit and say that it's *not* a pickled copy of the future sitting in some arbitrary process (or even on some arbitrary machine).
The callbacks will always be called in a thread other than the main thread in the process that created the executor. Is that a strong enough contract?
Sure. Really, almost any contract would work, it just needs to be spelled out. It might be nice to know whether the thread invoking the callbacks is a daemon thread or not, but I suppose it's not strictly necessary.
Your concerns is that the thread will be killed when the interpreter exits? It won't be.
This is really minor, I know, but why does it say "NOTE: This method can be used to create adapters from Futures to Twisted Deferreds"? First of all, what's the deal with "NOTE"; it's the only "NOTE" in the whole PEP, and it doesn't seem to add anything. This sentence would read exactly the same if that word were deleted. Without more clarity on the required execution context of the callbacks, this claim might not actually be true anyway; Deferred callbacks can only be invoked in the main reactor thread in Twisted. But even if it is perfectly possible, why leave so much of the adapter implementation up to the imagination? If it's important enough to mention, why not have a reference to such an adapter in the reference Futures implementation, since it *should* be fairly trivial to write?
I'm a bit surprised that this doesn't allow for better interoperability with Deferreds given this discussion:
<discussion snipped>
I did not communicate that well. As implemented, it's quite possible to implement a translation layer which turns a Future into a Deferred. What I meant by that comment was, the specification in the PEP was to loose to be sure that such a layer would work with arbitrary executors.
For what it's worth, the Deferred translator would look like this, if you want to include it in the PEP (untested though, you may want to run it first):
from twisted.internet.defer import Deferred from twisted.internet.reactor import callFromThread
def future2deferred(future): d = Deferred() def invoke_deferred(): try: result = future.result() except: d.errback() else: d.callback(result) def done_callback(same_future): callFromThread(invoke_deferred) future.add_done_callback(done_callback) return d
This does beg the question of what the traceback will look like in that except: block though. I guess the multi-threaded executor will use python3 exception chaining so Deferred should be able to show a sane traceback in case of an error, but what about exceptions in other processes?
I suggest having have add_done_callback, implementing it with a list so that callbacks are always invoked in the order that they're added, and getting rid of remove_done_callback.
Sounds good to me!
Great! :-)
futures._base.Executor isn't exposed publicly, but it needs to be. The PEP kinda makes it sound like it is ("Executor is an abstract class..."). Plus, A third party library wanting to implement an executor of its own shouldn't have to copy and paste the implementation of Executor.map.
That was a bug that I've fixed. Thanks!
Double-great!
One minor suggestion on the "internal future methods" bit - something I wish we'd done with Deferreds was to put 'callback()' and 'addCallbacks()' on separate objects, so that it was very explicit whether you were on the emitting side of a Deferred or the consuming side. That seems to be the case with these internal methods - they are not so much "internal" as they are for the producer of the Future (whether a unit test or executor) so you might want to put them on a different object that it's easy for the thing creating a Future() to get at but hard for any subsequent application code to fiddle with by accident. Off the top of my head, I suggest naming it "Invoker()". A good way to do this would be to have an Invoker class which can't be instantiated (raises an exception from __init__ or somesuch), then a Future.create() method which returns an Invoker, which itself has a '.future' attribute.
No reaction on this part? I think you'll wish you did this in a couple of years when you start bumping into application code that calls "set_result" :).
My reactions are mixed ;-) Your proposal is to add a level of indirection to make it harder for people to call implementation methods. The downside is that it makes it a bit harder to write tests and Executors. I also can't see a big problem in letting people call set_result in client code though it is documented as being only for Executor implementations and tests. On the implementation side, I don't see why an Invoker needs a reference to the future. Each Invoker could own one Future. A reference to the Invoker is kept by the Executor and its future is returned to the client i.e. class Invoker(object): def __init__(self): """Should only be called by Executor implementations.""" self.future = Future() def set_running_or_notify_cancel(self): # Messes with self.future's internals def set_result(self): # Messes with self.future's internals def set_exception(self): # Messes with self.future's internals Cheers, Brian
Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process - except maybe to prompt feedback like this :).
We've already had this discussion before. Could you explain why this module should *not* be in the stdlib e.g. does it have significantly less utility than other modules in stdlib? Is it significantly higher risk? etc?
You've convinced me, mainly because I noticed later on in the discussion that it *has* been released to pypi for several months, and does have a bunch of downloads. It doesn't have quite the popularity I'd personally like to see for stdlib modules, but it's not like you didn't try, and you do (sort of) have a point about small modules being hard to get adoption. I'm sorry that this, my least interesting point in my opinion, is what has seen the most discussion so far.
I'd appreciate it if you could do a release to pypi with the bugfixes you mentioned here, to make sure that the released version is consistent with what eventually gets into Python.
Oh, and one final nitpick: <http://www.rfc-editor.org/rfc/ rfc2606.txt> says you really should not put real domain names into your "web crawl example", especially not "some-made-up-domain.com".
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brian%40sweetapp.com
On May 24, 2010, at 5:36 AM, Brian Quinlan wrote:
On May 24, 2010, at 5:16 AM, Glyph Lefkowitz wrote:
On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:
On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:
ProcessPoolExecutor has the same serialization perils that multiprocessing does. My original plan was to link to the multiprocessing docs to explain them but I couldn't find them listed.
Linking to the pickle documentation might be a good start.
Yes, the execution context is Executor-dependent. The section under ProcessPoolExecutor and ThreadPoolExecutor spells this out, I think.
I suppose so. I guess I'm just looking for more precise usage of terminology. (This is a PEP, after all. It's a specification that multiple VMs may have to follow, not just some user documentation for a package, even if they'll *probably* be using your code in all cases.) I'd be happier if there were a clearer term than "calls" for the things being scheduled ("submissions"?), since the done callbacks aren't called in the subprocess for ProcessPoolExecutor, as we just discussed.
Sure. Really, almost any contract would work, it just needs to be spelled out. It might be nice to know whether the thread invoking the callbacks is a daemon thread or not, but I suppose it's not strictly necessary.
Your concerns is that the thread will be killed when the interpreter exits? It won't be.
Good to know. Tell it to the PEP though, not me ;).
No reaction on [invoker vs. future]? I think you'll wish you did this in a couple of years when you start bumping into application code that calls "set_result" :).
My reactions are mixed ;-)
Well, you are not obliged to take my advice, as long as I am not obliged to refrain from mocking you mercilessly if it happens that I was right in a couple of years ;-).
Your proposal is to add a level of indirection to make it harder for people to call implementation methods. The downside is that it makes it a bit harder to write tests and Executors.
Both tests and executors will still create and invoke methods directly on one object; the only additional difficulty seems to be the need to type '.future' every so often on the executor/testing side of things, and that seems a cost well worth paying to avoid confusion over who is allowed to call those methods and when.
I also can't see a big problem in letting people call set_result in client code though it is documented as being only for Executor implementations and tests.
On the implementation side, I don't see why an Invoker needs a reference to the future.
Well, uh...
class Invoker(object): def __init__(self): """Should only be called by Executor implementations.""" self.future = Future() ^ this is what I'd call a "reference to the future"
On 26 May 2010, at 18:09, Glyph Lefkowitz wrote:
On May 24, 2010, at 5:36 AM, Brian Quinlan wrote:
On May 24, 2010, at 5:16 AM, Glyph Lefkowitz wrote:
On May 23, 2010, at 2:37 AM, Brian Quinlan wrote:
On May 23, 2010, at 2:44 PM, Glyph Lefkowitz wrote:
ProcessPoolExecutor has the same serialization perils that multiprocessing does. My original plan was to link to the multiprocessing docs to explain them but I couldn't find them listed.
Linking to the pickle documentation might be a good start.
Will do.
Yes, the execution context is Executor-dependent. The section under ProcessPoolExecutor and ThreadPoolExecutor spells this out, I think.
I suppose so. I guess I'm just looking for more precise usage of terminology. (This is a PEP, after all. It's a specification that multiple VMs may have to follow, not just some user documentation for a package, even if they'll *probably* be using your code in all cases.) I'd be happier if there were a clearer term than "calls" for the things being scheduled ("submissions"?), since the done callbacks aren't called in the subprocess for ProcessPoolExecutor, as we just discussed.
Sure. Really, almost any contract would work, it just needs to be spelled out. It might be nice to know whether the thread invoking the callbacks is a daemon thread or not, but I suppose it's not strictly necessary.
Your concerns is that the thread will be killed when the interpreter exits? It won't be.
Good to know. Tell it to the PEP though, not me ;).
Will do.
No reaction on [invoker vs. future]? I think you'll wish you did this in a couple of years when you start bumping into application code that calls "set_result" :).
My reactions are mixed ;-)
Well, you are not obliged to take my advice, as long as I am not obliged to refrain from mocking you mercilessly if it happens that I was right in a couple of years ;-).
I was looking for your reasoning rather than trying to negotiate the circumstances under which you would mock me.
Your proposal is to add a level of indirection to make it harder for people to call implementation methods. The downside is that it makes it a bit harder to write tests and Executors.
Both tests and executors will still create and invoke methods directly on one object; the only additional difficulty seems to be the need to type '.future' every so often on the executor/testing side of things, and that seems a cost well worth paying to avoid confusion over who is allowed to call those methods and when.
I also can't see a big problem in letting people call set_result in client code though it is documented as being only for Executor implementations and tests.
On the implementation side, I don't see why an Invoker needs a reference to the future.
Well, uh...
class Invoker(object): def __init__(self): """Should only be called by Executor implementations.""" self.future = Future() ^ this is what I'd call a "reference to the future"
I said exactly the opposite of what I meant: futures don't need a reference to the invoker. Cheers, Brian
On May 26, 2010, at 4:55 AM, Brian Quinlan wrote:
I said exactly the opposite of what I meant: futures don't need a reference to the invoker.
Indeed they don't, and they really shouldn't have one. If I wrote that they did, then it was an error. ... and that appears to be it! Thank you for your very gracious handling of a pretty huge pile of criticism :). Good luck with the PEP, -glyph
Glyph Lefkowitz wrote:
Finally, why isn't this just a module on PyPI? It doesn't seem like there's any particular benefit to making this a stdlib module and going through the whole PEP process
I'm inclined to agree. This needs to be field-tested before being considered for stdlib inclusion. -- Greg
Hi On Sun, May 23, 2010 at 10:47:08AM +1000, Brian Quinlan wrote:
Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days.
So fire away!
In thread.py the module automatically registers a handler with atexit. I don't think I'm alone in thinking libraries should not be doing this sort of thing unconditionally behind a user's back. I'm also not so sure how comfortable I am with the module-level globals. Would it not be possible to have an exit handler on each thread pool which the documentation reccomends you register with atexit if it suits your application? I think that would get rid of the global singletons and hidden atexit in a fairly elegant way. Lastly _base.py creates a LOGGER (connected to sys.stderr if I understand correctly) and only logs a critical message to it at the same time as a RuntimeError is raised. While I don't necessarily dislike that it uses a logger, I don't like that it's wired up to sys.stderr I rather think it's the application's duty to create a handler if it wants one. But given that it's only used at the same time as a RuntimeError it does seem redundant. Regards Floris PS: I've only looked at the threading part of the implementation. -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
On 26 May 2010, at 22:06, Floris Bruynooghe wrote:
Hi
On Sun, May 23, 2010 at 10:47:08AM +1000, Brian Quinlan wrote:
Jesse, the designated pronouncer for this PEP, has decided to keep discussion open for a few more days.
So fire away!
In thread.py the module automatically registers a handler with atexit. I don't think I'm alone in thinking libraries should not be doing this sort of thing unconditionally behind a user's back. I'm also not so sure how comfortable I am with the module-level globals.
Would it not be possible to have an exit handler on each thread pool which the documentation reccomends you register with atexit if it suits your application? I think that would get rid of the global singletons and hidden atexit in a fairly elegant way.
First let me explain why I install at atexit handler. Imagine that the you write a script like this: t = ThreadPoolExecutor(1) t.submit(lambda url: print(urllib.open(url).read()), 'http://www.apple.com/') You have two semantic choices here: 1. let the interpreter exit with the future still running 2. wait until the future finishes and then exit I chose (2) but can be convinced otherwise. The obvious way to accomplish this is to make the worker thread non-daemon so the interpreter won't exit while it is running. But since the worker thread is part of a pool, it won't stop while it's executor is alive. So my approach was to make worker threads daemon and install an atexit handler that sets a global indicating that the interpreter is exiting so any workers should exit when when their work queues are empty. It then calls join on each worker thread so the interpreter will not exit until they are finished. I think that this approach is reasonable assuming that you want (2). I also don't have the aversion to globals that you do :-)
Lastly _base.py creates a LOGGER (connected to sys.stderr if I understand correctly) and only logs a critical message to it at the same time as a RuntimeError is raised. While I don't necessarily dislike that it uses a logger, I don't like that it's wired up to sys.stderr I rather think it's the application's duty to create a handler if it wants one. But given that it's only used at the same time as a RuntimeError it does seem redundant.
The LOGGER is only use for "impossible" exceptions (exceptions in the machinery of the module itself) that won't be propagated because they occur in worker threads. Cheers, Brian
Regards Floris
PS: I've only looked at the threading part of the implementation.
-- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/brian%40sweetapp.com
On 27/05/10 00:31, Brian Quinlan wrote:
You have two semantic choices here: 1. let the interpreter exit with the future still running 2. wait until the future finishes and then exit
I'd go for (1). I don't think it's unreasonable to expect a program that wants all its tasks to finish to explicitly wait for that to happen. Also, automatically doing (2) would seem to make it difficult for a program to bail out if something unexpected happens. It would have to explicitly shut down the thread pool instead of just letting an exception propagate. -- Greg
On Thu, May 27, 2010 at 01:46:07PM +1200, Greg Ewing wrote:
On 27/05/10 00:31, Brian Quinlan wrote:
You have two semantic choices here: 1. let the interpreter exit with the future still running 2. wait until the future finishes and then exit
I'd go for (1). I don't think it's unreasonable to expect a program that wants all its tasks to finish to explicitly wait for that to happen.
I'd got for (1) as well, it's no more then reasonable that if you want a result you wait for it. And I dislike libraries doing magic you can't see, I'd prefer if I explicitly had to shut a pool down. And yes, if you shut the interpreter down while threads are running they sometimes wake up at the wrong time to find the world around them destroyed. But that's part of programming with threads so it's not like the futures lib suddenly makes things behave differently. I'm glad I'm not alone in preferring (1) tough. Regards Floris -- Debian GNU/Linux -- The Power of Freedom www.debian.org | www.gnu.org | www.kernel.org
On 27 May 2010, at 17:53, Floris Bruynooghe wrote:
On Thu, May 27, 2010 at 01:46:07PM +1200, Greg Ewing wrote:
On 27/05/10 00:31, Brian Quinlan wrote:
You have two semantic choices here: 1. let the interpreter exit with the future still running 2. wait until the future finishes and then exit
I'd go for (1). I don't think it's unreasonable to expect a program that wants all its tasks to finish to explicitly wait for that to happen.
I'd got for (1) as well, it's no more then reasonable that if you want a result you wait for it. And I dislike libraries doing magic you can't see, I'd prefer if I explicitly had to shut a pool down. And yes, if you shut the interpreter down while threads are running they sometimes wake up at the wrong time to find the world around them destroyed. But that's part of programming with threads so it's not like the futures lib suddenly makes things behave differently.
I'm glad I'm not alone in preferring (1) tough.
Keep in mind that this library magic is consistent with the library magic that the threading module does - unless the user sets Thread.daemon to True, the interpreter does *not* exit until the thread does. Cheers, Brian
On 27/05/10 18:13, Brian Quinlan wrote:
On 27 May 2010, at 17:53, Floris Bruynooghe wrote:
I'm glad I'm not alone in preferring (1) tough.
Keep in mind that this library magic is consistent with the library magic that the threading module does - unless the user sets Thread.daemon to True, the interpreter does *not* exit until the thread does.
Along those lines, an Executor.daemon option may be a good idea. That way the default behaviour is to wait until things are done (just like threading itself), but it is easy for someone to turn that behaviour off for a specific executor. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Thu, May 27, 2010 at 4:13 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Keep in mind that this library magic is consistent with the library magic that the threading module does - unless the user sets Thread.daemon to True, the interpreter does *not* exit until the thread does.
Is there a compelling to make the threads daemon threads? If not, perhaps they can just be normal threads, and you can rely on the threading module to wait for them to finish. Unrelatedly, I feel like this behavior of waiting for the thread to terminate usually manifests as deadlocks when the main thread throws an uncaught exception. The application then no longer responds properly to interrupts, since it's stuck waiting on a semaphore. I guess it's better than the alternative of random crashes when daemon threads wake up during interpreter shutdown, though. Reid
On May 28, 2010, at 11:57 AM, Reid Kleckner wrote:
On Thu, May 27, 2010 at 4:13 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Keep in mind that this library magic is consistent with the library magic that the threading module does - unless the user sets Thread.daemon to True, the interpreter does *not* exit until the thread does.
Is there a compelling to make the threads daemon threads? If not, perhaps they can just be normal threads, and you can rely on the threading module to wait for them to finish.
Did you read my explanation of the reasoning behind my approach? Cheers, Brian
Unrelatedly, I feel like this behavior of waiting for the thread to terminate usually manifests as deadlocks when the main thread throws an uncaught exception. The application then no longer responds properly to interrupts, since it's stuck waiting on a semaphore. I guess it's better than the alternative of random crashes when daemon threads wake up during interpreter shutdown, though.
Reid
On Thu, May 27, 2010 at 8:06 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On May 28, 2010, at 11:57 AM, Reid Kleckner wrote:
On Thu, May 27, 2010 at 4:13 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Keep in mind that this library magic is consistent with the library magic that the threading module does - unless the user sets Thread.daemon to True, the interpreter does *not* exit until the thread does.
Is there a compelling to make the threads daemon threads? If not, perhaps they can just be normal threads, and you can rely on the threading module to wait for them to finish.
Did you read my explanation of the reasoning behind my approach?
You should try to link to explanations. These have been long threads, and it's often hard to find the previous message where a subject was addressed.
On Thu, 27 May 2010 21:57:48 -0400 Reid Kleckner <rnk@mit.edu> wrote:
Unrelatedly, I feel like this behavior of waiting for the thread to terminate usually manifests as deadlocks when the main thread throws an uncaught exception. The application then no longer responds properly to interrupts, since it's stuck waiting on a semaphore.
I think the internal low-level lock implementation should be fixed so that it runs PyErr_CheckSignals() and is able to signal an error on function return (rather than the current binary "lock succeeded" / "lock timed out" status). Actually, it would be nice if you could open a bug entry for that :) Regards Antoine.
On 5/27/2010 4:13 AM, Brian Quinlan wrote:
On 27 May 2010, at 17:53, Floris Bruynooghe wrote:
On Thu, May 27, 2010 at 01:46:07PM +1200, Greg Ewing wrote:
On 27/05/10 00:31, Brian Quinlan wrote:
You have two semantic choices here: 1. let the interpreter exit with the future still running 2. wait until the future finishes and then exit
I'd go for (1). I don't think it's unreasonable to expect a program that wants all its tasks to finish to explicitly wait for that to happen.
I'd got for (1) as well, it's no more then reasonable that if you want a result you wait for it. And I dislike libraries doing magic you can't see, I'd prefer if I explicitly had to shut a pool down.
I'm glad I'm not alone in preferring (1) tough.
Keep in mind that this library magic is consistent with the library magic that the threading module does - unless the user sets Thread.daemon to True, the interpreter does *not* exit until the thread does.
Given your rationale, I don't understand from the PEP:
shutdown(wait=True)
Signal the executor that it should free any resources that it is using when the currently pending futures are done executing. Calls to Executor.submit and Executor.map and made after shutdown will raise RuntimeError.
If wait is True then the executor will not return until all the pending futures are done executing and the resources associated with the executor have been freed.
Can you tell me what is the expected execution time of the following:
executor = ThreadPoolExecutor(max_workers=1) executor.submit(lambda: time.sleep(1000)) executor.shutdown(wait=False) sys.exit(0)
I believe it's 1000 seconds, which seems to defy my request of shutdown(wait=False) because "secretly" the Python exit is going to wait anyways. ISTM, it is much easier to get behavior #2 if you have behavior #1, and it would also seem rather trivial to make ThreadPoolExecutor take an optional argument specifying which behavior you want. Your reference implementation does not actually implement the specification given in the PEP, so it's quite impossible to check this myself. There is no wait=True option for shutdown() in the reference implementation, so I can only guess what that implementation might look like. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu
On May 28, 2010, at 1:39 PM, Scott Dial wrote:
On 5/27/2010 4:13 AM, Brian Quinlan wrote:
On 27 May 2010, at 17:53, Floris Bruynooghe wrote:
On Thu, May 27, 2010 at 01:46:07PM +1200, Greg Ewing wrote:
On 27/05/10 00:31, Brian Quinlan wrote:
You have two semantic choices here: 1. let the interpreter exit with the future still running 2. wait until the future finishes and then exit
I'd go for (1). I don't think it's unreasonable to expect a program that wants all its tasks to finish to explicitly wait for that to happen.
I'd got for (1) as well, it's no more then reasonable that if you want a result you wait for it. And I dislike libraries doing magic you can't see, I'd prefer if I explicitly had to shut a pool down.
I'm glad I'm not alone in preferring (1) tough.
Keep in mind that this library magic is consistent with the library magic that the threading module does - unless the user sets Thread.daemon to True, the interpreter does *not* exit until the thread does.
Given your rationale, I don't understand from the PEP:
shutdown(wait=True)
Signal the executor that it should free any resources that it is using when the currently pending futures are done executing. Calls to Executor.submit and Executor.map and made after shutdown will raise RuntimeError.
If wait is True then the executor will not return until all the pending futures are done executing and the resources associated with the executor have been freed.
Can you tell me what is the expected execution time of the following:
executor = ThreadPoolExecutor(max_workers=1) executor.submit(lambda: time.sleep(1000)) executor.shutdown(wait=False) sys.exit(0)
I believe it's 1000 seconds, which seems to defy my request of shutdown(wait=False) because "secretly" the Python exit is going to wait anyways.
It would take 1000 seconds. "...then the executor will not return..." should read "...then the method will not return...".
ISTM, it is much easier to get behavior #2 if you have behavior #1, and it would also seem rather trivial to make ThreadPoolExecutor take an optional argument specifying which behavior you want.
Adding a daemon option would be reasonable. If you don't shutdown your executors you are pretty much guaranteed to get random traceback output on exit through.
Your reference implementation does not actually implement the specification given in the PEP, so it's quite impossible to check this myself. There is no wait=True option for shutdown() in the reference implementation, so I can only guess what that implementation might look like.
Look at around line 129 in: http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pytho... Cheers, Brian
On Sat, 22 May 2010 10:09:37 -0400, Jesse Noller <jnoller@gmail.com> wrote:
On Sat, May 22, 2010 at 9:59 AM, R. David Murray <rdmurray@bitdance.com> wr= ote:
On Sat, 22 May 2010 19:12:05 +1000, Brian Quinlan <brian@sweetapp.com> wr= ote:
On May 22, 2010, at 5:30 AM, Dj Gilcrease wrote:
On Fri, May 21, 2010 at 8:26 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Except that now isn't the time for that discussion. This PEP has discussed on-and-off for several months on both stdlib-sig and python-dev.
I think any time till the PEP is accepted is a good time to discuss changes to the API
I disagree. If a PEP is being updated continuously then there is nothing stable to pronounce on.
Well, you've been making updates as a result of this round of discussion.
If there is still discussion then perhaps the PEP isn't ready for pronouncement yet. =A0At some point someone can decide it is all bikeshedding and ask for pronouncement on that basis, but I don't think it is appropriate to cut off discussion by saying "it's ready for pronouncement" unless you want increase the chances of its getting rejected.
I commiserate with Brian here - he's been very patient, and has been working on things, taking in input, etc for awhile now on this. In his mind, it is done (or at least incredibly close to done) and opening the door in the conversation for more API nit picking and debate about the exact verbiage on method names means we're never going to be done splashing paint.
OK, so you are saying that the comments in question are bikeshedding. I can accept that easily. What I was trying to point out was that saying "discussion is closed" is not the best way to nurture community consensus. Saying "we've reached the bikesheedding point, let's pronounce" is a very different thing to my mind, even if it is only a matter of tone. Have fun pronouncing ;) -- R. David Murray www.bitdance.com
On 22 May 2010, at 23:59, R. David Murray wrote:
On Sat, 22 May 2010 19:12:05 +1000, Brian Quinlan <brian@sweetapp.com> wrote:
On May 22, 2010, at 5:30 AM, Dj Gilcrease wrote:
On Fri, May 21, 2010 at 8:26 AM, Brian Quinlan <brian@sweetapp.com> wrote:
Except that now isn't the time for that discussion. This PEP has discussed on-and-off for several months on both stdlib-sig and python-dev.
I think any time till the PEP is accepted is a good time to discuss changes to the API
I disagree. If a PEP is being updated continuously then there is nothing stable to pronounce on.
Well, you've been making updates as a result of this round of discussion.
Yes, I've been making documentation and PEP updates to clarify points that people found confusing and will continue to do so.
If there is still discussion then perhaps the PEP isn't ready for pronouncement yet. At some point someone can decide it is all bikeshedding and ask for pronouncement on that basis, but I don't think it is appropriate to cut off discussion by saying "it's ready for pronouncement" unless you want increase the chances of its getting rejected.
Here are the new proposed non-documentation changes that I've collected (let me know if I've missed any): Rename "executor" => "executer" Rename "submit" to "apply" Rename "done" to "finished" Rename "not_finished" to "pending" Rename "FIRST_COMPLETED" to "ONE_COMPLETED" We can discuss naming for all eternity and never reach a point where even half of the participants are satisfied. Since naming has been discussed extensively here and in stdlib-sig, I think that we have to decide that it is good enough and move on. Or decide that it isn't good enough and reject the PEP. Cheers, Brian
The usual way of doing this (at least so far as I have observed, which granted hasn't been too many cases) is to say something like "I think this PEP is ready for pronouncement" and then wait for feedback on that assertion or for the pronouncement. It's especially good if you can answer any concerns that are raised with "that was discussed already and we concluded X". Bonus points for finding a thread reference and adding it to the PEP :)
-- R. David Murray www.bitdance.com
I think the PEP's overall API is good to go. On Sat, May 22, 2010 at 4:12 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On 22 May 2010, at 23:59, R. David Murray wrote:
If there is still discussion then perhaps the PEP isn't ready for pronouncement yet. At some point someone can decide it is all bikeshedding and ask for pronouncement on that basis, but I don't think it is appropriate to cut off discussion by saying "it's ready for pronouncement" unless you want increase the chances of its getting rejected.
Here are the new proposed non-documentation changes that I've collected (let me know if I've missed any):
...
I propose to rename the Future.result method to Future.get. "get" is what Java (http://java.sun.com/javase/7/docs/api/java/util/concurrent/Future.html) and C++ (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf section 30.6.6 para 12) use, and the word "result" doesn't seem particularly better or worse than "get" for our purposes, which inclines me to stay consistent.
We can discuss naming for all eternity and never reach a point where even half of the participants are satisfied.
Agreed. To reduce the length of the discussion, I'm not going to reply to counter-arguments to my proposal, but I think it'll be useful to Jesse if people who agree or disagree speak up briefly. I'll reply the other naming proposals in another message. Jeffrey
On May 23, 2010, at 9:43 AM, Jeffrey Yasskin wrote:
I think the PEP's overall API is good to go.
On Sat, May 22, 2010 at 4:12 PM, Brian Quinlan <brian@sweetapp.com> wrote:
On 22 May 2010, at 23:59, R. David Murray wrote:
If there is still discussion then perhaps the PEP isn't ready for pronouncement yet. At some point someone can decide it is all bikeshedding and ask for pronouncement on that basis, but I don't think it is appropriate to cut off discussion by saying "it's ready for pronouncement" unless you want increase the chances of its getting rejected.
Here are the new proposed non-documentation changes that I've collected (let me know if I've missed any):
...
I propose to rename the Future.result method to Future.get. "get" is what Java (http://java.sun.com/javase/7/docs/api/java/util/concurrent/Future.html ) and C++ (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3092.pdf section 30.6.6 para 12) use, and the word "result" doesn't seem particularly better or worse than "get" for our purposes, which inclines me to stay consistent.
In C++ and Java, there is only one result-retrieving method so "get" seems like a reasonable name. My implementation has a second method .exception(), which returns the exception raised by the submitted function (or None if no exception was raised). I thought that having multiple getter methods, where one is called .get() would be a bit confusing. But I don't really care so I'm -0. Cheers, Brian
On Sat, May 22, 2010 at 4:12 PM, Brian Quinlan <brian@sweetapp.com> wrote:
Rename "executor" => "executer"
-1 for consistency with Java.
Rename "submit" to "apply"
"apply" focuses attention on the function object, while "submit" focuses attention, properly I think, on the fact that you're handing something to the executor to run. So -1.
Rename "done" to "finished"
"done" is nice and short, and I don't think "finished" or "completed" will be any less prone to people thinking the task actually ran. So -1.
Rename "not_finished" to "pending"
+0.5. Doesn't matter that much, but pending is used elsewhere in the proposal for this concept. On the other hand, "pending" could be thought to refer to the state before "running". Possibly "finished" should be renamed to "done" here, since it's described as '"finished", contains the futures that completed (finished or were cancelled)', which uses "finished" for two different concepts.
Rename "FIRST_COMPLETED" to "ONE_COMPLETED"
"ONE_COMPLETED" could imply that the first result set must contain exactly one element, but in fact, if multiple tasks finish before the waiting thread has a chance to wake up, multiple futures could be returned as done. So -1. And like my other post, I won't argue about these, leaving the actual decision up to Brian and Jesse.
On May 23, 2010, at 10:06 AM, Jeffrey Yasskin wrote:
On Sat, May 22, 2010 at 4:12 PM, Brian Quinlan <brian@sweetapp.com> wrote:
Rename "executor" => "executer"
-1 for consistency with Java.
-1 pending an explanation of why "executer" is better
Rename "submit" to "apply"
"apply" focuses attention on the function object, while "submit" focuses attention, properly I think, on the fact that you're handing something to the executor to run. So -1.
-1
Rename "done" to "finished"
"done" is nice and short, and I don't think "finished" or "completed" will be any less prone to people thinking the task actually ran. So -1.
-0
Rename "not_finished" to "pending"
+0.5. Doesn't matter that much, but pending is used elsewhere in the proposal for this concept. On the other hand, "pending" could be thought to refer to the state before "running". Possibly "finished" should be renamed to "done" here, since it's described as '"finished", contains the futures that completed (finished or were cancelled)', which uses "finished" for two different concepts.
I think that using "finished" is bad terminology here. So +1 to "finished" => "done". I don't have a preference for "not_done" vs. "pending".
Rename "FIRST_COMPLETED" to "ONE_COMPLETED"
"ONE_COMPLETED" could imply that the first result set must contain exactly one element, but in fact, if multiple tasks finish before the waiting thread has a chance to wake up, multiple futures could be returned as done. So -1.
A logician would probably call it "SOME_COMPLETED". What about "ANY_COMPLETED"? Though I think that "FIRST_COMPLETED" still reads better. Cheers, Brian
On 5/22/2010 8:06 PM, Jeffrey Yasskin wrote:
On Sat, May 22, 2010 at 4:12 PM, Brian Quinlan<brian@sweetapp.com> wrote:
Rename "executor" => "executer"
-1 for consistency with Java.
-10 for 'executer'. As far as I can find out, it is a misspelling of 'executor'. If the designers of some other language made a stupid mistake, let them correct it instead of us following them over a cliff. Unlike this one, the other name suggestions look at least plausible and worth a couple of minutes of consideration. As for the module itself, part of the justification to me for accepting it would be if it is part of a larger plan, even if currently vague, to refactor and improve Python's support for concurrent execution, as implied by the name 'concurrent.futures'. If Jesse accepts it, I would take it as some kind of commitment to help with at least one other concurrent.x module so this one is not an orphan. While concurrent execution does not *currently* affect me, I am convinced that better support will be important for Python's future. Terry Jan Reedy
On 2010-05-23, Terry Reedy wrote:
On 5/22/2010 8:06 PM, Jeffrey Yasskin wrote:
On Sat, May 22, 2010 at 4:12 PM, Brian Quinlan<brian@sweetapp.com> wrote:
Rename "executor" => "executer"
-1 for consistency with Java.
-10 for 'executer'. As far as I can find out, it is a misspelling of 'executor'. If the designers of some other language made a stupid mistake, let them correct it instead of us following them over a cliff.
I'd suggested this because it seemed obvious to me, but clearly not. Compare: http://www.thefreedictionary.com/executor http://www.thefreedictionary.com/executer However, as I mentioned in the first place I didn't expect any change of this since Java uses the first spelling. [snip] -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Advanced Qt Programming" - ISBN 0321635906
participants (35)
-
Andrew Svetlov
-
Antoine Pitrou
-
Brett Cannon
-
Brian Quinlan
-
Cameron Simpson
-
Dirkjan Ochtman
-
Dj Gilcrease
-
Eric Smith
-
Floris Bruynooghe
-
Georg Brandl
-
geremy condra
-
Glyph Lefkowitz
-
Greg Ewing
-
Guido van Rossum
-
Jeffrey Yasskin
-
Jesse Noller
-
John Arbash Meinel
-
John Arbash Meinel
-
Jon Ribbens
-
Lennart Regebro
-
Mark Summerfield
-
Michael Foord
-
Nick Coghlan
-
Paul Moore
-
R. David Murray
-
Reid Kleckner
-
Robert Collins
-
Sandro Tosi
-
Scott Dial
-
Stephen J. Turnbull
-
Steve Holden
-
Steven D'Aprano
-
Terry Reedy
-
Vinay Sajip
-
Yaniv Aknin