Thoughts fresh after EuroPython
While the EuroPython sprints are still going on, I am back home, and after a somewhat restful night of sleep, I have some thoughts I'd like to share before I get distracted. Note, I am jumping wildly between topics. - Commit privileges: Maybe we've been too careful with only giving commit privileges to to experienced and trusted new developers. I spoke to Ezio Melotti and from his experience with getting commit privileges, it seems to be a case of "the lion is much more afraid of you than you are afraid of the lion". I.e. having got privileges he was very concerned about doing something wrong, worried about the complexity of SVN, and so on. Since we've got lots of people watching the commit stream, I think that there really shouldn't need to be a worry at all about a new committer doing something malicious, and there shouldn't be much worry about honest beginners' mistakes either -- the main worry remains that new committers don't use their privileges enough. So, my recommendation (which surely is a turn-around of my *own* attitude in the past) is to give out more commit privileges sooner. - Concurrency and parallelism: Russel Winder and Sarah Mount pushed the idea of CSP (http://en.wikipedia.org/wiki/Communicating_sequential_processes) in several talks at the conference. They (at least Russell) emphasized the difference between concurrency (interleaved event streams) and parallelism (using many processors to speed things up). Their prediction is that as machines with many processing cores become more prevalent, the relevant architecture will change from cores sharing a single coherent memory (the model on which threads are based) to one where each core has a limited amount of private memory, and communication is done via message passing between the cores. This gives them (and me :-) hope that the GIL won't be a problem as long as we adopt a parallel processing model. Two competing models are the Actor model, which is based on asynchronous communication, and CSP, which is synchronous (when a writer writes to a channel, it blocks until a reader reads that value -- a rendezvous). At least Sarah suggested that both models are important. She also mentioned that a merger is under consideration between the two major CSP-for-Python packages, Py-CSP and Python-CSP. I also believe that the merger will be based on the stdlib multiprocessing package, but I'm not sure. I do expect that we may get some suggestions from that corner to make some minor changes to details of multiprocessing (and perhaps threading), and I think we should be open to those (I expect these will be good suggestions for small tweaks, not major overhauls). - After seeing Raymond's talk about monocle (search for it on PyPI) I am getting excited again about PEP 380 (yield from, return values from generators). Having read the PEP on the plane back home I didn't see anything wrong with it, so it could just be accepted in its current form. Implementation will still have to wait for Python 3.3 because of the moratorium. (Although I wouldn't mind making an exception to get it into 3.2.) - This made me think of how the PEP process should evolve so as to not require my personal approval for every PEP. I think the model for future PEPs should be the one we used for PEP 3148 (futures, which was just approved by Jesse): the discussion is led and moderated by one designated "PEP handler" (a different one for each PEP) and the PEP handler, after reviewing the discussion, decides when the PEP is approved. A PEP handler should be selected for each PEP as soon as possible; without a PEP handler, discussing a PEP is not all that useful. The PEP handler should be someone respected by the community with an interest in the subject of the PEP but at an arms' length (at least) from the PEP author. The PEP handler will have to moderate feedback, separating useful comments from (too much) bikeshedding, repetitious lines of questioning, and other forms of obstruction. The PEP handler should also set and try to maintain a schedule for the discussion. Note that a schedule should not be used to break a tie -- it should be used to stop bikeshedding and repeat discussions, while giving all interested parties a chance to comment. (I should say that this is probably similar to the role of an IETF working group director with respect to RFCs.) - Specifically, if Raymond is interested, I wouldn't mind seeing him as the PEP handler for PEP 380. For some of Martin von Löwis's PEPs (382, 384) I think a PEP handler is sorely lacking -- from the language summit it appeared as if nobody besides Martin understands these PEPs. - A lot of things seem to be happening to make PyPI better. Is this being summarized somewhere? Based on some questions I received during my keynote Q&A (http://bit.ly/bdflqa) I think not enough people are aware of what we are already doing in this area. Frankly, I'm not sure I do, either: I think I've heard of a GSOC student and of plans to take over pypi.appspot.com (with the original developer's permission) to become a full and up-to-date mirror. Mirroring apparently also requires some client changes. Oh, and there's a proposed solution for the "register user" problem where apparently the clients had been broken by a unilateral change to the server to require a certain "yes I agree" checkbox. For a hopefully eventually exhaustive overview of what was accomplished at EuroPython, go to http://wiki.europython.eu/After -- and if you know some blog about EuroPython not yet listed, please add it there. -- --Guido van Rossum (python.org/~guido)
On 7/24/2010 10:08 AM, Guido van Rossum wrote:
- Commit privileges: Maybe we've been too careful with only giving commit privileges to to experienced and trusted new developers. I spoke to Ezio Melotti and from his experience with getting commit privileges, it seems to be a case of "the lion is much more afraid of you than you are afraid of the lion". I.e. having got privileges he was very concerned about doing something wrong, worried about the complexity of SVN, and so on. Since we've got lots of people watching the commit stream, I think that there really shouldn't need to be a worry at all about a new committer doing something malicious, and there shouldn't be much worry about honest beginners' mistakes either -- the main worry remains that new committers don't use their privileges enough.
My initial inclination is to start with 1 or 2 line patches that I am 99.99% certain are correct. But it has occurred to me that it might be better for Python if I were willing to take a greater than 1/10000 chance of making a mistake. But how much greater? What error rate do *you* consider acceptable?
- Concurrency and parallelism: Russel Winder and Sarah Mount pushed the idea of CSP (http://en.wikipedia.org/wiki/Communicating_sequential_processes) in several talks at the conference. They (at least Russell) emphasized the difference between concurrency (interleaved event streams)
This speeds perceived and maybe actual responsiveness to user input.
and parallelism (using many processors to speed things up).
This reduces total time.
Their prediction is that as machines with many processing cores become more prevalent, the relevant architecture will change from cores sharing a single coherent memory (the model on which threads are based) to one where each core has a limited amount of private memory, and communication is done via message passing between the cores.
I believe this is a prediction that current prototypes, if not current products, will be both technically and commercially successful. My impression is enough better than 50/50 to be worth taking into account. It does not seem like much of a leap from private caches that write through to common memory to private memory that is not written through, especially on 64 bit machines with memory space to spare.
- After seeing Raymond's talk about monocle (search for it on PyPI) I am getting excited again about PEP 380 (yield from, return values from generators). Having read the PEP on the plane back home I didn't see anything wrong with it, so it could just be accepted in its current form. Implementation will still have to wait for Python 3.3 because of the moratorium. (Although I wouldn't mind making an exception to get it into 3.2.)
While initially -0, I now think the moratorium was a good idea. It seems to be successful at letting and even encouraging people to target 3.2 by working with 3.1. A big exception like this would probably annoy lots of people who had *their* 'equally good' ideas put off and might annoy alternative implementors counting on core 3.2 being as it. So the only exception I would make would one that had a really good technical reason, like making Python work better on multi-core processors
- This made me think of how the PEP process should evolve so as to not require my personal approval for every PEP. I think the model for future PEPs should be the one we used for PEP 3148 (futures, which was just approved by Jesse):
+1 -- Terry Jan Reedy
On Sat, Jul 24, 2010 at 2:05 PM, Terry Reedy <tjreedy@udel.edu> wrote:
On 7/24/2010 10:08 AM, Guido van Rossum wrote:
- Commit privileges: Maybe we've been too careful with only giving commit privileges to to experienced and trusted new developers. I spoke to Ezio Melotti and from his experience with getting commit privileges, it seems to be a case of "the lion is much more afraid of you than you are afraid of the lion". I.e. having got privileges he was very concerned about doing something wrong, worried about the complexity of SVN, and so on. Since we've got lots of people watching the commit stream, I think that there really shouldn't need to be a worry at all about a new committer doing something malicious, and there shouldn't be much worry about honest beginners' mistakes either -- the main worry remains that new committers don't use their privileges enough.
My initial inclination is to start with 1 or 2 line patches that I am 99.99% certain are correct. But it has occurred to me that it might be better for Python if I were willing to take a greater than 1/10000 chance of making a mistake. But how much greater? What error rate do *you* consider acceptable?
Mistakes get made all the time, mostly by experienced committers. When caught quickly they are easy to roll back (that's arguably much of the point of source control :-). New committers can also start with things like docs where there are fewer risks, and more little things that can easily get fixed but aren't for lack of attention. This will help them figure out the source control tools and workflow, which will build up their (and our) confidence, making future success even more likely. -- --Guido van Rossum (python.org/~guido)
On 7/24/2010 3:08 PM, Guido van Rossum wrote:
While the EuroPython sprints are still going on, I am back home, and after a somewhat restful night of sleep, I have some thoughts I'd like to share before I get distracted. Note, I am jumping wildly between topics.
- Commit privileges: Maybe we've been too careful with only giving commit privileges to to experienced and trusted new developers. I spoke to Ezio Melotti and from his experience with getting commit privileges, it seems to be a case of "the lion is much more afraid of you than you are afraid of the lion". I.e. having got privileges he was very concerned about doing something wrong, worried about the complexity of SVN, and so on. Since we've got lots of people watching the commit stream, I think that there really shouldn't need to be a worry at all about a new committer doing something malicious, and there shouldn't be much worry about honest beginners' mistakes either -- the main worry remains that new committers don't use their privileges enough. So, my recommendation (which surely is a turn-around of my *own* attitude in the past) is to give out more commit privileges sooner.
+1. I think this would have a very positive effect on the way that the Python development community id perceived from the outside. In reality it's probably mostly an acceptance of the fact that new protocols are appropriate with a rather larger development community.
- Concurrency and parallelism: Russel Winder and Sarah Mount pushed the idea of CSP (http://en.wikipedia.org/wiki/Communicating_sequential_processes) in several talks at the conference. They (at least Russell) emphasized the difference between concurrency (interleaved event streams) and parallelism (using many processors to speed things up). Their prediction is that as machines with many processing cores become more prevalent, the relevant architecture will change from cores sharing a single coherent memory (the model on which threads are based) to one where each core has a limited amount of private memory, and communication is done via message passing between the cores. This gives them (and me :-) hope that the GIL won't be a problem as long as we adopt a parallel processing model. Two competing models are the Actor model, which is based on asynchronous communication, and CSP, which is synchronous (when a writer writes to a channel, it blocks until a reader reads that value -- a rendezvous). At least Sarah suggested that both models are important. She also mentioned that a merger is under consideration between the two major CSP-for-Python packages, Py-CSP and Python-CSP. I also believe that the merger will be based on the stdlib multiprocessing package, but I'm not sure. I do expect that we may get some suggestions from that corner to make some minor changes to details of multiprocessing (and perhaps threading), and I think we should be open to those (I expect these will be good suggestions for small tweaks, not major overhauls).
- After seeing Raymond's talk about monocle (search for it on PyPI) I am getting excited again about PEP 380 (yield from, return values from generators). Having read the PEP on the plane back home I didn't see anything wrong with it, so it could just be accepted in its current form. Implementation will still have to wait for Python 3.3 because of the moratorium. (Although I wouldn't mind making an exception to get it into 3.2.)
I can understand the temptation, but hope you can manage to resist it. The downside of allowing such exceptions is that people won't take these pronouncements seriously if they see that a sufficiently desirable goal is a reason for ignoring them. Everyone should be subject to the same rules.
- This made me think of how the PEP process should evolve so as to not require my personal approval for every PEP. I think the model for future PEPs should be the one we used for PEP 3148 (futures, which was just approved by Jesse): the discussion is led and moderated by one designated "PEP handler" (a different one for each PEP) and the PEP handler, after reviewing the discussion, decides when the PEP is approved. A PEP handler should be selected for each PEP as soon as possible; without a PEP handler, discussing a PEP is not all that useful. The PEP handler should be someone respected by the community with an interest in the subject of the PEP but at an arms' length (at least) from the PEP author. The PEP handler will have to moderate feedback, separating useful comments from (too much) bikeshedding, repetitious lines of questioning, and other forms of obstruction. The PEP handler should also set and try to maintain a schedule for the discussion. Note that a schedule should not be used to break a tie -- it should be used to stop bikeshedding and repeat discussions, while giving all interested parties a chance to comment. (I should say that this is probably similar to the role of an IETF working group director with respect to RFCs.)
I think the process where Jesse steered PEP 3148 worked well.
- Specifically, if Raymond is interested, I wouldn't mind seeing him as the PEP handler for PEP 380. For some of Martin von Löwis's PEPs (382, 384) I think a PEP handler is sorely lacking -- from the language summit it appeared as if nobody besides Martin understands these PEPs.
- A lot of things seem to be happening to make PyPI better. Is this being summarized somewhere? Based on some questions I received during my keynote Q&A (http://bit.ly/bdflqa) I think not enough people are aware of what we are already doing in this area. Frankly, I'm not sure I do, either: I think I've heard of a GSOC student and of plans to take over pypi.appspot.com (with the original developer's permission) to become a full and up-to-date mirror. Mirroring apparently also requires some client changes. Oh, and there's a proposed solution for the "register user" problem where apparently the clients had been broken by a unilateral change to the server to require a certain "yes I agree" checkbox.
There is indeed. PyPi has been troublesome in the past mostly because while many people have strong opinions on how it should work, only Martin has put in much time to try and improve it (and has had precious little thanks for that). The site will definitely benefit from wider community involvement, and I think that your suggestions about opening up the developer community will also help relay the message that people are welcome to help on other aspects of Python such as the cheese shop.
For a hopefully eventually exhaustive overview of what was accomplished at EuroPython, go to http://wiki.europython.eu/After -- and if you know some blog about EuroPython not yet listed, please add it there.
Cool idea. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/
On Sun, 25 Jul 2010 10:04:57 am Steve Holden wrote:
- After seeing Raymond's talk about monocle (search for it on PyPI) I am getting excited again about PEP 380 (yield from, return values from generators). Having read the PEP on the plane back home I didn't see anything wrong with it, so it could just be accepted in its current form. Implementation will still have to wait for Python 3.3 because of the moratorium. (Although I wouldn't mind making an exception to get it into 3.2.)
I can understand the temptation, but hope you can manage to resist it.
The downside of allowing such exceptions is that people won't take these pronouncements seriously if they see that a sufficiently desirable goal is a reason for ignoring them. Everyone should be subject to the same rules.
I have no opinion on PEP 380 specifically, but surely a *sufficiently* desirable goal *should* be a reason for breaking the rules? Obedience to some abstract rule just because it is the rule is not a virtue. The moratorium is there to advance Python as a whole, and if (a big "if") it becomes a hindrance instead, then Guido should make an exception. I promise that I won't cease taking his pronouncements seriously if he does :) -- Steven D'Aprano
On Sat, Jul 24, 2010 at 9:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, 25 Jul 2010 10:04:57 am Steve Holden wrote:
- After seeing Raymond's talk about monocle (search for it on PyPI) I am getting excited again about PEP 380 (yield from, return values from generators). Having read the PEP on the plane back home I didn't see anything wrong with it, so it could just be accepted in its current form. Implementation will still have to wait for Python 3.3 because of the moratorium. (Although I wouldn't mind making an exception to get it into 3.2.)
I can understand the temptation, but hope you can manage to resist it.
The downside of allowing such exceptions is that people won't take these pronouncements seriously if they see that a sufficiently desirable goal is a reason for ignoring them. Everyone should be subject to the same rules.
I have no opinion on PEP 380 specifically, but surely a *sufficiently* desirable goal *should* be a reason for breaking the rules? Obedience to some abstract rule just because it is the rule is not a virtue. The moratorium is there to advance Python as a whole, and if (a big "if") it becomes a hindrance instead, then Guido should make an exception.
I promise that I won't cease taking his pronouncements seriously if he does :)
I wasn't for the moratorium in the first place, so take this with a grain of salt, but ISTM that if you feel this doesn't impact the moratorium's goals then there's nothing logically inconsistent about allowing it through. Of course, if you feel like it does and you decide to let it through anyway, I think it would be worth explaining really well why exactly that happened. Geremy Condra
On Sun, Jul 25, 2010 at 2:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, 25 Jul 2010 10:04:57 am Steve Holden wrote:
- After seeing Raymond's talk about monocle (search for it on PyPI) I am getting excited again about PEP 380 (yield from, return values from generators). Having read the PEP on the plane back home I didn't see anything wrong with it, so it could just be accepted in its current form. Implementation will still have to wait for Python 3.3 because of the moratorium. (Although I wouldn't mind making an exception to get it into 3.2.)
I can understand the temptation, but hope you can manage to resist it.
The downside of allowing such exceptions is that people won't take these pronouncements seriously if they see that a sufficiently desirable goal is a reason for ignoring them. Everyone should be subject to the same rules.
I have no opinion on PEP 380 specifically, but surely a *sufficiently* desirable goal *should* be a reason for breaking the rules? Obedience to some abstract rule just because it is the rule is not a virtue. The moratorium is there to advance Python as a whole, and if (a big "if") it becomes a hindrance instead, then Guido should make an exception.
I promise that I won't cease taking his pronouncements seriously if he does :)
We knew PEP 380 would be hurt by the moratorium when the moratorium PEP went through. The goals of the moratorium itself, in making it possible to have a 3.2 release that is fully supported by all of the major Python implementations, still apply, and I believe making an exception for PEP 380 *would* make those goals much harder to achieve. So, while I can understand Guido's temptation (PEP 380 *is* pretty cool), I'm among those that hope he resists that temptation. Letting these various ideas bake a little longer without syntactic support likely won't hurt either. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Nick Coghlan, 25.07.2010 08:29:
We knew PEP 380 would be hurt by the moratorium when the moratorium PEP went through.
The goals of the moratorium itself, in making it possible to have a 3.2 release that is fully supported by all of the major Python implementations, still apply, and I believe making an exception for PEP 380 *would* make those goals much harder to achieve.
IMO, it would be worth asking the other implementations if that is the case. It may well be that they are interested in implementing it anyway, so getting it into CPython and the other implementations at the same time may actually be possible. It wouldn't meet the moratorium as such, but it would absolutely comply with its goals. Stefan
Am 25.07.2010 08:54, schrieb Stefan Behnel:
Nick Coghlan, 25.07.2010 08:29:
We knew PEP 380 would be hurt by the moratorium when the moratorium PEP went through.
The goals of the moratorium itself, in making it possible to have a 3.2 release that is fully supported by all of the major Python implementations, still apply, and I believe making an exception for PEP 380 *would* make those goals much harder to achieve.
IMO, it would be worth asking the other implementations if that is the case. It may well be that they are interested in implementing it anyway, so getting it into CPython and the other implementations at the same time may actually be possible. It wouldn't meet the moratorium as such, but it would absolutely comply with its goals.
+1. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
2010/7/25 Stefan Behnel <stefan_ml@behnel.de>:
Nick Coghlan, 25.07.2010 08:29:
We knew PEP 380 would be hurt by the moratorium when the moratorium PEP went through.
The goals of the moratorium itself, in making it possible to have a 3.2 release that is fully supported by all of the major Python implementations, still apply, and I believe making an exception for PEP 380 *would* make those goals much harder to achieve.
IMO, it would be worth asking the other implementations if that is the case. It may well be that they are interested in implementing it anyway, so getting it into CPython and the other implementations at the same time may actually be possible. It wouldn't meet the moratorium as such, but it would absolutely comply with its goals.
Speaking from the PyPy perspective, syntax is not really a problem. It, for example, took me ~1 week to more PyPy from 2.5 to 2.7 syntax. A more interesting moratorium for us would be one on tests that are not implementation portable. :) -- Regards, Benjamin
On 28/07/2010 22:20, Benjamin Peterson wrote:
2010/7/25 Stefan Behnel<stefan_ml@behnel.de>:
Nick Coghlan, 25.07.2010 08:29:
We knew PEP 380 would be hurt by the moratorium when the moratorium PEP went through.
The goals of the moratorium itself, in making it possible to have a 3.2 release that is fully supported by all of the major Python implementations, still apply, and I believe making an exception for PEP 380 *would* make those goals much harder to achieve.
IMO, it would be worth asking the other implementations if that is the case. It may well be that they are interested in implementing it anyway, so getting it into CPython and the other implementations at the same time may actually be possible. It wouldn't meet the moratorium as such, but it would absolutely comply with its goals.
Speaking from the PyPy perspective, syntax is not really a problem. It, for example, took me ~1 week to more PyPy from 2.5 to 2.7 syntax. A more interesting moratorium for us would be one on tests that are not implementation portable. :)
At the PyCon language summit the IronPython guys said that syntax wasn't an issue for them either but changes on builtins *could* be an issue. All the best, Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On Wed, Jul 28, 2010 at 5:20 PM, Benjamin Peterson <benjamin@python.org> wrote:
2010/7/25 Stefan Behnel <stefan_ml@behnel.de>:
Nick Coghlan, 25.07.2010 08:29:
We knew PEP 380 would be hurt by the moratorium when the moratorium PEP went through.
The goals of the moratorium itself, in making it possible to have a 3.2 release that is fully supported by all of the major Python implementations, still apply, and I believe making an exception for PEP 380 *would* make those goals much harder to achieve.
IMO, it would be worth asking the other implementations if that is the case. It may well be that they are interested in implementing it anyway, so getting it into CPython and the other implementations at the same time may actually be possible. It wouldn't meet the moratorium as such, but it would absolutely comply with its goals.
Speaking from the PyPy perspective, syntax is not really a problem. It, for example, took me ~1 week to more PyPy from 2.5 to 2.7 syntax. A more interesting moratorium for us would be one on tests that are not implementation portable. :)
I thought at the last two pycons, we've all discussed that we should have a system in place for marking tests *and* modules within the stdlib as "will only work on FooPython". I suspect that it's waiting on the shared-stdlib effort, which is waiting on mercurial (and time). jesse
On 28/07/2010 23:57, Jesse Noller wrote:
On Wed, Jul 28, 2010 at 5:20 PM, Benjamin Peterson<benjamin@python.org> wrote:
2010/7/25 Stefan Behnel<stefan_ml@behnel.de>:
Nick Coghlan, 25.07.2010 08:29:
We knew PEP 380 would be hurt by the moratorium when the moratorium PEP went through.
The goals of the moratorium itself, in making it possible to have a 3.2 release that is fully supported by all of the major Python implementations, still apply, and I believe making an exception for PEP 380 *would* make those goals much harder to achieve.
IMO, it would be worth asking the other implementations if that is the case. It may well be that they are interested in implementing it anyway, so getting it into CPython and the other implementations at the same time may actually be possible. It wouldn't meet the moratorium as such, but it would absolutely comply with its goals.
Speaking from the PyPy perspective, syntax is not really a problem. It, for example, took me ~1 week to more PyPy from 2.5 to 2.7 syntax. A more interesting moratorium for us would be one on tests that are not implementation portable. :)
I thought at the last two pycons, we've all discussed that we should have a system in place for marking tests *and* modules within the stdlib as "will only work on FooPython". I suspect that it's waiting on the shared-stdlib effort, which is waiting on mercurial (and time).
It is also one of the things Brett intends to work on if his proposal is accepted by the PSF board. Michael
jesse _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On Thu, Jul 29, 2010 at 8:57 AM, Jesse Noller <jnoller@gmail.com> wrote:
I thought at the last two pycons, we've all discussed that we should have a system in place for marking tests *and* modules within the stdlib as "will only work on FooPython". I suspect that it's waiting on the shared-stdlib effort, which is waiting on mercurial (and time).
@skipIf, @cpython_only and @test_impl_detail have been getting sprinkled fairly liberally throughout the test suite, so that part of the effort is already in progress. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, Jul 29, 2010 at 8:10 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
On Thu, Jul 29, 2010 at 8:57 AM, Jesse Noller <jnoller@gmail.com> wrote:
I thought at the last two pycons, we've all discussed that we should have a system in place for marking tests *and* modules within the stdlib as "will only work on FooPython". I suspect that it's waiting on the shared-stdlib effort, which is waiting on mercurial (and time).
@skipIf, @cpython_only and @test_impl_detail have been getting sprinkled fairly liberally throughout the test suite, so that part of the effort is already in progress.
<mr. burns>Excellent</mr. burns> I have some sprinkling of my own to do then, I guess.
2010/7/29 Nick Coghlan <ncoghlan@gmail.com>:
On Thu, Jul 29, 2010 at 8:57 AM, Jesse Noller <jnoller@gmail.com> wrote:
I thought at the last two pycons, we've all discussed that we should have a system in place for marking tests *and* modules within the stdlib as "will only work on FooPython". I suspect that it's waiting on the shared-stdlib effort, which is waiting on mercurial (and time).
@skipIf, @cpython_only and @test_impl_detail have been getting sprinkled fairly liberally throughout the test suite, so that part of the effort is already in progress.
Note that as I port PyPy 2.7, I'm marking tests. -- Regards, Benjamin
On Sat, Jul 24, 2010 at 10:08 AM, Guido van Rossum <guido@python.org> wrote:
While the EuroPython sprints are still going on, I am back home, and after a somewhat restful night of sleep, I have some thoughts I'd like to share before I get distracted. Note, I am jumping wildly between topics.
- Commit privileges: Maybe we've been too careful with only giving commit privileges to to experienced and trusted new developers. I spoke to Ezio Melotti and from his experience with getting commit privileges, it seems to be a case of "the lion is much more afraid of you than you are afraid of the lion". I.e. having got privileges he was very concerned about doing something wrong, worried about the complexity of SVN, and so on. Since we've got lots of people watching the commit stream, I think that there really shouldn't need to be a worry at all about a new committer doing something malicious, and there shouldn't be much worry about honest beginners' mistakes either -- the main worry remains that new committers don't use their privileges enough. So, my recommendation (which surely is a turn-around of my *own* attitude in the past) is to give out more commit privileges sooner.
I'd agree with this as well; speaking as someone who was terrified when I first got privileges, I can really back this up. However, I wonder if voluntary code reviews for bigger ticket items would be a good thing to encourage more. I know some people use rietveld - but maybe we should be using it more, plus emails to python-committers asking for reviews. As time has gone on, I've become more and more a firm believer in doing peer reviews.
- Concurrency and parallelism: Russel Winder and Sarah Mount pushed the idea of CSP (http://en.wikipedia.org/wiki/Communicating_sequential_processes) in several talks at the conference. They (at least Russell) emphasized the difference between concurrency (interleaved event streams) and parallelism (using many processors to speed things up). Their prediction is that as machines with many processing cores become more prevalent, the relevant architecture will change from cores sharing a single coherent memory (the model on which threads are based) to one where each core has a limited amount of private memory, and communication is done via message passing between the cores. This gives them (and me :-) hope that the GIL won't be a problem as long as we adopt a parallel processing model. Two competing models are the Actor model, which is based on asynchronous communication, and CSP, which is synchronous (when a writer writes to a channel, it blocks until a reader reads that value -- a rendezvous). At least Sarah suggested that both models are important. She also mentioned that a merger is under consideration between the two major CSP-for-Python packages, Py-CSP and Python-CSP. I also believe that the merger will be based on the stdlib multiprocessing package, but I'm not sure. I do expect that we may get some suggestions from that corner to make some minor changes to details of multiprocessing (and perhaps threading), and I think we should be open to those (I expect these will be good suggestions for small tweaks, not major overhauls).
I'm open to changes; but remain skeptical the CSP will suddenly take over the world :) There's other, competing philosophies and toolkits out there as well. Additionally; the patches would have to be relicensed - python-csp is under GPLv2 AFAICT.
- After seeing Raymond's talk about monocle (search for it on PyPI) I am getting excited again about PEP 380 (yield from, return values from generators). Having read the PEP on the plane back home I didn't see anything wrong with it, so it could just be accepted in its current form. Implementation will still have to wait for Python 3.3 because of the moratorium. (Although I wouldn't mind making an exception to get it into 3.2.)
I, like others, want PEP 380 to be in and done (it's exciting!). However, we knew going into the moratorium that it would negatively affect PEP 380 - as a co-author, it was one of the few things which made me second-guess the implementation of the moratorium. So; in this case I'd have to vote no, we knew going in it would do this.
- This made me think of how the PEP process should evolve so as to not require my personal approval for every PEP. I think the model for future PEPs should be the one we used for PEP 3148 (futures, which was just approved by Jesse): the discussion is led and moderated by one designated "PEP handler" (a different one for each PEP) and the PEP handler, after reviewing the discussion, decides when the PEP is approved. A PEP handler should be selected for each PEP as soon as possible; without a PEP handler, discussing a PEP is not all that useful. The PEP handler should be someone respected by the community with an interest in the subject of the PEP but at an arms' length (at least) from the PEP author. The PEP handler will have to moderate feedback, separating useful comments from (too much) bikeshedding, repetitious lines of questioning, and other forms of obstruction. The PEP handler should also set and try to maintain a schedule for the discussion. Note that a schedule should not be used to break a tie -- it should be used to stop bikeshedding and repeat discussions, while giving all interested parties a chance to comment. (I should say that this is probably similar to the role of an IETF working group director with respect to RFCs.)
This reminds me of discussions I've had in some venues about Python adopting a lieutenant-based system ala the linux kernel. I'm a fan of what you're suggesting, but would take it one step further - the PEP handler should also be a committer, someone with experience working with the current core group so that the PEP handler can eventually help mentor (if needed) the new code (and possibly committer) through the integration process.
On Sun, Jul 25, 2010 at 2:26 PM, Jesse Noller <jnoller@gmail.com> wrote:
On Sat, Jul 24, 2010 at 10:08 AM, Guido van Rossum <guido@python.org> wrote:
- After seeing Raymond's talk about monocle (search for it on PyPI) I am getting excited again about PEP 380 (yield from, return values from generators). Having read the PEP on the plane back home I didn't see anything wrong with it, so it could just be accepted in its current form. Implementation will still have to wait for Python 3.3 because of the moratorium. (Although I wouldn't mind making an exception to get it into 3.2.)
I, like others, want PEP 380 to be in and done (it's exciting!). However, we knew going into the moratorium that it would negatively affect PEP 380 - as a co-author, it was one of the few things which made me second-guess the implementation of the moratorium. So; in this case I'd have to vote no, we knew going in it would do this.
I was/am pro PEP 380 and pro Moratorium. We knew going in to the moratorium that PEP 380 wouldn't be included and talked about it extensively. We should honor that now for the same reasons we talked about then: declaring "no syntax changes" allows for a focus on the stdlib. -Jack
On 25 July 2010 19:26, Jesse Noller <jnoller@gmail.com> wrote:
On Sat, Jul 24, 2010 at 10:08 AM, Guido van Rossum <guido@python.org> wrote:
- Concurrency and parallelism: Russel Winder and Sarah Mount pushed the idea of CSP (http://en.wikipedia.org/wiki/Communicating_sequential_processes) in several talks at the conference. They (at least Russell) emphasized the difference between concurrency (interleaved event streams) and parallelism (using many processors to speed things up). Their prediction is that as machines with many processing cores become more prevalent, the relevant architecture will change from cores sharing a single coherent memory (the model on which threads are based) to one where each core has a limited amount of private memory, and communication is done via message passing between the cores. This gives them (and me :-) hope that the GIL won't be a problem as long as we adopt a parallel processing model. Two competing models are the Actor model, which is based on asynchronous communication, and CSP, which is synchronous (when a writer writes to a channel, it blocks until a reader reads that value -- a rendezvous). At least Sarah suggested that both models are important. She also mentioned that a merger is under consideration between the two major CSP-for-Python packages, Py-CSP and Python-CSP. I also believe that the merger will be based on the stdlib multiprocessing package, but I'm not sure. I do expect that we may get some suggestions from that corner to make some minor changes to details of multiprocessing (and perhaps threading), and I think we should be open to those (I expect these will be good suggestions for small tweaks, not major overhauls).
I'm open to changes; but remain skeptical the CSP will suddenly take over the world :)
There's other, competing philosophies and toolkits out there as well. Additionally; the patches would have to be relicensed - python-csp is under GPLv2 AFAICT.
Thanks for this write-up. Just a few things to follow-up from the comments here... * There is a discussion taking place, as we speak on the python-csp list, about merging PyCSP and python-csp. Like I said in the talk, it's really a matter of details more than anything else and will likely go ahead, hopefully soon. The issue of licensing is one thing we are talking about. * We do currently use threading and multiprocessing. I think we could achieve a big performance boost by loosing mp and moving to a C implementation of much of our own work, but only because so much of mp is pure Python and implementing a message-passing library requires some rather idiosyncratic programming. That's really an implementation detail for us though, so please don't read it as a comment about mp or threading in general. * In terms of the Python standard library, python-csp was very much written with our own purposes in mind, and in order to provide a platform for the CSP "paradigm" of programming to be used in a Pythonic style. Multiprocessing as it currently stands is great, and provides good support for the sort of design patterns that are currently very common in concurrent and parallel programming, like process-safe queues and so on. That's brilliant, but I would like to see the Python stdlib support a wider range of programming styles, including but not exclusive to message-passing concurrency. java.util.concurrent does this very nicely and has a wide range of constructs such as barriers, futures and so on. Python has are a wide range of packages outside the standard library (trellis, pycells, python-csp, STM implementations, and so on) but not so much that comes inside the stdlib. I think that's a shame. I won't bang on about this too much here, we'll write up some code over the next few months and when there's something more concrete and fully-baked to discuss I think it would a really useful discussion to have. Thanks, Sarah -- Sarah Mount, Senior Lecturer, University of Wolverhampton website: http://www.snim2.org/ twitter: @snim2
On Sat, Jul 24, 2010 at 7:08 AM, Guido van Rossum <guido@python.org> wrote:
- Commit privileges: Maybe we've been too careful with only giving commit privileges to to experienced and trusted new developers. I spoke to Ezio Melotti and from his experience with getting commit privileges, it seems to be a case of "the lion is much more afraid of you than you are afraid of the lion". I.e. having got privileges he was very concerned about doing something wrong, worried about the complexity of SVN, and so on. Since we've got lots of people watching the commit stream, I think that there really shouldn't need to be a worry at all about a new committer doing something malicious, and there shouldn't be much worry about honest beginners' mistakes either -- the main worry remains that new committers don't use their privileges enough. So, my recommendation (which surely is a turn-around of my *own* attitude in the past) is to give out more commit privileges sooner.
+1 agreed!
- Concurrency and parallelism: Russel Winder and Sarah Mount pushed the idea of CSP (http://en.wikipedia.org/wiki/Communicating_sequential_processes) in several talks at the conference. They (at least Russell) emphasized the difference between concurrency (interleaved event streams) and parallelism (using many processors to speed things up). Their prediction is that as machines with many processing cores become more prevalent, the relevant architecture will change from cores sharing a single coherent memory (the model on which threads are based) to one where each core has a limited amount of private memory, and communication is done via message passing between the cores. This
I do not believe this prediction. The dominant systems being deployed today have 4,6,8,12 cores on a single memory bus with coherent memory. Sure, NUMA is clearly the dominant architecture but any subdivisions will still have multiple cores with access to the same coherent memory. We'll just end up with multiples of that in one system. The future architecture is _not_ the Cell processor. gives them (and me :-) hope that the GIL won't be a problem as long as
we adopt a parallel processing model. Two competing models are the
Actor model, which is based on asynchronous communication, and CSP,
which is synchronous (when a writer writes to a channel, it blocks until a reader reads that value -- a rendezvous). At least Sarah suggested that both models are important. She also mentioned that a merger is under consideration between the two major CSP-for-Python packages, Py-CSP and Python-CSP. I also believe that the merger will be based on the stdlib multiprocessing package, but I'm not sure. I do expect that we may get some suggestions from that corner to make some minor changes to details of multiprocessing (and perhaps threading), and I think we should be open to those (I expect these will be good suggestions for small tweaks, not major overhauls).
The async communication model is good regardless of individual architecture because it more readily grows out beyond a single computer to a network of computers when you want to scale an application up. So yes, anything we can do to help that is good. Hoping that the GIL won't be a problem has been a strategy for over a decade. It has failed. It is limiting what people can and will do with Python today. It isn't going to magically un-become a problem. -gps
On 7/25/10 3:19 PM, "Gregory P. Smith" <greg@krypto.org> wrote:
On Sat, Jul 24, 2010 at 7:08 AM, Guido van Rossum <guido@python.org> wrote:
- Commit privileges: Maybe we've been too careful with only giving commit privileges to to experienced and trusted new developers. I spoke to Ezio Melotti and from his experience with getting commit privileges, it seems to be a case of "the lion is much more afraid of you than you are afraid of the lion". I.e. having got privileges he was very concerned about doing something wrong, worried about the complexity of SVN, and so on. Since we've got lots of people watching the commit stream, I think that there really shouldn't need to be a worry at all about a new committer doing something malicious, and there shouldn't be much worry about honest beginners' mistakes either -- the main worry remains that new committers don't use their privileges enough. So, my recommendation (which surely is a turn-around of my *own* attitude in the past) is to give out more commit privileges sooner.
+1 agreed!
- Concurrency and parallelism: Russel Winder and Sarah Mount pushed the idea of CSP (http://en.wikipedia.org/wiki/Communicating_sequential_processes) in several talks at the conference. They (at least Russell) emphasized the difference between concurrency (interleaved event streams) and parallelism (using many processors to speed things up). Their prediction is that as machines with many processing cores become more prevalent, the relevant architecture will change from cores sharing a single coherent memory (the model on which threads are based) to one where each core has a limited amount of private memory, and communication is done via message passing between the cores. This
I do not believe this prediction. The dominant systems being deployed today have 4,6,8,12 cores on a single memory bus with coherent memory. Sure, NUMA is clearly the dominant architecture but any subdivisions will still have multiple cores with access to the same coherent memory. We'll just end up with multiples of that in one system. The future architecture is _not_ the Cell processor.
+1 And there is so much legacy software out there that depends on single coherency systems that it is unlikely such a memory architecture would become pervasive, given the hurdle of rewriting those programs, python programs included.
gives them (and me :-) hope that the GIL won't be a problem as long as we adopt a parallel processing model. Two competing models are the Actor model, which is based on asynchronous communication, and CSP, which is synchronous (when a writer writes to a channel, it blocks until a reader reads that value -- a rendezvous). At least Sarah suggested that both models are important. She also mentioned that a merger is under consideration between the two major CSP-for-Python packages, Py-CSP and Python-CSP. I also believe that the merger will be based on the stdlib multiprocessing package, but I'm not sure. I do expect that we may get some suggestions from that corner to make some minor changes to details of multiprocessing (and perhaps threading), and I think we should be open to those (I expect these will be good suggestions for small tweaks, not major overhauls).
The async communication model is good regardless of individual architecture because it more readily grows out beyond a single computer to a network of computers when you want to scale an application up. So yes, anything we can do to help that is good.
+1
Hoping that the GIL won't be a problem has been a strategy for over a decade. It has failed. It is limiting what people can and will do with Python today. It isn't going to magically un-become a problem.
+1 FWIW: We use Python at Tabblo, straddled across Python 2.5.4 and 2.6.5. They work. And they work well. But we make light use of threads (mostly background I/O handling), and heavy use of multiple processes because we can't take advantage of our multi-core systems otherwise. -peter
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/peter.a.portante%40gmail.c...
On Sun, Jul 25, 2010 at 8:31 PM, Peter Portante <peter.a.portante@gmail.com> wrote:
FWIW: We use Python at Tabblo, straddled across Python 2.5.4 and 2.6.5. They work. And they work well. But we make light use of threads (mostly background I/O handling), and heavy use of multiple processes because we can't take advantage of our multi-core systems otherwise.
Isn't this an indication that the GIL is, in fact, not (much of) a problem? I wish those trying to get rid of the GIL well. But it may not be the panacea some folks are hoping for. Multi-threaded programming remains hard (and removing the GIL might actually make it harder). Jython and IronPython don't have a GIL, and I think PyPy may not either. Does anyone have experience with GIL-free programming in one of those? -- --Guido van Rossum (python.org/~guido)
On 7/25/10 11:42 PM, "Guido van Rossum" <guido@python.org> wrote:
On Sun, Jul 25, 2010 at 8:31 PM, Peter Portante <peter.a.portante@gmail.com> wrote:
FWIW: We use Python at Tabblo, straddled across Python 2.5.4 and 2.6.5. They work. And they work well. But we make light use of threads (mostly background I/O handling), and heavy use of multiple processes because we can't take advantage of our multi-core systems otherwise.
Isn't this an indication that the GIL is, in fact, not (much of) a problem?
Meaning, we have a working system, so the GIL is not (much of) a problem? Or that we have successfully spent a lot of time and effort rewriting embarrassingly parallel multithreaded algorithms into a bit more complex message passing multi-process algorithms because we can't get the language implementation to make efficient use of multiple CPUs, thus avoiding the GIL from being (much of) a problem? Perhaps we have to ask what does it mean to say the GIL is a problem? If what we mean is that the existence of the GIL does not cause a CPython based program to fail, then yes, it is not a problem at all. In fact, it is a testament to the level of excellence the code has achieved through the hard work folks have put in over the years. If what we mean is that the existence of the GIL prevents a multithreaded CPython application from taking advantage of multiple CPUs, then yes, it is a "problem". So the above statement says that the GIL is not a problem, and that it is THE problem, depending on your definition. :)
I wish those trying to get rid of the GIL well. But it may not be the panacea some folks are hoping for.
You are right, getting rid of the GIL is not a panacea for anything. Removing the GIL means that there will be other changes to the behavioral landscape of the language implementation which folks will have to learn and understand well to write multi-threaded programs that perform well. Anybody wishing to make a whole system run well must engage in that process of learning and discovery. Yet, shouldn't we be able to write a simple embarrassingly parallel multithreaded algorithm in python (no C-extensions) and have its execution use all the cores on a system using CPython? Python is a beautiful language in which to express algorithms. Having to resort to other languages, C extensions, or other implementations of Python, in order to express those algorithms that rely on execution entities sharing a coherent memory space is a limitation imposed by the existence of the GIL in CPython. Is that limitation worth the effort to remove? Perhaps. Perhaps not. Perhaps Jython, or IronPython, or other implementations of Python that don't have a GIL provide a path forward for folks that need that. Those implementations don't currently provide a path forward for what we are doing, so we avoid the use of threads with CPython.
Multi-threaded programming remains hard (and removing the GIL might actually make it harder).
Could we make a statement that the perceived difficulty of multithreaded programming would only increase if a CPython implementation had undocumented behaviors, or undefined behaviors that should be defined? In other words, the difficulty of multithreaded programming is independent of the existence of a/the GIL, but is dependent on the thorough documentation of all language implementation behaviors.
Jython and IronPython don't have a GIL, and I think PyPy may not either.
FWIW: We have considered switching to Jython because it does not have a GIL. Unfortunately, we'd have to find replacements for some of the C-extension modules we use. Sincerely, -peter
Does anyone have experience with GIL-free programming in one of those?
On 7/26/2010 2:40 AM, Peter Portante wrote:
Yet, shouldn't we be able to write a simple embarrassingly parallel multithreaded algorithm in python (no C-extensions) and have its execution use all the cores on a system using CPython?
Abstractly, yes, and I believe you can do that now with some implementations. The actual questions are along the lines of ... What would be the cost of making that happen with CPython? Who would be disadvanged making that happen with CPython? and for both of those, Is the tradeoff worth it? Another way to put it is Should CPython be optimized for 1, 2, 3, or 4 or more cores? The answer to this is obviously changing. I will soon replace a single core with a 4/6 core machine, so would be right in the middle on that, except that my current work is all single-threaded anyway. But that could change. Should all implementation be optimized the same way? Of course, with several developers focused on these issues, we could have a compile time switch and distribute multiple Windows binaries, but this does not seem like fun, volunteer-type stuff. -- Terry Jan Reedy
Terry Reedy wrote:
Should CPython be optimized for 1, 2, 3, or 4 or more cores? The answer to this is obviously changing. I will soon replace a single core with a 4/6 core machine,
I don't think you can answer that just by considering the average number of cores in a CPU. Even if my CPU has 4 cores, most of the time the Python code I run on it isn't going to take advantage of more than one of them, simply because it's not written to be multi-threaded. I would not like to be in a position where I *have* to use some number of cores in order to get reasonable performance from my Python code. -- Greg
On 26/07/2010 04:42, Guido van Rossum wrote:
On Sun, Jul 25, 2010 at 8:31 PM, Peter Portante <peter.a.portante@gmail.com> wrote:
FWIW: We use Python at Tabblo, straddled across Python 2.5.4 and 2.6.5. They work. And they work well. But we make light use of threads (mostly background I/O handling), and heavy use of multiple processes because we can't take advantage of our multi-core systems otherwise.
Isn't this an indication that the GIL is, in fact, not (much of) a problem?
I wish those trying to get rid of the GIL well. But it may not be the panacea some folks are hoping for. Multi-threaded programming remains hard (and removing the GIL might actually make it harder).
Jython and IronPython don't have a GIL, and I think PyPy may not either. Does anyone have experience with GIL-free programming in one of those?
At Resolver Systems we created a "calculation system" that does large calculations on background threads using IronPython. Doing them on a background thread allows the ui to remain responsive. Several calculations could run simultaneously using multiple cores. As the calculation operates on a large object graph (which the ui then needs access to in order to display it) using multiprocessing would have imposed a very big overhead due to serialization / deserialization (the program runs on windows). Using CPython would have made the program a lot slower due to the GIL. All the best, Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On 26 Jul, 2010,at 12:00 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote: On 26/07/2010 04:42, Guido van Rossum wrote:
On Sun, Jul 25, 2010 at 8:31 PM, Peter Portante <peter.a.portante@gmail.com> wrote:
FWIW: We use Python at Tabblo, straddled across Python 2.5.4 and 2.6.5. They work. And they work well. But we make light use of threads (mostly background I/O handling), and heavy use of multiple processes because we can't take advantage of our multi-core systems otherwise.
Isn't this an indication that the GIL is, in fact, not (much of) a problem?
I wish those trying to get rid of the GIL well But it may not be the panacea some folks are hoping for. Multi-threaded programming remains hard (and removing the GIL might actually make it harder).
Jython and IronPython don't have a GIL, and I think PyPy may not either. Does anyone have experience with GIL-free programming in one of those?
At Resolver Systems we created a "calculation system" that does large calculations on background threads using IronPython. Doing them on a background thread allows the ui to remain responsive. Several calculations could run simultaneously using multiple cores. As the calculation operates on a large object graph (which the ui then needs access to in order to display it) using multiprocessing would have imposed a very big overhead due to serialization / deserialization (the program runs on windows). Using CPython would have made the program a lot slower due to the GIL. I have a simular usecase, although in my case it's more using large blobs instead of complicated datastructures. I'm not hurt by the GIL because most threads run C code most of the time, which enables us to use multiple CPU cores without getting hurt by the GIL. In my opinion the GIL is a weak point of CPython and it would be nice if it could be fixed. That is however easier said than done, a number of people have tried in the past and ran into implementation limitations like our refcounting garbage collector that make hard to remove the GIL without either rewriting lots of code, or running into a brick wall performance-wise. The HotPy presentation at EuroPython shows that it is possible to remove the GIL, although at the cost of replacing the garbage collector and most likely breaking existing C extensions (although the HotPy author seemed to have a possible workaround for that) Ronald
On Mon, Jul 26, 2010 at 3:00 AM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
At Resolver Systems we created a "calculation system" that does large calculations on background threads using IronPython. Doing them on a background thread allows the ui to remain responsive. Several calculations could run simultaneously using multiple cores.
As the calculation operates on a large object graph (which the ui then needs access to in order to display it) using multiprocessing would have imposed a very big overhead due to serialization / deserialization (the program runs on windows).
Using CPython would have made the program a lot slower due to the GIL.
Sure. Note that using threads with the GIL, it is not a problem to keep the UI responsive even if background calculations are going on (at worst it requires some tweaking of sys.setcheckinterval() or its new-GIL equivalent). However with the GIL multiple calculations would be limited to a single core. According to CSP advicates, this approach will break down when you need more than 8-16 cores since cache coherence breaks down at 16 cores. Then you would have to figure out a message-passing approach (but the messages would have to be very fast). -- --Guido van Rossum (python.org/~guido)
According to CSP advicates, this approach will break down when you need more than 8-16 cores since cache coherence breaks down at 16 cores. Then you would have to figure out a message-passing approach (but the messages would have to be very fast).
It does break down, and probably always will be. Imho this gets worse with NUMA architectures becoming more prevalent. But even with 50 cores you may be happy to have something run away with 4-8 threads shared memory from time to time. Developing good message based schemes is important for the long run, but I think multithreaded parallelization will become more common, before we see a general switch to messages. Regards, Joerg Blank
On 7/26/2010 7:36 AM, Guido van Rossum wrote:
According to CSP advicates, this approach will break down when you need more than 8-16 cores since cache coherence breaks down at 16 cores. Then you would have to figure out a message-passing approach (but the messages would have to be very fast).
Catching up on Python-Dev after 3 months of travel (lucky me!), so apologies for a "blast from the past" as I'm 6 weeks late in replying here. Think of the hardware implementation of cache coherence as a MIL - memory interleave lock, or a micro interpreter lock (the hardware is interpreting what the compiled software is doing). That is not so different than Python's GIL, just at a lower level. I didn't read the CSP advocacy papers, but experience in early parallel system at CMU, Tandem Computers, and Teradata strongly imply that multiprocessing of some sort will always be able to scale larger than memory coherent cores -- if the application can be made parallel at all. It is interesting to note that all the parallel systems mentioned above implemented fast message passing hardware of various sorts (affected by available technologies of their times). It is interesting to note the similarities between some of the extreme multi-way cache coherence approaches and the various message passing hardware, also... some of the papers that talk about exceeding 16 cores were going down a message passing road to achieve it. Maybe something new has been discovered in the last 8 years since I've not been following the research... the only thing I've read about that in the last 8 years is the loss of Jim Gray at sea... but the IEEE paper you posted later seems to confirm my suspicions that there has not yet been a breakthrough. The point of the scalability remark, though, is that while lots of problems can be solved on a multi-core system, problems also grow bigger, and there will likely always be problems that cannot be solved on a multi-core (single cache coherent memory) system. Those problems will require message passing solutions. Experience with the systems above has shown that switching from a multi-core (semaphore based) design to a message passing design is usually a rewrite. Perhaps the existence of the GIL, forcing a message passing solution to be created early, is a blessing in disguise for the design of large scale applications. I've been hearing about problems for which the data is too large to share, and the calculation is too complex to parallelize for years, but once the available hardware is exhausted as the problem grows, the only path to larger scale is message passing parallelism... forcing a redesign of applications that outgrew the available hardware. That said, applications that do fit in available hardware generally can run a little faster with some sort of shared memory approach: message passing does have overhead. -- Glenn ------------------------------------------------------------------------ I have CDO..It's like OCD, but in alphabetical order..The way it should be! (a facebook group is named this, except for a misspelling.)
On Mon, Jul 26, 2010 at 12:00 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
At Resolver Systems we created a "calculation system" that does large calculations on background threads using IronPython. Doing them on a background thread allows the ui to remain responsive. Several calculations could run simultaneously using multiple cores.
As the calculation operates on a large object graph (which the ui then needs access to in order to display it) using multiprocessing would have imposed a very big overhead due to serialization / deserialization (the program runs on windows). [...] All the best,
Michael
Hey, (De)serialization being a much bigger cost than cache invalidation for a low amount of threads that do a lot of work each is definitely a common "problem" (in quotes, because as you mentioned: it actually *works*!). There are a number of ways that CSP tries to solve that (generally involving more locking), but they are not currently applicable to CPython because of the state of the GIL. Unfortunately, CSP theory appears to predict this is something that starts breaking down around 16 or so cores. Since x86-64 CPUs (Opterons) are currently available with 12 cores and their 16 core bigger brother coming in 2011, I guess now would be a good time to start worrying about it :-) I'd like to chime in from my experience with E, because they've ran into this problem (processors want many processes to perform, but (de)serialization makes that prohibitive) and tried to solve it (and I think they did well). As always when I talk about E, I'm not suggesting everyone drops everything and does this, but it might be interesting to look at. (Disclaimer: the following explanations makes minor concessions to pedant-proof levels of purity in the interest of giving everyone an idea of what something is that's correct enough to reason about it on an abstract level -- people who are interested, please read the Wikipedia bits, they're surprisingly good :-)) E introduces a concept called "vats". They have an event queue, their own stack and N objects. Vats run on top of real processes, which have 0..N vats. The advantage is that vats don't share namespaces but can (but don't necessarily do) share memoryspaces. So, messaging between vats *can* be cheap (I'm unfamiliar with threading under .NET, but if it's similar to how it happens on the JVM: same ballpark), but the vat is completely oblivious to the fact it's running on the same process as a different vat or a completely different one running on a CPU on the other side of the world. Because inter-vat message passing is explicit, these vats can also run in parallel with no issues. The simplest way to implement this would be a vat-local GIL (I realise the name GIL no longer makes sense there) for each vat, and then the process (most likely written in C(ython)) and the objects inside each vat contesting it. Or, in closing, but less exciting sounding: we've reinvented threads and they're called vats now! (The advantage is that you get the distributed nature, and only pay for it when you actually need it.) Computers are reasonably good at this sort of scheduling (putting the appropriate vats together), but it wouldn't be unthinkable to have the programmer hint at it. You just have to be careful not to take it too far and get into gcc realm, where higher levels of optimization include things like "ignore programmer hints". Caveat emptor: E has always cared much more about capabilities (so the security aspect) than parallel execution. thanks for reading Laurens
On 26/07/2010 04:42, Guido van Rossum wrote:
On Sun, Jul 25, 2010 at 8:31 PM, Peter Portante <peter.a.portante@gmail.com> wrote:
FWIW: We use Python at Tabblo, straddled across Python 2.5.4 and 2.6.5. They work. And they work well. But we make light use of threads (mostly background I/O handling), and heavy use of multiple processes because we can't take advantage of our multi-core systems otherwise.
Isn't this an indication that the GIL is, in fact, not (much of) a problem?
I wish those trying to get rid of the GIL well. But it may not be the panacea some folks are hoping for. Multi-threaded programming remains hard (and removing the GIL might actually make it harder).
Jython and IronPython don't have a GIL, and I think PyPy may not either. Does anyone have experience with GIL-free programming in one of those?
Oh, and PyPy does have a GIL but the developers say it wouldn't be a huge amount of work to remove it. Presumably they would have to add locking in the right places - which would then impact performance. As PyPy doesn't use reference counting adding locking shouldn't impact performance as much as previous attempts with CPython have. Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.
On Mon, Jul 26, 2010 at 12:02 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
On 26/07/2010 04:42, Guido van Rossum wrote:
On Sun, Jul 25, 2010 at 8:31 PM, Peter Portante <peter.a.portante@gmail.com> wrote:
FWIW: We use Python at Tabblo, straddled across Python 2.5.4 and 2.6.5. They work. And they work well. But we make light use of threads (mostly background I/O handling), and heavy use of multiple processes because we can't take advantage of our multi-core systems otherwise.
Isn't this an indication that the GIL is, in fact, not (much of) a problem?
I wish those trying to get rid of the GIL well. But it may not be the panacea some folks are hoping for. Multi-threaded programming remains hard (and removing the GIL might actually make it harder).
Jython and IronPython don't have a GIL, and I think PyPy may not either. Does anyone have experience with GIL-free programming in one of those?
Oh, and PyPy does have a GIL but the developers say it wouldn't be a huge amount of work to remove it.
It wouldn't be as huge as on CPython, since we don't have reference counting, but it's still *a lot* of work and someone would have to step and take this task (since none core pypy dev is that interested in that).
Presumably they would have to add locking in the right places - which would then impact performance. As PyPy doesn't use reference counting adding locking shouldn't impact performance as much as previous attempts with CPython have.
That's one thing but the other thing is that JIT can remove a lot of locks (like it does no JVM), but that's yet another batch of work to be done. Cheers, fijal
Guido van Rossum wrote:
While the EuroPython sprints are still going on, I am back home, and after a somewhat restful night of sleep, I have some thoughts I'd like to share before I get distracted. Note, I am jumping wildly between topics.
- Commit privileges: Maybe we've been too careful with only giving commit privileges to to experienced and trusted new developers. I spoke to Ezio Melotti and from his experience with getting commit privileges, it seems to be a case of "the lion is much more afraid of you than you are afraid of the lion". I.e. having got privileges he was very concerned about doing something wrong, worried about the complexity of SVN, and so on. Since we've got lots of people watching the commit stream, I think that there really shouldn't need to be a worry at all about a new committer doing something malicious, and there shouldn't be much worry about honest beginners' mistakes either -- the main worry remains that new committers don't use their privileges enough. So, my recommendation (which surely is a turn-around of my *own* attitude in the past) is to give out more commit privileges sooner.
I would like to highlight that other open source projects have used more liberal commit right policies, without the project breaking into pieces, to the contrary. For example, in KDE, you usually get commit rights on your second patch submisssion. The contributors have reported that it really helped to convert from "occasional contributors" to "active contributors" and were encouraged by the trust given by the project community. They felt a new sense of responsibility toward the project with the ability to contribute directly. There was never any malicious commits done to KDE using this liberal policy. The newcomers tend to be extremely careful. If you keep the newcomers under the umbrella of a mentor for a few months and with the additional security of post-commit reviews, I am sure that you are not taking any real risks on the codebase. cheers, Philippe
On Sat, Jul 24, 2010 at 4:08 PM, Guido van Rossum <guido@python.org> wrote: [...]
- A lot of things seem to be happening to make PyPI better. Is this being summarized somewhere? Based on some questions I received during my keynote Q&A (http://bit.ly/bdflqa) I think not enough people are aware of what we are already doing in this area.
Even people very involved in packaging are not fully aware of what's going on. I am not for instance. I think that we lack to communicate and synchronize on our efforts on the PyPI development. The last example I have in mind is that I have announced here that I was working on a patch for the checkbox problem, then Martin announced today on catalog-sig it was fixed by Georg and updated in production :) I think we need to improve this: it can be a very frustrating experience to contribute to PyPI. Possible improvements: - Have a PyPI component at bugs.python.org so all work on bugs/new features would be known and followed by at the same level than other packaging components we maintain, and that depend on PyPI (distutils, distutils2) -- e.g. drop the sourceforge tracker - Make it easier to contribute by moving the PyPI code base to hg.python.org. Unlike Python, this is a very simple move.
Frankly, I'm not sure I do, either: I think I've heard of a GSOC student and of plans to take over pypi.appspot.com (with the original developer's permission) to become a full and up-to-date mirror.
That would be great if the student could promote his work at Catalog-SIG.
Mirroring apparently also requires some client changes.
Mirrors can be used as long as you manually point a mirror when using them. We we are working on making the switch automatic. Regards, Tarek -- Tarek Ziadé | http://ziade.org
On Mon, Jul 26, 2010 at 4:02 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Sat, Jul 24, 2010 at 4:08 PM, Guido van Rossum <guido@python.org> wrote:
<snip>
Mirroring apparently also requires some client changes.
Mirrors can be used as long as you manually point a mirror when using them. We we are working on making the switch automatic.
I think we've talked briefly about this before, but let me reiterate that getting this right from a security point of view is quite a bit harder than it at first appears, and IMHO it is worth getting right. Geremy Condra
On Mon, Jul 26, 2010 at 1:20 PM, geremy condra <debatem1@gmail.com> wrote:
On Mon, Jul 26, 2010 at 4:02 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Sat, Jul 24, 2010 at 4:08 PM, Guido van Rossum <guido@python.org> wrote:
<snip>
Mirroring apparently also requires some client changes.
Mirrors can be used as long as you manually point a mirror when using them. We we are working on making the switch automatic.
I think we've talked briefly about this before, but let me reiterate that getting this right from a security point of view is quite a bit harder than it at first appears, and IMHO it is worth getting right.
FWIW, Martin has added a section about mirror authenticity in the PEP: http://www.python.org/dev/peps/pep-0381/#mirror-authenticity
Geremy Condra
-- Tarek Ziadé | http://ziade.org
On Mon, Jul 26, 2010 at 4:52 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Mon, Jul 26, 2010 at 1:20 PM, geremy condra <debatem1@gmail.com> wrote:
On Mon, Jul 26, 2010 at 4:02 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Sat, Jul 24, 2010 at 4:08 PM, Guido van Rossum <guido@python.org> wrote:
<snip>
Mirroring apparently also requires some client changes.
Mirrors can be used as long as you manually point a mirror when using them. We we are working on making the switch automatic.
I think we've talked briefly about this before, but let me reiterate that getting this right from a security point of view is quite a bit harder than it at first appears, and IMHO it is worth getting right.
FWIW, Martin has added a section about mirror authenticity in the PEP:
http://www.python.org/dev/peps/pep-0381/#mirror-authenticity
This is more-or-less what was discussed earlier, and from what's described here I think the concerns I voiced stand. What's the right way to do disclosure on this sort of issue? Geremy Condra
On Mon, Jul 26, 2010 at 2:10 PM, geremy condra <debatem1@gmail.com> wrote:
On Mon, Jul 26, 2010 at 4:52 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Mon, Jul 26, 2010 at 1:20 PM, geremy condra <debatem1@gmail.com> wrote:
On Mon, Jul 26, 2010 at 4:02 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Sat, Jul 24, 2010 at 4:08 PM, Guido van Rossum <guido@python.org> wrote:
<snip>
Mirroring apparently also requires some client changes.
Mirrors can be used as long as you manually point a mirror when using them. We we are working on making the switch automatic.
I think we've talked briefly about this before, but let me reiterate that getting this right from a security point of view is quite a bit harder than it at first appears, and IMHO it is worth getting right.
FWIW, Martin has added a section about mirror authenticity in the PEP:
http://www.python.org/dev/peps/pep-0381/#mirror-authenticity
This is more-or-less what was discussed earlier, and from what's described here I think the concerns I voiced stand. What's the right way to do disclosure on this sort of issue?
I would recommend discussing it in Distutils-SIG and proposing a change to that PEP. Notice that this PEP is not accepted yet. I am not sure what would be the best moment to have it accepted. I guess once we have experimented enough on the client side. Tarek -- Tarek Ziadé | http://ziade.org
On Mon, Jul 26, 2010 at 7:21 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Mon, Jul 26, 2010 at 2:10 PM, geremy condra <debatem1@gmail.com> wrote:
On Mon, Jul 26, 2010 at 4:52 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Mon, Jul 26, 2010 at 1:20 PM, geremy condra <debatem1@gmail.com> wrote:
On Mon, Jul 26, 2010 at 4:02 AM, Tarek Ziadé <ziade.tarek@gmail.com> wrote:
On Sat, Jul 24, 2010 at 4:08 PM, Guido van Rossum <guido@python.org> wrote:
<snip>
Mirroring apparently also requires some client changes.
Mirrors can be used as long as you manually point a mirror when using them. We we are working on making the switch automatic.
I think we've talked briefly about this before, but let me reiterate that getting this right from a security point of view is quite a bit harder than it at first appears, and IMHO it is worth getting right.
FWIW, Martin has added a section about mirror authenticity in the PEP:
http://www.python.org/dev/peps/pep-0381/#mirror-authenticity
This is more-or-less what was discussed earlier, and from what's described here I think the concerns I voiced stand. What's the right way to do disclosure on this sort of issue?
I would recommend discussing it in Distutils-SIG and proposing a change to that PEP.
I've noticed that I don't have a lot of success in shifting this kind of debate, so I'm not sure it's a good idea to publicly discuss vulnerabilities in something that may wind up being implemented as-is, but it's up to you guys. Geremy Condra
geremy condra, 26.07.2010 16:29:
I've noticed that I don't have a lot of success in shifting this kind of debate, so I'm not sure it's a good idea to publicly discuss vulnerabilities in something that may wind up being implemented as-is, but it's up to you guys.
Hmm, security by obscurity? That's a good idea. Let's do that more often. Stefan
On Mon, Jul 26, 2010 at 7:36 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
geremy condra, 26.07.2010 16:29:
I've noticed that I don't have a lot of success in shifting this kind of debate, so I'm not sure it's a good idea to publicly discuss vulnerabilities in something that may wind up being implemented as-is, but it's up to you guys.
Hmm, security by obscurity? That's a good idea. Let's do that more often.
FWIW, security by obscurity has a bad rep in some circles, but it is an essential component of any serious security policy. It just should never be the *only* component. (In fact, any serious security policy should have multiple disparate components.) In this case, it looks like (a) the cat is already out of the bag, and (b) it's easy to figure out from the PEPs where the vulnerabilities lie, so I don't think we'll gain much by shushing it up. -- --Guido van Rossum (python.org/~guido)
On Mon, Jul 26, 2010 at 7:36 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
geremy condra, 26.07.2010 16:29:
I've noticed that I don't have a lot of success in shifting this kind of debate, so I'm not sure it's a good idea to publicly discuss vulnerabilities in something that may wind up being implemented as-is, but it's up to you guys.
Hmm, security by obscurity? That's a good idea. Let's do that more often.
Usually it's termed responsible disclosure, but I'm a lot more interested in fixing things than playing semantics. Geremy Condra
On Tue, 27 Jul 2010 12:36:37 am Stefan Behnel wrote:
geremy condra, 26.07.2010 16:29:
I've noticed that I don't have a lot of success in shifting this kind of debate, so I'm not sure it's a good idea to publicly discuss vulnerabilities in something that may wind up being implemented as-is, but it's up to you guys.
Hmm, security by obscurity? That's a good idea. Let's do that more often.
Shhh! Don't tell anybody! *wink* But seriously, I don't think Geremy is suggesting security by obscurity. It seems to me that he's merely suggesting that we are discreet about discussing vulnerabilities unless we have a plan to fix them. Whether such discretion is useful is an open question. It may be that the cat is already out of the bag and it's too late to be discreet, so we might as well not bother. -- Steven D'Aprano
On Mon, Jul 26, 2010 at 4:29 PM, geremy condra <debatem1@gmail.com> wrote: ...
I've noticed that I don't have a lot of success in shifting this kind of debate, so I'm not sure it's a good idea to publicly discuss vulnerabilities in something that may wind up being implemented as-is, but it's up to you guys.
I think its best to have this discussed there publicly. In any case, mirrors are run by trusted people, so the risks are not very high AFAIK I think this discussion didn't have a lot of participant because most of us (that includes me) are not expert all all, if not ignorant, in this topic. A complete patch to the PEP, including a detailed description, is the best thing to do I think, to move this forward. Regards Tarek
Am 26.07.2010 13:02, schrieb Tarek Ziadé:
On Sat, Jul 24, 2010 at 4:08 PM, Guido van Rossum <guido@python.org> wrote: [...]
- A lot of things seem to be happening to make PyPI better. Is this being summarized somewhere? Based on some questions I received during my keynote Q&A (http://bit.ly/bdflqa) I think not enough people are aware of what we are already doing in this area.
Even people very involved in packaging are not fully aware of what's going on. I am not for instance. I think that we lack to communicate and synchronize on our efforts on the PyPI development.
Basically, I think what you'd like to have is Martin saying "I'm going to work on this feature", in addition to "I implemented this feature now" afterwards. That shouldn't be too hard. On related news, PyPI now has a JSONRPC interface. I'll leave it to you to figure out how to use it, and that it wasn't Martin who added it in secret ;)
The last example I have in mind is that I have announced here that I was working on a patch for the checkbox problem, then Martin announced today on catalog-sig it was fixed by Georg and updated in production :)
I'd like to add to this that Martin didn't know I was working on the patch (I wrote the patch on the day after I came home from EP), and having worked a bit in the PyPI codebase during the sprints, I just decided to fix this issue, which I had perceived to be quite urgent to some people. (Also, the patch really wasn't a huge thing.)
I think we need to improve this: it can be a very frustrating experience to contribute to PyPI.
I did not experience it this way. On the contrary, I tried to run PyPI locally for testing purposes, but didn't want to compile and run Postgres, so we figured how hard it was to use Sqlite instead. Martin put in quite an effort to make it possible to have a local instance run with an sqlite db, and I could sprint productively on PyPI.
Possible improvements:
- Have a PyPI component at bugs.python.org so all work on bugs/new features would be known and followed by at the same level than other packaging components we maintain, and that depend on PyPI (distutils, distutils2) -- e.g. drop the sourceforge tracker
I wouldn't do that -- PyPI is not distributed with Python. (I'm equally skeptical about Distutils2, but it will at least be part of Python at some point in the future.) I would support a move to a separate bugs.python.org/pypi tracker, however. Not having to deal with SourceForge is still a good thing.
- Make it easier to contribute by moving the PyPI code base to hg.python.org. Unlike Python, this is a very simple move.
+1 to that. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
On Mon, Jul 26, 2010 at 10:39 PM, Georg Brandl <g.brandl@gmx.net> wrote: ...
I think we need to improve this: it can be a very frustrating experience to contribute to PyPI.
I did not experience it this way. On the contrary, I tried to run PyPI locally for testing purposes, but didn't want to compile and run Postgres, so we figured how hard it was to use Sqlite instead. Martin put in quite an effort to make it possible to have a local instance run with an sqlite db, and I could sprint productively on PyPI.
I wasn't talking about the technical gap. This is improving all the time (I've improved the quick wsgi launching script so we could use it without dependencies instance). I am talking about synchronization and communication e.g. the project management. For instance, You didn't know that I have started this patch, and Martin didn't know that you started on your side. :-)
Possible improvements:
- Have a PyPI component at bugs.python.org so all work on bugs/new features would be known and followed by at the same level than other packaging components we maintain, and that depend on PyPI (distutils, distutils2) -- e.g. drop the sourceforge tracker
I wouldn't do that -- PyPI is not distributed with Python. (I'm equally skeptical about Distutils2, but it will at least be part of Python at some point in the future.)
I would support a move to a separate bugs.python.org/pypi tracker, however. Not having to deal with SourceForge is still a good thing.
My PoV is that PyPI is part of the Python ecosystem we provide. The other part is currently in Python (distutils) and can be impacted by changes. The checkbox for instance, is one atomic change across Python and PyPI, But another roundup issue would be a progress already.
- Make it easier to contribute by moving the PyPI code base to hg.python.org. Unlike Python, this is a very simple move.
+1 to that.
Am 26.07.2010 23:03, schrieb Tarek Ziadé:
On Mon, Jul 26, 2010 at 10:39 PM, Georg Brandl <g.brandl@gmx.net> wrote: ....
I think we need to improve this: it can be a very frustrating experience to contribute to PyPI.
I did not experience it this way. On the contrary, I tried to run PyPI locally for testing purposes, but didn't want to compile and run Postgres, so we figured how hard it was to use Sqlite instead. Martin put in quite an effort to make it possible to have a local instance run with an sqlite db, and I could sprint productively on PyPI.
I wasn't talking about the technical gap. This is improving all the time (I've improved the quick wsgi launching script so we could use it without dependencies instance).
I am talking about synchronization and communication e.g. the project management.
For instance, You didn't know that I have started this patch, and Martin didn't know that you started on your side. :-)
This isn't specific to PyPI though. I usually don't announce "I'm going to work on this issue" for a specific Python tracker item, except if I expect it to take more than a few hours. The patch in question is really pretty minor.
Possible improvements:
- Have a PyPI component at bugs.python.org so all work on bugs/new features would be known and followed by at the same level than other packaging components we maintain, and that depend on PyPI (distutils, distutils2) -- e.g. drop the sourceforge tracker
I wouldn't do that -- PyPI is not distributed with Python. (I'm equally skeptical about Distutils2, but it will at least be part of Python at some point in the future.)
I would support a move to a separate bugs.python.org/pypi tracker, however. Not having to deal with SourceForge is still a good thing.
My PoV is that PyPI is part of the Python ecosystem we provide. The other part is currently in Python (distutils) and can be impacted by changes. The checkbox for instance, is one atomic change across Python and PyPI,
Sure PyPI is part of the ecosystem. But so are quite a lot of other tools, and none of them are tracked in bugs.python.org. (This is also the case for the website.) I'd really like bugs.python.org to remain a tracker for what we ship as the CPython distribution, and nothing else. There's enough content in there already. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
On Mon, Jul 26, 2010 at 11:15 PM, Georg Brandl <g.brandl@gmx.net> wrote: ..
Sure PyPI is part of the ecosystem. But so are quite a lot of other tools, and none of them are tracked in bugs.python.org. (This is also the case for the website.) I'd really like bugs.python.org to remain a tracker for what we ship as the CPython distribution, and nothing else. There's enough content in there already.
PyPI is the only tool that gets called from Python AFAIK.
On 7/26/2010 5:15 PM, Georg Brandl wrote:
Sure PyPI is part of the ecosystem. But so are quite a lot of other tools, and none of them are tracked in bugs.python.org. (This is also the case for the website.) I'd really like bugs.python.org to remain a tracker for what we ship as the CPython distribution, and nothing else. There's enough content in there already.
How about one other tracker, say bugs.python/org/tools (or projects, or ???) for everything else: pypi, distribute2 (until part of release), web site, sandbox projects? It would have to be taught how to turn revxxxx + component into a link to the appropriate repository. -- Terry Jan Reedy
Am 27.07.2010 04:43, schrieb Terry Reedy:
On 7/26/2010 5:15 PM, Georg Brandl wrote:
Sure PyPI is part of the ecosystem. But so are quite a lot of other tools, and none of them are tracked in bugs.python.org. (This is also the case for the website.) I'd really like bugs.python.org to remain a tracker for what we ship as the CPython distribution, and nothing else. There's enough content in there already.
How about one other tracker, say bugs.python/org/tools (or projects, or ???) for everything else: pypi, distribute2 (until part of release), web site, sandbox projects? It would have to be taught how to turn revxxxx + component into a link to the appropriate repository.
I still think that one tracker per project/site is the better way. However, I would be +0 on a common "infrastructure" tracker (also subsuming the meta-tracker). Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
On Tue, 27 Jul 2010 09:57:22 +0200 Georg Brandl <g.brandl@gmx.net> wrote:
Am 27.07.2010 04:43, schrieb Terry Reedy:
On 7/26/2010 5:15 PM, Georg Brandl wrote:
Sure PyPI is part of the ecosystem. But so are quite a lot of other tools, and none of them are tracked in bugs.python.org. (This is also the case for the website.) I'd really like bugs.python.org to remain a tracker for what we ship as the CPython distribution, and nothing else. There's enough content in there already.
How about one other tracker, say bugs.python/org/tools (or projects, or ???) for everything else: pypi, distribute2 (until part of release), web site, sandbox projects? It would have to be taught how to turn revxxxx + component into a link to the appropriate repository.
I still think that one tracker per project/site is the better way.
Only if they have similar look and feel, and don't require you to register the same login N times, though. Regards Antoine.
On 7/27/2010 11:02 AM, Antoine Pitrou wrote:
On Tue, 27 Jul 2010 09:57:22 +0200 Georg Brandl <g.brandl@gmx.net> wrote:
Am 27.07.2010 04:43, schrieb Terry Reedy:
On 7/26/2010 5:15 PM, Georg Brandl wrote:
Sure PyPI is part of the ecosystem. But so are quite a lot of other tools, and none of them are tracked in bugs.python.org. (This is also the case for the website.) I'd really like bugs.python.org to remain a tracker for what we ship as the CPython distribution, and nothing else. There's enough content in there already.
How about one other tracker, say bugs.python/org/tools (or projects, or ???) for everything else: pypi, distribute2 (until part of release), web site, sandbox projects? It would have to be taught how to turn revxxxx + component into a link to the appropriate repository.
I still think that one tracker per project/site is the better way.
Only if they have similar look and feel, and don't require you to register the same login N times, though.
Is it really time to give devs a distributed identity good for a range of systems? Sounds like a potentially hairy management task. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/
Am 27.07.2010 12:49, schrieb Steve Holden:
On 7/27/2010 11:02 AM, Antoine Pitrou wrote:
On Tue, 27 Jul 2010 09:57:22 +0200 Georg Brandl <g.brandl@gmx.net> wrote:
Am 27.07.2010 04:43, schrieb Terry Reedy:
On 7/26/2010 5:15 PM, Georg Brandl wrote:
Sure PyPI is part of the ecosystem. But so are quite a lot of other tools, and none of them are tracked in bugs.python.org. (This is also the case for the website.) I'd really like bugs.python.org to remain a tracker for what we ship as the CPython distribution, and nothing else. There's enough content in there already.
How about one other tracker, say bugs.python/org/tools (or projects, or ???) for everything else: pypi, distribute2 (until part of release), web site, sandbox projects? It would have to be taught how to turn revxxxx + component into a link to the appropriate repository.
I still think that one tracker per project/site is the better way.
Only if they have similar look and feel, and don't require you to register the same login N times, though.
Is it really time to give devs a distributed identity good for a range of systems? Sounds like a potentially hairy management task.
IMO supporting OpenID is good enough. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
Steve Holden writes:
Only if they have similar look and feel, and don't require you to register the same login N times, though.
Is it really time to give devs a distributed identity good for a range of systems? Sounds like a potentially hairy management task.
Sure, but Python can leave management up to Google, Yahoo, and several other well-known providers. This is what "OpenId" is all about.
Basically, I think what you'd like to have is Martin saying "I'm going to work on this feature", in addition to "I implemented this feature now" afterwards. That shouldn't be too hard.
I'm not very good at blogging (more specifically, I never blog). People interested in following even the tinyest changes to PyPI should watch the commit list; it is unlikely that I will post each and every change to catalog-sig. People are then free to blog about the changes as they please (and I would appreciate any help I can get in announcing changes). Regards, Martin
On Mon, Jul 26, 2010 at 11:57 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Basically, I think what you'd like to have is Martin saying "I'm going to work on this feature", in addition to "I implemented this feature now" afterwards. That shouldn't be too hard.
I'm not very good at blogging (more specifically, I never blog).
People interested in following even the tinyest changes to PyPI should watch the commit list; it is unlikely that I will post each and every change to catalog-sig.
People are then free to blog about the changes as they please (and I would appreciate any help I can get in announcing changes).
I think what would be good is to have you discussing the "I'm going to work on this feature" at the ML before it's started. And not discovering it randomly at PyPI or the commit list, leading to a heated discussion or a poll. I would classify the changes in three kinds: - minor: a new feature, a UI bugfix etc - important: a new feature that changes a lot the end-user experience (like the rating system) - major: a change to the APIs (HTTP/XML-RPC) I think you should briefly present your plans for important or major changes in catalog-SIG prior to starting them, so we can discuss them. Regards Tarek -- Tarek Ziadé | http://ziade.org
I would classify the changes in three kinds:
- minor: a new feature, a UI bugfix etc - important: a new feature that changes a lot the end-user experience (like the rating system) - major: a change to the APIs (HTTP/XML-RPC)
I think you should briefly present your plans for important or major changes in catalog-SIG prior to starting them, so we can discuss them.
So would you consider the addition of JSON a major feature (as it introduces a new API)? I doubt Richard would have been willing to wait for the end of some discussion before implementing it. Regards, Martin
2010/7/27 "Martin v. Löwis" <martin@v.loewis.de>:
I would classify the changes in three kinds:
- minor: a new feature, a UI bugfix etc - important: a new feature that changes a lot the end-user experience (like the rating system) - major: a change to the APIs (HTTP/XML-RPC)
I think you should briefly present your plans for important or major changes in catalog-SIG prior to starting them, so we can discuss them.
So would you consider the addition of JSON a major feature (as it introduces a new API)? I doubt Richard would have been willing to wait for the end of some discussion before implementing it.
That's "minor" since it's a new feature that does not interfere with existing features -- I hope it doesn't :) "major" is a change to existing APIs that could potentially break existing software. PyPI is not different from other software in that respect, e.g. we need to be careful with the changes we make and push in production. But, if Richard sprints again and change the JSON output --let's say it use to return a mapping, but now it returns a list with a timestamp as the first member and the original mapping as the second member--, he should first explain that change to the ML so people that potentially uses the JSON version can be aware of that. In case of doubt, a change to an existing piece should be mentioned. Ideally, as I said in a previous mail, we should document in a single place (a PEP I guess) the PyPI specification, and maybe version it. Regards, Tarek -- Tarek Ziadé | http://ziade.org
On Jul 24, 2010, at 07:08 AM, Guido van Rossum wrote:
privileges enough. So, my recommendation (which surely is a turn-around of my *own* attitude in the past) is to give out more commit privileges sooner.
+1, though I'll observe that IME, actual commit privileges become much less of a special badge once a dvcs-based workflow is put in place. In the absence of that, I agree that we have enough checks and balances in place to allow more folks to commit changes.
approved. A PEP handler should be selected for each PEP as soon as possible; without a PEP handler, discussing a PEP is not all that useful. The PEP handler should be someone respected by the community with an interest in the subject of the PEP but at an arms' length (at least) from the PEP author. The PEP handler will have to moderate
This is a good idea, and certainly helps "scale Guido" better. We might also consider designating experts who can collaborate on PEP wrangling for certain topics. For example, if Martin would normally be the Cheeseshop PEP handler, but submits his own PEPs on the topic, the handful of experts can take up the slack when Martin recuses himself on his own PEPs. I'd hope that we could always find at least two people to wrangle any PEP, or we've got a bigger problem to deal with! Sounds like EuroPython was fun! -Barry
On Mon, Jul 26, 2010 at 9:06 AM, Barry Warsaw <barry@python.org> wrote:
On Jul 24, 2010, at 07:08 AM, Guido van Rossum wrote:
privileges enough. So, my recommendation (which surely is a turn-around of my *own* attitude in the past) is to give out more commit privileges sooner.
+1, though I'll observe that IME, actual commit privileges become much less of a special badge once a dvcs-based workflow is put in place. In the absence of that, I agree that we have enough checks and balances in place to allow more folks to commit changes
Even with DVCS in place, commit privileges allow the person who cares about a change to move it forward, including the more mechanical aspects. E.g. if there are positive reviews of a person's changes in their fork, they can push those changes in. Or more generally, there's a lot of ways of getting approval, but limited commit privileges means all approval must ultimately be funneled through someone with commit. Also different parts of the codebase should have different levels of review and conservativism; e.g., adding clarifications to the docs requires a different level of review than changing stuff in the core. We could try to build that into the tools, but it's a lot easier to make the tools permissive and build these distinctions into social structures. -- Ian Bicking | http://blog.ianbicking.org
On Jul 26, 2010, at 10:50 AM, Ian Bicking wrote:
On Mon, Jul 26, 2010 at 9:06 AM, Barry Warsaw <barry@python.org> wrote:
On Jul 24, 2010, at 07:08 AM, Guido van Rossum wrote:
privileges enough. So, my recommendation (which surely is a turn-around of my *own* attitude in the past) is to give out more commit privileges sooner.
+1, though I'll observe that IME, actual commit privileges become much less of a special badge once a dvcs-based workflow is put in place. In the absence of that, I agree that we have enough checks and balances in place to allow more folks to commit changes
Even with DVCS in place, commit privileges allow the person who cares about a change to move it forward, including the more mechanical aspects. E.g. if there are positive reviews of a person's changes in their fork, they can push those changes in. Or more generally, there's a lot of ways of getting approval, but limited commit privileges means all approval must ultimately be funneled through someone with commit.
Right, but with a dvcs workflow, it's really only the very last step that requires commit privileges. There is much less chance of having those fork branches get stale, much greater ability for those branches to be reviewed, tested, and commented on, etc. You can more easily do everything right up until the final merge to the master branch without commit privileges, so it's much less of a blocker to progress. -Barry
FWIW, a leading magazine (IEEE Spectrum) this week has an interesting opinion piece about multicore. http://spectrum.ieee.org/computing/software/the-trouble-with-multicore -- --Guido van Rossum (python.org/~guido)
participants (28)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Barry Warsaw
-
Benjamin Peterson
-
Georg Brandl
-
geremy condra
-
Glenn Linderman
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
Ian Bicking
-
Jack Diederich
-
Jesse Noller
-
Jörg Blank
-
Laurens Van Houtven
-
Maciej Fijalkowski
-
Michael Foord
-
Nick Coghlan
-
Peter Portante
-
Philippe Fremy
-
Ronald Oussoren
-
Sarah Mount
-
Stefan Behnel
-
Stephen J. Turnbull
-
Steve Holden
-
Steven D'Aprano
-
Tarek Ziadé
-
Terry Reedy