Hi everyone, CPython is slow. We all know that, yet little is done to fix it. I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding. I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different. Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject. My ideas for possible funding, as well as the actual plan of development, can be found here: https://github.com/markshannon/faster-cpython I'd love to hear your thoughts on this. Cheers, Mark.
On Tue, 20 Oct 2020 13:53:34 +0100 Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
Do you plan to do all this in C, or would you switch to C++ (or something else)? Regards Antoine.
Hi Antoine, On 20/10/2020 2:32 pm, Antoine Pitrou wrote:
On Tue, 20 Oct 2020 13:53:34 +0100 Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
Do you plan to do all this in C, or would you switch to C++ (or something else)?
All C, no C++. I promise :) Cheers, Mark.
On Tue, 20 Oct 2020 16:10:27 +0100 Mark Shannon <mark@hotpy.org> wrote:
Hi Antoine,
On 20/10/2020 2:32 pm, Antoine Pitrou wrote:
On Tue, 20 Oct 2020 13:53:34 +0100 Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
Do you plan to do all this in C, or would you switch to C++ (or something else)?
All C, no C++. I promise :)
Interesting, I was mostly expecting/suggesting the opposite. Once you pass a certain level of complexity, C is really a burden. Regards Antoine.
A concern I have about this is what effect it will have on the complexity of CPython's implementation. CPython is currently very simple and straightforward. Some parts are not quite as simple as they used to be, but on the whole it's fairly easy to understand, and I consider this to be one of its strengths. I worry that adding four layers of clever speedup tricks will completely destroy this simplicity, leaving us with something that can no longer be maintained or contributed to by ordinary mortals. -- Greg
On Wed, Oct 21, 2020 at 4:04 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
A concern I have about this is what effect it will have on the complexity of CPython's implementation.
CPython is currently very simple and straightforward. Some parts are not quite as simple as they used to be, but on the whole it's fairly easy to understand, and I consider this to be one of its strengths.
I worry that adding four layers of clever speedup tricks will completely destroy this simplicity, leaving us with something that can no longer be maintained or contributed to by ordinary mortals.
I have never considered ceval.c to be simple and straightforward. Nor our parser(s). Or our optimizers. Or the regular expression implementation. Or the subprocess internals. (we may differ on lists of what isn't simple and straightforward, but i guarantee you we've all got something in mind) Making this not a major concern for me. We'd decide if there is something we find to be dark magic that seems challenging to maintain during the review phases and decide what if anything needs to be done about that to ameliorate such issues. -gps
-- Greg _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6XBJ2OA4... Code of Conduct: http://python.org/psf/codeofconduct/
Hi Greg, On 21/10/2020 11:57 pm, Greg Ewing wrote:
A concern I have about this is what effect it will have on the complexity of CPython's implementation.
CPython is currently very simple and straightforward. Some parts are not quite as simple as they used to be, but on the whole it's fairly easy to understand, and I consider this to be one of its strengths.
I'm not sure that it is "very simple and straightforward".
I worry that adding four layers of clever speedup tricks will completely destroy this simplicity, leaving us with something that can no longer be maintained or contributed to by ordinary mortals.
The plan is that everything will be accessible to someone with a CS degree. Any code base takes time and work to get familiar with it. There is no reason why this code should be any easier or harder to understand than any other domain-specific code. Cheers, Mark.
A very interesting proposal. A couple of thoughts... Can we have an executive summary of how your proposed approach differs from those of PyPy, Unladen Swallow, and various other attempts? You suggest that payment should be on delivery, or meeting the target, rather than up-front. That's good for the PSF, but it also means that the contractor not only takes all the risk of failure, but also needs an independent source of income, or at least substantial savings (enough for, what, eighteen months development per stage?). Doesn't that limit the available pool of potential contractors? I think there's always tension between community driven development and paid work. If the PSF pays person A to develop something, might not people B, C, D and E feel slighted that they didn't get paid? On the other hand, I guess we already deal with that. There are devs who are paid by their employers to work on Python for N hours a months, for some value of N, or to develop something and then open source it. And then there are devs who aren't. You have suggested that the cost of each stage be split 50:50 between development and maintenance. But development is a one-off cost; maintenance is an forever cost, and unpredictable, and presumably some of that maintenance will be done by people other than the contractor. A minor point, and I realise that the costs are all in very round figures, but they don't quite match up: $2 million split over five stages is $400K per stage, not $500K.
1. I already have working code for the first stage.
I don't mean to be negative, or hostile, but this sounds like you are saying "I have a patch for Python that will make it 1.5 times faster, but you will never see it unless you pay me!" I realise that is a very uncharitable way of looking at it, sorry about that, it's nothing personal. But $500K is a lot of money. If the PSF says "No thanks", what happens to your code? - delete it; - donate it to Python for free; - fork Python and try to make a commercial, non-FOSS version that you can sell to recoup your development time; - something else? If this was a closed-source proprietary project, there would be no question in my mind. You took a bet that you could sell the code, and you lost. You swallow your loss and move on, that's how the proprietary world works. But this is FOSS and community driven, and I don't think that fits well with that model. -- Steve
On 20/10/2020 2:47 pm, Steven D'Aprano wrote:
A very interesting proposal.
A couple of thoughts...
Can we have an executive summary of how your proposed approach differs from those of PyPy, Unladen Swallow, and various other attempts?
https://github.com/markshannon/faster-cpython/blob/master/tiers.md should cover it.
You suggest that payment should be on delivery, or meeting the target, rather than up-front. That's good for the PSF, but it also means that the contractor not only takes all the risk of failure, but also needs an independent source of income, or at least substantial savings (enough for, what, eighteen months development per stage?). Doesn't that limit the available pool of potential contractors?
We only need one. I don't think financial constraints are the main problem. I think domain knowledge is probably more of a constraint.
I think there's always tension between community driven development and paid work. If the PSF pays person A to develop something, might not people B, C, D and E feel slighted that they didn't get paid?
The PSF already pays people to work on PyPI
On the other hand, I guess we already deal with that. There are devs who are paid by their employers to work on Python for N hours a months, for some value of N, or to develop something and then open source it. And then there are devs who aren't.
You have suggested that the cost of each stage be split 50:50 between development and maintenance. But development is a one-off cost; maintenance is an forever cost, and unpredictable, and presumably some of that maintenance will be done by people other than the contractor.
Any new feature will require ongoing maintenance. I'm just suggesting that we budget for it. Who is going to pay for the maintenance of PEP 634?
A minor point, and I realise that the costs are all in very round figures, but they don't quite match up: $2 million split over five stages is $400K per stage, not $500K.
I meant four stages. Did I write "five" somewhere?
1. I already have working code for the first stage.
I don't mean to be negative, or hostile, but this sounds like you are saying "I have a patch for Python that will make it 1.5 times faster, but you will never see it unless you pay me!"
I believe that's how business works ;) I have this thing, e.g an iPhone, if you want it you must pay me. I think that speeding CPython 50% is worth a few hundred iPhones.
I realise that is a very uncharitable way of looking at it, sorry about that, it's nothing personal. But $500K is a lot of money.
Remember the contractor only gets roughly half of that. The rest stays with the PSF to fund maintenance of CPython. $250k only pays for one engineer for one year at one of the big tech firms. Cheers, Mark.
On Wed, Oct 21, 2020 at 12:55 AM Steven D'Aprano <steve@pearwood.info> wrote:
A minor point, and I realise that the costs are all in very round figures, but they don't quite match up: $2 million split over five stages is $400K per stage, not $500K.
The proposal is for four stages. ChrisA
On Wed, Oct 21, 2020 at 02:38:25AM +1100, Chris Angelico wrote:
On Wed, Oct 21, 2020 at 12:55 AM Steven D'Aprano <steve@pearwood.info> wrote:
A minor point, and I realise that the costs are all in very round figures, but they don't quite match up: $2 million split over five stages is $400K per stage, not $500K.
The proposal is for four stages.
D'oh! I mean, I knew that, I was just checking to see if others were paying attention. Well done, you pass! -- Steve
Where is your working code for the first stage? October 20, 2020 8:53 AM, "Mark Shannon" <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
https://github.com/markshannon/faster-cpython
I'd love to hear your thoughts on this.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RDXLCH22... Code of Conduct: http://python.org/psf/codeofconduct
On 10/20/20 2:53 PM, Mark Shannon wrote:
I'd love to hear your thoughts on this.
a VM needs a separate backend for each architecture (maybe even OS) - which architectures do you include into your proposal? what's your estimate for a new port? - do you plan for a fall-back to a slow non-VM mode, e.g. the existing one? Do you plan to support that in parallel such that Python still can be used on architectures with a working compiler (and a libffi port). E.g. OpenJDK has the zero port, which is slow (interpreted), but runs on practically all architectures. Thanks, Matthias
On Wed, Oct 21, 2020 at 12:03 AM Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
The overall aim is to speed up CPython by a factor of (approximately) five. We aim to do this in four distinct stages, each stage increasing the speed of CPython by (approximately) 50%.
This is a very bold estimate. Particularly, you're proposing a number of small tweaks in stage 2 and expecting that (combined) they can give a 50% improvement in overall performance? Do you have any details to back this up? You're not just asking for a proposal to be accepted, you're actually asking for (quite a bit of) money, and then hoping to find a contractor to do the actual work. That means you're expecting that anyone would be able to achieve this, given sufficient development time. BIG BIG concern: You're basically assuming that all this definition of performance is measured for repeated executions of code. That's how PyPy already works, and it most often suffers quite badly in startup performance to make this happen. Will your proposed changes mean that CPython has to pay the same startup costs that PyPy does? What would happen if $2M were spent on improving PyPy3 instead? ChrisA
Hi Chris, On 20/10/2020 4:37 pm, Chris Angelico wrote:
On Wed, Oct 21, 2020 at 12:03 AM Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
The overall aim is to speed up CPython by a factor of (approximately) five. We aim to do this in four distinct stages, each stage increasing the speed of CPython by (approximately) 50%.
This is a very bold estimate. Particularly, you're proposing a number of small tweaks in stage 2 and expecting that (combined) they can give a 50% improvement in overall performance?
20 tweaks each providing a 2% is a 49% speedup. Stage 1 will open up optimizations that are currently worthwhile.
Do you have any details to back this up? You're not just asking for a proposal to be accepted, you're actually asking for (quite a bit of) money, and then hoping to find a contractor to do the actual work.
I am offering to do the work.
That means you're expecting that anyone would be able to achieve this, given sufficient development time. No, I can (with paid help) achieve this. What matters is that someone can, not that anyone can.
BIG BIG concern: You're basically assuming that all this definition of performance is measured for repeated executions of code. That's how PyPy already works, and it most often suffers quite badly in startup performance to make this happen. Will your proposed changes mean that CPython has to pay the same startup costs that PyPy does?
Could you clarify what you think I'm assuming? When you say start up, do you mean this? $ time python3 -S -c "" real 0m0.010s $ time pypy -S -c "" real 0m0.017s No, there would be no slower startup. In fact the tier 0 interpreter should start a fraction faster than 3.9.
What would happen if $2M were spent on improving PyPy3 instead?
The PSF loses $1M to spend on CPython maintenance, to start with. What would happen to PyPy? I have no idea. Partial success of speeding up CPython is very valuable. Partial success in getting PyPy to support C extensions well and perform well when it currently does, is much less valuable. CPython that is "only" 2 or 3 times faster is a major improvement, but a PyPy that supports 80% of the C extensions that it currently does not is still not a replacement for CPython. Cheers, Mark.
ChrisA _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/3NBP3KLT... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Oct 21, 2020 at 3:31 AM Mark Shannon <mark@hotpy.org> wrote:
Hi Chris,
On 20/10/2020 4:37 pm, Chris Angelico wrote:
On Wed, Oct 21, 2020 at 12:03 AM Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
The overall aim is to speed up CPython by a factor of (approximately) five. We aim to do this in four distinct stages, each stage increasing the speed of CPython by (approximately) 50%.
This is a very bold estimate. Particularly, you're proposing a number of small tweaks in stage 2 and expecting that (combined) they can give a 50% improvement in overall performance?
20 tweaks each providing a 2% is a 49% speedup. Stage 1 will open up optimizations that are currently worthwhile.
Yes, I understand mathematics. Do you have evidence that shows that each of the twenty tweaks can give a two percent speedup though?
Do you have any details to back this up? You're not just asking for a proposal to be accepted, you're actually asking for (quite a bit of) money, and then hoping to find a contractor to do the actual work.
I am offering to do the work.
Sure, that takes away some of the uncertainty, but you're still asking for a considerable amount of money sight-unseen.
BIG BIG concern: You're basically assuming that all this definition of performance is measured for repeated executions of code. That's how PyPy already works, and it most often suffers quite badly in startup performance to make this happen. Will your proposed changes mean that CPython has to pay the same startup costs that PyPy does?
Could you clarify what you think I'm assuming?
When you say start up, do you mean this?
$ time python3 -S -c ""
real 0m0.010s
$ time pypy -S -c ""
real 0m0.017s
No, there would be no slower startup. In fact the tier 0 interpreter should start a fraction faster than 3.9.
That's a microbenchmark, but yes, that's the kind of thing I'm talking about. For short scripts, will "python3.13 script.py" be slower than "python3.9 script.py"?
What would happen if $2M were spent on improving PyPy3 instead?
The PSF loses $1M to spend on CPython maintenance, to start with.
What would happen to PyPy? I have no idea.
Partial success of speeding up CPython is very valuable. Partial success in getting PyPy to support C extensions well and perform well when it currently does, is much less valuable.
CPython that is "only" 2 or 3 times faster is a major improvement, but a PyPy that supports 80% of the C extensions that it currently does not is still not a replacement for CPython.
True, but it does sound like you're pushing for the sorts of changes that PyPy already has. Not sure whether your proposed changes would be better done to PyPy or to CPython. ChrisA
On 20/10/2020 5:48 pm, Chris Angelico wrote:
On Wed, Oct 21, 2020 at 3:31 AM Mark Shannon <mark@hotpy.org> wrote:
Hi Chris,
On 20/10/2020 4:37 pm, Chris Angelico wrote:
On Wed, Oct 21, 2020 at 12:03 AM Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
The overall aim is to speed up CPython by a factor of (approximately) five. We aim to do this in four distinct stages, each stage increasing the speed of CPython by (approximately) 50%.
This is a very bold estimate. Particularly, you're proposing a number of small tweaks in stage 2 and expecting that (combined) they can give a 50% improvement in overall performance?
20 tweaks each providing a 2% is a 49% speedup. Stage 1 will open up optimizations that are currently worthwhile.
Yes, I understand mathematics. Do you have evidence that shows that each of the twenty tweaks can give a two percent speedup though?
My point was that small changes can easily add up to a large change. And yes, I have a long list of possible small optimizations.
Do you have any details to back this up? You're not just asking for a proposal to be accepted, you're actually asking for (quite a bit of) money, and then hoping to find a contractor to do the actual work.
I am offering to do the work.
Sure, that takes away some of the uncertainty, but you're still asking for a considerable amount of money sight-unseen.
I'm not asking for money up front. I'm asking for some promise of payment, once the work is done. If I fail, only I suffer a loss.
BIG BIG concern: You're basically assuming that all this definition of performance is measured for repeated executions of code. That's how PyPy already works, and it most often suffers quite badly in startup performance to make this happen. Will your proposed changes mean that CPython has to pay the same startup costs that PyPy does?
Could you clarify what you think I'm assuming?
When you say start up, do you mean this?
$ time python3 -S -c ""
real 0m0.010s
$ time pypy -S -c ""
real 0m0.017s
No, there would be no slower startup. In fact the tier 0 interpreter should start a fraction faster than 3.9.
That's a microbenchmark, but yes, that's the kind of thing I'm talking about. For short scripts, will "python3.13 script.py" be slower than "python3.9 script.py"?
Tiered execution means that 3.10+ should be no slower than 3.9 for any program, and faster for all but really short ones. Tier 0 would be a bit slower than 3.9, but will start faster. Tier 1 should kick in before 3.9 would catch up. Cheers, Mark.
On Wed, Oct 21, 2020 at 02:37:02AM +1100, Chris Angelico wrote:
Do you have any details to back this up? You're not just asking for a proposal to be accepted, you're actually asking for (quite a bit of) money, and then hoping to find a contractor to do the actual work.
Payment is on delivery. At each stage, if the contractor fails to deliver the promised gains, they get nothing. (I believe that Mark is being polite by referring to a generic contractor. I think he is referring to himself.)
That means you're expecting that anyone would be able to achieve this, given sufficient development time.
With sufficient time, maybe the horse will learn to sing. https://everything2.com/title/And+maybe+the+horse+will+learn+how+to+sing But I don't think Mark believes *anyone* will be able to improve performance. If it were that easy that anyone could do it, Python would already be blazingly fast.
BIG BIG concern: You're basically assuming that all this definition of performance is measured for repeated executions of code.
That's not what the proposal says. "Performance should improve for all code, not just loop-heavy and numeric code." In fact Mark goes further: he says that he's willing to allow some performance degradation on loop heavy code, if the overall performance increases. But why are we nitpicking Stage Two? The beauty of Mark's proposal is: 1. there is no committment to any stage until the previous stage is complete; 2. there is no committment to pay for any stage unless the contractor actually delivers the goods. If you don't think that Mark, or the contractor, will be able to deliver a 50% speed up in Stage 2, what have you got to lose? If he fails to deliver, you pay nothing and don't commit to Stage 3. If he does deliver, you get the desired result. (The details allow for some slack: if the speed up is only 49%, the contractor still gets paid. Presumably it will be on some sort of sliding scale.)
That's how PyPy already works, and it most often suffers quite badly in startup performance to make this happen. Will your proposed changes mean that CPython has to pay the same startup costs that PyPy does?
Good question.
What would happen if $2M were spent on improving PyPy3 instead?
Then both of the PyPy3 users will be very happy *wink* (I know, that's a terrible, horrible, inaccurate slight on the PyPy community, which is very probably thriving, and I would delete it if I hadn't already hit Send.) I think this isn't a matter of just throwing money at a project. Mark knows the CPython architecture. I don't know if he knows the PyPy architecture, and unless a PyPy expert comes up with a counter proposal, we don't know that spending $2M on PyPy will see any sort of comparable gains. -- Steve
On Wed, Oct 21, 2020 at 3:38 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Oct 21, 2020 at 02:37:02AM +1100, Chris Angelico wrote:
Do you have any details to back this up? You're not just asking for a proposal to be accepted, you're actually asking for (quite a bit of) money, and then hoping to find a contractor to do the actual work.
Payment is on delivery. At each stage, if the contractor fails to deliver the promised gains, they get nothing.
(I believe that Mark is being polite by referring to a generic contractor. I think he is referring to himself.)
It's a little unclear from the proposal, as there was something about whether a suitable contractor could be found, but sure. TBH I'd be happier with this proposal as a direct offer/request for money than as "hey let's go look for potential coders", but it sounds like that's the plan anyway?
That means you're expecting that anyone would be able to achieve this, given sufficient development time.
With sufficient time, maybe the horse will learn to sing.
https://everything2.com/title/And+maybe+the+horse+will+learn+how+to+sing
But I don't think Mark believes *anyone* will be able to improve performance. If it were that easy that anyone could do it, Python would already be blazingly fast.
Yeah. And the "anyone" part is the concern I had - that the proposal was asking for funding and then for a search for a contractor. But if it's "pay me and I'll write this", then it's a bit more concrete.
BIG BIG concern: You're basically assuming that all this definition of performance is measured for repeated executions of code.
That's not what the proposal says.
"Performance should improve for all code, not just loop-heavy and numeric code."
In fact Mark goes further: he says that he's willing to allow some performance degradation on loop heavy code, if the overall performance increases.
"Overall performance" is a myth, and there's no way that CPython will magically be able to execute any code with the exact same performance improvement. So my question is: what happens to startup performance, what happens to short scripts, what happens to the interpreter's load times, etc, etc, etc? It could be that all code becomes faster, but only after it's been run a few times. That would be great for, say, a web app - it handles a request, goes back and waits for another, and then handles the next one a bit faster - but not for a command-line script. (And yes, I'm aware that it'd theoretically be possible to dump the compiler state to disk, but that has its own risks.)
What would happen if $2M were spent on improving PyPy3 instead?
Then both of the PyPy3 users will be very happy *wink*
(I know, that's a terrible, horrible, inaccurate slight on the PyPy community, which is very probably thriving, and I would delete it if I hadn't already hit Send.)
Yes, you're being horribly insulting to the third user of PyPy3, who is probably right now warming up his interpreter so he can send you an angry response :) I guess my biggest concern with this proposal is that it's heavy on mathematically pure boasts and light on actual performance metrics, and I'm talking here about the part where (so we're told) the code is all done and it just takes a swipe of a credit card to unlock it. And without the ability to run it myself, I can't be sure that it'd actually give *any* performance improvement on my code or use-cases. So there's a lot that has to be taken on faith, and I guess I'm just a bit dubious of how well it'd fulfil that. ChrisA
On Tue, Oct 20, 2020 at 9:33 AM Steven D'Aprano <steve@pearwood.info> wrote:
What would happen if $2M were spent on improving PyPy3 instead?
Then both of the PyPy3 users will be very happy *wink*
Wow, I didn't know I was 50% of Pypy3 users :) Anyway, Pypy3 is already pretty great. I'm sure it can be improved, but I suspect what it needs most is HPY work, which could benefit a lot of Python language implementations in the long term: https://github.com/hpyproject/hpy $2e6 spent on HPY could be pretty amazing.
Since HPy was mentioned, hello from the HPy team! If anyone is thinking about Python performance or new Python VMs, we'd love them to take a look at HPy and come and talk to us. HPy is meant to provide a new C API layer that any Python VM could implement in order to efficiently support Python extensions written on top of HPy. Amazingly, such extensions would also be backwards compatible (i.e. they would also run on CPython as it exists today). If you'd like to talk to us (and we would like to talk to you) you can find us at: * Mailing list: hpy-dev@python.org * IRC: hpy-dev@python.org If you would like to contribute as a developer (or any other way), we're here to help (and at the moment I am prioritising landing other people's contributions pretty much full-time). If someone has money to donate, they can donate to the project at https://opencollective.com/hpy/contribute. We have quite a few contributors who could contribute more time than they do in exchange for money. HPy is currently in quite an early stage of development (we can already run some simple Python C extensions without any performance penalties on CPython). The upside is that right now we can consider almost any suggestion for improving HPy because none of the design is set in stone. In a few months it might be helpful to have people try porting some (many) C extensions to HPy to see how that goes. Yours happily, Simon Cross (random person working on HPy)
On 10/20/2020 2:49 PM, Dan Stromberg wrote:
I suspect what it needs most is HPY work, which could benefit a lot of Python language implementations in the long term: https://github.com/hpyproject/hpy
$2e6 spent on HPY could be pretty amazing.
I don't think the two projects are mutually exclusive. I would like to see the PSF raise and spend a few million a year on improvements. -- Terry Jan Reedy
On Wed, Oct 21, 2020 at 4:28 AM Terry Reedy <tjreedy@udel.edu> wrote:
I don't think the two projects are mutually exclusive.
100% agreed. I would even go as far as to say that HPy and other proposals to improve Python are mutually beneficial. HPy aims to remove dependencies between C extensions and Python internals, so that Python internals can evolve more easily. Someone else still needs to do the evolving.
On Wed, Oct 21, 2020 at 3:38 AM Steven D'Aprano steve@pearwood.info <mailto:steve@pearwood.info> wrote:
some insulting FUD that is not worth repeating, and an apology
Just to set the record straight, PyPy has been available on conda-forge [0] since March, and has seen close to 70,000 downloads [1] from that channel alone, in addition to the downloads from https://downloads.python.org/pypy and the other channels where it is available. This is far from CPython's wild popularity, but people are successfully using it with the scientific python stack. It is true there is more work to be done, that does not mean it is useless. It is in CPython's interest to have a viable alternative interpreter for many reasons. The code of conduct [3] should guide discourse when relating to all people and projects in the Python ecosystem, not just internally to CPython. Matti [0] https://conda-forge.org/blog/posts/2020-03-10-pypy/ [1] https://anaconda.org/conda-forge/pypy3.6 [2] https://www.python.org/psf/conduct/
On Wed, Oct 21, 2020 at 8:23 PM Matti Picus <matti.picus@gmail.com> wrote:
Just to set the record straight, PyPy has been available on conda-forge [0] since March, and has seen close to 70,000 downloads [1] from that channel alone, in addition to the downloads from https://downloads.python.org/pypy and the other channels where it is available. This is far from CPython's wild popularity, but people are successfully using it with the scientific python stack. It is true there is more work to be done, that does not mean it is useless.
When I go looking for PyPy performance stats, everything seems to be Python 2.7. Is there anywhere that compares PyPy3 to CPython 3.6 (or whichever specific version)? Or maybe it's right there on https://speed.pypy.org/ and I just can't see it - that's definitely possible :) ChrisA
On Tue, 20 Oct 2020 at 14:01, Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
https://github.com/markshannon/faster-cpython
I'd love to hear your thoughts on this.
A very blunt (apologies if it's too blunt) restatement of this proposal (at least stage 1) seems to me to be "Give me $250k, and donate $250k to the PSF for ongoing support, and I'll let you have code that I've already written that gives CPython a 50% speedup. If my code doesn't deliver a 50% speedup, I'll give the money back. No money, no code. We can also discuss 3 more incremental steps of the same sort of magnitude, but I don't have code already written for them". Honestly, if someone's able to get together $500k, that sounds like a fairly good deal (no risk). If you're asking for a commitment to the full $2M, even if stages 2-4 are contingent on delivery, that's a bit of a harder ask (because the cashflow implication of committing that sort of money becomes relevant). But maybe someone can do it. I'm fine with this, I guess. I don't have $500k to offer myself, so all my agreement is worth is that I don't have a problem with this much work being funded/provided via one big donation. I assume that part of "delivery" would involve code review, etc. - we wouldn't be bypassing normal development workflow. So there's still work to be done in putting the code through review, responding to review comments, etc, and ensuring that the code is delivered in a form that the core devs are happy to maintain (PSF donation for support notwithstanding). Actually, a chunk of that support allocation to the PSF might be needed to pay for reviewers, if (as I suspect) this is a large and complex PR. What I don't see is where the money's coming from. It's fine to ask, but will anyone come up with that sort of funding? Paul
On Tue, Oct 20, 2020 at 06:04:37PM +0100, Paul Moore wrote:
What I don't see is where the money's coming from. It's fine to ask, but will anyone come up with that sort of funding?
I don't think Mark is asking for you or I to fund the exercise. He's asking for the PSF to fund it. I think that Mark's asking price is perhaps a bit optimistic. With the Covid-19 pandemic, lock downs and shutting down of international travel, I expect that PSF conference income will be way down. And income in the previous financial year was not exactly at Facebook levels: Revenue: $3.1 million Expences: $2.6 million Nett income: $520000 https://www.python.org/psf/annual-report/2019/ Mark is asking for half of that, justified that this is the going rate for a full-time developer at one of the top tier tech firms. -- Steve
On 21/10/2020 00.14, Steven D'Aprano wrote:
On Tue, Oct 20, 2020 at 06:04:37PM +0100, Paul Moore wrote:
What I don't see is where the money's coming from. It's fine to ask, but will anyone come up with that sort of funding?
I don't think Mark is asking for you or I to fund the exercise. He's asking for the PSF to fund it.
No, he is not. Mark is asking the PSF to organize a fund raiser and keep half the money. Christian
On Wed, 21 Oct 2020 at 08:14, Christian Heimes <christian@python.org> wrote:
On 21/10/2020 00.14, Steven D'Aprano wrote:
On Tue, Oct 20, 2020 at 06:04:37PM +0100, Paul Moore wrote:
What I don't see is where the money's coming from. It's fine to ask, but will anyone come up with that sort of funding?
I don't think Mark is asking for you or I to fund the exercise. He's asking for the PSF to fund it.
No, he is not. Mark is asking the PSF to organize a fund raiser and keep half the money.
Right. I'd misinterpreted the fact that the PSF was to get half the money as meaning they weren't doing the fundraising. My misunderstanding, thanks for the clarification. Paul
On 21/10/2020 09.35, Paul Moore wrote:
On Wed, 21 Oct 2020 at 08:14, Christian Heimes <christian@python.org> wrote:
On 21/10/2020 00.14, Steven D'Aprano wrote:
On Tue, Oct 20, 2020 at 06:04:37PM +0100, Paul Moore wrote:
What I don't see is where the money's coming from. It's fine to ask, but will anyone come up with that sort of funding?
I don't think Mark is asking for you or I to fund the exercise. He's asking for the PSF to fund it.
No, he is not. Mark is asking the PSF to organize a fund raiser and keep half the money.
Right. I'd misinterpreted the fact that the PSF was to get half the money as meaning they weren't doing the fundraising. My misunderstanding, thanks for the clarification.
You are welcome! IMHO you got it right in your initial posting. I'm irritated that Steven is spreading FUD although Mark's documents clearly states his intention [1]: The PSF seems to be the obvious organization to coordinate funding. Mark is asking the PSF to organize a fund raiser and to act as a trustee. This is the most logical and reasonable approach. We don't want to have another requests incident, do we? For stage 2-4 Mark is also willing to put his reputation and financial security in jeopardy *and* give the PSF half of the income in exchange for little risk. Christian [1] https://github.com/markshannon/faster-cpython/blob/master/funding.md
On Wed, Oct 21, 2020 at 8:37 AM Paul Moore <p.f.moore@gmail.com> wrote:
On Wed, 21 Oct 2020 at 08:14, Christian Heimes <christian@python.org> wrote:
On 21/10/2020 00.14, Steven D'Aprano wrote:
On Tue, Oct 20, 2020 at 06:04:37PM +0100, Paul Moore wrote:
What I don't see is where the money's coming from. It's fine to ask, but will anyone come up with that sort of funding?
I don't think Mark is asking for you or I to fund the exercise. He's asking for the PSF to fund it.
No, he is not. Mark is asking the PSF to organize a fund raiser and keep half the money.
Right. I'd misinterpreted the fact that the PSF was to get half the money as meaning they weren't doing the fundraising. My misunderstanding, thanks for the clarification.
Paul
In 2004 a single company paid me to organise the "Need for Speed" sprint, held in Iceland. A lot was achieved, particularly in string searching and the re module, though I can't honestly say how much impact it had on "overall performance". The work we did with pybench that week definitely moved Python benchmarking along some. Sixteen years later, the PSF's income may be down due to external factors, but its connectivity at a high level with Python users has improved immeasurably. Need for Speed cost ~$300,000 in inflation-adjusted dollars. If one relatively small trading house could fund at that level it seems likely the PSF could fund this suggested a project quite separately from its existing revenues as long as the development community was behind it and prepared to help with materials for the "sell."
On Wed, Oct 21, 2020 at 09:06:58AM +0200, Christian Heimes wrote:
On 21/10/2020 00.14, Steven D'Aprano wrote:
On Tue, Oct 20, 2020 at 06:04:37PM +0100, Paul Moore wrote:
What I don't see is where the money's coming from. It's fine to ask, but will anyone come up with that sort of funding?
I don't think Mark is asking for you or I to fund the exercise. He's asking for the PSF to fund it.
No, he is not. Mark is asking the PSF to organize a fund raiser and keep half the money.
I think that's inaccurate. The funding.md document doesn't mention "fund raiser", it doesn't specify how the PSF is to collect the funds, and I wouldn't expect it to. The PSF has various income streams, one of which is donations, and how they collect the money for this proposal is up to them, not Mark. If the PSF decide that the best way to make the money is to have a bake sale, that's their prerogative. The *how* is not really relevant or up to us (except to the degree that some of us may be members of the PSF). And they won't be *keeping* half the money, they will be spending it on on-going maintenance. (Or at least that is Mark's expectation.) So the intent is for the money to be spent at some point, and not for general expenses, but specifically on this project. -- Steve
On 21/10/2020 11.37, Steven D'Aprano wrote:
On Wed, Oct 21, 2020 at 09:06:58AM +0200, Christian Heimes wrote:
On 21/10/2020 00.14, Steven D'Aprano wrote:
On Tue, Oct 20, 2020 at 06:04:37PM +0100, Paul Moore wrote:
What I don't see is where the money's coming from. It's fine to ask, but will anyone come up with that sort of funding?
I don't think Mark is asking for you or I to fund the exercise. He's asking for the PSF to fund it.
No, he is not. Mark is asking the PSF to organize a fund raiser and keep half the money.
I think that's inaccurate. The funding.md document doesn't mention "fund raiser", it doesn't specify how the PSF is to collect the funds, and I wouldn't expect it to.
You are spreading FUD. Stop speculating and give Mark the benefit of a doubt. This kind of negative attitude is a main cause of contributor burnout. If you are unsure about Mark's intentions and goals then please open an issue on his repository.
I'd love to hear more about what workloads you're targeting and how you came up with the anticipated numbers for the improvements. For comparison, our new jit provides a single-digit-percentage speedup on our django and flask benchmarks. On Tue, Oct 20, 2020 at 9:03 AM Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
https://github.com/markshannon/faster-cpython
I'd love to hear your thoughts on this.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RDXLCH22... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Oct 20, 2020 at 5:59 AM Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
https://github.com/markshannon/faster-cpython
I'd love to hear your thoughts on this.
Cheers, Mark.
+1 Overall I think you are making quite a reasonable proposal. It sounds like effectively bringing your hotpy2 concepts into the CPython interpreter with an intent to help maintain them over the long term. People worried that you are doing this out of self interest may not know who you are. Sure, you want to be paid to do work that you appear to love and have been mulling over for a decade+. There is nothing wrong with that. Payment is proposed as on delivery per phase. I like the sound of that, nice! Challenges I expect we'll face, that seem tractable to me, are mostly around what potential roadblocks we, us existing python-committers and our ultimate decider steering council might introduce intentionally or not that prevents landing such work. Payment on delivery helps that a lot, if we opt out of some work, it is both our losses. One potential outcome is that you'd burn out and go away if we didn't accept something meaning payment wasn't going to happen. That already happens amongst all core devs today so I don't have a problem with this even though it isn't what we'd rightfully want to happen. Middle grounds are quite reasonable renegotiations. The deciders on this would be the PSF (because money) and the PSF would presumably involve the Steering Council in decisions of terms and judgements. Some background materials for those who don't already know Mark's past work along these lines that is where this proposal comes from: https://sites.google.com/site/makingcpythonfast/ (hotpy) and the associated presentation in 2012: https://www.youtube.com/watch?v=c6PYnZUMF7o The amount of money seems entirely reasonable to me. Were it to be taken on, part of the PSF's job is to drum up the money. This would be a contract with outcomes that could effectively be sold to funders in order to do so. There are many companies who use CPython a lot that we could solicit funding from, many of whom have employees among our core devs already. Will they bite? It doesn't even have to be from a single source or be all proposed phases up front, that is what the PSF exists to decide and coordinate on. We've been discussing on and off in the past many years how to pay people for focused work on CPython and the ecosystem and balance that with being an open source community project. We've got some people employed along these lines already, this would become more of that and in many ways just makes sense to me. Summarizing some responses to points others have brought up: Performance estimates: * Don't fret about claimed speedups of each phase. We're all going to doubt different things or expect others to be better. The proof is ultimately in the future pudding. JIT topics: * JITs rarely stand alone. The phase 1+2 improved interpreter will always exist. It is normal to start with an interpreter for fast startup and initial tracing before performing JIT compilations, and as a fallback mechanism when the JIT isn't appropriate or available. (my background: Transmeta. We had an Interpreter and at least two levels of Translators behind our x86 compatible CPUs, all were necessary) * Sometimes you simply want to turn tracing and jit stuff off to save memory. That knob always winds up existing. If nothing else it is normal to run our test suite with such a knob in multiple positions for proper coverage. * It is safe to assume any later phase actual JIT would target at least one important platform (ex: amd64 or aarch64) and if successful should easily elicit contributions supporting others either as work or as funding to create it. "*Why this, why not fund XyZ?*" whataboutism: * This conversation is separate from other projects. The way attracting funding for a project works can involve spelling out what it is for. It isn't my decision, but I'd be amazed if anything beyond maybe phase 1 came solely out of a PSF general no obligation fund. CPython is the most used Python VM in the world. A small amount of funding is not going to get maintainers and users to switch to PyPy. There is unlikely to be a major this or that situation here. Unladen Swallow * That was a fixed time one year attempt at speeding up CPython by Google. IIRC CPython's computed goto support came out of that(?), as did a ton of improvements to the LLVM internals that we don't see in python-dev land as that project was not yet anywhere near ready to take on dynamic language VMs at the time. At the end the llvm backed side was not something that was deemed maintainable or necessarily a win, so it was not accepted by us and was shelved. It wasn't a clear win and carried a very complicated cross project maintenance burden. I still work with many of the people involved in that project, at least one of whom works full time on LLVM today. Nobody involved that I'm aware of is bitter about it. It was a fixed time experiment, a few projects got some good out of it. Another reason that did continue: The motivating internal application we attempted Unladen Swallow for ultimately found they were more Memory than CPU constrained in terms of their compute resources planning... 5-6 years ago an attempt at getting the same internal application up and running on PyPy (which led to many contributions to PyPy's cpyext) ran in part into the memory constraint. (*there were more issues with pypy - cpyext vs performance being but one; this isn't the place for that and I'm not the right person to ask*) meta: i've written too many words and edited so often i can't see my own typos and misedits anymore. i'll stop now. :) -gps
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RDXLCH22... Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Oct 21, 2020 at 3:51 AM Gregory P. Smith <greg@krypto.org> wrote:
meta: i've written too many words and edited so often i can't see my own typos and misedits anymore. i'll stop now. :)
Haha! Very interesting background, thank you for writing down all of this!
Let me explain an impression I'm getting. It is *just one aspect* of my opinion, one that doesn't make sense to me. Please tell me where it is wrong. In the C API, there's a somewhat controversial refactoring going on, which involves passing around tstate arguments. I'm not saying [the first discussion] was perfect, and there are still issues, but, however flawed the "do-ocracy" process is, it is the best way we found to move forward. No one who can/wants to do the work has a better solution. Later, Mark says there is an even better way – or at least, a less intrusive one! In [the second discussion], he hints at it vaguely (from that limited info I have, it involves switching to C11 and/or using compiler-specific extensions -- not an easy change to do). But frustratingly, Mark doesn't reveal any actual details, and a lot of the complaints are about churn and merge conflicts. And now, there's news -- the better solution won't be revealed unless the PSF pays for it! That's a very bad situation to be in for having discussions: basically, either we disregard Mark and go with the not-ideal solution, or virtually all work on changing the C API and internal structures is blocked. I sense a similar thing happening here: https://github.com/ericsnowcurrently/multi-core-python/issues/69 -- there's a vague proposal to do things very differently, but I find it hard to find anything actionable. I would like to change my plans to align with Mark's fork, or to better explain some of the non-performance reasons for recent/planned changes. But I can't, because details are behind a paywall. [the first discussion]: https://mail.python.org/archives/list/python-dev@python.org/thread/PQBGECVGV... [the second discussion]: https://mail.python.org/archives/list/python-dev@python.org/thread/KGBXVVJQZ... On 10/20/20 2:53 PM, Mark Shannon wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
https://github.com/markshannon/faster-cpython
I'd love to hear your thoughts on this.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RDXLCH22...
Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, 21 Oct 2020 12:49:54 +0200 Petr Viktorin <encukou@gmail.com> wrote:
Later, Mark says there is an even better way – or at least, a less intrusive one! In [the second discussion], he hints at it vaguely (from that limited info I have, it involves switching to C11 and/or using compiler-specific extensions -- not an easy change to do). But frustratingly, Mark doesn't reveal any actual details, and a lot of the complaints are about churn and merge conflicts. And now, there's news -- the better solution won't be revealed unless the PSF pays for it!
That's a very bad situation to be in for having discussions: basically, either we disregard Mark and go with the not-ideal solution, or virtually all work on changing the C API and internal structures is blocked.
The disagreement is basically on the promises of the "not-ideal solution". Victor claims it will help improve performance. People like Mark and I are skeptical that the C API is really an important concern (apart from small fixes relating to borrowed references, and that's mostly to make PyPy's life easier). Personally, I trust that Mark's proposed plan is workable. That doesn't mean it *will* work (that depends a lot on being able to maintain motivation and integrate changes at a good pace - which may be a challenge given the technical conservativeness in the CPython community), but it's technically sound. Regards Antoine.
On 10/21/20 4:04 AM, Antoine Pitrou wrote:
(apart from small fixes relating to borrowed references, and that's mostly to make PyPy's life easier).
Speaking as the Gilectomy guy: borrowed references are evil. The definition of the valid lifetime of a borrowed reference doesn't exist, because they are a hack (baked into the API!) that we mostly "get away with" just because of the GIL. If I still had wishes left on my monkey's paw I'd wish them away*. //arry/ * Unfortunately, I used my last wish back in February, wishing I could spend more time at home.
On 21 Oct 2020, at 14:39, Larry Hastings <larry@hastings.org> wrote:
On 10/21/20 4:04 AM, Antoine Pitrou wrote:
(apart from small fixes relating to borrowed references, and that's mostly to make PyPy's life easier).
Speaking as the Gilectomy guy: borrowed references are evil. The definition of the valid lifetime of a borrowed reference doesn't exist, because they are a hack (baked into the API!) that we mostly "get away with" just because of the GIL. If I still had wishes left on my monkey's paw I'd wish them away*.
Even with the GIL borrowed references are problematic. There are a lot of cases where using a borrowed reference after calling an API that might run Python code might invalidate the borrowed reference. In general the only safe thing to do with a borrowed reference is to turn it into a strong reference as soon as possible. Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/
/arry
* Unfortunately, I used my last wish back in February, wishing I could spend more time at home.
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PRIVTI2R... Code of Conduct: http://python.org/psf/codeofconduct/
Hi Petr, On 21/10/2020 11:49 am, Petr Viktorin wrote:
Let me explain an impression I'm getting. It is *just one aspect* of my opinion, one that doesn't make sense to me. Please tell me where it is wrong.
In the C API, there's a somewhat controversial refactoring going on, which involves passing around tstate arguments. I'm not saying [the first discussion] was perfect, and there are still issues, but, however flawed the "do-ocracy" process is, it is the best way we found to move forward. No one who can/wants to do the work has a better solution.
Later, Mark says there is an even better way – or at least, a less intrusive one! In [the second discussion], he hints at it vaguely (from that limited info I have, it involves switching to C11 and/or using compiler-specific extensions -- not an easy change to do). But frustratingly, Mark doesn't reveal any actual details, and a lot of the complaints are about churn and merge conflicts. And now, there's news -- the better solution won't be revealed unless the PSF pays for it!
There's no secret. C thread locals are well documented. I even provided a code example last time we discussed it. You reminded me of it yesterday ;) https://godbolt.org/z/dpSo-Q The "even faster" solution I mentioned yesterday, is as I stated yesterday to use an aligned stack. If you wanted more info, you could have asked :) First, you ensure that the stack is in a 2**N aligned block. Assuming that the C stack grows down from the top, then the threadstate struct goes at the bottom. It's probably a good idea to put a guard page between the C stack and the threadstate struct. The struct's address can then be found by masking off the bottom N bits from the stack pointer. This approach uses 0 registers and cost 1 ALU instruction. Can't get cheaper than that :) It's not portable and probably a pain to implement, but it is fast. But it doesn't matter how it's implemented. The implementation is hidden behind `PyThreadState_GET()`, it can be changed to use a thread local, or to some fancy aligned stack, without the rest of the codebase changing.
That's a very bad situation to be in for having discussions: basically, either we disregard Mark and go with the not-ideal solution, or virtually all work on changing the C API and internal structures is blocked.
The existence of multiple interpreters should be orthogonal to speeding up those interpreters, provided the separation is clean and well designed. But it should be clean and well designed anyway, IMO.
I sense a similar thing happening here: https://github.com/ericsnowcurrently/multi-core-python/issues/69 --
The title of that issue is 'Clarify what is a "sub-interpreter" and what is an "interpreter"'?
there's a vague proposal to do things very differently, but I find it
This? https://github.com/ericsnowcurrently/multi-core-python/issues/69#issuecommen...
hard to find anything actionable. I would like to change my plans to align with Mark's fork, or to better explain some of the non-performance reasons for recent/planned changes. But I can't, because details are behind a paywall.
Let's make this very clear. My objections to the way multiple interpreters is being implemented has very little to do speeding up the interpreter and entirely to do with long term maintenance and ultimate success of the project. Obviously, I would like it if multiple interpreters didn't slowdown CPython. But that has always been the case. Cheers, Mark.
[the first discussion]: https://mail.python.org/archives/list/python-dev@python.org/thread/PQBGECVGV...
[the second discussion]: https://mail.python.org/archives/list/python-dev@python.org/thread/KGBXVVJQZ...
On 10/20/20 2:53 PM, Mark Shannon wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
https://github.com/markshannon/faster-cpython
I'd love to hear your thoughts on this.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RDXLCH22...
Code of Conduct: http://python.org/psf/codeofconduct/
Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/7DKURFZ3...
Code of Conduct: http://python.org/psf/codeofconduct/
On 10/21/20 1:40 PM, Mark Shannon wrote:
Hi Petr,
On 21/10/2020 11:49 am, Petr Viktorin wrote:
Let me explain an impression I'm getting. It is *just one aspect* of my opinion, one that doesn't make sense to me. Please tell me where it is wrong.
In the C API, there's a somewhat controversial refactoring going on, which involves passing around tstate arguments. I'm not saying [the first discussion] was perfect, and there are still issues, but, however flawed the "do-ocracy" process is, it is the best way we found to move forward. No one who can/wants to do the work has a better solution.
Later, Mark says there is an even better way – or at least, a less intrusive one! In [the second discussion], he hints at it vaguely (from that limited info I have, it involves switching to C11 and/or using compiler-specific extensions -- not an easy change to do). But frustratingly, Mark doesn't reveal any actual details, and a lot of the complaints are about churn and merge conflicts. And now, there's news -- the better solution won't be revealed unless the PSF pays for it!
There's no secret. C thread locals are well documented. I even provided a code example last time we discussed it.
You reminded me of it yesterday ;) https://godbolt.org/z/dpSo-Q
At the risk of going off topic: That's for GCC. As far as I know, MSVC uses something like __declspec( thread ). What are the options for generic C99 compilers, other than staying slow?
The "even faster" solution I mentioned yesterday, is as I stated yesterday to use an aligned stack. If you wanted more info, you could have asked :)
First, you ensure that the stack is in a 2**N aligned block. Assuming that the C stack grows down from the top, then the threadstate struct goes at the bottom. It's probably a good idea to put a guard page between the C stack and the threadstate struct.
The struct's address can then be found by masking off the bottom N bits from the stack pointer. This approach uses 0 registers and cost 1 ALU instruction. Can't get cheaper than that :)
It's not portable and probably a pain to implement, but it is fast.
But it doesn't matter how it's implemented. The implementation is hidden behind `PyThreadState_GET()`, it can be changed to use a thread local, or to some fancy aligned stack, without the rest of the codebase changing.
Not portable and hard to implement is a pain for maintenance – especially porting CPython to new compilers/platforms/situations. The alternative is changing the codebase, which (it seems from the discussions) would give us comparable performance, everywhere, and the result can be maintained by many more people.
That's a very bad situation to be in for having discussions: basically, either we disregard Mark and go with the not-ideal solution, or virtually all work on changing the C API and internal structures is blocked.
The existence of multiple interpreters should be orthogonal to speeding up those interpreters, provided the separation is clean and well designed. But it should be clean and well designed anyway, IMO.
+1
I sense a similar thing happening here: https://github.com/ericsnowcurrently/multi-core-python/issues/69 --
The title of that issue is 'Clarify what is a "sub-interpreter" and what is an "interpreter"'?
there's a vague proposal to do things very differently, but I find it
This? https://github.com/ericsnowcurrently/multi-core-python/issues/69#issuecommen...
I'll continue there.
hard to find anything actionable. I would like to change my plans to align with Mark's fork, or to better explain some of the non-performance reasons for recent/planned changes. But I can't, because details are behind a paywall.
Let's make this very clear. My objections to the way multiple interpreters is being implemented has very little to do speeding up the interpreter and entirely to do with long term maintenance and ultimate success of the project.
Obviously, I would like it if multiple interpreters didn't slowdown CPython. But that has always been the case.
Thank you for clearing my doubts!
On 10/21/20 5:58 AM, Petr Viktorin wrote:
At the risk of going off topic: That's for GCC. As far as I know, MSVC uses something like __declspec( thread ). What are the options for generic C99 compilers, other than staying slow?
As a practical matter: does CPython even support "generic C99 compilers"? AFAIK we support three specific compilers: GCC, Clang, and MSVC. (Maybe we also support icc? I think mostly because it supports GCC language extensions.) //arry/
On 21/10/2020 20.55, Larry Hastings wrote:
On 10/21/20 5:58 AM, Petr Viktorin wrote:
At the risk of going off topic: That's for GCC. As far as I know, MSVC uses something like __declspec( thread ). What are the options for generic C99 compilers, other than staying slow?
As a practical matter: does CPython even support "generic C99 compilers"? AFAIK we support three specific compilers: GCC, Clang, and MSVC.
(Maybe we also support icc? I think mostly because it supports GCC language extensions.)
We don't prohibit users to use exotic compilers. Some users maintain Python on platforms like AIX and Solaris with closed source compilers. In my opinion it would fine to focus on X86_64 and GCC first. That will cover the majority of servers and consumer PCs. Clang and GCC have a similar feature set and extensions, so clang should be doable with manageable amount of effort, too. After X86_64 I'd consider AArch64 (ARM64) and MSVC next. Christian
Hi Mark, This sounds really cool. Can you give us more details? Some questions that occurred to me while reading: - You're suggesting that the contractor would only be paid if the desired 50% speedup is achieved, so I guess we'd need some objective Python benchmark that boils down to a single speedup number. Did you have something in mind for this? - How much of the work has already been completed? - Do you have any preliminary results of applying that work to that benchmark? Even if it's preliminary, it would still help a lot in making the case for this being a realistic plan. -n On Tue, Oct 20, 2020 at 6:00 AM Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
https://github.com/markshannon/faster-cpython
I'd love to hear your thoughts on this.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RDXLCH22... Code of Conduct: http://python.org/psf/codeofconduct/
-- Nathaniel J. Smith -- https://vorpus.org <http://vorpus.org>
Hi Nathaniel, On 22/10/2020 7:36 am, Nathaniel Smith wrote:
Hi Mark,
This sounds really cool. Can you give us more details? Some questions that occurred to me while reading:
- You're suggesting that the contractor would only be paid if the desired 50% speedup is achieved, so I guess we'd need some objective Python benchmark that boils down to a single speedup number. Did you have something in mind for this?
- How much of the work has already been completed?
A fair bit of stage 1, and much research and design for the later stages.
- Do you have any preliminary results of applying that work to that benchmark? Even if it's preliminary, it would still help a lot in making the case for this being a realistic plan.
Getting a PGO/LTO comparison against 3.10 is tricky. Mainly because I'm relying on merging a bunch of patches and expecting it to work :) However, on a few simple benchmarks I'm seeing about a 70% speedup vs master for both default and LTO configures. I would expect a lower speedup on a wider range of benchmarks with a PGO/LTO build. But 50% is definitely achievable. Cheers, Mark.
-n
On Tue, Oct 20, 2020 at 6:00 AM Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
https://github.com/markshannon/faster-cpython
I'd love to hear your thoughts on this.
Cheers, Mark. _______________________________________________ Python-Dev mailing list -- python-dev@python.org <mailto:python-dev@python.org> To unsubscribe send an email to python-dev-leave@python.org <mailto:python-dev-leave@python.org> https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/RDXLCH22... Code of Conduct: http://python.org/psf/codeofconduct/
-- Nathaniel J. Smith -- https://vorpus.org <http://vorpus.org>
On Thu, 22 Oct 2020 at 12:52, Mark Shannon <mark@hotpy.org> wrote:
Getting a PGO/LTO comparison against 3.10 is tricky. Mainly because I'm relying on merging a bunch of patches and expecting it to work :)
However, on a few simple benchmarks I'm seeing about a 70% speedup vs master for both default and LTO configures.
I would expect a lower speedup on a wider range of benchmarks with a PGO/LTO build. But 50% is definitely achievable.
Apologies if this is already mentioned somewhere, but is this across all supported platforms (I'm specifically interested in Windows) or is it limited to only some? I assume the long-term expectation would be to get the speedup on all supported platforms, I'm mainly curious as to whether your current results are platform-specific or general. Paul
Hi Paul, On 22/10/2020 1:18 pm, Paul Moore wrote:
On Thu, 22 Oct 2020 at 12:52, Mark Shannon <mark@hotpy.org> wrote:
Getting a PGO/LTO comparison against 3.10 is tricky. Mainly because I'm relying on merging a bunch of patches and expecting it to work :)
However, on a few simple benchmarks I'm seeing about a 70% speedup vs master for both default and LTO configures.
I would expect a lower speedup on a wider range of benchmarks with a PGO/LTO build. But 50% is definitely achievable.
Apologies if this is already mentioned somewhere, but is this across all supported platforms (I'm specifically interested in Windows) or is it limited to only some? I assume the long-term expectation would be to get the speedup on all supported platforms, I'm mainly curious as to whether your current results are platform-specific or general.
There is nothing platform specific. I've only tested on Linux. I hope that the speedup on Windows should be a bit more, as MSVC seems to do better jump fusion than GCC. (Not tested clang). Cheers, Mark.
On Thu, 22 Oct 2020 at 14:25, Mark Shannon <mark@hotpy.org> wrote:
MSVC seems to do better jump fusion than GCC.
Maybe I'm wrong, since I only take a look at dict, tuple and set C code, but it does not seems to me that there's more than 1-2 GOTOs in every CPython function, and they can't be merged.
On 22Oct2020 1341, Marco Sulla wrote:
On Thu, 22 Oct 2020 at 14:25, Mark Shannon <mark@hotpy.org <mailto:mark@hotpy.org>> wrote:
MSVC seems to do better jump fusion than GCC.
Maybe I'm wrong, since I only take a look at dict, tuple and set C code, but it does not seems to me that there's more than 1-2 GOTOs in every CPython function, and they can't be merged.
There are vastly more jumps generated than what you see in the source code. You'll need to compare assembly language to get a proper read on this. But I don't think that's necessary, since processors do other kinds of clever things with jumps anyway, that can variously improve/degrade performance from what the compilers generate. Benchmarks on consistent hardware are what matter, not speculation about generated code. Cheers, Steve
I don’t have much to add to this thread, except to ask whether Mark has been in contact with Carl Shapiro. Carl’s posted here before, but I don’t think he’s an active mailing list participant. Carl has a lot of experience with VMs and has been doing interesting work on performant Python implementations at Facebook. You might want to reach out to him. Cheers, -Barry
On 20 Oct 2020, at 14:53, Mark Shannon <mark@hotpy.org> wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I am aware that there have been several promised speed ups in the past that have failed. You might wonder why this is different.
Here are three reasons: 1. I already have working code for the first stage. 2. I'm not promising a silver bullet. I recognize that this is a substantial amount of work and needs funding. 3. I have extensive experience in VM implementation, not to mention a PhD in the subject.
My ideas for possible funding, as well as the actual plan of development, can be found here:
https://github.com/markshannon/faster-cpython
I'd love to hear your thoughts on this.
I don’t have anything useful to add to the discussion, other than to say that I’m happing to see that someone is willing to spent a significant amount of effort on making CPython faster. Especially when that someone has worked on faster Python implementation before (look for a HotPy talk at EuroPython). I’m not too worried about the technical part and have no expertise at funding at all. I am worried that merging this work will take a significant amount of effort. This is likely to result in fairly significant changes to the core interpreter, and it might be hard to find enough core devs that willing and able to review changes in a timely manner. Ronald
(For the record, I’m not replying as a PSF Director in this; I haven’t discussed this with the rest of the Board yet. This just comes from the Steering Council.) The Steering Council discussed this proposal in our weekly meeting, last week. It's a complicated subject with a lot of different facets to consider. First of all, though, we want to thank you, Mark, for bringing this to the table. The Steering Council and the PSF have been looking for these kinds of proposals for spending money on CPython development. We need ideas like this to have something to spend money on that we might collect (e.g. via the new GitHub sponsors page), and also to have a good story to potential (corporate) sponsors. That said, we do have a number of things to consider here. For background, funding comes in a variety of flavours. Most donations to the PSF are general fund donations; the foundation is free to use it for whatever purpose it deems necessary (within its non-profit mission). The PSF Board and staff decide where this money has the biggest impact, as there are a lot of things the PSF could spend it on. Funds can also be earmarked for a specific purpose. Donations to PyPI ( donate.pypi.org) work this way, for example. The donations go to the PSF, but are set aside specifically for PyPI expenses and development. Fiscal sponsorship (https://www.python.org/psf/fiscal-sponsorees/) is similar, but even more firmly restricted (and the fiscal sponsorees, not the PSF, decides what to spend the money on). A third way of handling funding is more targeted donations: sponsors donate for a specific program. For example, GitHub donated money specifically for the PSF to hire a project manager to handle the migration from bugs.python.org to GitHub Issues. Ezio Melotti was contracted by the PSF for this job, not GitHub, even though the funds are entirely donated by GitHub. Similar to such targeted donations are grant requests, like the several grants PyPI received and the CZI grant request for CPython that was recently rejected (https://github.com/python/steering-council/issues/26). The mechanics are a little different, but the end result is the same: the PSF receives funds to achieve very specific goals. Regarding donations to CPython development (as earmarked donations, or from the PSF's general fund), the SC drew up a plan for investment that is centered around maintenance: reducing the maintenance burden, easing the load on volunteers where desired, working through our bug and PR backlog. (The COVID-19 impact on PyCon and PSF funds put a damper on our plans, but we used much of the original plan for the CZI grant request, for example. Since that, too, fell through, we're hoping to collect funds for a reduced version of the plan through the PSF, which is looking to add it as a separate track in the sponsorship program.) Speeding up pure-Python programs is not something we consider a priority at this point, at least not until we can address the larger maintenance issues. And it may not be immediately obvious from Mark's plans, but as far as we can tell, the proposal is for speeding up pure-Python code. It will do little for code that is hampered, speed-wise, by CPython's object model, or threading model, or the C API. We have no idea how much this will actually matter to users. Making pure-Python code execution faster is always welcome, but it depends on the price. It may not be a good place to spend $500k or more, and it may even not be considered worth the implementation complexity. Thinking specifically of corporate sponsorship, it's very much the question if pure-Python code speedup is something companies would be willing to invest serious funds in. Google's Unladen Swallow was such an investment, and though it did deliver speedups (which were included in Python 2.7) and even though Google has a lot of Python code, there was not enough interest to keep it going. This may be different now, but finding out what "customers" (in the broadest sense) actually want is an important first step in asking for funding for a project like this. It's the kind of thing normally done by a product manager, at least in the corporate world, and we need that same effort and care put into it. If we can potentially find the funds for this project, via the PSF's general fund, earmarked funds or a direct corporate sponsor, we also have to consider what we are actually delivering. Which performance metrics are we improving? How are we measuring them, what benchmarks? What if the sponsor has their own benchmarks they want to use? What about effects on other performance metrics, ones the project isn't seeking to improve, are they allowed to worsen? To what extent? How will that be measured? How will we measure progress as the project continues? What milestones will we set? What happens when there's disagreement about the result between the sponsor and the people doing the work? What if the Steering Council or the core developers -- as a body -- declines to merge the work even if it does produce the desired result for the sponsor and the people doing the work? And this is about more than just agreements between the sponsor and the people doing the work. What is the position of the Steering Council in this? Are they managing the people doing the work or not? Are they evaluating the end result or not? What about the rest of the core developers? And how will development take place? Will the design or implementation of the performance improvements go through the PEP process? Will the SC or other core developers have input in the design or implementation? Who will do code review of the changes? Will the work be merged in small increments, or will it happen in a separate branch until the project is complete? All of these questions, and more, will need to be answered in some way, and it really requires a project manager to take this on. We've seen how much impact good management can have on a project with the PyPI work overseen by Sumana. A project of this scale really can't do without it. I don't doubt all of these questions can be answered, but it's going to take time and effort -- and probably concessions -- to get to a good proposal to put before interested corporations, and then more adjustments to accommodate them. The PSF and the SC can't fund the work at this time. If we can find a sponsor willing to just shell out the $2M (or just $500k) for the current plan, the SC is not against it -- but without the product management and project management work mentioned above, I doubt this will happen. If we want the SC or the PSF to go shopping for sponsors, soliciting donations for this project, we need more of the product/project management work done as well. If people want to work on the product and project management part of the proposal, that’d be great. We'd be happy to provide guidance. We also can -- and will! -- certainly mention this proposal as the kind of work we would want to fund when talking to potential sponsors. We can gauge interest, to see how worthwhile it would be to flesh out the proposal. Who knows, maybe someone will be willing to outright fund this as-is. But as it is, the SC doesn't think we can fund this directly, even if we had the money available. For the SC, Thomas. -- Thomas Wouters <thomas@python.org> Hi! I'm an email virus! Think twice before sending your email to help me spread!
Hello, On Wed, 4 Nov 2020 13:27:50 +0100 Thomas Wouters <thomas@python.org> wrote:
And it may not be immediately obvious from Mark's plans, but as far as we can tell, the proposal is for speeding up pure-Python code. It will do little for code that is hampered, speed-wise, by CPython's object model, or threading model, or the C API. We have no idea how much this will actually matter to users. Making pure-Python code execution faster is always welcome, but it depends on the price. It may not be a good place to spend $500k or more, and it may even not be considered worth the implementation complexity.
FWIW, I think it would definitely be worth it. Performance will be a *major* hurdle for Python in the years to come (the other hurdle being ease of deployment).
Thinking specifically of corporate sponsorship, it's very much the question if pure-Python code speedup is something companies would be willing to invest serious funds in.
I would suggest for example talking to Quansight, Numfocus, the NVidia Rapids team, and/or coiled.io . There are areas of scientific computing where better pure Python performance would help (one potential area is the Dask scheduler, another is the Numba JIT compiler). Another prominent area is server-side Web development, but I have noone to suggest there :-) Best regards Antoine.
On Wed, 4 Nov 2020 at 13:14, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Wed, 4 Nov 2020 13:27:50 +0100 Thomas Wouters <thomas@python.org> wrote:
And it may not be immediately obvious from Mark's plans, but as far as we can tell, the proposal is for speeding up pure-Python code. It will do little for code that is hampered, speed-wise, by CPython's object model, or threading model, or the C API. We have no idea how much this will actually matter to users. Making pure-Python code execution faster is always welcome, but it depends on the price. It may not be a good place to spend $500k or more, and it may even not be considered worth the implementation complexity.
FWIW, I think it would definitely be worth it. Performance will be a *major* hurdle for Python in the years to come (the other hurdle being ease of deployment).
I agree on both of these points, and I would love to see funding be available for both of these items. But having said that, I agree with the SC's position here. Getting funding is only one part of the problem, project management and co-ordination is absolutely necessary (we're talking about a $2M project!) and would be a significant overhead. Even if the cost of such resource could come from the funding, there's still a significant cashflow problem with committing that resource prior to getting funding, as well as a risk that the funding doesn't materialise and the investment is lost. I hope that we can find some way to realise the benefits Mark has identified, but I can see why the SC has to prioritise the way they have. Paul
Hi Thomas, I have to assume that this isn't a rejection of my proposal, since I haven't actually made a proposal to the SC yet :) Thanks for the feedback though, it's very valuable to know the SC's thinking on this matter. I have a few comments inline below. On 04/11/2020 12:27 pm, Thomas Wouters wrote:
(For the record, I’m not replying as a PSF Director in this; I haven’t discussed this with the rest of the Board yet. This just comes from the Steering Council.)
The Steering Council discussed this proposal in our weekly meeting, last week. It's a complicated subject with a lot of different facets to consider. First of all, though, we want to thank you, Mark, for bringing this to the table. The Steering Council and the PSF have been looking for these kinds of proposals for spending money on CPython development. We need ideas like this to have something to spend money on that we might collect (e.g. via the new GitHub sponsors page), and also to have a good story to potential (corporate) sponsors.
That said, we do have a number of things to consider here.
For background, funding comes in a variety of flavours. Most donations to the PSF are general fund donations; the foundation is free to use it for whatever purpose it deems necessary (within its non-profit mission). The PSF Board and staff decide where this money has the biggest impact, as there are a lotof things the PSF could spend it on.
Funds can also be earmarked for a specific purpose. Donations to PyPI (donate.pypi.org <http://donate.pypi.org>) work this way, for example. The donations go to the PSF, but are set aside specifically for PyPI expenses and development. Fiscal sponsorship (https://www.python.org/psf/fiscal-sponsorees/) is similar, but even more firmly restricted (and the fiscal sponsorees, not the PSF, decides +++ what to spend the money on).
A third way of handling funding is more targeted donations: sponsors donate for a specific program. For example, GitHub donated money specifically for the PSF to hire a project manager to handle the migration from bugs.python.org <http://bugs.python.org> to GitHub Issues. Ezio Melotti was contracted by the PSF for this job, not GitHub, even though the funds are entirely donated by GitHub. Similar to such targeted donations are grant requests, like the several grants PyPI received and the CZI grant request for CPython that was recently rejected (https://github.com/python/steering-council/issues/26). The mechanics are a little different, but the end result is the same: the PSF receives funds to achieve very specific goals.
I really don't want to take money away from the PSF. Ideally I would like the PSF to have more money.
Regarding donations to CPython development (as earmarked donations, or from the PSF's general fund), the SC drew up a plan for investment that is centered around maintenance: reducing the maintenance burden, easing the load on volunteers where desired, working through our bug and PR backlog. (The COVID-19 impact on PyCon and PSF funds put a damper on our plans, but we used much of the original plan for the CZI grant request, for example. Since that, too, fell through, we're hoping to collect funds for a reduced version of the plan through the PSF, which is looking to add it as a separate track in the sponsorship program.) Speeding up pure-Python programs is not something we consider a priority at this point, at least not until we can address the larger maintenance issues.
I too think we should improve the maintenance story. But maintenance doesn't get anyone excited. Performance does. By allocating part of the budget to maintenance we get performance *and* a better maintenance story. That's my goal anyway. I think it is a lot easier to say to corporations, give us X dollars to speed up Python and you save Y dollars, than give us X dollars to improve maintenance with no quantifiable benefit to them.
And it may not be immediately obvious from Mark's plans, but as far as we can tell, the proposal is for speeding up pure-Python code. It will do little for code that is hampered, speed-wise, by CPython's object model, or threading model, or the C API. We have no idea how much this will actually matter to users. Making pure-Python code execution faster is always welcome, but it depends on the price. It may not be a good place to spend $500k or more, and it may even not be considered worth the implementation complexity.
I'll elaborate: 1. There will be a large total diff, but not that large an increase in code size; less than 1% of the current size of the C code base. There would be an increase in the conceptual complexity of the interpreter, but I'm hoping to largely offset that with better code organization. It is perfectly possible to *improve* code quality, if not necessarily size, while increasing performance. Simpler code is often faster and better algorithms do not make worse code. 2. The object model and C-API are an inherent part of CPython. It's not really meaningful to say that some piece of code's performance is hampered by the C-API or object model. What matters is how much faster it goes. 3. Regarding threading, all CPU bound code will be speed up. Whether code is limited by being single threaded or not, it will still be sped up. The speed up of a single interpreter is (largely) independent of the number of threads running. Eric, Petr and Victor's work will still be relevant for concurrent performance. Please, just ask me if you need more details on any of these points.
Thinking specifically of corporate sponsorship, it's very much the question if pure-Python code speedup is something companies would be willing to invest serious funds in. Google's Unladen Swallow was such an investment, and though it did deliver speedups (which were included in Python 2.7) and even though Google has a lotof Python code, there was not enough interest to keep it going. This may be different now, but finding out what "customers" (in the broadest sense) actually want is an important first step in asking for funding for a project like this. It's the kind of thing normally done by a product manager, at least in the corporate world, and we need that same effort and care put into it.
It makes sense that a single corporate sponsor would be unwilling to fund this. But why not several corporations? It keeps their costs down and they get the same benefit. I have no idea how to go about organizing that, however.
If we canpotentially find the funds for this project, via the PSF's general fund, earmarked funds or a direct corporate sponsor, we also have to consider what we are actually delivering. Which performance metrics are we improving? How are we measuring them, what benchmarks? What if the sponsor has their own benchmarks they want to use? What about effects on other performance metrics, ones the project isn't seeking to improve, are they allowed to worsen? To what extent? How will that be measured? How will we measure progress as the project continues? What milestones will we set? What happens when there's disagreement about the result between the sponsor and the people doing the work? What if the Steering Council or the core developers -- as a body -- declines to merge the work even if it does produce the desired result for the sponsor and the people doing the work?
We already have a standard benchmark suite. I would propose using that as a start. If corporate sponsors want to add their own benchmarks that's a double win. They get more confidence that they will see performance improvements and we get a more comprehensive benchmark suite. I wouldn't worry about anything getting slower. But, if a sponsor only sees a 20% speedup on their code, despite a general speed up of 50%, then what happens? I guess that's up to the sponsor, although they probably should state their conditions up front.
And this is about more than just agreements between the sponsor and the people doing the work. What is the position of the Steering Council in this? Are they managing the people doing the work or not? Are they evaluating the end result or not? What about the rest of the core developers? And how will development take place? Will the design or implementation of the performance improvements go through the PEP process? Will the SC or other core developers have input in the design or implementation? Who will do code review of the changes? Will the work be merged in small increments, or will it happen in a separate branch until the project is complete? All of these questions, and more, will need to be answered in some way, and it really requires a project manager to take this on. We've seen how much impact good management can have on a project with the PyPI work overseen by Sumana. A project of this scale really can't do without it.
I don't think that the SC or PSF should be managing the work. How do you price and allocate research work? Which is why I am offering to subcontract. I am willing to take on the risk and, having done the research, know that I can deliver. As for reviewing and merging, I would expect to pay someone for reviewing and some other maintenance tasks. Note that the payment would be for the review, not for a favorable review. Obviously reviews from other code devs would be most welcome, but I don't want to rely on using up other people's spare time. I can merge the code myself. Merges would be in small units and as often as is practical. There is no need for long lived branches, at least not for stage 1.
I don't doubt all of these questions can be answered, but it's going to take time and effort -- and probably concessions -- to get to a good proposal to put before interested corporations, and then more adjustments to accommodate them. The PSF and the SC can't fund the work at this time. If we can find a sponsor willing to just shell out the $2M (or just $500k) for the current plan, the SC is not against it -- but without the product management and project management work mentioned above, I doubt this will happen. If we want the SC or the PSF to go shopping for sponsors, soliciting donations for this project, we need more of the product/project management work done as well.
Just the $500k, or thereabouts. The first stage should not rely on later stages ever happening. As for project management, that's why I was suggested a cash-on-delivery contract. Obviously whoever gets hired by the PSF for maintenance will need managing, but that needs to happen anyway.
If people want to work on the product and project management part of the proposal, that’d be great. We'd be happy to provide guidance. We also can -- and will! -- certainly mention this proposal as the kind of work we would want to fund when talking to potential sponsors. We can gauge interest, to see how worthwhile it would be to flesh out the proposal. Who knows, maybe someone will be willing to outright fund this as-is. But as it is, the SC doesn't think we can fund this directly, even if we had the money available.
Again, I really don't want to take money away from the PSF. Cheers, Mark.
On Tue, Oct 20, 2020 at 01:53:34PM +0100, Mark Shannon wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I've noticed a lot of optimization-related b.p.o. issues created by Mark, which is great. What happened with Mark's proposal here? Did the funding issue get sorted? -- Steve
On Fri, May 7, 2021 at 6:51 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Oct 20, 2020 at 01:53:34PM +0100, Mark Shannon wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I've noticed a lot of optimization-related b.p.o. issues created by Mark, which is great. What happened with Mark's proposal here? Did the funding issue get sorted?
I believe Guido has Mark contracting on Python performance through Microsoft? -Greg
On Fri, May 7, 2021 at 8:20 PM Gregory P. Smith <greg@krypto.org> wrote:
On Fri, May 7, 2021 at 6:51 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Oct 20, 2020 at 01:53:34PM +0100, Mark Shannon wrote:
Hi everyone,
CPython is slow. We all know that, yet little is done to fix it.
I'd like to change that. I have a plan to speed up CPython by a factor of five over the next few years. But it needs funding.
I've noticed a lot of optimization-related b.p.o. issues created by Mark, which is great. What happened with Mark's proposal here? Did the funding issue get sorted?
I believe Guido has Mark contracting on Python performance through Microsoft?
For those who didn't attend the Language Summit yesterday, this is indeed the case. We've been in stealth mode until the summit, but the cat is now definitely out of the bag -- Microsoft is thanking the Python community by funding work to speed up CPython. Besides Mark and myself, Eric Snow (a MS employee like myself) is also full-time on this project. We expect to be adding a few more people to the team. Mark has already revealed his PEP 659 (Specializing Adaptive Interpreter). We've also created a small GitHub org: https://github.com/faster-cpython/, containing several repos: - https://github.com/faster-cpython/cpython, a fork of cpython where we do the work (PRs will mostly come from here) - https://github.com/faster-cpython/tools, a set of tools we're using for benchmarking and analysis and the like (the README contains some stats we've gathered on bytecode occurrence) - https://github.com/faster-cpython/ideas, a tracker where we're discussing various plans and ideas Contributions are welcome! I've also published the slides of my language summit presentation: https://github.com/faster-cpython/ideas/blob/main/FasterCPythonDark.pdf -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Great news, just a tiny bit from me. I read the other day in the OpenSource report sponsored by the Ford Foundation a CPython contributor stating that we have an all time high count of Python users but an all time low number of contributors to CPython. I don't know how but we certainly need a fake path to help people start contributing and level up to gain a pool of resources We don't need to wait for easy issues or things like that or wait for PR merge to level up. Yet you always see it: new people not knowing where to start, highly skilled contributors drowning and intermediate contributors moving slowly I know all contributors are doing awesome work but maybe something can be done to have a smarter skilling up stream
On 5/12/2021 2:50 PM, Abdur-Rahmaan Janhangeer wrote:
Great news, just a tiny bit from me. I read the other day in the OpenSource report sponsored by the Ford Foundation a CPython contributor stating that we have an all time high count of Python users but an all time low number of contributors to CPython. I don't know how but we certainly need a fake path to help people start
I presume you mean 'fast path'?
contributing and level up to gain a pool of resources We don't need to wait for easy issues or things like that or wait for PR merge to level up.
Yet you always see it: new people not knowing where to start, highly skilled contributors drowning and intermediate contributors moving slowly
I have multiple times strongly recommended that people review issues and PRs, and sometimes given details, but most won't or don't. Do you have any idea why? -- Terry Jan Reedy
On Wed, 12 May 2021 17:05:03 -0400 Terry Reedy <tjreedy@udel.edu> wrote:
contributing and level up to gain a pool of resources We don't need to wait for easy issues or things like that or wait for PR merge to level up.
Yet you always see it: new people not knowing where to start, highly skilled contributors drowning and intermediate contributors moving slowly
I have multiple times strongly recommended that people review issues and PRs, and sometimes given details, but most won't or don't.
I don't know who "people" are in your sentence, but reviewing issues and PRs generally requires a high familiarity with a project, and enough confidence to speak with a voice of (seeming) authority. I'm not convinced it's generally easier than submitting a patch for a particular issue you're comfortable with. Regards Antoine.
On 5/12/2021 5:14 PM, Antoine Pitrou wrote:
On Wed, 12 May 2021 17:05:03 -0400 Terry Reedy <tjreedy@udel.edu> wrote:
Yet you always see it: new people not knowing where to start, highly skilled contributors drowning and intermediate contributors moving slowly
I have multiple times strongly recommended that people review issues and PRs, and sometimes given details, but most won't or don't.
I don't know who "people" are in your sentence, but reviewing issues and PRs generally requires a high familiarity with a project, and
Much can be done without what I think you mean by 'high familiarity'. Bug issues: bpo: "On macOS with 3.8.3 I see this buggy behavior" If not enough info to reproduce, ask for it. If there is, try to reproduce on lastest release or even better, repository build. Sometimes, trying on a different OS is helpful. PR: make local PR branch and test whether proposed fix works. Enhancement issues: bpo: if proposal is for core python or a module one has used, does proposal seem like an actual improvement? enough to be worth the likely bother? PR: does the PR work as promised? Do you like it? PR Python code: read it. See any possible improvements?
enough confidence to speak with a voice of (seeming) authority.
I prefer honesty to pretend authority. Nearly 2 years ago, a 'new contributor' commented on a IDLE PR, "Would x be better?" It was a minor improvement, but a real one, so I made it, thanked the person, and encouraged further efforts. That person did so and is now a core developer. I would welcome more eyes on IDLE patches and use testing thereof.
I'm not convinced it's generally easier than submitting a patch for a particular issue you're comfortable with.
Some review actions are easier, some are not. But the only way to learn to review other people's code is to do it, and it is a skill we need in at least some new coredevs. -- Terry Jan Reedy
On 5/12/21 4:10 PM, Terry Reedy wrote:
On 5/12/2021 5:14 PM, Antoine Pitrou wrote:
On Wed, 12 May 2021 17:05:03 -0400 Terry Reedy <tjreedy@udel.edu> wrote:
Yet you always see it: new people not knowing where to start, highly skilled contributors drowning and intermediate contributors moving slowly
I have multiple times strongly recommended that people review issues and PRs, and sometimes given details, but most won't or don't.
I don't know who "people" are in your sentence, but reviewing issues and PRs generally requires a high familiarity with a project, and
Much can be done without what I think you mean by 'high familiarity'.
Bug issues: bpo: "On macOS with 3.8.3 I see this buggy behavior" If not enough info to reproduce, ask for it. If there is, try to reproduce on lastest release or even better, repository build. Sometimes, trying on a different OS is helpful. PR: make local PR branch and test whether proposed fix works.
Enhancement issues: bpo: if proposal is for core python or a module one has used, does proposal seem like an actual improvement? enough to be worth the likely bother? PR: does the PR work as promised? Do you like it?
PR Python code: read it. See any possible improvements?
In addition, starting by working in other's issues and PRs will build a degree of familiarity with how the Python development works - what sorts of questions get asked, what changes to a PR tend to get asked for, etc.
On Thu, 13 May 2021, 01:09 Terry Reedy, <tjreedy@udel.edu> wrote:
On 5/12/2021 2:50 PM, Abdur-Rahmaan Janhangeer wrote:
Great news, just a tiny bit from me. I read the other day in the OpenSource report sponsored by the Ford Foundation a CPython contributor stating that we have an all time high count of Python users but an all time low number of contributors to CPython. I don't know how but we certainly need a fake path to help people start
I presume you mean 'fast path'?
No i mean fake path in the sense of a fork of CPython with issues for learning purposes Then people work on solving the issues on their own without PRing. It helps them get close to the CPython source without waiting for merges or comments since the fix will be documented. It allows people to skill up without people involvement
Abdur-Rahmaan Janhangeer writes:
No i mean fake path in the sense of a fork of CPython with issues for learning purposes
*Creating* plausible issues is hard work, I assure you as a university professor. Coming up with "exercises" that are not makework requires expertise in both the domain and in educational psychology. (Some people are "just good at it", of course, but it's quite clear from popular textbooks that most are not.) I think that would be a very unproductive use of developer time, especially since "git clone; git checkout some-tag-in-2017" is pretty much what you're asking for otherwise.
Then people work on solving the issues on their own without PRing.
The problem is not a lack of issues to practice on. It's that (1) the PR process itself is a barrier, or at least an annoyance, and (2) many new contributors need mentoring. (Or think they do. Some just need encouragment, others need help on technique, but both groups are more or less blocked without the mentoring.) And, of course, real contribution involves a lot of unfun work. Writing tests, writing documentation, explaining to other developers who start out -1 because they don't get it, overcoming your own mental blocks to changing your submission because *you* don't get it, and on and on. A lot of newcomers think "I'm not good at that, if I have to do it I can't contribute" (and a few selfishly think they can just do the fun parts and achieve fame and fortune), but you know, "if not you, then who? If you don't do it for Python, where are you going to be able to contribute?" To be honest, although I'm not a specialist in organizational behavior and am operating with a small sample, I can say that from the point of view of identifying tasks, finding solutions, and implementing them, Python is the most effective non-hierarchical organization I've ever seen. I can't say I've seen more than one or two hierarchical organizations that are significantly better at implementing solutions and don't burn up their workers in the process -- and the ones I'm thinking of are way smaller than Python. (Yes, I know that there are people who have gotten burned up in Python, too. We can do better on that, but Python does not deliberately sacrifice people to the organization.) ISTM that Terry is right. What we need to do better is encourage people to just start contributing, and help them to get over the initial humps: git, the PR process, requests from the QA police for docs and tests and NEWS entries, etc. Terry's approach seems good to me on the face of it, and it's "battle-tested". Terry uses it and has had some successes. Maybe that process can be tweaked, but it's a good recipe. I suspect that the main reason it doesn't work for Terry outside of IDLE is that IDLE is where Terry has expertise and motivation to do emotional work: handholding at the beginning, deeper involvement in mentoring as necessary. *And that's as it should be.* It's up to the rest of us to do that work on areas *we* care about. I have to point out that there's a whole crew over on corementorship doing this work, and at least one Very Senior Developer with their own private mentoring program.[1] IMO, that is a big part of why Python is as successful as it is. If more senior developers would take on these tasks it would have a big effect downstream. But emotional work is hard, and it comes in big chunks. In many situations you have to follow through, on the mentee's schedule, or the mentee will "slip the hook and swim away." So it's a big ask. I'm willing to make that ask in the abstract, but there's not even one senior developer I'm able to point to and say "definitely that person would do more for Python by mentoring than by hacking". It's a very hard problem. Footnotes: [1] Why "private"? Well, why should the junior developers have all the fun? The VSDs want to hack too! So those programs are small and not terribly well-publicized (and they often have strong "inclusion" focuses as well as specific focus on areas of improvement).
On Thu, May 13, 2021 at 10:03 PM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
*Creating* plausible issues is hard work, I assure you as a university professor. Coming up with "exercises" that are not makework requires expertise in both the domain and in educational psychology. (Some people are "just good at it", of course, but it's quite clear from popular textbooks that most are not.) I think that would be a very unproductive use of developer time, especially since "git clone; git checkout some-tag-in-2017" is pretty much what you're asking for otherwise.
Maybe selecting already solved issues which theoretically takes away the pain of mimicking real-world scenarios. Great to have the insights from someone behind the scenes of exercises The problem is not a lack of issues to practice on. It's that (1) the
PR process itself is a barrier, or at least an annoyance, and (2) many new contributors need mentoring. (Or think they do. Some just need encouragment, others need help on technique, but both groups are more or less blocked without the mentoring.)
I think setting up is not that hard. VStinner contributed a great piece in the sense of https://cpython-core-tutorial.readthedocs.io/en/latest/ if someone gets stuck, he can ping the list or something like that . Like once you set the project running i guess what you need is contribute or explore and understand, both theoretically solved using the educational repo. Like you need to find something to do before the interest wanes away. As Terry Reedy encourages, getting more and more people to contribute ensures that at least a couple of people passes through the vital processes need to get people going/becoming a regular contributor. This idea aims to make this process easier. And, of course, real contribution involves a lot of unfun work.
Writing tests, writing documentation, explaining to other developers who start out -1 because they don't get it, overcoming your own mental blocks to changing your submission because *you* don't get it, and on and on. A lot of newcomers think "I'm not good at that, if I have to do it I can't contribute" (and a few selfishly think they can just do the fun parts and achieve fame and fortune), but you know, "if not you, then who? If you don't do it for Python, where are you going to be able to contribute?"
Having past solved issues picked out and documented some more in increasing level of difficulty seems to iron out the issues.
To be honest, although I'm not a specialist in organizational behavior and am operating with a small sample, I can say that from the point of view of identifying tasks, finding solutions, and implementing them, Python is the most effective non-hierarchical organization I've ever seen. I can't say I've seen more than one or two hierarchical organizations that are significantly better at implementing solutions and don't burn up their workers in the process -- and the ones I'm thinking of are way smaller than Python. (Yes, I know that there are people who have gotten burned up in Python, too. We can do better on that, but Python does not deliberately sacrifice people to the organization.)
I agree that the Python community is awesome, the different WGs act like great departments, people do give a lot of time but being subscribed in here for some years made me see some recurring patterns. Also, while organising FlaskCon, we got some really great insights into the community. The usergroup page where usergroups are listed is a big lie in the sense that though is lists all usergroups once initiated, the real picture is way different. We contacted a great deal of them. Here and there there is room for improvement in the machinery.
I have to point out that there's a whole crew over on corementorship doing this work, and at least one Very Senior Developer with their own private mentoring program.[1] IMO, that is a big part of why Python is as successful as it is. If more senior developers would take on these tasks it would have a big effect downstream. But emotional work is hard, and it comes in big chunks. In many situations you have to follow through, on the mentee's schedule, or the mentee will "slip the hook and swim away." So it's a big ask. I'm willing to make that ask in the abstract, but there's not even one senior developer I'm able to point to and say "definitely that person would do more for Python by mentoring than by hacking". It's a very hard problem.
That's why i guess what i am proposing might seem simple but it's fundamentally putting CPython contribution mentoring in auto-pilot mode. I've seen as i said VStinner's initiative and initiatives like these pay off far more than just the docs though it can be included in the docs, but having some tidbit liberty addresses some on the fly issues. But not all people have time for that as juggling work, life and OpenSource is a great problem to solve. Personally i intend to help setting up the basics of it but it requires me to become a regular contributor, in the meanwhile, sharing some obeservations, ticking off some todos until i resume interest in tackling issues. Kind Regards, Abdur-Rahmaan Janhangeer about <https://compileralchemy.github.io/> | blog <https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ> Mauritius
Abdur-Rahmaan Janhangeer writes:
That's why i guess what i am proposing might seem simple
I'm saying that we already have the simple version, spelled git clone; git checkout main~5000 then git log -U0 main~5000..main | grep -v '^[-+ ]' which provides very nice hints for the diligent student, including description, file, and even line numbers.[1] Variations on that theme will provide less detailed hints or you can cheat just a little by replacing the "| grep" with "; sleep 2; clear". The possibilities are endless! The curated version you propose is, in my "I do that for $DAYJOB" opinion, very hard to do better than that. Steve Footnotes: [1] git is, in my opinion, the 42 of software development. It is the answer to life, the universe, and EVERYTHING. :-)
On Wed, May 12, 2021 at 8:51 PM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
Great news, just a tiny bit from me. I read the other day in the OpenSource report sponsored by the Ford Foundation a CPython contributor stating that we have an all time high count of Python users but an all time low number of contributors to CPython.
There's also (probably, I didn't count myself) a record number of Python implementations that are not CPython (including Cython, Pythran and similar projects) as well as CPython forks (Pyjion, Pyston, the project announced by Guido, etc.) Also: judging from https://github.com/python/cpython/graphs/contributors, the "all time low number of contributors to CPython" assertion doesn't seem to hold. In terms of committers, and according to this page, I count a similar number or slightly higher of committers (50+) over the last year, similar to over previous years (including those that seem more active in terms of commit counts on the graph, like around 2010). S. -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
Actual quote by "a Python Software Foundation fellow and contrib- utor to Python infrastructure projects" What frustrates me most is that we have an all-time high of Python developers and an all-time low on high quality contri- butions.[...] As soon as pivotal developers like Armin Ronacher slow down their churn, the whole community feels it immedi- ately. The moment Paul Kehrer stops working on PyCA we’re screwed. If Hawkowl stops porting, Twisted will never be on Python 3 and git. So we’re bleeding due to people who cause more work than they provide. [...] Right now everyone is benefitting from what has been built but due to lack of funding and contributions it’s deteriorating. I find that worrying, because Python might be super popular right now but once the consequences hit us, the opportunists will leave as fast as they arrived Book: ROADS AND BRIDGES: THE UNSEEN LABOR BEHIND OUR DIGITAL INFRASTRUCTURE Link, Page 76 link: https://www.fordfoundation.org/media/2976/roads-and-bridges-the-unseen-labor...
On Thu, May 13, 2021 at 8:42 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
Actual quote by "a Python Software Foundation fellow and contrib- utor to Python infrastructure projects"
Ah, this is what you were referring to. The document was published 5 years ago, so this may or may not reflect the current situation.
What frustrates me most is that we have an all-time high of Python developers and an all-time low on high quality contri- butions.[...] As soon as pivotal developers like Armin Ronacher slow down their churn, the whole community feels it immedi- ately.
That's true, but, AFAIK, Armin was never a direct contributor to CPython (confirmed by looking at https://github.com/python/cpython/graphs/contributors ) so I guess that's another issue. But to add a specific comment on this topic: Armin was indeed a very creative and prolific developer of Python libraries and frameworks, and since he's mostly left the Python world this has been an issue indeed. Some of his projects, like Lektor (which I'm using for my blog) have lost their traction and some of their users are moving to other tools. But the good news is that some of his projects have been picked up by other contributors, as the "Pallets" project, and, as a coincidence, they have made major new releases of most of Armin's old projects just a couple of days ago: https://www.palletsprojects.com/blog/flask-2-0-released/ S.
-- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyParis & PyData Paris - http://pyparis.org/ & http://pydata.fr/
Greetings, One crucial missing piece in the Python world is the focus on internals of projects. You have many talks on usage and scaling but not enough on internals. Even less workshops. For OpenSource to thrive, you need people who master the codebase. It's a long process. You get there by having core contributors over time. How does a contributor becomes a core one? By getting the feet wet into the codebase and tackling more difficult as time passes. That's why instead of waiting for people to find issues, work on it, wait for validation, we can improve the training process without damage to the codebase. People get the educational version of the repo, solve the issues at their own pace up to the level where they'll feel confident to try a meaningful PR. Seeing it with the eye of a knowledgeable person makes will make them PR not just for the sake of PR but because of a real need. One practical way is also to point the intermediate steps to resources on the internet, like this and that C articles to get started with C, this article to understand this C behaviour, this talk at this conf to understand this part of the C API, i built a tool specifically to document those intermediate steps by gathering resources on the internet, will start using it soon: https://linkolearn.com/. I am part of the Flask Community Workgroup (It's due to be announced soon, but here is the link: https://flaskcwg.github.io/). One of the aims of it is education, a good deal about internals. We aim to roll out some initiatives by next year. What caused me to write the first post is that there seems to be a bottleneck somewhere when you see contributors overwhelmed by OpenSource tasks. If it were some obscure project I understand but not one of the most popular OpenSource product of today. Kind Regards, Abdur-Rahmaan Janhangeer about <https://compileralchemy.github.io/> | blog <https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ> Mauritius
On Thu, May 13, 2021 at 5:37 PM Abdur-Rahmaan Janhangeer <arj.python@gmail.com> wrote:
Greetings,
One crucial missing piece in the Python world is the focus on internals of projects. You have many talks on usage and scaling but not enough on internals. Even less workshops. For OpenSource to thrive, you need people who master the codebase. It's a long process. You get there by having core contributors over time. How does a contributor becomes a core one? By getting the feet wet into the codebase and tackling more difficult as time passes. That's why instead of waiting for people to find issues, work on it, wait for validation, we can improve the training process without damage to the codebase. People get the educational version of the repo, solve the issues at their own pace up to the level where they'll feel confident to try a meaningful PR. Seeing it with the eye of a knowledgeable person makes will make them PR not just for the sake of PR but because of a real need.
How is this "educational version" different from a forked git repository? I'm confused here. ChrisA
Greetings, On Thu, May 13, 2021 at 11:43 AM Chris Angelico <rosuav@gmail.com> wrote:
How is this "educational version" different from a forked git repository? I'm confused here.
Oh i mean a forked git repository with internal-focused documentations, issues opened with description of changes to be made then repo set to READONLY. A way to view what the solved issues look like in included under 'internal-focused documentations' Kind Regards, Abdur-Rahmaan Janhangeer about <https://compileralchemy.github.io/> | blog <https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ> Mauritius
Have you heard of this book? It's an excellent companion to the source code. https://realpython.com/products/cpython-internals-book/ On Thu, May 13, 2021 at 12:30 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
Greetings,
One crucial missing piece in the Python world is the focus on internals of projects. You have many talks on usage and scaling but not enough on internals. Even less workshops. For OpenSource to thrive, you need people who master the codebase. It's a long process. You get there by having core contributors over time. How does a contributor becomes a core one? By getting the feet wet into the codebase and tackling more difficult as time passes. That's why instead of waiting for people to find issues, work on it, wait for validation, we can improve the training process without damage to the codebase. People get the educational version of the repo, solve the issues at their own pace up to the level where they'll feel confident to try a meaningful PR. Seeing it with the eye of a knowledgeable person makes will make them PR not just for the sake of PR but because of a real need. One practical way is also to point the intermediate steps to resources on the internet, like this and that C articles to get started with C, this article to understand this C behaviour, this talk at this conf to understand this part of the C API, i built a tool specifically to document those intermediate steps by gathering resources on the internet, will start using it soon: https://linkolearn.com/. I am part of the Flask Community Workgroup (It's due to be announced soon, but here is the link: https://flaskcwg.github.io/). One of the aims of it is education, a good deal about internals. We aim to roll out some initiatives by next year. What caused me to write the first post is that there seems to be a bottleneck somewhere when you see contributors overwhelmed by OpenSource tasks. If it were some obscure project I understand but not one of the most popular OpenSource product of today.
Kind Regards,
Abdur-Rahmaan Janhangeer about <https://compileralchemy.github.io/> | blog <https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ> Mauritius
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
A really awesome book, i was proposing a by the house training. The community is awesome, just some more twerkings needed as you always see the lost beginner wanting mentorship, the contributors contributing and the core-devs having no time to cater for a whole community of mentorship seekers. Kind Regards, Abdur-Rahmaan Janhangeer about <https://compileralchemy.github.io/> | blog <https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ> Mauritius
Recently I found cinder on GitHub created by Instagram <https://github.com/facebookincubator/cinder>, looks like they have the same interest of yous (speed up cpython) and might be useful team up with them. ``` We've made Cinder publicly available in order to facilitate conversation about potentially upstreaming some of this work to CPython and to reduce duplication of effort among people working on CPython performance. ``` Em seg., 17 de mai. de 2021 às 12:52, Abdur-Rahmaan Janhangeer < arj.python@gmail.com> escreveu:
A really awesome book, i was proposing a by the house training. The community is awesome, just some more twerkings needed as you always see the lost beginner wanting mentorship, the contributors contributing and the core-devs having no time to cater for a whole community of mentorship seekers.
Kind Regards,
Abdur-Rahmaan Janhangeer about <https://compileralchemy.github.io/> | blog <https://www.pythonkitchen.com> github <https://github.com/Abdur-RahmaanJ> Mauritius _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MASJZFLZ... Code of Conduct: http://python.org/psf/codeofconduct/
-- Att, Diego da Silva Péres.
On Fri, 14 May 2021, 1:47 am Stéfane Fermigier, <sf@fermigier.com> wrote:
On Thu, May 13, 2021 at 8:42 AM Abdur-Rahmaan Janhangeer < arj.python@gmail.com> wrote:
Actual quote by "a Python Software Foundation fellow and contrib- utor to Python infrastructure projects"
Ah, this is what you were referring to. The document was published 5 years ago, so this may or may not reflect the current situation.
What frustrates me most is that we have an all-time high of Python developers and an all-time low on high quality contri- butions.[...] As soon as pivotal developers like Armin Ronacher slow down their churn, the whole community feels it immedi- ately.
That's true, but, AFAIK, Armin was never a direct contributor to CPython (confirmed by looking at https://github.com/python/cpython/graphs/contributors ) so I guess that's another issue.
The problems mentioned in the Ford Foundation report definitely aren't solved, but one of the contributing factors that the PSF identified at the time was that if core projects like CPython and pypi.org are underfunded, that poor precedent severely hurts fund raising and other sustainability efforts in the wider Python ecosystem. Thus efforts like the "developer in residence" role that the PSF are currently recruiting for, as well as the increased focus on (and investment in) ecosystem sustainability from commercial users and redistributors. Cheers, Nick.
participants (33)
-
Abdur-Rahmaan Janhangeer
-
Antoine Pitrou
-
Antoine Pitrou
-
Barry Warsaw
-
Chris Angelico
-
Christian Heimes
-
Dan Stromberg
-
Diego Peres
-
edwin@211mainstreet.net
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
Kevin Modzelewski
-
Larry Hastings
-
Marco Sulla
-
Mark Shannon
-
Mats Wichmann
-
Matthias Klose
-
Matti Picus
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Petr Viktorin
-
Ronald Oussoren
-
Simon Cross
-
Stefan Ring
-
Stephen J. Turnbull
-
Steve Dower
-
Steve Holden
-
Steven D'Aprano
-
Stéfane Fermigier
-
Terry Reedy
-
Thomas Wouters