On Fri, Apr 17, 2020 at 3:57 PM Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Fri, Apr 17, 2020 at 2:59 PM Nathaniel Smith <njs@pobox.com> wrote:
I think some perspective might be useful here :-).
The last time we merged a new concurrency model in the stdlib, it was asyncio.
[snip]
OTOH, AFAICT the new concurrency model in PEP 554 has never actually been used, and it isn't even clear whether it's useful at all.
Perhaps I didn't word things quite right. PEP 554 doesn't provide a new concurrency model so much as it provides functionality that could probably be used as the foundation for one.
That makes it worse, right? If I wrote a PEP saying "here's some features that could possibly someday be used to make a new concurrency model", that wouldn't make it past the first review.
Ultimately the module proposed in the PEP does the following:
* exposes the existing subinterpreters functionality almost as-is
So I think this is a place where we see things really differently. I guess your perspective is, subinterpreters are already a CPython feature, so we're not adding anything, and we don't really need to talk about whether CPython should support subinterpreters. But this simply isn't true. Yes, there's some APIs for subinterpreters added back in the 1.x days, but they were never really thought through, and have never actually worked. There are exactly 3 users, and all have serious issues, and a strategy for avoiding subinterpreters because of the brokenness. In practice, the existing ecosystem of C extensions has never supported subinterpreters. This is clearly not a great state of affairs – we should either support them or not support them. Shipping a broken feature doesn't help anyone. But the current status isn't terribly harmful, because the general consensus across the ecosystem is that they don't work and aren't used. If we start exposing them in the stdlib and encouraging people to use them, though, that's a *huge* change. Our users trust us. If we tell them that subinterpreters are a real thing now, then they'll spend lots of effort on trying to support them. Since subinterpreters are confusing, and break the C API/ABI, this means that every C extension author will have to spend a substantial amount of time figuring out what subinterpreters are, how they work, squinting at PEP 489, asking questions, auditing their code, etc. This will take years, and in the mean time, users will expect subinterpreters to work, be confused at why they break, yell at random third-party maintainers, spend days trying to track down mysterious problems that turn out to be caused by subinterpreters, etc. There will be many many blog posts trying to explain subinterpreters and understand when they're useful (if ever), arguments about whether to support them. Twitter threads. Production experiments. If you consider that we have thousands of existing C extensions and millions of users, accepting PEP 554 means forcing people you don't know to collectively spend many person-years on subinterpreters. Random story time: NumPy deprecated some C APIs some years ago, a little bit before I got involved. Unfortunately, it wasn't fully thought through; the new APIs were a bit nicer-looking, but didn't enable any new features, didn't provide any path to getting rid of the old APIs, and in fact it turned out that there were some critical use cases that still required the old API. So in practice, the deprecation was never going anywhere; the old APIs work just as well and are never going to get removed, so spending time migrating to the new APIs was, unfortunately, a completely pointless waste of time that provided zero value to anyone. Nonetheless, our users trusted us, so lots and lots of projects spend substantial effort on migrating to the new API: figuring out how it worked, making PRs, reviewing them, writing shims to work across the old and new API, having big discussions about how to make the new API work with Cython, debating what to do about the cases where the new APIs were inadequate, etc. None of this served any purpose: they just did it because they trusted us, and we misled them. It's pretty shameful, honestly. Everyone meant well, but in retrospect it was a terrible betrayal of our users' trust. Now, that only affected projects that were using the NumPy C API, and even then, only developers who were diligent and trying to follow the latest updates; there were no runtime warnings, nothing visible to end-users, etc. Your proposal has something like 100x-1000x more impact, because you want to make all C extensions in Python get updated or at least audited, and projects that aren't updated will produce mysterious crashes, incorrect output, or loud error messages that cause users to come after the developers and demand fixes. Now maybe that's worth it. I think on net the Py3 transition was worth it, and that was even more difficult. But Py3 had an incredible amount of scrutiny and rationale. Here you're talking about breaking the C API, and your rationales so far are, I'm sorry, completely half-assed. You've never even tried to address the most difficult objections, the rationales you have written down are completely hand-wave-y, and AFAICT none of them stand up to any serious scrutiny. (For one random example: have you even measured how much subinterpreters might improve startup time on Windows versus subprocesses? I did, and AFAICT in any realistic scenario it's completely irrelevant – the majority of startup cost is importing modules, not spawning a subprocess, and anyway in any case where subinterpreters make sense to use, startup costs are only a tiny fraction of total runtime. Maybe I'm testing the wrong scenario, and you can come up with a better one. But how are you at the point of asking for PEP acceptance without any test results at all?!) Yes, subinterpreters are a neat idea, and a beautiful dream. But on its own, that's not enough to justify burning up many person-years of our users' lives. You can do better than this, and you need to.
* provides a minimal way to pass information between subinterpreters (which you don't need in C but do in Python code) * adds a few minor conveniences like propagating exceptions and making it easier to share buffers safely
These are a new API, and the current draft does seem like, well, a draft. Probably there's not much point in talking about it until the points above are resolved. But even if CPython should support subinterpreters, it would still be better to evolve the API outside the stdlib until it's more mature. Or at least have some users! Every API sucks in its first draft, that's just how API design works.
Are you concerned about users reporting bugs that surface when an incompatible extension is used in a subinterpreter? That shouldn't be a problem if we raise ImportError if an extension that does not support PEP 489 is imported in a subinterpreter.
Making subinterpreter support opt-in would definitely be better than making it opt-out. When C extensions break with subinterpreters, it's often in super-obscure ways where it's not at all clear that subinterpreters are involved. But notice that this means that no-one can use subinterpreters at all, until all of their C extensions have had significant reworks to use the new API, which will take years and tons of work -- it's similar to the Python 3 transition. Many libraries will never make the jump. And why should anyone bother to wait?
Saying it's "obviously" the "only" reason is a bit much. :) PEP 554 exposes existing functionality that hasn't been all that popular (until recently for some reason <wink>) mostly because it is old, was never publicized (until recently), and involved using the C-API. As soon as folks learn about it they want it, for various reasons including (relative) isolation and reduced resource usage in large-scale deployment scenarios. It becomes even more attractive if you say subinterpreters allow you to work around the GIL in a single process, but that isn't the only reason.
I'm worried that you might be too close to this, and convincing yourself that there's some pent-up demand that doesn't actually exist. Subinterpreters have always been documented in the C API docs, and they've had decades for folks to try them out and/or improve support if it was useful. CPython has seen *huge* changes in that time, with massive investments on many fronts. But no serious work happened on subinterpreters until you started advocating for the GIL splitting idea. But anyway, you say here that it's useful for "(relative) isolation and reduced resource usage". That's great, I'm asking for rationale and there's some rationale! Can you expand that into something that's detailed enough to actually evaluate? We already have robust support for threads for low-isolation and subprocesses for high-isolation. Can you name some use cases where neither of these are appropriate and you instead want an in-between isolation – like subprocesses, but more fragile and with odd edge cases where state leaks between them? Why do you believe that subinterpreters will have reduced resource usage? I assume you're comparing them to subprocesses here. Subinterpreters are "shared-nothing"; all code, data, etc. has to be duplicated, except for static C code ... which is exactly the same as how subprocesses work. So I don't see any theoretical reason why they should have reduced resource usage. And theory aside, have you measured the resource usage? Can you share your measurements?
Or if PEP 554 is really a good idea on its own merits, purely as a new concurrency API, then why not build that concurrency API on top of multiprocessing and put it on PyPI and let real users try it out?
As I said, the aim of PEP 554 isn't to provide a full concurrency model, though it could facilitate something like CSP. FWIW, there are CSP libraries on PyPI already, but they are limited due to their reliance on threads or multiprocessing.
What are these limitations? Can you name some?
etc. is stupendously complex,
The project involves lots of little pieces, each supremely tractable. So if by "stupendously complex" you mean "stupendously tedious/boring" then I agree. :) It isn't something that requires a big brain so much as a willingness to stick with it.
I think you're being over-optimistic here :-/. The two of us have had a number of conversations about this project over the last few years. And as I remember it, I've repeatedly pointed out that there were several fundamental unanswered questions, any one of which could easily sink the whole project, and also a giant pile of boring straightforward work, and I encouraged you to start with the high-risk parts to prove out the idea before investing all that time in the tedious parts. And you've explicitly told me that no, you wanted to work on the easy parts first, and defer the big questions until later. So, well... you're asking for a PEP to be accepted. I think that means it's "later". And I feel like a bit of a jerk raising these difficult questions, after all the work you and others have poured into this, but... that's kind of what you explicitly decided to set yourself up for? I'm not sure what you were expecting. tl;dr: accepting PEP 554 is effectively a C API break, and will force many thousands of people worldwide to spend many hours wrangling with subinterpreter support. And I've spent a ton of time thinking about it, talking to folks about it, etc., over the last few years, and I still just can't see any rationale that stands up to scrutiny. So I think accepting PEP 554 now would be a betrayal of our users' trust, harm our reputation, and lead to a situation where a few years down the road we all look back and think "why did we waste so much energy on that?" -n -- Nathaniel J. Smith -- https://vorpus.org