[Python-Dev] Re: PEP 554 for 3.9 or 3.10?

20 Apr 2020

      On Fri, Apr 17, 2020 at 3:57 PM Eric Snow  wrote:
...
On Fri, Apr 17, 2020 at 2:59 PM Nathaniel Smith  wrote:
...
I think some perspective might be useful here :-).
The last time we merged a new concurrency model in the stdlib, it was asyncio.
[snip]
OTOH, AFAICT the new concurrency model in PEP 554 has never actually
been used, and it isn't even clear whether it's useful at all.
Perhaps I didn't word things quite right.  PEP 554 doesn't provide a
new concurrency model so much as it provides functionality that could
probably be used as the foundation for one.
That makes it worse, right? If I wrote a PEP saying "here's some
features that could possibly someday be used to make a new concurrency
model", that wouldn't make it past the first review.
...
Ultimately the module
proposed in the PEP does the following:
* exposes the existing subinterpreters functionality almost as-is
So I think this is a place where we see things really differently.

I guess your perspective is, subinterpreters are already a CPython
feature, so we're not adding anything, and we don't really need to
talk about whether CPython should support subinterpreters.

But this simply isn't true. Yes, there's some APIs for subinterpreters
added back in the 1.x days, but they were never really thought
through, and have never actually worked. There are exactly 3 users,
and all have serious issues, and a strategy for avoiding
subinterpreters because of the brokenness. In practice, the existing
ecosystem of C extensions has never supported subinterpreters.

This is clearly not a great state of affairs – we should either
support them or not support them. Shipping a broken feature doesn't
help anyone. But the current status isn't terribly harmful, because
the general consensus across the ecosystem is that they don't work and
aren't used.

If we start exposing them in the stdlib and encouraging people to use
them, though, that's a *huge* change. Our users trust us. If we tell
them that subinterpreters are a real thing now, then they'll spend
lots of effort on trying to support them.

Since subinterpreters are confusing, and break the C API/ABI, this
means that every C extension author will have to spend a substantial
amount of time figuring out what subinterpreters are, how they work,
squinting at PEP 489, asking questions, auditing their code, etc. This
will take years, and in the mean time, users will expect
subinterpreters to work, be confused at why they break, yell at random
third-party maintainers, spend days trying to track down mysterious
problems that turn out to be caused by subinterpreters, etc. There
will be many many blog posts trying to explain subinterpreters and
understand when they're useful (if ever), arguments about whether to
support them. Twitter threads. Production experiments. If you consider
that we have thousands of existing C extensions and millions of users,
accepting PEP 554 means forcing people you don't know to collectively
spend many person-years on subinterpreters.

Random story time: NumPy deprecated some C APIs some years ago, a
little bit before I got involved. Unfortunately, it wasn't fully
thought through; the new APIs were a bit nicer-looking, but didn't
enable any new features, didn't provide any path to getting rid of the
old APIs, and in fact it turned out that there were some critical use
cases that still required the old API. So in practice, the deprecation
was never going anywhere; the old APIs work just as well and are never
going to get removed, so spending time migrating to the new APIs was,
unfortunately, a completely pointless waste of time that provided zero
value to anyone.

Nonetheless, our users trusted us, so lots and lots of projects spend
substantial effort on migrating to the new API: figuring out how it
worked, making PRs, reviewing them, writing shims to work across the
old and new API, having big discussions about how to make the new API
work with Cython, debating what to do about the cases where the new
APIs were inadequate, etc. None of this served any purpose: they just
did it because they trusted us, and we misled them. It's pretty
shameful, honestly. Everyone meant well, but in retrospect it was a
terrible betrayal of our users' trust.

Now, that only affected projects that were using the NumPy C API, and
even then, only developers who were diligent and trying to follow the
latest updates; there were no runtime warnings, nothing visible to
end-users, etc. Your proposal has something like 100x-1000x more
impact, because you want to make all C extensions in Python get
updated or at least audited, and projects that aren't updated will
produce mysterious crashes, incorrect output, or loud error messages
that cause users to come after the developers and demand fixes.

Now maybe that's worth it. I think on net the Py3 transition was worth
it, and that was even more difficult. But Py3 had an incredible amount
of scrutiny and rationale. Here you're talking about breaking the C
API, and your rationales so far are, I'm sorry, completely half-assed.
You've never even tried to address the most difficult objections, the
rationales you have written down are completely hand-wave-y, and
AFAICT none of them stand up to any serious scrutiny.

(For one random example: have you even measured how much
subinterpreters might improve startup time on Windows versus
subprocesses? I did, and AFAICT in any realistic scenario it's
completely irrelevant – the majority of startup cost is importing
modules, not spawning a subprocess, and anyway in any case where
subinterpreters make sense to use, startup costs are only a tiny
fraction of total runtime. Maybe I'm testing the wrong scenario, and
you can come up with a better one. But how are you at the point of
asking for PEP acceptance without any test results at all?!)

Yes, subinterpreters are a neat idea, and a beautiful dream. But on
its own, that's not enough to justify burning up many person-years of
our users' lives. You can do better than this, and you need to.
...
* provides a minimal way to pass information between subinterpreters
(which you don't need in C but do in Python code)
* adds a few minor conveniences like propagating exceptions and making
it easier to share buffers safely
These are a new API, and the current draft does seem like, well, a
draft. Probably there's not much point in talking about it until the
points above are resolved. But even if CPython should support
subinterpreters, it would still be better to evolve the API outside
the stdlib until it's more mature. Or at least have some users! Every
API sucks in its first draft, that's just how API design works.
...
Are you concerned about users reporting bugs that surface when an
incompatible extension is used in a subinterpreter?  That shouldn't be
a problem if we raise ImportError if an extension that does not
support PEP 489 is imported in a subinterpreter.
Making subinterpreter support opt-in would definitely be better than
making it opt-out. When C extensions break with subinterpreters, it's
often in super-obscure ways where it's not at all clear that
subinterpreters are involved.

But notice that this means that no-one can use subinterpreters at all,
until all of their C extensions have had significant reworks to use
the new API, which will take years and tons of work -- it's similar to
the Python 3 transition. Many libraries will never make the jump.

And why should anyone bother to wait?
...
Saying it's "obviously" the "only" reason is a bit much. :)  PEP 554
exposes existing functionality that hasn't been all that popular
(until recently for some reason <wink>) mostly because it is old, was
never publicized (until recently), and involved using the C-API.  As
soon as folks learn about it they want it, for various reasons
including (relative) isolation and reduced resource usage in
large-scale deployment scenarios.  It becomes even more attractive if
you say subinterpreters allow you to work around the GIL in a single
process, but that isn't the only reason.
I'm worried that you might be too close to this, and convincing
yourself that there's some pent-up demand that doesn't actually exist.
Subinterpreters have always been documented in the C API docs, and
they've had decades for folks to try them out and/or improve support
if it was useful. CPython has seen *huge* changes in that time, with
massive investments on many fronts. But no serious work happened on
subinterpreters until you started advocating for the GIL splitting
idea.

But anyway, you say here that it's useful for "(relative) isolation
and reduced resource usage". That's great, I'm asking for rationale
and there's some rationale! Can you expand that into something that's
detailed enough to actually evaluate?

We already have robust support for threads for low-isolation and
subprocesses for high-isolation. Can you name some use cases where
neither of these are appropriate and you instead want an in-between
isolation – like subprocesses, but more fragile and with odd edge
cases where state leaks between them?

Why do you believe that subinterpreters will have reduced resource
usage? I assume you're comparing them to subprocesses here.
Subinterpreters are "shared-nothing"; all code, data, etc. has to be
duplicated, except for static C code ... which is exactly the same as
how subprocesses work. So I don't see any theoretical reason why they
should have reduced resource usage.

And theory aside, have you measured the resource usage? Can you share
your measurements?
...
...
Or if PEP 554 is really a good idea on its own merits,
purely as a new concurrency API, then why not build that concurrency
API on top of multiprocessing and put it on PyPI and let real users
try it out?
As I said, the aim of PEP 554 isn't to provide a full concurrency
model, though it could facilitate something like CSP.  FWIW, there are
CSP libraries on PyPI already, but they are limited due to their
reliance on threads or multiprocessing.
What are these limitations? Can you name some?
...
...
etc. is stupendously complex,
The project involves lots of little pieces, each supremely tractable.
So if by "stupendously complex" you mean "stupendously tedious/boring"
then I agree. :)  It isn't something that requires a big brain so much
as a willingness to stick with it.
I think you're being over-optimistic here :-/.

The two of us have had a number of conversations about this project
over the last few years. And as I remember it, I've repeatedly pointed
out that there were several fundamental unanswered questions, any one
of which could easily sink the whole project, and also a giant pile of
boring straightforward work, and I encouraged you to start with the
high-risk parts to prove out the idea before investing all that time
in the tedious parts. And you've explicitly told me that no, you
wanted to work on the easy parts first, and defer the big questions
until later.

So, well... you're asking for a PEP to be accepted. I think that means
it's "later". And I feel like a bit of a jerk raising these difficult
questions, after all the work you and others have poured into this,
but... that's kind of what you explicitly decided to set yourself up
for? I'm not sure what you were expecting.

tl;dr: accepting PEP 554 is effectively a C API break, and will force
many thousands of people worldwide to spend many hours wrangling with
subinterpreter support. And I've spent a ton of time thinking about
it, talking to folks about it, etc., over the last few years, and I
still just can't see any rationale that stands up to scrutiny. So I
think accepting PEP 554 now would be a betrayal of our users' trust,
harm our reputation, and lead to a situation where a few years down
the road we all look back and think "why did we waste so much energy
on that?"

-n

-- 
Nathaniel J. Smith -- https://vorpus.org