Hello! Sorry for the delay; PyCon is keeping me busy. On the other hand, I did get to talk to a lot of smart people here!
I'm leaning toward accepting PEP 590 (with some changes still). Let's start focusing on it. As for the changes, I have these 4 points:
I feel that the API needs some contact with real users before it's set in stone. That was the motivation behind my proposal for PEP 590 with additional flags. At PyCon, Nick Coghlan suggested another option make the API "provisional": make it formally private. Py_TPFLAGS_HAVE_VECTORCALL would be underscore-prefixed, and the docs would say that it can change. in Python 3.9, the semantics will be finalized and the underscore removed. This would allow high-maintenance projects (like Cython) to start using it and give their feedback, and we'd have a chance to respond to the feedback.
tp_vectorcall_offset should be what's replacing tp_print in the struct. The current implementation has tp_vectorcall there. This way, Cython can create vectorcall callables for older Pythons. (See PEP 580: https://www.python.org/dev/peps/pep-0580/#replacing-tp-print).
Subclassing should not be forbidden. Jeroen, do you want write a section for how subclassing should work?
Given Jeroen's research and ideas that went into the PEP (and hopefully, we'll incorporate some PEP 580 text as well), it seems fair to list him as co-author of the accepted PEP, instead of just listing PEP 580 in the acknowledgement section.
On some other points:
- Single bound method class for all kinds of function classes: This would be a cleaner design, yes, but I don't see a pressing need. As PEP 579 says, "this is a compounding issue", not a goal. As I recall, that is the only major reason for CCALL_DEFARG. PEP 590 says that x64 Windows passes 4 arguments in registers. Admittedly, I haven't checked this, nor the performance implications (so this would be a good point to argue!), but it seems like a good reason to keep the argument count down. So, no CCALL_DEFARG.
- In reply to this Mark's note:
PEP 590 is fully universal, it supports callables that can do anything with anything. There is no need for it to be extended because it already supports any possible behaviour.
I don't buy this point. The current tp_call also supports any possible behavior. Here we want to support any behavior *efficiently*. As a specific example: for calling PEP 590 callable with a kwarg dict, there'll need to be an extra allocation. That's inefficient relative to PEP 580 (or PEP 590 plus allowing a dict in "kwnames"). But I'm willing to believe the inefficiency is acceptable.