[Python-Dev] PEP 590 discussion

Sun Apr 14 07:30:48 EDT 2019

Hi, Petr

On 10/04/2019 5:25 pm, Petr Viktorin wrote:
> Hello!
> I've had time for a more thorough reading of PEP 590 and the reference 
> implementation. Thank you for the work!
> Overall, I like PEP 590's direction. I'd now describe the fundamental 
> difference between PEP 580 and PEP 590 as:
> - PEP 580 tries to optimize all existing calling conventions
> - PEP 590 tries to optimize (and expose) the most general calling 
> convention (i.e. fastcall)
> 
> PEP 580 also does a number of other things, as listed in PEP 579. But I 
> think PEP 590 does not block future PEPs for the other items.
> On the other hand, PEP 580 has a much more mature implementation -- and 
> that's where it picked up real-world complexity.
> 
> PEP 590's METH_VECTORCALL is designed to handle all existing use cases, 
> rather than mirroring the existing METH_* varieties.
> But both PEPs require the callable's code to be modified, so requiring 
> it to switch calling conventions shouldn't be a problem.
> 
> Jeroen's analysis from 
> https://mail.python.org/pipermail/python-dev/2018-July/154238.html seems 
> to miss a step at the top:
> 
> a. CALL_FUNCTION* / CALL_METHOD opcode
>        calls
> b. _PyObject_FastCallKeywords()
>        which calls
> c. _PyCFunction_FastCallKeywords()
>        which calls
> d. _PyMethodDef_RawFastCallKeywords()
>        which calls
> e. the actual C function (*ml_meth)()
> 
> I think it's more useful to say that both PEPs bridge a->e (via 
> _Py_VectorCall or PyCCall_Call).
> 
> 
> PEP 590 is built on a simple idea, formalizing fastcall. But it is 
> complicated by PY_VECTORCALL_ARGUMENTS_OFFSET and 
> Py_TPFLAGS_METHOD_DESCRIPTOR.
> As far as I understand, both are there to avoid intermediate 
> bound-method object for LOAD_METHOD/CALL_METHOD. (They do try to be 
> general, but I don't see any other use case.)
> Is that right?

Not quite.
Py_TPFLAGS_METHOD_DESCRIPTOR is for LOAD_METHOD/CALL_METHOD, it allows 
any callable descriptor to benefit from the LOAD_METHOD/CALL_METHOD 
optimisation.

PY_VECTORCALL_ARGUMENTS_OFFSET exists so that callables that make onward 
calls with an additional argument can do so efficiently. The obvious 
example is bound-methods, but classes are at least as important.
cls(*args) -> cls.new(cls, *args) -> cls.__init__(self, *args)

> (I'm running out of time today, but I'll write more on why I'm asking, 
> and on the case I called "impossible" (while avoiding creation of a 
> "bound method" object), later.)
> 
> 
> The way `const` is handled in the function signatures strikes me as too 
> fragile for public API.
> I'd like if, as much as possible, PY_VECTORCALL_ARGUMENTS_OFFSET was 
> treated as a special optimization that extension authors can either opt 
> in to, or blissfully ignore.
> That might mean:
> - vectorcall, PyObject_VectorCallWithCallable, PyObject_VectorCall, 
> PyCall_MakeTpCall all formally take "PyObject *const *args"
> - a naïve callee must do "nargs &= ~PY_VECTORCALL_ARGUMENTS_OFFSET" 
> (maybe spelled as "nargs &= PY_VECTORCALL_NARGS_MASK"), but otherwise 
> writes compiler-enforced const-correct code.
> - if PY_VECTORCALL_ARGUMENTS_OFFSET is set, the callee may modify 
> "args[-1]" (and only that, and after the author has read the docs).

The updated minimal implementation now uses `const` arguments.
Code that uses args[-1] must explicitly cast away the const.
https://github.com/markshannon/cpython/blob/vectorcall-minimal/Objects/classobject.c#L55

> 
> 
> Another point I'd like some discussion on is that vectorcall function 
> pointer is per-instance. It looks this is only useful for type objects, 
> but it will add a pointer to every new-style callable object (including 
> functions). That seems wasteful.
> Why not have a per-type pointer, and for types that need it (like 
> PyTypeObject), make it dispatch to an instance-specific function?

Firstly, each callable has different behaviour, so it makes sense to be 
able to do the dispatch from caller to callee in one step. Having a 
per-object function pointer allows that.
Secondly, callables are either large or transient. If large, then the 
extra few bytes makes little difference. If transient then, it matters 
even less.
The total increase in memory is likely to be only a few tens of 
kilobytes, even for a large program.

> 
> 
> Minor things:
> - "Continued prohibition of callable classes as base classes" -- this 
> section reads as a final. Would you be OK wording this as something 
> other PEPs can tackle?
> - "PyObject_VectorCall" -- this looks extraneous, and the reference 
> imlementation doesn't need it so far. Can it be removed, or justified?

Yes, removing it makes sense. I can then rename the clumsily named 
"PyObject_VectorCallWithCallable" as "PyObject_VectorCall".

> - METH_VECTORCALL is *not* strictly "equivalent to the currently 
> undocumented METH_FASTCALL | METH_KEYWORD flags" (it has the 
> ARGUMENTS_OFFSET complication).

METH_VECTORCALL is just making METH_FASTCALL | METH_KEYWORD documented 
and public.
Would you prefer that it has a different name to prevent confusion with 
over PY_VECTORCALL_ARGUMENTS_OFFSET?

I don't like calling things "fast" or "new" as the names can easily 
become misleading.
New College, Oxford is over 600 years old. Not so "new" any more :)

> - I'd like to officially call this PEP "Vectorcall", see 
> https://github.com/python/peps/pull/984
> 
> 
> 
> Mark, what are your plans for next steps with PEP 590? If a volunteer 
> wanted to help you push this forward, what would be the best thing to 
> work on?

The minimal implementation is also a complete implementation. Third 
party code can use the vectorcall protocol immediately use and be called 
efficiently from the interpreter.
I think it is very close to being mergeable.

To gain the promised performance improvements is obviously a lot more 
work, but can be done incrementally over the next few months.

Cheers,
Mark.