On 27/03/2019 1:50 pm, Petr Viktorin wrote:
On Sun, Mar 24, 2019 at 4:22 PM Mark Shannon firstname.lastname@example.org wrote:
Regarding PEPs 576 and 580. Over the new year, I did a thorough analysis of possible approaches to possible calling conventions for use in the CPython ecosystems and came up with a new PEP. The draft can be found here: https://github.com/markshannon/peps/blob/new-calling-convention/pep-9999.rst
I was hoping to profile a branch with the various experimental changes cherry-picked together, but don't seemed to have found the time :(
I'd like to have a testable branch, before formally submitting the PEP, but I'd thought you should be aware of the PEP.
Hello Mark, Thank you for letting me know! I wish I knew of this back in January, when you committed the first draft. This is unfair to the competing PEP, which is ready and was waiting for the new govenance. We have lost three months that could be spent pondering the ideas in the pre-PEP.
I realize this is less than ideal. I had planned to publish this in December, but life intervened. Nothing bad, just too busy.
Do you think you will find the time to piece things together? Is there anything that you already know should be changed?
I've submitted the final PEP and minimal implementation https://github.com/python/peps/pull/960 https://github.com/python/cpython/compare/master...markshannon:vectorcall-mi...
Do you have any comments on [Jeroen's comparison]?
It is rather out of date, but two comments. 1. `_PyObject_FastCallKeywords()` is used as an example of a call in CPython. It is an internal implementation detail and not a common path. 2. The claim that PEP 580 allows "certain optimizations because other code can make assumptions" is flawed. In general, the caller cannot make assumptions about the callee or vice-versa. Python is a dynamic language.
The pre-PEP is simpler then PEP 580, because it solves simpler issues.
The fundamental issue being addressed is the same, and it is this: Currently third-party C code can either be called quickly or have access to the callable object, not both. Both PEPs address this.
I'll need to confirm that it won't paint us into a corner -- that there's a way to address all the issues in PEP 579 in the future.
PEP 579 is mainly a list of supposed flaws with the 'builtin_function_or_method' class. The general thrust of PEP 579 seems to be that builtin-functions and builtin-methods should be more flexible and extensible than they are. I don't agree. If you want different behaviour, then use a different object. Don't try an cram all this extra behaviour into a pre-existing object.
However, if we assume that we are talking about callables implemented in C, in general, then there are 3 key issues covered by PEP 579.
1. Inspection and documentation; it is hard for extensions to have docstrings and signatures. Worth addressing, but completely orthogonal to PEP 590. 2. Extensibility and performance; extensions should have the power of Python functions without suffering slow calls. Allowing the C code access to the callable object is a general solution to this problem. Both PEP 580 and PEP 590 do this. 3. Exposing the underlying implementation and signature of the C code, so that optimisers can avoid unnecessary boxing. This may be worth doing, but until we have an adaptive optimiser capable of exploiting this information, this is premature. Neither PEP 580 nor PEP 590 explicit allow or prevent this.
The pre-PEP claims speedups of 2% in initial experiments, with expected overall performance gain of 4% for the standard benchmark suite. That's pretty big.
That's because there is a lot of code around calls in CPython, and it has grown in a rather haphazard fashion. Victor's work to add the "FASTCALL" protocol has helped. PEP 590 seeks to formalise and extend that, so that it can be used more consistently and efficiently.
As far as I can see, PEP 580 claims not much improvement in CPython, but rather large improvements for extensions (Mistune with Cython).
Calls to and from extension code are slow because they have to use the `tp_call` calling convention (or lose access to the callable object). With a calling convention that does not have any special cases, extensions can be as fast as builtin functions. Both PEP 580 and PEP 590 attempt to do this, but PEP 590 is more efficient.
The pre-PEP has a complication around offsetting arguments by 1 to allow bound methods forward calls cheaply. I fear that this optimizes for current usage with its limitations.
It's optimising for the common case, while allowing the less common. Bound methods and classes need to add one additional argument. Other rarer cases, like `partial` may need to allocate memory, but can still add or remove any number of arguments.
PEP 580's cc_parent allows bound methods to have access to the class, and through that, the module object where they are defined and the corresponding module state. To support this, vector calls would need a two-argument offset.
Not true. The first argument in the vector call is the callable itself. Through that it, any callable can access its class, its module or any other object it wants.
(That seems to illustrate the main difference between the motivations of the two PEPs: one focuses on extensibility; the other on optimizing existing use cases.)
I'll reiterate that PEP 590 is more general than PEP 580 and that once the callable's code has access to the callable object (as both PEPs allow) then anything is possible. You can't can get more extensible than that.
The pre-PEP's "any third-party class implementing the new call interface will not be usable as a base class" looks quite limiting.
PEP 580 has the same limitation for the same reasons. The limitation is necessary for correctness if an object supports calls via `__call__` and through another calling convention.