[Python-Dev] PEP 580 and PEP 590 comparison.

Sat Apr 27 05:26:29 EDT 2019

Hi,

On 15/04/2019 9:34 am, Jeroen Demeyer wrote:
> On 2019-04-14 13:34, Mark Shannon wrote:
>> I'll address capability first.
> 
> I don't think that comparing "capability" makes a lot of sense since 
> neither PEP 580 nor PEP 590 adds any new capabilities to CPython. They 
> are meant to allow doing things faster, not to allow more things.
> 
> And yes, the C call protocol can be implemented on top of the vectorcall 
> protocol and conversely, but that doesn't mean much.

That isn't true. You cannot implement PEP 590 on top of PEP 580. PEP 580 
isn't as general.
Specifically, and this is important, PEP 580 cannot implement efficient 
calls to class objects without breaking the ABI.

> 
>> Now performance.
>>
>> Currently the PEP 590 implementation is intentionally minimal. It does
>> nothing for performance.
> 
> So, we're missing some information here. What kind of performance 
> improvements are possible with PEP 590 which are not in the reference 
> implementation?

Performance improvements include, but aren't limited to:

1. Much faster calls to common classes: range(), set(), type(), list(), etc.
2. Modifying argument clinic to produce C functions compatible with the 
vectorcall, allowing the interpreter to call the C function directly, 
with no additional overhead beyond the vectorcall call sequence.
3. Customization of the C code for function objects depending on the 
Python code. The would probably be limited to treating closures and 
generator function differently, but optimizing other aspects of the 
Python function call is possible.

> 
>> The benchmark Jeroen provides is a
>> micro-benchmark that calls the same functions repeatedly. This is
>> trivial and unrealistic.
> 
> Well, it depends what you want to measure... I'm trying to measure 
> precisely the thing that makes PEP 580 and PEP 590 different from the 
> status-quo, so in that sense those benchmarks are very relevant.
> 
> I think that the following 3 statements are objectively true:
> 
> (A) Both PEP 580 and PEP 590 add a new calling convention, which is 
> equally fast as builtin functions (and hence faster than tp_call).
Yes

> (B) Both PEP 580 and PEP 590 keep roughly the same performance as the 
> status-quo for existing function/method calls.
For the minimal implementation of PEP 590, yes. I would expect a small 
improvement with and implementation of PEP 590 including optimizations.

> (C) While the performance of PEP 580 and PEP 590 is roughly the same,
> PEP 580 is slightly faster (based on the reference implementations 
> linked from PEP 580 and PEP 590)I quite deliberately used the term "minimal" to describe the 
implementation of PEP 590 you have been using.
PEP 590 allows many optimizations.
Comparing the performance of the four hundred line minimal diff for PEP 
590 with the full four thousand line diff for PEP 580 is misleading.

> 
> Two caveats concerning (C):
> - the difference may be too small to matter. Relatively, it's a few 
> percent of the call time but in absolute numbers, it's less than 10 CPU 
> clock cycles.
> - there might be possible improvements to the reference implementation 
> of either PEP 580/PEP 590. I don't expect big differences though.
> 
>> To repeat an example
>> from an earlier email, which may have been overlooked, this code reduces
>> the time to create ranges and small lists by about 30%
> 
> That's just a special case of the general fact (A) above and using the 
> new calling convention for "type". It's an argument in favor of both PEP 
> 580 and PEP 590, not for PEP 590 specifically.

It very much is an argument in favor of PEP 590. PEP 580 cannot do this.

Cheers,
Mark.