[Python-ideas] Why CPython is still behind in performance for some widely used patterns ?

Steven D'Aprano steve at pearwood.info
Fri Jan 26 19:25:35 EST 2018


On Fri, Jan 26, 2018 at 02:18:53PM -0800, Chris Barker wrote:

[...]
> sure, but I would argue that you do need to write code in a clean style
> appropriate for the language at hand.

Indeed. If you write Java-esque code in Python with lots of deep chains 
obj.attr.spam.eggs.cheese.foo.bar.baz expecting that the compiler will 
resolve them at compile-time, your code will be slow.

No language is immune from this: it is possible to write bad code in any 
language, and if you write Pythonesque highly dynamic code using lots of 
runtime dispatching in Java, your Java benchmarks will be slow too.

But having agreed with your general principle, I'm afraid I have to 
disagree with your specific:

> For instance, the above creates a function that is a simple one-liner --
> there is no reason to do that, and the fact that function calls to have
> significant overhead in Python is going to bite you.

I disagree that there is no reason to write simple "one-liners". As soon 
as you are calling that one-liner from more than two, or at most three, 
places, the DRY principle strongly suggests you move it into a function.

Even if you're only calling the one-liner from the one place, there can 
still be reasons to refactor it out into a separate function, such as 
for testing and maintainability.

Function call overhead is a genuine pain-point for Python code which 
needs to be fast. I'm fortunate that I rarely run into this in practice: 
most of the time either my code doesn't need to be fast (if it takes 3 
ms instead of 0.3 ms, I'm never going to notice the difference) or the 
function call overhead is trivial compared to the rest of the 
computation. But it has bit me once or twice, in the intersection of:

- code that needs to be as fast as possible;
- code that needs to be factored into subroutines;
- code where the cost of the function calls is a significant 
  fraction of the overall cost.

When all three happen at the same time, it is painful and there's no 
good solution.


> "inlining" the filter call is making the code more pythonic and readable --
> a no brainer. I wouldn't call that a optimization.

In this specific case of "if rule.x in whatever.x", I might agree with 
you, but if the code is a bit more complex but still a one-liner:

    if rules[key].matcher.lower() in data[key].person.history:

I would much prefer to see it factored out into a function or method. So 
we have to judge each case on its merits: it isn't a no-brainer that 
inline code is always more Pythonic and readable.


> making rule.x local is an optimization -- that is, the only reason you'd do
> it to to make the code go faster. how much difference did that really make?

I assumed that rule.x could be a stand-in for a longer, Java-esque chain 
of attribute accesses.


> I also don't know what type your "whatevers" are, but "x in something" can
> be order (n) if they re sequences, and using a dict or set would be a much
> better performance.

Indeed. That's a good point.


-- 
Steve


More information about the Python-ideas mailing list