Mailman 3 Why CPython is still behind in performance for some widely used patterns ? - Python-ideas

Why CPython is still behind in performance for some widely used patterns ?

Pau Freixes

Jan. 26, 2018

9:35 p.m.

Hi, This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go. One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that def filter(rule, whatever): if rule.x in whatever.x: return True rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1 return cnt The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations. Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1 The performance of the CPython boosted x3/x4 just doing these "silly" things. The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default? The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all. Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns. This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong. BTW: pypy looks like is immunized [1] [1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau

Show replies by date

Edward Minnix

January 2018

10:03 p.m.

There are several reasons for the issues you are mentioning. 1. Attribute look up is much more complicated than you would think. (If you have the time, watch https://www.youtube.com/watch?v=kZtC_4Ecq1Y that will explain things better than I can) The series of operations that happen with every `obj.attr` occurrence can be complicated. It goes something like: def get_attr(obj, attr): if attr in obj.__dict__: value = obj.__dict__[attr] if is_descriptor(value): return value(obj) else: return value else: for cls in type(obj).mro(): if attr in cls.__dict__: value = cls.__dict__[attr] if is_descriptor(value): return value(obj) else: return value else: raise AttributeError('Attribute %s not found' % attr) Therefore, the caching means this operation is only done once instead of n times (where n = len(whatevers)) 2. Function calls 3. Dynamic code makes things harder to optimize Python’s object model allows for constructs that are very hard to optimize without knowing about the structure of the data ahead of time. For instance, if an attribute is defined by a property, there are no guarantees of obj.attr will return the same thing. So in simple terms, the power Python gives you over the language makes it harder to optimize the language. 4. CPython’s compiler makes (as a rule) no optimizations CPython’s compiler is a fairly direct source-to-bytecode compiler, not an actual optimizing compiler. So anything beyond constant-folding and deletion of some types of debug code, the language isn’t going to worry about optimizing things for you. So in simple terms, of the languages you mentioned, JavaScript’s object model is substantially less powerful than Python’s, but it also is more straightforward in terms of what obj.attr means, and the other 3 you mentioned all have statically-typed, optimizing compilers, with a straight-forward method resolution order. The things you see as flaws end up being the way Pythonistas can add more dynamic systems into their APIs (and since we don’t have macros, most of our dynamic operations must be done at run-time). - Ed On Jan 26, 2018, 16:36 -0500, Pau Freixes <pfreixes@gmail.com>, wrote:

...

Hi,

This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go.

One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns.

This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

10:12 p.m.

On Sat, Jan 27, 2018 at 8:35 AM, Pau Freixes <pfreixes@gmail.com> wrote:

...

Did you consider using a set instead of a list for your inclusion checks? I don't have the full details of what the code is doing, but the "in" check on a large set can be incredibly fast compared to the equivalent on a list/array.

...

Are you sure it's the language's fault? Failing to use a better data type simply because some other language doesn't have it is a great way to make a test that's "fair" in the same way that Balance and Armageddon are "fair" in Magic: The Gathering. They reset everyone to the baseline, and the baseline's equal for everyone right? Except that that's unfair to a language that prefers to work somewhere above the baseline, and isn't optimized for naive code. ChrisA

Steven D'Aprano

11:07 p.m.

On Sat, Jan 27, 2018 at 09:12:29AM +1100, Chris Angelico wrote:

...

I'm afraid I have no idea what that analogy means :-) -- Steve

Chris Angelico

2:29 a.m.

On Sat, Jan 27, 2018 at 10:07 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

When you push everyone to an identical low level, you're not truly being fair. Let's say you try to benchmark a bunch of programming languages against each other by having them use no more than four local variables, all integers, one static global array for shared storage, and no control flow other than conditional GOTOs. (After all, that's all you get in some machine languages!) It's perfectly fair, all languages have to compete on the same grounds. But it's also completely UNfair on high level languages, because you're implementing things in terribly bad ways. "Fair" is a tricky concept, and coding in a non-Pythonic way is not truly "fair" to Python. ChrisA

Chris Barker

10:18 p.m.

If there are robust and simple optimizations that can be added to CPython, great, but: This mail is the consequence of a true story, a story where CPython

...

got defeated by Javascript, Java, C# and Go.

at least those last three are statically compiled languages -- they are going to be faster than Python for this sort of thing -- particularly for code written in a non-pythonic style... def filter(rule, whatever):

...

if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

sure, but I would argue that you do need to write code in a clean style appropriate for the language at hand. For instance, the above creates a function that is a simple one-liner -- there is no reason to do that, and the fact that function calls to have significant overhead in Python is going to bite you. for rule in rules:

...

x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

"inlining" the filter call is making the code more pythonic and readable -- a no brainer. I wouldn't call that a optimization. making rule.x local is an optimization -- that is, the only reason you'd do it to to make the code go faster. how much difference did that really make? I also don't know what type your "whatevers" are, but "x in something" can be order (n) if they re sequences, and using a dict or set would be a much better performance. and perhaps collections.Counter would help here, too. In short, it is a non-goal to get python to run as fast as static langues for simple nested loop code like this :-) The case of the rule cache IMHO is very striking, we have plenty

...

examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

you can bet it's been considered -- the Python core devs are a pretty smart bunch :-) The fundamental reason is that rule.x could change inside that loop -- so you can't cache it unless you know for sure it won't. -- Again, dynamic language. The case of the slowness to call functions in CPython is quite

...

recurrent and looks like its an unsolved problem at all.

dynamic language again ... If the default code that you

...

can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

yes, that's true -- but your example shouldn't be the default code you write in Python. BTW: pypy looks like is immunized [1]

...

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582

PyPy uses a JIT -- which is the way to make a dynamic language run faster -- That's kind of why it exists.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Soni L.

10:59 p.m.

On 2018-01-26 08:18 PM, Chris Barker wrote:

...

Java and C#? Statically compiled? Haha. No. Java has a bytecode. While yes, Java doesn't need to compile your code before running it, the compilation time in CPython is usually minimal, unless you're using eval. You can precompile your python into bytecode but it's usually not worth it. Java can also load bytecode at runtime and do bytecode manipulation stuff. The only "real" benefit of Java is that object layout is pretty much static. (This can be simulated with __slots__ I think? idk.) See also, for example: http://luajit.org/ext_ffi.html#cdata (The same goes for C#. Idk about Go.) (Ofc, their JITs do also help. But even with the JIT off, it's still pretty good.)

...

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

sure, but I would argue that you do need to write code in a clean style appropriate for the language at hand. For instance, the above creates a function that is a simple one-liner -- there is no reason to do that, and the fact that function calls to have significant overhead in Python is going to bite you.

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

"inlining" the filter call is making the code more pythonic and readable -- a no brainer. I wouldn't call that a optimization.

making rule.x local is an optimization -- that is, the only reason you'd do it to to make the code go faster. how much difference did that really make?

I also don't know what type your "whatevers" are, but "x in something" can be order (n) if they re sequences, and using a dict or set would be a much better performance.

and perhaps collections.Counter would help here, too.

In short, it is a non-goal to get python to run as fast as static langues for simple nested loop code like this :-)

The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

you can bet it's been considered -- the Python core devs are a pretty smart bunch :-)

The fundamental reason is that rule.x could change inside that loop -- so you can't cache it unless you know for sure it won't. -- Again, dynamic language.

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

dynamic language again ...

If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

yes, that's true -- but your example shouldn't be the default code you write in Python.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 <https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582>

PyPy uses a JIT -- which is the way to make a dynamic language run faster -- That's kind of why it exists....

-CHB

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov <mailto:Chris.Barker@noaa.gov>

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Cameron Simpson

12:20 a.m.

On 26Jan2018 20:59, Soni L. <fakedme+py@gmail.com> wrote:

...

However, both Java and Go are staticly typed; I think C# is too, but don't know. The compiler has full knowledge of the types of almost every symbol, and can write machine optimal code for operations (even though the initial machine is the JVM for Java - I gather the JVM bytecode is also type annotated, so JITs can in turn do a far better job at making machine optimal machine code when used). This isn't really an option for "pure" Python. Cheers, Cameron Simpson <cs@cskk.id.au> (formerly cs@zip.com.au)

Steven D'Aprano

12:25 a.m.

On Fri, Jan 26, 2018 at 02:18:53PM -0800, Chris Barker wrote: [...]

...

sure, but I would argue that you do need to write code in a clean style appropriate for the language at hand.

Indeed. If you write Java-esque code in Python with lots of deep chains obj.attr.spam.eggs.cheese.foo.bar.baz expecting that the compiler will resolve them at compile-time, your code will be slow. No language is immune from this: it is possible to write bad code in any language, and if you write Pythonesque highly dynamic code using lots of runtime dispatching in Java, your Java benchmarks will be slow too. But having agreed with your general principle, I'm afraid I have to disagree with your specific:

...

For instance, the above creates a function that is a simple one-liner -- there is no reason to do that, and the fact that function calls to have significant overhead in Python is going to bite you.

I disagree that there is no reason to write simple "one-liners". As soon as you are calling that one-liner from more than two, or at most three, places, the DRY principle strongly suggests you move it into a function. Even if you're only calling the one-liner from the one place, there can still be reasons to refactor it out into a separate function, such as for testing and maintainability. Function call overhead is a genuine pain-point for Python code which needs to be fast. I'm fortunate that I rarely run into this in practice: most of the time either my code doesn't need to be fast (if it takes 3 ms instead of 0.3 ms, I'm never going to notice the difference) or the function call overhead is trivial compared to the rest of the computation. But it has bit me once or twice, in the intersection of: - code that needs to be as fast as possible; - code that needs to be factored into subroutines; - code where the cost of the function calls is a significant fraction of the overall cost. When all three happen at the same time, it is painful and there's no good solution.

...

"inlining" the filter call is making the code more pythonic and readable -- a no brainer. I wouldn't call that a optimization.

In this specific case of "if rule.x in whatever.x", I might agree with you, but if the code is a bit more complex but still a one-liner: if rules[key].matcher.lower() in data[key].person.history: I would much prefer to see it factored out into a function or method. So we have to judge each case on its merits: it isn't a no-brainer that inline code is always more Pythonic and readable.

...

making rule.x local is an optimization -- that is, the only reason you'd do it to to make the code go faster. how much difference did that really make?

I assumed that rule.x could be a stand-in for a longer, Java-esque chain of attribute accesses.

...

I also don't know what type your "whatevers" are, but "x in something" can be order (n) if they re sequences, and using a dict or set would be a much better performance.

Indeed. That's a good point. -- Steve

Victor Stinner

11:28 p.m.

Hi, Well, I wrote https://faster-cpython.readthedocs.io/ website to answer to such question. See for example https://faster-cpython.readthedocs.io/mutable.html "Everything in Python is mutable". Victor 2018-01-26 22:35 GMT+01:00 Pau Freixes <pfreixes@gmail.com>:

...

Hi,

This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go.

One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns.

This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Wes Turner

12:47 a.m.

On Friday, January 26, 2018, Victor Stinner <victor.stinner@gmail.com> wrote:

...

...

Victor

...
Hi,

This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go.

One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly"

2018-01-26 22:35 GMT+01:00 Pau Freixes <pfreixes@gmail.com>: things.

...
The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns.

This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Nick Coghlan

6:42 a.m.

On 27 January 2018 at 07:35, Pau Freixes <pfreixes@gmail.com> wrote:

...

Not really, as we've seen with the relatively slow adoption of PyPy over the past several years. CPython, as an implementation, emphasises C/C++ compatibility, and internal interpreter simplicity. That comes at a definite cost in runtime performance (especially where attribute access and function calls are concerned), but has also enabled an enormous orchestration ecosystem, originally around C/C++/FORTRAN components, but now increasingly around Rust components within the same process, as well as out-of-process Java, C#, and JavaScript components. In this usage model, if Python code becomes the throughput bottleneck, it's only because something has gone wrong at the system architecture level. PyPy, by contrast, emphasises raw speed, sacrificing various aspects of CPython's C/C++ interoperability in order to attain it. It's absolutely the implementation you want to be using if your main concern is the performance of your Python code in general, and there aren't any obvious hotspots that could be more selectively accelerated. To date, the CPython model of "Use (C)Python to figure out what kind of problem you have, then rewrite your performance bottlenecks in a language more specifically tailored to that problem space" has proven relatively popular. There's likely still more we can do within CPython to make typical code faster without increasing the interpreter complexity too much (e.g. Yury's idea of introducing an implicit per-opcode result cache into the eval loop), but opt-in solutions that explicit give up some of Python's language level dynamism are always going to be able to do less work at runtime than typical Python code does. Cheers, Nick. P.S. You may find https://www.curiousefficiency.org/posts/2015/10/languages-to-improve-your-py... interesting in the context of considering some of the many factors other than raw speed that may influence people's choice of programming language. Similarly, https://www.curiousefficiency.org/posts/2017/10/considering-pythons-target-a... provides some additional info on the scope of Python's use cases (for the vast majority of which, "How many requests per second can I serve in a naive loop in a CPU bound process?" isn't a particularly relevant characteristic) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Stephan Houben

10:33 a.m.

Hi all, I would like to remark that, in my opinion, the question of CPython's performance cannot be decoupled from the extremely wide selection of packages which provide optimized code for almost any imaginable task. For example: Javascript may be faster than (C)Python on simple benchmarks, but as soon as the task is somewhat amenable to scypi, and I can use scipy in Python, the resulting performance will completely cream Javascript in a way that isn't funny anymore. And scipy is just an example, there are tons of such libraries for all kinds of tasks I am not aware of any language ecosystem with a similar wide scope of packages; at least Java and Node both fall short. (Node may have more packages by number but the quality is definitely less and there is tons of overlap). Stephan 2018-01-27 7:42 GMT+01:00 Nick Coghlan <ncoghlan@gmail.com>:

...

On 27 January 2018 at 07:35, Pau Freixes <pfreixes@gmail.com> wrote:

...
This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

Not really, as we've seen with the relatively slow adoption of PyPy over the past several years.

CPython, as an implementation, emphasises C/C++ compatibility, and internal interpreter simplicity. That comes at a definite cost in runtime performance (especially where attribute access and function calls are concerned), but has also enabled an enormous orchestration ecosystem, originally around C/C++/FORTRAN components, but now increasingly around Rust components within the same process, as well as out-of-process Java, C#, and JavaScript components. In this usage model, if Python code becomes the throughput bottleneck, it's only because something has gone wrong at the system architecture level.

PyPy, by contrast, emphasises raw speed, sacrificing various aspects of CPython's C/C++ interoperability in order to attain it. It's absolutely the implementation you want to be using if your main concern is the performance of your Python code in general, and there aren't any obvious hotspots that could be more selectively accelerated.

To date, the CPython model of "Use (C)Python to figure out what kind of problem you have, then rewrite your performance bottlenecks in a language more specifically tailored to that problem space" has proven relatively popular. There's likely still more we can do within CPython to make typical code faster without increasing the interpreter complexity too much (e.g. Yury's idea of introducing an implicit per-opcode result cache into the eval loop), but opt-in solutions that explicit give up some of Python's language level dynamism are always going to be able to do less work at runtime than typical Python code does.

Cheers, Nick.

P.S. You may find https://www.curiousefficiency. org/posts/2015/10/languages-to-improve-your-python.html# broadening-our-horizons interesting in the context of considering some of the many factors other than raw speed that may influence people's choice of programming language. Similarly, https://www.curiousefficiency. org/posts/2017/10/considering-pythons-target-audience.html provides some additional info on the scope of Python's use cases (for the vast majority of which, "How many requests per second can I serve in a naive loop in a CPU bound process?" isn't a particularly relevant characteristic)

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Pau Freixes

9:18 p.m.

Hi, Thanks to all of you for your responses, the points of view and the information that you shared to back up your rationales, I had some time to visit few of them but sure I will try to suit the proper time to review all of them. It's hard to try to keep the discussion organized responding at each response, if you don't mind I would do it with just this email. If you believe that I'm missing something important shoot it. First of all, my fault starting the discussion with the language battle side, this didn't help it to focus the conversation to the point that I wanted to discuss. So, the intention was to raise two use cases which both have a performance cost that could be explicitly circumvented by the developer, taking into account that both are, let's say, well known by the community. Correct me if I'm wrong, but most of you argue that the proper Zen of Python - can we say it mutability [1]? as Victor pointed out - that allow the user have the freedom to mutate objects in runtime goes in the opposite direction of allowing the *compiler* to make code with some optimizations. Or, more specifically for the ceval - *interpreter*? - apply some hacks that would help to reduce the footprint of some operations. Im wondering if a solution might pass for having something like that [2] but for generic attributes, should it be possible? has been discussed before ? is there any red-flag that you might thing that will make to much complicated a well-balanced solution? Regarding the cost of calling a function, that I can guess is not related with the previous stuff, what is an impediment right now to make it faster ? [1] https://faster-cpython.readthedocs.io/mutable.html [2] https://bugs.python.org/issue28158 On Fri, Jan 26, 2018 at 10:35 PM, Pau Freixes <pfreixes@gmail.com> wrote:

...

Hi,

This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go.

One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns.

This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau

-- --pau

Nick Coghlan

6:14 a.m.

On 28 January 2018 at 07:18, Pau Freixes <pfreixes@gmail.com> wrote:

...

At a technical level, the biggest problems relate to the way we manipulate frame objects at runtime, including the fact that we expose those frames programmatically for the benefit of debuggers and other tools. More broadly, the current lack of perceived commercial incentives for large corporations to invest millions in offering a faster default Python runtime, the way they have for the other languages you mentioned in your initial post :) Cheers, Nick. P.S. Fortunately for Python users in general, those incentives are in the process of changing, as we see the rise of platforms like AWS Lambda (where vendors and platforms charging by the RAM-second gives a clear financial incentive to investing in software performance improvements). -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Pau Freixes

7:35 a.m.

...

Shoudnt be something that could be tackled with the introduction of a kind of "-g" flag ? Asking the user to make explicit that is willing on having all of this extra information that in normal situations won't be there.

...

Agree, at least from my understanding, Google has had a lot of initiatives to improve the JS runtime. But at the same moment, these last years and with the irruption of Asyncio many companies such as Facebook are implementing their systems on top of CPython meaning that they are indirectly inverting on it. -- --pau

Nick Coghlan

11:09 a.m.

On 28 January 2018 at 17:35, Pau Freixes <pfreixes@gmail.com> wrote:

...

This is exactly what some other Python runtimes do, although some of them are also able to be clever about it and detect at runtime if you're doing something that relies on access to frame objects (e.g. PyPy does that). That's one of the biggest advantages of making folks opt-in to code acceleration measures, whether it's using a different interpreter implementation (like PyPy), or using some form of accelerator in combination with CPython (like Cython or Numba): because those tools are opt-in, they don't necessarily need to execute 100% of the software that runs on CPython, they only need to execute the more speed sensitive software that folks actually try to run on them. And because they're not directly integrated into CPython, they don't need to abide by our design and implementation constraints either. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Brett Cannon

5:10 p.m.

On Sat, Jan 27, 2018, 23:36 Pau Freixes, <pfreixes@gmail.com> wrote:

...

I find that's a red herring. There are plenty of massive companies that have relied on Python for performance-critical workloads in timespans measuring in decades and they have not funded core Python development or the PSF in a way even approaching the other languages Python was compared against in the original email. It might be the feeling of community ownership that keeps companies from making major investments in Python, but regardless it's important to simply remember the core devs are volunteers so the question of "why hasn't this been solved" usually comes down to "lack of volunteer time". -Brett

...

Pau Freixes

February 2018

3:34 p.m.

Maybe it is happening but not in the way that you would expect https://mail.python.org/pipermail/python-dev/2018-January/152029.html Anyway, do we conclude, or at least a significant part, that is something desiderable but some constraints do not allow to work on that? Also, more technically Iwouls like to have your point of view of two questions, sorry if these sound kind stupid. 1) Is CPython 4 a good place to start to think on make the default execution mone debuggale. Having an explicit -g operand that by default is disabled, shouldnt be an open window for changing many thinks behind the scenes? 2) Regarding the Yuris proposal to cache bultin functions, why this strategy cant be used for objects and their attributes within the function scope? Technically which is the red flag? Cheers, El 29/01/2018 18:10, "Brett Cannon" <brett@python.org> escribió:

...

Brett Cannon

5:55 p.m.

On Thu, 1 Feb 2018 at 07:34 Pau Freixes <pfreixes@gmail.com> wrote:

...

Maybe it is happening but not in the way that you would expect

https://mail.python.org/pipermail/python-dev/2018-January/152029.html

As one of the people who works at Microsoft and has Steve as a teammate I'm well aware of what MS contributes. :) My point is even with the time Steve, me, and our fellow core devs at MS get to spend on Python, it still pales in comparison to what some other languages get with dedicated staffing.

...

Anyway, do we conclude, or at least a significant part, that is something desiderable but some constraints do not allow to work on that?

I'm not sure what you're referencing as "something desirable", but I think we all want to see Python improve if possible.

...

Also, more technically Iwouls like to have your point of view of two questions, sorry if these sound kind stupid.

1) Is CPython 4 a good place to start to think on make the default execution mone debuggale. Having an explicit -g operand that by default is disabled, shouldnt be an open window for changing many thinks behind the scenes?

Don't view Python 4 as a magical chance to do a ton of breaking changes like Python 3.

...

2) Regarding the Yuris proposal to cache bultin functions, why this strategy cant be used for objects and their attributes within the function scope? Technically which is the red flag?

Descriptors are the issue for attributes. After that it's a question of whether it's worth the overhead of other scope levels (built-ins are somewhat unique in that they are very rarely changed). The key point is that all of this requires people's time and we just don't have tons of that available at the moment. -Brett

...

Cheers,

El 29/01/2018 18:10, "Brett Cannon" <brett@python.org> escribió:

...
On Sat, Jan 27, 2018, 23:36 Pau Freixes, <pfreixes@gmail.com> wrote:

...
...
At a technical level, the biggest problems relate to the way we manipulate frame objects at runtime, including the fact that we expose those frames programmatically for the benefit of debuggers and other tools.

Shoudnt be something that could be tackled with the introduction of a kind of "-g" flag ? Asking the user to make explicit that is willing on having all of this extra information that in normal situations won't be there.

...
More broadly, the current lack of perceived commercial incentives for large corporations to invest millions in offering a faster default Python runtime, the way they have for the other languages you mentioned in your initial post :)

Agree, at least from my understanding, Google has had a lot of initiatives to improve the JS runtime. But at the same moment, these last years and with the irruption of Asyncio many companies such as Facebook are implementing their systems on top of CPython meaning that they are indirectly inverting on it.

I find that's a red herring. There are plenty of massive companies that have relied on Python for performance-critical workloads in timespans measuring in decades and they have not funded core Python development or the PSF in a way even approaching the other languages Python was compared against in the original email. It might be the feeling of community ownership that keeps companies from making major investments in Python, but regardless it's important to simply remember the core devs are volunteers so the question of "why hasn't this been solved" usually comes down to "lack of volunteer time".

-Brett

...
-- --pau _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Yury Selivanov

6:04 p.m.

On Thu, Feb 1, 2018 at 12:55 PM, Brett Cannon <brett@python.org> wrote:

...

On Thu, 1 Feb 2018 at 07:34 Pau Freixes <pfreixes@gmail.com> wrote:

[..]

...

I'm not sure I understand Pau's question but I can assure that my optimizations were fully backwards compatible and preserved all of Python attribute lookup semantics. And they made some macrobenchmarks up to 10% faster. Unfortunately I failed at merging them in 3.7. Will do that for 3.8. Yury

Pau Freixes

7:30 p.m.

Chris Barker

January 2018

5:14 p.m.

On Sat, Jan 27, 2018 at 10:14 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

sure, though there have been a few high profile (failed?) efforts, by Google and DropBox, yes? Unladen Swallow was one -- not sure of the code name for the other. Turns out it's really hard :-) And, of course, PyPy is one such successful effort :-) And to someone's point, IIUC, PyPy has been putting a lot of effort into C-API compatibility, to the point where it can run numpy: https://pypy.org/compat.html Who knows -- maybe we will all be running PyPy some day :-) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Antoine Pitrou

5:25 p.m.

On Tue, 30 Jan 2018 09:14:06 -0800 Chris Barker <chris.barker@noaa.gov> wrote:

...

You're probably thinking about Pyston. Regards Antoine.

Antoine Pitrou

3:47 p.m.

On Sat, 27 Jan 2018 22:18:08 +0100 Pau Freixes <pfreixes@gmail.com> wrote:

...

Allow me to disagree. It's true that the extremely dynamic and flexible nature of Python makes it much harder to optimize Python code than, say, PHP code (I'm not entirely sure about this, but still). Still, I do think it's a collective failure that we've (*) made little progress in interpreter optimization in the last 10-15 years, compared to other languages. (*) ("we" == the CPython team here; PyPy is another question) Regards Antoine.

David Mertz

10:53 p.m.

...

This code seems almost certainly broken as written. Not just suboptimal, but just plain wrong. As a start, it has a return statement outside a function or method body, so it's not even syntactical (also, `cnt` is never initialized). But assuming we just print out `cnt` it still gives an answer we probably don't want. For example, I wrote get_rules() and get_whatevers() functions that produce a list of "things" that have an x attribute. Each thing holds a (distinct) word from a Project Gutenberg book (Title: Animal Locomotion: Or walking, swimming, and flying, with a dissertation on aëronautics; Author: J. Bell Pettigrew). Whatevers omit a few of the words, since the code suggests there should be more rules. In particular: In [1]: len(get_rules()), len(get_whatevers()) Out [1]: (12306, 12301) Running the probably wrong code is indeed slow: In [2]: %%time def filter(rule, whatever): if rule.x in whatever.x: return True rules = get_rules() whatevers = get_whatevers() cnt = 0 for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1 print(cnt) Out [2]: 110134 CPU times: user 53.1 s, sys: 190 ms, total: 53.3 s Wall time: 53.6 s It's hard for me to imagine why this is the question one would want answered. It seems much more likely you'd want to know: In [3]: %%time len({thing.x for thing in get_rules()} - {thing.x for thing in get_whatevers()}) Out [3]: CPU times: user 104 ms, sys: 4.89 ms, total: 109 ms Wall time: 112 ms 5 So that's 500 times faster, more Pythonic, and seems to actually answer the question one would want answered. However, let's suppose there really is a reason to answer the question in the original code. Using more sensible basic datatypes, we only get about a 3x speedup: In [4]: %%time rules = {thing.x for thing in get_rules()} whatevers = {thing.x for thing in get_whatevers()} cnt = 0 for rule in rules: for whatever in whatevers: if rule in whatever: cnt += 1 print(cnt) Out [4]: 110134 CPU times: user 18.3 s, sys: 96.9 ms, total: 18.4 s Wall time: 18.5 s I'm sure there is room for speedup if the actual problem being solved was better described. Maybe something involving itertools.product(), but it's hard to say without knowing what the correct behavior actually is. Overall this is similar to saying you could implement bogosort in Python, and it would be much slower than calling Timsort with `sorted(my_stuff)`

Barry Warsaw

4:24 a.m.

Just to add another perspective, I find many "performance" problems in the real world can often be attributed to factors other than the raw speed of the CPython interpreter. Yes, I'd love it if the interpreter were faster, but in my experience a lot of other things dominate. At least they do provide low hanging fruit to attack first. This can be anything from poorly written algorithms, a lack of understanding about the way Python works, use of incorrect or inefficient data structures, doing network accesses or other unpredictable work at import time, etc. The bottom line I think is that you have to measure what you've got in production, and attack the hotspots. For example, I love and can't wait to use Python 3.7's `-X importtime` flag to measure regressions in CLI start up times due to unfortunate things appearing in module globals. But there's something else that's very important to consider, which rarely comes up in these discussions, and that's the developer's productivity and programming experience. One of the things that makes Python so popular and effective I think, is that it scales well in the human dimension, meaning that it's a great language for one person, a small team, and scales all the way up to very large organizations. I've become convinced that things like type annotations helps immensely at those upper human scales; a well annotated code base can help ramp up developer productivity very quickly, and tools and IDEs are available that help quite a bit with that. This is often undervalued, but shouldn't be! Moore's Law doesn't apply to humans, and you can't effectively or cost efficiently scale up by throwing more bodies at a project. Python is one of the best languages (and ecosystems!) that make the development experience fun, high quality, and very efficient. Cheers, -Barry

Nick Coghlan

5:45 a.m.

On 30 January 2018 at 14:24, Barry Warsaw <barry@python.org> wrote:

...

I'll also note that one of the things we (and others) *have* been putting quite a bit of time into is the question of "Why do people avoid using extension modules for code acceleration?". While there are definitely still opportunities to speed up the CPython interpreter itself, they're never going to compete for raw speed with the notion of "Let's just opt out of Python's relatively expensive runtime bookkeeping for lower level code units, and use native machine types instead". (Before folks say "But what about PyPy/Numba/etc?": this is what those tools do as well, they're just able to analyse your running code and do it on the fly, rather than having an outer feedback loop of humans doing it explicitly in version control based on benchmarks, flame graphs, and other performance profiling tools) And on that front, things have progressed quite a bit in the past few years: * wheel files & the conda ecosystem mean that precompiled binaries are often readily available for Windows/Mac OS X/Linux x86_64 * tools like pysip and milksnake are working to reduce the combinatorial explosion of potential wheel build targets * tools like Cython work to lower barriers to entry between working with dynamic compilation and working with precompiled systems (this is also where Grumpy fits in, but targeting Go rather than C/C++) * at the CPython interpreter level, we continue to work to reduce the differences between what extension modules can do and what regular source and bytecode modules can do There are still lots of other things it would be nice to have (such as transplanting the notion of JavaScript source maps so that debuggers can more readily map Python pyc files and extension modules back to the corresponding lines of source code), but the idea of "What about precompiling an extension module?" is already markedly less painful than it used to be. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Chris Barker

5:21 p.m.

On Mon, Jan 29, 2018 at 9:45 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

well, the scientific computing community does do that a lot -- with f2py, Cyton, and more recently numba. But the current state of the art makes it fairly easy and practical for number crunching (and to a somewhat less extent basic text crunching), but not so much for manipulating higher order data structures. For example running the OPs code through Cython would likely buy you very little performance. I don't think numba would do much for you either (though I don't have real experience with that) PyPy is the only one I know of that is targeting general "Python" code per se. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Barry Scott

February 2018

8:15 p.m.

...

I think that is simple. Those that try give up because its a difficult API to call correctly. At PYCON UK on speaker explain how she, PhD level researcher, had failed to get the a C extension working. I was contacted to improve PyCXX by a contractor for the US Army that stated that he was called in to help the internal developers get a big library wrapped for use from python. After 6 months they where no where near working code. He did what was needed with PyCXX in 3 weeks + the time getting me to make some nice additions to help him. It seems that if people find a C++ library that will do the heavy lifting they end up with extensions. But those that attempt the C API as is seem to fail and give up. It also seems that people do not go looking for the helper libraries. Next year at PYCON I hope to give a talk on PyCXX and encourage people to write extensions. Barry PyCXX maintainer.

Nick Coghlan

5:47 a.m.

On 2 February 2018 at 06:15, Barry Scott <barry@barrys-emacs.org> wrote:

...

Aye, indeed. That's a big part of why we've never had much motivation to fill in the "How to do this by hand" instructions on https://packaging.python.org/guides/packaging-binary-extensions/: it's so hard to get the refcounting and GIL manager right by hand that it's almost never the right answer vs either using a dedicated extension-module-writing language like Cython, or writing a normal shared external library and then using a wrapper generator like cffi/SWIG/milksnake, or using a helper library like PySIP/PyCXX/Boost to do the heavy lifting for you. So while wheels and conda have helped considerably with the cross-platform end user UX of extension modules, there's still a lot of work to be done around the publisher UX, both for existing publishers (to get more tools to work the way PySIP does and allow a single wheel build to target multiple Python versions), and for new publishers (to make the various extension module authoring tools easier to discover, rather than having folks assuming that handcrafted calls directly into the CPython C API is their only option). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Stefan Behnel

7:15 p.m.

Nick Coghlan schrieb am 02.02.2018 um 06:47:

...

Or even a competitive option. Tools like Cython or pybind11 go to great length to shave off every bit of overhead from C-API calls, commonly replacing high-level C-API functionality with macros and direct access to data structures. The C/C++ code that they generate is so complex and tuned that it would be infeasible to write and maintain something like that by hand, but it can perfectly be generated, and it usually performs visibly better than most hand-written modules, definitely much better than anything a non-expert could write. Basically, by not learning the C-API you can benefit from all that highly tuned and specialised code written by C-API experts that the documentation doesn't even tell you about. Stefan

Antoine Pitrou

6:17 a.m.

On Mon, 19 Feb 2018 20:15:27 +0100 Stefan Behnel <stefan_ml@behnel.de> wrote:

...

Doesn't the documentation ever mention Cython? It probably should (no idea about pybind11, which I've never played with). Perhaps you can open an issue about that? As a sidenote, you can certainly use Cython without learning the C API, but to extract maximum performance it's better to know the C API anyway, to be aware of what kind of optimizations are available in which situations. Regards Antoine.

Nick Coghlan

1:12 p.m.

On 20 February 2018 at 16:17, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

We mention them in the Extending & Embedding guide, and link out to the page on packaging.python.org that describes them in more detail: https://docs.python.org/3/extending/index.html#recommended-third-party-tools Cheers, Nick. P.S. There are also a number of open issues at https://github.com/pypa/python-packaging-user-guide/issues regarding additional projects that should be mentioned in https://packaging.python.org/guides/packaging-binary-extensions/ -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Barry

8:54 p.m.

...

Can you add PyCXX to the list please? Barry

...

Guido van Rossum

10:40 p.m.

On Tue, Feb 20, 2018 at 12:54 PM, Barnstone Worthy <barry@barrys-emacs.org> wrote: I'm pretty sure that's an alias for Barry Warsaw. :-) -- --Guido van Rossum (python.org/~guido)

Nick Coghlan

2:05 a.m.

On 21 February 2018 at 08:40, Guido van Rossum <guido@python.org> wrote:

...

Different Barry :) I've expanded the existing issue at https://github.com/pypa/python-packaging-user-guide/issues/355 to note that there are more options we should at least mention in the binary extensions guide, even if we don't go into the same level of detail as we do for cffi/Cython/SWIG. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Brendan Barnwell

January 2018

5:58 a.m.

On 2018-01-29 20:24, Barry Warsaw wrote:

...

You are quite right. I think, however, that that is precisely why it's important to improve the speed of Python. It's easier to make a good language fast than it is to make a fast language good. It's easier to hack a compiler or an interpreter to run slow code faster than it is to hack the human brain to understand confusing code more easily. So I think the smart move is to take the languages that have intrinsically good design from cognitive/semantic perspective (such as Python) and support that good design with performant implementations. To be clear, this is just me philosophizing since I don't have the ability to do that myself. And I imagine many people on this list already think that anyone who is spending time making JavaScript faster would do better to make Python faster instead! :-) But I think it's an important corollary to the above. Python's excellence in developer-time "speed" is a sort of latent force multiplier that makes execution-time improvements all the more powerful. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Chris Barker - NOAA Federal

11:34 p.m.

...

It's easier to make a good language fast than it is to make a fast language good. It's easier to hack a compiler or an interpreter to run slow code faster than it is to hack the human brain to understand confusing code more easily. So I think the smart move is to take the languages that have intrinsically good design from cognitive/semantic perspective (such as Python) and support that good design with performant implementations.

A lot of smart people have worked on this — and this is where we are. It turns out that keeping Python’s fully dynamic nature while making it run faster if hard! Maybe if more time/money/brains were thrown at cPython, it could be made to run much faster — but it doesn’t look good. -CHB

Antoine Pitrou

6:10 p.m.

On Mon, 29 Jan 2018 23:24:43 -0500 Barry Warsaw <barry@python.org> wrote:

...

Moore's Law doesn't really apply to semiconductors anymore either, and transistor size scaling is increasingly looking like it's reaching its end. Regards Antoine.

Barry Warsaw

8:19 p.m.

Antoine Pitrou wrote:

...

You forget about the quantum computing AI blockchain in the cloud. OTOH, I still haven't perfected my clone army yet. -Barry

Edward Minnix

January 2018

10:03 p.m.

...

Hi,

This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go.

One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns.

This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Angelico

10:12 p.m.

On Sat, Jan 27, 2018 at 8:35 AM, Pau Freixes <pfreixes@gmail.com> wrote:

...

Steven D'Aprano

11:07 p.m.

On Sat, Jan 27, 2018 at 09:12:29AM +1100, Chris Angelico wrote:

...

I'm afraid I have no idea what that analogy means :-) -- Steve

Chris Angelico

2:29 a.m.

On Sat, Jan 27, 2018 at 10:07 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Chris Barker

10:18 p.m.

If there are robust and simple optimizations that can be added to CPython, great, but: This mail is the consequence of a true story, a story where CPython

...

got defeated by Javascript, Java, C# and Go.

...

if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

...

x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

...

examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

...

recurrent and looks like its an unsolved problem at all.

dynamic language again ... If the default code that you

...

can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

yes, that's true -- but your example shouldn't be the default code you write in Python. BTW: pypy looks like is immunized [1]

...

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582

Soni L.

10:59 p.m.

On 2018-01-26 08:18 PM, Chris Barker wrote:

...

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

sure, but I would argue that you do need to write code in a clean style appropriate for the language at hand. For instance, the above creates a function that is a simple one-liner -- there is no reason to do that, and the fact that function calls to have significant overhead in Python is going to bite you.

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

"inlining" the filter call is making the code more pythonic and readable -- a no brainer. I wouldn't call that a optimization.

making rule.x local is an optimization -- that is, the only reason you'd do it to to make the code go faster. how much difference did that really make?

I also don't know what type your "whatevers" are, but "x in something" can be order (n) if they re sequences, and using a dict or set would be a much better performance.

and perhaps collections.Counter would help here, too.

In short, it is a non-goal to get python to run as fast as static langues for simple nested loop code like this :-)

The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

you can bet it's been considered -- the Python core devs are a pretty smart bunch :-)

The fundamental reason is that rule.x could change inside that loop -- so you can't cache it unless you know for sure it won't. -- Again, dynamic language.

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

dynamic language again ...

If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

yes, that's true -- but your example shouldn't be the default code you write in Python.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 <https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582>

PyPy uses a JIT -- which is the way to make a dynamic language run faster -- That's kind of why it exists....

-CHB

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov <mailto:Chris.Barker@noaa.gov>

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Cameron Simpson

January 2018

12:20 a.m.

On 26Jan2018 20:59, Soni L. <fakedme+py@gmail.com> wrote:

...

Steven D'Aprano

12:25 a.m.

On Fri, Jan 26, 2018 at 02:18:53PM -0800, Chris Barker wrote: [...]

...

sure, but I would argue that you do need to write code in a clean style appropriate for the language at hand.

...

For instance, the above creates a function that is a simple one-liner -- there is no reason to do that, and the fact that function calls to have significant overhead in Python is going to bite you.

...

"inlining" the filter call is making the code more pythonic and readable -- a no brainer. I wouldn't call that a optimization.

...

making rule.x local is an optimization -- that is, the only reason you'd do it to to make the code go faster. how much difference did that really make?

I assumed that rule.x could be a stand-in for a longer, Java-esque chain of attribute accesses.

...

I also don't know what type your "whatevers" are, but "x in something" can be order (n) if they re sequences, and using a dict or set would be a much better performance.

Indeed. That's a good point. -- Steve

Victor Stinner

11:28 p.m.

...

Hi,

This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go.

One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns.

This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Wes Turner

12:47 a.m.

On Friday, January 26, 2018, Victor Stinner <victor.stinner@gmail.com> wrote:

...

Victor

...
Hi,

This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go.

One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly"

2018-01-26 22:35 GMT+01:00 Pau Freixes <pfreixes@gmail.com>: things.

...
The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns.

This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Nick Coghlan

6:42 a.m.

On 27 January 2018 at 07:35, Pau Freixes <pfreixes@gmail.com> wrote:

...

Stephan Houben

10:33 a.m.

...

On 27 January 2018 at 07:35, Pau Freixes <pfreixes@gmail.com> wrote:

...
This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

Not really, as we've seen with the relatively slow adoption of PyPy over the past several years.

CPython, as an implementation, emphasises C/C++ compatibility, and internal interpreter simplicity. That comes at a definite cost in runtime performance (especially where attribute access and function calls are concerned), but has also enabled an enormous orchestration ecosystem, originally around C/C++/FORTRAN components, but now increasingly around Rust components within the same process, as well as out-of-process Java, C#, and JavaScript components. In this usage model, if Python code becomes the throughput bottleneck, it's only because something has gone wrong at the system architecture level.

PyPy, by contrast, emphasises raw speed, sacrificing various aspects of CPython's C/C++ interoperability in order to attain it. It's absolutely the implementation you want to be using if your main concern is the performance of your Python code in general, and there aren't any obvious hotspots that could be more selectively accelerated.

To date, the CPython model of "Use (C)Python to figure out what kind of problem you have, then rewrite your performance bottlenecks in a language more specifically tailored to that problem space" has proven relatively popular. There's likely still more we can do within CPython to make typical code faster without increasing the interpreter complexity too much (e.g. Yury's idea of introducing an implicit per-opcode result cache into the eval loop), but opt-in solutions that explicit give up some of Python's language level dynamism are always going to be able to do less work at runtime than typical Python code does.

Cheers, Nick.

P.S. You may find https://www.curiousefficiency. org/posts/2015/10/languages-to-improve-your-python.html# broadening-our-horizons interesting in the context of considering some of the many factors other than raw speed that may influence people's choice of programming language. Similarly, https://www.curiousefficiency. org/posts/2017/10/considering-pythons-target-audience.html provides some additional info on the scope of Python's use cases (for the vast majority of which, "How many requests per second can I serve in a naive loop in a CPU bound process?" isn't a particularly relevant characteristic)

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Pau Freixes

January 2018

4:18 p.m.

...

Hi,

This mail is the consequence of a true story, a story where CPython got defeated by Javascript, Java, C# and Go.

One of the teams of the company where Im working had a kind of benchmark to compare the different languages on top of their respective "official" web servers such as Node.js, Aiohttp, Dropwizard and so on. The test by itself was pretty simple and tried to test the happy path of the logic, a piece of code that fetches N rules from another system and then apply them to X whatevers also fetched from another system, something like that

def filter(rule, whatever): if rule.x in whatever.x: return True

rules = get_rules() whatevers = get_whatevers() for rule in rules: for whatever in whatevers: if filter(rule, whatever): cnt = cnt + 1

return cnt

The performance of Python compared with the other languages was almost x10 times slower. It's true that they didn't optimize the code, but they did not for any language having for all of them the same cost in terms of iterations.

Once I saw the code I proposed a pair of changes, remove the call to the filter function making it "inline" and caching the rule's attributes, something like that

for rule in rules: x = rule.x for whatever in whatevers: if x in whatever.x: cnt += 1

The performance of the CPython boosted x3/x4 just doing these "silly" things.

The case of the rule cache IMHO is very striking, we have plenty examples in many repositories where the caching of none local variables is a widely used pattern, why hasn't been considered a way to do it implicitly and by default?

The case of the slowness to call functions in CPython is quite recurrent and looks like its an unsolved problem at all.

Sure I'm missing many things, and I do not have all of the information. This mail wants to get all of this information that might help me to understand why we are here - CPython - regarding this two slow patterns.

This could be considered an unimportant thing, but its more relevant than someone could expect, at least IMHO. If the default code that you can write in a language is by default slow and exists an alternative to make it faster, this language is doing something wrong.

BTW: pypy looks like is immunized [1]

[1] https://gist.github.com/pfreixes/d60d00761093c3bdaf29da025a004582 -- --pau