Re: [Python-Dev] Speeding up CPython 5-10%

Hi Yury, (Sorry for misspelling your name previously!)
Yes, we'll need to add CALL_METHOD{_VAR|_KW|etc} opcodes to optimize all kind of method calls. However, I'm not sure how big the impact will be, need to do more benchmarking.
I never did such fine grained analysis with MicroPython. I don't think there are many uses of * and ** that it'd be worth it, but definitely there are lots of uses of plain keywords. Also, you'd want to consider how simple/complex it is to treat all these different opcodes in the compiler. For us, it's simpler to treat everything the same. Otherwise your LOAD_METHOD part of the compiler will need to peek deep into the AST to see what kind of call it is.
BTW, how do you benchmark MicroPython?
Haha, good question! Well, we use Pystone 1.2 (unmodified) to do basic benchmarking, and find it to be quite good. We track our code live at: http://micropython.org/resources/code-dashboard/ You can see there the red line, which is the Pystone result. There was a big jump around Jan 2015 which is when we introduced opcode dictionary caching. And since then it's been very gradually increasing due to small optimisations here and there. Pystone is actually a great benchmark for embedded systems because it gives very reliable results there (almost zero variation across runs) and if we can squeeze 5 more Pystones out with some change then we know that it's a good optimisation (for efficiency at least). For us, low RAM usage and small code size are the most important factors, and we track those meticulously. But in fact, smaller code size quite often correlates with more efficient code because there's less to execute and it fits in the CPU cache (at least on the desktop). We do have some other benchmarks, but they are highly specialised for us. For example, how fast can you bit bang a GPIO pin using pure Python code. Currently we get around 200kHz on a 168MHz MCU, which shows that pure (Micro)Python code is about 100 times slower than C.
That's a neat idea! You're right, it does require bytecode to become writeable. I considered implementing a similar strategy, but this would be a big change for CPython. So I decided to minimize the impact of the patch and leave the opcodes untouched.
I think you need to consider "big" changes, especially ones like this that can have a great (and good) impact. But really, this is a behind-the-scenes change that *should not* affect end users, and so you should not have any second thoughts about doing it. One problem I see with CPython is that it exposes way too much to the user (both Python programmer and C extension writer) and this hurts both language evolution (you constantly need to provide backwards compatibility) and ability to optimise. Cheers, Damien.

Damien, On 2016-01-27 4:20 PM, Damien George wrote:
Hi Yury,
(Sorry for misspelling your name previously!)
NP. As long as the first letter is "y" I don't care ;)
Yes, we'll need to add CALL_METHOD{_VAR|_KW|etc} opcodes to optimize all kind of method calls. However, I'm not sure how big the impact will be, need to do more benchmarking. I never did such fine grained analysis with MicroPython. I don't think there are many uses of * and ** that it'd be worth it, but definitely there are lots of uses of plain keywords. Also, you'd want to consider how simple/complex it is to treat all these different opcodes in the compiler. For us, it's simpler to treat everything the same. Otherwise your LOAD_METHOD part of the compiler will need to peek deep into the AST to see what kind of call it is.
BTW, how do you benchmark MicroPython? Haha, good question! Well, we use Pystone 1.2 (unmodified) to do basic benchmarking, and find it to be quite good. We track our code live at:
The dashboard is cool! An off-topic: have you ever tried hg.python.org/benchmarks or compare MicroPython vs CPython? I'm curious if MicroPython is faster -- in that case we'll try to copy some optimization ideas.
You can see there the red line, which is the Pystone result. There was a big jump around Jan 2015 which is when we introduced opcode dictionary caching. And since then it's been very gradually increasing due to small optimisations here and there.
Do you use opcode dictionary caching only for LOAD_GLOBAL-like opcodes? Do you have an equivalent of LOAD_FAST, or you use dicts to store local variables?
That's a neat idea! You're right, it does require bytecode to become writeable. I considered implementing a similar strategy, but this would be a big change for CPython. So I decided to minimize the impact of the patch and leave the opcodes untouched. I think you need to consider "big" changes, especially ones like this that can have a great (and good) impact. But really, this is a behind-the-scenes change that *should not* affect end users, and so you should not have any second thoughts about doing it.
If we change the opcode size, it will probably affect libraries that compose or modify code objects. Modules like "dis" will also need to be updated. And that's probably just a tip of the iceberg. We can still implement your approach if we add a separate private 'unsigned char' array to each code object, so that LOAD_GLOBAL can store the key offsets. It should be a bit faster than my current patch, since it has one less level of indirection. But this way we loose the ability to optimize LOAD_METHOD, simply because it requires more memory for its cache. In any case, I'll experiment!
One problem I see with CPython is that it exposes way too much to the user (both Python programmer and C extension writer) and this hurts both language evolution (you constantly need to provide backwards compatibility) and ability to optimise.
Right. Even though CPython explicitly states that opcodes and code objects might change in the future, we still have to be careful about changing them. Yury
participants (2)
-
Damien George
-
Yury Selivanov