Damien George mentioned [1] that in MicroPython there are 16 dedicated
opcodes for loading from positions 0-15, and 16 for storing to these
positions. This not only makes a bytecode more compact, but might make
CPython slightly faster too.
I have collected statistic for opcodes generated in CPython during
running the compileall module for the Lib/ directory. Raw data file is
attached.
Here is top-16 most used opcodes. Column 4 is the percentage among all
opcodes. Columns 5-9 are …
[View More]percents of opcodes with argument that exceeds
specified width.
# op name % >2bit >3bit >4bit >8bit
>16bit
1. 100 LOAD_CONST 22.2 57.8 39.7 25.8 3.6
0
2. 124 LOAD_FAST 14.7 19.3 5.95 1.04 0
0
3. 106 LOAD_ATTR 10.6 49.5 23.1 6.61 0
0
4. 131 CALL_FUNCTION 10.3 8.54 5.68 5.6 5.6
0
5. 116 LOAD_GLOBAL 5.8 39.2 16.7 4.21 0
0
6. 1 POP_TOP 5.11
7. 90 STORE_NAME 4.04 69.5 50.4 33.1 4.03
0
8. 125 STORE_FAST 3.8 39.4 12.8 2.24 0
0
9. 83 RETURN_VALUE 2.68
10. 132 MAKE_FUNCTION 2.16 1.33 0.861 0.782 0.776
0.23
11. 101 LOAD_NAME 1.81 64 49.9 35.3 2.35
0
12. 114 POP_JUMP_IF_FALSE 1.39 100 100 99.5 27.4
0
13. 102 BUILD_TUPLE 1.28 12.4 2.54 0.211 0
0
14. 107 COMPARE_OP 1.15 56.6 36.3 0 0
0
15. 87 POP_BLOCK 0.89
16. 110 JUMP_FORWARD 0.734 75.1 69.6 58.2 2.36
0
13.7% of opcodes doesn't have an argument. None argument exceeds 24
bits. Only 0.005% opcodes have an argument exceeded 16 bits and required
EXTENDED_ARG (these opcodes are MAKE_FUNCTION and MAKE_CLOSURE). Only 2%
of opcodes have an argument exceeded 8 bits. This is mostly jumping opcodes.
More that every 5th opcode is LOAD_CONST. Only few files like
Lib/html/__init__.py or Lib/locale.py contain LOAD_CONST with argument
larger than 1024. They contain a large number of constants. Arguments
larger than 256 are used mostly in encoding files for constructing
mappings that are never used.
Mean size of opcode is 2.73 bytes.
Here is top-32 of most used opcodes with arguments:
# op name arg % cum%
1. 124 LOAD_FAST 0 5.76 5.76
2. 131 CALL_FUNCTION 1 4.27 10.03
3. 124 LOAD_FAST 1 3.10 13.13
4. 131 CALL_FUNCTION 2 2.87 16.00
5. 100 LOAD_CONST 1 2.81 18.80
6. 100 LOAD_CONST 0 2.67 21.47
7. 100 LOAD_CONST 2 2.20 23.67
8. 132 MAKE_FUNCTION 0 1.98 25.64
9. 124 LOAD_FAST 2 1.86 27.51
10. 100 LOAD_CONST 3 1.68 29.19
11. 131 CALL_FUNCTION 0 1.54 30.72
12. 106 LOAD_ATTR 0 1.44 32.16
13. 106 LOAD_ATTR 1 1.39 33.55
14. 106 LOAD_ATTR 2 1.36 34.91
15. 116 LOAD_GLOBAL 0 1.34 36.24
16. 100 LOAD_CONST 4 1.31 37.56
17. 124 LOAD_FAST 3 1.18 38.73
18. 106 LOAD_ATTR 3 1.17 39.91
19. 100 LOAD_CONST 5 1.09 41.00
20. 125 STORE_FAST 1 0.95 41.94
21. 106 LOAD_ATTR 4 0.94 42.88
22. 116 LOAD_GLOBAL 1 0.93 43.81
23. 100 LOAD_CONST 6 0.89 44.69
24. 124 LOAD_FAST 4 0.78 45.48
25. 131 CALL_FUNCTION 3 0.76 46.23
26. 106 LOAD_ATTR 5 0.74 46.98
27. 125 STORE_FAST 2 0.74 47.72
28. 100 LOAD_CONST 7 0.73 48.45
29. 116 LOAD_GLOBAL 2 0.73 49.18
30. 102 BUILD_TUPLE 2 0.68 49.86
31. 106 LOAD_ATTR 6 0.62 50.47
32. 100 LOAD_CONST 8 0.61 51.08
There are two ways to make Python opcodes more compact.
1. Add 1-byte dedicated opcodes for most used opcodes with arguments.
If add 32 short opcodes for 6 most popular opcodes: LOAD_CONST and
LOAD_ATTR with arguments from 0 to 7, LOAD_FAST, STORE_FAST, LOAD_GLOBAL
and CALL_FUNCTION with arguments from 0 to 3, this will decrease mean
size of opcode from 2.73 to 1.75 bytes (1.56 times). This might make
decoding of 49% opcodes slightly faster due to avoiding decoding argument.
This is relative simple and compatible way. But not fully compatible. We
have either shift numbers for opcodes with arguments to free space for
additional opcodes, or add new range for argumentless opcodes at the end
of 8-bit space. Third-party tools that work with CPython bytecode should
be updated.
2. Use 16-bit opcodes as in WPython.
This will decrease mean size of opcode from 2.73 to 2.044 bytes (1.33
times). But this might make decoding of 85% opcodes slightly faster.
Raymond shown a roughly 10+% speedup in code exercising simple and
common opcodes that that have a oparg when just optimize oparg decoding.
[2] Using 16-bit opcodes with embedded oparg might have even larger effect.
This change needs more serious reworking of compiler and interpreter,
not only disassembler and third-party tools.
[1] http://permalink.gmane.org/gmane.comp.python.devel/156162
[2] http://bugs.python.org/issue25823
[View Less]
One defect of a mailing list is the difficulty of viewing a weighted average of opinions. The benefit is that anyone can voice an opinion. This is more like the Senate than the House -- Rhode Island appears (on paper) to have as much influence as California. Luckily, we have a form of President. I'm guessing a House occurs in a more private mode of communication?
Perhaps as the community gets larger, a system like StackOverflow might be a better tool for handling things like Python-Ideas.
…
[View More]> On Jan 27, 2016, at 12:58 PM, Sjoerd Job Postmus <sjoerdjob(a)sjec.nl> wrote:
> (not sure if I even have the right to vote here, given that I'm not a
> core developer, but just giving my opinion)
[View Less]
On Feb 4, 2016, at 10:55, Serhiy Storchaka <storchaka(a)gmail.com> wrote:
>> On 04.02.16 19:27, Sven R. Kunze wrote:
>> On 04.02.2016 17:55, Serhiy Storchaka wrote:
>>>>> 1. Add 1-byte dedicated opcodes for most used opcodes with arguments.
>>
>> Do you have a working CPython to showcast this?
>
> Not yet. The interpreter is easy, but the compiler is more complicated in this case.
>
>>>>> 2. Use 16-bit opcodes as in …
[View More]WPython.
>>
>> Same question as above + does it make sense to combine those two options?
>
> Of course not. Having some opcodes to be 8-bit kills the benefit of all 16-bit opcodes.
>
> This options needs to rewrite more code, but resulting code can be easier. I afraid that might be problem with bootstrapping, generating freezed modules.
I just tried it out to see what's hard. And it seems like it's all easy. I was able to get it working (not passing all tests, of course--you'd have to fix dis, pdb, etc. for that) on my lunch hour.
I didn't try to be smart anywhere: every opcode gets 8 bits for an arg, and uses up to three EXTENDED_ARG ops if it needs more. I didn't do any simplification, nor did I cast the bytecode to unsigned short *.
The hardest changes are in ceval (plus frameobject--I didn't realize lasti has to start lasti at -2 instead of -1 until I tried to run it and immediately got errors about unknown opcodes at odd offsets).
Compile is pretty easy by comparison.
Nothing at all has to be done for bootstrap, freeze, marshal, etc. (unless you want to add backward-compat code to load non-word-based .pyc files).
You'd also need to fix at least peephole (I just disabled it), dis, and pdb, but none of them look too hard.
The bootstrap bytecode and stdlib .pyc files ends up about 5% smaller (and that's with the peephole optimizer disabled, so presumably it would be better in real life).
If anyone wants to look at or play with the code, I'm in the process of regenerating my github fork, but it'll be branch wpy on https://github.com/abarnert/cpython once that's done (maybe 20 minutes? I forget how long cloning takes).
[View Less]