Mailman 3 More compact bytecode - Python-ideas

Feb. 4, 2016

      Damien George mentioned [1] that in MicroPython there are 16 dedicated 
opcodes for loading from positions 0-15, and 16 for storing to these 
positions. This not only makes a bytecode more compact, but might make 
CPython slightly faster too.

I have collected statistic for opcodes generated in CPython during 
running the compileall module for the Lib/ directory. Raw data file is 
attached.

Here is top-16 most used opcodes. Column 4 is the percentage among all 
opcodes. Columns 5-9 are percents of opcodes with argument that exceeds 
specified width.

   #   op name                        %   >2bit   >3bit   >4bit   >8bit 
  >16bit
   1. 100 LOAD_CONST               22.2    57.8    39.7    25.8     3.6 
       0
   2. 124 LOAD_FAST                14.7    19.3    5.95    1.04       0 
       0
   3. 106 LOAD_ATTR                10.6    49.5    23.1    6.61       0 
       0
   4. 131 CALL_FUNCTION            10.3    8.54    5.68     5.6     5.6 
       0
   5. 116 LOAD_GLOBAL               5.8    39.2    16.7    4.21       0 
       0
   6.   1 POP_TOP                  5.11
   7.  90 STORE_NAME               4.04    69.5    50.4    33.1    4.03 
       0
   8. 125 STORE_FAST                3.8    39.4    12.8    2.24       0 
       0
   9.  83 RETURN_VALUE             2.68
  10. 132 MAKE_FUNCTION            2.16    1.33   0.861   0.782   0.776 
    0.23
  11. 101 LOAD_NAME                1.81      64    49.9    35.3    2.35 
       0
  12. 114 POP_JUMP_IF_FALSE        1.39     100     100    99.5    27.4 
       0
  13. 102 BUILD_TUPLE              1.28    12.4    2.54   0.211       0 
       0
  14. 107 COMPARE_OP               1.15    56.6    36.3       0       0 
       0
  15.  87 POP_BLOCK                0.89
  16. 110 JUMP_FORWARD            0.734    75.1    69.6    58.2    2.36 
       0

13.7% of opcodes doesn't have an argument. None argument exceeds 24 
bits. Only 0.005% opcodes have an argument exceeded 16 bits and required 
EXTENDED_ARG (these opcodes are MAKE_FUNCTION and MAKE_CLOSURE). Only 2% 
of opcodes have an argument exceeded 8 bits. This is mostly jumping opcodes.

More that every 5th opcode is LOAD_CONST. Only few files like 
Lib/html/__init__.py or Lib/locale.py contain LOAD_CONST with argument 
larger than 1024. They contain a large number of constants. Arguments 
larger than 256 are used mostly in encoding files for constructing 
mappings that are never used.

Mean size of opcode is 2.73 bytes.

Here is top-32 of most used opcodes with arguments:

  #   op name            arg       %     cum%
  1. 124 LOAD_FAST        0      5.76     5.76
  2. 131 CALL_FUNCTION    1      4.27    10.03
  3. 124 LOAD_FAST        1      3.10    13.13
  4. 131 CALL_FUNCTION    2      2.87    16.00
  5. 100 LOAD_CONST       1      2.81    18.80
  6. 100 LOAD_CONST       0      2.67    21.47
  7. 100 LOAD_CONST       2      2.20    23.67
  8. 132 MAKE_FUNCTION    0      1.98    25.64
  9. 124 LOAD_FAST        2      1.86    27.51
10. 100 LOAD_CONST       3      1.68    29.19
11. 131 CALL_FUNCTION    0      1.54    30.72
12. 106 LOAD_ATTR        0      1.44    32.16
13. 106 LOAD_ATTR        1      1.39    33.55
14. 106 LOAD_ATTR        2      1.36    34.91
15. 116 LOAD_GLOBAL      0      1.34    36.24
16. 100 LOAD_CONST       4      1.31    37.56
17. 124 LOAD_FAST        3      1.18    38.73
18. 106 LOAD_ATTR        3      1.17    39.91
19. 100 LOAD_CONST       5      1.09    41.00
20. 125 STORE_FAST       1      0.95    41.94
21. 106 LOAD_ATTR        4      0.94    42.88
22. 116 LOAD_GLOBAL      1      0.93    43.81
23. 100 LOAD_CONST       6      0.89    44.69
24. 124 LOAD_FAST        4      0.78    45.48
25. 131 CALL_FUNCTION    3      0.76    46.23
26. 106 LOAD_ATTR        5      0.74    46.98
27. 125 STORE_FAST       2      0.74    47.72
28. 100 LOAD_CONST       7      0.73    48.45
29. 116 LOAD_GLOBAL      2      0.73    49.18
30. 102 BUILD_TUPLE      2      0.68    49.86
31. 106 LOAD_ATTR        6      0.62    50.47
32. 100 LOAD_CONST       8      0.61    51.08

There are two ways to make Python opcodes more compact.

1. Add 1-byte dedicated opcodes for most used opcodes with arguments.

If add 32 short opcodes for 6 most popular opcodes: LOAD_CONST and 
LOAD_ATTR with arguments from 0 to 7, LOAD_FAST, STORE_FAST, LOAD_GLOBAL 
and CALL_FUNCTION with arguments from 0 to 3, this will decrease mean 
size of opcode from 2.73 to 1.75 bytes (1.56 times). This might make 
decoding of 49% opcodes slightly faster due to avoiding decoding argument.

This is relative simple and compatible way. But not fully compatible. We 
have either shift numbers for opcodes with arguments to free space for 
additional opcodes, or add new range for argumentless opcodes at the end 
of 8-bit space. Third-party tools that work with CPython bytecode should 
be updated.

2. Use 16-bit opcodes as in WPython.

This will decrease mean size of opcode from 2.73 to 2.044 bytes (1.33 
times). But this might make decoding of 85% opcodes slightly faster.

Raymond shown a roughly 10+% speedup in code exercising simple and 
common opcodes that that have a oparg when just optimize oparg decoding. 
[2] Using 16-bit opcodes with embedded oparg might have even larger effect.

This change needs more serious reworking of compiler and interpreter, 
not only disassembler and third-party tools.

[1] http://permalink.gmane.org/gmane.comp.python.devel/156162
[2] http://bugs.python.org/issue25823

More compact bytecode

Serhiy Storchaka

Victor Stinner

Serhiy Storchaka

Sven R. Kunze

Serhiy Storchaka

Sven R. Kunze

Andrew Barnert

Chris Angelico

Andrew Barnert

Chris Angelico

Andrew Barnert

MRAB

Serhiy Storchaka

Yury Selivanov

Victor Stinner

Serhiy Storchaka

Yury Selivanov

Victor Stinner

Antoine Pitrou

Andrew Barnert

Antoine Pitrou

Andrew Barnert

Serhiy Storchaka

Antoine Pitrou

MRAB

tags

participants (9)