To potentially help provide a little bit of additional detail around our approach I've spent some time writing up our internal details of the shadow byte code implementation, and landed that in our Cinder repo here: https://github.com/facebookincubator/cinder/blob/cinder/3.8/CinderDoc/shadowcode.rst.  That might at least spark discussion or ideas about possible internal implementation details or things which could be different/more efficient in our implementation.

I've also had a version of it against 3.10 going for a while (as internally we're still at 3.8) and I've updated it to a relatively recent merge of 3.11 main.  I've pushed the latest version of that here here:  https://github.com/DinoV/cpython/tree/shadowcode_rebase_2021_05_12.  The 3.11 version obviously isn't as battle tested as what we've been running in production for some time now but it pretty much the same.  It is missing our improved global caching which uses dictionary watches though.  And it is a rather large PR (almost 7k lines) but over 1/3rd of that is the test cases.

Also just to inform the discussion around potential performance benefits, here's how that alone is currently benchmarking versus the base commit:

cpython_310_opt_rig.json
========================

Performance version: 1.0.1
Report on Linux-5.2.9-229_fbk15_hardened_4185_g357f49b36602-x86_64-with-glibc2.28
Number of logical CPUs: 48
Start date: 2021-05-17 21:57:08.095822
End date: 2021-05-17 22:40:33.374232

cpython_ghdino_opt_rig.json
===========================

Performance version: 1.0.1
Report on Linux-5.2.9-229_fbk15_hardened_4185_g357f49b36602-x86_64-with-glibc2.28
Number of logical CPUs: 48
Start date: 2021-05-21 17:25:24.410644
End date: 2021-05-21 18:02:53.524314

+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| Benchmark               | cpython_310_opt_rig.json | cpython_ghdino_opt_rig.json | Change       | Significance          |
+=========================+==========================+=============================+==============+=======================+
| 2to3                    | 498 ms                   | 459 ms                      | 1.09x faster | Significant (t=15.60) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| chameleon               | 13.4 ms                  | 12.6 ms                     | 1.07x faster | Significant (t=11.10) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| chaos                   | 163 ms                   | 135 ms                      | 1.21x faster | Significant (t=33.07) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| crypto_pyaes            | 171 ms                   | 147 ms                      | 1.16x faster | Significant (t=24.93) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| deltablue               | 11.7 ms                  | 8.38 ms                     | 1.40x faster | Significant (t=70.51) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| django_template         | 73.7 ms                  | 68.1 ms                     | 1.08x faster | Significant (t=13.12) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| dulwich_log             | 108 ms                   | 98.6 ms                     | 1.10x faster | Significant (t=18.11) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| fannkuch                | 734 ms                   | 731 ms                      | 1.00x faster | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| float                   | 166 ms                   | 140 ms                      | 1.18x faster | Significant (t=29.38) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| go                      | 345 ms                   | 305 ms                      | 1.13x faster | Significant (t=31.29) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| hexiom                  | 14.4 ms                  | 13.1 ms                     | 1.10x faster | Significant (t=15.95) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| json_dumps              | 19.6 ms                  | 18.1 ms                     | 1.09x faster | Significant (t=13.85) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| json_loads              | 37.5 us                  | 34.8 us                     | 1.08x faster | Significant (t=16.23) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| logging_format          | 14.5 us                  | 10.9 us                     | 1.33x faster | Significant (t=43.42) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| logging_silent          | 274 ns                   | 238 ns                      | 1.15x faster | Significant (t=23.00) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| logging_simple          | 13.4 us                  | 10.2 us                     | 1.31x faster | Significant (t=46.73) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| mako                    | 23.1 ms                  | 22.3 ms                     | 1.04x faster | Significant (t=5.78)  |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| meteor_contest          | 151 ms                   | 152 ms                      | 1.01x slower | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| nbody                   | 217 ms                   | 208 ms                      | 1.04x faster | Significant (t=6.52)  |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| nqueens                 | 153 ms                   | 145 ms                      | 1.06x faster | Significant (t=10.43) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| pathlib                 | 29.2 ms                  | 24.5 ms                     | 1.19x faster | Significant (t=27.86) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| pickle                  | 14.6 us                  | 14.6 us                     | 1.00x slower | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| pickle_dict             | 36.3 us                  | 35.4 us                     | 1.03x faster | Significant (t=6.24)  |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| pickle_list             | 5.55 us                  | 5.44 us                     | 1.02x faster | Significant (t=3.42)  |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| pickle_pure_python      | 708 us                   | 576 us                      | 1.23x faster | Significant (t=56.02) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| pidigits                | 262 ms                   | 255 ms                      | 1.03x faster | Significant (t=6.37)  |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| pyflate                 | 1.02 sec                 | 919 ms                      | 1.11x faster | Significant (t=24.26) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| python_startup          | 13.1 ms                  | 13.1 ms                     | 1.01x faster | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| python_startup_no_site  | 8.69 ms                  | 8.56 ms                     | 1.01x faster | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| raytrace                | 758 ms                   | 590 ms                      | 1.28x faster | Significant (t=62.09) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| regex_compile           | 256 ms                   | 227 ms                      | 1.13x faster | Significant (t=29.88) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| regex_dna               | 256 ms                   | 256 ms                      | 1.00x faster | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| regex_effbot            | 4.29 ms                  | 4.35 ms                     | 1.01x slower | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| regex_v8                | 35.7 ms                  | 35.5 ms                     | 1.00x faster | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| richards                | 117 ms                   | 98.3 ms                     | 1.19x faster | Significant (t=31.70) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| scimark_fft             | 559 ms                   | 573 ms                      | 1.02x slower | Significant (t=-6.02) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| scimark_lu              | 254 ms                   | 249 ms                      | 1.02x faster | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| scimark_monte_carlo     | 162 ms                   | 126 ms                      | 1.29x faster | Significant (t=41.31) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| scimark_sor             | 305 ms                   | 281 ms                      | 1.09x faster | Significant (t=19.82) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| scimark_sparse_mat_mult | 7.51 ms                  | 7.59 ms                     | 1.01x slower | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| spectral_norm           | 218 ms                   | 220 ms                      | 1.01x slower | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| telco                   | 9.65 ms                  | 9.56 ms                     | 1.01x faster | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| unpack_sequence         | 82.4 ns                  | 75.5 ns                     | 1.09x faster | Significant (t=15.12) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| unpickle                | 21.0 us                  | 19.9 us                     | 1.05x faster | Significant (t=8.02)  |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| unpickle_list           | 6.49 us                  | 6.76 us                     | 1.04x slower | Significant (t=-7.46) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| unpickle_pure_python    | 494 us                   | 419 us                      | 1.18x faster | Significant (t=26.60) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| xml_etree_generate      | 144 ms                   | 140 ms                      | 1.03x faster | Significant (t=3.75)  |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| xml_etree_iterparse     | 167 ms                   | 159 ms                      | 1.04x faster | Significant (t=7.17)  |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| xml_etree_parse         | 212 ms                   | 209 ms                      | 1.02x faster | Not significant       |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+
| xml_etree_process       | 114 ms                   | 102 ms                      | 1.11x faster | Significant (t=16.92) |
+-------------------------+--------------------------+-----------------------------+--------------+-----------------------+

Skipped 5 benchmarks only in cpython_310_opt_rig.json: sympy_expand, sympy_integrate, sympy_str, sympy_sum, tornado_http


And here's the almost entirely non-significant memory benchmarks:

cpython_310_mem.json
====================

Performance version: 1.0.1
Report on Linux-5.2.9-229_fbk15_hardened_4185_g357f49b36602-x86_64-with-glibc2.28
Number of logical CPUs: 48
Start date: 2021-05-18 13:09:32.100009
End date: 2021-05-18 13:46:54.655953

cpython_ghdino_mem.json
=======================

Performance version: 1.0.1
Report on Linux-5.2.9-229_fbk15_hardened_4185_g357f49b36602-x86_64-with-glibc2.28
Number of logical CPUs: 48
Start date: 2021-05-19 17:17:30.891269
End date: 2021-05-20 10:44:09.117795

+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| Benchmark               | cpython_310_mem.json | cpython_ghdino_mem.json | Change        | Significance          |
+=========================+======================+=========================+===============+=======================+
| 2to3                    | 21.2 MB              | 21.6 MB                 | 1.02x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| chameleon               | 16.5 MB              | 16.5 MB                 | 1.00x smaller | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| chaos                   | 8303.8 kB            | 8170.0 kB               | 1.02x smaller | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| crypto_pyaes            | 7630.8 kB            | 7549.6 kB               | 1.01x smaller | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| deltablue               | 9620.0 kB            | 9839.4 kB               | 1.02x larger  | Significant (t=-8.20) |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| django_template         | 22.3 MB              | 22.6 MB                 | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| dulwich_log             | 11.6 MB              | 11.7 MB                 | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| fannkuch                | 7174.6 kB            | 7195.0 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| float                   | 16.7 MB              | 18.3 MB                 | 1.10x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| go                      | 9132.4 kB            | 9170.4 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| hexiom                  | 8311.8 kB            | 8372.6 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| json_dumps              | 9406.6 kB            | 9413.0 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| json_loads              | 7444.0 kB            | 7453.0 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| logging_format          | 11.0 MB              | 10.1 MB                 | 1.08x smaller | Significant (t=17.51) |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| logging_silent          | 7651.0 kB            | 7706.2 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| logging_simple          | 10.3 MB              | 10.4 MB                 | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| mako                    | 13.7 MB              | 13.9 MB                 | 1.02x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| meteor_contest          | 9474.6 kB            | 9512.0 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| nbody                   | 7365.4 kB            | 7461.4 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| nqueens                 | 7471.0 kB            | 7487.4 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| pathlib                 | 8682.4 kB            | 8732.0 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| pickle                  | 7935.2 kB            | 7942.8 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| pickle_dict             | 7930.6 kB            | 7933.2 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| pickle_list             | 7934.2 kB            | 7956.6 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| pickle_pure_python      | 7962.4 kB            | 7971.2 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| pidigits                | 7396.4 kB            | 7435.0 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| pyflate                 | 36.9 MB              | 37.2 MB                 | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| python_startup          | 9499.6 kB            | 9624.0 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| python_startup_no_site  | 9479.6 kB            | 9630.8 kB               | 1.02x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| raytrace                | 8239.0 kB            | 8273.0 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| regex_compile           | 8602.2 kB            | 8662.6 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| regex_dna               | 15.0 MB              | 15.1 MB                 | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| regex_effbot            | 8054.6 kB            | 8094.8 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| regex_v8                | 13.0 MB              | 13.0 MB                 | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| richards                | 7837.2 kB            | 7841.2 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| scimark_fft             | 8037.0 kB            | 8118.8 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| scimark_lu              | 8059.2 kB            | 8107.2 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| scimark_monte_carlo     | 7968.2 kB            | 8020.2 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| scimark_sor             | 7995.0 kB            | 8065.0 kB               | 1.01x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| scimark_sparse_mat_mult | 8512.2 kB            | 8549.4 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| spectral_norm           | 7184.4 kB            | 7217.8 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| telco                   | 7857.2 kB            | 7672.2 kB               | 1.02x smaller | Significant (t=38.26) |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| unpack_sequence         | 8809.6 kB            | 8835.8 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| unpickle                | 7943.4 kB            | 7965.8 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| unpickle_list           | 7948.6 kB            | 7925.6 kB               | 1.00x smaller | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| unpickle_pure_python    | 7922.0 kB            | 7955.8 kB               | 1.00x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| xml_etree_generate      | 11.5 MB              | 11.7 MB                 | 1.02x larger  | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| xml_etree_iterparse     | 12.1 MB              | 12.0 MB                 | 1.01x smaller | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| xml_etree_parse         | 11.6 MB              | 11.5 MB                 | 1.01x smaller | Not significant       |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+
| xml_etree_process       | 12.1 MB              | 12.5 MB                 | 1.03x larger  | Significant (t=-3.04) |
+-------------------------+----------------------+-------------------------+---------------+-----------------------+


On Tue, May 25, 2021 at 2:05 PM Guido van Rossum <guido@python.org> wrote:
On Tue, May 25, 2021 at 1:50 PM Łukasz Langa <lukasz@langa.pl> wrote:

On 25 May 2021, at 21:57, Guido van Rossum <guido@python.org> wrote:

On Tue, May 25, 2021 at 12:34 PM Brett Cannon <brett@python.org> wrote:

I personally think it should be a Standards Track PEP. This PEP isn't documenting some detail like PEP 13 or some release schedule, but is instead proposing a rather major change to the interpreter which a lot of us will need to understand in order to support the code (and I do realize the entire area of "what requires a PEP and what doesn't" is very hazy).

Now, we've done similar things before (for example, the pattern matching implementation was a long-living branch), but the difference is that for pattern matching, the implementation followed the design, whereas for the changes to the bytecode interpreter that we're undertaking here, much of the architecture will be designed as the implementation proceeds, based on what we learn during the implementation.

Good point. We've also done long-living branching during Gilectomy which saved a lot of pain when it turned out not to be worth pursuing after all. Do you think this case is qualitatively different?

I think it's different -- the problems with the Gilectomy were pretty predictable (slower single-core perf due to way more locking calls), but it was not predictable whether Larry would be able to overcome them (I was rooting for him the whole time).

Here, we're looking at something where Mark has prototyped the proposed approach extensively (HoyPy, HotPy2), and the question is more whether Python 3.11 is going to be 15% faster or 50%. And some of the ideas have also been prototyped by the existing inline caches (some of the proposal is just to do more of those, and reducing the overhead by specializing opcodes), and further validated by Dino's work at Facebook/Instagram on Shadowcode (part of Cinder), which also specializes opcodes.

--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WOODDS3VR5AWKWXRZC4XU26F44H2CC4W/
Code of Conduct: http://python.org/psf/codeofconduct/