
I created simple benchmark: https://gist.github.com/methane/abb509e5f781cc4a103cc450e1e7925d This benchmark creates 1000 annotated functions and measure time to load and exec. And here is the result. All interpreters are built without --pydebug, --enable-optimization, and --with-lto. ``` # Python 3.9 w/ stock semantics $ python3 ~/ann_test.py 1 code size: 121011 unmarshal: avg: 0.33605549649801103 +/- 0.007382938279889738 exec: avg: 0.395090194279328 +/- 0.001004608380122509 # Python 3.9 w/ PEP 563 semantics $ python3 ~/ann_test.py 2 code size: 121070 unmarshal: avg: 0.3407619891455397 +/- 0.0011833618746421965 exec: avg: 0.24590165729168803 +/- 0.0003123404336687428 # master branch w/ PEP 563 semantics $ ./python ~/ann_test.py 2 code size: 149086 unmarshal: avg: 0.45410854648798704 +/- 0.00107521956753799 exec: avg: 0.11281821667216718 +/- 0.00011939747308270317 # master branch + optimization (*) w/ PEP 563 semantics $ ./python ~/ann_test.py 2 code size: 110488 unmarshal: avg: 0.3184352931333706 +/- 0.0015278719180908732 exec: avg: 0.11042822999879717 +/- 0.00018108884723599264 # co_annotatins reference implementation w/ PEP 649 semantics $ ./python ~/ann_test.py 3 code size: 229679 unmarshal: avg: 0.6402394526172429 +/- 0.0006400500128250688 exec: avg: 0.09774857209995388 +/- 9.275466265195788e-05 # co_annotatins reference implementation + optimization (*) w/ PEP 649 semantics $ ./python ~/ann_test.py 3 code size: 204963 unmarshal: avg: 0.5824743471574039 +/- 0.007219086642131638 exec: avg: 0.09641968684736639 +/- 0.0001416784753249878 ``` (*) I found constant folding creates new tuple every time even though same tuple is in constant table. See https://github.com/python/cpython/pull/25419 For co_annotations, I cherry-pick https://github.com/python/cpython/pull/23056 too. -- Inada Naoki <songofacandy@gmail.com>