Thanks for doing this! I don't think PEP 649 is going to be accepted or rejected based on either performance or memory usage, but it's nice to see you confirmed that its performance and memory impact is acceptable. If I run "ann_test.py 1", the annotations are already turned into strings. Why do you do it that way? It makes stock semantics look better, because manually stringized annotations are much faster than evaluating real expressions. It seems to me that the test would be more fair if test 1 used real annotations. So I added this to "lines": from types import SimpleNamespace foo = SimpleNamespace() foo.bar = SimpleNamespace() foo.bar.baz = float I also changed quote(t) so it always returned t unchanged. When I ran it that way, stock semantics "exec" time got larger. Cheers, //arry/ On 4/14/21 6:44 PM, Inada Naoki wrote:
I added memory usage data by tracemalloc.
``` # Python 3.9 w/ old semantics $ python3 ann_test.py 1 code size: 121011 memory: (385200, 385200) unmarshal: avg: 0.3341682574478909 +/- 3.700437551781949e-05 exec: avg: 0.4067857594229281 +/- 0.0006858555167675445
# Python 3.9 w/ PEP 563 semantics $ python3 ann_test.py 2 code size: 121070 memory: (398675, 398675) unmarshal: avg: 0.3352349083404988 +/- 7.749102039824168e-05 exec: avg: 0.24610224328935146 +/- 0.0008628035427956459
# master + optimization w/ PEP 563 semantics $ ./python ~/ann_test.py 2 code size: 110488 memory: (193572, 193572) unmarshal: avg: 0.31316645480692384 +/- 0.00011766086337841035 exec: avg: 0.11456295938696712 +/- 0.0017481202239372398
# co_annotations + optimization w/ PEP 649 semantics $ ./python ~/ann_test.py 3 code size: 204963 memory: (208273, 208273) unmarshal: avg: 0.597023528907448 +/- 0.00016614519056599577 exec: avg: 0.09546191191766411 +/- 0.00018099485135812695 ```
Summary:
* Both of PEP 563 and PEP 649 has low memory consumption than Python 3.9. * Importing time (unmarshal+exec) is about 0.7sec on old semantics and PEP 649, 0.43sec on PEP 563.
On Thu, Apr 15, 2021 at 10:31 AM Inada Naoki <songofacandy@gmail.com> wrote:
I created simple benchmark: https://gist.github.com/methane/abb509e5f781cc4a103cc450e1e7925d
This benchmark creates 1000 annotated functions and measure time to load and exec. And here is the result. All interpreters are built without --pydebug, --enable-optimization, and --with-lto.
``` # Python 3.9 w/ stock semantics
$ python3 ~/ann_test.py 1 code size: 121011 unmarshal: avg: 0.33605549649801103 +/- 0.007382938279889738 exec: avg: 0.395090194279328 +/- 0.001004608380122509
# Python 3.9 w/ PEP 563 semantics
$ python3 ~/ann_test.py 2 code size: 121070 unmarshal: avg: 0.3407619891455397 +/- 0.0011833618746421965 exec: avg: 0.24590165729168803 +/- 0.0003123404336687428
# master branch w/ PEP 563 semantics
$ ./python ~/ann_test.py 2 code size: 149086 unmarshal: avg: 0.45410854648798704 +/- 0.00107521956753799 exec: avg: 0.11281821667216718 +/- 0.00011939747308270317
# master branch + optimization (*) w/ PEP 563 semantics $ ./python ~/ann_test.py 2 code size: 110488 unmarshal: avg: 0.3184352931333706 +/- 0.0015278719180908732 exec: avg: 0.11042822999879717 +/- 0.00018108884723599264
# co_annotatins reference implementation w/ PEP 649 semantics
$ ./python ~/ann_test.py 3 code size: 229679 unmarshal: avg: 0.6402394526172429 +/- 0.0006400500128250688 exec: avg: 0.09774857209995388 +/- 9.275466265195788e-05
# co_annotatins reference implementation + optimization (*) w/ PEP 649 semantics
$ ./python ~/ann_test.py 3 code size: 204963 unmarshal: avg: 0.5824743471574039 +/- 0.007219086642131638 exec: avg: 0.09641968684736639 +/- 0.0001416784753249878 ```
(*) I found constant folding creates new tuple every time even though same tuple is in constant table. See https://github.com/python/cpython/pull/25419 For co_annotations, I cherry-pick https://github.com/python/cpython/pull/23056 too.
-- Inada Naoki <songofacandy@gmail.com>