Thanks for doing this! I don't think PEP 649 is going to be accepted or rejected based on either performance or memory usage, but it's nice to see you confirmed that its performance and memory impact is acceptable.

If I run "ann_test.py 1", the annotations are already turned into strings. Why do you do it that way? It makes stock semantics look better, because manually stringized annotations are much faster than evaluating real expressions.

It seems to me that the test would be more fair if test 1 used real annotations. So I added this to "lines":

from types import SimpleNamespace
foo = SimpleNamespace()
foo.bar = SimpleNamespace()
foo.bar.baz = float

I also changed quote(t) so it always returned t unchanged. When I ran it that way, stock semantics "exec" time got larger.

Cheers,

/arry

On 4/14/21 6:44 PM, Inada Naoki wrote:

I added memory usage data by tracemalloc.

```
# Python 3.9 w/ old semantics
$ python3 ann_test.py 1
code size: 121011
memory: (385200, 385200)
unmarshal: avg: 0.3341682574478909 +/- 3.700437551781949e-05
exec: avg: 0.4067857594229281 +/- 0.0006858555167675445

# Python 3.9 w/ PEP 563 semantics
$ python3 ann_test.py 2
code size: 121070
memory: (398675, 398675)
unmarshal: avg: 0.3352349083404988 +/- 7.749102039824168e-05
exec: avg: 0.24610224328935146 +/- 0.0008628035427956459

# master + optimization w/ PEP 563 semantics
$ ./python ~/ann_test.py 2
code size: 110488
memory: (193572, 193572)
unmarshal: avg: 0.31316645480692384 +/- 0.00011766086337841035
exec: avg: 0.11456295938696712 +/- 0.0017481202239372398

# co_annotations + optimization w/ PEP 649 semantics
$ ./python ~/ann_test.py 3
code size: 204963
memory: (208273, 208273)
unmarshal: avg: 0.597023528907448 +/- 0.00016614519056599577
exec: avg: 0.09546191191766411 +/- 0.00018099485135812695
```

Summary:

* Both of PEP 563 and PEP 649 has low memory consumption than Python 3.9.
* Importing time (unmarshal+exec) is about 0.7sec on old semantics and
PEP 649, 0.43sec on PEP 563.

On Thu, Apr 15, 2021 at 10:31 AM Inada Naoki <songofacandy@gmail.com> wrote:

I created simple benchmark:
https://gist.github.com/methane/abb509e5f781cc4a103cc450e1e7925d

This benchmark creates 1000 annotated functions and measure time to
load and exec.
And here is the result. All interpreters are built without --pydebug,
--enable-optimization, and --with-lto.

```
# Python 3.9 w/ stock semantics

$ python3 ~/ann_test.py 1
code size: 121011
unmarshal: avg: 0.33605549649801103 +/- 0.007382938279889738
exec: avg: 0.395090194279328 +/- 0.001004608380122509

# Python 3.9 w/ PEP 563 semantics

$ python3 ~/ann_test.py 2
code size: 121070
unmarshal: avg: 0.3407619891455397 +/- 0.0011833618746421965
exec: avg: 0.24590165729168803 +/- 0.0003123404336687428

# master branch w/ PEP 563 semantics

$ ./python ~/ann_test.py 2
code size: 149086
unmarshal: avg: 0.45410854648798704 +/- 0.00107521956753799
exec: avg: 0.11281821667216718 +/- 0.00011939747308270317

# master branch + optimization (*) w/ PEP 563 semantics
$ ./python ~/ann_test.py 2
code size: 110488
unmarshal: avg: 0.3184352931333706 +/- 0.0015278719180908732
exec: avg: 0.11042822999879717 +/- 0.00018108884723599264

# co_annotatins reference implementation w/ PEP 649 semantics

$ ./python ~/ann_test.py 3
code size: 229679
unmarshal: avg: 0.6402394526172429 +/- 0.0006400500128250688
exec: avg: 0.09774857209995388 +/- 9.275466265195788e-05

# co_annotatins reference implementation + optimization (*) w/ PEP 649 semantics

$ ./python ~/ann_test.py 3
code size: 204963
unmarshal: avg: 0.5824743471574039 +/- 0.007219086642131638
exec: avg: 0.09641968684736639 +/- 0.0001416784753249878
```

(*) I found constant folding creates new tuple every time even though
same tuple is in constant table.
See https://github.com/python/cpython/pull/25419
For co_annotations, I cherry-pick
https://github.com/python/cpython/pull/23056  too.


--
Inada Naoki  <songofacandy@gmail.com>