Please don't confuse Inada Naoki's benchmark results with the effect PEP 649 would have on a real-world codebase.  His artifical benchmark constructs a thousand empty functions that take three parameters with randomly-chosen annotations--the results provides some insights but are not directly applicable to reality.

PEP 649's effects on code size / memory / import time are contingent on the number of annotations and the number of objects annotated, not the overall code size of the module.  Expressing it that way, and suggesting that Python users would see the same results with real-world code, is highly misleading.

I too would be interested to know the effects PEP 649 had on a real-world codebase currently using PEP 563, but AFAIK nobody has reported such results.


On 4/16/21 11:05 AM, Jukka Lehtosalo wrote:
On Fri, Apr 16, 2021 at 5:28 PM Łukasz Langa <> wrote:
[snip] I say "compromise" because as Inada Naoki measured, there's still a non-zero performance cost of PEP 649 versus PEP 563:

- code size: +63%
- memory: +62%
- import time: +60%

Will this hurt some current users of typing? Yes, I can name you multiple past employers of mine where this will be the case. Is it worth it for Pydantic? I tend to think that yes, it is, since it is a significant community, and the operations on type annotations it performs are in the sensible set for which `typing.get_type_hints()` was proposed.

Just to give some more context: in my experience, both import time and memory use tend to be real issues in large Python codebases (code size less so), and I think that the relative efficiency of PEP 563 is an important feature. If PEP 649 can't be made more efficient, this could be a major regression for some users. Python server applications need to run multiple processes because of the GIL, and since code objects generally aren't shared between processes (GC and reference counting makes it tricky, I understand), code size increases tend to be amplified on large servers. Even having a lot of RAM doesn't necessarily help, since a lot of RAM typically implies many CPU cores, and thus many processes are needed as well.

I can see how both PEP 563 and PEP 649 bring significant benefits, but typically for different user populations. I wonder if there's a way of combining the benefits of both approaches. I don't like the idea of having toggles for different performance tradeoffs indefinitely, but I can see how this might be a necessary compromise if we don't want to make things worse for any user groups.


Python-Dev mailing list --
To unsubscribe send an email to
Message archived at
Code of Conduct: