PEP 563 in light of PEP 649
Hi all, I got pinged to voice my opinion on PEP 649 as the instigator of PEP 563. I'm sorry, this is long, and a separate thread, because it deals with three things: - Goals set for PEP 563 and how it did in practice; - PEP 649 and how it addresses those same goals; - can we cleanly adopt PEP 649? First off, it looks like this isn't worded clearly enough in the PEP itself so let me summarize what the goals of PEP 563 were: Goal 1. To get rid of the forward reference problem, e.g. when a type is declared lower in the file than its use. A cute special case of this is when a class has a method that accepts or returns objects of its own type. Goal 2. To somewhat decouple the syntax of type annotations from the runtime requirements, allowing for better expressibility. Goal 3. To make annotations affect runtime characteristics of typed modules less, namely import time and memory usage. Now, did PEP 563 succeed in its goals? Well, partially at best. Let's see. In terms of Goal 1, it turned out that `typing.get_type_hints()` has limits that make its use in general costly at runtime, and more importantly insufficient to resolve all types. The most common example deals with non-global context in which types are generated (e.g. inner classes, classes within functions, etc.). But one of the crown examples of forward references: classes with methods accepting or returning objects of their own type, also isn't properly handled by `typing.get_type_hints()` if a class generator is used. There's some trickery we can do to connect the dots but in general it's not great. As for Goal 2, it became painfully obvious that a number of types used for static typing purposes live outside of the type annotation context. So while PEP 563 tried to enable a backdoor for more usable static typing syntax, it ultimately couldn't. This is where PEP 585 and later PEP 604 came in, filling the gap by doing the sad but necessary work of enabling this extended typing syntax in proper runtime Python context. This is what should have been done all along and it makes PEP 563 in this context irrelevant as of Python 3.9 (for PEP 585) and 3.10 (for PEP 604). However, to the extent of types used within annotations, the PEP 563 future-import allows using the new cute typing syntax already for Python 3.7+ compatible code. So library authors can already adopt the lowercase type syntax of PEP 585 and the handy pipe syntax for unions of PEP 604. And even for non-type annotation uses that can be successfully barred by a `if TYPE_CHECKING` block, like type aliases, type variables, and such. Of course that has no chance of working with `typing.get_type_hints()`. Now, Goal 3 is a different matter. As Inada Naoki demonstrated somewhere in the preceding discussion here, PEP 563 made fully type-annotationed codebase import pretty much as fast as non-annotated code, and through the joys of string interning, use relatively little extra memory. At the time PEP 563, a popular concern around static typing in Python was that it slows down runtime while it's only useful for static analysis. While we (the typing crowd) were always sure the "only useful as a better linter" is dismissive, the performance argument had to go. So where does this leave us today? Runtime use of types was somewhat overly optimistically treated as solvable with `typing.get_type_hints()`. Now Pydantic and other similar tools show that this isn't sadly the case. Without the future-import, they could ignore the problem until Python 3.10 but no longer. I was somewhat surprised this was the case because forward references as strings could always be used. So I guess the answer there was to just not use them if you want your runtime tool to work. Fair enough. PEP 649 addresses this runtime usage of type annotations head on in a way that eluded me when I first set out to solve this problem. Back then, Larry and Mark Shannon did voice their opinion that through some clever frame object storage, "implicit lambdas", we can address the issue of forward references. This seemed unattractive to me at the time because it didn't deal with our Goal 2 and our understanding was that it actually makes Goal 3 worse by holding on to all frames in memory where type annotations appear, and by creating massive duplication of equivalent annotations in memory due to lack of a mechanism similar to string interning. Now those issues are somewhat solved in the final PEP 649 and this makes for an interesting compromise for us to make. I say "compromise" because as Inada Naoki measured, there's still a non-zero performance cost of PEP 649 versus PEP 563: - code size: +63% - memory: +62% - import time: +60% Will this hurt some current users of typing? Yes, I can name you multiple past employers of mine where this will be the case. Is it worth it for Pydantic? I tend to think that yes, it is, since it is a significant community, and the operations on type annotations it performs are in the sensible set for which `typing.get_type_hints()` was proposed. However, there are some big open questions about how to adopt PEP 649. Question 1. What should happen to code that already adopted `from __future__ import annotations`? Future imports were never thought of as feature toggles so if PEP 563 isn't going to become the default, it should get removed. However, removing it needs a deprecation period, otherwise code that already adopted the future-import will fail to execute (even if you leave a dummy future-import there -- forward references, remember?). So deprecation, and with that, a rather tricky situation where PEP 649 will have to support files with the future-import. Now PEP 649 says that an object cannot both have __annotations__ and __co_annotations__ set at the same time though, so I guess files with the future-import would necessarily be treated as some deprecated code that might or might not be translatable to PEP 649 co-annotations. Question 2. If PEP 649 deprecates PEP 563, will the use of the future-import for early adoption of PEP 585 and PEP 604 syntax become disallowed? Ultimately this is my biggest worry -- that there doesn't seem to be a clear adoption path for this change. If we just take it wholesale and declare PEP 563 a failure, that bars library authors from using PEP 585/604 syntax until Python 3.10 becomes their lowest supported version. That's October 2025 at the earliest if we're looking at 3.9 lifespan. That would be a significant wrench thrown in typing adoption as PEP 585 and 604 provide significant usability advantages, to the point where often modules with non-trivial annotations don't even have to import the typing module at all. Question 3. Additionally, it's unclear to me whether PEP 649 allows for any relaxed syntax in type annotations that might not be immediately valid in a given version of Python. Say, hypothetically (please don't bikeshed this!), we adopt PEP 649 in Python 3.10 and in Python 3.11 we come up with a shorthand for optional types using a question mark. Can I put the question mark there in Python 3.10 at all? Second made up example: let's say in Python 3.12 we allow passing a function as a type (meaning "a callable just LIKE this one"). Can I put this callable there now in Python 3.10? Now, should we postpone with until Python 3.11? We should not, at least not in terms of figuring out a clear migration path from here to there, ideally such that existing end users of the PEP 563 future-import can just live their lives as if nothing ever happened. Given that the goal of the future-import was largely to say "I don't care about those annotations at runtime", maybe PEP 649 could somehow adopt the existing future-import? `typing.get_type_hints()` would do what it does today anyway. The only users affected would be those directly using the strings in PEP 563 annotations, and it's unclear whether this is a real problem. There are a hundred pages of results on GitHub [1]_ that include `__annotations__` so this would have to be sought through. If we don't make it in time for including this in Python 3.10, then so be it. But let's not wait until April 2022 ;-) i-regret-nothing'ly yours, Łukasz _[1] https://github.com/search?l=Python&q=__annotations__&type=Code
Thanks so much for this, I think it makes a lot of sense. You've saved me by explaining why PEP 563 semantics are problematic for pydantic far more articulately than I could have done. I can entirely see why PEP 563 made sense at the time, myself (and others in the "runtime use of annotations" community) should have engaged with the PEP before it was accepted, we might have been able to smooth out these wrinkles then. *One other route occurs to me:* Switch from `from __future__ import annotations` to a per module feature flag (say, something like `from __switches__ import postponed_annotations`). I don't know if there's a precedent for this?, but I think it has a lot of advantages: * those who want type annotations as strings get their way (just a switch of statement at the beginning of files) * those who want runtime access to type annotations (pydantic etc.) get their way * if you have one module in a big code base that uses pydantic or similar, you only need to pay the price of runtime type annotations in that file * `from __future__ import annotations` can continue to prompt PEP 563 semantics (for now or indefinitely), independently it can also continue to work for PEP 585 and PEP 604 as I believe it does now * there's room for PEP 649 or similar in the future to improve forward refs and performance when the switch is not set, without the rush to get it into 3.10 * we could even allow the default to be changed with an env variable or command line flag like -O / PYTHONOPTIMIZE - but maybe this too complicated I understand that adding another switch to python is not ideal, but given that we are where we are, it seems like a pragmatic solution. For me I'd be happy if there was anyway whatsoever to keep the current (or PEP 649) behaviour in 3.10. Samuel -- Samuel Colvin
On Fri, Apr 16, 2021 at 5:28 PM Łukasz Langa <lukasz@langa.pl> wrote:
[snip] I say "compromise" because as Inada Naoki measured, there's still a non-zero performance cost of PEP 649 versus PEP 563:
- code size: +63% - memory: +62% - import time: +60%
Will this hurt some current users of typing? Yes, I can name you multiple past employers of mine where this will be the case. Is it worth it for Pydantic? I tend to think that yes, it is, since it is a significant community, and the operations on type annotations it performs are in the sensible set for which `typing.get_type_hints()` was proposed.
Just to give some more context: in my experience, both import time and memory use tend to be real issues in large Python codebases (code size less so), and I think that the relative efficiency of PEP 563 is an important feature. If PEP 649 can't be made more efficient, this could be a major regression for some users. Python server applications need to run multiple processes because of the GIL, and since code objects generally aren't shared between processes (GC and reference counting makes it tricky, I understand), code size increases tend to be amplified on large servers. Even having a lot of RAM doesn't necessarily help, since a lot of RAM typically implies many CPU cores, and thus many processes are needed as well. I can see how both PEP 563 and PEP 649 bring significant benefits, but typically for different user populations. I wonder if there's a way of combining the benefits of both approaches. I don't like the idea of having toggles for different performance tradeoffs indefinitely, but I can see how this might be a necessary compromise if we don't want to make things worse for any user groups. Jukka
Please don't confuse Inada Naoki's benchmark results with the effect PEP 649 would have on a real-world codebase. His artifical benchmark constructs a thousand empty functions that take three parameters with randomly-chosen annotations--the results provides some insights but are not directly applicable to reality. PEP 649's effects on code size / memory / import time are contingent on the number of annotations and the number of objects annotated, not the overall code size of the module. Expressing it that way, and suggesting that Python users would see the same results with real-world code, is highly misleading. I too would be interested to know the effects PEP 649 had on a real-world codebase currently using PEP 563, but AFAIK nobody has reported such results. //arry/ On 4/16/21 11:05 AM, Jukka Lehtosalo wrote:
On Fri, Apr 16, 2021 at 5:28 PM Łukasz Langa <lukasz@langa.pl <mailto:lukasz@langa.pl>> wrote:
[snip] I say "compromise" because as Inada Naoki measured, there's still a non-zero performance cost of PEP 649 versus PEP 563:
- code size: +63% - memory: +62% - import time: +60%
Will this hurt some current users of typing? Yes, I can name you multiple past employers of mine where this will be the case. Is it worth it for Pydantic? I tend to think that yes, it is, since it is a significant community, and the operations on type annotations it performs are in the sensible set for which `typing.get_type_hints()` was proposed.
Just to give some more context: in my experience, both import time and memory use tend to be real issues in large Python codebases (code size less so), and I think that the relative efficiency of PEP 563 is an important feature. If PEP 649 can't be made more efficient, this could be a major regression for some users. Python server applications need to run multiple processes because of the GIL, and since code objects generally aren't shared between processes (GC and reference counting makes it tricky, I understand), code size increases tend to be amplified on large servers. Even having a lot of RAM doesn't necessarily help, since a lot of RAM typically implies many CPU cores, and thus many processes are needed as well.
I can see how both PEP 563 and PEP 649 bring significant benefits, but typically for different user populations. I wonder if there's a way of combining the benefits of both approaches. I don't like the idea of having toggles for different performance tradeoffs indefinitely, but I can see how this might be a necessary compromise if we don't want to make things worse for any user groups.
Jukka
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/PBJ6MBQI... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Apr 16, 2021 at 12:32 PM Larry Hastings <larry@hastings.org> wrote:
Please don't confuse Inada Naoki's benchmark results with the effect PEP 649 would have on a real-world codebase. His artifical benchmark constructs a thousand empty functions that take three parameters with randomly-chosen annotations--the results provides some insights but are not directly applicable to reality.
PEP 649's effects on code size / memory / import time are contingent on the number of annotations and the number of objects annotated, not the overall code size of the module. Expressing it that way, and suggesting that Python users would see the same results with real-world code, is highly misleading.
I too would be interested to know the effects PEP 649 had on a real-world codebase currently using PEP 563, but AFAIK nobody has reported such results.
I'm not going to report results, but we could use mypy itself as an example real-world code base. Mypy is almost 100% annotated. It does not include `from __future__ import annotations` lines but those could easily be added mechanically for some experiment. ISTM that the unmarshal times reported by Inada are largely proportional to the code size numbers, so perhaps the following three-way experiment would give an indication: (1) Addthe sizes of all pyc files for mypy run with Python 3.9 (classic) (2) Ditto run with Python 3.10a7 (PEP 563) (3) Ditto run with Larry's branch (PEP 649, assuming it's on by default there -- otherwise, modify the source by inserting the needed future import at the top) The repo is github.com/python/mypy, the subdirectory to look is mypy, WITH THE EXCLUSION OF THE typeshed SUBDIRECTORY THEREOF. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On 4/16/21 5:00 PM, Guido van Rossum wrote:
(3) Ditto run with Larry's branch (PEP 649, assuming it's on by default there -- otherwise, modify the source by inserting the needed future import at the top)
The co_annotations stuff in my branch is gated with "from __future__ import co_annotations". Without that import my branch has stock semantics. Also, in case somebody does do this testing: don't use my branch for "from __future__ import annotations" testing. There are neat new optimizations for stringized annotations but my branch is too out-of-date to have them. Cheers, //arry/
On Sat, Apr 17, 2021 at 1:38 PM Guido van Rossum <guido@python.org> wrote:
I'm not going to report results, but we could use mypy itself as an example real-world code base. Mypy is almost 100% annotated. It does not include `from __future__ import annotations` lines but those could easily be added mechanically for some experiment.
ISTM that the unmarshal times reported by Inada are largely proportional to the code size numbers, so perhaps the following three-way experiment would give an indication:
(1) Addthe sizes of all pyc files for mypy run with Python 3.9 (classic) (2) Ditto run with Python 3.10a7 (PEP 563) (3) Ditto run with Larry's branch (PEP 649, assuming it's on by default there -- otherwise, modify the source by inserting the needed future import at the top)
Please don't use 3.10a7, but latest master branch. CFG optimizer broke some PEP 563 optimization and I fixed it yesterday. https://github.com/python/cpython/pull/25419
The repo is github.com/python/mypy, the subdirectory to look is mypy, WITH THE EXCLUSION OF THE typeshed SUBDIRECTORY THEREOF.
I want to measure import time and memory usage. Will `import mypy.main` import all important modules? This is my quick result of (1) and (2). I can not try (3) because of memory error. (see later). ## memory usage ``` $ cat a.py import tracemalloc tracemalloc.start() import mypy.main print("memory:", tracemalloc.get_traced_memory()[0]) # (1) $ python3 a.py memory: 8963137 $ python3 -OO a.py memory: 8272848 # (2) $ ~/local/python-dev/bin/python3 a.py memory: 8849216 $ ~/local/python-dev/bin/python3 -OO a.py memory: 8104730
(8963137-8849216)/8963137 0.012709947421310196 (8272848-8104730)/8272848 0.020321659481716575
PEP 563 saved 1~2% memory.
## GC time
$ pyperf timeit -s 'import mypy.main, gc' -- 'gc.collect()' 3.9: ..................... 2.68 ms +- 0.02 ms 3.10: ..................... 2.23 ms +- 0.01 ms Mean +- std dev: [3.9] 2.68 ms +- 0.02 ms -> [3.10] 2.23 ms +- 0.01 ms: 1.20x faster ``` PEP 563 is 1.2x faster! ## import time ``` $ python3 -m pyperf command python3 -c 'import mypy.main' (1) command: Mean +- std dev: 99.6 ms +- 0.3 ms (2) command: Mean +- std dev: 93.3 ms +- 1.2 ms
(99.6-93.3)/99.6 0.06325301204819275
PEP 563 reduced 6% importtime.
## memory error on co_annotations
I modifled py_compile to add `from __future__ import co_annotations`
automatically.
$ ../co_annotations/python -m compileall mypy Listing 'mypy'... Compiling 'mypy/checker.py'... free(): corrupted unsorted chunks Aborted #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ffff7c73859 in __GI_abort () at abort.c:79 #2 0x00007ffff7cde3ee in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e08285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155 #3 0x00007ffff7ce647c in malloc_printerr (str=str@entry=0x7ffff7e0a718 "free(): corrupted unsorted chunks") at malloc.c:5347 #4 0x00007ffff7ce81c2 in _int_free (av=0x7ffff7e39b80 <main_arena>, p=0x555555d1db30, have_lock=<optimized out>) at malloc.c:4356 #5 0x0000555555603906 in PyMem_RawFree (ptr=<optimized out>) at Objects/obmalloc.c:1922 #6 _PyObject_Free (ctx=<optimized out>, p=<optimized out>) at Objects/obmalloc.c:1922 #7 _PyObject_Free (ctx=<optimized out>, p=<optimized out>) at Objects/obmalloc.c:1913 #8 0x000055555567caa9 in compiler_unit_free (u=0x555555ef0fd0) at Python/compile.c:583 #9 0x000055555568aea5 in compiler_exit_scope (c=0x7fffffffc3d0) at Python/compile.c:760 #10 compiler_function (c=0x7fffffffc3d0, s=<optimized out>, is_async=0) at Python/compile.c:2529 #11 0x000055555568837d in compiler_visit_stmt (s=<optimized out>, c=0x7fffffffc3d0) at Python/compile.c:3665 #12 compiler_body (c=c@entry=0x7fffffffc3d0, stmts=0x555556222450) at Python/compile.c:1977 #13 0x0000555555688e51 in compiler_class (c=c@entry=0x7fffffffc3d0, s=s@entry=0x555556222a60) at Python/compile.c:2623 #14 0x0000555555687ce3 in compiler_visit_stmt (s=<optimized out>, c=0x7fffffffc3d0) at Python/compile.c:3667 #15 compiler_body (c=c@entry=0x7fffffffc3d0, stmts=0x5555563014c0) at Python/compile.c:1977 #16 0x000055555568db00 in compiler_mod (filename=0x7ffff72e6770, mod=0x5555563017b0, c=0x7fffffffc3d0) at Python/compile.c:2001 ``` -- Inada Naoki <songofacandy@gmail.com>
Obviously that's a bug. Can you send me this test case? Anything works--Github, private email, whatever is most convenient for you. Thank you! //arry/ On 4/16/21 11:22 PM, Inada Naoki wrote:
## memory error on co_annotations
I modifled py_compile to add `from __future__ import co_annotations` automatically.
``` $ ../co_annotations/python -m compileall mypy Listing 'mypy'... Compiling 'mypy/checker.py'... free(): corrupted unsorted chunks Aborted
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ffff7c73859 in __GI_abort () at abort.c:79 #2 0x00007ffff7cde3ee in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e08285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155 #3 0x00007ffff7ce647c in malloc_printerr (str=str@entry=0x7ffff7e0a718 "free(): corrupted unsorted chunks") at malloc.c:5347 #4 0x00007ffff7ce81c2 in _int_free (av=0x7ffff7e39b80 <main_arena>, p=0x555555d1db30, have_lock=<optimized out>) at malloc.c:4356 #5 0x0000555555603906 in PyMem_RawFree (ptr=<optimized out>) at Objects/obmalloc.c:1922 #6 _PyObject_Free (ctx=<optimized out>, p=<optimized out>) at Objects/obmalloc.c:1922 #7 _PyObject_Free (ctx=<optimized out>, p=<optimized out>) at Objects/obmalloc.c:1913 #8 0x000055555567caa9 in compiler_unit_free (u=0x555555ef0fd0) at Python/compile.c:583 #9 0x000055555568aea5 in compiler_exit_scope (c=0x7fffffffc3d0) at Python/compile.c:760 #10 compiler_function (c=0x7fffffffc3d0, s=<optimized out>, is_async=0) at Python/compile.c:2529 #11 0x000055555568837d in compiler_visit_stmt (s=<optimized out>, c=0x7fffffffc3d0) at Python/compile.c:3665 #12 compiler_body (c=c@entry=0x7fffffffc3d0, stmts=0x555556222450) at Python/compile.c:1977 #13 0x0000555555688e51 in compiler_class (c=c@entry=0x7fffffffc3d0, s=s@entry=0x555556222a60) at Python/compile.c:2623 #14 0x0000555555687ce3 in compiler_visit_stmt (s=<optimized out>, c=0x7fffffffc3d0) at Python/compile.c:3667 #15 compiler_body (c=c@entry=0x7fffffffc3d0, stmts=0x5555563014c0) at Python/compile.c:1977 #16 0x000055555568db00 in compiler_mod (filename=0x7ffff72e6770, mod=0x5555563017b0, c=0x7fffffffc3d0) at Python/compile.c:2001 ```
I noticed something this morning: there's another way in which Inada Naoki's benchmark here is--possibly?--unrealistic. As mentioned, his benchmark generates a thousand functions, each of which takes exactly three parameters, and each of those parameters randomly chooses one of three annotations. In current trunk (not in my branch, I'm behind), there's an optimization for stringized annotations that compiles the annotations into a tuple, and then when you pull out __annotations__ on the object at runtime it converts it into a dict on demand. This means that even though there are a thousand functions, they only ever generate one of nine possible tuples for these annotation tuples. And here's the thing: our lovely marshal module is smart enough to notice that these tuples /are/ duplicates, and it'll throw away the duplicates and replace them with references to the original. Something analogous /could/ happen in the PEP 649 branch but currently doesn't. When running Inada Noki's benchmark, there are a total of nine possible annotations code objects. Except, each function generated by the benchmark has a unique name, and I incorporate that name into the name given to the code object (f"{function_name}.__co_annotations__"). Since each function name is different, each code object name is different, so each code object /hash/ is different, and since they aren't /exact/ duplicates they are never consolidated. Inada Naoki has suggested changing this, so that all the annotations code objects have the same name ("__co_annotations__"). If we made that change, I'm pretty sure the code size delta in this synthetic benchmark would drop. I haven't done it because the current name of the code object might be helpful in debugging, and I'm not convinced this would have an effect in real-world code. But... would it? Someone, and again I think it's Inada Naoki, suggests that in real-world applications, there are often many, many functions in a single module that have identical signatures. The annotation-tuples optimization naturally takes advantage of that. PEP 649 doesn't. Should it? Would this really be beneficial to real-world code bases? Cheers, //arry/ On 4/16/21 12:26 PM, Larry Hastings wrote:
Please don't confuse Inada Naoki's benchmark results with the effect PEP 649 would have on a real-world codebase. His artifical benchmark constructs a thousand empty functions that take three parameters with randomly-chosen annotations--the results provides some insights but are not directly applicable to reality.
PEP 649's effects on code size / memory / import time are contingent on the number of annotations and the number of objects annotated, not the overall code size of the module. Expressing it that way, and suggesting that Python users would see the same results with real-world code, is highly misleading.
I too would be interested to know the effects PEP 649 had on a real-world codebase currently using PEP 563, but AFAIK nobody has reported such results.
//arry/
On 4/16/21 11:05 AM, Jukka Lehtosalo wrote:
On Fri, Apr 16, 2021 at 5:28 PM Łukasz Langa <lukasz@langa.pl <mailto:lukasz@langa.pl>> wrote:
[snip] I say "compromise" because as Inada Naoki measured, there's still a non-zero performance cost of PEP 649 versus PEP 563:
- code size: +63% - memory: +62% - import time: +60%
Will this hurt some current users of typing? Yes, I can name you multiple past employers of mine where this will be the case. Is it worth it for Pydantic? I tend to think that yes, it is, since it is a significant community, and the operations on type annotations it performs are in the sensible set for which `typing.get_type_hints()` was proposed.
Just to give some more context: in my experience, both import time and memory use tend to be real issues in large Python codebases (code size less so), and I think that the relative efficiency of PEP 563 is an important feature. If PEP 649 can't be made more efficient, this could be a major regression for some users. Python server applications need to run multiple processes because of the GIL, and since code objects generally aren't shared between processes (GC and reference counting makes it tricky, I understand), code size increases tend to be amplified on large servers. Even having a lot of RAM doesn't necessarily help, since a lot of RAM typically implies many CPU cores, and thus many processes are needed as well.
I can see how both PEP 563 and PEP 649 bring significant benefits, but typically for different user populations. I wonder if there's a way of combining the benefits of both approaches. I don't like the idea of having toggles for different performance tradeoffs indefinitely, but I can see how this might be a necessary compromise if we don't want to make things worse for any user groups.
Jukka
_______________________________________________ Python-Dev mailing list --python-dev@python.org To unsubscribe send an email topython-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived athttps://mail.python.org/archives/list/python-dev@python.org/message/PBJ6MBQI... Code of Conduct:http://python.org/psf/codeofconduct/
Oops: where I said nine, I should have said, twenty-seven. 3-cubed. Should have had my coffee /before/ posting. Carry on! //arry/ On 4/19/21 10:51 AM, Larry Hastings wrote:
I noticed something this morning: there's another way in which Inada Naoki's benchmark here is--possibly?--unrealistic.
As mentioned, his benchmark generates a thousand functions, each of which takes exactly three parameters, and each of those parameters randomly chooses one of three annotations. In current trunk (not in my branch, I'm behind), there's an optimization for stringized annotations that compiles the annotations into a tuple, and then when you pull out __annotations__ on the object at runtime it converts it into a dict on demand.
This means that even though there are a thousand functions, they only ever generate one of nine possible tuples for these annotation tuples. And here's the thing: our lovely marshal module is smart enough to notice that these tuples /are/ duplicates, and it'll throw away the duplicates and replace them with references to the original.
Something analogous /could/ happen in the PEP 649 branch but currently doesn't. When running Inada Noki's benchmark, there are a total of nine possible annotations code objects. Except, each function generated by the benchmark has a unique name, and I incorporate that name into the name given to the code object (f"{function_name}.__co_annotations__"). Since each function name is different, each code object name is different, so each code object /hash/ is different, and since they aren't /exact/ duplicates they are never consolidated.
Inada Naoki has suggested changing this, so that all the annotations code objects have the same name ("__co_annotations__"). If we made that change, I'm pretty sure the code size delta in this synthetic benchmark would drop. I haven't done it because the current name of the code object might be helpful in debugging, and I'm not convinced this would have an effect in real-world code.
But... would it? Someone, and again I think it's Inada Naoki, suggests that in real-world applications, there are often many, many functions in a single module that have identical signatures. The annotation-tuples optimization naturally takes advantage of that. PEP 649 doesn't. Should it? Would this really be beneficial to real-world code bases?
Cheers,
//arry/
On 4/16/21 12:26 PM, Larry Hastings wrote:
Please don't confuse Inada Naoki's benchmark results with the effect PEP 649 would have on a real-world codebase. His artifical benchmark constructs a thousand empty functions that take three parameters with randomly-chosen annotations--the results provides some insights but are not directly applicable to reality.
PEP 649's effects on code size / memory / import time are contingent on the number of annotations and the number of objects annotated, not the overall code size of the module. Expressing it that way, and suggesting that Python users would see the same results with real-world code, is highly misleading.
I too would be interested to know the effects PEP 649 had on a real-world codebase currently using PEP 563, but AFAIK nobody has reported such results.
//arry/
On 4/16/21 11:05 AM, Jukka Lehtosalo wrote:
On Fri, Apr 16, 2021 at 5:28 PM Łukasz Langa <lukasz@langa.pl <mailto:lukasz@langa.pl>> wrote:
[snip] I say "compromise" because as Inada Naoki measured, there's still a non-zero performance cost of PEP 649 versus PEP 563:
- code size: +63% - memory: +62% - import time: +60%
Will this hurt some current users of typing? Yes, I can name you multiple past employers of mine where this will be the case. Is it worth it for Pydantic? I tend to think that yes, it is, since it is a significant community, and the operations on type annotations it performs are in the sensible set for which `typing.get_type_hints()` was proposed.
Just to give some more context: in my experience, both import time and memory use tend to be real issues in large Python codebases (code size less so), and I think that the relative efficiency of PEP 563 is an important feature. If PEP 649 can't be made more efficient, this could be a major regression for some users. Python server applications need to run multiple processes because of the GIL, and since code objects generally aren't shared between processes (GC and reference counting makes it tricky, I understand), code size increases tend to be amplified on large servers. Even having a lot of RAM doesn't necessarily help, since a lot of RAM typically implies many CPU cores, and thus many processes are needed as well.
I can see how both PEP 563 and PEP 649 bring significant benefits, but typically for different user populations. I wonder if there's a way of combining the benefits of both approaches. I don't like the idea of having toggles for different performance tradeoffs indefinitely, but I can see how this might be a necessary compromise if we don't want to make things worse for any user groups.
Jukka
_______________________________________________ Python-Dev mailing list --python-dev@python.org To unsubscribe send an email topython-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived athttps://mail.python.org/archives/list/python-dev@python.org/message/PBJ6MBQI... Code of Conduct:http://python.org/psf/codeofconduct/
On 4/19/21 10:51 AM, Larry Hastings wrote:
Something analogous /could/ happen in the PEP 649 branch but currently doesn't. When running Inada Noki's benchmark, there are a total of nine possible annotations code objects. Except, each function generated by the benchmark has a unique name, and I incorporate that name into the name given to the code object (f"{function_name}.__co_annotations__"). Since each function name is different, each code object name is different, so each code object /hash/ is different, and since they aren't /exact/ duplicates they are never consolidated.
I hate anonymous functions, so the name is very important to me. The primary code base I work on does have hundreds of methods with the same signature -- unfortunately, many of the also have the same name (four levels of super() calls is not unusual, and all to the same read/write/create parent methods from read/write/create child methods). In such a case would the name make a meaningful difference? Or maybe the name can be store when running in debug mode, and not stored with -O ? -- ~Ethan~
On Mon, 19 Apr 2021 13:37:56 -0700 Ethan Furman <ethan@stoneleaf.us> wrote:
On 4/19/21 10:51 AM, Larry Hastings wrote:
Something analogous /could/ happen in the PEP 649 branch but currently doesn't. When running Inada Noki's benchmark, there are a total of nine possible annotations code objects. Except, each function generated by the benchmark has a unique name, and I incorporate that name into the name given to the code object (f"{function_name}.__co_annotations__"). Since each function name is different, each code object name is different, so each code object /hash/ is different, and since they aren't /exact/ duplicates they are never consolidated.
I hate anonymous functions, so the name is very important to me.
You are unlikely to notice the name of the code object underlying __co_annotations__, aren't you?
Or maybe the name can be store when running in debug mode, and not stored with -O ?
Almost nobody uses -O. Optimizations that are enabled only in -O are useless. Regards Antoine.
On 19/04/2021 22:01, Antoine Pitrou wrote:
Almost nobody uses -O. Optimizations that are enabled only in -O are useless. Data point: I use -O.¹ Not frequently, not usually, but I have a few large² programs that I add features to from time to time and will doubtless continue to do so. I need their __debug__-enabled diagnostics and asserts during development and testing. When I run them "for real", I use -O. Of course, since I never use annotations, type hints or whatever, this may not be very relevant to this thread. Best wishes Rob Cliffe
¹ Which makes me "almost nobody" - well, I knew that already. 😁 ² Your idea of "large" may not be the same as mine - I'm talking a few thousand lines.
On 4/19/21 1:37 PM, Ethan Furman wrote:
On 4/19/21 10:51 AM, Larry Hastings wrote:
Something analogous /could/ happen in the PEP 649 branch but currently doesn't. When running Inada Noki's benchmark, there are a total of nine possible annotations code objects. Except, each function generated by the benchmark has a unique name, and I incorporate that name into the name given to the code object (f"{function_name}.__co_annotations__"). Since each function name is different, each code object name is different, so each code object /hash/ is different, and since they aren't /exact/ duplicates they are never consolidated.
I hate anonymous functions, so the name is very important to me. The primary code base I work on does have hundreds of methods with the same signature -- unfortunately, many of the also have the same name (four levels of super() calls is not unusual, and all to the same read/write/create parent methods from read/write/create child methods). In such a case would the name make a meaningful difference?
Or maybe the name can be store when running in debug mode, and not stored with -O ?
I think it needs to have /a/ name. But if it made a difference, perhaps it could use f"{function_name}.__co_annotations__" normally, and simply "__co_annotations__" with -O. Note also that this is the name of the annotations code object, although I think the annotations function object reuses the name too. Anyway, under normal circumstances, the Python programmer would have no reason to interact directly with the annotations code/function object, so it's not likely it will affect them one way or another. The only time they would see it would be, say, if the calculation of an annotation threw an exception, in which case it seems like seeing f"{function_name}.__co_annotations__" in the traceback might be a helpful clue in diagnosing the problem. I'd want to see some real numbers before considering changes here. If it has a measurable and beneficial effect on real-world code, okay! let's change it! But my suspicion is that it doesn't really matter. Cheers, //arry/
El lun, 19 abr 2021 a las 14:17, Larry Hastings (<larry@hastings.org>) escribió:
On 4/19/21 1:37 PM, Ethan Furman wrote:
On 4/19/21 10:51 AM, Larry Hastings wrote:
Something analogous /could/ happen in the PEP 649 branch but currently doesn't. When running Inada Noki's benchmark, there are a total of nine possible annotations code objects. Except, each function generated by the benchmark has a unique name, and I incorporate that name into the name given to the code object (f"{function_name}.__co_annotations__"). Since each function name is different, each code object name is different, so each code object /hash/ is different, and since they aren't /exact/ duplicates they are never consolidated.
I hate anonymous functions, so the name is very important to me. The primary code base I work on does have hundreds of methods with the same signature -- unfortunately, many of the also have the same name (four levels of super() calls is not unusual, and all to the same read/write/create parent methods from read/write/create child methods). In such a case would the name make a meaningful difference?
Or maybe the name can be store when running in debug mode, and not stored with -O ?
I think it needs to have *a* name. But if it made a difference, perhaps it could use f"{function_name}.__co_annotations__" normally, and simply "__co_annotations__" with -O.
Note also that this is the name of the annotations code object, although I think the annotations function object reuses the name too. Anyway, under normal circumstances, the Python programmer would have no reason to interact directly with the annotations code/function object, so it's not likely it will affect them one way or another. The only time they would see it would be, say, if the calculation of an annotation threw an exception, in which case it seems like seeing f"{function_name}.__co_annotations__" in the traceback might be a helpful clue in diagnosing the problem.
If the line numbers in the traceback are right, I'm not sure the function name would make much of a difference.
I'd want to see some real numbers before considering changes here. If it has a measurable and beneficial effect on real-world code, okay! let's change it! But my suspicion is that it doesn't really matter.
Cheers,
*/arry* _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZAPCP4MF... Code of Conduct: http://python.org/psf/codeofconduct/
Just an idea: do not save co_name and co_firstlineno in code object for function annotations. When creating a function object from a code object, they can be copied from annotated function. I think co_name and co_firstlineno are preventing code object is shared in compile time. We can share only co_names and co_consts for now. If we can share the entire code object, it will reduce pyc file size and unmarshal time (e.g. part of import time). On Tue, Apr 20, 2021 at 6:15 AM Larry Hastings <larry@hastings.org> wrote:
On 4/19/21 1:37 PM, Ethan Furman wrote:
On 4/19/21 10:51 AM, Larry Hastings wrote:
Something analogous /could/ happen in the PEP 649 branch but currently doesn't. When running Inada Noki's benchmark, there are a total of nine possible annotations code objects. Except, each function generated by the benchmark has a unique name, and I incorporate that name into the name given to the code object (f"{function_name}.__co_annotations__"). Since each function name is different, each code object name is different, so each code object /hash/ is different, and since they aren't /exact/ duplicates they are never consolidated.
I hate anonymous functions, so the name is very important to me. The primary code base I work on does have hundreds of methods with the same signature -- unfortunately, many of the also have the same name (four levels of super() calls is not unusual, and all to the same read/write/create parent methods from read/write/create child methods). In such a case would the name make a meaningful difference?
Or maybe the name can be store when running in debug mode, and not stored with -O ?
I think it needs to have a name. But if it made a difference, perhaps it could use f"{function_name}.__co_annotations__" normally, and simply "__co_annotations__" with -O.
Note also that this is the name of the annotations code object, although I think the annotations function object reuses the name too. Anyway, under normal circumstances, the Python programmer would have no reason to interact directly with the annotations code/function object, so it's not likely it will affect them one way or another. The only time they would see it would be, say, if the calculation of an annotation threw an exception, in which case it seems like seeing f"{function_name}.__co_annotations__" in the traceback might be a helpful clue in diagnosing the problem.
I'd want to see some real numbers before considering changes here. If it has a measurable and beneficial effect on real-world code, okay! let's change it! But my suspicion is that it doesn't really matter.
Cheers,
/arry
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/ZAPCP4MF... Code of Conduct: http://python.org/psf/codeofconduct/
-- Inada Naoki <songofacandy@gmail.com>
On Tue, Apr 20, 2021 at 4:24 PM Inada Naoki <songofacandy@gmail.com> wrote:
Just an idea: do not save co_name and co_firstlineno in code object for function annotations. When creating a function object from a code object, they can be copied from annotated function.
I created a pull request. It use `__co_annotations__` for name, but use `<func.__qualname__>.__co_annotations__` for qualname. https://github.com/larryhastings/co_annotations/pull/11 -- Inada Naoki <songofacandy@gmail.com>
And I tried removing co_firstfileno in optimize branch. https://github.com/larryhastings/co_annotations/pull/9 Microbenchmarks. (https://gist.github.com/methane/abb509e5f781cc4a103cc450e1e7925d) ``` # co_annotations branch (63b415c3) $ ./python ~/ann_test.py 3 code size: 229679 bytes memory: 209077 bytes unmarshal: avg: 639.631ms +/-0.254ms exec: avg: 95.979ms +/-0.033ms $ ./python ~/ann_test_method.py 3 code size: 245729 bytes memory: 339109 bytes unmarshal: avg: 672.997ms +/-9.039ms exec: avg: 259.286ms +/-4.841ms # optimize branch (fbf0ad725f) $ ./python ~/ann_test.py 3 code size: 113082 bytes memory: 209077 bytes unmarshal: avg: 318.437ms +/-0.171ms exec: avg: 100.187ms +/-0.141ms $ ./python ~/ann_test_method.py 3 code size: 129134 bytes memory: 284565 bytes unmarshal: avg: 357.157ms +/-0.971ms exec: avg: 262.066ms +/-5.258ms ``` By the way, this microbenchmark uses 3 arguments and 1 return value. annotation value is chosen from 3 (e.g. ["int", "str", "foo.bar.baz"]). So there are 3*3*3*3=81 signatures, not only 27. Anyway, 81/1000 may not be realistic. When I changed ann_test to chose annotation value from 5 (e.g. 625/1000): ``` # co_annotations $ ./python ~/ann_test.py 3 code size: 236106 bytes memory: 208261 bytes unmarshal: avg: 653.788ms +/-1.257ms exec: avg: 95.783ms +/-0.169ms # optimize $ ./python ~/ann_test.py 3 code size: 162097 bytes memory: 208261 bytes unmarshal: avg: 458.959ms +/-0.163ms exec: avg: 98.327ms +/-0.065ms ``` -- Inada Naoki <songofacandy@gmail.com>
participants (10)
-
Antoine Pitrou
-
Ethan Furman
-
Guido van Rossum
-
Inada Naoki
-
Jelle Zijlstra
-
Jukka Lehtosalo
-
Larry Hastings
-
Rob Cliffe
-
Samuel Colvin
-
Łukasz Langa