Implementation of PEP-0604

To implement a full version of PEP604 <https://www.python.org/dev/peps/pep-0604/>, I analyze the typing module, started with _GenericAlias. 1) I must rewrite : - def _type_check(arg, msg, is_argument=True) - def _type_repr(obj) - def _collect_type_vars(types) - def _subs_tvars(tp, tvars, subs) - def _check_generic(cls, parameters) - def _remove_dups_flatten(parameters) - def _tp_cache(func) - class _Final - class _Immutable - class _SpecialForm(_Final, _Immutable, _root=True) - class ForwardRef(_Final, _root=True) - class TypeVar(_Final, _Immutable, _root=True) - def _is_dunder(attr) - class _GenericAlias(_Final, _root=True) - class Generic - class _TypingEmpty - class _TypingEllipsis - def _get_protocol_attrs(cls) - def _is_callable_members_only(cls) - def _allow_reckless_class_cheks() - class _ProtocolMeta(ABCMeta) - class Protocol(Generic, metaclass=_ProtocolMeta) 2) The function _tp_cache use functools.lru_cache() def _tp_cache(func): cached = functools.lru_cache()(func) it's not reasonable to move the lru_cache() in the core 3) The method TypeVar.__init__() use: def_mod = sys._getframe(1).f_globals['__name__'] # for pickling 4) The method def _allow_reckless_class_cheks() use: return sys._getframe(3).f_globals['__name__'] in ['abc', 'functools'] 5) The method Protocol.__init_subclass___proto_hook() use: if (isinstance(annotations, collections.abc.Mapping) it's not reasonable to move the Mapping type in the core It's not enough to move the typing classes, I must move functools.lru_cache() and dependencies, collections.abs.Mapping and dependencies, and track the frame level. *It's too big for me.* May be, the approach with only PEP 563 is enough. from __future__ import annotations a:int|str=3 This new syntax is only usable in annotations. Without runtime evaluation and without modifying issubclass() and isinstance() may be acceptable. Only the mypy (and others tools like this) must be updated. Philippe

On Thu., 31 Oct. 2019, 11:45 pm Philippe Prados, <philippe.prados@gmail.com> wrote:
However, there may be C level caching machinery that can be used instead (we have a lot of low level caches, lru_cache is just the flexible option that's available to Python code). 3) The method TypeVar.__init__() use:
_getframe is already a Python level wrapper around an underlying C API. A C implementation of this code would skip the wrapper and call the underlying API directly.
I believe the collections ABCs are already available to builtin code through the frozen "_collections_abc" module. If I'm misremembering, and that module is just imported early rather than being frozen as precompiled bytecode, then freezing it as part of the interpreter build process (rather than rewriting it in C) would be the way to handle this (and a proof of concept could likely get away with importing it just before it is needed).
It's certainly not an easy project to tackle. For some of the specific points you raise though, you wouldn't translate the existing Python code directly to C. Instead, you would want to look for existing builtin code that does something similar (e.g. by searching for references to "_collections_abc"), and then use C level code that gives the same external behavior, even though it works differently internally. Cheers, Nick.

On Thu, 31 Oct 2019 at 23:20, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree with Nick, this is indeed big, but not impossible. If you are not sure yet whether you will work on implementation, you can focus on polishing the PEP text, and then if it is accepted and you will decide to give implementation to someone else, we will find a volunteer. -- Ivan

Unless I'm misunderstanding something, it seems that the OP is under the impression that because they authored the PEP, they're obligated and responsible for the entire implementation. Instead, we should be able to relatively easily split up the implementation into multiple self-contained PRs. The OP can still work on some of them if they want to and are able to, but other contributors can work on other parts of it. The easiest way of doing this would likely be separating each self-contained PR into a bullet point in a bpo issue with a brief description (similar to the layout posted above), and then the OP can work through them as they have time to do so. Any other contributor can simply pick up where the OP left off, or on other parts of it while the OP is working on something else. This would of course be after/if the PEP is accepted. On Mon, Nov 4, 2019 at 7:33 AM Ivan Levkivskyi <levkivskyi@gmail.com> wrote:

Hi Philippe and list, Long time lurker here, I've had a few spare cycles at $work this week that I spent exploring an alternate implementation to PEP604 starting with the implementation from Philippe. I don't say "reference implementation", because it doesn't implement PEP604 faithfully, but it does try to implement it's goal. My thinking was that the "runtime" component to it only needs to support the isinstance/issubclass methods and so can be pretty minimal without need to import all the typing.py machinery. At which point, when actual type checking was done (anything in typing.py or mypy), the `union` type can be special cased, or promoted to a `typing.Union`. I started by creating a built-in "union" type, that was a thin wrapper around a tuple, such that type.__or__ will return `union(lhs, rhs)` which stored the two items in a 2-length tuple. I also changed the methods in `typing.Union` to return a `union`. However, I hit my first issue: `int | int | str` would then return `union(int, union(int, str))`, whereas it should deduplicate the types. So, I implemented logic to detect and merge union | union, but changed the underlying data to a plain flat tuple. And the next issue I hit was that some of the other `typing.py` types weren't being detected as valid types (probably due to my limited understanding of the PyObject datastructure/flags), specifically, `int | "forwardref"`, I needed a way to signal to typing.py that I had a forwardref without importing it. This is where I changed my approach to the problem. I wanted to find a way to have these metatypes (Union/TypeVar/ForwardRef) without re-implementing them in c, or importing them. My Idea was to change the builtin type to what I called a "shadow type". In the C layer it's just holds a tuple of objects/parameters, and has a type field to say what it is, then when in the type checking/typing.py layer, if it detects an object is a "shadowobject", then it promotes it to a full type. What I liked about this approach, is that for isinstance/issubclass, when it see a "shadowobject" it can just grab the underlying types from the shadowobject->params tuple and no import magic needs to happen. All of the type magic can be left to mypy/typing.py without much knowledge of the shadowobject. I've mostly got through this implementation with all but one test passing. At this point, I've got as far as I can go without help, so I'm posting here, looking for feedback or discussion. I expect that maybe the next step should be me commenting on the PEP604 itself for implementation concerns regarding `_GenericAlias`? Implementation is located at https://github.com/Naddiseo/cpython/tree/PEP604 Richard

Thanks for the encouragement! I've been working some more on it since I had some more free cycles in the last few days and I think I've got to the limit of my capabilities. I think I've got it to a point where it needs more eyes because my experience level writing code in the python interpreter is pretty low, so I know that I have holes in there, especially around using INCREF/DECREF. I'm currently stuck on 3 things: 1. repr( pickle.loads( pickle.dumps(int | str) ) ) causes an infinite recursion that I can't figure out. I might end up re-specializing `repr()` again since it isn't defined in the PEP. 2. - `None | "forwardref"` and `None | None` - `"forwardref" | None` and `"forwardRefA" | "forwardRefB"` I've had a go at adding a `type.__ror__` as suggested by GvR, but I think I'm missing something. 3. type parameters - I'm not sure how to handle this: ``` MaybeT = None | "T" # creates shadow union T = TypeVar('T') maybe_int = MaybeT[int] # should this stay in shadow land, or go back to typing.py isinstance(1, maybe_int) isinstance(1, MaybeT[int]) ``` I have a couple options for this: - implement the type substitution in c; which can be done, but feels like overkill? - store the type parameters on the shadow object and vivify when needed (which will end up being in the isinstance call) - implement the __getitem__ as a call into the typing.py library and promote as it's requested. This is how I've currently implemented it. I'm happy to keep going on this if/when needed, Richard

To add some more input about forward references, here are couple of thoughts: 1. I think ideally we should support forward references to the same extent they are currently supported: i.e. Union[None, "ForwardRef"], Union["OtherForwardRef", None], and Union["ForwardRef", "OtherForwardRef"] should all be supported. The last one looks the hardest, in the sense that it looks like we will need to add `str.__or__()` to support these forms. 2. Forward references to type variables were never supported and we shouldn't start supporting this (even if it accidentally worked in some situations). Type variables should always be defined before use. So I think we should not worry about point 3 in Richard's e-mail. Philippe, I think it makes sense to reflect these points to the PEP draft. -- Ivan On Thu, 14 Nov 2019 at 02:23, Richard Eames <github@naddiseo.ca> wrote:

On 2019-11-17 20:09, Ivan Levkivskyi wrote:
I'm not sure I like the thought of adding `str.__or__()` because it would mean that "Foo" | "Bar" becomes valid and returns a type, which is unexpected. Perhaps an alternative would be to add 'Forward' to make it explicit: Forward("ForwardRef") | Forward("OtherForwardRef") or just: Forward("ForwardRef") | "OtherForwardRef"
[snip]

On Thu., 31 Oct. 2019, 11:45 pm Philippe Prados, <philippe.prados@gmail.com> wrote:
However, there may be C level caching machinery that can be used instead (we have a lot of low level caches, lru_cache is just the flexible option that's available to Python code). 3) The method TypeVar.__init__() use:
_getframe is already a Python level wrapper around an underlying C API. A C implementation of this code would skip the wrapper and call the underlying API directly.
I believe the collections ABCs are already available to builtin code through the frozen "_collections_abc" module. If I'm misremembering, and that module is just imported early rather than being frozen as precompiled bytecode, then freezing it as part of the interpreter build process (rather than rewriting it in C) would be the way to handle this (and a proof of concept could likely get away with importing it just before it is needed).
It's certainly not an easy project to tackle. For some of the specific points you raise though, you wouldn't translate the existing Python code directly to C. Instead, you would want to look for existing builtin code that does something similar (e.g. by searching for references to "_collections_abc"), and then use C level code that gives the same external behavior, even though it works differently internally. Cheers, Nick.

On Thu, 31 Oct 2019 at 23:20, Nick Coghlan <ncoghlan@gmail.com> wrote:
I agree with Nick, this is indeed big, but not impossible. If you are not sure yet whether you will work on implementation, you can focus on polishing the PEP text, and then if it is accepted and you will decide to give implementation to someone else, we will find a volunteer. -- Ivan

Unless I'm misunderstanding something, it seems that the OP is under the impression that because they authored the PEP, they're obligated and responsible for the entire implementation. Instead, we should be able to relatively easily split up the implementation into multiple self-contained PRs. The OP can still work on some of them if they want to and are able to, but other contributors can work on other parts of it. The easiest way of doing this would likely be separating each self-contained PR into a bullet point in a bpo issue with a brief description (similar to the layout posted above), and then the OP can work through them as they have time to do so. Any other contributor can simply pick up where the OP left off, or on other parts of it while the OP is working on something else. This would of course be after/if the PEP is accepted. On Mon, Nov 4, 2019 at 7:33 AM Ivan Levkivskyi <levkivskyi@gmail.com> wrote:

Hi Philippe and list, Long time lurker here, I've had a few spare cycles at $work this week that I spent exploring an alternate implementation to PEP604 starting with the implementation from Philippe. I don't say "reference implementation", because it doesn't implement PEP604 faithfully, but it does try to implement it's goal. My thinking was that the "runtime" component to it only needs to support the isinstance/issubclass methods and so can be pretty minimal without need to import all the typing.py machinery. At which point, when actual type checking was done (anything in typing.py or mypy), the `union` type can be special cased, or promoted to a `typing.Union`. I started by creating a built-in "union" type, that was a thin wrapper around a tuple, such that type.__or__ will return `union(lhs, rhs)` which stored the two items in a 2-length tuple. I also changed the methods in `typing.Union` to return a `union`. However, I hit my first issue: `int | int | str` would then return `union(int, union(int, str))`, whereas it should deduplicate the types. So, I implemented logic to detect and merge union | union, but changed the underlying data to a plain flat tuple. And the next issue I hit was that some of the other `typing.py` types weren't being detected as valid types (probably due to my limited understanding of the PyObject datastructure/flags), specifically, `int | "forwardref"`, I needed a way to signal to typing.py that I had a forwardref without importing it. This is where I changed my approach to the problem. I wanted to find a way to have these metatypes (Union/TypeVar/ForwardRef) without re-implementing them in c, or importing them. My Idea was to change the builtin type to what I called a "shadow type". In the C layer it's just holds a tuple of objects/parameters, and has a type field to say what it is, then when in the type checking/typing.py layer, if it detects an object is a "shadowobject", then it promotes it to a full type. What I liked about this approach, is that for isinstance/issubclass, when it see a "shadowobject" it can just grab the underlying types from the shadowobject->params tuple and no import magic needs to happen. All of the type magic can be left to mypy/typing.py without much knowledge of the shadowobject. I've mostly got through this implementation with all but one test passing. At this point, I've got as far as I can go without help, so I'm posting here, looking for feedback or discussion. I expect that maybe the next step should be me commenting on the PEP604 itself for implementation concerns regarding `_GenericAlias`? Implementation is located at https://github.com/Naddiseo/cpython/tree/PEP604 Richard

Thanks for the encouragement! I've been working some more on it since I had some more free cycles in the last few days and I think I've got to the limit of my capabilities. I think I've got it to a point where it needs more eyes because my experience level writing code in the python interpreter is pretty low, so I know that I have holes in there, especially around using INCREF/DECREF. I'm currently stuck on 3 things: 1. repr( pickle.loads( pickle.dumps(int | str) ) ) causes an infinite recursion that I can't figure out. I might end up re-specializing `repr()` again since it isn't defined in the PEP. 2. - `None | "forwardref"` and `None | None` - `"forwardref" | None` and `"forwardRefA" | "forwardRefB"` I've had a go at adding a `type.__ror__` as suggested by GvR, but I think I'm missing something. 3. type parameters - I'm not sure how to handle this: ``` MaybeT = None | "T" # creates shadow union T = TypeVar('T') maybe_int = MaybeT[int] # should this stay in shadow land, or go back to typing.py isinstance(1, maybe_int) isinstance(1, MaybeT[int]) ``` I have a couple options for this: - implement the type substitution in c; which can be done, but feels like overkill? - store the type parameters on the shadow object and vivify when needed (which will end up being in the isinstance call) - implement the __getitem__ as a call into the typing.py library and promote as it's requested. This is how I've currently implemented it. I'm happy to keep going on this if/when needed, Richard

To add some more input about forward references, here are couple of thoughts: 1. I think ideally we should support forward references to the same extent they are currently supported: i.e. Union[None, "ForwardRef"], Union["OtherForwardRef", None], and Union["ForwardRef", "OtherForwardRef"] should all be supported. The last one looks the hardest, in the sense that it looks like we will need to add `str.__or__()` to support these forms. 2. Forward references to type variables were never supported and we shouldn't start supporting this (even if it accidentally worked in some situations). Type variables should always be defined before use. So I think we should not worry about point 3 in Richard's e-mail. Philippe, I think it makes sense to reflect these points to the PEP draft. -- Ivan On Thu, 14 Nov 2019 at 02:23, Richard Eames <github@naddiseo.ca> wrote:

On 2019-11-17 20:09, Ivan Levkivskyi wrote:
I'm not sure I like the thought of adding `str.__or__()` because it would mean that "Foo" | "Bar" becomes valid and returns a type, which is unexpected. Perhaps an alternative would be to add 'Forward' to make it explicit: Forward("ForwardRef") | Forward("OtherForwardRef") or just: Forward("ForwardRef") | "OtherForwardRef"
[snip]
participants (6)
-
Ivan Levkivskyi
-
Kyle Stanley
-
MRAB
-
Nick Coghlan
-
Philippe Prados
-
Richard Eames