Runtime types vs static types
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
There has been some discussion here and there concerning the differences between runtime types and static types (mypy etc.). What I write below is not really an idea or proposal---just a perspective, or a topic that people may want to discuss. Since the discussion on this is currently very fuzzy and scattered and not really happening either AFAICT (I've probably missed many discussions, though). Anyway, I thought I'd give it a shot: Clearly, there needs to be some sort of distinction between runtime classes/types and static types, because static types can be more precise than Python's dynamic runtime semantics. For example, Iterable[int] is an iterable that contains integers. For a static type checker, it is clear what this means. But at runtime, it may be impossible to figure out whether an iterable is really of this type without consuming the whole iterable and checking whether each yielded element is an integer. Even that is not possible if the iterable is infinite. Even Sequence[int] is problematic, because checking the types of all elements of the sequence could take a long time. Since things like isinstance(it, Iterable[int]) cannot guarantee a proper answer, one easily arrives at the conclusion that static types and runtime classes are just two separate things and that one cannot require that all types support something like isinstance at runtime. On the other hand, there are many runtime things that can or could be done using (type) annotations, for example: Multidispatch (example with hypothetical syntax below): @overload def concatenate(parts: Iterable[str]) -> str: return "".join(parts) @overload def concatenate(parts: Iterable[bytes]) -> bytes: return b"".join(parts) @overload def concatenate(parts: Iterable[Iterable]) -> Iterable: return itertools.chain(*parts) or runtime type checking: @check_types def load_from_file(filename: Union[os.PathLike, str, bytes]): with open(filename) as f: return do_stuff_with(f.read()) which would automatically give a nice error message if, say, a file object is given as argument instead of a path to a file. However useful (and efficient) these things might be, the runtime type checks are problematic, as discussed above. Furthermore, other differences between runtime and static typing may emerge (or have emerged), which will complicate the matter further. For instance, the runtime __annotations__ of classes, modules and functions may in some cases contain something completely different from what a type checker thinks the type should be. These and other incompatibilities between runtime and static typing will create two (or more) different kinds of type-annotated Python: runtime-oriented Python and Python with static type checking. These may be incompatible in both directions: a static type checker may complain about code that is perfectly valid for the runtime folks, and code written for static type checking may not be able to use new Python techniques that make use of type hints at runtime. There may not even be a fully functional subset of the two "languages". Different libraries will adhere to different standards and will not be compatible with each other. The split will be much worse and more difficult to understand than Python 2 vs 3, peoples around the world will suffer like never before, and programming in Python will become a very complicated mess. One way of solving the problem would be that type annotations are only a static concept, like with stubs or comment-based type annotations. This would also be nice from a memory and performance perspective, as evaluating and storing the annotations would not occupy memory (although both issues and some more might be nicely solved by making the annotations lazily ealuated). However, leaving out runtime effects of type annotations is not the approach taken, and runtime introspection of annotations seems to have some promising applications as well. And for many cases, the traditional Python class actually acts very nicely as both the runtime and static type. So if type annotations will be both for runtime and for static checking, how to make everything work for both static and runtime typing? Since a writer of a library does not know what the type hints will be used for by the library users, it is very important that there is only one way of making type annotations which will work regardless of what the annotations are used for in the end. This will also make it much easier to learn Python typing. Regarding runtime types and isinstance, let's look at the Iterable[int] example. For this case, there are a few options: 1) Don't implement isinstance This is problematic for runtime uses of annotations. 2) isinstance([1, '2', 'three'], Iterable[int]) returns True This is in fact now the case. This is ok for many runtime situations, but lacks precision compared to the static version. One may want to distinguish between Iterable[int] and Iterable[str] at runtime (e.g. the multidispatch example above). 3) Check as much as you can at runtime There could be something like Reiterable, which means the object is not consumed by iterating over it, so one could actually check if all elements are instances of int. This would be useful in some situations, but not available for every object. Furthermore, the check could take an arbitrary amount of time so it is not really suitable for things like multidispatch or some matching constructs etc., where the performance overhead of the type check is really important. 4) Do a deeper check than in (2) but trust the annotations For example, an instance of a class that has a method like def __iter__(self) -> Iterator[int]: some code could be identified as Iterable[int] at runtime, even if it is not guaranteed that all elements are really integers. On the other hand, an object returned by def get_ints() -> Iterable[int]: some code does not know its own annotations, so the check is difficult to do at runtime. And of course, there may not be annotations available. 5) Something else? And what about PEP544 (protocols), which is being drafted? The PEP seems to aim for having type objects that represent duck-typing protocols/interfaces. Checking whether a protocol is implemented by an object or type is clearly a useful thing to do at runtime, but it is not really clear if isinstance would be a guaranteed feature for PEP544 Protocols. So one question is, is it possible to draw the lines between what works with isinstance and what doesn't, and between what details are checked by isinstance and what aren't? -- Or should insinstance be reserved for a more limited purpose, and add another check function, say `implements(...)`, which would perhaps guarantee some answer for all combinations of object and type? I'll stop here---this email is probably already much longer than a single email should be ;) -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/2f884/2f884aef3ade483ef3f4b83e3a648e8cbd09bb76" alt=""
I'm guessing to implement PEP 544, many of the `__instancecheck__` and `__subclasscheck__` methods in `typing.py` would need to be updated to check the `__annotations__` of the class of the object it's passed against its own definition, (covered in this section <https://www.python.org/dev/peps/pep-0544/#runtime-decorator-and-narrowing-ty...> of the PEP). I've been somewhat surprised that many of the `__instancecheck__` implementations do not work at runtime, even when the implementation would be trivial (e.g. for `Union`), or would not have subtle edge cases due to immutability (e.g. for `Tuple`, which cannot be used for checking parameterized instances). This seems like counterintuitive behavior that would be straightforward to fix, unless there are subtleties & edge cases I'm missing. If people are amenable to updating those cases, I'd be interested in submitting a patch to that effect. Best, Lucas On Sat, Jun 24, 2017 at 12:42 PM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Sat, Jun 24, 2017 at 11:30 PM, Lucas Wiman <lucas.wiman@gmail.com> wrote:
I may have missed something, but I believe PEP544 is not suggesting that annotations would have any effect on isinstance. Instead, isinstance would by default not work.
Tuple is an interesting case, because for small tuples (say 2- or 3-tuples), it makes perfect sense to check the types of all elements for some runtime purposes. Regarding Union, I believe the current situation has a lot to do with the fact that the relation between type annotations and runtime behavior hasn't really settled yet. If people are amenable to updating those cases, I'd be interested in
submitting a patch to that effect.
Thanks for letting us know. (There may not be an instant decision on this particular case, though, but who knows :) -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sat, Jun 24, 2017 at 10:42:19PM +0300, Koos Zevenhoven wrote: [...]
I think that's backwards: runtime types can be more precise than static types. Runtime types can make use of information known at compile time *and* at runtime, while static types can only make use of information known at compile time. Consider: List[str if today == 'Tuesday' else int] The best that the compile-time checker can do is treat it as List[Union[str, int]] if even that, but at runtime we can tell whether or not [1, 2, 3] is legal or not. But in any case, *static types* and *dynamic types* (runtime types, classes) are distinct concepts, but with significant overlap. Static types apply to *variables* (or expressions) while dynamic types apply to *values*. Values are, in general, only known at runtime.
There's a difference between *requesting* an object's runtime type and *verifying* that it is what it says it is. Of course if we try to verify that an iterator yields nothing but ints, we can't do so without consuming the iterator, or possibly even entering an infinite loop. But we can ask an object what type they are, they can tell you that they're an Iterable[int], and this could be an extremely fast check. Assuming you trust the object not to lie. ("Consenting adults" may apply here.)
That's way too strong. I agree that static types and runtime types (I don't use the term "class" because in principle at least this could include types not implemented as a class, e.g. a struct or record or primitive unboxed value) are distinct, but they do overlap. To describe them as "separate" implies that they are unconnected and that one could sensibly have things which are statically typed as (let's say) Sequence[bool] but runtime typed as float. Gradual typing is useful because the static types are at least an approximation to the runtime types. If they had no connection at all, we'd learn nothing from static type checking and there would be no reason to do it. So static types and runtime types must be at least closely related to be useful. [...]
Yes? What's your point? Consenting adults certainly applies here. There are lots of reasons why people might avoid "new Python techniques" for *anything*, not just type hints: - they have to support older versions of Python; - they're stuck on an older version and can't upgrade; - they just don't like those new techniques. Nobody forces you to run a static type-checker. If you choose to run one, and it gives the wrong answers, then you can: - stop using it; - use a better one that gives the right answer; - fix the broken code that the checker says is broken (regardless of whether it is genuinely broken or not); - add, remove or modify annotations to satisfy the checker; - disable type-checking for that code unit (module?) alone. But the critical thing here is that so long as Python is a dynamically typed language, you cannot eliminate runtime type checks. You can choose *not* to write them in your code, and rely on duck typing and exceptions, but the type checks are still there in the implementation. E.g. you have x + 1 in your code. Even if *you* don't guard with an type check: # if isinstance(x, int): y = x + 1 there's still a runtime check in the byte-code which prevents low-level machine code errors that could lead to a segmentation fault or worse.
There may not even be a fully functional subset of the two "languages".
What do you mean by "fully functional"? Of course there will be working code that can pass both the static checks and run without error. Here's a trivial example: print("Hello World") On the other hand, it's trivially true that code which works at runtime cannot *always* be statically checked: s = input("Type some Python code: ") exec(s) The static type checker cannot possibly check code that doesn't even exist until runtime! I don't think it is plausible to say that there is, or could be, no overlap between (a) legal Python code that runs under a type-checker, and (b) legal Python code that runs without it. That's literally impossible since the type-checker is not part of the Python interpreter, so you can always just *not run the type-checker* to turn (a) into (b).
I think this is Chicken Little "The Sky Is Falling" FUD.
One way of solving the problem would be that type annotations are only a static concept, like with stubs or comment-based type annotations.
I don't agree that there's a problem that needs to be solved.
Sounds like premature optimization to me. How many distinct annotations do you have? How much memory do you think they will use? If you're running 64-bit Python, each pointer to the annotation takes a full eight bytes. If we assume that every annotation is distinct, and we allow 1000 bytes for each annotation, a thousand annotations would only use 1MB of memory. On modern machines, that's trivial. I don't think this will be a problem for the average developer. (Although people programming on embedded devices may be different.) If we want to support that optimization, we could add an optimization flag that strips annotations at runtime, just as the -OO flag strips docstrings. That becomes a matter of *consenting adults* -- if you don't want annotations, you don't need to keep them, but it then becomes your responsibility that you don't try to use them. (If you do, you'll get a runtime AttributeError.)
No, that's backwards. The library creator gets to decide what their library uses annotations for: type-hints, or something else. As the user of a library, I don't get to decide what the library does with its own annotations.
I don't understand this.
That's clearly a bug. If isinstance(... Iterable[int]) is supported at all, then clearly the result should be False. [...]
3) Check as much as you can at runtime
For what purpose?
I suggested something similar to this earlier in this post.
Right -- when annotations are not available, the type checker will either infer types, if it can, or default to the Any type. I don't really understand where you are going with this. The premise, that statically-type-checked Python is fundamentally different from Python-without-static-checks, and therefore we have to bring in a bunch of extra runtime checks to make them the same, seems wrong to me. Perhaps I have not understood you. -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Jul 2, 2017 at 9:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
IMO people should act as if this will eventually be the case. Annotations should be evaluated solely for the purpose of populating __annotations__, and not for any sort of side effects - just like with assertions. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jul 02, 2017 at 09:38:11PM +1000, Chris Angelico wrote:
Avoiding side-effects is generally a good idea, but I think that's probably taking it too far. I think that we should assume that def func(x:Spam()): ... will always look up and call Spam when the function is defined. But we should be prepared that func.__annotations__ might not exist, if we're running in a highly-optimized mode, or MicroPython, or similar. -- Steve
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 7/2/2017 7:57 AM, Steven D'Aprano wrote:
Code that does not control the compilation of the file with func should also not assume the existence of func.__doc__. On the other hand, programs, such as IDEs, that do control compilation, by calling the standard compile(), can assume both attributes if they pass the appropriate compile flags. -- Terry Jan Reedy
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Sun, Jul 2, 2017 at 2:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
This is not backwards -- just a different interpretation of the same situation. In fact, the problem is that 'type' already means too many different things. Clarity of terminology for a concept helps a lot in making the concept itself simpler and easier for both the designers and the users. This is analogous to the problem in English language that the verb 'argue' does not have a single meaning. Sometimes the concepts of an argument and a productive discussion get mixed up, and people who want to discuss productively just end up going away because others are turning the discussion into an argument. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/2f884/2f884aef3ade483ef3f4b83e3a648e8cbd09bb76" alt=""
I'm guessing to implement PEP 544, many of the `__instancecheck__` and `__subclasscheck__` methods in `typing.py` would need to be updated to check the `__annotations__` of the class of the object it's passed against its own definition, (covered in this section <https://www.python.org/dev/peps/pep-0544/#runtime-decorator-and-narrowing-ty...> of the PEP). I've been somewhat surprised that many of the `__instancecheck__` implementations do not work at runtime, even when the implementation would be trivial (e.g. for `Union`), or would not have subtle edge cases due to immutability (e.g. for `Tuple`, which cannot be used for checking parameterized instances). This seems like counterintuitive behavior that would be straightforward to fix, unless there are subtleties & edge cases I'm missing. If people are amenable to updating those cases, I'd be interested in submitting a patch to that effect. Best, Lucas On Sat, Jun 24, 2017 at 12:42 PM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Sat, Jun 24, 2017 at 11:30 PM, Lucas Wiman <lucas.wiman@gmail.com> wrote:
I may have missed something, but I believe PEP544 is not suggesting that annotations would have any effect on isinstance. Instead, isinstance would by default not work.
Tuple is an interesting case, because for small tuples (say 2- or 3-tuples), it makes perfect sense to check the types of all elements for some runtime purposes. Regarding Union, I believe the current situation has a lot to do with the fact that the relation between type annotations and runtime behavior hasn't really settled yet. If people are amenable to updating those cases, I'd be interested in
submitting a patch to that effect.
Thanks for letting us know. (There may not be an instant decision on this particular case, though, but who knows :) -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sat, Jun 24, 2017 at 10:42:19PM +0300, Koos Zevenhoven wrote: [...]
I think that's backwards: runtime types can be more precise than static types. Runtime types can make use of information known at compile time *and* at runtime, while static types can only make use of information known at compile time. Consider: List[str if today == 'Tuesday' else int] The best that the compile-time checker can do is treat it as List[Union[str, int]] if even that, but at runtime we can tell whether or not [1, 2, 3] is legal or not. But in any case, *static types* and *dynamic types* (runtime types, classes) are distinct concepts, but with significant overlap. Static types apply to *variables* (or expressions) while dynamic types apply to *values*. Values are, in general, only known at runtime.
There's a difference between *requesting* an object's runtime type and *verifying* that it is what it says it is. Of course if we try to verify that an iterator yields nothing but ints, we can't do so without consuming the iterator, or possibly even entering an infinite loop. But we can ask an object what type they are, they can tell you that they're an Iterable[int], and this could be an extremely fast check. Assuming you trust the object not to lie. ("Consenting adults" may apply here.)
That's way too strong. I agree that static types and runtime types (I don't use the term "class" because in principle at least this could include types not implemented as a class, e.g. a struct or record or primitive unboxed value) are distinct, but they do overlap. To describe them as "separate" implies that they are unconnected and that one could sensibly have things which are statically typed as (let's say) Sequence[bool] but runtime typed as float. Gradual typing is useful because the static types are at least an approximation to the runtime types. If they had no connection at all, we'd learn nothing from static type checking and there would be no reason to do it. So static types and runtime types must be at least closely related to be useful. [...]
Yes? What's your point? Consenting adults certainly applies here. There are lots of reasons why people might avoid "new Python techniques" for *anything*, not just type hints: - they have to support older versions of Python; - they're stuck on an older version and can't upgrade; - they just don't like those new techniques. Nobody forces you to run a static type-checker. If you choose to run one, and it gives the wrong answers, then you can: - stop using it; - use a better one that gives the right answer; - fix the broken code that the checker says is broken (regardless of whether it is genuinely broken or not); - add, remove or modify annotations to satisfy the checker; - disable type-checking for that code unit (module?) alone. But the critical thing here is that so long as Python is a dynamically typed language, you cannot eliminate runtime type checks. You can choose *not* to write them in your code, and rely on duck typing and exceptions, but the type checks are still there in the implementation. E.g. you have x + 1 in your code. Even if *you* don't guard with an type check: # if isinstance(x, int): y = x + 1 there's still a runtime check in the byte-code which prevents low-level machine code errors that could lead to a segmentation fault or worse.
There may not even be a fully functional subset of the two "languages".
What do you mean by "fully functional"? Of course there will be working code that can pass both the static checks and run without error. Here's a trivial example: print("Hello World") On the other hand, it's trivially true that code which works at runtime cannot *always* be statically checked: s = input("Type some Python code: ") exec(s) The static type checker cannot possibly check code that doesn't even exist until runtime! I don't think it is plausible to say that there is, or could be, no overlap between (a) legal Python code that runs under a type-checker, and (b) legal Python code that runs without it. That's literally impossible since the type-checker is not part of the Python interpreter, so you can always just *not run the type-checker* to turn (a) into (b).
I think this is Chicken Little "The Sky Is Falling" FUD.
One way of solving the problem would be that type annotations are only a static concept, like with stubs or comment-based type annotations.
I don't agree that there's a problem that needs to be solved.
Sounds like premature optimization to me. How many distinct annotations do you have? How much memory do you think they will use? If you're running 64-bit Python, each pointer to the annotation takes a full eight bytes. If we assume that every annotation is distinct, and we allow 1000 bytes for each annotation, a thousand annotations would only use 1MB of memory. On modern machines, that's trivial. I don't think this will be a problem for the average developer. (Although people programming on embedded devices may be different.) If we want to support that optimization, we could add an optimization flag that strips annotations at runtime, just as the -OO flag strips docstrings. That becomes a matter of *consenting adults* -- if you don't want annotations, you don't need to keep them, but it then becomes your responsibility that you don't try to use them. (If you do, you'll get a runtime AttributeError.)
No, that's backwards. The library creator gets to decide what their library uses annotations for: type-hints, or something else. As the user of a library, I don't get to decide what the library does with its own annotations.
I don't understand this.
That's clearly a bug. If isinstance(... Iterable[int]) is supported at all, then clearly the result should be False. [...]
3) Check as much as you can at runtime
For what purpose?
I suggested something similar to this earlier in this post.
Right -- when annotations are not available, the type checker will either infer types, if it can, or default to the Any type. I don't really understand where you are going with this. The premise, that statically-type-checked Python is fundamentally different from Python-without-static-checks, and therefore we have to bring in a bunch of extra runtime checks to make them the same, seems wrong to me. Perhaps I have not understood you. -- Steve
data:image/s3,"s3://crabby-images/0f8ec/0f8eca326d99e0699073a022a66a77b162e23683" alt=""
On Sun, Jul 2, 2017 at 9:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
IMO people should act as if this will eventually be the case. Annotations should be evaluated solely for the purpose of populating __annotations__, and not for any sort of side effects - just like with assertions. ChrisA
data:image/s3,"s3://crabby-images/6a9ad/6a9ad89a7f4504fbd33d703f493bf92e3c0cc9a9" alt=""
On Sun, Jul 02, 2017 at 09:38:11PM +1000, Chris Angelico wrote:
Avoiding side-effects is generally a good idea, but I think that's probably taking it too far. I think that we should assume that def func(x:Spam()): ... will always look up and call Spam when the function is defined. But we should be prepared that func.__annotations__ might not exist, if we're running in a highly-optimized mode, or MicroPython, or similar. -- Steve
data:image/s3,"s3://crabby-images/e2594/e259423d3f20857071589262f2cb6e7688fbc5bf" alt=""
On 7/2/2017 7:57 AM, Steven D'Aprano wrote:
Code that does not control the compilation of the file with func should also not assume the existence of func.__doc__. On the other hand, programs, such as IDEs, that do control compilation, by calling the standard compile(), can assume both attributes if they pass the appropriate compile flags. -- Terry Jan Reedy
data:image/s3,"s3://crabby-images/9d55a/9d55a9d1915303c24fcf368a61919b9d2c534a18" alt=""
On Sun, Jul 2, 2017 at 2:16 PM, Steven D'Aprano <steve@pearwood.info> wrote:
This is not backwards -- just a different interpretation of the same situation. In fact, the problem is that 'type' already means too many different things. Clarity of terminology for a concept helps a lot in making the concept itself simpler and easier for both the designers and the users. This is analogous to the problem in English language that the verb 'argue' does not have a single meaning. Sometimes the concepts of an argument and a productive discussion get mixed up, and people who want to discuss productively just end up going away because others are turning the discussion into an argument. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
participants (6)
-
Chris Angelico
-
Greg Ewing
-
Koos Zevenhoven
-
Lucas Wiman
-
Steven D'Aprano
-
Terry Reedy