Lessons from typing hinting Whoosh (PEP484)

Since PyCharm added support for pep484 I've been working on adding type hints to the Whoosh code base. It makes it a lot more enjoyable to work with the code in the IDE. As I add more and more hints, it's already revealed bugs where arguments were accidentally swapped, or interfaces weren't updated everywhere. I feel much more confident working in the (quite large) codebase with typing. Basically, I'm loving the idea of PEP484. For me, it's a huge improvement in the development experience. However, I've already run into a few issues. I'm not sure if these were discussed during the development of PEP484. 1. The biggest problem, most perplexing problem for me is circular imports. That type hints must actually be imported at the top level is a MASSIVE limitation, at least in a big codebase. I started off importing the types I needed, then switched to the "import the module, specify the type in a string" trick, but even that has failed. In a large/complex codebase, the more things you have to import at the top level (to use for typing), the more potential import circles you create. Now I've hit an impasse where every step in the circle is using the module/string trick, but it doesn't help because I need to subclass something in the circle. For now, I'm going to have to workaround this, either by trying to move "interface" base classes out to their own side package, or just not type hinting some things. Neither is appetizing at all. I think a good solution might be if the typing system could have its own special "imports" that are only used by the typing system, e.g. something like this at the top of a file: # "real" import from foo import bar # imports just for type checking __typeimports__ = """ from baz import Qux """ 2. It would be really nice if we could have "type aliasing" (or whatever it's called). I have a lot of types that are just something like "Tuple[int, int]", so type checking doesn't help much. It would be much more useful if I have a value that Python sees as (e.g.) an int, but have the type system track it as something more specific. Something like this: DocId = typing.TypeAlias(int) DocLength = typing.TypeAlias(int) def id_and_length() -> Tuple[DocId, Length]: docid = 5 # type: DocId length = 10 # type: Length return docid, length 3. I can't tell from the PEP484 documentation... If I have a fully hinted base class, and subclass it with the same signature for every method, am I still supposed to hint all the arguments and returns in the subclass? That's what I've been doing, but it's pretty tedious. Cheers,

Matt Chaput wrote:
like... type aliases? https://www.python.org/dev/peps/pep-0484/#type-aliases -- By ZeD

like... type aliases? https://www.python.org/dev/peps/pep-0484/#type-aliases
As I read it, that's just a way to avoid having to spell out the same type definition more than once. It's not the same as being able to define a new type for the type system which Python sees as a plain old primitive type. I'm probably not using the right programming language theory word for this concept :) Matt

On Tue, Nov 17, 2015 at 2:48 AM, Matt Chaput <matt@whoosh.ca> wrote:
As I read it, that's just a way to avoid having to spell out the same type definition more than once. It's not the same as being able to define a new type for the type system which Python sees as a plain old primitive type.
I'm probably not using the right programming language theory word for this concept :)
I'm not entirely sure, but I think you might be able to subclass something to create a special form. A length might be a subclass of integer, and a docid could be a different subclass of integer. Lengths and document IDs could then be counted as distinct from each other, although any function that expects an integer would happily accept either. (That might not be appropriate for docid, but it would for length (eg range() should accept them), so you might want to do docid some other way.) ChrisA

Or like structural typing - there has already been some discussion on this list: https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 On 16.11.2015 07:51, Vito De Tullio wrote:

On Nov 15, 2015, at 22:17, Matt Chaput <matt@whoosh.ca> wrote:
It sounds like you want a subtype that adds no new semantics or other runtime effects. Like this: class DocId(int): pass class DocLength(int): pass def id_and_length() -> Tuple[DocId, DocLength]: return DocId(5), DocLength(10) These will behave exactly like int objects, except that you can (dynamically or statically) type check them. It does mean that if someone uses "type(spam[0]) == int" it will fail, but I think if you care either way, you'd actually want it to fail. Meanwhile, "isinstance(spam[0], int)" or "spam[0] + eggs" or even using it in a function that requires something usable as a C long will work as expected. The object will also be the same size as an int in memory (although it will pickle a little bigger). It can't be optimized into a constant at compile time, but I doubt that's ever an issue. And it makes your intentions perfectly clear.

If you really want DocId to be as small as int you should add `__slots__ = ()` to the class def. It's still a bit slower (some code special-cases exactly int) and it also has some other imperfections -- e.g. DocId(2) + DocId(2) == int(4), not DocId(4). But that's probably okay for a document ID (and the solution is way too messy to use). But in general this is a pretty verbose option. If the main purpose is for documentation maybe type aliases, only used on annotations and other types are enough (`DocId = int`). But yeah, the type checker won't track it. (That's a possible future feature though.) On Mon, Nov 16, 2015 at 9:19 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

On Nov 16, 2015, at 09:31, Guido van Rossum <guido@python.org> wrote:
Good points. But I don't think any of those things matter for the OP's use case. And surely being able to runtime-check the type and get the same results as compile-time checks, and not requiring any new language features, are advantages? I'm sure there are cases where the performance matters, but are there enough cases where the performance matters, runtime typing (e.g., in logs and debugger) doesn't matter, and static typing does matter? As a side note, most of the C++ family have something akin to "typedef" and/or "using" type aliases that are explicitly _not_ treated as different types by the static typing algorithm: if your function wants a DocId and I pass an int, you get neither an error nor a warning, because they are literally just different names for the same type. So, if Python really does need a feature that's stricter than what C++ and friends do, using a name that evokes comparisons to their feature is probably not a good idea. The way C++ does what the OP is asking for is exactly what I suggested: an empty class that publicly inherits the type. (Of course it only works with class types, not simple types like int, in C++. But Python only has class types, so that wouldn't be a problem.)
But in general this is a pretty verbose option.
Even with __slots__, it's fewer characters, and the same number of concepts, as the OP's proposed solution. I think it's also more obvious to read: you're saying that DocId is a brand-new type that's a subtype of int (that adds nothing), which by definition means it can be used anywhere an int can be used, but not vice-versa. You don't have to know anything about MyPy or isinstance or anything else to figure out how they'll handle it (except the basic knowledge that Python's type system is generally a sensible OO system).

On Mon, Nov 16, 2015 at 2:16 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Well, *you* claimed there was no size difference.
And surely being able to runtime-check the type and get the same results as compile-time checks, and not requiring any new language features, are advantages?
Now you're changing the subject -- Matt very specifically asked for something that at runtime was just an int but was tracked more specifically by the type checker. We currently don't have that: there's either some runtime overhead (your solution) or the type checker doesn't track it (type aliasing as PEP 484 defines it). Anyway, lots of things PEP 484 tracks cannot be checked at runtime (e.g. anything involving TypeVar).
I'm sure there are cases where the performance matters, but are there enough cases where the performance matters, runtime typing (e.g., in logs and debugger) doesn't matter, and static typing does matter?
A key requirement for PEP 484 is that there's no runtime overhead (apart from modules loading a tiny bit slower because the annotations are evaluated). Otherwise people will be afraid of using it. Having zillions of subclasses of builtin types at runtime just so the type checker can track them separately is in direct contradiction to this requirement. (Note that the BDFL-delegate specifically insisted that we remove isinstance() support from PEP-484's types.)
As a side note, most of the C++ family have something akin to "typedef" and/or "using" type aliases that are explicitly _not_ treated as different types by the static typing algorithm: if your function wants a DocId and I pass an int, you get neither an error nor a warning, because they are literally just different names for the same type.
Yes, that's what PEP 484 type aliases do too.
So, if Python really does need a feature that's stricter than what C++ and friends do, using a name that evokes comparisons to their feature is probably not a good idea. The way C++ does what the OP is asking for is exactly what I suggested: an empty class that publicly inherits the type. (Of course it only works with class types, not simple types like int, in C++. But Python only has class types, so that wouldn't be a problem.)
But C++ classes like that have no runtime overhead. The equivalent Python syntax does, alas.
But in general this is a pretty verbose option.
Even with __slots__, it's fewer characters, and the same number of concepts, as the OP's proposed solution.
But such a hack -- everytime someone sees that they'll wonder why the `__slots__ = ()`. And AFAICT Matt didn't propose any solution -- he just showed some example code indicating what he wanted to be taken care of automatically.
I think it's also more obvious to read: you're saying that DocId is a brand-new type that's a subtype of int (that adds nothing), which by definition means it can be used anywhere an int can be used, but not vice-versa. You don't have to know anything about MyPy or isinstance or anything else to figure out how they'll handle it (except the basic knowledge that Python's type system is generally a sensible OO system).
Yeah, but I'd still prefer a solution that is only read by the type checker. And defining subclasses of int is pretty uncommon, so it'll confuse the heck out of a lot of readers -- much more so than other parts of type annotations (which are ignorable). Anyway, let's stop bickering until Matt has had the time to read the thread and respond. -- --Guido van Rossum (python.org/~guido)

As a side note, most of the C++ family have something akin to "typedef" and/or "using" type aliases that are explicitly _not_ treated as different types by the static typing algorithm:
...
Yes, that's what PEP 484 type aliases do too.
Darn -- C typedefs have their uses, but improving type safety is not one of them. PEP 484 is all about type checking/safety -- it seems it would be a lot more useful if aliases were treated as different types. Oh well. CHB

No, PEP 484 is about a pragmatic compromise that doesn't require adding new syntax and yet allows a reasonable amount of checking. Type checking religion has no place here. On Mon, Nov 16, 2015 at 5:27 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
-- --Guido van Rossum (python.org/~guido)

I forgot to say, there's been a couple of places where I haven't been able to add hinting that hopefully can be improved: 1. Standard library types that should be public but aren't (regular expression object) 2. Specifying an argument/return that's a class (that should be a subclass of some class). There might be a way to do 2 that I don't know about :) Cheers, Matt

#1: typing.re defines the re types. For the others, the typeshed project takes contributions. #2: You can use def f() -> type: ... to specify a class as return type; but we currently don't have a way to contrain that class. On Mon, Nov 16, 2015 at 6:41 PM, Matt Chaput <matt@whoosh.ca> wrote:
-- --Guido van Rossum (python.org/~guido)

There's an existing (but postponed) proposal to use Type[X]: https://github.com/ambv/typehinting/issues/107 On Mon, Nov 16, 2015 at 7:36 PM, Ryan Gonzalez <rymg19@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)

Matt Chaput wrote:
like... type aliases? https://www.python.org/dev/peps/pep-0484/#type-aliases -- By ZeD

like... type aliases? https://www.python.org/dev/peps/pep-0484/#type-aliases
As I read it, that's just a way to avoid having to spell out the same type definition more than once. It's not the same as being able to define a new type for the type system which Python sees as a plain old primitive type. I'm probably not using the right programming language theory word for this concept :) Matt

On Tue, Nov 17, 2015 at 2:48 AM, Matt Chaput <matt@whoosh.ca> wrote:
As I read it, that's just a way to avoid having to spell out the same type definition more than once. It's not the same as being able to define a new type for the type system which Python sees as a plain old primitive type.
I'm probably not using the right programming language theory word for this concept :)
I'm not entirely sure, but I think you might be able to subclass something to create a special form. A length might be a subclass of integer, and a docid could be a different subclass of integer. Lengths and document IDs could then be counted as distinct from each other, although any function that expects an integer would happily accept either. (That might not be appropriate for docid, but it would for length (eg range() should accept them), so you might want to do docid some other way.) ChrisA

Or like structural typing - there has already been some discussion on this list: https://github.com/ambv/typehinting/issues/11#issuecomment-138133867 On 16.11.2015 07:51, Vito De Tullio wrote:

On Nov 15, 2015, at 22:17, Matt Chaput <matt@whoosh.ca> wrote:
It sounds like you want a subtype that adds no new semantics or other runtime effects. Like this: class DocId(int): pass class DocLength(int): pass def id_and_length() -> Tuple[DocId, DocLength]: return DocId(5), DocLength(10) These will behave exactly like int objects, except that you can (dynamically or statically) type check them. It does mean that if someone uses "type(spam[0]) == int" it will fail, but I think if you care either way, you'd actually want it to fail. Meanwhile, "isinstance(spam[0], int)" or "spam[0] + eggs" or even using it in a function that requires something usable as a C long will work as expected. The object will also be the same size as an int in memory (although it will pickle a little bigger). It can't be optimized into a constant at compile time, but I doubt that's ever an issue. And it makes your intentions perfectly clear.

If you really want DocId to be as small as int you should add `__slots__ = ()` to the class def. It's still a bit slower (some code special-cases exactly int) and it also has some other imperfections -- e.g. DocId(2) + DocId(2) == int(4), not DocId(4). But that's probably okay for a document ID (and the solution is way too messy to use). But in general this is a pretty verbose option. If the main purpose is for documentation maybe type aliases, only used on annotations and other types are enough (`DocId = int`). But yeah, the type checker won't track it. (That's a possible future feature though.) On Mon, Nov 16, 2015 at 9:19 AM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

On Nov 16, 2015, at 09:31, Guido van Rossum <guido@python.org> wrote:
Good points. But I don't think any of those things matter for the OP's use case. And surely being able to runtime-check the type and get the same results as compile-time checks, and not requiring any new language features, are advantages? I'm sure there are cases where the performance matters, but are there enough cases where the performance matters, runtime typing (e.g., in logs and debugger) doesn't matter, and static typing does matter? As a side note, most of the C++ family have something akin to "typedef" and/or "using" type aliases that are explicitly _not_ treated as different types by the static typing algorithm: if your function wants a DocId and I pass an int, you get neither an error nor a warning, because they are literally just different names for the same type. So, if Python really does need a feature that's stricter than what C++ and friends do, using a name that evokes comparisons to their feature is probably not a good idea. The way C++ does what the OP is asking for is exactly what I suggested: an empty class that publicly inherits the type. (Of course it only works with class types, not simple types like int, in C++. But Python only has class types, so that wouldn't be a problem.)
But in general this is a pretty verbose option.
Even with __slots__, it's fewer characters, and the same number of concepts, as the OP's proposed solution. I think it's also more obvious to read: you're saying that DocId is a brand-new type that's a subtype of int (that adds nothing), which by definition means it can be used anywhere an int can be used, but not vice-versa. You don't have to know anything about MyPy or isinstance or anything else to figure out how they'll handle it (except the basic knowledge that Python's type system is generally a sensible OO system).

On Mon, Nov 16, 2015 at 2:16 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Well, *you* claimed there was no size difference.
And surely being able to runtime-check the type and get the same results as compile-time checks, and not requiring any new language features, are advantages?
Now you're changing the subject -- Matt very specifically asked for something that at runtime was just an int but was tracked more specifically by the type checker. We currently don't have that: there's either some runtime overhead (your solution) or the type checker doesn't track it (type aliasing as PEP 484 defines it). Anyway, lots of things PEP 484 tracks cannot be checked at runtime (e.g. anything involving TypeVar).
I'm sure there are cases where the performance matters, but are there enough cases where the performance matters, runtime typing (e.g., in logs and debugger) doesn't matter, and static typing does matter?
A key requirement for PEP 484 is that there's no runtime overhead (apart from modules loading a tiny bit slower because the annotations are evaluated). Otherwise people will be afraid of using it. Having zillions of subclasses of builtin types at runtime just so the type checker can track them separately is in direct contradiction to this requirement. (Note that the BDFL-delegate specifically insisted that we remove isinstance() support from PEP-484's types.)
As a side note, most of the C++ family have something akin to "typedef" and/or "using" type aliases that are explicitly _not_ treated as different types by the static typing algorithm: if your function wants a DocId and I pass an int, you get neither an error nor a warning, because they are literally just different names for the same type.
Yes, that's what PEP 484 type aliases do too.
So, if Python really does need a feature that's stricter than what C++ and friends do, using a name that evokes comparisons to their feature is probably not a good idea. The way C++ does what the OP is asking for is exactly what I suggested: an empty class that publicly inherits the type. (Of course it only works with class types, not simple types like int, in C++. But Python only has class types, so that wouldn't be a problem.)
But C++ classes like that have no runtime overhead. The equivalent Python syntax does, alas.
But in general this is a pretty verbose option.
Even with __slots__, it's fewer characters, and the same number of concepts, as the OP's proposed solution.
But such a hack -- everytime someone sees that they'll wonder why the `__slots__ = ()`. And AFAICT Matt didn't propose any solution -- he just showed some example code indicating what he wanted to be taken care of automatically.
I think it's also more obvious to read: you're saying that DocId is a brand-new type that's a subtype of int (that adds nothing), which by definition means it can be used anywhere an int can be used, but not vice-versa. You don't have to know anything about MyPy or isinstance or anything else to figure out how they'll handle it (except the basic knowledge that Python's type system is generally a sensible OO system).
Yeah, but I'd still prefer a solution that is only read by the type checker. And defining subclasses of int is pretty uncommon, so it'll confuse the heck out of a lot of readers -- much more so than other parts of type annotations (which are ignorable). Anyway, let's stop bickering until Matt has had the time to read the thread and respond. -- --Guido van Rossum (python.org/~guido)

As a side note, most of the C++ family have something akin to "typedef" and/or "using" type aliases that are explicitly _not_ treated as different types by the static typing algorithm:
...
Yes, that's what PEP 484 type aliases do too.
Darn -- C typedefs have their uses, but improving type safety is not one of them. PEP 484 is all about type checking/safety -- it seems it would be a lot more useful if aliases were treated as different types. Oh well. CHB

No, PEP 484 is about a pragmatic compromise that doesn't require adding new syntax and yet allows a reasonable amount of checking. Type checking religion has no place here. On Mon, Nov 16, 2015 at 5:27 PM, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
-- --Guido van Rossum (python.org/~guido)

I forgot to say, there's been a couple of places where I haven't been able to add hinting that hopefully can be improved: 1. Standard library types that should be public but aren't (regular expression object) 2. Specifying an argument/return that's a class (that should be a subclass of some class). There might be a way to do 2 that I don't know about :) Cheers, Matt

#1: typing.re defines the re types. For the others, the typeshed project takes contributions. #2: You can use def f() -> type: ... to specify a class as return type; but we currently don't have a way to contrain that class. On Mon, Nov 16, 2015 at 6:41 PM, Matt Chaput <matt@whoosh.ca> wrote:
-- --Guido van Rossum (python.org/~guido)

There's an existing (but postponed) proposal to use Type[X]: https://github.com/ambv/typehinting/issues/107 On Mon, Nov 16, 2015 at 7:36 PM, Ryan Gonzalez <rymg19@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido)
participants (8)
-
Andrew Barnert
-
Chris Angelico
-
Chris Barker - NOAA Federal
-
Guido van Rossum
-
Matt Chaput
-
Ryan Gonzalez
-
Sven R. Kunze
-
Vito De Tullio