Type hinting for path-related functions
I actually proposed this already in one of the pathlib threads on python-dev, but I decided to repost here, because this is easily seen as a separate issue. I'll start with some introduction, then moving on to the actual type hinting part. In our seemingly never-ending discussions about pathlib support in the stdlib in various threads, first here on python-ideas, then even more extensively on python-dev, have perhaps almost converged. The required changes involve a protocol method, probably named __fspath__, which any path-like type could implement to return a more, let's say, "classical" path object such as a str. However, the protocol is polymorphic and may also return bytes, which has a lot do do with the fact that the stdlib itself is polymophic and currently accepts str as well as bytes paths almost everywhere, including the newly-introduced os.scandir + DirEntry combination. The upcoming improvements will further allow passing pathlib path objects as well as DirEntry objects to any stdlib function that take paths. It came up, for instance here [1], that the function associated with the protocol, potentially named os.fspath, will end up needing type hints. This function takes pathlike objects and turns them into str or bytes. There are various different scenarios [2] that can be considered for code dealing with paths, but let's consider the case of os.path.* and other traditional python path-related functions. Some examples: os.path.join Currently, it takes str or bytes paths and returns a joined path of the same type (mixing different types raises an exception). In the future, it will also accept pathlib objects (underlying type always str) and DirEntry (underlying type str or bytes) or third-party path objects (underlying type str or bytes). The function will then return a pathname of the underlying type. os.path.dirname Currently, it takes a str or bytes and returns the dirname of the same type. In the future, it will also accept Path and DirEntry and return the underlying type. Let's consider the type hint of os.path.dirname at present and in the future: Currently, one could write def dirname(p: Union[str, bytes]) -> Union[str, bytes]: ... While this is valid, it could be more precise: pathstring = typing.TypeVar('pathstring', str, bytes) def dirname(p: pathstring) -> pathstring: ... This now contains the information that the return type is the same as the argument type. The name 'pathstring' may be considered slightly misleading because "byte strings" are not actually strings in Python 3, but at least it does not advertise the use of bytes as paths, which is very rarely desirable. But what about the future. There are two kinds of rich path objects, those with an underlying type of str and those with an underlying type of bytes. These should implement the __fspath__() protocol and return their underlying type. However, we do care about what (underlying) type is provided by the protocol, so we might want to introduce something like typing.FSPath[underlying_type]: FSPath[str] # str-based pathlike, including str FSPath[bytes] # bytes-based pathlike, including bytes And now, using the above defined TypeVar pathstring, the future version of dirname would be type annotated as follows: def dirname(p: FSPath[pathstring]) -> pathstring: ... It's getting late. I hope this made sense :). -Koos [1] https://mail.python.org/pipermail/python-dev/2016-April/144246.html [2] https://mail.python.org/pipermail/python-dev/2016-April/144239.html
Your pathstring seems to be the same as the predefined (in typing.py, and PEP 484) AnyStr. You are indeed making sense, except that for various reasons the stdlib is not likely to adopt in-line signature annotations yet -- not even for new code. However once there's agreement on os.fspath() it can be added to the stubs in github.com/python/typeshed. Is there going to be a PEP for os.fspath()? (I muted most of the discussions so I'm not sure where it stands.) On Mon, Apr 18, 2016 at 5:40 PM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
I actually proposed this already in one of the pathlib threads on python-dev, but I decided to repost here, because this is easily seen as a separate issue. I'll start with some introduction, then moving on to the actual type hinting part.
In our seemingly never-ending discussions about pathlib support in the stdlib in various threads, first here on python-ideas, then even more extensively on python-dev, have perhaps almost converged. The required changes involve a protocol method, probably named __fspath__, which any path-like type could implement to return a more, let's say, "classical" path object such as a str. However, the protocol is polymorphic and may also return bytes, which has a lot do do with the fact that the stdlib itself is polymophic and currently accepts str as well as bytes paths almost everywhere, including the newly-introduced os.scandir + DirEntry combination. The upcoming improvements will further allow passing pathlib path objects as well as DirEntry objects to any stdlib function that take paths.
It came up, for instance here [1], that the function associated with the protocol, potentially named os.fspath, will end up needing type hints. This function takes pathlike objects and turns them into str or bytes. There are various different scenarios [2] that can be considered for code dealing with paths, but let's consider the case of os.path.* and other traditional python path-related functions.
Some examples:
os.path.join
Currently, it takes str or bytes paths and returns a joined path of the same type (mixing different types raises an exception).
In the future, it will also accept pathlib objects (underlying type always str) and DirEntry (underlying type str or bytes) or third-party path objects (underlying type str or bytes). The function will then return a pathname of the underlying type.
os.path.dirname
Currently, it takes a str or bytes and returns the dirname of the same type. In the future, it will also accept Path and DirEntry and return the underlying type.
Let's consider the type hint of os.path.dirname at present and in the future:
Currently, one could write
def dirname(p: Union[str, bytes]) -> Union[str, bytes]: ...
While this is valid, it could be more precise:
pathstring = typing.TypeVar('pathstring', str, bytes)
def dirname(p: pathstring) -> pathstring: ...
This now contains the information that the return type is the same as the argument type. The name 'pathstring' may be considered slightly misleading because "byte strings" are not actually strings in Python 3, but at least it does not advertise the use of bytes as paths, which is very rarely desirable.
But what about the future. There are two kinds of rich path objects, those with an underlying type of str and those with an underlying type of bytes. These should implement the __fspath__() protocol and return their underlying type. However, we do care about what (underlying) type is provided by the protocol, so we might want to introduce something like typing.FSPath[underlying_type]:
FSPath[str] # str-based pathlike, including str FSPath[bytes] # bytes-based pathlike, including bytes
And now, using the above defined TypeVar pathstring, the future version of dirname would be type annotated as follows:
def dirname(p: FSPath[pathstring]) -> pathstring: ...
It's getting late. I hope this made sense :).
-Koos
[1] https://mail.python.org/pipermail/python-dev/2016-April/144246.html [2] https://mail.python.org/pipermail/python-dev/2016-April/144239.html _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On 04/18/2016 06:27 PM, Guido van Rossum wrote:
Is there going to be a PEP for os.fspath()? (I muted most of the discussions so I'm not sure where it stands.)
We're nearing the end of the discussions. Brett Cannon and Chris Angelico will draw up an amendment to the pathlib PEP. -- ~Ethan~
On Tue, Apr 19, 2016 at 4:27 AM, Guido van Rossum <guido@python.org> wrote:
Your pathstring seems to be the same as the predefined (in typing.py, and PEP 484) AnyStr.
Oh, there too! :) I thought I will need a TypeVar, so I turned to help(typing.TypeVar) to look up how to do that, and there it was, right in front of me, just with a different name 'A': A = TypeVar('A', str, bytes) Anyway, it might make sense to consider defining 'pathstring' (or 'PathStr' for consistency?), even if it would be the same as AnyStr. Then, hypothetically, if at any point in the far future, bytes paths would be deprecated, it could be considered to make PathStr just str. After all, we don't want just Any String, we want something that represents a path (in a documentation sense).
You are indeed making sense, except that for various reasons the stdlib is not likely to adopt in-line signature annotations yet -- not even for new code.
However once there's agreement on os.fspath() it can be added to the stubs in github.com/python/typeshed.
I see, and I did have that impression already about the stdlib and type hints, probably based on some of your writings. My intention was to write these in the stub format, but apparently I need to look up the stub syntax once more.
Is there going to be a PEP for os.fspath()? (I muted most of the discussions so I'm not sure where it stands.)
It has not seemed like a good idea to discuss this (too?), but now that you ask, I have been wondering how optimal it is to add this to the pathlib PEP. While the changes do affect pathlib (even the code of the module itself), this will affect ntpath, posixpath, os.scandir, os.[other stuff], DirEntry (tempted to say os.DirEntry, but that is not true), shutil.[stuff], (io.)open, and potentially all kinds of random places in the stdlib, such as fileinput, filecmp, zipfile, tarfile, tempfile (for the 'dir' keyword arguments), maybe even glob, and fnmatch, to name a few :). And now, if the FSPath[underlying_type] I just proposed ends up being added to typing (by whatever name), this will even affect typing.py. -Koos
On Tue, Apr 19, 2016 at 3:41 AM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Tue, Apr 19, 2016 at 4:27 AM, Guido van Rossum <guido@python.org> wrote:
Your pathstring seems to be the same as the predefined (in typing.py, and PEP 484) AnyStr.
Oh, there too! :) I thought I will need a TypeVar, so I turned to help(typing.TypeVar) to look up how to do that, and there it was, right in front of me, just with a different name 'A':
A = TypeVar('A', str, bytes)
Anyway, it might make sense to consider defining 'pathstring' (or 'PathStr' for consistency?), even if it would be the same as AnyStr. Then, hypothetically, if at any point in the far future, bytes paths would be deprecated, it could be considered to make PathStr just str. After all, we don't want just Any String, we want something that represents a path (in a documentation sense).
Unfortunately, until we implement something like "NewType" ( https://github.com/python/typing/issues/189) the type checkers won't check whether you're actually using the right thing, so while the separate name would add a bit of documentation, I doubt that you'll ever be able to change the meaning of PathStr. Also, I don't expect a future where bytes paths don't make sense, unless Linux starts enforcing a normalized UTF-8 encoding in the kernel.
You are indeed making sense, except that for various reasons the stdlib is not likely to adopt in-line signature annotations yet -- not even for new code.
However once there's agreement on os.fspath() it can be added to the stubs in github.com/python/typeshed.
I see, and I did have that impression already about the stdlib and type hints, probably based on some of your writings. My intention was to write these in the stub format, but apparently I need to look up the stub syntax once more.
Once there's a PEP, updating the stubs will be routine.
Is there going to be a PEP for os.fspath()? (I muted most of the discussions so I'm not sure where it stands.)
It has not seemed like a good idea to discuss this (too?), but now that you ask, I have been wondering how optimal it is to add this to the pathlib PEP. While the changes do affect pathlib (even the code of the module itself), this will affect ntpath, posixpath, os.scandir, os.[other stuff], DirEntry (tempted to say os.DirEntry, but that is not true), shutil.[stuff], (io.)open, and potentially all kinds of random places in the stdlib, such as fileinput, filecmp, zipfile, tarfile, tempfile (for the 'dir' keyword arguments), maybe even glob, and fnmatch, to name a few :).
And now, if the FSPath[underlying_type] I just proposed ends up being added to typing (by whatever name), this will even affect typing.py.
Personally I think it's better off as a separate PEP, unless it turns out that it can be compressed to just the addition of a few paragraphs to the original PEP 428. -- --Guido van Rossum (python.org/~guido)
On Tue, Apr 19, 2016, at 12:20, Guido van Rossum wrote:
Also, I don't expect a future where bytes paths don't make sense, unless Linux starts enforcing a normalized UTF-8 encoding in the kernel.
Well, OSX does that now, but that's a whole other topic. Whether it is useful to represent paths as the bytes type in Python code is orthogonal to whether you can have paths that aren't valid strings in an encoding, considering that surrogateescape lets you represent any sequence of bytes as a str.
On Tue, Apr 19, 2016 at 10:36 AM, Random832 <random832@fastmail.com> wrote:
On Tue, Apr 19, 2016, at 12:20, Guido van Rossum wrote:
Also, I don't expect a future where bytes paths don't make sense, unless Linux starts enforcing a normalized UTF-8 encoding in the kernel.
Well, OSX does that now, but that's a whole other topic. Whether it is useful to represent paths as the bytes type in Python code is orthogonal to whether you can have paths that aren't valid strings in an encoding, considering that surrogateescape lets you represent any sequence of bytes as a str.
I'm sorry, I just don't see support for bytes paths going away any time soon (regardless of the alternative you bring up), so I think it's a waste of breath to discuss it further. -- --Guido van Rossum (python.org/~guido)
On Tue, Apr 19, 2016 at 7:20 PM, Guido van Rossum <guido@python.org> wrote:
On Tue, Apr 19, 2016 at 3:41 AM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Tue, Apr 19, 2016 at 4:27 AM, Guido van Rossum <guido@python.org> wrote:
Is there going to be a PEP for os.fspath()? (I muted most of the discussions so I'm not sure where it stands.)
It has not seemed like a good idea to discuss this (too?), but now that you ask, I have been wondering how optimal it is to add this to the pathlib PEP. While the changes do affect pathlib (even the code of the module itself), this will affect ntpath, posixpath, os.scandir, os.[other stuff], DirEntry (tempted to say os.DirEntry, but that is not true), shutil.[stuff], (io.)open, and potentially all kinds of random places in the stdlib, such as fileinput, filecmp, zipfile, tarfile, tempfile (for the 'dir' keyword arguments), maybe even glob, and fnmatch, to name a few :).
And now, if the FSPath[underlying_type] I just proposed ends up being added to typing (by whatever name), this will even affect typing.py.
Personally I think it's better off as a separate PEP, unless it turns out that it can be compressed to just the addition of a few paragraphs to the original PEP 428.
While I could imagine the discussions having been shorter, it does not seem like compressing everything into a few paragraphs is a good idea either. And there are things that have not really been discussed, such as the details of the 'typing' part and the list of affected modules, which I tried to sketch above. Anyway, after all this, I wouldn't mind to also work on the PEP if there will be separate one---if that makes any sense. -Koos
It turns out it has been almost a month since this, and the PEP draft is already looking good. It seems we might now be ready to discuss it. Should we add the generic type FSPath[str]? Again, there is a naming issue, and the question of including plain str and bytes. We'll need to address this, unless we want the type checker to not know whether os.path.* etc. return str or bytes and to carry around Union[str, bytes]. In theory, it would be possible to infer whether it is str or bytes, as described. -- Koos On Tue, Apr 19, 2016 at 3:40 AM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
I actually proposed this already in one of the pathlib threads on python-dev, but I decided to repost here, because this is easily seen as a separate issue. I'll start with some introduction, then moving on to the actual type hinting part.
In our seemingly never-ending discussions about pathlib support in the stdlib in various threads, first here on python-ideas, then even more extensively on python-dev, have perhaps almost converged. The required changes involve a protocol method, probably named __fspath__, which any path-like type could implement to return a more, let's say, "classical" path object such as a str. However, the protocol is polymorphic and may also return bytes, which has a lot do do with the fact that the stdlib itself is polymophic and currently accepts str as well as bytes paths almost everywhere, including the newly-introduced os.scandir + DirEntry combination. The upcoming improvements will further allow passing pathlib path objects as well as DirEntry objects to any stdlib function that take paths.
It came up, for instance here [1], that the function associated with the protocol, potentially named os.fspath, will end up needing type hints. This function takes pathlike objects and turns them into str or bytes. There are various different scenarios [2] that can be considered for code dealing with paths, but let's consider the case of os.path.* and other traditional python path-related functions.
Some examples:
os.path.join
Currently, it takes str or bytes paths and returns a joined path of the same type (mixing different types raises an exception).
In the future, it will also accept pathlib objects (underlying type always str) and DirEntry (underlying type str or bytes) or third-party path objects (underlying type str or bytes). The function will then return a pathname of the underlying type.
os.path.dirname
Currently, it takes a str or bytes and returns the dirname of the same type. In the future, it will also accept Path and DirEntry and return the underlying type.
Let's consider the type hint of os.path.dirname at present and in the future:
Currently, one could write
def dirname(p: Union[str, bytes]) -> Union[str, bytes]: ...
While this is valid, it could be more precise:
pathstring = typing.TypeVar('pathstring', str, bytes)
def dirname(p: pathstring) -> pathstring: ...
This now contains the information that the return type is the same as the argument type. The name 'pathstring' may be considered slightly misleading because "byte strings" are not actually strings in Python 3, but at least it does not advertise the use of bytes as paths, which is very rarely desirable.
But what about the future. There are two kinds of rich path objects, those with an underlying type of str and those with an underlying type of bytes. These should implement the __fspath__() protocol and return their underlying type. However, we do care about what (underlying) type is provided by the protocol, so we might want to introduce something like typing.FSPath[underlying_type]:
FSPath[str] # str-based pathlike, including str FSPath[bytes] # bytes-based pathlike, including bytes
And now, using the above defined TypeVar pathstring, the future version of dirname would be type annotated as follows:
def dirname(p: FSPath[pathstring]) -> pathstring: ...
It's getting late. I hope this made sense :).
-Koos
[1] https://mail.python.org/pipermail/python-dev/2016-April/144246.html [2] https://mail.python.org/pipermail/python-dev/2016-April/144239.html
On 05/13/2016 01:30 PM, Koos Zevenhoven wrote:
It turns out it has been almost a month since this, and the PEP draft is already looking good. It seems we might now be ready to discuss it. Should we add the generic type FSPath[str]?
Guido's post on one of the other threads: ----------------------------------------
There's no need for typing.PathLike.
So I'm gonna say no. ;) -- ~Ethan~
On Fri, May 13, 2016 at 11:50 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
On 05/13/2016 01:30 PM, Koos Zevenhoven wrote:
It turns out it has been almost a month since this, and the PEP draft is already looking good. It seems we might now be ready to discuss it. Should we add the generic type FSPath[str]?
Guido's post on one of the other threads: ----------------------------------------
There's no need for typing.PathLike.
So I'm gonna say no. ;)
Oh, it looks like a read those two emails in the wrong order ;). Anyway, I was going to suggest making the abstract base class subscriptable too like this: PathABC[str] is a str-based path ABC, and PathABC[bytes] a bytes-based one ;). I don't know if that should be called a generic type or not, though. -- Koos
-- ~Ethan~
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, May 14, 2016 at 1:08 AM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Fri, May 13, 2016 at 11:50 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
On 05/13/2016 01:30 PM, Koos Zevenhoven wrote:
It turns out it has been almost a month since this, and the PEP draft is already looking good. It seems we might now be ready to discuss it. Should we add the generic type FSPath[str]?
Guido's post on one of the other threads: ----------------------------------------
There's no need for typing.PathLike.
So I'm gonna say no. ;)
Oh, it looks like a read those two emails in the wrong order ;).
Anyway, I was going to suggest making the abstract base class subscriptable too like this: PathABC[str] is a str-based path ABC, and PathABC[bytes] a bytes-based one ;). I don't know if that should be called a generic type or not, though.
But maybe making it a generic type would be the way to make the type checker to understand it? Of course if subscripting the ABC is not desired, there could be three ABCs, something like os.StrPath (for str-based path types) and os.BytesPath (for bytes-based path types) and os.StrBytesPath (for str/bytes like DirEntry, unless such classes are split in two). But I suppose there is nothing that would help combining this with a TypeVar-like thing. Then again, when working in pure python with a pure python fspath and with the __fspath__ method properly annotated in the path class, the type checker should probably be able to infer the types. Maybe Guido was referring to this. -- Koos
-- Koos
-- ~Ethan~
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, 13 May 2016 at 15:09 Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Fri, May 13, 2016 at 11:50 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
On 05/13/2016 01:30 PM, Koos Zevenhoven wrote:
It turns out it has been almost a month since this, and the PEP draft is already looking good. It seems we might now be ready to discuss it. Should we add the generic type FSPath[str]?
Guido's post on one of the other threads: ----------------------------------------
There's no need for typing.PathLike.
So I'm gonna say no. ;)
Oh, it looks like a read those two emails in the wrong order ;).
Anyway, I was going to suggest making the abstract base class subscriptable too like this: PathABC[str] is a str-based path ABC, and PathABC[bytes] a bytes-based one ;). I don't know if that should be called a generic type or not, though.
The PEP already addresses this and says "no".
On Sat, May 14, 2016 at 2:05 AM, Brett Cannon <brett@python.org> wrote:
On Fri, 13 May 2016 at 15:09 Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Fri, May 13, 2016 at 11:50 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
On 05/13/2016 01:30 PM, Koos Zevenhoven wrote:
It turns out it has been almost a month since this, and the PEP draft is already looking good. It seems we might now be ready to discuss it. Should we add the generic type FSPath[str]?
Guido's post on one of the other threads: ----------------------------------------
There's no need for typing.PathLike.
So I'm gonna say no. ;)
Oh, it looks like a read those two emails in the wrong order ;).
Anyway, I was going to suggest making the abstract base class subscriptable too like this: PathABC[str] is a str-based path ABC, and PathABC[bytes] a bytes-based one ;). I don't know if that should be called a generic type or not, though.
The PEP already addresses this and says "no".
I obviously know the PEP very well, and it doesn't. But I'm probably just doing a bad job explaining what I mean right now, and should probably go to bed. Sorry. -- Koos
On Fri, 13 May 2016 at 16:14 Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Sat, May 14, 2016 at 2:05 AM, Brett Cannon <brett@python.org> wrote:
On Fri, 13 May 2016 at 15:09 Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Fri, May 13, 2016 at 11:50 PM, Ethan Furman <ethan@stoneleaf.us>
wrote:
On 05/13/2016 01:30 PM, Koos Zevenhoven wrote:
It turns out it has been almost a month since this, and the PEP draft is already looking good. It seems we might now be ready to discuss it. Should we add the generic type FSPath[str]?
Guido's post on one of the other threads: ----------------------------------------
There's no need for typing.PathLike.
So I'm gonna say no. ;)
Oh, it looks like a read those two emails in the wrong order ;).
Anyway, I was going to suggest making the abstract base class subscriptable too like this: PathABC[str] is a str-based path ABC, and PathABC[bytes] a bytes-based one ;). I don't know if that should be called a generic type or not, though.
The PEP already addresses this and says "no".
I obviously know the PEP very well, and it doesn't. But I'm probably just doing a bad job explaining what I mean right now, and should probably go to bed. Sorry.
Ah, I now see what you're proposing: somehow making the ABC a generic like the generics types in the typing module are (which would be a new feature for ABCs and why I didn't realize what you were initially asking for; sorry about that). The answer is still "no". :) The generics support from the typing module is specific to that module and not applicable to ABCs themselves. Think of ABCs as helping guarantee that you implement specific methods and attributes. If your typing needs are met simply by that level of type information, then ABCs are fine as a type hint. But if you need something more specific like generics support then that's when it goes into the typing module (IOW don't think of the types in the typing module as ABCs or vice-versa but as separate things both dealing with duck-typing for their own purposes). Since the PEP already ruled out adding generics support for a special type representing os.PathLike or path-like objects, that also means it shouldn't be pushed into the ABC as an end-run around not going into the typing module.
On Sat, May 14, 2016 at 2:41 AM, Brett Cannon <brett@python.org> wrote:
On Fri, 13 May 2016 at 16:14 Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Sat, May 14, 2016 at 2:05 AM, Brett Cannon <brett@python.org> wrote:
On Fri, 13 May 2016 at 15:09 Koos Zevenhoven <k7hoven@gmail.com> wrote:
Anyway, I was going to suggest making the abstract base class subscriptable too like this: PathABC[str] is a str-based path ABC, and PathABC[bytes] a bytes-based one ;). I don't know if that should be called a generic type or not, though.
The PEP already addresses this and says "no".
I obviously know the PEP very well, and it doesn't. But I'm probably just doing a bad job explaining what I mean right now, and should probably go to bed. Sorry.
Ah, I now see what you're proposing: somehow making the ABC a generic like the generics types in the typing module are (which would be a new feature for ABCs and why I didn't realize what you were initially asking for; sorry about that). The answer is still "no". :)
I'm not sure that this is strictly a new feature, although I suppose there is no example of such an ABC at the moment in the stdlib. But I suppose there is a reason why, for instance, typing.Sequence and collections.abc.Sequence are not merged together. Maybe that is to limit the complexity of the already complex type stuff at this point. The question of whether the ABC could be subscripted to determine the underlying type can be viewed as separate from whether it inherits from Generic[...] or not. But IIUC, inheriting from Generic[...] is the thing that Mypy understands.
The generics support from the typing module is specific to that module and not applicable to ABCs themselves.
Specific to that module? Maybe you mean the convention of having the stdlib generics in typing.py.
Think of ABCs as helping guarantee that you implement specific methods and attributes.
Yes, that's pretty much what I do ;-). And as I already suggested, one could also avoid the subscripting part by defining separate ABCs, os.StrPath, os.BytesPath. This still wouldn't allow parametrizing with a TypeVar, but at least one could write [for os(.path) functions] @overload def dirname(p: Union[str, StrPath]) -> str: ... @overload def dirname(p: Union[bytes, BytesPath] -> str: ... and @overload def fspath(p: Union[str, StrPath]) -> str: ... @overload def fspath(p: Union[bytes, BytesPath] -> str: ... - Koos P.S. The situation with DirEntry requires more considerations, because it can have either underlying type.
A copy-paste error of mine corrected below, sorry. On Sun, May 15, 2016 at 1:51 PM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Sat, May 14, 2016 at 2:41 AM, Brett Cannon <brett@python.org> wrote:
On Fri, 13 May 2016 at 16:14 Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Sat, May 14, 2016 at 2:05 AM, Brett Cannon <brett@python.org> wrote:
On Fri, 13 May 2016 at 15:09 Koos Zevenhoven <k7hoven@gmail.com> wrote:
Anyway, I was going to suggest making the abstract base class subscriptable too like this: PathABC[str] is a str-based path ABC, and PathABC[bytes] a bytes-based one ;). I don't know if that should be called a generic type or not, though.
The PEP already addresses this and says "no".
I obviously know the PEP very well, and it doesn't. But I'm probably just doing a bad job explaining what I mean right now, and should probably go to bed. Sorry.
Ah, I now see what you're proposing: somehow making the ABC a generic like the generics types in the typing module are (which would be a new feature for ABCs and why I didn't realize what you were initially asking for; sorry about that). The answer is still "no". :)
I'm not sure that this is strictly a new feature, although I suppose there is no example of such an ABC at the moment in the stdlib. But I suppose there is a reason why, for instance, typing.Sequence and collections.abc.Sequence are not merged together. Maybe that is to limit the complexity of the already complex type stuff at this point.
The question of whether the ABC could be subscripted to determine the underlying type can be viewed as separate from whether it inherits from Generic[...] or not. But IIUC, inheriting from Generic[...] is the thing that Mypy understands.
The generics support from the typing module is specific to that module and not applicable to ABCs themselves.
Specific to that module? Maybe you mean the convention of having the stdlib generics in typing.py.
Think of ABCs as helping guarantee that you implement specific methods and attributes.
Yes, that's pretty much what I do ;-).
And as I already suggested, one could also avoid the subscripting part by defining separate ABCs, os.StrPath, os.BytesPath. This still wouldn't allow parametrizing with a TypeVar, but at least one could write [for os(.path) functions]
@overload def dirname(p: Union[str, StrPath]) -> str: ... @overload def dirname(p: Union[bytes, BytesPath] -> bytes: ...
corrected to "-> bytes"
and
@overload def fspath(p: Union[str, StrPath]) -> str: ... @overload def fspath(p: Union[bytes, BytesPath] -> bytes: ...
corrected to "-> bytes"
- Koos
P.S. The situation with DirEntry requires more considerations, because it can have either underlying type.
I didn't have time to read the thread, but I read the PEP and thought about this a little bit. One key thing is that we can write the code CPython sees at runtime one way, and write the stubs that type checkers (like mypy) see a different way. The stubs go in the typeshed repo (https://github.com/python/typeshed) and I would add something like the following to the os module there (stdlib/3/os/__init__.pyi in the repo). First we need to add scandir() and DirEntry (this is not entirely unrelated -- DirEntry is an example of something that is PathLike). Disregarding the PathLike protocol for the moment, I think they can be defined like this: if sys.version_info >= (3, 5): class DirEntry(Generic[AnyStr]): name = ... # type: AnyStr path = ... # type: AnyStr def inode(self) -> int: ... def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ... def is_file(self, *, follow_symlinks: bool = ...) -> bool: ... def is_symlink(self) -> bool: ... def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ... @overload def scandir(path: str = ...) -> DirEntry[str]: ... @overload def scandir(path: bytes) -> DirEntry[bytes]: ... Note that the docs claim there's a type os.DirEntry, even though it doesn't currently exist -- I think we should fix that in 3.6 even if it may not make sense to instantiate it. Also note that a slightly different overload is also possible -- I think these are for all practical purposes the same: @overload def scandir() -> DirEntry[str]: ... @overload def scandir(path: AnyStr) -> DirEntry[AnyStr]: ... The reason we need the overload in all cases is that os.scandir() without arguments returns a str. Finally, a reminder that this is all stub code -- it's only ever seen by typecheckers. What we put in the actual os.py file in the stdlib can be completely different, and it doesn't need type annotations (type checkers always prefer the stubs over the real code). Now let's add PathLike. This first attempt doesn't address DirEntry yet: if sys.version_info >= (3, 6): from abc import abstractmethod class PathLike(Generic[AnyStr]): @abstractmethod def __fspath__(self) -> AnyStr: ... @overload def fspath(path: PathLike[AnyStr]) -> AnyStr: ... @overload def fspath(path: AnyStr) -> AnyStr: ... This tells a type checker enough so that it will know that e.g. os.fspath(b'.') returns a bytes object. Also, if we have a class C that derives from PathLike we can make it non-generic, e.g. the stubs for pathlib.Path would start with something like class Path(os.PathLike[str]): ... and now the type checker will know that in the following code `c` is always a str: a = ... # type: Any b = pathlib.Path(a) c = os.fspath(b) Finally let's redefine scandir(). We'll have to redefind DirEntry to inherit from PathLike, and it will remain generic: class DirEntry(PathLike[AnyStr], Generic[AnyStr]): # Everything else unchanged! Now the type checker should understand the following: for a in os.scandir('.'): b = os.fspath(a) ... Here it will know that `a` is a DirEntry[str] (because the argument given to os.scandir() is a str) and hence it will also know that b is a str. Now if then pass b to pathlib it will understand this cannot be a type error, and if you pass b to some os.path.* function (e.g. os.path.basename()) it will understand the return value is a str. If you pass some variable to os.scandir() then if the type checker can deduce that that variable is a str (e.g. because you've gotten it from pathlib) it will know that the results are DirEntry[str] instances. If you pass something to os.scandir() that's a bytes object it will know that the results are DirEntry[bytes] objects, and it knows that calling os.fspath() on those will return bytes. (And it will know that you can't pass those to pathlib, but you *can* pass them to most os and os.path functions.) Next, if the variable passed to os.scandir() has the declared or inferred type AnyStr then mypy will know that it can be either str or bytes and the types of results will also use AnyStr. I think in that case you'll get an error if you pass it to pathlib. Note that this can only happen inside a generic class or a generic function that has AnyStr as one of its parameters. (AnyStr is itself a type variable.) The story ought to be similar if the variable has the type Union[str, bytes], except that this can occur in non-generic code and the resulting types are similarly fuzzy. (I think there's a bug in mypy around this though, follow https://github.com/python/mypy/issues/1533 if you're interested how that turns out.) -- --Guido van Rossum (python.org/~guido)
On Sun, 15 May 2016 at 10:21 Guido van Rossum <guido@python.org> wrote:
I didn't have time to read the thread, but I read the PEP and thought about this a little bit.
One key thing is that we can write the code CPython sees at runtime one way, and write the stubs that type checkers (like mypy) see a different way. The stubs go in the typeshed repo (https://github.com/python/typeshed) and I would add something like the following to the os module there (stdlib/3/os/__init__.pyi in the repo).
First we need to add scandir() and DirEntry (this is not entirely unrelated -- DirEntry is an example of something that is PathLike). Disregarding the PathLike protocol for the moment, I think they can be defined like this:
if sys.version_info >= (3, 5): class DirEntry(Generic[AnyStr]): name = ... # type: AnyStr path = ... # type: AnyStr def inode(self) -> int: ... def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ... def is_file(self, *, follow_symlinks: bool = ...) -> bool: ... def is_symlink(self) -> bool: ... def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ...
@overload def scandir(path: str = ...) -> DirEntry[str]: ... @overload def scandir(path: bytes) -> DirEntry[bytes]: ...
Note that the docs claim there's a type os.DirEntry, even though it doesn't currently exist -- I think we should fix that in 3.6 even if it may not make sense to instantiate it.
http://bugs.python.org/issue27038 (and AnyStr isn't documented, so http://bugs.python.org/issue26141).
Also note that a slightly different overload is also possible -- I think these are for all practical purposes the same:
@overload def scandir() -> DirEntry[str]: ... @overload def scandir(path: AnyStr) -> DirEntry[AnyStr]: ...
The reason we need the overload in all cases is that os.scandir() without arguments returns a str.
Finally, a reminder that this is all stub code -- it's only ever seen by typecheckers. What we put in the actual os.py file in the stdlib can be completely different, and it doesn't need type annotations (type checkers always prefer the stubs over the real code).
Now let's add PathLike. This first attempt doesn't address DirEntry yet:
if sys.version_info >= (3, 6): from abc import abstractmethod class PathLike(Generic[AnyStr]): @abstractmethod def __fspath__(self) -> AnyStr: ...
@overload def fspath(path: PathLike[AnyStr]) -> AnyStr: ... @overload def fspath(path: AnyStr) -> AnyStr: ...
This tells a type checker enough so that it will know that e.g. os.fspath(b'.') returns a bytes object. Also, if we have a class C that derives from PathLike we can make it non-generic, e.g. the stubs for pathlib.Path would start with something like
class Path(os.PathLike[str]): ...
and now the type checker will know that in the following code `c` is always a str:
a = ... # type: Any b = pathlib.Path(a) c = os.fspath(b)
Finally let's redefine scandir(). We'll have to redefind DirEntry to inherit from PathLike, and it will remain generic:
class DirEntry(PathLike[AnyStr], Generic[AnyStr]): # Everything else unchanged!
Now the type checker should understand the following:
for a in os.scandir('.'): b = os.fspath(a) ...
Here it will know that `a` is a DirEntry[str] (because the argument given to os.scandir() is a str)
Which works because AnyStr is a TypeVar (if anyone else was wondering like I was why that worked since AnyStr isn't documented yet).
and hence it will also know that b is a str. Now if then pass b to pathlib it will understand this cannot be a type error, and if you pass b to some os.path.* function (e.g. os.path.basename()) it will understand the return value is a str.
If you pass some variable to os.scandir() then if the type checker can deduce that that variable is a str (e.g. because you've gotten it from pathlib) it will know that the results are DirEntry[str] instances. If you pass something to os.scandir() that's a bytes object it will know that the results are DirEntry[bytes] objects, and it knows that calling os.fspath() on those will return bytes. (And it will know that you can't pass those to pathlib, but you *can* pass them to most os and os.path functions.)
Next, if the variable passed to os.scandir() has the declared or inferred type AnyStr then mypy will know that it can be either str or bytes and the types of results will also use AnyStr. I think in that case you'll get an error if you pass it to pathlib. Note that this can only happen inside a generic class or a generic function that has AnyStr as one of its parameters. (AnyStr is itself a type variable.)
The story ought to be similar if the variable has the type Union[str, bytes], except that this can occur in non-generic code and the resulting types are similarly fuzzy. (I think there's a bug in mypy around this though, follow https://github.com/python/mypy/issues/1533 if you're interested how that turns out.)
This might make a nice example in the docs and/or blog post since this is hitting the intermediate/advanced space for typing that almost none of us have hit.
On Mon, May 16, 2016 at 9:15 AM, Brett Cannon <brett@python.org> wrote:
This might make a nice example in the docs and/or blog post since this is hitting the intermediate/advanced space for typing that almost none of us have hit.
I'll see if I can follow up on this idea. But if anyone else feels like blogging about this, please go ahead! If you post a link here I'll retweet. -- --Guido van Rossum (python.org/~guido)
On Mon, 16 May 2016 at 11:38 Guido van Rossum <guido@python.org> wrote:
On Mon, May 16, 2016 at 9:15 AM, Brett Cannon <brett@python.org> wrote:
This might make a nice example in the docs and/or blog post since this is hitting the intermediate/advanced space for typing that almost none of us have hit.
I'll see if I can follow up on this idea. But if anyone else feels like blogging about this, please go ahead! If you post a link here I'll retweet.
If you don't get to it then maybe I will in a blog post on PEP 519, but I don't think I will be much faster than you for the foreseeable future. :)
On its way. It's longer than I expected. Hopefully to be finished tomorrow... On Mon, May 16, 2016 at 1:48 PM, Brett Cannon <brett@python.org> wrote:
On Mon, 16 May 2016 at 11:38 Guido van Rossum <guido@python.org> wrote:
On Mon, May 16, 2016 at 9:15 AM, Brett Cannon <brett@python.org> wrote:
This might make a nice example in the docs and/or blog post since this is hitting the intermediate/advanced space for typing that almost none of us have hit.
I'll see if I can follow up on this idea. But if anyone else feels like blogging about this, please go ahead! If you post a link here I'll retweet.
If you don't get to it then maybe I will in a blog post on PEP 519, but I don't think I will be much faster than you for the foreseeable future. :)
-- --Guido van Rossum (python.org/~guido)
For the mailing list folks here's a draft of the promised blog post: https://paper.dropbox.com/doc/Adding-type-annotations-for-PEP-519-SQuovLc1Zy... Hopefully some of you can actually add comments in the sidebar (it requires a Dropbox login). I'll convert it to my usual Blogger format tomorrow. --Guido On Mon, May 16, 2016 at 9:08 PM, Guido van Rossum <guido@python.org> wrote:
On its way. It's longer than I expected. Hopefully to be finished tomorrow...
On Mon, May 16, 2016 at 1:48 PM, Brett Cannon <brett@python.org> wrote:
On Mon, 16 May 2016 at 11:38 Guido van Rossum <guido@python.org> wrote:
On Mon, May 16, 2016 at 9:15 AM, Brett Cannon <brett@python.org> wrote:
This might make a nice example in the docs and/or blog post since this is hitting the intermediate/advanced space for typing that almost none of us have hit.
I'll see if I can follow up on this idea. But if anyone else feels like blogging about this, please go ahead! If you post a link here I'll retweet.
If you don't get to it then maybe I will in a blog post on PEP 519, but I don't think I will be much faster than you for the foreseeable future. :)
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
On Mon, 16 May 2016 at 22:09 Guido van Rossum <guido@python.org> wrote:
For the mailing list folks here's a draft of the promised blog post:
https://paper.dropbox.com/doc/Adding-type-annotations-for-PEP-519-SQuovLc1Zy...
Hopefully some of you can actually add comments in the sidebar (it requires a Dropbox login).
No comments from me, although it's going to be interesting making this all work in Typeshed if we backport the PathLike support to the pathlib constructor to 3.4 and 3.5 but (obviously) not os.PathLike itself. -Brett
I'll convert it to my usual Blogger format tomorrow.
--Guido
On its way. It's longer than I expected. Hopefully to be finished tomorrow...
On Mon, May 16, 2016 at 1:48 PM, Brett Cannon <brett@python.org> wrote:
On Mon, 16 May 2016 at 11:38 Guido van Rossum <guido@python.org> wrote:
On Mon, May 16, 2016 at 9:15 AM, Brett Cannon <brett@python.org>
wrote:
This might make a nice example in the docs and/or blog post since
On Mon, May 16, 2016 at 9:08 PM, Guido van Rossum <guido@python.org> wrote: this is
hitting the intermediate/advanced space for typing that almost none of us have hit.
I'll see if I can follow up on this idea. But if anyone else feels like blogging about this, please go ahead! If you post a link here I'll retweet.
If you don't get to it then maybe I will in a blog post on PEP 519, but I don't think I will be much faster than you for the foreseeable future. :)
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
On Tue, May 17, 2016 at 9:33 AM, Brett Cannon <brett@python.org> wrote:
No comments from me, although it's going to be interesting making this all work in Typeshed if we backport the PathLike support to the pathlib constructor to 3.4 and 3.5 but (obviously) not os.PathLike itself.
By then mypy should support "if sys.version_info >= ..." checks. (There's some work required still: https://github.com/python/mypy/issues/698) -- --Guido van Rossum (python.org/~guido)
On Mon, May 16, 2016 at 10:09 PM, Guido van Rossum <guido@python.org> wrote:
For the mailing list folks here's a draft of the promised blog post: https://paper.dropbox.com/doc/Adding-type-annotations-for-PEP-519-SQuovLc1Zy...
Hopefully some of you can actually add comments in the sidebar (it requires a Dropbox login).
I'll convert it to my usual Blogger format tomorrow.
I ended up breaking this in two. Part 1, about AnyStr, is now posted: http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html Part 2 is still in draft and I'm hoping to get more feedback: https://paper.dropbox.com/doc/Adding-type-annotations-for-PEP-519-SQuovLc1Zy... -- --Guido van Rossum (python.org/~guido)
Here's part 2: http://neopythonic.blogspot.com/2016/05/adding-type-annotations-for-fspath.h... On Tue, May 17, 2016 at 10:33 AM, Guido van Rossum <guido@python.org> wrote:
On Mon, May 16, 2016 at 10:09 PM, Guido van Rossum <guido@python.org> wrote:
For the mailing list folks here's a draft of the promised blog post: https://paper.dropbox.com/doc/Adding-type-annotations-for-PEP-519-SQuovLc1Zy...
Hopefully some of you can actually add comments in the sidebar (it requires a Dropbox login).
I'll convert it to my usual Blogger format tomorrow.
I ended up breaking this in two. Part 1, about AnyStr, is now posted: http://neopythonic.blogspot.com/2016/05/the-anystr-type-variable.html
Part 2 is still in draft and I'm hoping to get more feedback: https://paper.dropbox.com/doc/Adding-type-annotations-for-PEP-519-SQuovLc1Zy...
-- --Guido van Rossum (python.org/~guido)
-- --Guido van Rossum (python.org/~guido)
This seems to be what I thought too, except completely in the stubs. I might add some comments in the blog post draft. -- Koos On Sun, May 15, 2016 at 8:21 PM, Guido van Rossum <guido@python.org> wrote:
I didn't have time to read the thread, but I read the PEP and thought about this a little bit.
One key thing is that we can write the code CPython sees at runtime one way, and write the stubs that type checkers (like mypy) see a different way. The stubs go in the typeshed repo (https://github.com/python/typeshed) and I would add something like the following to the os module there (stdlib/3/os/__init__.pyi in the repo).
First we need to add scandir() and DirEntry (this is not entirely unrelated -- DirEntry is an example of something that is PathLike). Disregarding the PathLike protocol for the moment, I think they can be defined like this:
if sys.version_info >= (3, 5): class DirEntry(Generic[AnyStr]): name = ... # type: AnyStr path = ... # type: AnyStr def inode(self) -> int: ... def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ... def is_file(self, *, follow_symlinks: bool = ...) -> bool: ... def is_symlink(self) -> bool: ... def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ...
@overload def scandir(path: str = ...) -> DirEntry[str]: ... @overload def scandir(path: bytes) -> DirEntry[bytes]: ...
Note that the docs claim there's a type os.DirEntry, even though it doesn't currently exist -- I think we should fix that in 3.6 even if it may not make sense to instantiate it.
Also note that a slightly different overload is also possible -- I think these are for all practical purposes the same:
@overload def scandir() -> DirEntry[str]: ... @overload def scandir(path: AnyStr) -> DirEntry[AnyStr]: ...
The reason we need the overload in all cases is that os.scandir() without arguments returns a str.
Finally, a reminder that this is all stub code -- it's only ever seen by typecheckers. What we put in the actual os.py file in the stdlib can be completely different, and it doesn't need type annotations (type checkers always prefer the stubs over the real code).
Now let's add PathLike. This first attempt doesn't address DirEntry yet:
if sys.version_info >= (3, 6): from abc import abstractmethod class PathLike(Generic[AnyStr]): @abstractmethod def __fspath__(self) -> AnyStr: ...
@overload def fspath(path: PathLike[AnyStr]) -> AnyStr: ... @overload def fspath(path: AnyStr) -> AnyStr: ...
This tells a type checker enough so that it will know that e.g. os.fspath(b'.') returns a bytes object. Also, if we have a class C that derives from PathLike we can make it non-generic, e.g. the stubs for pathlib.Path would start with something like
class Path(os.PathLike[str]): ...
and now the type checker will know that in the following code `c` is always a str:
a = ... # type: Any b = pathlib.Path(a) c = os.fspath(b)
Finally let's redefine scandir(). We'll have to redefind DirEntry to inherit from PathLike, and it will remain generic:
class DirEntry(PathLike[AnyStr], Generic[AnyStr]): # Everything else unchanged!
Now the type checker should understand the following:
for a in os.scandir('.'): b = os.fspath(a) ...
Here it will know that `a` is a DirEntry[str] (because the argument given to os.scandir() is a str) and hence it will also know that b is a str. Now if then pass b to pathlib it will understand this cannot be a type error, and if you pass b to some os.path.* function (e.g. os.path.basename()) it will understand the return value is a str.
If you pass some variable to os.scandir() then if the type checker can deduce that that variable is a str (e.g. because you've gotten it from pathlib) it will know that the results are DirEntry[str] instances. If you pass something to os.scandir() that's a bytes object it will know that the results are DirEntry[bytes] objects, and it knows that calling os.fspath() on those will return bytes. (And it will know that you can't pass those to pathlib, but you *can* pass them to most os and os.path functions.)
Next, if the variable passed to os.scandir() has the declared or inferred type AnyStr then mypy will know that it can be either str or bytes and the types of results will also use AnyStr. I think in that case you'll get an error if you pass it to pathlib. Note that this can only happen inside a generic class or a generic function that has AnyStr as one of its parameters. (AnyStr is itself a type variable.)
The story ought to be similar if the variable has the type Union[str, bytes], except that this can occur in non-generic code and the resulting types are similarly fuzzy. (I think there's a bug in mypy around this though, follow https://github.com/python/mypy/issues/1533 if you're interested how that turns out.)
-- --Guido van Rossum (python.org/~guido)
Executive summary: Inference of str vs bytes (in particular, which component of a "sum" type in general) is not always possible, and it's not obvious how beneficial it is. Proposal: have typecheckers recognize "suspicious" unions, and complain if they are not "converted" to a component type before passing to a polymorphic function. Koos Zevenhoven writes:
We'll need to address this, unless we want the type checker to not know whether os.path.* etc. return str or bytes and to carry around Union[str, bytes]. In theory, it would be possible to infer whether it is str or bytes, as described.
I'm -0.5 = "more trouble than it's worth". Rationale: *Sometimes* it would be possible. OTOH, modules like os.path *will* end up carrying around Union[str, bytes], precisely because they will be called in both contexts, and that's not predictable at the time of type-checking os.path. So what you really want is a Sum type constructor, where Sum is in the sense of category theory's sum functor. That is, not only is the result Type a (disjoint) union, but you're also guaranteed that each operation respects the component type. I could argue that this is why Guido was right to remove the restriction that os.fspath return str. The __fspath__ method is part of the "sum category os.path"[1], which is a collection of operations on bytes and on str that parallel each other because they are equivalent representations of filesystem paths. os.fspath therefore is also part of this category. If you want to take the result out of this category and make it str, use os.fsdecode (which is *not* part of this category *because* the codomain is str, not Sum[bytes,str]). Note that Union is still a useful type if we have Sum. For example, the operator division / : Tuple(Union(Float,Int),Union(Float,Int)) -> Union(Float,Int) might take 1.0/1.0 -> 1 (in Common Lisp it does, although in Python it doesn't). So the "sum" requirement that the map respect "original" type is a language design question, not a restriction we should apply for the fun of it. Now, how would the type checker treat __fspath__? Borrowing a type from Ethan, it needs to observe that there was bytes object in the main function that got passed to the Antipathy Sum[str,bytes] -> Path constructor. It would then tag that Path p as a bytes-y Path, and further tag the value of os.fspath(p) as bytes. So we would need to add the concept of a ComponentType-y object of Type SumType to the typing module. That sounds complicated to me, and typically not all that useful: people living at the level of bytes will be swimming in bytes, ditto str-people. Polymorphism "just works" for them. People working at the interface will be using functions like os.fsdecode that ensure type in the nature of things. They have to be more careful, but use of a Union[str, bytes] as an argument to foo(b: bytes) will already be flagged. I think TRT here is to provide a way to tell the type checker that certain Unions should be converted as soon as possible, and not allow passing them even to polymorphic functions. So: def foo(d: DirEntry) -> Int: s = os.fspath(d) with open(s) as f: # Flag use of s as Union[str, bytes]. # open() doesn't care, but the type # checker knows that such Unions # should only be passed to conversion # functions. return len(f) while def foo(d: DirEntry) -> Int: s = os.fspath(d) s = os.fsdecode(s) # Converted to str immediately, clear since # fsdecode Type is "SuspiciousUnion -> str". with open(s) as f: return len(f) is OK. This is on the "catch bugs early" principle, and respects the generally accepted principle that in "most" applications, encoded bytes should be decoded to str "at the boundary". Of course this would be optional (eg, it would be off in a standalone check of os.path for type sanity), and not all Unions are subject to this treatment (eg, it would make no sense for numbers). On the other hand, *inside* a function that is respectfully polymorphic, the annotation foo(s: Sum[str, bytes]) -> Sum[str, bytes] would tell the typechecker to ensure that in foo's implementation, if bytes go in, bytes come out, if str goes in, str comes out. That might be very useful (but I don't have a strong opinion). A nit: what would len(s: Sum[str, bytes]) -> Int mean? In this case it's obviously equivalent to len(s: Union[str, bytes]) -> Int, but I'm not sure it's all that obvious in other possible cases. Finally, in many cases Sum types just are not going to be useful. You might want to alias Number to Sum[Float, Int] because (in Python) Float + Float -> Float and Int + Int -> Int, but then you have two problems. First, you can't do inference on mixed-type arithmetic, Sum doesn't permit that.[2] Second, you can't do inference on Int / Int. Footnotes: [1] The quotes mean "I think I could make this precise but it would be boring. Bear with me, and don't be distracted by the fact that lots of stuff in this 'category' aren't in the os.path *module* -- eg, open() and __fspath__ itself." I hope that the categorical language will be a useful metaphor for those who understand category theory, but really the whole thing is a hand-wavy path to "what implementation would look like". [2] This is also a handwavy statement. Technically, the arithmetic we're talking about is binary: Tuple[Sum, Sum] -> Sum and it could do anything with mixed types, as Sum can only provide restrictions on unary: Sum -> Sum operation. So a more accurate way to put the point is that it's not really obvious what properties Tuple[Sum, Sum] should have, especially in the context of a function whose value is of the same Sum Type. And of course the problem with Int/Int is real -- it's hard to imagine how that could be handled by a general rule about Tuple[Sum, Sum] -> Sum: surely if both operands are of the same type, the value should be of that type! But we decided otherwise when designing Python 3.
First of all, thanks for posting, Stephen. I will be responding more thoroughly later, I'll first comment on some points. On Sat, May 14, 2016 at 8:28 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote: [...]
I could argue that this is why Guido was right to remove the restriction that os.fspath return str. The __fspath__ method is part of the "sum category os.path"[1], which is a collection of operations on bytes and on str that parallel each other because they are equivalent representations of filesystem paths. os.fspath therefore is also part of this category. If you want to take the result out of this category and make it str, use os.fsdecode (which is *not* part of this category *because* the codomain is str, not Sum[bytes,str]).
Yes, indeed. This was also the reason why, in the python-dev discussions, I have been arguing for __fspath__ to be able to return both str and bytes *from the beginning*, although I have phrased it quite differently and I think in several different ways. An this is also the reason why, more recently, I've been arguing that os.fspath should *always* allow this too. Before that, I was compromizing towards desires to enforce str, while making the polymorphicity optionally available. I was very happy to learn Guido thought the same, because it meant I could stop arguing for that. If only the path discussion would have had a better signal-to-noise ratio, then this would perhaps have happened faster and many many many man hours would have been saved. -- Koos
Footnotes: [1] The quotes mean "I think I could make this precise but it would be boring. Bear with me, and don't be distracted by the fact that lots of stuff in this 'category' aren't in the os.path *module* -- eg, open() and __fspath__ itself." I hope that the categorical language will be a useful metaphor for those who understand category theory, but really the whole thing is a hand-wavy path to "what implementation would look like".
I'm confused by the terminology in this discussion. PEP 484 has type variables with constraints, and the typing module predefines AnyStr = TypeVar('AnyStr', str, bytes). This is then used to define various polymorphic functions, e.g. in os.path: def abspath(path: AnyStr) -> AnyStr: ... This means that the type checker will know that the type of abspath(b'.') is bytes, the type of abspath('.') is str. There's also e.g. def relpath(path: AnyStr, start: AnyStr = ...) -> AnyStr: ... which means that path and start must be either both bytes or both str. For reference, this code lives in typeshed: https://github.com/python/typeshed/blob/master/stdlib/3/os/path.pyi Is that at all useful? On Fri, May 13, 2016 at 10:28 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Executive summary:
Inference of str vs bytes (in particular, which component of a "sum" type in general) is not always possible, and it's not obvious how beneficial it is. Proposal: have typecheckers recognize "suspicious" unions, and complain if they are not "converted" to a component type before passing to a polymorphic function.
Koos Zevenhoven writes:
We'll need to address this, unless we want the type checker to not know whether os.path.* etc. return str or bytes and to carry around Union[str, bytes]. In theory, it would be possible to infer whether it is str or bytes, as described.
I'm -0.5 = "more trouble than it's worth". Rationale:
*Sometimes* it would be possible. OTOH, modules like os.path *will* end up carrying around Union[str, bytes], precisely because they will be called in both contexts, and that's not predictable at the time of type-checking os.path.
So what you really want is a Sum type constructor, where Sum is in the sense of category theory's sum functor. That is, not only is the result Type a (disjoint) union, but you're also guaranteed that each operation respects the component type.
I could argue that this is why Guido was right to remove the restriction that os.fspath return str. The __fspath__ method is part of the "sum category os.path"[1], which is a collection of operations on bytes and on str that parallel each other because they are equivalent representations of filesystem paths. os.fspath therefore is also part of this category. If you want to take the result out of this category and make it str, use os.fsdecode (which is *not* part of this category *because* the codomain is str, not Sum[bytes,str]).
Note that Union is still a useful type if we have Sum. For example, the operator division / : Tuple(Union(Float,Int),Union(Float,Int)) -> Union(Float,Int) might take 1.0/1.0 -> 1 (in Common Lisp it does, although in Python it doesn't). So the "sum" requirement that the map respect "original" type is a language design question, not a restriction we should apply for the fun of it.
Now, how would the type checker treat __fspath__? Borrowing a type from Ethan, it needs to observe that there was bytes object in the main function that got passed to the Antipathy Sum[str,bytes] -> Path constructor. It would then tag that Path p as a bytes-y Path, and further tag the value of os.fspath(p) as bytes. So we would need to add the concept of a ComponentType-y object of Type SumType to the typing module. That sounds complicated to me, and typically not all that useful: people living at the level of bytes will be swimming in bytes, ditto str-people. Polymorphism "just works" for them. People working at the interface will be using functions like os.fsdecode that ensure type in the nature of things. They have to be more careful, but use of a Union[str, bytes] as an argument to foo(b: bytes) will already be flagged.
I think TRT here is to provide a way to tell the type checker that certain Unions should be converted as soon as possible, and not allow passing them even to polymorphic functions. So:
def foo(d: DirEntry) -> Int: s = os.fspath(d) with open(s) as f: # Flag use of s as Union[str, bytes]. # open() doesn't care, but the type # checker knows that such Unions # should only be passed to conversion # functions. return len(f)
while
def foo(d: DirEntry) -> Int: s = os.fspath(d) s = os.fsdecode(s) # Converted to str immediately, clear since # fsdecode Type is "SuspiciousUnion -> str". with open(s) as f: return len(f)
is OK. This is on the "catch bugs early" principle, and respects the generally accepted principle that in "most" applications, encoded bytes should be decoded to str "at the boundary". Of course this would be optional (eg, it would be off in a standalone check of os.path for type sanity), and not all Unions are subject to this treatment (eg, it would make no sense for numbers).
On the other hand, *inside* a function that is respectfully polymorphic, the annotation foo(s: Sum[str, bytes]) -> Sum[str, bytes] would tell the typechecker to ensure that in foo's implementation, if bytes go in, bytes come out, if str goes in, str comes out. That might be very useful (but I don't have a strong opinion).
A nit: what would len(s: Sum[str, bytes]) -> Int mean? In this case it's obviously equivalent to len(s: Union[str, bytes]) -> Int, but I'm not sure it's all that obvious in other possible cases.
Finally, in many cases Sum types just are not going to be useful. You might want to alias Number to Sum[Float, Int] because (in Python) Float + Float -> Float and Int + Int -> Int, but then you have two problems. First, you can't do inference on mixed-type arithmetic, Sum doesn't permit that.[2] Second, you can't do inference on Int / Int.
Footnotes: [1] The quotes mean "I think I could make this precise but it would be boring. Bear with me, and don't be distracted by the fact that lots of stuff in this 'category' aren't in the os.path *module* -- eg, open() and __fspath__ itself." I hope that the categorical language will be a useful metaphor for those who understand category theory, but really the whole thing is a hand-wavy path to "what implementation would look like".
[2] This is also a handwavy statement. Technically, the arithmetic we're talking about is binary: Tuple[Sum, Sum] -> Sum and it could do anything with mixed types, as Sum can only provide restrictions on unary: Sum -> Sum operation. So a more accurate way to put the point is that it's not really obvious what properties Tuple[Sum, Sum] should have, especially in the context of a function whose value is of the same Sum Type. And of course the problem with Int/Int is real -- it's hard to imagine how that could be handled by a general rule about Tuple[Sum, Sum] -> Sum: surely if both operands are of the same type, the value should be of that type! But we decided otherwise when designing Python 3.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
Sorry Stephen, I feel like I've fallen down the rabbit hole here... your posts are usually a paragon of clarity but this one is just beyond me :( On Sat, May 14, 2016 at 02:28:51PM +0900, Stephen J. Turnbull wrote:
Executive summary:
Inference of str vs bytes (in particular, which component of a "sum" type in general) is not always possible, and it's not obvious how beneficial it is. Proposal: have typecheckers recognize "suspicious" unions, and complain if they are not "converted" to a component type before passing to a polymorphic function. [...] So what you really want is a Sum type constructor, where Sum is in the sense of category theory's sum functor. That is, not only is the result Type a (disjoint) union, but you're also guaranteed that each operation respects the component type.
Are we supposed to know what category theory's sum functor is? I'm warning you, if you start talking about Monads I'm just going hit delete on your post :-) I'm afraid I don't understand what you mean by a Sum type constructor. Can you explain in more detail? I also don't understand what you mean by '"suspicious" unions' (quotes in original) or what makes them suspicious, or "suspicious" as the case may be. What's suspicious about Union[bytes, str]? Are you proposing a Sum contructor for the typing module, as an alternative to Union? What will it do? If not, why are you talking about Sum[bytes, str]? -- Steve
Steven D'Aprano writes:
Are we supposed to know what category theory's sum functor is?
No. There are Pythonistas who do know, that was for their benefit. But to clarify what I mean, I'll quote Guido's example (aside to Guido: yes, that helped!):
PEP 484 has type variables with constraints, and the typing module predefines AnyStr = TypeVar('AnyStr', str, bytes). This is then used to define various polymorphic functions, e.g. in os.path:
def abspath(path: AnyStr) -> AnyStr: ...
This means that the type checker will know that the type of abspath(b'.') is bytes, the type of abspath('.') is str.
My bad here. This has the effect of what I called a Sum, I just didn't recognize it as that when I last read PEP 484.
I also don't understand what you mean by '"suspicious" unions' (quotes in original) or what makes them suspicious, or "suspicious" as the case may be. What's suspicious about Union[bytes, str]?
Union types doesn't allow you to trace the source of an unexpected type backward (or forward), thus, all instances of a Union type must be considered potential sources of the "wrong" type, even if you can infer some of the types of source variables. Compare Number, another union type: consider trying to figure out where an "unexpected" float came from in function performing a long sequence of arithmetic operations including many divisions. You can't do it knowing only the types of the arguments to the function, you need to know the values as well. But I shouldn't have taken the OP's Union[bytes,str] seriously. In fact the os.path functions and other such functions are defined with AnyStr, which is a *variable* type, not a Union type. In typeshed, we write os.path.basename(path: AnyStr) -> AnyStr: ... Ie, AnyStr can be either str or bytes, but it must be the same in all places it appears in the annotated signature. This allows us to reason forward from an appearance of bytes or str to determine what the value of any composition of os or os.path AnyStr->AnyStr functions would be. Eg, in def foo(path: str) -> str: return os.path.basename(os.path.dirname(os.realpath(path))) the composition is str->AnyStr->AnyStr->AnyStr, which resolves to str->str->str->str, and so foo() typechecks sucessfully. Back to typing and the __fspath__ PEP. The remaining question dealing with typing of __fspath__ as currently specified is determining the subtype of DirEntry you're looking at when its __fspath__ gets invoked. (A similar problem remains for the name attribute of file objects, but they aren't scheduled to get a __fspath__.) But AFAICT there is no way to do that yet (I can't find DirEntry in the typing module). Maybe I can contribute to resolving these issues.
Are you proposing a Sum contructor for the typing module, as an alternative to Union? What will it do?
Not any more. It would have done basically what constrained TypeVars like AnyStr do, but more obscurely, and it would have required much more effort and notation to work with functions with multiple arguments of different types and the like.
If not, why are you talking about Sum[bytes, str]?
Because I didn't know better then. :-)
On Sun, May 15, 2016 at 8:20 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
But I shouldn't have taken the OP's Union[bytes,str] seriously. In fact the os.path functions and other such functions are defined with AnyStr, which is a *variable* type, not a Union type. In typeshed, we write
If you mean my OP in the other thread, I was indeed just introducing the readers (potentially not familiar with PEP 484, mypy etc.) to the basic concepts such as Unions and TypeVars while going towards the actual proposal at the very end of the post, which was to avoid having to go from TypeVars "back to" Unions in os.fspath and os.path functions. -- Koos
On Sat, May 14, 2016 at 8:28 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Executive summary:
Inference of str vs bytes (in particular, which component of a "sum" type in general) is not always possible, and it's not obvious how beneficial it is. Proposal: have typecheckers recognize "suspicious" unions, and complain if they are not "converted" to a component type before passing to a polymorphic function.
Maybe you mean to propose that the concept of sum types should be introduced in typing/mypy. I don't think we want that, because annotating types with them would probably be clumsy, and there is too much overlap with things that are already available (unless of course some of that gets thrown away in favor of Sum).
Koos Zevenhoven writes:
We'll need to address this, unless we want the type checker to not know whether os.path.* etc. return str or bytes and to carry around Union[str, bytes]. In theory, it would be possible to infer whether it is str or bytes, as described.
I'm -0.5 = "more trouble than it's worth". Rationale:
I could imagine that being the conclusion at this point. I just feel bad for reducing the precision in the "type-hintability" of things like os.path.* as a consequence of the new fspath PEP, which I'm otherwise already quite happy with. That's the problem I'm trying to solve. I'm perfectly fine with it if the conclusion is that it's not worth it. I just don't want to say I broke the type hinting for os.path.* for people that want mypy to catch their errors of mixing bytes and str paths in the wrong way, or errors of mixing str/bytes-polymorphic path code with str concatenation etc. in the wrong way. But if I'm told Unions are sufficient, I'll be happy with that, because I'm not a significant user of type hints anyway. So please, someone tell me that. In my OP in the original thread, I describe how the os.path functions could presently be hinted (I guess someone may have already done this, but I did not check). In the future, It may be preferable to tell the type checker that passing a PurePath into os.path functions results in str being returned, as I also explain using a parametrizable type.
*Sometimes* it would be possible. OTOH, modules like os.path *will* end up carrying around Union[str, bytes], precisely because they will be called in both contexts, and that's not predictable at the time of type-checking os.path.
My proposal is aimed at making it possible to *always* have mypy know the types.
So what you really want is a Sum type constructor, where Sum is in the sense of category theory's sum functor. That is, not only is the result Type a (disjoint) union, but you're also guaranteed that each operation respects the component type.
(Side note: I don't know why you capitalize T in "Type" here, it does not seem to make sense in the context that Guido has just been describing as Type. Is this confusion an argument against calling Type "Type"?) My point was to parametrize the type (hint) for path objects [pretty much exactly (like) a generic], which roughly means that the type checker can be told how one disjoint union as the argument type gets turned into another disjoint union as the return type. I suppose one might write this in English as "In a typical os.path.* function such as os.path.dirname, Union[PathABC[str], str] and Union[PathABC[bytes], bytes] get turned into str and bytes, *respectively*." Mypy/PEP483-4 approaches this problem by using Generics and TypeVars. Parametrizing the path type by the underlying type such as str in PathABC[str] would allow using a TypeVar to indicate "respectively". However, there is also @overload to indicate this.
I could argue that this is why Guido was right to remove the restriction that os.fspath return str. The __fspath__ method is part of the "sum category os.path"[1], which is a collection of operations on bytes and on str that parallel each other because they are equivalent representations of filesystem paths. os.fspath therefore is also part of this category. If you want to take the result out of this category and make it str, use os.fsdecode (which is *not* part of this category *because* the codomain is str, not Sum[bytes,str]).
Indeed, as mentioned in my previous post, you seem to be describing exactly my earlier arguments, but using a different (or additional) set of terminology.
Note that Union is still a useful type if we have Sum. For example, the operator division / : Tuple(Union(Float,Int),Union(Float,Int)) -> Union(Float,Int) might take 1.0/1.0 -> 1 (in Common Lisp it does, although in Python it doesn't). So the "sum" requirement that the map respect "original" type is a language design question, not a restriction we should apply for the fun of it.
Yes, Unions will always be needed.
Now, how would the type checker treat __fspath__? Borrowing a type from Ethan, it needs to observe that there was bytes object in the main function that got passed to the Antipathy Sum[str,bytes] -> Path constructor. It would then tag that Path p as a bytes-y Path, and further tag the value of os.fspath(p) as bytes. So we would need to add the concept of a ComponentType-y object of Type SumType to the typing module. That sounds complicated to me, and typically not all that useful: people living at the level of bytes will be swimming in bytes, ditto str-people. Polymorphism "just works" for them. People working at the interface will be using functions like os.fsdecode that ensure type in the nature of things. They have to be more careful, but use of a Union[str, bytes] as an argument to foo(b: bytes) will already be flagged.
Indeed, as mentioned, there are already approaches for this.
I think TRT here is to provide a way to tell the type checker that certain Unions should be converted as soon as possible, and not allow passing them even to polymorphic functions. So:
def foo(d: DirEntry) -> Int: s = os.fspath(d) with open(s) as f: # Flag use of s as Union[str, bytes]. # open() doesn't care, but the type # checker knows that such Unions # should only be passed to conversion # functions. return len(f)
while
def foo(d: DirEntry) -> Int: s = os.fspath(d) s = os.fsdecode(s) # Converted to str immediately, clear since # fsdecode Type is "SuspiciousUnion -> str". with open(s) as f: return len(f)
There is nothing suspicious about that Union there. "Union[str, bytes] -> str" is a perfectly good type hint for os.fsdecode. You can't get any more precise than that. Some nitpicking: Your function foo should probably be hinted to work with any path, not just DirEntry, besides, there seems to be no right way to currently import DirEntry, you won't actually be able to use that as a static type hint. I'm not sure this example has anything to do with what you (or I) are proposing, except that os.fsdecode and os.fsencode indeed make sure that the output type is no longer a Union. So maybe you are just calling Unions in general suspicious, because without a context you can't know exactly which runtime type it should be. Luckily our PEP will turn your example into: def foo(d: TheTypeHintForAnyPath) -> int: with open(d) as f: return len(f)
is OK. This is on the "catch bugs early" principle, and respects the generally accepted principle that in "most" applications, encoded bytes should be decoded to str "at the boundary". Of course this would be optional (eg, it would be off in a standalone check of os.path for type sanity), and not all Unions are subject to this treatment (eg, it would make no sense for numbers).
One definitely should not use os.fsdecode or os.fsencode just to help the type checker. I hope that is not what you meant.
On the other hand, *inside* a function that is respectfully polymorphic, the annotation foo(s: Sum[str, bytes]) -> Sum[str, bytes] would tell the typechecker to ensure that in foo's implementation, if bytes go in, bytes come out, if str goes in, str comes out. That might be very useful (but I don't have a strong opinion).
Indeed, this is almost the same as TypeVar or an overload.
A nit: what would len(s: Sum[str, bytes]) -> Int mean? In this case it's obviously equivalent to len(s: Union[str, bytes]) -> Int, but I'm not sure it's all that obvious in other possible cases.
Also here. - Koos
Finally, in many cases Sum types just are not going to be useful. You might want to alias Number to Sum[Float, Int] because (in Python) Float + Float -> Float and Int + Int -> Int, but then you have two problems. First, you can't do inference on mixed-type arithmetic, Sum doesn't permit that.[2] Second, you can't do inference on Int / Int.
Footnotes: [1] The quotes mean "I think I could make this precise but it would be boring. Bear with me, and don't be distracted by the fact that lots of stuff in this 'category' aren't in the os.path *module* -- eg, open() and __fspath__ itself." I hope that the categorical language will be a useful metaphor for those who understand category theory, but really the whole thing is a hand-wavy path to "what implementation would look like".
[2] This is also a handwavy statement. Technically, the arithmetic we're talking about is binary: Tuple[Sum, Sum] -> Sum and it could do anything with mixed types, as Sum can only provide restrictions on unary: Sum -> Sum operation. So a more accurate way to put the point is that it's not really obvious what properties Tuple[Sum, Sum] should have, especially in the context of a function whose value is of the same Sum Type. And of course the problem with Int/Int is real -- it's hard to imagine how that could be handled by a general rule about Tuple[Sum, Sum] -> Sum: surely if both operands are of the same type, the value should be of that type! But we decided otherwise when designing Python 3.
Koos Zevenhoven writes:
I just don't want to say I broke the type hinting for os.path.*
Blame it on Brett. :-)
But if I'm told Unions are sufficient,
Unions aren't. TypeVars are (and they're basically what you proposed as a solution, I just didn't recognize them in the context of "Union").
There is nothing suspicious about that Union there. "Union[str, bytes] -> str" is a perfectly good type hint for os.fsdecode. You can't get any more precise than that.
Unions are suspicious as *values*, not as arguments. Fortunately, AnyStr is not a Union.
Koos Zevenhoven wrote:
Maybe you mean to propose that the concept of sum types should be introduced in typing/mypy.
I'm not deeply into category theory, but the proposal seems to be that Sum would be a special kind of type that's assumed to be the same type wherever it appears in the signature of a function. That might work for the special case of a function of one parameter that returns the same type as its argument, but that's about all. As has been pointed out, type parameters allow the same thing to be expressed, and a lot more besides, so a Sum type is not needed. We already have everything we neeed. -- Greg
There seems to be a movement (inspired by Haskell no doubt) that argues that sum types are the next fashion in language design. E.g. Googling for "sum types python" I found https://chadaustin.me/2015/07/sum-types/ (I should look up the author, he works a floor away from me :-). If you read that carefully, a sum type is mostly syntactic sugar for a tagged union. They're easy enough to add to compiled languages. I find it interesting that Austin mentions several types that the "Maybe" type is equivalent to the convention of returning NULL/null/None, and he uses it as a simple example (the simplest, in fact) of a sum type. So in PEP 484 that's called Optional[X], and indeed Optional[X] is syntactic sugar for Union[X, None], and of course unions in Python are always tagged (since you can always introspect the object type). So Python's equivalent for Sum types seems to be PEP 484's Union types. For matching, a sequence of isinstance() checks doesn't look too shabby: def add1(a: Union[int, str, bytes]) -> Union[int, str, bytes]: if isinstance(a, int): # In this block, the type of a is int return a+1 if isinstance(a, str): # Here a is a str return a + 'x' # In this block, the type of a is bytes (because it can't be anything else!) return a + b'x' Of course in this specific example, using a constrained type variable (similar to AnyStr) would be much better, since it declares that the return type in this case is the same as the argument type. However that's not what Sum types do in Haskell either, so we can't really blame them for this -- in Haskell as in PEP 484, Sum types and generics are pretty much orthogonal concepts. I think Sum/Union shine when they are either used for the argument (so the function must have a list of cases) or for the return value so the caller must have a list of cases), but not for both. That's not to say that it wouldn't be an interesting exercise to see if we can add a nice pattern matching syntax to Python. Austin claims that one of the advantages of Sum types in Haskell is that it supports a recursive matching syntax. It would probably look different than it looks in compiled languages though, because a Python compiler doesn't know much about the types occurring at runtime, so e.g. it couldn't tell you when you're missing a case. However, a type checker could -- this might be the first new Python feature that was in some sense enabled by PEP 484. (And here I thought I was off-topic for the thread, but this thread was actually started specifically to discuss Sum types, so now I feel better about this post.) PS. I recall vividly the classes in category theory I took in college. They were formative in my education -- I knew for sure then that I would never be a mathematician. On Sun, May 15, 2016 at 4:37 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Koos Zevenhoven wrote:
Maybe you mean to propose that the concept of sum types should be
introduced in typing/mypy.
I'm not deeply into category theory, but the proposal seems to be that Sum would be a special kind of type that's assumed to be the same type wherever it appears in the signature of a function.
That might work for the special case of a function of one parameter that returns the same type as its argument, but that's about all.
As has been pointed out, type parameters allow the same thing to be expressed, and a lot more besides, so a Sum type is not needed. We already have everything we neeed.
-- Greg
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On Mon, May 16, 2016 at 4:57 AM, Guido van Rossum <guido@python.org> wrote: [...]
unions in Python are always tagged (since you can always introspect the object type).
I suppose that is true at runtime, at least if the pairwise intersections of the "arguments" of the union are empty as they usually are. But for a static type checker, the "tag" for the Union may be missing (which is the thing I was worried about). [...]
in Haskell as in PEP 484, Sum types and generics are pretty much orthogonal concepts.
Although the parametrization provided by generics make TypeVars more powerful for annotating functions that deal with the equivalent of sum types. -- Koos
On Mon, May 16, 2016 at 7:45 AM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Mon, May 16, 2016 at 4:57 AM, Guido van Rossum <guido@python.org> wrote: [...]
unions in Python are always tagged (since you can always introspect the object type).
I suppose that is true at runtime, at least if the pairwise intersections of the "arguments" of the union are empty as they usually are. But for a static type checker, the "tag" for the Union may be missing (which is the thing I was worried about).
Have you thought about how a type checker works? *Of course* the tags are "missing" for it. When it sees that an argument (e.g.) is a union it has to type check the following code with the assumption that it can be any of those types. (However certain flow control using isinstance() can refine this knowledge, like the "if isinstance(...)" example I've given before; also "assert isinstance(...)" has a similar effect. When a type checker like mypy has a variable that it knows to be a str, and this variable is passed to a function whose signature is (Union[str, bytes]) -> Union[str, bytes], then the result has to be assumed to be the given union, because such a signature does not guarantee that the return type matches the input type. (E.g. if I had a function that turned bytes into str and str into bytes, this could be its signature!) For this reason, the type variable AnyStr exists, and the mechanism of constrained TypeVars that makes AnyStr possible -- you can make your own type variables with such behavior too.
[...]
in Haskell as in PEP 484, Sum types and generics are pretty much orthogonal concepts.
Although the parametrization provided by generics make TypeVars more powerful for annotating functions that deal with the equivalent of sum types.
Yeah, it's not uncommon for two orthogonal features to combine into something even more powerful like that. It's like Hydrogen plus Oxygen... :-) -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
On Mon, May 16, 2016 at 7:45 AM, Koos Zevenhoven <k7hoven@gmail.com <mailto:k7hoven@gmail.com>> wrote:
On Mon, May 16, 2016 at 4:57 AM, Guido van Rossum <guido@python.org <mailto:guido@python.org>> wrote: [...]
unions in Python are always tagged (since you can always introspect the object type).
The "Sum" types talked about in the referenced article are what Haskell calls "algebraic types". They're not really the same as the Union[X,Y] types we're talking about here, because a Union type simply tells the type checker that one of a number of different types could be present at run time. The code might introspect on the type, but it doesn't have to do anything special to access one of the branches of the union -- it just goes ahead and uses the value. An algebraic type, on the other hand, is a new type of run-time object that has to be explicitly unpacked to access its contents. It's more like a class in that respect. -- Greg
Guido van Rossum wrote:
... https://chadaustin.me/2015/07/sum-types/ ... unions in Python are always tagged (since you can alway introspect the object type).
Greg Ewing replied:
The "Sum" types talked about in the referenced article are what Haskell calls "algebraic types". They're not really the same as the Union[X,Y] types we're talking about here, because a Union type simply tells the type checker that one of a number of different types could be present at run time. The code might introspect on the type, but it doesn't have to do anything special to access one of the branches of the union -- it just goes ahead and uses the value.
An algebraic type, on the other hand, is a new type of run-time object that has to be explicitly unpacked to access its contents. It's more like a class in that respect.
I know. But this could be considered syntactic sugar. And the article insists that Sum types are not really the same as classes either (they're also a form of syntactic sugar). Anyway, I did set up a chat with the Chad (that article's author) and maybe we'll all get some more clarity. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
I know. But this could be considered syntactic sugar.
I don't think it is just syntactic sugar. There is a real difference between having a box containing either an apple or a banana, and having just an apple or a banana. -- Greg
Hmm... In Java there is a form of syntactic sugar that automatically deals with such boxes called auto-(un)boxing, IIUC. So I still think it can be called syntactic sugar. On Monday, May 16, 2016, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Guido van Rossum wrote:
I know. But this could be considered syntactic sugar.
I don't think it is just syntactic sugar. There is a real difference between having a box containing either an apple or a banana, and having just an apple or a banana.
-- Greg _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
Guido van Rossum wrote:
Hmm... In Java there is a form of syntactic sugar that automatically deals with such boxes called auto-(un)boxing, IIUC. So I still think it can be called syntactic sugar.
That's not the same thing either. Boxing in Java is a hack to make up for the fact that some types are not objects, and the auto boxing and unboxing is there so that you can forget about the boxes and pretend that e.g. int and Integer are the same type (at least for some purposes). But with algebraic types, the boxes are entities in their own right whose presence conveys information. You can have more than one kind of box with the same contents: data Box = Matchbox Int | Shoebox Int Not only is a Matchbox distinguishable from a Shoebox at run time, but Box is a distinct type from Int -- you can't pass an Int directly to something expecting a Box or vice versa. A realisation of algebraic types in Python (or any other language, for that matter) would require carrying information at run time about which kind of box is present. This is in contrast to a union type, which is purely a compile-time concept and has no run-time implications. -- Greg
On Tue, May 17, 2016 at 12:38 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote: [...]
A realisation of algebraic types in Python (or any other language, for that matter) would require carrying information at run time about which kind of box is present. This is in contrast to a union type, which is purely a compile-time concept and has no run-time implications.
The point about tagged unions was that Python *always* has the type information available at runtime. This type information corresponds to the "tag" in "tagged union". For a tagged union at runtime, a language does not need to carry around the information of what kind of box you have, but what kind of beast you have in that box. Unless of course you want a different kind of box type corresponding to each type, which would just be stupid. (Maybe there would be a separate box type for wrapping each of the boxes too ;-) Indeed I, on the other hand, was referring to the ambiguity of compile-time or static-type-checking unions, where the "tag" of a Union can only be known when a non-union/unambiguous type is explicitly passed to something that expects a union. But if the type hints do not "match the tags" [1], then the ambiguity of ("untagged") unions can "spread" in the static type-checking phase, while the runtime types are of course always well-defined. If a language simulates tagged unions by putting things in a box as you describe, and then comes up with a way to automatically release the bird out of a "Box" at runtime when you try to make it .quack(), then they would seem to be reinventing duck typing. - Koos [1] The way TypeVars or @overloads do, or my hypothetical TagMatcher sum example.
The difference between a box with one apple in it and a single apple is critical. In fact, I think that is the source of the most common type issue in real Python code: You can't tell the difference (without type checking) between a sequence of strings and a string. Because, of course, a string IS a sequence of strings. In this case, there is no character type, i.e. no type to represent a single apple. (Kind of an infinite Russian doll of Apple boxes...) Type hinting will help address the issue for strings, but it seems a very useful distinction for all types. -CHB
On Tue, May 17, 2016 at 2:38 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Guido van Rossum wrote:
Hmm... In Java there is a form of syntactic sugar that automatically deals with such boxes called auto-(un)boxing, IIUC. So I still think it can be called syntactic sugar.
That's not the same thing either. Boxing in Java is a hack to make up for the fact that some types are not objects, and the auto boxing and unboxing is there so that you can forget about the boxes and pretend that e.g. int and Integer are the same type (at least for some purposes).
But with algebraic types, the boxes are entities in their own right whose presence conveys information. You can have more than one kind of box with the same contents:
data Box = Matchbox Int | Shoebox Int
Not only is a Matchbox distinguishable from a Shoebox at run time, but Box is a distinct type from Int -- you can't pass an Int directly to something expecting a Box or vice versa.
A realisation of algebraic types in Python (or any other language, for that matter) would require carrying information at run time about which kind of box is present. This is in contrast to a union type, which is purely a compile-time concept and has no run-time implications.
I'm sorry, I wasn't trying to claim that Java's auto-(un)boxing was anything like ADTs; I know better. I was just quipping that just because there's a difference between a box containing a piece of fruit and the piece of fruit itself, that doesn't mean handling the box can't be considered syntactic sugar -- your original remark claimed something wasn't syntactic sugar because of the difference between the box and its contents, and that's what I disagree with. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
your original remark claimed something wasn't syntactic sugar because of the difference between the box and its contents, and that's what I disagree with.
Maybe I misunderstood -- you seemed to be saying that algebraic types were just syntactic sugar for something. Perhaps I should have asked what you thought they were syntactic sugar *for*? I was also responding to a comment that values in Python are already tagged with their type, so tagged unions are unnecessary. But the type tag of a Python object is not equivalent to the tag of an algebraic type, because the latter conveys information over and above the type of its payload. -- Greg
On Wed, May 18, 2016 at 3:41 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Guido van Rossum wrote:
your original remark claimed something wasn't syntactic sugar because of the difference between the box and its contents, and that's what I disagree with.
Maybe I misunderstood -- you seemed to be saying that algebraic types were just syntactic sugar for something. Perhaps I should have asked what you thought they were syntactic sugar *for*?
That's a good question, for which I don't have a great answer. I've just discussed some of this with the author of that blog article (Chad Austin) and we came to the conclusion that there isn't a great future for ADTs or Sum types in Python because almost everything you can do with them already has a way to do it in Python that's good enough. Sometimes what you need is Haskell's Maybe (i.e. 'Nothing' or 'Just x'), and in most cases just checking for None is good enough. The strict optional checking that's coming to mypy soon will help here because it'll catch you when you're not checking for None before doing something else with the value. Sometimes you really do want to distinguish between the box and what's in it, and then you can use a bunch of simple classes, or maybe a bunch of namedtuples, to represent the different kinds of boxes. If you declare the unopened box type as a Union you can get mypy to check that you've covered all your bases. (If you don't want/need a check that you're handling all cases you can use a shared superclass instead of a union.) There are probably a few other cases. The one thing that Python doesn't have (and mypy doesn't add) would be a match statement. The design of a Pythonic match statement would be an interesting exercise; perhaps we should see how far we can get with that for Python 3.7.
I was also responding to a comment that values in Python are already tagged with their type, so tagged unions are unnecessary. But the type tag of a Python object is not equivalent to the tag of an algebraic type, because the latter conveys information over and above the type of its payload.
Right. And if you need that feature you can wrap it in a class or namedtuple. -- --Guido van Rossum (python.org/~guido)
On 19 May 2016 at 08:53, Guido van Rossum <guido@python.org> wrote:
The one thing that Python doesn't have (and mypy doesn't add) would be a match statement. The design of a Pythonic match statement would be an interesting exercise; perhaps we should see how far we can get with that for Python 3.7.
If it's syntactic sugar for a particular variety of if/elif/else statement, I think that may be feasible, but you'd presumably want to avoid the "Can we precompute a lookup table?" quagmire that doomed PEP 3103's switch statement. That said, for the pre-computed lookup table case, whatever signature deconstruction design you came up with for a match statement might also be usable as the basis for a functools.multidispatch() decorator design. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Guido van Rossum wrote:
Sometimes you really do want to distinguish between the box and what's in it, and then you can use a bunch of simple classes, or maybe a bunch of namedtuples, to represent the different kinds of boxes. If you declare the unopened box type as a Union you can get mypy to check that you've covered all your bases.
Well, sort of. It will check that you don't pass anything outside of the set of allowed types, but it can't tell whether you've handled all the possible cases in the code. That's something Haskell can do because (1) it has special constructs for case analysis, and (2) its algebraic types can't be extended or subclassed.
The one thing that Python doesn't have (and mypy doesn't add) would be a match statement. The design of a Pythonic match statement would be an interesting exercise;
Yes, if there were dedicated syntax for it, mypy could check that you covered all the branches of a Union. I've thought about this off and on a bit. Maybe I'll write about some ideas in another post. -- Greg
On Thu, May 19, 2016 at 12:00 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Guido van Rossum wrote:
Sometimes you really do want to distinguish between the box and what's in it, and then you can use a bunch of simple classes, or maybe a bunch of namedtuples, to represent the different kinds of boxes. If you declare the unopened box type as a Union you can get mypy to check that you've covered all your bases.
Well, sort of. It will check that you don't pass anything outside of the set of allowed types, but it can't tell whether you've handled all the possible cases in the code. That's something Haskell can do because (1) it has special constructs for case analysis, and (2) its algebraic types can't be extended or subclassed.
I'm still not sure I understand why this check that you've handled all cases is so important (I've met a few people who obsessed about it in other languages, but I don't really feel the need in my gut yet). I also don't think that subclasses cause problems (if there's a match for a particular class, it will match the subclass too). Anyway, I believe mypy could easily check that you're always returning a value in situations like this (even though it doesn't do that now): def foo(arg: Union[X, Y, Z]) -> int: if isinstance(arg, X): return arg.counter if isinstance(arg, Y): return 0 It should be possible to figure out that Z is missing here.
The one thing that Python doesn't have (and mypy doesn't add) would be a match statement. The design of a Pythonic match statement would be an interesting exercise;
Yes, if there were dedicated syntax for it, mypy could check that you covered all the branches of a Union.
I've thought about this off and on a bit. Maybe I'll write about some ideas in another post.
Would love to hear your thoughts!. -- --Guido van Rossum (python.org/~guido)
Guido van Rossum wrote:
I'm still not sure I understand why this check that you've handled all cases is so important (I've met a few people who obsessed about it in other languages, but I don't really feel the need in my gut yet).
If you're programming in the usual OO style, then when you add a new subclass, most of the code needed to support it goes into that class. You can easily go through all the methods of the base class and make sure you've overridden the ones you need to. But if you add a new branch to an algebraic type, you need to chase down all the pieces of code scattered about your program that operate on that type and update them. If you're happy to rely on testing to do so, that's fine. But if you're a static checking kind of person, I can see the attraction of having some help from your tools for it.
I also don't think that subclasses cause problems (if there's a match for a particular class, it will match the subclass too).
It's not subclassing an existing branch that's the problem, it's adding a new branch to the union. -- Greg
On Mon, May 16, 2016 at 2:37 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'm not deeply into category theory, but the proposal seems to be that Sum would be a special kind of type that's assumed to be the same type wherever it appears in the signature of a function.
Indeed, that's TypeVar.
That might work for the special case of a function of one parameter that returns the same type as its argument, but that's about all.
I was actually trying to interpret the (vague) proposal as going even a step further, closer to what may have helped slightly in the fspath type hinting problem. Perhaps something like this: M = TagMatcher(3) # 3 is the number of "summed" types # (and yes, I just made up the term TagMatcher) def onehalf(value: M.sum[int, float, complex]) -> M.sum[float, float, complex]: return 0.5 * value That would then match the "tag" between the argument and return types (within the sum types which are tagged unions): int -> float float -> float complex -> complex As you can see, "onehalf(...)" will turn int, float and complex into float, float and complex, respectively. I suppose one or more "functors" could be seen in there to make this happen in theory XD. This is not solved by a TypeVar.
As has been pointed out, type parameters allow the same thing to be expressed, and a lot more besides, so a Sum type is not needed. We already have everything we neeed.
So, despite what I write above, you still seem to agree with me, assuming you did not literally mean *everything* ;-). The reason why we agree is that we can already do: @overload def onehalf(value: int) -> float: ... @overload def onehalf(value: float) -> float: ... @overload def onehalf(value: complex) -> complex: ... So, the sum type, even with my "TagMatcher" described above, does not provide anything new. After all, the whole point of sum types is to give *statically typed* languages something that resembles dynamic typing. -- Koos
-- Greg
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Koos Zevenhoven writes:
On Mon, May 16, 2016 at 2:37 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I'm not deeply into category theory, but the proposal seems to be that Sum would be a special kind of type that's assumed to be the same type wherever it appears in the signature of a function.
Indeed, that's TypeVar.
True as far as the statement itself makes sense (categorically, a type is what it is, so of course it is the same type wherever it appears!), but a Sum would not be a TypeVar. It's a type, it's far more limited[1] in analyzing Python programs, and I would like to withdraw the term itself, as well as any proposal to add it to typing, from this discussion. All concerned have my apologies for bringing it up at all. Footnotes: [1] And useful in category theory for that very reason -- it's easier to reason about in categorical contexts. But it makes things harder in Mypy.
participants (10)
-
Brett Cannon
-
Chris Barker - NOAA Federal
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
Koos Zevenhoven
-
Nick Coghlan
-
Random832
-
Stephen J. Turnbull
-
Steven D'Aprano