[Python-ideas] Type hinting for path-related functions

Brett Cannon brett at python.org
Mon May 16 12:15:50 EDT 2016

On Sun, 15 May 2016 at 10:21 Guido van Rossum <guido at python.org> wrote:

> I didn't have time to read the thread, but I read the PEP and thought
> about this a little bit.
> One key thing is that we can write the code CPython sees at runtime one
> way, and write the stubs that type checkers (like mypy) see a different
> way. The stubs go in the typeshed repo (https://github.com/python/typeshed)
> and I would add something like the following to the os module there
> (stdlib/3/os/__init__.pyi in the repo).
> First we need to add scandir() and DirEntry (this is not entirely
> unrelated -- DirEntry is an example of something that is PathLike).
> Disregarding the PathLike protocol for the moment, I think they can be
> defined like this:
> if sys.version_info >= (3, 5):
>     class DirEntry(Generic[AnyStr]):
>         name = ...  # type: AnyStr
>         path = ...  # type: AnyStr
>         def inode(self) -> int: ...
>         def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ...
>         def is_file(self, *, follow_symlinks: bool = ...) -> bool: ...
>         def is_symlink(self) -> bool: ...
>         def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ...
>     @overload
>     def scandir(path: str = ...) -> DirEntry[str]: ...
>     @overload
>     def scandir(path: bytes) -> DirEntry[bytes]: ...
> Note that the docs claim there's a type os.DirEntry, even though it
> doesn't currently exist -- I think we should fix that in 3.6 even if it may
> not make sense to instantiate it.


(and AnyStr isn't documented, so http://bugs.python.org/issue26141).

> Also note that a slightly different overload is also possible -- I think
> these are for all practical purposes the same:
>     @overload
>     def scandir() -> DirEntry[str]: ...
>     @overload
>     def scandir(path: AnyStr) -> DirEntry[AnyStr]: ...
> The reason we need the overload in all cases is that os.scandir() without
> arguments returns a str.
> Finally, a reminder that this is all stub code -- it's only ever seen by
> typecheckers. What we put in the actual os.py file in the stdlib can be
> completely different, and it doesn't need type annotations (type checkers
> always prefer the stubs over the real code).
> Now let's add PathLike. This first attempt doesn't address DirEntry yet:
> if sys.version_info >= (3, 6):
>     from abc import abstractmethod
>     class PathLike(Generic[AnyStr]):
>         @abstractmethod
>         def __fspath__(self) -> AnyStr: ...
>     @overload
>     def fspath(path: PathLike[AnyStr]) -> AnyStr: ...
>     @overload
>     def fspath(path: AnyStr) -> AnyStr: ...
> This tells a type checker enough so that it will know that e.g.
> os.fspath(b'.') returns a bytes object. Also, if we have a class C that
> derives from PathLike we can make it non-generic, e.g. the stubs for
> pathlib.Path would start with something like
> class Path(os.PathLike[str]):
>     ...
> and now the type checker will know that in the following code `c` is
> always a str:
> a = ...  # type: Any
> b = pathlib.Path(a)
> c = os.fspath(b)
> Finally let's redefine scandir(). We'll have to redefind DirEntry to
> inherit from PathLike, and it will remain generic:
> class DirEntry(PathLike[AnyStr], Generic[AnyStr]):
>     # Everything else unchanged!
> Now the type checker should understand the following:
> for a in os.scandir('.'):
>     b = os.fspath(a)
>     ...
> Here it will know that `a` is a DirEntry[str] (because the argument given
> to os.scandir() is a str)

Which works because AnyStr is a TypeVar (if anyone else was wondering like
I was why that worked since AnyStr isn't documented yet).

> and hence it will also know that b is a str. Now if then pass b to pathlib
> it will understand this cannot be a type error, and if you pass b to some
> os.path.* function (e.g. os.path.basename()) it will understand the return
> value is a str.
> If you pass some variable to os.scandir() then if the type checker can
> deduce that that variable is a str (e.g. because you've gotten it from
> pathlib) it will know that the results are DirEntry[str] instances. If you
> pass something to os.scandir() that's a bytes object it will know that the
> results are DirEntry[bytes] objects, and it knows that calling os.fspath()
> on those will return bytes. (And it will know that you can't pass those to
> pathlib, but you *can* pass them to most os and os.path functions.)
> Next, if the variable passed to os.scandir() has the declared or inferred
> type AnyStr then mypy will know that it can be either str or bytes and the
> types of results will also use AnyStr. I think in that case you'll get an
> error if you pass it to pathlib. Note that this can only happen inside a
> generic class or a generic function that has AnyStr as one of its
> parameters. (AnyStr is itself a type variable.)
> The story ought to be similar if the variable has the type Union[str,
> bytes], except that this can occur in non-generic code and the resulting
> types are similarly fuzzy. (I think there's a bug in mypy around this
> though, follow https://github.com/python/mypy/issues/1533 if you're
> interested how that turns out.)

This might make a nice example in the docs and/or blog post since this is
hitting the intermediate/advanced space for typing that almost none of us
have hit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160516/bc7d9e08/attachment.html>

More information about the Python-ideas mailing list