[Python-ideas] Type hinting for path-related functions

Koos Zevenhoven k7hoven at gmail.com
Tue May 17 10:44:43 EDT 2016


This seems to be what I thought too, except completely in the stubs. I
might add some comments in the blog post draft.

-- Koos

On Sun, May 15, 2016 at 8:21 PM, Guido van Rossum <guido at python.org> wrote:
> I didn't have time to read the thread, but I read the PEP and thought about
> this a little bit.
>
> One key thing is that we can write the code CPython sees at runtime one way,
> and write the stubs that type checkers (like mypy) see a different way. The
> stubs go in the typeshed repo (https://github.com/python/typeshed) and I
> would add something like the following to the os module there
> (stdlib/3/os/__init__.pyi in the repo).
>
> First we need to add scandir() and DirEntry (this is not entirely unrelated
> -- DirEntry is an example of something that is PathLike). Disregarding the
> PathLike protocol for the moment, I think they can be defined like this:
>
> if sys.version_info >= (3, 5):
>     class DirEntry(Generic[AnyStr]):
>         name = ...  # type: AnyStr
>         path = ...  # type: AnyStr
>         def inode(self) -> int: ...
>         def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ...
>         def is_file(self, *, follow_symlinks: bool = ...) -> bool: ...
>         def is_symlink(self) -> bool: ...
>         def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ...
>
>     @overload
>     def scandir(path: str = ...) -> DirEntry[str]: ...
>     @overload
>     def scandir(path: bytes) -> DirEntry[bytes]: ...
>
> Note that the docs claim there's a type os.DirEntry, even though it doesn't
> currently exist -- I think we should fix that in 3.6 even if it may not make
> sense to instantiate it.
>
> Also note that a slightly different overload is also possible -- I think
> these are for all practical purposes the same:
>
>     @overload
>     def scandir() -> DirEntry[str]: ...
>     @overload
>     def scandir(path: AnyStr) -> DirEntry[AnyStr]: ...
>
> The reason we need the overload in all cases is that os.scandir() without
> arguments returns a str.
>
> Finally, a reminder that this is all stub code -- it's only ever seen by
> typecheckers. What we put in the actual os.py file in the stdlib can be
> completely different, and it doesn't need type annotations (type checkers
> always prefer the stubs over the real code).
>
> Now let's add PathLike. This first attempt doesn't address DirEntry yet:
>
> if sys.version_info >= (3, 6):
>     from abc import abstractmethod
>     class PathLike(Generic[AnyStr]):
>         @abstractmethod
>         def __fspath__(self) -> AnyStr: ...
>
>     @overload
>     def fspath(path: PathLike[AnyStr]) -> AnyStr: ...
>     @overload
>     def fspath(path: AnyStr) -> AnyStr: ...
>
> This tells a type checker enough so that it will know that e.g.
> os.fspath(b'.') returns a bytes object. Also, if we have a class C that
> derives from PathLike we can make it non-generic, e.g. the stubs for
> pathlib.Path would start with something like
>
> class Path(os.PathLike[str]):
>     ...
>
> and now the type checker will know that in the following code `c` is always
> a str:
>
> a = ...  # type: Any
> b = pathlib.Path(a)
> c = os.fspath(b)
>
> Finally let's redefine scandir(). We'll have to redefind DirEntry to inherit
> from PathLike, and it will remain generic:
>
> class DirEntry(PathLike[AnyStr], Generic[AnyStr]):
>     # Everything else unchanged!
>
> Now the type checker should understand the following:
>
> for a in os.scandir('.'):
>     b = os.fspath(a)
>     ...
>
> Here it will know that `a` is a DirEntry[str] (because the argument given to
> os.scandir() is a str) and hence it will also know that b is a str. Now if
> then pass b to pathlib it will understand this cannot be a type error, and
> if you pass b to some os.path.* function (e.g. os.path.basename()) it will
> understand the return value is a str.
>
> If you pass some variable to os.scandir() then if the type checker can
> deduce that that variable is a str (e.g. because you've gotten it from
> pathlib) it will know that the results are DirEntry[str] instances. If you
> pass something to os.scandir() that's a bytes object it will know that the
> results are DirEntry[bytes] objects, and it knows that calling os.fspath()
> on those will return bytes. (And it will know that you can't pass those to
> pathlib, but you *can* pass them to most os and os.path functions.)
>
> Next, if the variable passed to os.scandir() has the declared or inferred
> type AnyStr then mypy will know that it can be either str or bytes and the
> types of results will also use AnyStr. I think in that case you'll get an
> error if you pass it to pathlib. Note that this can only happen inside a
> generic class or a generic function that has AnyStr as one of its
> parameters. (AnyStr is itself a type variable.)
>
> The story ought to be similar if the variable has the type Union[str,
> bytes], except that this can occur in non-generic code and the resulting
> types are similarly fuzzy. (I think there's a bug in mypy around this
> though, follow https://github.com/python/mypy/issues/1533 if you're
> interested how that turns out.)
>
> --
> --Guido van Rossum (python.org/~guido)


More information about the Python-ideas mailing list