[Python-ideas] Type hinting for path-related functions

Guido van Rossum guido at python.org
Sun May 15 13:21:29 EDT 2016

I didn't have time to read the thread, but I read the PEP and thought about
this a little bit.

One key thing is that we can write the code CPython sees at runtime one
way, and write the stubs that type checkers (like mypy) see a different
way. The stubs go in the typeshed repo (https://github.com/python/typeshed)
and I would add something like the following to the os module there
(stdlib/3/os/__init__.pyi in the repo).

First we need to add scandir() and DirEntry (this is not entirely unrelated
-- DirEntry is an example of something that is PathLike). Disregarding the
PathLike protocol for the moment, I think they can be defined like this:

if sys.version_info >= (3, 5):
    class DirEntry(Generic[AnyStr]):
        name = ...  # type: AnyStr
        path = ...  # type: AnyStr
        def inode(self) -> int: ...
        def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ...
        def is_file(self, *, follow_symlinks: bool = ...) -> bool: ...
        def is_symlink(self) -> bool: ...
        def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ...

    def scandir(path: str = ...) -> DirEntry[str]: ...
    def scandir(path: bytes) -> DirEntry[bytes]: ...

Note that the docs claim there's a type os.DirEntry, even though it doesn't
currently exist -- I think we should fix that in 3.6 even if it may not
make sense to instantiate it.

Also note that a slightly different overload is also possible -- I think
these are for all practical purposes the same:

    def scandir() -> DirEntry[str]: ...
    def scandir(path: AnyStr) -> DirEntry[AnyStr]: ...

The reason we need the overload in all cases is that os.scandir() without
arguments returns a str.

Finally, a reminder that this is all stub code -- it's only ever seen by
typecheckers. What we put in the actual os.py file in the stdlib can be
completely different, and it doesn't need type annotations (type checkers
always prefer the stubs over the real code).

Now let's add PathLike. This first attempt doesn't address DirEntry yet:

if sys.version_info >= (3, 6):
    from abc import abstractmethod
    class PathLike(Generic[AnyStr]):
        def __fspath__(self) -> AnyStr: ...

    def fspath(path: PathLike[AnyStr]) -> AnyStr: ...
    def fspath(path: AnyStr) -> AnyStr: ...

This tells a type checker enough so that it will know that e.g.
os.fspath(b'.') returns a bytes object. Also, if we have a class C that
derives from PathLike we can make it non-generic, e.g. the stubs for
pathlib.Path would start with something like

class Path(os.PathLike[str]):

and now the type checker will know that in the following code `c` is always
a str:

a = ...  # type: Any
b = pathlib.Path(a)
c = os.fspath(b)

Finally let's redefine scandir(). We'll have to redefind DirEntry to
inherit from PathLike, and it will remain generic:

class DirEntry(PathLike[AnyStr], Generic[AnyStr]):
    # Everything else unchanged!

Now the type checker should understand the following:

for a in os.scandir('.'):
    b = os.fspath(a)

Here it will know that `a` is a DirEntry[str] (because the argument given
to os.scandir() is a str) and hence it will also know that b is a str. Now
if then pass b to pathlib it will understand this cannot be a type error,
and if you pass b to some os.path.* function (e.g. os.path.basename()) it
will understand the return value is a str.

If you pass some variable to os.scandir() then if the type checker can
deduce that that variable is a str (e.g. because you've gotten it from
pathlib) it will know that the results are DirEntry[str] instances. If you
pass something to os.scandir() that's a bytes object it will know that the
results are DirEntry[bytes] objects, and it knows that calling os.fspath()
on those will return bytes. (And it will know that you can't pass those to
pathlib, but you *can* pass them to most os and os.path functions.)

Next, if the variable passed to os.scandir() has the declared or inferred
type AnyStr then mypy will know that it can be either str or bytes and the
types of results will also use AnyStr. I think in that case you'll get an
error if you pass it to pathlib. Note that this can only happen inside a
generic class or a generic function that has AnyStr as one of its
parameters. (AnyStr is itself a type variable.)

The story ought to be similar if the variable has the type Union[str,
bytes], except that this can occur in non-generic code and the resulting
types are similarly fuzzy. (I think there's a bug in mypy around this
though, follow https://github.com/python/mypy/issues/1533 if you're
interested how that turns out.)

--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160515/621c5b35/attachment.html>

More information about the Python-ideas mailing list