On Sun, 15 May 2016 at 10:21 Guido van Rossum <guido@python.org> wrote:
I didn't have time to read the thread, but I read the PEP and thought about this a little bit.

One key thing is that we can write the code CPython sees at runtime one way, and write the stubs that type checkers (like mypy) see a different way. The stubs go in the typeshed repo (https://github.com/python/typeshed) and I would add something like the following to the os module there (stdlib/3/os/__init__.pyi in the repo).

First we need to add scandir() and DirEntry (this is not entirely unrelated -- DirEntry is an example of something that is PathLike). Disregarding the PathLike protocol for the moment, I think they can be defined like this:

if sys.version_info >= (3, 5):
    class DirEntry(Generic[AnyStr]):
        name = ...  # type: AnyStr
        path = ...  # type: AnyStr
        def inode(self) -> int: ...
        def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ...
        def is_file(self, *, follow_symlinks: bool = ...) -> bool: ...
        def is_symlink(self) -> bool: ...
        def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ...

    @overload
    def scandir(path: str = ...) -> DirEntry[str]: ...
    @overload
    def scandir(path: bytes) -> DirEntry[bytes]: ...

Note that the docs claim there's a type os.DirEntry, even though it doesn't currently exist -- I think we should fix that in 3.6 even if it may not make sense to instantiate it.

http://bugs.python.org/issue27038

(and AnyStr isn't documented, so http://bugs.python.org/issue26141).
 

Also note that a slightly different overload is also possible -- I think these are for all practical purposes the same:

    @overload
    def scandir() -> DirEntry[str]: ...
    @overload
    def scandir(path: AnyStr) -> DirEntry[AnyStr]: ...

The reason we need the overload in all cases is that os.scandir() without arguments returns a str.

Finally, a reminder that this is all stub code -- it's only ever seen by typecheckers. What we put in the actual os.py file in the stdlib can be completely different, and it doesn't need type annotations (type checkers always prefer the stubs over the real code).

Now let's add PathLike. This first attempt doesn't address DirEntry yet:

if sys.version_info >= (3, 6):
    from abc import abstractmethod
    class PathLike(Generic[AnyStr]):
        @abstractmethod
        def __fspath__(self) -> AnyStr: ...

    @overload
    def fspath(path: PathLike[AnyStr]) -> AnyStr: ...
    @overload
    def fspath(path: AnyStr) -> AnyStr: ...

This tells a type checker enough so that it will know that e.g. os.fspath(b'.') returns a bytes object. Also, if we have a class C that derives from PathLike we can make it non-generic, e.g. the stubs for pathlib.Path would start with something like

class Path(os.PathLike[str]):
    ...

and now the type checker will know that in the following code `c` is always a str:

a = ...  # type: Any
b = pathlib.Path(a)
c = os.fspath(b)

Finally let's redefine scandir(). We'll have to redefind DirEntry to inherit from PathLike, and it will remain generic:

class DirEntry(PathLike[AnyStr], Generic[AnyStr]):
    # Everything else unchanged!

Now the type checker should understand the following:

for a in os.scandir('.'):
    b = os.fspath(a)
    ...

Here it will know that `a` is a DirEntry[str] (because the argument given to os.scandir() is a str)

Which works because AnyStr is a TypeVar (if anyone else was wondering like I was why that worked since AnyStr isn't documented yet).
 
and hence it will also know that b is a str. Now if then pass b to pathlib it will understand this cannot be a type error, and if you pass b to some os.path.* function (e.g. os.path.basename()) it will understand the return value is a str.

If you pass some variable to os.scandir() then if the type checker can deduce that that variable is a str (e.g. because you've gotten it from pathlib) it will know that the results are DirEntry[str] instances. If you pass something to os.scandir() that's a bytes object it will know that the results are DirEntry[bytes] objects, and it knows that calling os.fspath() on those will return bytes. (And it will know that you can't pass those to pathlib, but you *can* pass them to most os and os.path functions.)

Next, if the variable passed to os.scandir() has the declared or inferred type AnyStr then mypy will know that it can be either str or bytes and the types of results will also use AnyStr. I think in that case you'll get an error if you pass it to pathlib. Note that this can only happen inside a generic class or a generic function that has AnyStr as one of its parameters. (AnyStr is itself a type variable.)

The story ought to be similar if the variable has the type Union[str, bytes], except that this can occur in non-generic code and the resulting types are similarly fuzzy. (I think there's a bug in mypy around this though, follow https://github.com/python/mypy/issues/1533 if you're interested how that turns out.)

This might make a nice example in the docs and/or blog post since this is hitting the intermediate/advanced space for typing that almost none of us have hit.