
This seems to be what I thought too, except completely in the stubs. I might add some comments in the blog post draft. -- Koos On Sun, May 15, 2016 at 8:21 PM, Guido van Rossum <guido@python.org> wrote:
I didn't have time to read the thread, but I read the PEP and thought about this a little bit.
One key thing is that we can write the code CPython sees at runtime one way, and write the stubs that type checkers (like mypy) see a different way. The stubs go in the typeshed repo (https://github.com/python/typeshed) and I would add something like the following to the os module there (stdlib/3/os/__init__.pyi in the repo).
First we need to add scandir() and DirEntry (this is not entirely unrelated -- DirEntry is an example of something that is PathLike). Disregarding the PathLike protocol for the moment, I think they can be defined like this:
if sys.version_info >= (3, 5): class DirEntry(Generic[AnyStr]): name = ... # type: AnyStr path = ... # type: AnyStr def inode(self) -> int: ... def is_dir(self, *, follow_symlinks: bool = ...) -> bool: ... def is_file(self, *, follow_symlinks: bool = ...) -> bool: ... def is_symlink(self) -> bool: ... def stat(self, *, follow_symlinks: bool = ...) -> stat_result: ...
@overload def scandir(path: str = ...) -> DirEntry[str]: ... @overload def scandir(path: bytes) -> DirEntry[bytes]: ...
Note that the docs claim there's a type os.DirEntry, even though it doesn't currently exist -- I think we should fix that in 3.6 even if it may not make sense to instantiate it.
Also note that a slightly different overload is also possible -- I think these are for all practical purposes the same:
@overload def scandir() -> DirEntry[str]: ... @overload def scandir(path: AnyStr) -> DirEntry[AnyStr]: ...
The reason we need the overload in all cases is that os.scandir() without arguments returns a str.
Finally, a reminder that this is all stub code -- it's only ever seen by typecheckers. What we put in the actual os.py file in the stdlib can be completely different, and it doesn't need type annotations (type checkers always prefer the stubs over the real code).
Now let's add PathLike. This first attempt doesn't address DirEntry yet:
if sys.version_info >= (3, 6): from abc import abstractmethod class PathLike(Generic[AnyStr]): @abstractmethod def __fspath__(self) -> AnyStr: ...
@overload def fspath(path: PathLike[AnyStr]) -> AnyStr: ... @overload def fspath(path: AnyStr) -> AnyStr: ...
This tells a type checker enough so that it will know that e.g. os.fspath(b'.') returns a bytes object. Also, if we have a class C that derives from PathLike we can make it non-generic, e.g. the stubs for pathlib.Path would start with something like
class Path(os.PathLike[str]): ...
and now the type checker will know that in the following code `c` is always a str:
a = ... # type: Any b = pathlib.Path(a) c = os.fspath(b)
Finally let's redefine scandir(). We'll have to redefind DirEntry to inherit from PathLike, and it will remain generic:
class DirEntry(PathLike[AnyStr], Generic[AnyStr]): # Everything else unchanged!
Now the type checker should understand the following:
for a in os.scandir('.'): b = os.fspath(a) ...
Here it will know that `a` is a DirEntry[str] (because the argument given to os.scandir() is a str) and hence it will also know that b is a str. Now if then pass b to pathlib it will understand this cannot be a type error, and if you pass b to some os.path.* function (e.g. os.path.basename()) it will understand the return value is a str.
If you pass some variable to os.scandir() then if the type checker can deduce that that variable is a str (e.g. because you've gotten it from pathlib) it will know that the results are DirEntry[str] instances. If you pass something to os.scandir() that's a bytes object it will know that the results are DirEntry[bytes] objects, and it knows that calling os.fspath() on those will return bytes. (And it will know that you can't pass those to pathlib, but you *can* pass them to most os and os.path functions.)
Next, if the variable passed to os.scandir() has the declared or inferred type AnyStr then mypy will know that it can be either str or bytes and the types of results will also use AnyStr. I think in that case you'll get an error if you pass it to pathlib. Note that this can only happen inside a generic class or a generic function that has AnyStr as one of its parameters. (AnyStr is itself a type variable.)
The story ought to be similar if the variable has the type Union[str, bytes], except that this can occur in non-generic code and the resulting types are similarly fuzzy. (I think there's a bug in mypy around this though, follow https://github.com/python/mypy/issues/1533 if you're interested how that turns out.)
-- --Guido van Rossum (python.org/~guido)