[Python-ideas] Type hinting for path-related functions

Koos Zevenhoven k7hoven at gmail.com
Mon Apr 18 20:40:41 EDT 2016


I actually proposed this already in one of the pathlib threads on
python-dev, but I decided to repost here, because this is easily seen
as a separate issue. I'll start with some introduction, then moving on
to the actual type hinting part.

In our seemingly never-ending discussions about pathlib support in the
stdlib in various threads, first here on python-ideas, then even more
extensively on python-dev, have perhaps almost converged. The required
changes involve a protocol method, probably named __fspath__, which
any path-like type could implement to return a more, let's say,
"classical" path object such as a str. However, the protocol is
polymorphic and may also return bytes, which has a lot do do with the
fact that the stdlib itself is polymophic and currently accepts str as
well as bytes paths almost everywhere, including the newly-introduced
os.scandir + DirEntry combination. The upcoming improvements will
further allow passing pathlib path objects as well as DirEntry objects
to any stdlib function that take paths.

It came up, for instance here [1], that the function associated with
the protocol, potentially named os.fspath, will end up needing type
hints. This function takes pathlike objects and turns them into str or
bytes. There are various different scenarios [2] that can be
considered for code dealing with paths, but let's consider the case of
os.path.* and other traditional python path-related functions.

Some examples:

os.path.join

Currently, it takes str or bytes paths and returns a joined path of
the same type (mixing different types raises an exception).

In the future, it will also accept pathlib objects (underlying type
always str) and DirEntry (underlying type str or bytes) or third-party
path objects (underlying type str or bytes). The function will then
return a pathname of the underlying type.

os.path.dirname

Currently, it takes a str or bytes and returns the dirname of the same type.
In the future, it will also accept Path and DirEntry and return the
underlying type.

Let's consider the type hint of os.path.dirname at present and in the future:

Currently, one could write

def dirname(p: Union[str, bytes]) -> Union[str, bytes]:
    ...

While this is valid, it could be more precise:

pathstring = typing.TypeVar('pathstring', str, bytes)

def dirname(p: pathstring) -> pathstring:
    ...

This now contains the information that the return type is the same as
the argument type. The name 'pathstring' may be considered slightly
misleading because "byte strings" are not actually strings in Python
3, but at least it does not advertise the use of bytes as paths, which
is very rarely desirable.

But what about the future. There are two kinds of rich path objects,
those with an underlying type of str and those with an underlying type
of bytes. These should implement the __fspath__() protocol and return
their underlying type. However, we do care about what (underlying)
type is provided by the protocol, so we might want to introduce
something like typing.FSPath[underlying_type]:

FSPath[str]       # str-based pathlike, including str
FSPath[bytes]  # bytes-based pathlike, including bytes

And now, using the above defined TypeVar pathstring, the future
version of dirname would be type annotated as follows:

def dirname(p: FSPath[pathstring]) -> pathstring:
    ...

It's getting late. I hope this made sense :).

-Koos

[1] https://mail.python.org/pipermail/python-dev/2016-April/144246.html
[2] https://mail.python.org/pipermail/python-dev/2016-April/144239.html


More information about the Python-ideas mailing list