[Python-Dev] Defining a path protocol

Brett Cannon brett at python.org
Fri Apr 8 17:53:18 EDT 2016

On Fri, 8 Apr 2016 at 14:23 Koos Zevenhoven <k7hoven at gmail.com> wrote:

> On Fri, Apr 8, 2016 at 8:34 PM, Brett Cannon <brett at python.org> wrote:
> > On Fri, 8 Apr 2016 at 09:39 Ethan Furman <ethan at stoneleaf.us> wrote:
> >> > I thought the whole point off all this is that not any old string can
> be
> >> > a path! (whereas any int can be an index). Unless we go with Chris A's
> >> > suggestion that this be a more generic lossless string protocol,
> rather
> >> > than just for paths.
> >>
> >> That does seem to be a valid point against str.__fspath__.
> >
> > Yep, and I'm expecting we won't want that at this point. The fact that
> paths
> > need strings for low-level OS stuff is a historical and technical
> detail, so
> > no need to drag the entire str type into it if we can provide a
> reasonable
> > helper function (for either the ABC or magic method solution).
> I'm not sure I understand what these points are about.

It means we most likely won't add a new method to str in regards to this

> Anyway,
> disallowing str or bytes as pathnames will break backwards
> compatibility if done at some point in the future. There's no way
> around that.

No one is proposing disallowing str or bytes for a pre-existing API that
supports either. The whole point of this is to make APIs work with strings
and pathlib.

> But regarding all this talk of mine about bytes is because it has not
> been completely clear to me if something can break when converting a
> bytes path to str. I did originally propose guaranteeing a str, but I
> am so far only 85% convinced that that does not cause any problems.

Depends on your definition of "problem". If you somehow blindly converted a
bytes object representing a path to a str without knowing its encoding you
will definitely break someone silently (and even os.fsdecode() isn't
fool-proof thanks to multiple encodings on a single file system).

> I
> understand that fsencode(fsdecode(bytes_path)) should always be equal
> to bytes_path. But can some other path operations fail when there are
> surrogates in the strings? And again, not to forget DirEntry, which
> may have a byte string path.

At this point no one wants to touch bytes paths. If you need that level of
control because of multiple encodings within a single file system then you
will probably have to stick with managing bytes paths on your own to get
the encoding right.

And just because DirEntry supports bytes doesn't mean that any magic method
it gains has to carry that forward (it can always raise a TypeError if

> Either way, I suppose os.fspath should accept anything that has
> __fspath__ or is a str or bytes (whether these have the dunder method
> or not).

Maybe. I'm not sure if we will want to down that route of both bytes and
str being supported out of the same function as that gets messy quickly.
The main reason os.scandir() supports it is so it can be a drop-in
replacement for os.listdir(). It really depends on how we choose to
structure the function in terms of just doing the right thing for objects
that follow the protocol or if we want to introduce some required structure
for the resulting path and implement some type guarantees so you have a
better idea of what you will be working with after calling the function.

> Then the options are either to return Union[str, bytes] or to
> always return str. And if the latter does not cause any problems, I
> like it way better, and it seems others would do too.

You don't have to convert byte paths to str, you can simply raise an
exception in the face of them.

> And in that case
> it would probably be time to deprecate bytes paths on posix too (on
> Windows, this is already the case).

Can't do that as Stephen Turnbull will tell you. :) At best we can
marginalize the support of bytes-based paths to only low-level APIs exposed
through the os package.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160408/911415e3/attachment.html>

More information about the Python-Dev mailing list