[Python-Dev] pathlib - current status of discussions

Thu Apr 14 13:56:54 EDT 2016

On Thu, Apr 14, 2016 at 7:46 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
>
> What many folks seem to be missing is that *you* (generic you) have control
> of your data.
>
> If you are not working at the bytes layer, you shouldn't be getting bytes
> objects because:
>
> - you specified str when asking for data from the OS, or
> - you transformed the incoming bytes from whatever external source
>   to str when you received them.

There is an apparent contradiction of the above with some previous
posts, including your own. Let me try to fix it:

Code that deals with paths can be divided in groups as follows:

(1) Code that has access to pathname/filename data and has some level
of control over what data type comes in. This code may for instance
choose to deal with either bytes or str

(2) Code that takes the path or file name that it happens to get and
does something with it. This type of code can be divided into
subgroups as follows:

  (2a) Code that accepts only one type of paths (e.g. str, bytes or
pathlib) and fails if it gets something else.

  (2b) Code that wants to support different types of paths such as
str, bytes or pathlib objects. This includes os.path.*, os.scandir,
and various other standard library code. Presumably there is also
third-party code that does the same. These functions may want to
preserve the str-ness or bytes-ness of the paths in case they return
paths, as the stdlib now does. But new code may even want to return
pathlib objects when they get such objects as inputs. This is the
duck-typing or polymorphic code we have been talking about. Code of
this type (2b) may want to avoid implicit conversions because it makes
the life of code of the other types more difficult.

(feel free to fill in more categories of code)

So the code of type (2b) is trying to make all categories happy by
returning objects of the same type that it gets as input, while the
other categories are probably in the situation where they don't
necessarily need to make other categories of code happy.

And the question is this: Do we need to make code using both bytes
*and* scandir happy? This is largely the same question as whether we
have to support bytes in addition to str in the protocol.

(We may of course talk about third-party path libraries that have the
same problem as scandir's DirEntry. Ethan's library is not exactly in
the same category as DirEntry since its path objects *are* instances
of bytes or str and therefore do not need this protocol to begin with,
except perhaps for conversions from other high-level path types so
that different path libraries work together nicely).

-Koos