
On Tue, 29 Mar 2016 at 14:00 Sven R. Kunze srkunze@mail.de wrote:
On 29.03.2016 20:51, Brett Cannon wrote:
I may regret this, but I will give this a shot. :)
Never, Brett. I think everybody here appreciate a decent discussion. Who knows maybe, somebody advocating for path->str misses an important piece which in turn misses the PEP to describe clearly.
I wrote a blog post at http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str because my response was getting a bit long. I have pasted it below in CommonMark.
Thanks +1
So, let's see:
Over on [python-ideas](https://mail.python.org/mailman/listinfo/python-ideas) a discussion has broken out about somehow trying to make `p'/some/path/to/a/file` return an [instance of `pathlib.Path`](
https://docs.python.org/3/library/pathlib.html#pathlib.Path).
This led to a splinter discussion as to why `pathlib.Path` doesn't inherit from `str`? I figured instead of burying my response to this question in the thread I'd blog about it to try and explain one approach to API design.
I think the key question in all of this is whether paths are semantically equivalent to a sequence of characters? Obviously the answer is "no" since paths have structure, directly represent something like files and directories, etc. Now paths do have a serialized representation as strings -- at least most of the time, but I'm ignoring Linux and the crazy situation of binary paths -- which is why C APIs take in `char *` as the representation of paths (and because C tends to make one try to stick with the types that are part of the C standard). So not all strings represent paths, but all paths can be represented by strings. I think we can all agree with that.
OK, so if all paths can be represented as strings, why don't we just make `pathlib.Path` subclass `str`? Well, as I said earlier, not all strings are paths. You can't concatenate a string representing a path with some other random string and expect to get back a string that still represents a valid string (and I'm not talking about "valid" as in "the file doesn't exist", I'm talking about "syntactically not possible"). This is what [PEP 428](https://www.python.org/dev/peps/pep-0428/) is talking about when it says:
Not behaving like one of the basic builtin types [list str] also
minimizes the potential for confusion if a path is combined by accident with genuine builtin types [like str].
Valid string? I assume you mean path, right?
Yes.
Now at this point someone someone will start saying, "but Brett, '[practicality beats purity](https://www.python.org/dev/peps/pep-0020/)' and all of those pre-existing, old OS APIs want a string as an argument!" And you're right, in terms of practicality it would be easier for `pathlib.Path` to inherit from `str` ... for the short/medium term. One thing people are forgetting is that "[explicit is better than implicit](https://www.python.org/dev/peps/pep-0020/)" as well and as with most design decisions there's not one clear answer. Implicitly treating a path as a string currently works, but that's just because we have inherited a suboptimal representation for paths from C. Now if you use an explicit representation of paths like `pathlib.Path` then you gain semantic separation and understanding of what you are working with. You also avoid any issues that come with implicit `str` compatibility as I pointed out earlier.
I agree with the concatenation issue when using plain strings for paths.
However, something that I cannot leave uncommented is "suboptimal representation for paths". What would your optimal representation for paths look like?
I don't know, but I do like pathlib.Path more than a plain string.
I cannot believe that the current representation is so bad and has been for so long and nobody, really nobody, has anything done about it.
I never said it was bad, just suboptimal. If you want you can say using strings as paths is mediocre. I really don't want to argue about the label, just that pathlib.Path is a better representation of a path than a plain string and that strings on their own are nothing that special in terms of representing a string that demands we defend it on its own grounds as such.
And before anyone says, "so what?", think about this: Python 2/3. While I'm sure some will say I'm overblowing the comparison, but Python 3 came about because the implicit compatibility of binary and textual data in Python 2 caused major headaches to the point that we made a backwards-incompatible change that has caused widespread ramifications (but which we seem to be coming out on the other side of). By not inheriting from `str`, `pathlib.Path` has avoided a similar potential issue from the get-go. Or another way of looking at it is to ask why doesn't `dict` inherit from `str` so that it's easier to make it easier to stick in the body of an HTTP response so it can be implicitly treated as JSON? If you going, "eww, no because they are different types" then you at least understand the argument I'm trying to make (if you're going "I like that idea", then I think [Perl](https://www.perl.org/) might be more to your liking than Python and that's fine since the two languages just have different designs).
I think most "practicality beats purity" folks don't want that either. They are just bloody lazy. They actually want the benefits of both, the pure datastructure with its convenience methods and the dirty str-like thing with its convenience methods.
People don't like it when somebody takes one bag of convenience away from them if they easily can have both. ;-)
That's fine, but I also don't want to have the worry of latent bugs lurking in code either due to implicit type compatibility and so that clashes in this instance.
Better sell it differently. So, let's read on. ;)
Now back to that "practicality beats purity" side of this. Obviously there are lots of APIs out there that take a string as an argument for a file path. And I understand people don't like using `str(path)` or the upcoming/new [`getattr(path, 'path', path)` idiom](
https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.path)
to work around the limitations forced upon them by other APIs that only take a `str` because they occasionally forget it. For this point, I have two answers.
Sorry to interrupt your here but you missed one important piece (didn't we agree on selling this better?): people do nasty little string manipulations/analysis/regex with their paths for whatever reasons. Right now, it's easier than ever because, well, because paths are strings. :)
Sure, but that doesn't mean the situation can't be improved. You can also get the string out of Path objects to hack on, but that doesn't mean you should be shipping paths around in your code as strings either. I'm not saying you *can't* use strings to represent a path, just that it isn't the best option available from a higher-level perspective.
Selling pathlib with the "getattr(path, 'path', path)" idiom is not going to help those fellows. I mean, are you serious? It looks ridiculous. :-/
I disagree.
Especially the point that a "path" has a "path" is not really getting into my head. Either a "path" is a "path" or it better be named "file/directory" which has a "path".
path.path vs file.path/dir.path
You need to work on your selling skills, Brett. ;-)
One is still "explicit is better than implicit" and you should have tests to begin with to catch when you forget to convert your `pathlib.Path` objects to `str` as needed. I'm just one of those programmers who's willing to type a bit more for easy-to-read code that's less error-prone within reason (and I obviously view this as reasonable).
But more importantly, why don't you work with the code and/or projects that are forcing you to convert to `str` to start accepting `pathlib` objects as well for those same APIs? If projects start to work on making `pathlib` objects acceptable anywhere a path is accepted then once Python 3.4 is the oldest version of Python that is supported by the project then they can start considering dropping support for `str` as paths in new releases (obviously this also includes dropping support for Python 2 unless people use [pathlib2](https://pypi.python.org/pypi/pathlib2/) https://pypi.python.org/pypi/pathlib2/%29). And I fully admit the stdlib is not exempt from a place that needs updating, so for `importlib` I have [opened an issue](http://bugs.python.org/issue26667) to update it to show this isn't just talk on my end (unfortunately there's some unique bootstrapping problems to import where stuff like `sys.path` probably can't be updated since that has to exist before you can import `pathlib` and that sort of thing ripples out, but I'm hoping I can update at least some things and this is a very unique case of potentially needing to stick with `str`). Hopefully if people start asking for `pathlib` support from projects they will add it, eventually leading to an alleviation of the desire to have `pathlib.Path` inherit from `str`.
I think updating the stdlib looks sufficiently simple to start hacking CPython. So, I would be willing to help here once the github repo is up and running. :)
I suspect simple fixes like updating the docs and using `getattr(p, 'path', p)` in the right places will be the kind of quick fixes the GitHub migration will encourage.
-Brett
Nice post, Brett. Seems you covered the most important point quite well and also gave an explanation of why people request path->str.
Best, Sven