Making pathlib paths inherit from str

Hi all, Recently, I have spent quite a lot of time thinking about what should be done to improve the situation of pathlib. Last week, after all kinds of wild ideas and trying hard every night to prove to myself that duck-typing-like compatibility with other path-representing objects is sufficient, I noticed that I had failed to prove that and that I had gone around a full circle back to where I started: "why can't paths subclass ``str``?" In the "Working with Path objects: p-strings?" thread, I said I was working on a proposal. Since it's been several days already, I think i should post it here and get some feedback before going any further. Maybe I should have done that even earlier. Anyway, there are some rough edges, and I will need to add links to references etc. So, do not hesitate to give feedback or criticism, which is especially appreciated it you take the time to read through the whole thing first :). You can read a github-rendered version here: https://github.com/k7hoven/strpathlib/blob/master/proposal.rst And the raw rst is here below: Proposal: Making path objects inherit from ``str``. =================================================== Abstract -------- This proposal addresses issues that limit the usability of the object-oriented filesystem path objects, most notably those of ``pathlib``, introduced in the standard library in Python 3.4 via PEP 0428. One issue being that the path objects are not directly compatible with nearly any libraries, including the standard library. A further goal of this proposal is to provide a smooth transition into a Python with better Path handling, while keeping backwards compatiblity concerns to a minimum. The approach involves making the path classes in ``pathlib`` (and optionally also DirEntry) subclasses of ``str``, but takes further measures to avoid problems and unnecessary additions. Introduction ------------ Filesystem paths are strings that give instructions for traversing a directory tree. In Python, they have traditionally been represented as byte strings, and more recently, unicode string. However, Python now has ``pathlib`` in the standard library, which is an object-oriented library for dealing with objects specialized in representing a path and working with it. In this proposal, such objects are generally referred to as *path objects*, or sometimes, in the specific context of instances of the ``pathlib`` path classes, they are explicitly referred to as ``pathlib`` objects. In ``pathlib`` there is a hierarchy of path classes with a common base class ``PurePath``. It has a subclass ``Path`` which essentially assumes the path is intended to represent a path on the current system. However, both of these classes, when called, instantiate a subclass of the ``Windows`` or ``Posix`` flavor, which have slightly different behavior. In total, there are thus five public classes: ``PurePath``, ``PurePosixPath``, ``PureWindowsPath``, ``Path``, ``PosixPath`` and ``WindowsPath``. Since Python 3.5 and the introduction of ``os.scandir``, the family of path classes has a new member, ``DirEntry``, which is a performance-oriented path object with significant duck-typing compatibility with ``pathlib`` objects. The adoption of the different types of path objects is still quite low, which is perhaps unsurprising, because they were only introduced very recently. However, it can also be inconvenient to work with these objects, because, they usually need to be explicitly converted into strings before passing them to functions, and path strings returned by functions need to be explicitly converted into path objects. Especially the latter issue is difficult in terms of backwards compatibility of APIs. While many things were recently discussed on Python ideas regarding the future of path-like objects, this proposal has a much more limited scope, to provide first steps in the right direction. However, the last part of this proposal considers possible future directions that this may optionally lead to. Rationale --------- Filesystem paths (or comparable things like URIs) are strings of characters that represent information needed to access a file or directory (or other resource). In other words, they form a subset of strings, involving specialized functionality such as joining absolute and relative paths together, accessing different parts of the path or file name, and even accessing the resources the path points to. In Python terms, for a path ``path``, one would have ``isinstance(path, str)``. It is also clear that not all strings are paths. On the one hand, this would make an ideal case for making all path-representing objects inherit from ``str``; while Python tries not to over-emphasize object-oriented programming and inheritance, it should not try to avoid class hierarchies when they are appropriate in terms of both purity and practicality. Regarding practicality, making specialized *path objects* also instances of ``str`` would make almost any stdlib or third-party function accept path objects as path arguments, assuming that they accept any instance of ``str``. Furthermore, functions now returning instances of ``str`` to represent paths could in future versions return path objects, with only minor backwards-incompatibility worries. On the other hand, strings are a very general concept, and the Python ``str`` class provides a large variety of methods to manipulate and work with them, including ``.split()``, ``.find()``, ``.isnumeric()`` and ``.join()``. These operations may be defined just as well for a string that represents a path than for any other string. In fact, this is the status quo in Python, as the adoption of ``pathlib`` is still quite limited and paths are in most cases represented as strings (sometimes byte strings). But while the string operations are *defined* on path-representing strings, the results of these operations may not be of any use in most cases, even if in some cases, they may be. While it is not the responsibility of the programming language to prevent doing things that are not useful, it may be practical in some cases. For instance, the string method ``.find()`` could be mistaken to mean finding files on the file system, while it in fact searches for a substring. String concatenation, in turn, can be a perfectly reasonable thing to do: ``show_msg("Data saved to file: " + file_path)``. The result of the concatenation of a string and a path is not a path, but a general string. Directly concatenating two path objects together as strings, however, likely has no sensible use cases. There is prior art in subclassing the Python ``str`` type to build a path object type. Packages on PyPI (TODO: list more?) that do this include ``path.py`` and ``antipathy``. The latter also supports ``bytes``-based paths by instantiating a different class, a subclass of ``bytes``. Since these libraries have existed for several years, experience from them is available for evaluating the potential benefits and weaknesses of this proposal (as well as other aspects regarding ``pathlib``). However, this proposal goes a step further to avoid potential problems and to provide a smooth transition plan that, if desired, can be followed to painlessly move towards a Python with a clear distinction between strings and paths. An optional long-term goal that this proposal facilitates may be to gradually move away from using strings (or even their subclasses) as paths. Specification of standard library changes ----------------------------------------- Making ``pathlib`` classes subclasses of ``str`` ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Assuming the present class hierarchy in ``pathlib``, inheritance from ``str`` will be introduced by making the base class ``pathlib.PurePath`` a subclass of ``str``. Methods will further be overridden in ``PurePath`` as described in the following. Overriding all ``str``-specific methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since most of the ``str`` methods are not of any use on paths and can be confusing, leading to undesired behavior, *most* ``str`` methods (including magic methods, but excluding methods listed below) are overridden in ``PurePath`` with methods that by default raise ``TypeError("str method '<name>' is not available for paths."``. This will help programmers to immediately notice when they are using the wrong method. The perhaps unusual practice of disabling most base-class methods can be regarded as being conservative in adding ``str`` functionality to path objects. All methods, including double-underscore methods are overridden, except for the following, which are *not* overridden: - Methods of the ``str`` or ``object`` types that are already overridden by ``PurePath`` - Methods of the ``object`` type that are not overridden by ``str`` - ``__getattribute__`` - ``__len__`` (this could be debated, but not having it might be weird for a str instance) - ``encode`` - ``startswith`` and ``endswith`` (TODO: override these with case-insensitive behavior on the windows flavor) - ``__add__`` will be overriden separately, as described in later subsections. This will allow ``open(...)`` as well as most ``os`` and ``os.path`` functionality to work immediately, although there are cases that need special handling. Later, if shown to be desirable, some additional string methods may be enabled on paths. Overriding ``.__add__`` to disable adding two path objects together ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Overloading of the ``+`` operator in ``str`` will be overridden with a version which disables concatenation of two path objects together while allowing other type combinations (TODO: consider also fully disabling +): .. code:: python def __add__(self, other): if isinstance(other, PurePath): raise TypeError("Operator + for two paths is not defined; use / for joining paths.") return str.__add__(self, other) Optional enabling of string methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since many APIs currently have functions or methods that return paths as strings, existing code may expect to have all string functionality available on the returned objects. While most users are unlikely to use much of the ``str`` functionality, a library function may want to explicitly allow these operations on a path object that it returns. Therefore, the overridden ``str`` methods can be enabled by setting a ``._enable_str_functionality`` method on a path object as follows: - ``pathobj._enable_str_functionality = True #`` -- Enable ``str`` methods - ``pathobj._enable_str_functionality = 'warn' #`` -- Enable ``str`` methods, but emit a ``FutureWarning`` with the message ``"str method '<name>' may be disabled on paths in future versions."`` The warning will help the API users notice that the return value is no longer a plain path. .. code:: python def <name>(self, *args, **kwargs): """Method of str, not for use with pathlib path objects.""" try: enable = self._enable_str_functionality except AttributeError: raise TypeError("str method '{}' is not available for paths." .format('<name>')) from None if enable == 'warn': warnings.warn("str method '{}' may be disabled on paths in future versions." .format('<name>'), FutureWarning, stacklevel = 2) elif enable is True: pass else: raise ValueError("_enable_str_functionality can be True or 'warn'") return getattr(str, name)(self, *args, **kwargs) New APIs, however, do not need to enable ``str`` functionality and may return default path objects. Helping interactive python tools and IDEs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Interactive Python tools such as Jupyter are growing in popularity. When they use ``dir(...)`` to give suggestions for code completion, it is harmful to have all the disabled ``str`` methods show up in the list, even if they typically would raise exceptions. Therefore, the ``__dir__`` method should be overridden on ``PurePath`` to only show the methods that are meaningful for paths. Some tools used for code completion, such as ``rope`` and ``jedi`` may need some changes for optimal code completion. This in fact includes also the standard Python REPL, which currently does not respect ``__dir__`` in tab completion. Changes needed to other stdlib modules ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In stdlib modules other than ``pathlib``, mainly ``os``, ``ntpath`` and ``posixpath``, The stdlib functions in modules that use the methods/functionality listed below on path or file names, will be modified to explicitly convert the name ``name`` to a plain string first, e.g., using ``getattr(name, 'path', name)``, which also works for ``DirEntry`` but may return ``bytes``: - ``split`` - ``find`` - ``rfind`` - ``partition`` - ``__iter__`` - ``__getitem__`` (However, if ``DirEntry`` is not made to subclass ``str``, the idiom ``getattr(name, 'path', name)`` which is already supported in the development version, should be implemented in stdlib functions to accept not only ``str`` and path objects, but also DirEntry.) Guidelines for third-party package maintainers ---------------------------------------------- Libraries that take paths as arguments or return them ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since all of the standard library will accept path objects as path arguments, most third-party libraries will automatically do so. However, those that directly manipulate or examine the path name using ``str`` methods may not work. Those libraries will not immediately be ``pathlib``-compatible. To achieve full ``pathlib``-compatiblity, the libraries are advised to: 1. Make sure they do not explicitly check the ``type(...)`` of arguments, but use ``isinstance(...)`` instead, if needed. 2. See if their functions use disabled ``str``/``bytes`` methods on paths that they take as arguments. If so, they should either: \* change their code to, achieve the same using ``os.path`` functions (*this is the preferred option*), or \* convert the argument first using ``name = getattr(name, 'path', name)``, which does not require importing pathlib 3. Consider, when returning a path or file name, to convert it to a path object first if a ``str``-subclassing ``pathlib`` is available. During a transition period, the attribute ``._enable_str_functionality = 'warn'`` should be set before returning the object. For an even softer, transition period it is also possible to set ``._enable_str_functionality = True``, which enables ``str`` methods with no warnings. Pathlib-compatible or near-compatible libraries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To have the best level of compatibility, all path-like objects should preferably behave similarly to pathlib objects regarding subclassing ``str``. However, for the best level of *compatibility*, the safest options is to subclass ``str`` and *not* disable ``str`` functionality (which is already done by some known libraries). However, they may want to further disable methods of ``str`` to achieve the additional clarity that ``pathlib`` has regarding \* Having a ``.path`` attribute/property which gives a ``str`` (or ``bytes``) instance representing the path Older Python versions ~~~~~~~~~~~~~~~~~~~~~ The ``pathlib2`` module, which provides ``pathlib`` to pre-3.4 python versions, can also subclass ``str``, but it should by default have ``._enable_str_functionality = 'warn'`` or ``.enable_str_functionality = True``, because the stdlib in the older Python-versions is not compatible with paths that have ``str`` functionality disabled. Transition plans and future possibilities for long-term consideration --------------------------------------------------------------------- ``DirEntry`` ~~~~~~~~~~~~ ``DirEntry`` should also undergo a similar transition, which was, at first, part of this proposal, but it was removed to limit the scope (It could be added back, of course, if desired). Since ``DirEntry`` focuses on performance, it is important not to cause any significant performance drops. It would, however, simplify things if ``DirEntry`` did the same as ``pathlib`` regarding subclassing and disabling methods. A slight complication, however, arises from the fact that ``DirEntry`` may represent a path using ``bytes``, making the ``.path`` attribute also an instance of ``bytes`` instead of ``str``. This issue could be solved by at least two different approaches: 1. Make ``bytes``-kind DirEntry instances, interpreted as ``str`` instances, equivalent to ``os.fsdecode(direntry.path)``. 2. Instantiate a different ``DirEntry`` class for ``bytes`` paths, perhaps in a way similar to how the ``antipathy`` path library instantiates ``bPath`` when the ``bytes`` type is used. The future of plain string paths and ``os.path``? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It is possible to imagine having both ``os.path`` and ``pathlib`` coexist, as long as they co-operate well. Potentially, things like ``open("filename.txt")`` with a plain string argument will always be accepted. However, if regardless of what people use Python for, they slowly adopt path objects as the way to represent a path, the support for plain string paths may be deprecated and eventually dropped. On the one hand, to support the former situation, ``os.path`` functions can choose their return type to match the type of the arguments; with multiple different types in the arguments, ``pathlib`` might 'win' because it is already imported. On the other hand, to support the latter, all path-returning functions in the stdlib can begin to return pathlib objects, at first with ``str`` methods enabled with or without warning, and eventually, with ``str`` methods disabled. Literal syntax for paths: p-strings? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Should Python choose the *path* towards not allowing plain strings as paths, a convenient way to instantiate a path is desperately needed. As discussed in the recent python-ideas thread "Working with path objects: p-strings?", one possibility would be a new syntax like ``p"/path/to/file.ext"``, which would instantiate a path object. Another way of turning a string into a path could be to have a ``.path`` property on ``str`` objects that instantiates and returns a path object. It can be debated whether this 'Pythonic' or not. See also the next section. The ``.path`` attribute on path-like objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``DirEntry`` already had the ``.path`` attribute when it was introduced to the standard library in Python 3.5. It represents the absolute or relative path as a whole as a ``str`` or ``bytes`` instance. However, several people have raised the concern that the word ``path`` not referring to an actual path object may be misleading. However, if path objects are instances of str, the ``.path`` may in the future shift to mean the path object. In the case of ``pathlib`` paths, it would could be implemented as a property that returns ``self``, or during a transition phase, a path object with ``str`` functions enabled: .. code:: python @property def path(self): path = type(self)(self) path._enable_str_functionality = 'warn' return self ``DirEntry`` objects, on the other hand, could be converted to pathlib objects using the ``.path`` method. Similarly, ``str`` objects could have a similar property for conversion into a pathlib object (see previous section). Possibilities for making ``pathlib`` more lightweight ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If path objects were to become the norm in handling paths and file names, there may be a need for optimizations in terms of the speed and memory usage of path objects as well as the import time and memory footprint. Dependencies that are not always used by pathlib objects could also be imported lazily. Another base class for path-like objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Python already has multiple types that can represent paths-like objects. There could be one common base class for all of them, which would (at least at first) inherit from ``str``. ``DirEntry`` and ``PurePath`` would both be subclasses of this class. One would, however, need to answer the questions of what this class would be called, what it would look like, and what module would it be in (if not builtin). For now, let us call it ``PyRL`` for Python (Pyniversal?-) Resource Locator. This could also be a base class for URLs/URIs. Generalized Resource Locator addresses: a-strings? l-strings? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... not to mention g-strings! A generalized concept may be valuable in the future, because the distinction between local and remote is getting more and more vague. As discussed in the python-ideas thread "URLs/URIs + pathlib.Path + literal syntax = ?", it is possible to quite reliably distinguish common types of URLs from filesystem paths. If this became the norm, many Python-written programs could 'magically' accept URLs as input file paths by simply calling the ``PyRL(...)``, which could be equivalent to some literal syntax for use in a scripting, testing or interactive setting, or when loading config files from fixed locations.

On 04/06/2016 09:04 AM, Koos Zevenhoven wrote:
Fair enough, and done! :) Overall: excellent research and interesting ideas. However, before spending too much more time on this realize that the fate of pathlib in the stdlib is being discussed on Python-Dev, so you may want to move your efforts over there.
Proposal: Making path objects inherit from ``str``. ===================================================
Rationale ---------
An honorable mention for antipathy! :)
Specification of standard library changes -----------------------------------------
Overriding all ``str``-specific methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Have to be careful here -- the os module uses some of those string methods to work with string-paths.
Overriding ``.__add__`` to disable adding two path objects together ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See above.
This would be good for the pathlib backport, which I think should inherit from str/unicode.
Good idea.
The 'warn' setting would be preferable.
I don't see the stdlib ever /not/ accepting string paths.
Literal syntax for paths: p-strings? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I don't see this as necessary.
Another way of turning a string into a path could be to have a ``.path`` property on ``str`` objects that instantiates and returns a path object.
And definitely not this.
Generalized Resource Locator addresses: a-strings? l-strings? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This was disproved. Just about anything can be a posix-path. -- ~Ethan~

On Thu, Apr 7, 2016 at 2:04 AM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
I'm not sure I entirely understand what's going on here. Your rationale is "it should be possible to use a Path as a str", and that's supported by your proposal to subclass str; but then you want to override a bunch of methods to force users to be aware that a Path is *not* a str. Why subclass only to force people to distinguish? If you want to make a Path act "just a little bit" like a str, I'd expect to go the other way: don't subclass str, and add in a specific set of methods to provide str-like functionality. Or am I missing something here? ChrisA

On Wed, Apr 6, 2016 at 1:28 PM Chris Angelico <rosuav@gmail.com> wrote:
Indeed this was my thought as well. Originally (this was discussed on the python subreddit to some extent as well), my objection was that a simple program like say, left_pad.py would break depending on the implementation. This addresses that issue, but in my opinion creates even deeper problems. Its not hard to conceive some formatting or pretty printing library that at some point contains the code def concat(user_list: List[String]) -> String: return functools.reduce(lambda x, y: x + y, user_list) where user_list is given from some sort of user input. This change would lead to some nasty bugs, where this action will throw an error when any two adjacent args are Paths. What's worse is that type annotations actually exacerbates this issue since the objects disobey the typechecker. Java subclasses aren't allowed to just unimplement a parent's methods, they shouldn't in python either. This argument seems to boil to "well a string is one valid representation of a path, so a Path is a string" but by that definition a Path is also a node in a tree, and so we should create a Tree class and have Path subclass TreeNode, so that we could find out its children and depth from the root. There are so many surprising results of a change like this that I can't see a reason to do this. -Josh

On 6 April 2016 at 17:04, Koos Zevenhoven <k7hoven@gmail.com> wrote:
Thanks for putting this together. I don't agree with much of it, but it's good to have the proposal stated so clearly.
While I've read the whole proposal, there's a lot to digest, and honestly I don't have the time to spend on this right now - so my apologies if I missed anything relevant. Hopefully my comments will make sense anyway :-)
I'm not sure I agree with this. To me, "filesystem paths" are a things which define the location of a file in a filesystem. They are not strings, even though they can be represented by strings (actually, they can't, technically - POSIX allows nearly arbitrary bytestrings for for paths, whereas Python strings are Unicode). Saying a path is a string is no more true than saying that integers are strings that represent whole numbers. Traditionally, people haven't thought of paths as objects because not many languages provide *any* sort of abstraction of paths - doing so in a cross-platform way is *hard* and most languages duck the issue. Python is exceptional in providing good path manipulation functions (even os.path is streets ahead of what many other languages offer).
As noted above, this makes no sense to me. By this argument "integers are strings of characters that represent numbers". The string representation of an object is *not* the object.
You mention both practicality and purity here but only offer "practical" arguments. The practical arguments are fair, and as far as I can see are the crux of any proposal to make Path objects subclass str. You should focus on this, and not try to argue that subclassing str is "right" in any purity sense.
This seems to me to be a key point - if (many) of the operations that are part of the interface of a string don't make sense for a filesystem path, doesn't that very clearly make the point that filesystem paths are *not* strings?
There is prior art in subclassing the Python ``str`` type to build a path object type. Packages on PyPI (TODO: list more?) that do this
pylib's path.local object (used in pytest in particular) is another.
I don't think there's been any attempt made to collect or quantify that experience, though. All I've ever seen is hearsay "I've not heard of anyone reporting problems" evidence. While anecdotal evidence is a lot better than nothing, it's of limited value. Apart from anything else, there's a self-selection issue - people who *did* have problems may simply have stopped using the libraries.
This seems to me to be the biggest issue. You're proposing that Path objects will subclass strings, but code written to expect a string may fail if passed a Path object. Presumably though that code works if passed str(the_path_object) - as it works correctly right now. Maybe it's doing "string-like" things, but equally, it's presumably intended to. Consider a "make path uppercase" function that simply does .upper() on its argument. You are proposing a class that is a subclass of str, but calling str() on an instance gives an object that behaves differently. That's bizarre at best, and realistically I'd describe it as fundamentally broken. I don't want to argue type-theory here, but I'm pretty sure that violates most people's intuition of what inheritance means.
This is a huge chunk of extra complexity, both in terms of implementation, and even more so in terms of understanding. If someone wants a "real" string, just call cast using str() or use the .path attribute. This whole section of the proposal says to me that you haven't actually solved the problem you're trying to solve - you still expect people to have problems passing Path objects to functions that aren't expecting them, and you've had to consider how to work round that. The fact that you came up with (in effect) a "configuration flag" on an immutable object like a Path rather than just using the existing "give me a real string" options on Path, implies that your proposal is not well thought through in this area. Here's some questions for you (but IMO this section is unfixable - no matter what answers you give, I still consider this whole mechanism as a non-starter). * Are Path objects hashable, given they now have a mutable attribute? * If you change the _enable_str_functionality flag, does the object's hash change? * If it doesn't, what happens when you add 2 identical paths with different _enable_str_functionality flags to a set? * If you enable str methods do they return str or Path objects? If the latter, what is the flag set to on these objects? Basically, you broke a fundamental property of both Path and string objects - they are immutable.
This can be done with the current Path objects (and should). It is unrelated to this proposal. And it doesn't need to be restricted to "if overridden string functions are used". Just do it regardless, and all existing functions work immediately. The only issue is functions that *return* paths. And they are no harder under current Pathlib than under your proposal - a decision on what type to return has to be made either way.
Overcomplicated. If you accept paths, just do getattr(patharg, 'path', patharg) and you're fine. If you return paths, do nothing (or if you prefer, think about your API and make a more considered decision). Your proposal means that library authors have to actually consider whether the new path objects will cause subtle failures, because the string-like objects will not fail quickly, leading to bugs propogating into unrelated code. Overall, I'm a strong -1. If we subclass str, we should just do it and not over-complicate like this. I'm still not convinced we should do so, but your proposal *has* convinced me that any attempt to compromise is going to end up being worse than either option. Sorry I can't be more positive - but again, thanks for the thorough write-up. Paul

On 04/06/2016 09:04 AM, Koos Zevenhoven wrote:
Fair enough, and done! :) Overall: excellent research and interesting ideas. However, before spending too much more time on this realize that the fate of pathlib in the stdlib is being discussed on Python-Dev, so you may want to move your efforts over there.
Proposal: Making path objects inherit from ``str``. ===================================================
Rationale ---------
An honorable mention for antipathy! :)
Specification of standard library changes -----------------------------------------
Overriding all ``str``-specific methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Have to be careful here -- the os module uses some of those string methods to work with string-paths.
Overriding ``.__add__`` to disable adding two path objects together ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
See above.
This would be good for the pathlib backport, which I think should inherit from str/unicode.
Good idea.
The 'warn' setting would be preferable.
I don't see the stdlib ever /not/ accepting string paths.
Literal syntax for paths: p-strings? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I don't see this as necessary.
Another way of turning a string into a path could be to have a ``.path`` property on ``str`` objects that instantiates and returns a path object.
And definitely not this.
Generalized Resource Locator addresses: a-strings? l-strings? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This was disproved. Just about anything can be a posix-path. -- ~Ethan~

On Thu, Apr 7, 2016 at 2:04 AM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
I'm not sure I entirely understand what's going on here. Your rationale is "it should be possible to use a Path as a str", and that's supported by your proposal to subclass str; but then you want to override a bunch of methods to force users to be aware that a Path is *not* a str. Why subclass only to force people to distinguish? If you want to make a Path act "just a little bit" like a str, I'd expect to go the other way: don't subclass str, and add in a specific set of methods to provide str-like functionality. Or am I missing something here? ChrisA

On Wed, Apr 6, 2016 at 1:28 PM Chris Angelico <rosuav@gmail.com> wrote:
Indeed this was my thought as well. Originally (this was discussed on the python subreddit to some extent as well), my objection was that a simple program like say, left_pad.py would break depending on the implementation. This addresses that issue, but in my opinion creates even deeper problems. Its not hard to conceive some formatting or pretty printing library that at some point contains the code def concat(user_list: List[String]) -> String: return functools.reduce(lambda x, y: x + y, user_list) where user_list is given from some sort of user input. This change would lead to some nasty bugs, where this action will throw an error when any two adjacent args are Paths. What's worse is that type annotations actually exacerbates this issue since the objects disobey the typechecker. Java subclasses aren't allowed to just unimplement a parent's methods, they shouldn't in python either. This argument seems to boil to "well a string is one valid representation of a path, so a Path is a string" but by that definition a Path is also a node in a tree, and so we should create a Tree class and have Path subclass TreeNode, so that we could find out its children and depth from the root. There are so many surprising results of a change like this that I can't see a reason to do this. -Josh

On 6 April 2016 at 17:04, Koos Zevenhoven <k7hoven@gmail.com> wrote:
Thanks for putting this together. I don't agree with much of it, but it's good to have the proposal stated so clearly.
While I've read the whole proposal, there's a lot to digest, and honestly I don't have the time to spend on this right now - so my apologies if I missed anything relevant. Hopefully my comments will make sense anyway :-)
I'm not sure I agree with this. To me, "filesystem paths" are a things which define the location of a file in a filesystem. They are not strings, even though they can be represented by strings (actually, they can't, technically - POSIX allows nearly arbitrary bytestrings for for paths, whereas Python strings are Unicode). Saying a path is a string is no more true than saying that integers are strings that represent whole numbers. Traditionally, people haven't thought of paths as objects because not many languages provide *any* sort of abstraction of paths - doing so in a cross-platform way is *hard* and most languages duck the issue. Python is exceptional in providing good path manipulation functions (even os.path is streets ahead of what many other languages offer).
As noted above, this makes no sense to me. By this argument "integers are strings of characters that represent numbers". The string representation of an object is *not* the object.
You mention both practicality and purity here but only offer "practical" arguments. The practical arguments are fair, and as far as I can see are the crux of any proposal to make Path objects subclass str. You should focus on this, and not try to argue that subclassing str is "right" in any purity sense.
This seems to me to be a key point - if (many) of the operations that are part of the interface of a string don't make sense for a filesystem path, doesn't that very clearly make the point that filesystem paths are *not* strings?
There is prior art in subclassing the Python ``str`` type to build a path object type. Packages on PyPI (TODO: list more?) that do this
pylib's path.local object (used in pytest in particular) is another.
I don't think there's been any attempt made to collect or quantify that experience, though. All I've ever seen is hearsay "I've not heard of anyone reporting problems" evidence. While anecdotal evidence is a lot better than nothing, it's of limited value. Apart from anything else, there's a self-selection issue - people who *did* have problems may simply have stopped using the libraries.
This seems to me to be the biggest issue. You're proposing that Path objects will subclass strings, but code written to expect a string may fail if passed a Path object. Presumably though that code works if passed str(the_path_object) - as it works correctly right now. Maybe it's doing "string-like" things, but equally, it's presumably intended to. Consider a "make path uppercase" function that simply does .upper() on its argument. You are proposing a class that is a subclass of str, but calling str() on an instance gives an object that behaves differently. That's bizarre at best, and realistically I'd describe it as fundamentally broken. I don't want to argue type-theory here, but I'm pretty sure that violates most people's intuition of what inheritance means.
This is a huge chunk of extra complexity, both in terms of implementation, and even more so in terms of understanding. If someone wants a "real" string, just call cast using str() or use the .path attribute. This whole section of the proposal says to me that you haven't actually solved the problem you're trying to solve - you still expect people to have problems passing Path objects to functions that aren't expecting them, and you've had to consider how to work round that. The fact that you came up with (in effect) a "configuration flag" on an immutable object like a Path rather than just using the existing "give me a real string" options on Path, implies that your proposal is not well thought through in this area. Here's some questions for you (but IMO this section is unfixable - no matter what answers you give, I still consider this whole mechanism as a non-starter). * Are Path objects hashable, given they now have a mutable attribute? * If you change the _enable_str_functionality flag, does the object's hash change? * If it doesn't, what happens when you add 2 identical paths with different _enable_str_functionality flags to a set? * If you enable str methods do they return str or Path objects? If the latter, what is the flag set to on these objects? Basically, you broke a fundamental property of both Path and string objects - they are immutable.
This can be done with the current Path objects (and should). It is unrelated to this proposal. And it doesn't need to be restricted to "if overridden string functions are used". Just do it regardless, and all existing functions work immediately. The only issue is functions that *return* paths. And they are no harder under current Pathlib than under your proposal - a decision on what type to return has to be made either way.
Overcomplicated. If you accept paths, just do getattr(patharg, 'path', patharg) and you're fine. If you return paths, do nothing (or if you prefer, think about your API and make a more considered decision). Your proposal means that library authors have to actually consider whether the new path objects will cause subtle failures, because the string-like objects will not fail quickly, leading to bugs propogating into unrelated code. Overall, I'm a strong -1. If we subclass str, we should just do it and not over-complicate like this. I'm still not convinced we should do so, but your proposal *has* convinced me that any attempt to compromise is going to end up being worse than either option. Sorry I can't be more positive - but again, thanks for the thorough write-up. Paul
participants (5)
-
Chris Angelico
-
Ethan Furman
-
Joshua Morton
-
Koos Zevenhoven
-
Paul Moore