Better stdlib support for Path objects

Over in issue 22570, I lament the fact that while pathlib is awesome, its wider support in the stdlib is pretty sparse. I've tried to convert parts of a medium sized Python 3 application from os.path to pathlib and found this lack of support rather demotivating. Yes, it's fairly easy to wrap Path objects in str() to pass them to stdlib methods that expect only strings, but it's a lot of work in user code and I find that the resulting str()s are distracting. It's a disincentive. Antoine provided a link to a previous discussion[*] but that didn't go very far. One simple solution would be to sprinkle str() calls in various stdlib methods, but I'm not sure if that would fail miserably in the face of bytes paths (if the original stdlib API even accepts bytes paths). The suggestion in the issue is to add a "path protocol" and the referenced article suggests .strpath and .bytespath. OTOH, isn't str() and bytes() enough? I don't have any other brilliant ideas, but I opened the issue and am posting here to see if we can jump start another discussion for Python 3.5. I'd *like* to use more Paths, but not at the expense of my own code's readability. Yes, I'd sacrifice a bit of readability in the stdlib, especially if that would cover more use cases. Cheers, -Barry [*] https://mail.python.org/pipermail/python-ideas/2014-May/027869.html

Previous attempts at a pathlib like thing outside of the stdlib had the object inherit from str so it’d still work just fine in all those APIs. It even makes a certain bit of sense since a path really is just a specialized string. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Oct 06, 2014, at 01:53 PM, Donald Stufft wrote:
Yeah, except: http://bugs.python.org/issue22570#msg228716 -Barry

On 10/06/2014 07:58 PM, Antoine Pitrou wrote:
If that's a concern, couldn't that be solved with an abstract base class "Path" ? pathlib path classes could be inheriting from it, while modules like os, json and others that wish to accept also strings could register str, then call str() on all input that passes an isinstance(input, Path) check, raise an error otherwise. The remaining question then would be where the abstract base class should live. Wolfgang

On Tue, Oct 7, 2014 at 8:03 AM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
then call str() on all input that passes an isinstance(input, Path) check, raise an error otherwise.
Calling: path=str(path) within all API functions that accept file paths is a good enough solution, and one that doesn't change dependencies in stdlib. Most of the API functions I've had to call with str(path) don't return another path, but some more-structured else (json, csv, configparser, ...). Maintainers could silently start adding the path=str(path) prolog to the API so declaring official support for Path can be postponed until it is certified that there's 100% coverage in stdlib. Cheers, -- Juancarlo *Añez*

On Mon, Oct 6, 2014 at 10:47 AM, Barry Warsaw <barry@python.org> wrote:
I'd turn it around. You can construct a Path from an argument that can be either a string or another Path. Example:
So you could start refactoring stdlib code to use Path internally without forcing the callers to use Path, but still *allow* the callers to pass a Path. Though I'm not sure how this would work for return values without breaking backwards compatibility -- you'd have to keep returning strings and the callers would have to use the same mechanism to go back to using Paths. -- --Guido van Rossum (python.org/~guido)

On Oct 06, 2014, at 11:04 AM, Guido van Rossum wrote:
That's a very interesting perspective, and one I'd like to pursue further. I wonder if we can take a polymorphic approach similar to some bytes/str APIs, namely that if the argument is a pathlib object, a pathlib object could be returned, and if a str was passed, a str would be returned. An example is ConfigParser.read() which currently accepts only strings. I want to pass it a Path. It would be really useful if this method returned Paths when passed paths (they're used to verify which arguments were actually opened and read). There's probably a gazillion APIs that *could* be pathlib-ified, and I'm not looking to do a comprehensive expansion across the entire stdlib. But I think we could probably take a per-module approach, at least at first, to see if there are any hidden gotchas. Cheers, -Barry

On 2014-10-06 19:33, Barry Warsaw wrote:
I wonder whether it might be cleaner to use a simple function for it, something like: def pathify(result_path, like=original_path): if isinstance(original_path, Path): return Path(result_path) if isinstance(original_path, str): return str(result_path) raise TypeError('original path must be a path or a string, not {!r}'.format(type(original_path)))

This is a strong solution for new code. A quick hack to fix old code would be to simply cast the return values from anything returning a "path" to a string. If it's already a string, it'll be the same string. If it's a pathlib path, it'll end up a string. That minimises the effort needed to bring an old codebase up to speed with pathlib while implementing new code in pathlib, right? And you can work downwards from there; as long as all callers expecting a path do an explicit str-cast, you're safe either way? On 06/10/14 19:29, Guido van Rossum wrote:
-- Twitter: @onetruecathal, @formabiolabs Phone: +353876363185 Blog: http://indiebiotech.com miniLock.io: JjmYYngs7akLZUjkvFkuYdsZ3PyPHSZRBKNm6qTYKZfAM

On 7 Oct 2014 04:26, "Barry Warsaw" <barry@python.org> wrote:
pathlib is quite high level, so there's a chance of introducing undesirable circular dependencies in introducing it too broadly. With ipaddress and, as far as I am aware, pathlib, the intent was for it to be useful as a manipulation library, but to drop back to a serialised representation for transfer to other libraries (including the rest of the stdlib). This helps avoid the monolithic object model coupling that tends to pervade large Java applications. If the current spelling of that is too verbose/awkward/error prone, then I'd prefer to see that tackled directly (e.g. by introducing some appropriate calculated properties), rather than switching to the highly coupled all pervasive standard library change we were trying to avoid. Regards, Nick.

On 7 Oct 2014 19:48, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
or
Note that a path protocol (with appropriate C API support) would also address this concern with excessive coupling to a specific concrete type. A single dispatch generic function as an adapter API would be another option, but would likely pose bootstrapping problems for the lowest level interfaces like os.path and the open builtin. Cheers, Nick.
Regards, Nick.

On Oct 07, 2014, at 07:54 PM, Nick Coghlan wrote:
I wouldn't expect low level APIs like os.path and built-in open to accept Path objects. pathlib already covers most of those use cases, and whatever is missing from there can probably be easily added. It's higher level libraries accepting Path objects that is more interesting I think. Cheers, -Barry

I might agree with you if there wasn’t 20 years of code that expects to be able to pass a “path” in to various places. Having to keep a mental note, or worse look up in the documentation every time, where I’m expected to pass a Path object and where I’m expected to pass a str is just about the worst UX I can imagine. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Donald Stufft writes:
On Oct 7, 2014, at 9:25 AM, random832@fastmail.us wrote:
You wouldn't need a mental memo. Just do "import os" rather than "from os import *" and the "os." prefix will tell you you need a str. Otherwise use a Path. That would be the goal, I expect. How to get there, I'm not sure.

I never use star imports. The problem is that not only does tons of stuff in the stdlib currently accept only str, but so do tons of things on PyPI. Without looking at the implementation/docs for each one of these items I won’t know if I can pass it a str or a Path (or more likely I won’t know if I need to coerce my Path to a str since i doubt anyone is going to make Path only APIs). For me personally this means that pathlib will likely never be a useful tool because I find the need to coerce all over the place far worse than using the relevant os.path functions. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 10/07/2014 04:59 PM, Stephen J. Turnbull wrote:
That's exactly what I'd be worried about. It would require a big effort to convert enough APIs to make the few that don't take Paths insignificant. That would also signal a strong urge to third-party libs to become Path-aware. However, I'm skeptical that python-dev can muster enough energy for this effort. I believe that a .path attribute (name to be discussed) is probably as good as we can do. In reverse, it means that Path should grow many utility methods for common operations.
Note that Barry said: "I wouldn't expect low level APIs like os.path and built-in open to accept Path objects." which refers to open(), not os.open(). Georg

On 07.10.2014 11:48, Nick Coghlan wrote:
The approach to use pathlib internally in the stdlib while making sure that callers will get strings as return values should work fine. We've been using a similar approach with mxURL in some of our application server code. mxURL which provides a parsed URL object that implements common tasks such as joining URLs, rebuilding, etc. The approach makes code more readable, you get the option of passing in a string or an already parsed URL object (saving some overhead) and code using the APIs get strings which prevents other code from complaining about a wrong type. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 10/6/2014 2:04 PM, Guido van Rossum wrote:
To me, the first question to me is whether we 'believe' in pathlib enough to really support it in the stdlib and encourage its use.
If yes (which the above seems to hint), the second question is how to enlarge apis while remaining back compatible. For functions that take a pathstring in and do not return a pathstring, just allow a Path as alternate input type. There are already functions that take either a pathstring or open file-like object.
Some of the os functions that take a pathstring and return a pathstring are already 'duplicated' as pathlib functions or Path methods that map Path to Path. For others, there is a choice of duplicating the function in pathlib or making the output type depend on the input type. I do not remember the current scandir spec, but it should when introduced at least optionally accept and produce paths and perhaps live in pathlib instead of os. I suspect that functions that produce a pathstring without a pathstring input, such as int file descriptor to filename, are rare and low-level enough to leave as is. But that should be specified in any transition-plan pep. -- Terry Jan Reedy

On Mon, Oct 6, 2014 at 2:39 PM, Terry Reedy <tjreedy@udel.edu> wrote:
To me, the first question to me is whether we 'believe' in pathlib enough to really support it in the stdlib and encourage its use.
os.path is cumbersome (horrible?) when compared to pathlib. The issue is prevalent throughout stdlib. For example, json won't take a pathlib.Path, so you have to pass str(mypath). Cheers, -- Juancarlo *Añez*

On 10/06/2014 10:20 PM, Juancarlo Añez wrote:
Horrible? Please.
The issue is prevalent throughout stdlib. For example, json won't take a pathlib.Path, so you have to pass str(mypath).
That it is prevalent is exactly the problem with fixing it, in my opinion. This is like the situation with context managers. We've taken about 3-4 minor releases to add "with" support to objects that can logically support it. Nobody remembers this, so people have to refer to the docs (or the code) to see if and when e.g. smtplib.SMTP gained "with" support. However, the context managers are a few dozen classes at most. With paths, there are hundreds of APIs that would have to be updated to take Paths in the stdlib alone. Granted, a good portion would probably work fine since they only pass through paths to lower level APIs, but still every one has to be checked. Going by precedent, that's not something that we would be able to do consistently, even throughout several releases. (Another precedent is Argument Clinic.) cheers, Georg

On Mon, Oct 6, 2014 at 7:26 PM, Juancarlo Añez <apalala@gmail.com> wrote:
What do you think would be the nastier impacts of making pathlib.Path inherit from str?
Duplication of storage. Currently, pathlib.Path objects store a list of path components. To inherit meaningfully from str, they would have to store joined path string as well.

Hi, Not inheriting from built-in classes such as str, list or tuple was one of the design points of pathlib. It will not change in the future ;-) PEP 428 outlines this, but you can probably find a more detailed discussion in the python-ideas archive. Regards Antoine. On Tue, 07 Oct 2014 00:52:55 +0100 MRAB <python@mrabarnett.plus.com> wrote:

On Mon, Oct 6, 2014 at 7:58 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
See also rejected PEP 355: "Subclassing from str is a particularly bad idea; many string operations make no sense when applied to a path." http://legacy.python.org/dev/peps/pep-0355/ (I would add that many str operations make no sense - period, so propagating them into newer designs would be a mistake.)

Many str operations make no sense when applied to lots of different types of strings, even moreso many bytes methods make no sense when applied to any kind of bytes except a very narrow subset. I’m pretty sure that “does every single method make sense in every scenario” it not a useful distinction to make. However the sheer weight of APIs out there that expect str, and only str means that either pathlib isn’t very useful without wrapping every call in a str() or using some attribute/method on it to convert to str. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Oct 06, 2014, at 11:19 PM, Georg Brandl wrote:
I appreciate that this is a problem with such transitions. Is it an argument for never doing so though? Is it better that the stdlib prohibit adoption of advanced features and libraries than to do so piecemeal? I'm not so sure; even though it's admittedly annoying when such support is missing, it's really nice when they're there. How useful is pathlib if it can't be used with the stdlib? Cheers, -Barry

On Mon, 6 Oct 2014 23:30:01 -0400 Donald Stufft <donald@stufft.io> wrote:
IMO it's reasonable, but assuming we devise a dedicated protocol for getting a path's representation (e.g. __strpath__). Concrete type checking should be avoided. I hope someone (Barry? :-)) can take the time to think it out. Regards Antoine.

On Oct 07, 2014, at 11:34 AM, Antoine Pitrou wrote:
What would __strpath__ do that __str__ wouldn't do? Or do you think it's better to explicitly disallow str-like objects that aren't path objects? What I'm trying to understand is whether str(argument) is that "path protocol" or whether there's a good reason that something else that's specifically not str-ification is required. Cheers, -Barry

On Wed, Oct 8, 2014 at 12:37 AM, Barry Warsaw <barry@python.org> wrote:
Currently, __str__ will happily return a string representation of basically anything. More generally than __strpath__, maybe what's needed is a method "give me a str that accurately represents this object", to be provided only by those types which are "virtually strings". It'd be an alternative to subclassing str, as a means of saying "I'm a string!". Or maybe this should be done as a collections.abc - there's ByteString, maybe there should be TextString. Then anything that registers itself as a TextString should be assumed to function as a string, and a path could just register itself. Would that make more sense than __strpath__ does? ChrisA

On 7 October 2014 14:37, Barry Warsaw <barry@python.org> wrote:
What would __strpath__ do that __str__ wouldn't do? Or do you think it's better to explicitly disallow str-like objects that aren't path objects?
What is a str-like object? A lot of objects are acceptable to str(); most of them aren't "str-like" in any reasonable sense of the term (e.g. function, int), and probably shouldn't be acceptable as paths. edk

On 7 October 2014 14:47, Barry Warsaw <barry@python.org> wrote:
Yeah - but whatever the term, I think being able to str() something is too weak a predicate for using it as a path. If there were a way for an object to declare itself str-like in the sense that Paths are, that'd be much more reasonable, imv.

On 7 October 2014 23:37, Barry Warsaw <barry@python.org> wrote:
It's mostly a matter of offering better failure modes - the reasons would be similar to why we added operator.index and __index__ in order to expand various APIs from "only int or long objects" without expanding them so far that they also accepted float objects or strings the way coercing via "int" would. Using str(x) implicitly allows lots of nonsense that should throw an immediate TypeError to instead throw OSError later on (or, worse, perhaps even appear to work). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 October 2014 00:40, Nick Coghlan <ncoghlan@gmail.com> wrote:
Combining this thought with Chris Angelico's reply, I actually wonder if the index vs int analogy is even more applicable than I first thought. What if the protocol was __text__ with a new text() builtin (or at least an operator.text() function), and it was advised to only be implemented by types where they were, at least conceptually, truly representable as strings? That's basically what was done with the __index__ method in http://www.python.org/dev/peps/pep-0357/ to introduce ducktyping to several APIs that previously only worked with builtin int/long objects. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Oct 8, 2014 at 2:39 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I like the comparison with __index__. It's up to the class to decide whether it wants to be "truly representable as a string". Command-line arguments probably would be, in some canonical form. Config files probably not, as they need their structure, but if you want your config object to act exactly like a text string, then you define __text__. ChrisA

On Wed, Oct 8, 2014 at 2:55 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
You pass a Path object to anything whatsoever. If it wants a Path, it uses it. As soon as that Path gets picked up by something that was expecting a string, it'll easily and conveniently coerce to string. But if you pass something other than a string or string-equivalent, it'll throw an error rather than calling str() on it. Where is the definition of "able to convert to str implicitly"?
Because this could just call Path.__text__(), get back a string, and use that. Basically, the use-case is that you could create something that's not a string, but can be used like a string, and APIs unaware of it don't need to be made aware of it. Downside: These "text-aware" objects would require Python 3.whatever, and would be hard to backport. On older versions, they wouldn't implicitly become strings. ChrisA

On Oct 7, 2014, at 7:40, Nick Coghlan <ncoghlan@gmail.com> wrote:
The "worse" case would be pretty common. Any API that writes files would just try to create a file named str(path), and on most POSIX platforms that would succeed for almost anything--bytes, int, TextIOWrapper, list of paths that you forgot to open in a listcomp, sequence of chars that you forgot to ''.join, function that you meant to call instead of just referencing, arbitrary generic-repr'd or constructor-repr'd instance, ... Most of my memories of Alpha (a Tcl-programmable classic Mac text editor) are of debugging and cleaning up after errors exactly like this. It would be even more fun for novices trying to figure out how to pass names with angle brackets and quotes to find or rm in a shell they barely know how to use...

On 6 October 2014 18:47, Barry Warsaw <barry@python.org> wrote:
I find it worse than a disincentive, it makes understanding the code perceptibly harder, which is a maintenance issue. Having an attribute that returns the string representation would be a substantial improvement (as it's the extra parentheses from the str call that I find the most distracting, that and the code smell that an "explicit cast" involves). Having more things accept Path objects would be good, but there will always be 3rd party libraries, as well as places where you want to pass a pathname that really *shouldn't* have to expect them. Paul

On Oct 07, 2014, at 02:55 PM, Paul Moore wrote:
I realize there's another thing that bugs me about sprinkling str() calls all over the place, and this relates to my other question about whether str()-ability is "the path protocol". The problem is that if I'm looking at some random code and see: my_parser.load(str(path)) I really don't have any idea what 'path' is. Maybe that's a good thing, but in the few cases where I did this, it seemed bad. ;) OTOH, if I saw this, it would be a strong clue that path were a pathlib object: my_parser.load(path.string_path) substituting .string_path for whatever color the shed gets painted. Cheers, -Barry

On 10/07/2014 07:24 AM, Barry Warsaw wrote:
Neither should be needed: my_parser.load(path) should do the trick. What was the point of adding pathlib to the stdlib if we cannot use it with the stdlib? Having a __strpath__ and/or __bytespath__ would also allow third-party path libraries to be utilized. -- ~Ethan~

That would be ideal, however very few (any?) APIs in Python stdlib supports pathlib as it stands. Passing paths around is sprinkled all over the stdlib and history suggests it’ll be a fairly lengthy process to actually get all of those stdlib locations figured out. That doesn’t even begin to touch on all of the places *not* in the stdlib that expects str based paths, some of which are C code so making them use a Path object isn’t the easiest thing ever.
Having a __strpath__ and/or __bytespath__ would also allow third-party path libraries to be utilized.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Does path.py have different performance in this respect? I like the Path.isdir(), Path.walk() methods; but maybe not for everyone. Way OT, but similar support for URIs (e.g. URLObject) would likely need to take a similar approach: http://www.reddit.com/r/Python/comments/1r7h1t/python_objects_for_working_wi... On Oct 6, 2014 12:50 PM, "Barry Warsaw" <barry@python.org> wrote:

On Tue, Oct 7, 2014 at 8:25 PM, Wes Turner <wes.turner@gmail.com> wrote:
URLObject is actually a good reason against __strpath__: path.open() should work whether path is a filesystem path, or a URL object (with pathlib API), or any other path-like object; while open(path) would not. I think path.open() should be encouraged.

Previous attempts at a pathlib like thing outside of the stdlib had the object inherit from str so it’d still work just fine in all those APIs. It even makes a certain bit of sense since a path really is just a specialized string. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Oct 06, 2014, at 01:53 PM, Donald Stufft wrote:
Yeah, except: http://bugs.python.org/issue22570#msg228716 -Barry

On 10/06/2014 07:58 PM, Antoine Pitrou wrote:
If that's a concern, couldn't that be solved with an abstract base class "Path" ? pathlib path classes could be inheriting from it, while modules like os, json and others that wish to accept also strings could register str, then call str() on all input that passes an isinstance(input, Path) check, raise an error otherwise. The remaining question then would be where the abstract base class should live. Wolfgang

On Tue, Oct 7, 2014 at 8:03 AM, Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
then call str() on all input that passes an isinstance(input, Path) check, raise an error otherwise.
Calling: path=str(path) within all API functions that accept file paths is a good enough solution, and one that doesn't change dependencies in stdlib. Most of the API functions I've had to call with str(path) don't return another path, but some more-structured else (json, csv, configparser, ...). Maintainers could silently start adding the path=str(path) prolog to the API so declaring official support for Path can be postponed until it is certified that there's 100% coverage in stdlib. Cheers, -- Juancarlo *Añez*

On Mon, Oct 6, 2014 at 10:47 AM, Barry Warsaw <barry@python.org> wrote:
I'd turn it around. You can construct a Path from an argument that can be either a string or another Path. Example:
So you could start refactoring stdlib code to use Path internally without forcing the callers to use Path, but still *allow* the callers to pass a Path. Though I'm not sure how this would work for return values without breaking backwards compatibility -- you'd have to keep returning strings and the callers would have to use the same mechanism to go back to using Paths. -- --Guido van Rossum (python.org/~guido)

On Oct 06, 2014, at 11:04 AM, Guido van Rossum wrote:
That's a very interesting perspective, and one I'd like to pursue further. I wonder if we can take a polymorphic approach similar to some bytes/str APIs, namely that if the argument is a pathlib object, a pathlib object could be returned, and if a str was passed, a str would be returned. An example is ConfigParser.read() which currently accepts only strings. I want to pass it a Path. It would be really useful if this method returned Paths when passed paths (they're used to verify which arguments were actually opened and read). There's probably a gazillion APIs that *could* be pathlib-ified, and I'm not looking to do a comprehensive expansion across the entire stdlib. But I think we could probably take a per-module approach, at least at first, to see if there are any hidden gotchas. Cheers, -Barry

On 2014-10-06 19:33, Barry Warsaw wrote:
I wonder whether it might be cleaner to use a simple function for it, something like: def pathify(result_path, like=original_path): if isinstance(original_path, Path): return Path(result_path) if isinstance(original_path, str): return str(result_path) raise TypeError('original path must be a path or a string, not {!r}'.format(type(original_path)))

This is a strong solution for new code. A quick hack to fix old code would be to simply cast the return values from anything returning a "path" to a string. If it's already a string, it'll be the same string. If it's a pathlib path, it'll end up a string. That minimises the effort needed to bring an old codebase up to speed with pathlib while implementing new code in pathlib, right? And you can work downwards from there; as long as all callers expecting a path do an explicit str-cast, you're safe either way? On 06/10/14 19:29, Guido van Rossum wrote:
-- Twitter: @onetruecathal, @formabiolabs Phone: +353876363185 Blog: http://indiebiotech.com miniLock.io: JjmYYngs7akLZUjkvFkuYdsZ3PyPHSZRBKNm6qTYKZfAM

On 7 Oct 2014 04:26, "Barry Warsaw" <barry@python.org> wrote:
pathlib is quite high level, so there's a chance of introducing undesirable circular dependencies in introducing it too broadly. With ipaddress and, as far as I am aware, pathlib, the intent was for it to be useful as a manipulation library, but to drop back to a serialised representation for transfer to other libraries (including the rest of the stdlib). This helps avoid the monolithic object model coupling that tends to pervade large Java applications. If the current spelling of that is too verbose/awkward/error prone, then I'd prefer to see that tackled directly (e.g. by introducing some appropriate calculated properties), rather than switching to the highly coupled all pervasive standard library change we were trying to avoid. Regards, Nick.

On 7 Oct 2014 19:48, "Nick Coghlan" <ncoghlan@gmail.com> wrote:
or
Note that a path protocol (with appropriate C API support) would also address this concern with excessive coupling to a specific concrete type. A single dispatch generic function as an adapter API would be another option, but would likely pose bootstrapping problems for the lowest level interfaces like os.path and the open builtin. Cheers, Nick.
Regards, Nick.

On Oct 07, 2014, at 07:54 PM, Nick Coghlan wrote:
I wouldn't expect low level APIs like os.path and built-in open to accept Path objects. pathlib already covers most of those use cases, and whatever is missing from there can probably be easily added. It's higher level libraries accepting Path objects that is more interesting I think. Cheers, -Barry

I might agree with you if there wasn’t 20 years of code that expects to be able to pass a “path” in to various places. Having to keep a mental note, or worse look up in the documentation every time, where I’m expected to pass a Path object and where I’m expected to pass a str is just about the worst UX I can imagine. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Donald Stufft writes:
On Oct 7, 2014, at 9:25 AM, random832@fastmail.us wrote:
You wouldn't need a mental memo. Just do "import os" rather than "from os import *" and the "os." prefix will tell you you need a str. Otherwise use a Path. That would be the goal, I expect. How to get there, I'm not sure.

I never use star imports. The problem is that not only does tons of stuff in the stdlib currently accept only str, but so do tons of things on PyPI. Without looking at the implementation/docs for each one of these items I won’t know if I can pass it a str or a Path (or more likely I won’t know if I need to coerce my Path to a str since i doubt anyone is going to make Path only APIs). For me personally this means that pathlib will likely never be a useful tool because I find the need to coerce all over the place far worse than using the relevant os.path functions. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On 10/07/2014 04:59 PM, Stephen J. Turnbull wrote:
That's exactly what I'd be worried about. It would require a big effort to convert enough APIs to make the few that don't take Paths insignificant. That would also signal a strong urge to third-party libs to become Path-aware. However, I'm skeptical that python-dev can muster enough energy for this effort. I believe that a .path attribute (name to be discussed) is probably as good as we can do. In reverse, it means that Path should grow many utility methods for common operations.
Note that Barry said: "I wouldn't expect low level APIs like os.path and built-in open to accept Path objects." which refers to open(), not os.open(). Georg

On 07.10.2014 11:48, Nick Coghlan wrote:
The approach to use pathlib internally in the stdlib while making sure that callers will get strings as return values should work fine. We've been using a similar approach with mxURL in some of our application server code. mxURL which provides a parsed URL object that implements common tasks such as joining URLs, rebuilding, etc. The approach makes code more readable, you get the option of passing in a string or an already parsed URL object (saving some overhead) and code using the APIs get strings which prevents other code from complaining about a wrong type. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 10/6/2014 2:04 PM, Guido van Rossum wrote:
To me, the first question to me is whether we 'believe' in pathlib enough to really support it in the stdlib and encourage its use.
If yes (which the above seems to hint), the second question is how to enlarge apis while remaining back compatible. For functions that take a pathstring in and do not return a pathstring, just allow a Path as alternate input type. There are already functions that take either a pathstring or open file-like object.
Some of the os functions that take a pathstring and return a pathstring are already 'duplicated' as pathlib functions or Path methods that map Path to Path. For others, there is a choice of duplicating the function in pathlib or making the output type depend on the input type. I do not remember the current scandir spec, but it should when introduced at least optionally accept and produce paths and perhaps live in pathlib instead of os. I suspect that functions that produce a pathstring without a pathstring input, such as int file descriptor to filename, are rare and low-level enough to leave as is. But that should be specified in any transition-plan pep. -- Terry Jan Reedy

On Mon, Oct 6, 2014 at 2:39 PM, Terry Reedy <tjreedy@udel.edu> wrote:
To me, the first question to me is whether we 'believe' in pathlib enough to really support it in the stdlib and encourage its use.
os.path is cumbersome (horrible?) when compared to pathlib. The issue is prevalent throughout stdlib. For example, json won't take a pathlib.Path, so you have to pass str(mypath). Cheers, -- Juancarlo *Añez*

On 10/06/2014 10:20 PM, Juancarlo Añez wrote:
Horrible? Please.
The issue is prevalent throughout stdlib. For example, json won't take a pathlib.Path, so you have to pass str(mypath).
That it is prevalent is exactly the problem with fixing it, in my opinion. This is like the situation with context managers. We've taken about 3-4 minor releases to add "with" support to objects that can logically support it. Nobody remembers this, so people have to refer to the docs (or the code) to see if and when e.g. smtplib.SMTP gained "with" support. However, the context managers are a few dozen classes at most. With paths, there are hundreds of APIs that would have to be updated to take Paths in the stdlib alone. Granted, a good portion would probably work fine since they only pass through paths to lower level APIs, but still every one has to be checked. Going by precedent, that's not something that we would be able to do consistently, even throughout several releases. (Another precedent is Argument Clinic.) cheers, Georg

On Mon, Oct 6, 2014 at 7:26 PM, Juancarlo Añez <apalala@gmail.com> wrote:
What do you think would be the nastier impacts of making pathlib.Path inherit from str?
Duplication of storage. Currently, pathlib.Path objects store a list of path components. To inherit meaningfully from str, they would have to store joined path string as well.

Hi, Not inheriting from built-in classes such as str, list or tuple was one of the design points of pathlib. It will not change in the future ;-) PEP 428 outlines this, but you can probably find a more detailed discussion in the python-ideas archive. Regards Antoine. On Tue, 07 Oct 2014 00:52:55 +0100 MRAB <python@mrabarnett.plus.com> wrote:

On Mon, Oct 6, 2014 at 7:58 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
See also rejected PEP 355: "Subclassing from str is a particularly bad idea; many string operations make no sense when applied to a path." http://legacy.python.org/dev/peps/pep-0355/ (I would add that many str operations make no sense - period, so propagating them into newer designs would be a mistake.)

Many str operations make no sense when applied to lots of different types of strings, even moreso many bytes methods make no sense when applied to any kind of bytes except a very narrow subset. I’m pretty sure that “does every single method make sense in every scenario” it not a useful distinction to make. However the sheer weight of APIs out there that expect str, and only str means that either pathlib isn’t very useful without wrapping every call in a str() or using some attribute/method on it to convert to str. --- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

On Oct 06, 2014, at 11:19 PM, Georg Brandl wrote:
I appreciate that this is a problem with such transitions. Is it an argument for never doing so though? Is it better that the stdlib prohibit adoption of advanced features and libraries than to do so piecemeal? I'm not so sure; even though it's admittedly annoying when such support is missing, it's really nice when they're there. How useful is pathlib if it can't be used with the stdlib? Cheers, -Barry

On Mon, 6 Oct 2014 23:30:01 -0400 Donald Stufft <donald@stufft.io> wrote:
IMO it's reasonable, but assuming we devise a dedicated protocol for getting a path's representation (e.g. __strpath__). Concrete type checking should be avoided. I hope someone (Barry? :-)) can take the time to think it out. Regards Antoine.

On Oct 07, 2014, at 11:34 AM, Antoine Pitrou wrote:
What would __strpath__ do that __str__ wouldn't do? Or do you think it's better to explicitly disallow str-like objects that aren't path objects? What I'm trying to understand is whether str(argument) is that "path protocol" or whether there's a good reason that something else that's specifically not str-ification is required. Cheers, -Barry

On Wed, Oct 8, 2014 at 12:37 AM, Barry Warsaw <barry@python.org> wrote:
Currently, __str__ will happily return a string representation of basically anything. More generally than __strpath__, maybe what's needed is a method "give me a str that accurately represents this object", to be provided only by those types which are "virtually strings". It'd be an alternative to subclassing str, as a means of saying "I'm a string!". Or maybe this should be done as a collections.abc - there's ByteString, maybe there should be TextString. Then anything that registers itself as a TextString should be assumed to function as a string, and a path could just register itself. Would that make more sense than __strpath__ does? ChrisA

On 7 October 2014 14:37, Barry Warsaw <barry@python.org> wrote:
What would __strpath__ do that __str__ wouldn't do? Or do you think it's better to explicitly disallow str-like objects that aren't path objects?
What is a str-like object? A lot of objects are acceptable to str(); most of them aren't "str-like" in any reasonable sense of the term (e.g. function, int), and probably shouldn't be acceptable as paths. edk

On 7 October 2014 14:47, Barry Warsaw <barry@python.org> wrote:
Yeah - but whatever the term, I think being able to str() something is too weak a predicate for using it as a path. If there were a way for an object to declare itself str-like in the sense that Paths are, that'd be much more reasonable, imv.

On 7 October 2014 23:37, Barry Warsaw <barry@python.org> wrote:
It's mostly a matter of offering better failure modes - the reasons would be similar to why we added operator.index and __index__ in order to expand various APIs from "only int or long objects" without expanding them so far that they also accepted float objects or strings the way coercing via "int" would. Using str(x) implicitly allows lots of nonsense that should throw an immediate TypeError to instead throw OSError later on (or, worse, perhaps even appear to work). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 8 October 2014 00:40, Nick Coghlan <ncoghlan@gmail.com> wrote:
Combining this thought with Chris Angelico's reply, I actually wonder if the index vs int analogy is even more applicable than I first thought. What if the protocol was __text__ with a new text() builtin (or at least an operator.text() function), and it was advised to only be implemented by types where they were, at least conceptually, truly representable as strings? That's basically what was done with the __index__ method in http://www.python.org/dev/peps/pep-0357/ to introduce ducktyping to several APIs that previously only worked with builtin int/long objects. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Oct 8, 2014 at 2:39 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I like the comparison with __index__. It's up to the class to decide whether it wants to be "truly representable as a string". Command-line arguments probably would be, in some canonical form. Config files probably not, as they need their structure, but if you want your config object to act exactly like a text string, then you define __text__. ChrisA

On Wed, Oct 8, 2014 at 2:55 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
You pass a Path object to anything whatsoever. If it wants a Path, it uses it. As soon as that Path gets picked up by something that was expecting a string, it'll easily and conveniently coerce to string. But if you pass something other than a string or string-equivalent, it'll throw an error rather than calling str() on it. Where is the definition of "able to convert to str implicitly"?
Because this could just call Path.__text__(), get back a string, and use that. Basically, the use-case is that you could create something that's not a string, but can be used like a string, and APIs unaware of it don't need to be made aware of it. Downside: These "text-aware" objects would require Python 3.whatever, and would be hard to backport. On older versions, they wouldn't implicitly become strings. ChrisA

On Oct 7, 2014, at 7:40, Nick Coghlan <ncoghlan@gmail.com> wrote:
The "worse" case would be pretty common. Any API that writes files would just try to create a file named str(path), and on most POSIX platforms that would succeed for almost anything--bytes, int, TextIOWrapper, list of paths that you forgot to open in a listcomp, sequence of chars that you forgot to ''.join, function that you meant to call instead of just referencing, arbitrary generic-repr'd or constructor-repr'd instance, ... Most of my memories of Alpha (a Tcl-programmable classic Mac text editor) are of debugging and cleaning up after errors exactly like this. It would be even more fun for novices trying to figure out how to pass names with angle brackets and quotes to find or rm in a shell they barely know how to use...

On 6 October 2014 18:47, Barry Warsaw <barry@python.org> wrote:
I find it worse than a disincentive, it makes understanding the code perceptibly harder, which is a maintenance issue. Having an attribute that returns the string representation would be a substantial improvement (as it's the extra parentheses from the str call that I find the most distracting, that and the code smell that an "explicit cast" involves). Having more things accept Path objects would be good, but there will always be 3rd party libraries, as well as places where you want to pass a pathname that really *shouldn't* have to expect them. Paul

On Oct 07, 2014, at 02:55 PM, Paul Moore wrote:
I realize there's another thing that bugs me about sprinkling str() calls all over the place, and this relates to my other question about whether str()-ability is "the path protocol". The problem is that if I'm looking at some random code and see: my_parser.load(str(path)) I really don't have any idea what 'path' is. Maybe that's a good thing, but in the few cases where I did this, it seemed bad. ;) OTOH, if I saw this, it would be a strong clue that path were a pathlib object: my_parser.load(path.string_path) substituting .string_path for whatever color the shed gets painted. Cheers, -Barry

On 10/07/2014 07:24 AM, Barry Warsaw wrote:
Neither should be needed: my_parser.load(path) should do the trick. What was the point of adding pathlib to the stdlib if we cannot use it with the stdlib? Having a __strpath__ and/or __bytespath__ would also allow third-party path libraries to be utilized. -- ~Ethan~

That would be ideal, however very few (any?) APIs in Python stdlib supports pathlib as it stands. Passing paths around is sprinkled all over the stdlib and history suggests it’ll be a fairly lengthy process to actually get all of those stdlib locations figured out. That doesn’t even begin to touch on all of the places *not* in the stdlib that expects str based paths, some of which are C code so making them use a Path object isn’t the easiest thing ever.
Having a __strpath__ and/or __bytespath__ would also allow third-party path libraries to be utilized.
--- Donald Stufft PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Does path.py have different performance in this respect? I like the Path.isdir(), Path.walk() methods; but maybe not for everyone. Way OT, but similar support for URIs (e.g. URLObject) would likely need to take a similar approach: http://www.reddit.com/r/Python/comments/1r7h1t/python_objects_for_working_wi... On Oct 6, 2014 12:50 PM, "Barry Warsaw" <barry@python.org> wrote:

On Tue, Oct 7, 2014 at 8:25 PM, Wes Turner <wes.turner@gmail.com> wrote:
URLObject is actually a good reason against __strpath__: path.open() should work whether path is a filesystem path, or a URL object (with pathlib API), or any other path-like object; while open(path) would not. I think path.open() should be encouraged.
participants (23)
-
Alexander Belopolsky
-
Andrew Barnert
-
Antoine Pitrou
-
Barry Warsaw
-
Cathal Garvey
-
Chris Angelico
-
Donald Stufft
-
Ed Kellett
-
Ethan Furman
-
Georg Brandl
-
Guido van Rossum
-
Juancarlo Añez
-
M.-A. Lemburg
-
MRAB
-
Nick Coghlan
-
Paul Moore
-
Petr Viktorin
-
random832@fastmail.us
-
Stephen J. Turnbull
-
Terry Reedy
-
Wes Turner
-
Wolfgang Maier
-
Yury Selivanov