tweaking the file system path protocol

What do you think of this idea for a slight modification to os.fspath: the current version checks whether its arg is an instance of str, bytes or any subclass and, if so, returns the arg unchanged. In all other cases it tries to call the type's __fspath__ method to see if it can get str, bytes, or a subclass thereof this way. My proposal is to change this to: 1) check whether the type of the argument is str or bytes *exactly*; if so, return the argument unchanged 2) check wether __fspath__ can be called on the type and returns an instance of str, bytes, or any subclass (just like in the current version) 3) check whether the type is a subclass of str or bytes and, if so, return it unchanged This would have the following implications: a) it would speed up the very common case when the arg is either a str or a bytes instance exactly b) user-defined classes that inherit from str or bytes could control their path representation just like any other class c) subclasses of str/bytes that don't define __fspath__ would still work like they do now, but their processing would be slower d) subclasses of str/bytes that accidentally define a __fspath__ method would change their behavior I think cases c) and d) could be sufficiently rare that the pros outweigh the cons? Here's how the proposal could be implemented in the pure Python version (os._fspath): def _fspath(path): path_type = type(path) if path_type is str or path_type is bytes: return path # Work from the object's type to match method resolution of other magic # methods. try: path_repr = path_type.__fspath__(path) except AttributeError: if hasattr(path_type, '__fspath__'): raise elif issubclass(path_type, (str, bytes)): return path else: raise TypeError("expected str, bytes or os.PathLike object, " "not " + path_type.__name__) if isinstance(path_repr, (str, bytes)): return path_repr else: raise TypeError("expected {}.__fspath__() to return str or bytes, " "not {}".format(path_type.__name__, type(path_repr).__name__))

On Tue, May 23, 2017 at 12:12:11PM +0200, Wolfgang Maier wrote:
How about simplifying the implementation of fspath by giving str and bytes a __fspath__ method that returns str(self) or bytes(self)? class str: def __fspath__(self): return str(self) # Must be str, not type(self). (1) We can avoid most of the expensive type checks. (2) Subclasses of str and bytes don't have to do anything to get a useful default behaviour. def fspath(path): try: dunder = type(path).__fspath__ except AttributeError: raise TypeError(...) from None else: if dunder is not None: result = dunder(path) if type(result) in (str, byte): return result raise TypeError('expected a str or bytes, got ...') The reason for the not None check is to allow subclasses to explicitly deny that they can be used for paths by setting __fspath__ to None in the subclass. -- Steve

On Tue, May 23, 2017 at 1:49 PM, Steven D'Aprano <steve@pearwood.info> wrote:
How about simplifying the implementation of fspath by giving str and bytes a __fspath__ method that returns str(self) or bytes(self)?
The compatiblity issue I mention in the other email I just sent as a response to the OP will appear if a subclass returns something other than str(self) or bytes(self). —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Tue, May 23, 2017 at 1:12 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
The reason why this was not done was that a str or bytes subclass that implements __fspath__(self) would work in both pre-3.6 and 3.6+ but behave differently. This would be also be incompatible with existing code using str(path) for compatibility with the stdlib (the old way, which people still use for pre-3.6 compatibility even in new code).
To get the same performance benefit for str and bytes, but without changing functionality, there could first be the exact type check and then the isinstance check. This would add some performance penalty for PathLike objects. Removing the isinstance part of the __fspath__() return value, which I find less useful, would compensate for that. (3) would not be necessary in this version. Are you asking for other reasons, or because you actually have a use case where this matters? If this performance really matters somewhere, the version I describe above could be considered. It would have 100% backwards compatibility, or a little less (99% ?) if the isinstance check of the __fspath__() return value is removed for performance compensation.
b) user-defined classes that inherit from str or bytes could control their path representation just like any other class
Again, this would cause differences in behavior between different Python versions, and based on whether str(path) is used or not. —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On 05/23/2017 06:17 PM, Koos Zevenhoven wrote:
Hi Koos and thanks for your detailed response,
I'm not sure that sounds very convincing because that exact problem exists, was discussed and accepted in your PEP 519 for all other classes. I do not really see why subclasses of str and bytes should require special backwards compatibility here. Is there a reason why you are thinking they should be treated specially?
Right, that was one thing I forgot to mention in my list. My proposal would also speed up processing of pathlike objects because it moves the __fspath__ call up in front of the isinstance check. Your alternative would speed up only str and bytes, but would slow down Path-like classes. In addition, I'm not sure that removing the isinstance check on the return value of __fspath__() is a good idea because that would mean giving up the guarantee that os.fspath returns an instance of str or bytes and would effectively force library code to do the isinstance check anyway even if the function may have performed it already, which would worsen performance further.
That use case question is somewhat difficult to answer. I had this idea when working on two bug tracker issues (one concerning fnmatch and a follow-up one on os.path.normcase, which is called by fnmatch.filter and, in turn, calls os.fspath. fnmatchfilter is a case where performance matters and the decision when and where to call the rather expensive os.path.normcase->os.fspath there is not entirely straightforward. So, yes, I was basically looking at this because of a potential use case, but I say potential because I'm far from sure that any speed gain in os.fspath will be big enough to be useful for fnmatch.filter in the end.

On 05/23/2017 06:41 PM, Wolfgang Maier wrote:
Ah, sorry, I misunderstood what you were trying to say, but now I'm getting it! subclasses of str and bytes were of course usable as path arguments before simply because they were subclasses of them. Now they would be picked up based on their __fspath__ method, but old versions of Python executing code using them would still use them directly. Have to think about this one a bit, but thanks for pointing it out.

On Tue, May 23, 2017 at 7:53 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
Yes, this is exactly what I meant. I noticed I had left out some of the details of the reasoning, sorry. I tried to fix that in my response to Steven. — Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Tue, 23 May 2017 at 03:13 Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
What exactly is the performance issue you are having that is leading to this proposal? I ask because b) and d) change semantics and so it's not a small thing to make this change at this point since Python 3.6 has been released. So unless there's a major performance impact I'm reluctant to want to change it at this point.

I see no future for this proposal. Sorry Wolfgang! For future reference, the proposal was especially weak because it gave no concrete examples of code that was inconvenienced in any way by the current behavior. (And the performance hack of checking for exact str/bytes can be made without changing the semantics.) On Tue, May 23, 2017 at 10:04 AM, Brett Cannon <brett@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

23.05.17 20:04, Brett Cannon пише:
It seems to me that the purpose of this proposition is not performance, but the possibility to use __fspath__ in str or bytes subclasses. Currently defining __fspath__ in str or bytes subclasses doesn't have any effect. I don't know a reasonable use case for this feature. The __fspath__ method of str or bytes subclasses returning something not equivalent to self looks confusing to me.

On Wed, May 24, 2017 at 12:18 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Yes, that would be another reason. Only when Python drops support for strings as paths, can people start writing such subclasses. I'm sure many would now say dropping str/bytes path support won't even happen in Python 4. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Wed, May 24, 2017 at 12:18:16AM +0300, Serhiy Storchaka wrote:
23.05.17 20:04, Brett Cannon пише:
That's how I interpreted the proposal, with any performance issue being secondary. (I don't expect that converting path-like objects to strings would be the bottleneck in any application doing actual disk IO.)
I can imagine at least two: - emulating something like DOS 8.3 versus long file names; - case normalisation but what would make this really useful is for debugging. For instance, I have used something like this to debug problems with int() being called wrongly: py> class MyInt(int): ... def __int__(self): ... print("__int__ called") ... return super().__int__() ... py> x = MyInt(23) py> int(x) __int__ called 23 It would be annoying and inconsistent if int(x) avoided calling __int__ on int subclasses. But that's exactly what happens with fspath and str. I see that as a bug, not a feature: I find it hard to believe that we would design an interface for string-like objects (paths) and then intentionally prohibit it from applying to strings. And if we did, surely its a misfeature. Why *shouldn't* subclasses of str get the same opportunity to customize the result of __fspath__ as they get to customize their __repr__ and __str__? py> class MyStr(str): ... def __repr__(self): ... return 'repr' ... def __str__(self): ... return 'str' ... py> s = MyStr('abcdef') py> repr(s) 'repr' py> str(s) 'str' I don't think that backwards compatibility is an issue here. Nobody will have had reason to write str subclasses with __fspath__ methods, so changing the behaviour to no longer ignore them shouldn't break any code. But of course, we should treat this as a new feature, and only change the behaviour in 3.7. -- Steve

On 05/24/2017 02:41 AM, Steven D'Aprano wrote:
This is almost exactly what I have been thinking (just that I couldn't have presented it so clearly)! Lets look at a potential usecase for this. Assume that in a package you want to handle several paths to different files and directories that are all located in a common package-specific parent directory. Then using the path protocol you could write this: class PackageBase (object): basepath = '/home/.package' class PackagePath (str, PackageBase): def __fspath__ (): return os.path.join(self.basepath, str(self)) config_file = PackagePath('.config') log_file = PackagePath('events.log') data_dir = PackagePath('data') with open(log_file) as log: log.write('package paths initialized.\n') Just that this wouldn't currently work because PackagePath inherits from str. Of course, there are other ways to achieve the above, but when you think about designing a Path-like object class str is just a pretty attractive base class to start from. Now lets look at compatibility of a class like PackagePath under this proposal: - if client code uses e.g. str(config_file) and proceeds to treat the resulting object as a path unexpected things will happen and, yes, that's bad. However, this is no different from any other Path-like object for which __str__ and __fspath__ don't define the same return value. - if client code uses the PEP-recommended backwards-compatible way of dealing with paths, path.__fspath__() if hasattr(path, "__fspath__") else path things will just work. Interstingly, this would *currently* produce an unexpected result namely that it would execute the__fspath__ method of the str-subclass - if client code uses instances of PackagePath as paths directly then in Python3.6 and below that would lead to unintended outcome, while in Python3.7 things would work. This is *really* bad. But what it means is that, under the proposal, using a str or bytes subclass with an __fspath__ method defined makes your code backwards-incompatible and the solution would be not to use such a class if you want to be backwards-compatible (and that should get documented somewhere). This restriction, of course, limits the usefulness of the proposal in the near future, but that disadvantage will vanish over time. In 5 years, not supporting Python3.6 anymore maybe won't be a big deal anymore (for comparison, Python3.2 was released 6 years ago and since last years pip is no longer supporting it). As Steven pointed out the proposal is *very* unlikely to break existing code. So to summarize, the proposal - avoids an up-front isinstance check in the protocol and thereby speeds up the processing of exact strings and bytes and of anything that follows the path protocol.* - slows down the processing of instances of regular str and bytes subclasses* - makes the "path.__fspath__() if hasattr(path, "__fspath__") else path" idiom consistent for subclasses of str and bytes that define __fspath__ - opens up the opportunity to write str/bytes subclasses that represent a path other than just their self in the future** Still sounds like a net win to me, but lets see what I forgot ... * yes, speed is typically not your primary concern when it comes to IO; what's often neglected though is that not all path operations have to trigger actual IO (things in os.path for example don't typically perform IO) ** somebody on the list (I guess it was Koos?) mentioned that such classes would only make sense if Python ever disallowed the use of str/bytes as paths, but I don't think that is a prerequisite here. Wolfgang

On Wed, May 24, 2017 at 5:52 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
Unfortunately, this thinking is also very shallow compared to what went into PEP519.
This is exactly the kind of code that causes the problems. It will do the wrong thing when code like open(str(log_file), 'w') is used for compatiblity.
Isn't it great that it doesn't work, so it's not attractive anymore?
Yes, this is another way of shooting yourself in the foot. Luckily, this one is probably less attractive.
So people not testing for 3.6+ might think their code works while it doesn't. Luckily people not testing with 3.6+ are perhaps unlikely to try funny tricks with __fspath__.
Speedup for things with __fspath__ is the only virtue of this proposal, and it has not been shown that that speedup matters anywhere.
One can discuss whether this is the best idiom to use (I did not write it, so maybe someone else has comments). Anyway, some may want to use path.__fspath__() if hasattr(path, "__fspath__") else str(path) and some may want path if isinstance(path, (str, bytes)) else path.__fspath__() Or others may not be after oneliners like this and instead include the full implementation of fspath in their code—or even better, with some modifications. Really, the best thing to use in pre-3.6 might be more like: def fspath(path): if isinstance(path, (str, bytes)): return path if hasattr(path, '__fspath__'): return path.__fspath__() if type(path).__name__ == 'DirEntry': or isinstance(path, pathlib.PurePath): return str(path) raise TypeError("Argument cannot be interpreted as a file system path: " + repr(path)) Note that
Yes, I wrote that, and I stick with it: str and bytes subclasses that return something different from the str/bytes content should not be written. If Python ever disallows str/bytes as paths, such a thing becomes less harmful, and there is no need to have special treatment for str and bytes. Until then, I'm very happy with the decision to ignore __fspath__ on str and bytes. —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

Accidentally sent the email before it was done. Additions / corrections below: On Fri, May 26, 2017 at 3:58 PM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
In the above, I have to check type(path).__name__, because DirEntry was not exposed as os.DirEntry in 3.5 yet. For pre-3.4 Python and for older third-party libraries that do inherit from str/bytes, one could even use something like: def fspath(path): if isinstance(path, (str, bytes)): return path if hasattr(type(path), '__fspath__'): return type(path).__fspath__(path) if type(path).__name__ == 'DirEntry': return path.path if "Path" in type(path).__name__: # add whatever known names for path classes (what a hack!) return str(path) raise TypeError("Argument cannot be interpreted as a file system path: " + repr(path)) —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Fri, May 26, 2017 at 03:58:23PM +0300, Koos Zevenhoven wrote:
[...]
That is a rather rude comment. How would you feel if Wolfgang or I said that the PEP's thinking was "very shallow"? (I see you are listed as co-author.) If you are going to criticise our reasoning, you better give reasons for why we are wrong, not just insult us: "...this thinking is very shallow..." "This is exactly the kind of code that causes the problems." "Isn't it great that it doesn't work, so it's not attractive anymore?" "Yes, this is another way of shooting yourself in the foot." Let me look at your objections:
That's your opinion, other people might disagree. In another post, you said it would be "confusing". I think this argument is FUD ("Fear, Uncertainty, Doubt"). We can already write confusing code in a million other ways, why is this one to be prohibited? I don't know of any other area of Python where a type isn't permitted to override its own dunders: strings have __str__ and __repr__ floats have __float__ ints have __int__ tuples can override __getitem__ to return whatever they like etc. This is legal: py> class ConfusingStr(str): ... def __getitem__(self, i): ... return 'x' ... py> s = ConfusingStr("Nobody expects the Spanish Inquisition!") py> s[5] 'x' People have had the ability to write "confusing" strings, floats and ints which could return something different from their own value. They either don't do it, or if they do, they have a good reason and it isn't so confusing. And if somebody does use it to write a confusing class? So what? "consenting adults" applies here. We aren't responsible for every abuse of the language that somebody might do. Why is __fspath__ so special that we need to protect users from doing something confusing? What *really is* confusing is to ignore __fspath__ methods in some objects but not other objects. If that decision was intentional, I don't think it was justified in the PEP. (At least, I didn't see it.)
Then don't do that. Using open(str(log_file), 'w') is not the right way to emulate the Path protocol for backwards compatibility. The whole reason the Path protocol exists is because calling str(obj) is the wrong way to convert an unknown object to a file system path string. I think this argument about backwards compatibility is a storm in a tea cup. We can enumerate all the possibilities: 1. object that doesn't inherit from str/bytes: behaviour is unchanged; 2. object that does inherit from str/bytes, but doesn't override the __fspath__ method: behaviour is unchanged; 3. object that inherits from str/bytes, *and* overrides the __fspath__ method: behaviour is changed. Okay, the behaviour changes. I doubt that there will be many classes that subclass str and override __fspath__ now, because that would have been a waste of time up to now. So the main risk is: - classes created from Python 3.7 onwards; - which inherit from str/bytes; - and which override __fspath__; - and are back-ported to 3.6; - without taking into account that __fspath__ will be ignored in 3.6; - and the users don't read the docs to learn about the difference. The danger here is the possibility that the wrong pathname will be used, if str(obj) and fspath(obj) return a different string. Personally I think this is unlikely and not worth worrying about beyond a note in the documentation, but if people really feel this is a problem we could make this a __future__ import. But that just feels like overkill. -- Steve

On 28 May 2017 at 15:18, Steven D'Aprano <steve@pearwood.info> wrote:
It wouldn't even need to be a __future__ import, as we have a runtime warning category specifically for this kind of change: https://docs.python.org/3/library/exceptions.html#FutureWarning So *if* a change like this was made, the appropriate transition plan would be: Python 3.7: at *class definition time*, we emit FutureWarning for subclasses of str and bytes that define __fspath__, saying that it is currently ignored for such subclasses, but will be called in Python 3.8+ Python 3.8: os.fspath() is changed as Wolgang proposes, such that explicit protocol support takes precedence over builtin inheritance However, if we *did* make such a change, it should also be made for operator.index as well, since that is similarly inconsistent with the way the int/float/etc constructor protocols work: >>> from operator import index >>> class MyInt(int): ... def __int__(self): ... return 5 ... def __index__(self): ... return 5 ... >>> int(MyInt(10)) 5 >>> index(MyInt(10)) 10 >>> class MyFloat(float): ... def __float__(self): ... return 5.0 ... >>> float(MyFloat(10)) 5.0 >>> class MyComplex(complex): ... def __complex__(self): ... return 5j ... >>> complex(MyComplex(10j)) 5j >>> class MyStr(str): ... def __str__(self): ... return "Hello" ... >>> str(MyStr("Not hello")) 'Hello' >>> class MyBytes(bytes): ... def __bytes__(self): ... return b"Hello" ... >>> bytes(MyBytes(b"Not hello")) b'Hello' Regards, Nick. P.S. I'll also echo Steven's observations that it is entirely inappropriate to describe the thinking of other posters to the list as being overly shallow. The entire reason we *have* python-ideas and the PEP process is because programming language design is a *hard problem*, especially for a language with as broad a set of use cases as Python. Rather than trying to somehow survey the entire world of Python developers, we instead provide them with an open forum where they can say "This surprises or otherwise causes problems for me" and describe their perspective. That's neither deep nor shallow thinking, it's just different people using the same language in different ways, and hence encountering different pain points. As far as the specific point at hand goes, I think contrasting the behaviour of PEP 357 (__index__) and PEP 519 (__fspath__) with the behaviour of the builtin constructor protocols suggest that this is better characterised as an oversight in the design of the more recent protocols, since neither PEP explicitly discusses the problem, both PEPs were specifically designed to permit the use of objects that *don't* inherit from the relevant builtin types (since subclasses already worked), and both PEPs handle the "subclass that also implements the corresponding protocol" scenario differently from the way the builtin constructor protocols handle it. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, May 28, 2017 at 9:15 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Part of this discussion seems to consider consistency as the only thing that matters, but consistency is only the surface here. I won't comment on the __index__ issue, and especially not call it a "misfeature", because I haven't thought about it deeply, and my comments on it would be very shallow. I might ask about it though, like the OP did. Don't get me wrong, I like consistency very much. But regarding the __fspath__ case, there are not that many people *writing* fspath-enabled classes. Instead, there are many many many more people *using* such classes (and dealing with their compatibility issues in different ways). For those people, the current behavior brings consistency---after all, it was of course designed by thinking about it from all angles and not just based on my or anyone else's own use cases only. -- Koos
-- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Sun, May 28, 2017 at 05:35:38PM +0300, Koos Zevenhoven wrote:
What sort of compatibility issues are you referring to? os.fspath is new in 3.6, and 3.7 isn't out yet, so I'm having trouble understanding what compatibility issues you mean.
For those people, the current behavior brings consistency
That's a very unintuitive statement. How is it consistent for fspath to call the __fspath__ dunder method for some objects but ignore it for others?
Can explain the reasoning to us? I don't think it is explained in the PEP. -- Steve

On 28.05.2017 18:32, Steven D'Aprano wrote:
As far as I'm aware the only such issue people had was with building interfaces that could deal with regular strings and pathlib.Path (introduced in 3.4 if I remember correctly) instances alike. Because calling str on a pathlib.Path instance returns the path as a regular string it looked like it could become a (bad) habit to just always call str on any received object for "compatibility" with both types of path representations. The path protocol is a response to this that provides an explicit and safe alternative.
The path protocol brings a standard way of dealing with diverse path representations, but only if you use it. If people keep using str(path_object) as before, then they are doing things wrongly and are no better or safer off than they were before! The path protocol does *not* use __fspath__ as an indicator that an object's str-representation is intended to be used as a path. If you had wanted this, the PEP should have defined __fspath__ not as a method, but as a flag and have the protocol check that flag, then call __str__ if appropriate. With __fspath__ being a method that can return whatever its author sees fit, calling str to get a path from an arbitrary object is just as wrong as it always was - it will work for pathlib.Path objects and might or might not work for some other types. Importantly, this has nothing to do with this proposal, but is in the nature of the protocol as it is defined *now*.

On Wed, May 24, 2017 at 3:41 AM, Steven D'Aprano <steve@pearwood.info> wrote:
These are not reasonable use cases because they should not subclass str or bytes. That would be confusing.
You can monkeypatch the stdlib: from os import fspath as real_fspath mystr = "23" def fspath(path): if path is mystr: print("fspath was called on mystr") return real_fspath(path) os.fspath = fspath try_something_with(mystr) Having __fspath__ on str and bytes by default would destroy the ability to distinguish between PathLike and non-PathLike, because all strings would appear to be PathLike. (Not to mention the important compatibility issues between different Python versions and different ways of dealing with pre-PEP519 path objects.) —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Tue, May 23, 2017 at 12:12:11PM +0200, Wolfgang Maier wrote:
How about simplifying the implementation of fspath by giving str and bytes a __fspath__ method that returns str(self) or bytes(self)? class str: def __fspath__(self): return str(self) # Must be str, not type(self). (1) We can avoid most of the expensive type checks. (2) Subclasses of str and bytes don't have to do anything to get a useful default behaviour. def fspath(path): try: dunder = type(path).__fspath__ except AttributeError: raise TypeError(...) from None else: if dunder is not None: result = dunder(path) if type(result) in (str, byte): return result raise TypeError('expected a str or bytes, got ...') The reason for the not None check is to allow subclasses to explicitly deny that they can be used for paths by setting __fspath__ to None in the subclass. -- Steve

On Tue, May 23, 2017 at 1:49 PM, Steven D'Aprano <steve@pearwood.info> wrote:
How about simplifying the implementation of fspath by giving str and bytes a __fspath__ method that returns str(self) or bytes(self)?
The compatiblity issue I mention in the other email I just sent as a response to the OP will appear if a subclass returns something other than str(self) or bytes(self). —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Tue, May 23, 2017 at 1:12 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
The reason why this was not done was that a str or bytes subclass that implements __fspath__(self) would work in both pre-3.6 and 3.6+ but behave differently. This would be also be incompatible with existing code using str(path) for compatibility with the stdlib (the old way, which people still use for pre-3.6 compatibility even in new code).
To get the same performance benefit for str and bytes, but without changing functionality, there could first be the exact type check and then the isinstance check. This would add some performance penalty for PathLike objects. Removing the isinstance part of the __fspath__() return value, which I find less useful, would compensate for that. (3) would not be necessary in this version. Are you asking for other reasons, or because you actually have a use case where this matters? If this performance really matters somewhere, the version I describe above could be considered. It would have 100% backwards compatibility, or a little less (99% ?) if the isinstance check of the __fspath__() return value is removed for performance compensation.
b) user-defined classes that inherit from str or bytes could control their path representation just like any other class
Again, this would cause differences in behavior between different Python versions, and based on whether str(path) is used or not. —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On 05/23/2017 06:17 PM, Koos Zevenhoven wrote:
Hi Koos and thanks for your detailed response,
I'm not sure that sounds very convincing because that exact problem exists, was discussed and accepted in your PEP 519 for all other classes. I do not really see why subclasses of str and bytes should require special backwards compatibility here. Is there a reason why you are thinking they should be treated specially?
Right, that was one thing I forgot to mention in my list. My proposal would also speed up processing of pathlike objects because it moves the __fspath__ call up in front of the isinstance check. Your alternative would speed up only str and bytes, but would slow down Path-like classes. In addition, I'm not sure that removing the isinstance check on the return value of __fspath__() is a good idea because that would mean giving up the guarantee that os.fspath returns an instance of str or bytes and would effectively force library code to do the isinstance check anyway even if the function may have performed it already, which would worsen performance further.
That use case question is somewhat difficult to answer. I had this idea when working on two bug tracker issues (one concerning fnmatch and a follow-up one on os.path.normcase, which is called by fnmatch.filter and, in turn, calls os.fspath. fnmatchfilter is a case where performance matters and the decision when and where to call the rather expensive os.path.normcase->os.fspath there is not entirely straightforward. So, yes, I was basically looking at this because of a potential use case, but I say potential because I'm far from sure that any speed gain in os.fspath will be big enough to be useful for fnmatch.filter in the end.

On 05/23/2017 06:41 PM, Wolfgang Maier wrote:
Ah, sorry, I misunderstood what you were trying to say, but now I'm getting it! subclasses of str and bytes were of course usable as path arguments before simply because they were subclasses of them. Now they would be picked up based on their __fspath__ method, but old versions of Python executing code using them would still use them directly. Have to think about this one a bit, but thanks for pointing it out.

On Tue, May 23, 2017 at 7:53 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
Yes, this is exactly what I meant. I noticed I had left out some of the details of the reasoning, sorry. I tried to fix that in my response to Steven. — Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Tue, 23 May 2017 at 03:13 Wolfgang Maier < wolfgang.maier@biologie.uni-freiburg.de> wrote:
What exactly is the performance issue you are having that is leading to this proposal? I ask because b) and d) change semantics and so it's not a small thing to make this change at this point since Python 3.6 has been released. So unless there's a major performance impact I'm reluctant to want to change it at this point.

I see no future for this proposal. Sorry Wolfgang! For future reference, the proposal was especially weak because it gave no concrete examples of code that was inconvenienced in any way by the current behavior. (And the performance hack of checking for exact str/bytes can be made without changing the semantics.) On Tue, May 23, 2017 at 10:04 AM, Brett Cannon <brett@python.org> wrote:
-- --Guido van Rossum (python.org/~guido)

23.05.17 20:04, Brett Cannon пише:
It seems to me that the purpose of this proposition is not performance, but the possibility to use __fspath__ in str or bytes subclasses. Currently defining __fspath__ in str or bytes subclasses doesn't have any effect. I don't know a reasonable use case for this feature. The __fspath__ method of str or bytes subclasses returning something not equivalent to self looks confusing to me.

On Wed, May 24, 2017 at 12:18 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Yes, that would be another reason. Only when Python drops support for strings as paths, can people start writing such subclasses. I'm sure many would now say dropping str/bytes path support won't even happen in Python 4. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Wed, May 24, 2017 at 12:18:16AM +0300, Serhiy Storchaka wrote:
23.05.17 20:04, Brett Cannon пише:
That's how I interpreted the proposal, with any performance issue being secondary. (I don't expect that converting path-like objects to strings would be the bottleneck in any application doing actual disk IO.)
I can imagine at least two: - emulating something like DOS 8.3 versus long file names; - case normalisation but what would make this really useful is for debugging. For instance, I have used something like this to debug problems with int() being called wrongly: py> class MyInt(int): ... def __int__(self): ... print("__int__ called") ... return super().__int__() ... py> x = MyInt(23) py> int(x) __int__ called 23 It would be annoying and inconsistent if int(x) avoided calling __int__ on int subclasses. But that's exactly what happens with fspath and str. I see that as a bug, not a feature: I find it hard to believe that we would design an interface for string-like objects (paths) and then intentionally prohibit it from applying to strings. And if we did, surely its a misfeature. Why *shouldn't* subclasses of str get the same opportunity to customize the result of __fspath__ as they get to customize their __repr__ and __str__? py> class MyStr(str): ... def __repr__(self): ... return 'repr' ... def __str__(self): ... return 'str' ... py> s = MyStr('abcdef') py> repr(s) 'repr' py> str(s) 'str' I don't think that backwards compatibility is an issue here. Nobody will have had reason to write str subclasses with __fspath__ methods, so changing the behaviour to no longer ignore them shouldn't break any code. But of course, we should treat this as a new feature, and only change the behaviour in 3.7. -- Steve

On 05/24/2017 02:41 AM, Steven D'Aprano wrote:
This is almost exactly what I have been thinking (just that I couldn't have presented it so clearly)! Lets look at a potential usecase for this. Assume that in a package you want to handle several paths to different files and directories that are all located in a common package-specific parent directory. Then using the path protocol you could write this: class PackageBase (object): basepath = '/home/.package' class PackagePath (str, PackageBase): def __fspath__ (): return os.path.join(self.basepath, str(self)) config_file = PackagePath('.config') log_file = PackagePath('events.log') data_dir = PackagePath('data') with open(log_file) as log: log.write('package paths initialized.\n') Just that this wouldn't currently work because PackagePath inherits from str. Of course, there are other ways to achieve the above, but when you think about designing a Path-like object class str is just a pretty attractive base class to start from. Now lets look at compatibility of a class like PackagePath under this proposal: - if client code uses e.g. str(config_file) and proceeds to treat the resulting object as a path unexpected things will happen and, yes, that's bad. However, this is no different from any other Path-like object for which __str__ and __fspath__ don't define the same return value. - if client code uses the PEP-recommended backwards-compatible way of dealing with paths, path.__fspath__() if hasattr(path, "__fspath__") else path things will just work. Interstingly, this would *currently* produce an unexpected result namely that it would execute the__fspath__ method of the str-subclass - if client code uses instances of PackagePath as paths directly then in Python3.6 and below that would lead to unintended outcome, while in Python3.7 things would work. This is *really* bad. But what it means is that, under the proposal, using a str or bytes subclass with an __fspath__ method defined makes your code backwards-incompatible and the solution would be not to use such a class if you want to be backwards-compatible (and that should get documented somewhere). This restriction, of course, limits the usefulness of the proposal in the near future, but that disadvantage will vanish over time. In 5 years, not supporting Python3.6 anymore maybe won't be a big deal anymore (for comparison, Python3.2 was released 6 years ago and since last years pip is no longer supporting it). As Steven pointed out the proposal is *very* unlikely to break existing code. So to summarize, the proposal - avoids an up-front isinstance check in the protocol and thereby speeds up the processing of exact strings and bytes and of anything that follows the path protocol.* - slows down the processing of instances of regular str and bytes subclasses* - makes the "path.__fspath__() if hasattr(path, "__fspath__") else path" idiom consistent for subclasses of str and bytes that define __fspath__ - opens up the opportunity to write str/bytes subclasses that represent a path other than just their self in the future** Still sounds like a net win to me, but lets see what I forgot ... * yes, speed is typically not your primary concern when it comes to IO; what's often neglected though is that not all path operations have to trigger actual IO (things in os.path for example don't typically perform IO) ** somebody on the list (I guess it was Koos?) mentioned that such classes would only make sense if Python ever disallowed the use of str/bytes as paths, but I don't think that is a prerequisite here. Wolfgang

On Wed, May 24, 2017 at 5:52 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
Unfortunately, this thinking is also very shallow compared to what went into PEP519.
This is exactly the kind of code that causes the problems. It will do the wrong thing when code like open(str(log_file), 'w') is used for compatiblity.
Isn't it great that it doesn't work, so it's not attractive anymore?
Yes, this is another way of shooting yourself in the foot. Luckily, this one is probably less attractive.
So people not testing for 3.6+ might think their code works while it doesn't. Luckily people not testing with 3.6+ are perhaps unlikely to try funny tricks with __fspath__.
Speedup for things with __fspath__ is the only virtue of this proposal, and it has not been shown that that speedup matters anywhere.
One can discuss whether this is the best idiom to use (I did not write it, so maybe someone else has comments). Anyway, some may want to use path.__fspath__() if hasattr(path, "__fspath__") else str(path) and some may want path if isinstance(path, (str, bytes)) else path.__fspath__() Or others may not be after oneliners like this and instead include the full implementation of fspath in their code—or even better, with some modifications. Really, the best thing to use in pre-3.6 might be more like: def fspath(path): if isinstance(path, (str, bytes)): return path if hasattr(path, '__fspath__'): return path.__fspath__() if type(path).__name__ == 'DirEntry': or isinstance(path, pathlib.PurePath): return str(path) raise TypeError("Argument cannot be interpreted as a file system path: " + repr(path)) Note that
Yes, I wrote that, and I stick with it: str and bytes subclasses that return something different from the str/bytes content should not be written. If Python ever disallows str/bytes as paths, such a thing becomes less harmful, and there is no need to have special treatment for str and bytes. Until then, I'm very happy with the decision to ignore __fspath__ on str and bytes. —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

Accidentally sent the email before it was done. Additions / corrections below: On Fri, May 26, 2017 at 3:58 PM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
In the above, I have to check type(path).__name__, because DirEntry was not exposed as os.DirEntry in 3.5 yet. For pre-3.4 Python and for older third-party libraries that do inherit from str/bytes, one could even use something like: def fspath(path): if isinstance(path, (str, bytes)): return path if hasattr(type(path), '__fspath__'): return type(path).__fspath__(path) if type(path).__name__ == 'DirEntry': return path.path if "Path" in type(path).__name__: # add whatever known names for path classes (what a hack!) return str(path) raise TypeError("Argument cannot be interpreted as a file system path: " + repr(path)) —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Fri, May 26, 2017 at 03:58:23PM +0300, Koos Zevenhoven wrote:
[...]
That is a rather rude comment. How would you feel if Wolfgang or I said that the PEP's thinking was "very shallow"? (I see you are listed as co-author.) If you are going to criticise our reasoning, you better give reasons for why we are wrong, not just insult us: "...this thinking is very shallow..." "This is exactly the kind of code that causes the problems." "Isn't it great that it doesn't work, so it's not attractive anymore?" "Yes, this is another way of shooting yourself in the foot." Let me look at your objections:
That's your opinion, other people might disagree. In another post, you said it would be "confusing". I think this argument is FUD ("Fear, Uncertainty, Doubt"). We can already write confusing code in a million other ways, why is this one to be prohibited? I don't know of any other area of Python where a type isn't permitted to override its own dunders: strings have __str__ and __repr__ floats have __float__ ints have __int__ tuples can override __getitem__ to return whatever they like etc. This is legal: py> class ConfusingStr(str): ... def __getitem__(self, i): ... return 'x' ... py> s = ConfusingStr("Nobody expects the Spanish Inquisition!") py> s[5] 'x' People have had the ability to write "confusing" strings, floats and ints which could return something different from their own value. They either don't do it, or if they do, they have a good reason and it isn't so confusing. And if somebody does use it to write a confusing class? So what? "consenting adults" applies here. We aren't responsible for every abuse of the language that somebody might do. Why is __fspath__ so special that we need to protect users from doing something confusing? What *really is* confusing is to ignore __fspath__ methods in some objects but not other objects. If that decision was intentional, I don't think it was justified in the PEP. (At least, I didn't see it.)
Then don't do that. Using open(str(log_file), 'w') is not the right way to emulate the Path protocol for backwards compatibility. The whole reason the Path protocol exists is because calling str(obj) is the wrong way to convert an unknown object to a file system path string. I think this argument about backwards compatibility is a storm in a tea cup. We can enumerate all the possibilities: 1. object that doesn't inherit from str/bytes: behaviour is unchanged; 2. object that does inherit from str/bytes, but doesn't override the __fspath__ method: behaviour is unchanged; 3. object that inherits from str/bytes, *and* overrides the __fspath__ method: behaviour is changed. Okay, the behaviour changes. I doubt that there will be many classes that subclass str and override __fspath__ now, because that would have been a waste of time up to now. So the main risk is: - classes created from Python 3.7 onwards; - which inherit from str/bytes; - and which override __fspath__; - and are back-ported to 3.6; - without taking into account that __fspath__ will be ignored in 3.6; - and the users don't read the docs to learn about the difference. The danger here is the possibility that the wrong pathname will be used, if str(obj) and fspath(obj) return a different string. Personally I think this is unlikely and not worth worrying about beyond a note in the documentation, but if people really feel this is a problem we could make this a __future__ import. But that just feels like overkill. -- Steve

On 28 May 2017 at 15:18, Steven D'Aprano <steve@pearwood.info> wrote:
It wouldn't even need to be a __future__ import, as we have a runtime warning category specifically for this kind of change: https://docs.python.org/3/library/exceptions.html#FutureWarning So *if* a change like this was made, the appropriate transition plan would be: Python 3.7: at *class definition time*, we emit FutureWarning for subclasses of str and bytes that define __fspath__, saying that it is currently ignored for such subclasses, but will be called in Python 3.8+ Python 3.8: os.fspath() is changed as Wolgang proposes, such that explicit protocol support takes precedence over builtin inheritance However, if we *did* make such a change, it should also be made for operator.index as well, since that is similarly inconsistent with the way the int/float/etc constructor protocols work: >>> from operator import index >>> class MyInt(int): ... def __int__(self): ... return 5 ... def __index__(self): ... return 5 ... >>> int(MyInt(10)) 5 >>> index(MyInt(10)) 10 >>> class MyFloat(float): ... def __float__(self): ... return 5.0 ... >>> float(MyFloat(10)) 5.0 >>> class MyComplex(complex): ... def __complex__(self): ... return 5j ... >>> complex(MyComplex(10j)) 5j >>> class MyStr(str): ... def __str__(self): ... return "Hello" ... >>> str(MyStr("Not hello")) 'Hello' >>> class MyBytes(bytes): ... def __bytes__(self): ... return b"Hello" ... >>> bytes(MyBytes(b"Not hello")) b'Hello' Regards, Nick. P.S. I'll also echo Steven's observations that it is entirely inappropriate to describe the thinking of other posters to the list as being overly shallow. The entire reason we *have* python-ideas and the PEP process is because programming language design is a *hard problem*, especially for a language with as broad a set of use cases as Python. Rather than trying to somehow survey the entire world of Python developers, we instead provide them with an open forum where they can say "This surprises or otherwise causes problems for me" and describe their perspective. That's neither deep nor shallow thinking, it's just different people using the same language in different ways, and hence encountering different pain points. As far as the specific point at hand goes, I think contrasting the behaviour of PEP 357 (__index__) and PEP 519 (__fspath__) with the behaviour of the builtin constructor protocols suggest that this is better characterised as an oversight in the design of the more recent protocols, since neither PEP explicitly discusses the problem, both PEPs were specifically designed to permit the use of objects that *don't* inherit from the relevant builtin types (since subclasses already worked), and both PEPs handle the "subclass that also implements the corresponding protocol" scenario differently from the way the builtin constructor protocols handle it. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, May 28, 2017 at 9:15 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Part of this discussion seems to consider consistency as the only thing that matters, but consistency is only the surface here. I won't comment on the __index__ issue, and especially not call it a "misfeature", because I haven't thought about it deeply, and my comments on it would be very shallow. I might ask about it though, like the OP did. Don't get me wrong, I like consistency very much. But regarding the __fspath__ case, there are not that many people *writing* fspath-enabled classes. Instead, there are many many many more people *using* such classes (and dealing with their compatibility issues in different ways). For those people, the current behavior brings consistency---after all, it was of course designed by thinking about it from all angles and not just based on my or anyone else's own use cases only. -- Koos
-- + Koos Zevenhoven + http://twitter.com/k7hoven +

On Sun, May 28, 2017 at 05:35:38PM +0300, Koos Zevenhoven wrote:
What sort of compatibility issues are you referring to? os.fspath is new in 3.6, and 3.7 isn't out yet, so I'm having trouble understanding what compatibility issues you mean.
For those people, the current behavior brings consistency
That's a very unintuitive statement. How is it consistent for fspath to call the __fspath__ dunder method for some objects but ignore it for others?
Can explain the reasoning to us? I don't think it is explained in the PEP. -- Steve

On 28.05.2017 18:32, Steven D'Aprano wrote:
As far as I'm aware the only such issue people had was with building interfaces that could deal with regular strings and pathlib.Path (introduced in 3.4 if I remember correctly) instances alike. Because calling str on a pathlib.Path instance returns the path as a regular string it looked like it could become a (bad) habit to just always call str on any received object for "compatibility" with both types of path representations. The path protocol is a response to this that provides an explicit and safe alternative.
The path protocol brings a standard way of dealing with diverse path representations, but only if you use it. If people keep using str(path_object) as before, then they are doing things wrongly and are no better or safer off than they were before! The path protocol does *not* use __fspath__ as an indicator that an object's str-representation is intended to be used as a path. If you had wanted this, the PEP should have defined __fspath__ not as a method, but as a flag and have the protocol check that flag, then call __str__ if appropriate. With __fspath__ being a method that can return whatever its author sees fit, calling str to get a path from an arbitrary object is just as wrong as it always was - it will work for pathlib.Path objects and might or might not work for some other types. Importantly, this has nothing to do with this proposal, but is in the nature of the protocol as it is defined *now*.

On Wed, May 24, 2017 at 3:41 AM, Steven D'Aprano <steve@pearwood.info> wrote:
These are not reasonable use cases because they should not subclass str or bytes. That would be confusing.
You can monkeypatch the stdlib: from os import fspath as real_fspath mystr = "23" def fspath(path): if path is mystr: print("fspath was called on mystr") return real_fspath(path) os.fspath = fspath try_something_with(mystr) Having __fspath__ on str and bytes by default would destroy the ability to distinguish between PathLike and non-PathLike, because all strings would appear to be PathLike. (Not to mention the important compatibility issues between different Python versions and different ways of dealing with pre-PEP519 path objects.) —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +
participants (9)
-
Brett Cannon
-
Guido van Rossum
-
Juancarlo Añez
-
Koos Zevenhoven
-
Nick Coghlan
-
Serhiy Storchaka
-
Steven D'Aprano
-
tritium-list@sdamon.com
-
Wolfgang Maier