On Wed, May 24, 2017 at 5:52 PM, Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:
On 05/24/2017 02:41 AM, Steven D'Aprano wrote:
It would be annoying and inconsistent if int(x) avoided calling __int__ on int subclasses. But that's exactly what happens with fspath and str. I see that as a bug, not a feature: I find it hard to believe that we would design an interface for string-like objects (paths) and then intentionally prohibit it from applying to strings.
And if we did, surely its a misfeature. Why *shouldn't* subclasses of str get the same opportunity to customize the result of __fspath__ as they get to customize their __repr__ and __str__?
py> class MyStr(str): ... def __repr__(self): ... return 'repr' ... def __str__(self): ... return 'str' ... py> s = MyStr('abcdef') py> repr(s) 'repr' py> str(s) 'str'
This is almost exactly what I have been thinking (just that I couldn't have presented it so clearly)!
Unfortunately, this thinking is also very shallow compared to what went into PEP519.
Lets look at a potential usecase for this. Assume that in a package you want to handle several paths to different files and directories that are all located in a common package-specific parent directory. Then using the path protocol you could write this:
class PackageBase (object): basepath = '/home/.package'
class PackagePath (str, PackageBase): def __fspath__ (): return os.path.join(self.basepath, str(self))
config_file = PackagePath('.config') log_file = PackagePath('events.log') data_dir = PackagePath('data')
with open(log_file) as log: log.write('package paths initialized.\n')
This is exactly the kind of code that causes the problems. It will do the wrong thing when code like open(str(log_file), 'w') is used for compatiblity.
Just that this wouldn't currently work because PackagePath inherits from str. Of course, there are other ways to achieve the above, but when you think about designing a Path-like object class str is just a pretty attractive base class to start from.
Isn't it great that it doesn't work, so it's not attractive anymore?
Now lets look at compatibility of a class like PackagePath under this proposal:
- if client code uses e.g. str(config_file) and proceeds to treat the resulting object as a path unexpected things will happen and, yes, that's bad. However, this is no different from any other Path-like object for which __str__ and __fspath__ don't define the same return value.
Yes, this is another way of shooting yourself in the foot. Luckily, this one is probably less attractive.
- if client code uses the PEP-recommended backwards-compatible way of dealing with paths,
path.__fspath__() if hasattr(path, "__fspath__") else path
things will just work. Interstingly, this would *currently* produce an unexpected result namely that it would execute the__fspath__ method of the str-subclass
So people not testing for 3.6+ might think their code works while it doesn't. Luckily people not testing with 3.6+ are perhaps unlikely to try funny tricks with __fspath__.
- if client code uses instances of PackagePath as paths directly then in Python3.6 and below that would lead to unintended outcome, while in Python3.7 things would work. This is *really* bad.
But what it means is that, under the proposal, using a str or bytes subclass with an __fspath__ method defined makes your code backwards-incompatible and the solution would be not to use such a class if you want to be backwards-compatible (and that should get documented somewhere). This restriction, of course, limits the usefulness of the proposal in the near future, but that disadvantage will vanish over time. In 5 years, not supporting Python3.6 anymore maybe won't be a big deal anymore (for comparison, Python3.2 was released 6 years ago and since last years pip is no longer supporting it). As Steven pointed out the proposal is *very* unlikely to break existing code.
So to summarize, the proposal
- avoids an up-front isinstance check in the protocol and thereby speeds up the processing of exact strings and bytes and of anything that follows the path protocol.*
Speedup for things with __fspath__ is the only virtue of this proposal, and it has not been shown that that speedup matters anywhere.
- slows down the processing of instances of regular str and bytes subclasses*
- makes the "path.__fspath__() if hasattr(path, "__fspath__") else path" idiom consistent for subclasses of str and bytes that define __fspath__
One can discuss whether this is the best idiom to use (I did not write it, so maybe someone else has comments). Anyway, some may want to use path.__fspath__() if hasattr(path, "__fspath__") else str(path) and some may want path if isinstance(path, (str, bytes)) else path.__fspath__() Or others may not be after oneliners like this and instead include the full implementation of fspath in their code—or even better, with some modifications. Really, the best thing to use in pre-3.6 might be more like: def fspath(path): if isinstance(path, (str, bytes)): return path if hasattr(path, '__fspath__'): return path.__fspath__() if type(path).__name__ == 'DirEntry': or isinstance(path, pathlib.PurePath): return str(path) raise TypeError("Argument cannot be interpreted as a file system path: " + repr(path)) Note that
- opens up the opportunity to write str/bytes subclasses that represent a path other than just their self in the future**
Still sounds like a net win to me, but lets see what I forgot ...
* yes, speed is typically not your primary concern when it comes to IO; what's often neglected though is that not all path operations have to trigger actual IO (things in os.path for example don't typically perform IO)
** somebody on the list (I guess it was Koos?) mentioned that such classes would only make sense if Python ever disallowed the use of str/bytes as paths, but I don't think that is a prerequisite here.
Yes, I wrote that, and I stick with it: str and bytes subclasses that return something different from the str/bytes content should not be written. If Python ever disallows str/bytes as paths, such a thing becomes less harmful, and there is no need to have special treatment for str and bytes. Until then, I'm very happy with the decision to ignore __fspath__ on str and bytes. —Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven +