
On 05/24/2017 02:41 AM, Steven D'Aprano wrote:
On Wed, May 24, 2017 at 12:18:16AM +0300, Serhiy Storchaka wrote:
It seems to me that the purpose of this proposition is not performance, but the possibility to use __fspath__ in str or bytes subclasses. Currently defining __fspath__ in str or bytes subclasses doesn't have any effect.
That's how I interpreted the proposal, with any performance issue being secondary. (I don't expect that converting path-like objects to strings would be the bottleneck in any application doing actual disk IO.)
I don't know a reasonable use case for this feature. The __fspath__ method of str or bytes subclasses returning something not equivalent to self looks confusing to me.
I can imagine at least two:
- emulating something like DOS 8.3 versus long file names; - case normalisation
but what would make this really useful is for debugging. For instance, I have used something like this to debug problems with int() being called wrongly:
py> class MyInt(int): ... def __int__(self): ... print("__int__ called") ... return super().__int__() ... py> x = MyInt(23) py> int(x) __int__ called 23
It would be annoying and inconsistent if int(x) avoided calling __int__ on int subclasses. But that's exactly what happens with fspath and str. I see that as a bug, not a feature: I find it hard to believe that we would design an interface for string-like objects (paths) and then intentionally prohibit it from applying to strings.
And if we did, surely its a misfeature. Why *shouldn't* subclasses of str get the same opportunity to customize the result of __fspath__ as they get to customize their __repr__ and __str__?
py> class MyStr(str): ... def __repr__(self): ... return 'repr' ... def __str__(self): ... return 'str' ... py> s = MyStr('abcdef') py> repr(s) 'repr' py> str(s) 'str'
This is almost exactly what I have been thinking (just that I couldn't have presented it so clearly)! Lets look at a potential usecase for this. Assume that in a package you want to handle several paths to different files and directories that are all located in a common package-specific parent directory. Then using the path protocol you could write this: class PackageBase (object): basepath = '/home/.package' class PackagePath (str, PackageBase): def __fspath__ (): return os.path.join(self.basepath, str(self)) config_file = PackagePath('.config') log_file = PackagePath('events.log') data_dir = PackagePath('data') with open(log_file) as log: log.write('package paths initialized.\n') Just that this wouldn't currently work because PackagePath inherits from str. Of course, there are other ways to achieve the above, but when you think about designing a Path-like object class str is just a pretty attractive base class to start from. Now lets look at compatibility of a class like PackagePath under this proposal: - if client code uses e.g. str(config_file) and proceeds to treat the resulting object as a path unexpected things will happen and, yes, that's bad. However, this is no different from any other Path-like object for which __str__ and __fspath__ don't define the same return value. - if client code uses the PEP-recommended backwards-compatible way of dealing with paths, path.__fspath__() if hasattr(path, "__fspath__") else path things will just work. Interstingly, this would *currently* produce an unexpected result namely that it would execute the__fspath__ method of the str-subclass - if client code uses instances of PackagePath as paths directly then in Python3.6 and below that would lead to unintended outcome, while in Python3.7 things would work. This is *really* bad. But what it means is that, under the proposal, using a str or bytes subclass with an __fspath__ method defined makes your code backwards-incompatible and the solution would be not to use such a class if you want to be backwards-compatible (and that should get documented somewhere). This restriction, of course, limits the usefulness of the proposal in the near future, but that disadvantage will vanish over time. In 5 years, not supporting Python3.6 anymore maybe won't be a big deal anymore (for comparison, Python3.2 was released 6 years ago and since last years pip is no longer supporting it). As Steven pointed out the proposal is *very* unlikely to break existing code. So to summarize, the proposal - avoids an up-front isinstance check in the protocol and thereby speeds up the processing of exact strings and bytes and of anything that follows the path protocol.* - slows down the processing of instances of regular str and bytes subclasses* - makes the "path.__fspath__() if hasattr(path, "__fspath__") else path" idiom consistent for subclasses of str and bytes that define __fspath__ - opens up the opportunity to write str/bytes subclasses that represent a path other than just their self in the future** Still sounds like a net win to me, but lets see what I forgot ... * yes, speed is typically not your primary concern when it comes to IO; what's often neglected though is that not all path operations have to trigger actual IO (things in os.path for example don't typically perform IO) ** somebody on the list (I guess it was Koos?) mentioned that such classes would only make sense if Python ever disallowed the use of str/bytes as paths, but I don't think that is a prerequisite here. Wolfgang