[Python-ideas] tweaking the file system path protocol

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Wed May 24 10:52:12 EDT 2017


On 05/24/2017 02:41 AM, Steven D'Aprano wrote:
> On Wed, May 24, 2017 at 12:18:16AM +0300, Serhiy Storchaka wrote:
>>
>> It seems to me that the purpose of this proposition is not performance,
>> but the possibility to use __fspath__ in str or bytes subclasses.
>> Currently defining __fspath__ in str or bytes subclasses doesn't have
>> any effect.
> 
> That's how I interpreted the proposal, with any performance issue being
> secondary. (I don't expect that converting path-like objects to strings
> would be the bottleneck in any application doing actual disk IO.)
> 
>   
>> I don't know a reasonable use case for this feature. The __fspath__
>> method of str or bytes subclasses returning something not equivalent to
>> self looks confusing to me.
> 
> I can imagine at least two:
> 
> - emulating something like DOS 8.3 versus long file names;
> - case normalisation
> 
> but what would make this really useful is for debugging. For instance, I
> have used something like this to debug problems with int() being called
> wrongly:
> 
> py> class MyInt(int):
> ...     def __int__(self):
> ...             print("__int__ called")
> ...             return super().__int__()
> ...
> py> x = MyInt(23)
> py> int(x)
> __int__ called
> 23
> 
> It would be annoying and inconsistent if int(x) avoided calling __int__
> on int subclasses. But that's exactly what happens with fspath and str.
> I see that as a bug, not a feature: I find it hard to believe that we
> would design an interface for string-like objects (paths) and then
> intentionally prohibit it from applying to strings.
> 
> And if we did, surely its a misfeature. Why *shouldn't* subclasses of
> str get the same opportunity to customize the result of __fspath__ as
> they get to customize their __repr__ and __str__?
> 
> py> class MyStr(str):
> ...     def __repr__(self):
> ...             return 'repr'
> ...     def __str__(self):
> ...             return 'str'
> ...
> py> s = MyStr('abcdef')
> py> repr(s)
> 'repr'
> py> str(s)
> 'str'
> 

This is almost exactly what I have been thinking (just that I couldn't 
have presented it so clearly)!

Lets look at a potential usecase for this. Assume that in a package you 
want to handle several paths to different files and directories that are 
all located in a common package-specific parent directory. Then using 
the path protocol you could write this:

class PackageBase (object):
     basepath = '/home/.package'

class PackagePath (str, PackageBase):
     def __fspath__ ():
         return os.path.join(self.basepath, str(self))

config_file = PackagePath('.config')
log_file = PackagePath('events.log')
data_dir = PackagePath('data')

with open(log_file) as log:
     log.write('package paths initialized.\n')


Just that this wouldn't currently work because PackagePath inherits from 
str. Of course, there are other ways to achieve the above, but when you 
think about designing a Path-like object class str is just a pretty 
attractive base class to start from.

Now lets look at compatibility of a class like PackagePath under this 
proposal:

- if client code uses e.g. str(config_file) and proceeds to treat the 
resulting object as a path unexpected things will happen and, yes, 
that's bad. However, this is no different from any other Path-like 
object for which __str__ and __fspath__ don't define the same return value.

- if client code uses the PEP-recommended backwards-compatible way of 
dealing with paths,

path.__fspath__() if hasattr(path, "__fspath__") else path

things will just work. Interstingly, this would *currently* produce an 
unexpected result namely that it would execute the__fspath__ method of 
the str-subclass

- if client code uses instances of PackagePath as paths directly then in 
Python3.6 and below that would lead to unintended outcome, while in 
Python3.7 things would work. This is *really* bad.

But what it means is that, under the proposal, using a str or bytes 
subclass with an __fspath__ method defined makes your code 
backwards-incompatible and the solution would be not to use such a class 
if you want to be backwards-compatible (and that should get documented 
somewhere). This restriction, of course, limits the usefulness of the 
proposal in the near future, but that disadvantage will vanish over 
time. In 5 years, not supporting Python3.6 anymore maybe won't be a big 
deal anymore (for comparison, Python3.2 was released 6 years ago and 
since last years pip is no longer supporting it). As Steven pointed out 
the proposal is *very* unlikely to break existing code.

So to summarize, the proposal

- avoids an up-front isinstance check in the protocol and thereby speeds 
up the processing of exact strings and bytes and of anything that 
follows the path protocol.*

- slows down the processing of instances of regular str and bytes 
subclasses*

- makes the "path.__fspath__() if hasattr(path, "__fspath__") else path" 
idiom consistent for subclasses of str and bytes that define __fspath__

- opens up the opportunity to write str/bytes subclasses that represent 
a path other than just their self in the future**

Still sounds like a net win to me, but lets see what I forgot ...

* yes, speed is typically not your primary concern when it comes to IO; 
what's often neglected though is that not all path operations have to 
trigger actual IO (things in os.path for example don't typically perform IO)

** somebody on the list (I guess it was Koos?) mentioned that such 
classes would only make sense if Python ever disallowed the use of 
str/bytes as paths, but I don't think that is a prerequisite here.

Wolfgang



More information about the Python-ideas mailing list