[Python-ideas] tweaking the file system path protocol
Wolfgang Maier
wolfgang.maier at biologie.uni-freiburg.de
Wed May 24 10:52:12 EDT 2017
On 05/24/2017 02:41 AM, Steven D'Aprano wrote:
> On Wed, May 24, 2017 at 12:18:16AM +0300, Serhiy Storchaka wrote:
>>
>> It seems to me that the purpose of this proposition is not performance,
>> but the possibility to use __fspath__ in str or bytes subclasses.
>> Currently defining __fspath__ in str or bytes subclasses doesn't have
>> any effect.
>
> That's how I interpreted the proposal, with any performance issue being
> secondary. (I don't expect that converting path-like objects to strings
> would be the bottleneck in any application doing actual disk IO.)
>
>
>> I don't know a reasonable use case for this feature. The __fspath__
>> method of str or bytes subclasses returning something not equivalent to
>> self looks confusing to me.
>
> I can imagine at least two:
>
> - emulating something like DOS 8.3 versus long file names;
> - case normalisation
>
> but what would make this really useful is for debugging. For instance, I
> have used something like this to debug problems with int() being called
> wrongly:
>
> py> class MyInt(int):
> ... def __int__(self):
> ... print("__int__ called")
> ... return super().__int__()
> ...
> py> x = MyInt(23)
> py> int(x)
> __int__ called
> 23
>
> It would be annoying and inconsistent if int(x) avoided calling __int__
> on int subclasses. But that's exactly what happens with fspath and str.
> I see that as a bug, not a feature: I find it hard to believe that we
> would design an interface for string-like objects (paths) and then
> intentionally prohibit it from applying to strings.
>
> And if we did, surely its a misfeature. Why *shouldn't* subclasses of
> str get the same opportunity to customize the result of __fspath__ as
> they get to customize their __repr__ and __str__?
>
> py> class MyStr(str):
> ... def __repr__(self):
> ... return 'repr'
> ... def __str__(self):
> ... return 'str'
> ...
> py> s = MyStr('abcdef')
> py> repr(s)
> 'repr'
> py> str(s)
> 'str'
>
This is almost exactly what I have been thinking (just that I couldn't
have presented it so clearly)!
Lets look at a potential usecase for this. Assume that in a package you
want to handle several paths to different files and directories that are
all located in a common package-specific parent directory. Then using
the path protocol you could write this:
class PackageBase (object):
basepath = '/home/.package'
class PackagePath (str, PackageBase):
def __fspath__ ():
return os.path.join(self.basepath, str(self))
config_file = PackagePath('.config')
log_file = PackagePath('events.log')
data_dir = PackagePath('data')
with open(log_file) as log:
log.write('package paths initialized.\n')
Just that this wouldn't currently work because PackagePath inherits from
str. Of course, there are other ways to achieve the above, but when you
think about designing a Path-like object class str is just a pretty
attractive base class to start from.
Now lets look at compatibility of a class like PackagePath under this
proposal:
- if client code uses e.g. str(config_file) and proceeds to treat the
resulting object as a path unexpected things will happen and, yes,
that's bad. However, this is no different from any other Path-like
object for which __str__ and __fspath__ don't define the same return value.
- if client code uses the PEP-recommended backwards-compatible way of
dealing with paths,
path.__fspath__() if hasattr(path, "__fspath__") else path
things will just work. Interstingly, this would *currently* produce an
unexpected result namely that it would execute the__fspath__ method of
the str-subclass
- if client code uses instances of PackagePath as paths directly then in
Python3.6 and below that would lead to unintended outcome, while in
Python3.7 things would work. This is *really* bad.
But what it means is that, under the proposal, using a str or bytes
subclass with an __fspath__ method defined makes your code
backwards-incompatible and the solution would be not to use such a class
if you want to be backwards-compatible (and that should get documented
somewhere). This restriction, of course, limits the usefulness of the
proposal in the near future, but that disadvantage will vanish over
time. In 5 years, not supporting Python3.6 anymore maybe won't be a big
deal anymore (for comparison, Python3.2 was released 6 years ago and
since last years pip is no longer supporting it). As Steven pointed out
the proposal is *very* unlikely to break existing code.
So to summarize, the proposal
- avoids an up-front isinstance check in the protocol and thereby speeds
up the processing of exact strings and bytes and of anything that
follows the path protocol.*
- slows down the processing of instances of regular str and bytes
subclasses*
- makes the "path.__fspath__() if hasattr(path, "__fspath__") else path"
idiom consistent for subclasses of str and bytes that define __fspath__
- opens up the opportunity to write str/bytes subclasses that represent
a path other than just their self in the future**
Still sounds like a net win to me, but lets see what I forgot ...
* yes, speed is typically not your primary concern when it comes to IO;
what's often neglected though is that not all path operations have to
trigger actual IO (things in os.path for example don't typically perform IO)
** somebody on the list (I guess it was Koos?) mentioned that such
classes would only make sense if Python ever disallowed the use of
str/bytes as paths, but I don't think that is a prerequisite here.
Wolfgang
More information about the Python-ideas
mailing list