Re: [Python-ideas] tweaking the file system path protocol

May 28, 2017

      On Fri, May 26, 2017 at 03:58:23PM +0300, Koos Zevenhoven wrote:
...
On Wed, May 24, 2017 at 5:52 PM, Wolfgang Maier
<wolfgang.maier@biologie.uni-freiburg.de> wrote:
...
On 05/24/2017 02:41 AM, Steven D'Aprano wrote:
[...]
...
...
This is almost exactly what I have been thinking (just that I couldn't have
presented it so clearly)!
Unfortunately, this thinking is also very shallow compared to what
went into PEP519.
That is a rather rude comment. How would you feel if Wolfgang or I said 
that the PEP's thinking was "very shallow"? (I see you are listed as 
co-author.)

If you are going to criticise our reasoning, you better give reasons for 
why we are wrong, not just insult us:

"...this thinking is very shallow..."

"This is exactly the kind of code that causes the problems."

"Isn't it great that it doesn't work, so it's not attractive anymore?"

"Yes, this is another way of shooting yourself in the foot."

Let me look at your objections:
...
str and bytes subclasses that
return something different from the str/bytes content should not be
written.
That's your opinion, other people might disagree. In another post, you 
said it would be "confusing". I think this argument is FUD ("Fear, 
Uncertainty, Doubt"). We can already write confusing code in a million 
other ways, why is this one to be prohibited?

I don't know of any other area of Python where a type isn't permitted to 
override its own dunders:

strings have __str__ and __repr__
floats have __float__
ints have __int__
tuples can override __getitem__ to return whatever they like

etc. This is legal:

py> class ConfusingStr(str):
...     def __getitem__(self, i):
...             return 'x'
...
py> s = ConfusingStr("Nobody expects the Spanish Inquisition!")
py> s[5]
'x'

People have had the ability to write "confusing" strings, floats and 
ints which could return something different from their own value. They 
either don't do it, or if they do, they have a good reason and it 
isn't so confusing.

And if somebody does use it to write a confusing class? So what? 
"consenting adults" applies here. We aren't responsible for every abuse 
of the language that somebody might do. Why is __fspath__ so special 
that we need to protect users from doing something confusing?

What *really is* confusing is to ignore __fspath__ methods in some 
objects but not other objects. If that decision was intentional, I don't 
think it was justified in the PEP. (At least, I didn't see it.)
...
...
Lets look at a potential usecase for this. Assume that in a package you want
to handle several paths to different files and directories that are all
located in a common package-specific parent directory. Then using the path
protocol you could write this:
class PackageBase (object):
    basepath = '/home/.package'
class PackagePath (str, PackageBase):
    def __fspath__ ():
        return os.path.join(self.basepath, str(self))
config_file = PackagePath('.config')
log_file = PackagePath('events.log')
data_dir = PackagePath('data')
with open(log_file) as log:
    log.write('package paths initialized.\n')
This is exactly the kind of code that causes the problems. It will do
the wrong thing when code like open(str(log_file), 'w') is used for
compatiblity.
Then don't do that.

Using open(str(log_file), 'w') is not the right way to emulate the Path 
protocol for backwards compatibility. The whole reason the Path protocol 
exists is because calling str(obj) is the wrong way to convert an 
unknown object to a file system path string.

I think this argument about backwards compatibility is a storm in a tea 
cup. We can enumerate all the possibilities:

1. object that doesn't inherit from str/bytes: behaviour is unchanged;

2. object that does inherit from str/bytes, but doesn't override
   the __fspath__ method: behaviour is unchanged;

3. object that inherits from str/bytes, *and* overrides the __fspath__ 
   method: behaviour is changed.

Okay, the behaviour changes. I doubt that there will be many 
classes that subclass str and override __fspath__ now, because 
that would have been a waste of time up to now. So the main risk is:

- classes created from Python 3.7 onwards;
- which inherit from str/bytes;
- and which override __fspath__;
- and are back-ported to 3.6;
- without taking into account that __fspath__ will be ignored in 3.6;
- and the users don't read the docs to learn about the difference.

The danger here is the possibility that the wrong pathname will be used, 
if str(obj) and fspath(obj) return a different string.

Personally I think this is unlikely and not worth worrying about beyond a note in 
the documentation, but if people really feel this is a problem we could 
make this a __future__ import. But that just feels like overkill.

-- 
Steve