[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()
Koos Zevenhoven
k7hoven at gmail.com
Sun Apr 17 09:58:19 EDT 2016
On Sun, Apr 17, 2016 at 11:03 AM, Stephen J. Turnbull
<stephen at xemacs.org> wrote:
> Nick Coghlan writes:
>
> > str and bytes aren't going to implement __fspath__ (since they're
> > only *sometimes* path objects), so asking people to call the
> > protocol method directly for any purpose would be a pain.
>
> It *should* be a pain. People who need bytes should call fsencode,
> people who need str should call fsdecode, and Ethan's antipathy checks
> for bytes and str, then calls __fspath__ if needed. Who's left? Just
> the bartender and the janitor, last call was hours ago. OK, maybe
> there are enough clients to make it worthwhile to provide the utility,
> but it should be clearly marked as "double opt-in, for experts only
> (consenting adults must show proof of insurance)".
My doubts, expressed several times in these threads, about the need
for a *public* os.fspath function to complement the __fspath__
protocol, are now perhaps gone. I'll explain why (and how). The
reasons for my doubts were that
(1) The audience outside the stdlib for such a function should be
small, because it is preferred to either use existing tools in
os.path.* or pathlib (or similar) for manipulating paths.
(2) There are just too many different possible versions of this
function: rejecting str, rejecting bytes, coercion to str, coercion to
bytes, and accepting both str and bytes. That's a total of 5 different
cases. People also used to talk about versions that would not allow
passing through objects that are already bytes or str. That would make
it a total of 10 different versions!
(in principle, there could be even more, but let's not go there :-).
In other words, this argument was that it is probably best to
implement whatever flavor is needed for the context, perhaps based on
documented recipes.
Regarding (2), we can first rule out half of the 10 cases---the ones
that reject plain instances of bytes and/or str---because they would
not be very useful as all the isinstance/hasattr checking etc. would
be left to the caller. And here are the remaining five, explained
based on what they accept as argument, what they return, and where
they would be used:
(A) "polymorphic"
*Accept*: str and bytes, provided via __fspath__ as well as plain str
and bytes instances.
*Return*: str/bytes depending on input.
*Audience*: the stdlib, including os.path.things, os.things,
shutil.things, open, ... (some functions would need a C version).
There may even be a small audience outside the stdlib.
(B) "str-based only"
*Accept*: str, provided via __fspath__ as well as plain str.
*Return*: str.
*Audience*: relatively low-level code that works exclusively with str
paths but accepts specialized path objects as input.
(C) "bytes-based only"
*Accept*: bytes, provided via __fspath__ as well as plain bytes.
*Return*: bytes.
*Audience*: low-level code that explicitly deals with paths as bytes
(probably to deal with undefined/ill-defined encodings).
(D) "coerce to str"
*Accept*: str and bytes, provided via __fspath__ as well as plain str
and bytes instances.
*Return*: str (coerced / decoded if needed).
*Audience*: code that deals explicitly with str but wants to 'try'
supporting bytes-based path inputs too via implicit decoding (even if
it may result in surrogate escapes, which one cannot for instance
print(...).)
(E) "coerce to bytes"
*Accept*: str and bytes, provided via __fspath__ as well as plain str
and bytes instances.
*Return*: bytes (coerced / encoded if needed).
*Audience*: low-level code that explicitly deals with bytes paths but
wants to accept str-based path inputs too via implicit encoding.
Even if all options (A-E) probably have small audiences (compared to
e.g. os.path.*), some of them have larger audiences than others. But
all of them have at least *some* reasonable audience (as desribed
above).
Recently (well, a few days ago, but 'recently', considering the scale
of these discussions anyway ;-), Nick pointed out something I hadn't
realized---os.fsencode and os.fsdecode actually already implement
coercion to bytes and str, respectively. With those two functions made
compatible with the __fspath__ protocol [using (A) above], they would
in fact *be* (D) and (E), respectively.
Now, we only have options (A-C) left. They could all be implemented
roughly as follows:
def fspath(pathlike, *, output_types = (str,)):
if hasattr(pathlike, '__fspath__'):
ret = pathlike.__fspath__() # or pathlike.__fspath__ if it's not a method
else:
ret = pathlike
if not isinstance(ret, output_types):
raise TypeError("argument is not and does not provide an
acceptable pathname")
return ret
With an implementation like the above, (A) would correspond to
output_types = (str, bytes), (B) to the default, and (C) to
output_types = (bytes,).
So, with the above considerations as a counterargument, I consider
argument (2) gone.
What about argument (1), that the audience for the os.fspath(...)
function (especially for one selected version of the 5 or 10
variations!) is quite small, and we should not encourage manipulating
pathnames by hand, but to use os.path.* or pathlib instead?
The counterargument for (1):
It seems to me we now "all" agree that __fspath__ should allow
str+bytes polymorphism. I could try to list who I mean by "all"
(Ethan, Brett, Stephen T, Nick, ... ?), but obviously I won't be able
to list all or speak for them so I won't even try :-). Anyway, for
this argument, I'm assuming we agree on that. So, __fspath__ can
provide either str or bytes, even if str is *highly preferred* in most
places. Therefore, the os.fspath function, as part of the protocol,
has the important role of *by default* rejecting bytes, so that the
protocol effectively becomes str-only by default. With the fspath
implementation like the one I drafted above, and
os.fsencode+os.fsdecode, we in fact cover all cases (A-E).
So, as a summary: With a str+bytes-polymorphic __fspath__, with the
above argumentation and the rough implementation of os.fspath(...),
the conclusion is that the os.fspath function should indeed be public,
and that no further variations are needed.
-Koos
P.S. There is also the possibility of two dunder methods corresponding
to str and bytes, leading to one being preferred over the other in
some cases etc. I have gone though various aspects and possible
versions of that approach, but concluded it's not worth it, as some of
us may also have implied in earlier posts. After all, we want
something that's *almost* exclusively str.
More information about the Python-Dev
mailing list