[Python-Dev] Pathlib enhancements - improve fsdecode and fsencode

Wed Apr 20 09:30:39 EDT 2016

On Thu, Apr 14, 2016 at 9:55 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Please please please, junk both "filter out bytes" proposals.

If you were referring to some of the fspath versions, I think we will
need a bytes-rejecting version, for reasons explained in [1-2]. Of
course not eveŕyone wants or has to use it.

> Since they involve an exception, they impose an unnecessary "try" on
> all text applications that fear death on bytes returns.  May as well
> just wrap all objects with __fspath__ in fsdecode, and all is
> happy.
>
> Counterproposal: make fsdecode and fsencode grok __fspath__.  Then:

Not being a native English speaker, I'm relying on a Wikipedia
explanation of "grok", but if you mean that fsdecode and fsencode
would accept objects that implement __fspath__, then I think we all
agree on this. Making the stdlib accept path objects, after all, is
the whole point of the pathlib discussions :).

Anyway, I am happy that Nick [3] (and you [4] ?) pointed out that
os.fsencode and os.fsdecode currently implement coercion, i.e., they
both accept both str and bytes, and return just one of them. This was
important for my conclusion in [1]. When these two functions are made
__fspath__ compatible using `fspath(patharg, output_types = (str,
bytes))`, like most os functions, they will indeed implement coercion
to bytes or str from "any pathlike object".

[Side note: One may, for instance, ask why os.fsdecode passes str
objects through silently, even if they can't be decoded. Well, that's
the way it is, and I'm not expecting that to change. But maybe
fsdecode should have an additional keyword-only argument to tell them
that it should strictly return something it actually did decode. (And
similarly for os.fsencode.) But this has nothing to do with the path
protocol we are discussing.]

> (1) Bytes-lovers and str-addicts are both safe.

I don't think everyone is safe if you cant say "I don't want implicit
encoding/decoding".

> (2) They can omit fspath, too!

I think having *one* additonal function for the
non-encoding/non-decoding cases is too much, and as shown in [1], one
is enough.

> No, that doesn't work if the bytes objects aren't in the file system
> encoding, but these are *bytes*, mon ami: you have no way to find out
> what that encoding is, so you either know already and you substitute
> that + fspath for fsdecode, or you're hosed.  And in the only concrete
> use case so far, fsdecode Just Works.

Well, as you say yourself, fsdecode indeed works if your bytes are in
the default fs encoding, and when you know they are, go for it, use
fsdecode. But I, for instance, rarely have my paths as bytes.
Therefore, I would be happy to get an exception if I'm accidentally
passing bytes to some non-bytes-supporting function because I've
forgotten to decode some input that I got in an encoding other than
the file system encoding.

> I suppose a similar argument holds for applications that want bytes
> and fsencode, but I leave that as an exercise for the reader.

A similar counterargument holds, too :).

Unrelated to this particular post, I believe these discussions are
almost done and I truly hope we at least won't have to keep addressing
the same questions that we have already gone through, unless there is
something new on the table.

I hope it takes a shorter time to read these emails than it takes to
write them :).

-Koos

[1] https://mail.python.org/pipermail/python-dev/2016-April/144239.html
[2] https://mail.python.org/pipermail/python-dev/2016-April/144290.html

And somewhat older ones:

[3] https://mail.python.org/pipermail/python-dev/2016-April/144101.html
[4] https://mail.python.org/pipermail/python-dev/2016-April/144107.html