[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Brett Cannon brett at python.org
Mon Apr 18 17:40:37 EDT 2016


On Mon, 18 Apr 2016 at 12:26 Stephen J. Turnbull <stephen at xemacs.org> wrote:

> Brett Cannon writes:
>
>  > If we continue with the "str is an encoding of file paths",
>
> It's not.  It's a representation, but not an encoding.  In Python 3,
> encoding means a representation of a character string using bytes.
> It's using "encoding" generically for "representation" that makes your
> head hurt.
>

Well, it makes *your* head hurt; for me it helped clarify some things. :)


>
>  > you can then build from "bytes is an encoding of str" to get a
>  > pyramid of file path encodings: Path -> str -> bytes. I don't think
>  > this is in any way a controversial view.
>
> Perhaps not.  But it's not particularly useful. ;-)  Here's the
> pyramid I think about:
>
>                                  Path
>                                 /    \
>                                /      \
>                               V        V
>                             str <-> bytes
>
> That is, str and bytes are interchangeable *without* any knowledge of
> paths, which are on a higher level of complexity and abstraction.
> Although in pathlib, there's an assumption that paths are serialized
> to str which is (implicitly) serialized to bytes when talking to the
> OS, this is not necessarily true for other structured path classes, in
> particular it is not true for DirEntry (which is a "enhanced
> degenerate" path containing only one path segment but also other
> useful information about the filesystem object addressed)
>
> I haven't looked at Antipathy, but I would guess from Ethan's
> promotion of bytes paths and concern with efficiency that "bytes
> antipaths" do *not* "go through" str to get to bytes, they already are
> bytes (in the sense of class inheritance).
>
>  > But that's when I realized that adding __fspath__ support to
> os.fsdecode()
>  > and os.fsencode(), they become more coercion functions rather than
>  > encoding/decoding functions. It also means that os.fspath() has a place
>  > when you want to say "I only want to encode a file path to str" and
> avoid
>  > the decode bit that os.fsdecode() would do
>
> I don't understand what you're trying to say here.  fsdecode currently
> does not promise to decode anything, because it's polymorphic,
> accepting str and bytes.  fsdecode and fsencode already *are* coercion
> functions.
>

And they will continue to be coercion functions. My point is that since
they coerce there is no way to use them in a way to dictate that you don't
want any str/bytes encoding/decoding to occur without checking the
arguments going into the function (i.e. "no guessing about encodings,
please"). By providing os.fspath() I can say that I do not, under any
circumstances, want someone to guess at the encoding some bytes path is
under to get me a string and instead I want to start and end entirely in a
world of strings. IOW os.fspath() lets me work in such a way that the
instant bytes are introduced into my code for file paths it triggers a
TypeError.


>
> It's this kind of semantic confusion and broken nomenclature that is
> *why* I dislike these polymorphic functions and objects so much.  It
> is impossible to reason correctly about them.  We're stuck with
> invoking "practicality" and muddling through.  And the names mislead
> even experienced Pythonistas.
>

Yep, we are stuck with the names unless you want to propose a new name and
deprecate the old one.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160418/de4fee64/attachment-0001.html>


More information about the Python-Dev mailing list