[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Stephen J. Turnbull stephen at xemacs.org
Thu Apr 14 10:52:29 EDT 2016


Nick Coghlan writes:

 > The use case for returning bytes from __fspath__ is DirEntry, so you
 > can write things like this in low level code:
 > 
 >     def myscandir(dirpath):
 >         for entry in os.scandir(dirpath):
 >             if entry.is_file():
 >                 with open(entry) as f:
 >                     # do something

Excuse me, but that is *not* a use case for returning bytes from
DirEntry.__fspath__.  open() is perfectly happy taking str (including
surrogate-encoded rawbytes).  If the trivial thing is for __fspath__
to return bytes, then implicitly applying os.fsencode to the value
being returned is almost as trivial, and just as safe.  A low price to
pay for ensuring that text applications don't crash just because a
bytes-oriented object decides to implement __fspath__.

If there's any cost to defining __fspath__ as str-only, it's some
other use case.  What consumer of __fspath__ that expects bytes but
not str do you envision?  Is it generalizable, so that applying
fsencode to the value of __fspath__ would lead to "unacceptably"
widespread sprinkling of fsencode all over bytes-oriented code?

The more I think about this, the more I like my proposal to junk
fspath, and have fsdecode and fsencode consume __fspath__.  That way
application code can request its native type.

 > By contrast, as soon as you type "import pathlib" at the top of your
 > file, you've stepped outside the world of potentially pure boundary
 > code,

"Potentially pure" is an odd term to apply to the boundary code IMO.
We are agreed that conceptually paths are text, for human consumption
(at least at last report we were).  Therefore, paths represented as
bytes are inherently an impure construct.  Viz, surrogateescape.

 > and are instead dealing with structured application level
 > objects (which means traversing the bytes->str boundary before the
 > str->Path one).

That assumes that pathlib.Path's str-only design is appropriate.  I'm
questioning that, primarily as a thought experiment.



More information about the Python-Dev mailing list