[Python-Dev] Pathlib enhancements - acceptable inputs and outputs for __fspath__ and os.fspath()

Nick Coghlan ncoghlan at gmail.com
Sun Apr 10 01:31:30 EDT 2016


On 10 April 2016 at 02:41, Ethan Furman <ethan at stoneleaf.us> wrote:
> If we add os.fspath(), but don't allow bytes to be returned from it, our
> above example looks more like:
>
>   if isinstance(a_path_thingy, bytes):
>       # because os can accept bytes
>       pass
>   else:
>       a_path_thingy = os.fspath(a_path_thingy)
>   # do something with the path
>
> Yes, it's better -- but it still requires a pre-check before calling
> os.fspath().
>
> It is my contention that this is better:
>
>   a_path_thingy = os.fspath(a_path_thingy)

That approach often doesn't work, though - by design, there are
situations where you can't transparently handle bytes and str with the
same code path in Python 3 the way you could in Python 2.

When somebody hands you bytes rather than text you need to worry about
the encoding, and you need to worry about returning bytes rather than
text yourself. https://hg.python.org/cpython/rev/e44410e5928e#l4.1
provides an illustration of how fiddly that can get, and that's in the
URL context - cross-platform filesystem path handling is worse, since
you need to worry about the significant differences between the way
Windows and *nix handle binary paths, and you can't use os.sep
directly any more (since that's always text).

> This raises two issues:
>
> 1) Part of the stdlib is the new scandir module, which can work
>    with, and return, both bytes and text -- if __fspath__ can only
>    hold text, DirEntry will not get the __fspath__ method added,
>    and the pre-check, boiler-plate code will flourish;

DirEntry can still get the check, it can just throw TypeError when it
represents a binary path (that's one of the advantages of using a
method-based protocol - exceptions on method calls are more acceptable
than exceptions on property access).

> 2) pathlib.Path accepts bytes -- so what happens when a byte-derived
>    Path is passed to os.fspath()?  Is a TypeError raised?  Do we
>    guess and auto-convert with fsdecode()?

pathlib is str-only (which makes sense, since it's a cross-platform
API and binary paths basically don't work on Windows):

>>> pathlib.Path(b".")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.4/pathlib.py", line 907, in __new__
    self = cls._from_parts(args, init=False)
  File "/usr/lib64/python3.4/pathlib.py", line 589, in _from_parts
    drv, root, parts = self._parse_args(args)
  File "/usr/lib64/python3.4/pathlib.py", line 581, in _parse_args
    % type(a))
TypeError: argument should be a path or str object, not <class 'bytes'>

The only specific mention of binary support in the pathlib docs is to
state that "bytes(p)" uses os.fsencode() to convert to the binary
representation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list