[Python-Dev] File system path PEP, 3rd draft

Guido van Rossum guido at python.org
Mon May 16 16:11:48 EDT 2016


Once you assign yourself a PEP number I'll do one more pass and then I
expect to accept it -- the draft looks good to me!

On Mon, May 16, 2016 at 1:00 PM, Brett Cannon <brett at python.org> wrote:

> Recent discussions have been about type hints which are orthogonal to the
> PEP, so things have seemed to have reached a steady state.
>
> Was there anything else that needed clarification, Guido, or are you ready
> to pronounce? Or did you want to wait until the language summit? Or did you
> want to assign a BDFL delegate?
>
>
> On Fri, 13 May 2016 at 11:37 Brett Cannon <brett at python.org> wrote:
>
>> Biggest changes since the second draft:
>>
>>    1. Resolve __fspath__() from the type, not the instance (for Guido)
>>    2. Updated the TypeError messages to say "os.PathLike object" instead
>>    of "path object" (implicitly for Steven)
>>    3. TODO item to define "path-like" in the glossary (for Steven)
>>    4. Various more things added to Rejected Ideas
>>    5. Added Koos as a co-author (for Koos :)
>>
>> ----------
>> PEP: NNN
>> Title: Adding a file system path protocol
>> Version: $Revision$
>> Last-Modified: $Date$
>> Author: Brett Cannon <brett at python.org>,
>>         Koos Zevenhoven <k7hoven at gmail.com>
>> Status: Draft
>> Type: Standards Track
>> Content-Type: text/x-rst
>> Created: 11-May-2016
>> Post-History: 11-May-2016,
>>               12-May-2016,
>>               13-May-2016
>>
>>
>> Abstract
>> ========
>>
>> This PEP proposes a protocol for classes which represent a file system
>> path to be able to provide a ``str`` or ``bytes`` representation.
>> Changes to Python's standard library are also proposed to utilize this
>> protocol where appropriate to facilitate the use of path objects where
>> historically only ``str`` and/or ``bytes`` file system paths are
>> accepted. The goal is to facilitate the migration of users towards
>> rich path objects while providing an easy way to work with code
>> expecting ``str`` or ``bytes``.
>>
>>
>> Rationale
>> =========
>>
>> Historically in Python, file system paths have been represented as
>> strings or bytes. This choice of representation has stemmed from C's
>> own decision to represent file system paths as
>> ``const char *`` [#libc-open]_. While that is a totally serviceable
>> format to use for file system paths, it's not necessarily optimal. At
>> issue is the fact that while all file system paths can be represented
>> as strings or bytes, not all strings or bytes represent a file system
>> path. This can lead to issues where any e.g. string duck-types to a
>> file system path whether it actually represents a path or not.
>>
>> To help elevate the representation of file system paths from their
>> representation as strings and bytes to a richer object representation,
>> the pathlib module [#pathlib]_ was provisionally introduced in
>> Python 3.4 through PEP 428. While considered by some as an improvement
>> over strings and bytes for file system paths, it has suffered from a
>> lack of adoption. Typically the key issue listed for the low adoption
>> rate has been the lack of support in the standard library. This lack
>> of support required users of pathlib to manually convert path objects
>> to strings by calling ``str(path)`` which many found error-prone.
>>
>> One issue in converting path objects to strings comes from
>> the fact that the only generic way to get a string representation of
>> the path was to pass the object to ``str()``. This can pose a
>> problem when done blindly as nearly all Python objects have some
>> string representation whether they are a path or not, e.g.
>> ``str(None)`` will give a result that
>> ``builtins.open()`` [#builtins-open]_ will happily use to create a new
>> file.
>>
>> Exacerbating this whole situation is the
>> ``DirEntry`` object [#os-direntry]_. While path objects have a
>> representation that can be extracted using ``str()``, ``DirEntry``
>> objects expose a ``path`` attribute instead. Having no common
>> interface between path objects, ``DirEntry``, and any other
>> third-party path library has become an issue. A solution that allows
>> any path-representing object to declare that it is a path and a way
>> to extract a low-level representation that all path objects could
>> support is desired.
>>
>> This PEP then proposes to introduce a new protocol to be followed by
>> objects which represent file system paths. Providing a protocol allows
>> for explicit signaling of what objects represent file system paths as
>> well as a way to extract a lower-level representation that can be used
>> with older APIs which only support strings or bytes.
>>
>> Discussions regarding path objects that led to this PEP can be found
>> in multiple threads on the python-ideas mailing list archive
>> [#python-ideas-archive]_ for the months of March and April 2016 and on
>> the python-dev mailing list archives [#python-dev-archive]_ during
>> April 2016.
>>
>>
>> Proposal
>> ========
>>
>> This proposal is split into two parts. One part is the proposal of a
>> protocol for objects to declare and provide support for exposing a
>> file system path representation. The other part deals with changes to
>> Python's standard library to support the new protocol. These changes
>> will also lead to the pathlib module dropping its provisional status.
>>
>> Protocol
>> --------
>>
>> The following abstract base class defines the protocol for an object
>> to be considered a path object::
>>
>>     import abc
>>     import typing as t
>>
>>
>>     class PathLike(abc.ABC):
>>
>>         """Abstract base class for implementing the file system path
>> protocol."""
>>
>>         @abc.abstractmethod
>>         def __fspath__(self) -> t.Union[str, bytes]:
>>             """Return the file system path representation of the
>> object."""
>>             raise NotImplementedError
>>
>>
>> Objects representing file system paths will implement the
>> ``__fspath__()`` method which will return the ``str`` or ``bytes``
>> representation of the path. The ``str`` representation is the
>> preferred low-level path representation as it is human-readable and
>> what people historically represent paths as.
>>
>>
>> Standard library changes
>> ------------------------
>>
>> It is expected that most APIs in Python's standard library that
>> currently accept a file system path will be updated appropriately to
>> accept path objects (whether that requires code or simply an update
>> to documentation will vary). The modules mentioned below, though,
>> deserve specific details as they have either fundamental changes that
>> empower the ability to use path objects, or entail additions/removal
>> of APIs.
>>
>>
>> builtins
>> ''''''''
>>
>> ``open()`` [#builtins-open]_ will be updated to accept path objects as
>> well as continue to accept ``str`` and ``bytes``.
>>
>>
>> os
>> '''
>>
>> The ``fspath()`` function will be added with the following semantics::
>>
>>     import typing as t
>>
>>
>>     def fspath(path: t.Union[PathLike, str, bytes]) -> t.Union[str,
>> bytes]:
>>         """Return the string representation of the path.
>>
>>         If str or bytes is passed in, it is returned unchanged.
>>         """
>>         if isinstance(path, (str, bytes)):
>>             return path
>>
>>         # Work from the object's type to match method resolution of other
>> magic
>>         # methods.
>>         path_type = type(path)
>>         try:
>>             return path_type.__fspath__(path)
>>         except AttributeError:
>>             if hasattr(path_type, '__fspath__'):
>>                 raise
>>
>>             raise TypeError("expected str, bytes or os.PathLike object,
>> not "
>>                             + path_type.__name__)
>>
>> The ``os.fsencode()`` [#os-fsencode]_ and
>> ``os.fsdecode()`` [#os-fsdecode]_ functions will be updated to accept
>> path objects. As both functions coerce their arguments to
>> ``bytes`` and ``str``, respectively, they will be updated to call
>> ``__fspath__()`` if present to convert the path object to a ``str`` or
>> ``bytes`` representation, and then perform their appropriate
>> coercion operations as if the return value from ``__fspath__()`` had
>> been the original argument to the coercion function in question.
>>
>> The addition of ``os.fspath()``, the updates to
>> ``os.fsencode()``/``os.fsdecode()``, and the current semantics of
>> ``pathlib.PurePath`` provide the semantics necessary to
>> get the path representation one prefers. For a path object,
>> ``pathlib.PurePath``/``Path`` can be used. To obtain the ``str`` or
>> ``bytes`` representation without any coersion, then ``os.fspath()``
>> can be used. If a ``str`` is desired and the encoding of ``bytes``
>> should be assumed to be the default file system encoding, then
>> ``os.fsdecode()`` should be used. If a ``bytes`` representation is
>> desired and any strings should be encoded using the default file
>> system encoding, then ``os.fsencode()`` is used. This PEP recommends
>> using path objects when possible and falling back to string paths as
>> necessary and using ``bytes`` as a last resort.
>>
>> Another way to view this is as a hierarchy of file system path
>> representations (highest- to lowest-level): path → str → bytes. The
>> functions and classes under discussion can all accept objects on the
>> same level of the hierarchy, but they vary in whether they promote or
>> demote objects to another level. The ``pathlib.PurePath`` class can
>> promote a ``str`` to a path object. The ``os.fspath()`` function can
>> demote a path object to a ``str`` or ``bytes`` instance, depending
>> on what ``__fspath__()`` returns.
>> The ``os.fsdecode()`` function will demote a path object to
>> a string or promote a ``bytes`` object to a ``str``. The
>> ``os.fsencode()`` function will demote a path or string object to
>> ``bytes``. There is no function that provides a way to demote a path
>> object directly to ``bytes`` while bypassing string demotion.
>>
>> The ``DirEntry`` object [#os-direntry]_ will gain an ``__fspath__()``
>> method. It will return the same value as currently found on the
>> ``path`` attribute of ``DirEntry`` instances.
>>
>> The Protocol_ ABC will be added to the ``os`` module under the name
>> ``os.PathLike``.
>>
>>
>> os.path
>> '''''''
>>
>> The various path-manipulation functions of ``os.path`` [#os-path]_
>> will be updated to accept path objects. For polymorphic functions that
>> accept both bytes and strings, they will be updated to simply use
>> ``os.fspath()``.
>>
>> During the discussions leading up to this PEP it was suggested that
>> ``os.path`` not be updated using an "explicit is better than implicit"
>> argument. The thinking was that since ``__fspath__()`` is polymorphic
>> itself it may be better to have code working with ``os.path`` extract
>> the path representation from path objects explicitly. There is also
>> the consideration that adding support this deep into the low-level OS
>> APIs will lead to code magically supporting path objects without
>> requiring any documentation updated, leading to potential complaints
>> when it doesn't work, unbeknownst to the project author.
>>
>> But it is the view of this PEP that "practicality beats purity" in
>> this instance. To help facilitate the transition to supporting path
>> objects, it is better to make the transition as easy as possible than
>> to worry about unexpected/undocumented duck typing support for
>> path objects by projects.
>>
>> There has also been the suggestion that ``os.path`` functions could be
>> used in a tight loop and the overhead of checking or calling
>> ``__fspath__()`` would be too costly. In this scenario only
>> path-consuming APIs would be directly updated and path-manipulating
>> APIs like the ones in ``os.path`` would go unmodified. This would
>> require library authors to update their code to support path objects
>> if they performed any path manipulations, but if the library code
>> passed the path straight through then the library wouldn't need to be
>> updated. It is the view of this PEP and Guido, though, that this is an
>> unnecessary worry and that performance will still be acceptable.
>>
>>
>> pathlib
>> '''''''
>>
>> The constructor for ``pathlib.PurePath`` and ``pathlib.Path`` will be
>> updated to accept ``PathLike`` objects. Both ``PurePath`` and ``Path``
>> will continue to not accept ``bytes`` path representations, and so if
>> ``__fspath__()`` returns ``bytes`` it will raise an exception.
>>
>> The ``path`` attribute will be removed as this PEP makes it
>> redundant (it has not been included in any released version of Python
>> and so is not a backwards-compatibility concern).
>>
>>
>> C API
>> '''''
>>
>> The C API will gain an equivalent function to ``os.fspath()``::
>>
>>     /*
>>         Return the file system path of the object.
>>
>>         If the object is str or bytes, then allow it to pass through with
>>         an incremented refcount. If the object defines __fspath__(), then
>>         return the result of that method. All other types raise a
>> TypeError.
>>     */
>>     PyObject *
>>     PyOS_FSPath(PyObject *path)
>>     {
>>         if (PyUnicode_Check(path) || PyBytes_Check(path)) {
>>             Py_INCREF(path);
>>             return path;
>>         }
>>
>>         if (PyObject_HasAttrString(path->ob_type, "__fspath__")) {
>>             return PyObject_CallMethodObjArgs(path->ob_type,
>> "__fspath__", path,
>>                                             NULL);
>>         }
>>
>>         return PyErr_Format(PyExc_TypeError,
>>                             "expected a str, bytes, or os.PathLike
>> object, not %S",
>>                             path->ob_type);
>>     }
>>
>>
>>
>> Backwards compatibility
>> =======================
>>
>> There are no explicit backwards-compatibility concerns. Unless an
>> object incidentally already defines a ``__fspath__()`` method there is
>> no reason to expect the pre-existing code to break or expect to have
>> its semantics implicitly changed.
>>
>> Libraries wishing to support path objects and a version of Python
>> prior to Python 3.6 and the existence of ``os.fspath()`` can use the
>> idiom of
>> ``path.__fspath__() if hasattr(path, "__fspath__") else path``.
>>
>>
>> Implementation
>> ==============
>>
>> This is the task list for what this PEP proposes:
>>
>> #. Remove the ``path`` attribute from pathlib
>> #. Remove the provisional status of pathlib
>> #. Add ``os.PathLike``
>> #. Add ``os.fspath()``
>> #. Add ``PyOS_FSPath()``
>> #. Update ``os.fsencode()``
>> #. Update ``os.fsdecode()``
>> #. Update ``pathlib.PurePath`` and ``pathlib.Path``
>> #. Update ``builtins.open()``
>> #. Update ``os.DirEntry``
>> #. Update ``os.path``
>> #. Add a glossary entry for "path-like"
>>
>>
>> Rejected Ideas
>> ==============
>>
>> Other names for the protocol's method
>> -------------------------------------
>>
>> Various names were proposed during discussions leading to this PEP,
>> including ``__path__``, ``__pathname__``, and ``__fspathname__``. In
>> the end people seemed to gravitate towards ``__fspath__`` for being
>> unambiguous without being unnecessarily long.
>>
>>
>> Separate str/bytes methods
>> --------------------------
>>
>> At one point it was suggested that ``__fspath__()`` only return
>> strings and another method named ``__fspathb__()`` be introduced to
>> return bytes. The thinking is that by making ``__fspath__()`` not be
>> polymorphic it could make dealing with the potential string or bytes
>> representations easier. But the general consensus was that returning
>> bytes will more than likely be rare and that the various functions in
>> the os module are the better abstraction to promote over direct
>> calls to ``__fspath__()``.
>>
>>
>> Providing a ``path`` attribute
>> ------------------------------
>>
>> To help deal with the issue of ``pathlib.PurePath`` not inheriting
>> from ``str``, originally it was proposed to introduce a ``path``
>> attribute to mirror what ``os.DirEntry`` provides. In the end,
>> though, it was determined that a protocol would provide the same
>> result while not directly exposing an API that most people will never
>> need to interact with directly.
>>
>>
>> Have ``__fspath__()`` only return strings
>> ------------------------------------------
>>
>> Much of the discussion that led to this PEP revolved around whether
>> ``__fspath__()`` should be polymorphic and return ``bytes`` as well as
>> ``str`` or only return ``str``. The general sentiment for this view
>> was that ``bytes`` are difficult to work with due to their
>> inherent lack of information about their encoding and PEP 383 makes
>> it possible to represent all file system paths using ``str`` with the
>> ``surrogateescape`` handler. Thus, it would be better to forcibly
>> promote the use of ``str`` as the low-level path representation for
>> high-level path objects.
>>
>> In the end, it was decided that using ``bytes`` to represent paths is
>> simply not going to go away and thus they should be supported to some
>> degree. The hope is that people will gravitate towards path objects
>> like pathlib and that will move people away from operating directly
>> with ``bytes``.
>>
>>
>> A generic string encoding mechanism
>> -----------------------------------
>>
>> At one point there was a discussion of developing a generic mechanism
>> to extract a string representation of an object that had semantic
>> meaning (``__str__()`` does not necessarily return anything of
>> semantic significance beyond what may be helpful for debugging). In
>> the end, it was deemed to lack a motivating need beyond the one this
>> PEP is trying to solve in a specific fashion.
>>
>>
>> Have __fspath__ be an attribute
>> -------------------------------
>>
>> It was briefly considered to have ``__fspath__`` be an attribute
>> instead of a method. This was rejected for two reasons. One,
>> historically protocols have been implemented as "magic methods" and
>> not "magic methods and attributes". Two, there is no guarantee that
>> the lower-level representation of a path object will be pre-computed,
>> potentially misleading users that there was no expensive computation
>> behind the scenes in case the attribute was implemented as a property.
>>
>> This also indirectly ties into the idea of introducing a ``path``
>> attribute to accomplish the same thing. This idea has an added issue,
>> though, of accidentally having any object with a ``path`` attribute
>> meet the protocol's duck typing. Introducing a new magic method for
>> the protocol helpfully avoids any accidental opting into the protocol.
>>
>>
>> Provide specific type hinting support
>> -------------------------------------
>>
>> There was some consideration to provdinga generic ``typing.PathLike``
>> class which would allow for e.g. ``typing.PathLike[str]`` to specify
>> a type hint for a path object which returned a string representation.
>> While potentially beneficial, the usefulness was deemed too small to
>> bother adding the type hint class.
>>
>> This also removed any desire to have a class in the ``typing`` module
>> which represented the union of all acceptable path-representing types
>> as that can be represented with
>> ``typing.Union[str, bytes, os.PathLike]`` easily enough and the hope
>> is users will slowly gravitate to path objects only.
>>
>>
>> Provide ``os.fspathb()``
>> ------------------------
>>
>> It was suggested that to mirror the structure of e.g.
>> ``os.getcwd()``/``os.getcwdb()``, that ``os.fspath()`` only return
>> ``str`` and that another function named ``os.fspathb()`` be
>> introduced that only returned ``bytes``. This was rejected as the
>> purposes of the ``*b()`` functions are tied to querying the file
>> system where there is a need to get the raw bytes back. As this PEP
>> does not work directly with data on a file system (but which *may*
>> be), the view was taken this distinction is unnecessary. It's also
>> believed that the need for only bytes will not be common enough to
>> need to support in such a specific manner as ``os.fsencode()`` will
>> provide similar functionality.
>>
>>
>> Call ``__fspath__()`` off of the instance
>> -----------------------------------------
>>
>> An earlier draft of this PEP had ``os.fspath()`` calling
>> ``path.__fspath__()`` instead of ``type(path).__fspath__(path)``. The
>> changed to be consistent with how other magic methods in Python are
>> resolved.
>>
>>
>> Acknowledgements
>> ================
>>
>> Thanks to everyone who participated in the various discussions related
>> to this PEP that spanned both python-ideas and python-dev. Special
>> thanks to Stephen Turnbull for direct feedback on early drafts of this
>> PEP. More special thanks to Koos Zevenhoven and Ethan Furman for not
>> only feedback on early drafts of this PEP but also helping to drive
>> the overall discussion on this topic across the two mailing lists.
>>
>>
>> References
>> ==========
>>
>> .. [#python-ideas-archive] The python-ideas mailing list archive
>>    (https://mail.python.org/pipermail/python-ideas/)
>>
>> .. [#python-dev-archive] The python-dev mailing list archive
>>    (https://mail.python.org/pipermail/python-dev/)
>>
>> .. [#libc-open] ``open()`` documention for the C standard library
>>    (
>> http://www.gnu.org/software/libc/manual/html_node/Opening-and-Closing-Files.html
>> )
>>
>> .. [#pathlib] The ``pathlib`` module
>>    (https://docs.python.org/3/library/pathlib.html#module-pathlib)
>>
>> .. [#builtins-open] The ``builtins.open()`` function
>>    (https://docs.python.org/3/library/functions.html#open)
>>
>> .. [#os-fsencode] The ``os.fsencode()`` function
>>    (https://docs.python.org/3/library/os.html#os.fsencode)
>>
>> .. [#os-fsdecode] The ``os.fsdecode()`` function
>>    (https://docs.python.org/3/library/os.html#os.fsdecode)
>>
>> .. [#os-direntry] The ``os.DirEntry`` class
>>    (https://docs.python.org/3/library/os.html#os.DirEntry)
>>
>> .. [#os-path] The ``os.path`` module
>>    (https://docs.python.org/3/library/os.path.html#module-os.path)
>>
>>
>> Copyright
>> =========
>>
>> This document has been placed in the public domain.
>>
>>
>> ..
>>    Local Variables:
>>    mode: indented-text
>>    indent-tabs-mode: nil
>>    sentence-end-double-space: t
>>    fill-column: 70
>>    coding: utf-8
>>    End:
>>
>>


-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160516/adc56474/attachment-0001.html>


More information about the Python-Dev mailing list