[Python-ideas] PEP 428 - object-oriented filesystem paths
solipsis at pitrou.net
Fri Oct 5 20:25:34 CEST 2012
This PEP is a resurrection of the idea of having object-oriented
filesystem paths in the stdlib. It comes with a general API proposal
as well as a specific implementation (*). The implementation is young
and discussion is quite open.
PS: You can all admire my ASCII-art skills.
Title: The pathlib module -- object-oriented filesystem paths
Author: Antoine Pitrou <solipsis at pitrou.net>
Type: Standards Track
This PEP proposes the inclusion of a third-party module, `pathlib`_, in
the standard library. The inclusion is proposed under the provisional
label, as described in :pep:`411`. Therefore, API changes can be done,
either as part of the PEP process, or after acceptance in the standard
library (and until the provisional label is removed).
The aim of this library is to provide a simple hierarchy of classes to
handle filesystem paths and the common operations users do over them.
.. _`pathlib`: http://pypi.python.org/pypi/pathlib/
An object-oriented API for filesystem paths has already been proposed
and rejected in :pep:`355`. Several third-party implementations of the
idea of object-oriented filesystem paths exist in the wild:
* The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs
and others, which provides a ``str``-subclassing ``Path`` class;
* Twisted's slightly specialized `FilePath class`_;
* An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than
* `Unipath`_, a variation on the str-subclassing approach with two public
classes, an ``AbstractPath`` class for operations which don't do I/O and a
``Path`` class for all common operations.
This proposal attempts to learn from these previous attempts and the
rejection of :pep:`355`.
.. _`path.py module`: https://github.com/jaraco/path.py
.. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html
.. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass
.. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview
Why an object-oriented API
The rationale to represent filesystem paths using dedicated classes is the
same as for other kinds of stateless objects, such as dates, times or IP
addresses. Python has been slowly moving away from strictly replicating
the C language's APIs to providing better, more helpful abstractions around
all kinds of common functionality. Even if this PEP isn't accepted, it is
likely that another form of filesystem handling abstraction will be adopted
one day into the standard library.
Indeed, many people will prefer handling dates and times using the high-level
objects provided by the ``datetime`` module, rather than using numeric
timestamps and the ``time`` module API. Moreover, using a dedicated class
allows to enable desirable behaviours by default, for example the case
insensitivity of Windows paths.
The `pathlib`_ module implements a simple hierarchy of classes::
---------| PurePath |--------
| | | |
| +----------+ |
| | |
| | |
v | v
+---------------+ | +------------+
| | | | |
| PurePosixPath | | | PureNTPath |
| | | | |
+---------------+ | +------------+
| v |
| +------+ |
| | | |
| -------| Path |------ |
| | | | | |
| | +------+ | |
| | | |
| | | |
v v v v
| | | |
| PosixPath | | NTPath |
| | | |
This hierarchy divides path classes along two dimensions:
* a path class can be either pure or concrete: pure classes support only
operations that don't need to do any actual I/O, which are most path
manipulation operations; concrete classes support all the operations
of pure classes, plus operations that do I/O.
* a path class is of a given flavour according to the kind of operating
system paths it represents. `pathlib`_ implements two flavours: NT paths
for the filesystem semantics embodied in Windows systems, POSIX paths for
other systems (``os.name``'s terminology is re-used here).
Any pure class can be instantiated on any system: for example, you can
manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects
under Unix, and so on. However, concrete classes can only be instantiated
on a matching system: indeed, it would be error-prone to start doing I/O
with ``NTPath`` objects under Unix, or vice-versa.
Furthermore, there are two base classes which also act as system-dependent
factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
``PureNTPath`` depending on the operating system. Similarly, ``Path``
will instantiate either a ``PosixPath`` or a ``NTPath``.
It is expected that, in most uses, using the ``Path`` class is adequate,
which is why it has the shortest name of all.
No confusion with builtins
In this proposal, the path classes do not derive from a builtin type. This
contrasts with some other Path class proposals which were derived from
``str``. They also do not pretend to implement the sequence protocol:
if you want a path to act as a sequence, you have to lookup a dedicate
attribute (the ``parts`` attribute).
By avoiding to pass as builtin types, the path classes minimize the potential
for confusion if they are combined by accident with genuine builtin types.
Path objects are immutable, which makes them hashable and also prevents a
class of programming errors.
Little of the functionality from os.path is reused. Many os.path functions
are tied by backwards compatibility to confusing or plain wrong behaviour
(for example, the fact that ``os.path.abspath()`` simplifies ".." path
components without resolving symlinks first).
Also, using classes instead of plain strings helps make system-dependent
behaviours natural. For example, comparing and ordering Windows path
objects is case-insensitive, and path separators are automatically converted
to the platform default.
The API tries to provide useful notations all the while avoiding magic.
>>> p = Path('/home/antoine/pathlib/setup.py')
<PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']>
[PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
Pure paths API
The philosophy of the ``PurePath`` API is to provide a consistent array of
useful path manipulation operations, without exposing a hodge-podge of
functions like ``os.path`` does.
First a couple of conventions:
* All paths can have a drive and a root. For POSIX paths, the drive is
* A relative path has neither drive nor root.
* A POSIX path is absolute if it has a root. A Windows path is absolute if
it has both a drive *and* a root. A Windows UNC path (e.g.
``\\some\\share\\myfile.txt``) always has a drive and a root
(here, ``\\some\\share`` and ``\\``, respectively).
* A drive which has either a drive *or* a root is said to be anchored.
Its anchor is the concatenation of the drive and root. Under POSIX,
"anchored" is the same as "absolute".
Construction and joining
We will present construction and joining together since they expose
The simplest way to construct a path is to pass it its string representation::
Extraneous path separators and ``"."`` components are eliminated::
If you pass several arguments, they will be automatically joined::
>>> PurePath('docs', 'Makefile')
Joining semantics are similar to os.path.join, in that anchored paths ignore
the information from the previously joined components::
>>> PurePath('/etc', '/usr', 'bin')
However, with Windows paths, the drive is retained as necessary::
>>> PureNTPath('c:/foo', '/Windows')
>>> PureNTPath('c:/foo', 'd:')
Calling the constructor without any argument creates a path object pointing
to the logical "current directory"::
A path can be joined with another using the ``__getitem__`` operator::
>>> p = PurePosixPath('foo')
As with constructing, multiple path components can be specified at once::
A join() method is also provided, with the same behaviour. It can serve
as a factory function::
>>> path_factory = p.join
To represent a path (e.g. to pass it to third-party libraries), just call
``str()`` on it::
>>> p = PurePath('/home/antoine/pathlib/setup.py')
>>> p = PureNTPath('c:/windows')
To force the string representation with forward slashes, use the ``as_posix()``
To get the bytes representation (which might be useful under Unix systems),
call ``bytes()`` on it, or use the ``as_bytes()`` method::
Five simple properties are provided on every path (each can be empty)::
>>> p = PureNTPath('c:/pathlib/setup.py')
The ``parts`` property provides read-only sequence access to a path object::
>>> p = PurePosixPath('/etc/init.d')
<PurePosixPath.parts: ['/', 'etc', 'init.d']>
Simple indexing returns the invidual path component as a string, while
slicing returns a new path object constructed from the selected components::
Windows paths handle the drive and the root as a single path component::
>>> p = PureNTPath('c:/setup.py')
<PureNTPath.parts: ['c:\\', 'setup.py']>
(separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).
The ``parent()`` method returns an ancestor of the path::
The ``parents()`` method automates repeated invocations of ``parent()``, until
the anchor is reached::
>>> p = PureNTPath('c:/python33/bin/python.exe')
>>> for parent in p.parents(): parent
``is_relative()`` returns True if the path is relative (see definition
above), False otherwise.
``is_reserved()`` returns True if a Windows path is a reserved path such
as ``CON`` or ``NUL``. It always returns False for POSIX paths.
``match()`` matches the path against a glob pattern::
``relative()`` returns a new relative path by stripping the drive and root::
``relative_to()`` computes the relative difference of a path to another::
``normcase()`` returns a case-folded version of the path for NT paths::
Concrete paths API
In addition to the operations of the pure API, concrete paths provide
additional methods which actually access the filesystem to query or mutate
The classmethod ``cwd()`` creates a path object pointing to the current
working directory in absolute form::
The ``stat()`` method caches and returns the file's stat() result;
``restat()`` forces refreshing of the cache. ``lstat()`` is also provided,
but doesn't have any caching behaviour::
posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
For ease of use, direct attribute access to the fields of the stat structure
is provided over the path object itself::
Higher-level methods help examine the kind of the file::
The file owner and group names (rather than numeric ids) are queried
through matching properties::
>>> p = Path('/etc/shadow')
The ``resolve()`` method makes a path absolute, resolving any symlink on
the way. It is the only operation which will remove "``..``" path components.
Simple (non-recursive) directory access is done by iteration::
>>> p = Path('docs')
>>> for child in p: child
This allows simple filtering through list comprehensions::
>>> p = Path('.')
>>> [child for child in p if child.is_dir()]
[PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
Simple and recursive globbing is also provided::
>>> for child in p.glob('**/*.py'): child
The ``open()`` method provides a file opening API similar to the builtin
>>> p = Path('setup.py')
>>> with p.open() as f: f.readline()
The ``raw_open()`` method, on the other hand, is similar to ``os.open``::
>>> fd = p.raw_open(os.O_RDONLY)
>>> os.read(fd, 15)
Several common filesystem operations are provided as methods: ``touch()``,
``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be
provided, for example some of the functionality of the shutil module.
Experimental openat() support
On compatible POSIX systems, the concrete PosixPath class can take advantage
of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of
open file descriptors as necessary. Support is enabled by passing the
*use_openat* argument to the constructor::
>>> p = Path(".", use_openat=True)
Then all paths constructed by navigating this path (either by iteration or
indexing) will also use the openat() family of functions. The point of using
these functions is to avoid race conditions whereby a given directory is
silently replaced with another (often a symbolic link to a sensitive system
location) between two accesses.
.. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
This document has been placed into the public domain.
More information about the Python-ideas