[Python-ideas] PEP 428 - object-oriented filesystem paths
Calvin Spealman
ironfroggy at gmail.com
Sat Oct 6 18:14:40 CEST 2012
Responding late, but I didn't get a chance to get my very strong
feelings on this proposal in yesterday.
I do not like it. I'll give full disclosure and say that I think our
earlier failure to include the path library in the stdlib has been a
loss for Python and I'll always hope we can fix that one day. I still
hold out hope.
It feels like this proposal is "make it object oriented, because
object oriented is good" without any actual justification or obvious
problem this solves. The API looks clunky and redundant, and does not
appear to actually improve anything over the facilities in the os.path
module. This takes a lot of things we can already do with paths and
files and remixes them into a not-so intuitive API for the sake of
change, not for the sake of solving a real problem.
As for specific problems I have with the proposal:
Frankly, I think not keeping the / operator for joining is a huge
mistake. This is the number one best feature of path and despite that
many people don't like it, it makes sense. It makes our most common
path operation read very close to the actual representation of the
what you're creating. This is great.
Not inheriting from str means that we can't directly path these path
objects to existing code that just expects a string, so we have a
really hard boundary around the edges of this new API. It does not
lend itself well to incrementally transitioning to it from existing
code.
The stat operations and other file-facilities tacked on feel out of
place, and limited. Why does it make sense to add these facilities to
path and not other file operations? Why not give me a read method on
paths? or maybe a copy? Putting lots of file facilities on a path
object feels wrong because you can't extend it easily. This is one
place that function(thing) works better than thing.function()
Overall, I'm completely -1 on the whole thing.
On Fri, Oct 5, 2012 at 2:25 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Hello,
>
> This PEP is a resurrection of the idea of having object-oriented
> filesystem paths in the stdlib. It comes with a general API proposal
> as well as a specific implementation (*). The implementation is young
> and discussion is quite open.
>
> (*) http://pypi.python.org/pypi/pathlib/
>
> Regards
>
> Antoine.
>
> PS: You can all admire my ASCII-art skills.
>
>
>
> PEP: 428
> Title: The pathlib module -- object-oriented filesystem paths
> Version: $Revision$
> Last-Modified: $Date
> Author: Antoine Pitrou <solipsis at pitrou.net>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 30-July-2012
> Python-Version: 3.4
> Post-History:
>
>
> Abstract
> ========
>
> This PEP proposes the inclusion of a third-party module, `pathlib`_, in
> the standard library. The inclusion is proposed under the provisional
> label, as described in :pep:`411`. Therefore, API changes can be done,
> either as part of the PEP process, or after acceptance in the standard
> library (and until the provisional label is removed).
>
> The aim of this library is to provide a simple hierarchy of classes to
> handle filesystem paths and the common operations users do over them.
>
> .. _`pathlib`: http://pypi.python.org/pypi/pathlib/
>
>
> Related work
> ============
>
> An object-oriented API for filesystem paths has already been proposed
> and rejected in :pep:`355`. Several third-party implementations of the
> idea of object-oriented filesystem paths exist in the wild:
>
> * The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs
> and others, which provides a ``str``-subclassing ``Path`` class;
>
> * Twisted's slightly specialized `FilePath class`_;
>
> * An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than
> ``str``;
>
> * `Unipath`_, a variation on the str-subclassing approach with two public
> classes, an ``AbstractPath`` class for operations which don't do I/O and a
> ``Path`` class for all common operations.
>
> This proposal attempts to learn from these previous attempts and the
> rejection of :pep:`355`.
>
>
> .. _`path.py module`: https://github.com/jaraco/path.py
> .. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html
> .. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass
> .. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview
>
>
> Why an object-oriented API
> ==========================
>
> The rationale to represent filesystem paths using dedicated classes is the
> same as for other kinds of stateless objects, such as dates, times or IP
> addresses. Python has been slowly moving away from strictly replicating
> the C language's APIs to providing better, more helpful abstractions around
> all kinds of common functionality. Even if this PEP isn't accepted, it is
> likely that another form of filesystem handling abstraction will be adopted
> one day into the standard library.
>
> Indeed, many people will prefer handling dates and times using the high-level
> objects provided by the ``datetime`` module, rather than using numeric
> timestamps and the ``time`` module API. Moreover, using a dedicated class
> allows to enable desirable behaviours by default, for example the case
> insensitivity of Windows paths.
>
>
> Proposal
> ========
>
> Class hierarchy
> ---------------
>
> The `pathlib`_ module implements a simple hierarchy of classes::
>
> +----------+
> | |
> ---------| PurePath |--------
> | | | |
> | +----------+ |
> | | |
> | | |
> v | v
> +---------------+ | +------------+
> | | | | |
> | PurePosixPath | | | PureNTPath |
> | | | | |
> +---------------+ | +------------+
> | v |
> | +------+ |
> | | | |
> | -------| Path |------ |
> | | | | | |
> | | +------+ | |
> | | | |
> | | | |
> v v v v
> +-----------+ +--------+
> | | | |
> | PosixPath | | NTPath |
> | | | |
> +-----------+ +--------+
>
>
> This hierarchy divides path classes along two dimensions:
>
> * a path class can be either pure or concrete: pure classes support only
> operations that don't need to do any actual I/O, which are most path
> manipulation operations; concrete classes support all the operations
> of pure classes, plus operations that do I/O.
>
> * a path class is of a given flavour according to the kind of operating
> system paths it represents. `pathlib`_ implements two flavours: NT paths
> for the filesystem semantics embodied in Windows systems, POSIX paths for
> other systems (``os.name``'s terminology is re-used here).
>
> Any pure class can be instantiated on any system: for example, you can
> manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects
> under Unix, and so on. However, concrete classes can only be instantiated
> on a matching system: indeed, it would be error-prone to start doing I/O
> with ``NTPath`` objects under Unix, or vice-versa.
>
> Furthermore, there are two base classes which also act as system-dependent
> factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
> ``PureNTPath`` depending on the operating system. Similarly, ``Path``
> will instantiate either a ``PosixPath`` or a ``NTPath``.
>
> It is expected that, in most uses, using the ``Path`` class is adequate,
> which is why it has the shortest name of all.
>
>
> No confusion with builtins
> --------------------------
>
> In this proposal, the path classes do not derive from a builtin type. This
> contrasts with some other Path class proposals which were derived from
> ``str``. They also do not pretend to implement the sequence protocol:
> if you want a path to act as a sequence, you have to lookup a dedicate
> attribute (the ``parts`` attribute).
>
> By avoiding to pass as builtin types, the path classes minimize the potential
> for confusion if they are combined by accident with genuine builtin types.
>
>
> Immutability
> ------------
>
> Path objects are immutable, which makes them hashable and also prevents a
> class of programming errors.
>
>
> Sane behaviour
> --------------
>
> Little of the functionality from os.path is reused. Many os.path functions
> are tied by backwards compatibility to confusing or plain wrong behaviour
> (for example, the fact that ``os.path.abspath()`` simplifies ".." path
> components without resolving symlinks first).
>
> Also, using classes instead of plain strings helps make system-dependent
> behaviours natural. For example, comparing and ordering Windows path
> objects is case-insensitive, and path separators are automatically converted
> to the platform default.
>
>
> Useful notations
> ----------------
>
> The API tries to provide useful notations all the while avoiding magic.
> Some examples::
>
> >>> p = Path('/home/antoine/pathlib/setup.py')
> >>> p.name
> 'setup.py'
> >>> p.ext
> '.py'
> >>> p.root
> '/'
> >>> p.parts
> <PosixPath.parts: ['/', 'home', 'antoine', 'pathlib', 'setup.py']>
> >>> list(p.parents())
> [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')]
> >>> p.exists()
> True
> >>> p.st_size
> 928
>
>
> Pure paths API
> ==============
>
> The philosophy of the ``PurePath`` API is to provide a consistent array of
> useful path manipulation operations, without exposing a hodge-podge of
> functions like ``os.path`` does.
>
>
> Definitions
> -----------
>
> First a couple of conventions:
>
> * All paths can have a drive and a root. For POSIX paths, the drive is
> always empty.
>
> * A relative path has neither drive nor root.
>
> * A POSIX path is absolute if it has a root. A Windows path is absolute if
> it has both a drive *and* a root. A Windows UNC path (e.g.
> ``\\some\\share\\myfile.txt``) always has a drive and a root
> (here, ``\\some\\share`` and ``\\``, respectively).
>
> * A drive which has either a drive *or* a root is said to be anchored.
> Its anchor is the concatenation of the drive and root. Under POSIX,
> "anchored" is the same as "absolute".
>
>
> Construction and joining
> ------------------------
>
> We will present construction and joining together since they expose
> similar semantics.
>
> The simplest way to construct a path is to pass it its string representation::
>
> >>> PurePath('setup.py')
> PurePosixPath('setup.py')
>
> Extraneous path separators and ``"."`` components are eliminated::
>
> >>> PurePath('a///b/c/./d/')
> PurePosixPath('a/b/c/d')
>
> If you pass several arguments, they will be automatically joined::
>
> >>> PurePath('docs', 'Makefile')
> PurePosixPath('docs/Makefile')
>
> Joining semantics are similar to os.path.join, in that anchored paths ignore
> the information from the previously joined components::
>
> >>> PurePath('/etc', '/usr', 'bin')
> PurePosixPath('/usr/bin')
>
> However, with Windows paths, the drive is retained as necessary::
>
> >>> PureNTPath('c:/foo', '/Windows')
> PureNTPath('c:\\Windows')
> >>> PureNTPath('c:/foo', 'd:')
> PureNTPath('d:')
>
> Calling the constructor without any argument creates a path object pointing
> to the logical "current directory"::
>
> >>> PurePosixPath()
> PurePosixPath('.')
>
> A path can be joined with another using the ``__getitem__`` operator::
>
> >>> p = PurePosixPath('foo')
> >>> p['bar']
> PurePosixPath('foo/bar')
> >>> p[PurePosixPath('bar')]
> PurePosixPath('foo/bar')
>
> As with constructing, multiple path components can be specified at once::
>
> >>> p['bar/xyzzy']
> PurePosixPath('foo/bar/xyzzy')
>
> A join() method is also provided, with the same behaviour. It can serve
> as a factory function::
>
> >>> path_factory = p.join
> >>> path_factory('bar')
> PurePosixPath('foo/bar')
>
>
> Representing
> ------------
>
> To represent a path (e.g. to pass it to third-party libraries), just call
> ``str()`` on it::
>
> >>> p = PurePath('/home/antoine/pathlib/setup.py')
> >>> str(p)
> '/home/antoine/pathlib/setup.py'
> >>> p = PureNTPath('c:/windows')
> >>> str(p)
> 'c:\\windows'
>
> To force the string representation with forward slashes, use the ``as_posix()``
> method::
>
> >>> p.as_posix()
> 'c:/windows'
>
> To get the bytes representation (which might be useful under Unix systems),
> call ``bytes()`` on it, or use the ``as_bytes()`` method::
>
> >>> bytes(p)
> b'/home/antoine/pathlib/setup.py'
>
>
> Properties
> ----------
>
> Five simple properties are provided on every path (each can be empty)::
>
> >>> p = PureNTPath('c:/pathlib/setup.py')
> >>> p.drive
> 'c:'
> >>> p.root
> '\\'
> >>> p.anchor
> 'c:\\'
> >>> p.name
> 'setup.py'
> >>> p.ext
> '.py'
>
>
> Sequence-like access
> --------------------
>
> The ``parts`` property provides read-only sequence access to a path object::
>
> >>> p = PurePosixPath('/etc/init.d')
> >>> p.parts
> <PurePosixPath.parts: ['/', 'etc', 'init.d']>
>
> Simple indexing returns the invidual path component as a string, while
> slicing returns a new path object constructed from the selected components::
>
> >>> p.parts[-1]
> 'init.d'
> >>> p.parts[:-1]
> PurePosixPath('/etc')
>
> Windows paths handle the drive and the root as a single path component::
>
> >>> p = PureNTPath('c:/setup.py')
> >>> p.parts
> <PureNTPath.parts: ['c:\\', 'setup.py']>
> >>> p.root
> '\\'
> >>> p.parts[0]
> 'c:\\'
>
> (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).
>
> The ``parent()`` method returns an ancestor of the path::
>
> >>> p.parent()
> PureNTPath('c:\\python33\\bin')
> >>> p.parent(2)
> PureNTPath('c:\\python33')
> >>> p.parent(3)
> PureNTPath('c:\\')
>
> The ``parents()`` method automates repeated invocations of ``parent()``, until
> the anchor is reached::
>
> >>> p = PureNTPath('c:/python33/bin/python.exe')
> >>> for parent in p.parents(): parent
> ...
> PureNTPath('c:\\python33\\bin')
> PureNTPath('c:\\python33')
> PureNTPath('c:\\')
>
>
> Querying
> --------
>
> ``is_relative()`` returns True if the path is relative (see definition
> above), False otherwise.
>
> ``is_reserved()`` returns True if a Windows path is a reserved path such
> as ``CON`` or ``NUL``. It always returns False for POSIX paths.
>
> ``match()`` matches the path against a glob pattern::
>
> >>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY')
> True
>
> ``relative()`` returns a new relative path by stripping the drive and root::
>
> >>> PurePosixPath('setup.py').relative()
> PurePosixPath('setup.py')
> >>> PurePosixPath('/setup.py').relative()
> PurePosixPath('setup.py')
>
> ``relative_to()`` computes the relative difference of a path to another::
>
> >>> PurePosixPath('/usr/bin/python').relative_to('/usr')
> PurePosixPath('bin/python')
>
> ``normcase()`` returns a case-folded version of the path for NT paths::
>
> >>> PurePosixPath('CAPS').normcase()
> PurePosixPath('CAPS')
> >>> PureNTPath('CAPS').normcase()
> PureNTPath('caps')
>
>
> Concrete paths API
> ==================
>
> In addition to the operations of the pure API, concrete paths provide
> additional methods which actually access the filesystem to query or mutate
> information.
>
>
> Constructing
> ------------
>
> The classmethod ``cwd()`` creates a path object pointing to the current
> working directory in absolute form::
>
> >>> Path.cwd()
> PosixPath('/home/antoine/pathlib')
>
>
> File metadata
> -------------
>
> The ``stat()`` method caches and returns the file's stat() result;
> ``restat()`` forces refreshing of the cache. ``lstat()`` is also provided,
> but doesn't have any caching behaviour::
>
> >>> p.stat()
> posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
>
> For ease of use, direct attribute access to the fields of the stat structure
> is provided over the path object itself::
>
> >>> p.st_size
> 928
> >>> p.st_mtime
> 1328287308.889562
>
> Higher-level methods help examine the kind of the file::
>
> >>> p.exists()
> True
> >>> p.is_file()
> True
> >>> p.is_dir()
> False
> >>> p.is_symlink()
> False
>
> The file owner and group names (rather than numeric ids) are queried
> through matching properties::
>
> >>> p = Path('/etc/shadow')
> >>> p.owner
> 'root'
> >>> p.group
> 'shadow'
>
>
> Path resolution
> ---------------
>
> The ``resolve()`` method makes a path absolute, resolving any symlink on
> the way. It is the only operation which will remove "``..``" path components.
>
>
> Directory walking
> -----------------
>
> Simple (non-recursive) directory access is done by iteration::
>
> >>> p = Path('docs')
> >>> for child in p: child
> ...
> PosixPath('docs/conf.py')
> PosixPath('docs/_templates')
> PosixPath('docs/make.bat')
> PosixPath('docs/index.rst')
> PosixPath('docs/_build')
> PosixPath('docs/_static')
> PosixPath('docs/Makefile')
>
> This allows simple filtering through list comprehensions::
>
> >>> p = Path('.')
> >>> [child for child in p if child.is_dir()]
> [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
>
> Simple and recursive globbing is also provided::
>
> >>> for child in p.glob('**/*.py'): child
> ...
> PosixPath('test_pathlib.py')
> PosixPath('setup.py')
> PosixPath('pathlib.py')
> PosixPath('docs/conf.py')
> PosixPath('build/lib/pathlib.py')
>
>
> File opening
> ------------
>
> The ``open()`` method provides a file opening API similar to the builtin
> ``open()`` method::
>
> >>> p = Path('setup.py')
> >>> with p.open() as f: f.readline()
> ...
> '#!/usr/bin/env python3\n'
>
> The ``raw_open()`` method, on the other hand, is similar to ``os.open``::
>
> >>> fd = p.raw_open(os.O_RDONLY)
> >>> os.read(fd, 15)
> b'#!/usr/bin/env '
>
>
> Filesystem alteration
> ---------------------
>
> Several common filesystem operations are provided as methods: ``touch()``,
> ``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
> ``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be
> provided, for example some of the functionality of the shutil module.
>
>
> Experimental openat() support
> -----------------------------
>
> On compatible POSIX systems, the concrete PosixPath class can take advantage
> of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of
> open file descriptors as necessary. Support is enabled by passing the
> *use_openat* argument to the constructor::
>
> >>> p = Path(".", use_openat=True)
>
> Then all paths constructed by navigating this path (either by iteration or
> indexing) will also use the openat() family of functions. The point of using
> these functions is to avoid race conditions whereby a given directory is
> silently replaced with another (often a symbolic link to a sensitive system
> location) between two accesses.
>
> .. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html
>
>
> Copyright
> =========
>
> This document has been placed into the public domain.
>
>
> ..
> Local Variables:
> mode: indented-text
> indent-tabs-mode: nil
> sentence-end-double-space: t
> fill-column: 70
> coding: utf-8
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> http://mail.python.org/mailman/listinfo/python-ideas
--
Read my blog! I depend on your acceptance of my opinion! I am interesting!
http://techblog.ironfroggy.com/
Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy
More information about the Python-ideas
mailing list