[Python-3000] Path Reform: Get the ball rolling

Mike Orr sluggoster at gmail.com
Wed Nov 1 07:22:29 CET 2006


Talin wrote:
> 1) Does os.path need to be refactored at all?

Yes.  Functions are scattered arbitrarily across six modules: os,
os.path, shutil, stat, glob, fnmatch.  You have to search through five
scattered doc pages in the Python library to find your function, plus
the os module doc is split into five sections.  You may think
'shlutil' has to do with shells, not paths.  shutil.copy2 is
riduculously named: what's so "2" about it?  Why is 'split' in os.path
but 'stat' and 'mkdir' and 'remove' are in os?  Don't they all operate
on paths?

The lack of method chaning means you have to use nested functions,
which must be read "inside out" rather than left-to-right like paths
normally go. Say you want to add the absolute path of "../../lib" to
the Python path in a platform-independent manner, relative to an
absolute path (__file__):

    # Assume __file__ is "/toplevel/app1/bin/main_program.py".
    # Result is "/toplevel/app1/lib".
    p = os.path.join(os.path.dirname(os.path.dirname(__file__)), "lib")

PEP 355 proposes a much easier-to-read:

    # The Path object emulates "/toplevel/app1/bin/main_program.py".
    p = Path(__file__).parent.parent.join("lib")

Noam Raphael's directory-component object would make this even more
straightforward:

    # The Path object emulates ("/", "toplevel", "app1", "bin",
"main_program.py")
    p = Path(__file__)[:-2] + "lib"

Stat handling has grown cruft over the years. To check the modify time
of a file:

    os.path.getmtime("/FOO")
    os.stat("/FOO").st_mtime
    os.stat("/FOO")[stat.ST_MTIME]  # List subscript, deprecated usage.

If you want to check whether a file is a type for which there is no
os.path.is*() method:

    stat.S_ISSOCK( os.stat("/FOO").st_mode )  # Is the file a socket?

Compare to the directory-component proposal:

    Path("/foo").stat().issock

os.path functions are too low-level.  Say you want to recursively
delete a path you're about to overwrite, no matter whether it exists
or is a file or directory.  You can't do it in one line of code, darn,
you gotta write this function or inline the code everywhere you use
it:

    def purge(p):
        if os.path.isdir(p):
            shutil.rmtree(p)       # Raises error if nonexistent or
not a directory.
        elif os.path.exists():
            # isfile follows symlinks and returns False for special
files, so it's
            # not a reliable guide of whether we can call os.remove.
            os.remove(p)          # Raises error if nonexistent or a directory.
        if os.path.isfile(p):  # Includes all symlinks.
            os.remove(p)

> 2) is there anything that the existing os.path *won't do* that we desperately need it to do?

For filesystem files, no.  Though you really mean all six modules
above and not just os.path.  It has been proposed to support
non-filesystem directories (zip files, CSV/Subversion sandboxes, URLs,
FTP objects) under a new Path API.

>  3) Assuming that the answer to #1 is "yes", the next question is:
"evolution or revolution?"

Revolution.  It needs a clean new API.  However, this can live
alongside the existing functions if necessary:  posixpath.PosixPath,
path.Path, etc.

> 4) My third question is: Who are we going to steal our ideas from?
Boost, Java, C# and others - all are worthy of being the, ahem, target
of our inspiration. Or we have some alternative that's so cool that it
makes sense to "Think Different(ly)"?

Java is the only one I'm familiar with.  The existing Python proposals
are summarized below.

> 5) Must there be one ring to rule them all? I suggested earlier that we
might have a "low-level" and a "high-level" API, one built on top of the
other. Is this a good idea, or a non-starter?

It's worth discussing.  One question is whether the dichotomy does
anything useful or just adds unnecessary complexity.  But that can
only be answered for a specific API proposal.  Whatever we do will be
"low-level" compared to third-party extensions that will be built on
top of it, so we should plan for extensibility.

* * * *
Here's a summary of the existing Python proposals in chronological order.

Jason Orendorff's path.py
http://www.jorendorff.com/articles/python/path/src/path.py
This provides an OO path object encompassing the six modules above.
The object is a string subclass.

PEP 355
http://www.python.org/dev/peps/pep-0355/
An update to Orendorff's code.  This was rejected by the BDFL because:
(A) we haven't proven an OO interface is superior to os.path et al,
(B) it mixes abstract path operations and filesystem-dependent
operations, and (C) it's not radical enough for many OO proponents.
However, it does separate filesystem-dependent operations to an
extent: they must be methods, while abstract operations may be
attributes.  Orendorff's module does not separate these at all.

Noam Raphael's directory-based class
Introduction:  http://wiki.python.org/moin/AlternativePathClass
Feature discussion:  http://wiki.python.org/moin/AlternativePathDiscussion
Reference implementation:  http://wiki.python.org/moin/AlternativePathModule

Noam's emulates a sequence of components (a la os.path.split) rather
than a string.  This expresses slicing and joining by Python's [] and
+ operators, eliminating several named methods. Each component
emulates a unicode with methods to extract the name and extension(s).
The first component of absolute paths is a "root object" representing
the Posix root ("/"), a Windows drive-absolute ("C:\"), a Windows
drive-relative ("C:"), a network share, etc.

This proposal has a reference implementation but a PEP has not been
written, and the discussion page has many feature alternatives that
aren't coded.  A robust reference implementation would have methods
for all alternatives so they can be compared in real programs.

There have been a few additional ideas on python-dev but no code.
Nick Coghlan suggested separate classes for (A) string manipulation,
(B) abstract path operations, (C) read-only inspection of filesystem,
(D) add/remove files/directories/links.  Another suggested four
classes for (A) abstract path operations, (B) file, (C) directory, (D)
symlink.  Talin proposed an elaborate set of classes and functions to
do this.  I pointed out that each class would need 3+ versions to
accomodate platform differences, so somebody wanting to make a generic
subclass would potentially have to make 12 versions to accommodate all
the superclass possibilities (4 path classes * 3 platforms). Functions
would cut down the need for multiple classes and duplicated methods
between them, but functions would make "subclassing Path" more
difficult.

I would like to see one or more implementations tested and widely used
as soon as possible, so that we'd have confidence using them in our
programs until a general Python solution emerges.  But first we need
to see if we can achieve a common API, as Talin started this thread
saying.

-- 
Mike Orr <sluggoster at gmail.com>


More information about the Python-3000 mailing list