[Python-Dev] Path object design

Mike Orr sluggoster at gmail.com
Wed Nov 1 21:14:53 CET 2006


Argh, it's difficult to respond to one topic that's now spiraling into
two conversations on two lists.

glyph at divmod.com wrote:
> On 03:14 am, sluggoster at gmail.com wrote:
>
> >One thing is sure -- we urgently need something better than os.path.
> >It functions well but it makes hard-to-read and unpythonic code.
>
> I'm not so sure.  The need is not any more "urgent" today than it was
> 5 years ago, when os.path was equally "unpythonic" and unreadable.
> The problem is real but there is absolutely no reason to hurry to a
> premature solution.

Except that people have had to spend five years putting hard-to-read
os.path functions in the code, or reinventing the wheel with their own
libraries that they're not sure they can trust.  I started to use
path.py last year when it looked like it was emerging as the basis of
a new standard, but yanked it out again when it was clear the API
would be different by the time it's accepted.  I've gone back to
os.path for now until something stable emerges but I really wish I
didn't have to.

> I've already recommended Twisted's twisted.python.filepath module as a
> possible basis for the implementation of this feature....

> *It is already used in a large body of real, working code, and
> therefore its limitations are known.*

This is an important consideration.However, to me a clean API is more
important.  Since we haven't agreed on an API there is no widely-used
module that implements it... it's a chicken-and-egg problem since it
takes significant time to write and test an implementation.  So I'd
like to start from the standpoint of an ideal API rather than just
taking the API of the most widely-used implementation.  os.path is
clearly the most widely-used implementation, but that doesn't mean
that OOizing it as-is would be my favorite choice.

I took a quick look at filepath.  It looks similar in concept to PEP
355.  Four concerns:
    - unfamiliar method names (createDirectory vs mkdir, child vs join)
    - basename/dirname/parent are methods rather than properties:
leads to () overproliferation in user code.
    - the "secure" features may not be necessary.  If they are, this
should be a separate discussion, and perhaps implemented as a
subclass.
    - stylistic objection to verbose camelCase names like createDirectory


> Proposals for extending the language are contentious and it is very
> difficult to do experimentation with non-trivial projects because
> nobody wants to do that and then end up with a bunch of code written
> in a language that is no longer supported when the experiment fails.

True.

> Path representation is a bike shed.  Nobody would have proposed
> writing an entirely new embedded database engine for Python: python
> 2.5 simply included SQLite because its utility was already proven.

There's a quantum level of difference between path/file manipulation
-- which has long been considered a requirement for any full-featured
programming language -- and a database engine which is much more
complex.

Georg Brandl <g.brandl at gmx.net> wrote:
> I have been a supporter of the full-blown Path object in the past, but the
> recent discussions have convinved me that it is just too big and too confusing,
> and that you can't kill too many birds with one stone in this respect.
> Most of the ugliness really lies in the path name manipulation functions, which
> nicely map to methods on a path name object.

Fredrik has convinced me that it's more urgent to OOize the pathname
conversions than the filesystem operations.  Pathname conversions are
the ones that frequently get nested or chained, whereas filesystem
operations are usually done at the top level of a program statement,
or return a different "kind" of value (stat, true/false, etc).

However, it's interesting that all the proposals I've seen in the past
three years have been a "monolithic" OO class.  Clearly there are a
lot of people who prefer this way, or at least have never heard of
anything different.  Where have all the proponents of non-OO or
limited-OO strategies been?  The first proposal of that sort I've seen
was Nich Cochlan's October 1.  Have y'all just been ignoring the
monolithic OO efforts without offering any alternatives?


Fredrik Lundh <fredrik at pythonware.com> wrote:
> > This is fully backwards compatible, can go right into 2.6 without
> > breaking anything, allows people to update their code as they go,
> > and can be incrementally improved in future releases:
> >
> >      1) Add a pathname wrapper to "os.path", which lets you do basic
> >         path "algebra".  This should probably be a subclass of unicode,
> >         and should *only* contain operations on names.
> >
> >      2) Make selected "shutil" operations available via the "os" name-
> >         space; the old POSIX API vs. POSIX SHELL distinction is pretty
> >         irrelevant.  Also make the os.path predicates available via the
> >         "os" namespace.
> >
> > This gives a very simple conceptual model for the user; to manipulate
> > path *names*, use "os.path.<op>(string)" functions or the "<path>"
> > wrapper.  To manipulate *objects* identified by a path, given either as
> > a string or a path wrapper, use "os.<op>(path)".  This can be taught in
> > less than a minute.


Making this more concrete, I think Fredrik is suggesting:
    - Make (os.path) abspath, basename, commonprefix, dirname,
expanduser, expandvars, isabs, join, normcase, normpath, split,
splitdrive, splitext, splitunc methods of a Path object.
    - Copy functions into os: (os.path) exists, lexists,
get{atime,mtime,ctime,size}, is{file,dir,link,mount}, realpath,
samefile, sameopenfile, samestat, (shutil) copy, copy2,
copy{file,fileobj,mode,stat,tree}, rmtree, move.
    - Deprecate the old functions to remove in 3.0.
    - Abandon os.path.walk because os.walk is better.

This is worth considering as a start.  It does mean moving a lot of
functions that may be moved again at some point in the future.

If we do move shutil functions into os, I'd at least like to make some
tiny improvements in them.  Adding four lines to the beginning of
rmtree would make it behave like my purge() function without
detracting from its existing use:

    if not os.exists(p):
        return
    if not os.isdir(p):
        p.remove()

Also, do we really need six copy methods?  copy2 can be handled by a
third argument, etc.

-- 
Mike Orr <sluggoster at gmail.com>


More information about the Python-Dev mailing list