[Python-3000] Path Reform: Get the ball rolling

Wed Nov 1 09:17:25 CET 2006

More comments...

Mike Orr wrote:
> Talin wrote:
>> 1) Does os.path need to be refactored at all?
> 
> Yes.  Functions are scattered arbitrarily across six modules: os,
> os.path, shutil, stat, glob, fnmatch.  You have to search through five
> scattered doc pages in the Python library to find your function, plus
> the os module doc is split into five sections.  You may think
> 'shlutil' has to do with shells, not paths.  shutil.copy2 is
> riduculously named: what's so "2" about it?  Why is 'split' in os.path
> but 'stat' and 'mkdir' and 'remove' are in os?  Don't they all operate
> on paths?
> 
> The lack of method chaning means you have to use nested functions,
> which must be read "inside out" rather than left-to-right like paths
> normally go. Say you want to add the absolute path of "../../lib" to
> the Python path in a platform-independent manner, relative to an
> absolute path (__file__):
> 
>     # Assume __file__ is "/toplevel/app1/bin/main_program.py".
>     # Result is "/toplevel/app1/lib".
>     p = os.path.join(os.path.dirname(os.path.dirname(__file__)), "lib")
> 
> PEP 355 proposes a much easier-to-read:
> 
>     # The Path object emulates "/toplevel/app1/bin/main_program.py".
>     p = Path(__file__).parent.parent.join("lib")

Actually I generally use:

       p = os.path.normpath( os.path.join( __file__, "../..", "lib" ) )

or even:

       p = os.path.normpath( os.path.join( __file__, "../../lib" ) )

...which isn't quite as concise as what you wrote, but is better than 
the first example. (The reason this works is because 'normpath' doesn't 
know whether the last component is a file or a directory -- it simply 
interprets the ".." as an instruction to strip off the last component.)

What I'd like to see is a version of "join" that automatically 
simplifies as it goes. Lets call it "combine":

       p = os.path.combine( __file__, "../..", "lib" )

or:

       p = os.path.combine( __file__, "../../lib" )

That's even easier to read than any of the above versions IMHO.

> Noam Raphael's directory-component object would make this even more
> straightforward:
> 
>     # The Path object emulates ("/", "toplevel", "app1", "bin",
> "main_program.py")
>     p = Path(__file__)[:-2] + "lib"

I don't know if I would describe this as 'straightforward'. 'Concise', 
certainly; 'terse', yes, and 'clever'. But also 'cute', and 'tricky'. I 
see a couple of problems with it:

-- Its only intuitive if you remember that array elements are path 
components and not strings. In other words, if you attempt to "read" the 
[:-2] as if the path were a string, you'll get the wrong answer.

-- Is path[ 0 ] a string or a path? What if I really do want to get the 
first two *characters* of the path, and not the first to components? Do 
I have to say something like:

    str( path )[ :2 ]

> Stat handling has grown cruft over the years. To check the modify time
> of a file:
> 
>     os.path.getmtime("/FOO")
>     os.stat("/FOO").st_mtime
>     os.stat("/FOO")[stat.ST_MTIME]  # List subscript, deprecated usage.
> 
> If you want to check whether a file is a type for which there is no
> os.path.is*() method:
> 
>     stat.S_ISSOCK( os.stat("/FOO").st_mode )  # Is the file a socket?
> 
> Compare to the directory-component proposal:
> 
>     Path("/foo").stat().issock

This is the part I really don't like. A path is not a file.

Imagine that if instead of paths we were doing SQL queries. Take 
SQLObject for example; say we have a table of addresses:

    Address.select( query_string )

Now, suppose you say that you want to be able to perform manipulations 
on the query string, and therefore it should be an object. So we'll 
define a new class, SQLQuery( string ):

    Address.select( SQLQuery( string ) )

And we will allow queries to be conjoined using boolean operators:

    Address.select( SQLQuery( string ) | SQLQuery( string2 ) )

Nothing controversial so far - this is the way many such systems work 
already.

But now you say "Well, since SQLQuery is an object, it would be more 
elegant to have all of the query-related functions be methods of the 
query object." So for example, if you wanted to run the query string 
against the Address table, and see how many records came back, you would 
have to do something like:

    SQLQuery( string ).select( Address ).count()

..which is exactly backwards, and here's why: Generally when creating 
member functions of objects, you arrange them in the form:

    actor.operation( subject )

Where 'actor' is considered to be the 'active agent' of the operation, 
while the 'subject' is the passive input parameter.

I would argue that both paths and query strings are passive, whereas 
tables and file systems are, if not exactly lively, at least more 
'actor-like' than paths or queries.

Now, that being said, I wouldn't have a problem with there being an 
"abstract filesystem object" that represents an entity on disk (be it 
file, directory, etc.), which would have a path inside it that would do 
some of the things you suggest.

> os.path functions are too low-level.  Say you want to recursively
> delete a path you're about to overwrite, no matter whether it exists
> or is a file or directory.  You can't do it in one line of code, darn,
> you gotta write this function or inline the code everywhere you use
> it:
> 
>     def purge(p):
>         if os.path.isdir(p):
>             shutil.rmtree(p)       # Raises error if nonexistent or
> not a directory.
>         elif os.path.exists():
>             # isfile follows symlinks and returns False for special
> files, so it's
>             # not a reliable guide of whether we can call os.remove.
>             os.remove(p)          # Raises error if nonexistent or a directory.
>         if os.path.isfile(p):  # Includes all symlinks.
>             os.remove(p)

I don't deny that such a function ought to exist. But it shouldn't be a 
member function on a path object IMHO.

>> 2) is there anything that the existing os.path *won't do* that we desperately need it to do?
> 
> For filesystem files, no.  Though you really mean all six modules
> above and not just os.path.  It has been proposed to support
> non-filesystem directories (zip files, CSV/Subversion sandboxes, URLs,
> FTP objects) under a new Path API.
> 
>>  3) Assuming that the answer to #1 is "yes", the next question is:
> "evolution or revolution?"
> 
> Revolution.  It needs a clean new API.  However, this can live
> alongside the existing functions if necessary:  posixpath.PosixPath,
> path.Path, etc.
> 
>> 4) My third question is: Who are we going to steal our ideas from?
> Boost, Java, C# and others - all are worthy of being the, ahem, target
> of our inspiration. Or we have some alternative that's so cool that it
> makes sense to "Think Different(ly)"?
> 
> Java is the only one I'm familiar with.  The existing Python proposals
> are summarized below.
> 
>> 5) Must there be one ring to rule them all? I suggested earlier that we
> might have a "low-level" and a "high-level" API, one built on top of the
> other. Is this a good idea, or a non-starter?
> 
> It's worth discussing.  One question is whether the dichotomy does
> anything useful or just adds unnecessary complexity.  But that can
> only be answered for a specific API proposal.  Whatever we do will be
> "low-level" compared to third-party extensions that will be built on
> top of it, so we should plan for extensibility.

Actually, I was considering the PEP 355 to be "high-level" and the 
current os.path to be "low-level".

-- Talin