[Python-3000] Mini Path object

Mike Orr sluggoster at gmail.com
Sun Nov 26 10:55:44 CET 2006


Status update and questions about root splitting.

I've got a path module written (called unipath) and am in the middle
of the unit tests.   When they're done I'll release it for review.
It's been slow coz I was setting up two computers at the same time.

I tried to make a separate PathAlgebra class and FSPath class, but it
got so unweildly to use I made the latter a subclass.  They're now
called PathName and Path.  The main issues are:

    - Should FSPath.cwd(), .walk(), .rel_path_to(), etc return a
PathAlgebraObject or
      an FSPath?  The use cases seem evenly divided.

    - It gets ugly when you have to deconstruct an FSPath to modify
it, then call
      the constructor again to create a new FSPath to use it.  Users
would complain
      at:
             my_directory = FSPath(PathName("...")).mkdir()
      or:
             my_directory = FSPath("...")
             FSPath(my_directory.path, "subdir").mkdir()

Since I really don't think I want to use the non-subclass version in
my own applications and thus have little incentive to work on it,
maybe someone else who really wants that model for the stdlib can take
over implementing that part.

* * *
But the main question I wanted to ask about was root splitting.  I'm
going for the following API:

       p = PathName(*joinable_components)
       p.parent
       p.name
       p.ext
       p.name_without_ext
       p.drive
       p.unc
       p.split_root()  ->  (root, rest)
       p.components()  ->  [root, component1, component2, ...]

The idea being that you'll often want to make a path relative and then
attach it to a new parent, add/remove components, etc.

My idea, based on Noam's proposal, was to consider the root:
    /
    \
    c:\
    C:
    \\unc_share\
or "" for a relative path.  That way, root + rest would always be
concatenation without a separator.  But os.path.split* don't work like
this.

 >>> ntpath.splitdrive(r"C:\dir\subdir")
('C:', '\\dir\\subdir')
>>> ntpath.splitunc(r"\\unc_share\foo\bar\baz")
('\\\\unc_share\\foo', '\\bar\\baz')

There is no method to split a slash root (/ or \) so I have to do that manually.

To realize my original idea, in the drive and UNC cases I'd have to
move the intervening slash from the remainder to the root.  But this
would make the root different that what p.drive and p.unc return -- if
that matters.  Do we even need p.drive and p.unc if we have
p.split_root() that handles all kinds of roots?

Do we need the .split* methods if the properties/methods above handle
all the use cases?  It would be cleaner not to include them but
somebody will whine "where is my method?"  It's also slightly less
efficient to do (p.parent, p.name) or (p.name_without_ext, p.ext)
because you're implicitly doing the same split twice -- if that
matters.

On the other hand, os.path.join() handles all this, adding a separator
when necessary, and even handling "" as the first argument.  So maybe
I should just let os.path.join() do its magic and be happy.

Noam distinguished between a drive-relative (r"C:foo") and
drive-absolute (r"C:\foo").  I'm treating the former as absolute
because I don't think it matters in this context.  os.path.join() does
not add a slash with drive letters:

>>> ntpath.join(r"C:", "foo")
'C:foo'
>>> ntpath.join(r"C:\ "[:-1], "foo")
'C:\\foo'

(The latter funny syntax is to get around the fact that you can't have
a backslash at the end of a raw string literal.)

So with drive letters it's vital not to add a backslash, and to trust
the root portion ends with a backslash if necessary.

* * *
I've put some work into converting paths:
NTPath(PosixPath("foo/bar.py"), using the .components() format as a
universal intermediary, and refusing to convert absolute paths.  This
has taken some work to implement and several paragraphs to document,
and I still haven't tested to see if it's correct.  I'm wondering if
it's becoming overkill for the original goal and anticipated use
cases.  We thought, "Yeah, it would be a good idea to support
non-native paths and to convert paths."  Then Talin, who wanted it,
said he was mainly concerned about putting Posix paths in config files
and having them work on other platforms (Windows on embedded systems
that don't have a current directory), without having to manually
convert them.  But os.path.normpath() and os.path.normcase() already
convert Posix slashes to Windows backslashes.  Maybe that's enough?
Is there a need to go the other way or do other conversions?

* * *
Path APIs are really like web frameworks in Python: there's no way
everybody's going to agree on one, and it's so easy to write a
different one.  So I'm factoring the more complex code into a function
library that can be used as a back end to other path APIs.

-- 
Mike Orr <sluggoster at gmail.com>


More information about the Python-3000 mailing list