[Python-Dev] Path object design

Sun Nov 5 17:59:33 CET 2006

On 11/5/06, Andrew Dalke <dalke at dalkescientific.com> wrote:
>
>    I agree that supporting non-filesystem directories (zip files,
>    CSV/Subversion sandboxes, URLs) would be nice, but we already have a
>    big enough project without that.  What constraints should a Path
>    object keep in mind in order to be forward-compatible with this?
>
> Is the answer therefore that URLs and URI behaviour should not
> place constraints on a Path object becuse they are sufficiently
> dissimilar from file-system paths?  Do these other non-FS hierarchical
> structures have similar differences causing a semantic mismatch?

This discussion has renforced my belief that os.path.join's behavior
is correct with non-initial absolute args:

    os.path.join('/usr/bin', '/usr/local/bin/python')

I've used that in applications and haven't found it a burden.

Its behavior with '..' seems justifiable too, and Talin's trick of
wrapping everything in os.path.normpath is a great one.

I do think join should take more care to avoid multiple slashes
together in the middle of a path, although this is really the
responsibility of the platform library, not a generic function/method.
 Join is true to its documentation of only adding separators and never
than deleting them, but that seems like a bit of sloppiness.   On the
other hand, the filesystems don't care; I don't think anybody has
mentioned a case where it actually creates a path the filesystem can't
handle.

urljoin clearly has a different job.  When we talked about extending
path to URLs, I was thinking more in terms of opening files, fetching
resources, deleting, renaming, etc. rather than split-modify-rejoin.
A hypothetical urlpath module would clearly have to follow the URL
rules.  I don't see a contradition in supporting both URL joining
rules and having a non-initial absolute argument, just to avoid
cross-"platform" surprises.  But urlpath would also need methods to
parse the scheme and host on demand, query strings, #fragments, a
class method for building a URL from the smallest parts, etc.

As for supporting path fragments and '..' in join arguments (for
filesystem paths), it's clearly too widely used to eliminate.  Users
can voluntarily refrain from passing arguments containing separators.
For cases involving a user-supplied -- possibly hostile -- path,
either a separate method (safe_join, child) could achieve this, or a
subclass implemetation that allows only safe arguments.

Regarding pathname-manipulation methods and filesystem-access methods,
I'm not sure how workable it is to have separate objects for them.

    os.mkdir(   Path("/usr/local/lib/python/Cheetah/Template.py").parent   )
    Path("/usr/local/lib/python/Cheetah/Template.py").parent.mkdir()
    FileAccess(
Path("/usr/local/lib/python/Cheetah/Template.py").parent   ).mkdir()

The first two are reasonable.  The third... who would want to do this
for every path?  How often would you reuse the FileAccess object?  I
typically create Path objects from configuration values and keep them
around for the entire application; e.g., data_dir.  Then I create
derived paths as necessary. I suppose if the FileAccess object has a
.path attribute, it could do double-duty so you wouldn't have to store
the path separately.  Is this what the advocates of two classes have
in mind?  With usage like this?

    my_file = FileAccess(   file_access_obj.path.joinpath("my_file")   )
    my_file = FileAccess(   Path(file_access_obj,path, "my_file")   )

Working on my Path implementation.  (Yes it's necessary, Glyph, at
least to me.)  It's going slow because I just got a Macintosh laptop
and am still rounding up packages to install.

-- 
Mike Orr <sluggoster at gmail.com>