[Python-3000] Mini Path object

Talin talin at acm.org
Mon Nov 6 08:39:50 CET 2006


Mike Orr wrote:
> Posted to python-dev and python-3000.  Follow-ups to python-dev only please.
> 
> So, let's say we strip this Path class to:

I'm finally taking the time to sit down and go over this in detail. Here 
are some suggestions.

> class Path(unicode):
>     Path("foo")
>     Path(  Path("directory"),   "subdirectory", "file")    # Replaces
> .joinpath().
>     Path()

For the constructor, I would write it as:

   Path( *components )

'components' is an arbitrary number of path components, either strings 
or path objects. The components are joined together into a single path.

Strings can also be wrapped with an object that indicates that the Path 
is in a platform- or application-specific format:

    # Explicitly indicate that the path string is in Windows NTFS format.
    Path( Path.format.NTFS( "C:\\Program Files" ) )

Note that it's OK to include path separators in the string that's passed 
to the format wrapper - these will get converted.

Not including a format wrapper is equivalent to using the "local" wrapper:

    Path( Path.format.local( "C:\\Program Files" ) )

Where 'local' is an alias to the native path format for the host's 
default filesystem.

The wrapper objects are themselves classes, and need not be in the 
"Path" namespace. For example:

    import p4
    Path( p4.path.format( "//depot/files/..." ) )

This makes the set of specific path formats open-ended and extensible. 
Path format wrappers need not be built into the "Path" module. Each 
format wrapper will have a "to_path" method, that converts the specific 
path encoding into the universal path representation.

Note that if there are multiple components, they don't have to be 
wrapped the same way:

    Path( Path.format.NTFS( "C:\\Program Files" ),
          Path.format.local( "Gimp" ) )

...because the conversion to universal representation is done before the 
components are combined.

One question to be asked is whether the path should be simplified or 
not. There are cases where you *don't* want the path to be simplified, 
and other cases where you do. Perhaps a keyword argument?

    Path( "C:\\Program Files", "../../Gimp", normalize = True )

>     Path.cwd()

No objection here.

>     Path("ab") + "c"  => Path("abc")

Wouldn't that be:

    Path( "ab" ) + "c" => Path( "ab", "c" )

?

It seems that the most common operation is concatenating components, not 
characters, although both should be easy.

>     .abspath()

I've always thought this was a strange function. To be honest, I'd 
rather explicitly pass in the cwd().

>     .normcase()

So the purpose of this function is to get around the fact that on some 
platforms, comparisons between paths are case-sensitive, and on other 
platforms not. However, the reason this function seems weird to me is 
that most case-insensitive filesystems are case-preserving, which makes 
me thing that the real solution is to fix the comparison functions 
rather than mangling the string. (Although there's a hitch - its hard to 
make a case-insensitive dictionary that doesn't require a downcase'd 
copy of the key; Something I've long wanted was a userdict that allowed 
both the comparison and hash functions to be replaceable, but that's a 
different topic.)

>     .normpath()

I'd rename this to "simplify", since it no longer needs to normalize the 
separator chars. (That's done by the wrappers.)

>     .realpath()

Rename to resolve() or resolvelinks().

>     .expanduser()
>     .expandvars()
>     .expand()

Replace with expand( user=True, vars=True )

>     .parent

If parent was a function, you could pass in the number of levels go to 
up, i.e. parent( 2 ) to get the grandparent.

>     .name                 # Full filename without path
>     .namebase        # Filename without extension

I find the term 'name' ambiguous. How about:

     .filepart
     .basepart

or:

     .filename
     .basename

>     .ext

No problem with this

>     .drive

Do we need to somehow unify the concept of 'drive' and 'unc' part? Maybe 
'.device' could return the part before the first directory name.

>     .splitpath()

I'd like to replace this with:

    .component( slice_object )

where the semantics of 'component' are identical to __getitem__ on an 
array or tuple. So for example:

    Path( "a", "b" ).component( 0 ) => "a"
    Path( "a", "b" ).component( 1 ) => "b"
    Path( "a", "b" ).component( -1 ) => "b"
    Path( "a", "b" ).component( 0:1 ) => Path( "a", "b" )
    Path( "a", "b" ).component( 1: ) => Path( "b" )

This is essentially the same as the "slice notation" proposal given 
earlier, except that explicitly tell the user that we are dealing with 
path components, not characters.

>     .stripext()

How about:

     path.ext = ''

>     .splitunc()
>     .uncshare

See above - UNC shouldn't be a special case.

>     .splitall()

Something sadly lacking in os.path.

>     .relpath()

Again, I'd rather that they pass in the cwd() explicitly. But I would 
like to see something like:

    .relativeto( path )

...which computes the minimal relative path that goes from 'self' to 'path'.

>     .relpathto()

Not sure what this does, since there's no argument defined.

Additional methods:

     .format( wrapper_class )

...converts the path into a filesystem-specific format. You can also get 
the same effect by "wrapping" the path object and calling str()

    str( Path.format.NTFS( Path( "a", "b", "c" ) ) )

Although it's a bit cumbersome.

-- Talin


More information about the Python-3000 mailing list