[Python-ideas] PEP 428 - object-oriented filesystem paths
Guido van Rossum
guido at python.org
Sat Oct 6 19:44:37 CEST 2012
On Fri, Oct 5, 2012 at 11:25 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> This PEP is a resurrection of the idea of having object-oriented
> filesystem paths in the stdlib. It comes with a general API proposal
> as well as a specific implementation (*). The implementation is young
> and discussion is quite open.
Thanks for getting this started! I haven't read the whole PEP or the
whole thread, but I like many of the principles, such as not deriving
from existing built-in types (str or tuple), immutability, explicitly
caring about OS differences, and distinguishing between pure and
impure (I/O-using) operations. (Though admittedly I'm not super-keen
on the specific term "pure".)
I can't say I'm thrilled about overloading p[s], but I can't get too
excited about p/s either; p+s makes more sense but that would beg the
question of how to append an extension to a path (transforming e.g.
'foo/bar' to 'foo/bar.py' by appending '.py'). At the same time I'm
not in the camp that says you can't use / because it's not division.
But rather than diving right into the syntax, I would like to focus on
some use cases. (Some of this may already be in the PEP, my
apologize.) Some things I care about (based on path manipulations I
remember I've written at some point or another):
- Distinguishing absolute paths from relative paths; this affects
joining behavior as for os.path.join().
- Various normal forms that can be used for comparing paths for
equality; there should be a pure normalization as well as an impure
one (like os.path.realpath()).
- An API that encourage Unix lovers to write code that is most likely
also to make sense on Windows.
- An API that encourages Windows lovers to write code that is most
likely also to make sense on Unix.
- Integration with fnmatch (pure) and glob (impure).
- In addition to stat(), some simple derived operations like
getmtime(), getsize(), islink().
- Easy checks and manipulations (applying to the basename) like "ends
with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc
extension with .py", "remove trailing ~", "append .tmp", "remove
leading @", and so on.
- While it's nice to be able to ask for "the extension" it would be
nice if the checks above would not be hardcoded to use "." as a
separator; and it would be nice if the extension-parsing code could
deal with multiple extensions and wasn't confused by names starting or
ending with a dot.
- Matching on patterns on directory names (e.g. "does not contain a
segment named .hg").
- A matching notation based on glob/fnmatch syntax instead of regular
expressions.
PS. Another occasional use for "posix" style paths I have found is
manipulating the path portion of a URL. There are some posix-like
features, e.g. the interpretation of trailing / as "directory", the
requirement of leading / as root, the interpretation of "." and "..",
and the notion of relative paths (although path joining is different).
It would be nice if the "pure" posix path class could be reused for
this purpose, or if a related class with a subset or superset of the
same methods existed. This may influence the basic design somewhat in
showing the need for custom subclasses etc.
--
--Guido van Rossum (python.org/~guido)
More information about the Python-ideas
mailing list