On Sun, Oct 7, 2012 at 10:37 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 6 Oct 2012 10:44:37 -0700 Guido van Rossum <guido@python.org> wrote:
But rather than diving right into the syntax, I would like to focus on some use cases. (Some of this may already be in the PEP, my apologize.) Some things I care about (based on path manipulations I remember I've written at some point or another):
- Distinguishing absolute paths from relative paths; this affects joining behavior as for os.path.join().
The proposed API does function like os.path.join() in that respect: when joining a relative path to an absolute path, the relative path is simply discarded:
p = PurePath('a') q = PurePath('/b') p[q] PurePosixPath('/b')
- Various normal forms that can be used for comparing paths for equality; there should be a pure normalization as well as an impure one (like os.path.realpath()).
Impure normalization is done with the resolve() method:
os.chdir('/etc') Path('ssl/certs').resolve() PosixPath('/etc/pki/tls/certs')
(/etc/ssl/certs being a symlink to /etc/pki/tks/certs on my system)
Pure comparison already obeys case-sensitivity rules as well as the different path separators:
PureNTPath('a/b') == PureNTPath('A\\B') True PurePosixPath('a/b') == PurePosixPath('a\\b') False
Note the case information isn't lost either:
str(PureNTPath('a/b')) 'a\\b' str(PureNTPath('A/B')) 'A\\B'
- An API that encourage Unix lovers to write code that is most likely also to make sense on Windows.
- An API that encourages Windows lovers to write code that is most likely also to make sense on Unix.
I agree on these goals, that's why I'm trying to avoid system-specific methods. For example is_reserved() is also defined under Unix, it just always returns False:
PurePosixPath('CON').is_reserved() False PureNTPath('CON').is_reserved() True
- Integration with fnmatch (pure) and glob (impure).
This is provided indeed, with the match() and glob() methods respectively.
- In addition to stat(), some simple derived operations like getmtime(), getsize(), islink().
The PEP proposes properties mimicking the stat object attributes:
p = Path('setup.py') p.st_size 977 p.st_mtime 1349461817.8768747
And methods to query the file type:
p.is_symlink() False p.is_file() True
Perhaps the properties / methods mix isn't very consistent.
I would warn about caching these results on the path object. I can easily imagine cases where I want to repeatedly call stat() because I'm waiting for a file to change (e.g. tail -f does something like this). I would prefer to have a stat() method that always calls os.stat(), and no caching of the results; the user can cache the stat() return value. (Maybe we can add is_file() etc. as methods on stat() results now they are no longer just tuples?)
- Easy checks and manipulations (applying to the basename) like "ends with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc extension with .py", "remove trailing ~", "append .tmp", "remove leading @", and so on.
I'll try to reconcile this with Ben Finney's suffix / suffixes proposal.
- Matching on patterns on directory names (e.g. "does not contain a segment named .hg").
Sequence-like access on the parts property provides this:
p = PurePath('foo/.hg/hgrc') '.hg' in p.parts True
Sounds cool. I will try to refrain from bikeshedding much more on this proposal; I'd rather focus on reactors and futures... -- --Guido van Rossum (python.org/~guido)