[Python-ideas] PEP 428 - object-oriented filesystem paths

Guido van Rossum guido at python.org
Sun Oct 7 22:24:59 CEST 2012


On Sun, Oct 7, 2012 at 10:37 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> On Sat, 6 Oct 2012 10:44:37 -0700
> Guido van Rossum <guido at python.org> wrote:
>>
>> But rather than diving right into the syntax, I would like to focus on
>> some use cases. (Some of this may already be in the PEP, my
>> apologize.) Some things I care about (based on path manipulations I
>> remember I've written at some point or another):
>>
>> - Distinguishing absolute paths from relative paths; this affects
>> joining behavior as for os.path.join().
>
> The proposed API does function like os.path.join() in that respect:
> when joining a relative path to an absolute path, the relative path is
> simply discarded:
>
>>>> p = PurePath('a')
>>>> q = PurePath('/b')
>>>> p[q]
> PurePosixPath('/b')
>
>> - Various normal forms that can be used for comparing paths for
>> equality; there should be a pure normalization as well as an impure
>> one (like os.path.realpath()).
>
> Impure normalization is done with the resolve() method:
>
>>>> os.chdir('/etc')
>>>> Path('ssl/certs').resolve()
> PosixPath('/etc/pki/tls/certs')
>
> (/etc/ssl/certs being a symlink to /etc/pki/tks/certs on my system)
>
> Pure comparison already obeys case-sensitivity rules as well as the
> different path separators:
>
>>>> PureNTPath('a/b') == PureNTPath('A\\B')
> True
>>>> PurePosixPath('a/b') == PurePosixPath('a\\b')
> False
>
> Note the case information isn't lost either:
>
>>>> str(PureNTPath('a/b'))
> 'a\\b'
>>>> str(PureNTPath('A/B'))
> 'A\\B'
>
>> - An API that encourage Unix lovers to write code that is most likely
>> also to make sense on Windows.
>>
>> - An API that encourages Windows lovers to write code that is most
>> likely also to make sense on Unix.
>
> I agree on these goals, that's why I'm trying to avoid system-specific
> methods. For example is_reserved() is also defined under Unix, it just
> always returns False:
>
>>>> PurePosixPath('CON').is_reserved()
> False
>>>> PureNTPath('CON').is_reserved()
> True
>
>> - Integration with fnmatch (pure) and glob (impure).
>
> This is provided indeed, with the match() and glob() methods
> respectively.
>
>> - In addition to stat(), some simple derived operations like
>> getmtime(), getsize(), islink().
>
> The PEP proposes properties mimicking the stat object attributes:
>
>>>> p = Path('setup.py')
>>>> p.st_size
> 977
>>>> p.st_mtime
> 1349461817.8768747
>
> And methods to query the file type:
>
>>>> p.is_symlink()
> False
>>>> p.is_file()
> True
>
> Perhaps the properties / methods mix isn't very consistent.

I would warn about caching these results on the path object. I can
easily imagine cases where I want to repeatedly call stat() because
I'm waiting for a file to change (e.g. tail -f does something like
this). I would prefer to have a stat() method that always calls
os.stat(), and no caching of the results; the user can cache the
stat() return value. (Maybe we can add is_file() etc. as methods on
stat() results now they are no longer just tuples?)

>> - Easy checks and manipulations (applying to the basename) like "ends
>> with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc
>> extension with .py", "remove trailing ~", "append .tmp", "remove
>> leading @", and so on.
>
> I'll try to reconcile this with Ben Finney's suffix / suffixes proposal.
>
>> - Matching on patterns on directory names (e.g. "does not contain a
>> segment named .hg").
>
> Sequence-like access on the parts property provides this:
>
>>>> p = PurePath('foo/.hg/hgrc')
>>>> '.hg' in p.parts
> True

Sounds cool. I will try to refrain from bikeshedding much more on this
proposal; I'd rather focus on reactors and futures...

-- 
--Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list