[Python-ideas] PEP 428 - object-oriented filesystem paths

Paul Moore p.f.moore at gmail.com
Fri Oct 5 21:19:12 CEST 2012


On 5 October 2012 19:25, Antoine Pitrou <solipsis at pitrou.net> wrote:
> A path can be joined with another using the ``__getitem__`` operator::
>
>     >>> p = PurePosixPath('foo')
>     >>> p['bar']
>     PurePosixPath('foo/bar')
>     >>> p[PurePosixPath('bar')]
>     PurePosixPath('foo/bar')

There is a risk that this is too "cute". However, it's probably better
than overloading the '/' operator, and you do need something short.

> As with constructing, multiple path components can be specified at once::
>
>     >>> p['bar/xyzzy']
>     PurePosixPath('foo/bar/xyzzy')

That's risky. Are you proposing always using '/' regardless of OS? I'd
have expected os.sep (so \ on Windows). On the other hand, that would
make

p['bar\\baz']

mean two different things on Windows and Unix - 2 extra path levels on
Windows, only one on Unix (and a filename containing a backslash).

It would probably be better to allow tuples as arguments:

p['bar', 'baz']


> Properties
> ----------
>
> Five simple properties are provided on every path (each can be empty)::
>
>     >>> p = PureNTPath('c:/pathlib/setup.py')
>     >>> p.drive
>     'c:'
>     >>> p.root
>     '\\'
>     >>> p.anchor
>     'c:\\'
>     >>> p.name
>     'setup.py'
>     >>> p.ext
>     '.py'

I don't like the way the distinction between "root" and "anchor" works
here. Unix users are never going to use "anchor", as "root" is the
natural term, and it does exactly the right thing on Unix. So code
written on Unix will tend to do the wrong thing on Windows (where
generally you'd want to use "anchor" or you'll find yourself switching
accidentally to the current drive).

It's a rare situation where it would matter, which on the one hand
makes it much less worth worrying about, but on the other hand means
that when bugs *do* occur, they will be very obscure :-(

Also, there is no good terminology in current use here. The only
concrete thing I can suggest is that "root" would be better used as
the term for what you're calling "anchor" as Windows users would
expect the root of "C:\foo\bar\baz" to be "C:\". The term "drive"
would be right for "C:" (although some might expect that to mean "C:\"
as well, but there's no point wasting two terms on the one concept).
It might be more practical to use a new, but explicit, term like
"driveroot" for "\". It's the same as root on Unix, and on Windows
it's fairly obviously "the root on the current drive". And by using
the coined term for the less common option, it might act as a reminder
to people that something not entirely portable is going on.

But there's no really simple answer - Windows and Unix are just different here.

> The ``parts`` property provides read-only sequence access to a path object::
>
>     >>> p = PurePosixPath('/etc/init.d')
>     >>> p.parts
>     <PurePosixPath.parts: ['/', 'etc', 'init.d']>

+1. There's lots of times I have wished os.path had this.

> Windows paths handle the drive and the root as a single path component::
>
>     >>> p = PureNTPath('c:/setup.py')
>     >>> p.parts
>     <PureNTPath.parts: ['c:\\', 'setup.py']>
>     >>> p.root
>     '\\'
>     >>> p.parts[0]
>     'c:\\'
>
> (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).

This again suggests to me that "C:\" is more closely allied to the
term "root" here.

Also, I assume that paths will be comparable, using case sensitivity
appropriate to the platform. Presumably a PurePath and a Path are
comparable, too. What about a PosixPath and an NTPath? Would you
expect them to be comparable or not?

But in general, this looks like a pretty good proposal. Having a
decent path abstraction in the stdlib would be great.

Paul.



More information about the Python-ideas mailing list