PEP on path module for standard library

Reinhold Birkenfeld reinhold-birkenfeld-nospam at
Fri Jul 22 21:38:09 CEST 2005

Andrew Dalke wrote:
> Duncan Booth wrote:
>> Personally I think the concept of a specific path type is a good one, but 
>> subclassing string just cries out to me as the wrong thing to do.
> I disagree.  I've tried using a class which wasn't derived from
> a basestring and kept running into places where it didn't work well.
> For example, "open" and "mkdir" take strings as input.  There is no
> automatic coercion.

Well, as a Path object provides both open() and mkdir() functions, these use
cases are covered. And that's the point of the Path class: Every common use
you may have for a path is implemented as a method.

So, it's maybe a good thing that for uncommon uses you have to explicitly
"cast" the path to a string.

>>>> class Spam:
> ...   def __getattr__(self, name):
> ...     print "Want", repr(name)
> ...     raise AttributeError, name
> ... 
>>>> open(Spam())
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: coercing to Unicode: need string or buffer, instance found
>>>> import os
>>>> os.mkdir(Spam())
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: coercing to Unicode: need string or buffer, instance found
> The solutions to this are:
>   1) make the path object be derived from str or unicode.  Doing
> this does not conflict with any OO design practice (eg, Liskov
> substitution).
>   2) develop a new "I represent a filename" protocol, probably done
> via adapt().
> I've considered the second of these but I think it's a more
> complicated solution and it won't fit well with existing APIs
> which do things like
>   if isinstance(input, basestring):
>     input = open(input, "rU")
>   for line in input:
>     print line
> I showed several places in the stdlib and in 3rd party packages
> where this is used.

That's a valid point. However, if Path is not introduced as a string,
most new users will not try to use a Path instance where a string is
needed, just as you wouldn't try to pass a list where a string is wanted.

>> You should need an explicit call to convert a path to a string and that 
>> forces you when passing the path to something that requires a string to 
>> think whether you wanted the string relative, absolute, UNC, uri etc.
> You are broadening the definition of a file path to include URIs?
> That's making life more complicated.  Eg, the rules for joining
> file paths may be different than the rules for joining URIs.
> Consider if I have a file named "mail:dalke at" and I
> join that with "file://home/dalke/badfiles/".
> Additionally, the actions done on URIs are different than on file
> paths.  What should os.listdir("") do?

I agree. Path is only for local filesystem paths (well, in UNIX they could
as well be remote, but that's thanks to the abstraction the filesystem
provides, not Python).

> As I mentioned, I tried some classes which emulated file
> paths.  One was something like
> class TempDir:
>   """removes the directory when the refcount goes to 0"""
>   def __init__(self):
>     self.filename = ... use a function from the tempfile module
>   def __del__(self):
>     if os.path.exists(self.filename):
>       shutil.rmtree(self.filename)
>   def __str__(self):
>     return self.filename
> I could do
>   dirname = TempDir()
> but then instead of
>   os.mkdir(dirname)
>   tmpfile = os.path.join(dirname, "blah.txt")
> I needed to write it as
>   os.mkdir(str(dirname))
>   tmpfile = os.path.join(str(dirname), "blah.txt"))
> or have two variables, one which could delete the
> directory and the other for the name.  I didn't think
> that was good design.

I can't follow. That's clearly not a Path but a custom object of yours.
However, I would have done it differently: provide a "name" property
for the object, and don't call the variable "dirname", which is confusing.

> If I had derived from str/unicode then things would
> have been cleaner.
> Please note, btw, that some filesystems are unicode
> based and others are not.  As I recall, one nice thing
> about the path module is that it chooses the appropriate
> base class at import time.  My "str()" example above
> does not and would fail on a Unicode filesystem aware
> Python build.

There's no difference. The only points where the type of a Path
object' underlying string is decided are Path.cwd() and the
Path constructor.


More information about the Python-list mailing list