[Python-Dev] Path object design

Thu Nov 2 02:46:49 CET 2006

On 11/1/06, glyph at divmod.com <glyph at divmod.com> wrote:
>
> On 06:14 pm, fredrik at pythonware.com wrote:
> >glyph at divmod.com wrote:
> >
> >> I assert that it needs a better[1] interface because the current
> >> interface can lead to a variety of bugs through idiomatic, apparently
> >> correct usage.  All the more because many of those bugs are related to
> >> critical errors such as security and data integrity.
>
> >instead of referring to some esoteric knowledge about file systems that
> >us non-twisted-using mere mortals may not be evolved enough to under-
> >stand,
>
> On the contrary, twisted users understand even less, because (A) we've been
> demonstrated to get it wrong on numerous occasions in highly public and
> embarrassing ways and (B) we already have this class that does it all for us
> and we can't remember how it works :-).

This is ironic coming from one of Python's celebrity geniuses.  "We
made this class but we don't know how it works."  Actually, it's
downright alarming coming from someone who knows Twisted inside and
out yet still can't make sense of path patform oddities.

>  * This is confusing as heck:
>    >>> os.path.join("hello", "/world")
>    '/world'

That's in the documentation.  I'm not sure it's "wrong".  What should
it do in this situation?  Pretend the slash isn't there?

This came up in the directory-tuple proposal.  I said there was no
reason to change the existing behavior of join.  Noam favored an
exception.

>    >>> os.path.join("hello", "slash/world")
>    'hello/slash/world'

That has always been a loophole in the function, and many programs
depend on it.  Again, is it "wrong"?  Should an embedded separator in
an argument be an error?  Obviously this depends on the user's
knowledge that the separator happens to be slash.

>    >>> os.path.join("hello", "slash//world")
>    'hello/slash//world'

Again a case of what "should" it do?  The filesystem treats it as a
single slash.  The user didn't call normpath, so should we normalize
it anyway?

>  * Sometimes a path isn't a path; the zip "paths" in sys.path are a good
> example.  This is why I'm a big fan of including a polymorphic interface of
> some kind: this information is *already* being persisted in an ad-hoc and
> broken way now, so it needs to be represented; it would be good if it were
> actually represented properly.  URL
> manipulation-as-path-manipulation is another; the recent
> perforce use-case mentioned here is a special case of that, I think.

Good point, but exactly what functionality do you want to see for zip
files and URLs?  Just pathname manipulation?  Or the ability to see
whether a file exists and extract it, copy it, etc?

>  * you have to care about unicode sometimes.  rarely enough that none of
> your tests will ever account for it, but often enough that _some_ users will
> notice breakage if your code is ever widely distributed.

This is a Python-wide problem.  The move to universal unicode will
lessen this, or at least move the problem to *one* place (creating the
unicode object), where every Python programmer will get bitten by it
and we'll develop a few standard strategies to deal with it.

(The problem is that if str and unicode are mixed in expressions,
Python will promote the str to unicode and you'll get a
UnicodeDecodeError if it contains non-ASCII characters.  Figuring out
all the ways such strings can slip into a program is difficult if
you're dealing with user strings from an unknown charset, or your
MySQL server is configured differently than you thought it was, or the
string contains Windows curly quotes et al which are undefined in
Latin-1.)

>  * the documentation really can't emphasize enough how bad using
> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist
> when it is a contended resource, is.  It can be handy, but it is _always_ a
> race condition.

What else can you do?  It's either os.path.exists()/os.remove() or "do
it anyway and catch the exception".  And sometimes you have to check
the filetype in order to determine *what* to do.

-- 
Mike Orr <sluggoster at gmail.com>