[Python-Dev] file() or open()?
mike at skew.org
Wed Jul 7 06:51:54 CEST 2004
Nick Bastin wrote:
> I could see a future where open() would support any valid URI
This already exists, mostly, in urllib.urlopen(), which happens to be very
lenient in what it will accept as a "URL" to open. It attempts to guess at the
intended behavior, regardless of whether the argument was a URI (possibly
relative) or an OS path (possibly relative) or some hybrid of the two. Often
it is impossible to tell based on syntax alone what type of
thing-to-be-dreferenced the argument is, so the function errs on the side of
what's most likely -- not that it goes to great lengths.
Overall it does an OK job, but it has bugs, conformance issues, Unicode
issues, its behavior is affected by what platform it is running on, and heaven
help you if you have dot segments, a UNC path, a colon in the wrong place or
didn't use "|" instead of ":" in your Windows 'file' URI.
Even if these peripheral issues/bugs are addressed, it's still just too messy
to take an arbitrary string and guess as to whether it is a URI or an OS path
(and for which OS?) and to handle it as the user intended. The best you can
do, in the absence of making the user assert exactly what the string is, is
subject it to syntax checks that rule OUT the possibility of it being one or
the other. Then, if it's still ambiguous, what to do? Fall back on some
well-documented behavior such as trying it one way, then the other? Still
rather messy, IMHO, and makes it difficult to use such a function in a context
which requires that the argument be handled ONLY as one or the other. For
example, in a URI resolver, you don't want href="/etc/passwd" in a document at
http://myhost/doc.html to be interpreted as it it were "file:///etc/passwd"
just because the attempt to open "http://myhost/etc/passwd" happened to fail.
So I'd rather keep URIs and IRIs isolated from OS paths as much as possible.
Make the user understand the differences between them and discourage treatment
of them as interchangable strings. Provide functions for converting between an
OS path and a proper absolute URI, with no underlying platform influence
(urllib.pathname2url does not meet these criteria). See examples in 4Suite's
Ft.Lib.Uri in current CVS  for how I think it should be done. A more formal
proposal is forthcoming.
(I'm still not happy with some of the Unicode related API decisions I
made in this module, but encoding issues aside, I believe the general
methodology is sound.)
More information about the Python-Dev