[Python-Dev] file() or open()?

Mike Brown mike at skew.org
Wed Jul 7 06:51:54 CEST 2004


Nick Bastin wrote:
> I could see a future where open() would support any valid URI

-1.

This already exists, mostly, in urllib.urlopen(), which happens to be very 
lenient in what it will accept as a "URL" to open. It attempts to guess at the 
intended behavior, regardless of whether the argument was a URI (possibly 
relative) or an OS path (possibly relative) or some hybrid of the two. Often 
it is impossible to tell based on syntax alone what type of 
thing-to-be-dreferenced the argument is, so the function errs on the side of 
what's most likely -- not that it goes to great lengths.

Overall it does an OK job, but it has bugs, conformance issues, Unicode 
issues, its behavior is affected by what platform it is running on, and heaven 
help you if you have dot segments, a UNC path, a colon in the wrong place or 
didn't use "|" instead of ":" in your Windows 'file' URI.

Even if these peripheral issues/bugs are addressed, it's still just too messy 
to take an arbitrary string and guess as to whether it is a URI or an OS path 
(and for which OS?) and to handle it as the user intended. The best you can 
do, in the absence of making the user assert exactly what the string is, is 
subject it to syntax checks that rule OUT the possibility of it being one or 
the other. Then, if it's still ambiguous, what to do? Fall back on some 
well-documented behavior such as trying it one way, then the other? Still 
rather messy, IMHO, and makes it difficult to use such a function in a context 
which requires that the argument be handled ONLY as one or the other. For 
example, in a URI resolver, you don't want href="/etc/passwd" in a document at 
http://myhost/doc.html to be interpreted as it it were "file:///etc/passwd" 
just because the attempt to open "http://myhost/etc/passwd" happened to fail.

So I'd rather keep URIs and IRIs isolated from OS paths as much as possible. 
Make the user understand the differences between them and discourage treatment 
of them as interchangable strings. Provide functions for converting between an 
OS path and a proper absolute URI, with no underlying platform influence 
(urllib.pathname2url does not meet these criteria). See examples in 4Suite's 
Ft.Lib.Uri in current CVS [1] for how I think it should be done. A more formal
proposal is forthcoming.

-Mike


[1] http://cvs.4suite.org/cgi-bin/viewcvs.cgi/4Suite/Ft/Lib/Uri.py
    (I'm still not happy with some of the Unicode related API decisions I
    made in this module, but encoding issues aside, I believe the general
    methodology is sound.)


More information about the Python-Dev mailing list