Tim asked me to look into test_urllib2 failure. I notice that Guido's name is in the relevant RFC so I guess he's the real expert <0.5 wink>: http://www.faqs.org/rfcs/rfc1738.html Anyhow, there are a variety of problems. :( First, test_urllib2 says: file_url = "file://%s" % urllib2.__file__ This is not going to construct a strictly standards conforming URL on Windows but that form is still common enough and obvious enough that maybe we should support it. So that's problem #1, we aren't compatible with mildly broken Windows file URLs. Problem #2 is that the test program generates mildly broken URLs on Windows. That begs the question of what IS the right way to construct file urls in a cross-platform manner. I would have thought that urllib.pathname2url was the way but I note that it isn't documented. Plus it is poorly named. A function that does this: """Convert a DOS path name to a file url. C:\foo\bar\spam.foo becomes ///C|/foo/bar/spam.foo """ is not really constructing a URL! And the semantics of the function on multiple platforms do not seem to me to be identical. On Windows it adds a bunch of leading slashes and mac and Unix seem not to. So you can't safely paste a "file:" or "file://" on the front. I don't know how widely pathname2url has been used even though it is undocumented....should we fix it and document it or write a new function? -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook
[Paul Prescod]
Tim asked me to look into test_urllib2 failure.
Wow! I'm going to remember that. Have to ask people to do things more often <ahem>.
notice that Guido's name is in the relevant RFC so I guess he's the real expert <0.5 wink>:
http://www.faqs.org/rfcs/rfc1738.html
Anyhow, there are a variety of problems. :(
I'm going to add one more. The spec says this is a file URL: fileurl = "file://" [ host | "localhost" ] "/" fpath But on Windows, urllib2.urlopen() throws up even on URLs like: file:///c:/bootlog.txt and file://localhost/c:/bootlog.txt AFAICT, those conform to the spec (the first with an empty host, the second with the special reserved hostname), Windows has no problem with either of them (heck, in Outlook I can click on them while I'm typing this email -- works fine), but urllib2 mangles them into (repr) '\\c:\\bootlog.txt', which Windows has no idea what to do with. Hard to see why it should, either.
First, test_urllib2 says:
file_url = "file://%s" % urllib2.__file__
This is not going to construct a strictly standards conforming URL on Windows but that form is still common enough and obvious enough that maybe we should support it.
Common among what?
So that's problem #1, we aren't compatible with mildly broken Windows file URLs.
I haven't found a sense in which Windows file URLs are broken. test_urllib2 creates bad URLs on Windows, and urllib2 itself transforms legit file URLs into broken ones on Windows, but both of those appear to be our (Python's) fault. Until std stuff works, worrying about extensions to the std seems premature.
Problem #2 is that the test program generates mildly broken URLs on Windows.
Yup.
That begs the question of what IS the right way to construct file urls in a cross-platform manner.
The spec seems vaguely clear to me on this point (it's vaguely unclear to me whether a colon is allowed in an fpath -- the text seems to say one thing but the BNF another).
I would have thought that urllib.pathname2url was the way but I note that it isn't documented. Plus it is poorly named. A function that does this:
"""Convert a DOS path name to a file url.
C:\foo\bar\spam.foo
becomes
///C|/foo/bar/spam.foo """
is not really constructing a URL!
Or anything else recognizable <wink>.
And the semantics of the function on multiple platforms do not seem to me to be identical. On Windows it adds a bunch of leading slashes and mac and Unix seem not to. So you can't safely paste a "file:" or "file://" on the front. I don't know how widely pathname2url has been used even though it is undocumented....should we fix it and document it or write a new function?
Maybe it's just time to write urllib3.py <0.8 wink>. no-conclusions-from-me-ly y'rs - tim
[Tim & Paul on file URLs] [Tim]
But on Windows, urllib2.urlopen() throws up even on URLs like:
file:///c:/bootlog.txt
Curiously enough, url = "file:///" + urllib.quote_plus(fnm) seems to work on Windows. It even seems to work on mac, if you first turn '/' into '%2f', then undo the double quoting (turn '%252f' back into '%2f' in the ensuing url). It even seems to work on mac directory names with Unicode characters in them (though I haven't looked too closely, in fear of jinxing it). eye-of-newt-considered-helpful-ly y'rs - Gordon
participants (3)
-
Gordon McMillan
-
Paul Prescod
-
Tim Peters