[Python-Dev] urllib2

Tim Peters tim.one@home.com
Thu, 7 Jun 2001 00:54:42 -0400


[Paul Prescod]
> Tim asked me to look into test_urllib2 failure.

Wow!  I'm going to remember that.  Have to ask people to do things more
often <ahem>.

> notice that Guido's name is in the relevant RFC so I guess he's the
> real expert <0.5 wink>:
>
> http://www.faqs.org/rfcs/rfc1738.html
>
> Anyhow, there are a variety of problems. :(

I'm going to add one more.  The spec says this is a file URL:

    fileurl = "file://" [ host | "localhost" ] "/" fpath

But on Windows, urllib2.urlopen() throws up even on URLs like:

    file:///c:/bootlog.txt

and

    file://localhost/c:/bootlog.txt

AFAICT, those conform to the spec (the first with an empty host, the second
with the special reserved hostname), Windows has no problem with either of
them (heck, in Outlook I can click on them while I'm typing this email --
works fine), but urllib2 mangles them into (repr) '\\c:\\bootlog.txt', which
Windows has no idea what to do with.  Hard to see why it should, either.

> First, test_urllib2 says:
>
>    file_url = "file://%s" % urllib2.__file__
>
> This is not going to construct a strictly standards conforming URL on
> Windows but that form is still common enough and obvious enough that
> maybe we should support it.

Common among what?

> So that's problem #1, we aren't compatible with mildly broken Windows
> file URLs.

I haven't found a sense in which Windows file URLs are broken.  test_urllib2
creates bad URLs on Windows, and urllib2 itself transforms legit file URLs
into broken ones on Windows, but both of those appear to be our (Python's)
fault.  Until std stuff works, worrying about extensions to the std seems
premature.

> Problem #2 is that the test program generates mildly broken URLs
> on Windows.

Yup.

> That begs the question of what IS the right way to construct file urls
> in a cross-platform manner.

The spec seems vaguely clear to me on this point (it's vaguely unclear to me
whether a colon is allowed in an fpath -- the text seems to say one thing
but the BNF another).

> I would have thought that urllib.pathname2url was the way but I note
> that it isn't documented.  Plus it is poorly named. A function that
> does this:
>
> """Convert a DOS path name to a file url.
>
>             C:\foo\bar\spam.foo
>
>                     becomes
>
>             ///C|/foo/bar/spam.foo
>     """
>
> is not really constructing a URL!

Or anything else recognizable <wink>.

> And the semantics of the function on multiple platforms do not seem
> to me to be identical. On Windows it adds a bunch of leading slashes
> and mac and Unix seem not to. So you can't safely paste a "file:" or
> "file://" on the front. I don't know how widely pathname2url has been
> used even though it is undocumented....should we fix it and document
> it or write a new function?

Maybe it's just time to write urllib3.py <0.8 wink>.

no-conclusions-from-me-ly y'rs  - tim