[XML-SIG] file urls in urllib

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Fri, 9 Mar 2001 08:27:44 +0100


> I just tested the following combinations usng IE5.5 on Win98:
> 
> OK (i.e., it works):
> file:///D:\temp\xxx.html
> D:\temp\xxx.html
> D:/temp/xxx.html
> file:/D:\temp\xxx.html
> file:D:/temp/xxx.html
> file:///D:/temp/xxx.html
> file:///D|/temp/xxx.html
> file:///D|\temp\xxx.html
> file://localhost/D:/temp/xxx.html
> file://localhost/D:\temp\xxx.html

Thanks for these investigations. That seems to confirm that atleast

file:///D:/temp/xxx.html

is accepted as a URL, so I think urllib should accept it as well. As
for the others, I noticed one aspect that seems to have escaped (pun
intended) in the discussion so far: According to RFC 1738, both | and
\ are *unsafe*. That means they MUST be escaped in an URL (also the
rfc only writes "must"); in turn, the proper form of some of the
others would be 

file:///D%7C/temp/xxx.html
file:///D%7C%5Ctemp%5Cxxx.html

> Pretty amazing, eh?  Looks like they are following the maxim, write
> strict, accept loose.

I'd like urllib to follow that as well; the strict case probably being
the one with the forward slashes (as the required escaping for the
REVERSE SOLIDUS and the VERTICAL LINE looks ugly). Please note that
urllib.quote quotes the COLON, although this is not required by the
RFC: only if the colon was reserved by the scheme, it would need to be
quoted.

As for accepting: We should atleast accept what is clearly conforming
to the RFC, i.e. the forms starting with file://<optional host>/; we
should probably accept that not everything that should be quoted
is. We also need backwards compatibility, so the forms using the
vertical line should be accepted.

Regards,
Martin