Urlnames in urllib2
Gabriel Genellina
gagsl-py at yahoo.com.ar
Thu Oct 5 00:55:29 EDT 2006
At Wednesday 4/10/2006 21:03, goyatlah wrote:
>I'm trying to figure out how to get the exact opened url after a
>urlopen in urllib2.
>Say you have a link : http://myhost/mypath : what do I get back,
>- the file mypath on myhost
>- the file index.html on myhost/mypath,
>- or maybe something else.
You get whatever the webserver chooses to serve at that URI.
Usually:
- if mypath is a directory (or assimilable to a directory), you get a
redirect to mypath/ (else relative references won't work)
- for mypath/ you get the default document for that directory, maybe
index.html or index.php or default.html or ...
- for mypath/myname you should get the best choice of documents
regarding the Accept, Accept-Language, Accept-Encoding (but few
people/servers use them completely).
>Snd what about the following: http;//myhost/index.htm where index.htm
>is actually a directory.
Probably you would get a redirect to http://myhost/index.htm/
>With urllib2.geturl() I can find out if the name is changed to
>mypath/ or index.htm/ but it seems that is the only thing I can find
>out.
This is the
HTTPRedirectHandler doing its work. You could look at the
Content-Location header, but I doubt you could get much more info
about the actual object retrieved - there are proxies, rewrite rules,
virtual hosts...
Gabriel Genellina
Softlab SRL
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas
More information about the Python-list
mailing list