Urlnames in urllib2

Gabriel Genellina gagsl-py at yahoo.com.ar
Thu Oct 5 06:55:29 CEST 2006

At Wednesday 4/10/2006 21:03, goyatlah wrote:

>I'm trying to figure out how to get the exact opened url after a
>urlopen in urllib2.
>Say you have a link : http://myhost/mypath : what do I get back,
>- the file mypath on myhost
>- the file index.html on myhost/mypath,
>- or maybe something else.

You get whatever the webserver chooses to serve at that URI.
- if mypath is a directory (or assimilable to a directory), you get a 
redirect to mypath/ (else relative references won't work)
- for mypath/ you get the default document for that directory, maybe 
index.html or index.php or default.html or ...
- for mypath/myname you should get the best choice of documents 
regarding the Accept, Accept-Language, Accept-Encoding (but few 
people/servers use them completely).

>Snd what about the following: http;//myhost/index.htm where index.htm
>is actually a directory.

Probably you would get a redirect to http://myhost/index.htm/

>With  urllib2.geturl() I can find out if the name is changed to
>mypath/ or index.htm/ but it seems that is the only thing I can find

This is the

HTTPRedirectHandler doing its work. You could look at the 
Content-Location header, but I doubt you could get much more info 
about the actual object retrieved - there are proxies, rewrite rules, 
virtual hosts...

Gabriel Genellina
Softlab SRL 

Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! 

More information about the Python-list mailing list