about parse a link

Steve Holden sholden at holdenweb.com
Sat Sep 7 17:30:49 EDT 2002


"Fredrik Lundh" <fredrik at pythonware.com> wrote in message
news:HXXd9.9644$HY3.2204710 at newsc.telia.net...
> "koko" wrote:
>
> > if I have extracted the links on the page:
> > e.g:
> >
> > http://www.uic.edu/index.htm
> > on this page: there are
> > a.htm
> > b.htm
> > c.htm
> > http://www.uic.edu/home/e.htm
> >
> > how can I log the a.htm, b.htm, c.htm with the full web address?
>
>     base = "http://www.uic.edu/index.htm"
>
>     url_list = [
>         "a.htm",
>         "b.htm",
>         "c.htm",
>         "http://www.uic.edu/home/e.htm"
>     ]
>
>     import urlparse
>
>     for url in url_list:
>         print urlparse.urljoin(base, url)
>
> prints
>
>     http://www.uic.edu/a.htm
>     http://www.uic.edu/b.htm
>     http://www.uic.edu/c.htm
>     http://www.uic.edu/home/e.htm
>
> </F>
>
> <!-- (the eff-bot guide to) the python standard library:
> http://www.pythonware.com/people/fredrik/librarybook.htm
> -->
>

In case you haven't done much of this kind of stuff, note tha you should use
the URL of the page you are inspecting as the base, unless it contains a
<BASE...> tag, in which case you need to use the URL from that.

regards
-----------------------------------------------------------------------
Steve Holden                                  http://www.holdenweb.com/
Python Web Programming                        pydish.holdenweb.com/pwp/
Previous .sig file retired to                    www.homeforoldsigs.com
-----------------------------------------------------------------------






More information about the Python-list mailing list