about parse a link
Steve Holden
sholden at holdenweb.com
Sat Sep 7 17:30:49 EDT 2002
"Fredrik Lundh" <fredrik at pythonware.com> wrote in message
news:HXXd9.9644$HY3.2204710 at newsc.telia.net...
> "koko" wrote:
>
> > if I have extracted the links on the page:
> > e.g:
> >
> > http://www.uic.edu/index.htm
> > on this page: there are
> > a.htm
> > b.htm
> > c.htm
> > http://www.uic.edu/home/e.htm
> >
> > how can I log the a.htm, b.htm, c.htm with the full web address?
>
> base = "http://www.uic.edu/index.htm"
>
> url_list = [
> "a.htm",
> "b.htm",
> "c.htm",
> "http://www.uic.edu/home/e.htm"
> ]
>
> import urlparse
>
> for url in url_list:
> print urlparse.urljoin(base, url)
>
> prints
>
> http://www.uic.edu/a.htm
> http://www.uic.edu/b.htm
> http://www.uic.edu/c.htm
> http://www.uic.edu/home/e.htm
>
> </F>
>
> <!-- (the eff-bot guide to) the python standard library:
> http://www.pythonware.com/people/fredrik/librarybook.htm
> -->
>
In case you haven't done much of this kind of stuff, note tha you should use
the URL of the page you are inspecting as the base, unless it contains a
<BASE...> tag, in which case you need to use the URL from that.
regards
-----------------------------------------------------------------------
Steve Holden http://www.holdenweb.com/
Python Web Programming pydish.holdenweb.com/pwp/
Previous .sig file retired to www.homeforoldsigs.com
-----------------------------------------------------------------------
More information about the Python-list
mailing list