about parse a link

Fredrik Lundh fredrik at pythonware.com
Fri Sep 6 08:29:59 CEST 2002


"koko" wrote:

> if I have extracted the links on the page:
> e.g:
>
> http://www.uic.edu/index.htm
> on this page: there are
> a.htm
> b.htm
> c.htm
> http://www.uic.edu/home/e.htm
>
> how can I log the a.htm, b.htm, c.htm with the full web address?

    base = "http://www.uic.edu/index.htm"

    url_list = [
        "a.htm",
        "b.htm",
        "c.htm",
        "http://www.uic.edu/home/e.htm"
    ]

    import urlparse

    for url in url_list:
        print urlparse.urljoin(base, url)

prints

    http://www.uic.edu/a.htm
    http://www.uic.edu/b.htm
    http://www.uic.edu/c.htm
    http://www.uic.edu/home/e.htm

</F>

<!-- (the eff-bot guide to) the python standard library:
http://www.pythonware.com/people/fredrik/librarybook.htm
-->





More information about the Python-list mailing list