More regex help
support.desk.ipg at gmail.com
Wed Sep 24 18:25:02 CEST 2008
I am working on a python webcrawler, that will extract all links from an
html page, and add them to a queue, The problem I am having is building
absolute links from relative links, as there are so many different types of
relative links. If I just append the relative links to the current url, some
websites will send it into a never-ending loop.
What I am looking for is a regexp that will extract the root url from any
url string I pass to it, such as
Regexp = http:example.com
Regexp = 'http://anotherexample.com/
Regext = 'http://example.com'
More information about the Python-list