Regex single quotes in scraper script?

Rock rock at py-nosp
Sat Jul 17 04:32:39 CEST 2004

Hi, I started using a python based screen scraper called newsscraper I
downloaded from sourceforge.   I have created many python
templates that work just fine from their examples however I ran into a road
block with sites that use single quotes instead of double quotes for
specifying url in their web pages.

For example:  <a href=''>

instead of the usual
                     <a href="">

Being a real newbie with this I think I found the area of code that parses
the href.  It is in a file called
the full excerpt is listed below but here is the regex line that I believe
is not dealing with single quote.

m ='href\s*=\s*"?([^>" ]+)["> ]', text, re.I)

I have tried many different variations but no luck and no luck getting hold
of the author.  Any ideas?  Thx.

def get_href(text, base_url=None):
    """get_href(text[, base_url]) -> href or None

    Extract the URL out of an HREF tag.  If base_url is provided,
    will attempt to resolve relative links.

    m ='href\s*=\s*"?([^>" ]+)["> ]', text, re.I)
    if not m:
        return None
    link =
    if base_url and not link.lower().startswith("http"):
        import urlparse
        link = urlparse.urljoin(base_url, link)
    return link

More information about the Python-list mailing list