a regular expression question

Jp Calderone exarkun at intarweb.us
Sat Mar 22 03:19:48 EST 2003


On Fri, Mar 21, 2003 at 11:31:38PM -0800, Luke wrote:
> I suppose this isn't really a python question as much a R.E. question,
> but I'm using python to do it, so... I'm trying to parse link data
> from a webpage that looks like this:
> 

  You should probably use sgmllib to do this.

> 
> <a href="foo1">1</a> abc <a href="foo2">2</a> def <a href="foo3">3</a>
> ghi <a href="foo4">4</a> jkl
> 
 
  Groups are determined by the placement of parenthesis.  Since you have
placed parenthesis around "[0-9]+?", this is the only part being returned. 
You probably wanted to use non-grouping parenthesis, (?: ... ) for these
(since you are only making order of operations explicit), so that findall
simply returns the entire matched expression.

    >>> re1 = re.compile("<a .*?>(?:[0-9]+?)</a>(?:.*?)")
    >>> re1.findall(text)
    ['<a href="foo1">1</a>', '<a href="foo2">2</a>',
     '<a href="foo3">3</a>', '<a href="foo4">4</a>']

  Jp

-- 
"There is no reason for any individual to have a computer in their
home."
                -- Ken Olson, President of DEC, World Future Society
                   Convention, 1977
-- 
 up 2 days, 3:59, 6 users, load average: 0.19, 0.07, 0.02





More information about the Python-list mailing list