regular expressions: grabbing variables from multiple matches

D-Man dsh8290 at rit.edu
Wed Jan 3 22:53:39 EST 2001


At first I was confused by what you meant with findall returning a
list of matches, not match objects.

Here's my interpreter session (excuse the poor variable naming, it's
just for testing):

>>> import re
>>> r = re.compile( "asdf" )
>>> m = r.match( "asdf" )
>>> m
<SRE_Match object at 0x8130ca8>
>>> ml = r.findall( "asdfasdf" )
>>> ml
['asdf', 'asdf']
>>> for str in ml :
...     m = r.match( str )
...     print m
... 
<SRE_Match object at 0x8111108>
<SRE_Match object at 0x812ee28>
>>> 


Since findall gives you back the strings that match the regex, you can
go through them and call match() to get a match object.  This should
be faster than matching arbitrary text since you know (in the loop)
all the strings will match.

It will probably hurt performance though, but at least it will be
functional.

HTH,
-D


On Wed, Jan 03, 2001 at 07:16:49PM -0500, Heather Lynn White wrote:
> 
> Suppose I have a regular expression to grab all variations on a meta tag,
> and I will want to extract from any matches the name and content values
> for this tag.
> 
> I use the following re
> 
> MetaTag=re.compile(
> 
> 
> r'''<\s*?(meta|META)\s*?=\s*?"(?P<name>.*?)"\s*?(content|CONTENT)\s*?=\s*?"(?P<content>.*?)"\s*?>'''
> 
> )
> 
> now suppose I have an html document and I want to iterate through all the
> meta tags in that document. If I only catch one, I would say
> 
> matches=MetaTag.match(body)
> if matches:
> 	flds=matches.groupdict()
> 	name=flds["name"]
> 	content=flds["content"]
> 	print name, content
> 
> but this does not work if I use instead findall, to get multiple matches,
> because findall returns a list of matches rather than a list of match
> objects, unlike all the other functions.  Is there a way to extract these
> variables in the way I have done above, but with many matches?
> 
> -heather
> 




More information about the Python-list mailing list