re.findall() is skipping matching characters
Gustaf Liljegren
gustafl at algonet.se
Mon Oct 15 16:17:45 EDT 2001
Thanks for helping me out with matching/searching before. Unfortunately,
the example I gave was a little too basic, so I need some more help.
>>> re.search(r'<(a)', '<a href="page.html">').group()
'<a'
The search() function matches the full expression: both the '<' and the
'(a)', which is short for a alternation between more HTML elements. The
match() function behaves like this too:
>>> re.match(r'<(a)', '<a href="page.html">').group()
'<a'
But look what happens when I use the findall() function:
>>> re.findall(r'<(a)', '<a href="page.html">')
['a']
Why does findall() skip the '<'? I want to sort out full strings like '<a
href="page.html">' or '<area ... href="page.html">' and put them in a list.
I imagine the full regex should look something like this according to
today's standards:
re_link = re.compile(r'<(a|area)\s[^>]*href[^>]*/?>', re.I | re.M)
Where's the problem?
More information about the Python-list
mailing list