Regex problem

andres andres at corrada.com
Mon Oct 15 09:02:32 EDT 2001


Hi Gustaf,

Matches use the beginning of the line. Use "re.search" to search the
whole string. Alternatively, you could put "\s*" at the beginning of
your match string. 

Gustaf Liljegren writes: 

> I'm having a problem with a regex. I'm trying to match <a> or <area>
> elements containing the 'href' attribute. Here's the regex: 
> 
>>>> import re
>>>> re_link = re.compile(r'<(a|area)\s+[^>]*href[^>]*/?>', re.I | re.M)
> 
> It works fine when I try it on these two strings: 
> 
>>>> s1 = '<a href="mypage.html">'
>>>> re.match(re_link, s1).group()
> '<a href="mypage.html">' 
> 
>>>> s2 = '<area coords="0,0,10,10" href="mypage.html">'
>>>> re.match(re_link, s2).group()
> '<area coords="0,0,10,10" href="mypage.html">' 
> 
> But look what happens as soon as I add a space (or any other character)
> before: 
> 
>>>> s3 = ' <a href="mypage.html">'
>>>> re.match(re_link, s3).group()
> Traceback (most recent call last):
>   File "<pyshell#7>", line 1, in ?
>     re.match(re_link, s3).group()
> AttributeError: 'None' object has no attribute 'group'
>>>>
> 
> What's wrong here? Matches shouldn't have to start from the beginning of a
> string. 
> 
> Gustaf Liljegreb 
> 
> 
> -- 
> http://mail.python.org/mailman/listinfo/python-list
 




More information about the Python-list mailing list