Reg Exp: Need advice concerning "greediness"

Alex the_brain at mit.edu
Sat Sep 30 10:47:06 EDT 2000


> I tried:
> sRslt = "<h1><font COLOR="#FF0000">Heading Level 1</font></h1>";
> print re.findall(re.compile(r'(.*?FONT.*?)(COLOR=.*?)*([ |>].*)', re.I |
> re.S), sRslt);
> 
> This returns [("<h1><font, , COLOR="#FF0000">Heading Level 1</font></h1>)].
> I'd expected to receive [("<h1><font , COLOR="#FF0000", >Heading Level
> 1</font></h1>)].

For me, using python2.0, I get this answer

[('<h1><font', '', ' COLOR=')]

which is different from what you got, and what you expected.  Also, what
you got is not syntactically correct, I think.  Could you paste the
output directly from the interpreter?

In general, for this sort of thing, you are better off learning to use
the htmllib module, imo.  It'll take you about the same amount of time
to learn it this time as to get the regexp correct, and you'll have a
far more appropriate framework for the next such problem that comes
along.

Alex.

-- 
Speak softly but carry a big carrot.




More information about the Python-list mailing list