help with cr in reg exp...

GrelEns grelens at NOSPAMyahoo.NOTNEEDEDfr
Sat Jan 17 05:30:09 EST 2004


hello,

i had a trouble with re that i didn't understand (this is a silly example to
show, to parse html i use sgmllib) :
having this string :

>>> s = """<form name="test" method="post" action="test.php">
<input type="text" name="title" size="1." value="test...">
<br>
<a href="help.php">help</a>
</form>"""

why do i get :

>>> p = re.compile("(?=<form|<FORM).*(?=</form>|</FORM>)"); p.findall(s)
[]

while i was expected this kind of behaviour :
['form name="test" method="post" action="test.php">\n<input type="text"
name="title" size="1." value="test...">\n<br>\n<a href="help.php">help</a>']

which what i nearly get with :
>>> p = re.compile("(?=<form|<FORM).*(?=</form>|</FORM>)");
p.findall(s.replace('\n', ''))
['<form name="test" method="post" action="test.php"><input type="text"
name="title" size="1." value="test..."><br><a href="help.php">help</a> ']

it looks like \n isn't matched by . (dot)* in my re while i though (and
need) it should, i must be missing something.

thanks!





More information about the Python-list mailing list