re.DOTALL
Bengt Richter
bokr at oz.net
Wed Nov 27 16:02:11 EST 2002
On Wed, 27 Nov 2002 13:46:56 -0500, Irina Szabo <irina at simbiosys.ca> wrote:
>
>I need to remove all tags from HTML files.
>
>matchstr = re.compile(r'''<.*?>''',re.DOTALL|re.MULTILINE)
>print matchstr.sub(" ", str )
^^^--- BTW, using builtin names for variables is not a good idea ;-)
>
>The program works well for tags located on one line,
>but dosn't delete tags if the brackets <> are on different lines, like
>
><!--
>body { font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color:
>#000000}
>-->
>
>What is wrong?
>
If you had posted what you actually did (noodge ;-),
it would probably have been clear, but ...
====< interactive snip >=======
>>> aString = """
... something, followed by your html comment:
... <!--
... body { font-family: Arial, Helvetica, sans-serif; font-size: 10pt; color:
... #000000}
... -->
... followed by this line.
... """
>>> import re
>>> matchstr = re.compile(r'''<.*?>''',re.DOTALL|re.MULTILINE)
>>> print matchstr.sub(" ", aString)
something, followed by your html comment:
followed by this line.
===============================
... works, so you must have done something else.
Regards,
Bengt Richter
More information about the Python-list
mailing list