[Tutor] RE problems
Kent Johnson
kent_johnson at skillsoft.com
Sat Aug 7 19:25:33 CEST 2004
I'm not sure what you are trying to do. If you want to match all the tags
in the HTML, try re.findall() instead of re.match() - match will only find
the first match.
For example:
>>> import re
>>> text = '''<p>Here is some text<br>\nOn two lines</p>'''
>>> p=re.compile(r"(<.*?>)",re.DOTALL)
>>> print p.match(text).group()
>>> print p.findall(text)
['<p>', '<br>', '</p>']
If this is not what you meant, please post a short snippet of HTML and the
result you are trying to get from it.
At 05:23 PM 8/6/2004 -0400, James Alexander McCarney wrote:
>Hi tutors,
>I am having problems returning everything I want from a regular expression.
>I am merely getting the first string in the html text file, which stands to
>reason as per the code.
>Could someone give me the magic to put all the strings tagged < > thus.
>Thanks for any tips you can provide. As for the document,
>I am reading amk's RE how-to; and I know it's all in there; it's just that
>I've cudgeled my brains a lot today. ;-(
>Best regards,
>import pythoncom
>from win32com.client import Dispatch
>import re
>app = Dispatch('Word.Application')
>app.Visible = 1
>doc = app.Documents.Add()
>f = open("C:\myfile.html")
>allines = f.read()
>s1 = doc.Sentences(1)
>s1.Text = m.group()
>app = None
>Tutor maillist - Tutor at python.org
More information about the Tutor
mailing list