[Tutor] RE problems

Kent Johnson kent_johnson at skillsoft.com
Sat Aug 7 19:25:33 CEST 2004


James,

I'm not sure what you are trying to do. If you want to match all the tags 
in the HTML, try re.findall() instead of re.match() - match will only find 
the first match.

For example:
 >>> import re
 >>> text = '''<p>Here is some text<br>\nOn two lines</p>'''
 >>> p=re.compile(r"(<.*?>)",re.DOTALL)
 >>> print p.match(text).group()
<p>
 >>> print p.findall(text)
['<p>', '<br>', '</p>']


If this is not what you meant, please post a short snippet of HTML and the 
result you are trying to get from it.

Kent

At 05:23 PM 8/6/2004 -0400, James Alexander McCarney wrote:
>Hi tutors,
>
>I am having problems returning everything I want from a regular expression.
>I am merely getting the first string in the html text file, which stands to
>reason as per the code.
>
>Could someone give me the magic to put all the strings tagged < > thus.
>
>Thanks for any tips you can provide. As for the document,
>
>I am reading amk's RE how-to; and I know it's all in there; it's just that
>I've cudgeled my brains a lot today. ;-(
>
>Best regards,
>James
>
>##
>##
>import pythoncom
>from win32com.client import Dispatch
>import re
>
>
>app = Dispatch('Word.Application')
>app.Visible = 1
>
>doc = app.Documents.Add()
>
>f = open("C:\myfile.html")
>
>##
>##
>##
>##
>
>allines = f.read()
>p=re.compile(r"(<.*?>)",re.DOTALL)
>m=p.match(allines)
>
>##
>
>
>s1 = doc.Sentences(1)
>s1.Text = m.group()
>
>doc.SaveAs("C:\myTestPy.doc')
>
>app.Quit()
>app = None
>
>pythoncom.CoUninitialize()
>f.close()
>
>_______________________________________________
>Tutor maillist  -  Tutor at python.org
>http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list