[Tutor] RE problems

Brandon Bennett bennetb at gmail.com
Fri Aug 6 23:38:52 CEST 2004


I think this is a classic example of the greedy .*

Use "(<[^>]*>)"

This is match all characters between the < > that is not > (the ending tab. 

~Brandon

On Fri, 6 Aug 2004 17:23:07 -0400, James Alexander McCarney
<james.mccarney at cgi.com> wrote:
> Hi tutors,
> 
> I am having problems returning everything I want from a regular expression.
> I am merely getting the first string in the html text file, which stands to
> reason as per the code.
> 
> Could someone give me the magic to put all the strings tagged < > thus.
> 
> Thanks for any tips you can provide. As for the document,
> 
> I am reading amk's RE how-to; and I know it's all in there; it's just that
> I've cudgeled my brains a lot today. ;-(
> 
> Best regards,
> James
> 
> ##
> ##
> import pythoncom
> from win32com.client import Dispatch
> import re
> 
> app = Dispatch('Word.Application')
> app.Visible = 1
> 
> doc = app.Documents.Add()
> 
> f = open("C:\myfile.html")
> 
> ##
> ##
> ##
> ##
> 
> allines = f.read()
> p=re.compile(r"(<.*?>)",re.DOTALL)
> m=p.match(allines)
> 
> ##
> 
> s1 = doc.Sentences(1)
> s1.Text = m.group()
> 
> doc.SaveAs("C:\myTestPy.doc')
> 
> app.Quit()
> app = None
> 
> pythoncom.CoUninitialize()
> f.close()
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list