[Tutor] RE help

Ron Nixon nixonron at yahoo.com
Tue Feb 15 21:37:40 CET 2005


Problem solved. Thanks


--- Kent Johnson <kent37 at tds.net> wrote:

> Try it with non-greedy matches. You are matching
> everything from the first <hX><a to the last </p> 
> in one match. Also I think you want to escape the .
> before </p> (you want just paragraphs that end 
> in a period?)
> 
> pattern = re.compile("""<h[1-2]><a
> href="/(.*?)">(.*?)\.</p>""", re.DOTALL)
> 
> Kent
> 
> Ron Nixon wrote:
> > Trying to scrape a newspaper site for articles
> using
> > this code whic ws done with help from the list:
> > 
> > import urllib, re
> > pattern = re.compile("""<h[1-2]><a
> > href="/(.*)">(.*).</p>""", re.DOTALL)
> > page
> >
> =urllib.urlopen("http://www.startribune.com").read()
>  
> > 
> > for headline, body in pattern.findall(page):
> >     print body
> > 
> > It should grab articles from this:
> > 
> > <h2><a href="/stories/507/5240764.html">Sid
> Hartman:
> > Franchise could be moved</a></h2><p>If Reggie
> Fowler
> > and his business partners from New Jersey are
> approved
> > to buy the Vikings franchise from Red McCombs, it
> is
> > my opinion the franchise remains in danger of
> > eventually being relocated.</p>
> > 
> > and give me this: Sid Hartman: Franchise could be
> > moved</a></h2><p>If Reggie Fowler and his business
> > partners from New Jersey are approved to buy the
> > Vikings franchise from Red McCombs, it is my
> opinion
> > the franchise remains in danger of eventually
> being
> > relocated.
> > 
> > Instead it gives me this:<b>Boxerjam</b></a>. from
> > this :
> >
>
href="http://www.startribune.com/stories/1559/4773140.html"><b>Boxerjam</b></a>.
> > </p></div>
> > 
> > I know the re works in other programs I've tried.
> Is
> > there something different about re's in Python?
> > 
> > 
> > 
> > 
> > 		
> > __________________________________ 
> > Do you Yahoo!? 
> > Yahoo! Mail - Find what you need with new enhanced
> search.
> > http://info.mail.yahoo.com/mail_250
> > _______________________________________________
> > Tutor maillist  -  Tutor at python.org
> > http://mail.python.org/mailman/listinfo/tutor
> > 
> 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 



		
__________________________________ 
Do you Yahoo!? 
The all-new My Yahoo! - What will yours do?
http://my.yahoo.com 


More information about the Tutor mailing list