[Tutor] Value Error solved. Another question
Kent Johnson
kent37 at tds.net
Mon Feb 14 12:06:38 CET 2005
Ron Nixon wrote:
> Ignore my first posting. Here's what I'm trying to do.
> I want to extract headlines from a newspaper's website
> using this code. It works, but I want to match the
> second group in <h2><a href="(.*)">(.*)</p> and print
> that out.
> Sugguestions
>
>
> import urllib, re
> pattern = re.compile("""<h2><a
> href="(.*)">(.*)</p>""", re.DOTALL)
> page =
> urllib.urlopen("http://www.startribune.com").read()
> for headline in pattern.findall(page):
> print headline
I think you want
for headline, body in pattern.findall(page):
print body
pattern.findall() returns a list of tuples of groups. You have two groups in your regex so in your
code headline is being assigned to a tuple with two items. In my code the tuple is split and you can
print just the second item.
PS You might want to look at BeautifulSoup:
http://www.crummy.com/software/BeautifulSoup/
Kent
>
>
>
>
> __________________________________
> Do you Yahoo!?
> Yahoo! Mail - You care about security. So do we.
> http://promotions.yahoo.com/new_mail
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
More information about the Tutor
mailing list