[Tutor] An idea for a script
witham.ian at gmail.com
Thu Oct 11 02:17:50 CEST 2007
On 10/11/07, Dick Moores <rdm at rcblue.com> wrote:
> At 04:20 PM 10/10/2007, Dick Moores wrote:
> >How about a hint of how to get those ">jcooley<" things from the
> >source? (I'm able to have the script get the source, using urllib2.)
> >BTW I thought I wouldn't try to use BeautifulSoup right now, but
> >take the hard way.
> I asked for a hint too soon. A light went on, and I think I'm on the way
> from urllib2 import *
> u = 'http://starship.python.net/crew/index.html'
> f = urlopen(u)
> a = f.read()
> b = a.split('"')
> print b
> for x in b:
> if '<' not in x:
> print x
> This gets all, but not only, those ">jcooley<" things, I believe.
That looks like it will work...
Try starting with a couple of 'splits' so that you are only working with the
data between "The Crew" and "Looking for the official"
a = f.read()
a = a.split("The Crew").split("Looking for")
Now you are only examining the relevant block of HTML.
You can now filter the list with a list comprehension:
b = a.split('"')
b = [u for u in b if '<' not in u]
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Tutor