[Tutor] An idea for a script

Ian Witham witham.ian at gmail.com
Thu Oct 11 02:17:50 CEST 2007


On 10/11/07, Dick Moores <rdm at rcblue.com> wrote:
>
> At 04:20 PM 10/10/2007, Dick Moores wrote:
> >How about a hint of how to get those ">jcooley<" things from the
> >source? (I'm able to have the script get the source, using urllib2.)
> >
> >BTW I thought I wouldn't try to use BeautifulSoup right now, but
> >take the hard way.
> >
> >Dick
>
> I asked for a hint too soon. A light went on, and I think I'm on the way
> with
>
> from urllib2 import *
> u = 'http://starship.python.net/crew/index.html'
> f = urlopen(u)
> a =  f.read()
> b = a.split('"')
> print b
> for x in b:
>      if '<' not in x:
>          print x
>
> This gets all, but not only, those ">jcooley<" things, I believe.


That looks like it will work...
Try starting with a couple of 'splits' so that you are only working with the
data between "The Crew" and "Looking for the official"

a =  f.read()
a = a.split("The Crew")[1].split("Looking for")[0]

Now you are only examining the relevant block of HTML.
You can now filter the list with a list comprehension:

b = a.split('"')
b = [u for u in b if '<' not in u]

Ian.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20071011/706b6abf/attachment.htm 


More information about the Tutor mailing list