[Tutor] Accessing a Website

Joel Goldstick joel.goldstick at gmail.com
Thu Jul 12 20:42:38 CEST 2012


On Thu, Jul 12, 2012 at 2:03 PM, Fred G <bayespokerguy at gmail.com> wrote:
> Hi--
>
> My pseudocode is the following
>
> new_dictionary = []
> for name in file:
>  #1) log into university account
>  #2) go to website with data
>  #3) type in search box: name
>  #4) click search
>  #5) if name is exact match with name of one of the hits:
>     line.find("Code Number")
>     #6) remove the number directly after "Code Number: " and stop at the
> next space
> new_dictionary[name] = Code Number
>
> With the exception of step 6, I'm not quite sure how to do this in Python.
> Is it very complicated to write a script that logs onto a website that
> requires a user name and password that I have, and then repeatedly enters
> names and gets their associated id's that we want?  I used to work at a
> cancer lab where we decided we couldn't do this kind of thing to search
> PubMed, and that a human would be more accurate even though our criteria was
> simply (is there survival data?).  I don't think that this has to be the
> case here, but would greatly appreciate any guidance.
>
> Thanks so much.
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
There are a couple of modules (urllib, urllib2) in python and another
one that I like called requests that let your program access a
website.  (http://docs.python-requests.org/en/latest/index.html)
You can read the examples and see if this will help you on your way.
Also, go to the webpage with the search box, and when you enter a
search term and submit it, see what the url looks like after
submitting.  If it is a 'get' request, your search parameter will be
at the tail of the url.  If it is, you can create those urls in your
code and request the results (with requests module).
There is a great module called Beautiful Soup (use version 4) that can
help you parse through your results


-- 
Joel Goldstick


More information about the Tutor mailing list