[Tutor] Automating web page parsing

Thu Mar 30 01:33:46 CEST 2006

Srinivas Iyyer wrote:

> For a third party to analyze data from that chip, we
> should know the design of that chip and that
> information is one file. In this case it is GAL file.
> Since it is difficult and cumbersome to identify each
> design file type of all chips and get it into your
> directory.  However, on their website SMD
> (http://genome-www5.stanford.edu/), it is possible to
> go to each design file and obtain the data.  Since
> this is a time taking procedure, I wrote a socket
> script that would give me the URL of the file and
> allowing me to download.  The first barrier is, their
> database does not allow sockets programming. 

What did you try? How did it fail?

The website requires a form-based login which probably returns a cookie 
to your browser. Your socket solution needs to take this into account. 
There are some articles here with more info:
http://www.voidspace.org.uk/python/articles.shtml#http

> 
> Unfortunately, I have to access each file (there could
> be 40 - 100 files), get redirected to another page and
> there I can be able to download.
> 
> Is there a method to automate this procedure through a
> browser. 
> 
> Is there any alternative for such clicks. 

There are several packages intended to help script web sites, take a look at
twill http://www.idyll.org/~t/www-tools/twill/
mechanize and ClientForm http://wwwsearch.sf.net/
http://python.org/pypi/mechanoid

Kent