[Tutor] Automating web page parsing

Srinivas Iyyer srini_iyyer_bio at yahoo.com
Wed Mar 29 23:59:38 CEST 2006


Dear group, 

***Disclaimer***Not suitable for BioPython list***

I work with GeneChips to analyze human gene expression
patterns. These genechips are various kinds one of the
variety is made at Stanford University. In a typical
experiment, an experimenter uses roughly over 40
chips. 

For a third party to analyze data from that chip, we
should know the design of that chip and that
information is one file. In this case it is GAL file.
Since it is difficult and cumbersome to identify each
design file type of all chips and get it into your
directory.  However, on their website SMD
(http://genome-www5.stanford.edu/), it is possible to
go to each design file and obtain the data.  Since
this is a time taking procedure, I wrote a socket
script that would give me the URL of the file and
allowing me to download.  The first barrier is, their
database does not allow sockets programming. 

Unfortunately, I have to access each file (there could
be 40 - 100 files), get redirected to another page and
there I can be able to download.

Is there a method to automate this procedure through a
browser. 

Is there any alternative for such clicks. 

Example:
http://smd.stanford.edu/cgi-bin/data/viewDetails.pl?fullID=32898GENEPIX0
In this page at the bottom there is a link to
'Generate GAL file', that URL will allow me to get GAL
File. 

I cannot sit for whole evening and click ~40x30 times
and download that. It is painful. Are there any smart
ways to hack this process. 

Thanks
Sri.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


More information about the Tutor mailing list