[Tutor] Automating web page parsing

Thu Mar 30 06:34:27 CEST 2006

Srinivas Iyyer wrote:
> Dear group, 
>
> ***Disclaimer***Not suitable for BioPython list***
>
> I work with GeneChips to analyze human gene expression
> patterns. These genechips are various kinds one of the
> variety is made at Stanford University. In a typical
> experiment, an experimenter uses roughly over 40
> chips. 
>
> For a third party to analyze data from that chip, we
> should know the design of that chip and that
> information is one file. In this case it is GAL file.
> Since it is difficult and cumbersome to identify each
> design file type of all chips and get it into your
> directory.  However, on their website SMD
> (http://genome-www5.stanford.edu/), it is possible to
> go to each design file and obtain the data.  Since
> this is a time taking procedure, I wrote a socket
> script that would give me the URL of the file and
> allowing me to download.  The first barrier is, their
> database does not allow sockets programming. 
>
> Unfortunately, I have to access each file (there could
> be 40 - 100 files), get redirected to another page and
> there I can be able to download.
>
> Is there a method to automate this procedure through a
> browser. 
>
> Is there any alternative for such clicks. 
>
> Example:
> http://smd.stanford.edu/cgi-bin/data/viewDetails.pl?fullID=32898GENEPIX0
>   
Unfortunately for me when I go to that link I get a login form, not what 
you describe. So I can't help until I know how to log in.
> In this page at the bottom there is a link to
> 'Generate GAL file', that URL will allow me to get GAL
> File. 
>
> I cannot sit for whole evening and click ~40x30 times
> and download that. It is painful. Are there any smart
> ways to hack this process. 
>
> Thanks
> Sri.
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
>