[Tutor] Automating web page parsing
bgailer at alum.rpi.edu
Thu Mar 30 06:34:27 CEST 2006
Srinivas Iyyer wrote:
> Dear group,
> ***Disclaimer***Not suitable for BioPython list***
> I work with GeneChips to analyze human gene expression
> patterns. These genechips are various kinds one of the
> variety is made at Stanford University. In a typical
> experiment, an experimenter uses roughly over 40
> For a third party to analyze data from that chip, we
> should know the design of that chip and that
> information is one file. In this case it is GAL file.
> Since it is difficult and cumbersome to identify each
> design file type of all chips and get it into your
> directory. However, on their website SMD
> (http://genome-www5.stanford.edu/), it is possible to
> go to each design file and obtain the data. Since
> this is a time taking procedure, I wrote a socket
> script that would give me the URL of the file and
> allowing me to download. The first barrier is, their
> database does not allow sockets programming.
> Unfortunately, I have to access each file (there could
> be 40 - 100 files), get redirected to another page and
> there I can be able to download.
> Is there a method to automate this procedure through a
> Is there any alternative for such clicks.
Unfortunately for me when I go to that link I get a login form, not what
you describe. So I can't help until I know how to log in.
> In this page at the bottom there is a link to
> 'Generate GAL file', that URL will allow me to get GAL
> I cannot sit for whole evening and click ~40x30 times
> and download that. It is painful. Are there any smart
> ways to hack this process.
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> Tutor maillist - Tutor at python.org
More information about the Tutor