[Tutor] Automating web page parsing
Kent Johnson
kent37 at tds.net
Thu Mar 30 01:33:46 CEST 2006
Srinivas Iyyer wrote:
> For a third party to analyze data from that chip, we
> should know the design of that chip and that
> information is one file. In this case it is GAL file.
> Since it is difficult and cumbersome to identify each
> design file type of all chips and get it into your
> directory. However, on their website SMD
> (http://genome-www5.stanford.edu/), it is possible to
> go to each design file and obtain the data. Since
> this is a time taking procedure, I wrote a socket
> script that would give me the URL of the file and
> allowing me to download. The first barrier is, their
> database does not allow sockets programming.
What did you try? How did it fail?
The website requires a form-based login which probably returns a cookie
to your browser. Your socket solution needs to take this into account.
There are some articles here with more info:
http://www.voidspace.org.uk/python/articles.shtml#http
>
> Unfortunately, I have to access each file (there could
> be 40 - 100 files), get redirected to another page and
> there I can be able to download.
>
> Is there a method to automate this procedure through a
> browser.
>
> Is there any alternative for such clicks.
There are several packages intended to help script web sites, take a look at
twill http://www.idyll.org/~t/www-tools/twill/
mechanize and ClientForm http://wwwsearch.sf.net/
http://python.org/pypi/mechanoid
Kent
More information about the Tutor
mailing list