[Tutor] screen scraping without the request

Rohan Deshpande rohan.deshpande at gmail.com
Sun Apr 22 13:20:31 CEST 2007


Hi All,

the previous thread on screen scraping got me thinking of starting a similar
project.  However, the problem is I have no idea what the POST request is as
there is no escape string after the URL when the resulting page comes up.  I
essentially need to pull the HTML from a page that is generated on a users
machine and pipe it into a python script.  How should I go about doing
this?  Is it possible/feasible to decipher the POST request and get the
HTML, or use some screen scraping python libs a la the javascript DOM hacks?
I was thinking of the possibilities of the former, but the interaction on
the site is such that the user enters a username/password and goes through a
couple links before getting to the page I need.  Perhaps Python can use the
session cookie and then pull the right page?

Sorry this sounds so vague. I've never done anything like this so I'm not
sure where to begin.

Quite puzzled,
Rohan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070422/3a8eef31/attachment.htm 


More information about the Tutor mailing list