[Tutor] screen scraping without the request
rohan.deshpande at gmail.com
Sun Apr 22 16:52:44 CEST 2007
Assuming I can find the POST, is mechanize the way to go to mimic browser
functionality? or do i need other/extra libraries?
On 4/22/07, Luke Paireepinart <rabidpoobear at gmail.com> wrote:
> Kent Johnson wrote:
> > Rohan Deshpande wrote:
> >> Hi All,
> >> the previous thread on screen scraping got me thinking of starting a
> >> similar project. However, the problem is I have no idea what the POST
> >> request is as there is no escape string after the URL when the
> >> page comes up. I essentially need to pull the HTML from a page that is
> >> generated on a users machine and pipe it into a python script. How
> >> should I go about doing this? Is it possible/feasible to decipher the
> >> POST request and get the HTML, or use some screen scraping python libs
> >> former, but the interaction on the site is such that the user enters a
> >> username/password and goes through a couple links before getting to the
> >> page I need. Perhaps Python can use the session cookie and then pull
> >> the right page?
> Have you tried using Firebug? It's an extension for Firefox.
> You might be able to run it while you're navigating the site, and see
> the communciation between you and the server and get the POST that way,
> but I'm not completely certain about that.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Tutor