[Tutor] screen scraping without the request

Rohan Deshpande rohan.deshpande at gmail.com
Sun Apr 22 16:52:44 CEST 2007


Hi Luke/Kent,

Assuming I can find the POST, is mechanize the way to go to mimic browser
functionality? or do i need other/extra libraries?
Thanks,
Rohan

On 4/22/07, Luke Paireepinart <rabidpoobear at gmail.com> wrote:
>
> Kent Johnson wrote:
> > Rohan Deshpande wrote:
> >
> >> Hi All,
> >>
> >> the previous thread on screen scraping got me thinking of starting a
> >> similar project.  However, the problem is I have no idea what the POST
> >> request is as there is no escape string after the URL when the
> resulting
> >> page comes up.  I essentially need to pull the HTML from a page that is
> >> generated on a users machine and pipe it into a python script.  How
> >> should I go about doing this?  Is it possible/feasible to decipher the
> >> POST request and get the HTML, or use some screen scraping python libs
> a
> >> la the javascript DOM hacks? I was thinking of the possibilities of the
> >> former, but the interaction on the site is such that the user enters a
> >> username/password and goes through a couple links before getting to the
> >> page I need.  Perhaps Python can use the session cookie and then pull
> >> the right page?
> >>
> Have you tried using Firebug?  It's an extension for Firefox.
> You might be able to run it while you're navigating the site, and see
> the communciation between you and the server and get the POST that way,
> but I'm not completely certain about that.
> -Luke
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070422/5f1bff3c/attachment.html 


More information about the Tutor mailing list