[Tutor] screen scraping without the request
Rohan Deshpande
rohan.deshpande at gmail.com
Sun Apr 22 16:52:44 CEST 2007
Hi Luke/Kent,
Assuming I can find the POST, is mechanize the way to go to mimic browser
functionality? or do i need other/extra libraries?
Thanks,
Rohan
On 4/22/07, Luke Paireepinart <rabidpoobear at gmail.com> wrote:
>
> Kent Johnson wrote:
> > Rohan Deshpande wrote:
> >
> >> Hi All,
> >>
> >> the previous thread on screen scraping got me thinking of starting a
> >> similar project. However, the problem is I have no idea what the POST
> >> request is as there is no escape string after the URL when the
> resulting
> >> page comes up. I essentially need to pull the HTML from a page that is
> >> generated on a users machine and pipe it into a python script. How
> >> should I go about doing this? Is it possible/feasible to decipher the
> >> POST request and get the HTML, or use some screen scraping python libs
> a
> >> la the javascript DOM hacks? I was thinking of the possibilities of the
> >> former, but the interaction on the site is such that the user enters a
> >> username/password and goes through a couple links before getting to the
> >> page I need. Perhaps Python can use the session cookie and then pull
> >> the right page?
> >>
> Have you tried using Firebug? It's an extension for Firefox.
> You might be able to run it while you're navigating the site, and see
> the communciation between you and the server and get the POST that way,
> but I'm not completely certain about that.
> -Luke
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070422/5f1bff3c/attachment.html
More information about the Tutor
mailing list