[Tutor] screen scraping without the request

Luke Paireepinart rabidpoobear at gmail.com
Sun Apr 22 16:43:03 CEST 2007


Kent Johnson wrote:
> Rohan Deshpande wrote:
>   
>> Hi All,
>>
>> the previous thread on screen scraping got me thinking of starting a 
>> similar project.  However, the problem is I have no idea what the POST 
>> request is as there is no escape string after the URL when the resulting 
>> page comes up.  I essentially need to pull the HTML from a page that is 
>> generated on a users machine and pipe it into a python script.  How 
>> should I go about doing this?  Is it possible/feasible to decipher the 
>> POST request and get the HTML, or use some screen scraping python libs a 
>> la the javascript DOM hacks? I was thinking of the possibilities of the 
>> former, but the interaction on the site is such that the user enters a 
>> username/password and goes through a couple links before getting to the 
>> page I need.  Perhaps Python can use the session cookie and then pull 
>> the right page?
>>     
Have you tried using Firebug?  It's an extension for Firefox.
You might be able to run it while you're navigating the site, and see 
the communciation between you and the server and get the POST that way,
but I'm not completely certain about that.
-Luke


More information about the Tutor mailing list