[Tutor] screen scraping without the request

Martin Walsh mwalsh at groktech.org
Sun Apr 22 19:02:48 CEST 2007


Hi Rohan,

You might also try the LiveHTTPHeaders firefox extension, it is also
very good for this type of reverse engineering.

http://livehttpheaders.mozdev.org/index.html

HTH,
Marty

Luke Paireepinart wrote:
> Kent Johnson wrote:
>> Rohan Deshpande wrote:
>>   
>>> Hi All,
>>>
>>> the previous thread on screen scraping got me thinking of starting a 
>>> similar project.  However, the problem is I have no idea what the POST 
>>> request is as there is no escape string after the URL when the resulting 
>>> page comes up.  I essentially need to pull the HTML from a page that is 
>>> generated on a users machine and pipe it into a python script.  How 
>>> should I go about doing this?  Is it possible/feasible to decipher the 
>>> POST request and get the HTML, or use some screen scraping python libs a 
>>> la the javascript DOM hacks? I was thinking of the possibilities of the 
>>> former, but the interaction on the site is such that the user enters a 
>>> username/password and goes through a couple links before getting to the 
>>> page I need.  Perhaps Python can use the session cookie and then pull 
>>> the right page?
>>>     
> Have you tried using Firebug?  It's an extension for Firefox.
> You might be able to run it while you're navigating the site, and see 
> the communciation between you and the server and get the POST that way,
> but I'm not completely certain about that.
> -Luke
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list