urllib equivalent for HTTP requests
Diez B. Roggisch
deets at nospam.web.de
Wed Oct 8 03:34:40 EDT 2008
K schrieb:
> Hello everyone,
>
> I understand that urllib and urllib2 serve as really simple page
> request libraries. I was wondering if there is a library out there
> that can get the HTTP requests for a given page.
>
> Example:
> URL: http://www.google.com/test.html
>
> Something like: urllib.urlopen('http://www.google.com/
> test.html').files()
>
> Lists HTTP Requests attached to that URL:
> => http://www.google.com/test.html
> => http://www.google.com/css/google.css
> => http://www.google.com/js/js.css
There are no "Requests attached" to an url. There is a HTML-document
behind it, that might contain further external references.
> The other fun part is the inclusion of JS within <script> tags, i.e.
> the new Google Analytics script
> => http://www.google-analytics.com/ga.js
>
> or css, @imports
> => http://www.google.com/css/import.css
>
> I would like to keep track of that but I realize that py does not have
> a JS engine. :( Anyone with ideas on how to track these items or am I
> out of luck.
You can use e.g. BeautifulSoup to extract all links from the site.
What you can't do though is to get the requests that are issued by
Javascript that is *running*.
Diez
More information about the Python-list
mailing list