difference between urllib2.urlopen and firefox view 'page source'?
nagle at animats.com
Tue Mar 20 18:32:33 CET 2007
Here's a useful online tool that might help you see what's happening:
We use this to help webmasters see what our web crawler is seeing.
This reads a page, using Python and FancyURLOpener, with a
USER-AGENT string of "SiteTruth.com site rating system."
Then it parses the page with BeautifulSoup, removes all
<SCRIPT>, <EMBED>, and <OBJECT> tags, makes all the links
absolute, then writes the page back out in UTF-8 Unicode.
The resulting cleaned-up page is displayed.
If the page you're trying to read looks OK with our viewer,
you should be able to read it from Python with no problems.
> I am trying to screen scrape some stock data from yahoo, so I am
> trying to use urllib2 to retrieve the html and beautiful soup for the
> Maybe (most likely) I am doing something wrong, but when I use
> urllib2.urlopen to fetch a page, and when I view 'page source' of the
> exact same URL in firefox, I am seeing slight differences in the raw
> Do I need to set a browser agent so yahoo thinks urllib2 is firefox?
> passing different data?
More information about the Python-list