Web page data and urllib2.urlopen

Dave Angel davea at ieee.org
Fri Aug 7 08:34:00 EDT 2009


Piet van Oostrum wrote:
>>>>>> <snip>
>>>>>>             
> <snip>
>> DA> But the raw page didn't have any javascript.  So what about that original
>> DA> raw page triggered additional stuff to be loaded?
>> DA> Is it "user agent", as someone else brought out?  And is there somewhere I
>> DA> can read more about that aspect of things?  I've mostly built very static
>> DA> html pages, where the server yields the same page to everybody.  And some
>> DA> form stuff, where the  user clicks on a 'submit" button to trigger a script
>> DA> that's not shown on the URL line.
>>     
>
> Yes, if you specify a 'normal' web browser as user agent you do get the
> Javascript:
>
> import urllib2
>
> request = urllib2.Request('http://www.marketwatch.com/story/mondays-biggest-gaining-and-declining-stocks-2009-07-27')
> request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13) Gecko/2009073021 Firefox/3.0.13')
>
> opener = urllib2.build_opener() 
> page = opener.open(request).read()
> print page
>
>   
Thanks much.  That's a key I didn't understand.

DaveA



More information about the Python-list mailing list