<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">


<head>


<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">


<meta name=Generator content="Microsoft Word 11 (filtered medium)">


<!--[if !mso]>


<style>


v\:* {behavior:url(#default#VML);}


o\:* {behavior:url(#default#VML);}


w\:* {behavior:url(#default#VML);}


.shape {behavior:url(#default#VML);}


</style>


<![endif]-->


<style>


<!--


 /* Font Definitions */


 @font-face


        {font-family:Tahoma;


        panose-1:2 11 6 4 3 5 4 4 2 4;}


 /* Style Definitions */


 p.MsoNormal, li.MsoNormal, div.MsoNormal


        {margin:0in;


        margin-bottom:.0001pt;


        font-size:12.0pt;


        font-family:"Times New Roman";}


a:link, span.MsoHyperlink


        {color:blue;


        text-decoration:underline;}


a:visited, span.MsoHyperlinkFollowed


        {color:blue;


        text-decoration:underline;}


p


        {mso-margin-top-alt:auto;


        margin-right:0in;


        mso-margin-bottom-alt:auto;


        margin-left:0in;


        font-size:12.0pt;


        font-family:"Times New Roman";}


span.EmailStyle17


        {mso-style-type:personal-reply;


        font-family:Arial;


        color:navy;}


@page Section1


        {size:8.5in 11.0in;


        margin:1.0in 1.25in 1.0in 1.25in;}


div.Section1


        {page:Section1;}


-->


</style>


</head>


<body lang=EN-US link=blue vlink=blue>


<div class=Section1>


<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:


12.0pt'>If your only interested in the Images, perhaps you want to use wget like:<o:p></o:p></span></font></p>


<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:


12.0pt'><o:p> </o:p></span></font></p>


<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:


12.0pt'>wget -r --accept=jpg,jpeg www.xyz.org</span></font><font size=2


color=navy face=Arial><span style='font-size:10.0pt;font-family:Arial;


color:navy'><o:p></o:p></span></font></p>


<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:


10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>


<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:


10.0pt;font-family:Arial;color:navy'>or maybe this<o:p></o:p></span></font></p>


<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:


10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>


<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:


10.0pt;font-family:Arial;color:navy'><a


href="http://www.vex.net/~x/python_stuff.html">http://www.vex.net/~x/python_stuff.html</a><o:p></o:p></span></font></p>


<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:


10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>


<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:


12.0pt'><a href="http://www.vex.net/%7Ex/files/backcrawler.zip">BackCrawler 1.1</a><o:p></o:p></span></font></p>


<p><font size=3 face="Times New Roman"><span style='font-size:12.0pt'>A crude


web spider with only one purpose: mercilessly suck the background images from


all web pages it can find. Understands frames and redirects, uses MD5 to


elimate duplicates. Need web page backgrounds? This'll get lots of them. Sadly,


most are very tacky, and Backcrawler can't help with that. <i><span


style='font-style:italic'>Requires Threads.</span></i><o:p></o:p></span></font></p>


<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:


10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>


<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:


10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>


<div>


<div class=MsoNormal align=center style='text-align:center'><font size=3


face="Times New Roman"><span style='font-size:12.0pt'>


<hr size=2 width="100%" align=center tabindex=-1>


</span></font></div>


<p class=MsoNormal><b><font size=2 face=Tahoma><span style='font-size:10.0pt;


font-family:Tahoma;font-weight:bold'>From:</span></font></b><font size=2


face=Tahoma><span style='font-size:10.0pt;font-family:Tahoma'> Ronn Ross


[mailto:ronn.ross@gmail.com] <br>


<b><span style='font-weight:bold'>Sent:</span></b> Tuesday, April 07, 2009 9:37


AM<br>


<b><span style='font-weight:bold'>To:</span></b> Support Desk<br>


<b><span style='font-weight:bold'>Subject:</span></b> Re: Scraping a web page</span></font><o:p></o:p></p>


</div>


<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:


12.0pt'><o:p> </o:p></span></font></p>


<p class=MsoNormal style='margin-bottom:12.0pt'><font size=3


face="Times New Roman"><span style='font-size:12.0pt'>This works great, but is


there a way to do this with firefox or something similar so I can also print


the images from the site? <o:p></o:p></span></font></p>


<div>


<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:


12.0pt'>On Tue, Apr 7, 2009 at 9:58 AM, Support Desk <<a


href="mailto:support.desk.ipg@gmail.com">support.desk.ipg@gmail.com</a>>


wrote:<o:p></o:p></span></font></p>


<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:


12.0pt'>You could do something like below to get the rendered page.<br>


<br>


Import os<br>


site = '<a href="http://website.com" target="_blank">website.com</a>'<br>


X = os.popen('lynx --dump %s' % site).readlines()<o:p></o:p></span></font></p>


<div>


<div>


<p class=MsoNormal style='margin-bottom:12.0pt'><font size=3


face="Times New Roman"><span style='font-size:12.0pt'><br>


<br>


<br>


<br>


<br>


<br>


<br>


-----Original Message-----<br>


From: Tim Chase [mailto:<a href="mailto:python.list@tim.thechases.com">python.list@tim.thechases.com</a>]<br>


Sent: Tuesday, April 07, 2009 7:45 AM<br>


To: Ronn Ross<br>


Cc: <a href="mailto:python-list@python.org">python-list@python.org</a><br>


Subject: Re: Scraping a web page<br>


<br>


> f = urllib.urlopen("<a href="http://www.google.com" target="_blank">http://www.google.com</a>")<br>


> s = f.read()<br>


><br>


> It is working, but it's returning the source of the page. Is there anyway<br>


I<br>


> can get almost a screen capture of the page?<br>


<br>


This is the job of a browser -- to render the source HTML.  As<br>


such, you'd want to look into any of the browser-automation<br>


libraries to hook into IE, FireFox, Opera, or maybe using the<br>


WebKit/KHTML control.  You may then be able to direct it to<br>


render the HTML into a canvas you can then treat as an image.<br>


<br>


Another alternative might be provided by some web-services that<br>


will render a page as HTML with various browsers and then send<br>


you the result.  However, these are usually either (1)<br>


asynchronous or (2) paid services (or both).<br>


<br>


-tkc<br>


<br>


<br>


<br>


<br>


<br>


<br>


<o:p></o:p></span></font></p>


</div>


</div>


</div>


<p class=MsoNormal><font size=3 face="Times New Roman"><span style='font-size:


12.0pt'><o:p> </o:p></span></font></p>


</div>


</body>


</html>