Urllib's urlopen and urlretrieve
davea at davea.name
Thu Feb 21 19:04:37 CET 2013
On 02/21/2013 12:47 PM, rh wrote:
> On Thu, 21 Feb 2013 10:56:15 -0500
> Dave Angel <davea at davea.name> wrote:
>> On 02/21/2013 07:12 AM, qoresucks at gmail.com wrote:
>>> I only just started Python and given that I know nothing about
>>> network programming or internet programming of any kind really, I
>>> thought it would be interesting to try write something that could
>>> create an archive of a website for myself.
>> To archive your website, use the rsync command. No need to write any
>> code, as rsync will descend into all the directories as needed, and
>> it'll get the actual website data, not the stuff that the web server
>> feeds to the browsers.
> How many websites let you suck down their content using rsync???
> The request was for creating their own copy of a website.
Clearly this was his own website, since it's usually unethical to "suck
down" someone else's. And my message specifically said "To archive
*your* website..." As to the implied question of why, since he
presumably has the original sources, I can only relate my own
experience. I generate mine by a python program, but over time obsolete
files are left behind. Additionally, an overzealous SEO person
hand-edited my files. And finally, I reinstalled my system from scratch
a couple of months ago. So in order to see exactly what's out there, I
used rsync, about two weeks ago.
More information about the Python-list