url directory listing

Pete Shinners pete at visionart.com
Thu Jun 22 21:44:44 EDT 2000


> Curtis Jensen wrote:
> >
> > It is possible to get the files off of a site.  softbytelabs did it:
> > http://www.softbytelabs.com/
> > Their black widow program does a fine job of copying site files.  Maybe
> > it's not done with HTTP, but does anyone know how it's done?  What does
> > WebSucker do?  Thanks.

i have not seen websucker in action, but i am assuming it is
downloading all the files referenced from within the .HTML
pages it gets. i don't think this would be too hard,

parse the HTML and read all the <img> tags
to get image filenames. then recurse through all the
<a href=> tags

heh, of coarse there's a bit more glue to put into something
like this, but it shouldn't be too taxing. you'll probably
want to peek into the <body> <table> <tr> and <td> tags
to see if they have a "background" attribute



More information about the Python-list mailing list