url directory listing
Pete Shinners
pete at visionart.com
Thu Jun 22 21:44:44 EDT 2000
> Curtis Jensen wrote:
> >
> > It is possible to get the files off of a site. softbytelabs did it:
> > http://www.softbytelabs.com/
> > Their black widow program does a fine job of copying site files. Maybe
> > it's not done with HTTP, but does anyone know how it's done? What does
> > WebSucker do? Thanks.
i have not seen websucker in action, but i am assuming it is
downloading all the files referenced from within the .HTML
pages it gets. i don't think this would be too hard,
parse the HTML and read all the <img> tags
to get image filenames. then recurse through all the
<a href=> tags
heh, of coarse there's a bit more glue to put into something
like this, but it shouldn't be too taxing. you'll probably
want to peek into the <body> <table> <tr> and <td> tags
to see if they have a "background" attribute
More information about the Python-list
mailing list