request: web site copy utility

Simon B. brunns at beer.com
Tue Jun 20 07:24:53 EDT 2000


In article <393207A4.C04E6BDB at earthlink.net>,
  Simon <tomega at earthlink.net> wrote:
> websucker.py, while an awesome utility, doesn't actually suck the
entire
> site.  For instance, it will pull the image files on a page if those
> images are statically SRC-ed.  If there is an onmouseover directive,
for
> instance, which changes the image source to another file, it will not
> grab the other file, even if it lies in the same directory as the
first
> image.

I have problems with some sites (see <http://publib.boulder.ibm.com/cgi-
bin/bookmgr/BOOKS/QBKAQV00/CCONTENTS> for example) where the URLs are
fully qualified, rather than relative. At least I *think* that that's
what the problem is. Oh, and java applets aren't sucked, either.

I've had a quick look into the websucker source, but I haven't spotted
anything yet. I'll keep looking, but can anyone help me out here?

--
Simon B


Sent via Deja.com http://www.deja.com/
Before you buy.



More information about the Python-list mailing list