[Tutor] saving webpage as webarchive

Danny Yoo dyoo at hashcollision.org
Tue Mar 1 03:42:24 EST 2016


> I want to save a webpage as a webarchive, and not just get the text.
> I hope there’s a way to do it without saving all of the images separately.
> And even if I do have to download them separately, then how would I combine everything into the HTM webarchive?


If I understand your question properly, I think you're asking for
something like the use of the 'wget' utility, which knows how to
download an entire web site:

    http://www.linuxjournal.com/content/downloading-entire-web-site-wget


Trying to do this as a Python program is not a simple task; it's
equivalent to writing a web crawler.
http://www-rohan.sdsu.edu/~gawron/python_for_ss/course_core/book_draft/web/web_intro.html
explains some basic ideas.


More information about the Tutor mailing list