[Tutor] Using an XML file for web crawling

Alan Gauld alan.gauld at yahoo.co.uk
Fri Mar 31 10:11:11 EDT 2017


On 31/03/17 12:23, Igor Alexandre wrote:

> I have a sitemap in XML and I want to use it to save as text the various pages

What about non-text pages such as images and media files?

> I'm looking for some code on the web where I can just type the xml address 
> and wait for the crawler to do it's job, saving all the pages
> indicated in the sitemap as a text file in my computer.

I assume you mean multiple text files? And I assume you want to recreate
the site structure too - with folders etc?

There are tools around to do that but I don't know of any Python code
that you can just pick up and use, you will need to do a bit of work.
But I'm not an expert in web crawling so I could be wrong! :-)

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list