[Tutor] Using an XML file for web crawling
Alan Gauld
alan.gauld at yahoo.co.uk
Fri Mar 31 10:11:11 EDT 2017
On 31/03/17 12:23, Igor Alexandre wrote:
> I have a sitemap in XML and I want to use it to save as text the various pages
What about non-text pages such as images and media files?
> I'm looking for some code on the web where I can just type the xml address
> and wait for the crawler to do it's job, saving all the pages
> indicated in the sitemap as a text file in my computer.
I assume you mean multiple text files? And I assume you want to recreate
the site structure too - with folders etc?
There are tools around to do that but I don't know of any Python code
that you can just pick up and use, you will need to do a bit of work.
But I'm not an expert in web crawling so I could be wrong! :-)
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos
More information about the Tutor
mailing list