web page retrieve problems

Alex metallourlante at gmail.com
Mon Jul 27 03:40:09 EDT 2009


On Jul 26, 8:57 am, golu <bhardwajjaye... at gmail.com> wrote:
> the following function retrieves pages from the web and saves them in
> a specified dir. i want to extract the respective filenames from the
> urls e.g the page code.google.com shud be saved as code-google.htm  or
> something similar. can u suggest me a way to do it

Try with urllib.urlretrieve from standard lib:

urllib.urlretrieve(url[, filename[, reporthook[, data]]])¶
Copy a network object denoted by a URL to a local file, if necessary.
If the URL points to a local file, or a valid cached copy of the
object exists, the object is not copied. Return a tuple (filename,
headers) where filename is the local file name under which the object
can be found, and headers is whatever the info() method of the object
returned by urlopen() returned (for a remote object, possibly cached).
Exceptions are the same as for urlopen().



More information about the Python-list mailing list