mirror dir using urllib
Brian Kranson
bk at whack.org
Tue Feb 4 01:38:24 EST 2003
i have a small script that will mirror the dir pub/irs-gov on
www.irs.gov as follows:
import urllib, re, string
uf=urllib.urlopen("http://www.irs.gov/pub/irs-pdf/")
data=uf.read()
data1=string.split(data,"\012")
mylist=[]
for i in data1:
if re.compile('NAME=.*\.pdf">').search(i,1):
a=re.compile('NAME=.*\.pdf">').search(i,1)
mylist.append(string.split(i[a.start():a.end()],'"')[1])
for i in mylist:
file=open(i,'wb')
uf=urllib.urlopen("http://www.irs.gov/pub/irs-pdf/" + i)
file.write(uf.read())
file.close()
i want to know if there is a good way to make this faster. my
internet connection is fast enought that i don't need to do it serial,
but this is the only way i currently understand how to make it work.
Thanks
Bk
More information about the Python-list
mailing list