mirror dir using urllib

Brian Kranson bk at whack.org
Tue Feb 4 01:38:24 EST 2003


i have a small script that will mirror the dir pub/irs-gov on
www.irs.gov as follows:

import urllib, re, string
uf=urllib.urlopen("http://www.irs.gov/pub/irs-pdf/")
data=uf.read()
data1=string.split(data,"\012")
mylist=[]
for i in data1:
        if re.compile('NAME=.*\.pdf">').search(i,1):
                a=re.compile('NAME=.*\.pdf">').search(i,1)
                mylist.append(string.split(i[a.start():a.end()],'"')[1])


for i in mylist:
        file=open(i,'wb')
        uf=urllib.urlopen("http://www.irs.gov/pub/irs-pdf/" + i)
        file.write(uf.read())
        file.close()

i want to know if there is a good way to make this faster.  my
internet connection is fast enought that i don't need to do it serial,
 but this is the only way i currently understand how to make it work.

Thanks
Bk




More information about the Python-list mailing list