[Tutor] help with web programming
Daniel Coughlin
kauphlyn@speakeasy.org
Fri, 31 Aug 2001 16:53:13 -0700 (PDT)
Hello Pythoneers,
I am writing a script which grabs a webpage and then returns a list of a
certain type of link. The each link is then appended to my main url to collect
the associated pages.
It works fine for the first 10 or so links and then it breaks. (The list
of valid links has about 85 items in it.)
Here is the script:
import urllib
import re
sUrl = 'http://mydomain.com/'
sCmd = 'cmd'
sUrlcmd = sUrl+sCmd
lCmds = []
tupU = urllib.urlretrieve(sUrlcmd) #get main page
fTmp = open(tupU[0]) #open source file for main page
sTmp = fTmp.read()
lTmpHrefs = re.findall(r'href="(.*?)"', sTmp) #filter source for hrefs
for item in lTmpHrefs:
if item[0:3] == sCmd: lCmds.append(item) #make list of special hrefs
for cmd in lCmds:
tupTmp = urllib.urlretrieve(sUrl+cmd) #retrieve source of special links
print tupTmp[0] #print location of source files
-----------------------------------------------------
Here is the result:
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-21
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-22
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-24
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-25
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-26
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-27
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-28
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-29
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-30
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-31
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-32
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-33
Traceback (most recent call last):
File "C:\Python21\Pythonwin\pywin\framework\scriptutils.py", line 301,
in RunScript
exec codeObject in __main__.__dict__
File "C:\daniel\test.py", line 28, in ?
File "C:\Python21\lib\urllib.py", line 78, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python21\lib\urllib.py", line 208, in retrieve
fp = self.open(url, data)
File "C:\Python21\lib\urllib.py", line 176, in open
return getattr(self, name)(url)
File "C:\Python21\lib\urllib.py", line 290, in open_http
errcode, errmsg, headers = h.getreply()
File "C:\Python21\lib\httplib.py", line 705, in getreply
response = self._conn.getresponse()
File "C:\Python21\lib\httplib.py", line 559, in getresponse
response.begin()
File "C:\Python21\lib\httplib.py", line 117, in begin
line = self.fp.readline()
File "C:\Python21\lib\socket.py", line 233, in readline
new = self._sock.recv(self._rbufsize)
IOError: [Errno socket error] (10054, 'Connection reset by peer')
------
Can anyone out there help me interprit this error? Any help at all will
be appreciated!
Thanks,
Daniel