[Tutor] help with web programming

Daniel Coughlin kauphlyn@speakeasy.org
Fri, 31 Aug 2001 16:53:13 -0700 (PDT)


Hello Pythoneers,

I am writing a script which grabs a webpage and then returns a list of a
certain type of link. The each link is then appended to my main url to collect
the associated pages.
It works fine for the first 10 or so links and then it breaks. (The list
of valid links has about 85 items in it.)

Here is the script:


import urllib
import re


sUrl = 'http://mydomain.com/'
sCmd = 'cmd'
sUrlcmd = sUrl+sCmd
lCmds = []


tupU = urllib.urlretrieve(sUrlcmd) 	#get main page
fTmp = open(tupU[0]) 			#open source file for main page
sTmp = fTmp.read()
lTmpHrefs = re.findall(r'href="(.*?)"', sTmp) #filter source for hrefs

for item in lTmpHrefs:
    if item[0:3] == sCmd: lCmds.append(item) #make list of special hrefs


for cmd in lCmds:
    tupTmp = urllib.urlretrieve(sUrl+cmd) #retrieve source of special links
    print tupTmp[0]                       #print location of source files

-----------------------------------------------------
Here is the result:

C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-21
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-22
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-24
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-25
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-26
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-27
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-28
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-29
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-30
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-31
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-32
C:\DOCUME~1\DANIEL~1.SEA\LOCALS~1\Temp\~1204-33
Traceback (most recent call last):
  File "C:\Python21\Pythonwin\pywin\framework\scriptutils.py", line 301,
in RunScript
    exec codeObject in __main__.__dict__
  File "C:\daniel\test.py", line 28, in ?
  File "C:\Python21\lib\urllib.py", line 78, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python21\lib\urllib.py", line 208, in retrieve
    fp = self.open(url, data)
  File "C:\Python21\lib\urllib.py", line 176, in open
    return getattr(self, name)(url)
  File "C:\Python21\lib\urllib.py", line 290, in open_http
    errcode, errmsg, headers = h.getreply()
  File "C:\Python21\lib\httplib.py", line 705, in getreply
    response = self._conn.getresponse()
  File "C:\Python21\lib\httplib.py", line 559, in getresponse
    response.begin()
  File "C:\Python21\lib\httplib.py", line 117, in begin
    line = self.fp.readline()
  File "C:\Python21\lib\socket.py", line 233, in readline
    new = self._sock.recv(self._rbufsize)
IOError: [Errno socket error] (10054, 'Connection reset by peer')
------
Can anyone out there help me interprit this error? Any help at all will
be appreciated!

Thanks,

Daniel