[Python-checkins] python/dist/src/Tools/webchecker webchecker.py,1.29,1.30

mhammond@users.sourceforge.net mhammond@users.sourceforge.net
Wed, 26 Feb 2003 22:59:13 -0800


Update of /cvsroot/python/python/dist/src/Tools/webchecker
In directory sc8-pr-cvs1:/tmp/cvs-serv13525

Modified Files:
	webchecker.py 
Log Message:
When bad HTML is encountered, ignore the page rather than failing with
a traceback.


Index: webchecker.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Tools/webchecker/webchecker.py,v
retrieving revision 1.29
retrieving revision 1.30
diff -C2 -d -r1.29 -r1.30
*** webchecker.py	12 Nov 2002 22:19:34 -0000	1.29
--- webchecker.py	27 Feb 2003 06:59:10 -0000	1.30
***************
*** 401,405 ****
              self.markdone(url_pair)
              return
!         page = self.getpage(url_pair)
          if page:
              # Store the page which corresponds to this URL.
--- 401,413 ----
              self.markdone(url_pair)
              return
!         try:
!             page = self.getpage(url_pair)
!         except sgmllib.SGMLParseError, msg:
!             msg = self.sanitize(msg)
!             self.note(0, "Error parsing %s: %s",
!                           self.format_url(url_pair), msg)
!             # Dont actually mark the URL as bad - it exists, just
!             # we can't parse it!
!             page = None
          if page:
              # Store the page which corresponds to this URL.