<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type content=text/html;charset=Windows-1252>
<META content="MSHTML 6.00.6000.16825" name=GENERATOR></HEAD>
<BODY id=MailContainerBody
style="PADDING-RIGHT: 10px; PADDING-LEFT: 10px; PADDING-TOP: 15px" leftMargin=0
topMargin=0 CanvasTabStop="true" name="Compose message area">
<DIV><FONT face=Garamond color=#000080>This is the error and
traceback:</FONT></DIV>
<DIV><FONT face=Garamond><FONT color=#000080><FONT face=Garamond
color=#000080></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Garamond color=#000080>Unexpected error opening J:/F2/....html:
mismatched tag: line 124, column 8</FONT></DIV>
<DIV><FONT face=Garamond color=#000080></FONT> </DIV>
<DIV><FONT face=Garamond color=#000080>Traceback (most recent call
last):<BR> File "C:\....py", line 492, in
<module><BR> raw = extractText(xhtmlfile)<BR> File
"C:\....py", line 334, in extractText<BR> tree =
make_tree(xhtmlfile)<BR> File "....py", line 169, in
make_tree<BR> return tree<BR>UnboundLocalError: local variable
'tree' referenced before assignment<BR></FONT><FONT face=Garamond
color=#000080> <BR></FONT></DIV>
<DIV style="FONT: 10pt Tahoma"><FONT face=Garamond color=#000080 size=3>Here is
line 124, col 8 and I cannot see any obvious missing/mismatched
tags:</FONT></DIV>
<DIV style="FONT: 10pt Tahoma"><FONT face=Garamond color=#000080
size=3></FONT> </DIV>
<DIV style="FONT: 10pt Tahoma"><FONT face=Garamond color=#000080
size=3>"<p>As to the present time I am unable physical and mentally to
secure all this information at present.</p>"</FONT></DIV>
<DIV style="FONT: 10pt Tahoma"><FONT face=Garamond><FONT size=3><FONT
color=#000080><FONT face=Garamond color=#000080
size=3></FONT></FONT></FONT></FONT> </DIV>
<DIV style="FONT: 10pt Tahoma"><FONT face=Garamond><FONT size=3><FONT
color=#000080><FONT face=Garamond color=#000080
size=3>Dinesh</FONT></FONT></FONT></FONT></DIV>
<DIV style="FONT: 10pt Tahoma"><FONT face=Garamond><FONT size=3><FONT
color=#000080><FONT face=Garamond color=#000080
size=3></FONT></FONT></FONT></FONT> </DIV>
<DIV style="FONT: 10pt Tahoma"><FONT face=Garamond><FONT size=3><FONT
color=#000080><FONT face=Garamond color=#000080
size=3></FONT></FONT></FONT></FONT> </DIV>
<DIV><FONT face=Garamond color=#000080><BR></FONT></DIV>
<DIV style="FONT: 10pt Tahoma; font-color: black"><FONT face=Garamond><FONT
size=3><FONT color=#000080><B>From:</B> </FONT></FONT></FONT><A
title="mailto:kent37@tds.net CTRL + Click to follow link"
href="mailto:kent37@tds.net"><FONT face=Garamond color=#000080 size=3>Kent
Johnson</FONT></A><FONT face=Garamond color=#000080 size=3> </FONT></DIV>
<DIV><FONT face=Garamond><FONT color=#000080><B>Sent:</B> Tuesday, April 28,
2009 7:13 AM</FONT></FONT></DIV>
<DIV><FONT face=Garamond><FONT color=#000080><B>To:</B> </FONT></FONT><A
title=dineshbvadhia@hotmail.com href="mailto:dineshbvadhia@hotmail.com"><FONT
face=Garamond color=#000080>Dinesh B Vadhia</FONT></A><FONT face=Garamond
color=#000080> </FONT></DIV>
<DIV><FONT face=Garamond><FONT color=#000080><B>Cc:</B> </FONT></FONT><A
title=tutor@python.org href="mailto:tutor@python.org"><FONT face=Garamond
color=#000080>tutor@python.org</FONT></A><FONT face=Garamond color=#000080>
</FONT></DIV>
<DIV><FONT face=Garamond><FONT color=#000080><B>Subject:</B> Re: [Tutor] finding
mismatched or unpaired html tags</FONT></FONT></DIV>
<DIV><FONT face=Garamond color=#000080><BR></FONT></DIV>
<DIV><FONT face=Garamond color=#000080>On Tue, Apr 28, 2009 at 8:54 AM, Dinesh B
Vadhia<BR><</FONT><A href="mailto:dineshbvadhia@hotmail.com"><FONT
face=Garamond color=#000080>dineshbvadhia@hotmail.com</FONT></A><FONT
face=Garamond color=#000080>> wrote:<BR>> I'm processing tens of thousands
of html files and a few of them contain<BR>> mismatched tags and ElementTree
throws the error:<BR>><BR>> "Unexpected error opening
J:/F2/663/blahblah.html: mismatched tag: line 124,<BR>> column
8"<BR>><BR>> I now want to scan each file and simply identify each
mismatched or unpaired<BR>> tags (by line number) in each file. I've
read the ElementTree docs and<BR>> cannot see anything obvious how to do
this. I know this is a common problem<BR>> but feeling a bit clueless
here - any ideas?<BR><BR>It seems like the exception gives you the line number.
What kind of<BR>exception is raised? The exception object may contain the line
and<BR>column in a more accessible form, so you could catch the
exception,<BR>get the line number, then read that line out of the file and show
it.<BR><BR>Kent<BR></FONT></DIV></BODY></HTML>