[Expat-discuss] parsing error

Nikolai Koudelia nikoudel at gmail.com
Mon Oct 29 11:43:48 CET 2007


Hi!

I am trying to parse "xml" document with Python and Expat. I need to
scan through xml and collect values which match pattern. Example:

pattern:
<tr option1="GROUP1"><td>GROUP2</td><tr>

With pattern above I need to fetch "asdf" and "qwerty" from material below:

<table>
  <tr option1="asdf"><td>qwerty</td></tr>
</table>

The problem is that the material may not be correct. It may look like this:

<table>
  <tr option1="asdf"><td>qwerty</td></tr>
  </brokentag>
  <tr option1="rtyu"><td>fgh 16</td></tr>
</table>

When expat parser reaches </brokentag>, it throws an exception and
stops parsing. Is there a way to handle situation like that? Some
option telling expat to skip broken closing tags? Or should I repair
the material before parsing? Last one could be quite tricky, because
expat could not be used for that... Any ideas?

-NK


More information about the Expat-discuss mailing list