[XML-SIG] Error handling in PyExpat

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 21 Mar 2001 14:58:34 +0100


> I am using PyExpat to parse XML files and sometimes these files are
> not correct.  If I find an error in my handler (start_element,
> end_element or characters), I raise an exception and abort
> processing the XML file.  If I raise the exception my self in the
> handler, parser.ErrorLineNumber (and other variables describing the
> error position) are not available to my code (ErrorLineNumber
> contains a random value); that is in the exception handler that
> catches my exception.

Yes, expat does not support user-identified error lines. However, it
should be possible to propagate such information with the exception
that you raise.

> It should be possible to detect the exception in the expat parser
> module and set call set_error() in pyexpat.c if the information is
> available from expat.

Not sure what you mean. set_error generates a Python exception when
the expat parser has produced an error. That has nothing to do with
errors that callback functions might have found.

> Unfortunately the (C level) handlers are void functions so there
> must be another way to tell expat that processing has failed.

I don't think so. This is C, so there is no means of exception
handling. Once a callback is invoked, it is safe to assume that the
XML in itself is correct. You have to let expat finish parsing before
it returns to you (AFAIK).

Of course, once pyexpat has seen a Python exception, all callbacks are
cleared, so no further events get reported.

> I have checked my (between PyXML-0.6.3 and 0.6.4) PyExpat source and
> the xmlplus sources for the SAX implementation but did not find the
> code I am looking for.  Are there plans to implement this or should
> I do it my self?

In expat proper? Not my plan, certainly. In pyexpat? Don't know how.
If you can come up with some code to do what you want, that would be
good.

> If I raise an exception inside a handler, pyexpat.c.set_error()
> should be called
> (or some other function that gets line number, column number, byte posision
> etc.).

flag_error is called in that case; I don't think it should manipulate
the user's exception object.

Regards,
Martin