[XML-SIG] SAX exceptions are odd

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Thu, 5 Oct 2000 23:36:18 +0200

> If I call on parse on an empty file, I get no exception.  Is this
> desirable?  I assume it means that "" is well-formed XML, but that
> doesn't seem like a very helpful definition.  Is this right?

No, that looks like a bug in the expat parser. The xmlproc parser (in
PyXML) properly reports

FATAL ERROR in /tmp/foo:1:0: Premature document end, no root element

(when foo is an empty file)

> If I get almost any other exception I get an error message that says
> something like: "not well-formed at None:1:7"
> Why is None being printed?  It gave me the initial impression that my
> error was no setting up parse call correctly.  I assumed that the None
> was the cause of the exception and that under normal circumstances it
> would have said something like "not well-formed at foo.xml:1:7".

If the InputSource object has a proper system identifier, it should
print it. It may be useful to print something different if it is None,
   "not well-formed at <unknown>:1:7"

If you did provide a file name, and it got lost somewhere - then that
is a bug.

> What is a system identifier and why should it be reported in an
> exception when it is None?

I believe it is the SGML term for "file name". In SGML, documents may
have "public identifiers", in which case a globally well-known string
refers to the name of the document, and a system identifier - whose
meaning is understood only on the local computer system.

I also believe XML more specifically thinks of system identifiers as
URLs - although it is common to allow strings which are not URLs
(according to the RFC).

> There are three different pieces of
> information separated by colons.  I am accustomed to the notation
> filename:line number, but not another colon for the cursor position.

That's a matter of taste - you can write your own ErrorHandler if you
don't like the output. I personally understood immediately that
notation, as this is what Emacs supports as file locations.

> It would have been clearer, I think, if the message were more
> verbose and explained what each field was.

For reproducability, it is probably best if it is terse - we would
probably have a long debate on what it should look like if it had to