[XML-SIG] SAX exceptions are odd
Martin v. Loewis
Thu, 5 Oct 2000 23:36:18 +0200
> If I call on parse on an empty file, I get no exception. Is this
> desirable? I assume it means that "" is well-formed XML, but that
> doesn't seem like a very helpful definition. Is this right?
No, that looks like a bug in the expat parser. The xmlproc parser (in
PyXML) properly reports
FATAL ERROR in /tmp/foo:1:0: Premature document end, no root element
(when foo is an empty file)
> If I get almost any other exception I get an error message that says
> something like: "not well-formed at None:1:7"
> Why is None being printed? It gave me the initial impression that my
> error was no setting up parse call correctly. I assumed that the None
> was the cause of the exception and that under normal circumstances it
> would have said something like "not well-formed at foo.xml:1:7".
If the InputSource object has a proper system identifier, it should
print it. It may be useful to print something different if it is None,
"not well-formed at <unknown>:1:7"
If you did provide a file name, and it got lost somewhere - then that
is a bug.
> What is a system identifier and why should it be reported in an
> exception when it is None?
I believe it is the SGML term for "file name". In SGML, documents may
have "public identifiers", in which case a globally well-known string
refers to the name of the document, and a system identifier - whose
meaning is understood only on the local computer system.
I also believe XML more specifically thinks of system identifiers as
URLs - although it is common to allow strings which are not URLs
(according to the RFC).
> There are three different pieces of
> information separated by colons. I am accustomed to the notation
> filename:line number, but not another colon for the cursor position.
That's a matter of taste - you can write your own ErrorHandler if you
don't like the output. I personally understood immediately that
notation, as this is what Emacs supports as file locations.
> It would have been clearer, I think, if the message were more
> verbose and explained what each field was.
For reproducability, it is probably best if it is terse - we would
probably have a long debate on what it should look like if it had to