[XML-SIG] Re: SAX exceptions are odd
Thu, 5 Oct 2000 16:59:08 -0400 (EDT)
[Lars M. writes:]
>* Jeremy Hylton
>| If I call on parse on an empty file, I get no exception. Is this
>| desirable? I assume it means that "" is well-formed XML, but that
>| doesn't seem like a very helpful definition. Is this right?
>No, it's not right. You should get an error telling you that the
>document element is required.
Ok. Then consider it a bug report :-). Can you fix this and add a
test case to the test suite?
>| If I get almost any other exception I get an error message that says
>| something like: "not well-formed at None:1:7"
>Expat is not very good at providing informative error messages, so I
>don't think you can expect much more. If you want better error
>messages you should probably use xmlproc or xmllib.
I think the explanation part of the error message is okay, could be
better but not terrible. The part that's confusing is the
>As for the None that should imply that you just gave the parser a
>string to parse and didn't provide it with a system identifier (ie:
>URL or file name).
How does it know when I pass it a string and when I pass it a system
identifier? In Python, system identifiers are strings?!? What if I
have a file called "<foo>" will it open that file or attempt to parse
it as a string?
>| Why is None being printed? It gave me the initial impression that my
>| error was no setting up parse call correctly. I assumed that the None
>| was the cause of the exception and that under normal circumstances it
>| would have said something like "not well-formed at foo.xml:1:7".
>If you told it that you were parsing from foo.xml it should definitely
>return that information in the error message. Can you show us the
>exact call to parse?
I have a file foo in my current directory. I fire up Python:
> ls -l foo
-rw-rw-r-- 1 jeremy admin 0 Oct 5 16:57 foo
Python 2.0b2 (#18, Oct 5 2000, 09:53:11)
[GCC 2.95.2 19991024 (release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from xml.sax import parse, ContentHandler
>>> parse("foo", ContentHandler())
>| What is a system identifier and why should it be reported in an
>| exception when it is None?
>The system identifier is SGML-speak (and XML-speak) for the location
>of the document being parsed. I guess we could leave it out in the
>cases where it is None, if people prefer that. (I personally have no
>opinion on that.)
I personally prefer that.
>| I also think the format is odd. There are three different pieces of
>| information separated by colons. I am accustomed to the notation
>| filename:line number, but not another colon for the cursor position.
>| It would have been clearer, I think, if the message were more
>| verbose and explained what each field was.
>How about this:
> "Not well-formed in foo.xml at line %d, column %d."
>If you prefer that I'd be happy to change both that and the lost
>system identifier (if that is indeed the problem).
I would like this a lot better. It will be appreciated by novice
programmers and whiners like me.