[Expat-bugs] [ expat-Bugs-683681 ] XML_GetCurrent* functions for doctype declaration/DTD events

SourceForge.net noreply at sourceforge.net
Mon Feb 10 08:09:15 EST 2003


Bugs item #683681, was opened at 2003-02-10 01:12
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=683681&group_id=10127

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Rolf Ade (pointsman)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: XML_GetCurrent* functions for doctype declaration/DTD events

Initial Comment:
I find (1.95.6)  the return values of the
XML_GetCurrent* functions, if called in a doctype
declaration or DTD event handler
(XML_StartDoctypeDeclHandler,
XML_EndDoctypeDeclHandler, XML_ElementDeclHandler
etc.), surprising and at least under documented.

The reference.html file is a bit spare about the
XML_GetCurrent* functions. For example, the
documentation of XML_GetCurrentLineNumber() says only:
"Return the line number of the position." What exactly
is 'the postion', if the function is called in a event
handler?

The comments in the expat.h file are more explicit.
Especially they
mention:

   They may be called from any callback called to
report some parse
   event; in this case the location is the location of
the first of the
   sequence of characters that generated the event.


Now consider for example the following simple xml data:

<!DOCTYPE test SYSTEM "file:///boo.baz"     [
   <!ELEMENT test EMPTY>
   <!ATTLIST test attr CDATA #IMPLIED>
]>
<test attr="value"/>

A simple demo program, which calls all the
XML_GetCurrent* functions in the
XML_StartDoctypeDeclHandler(),
XML_EndDoctypeDeclHandler(), XML_ElementDeclHandler()
and XML_AttlistDeclHandler() gives the following output:

doctypeStart: line 1 column 44 index  44 count  1
elementDecl:  line 2 column 18 index  64 count  0
attlistDecl:  line 3 column 29 index 100 count  0
doctypeEnd:   line 4 column  1 index 111 count  1
elementStart: line 5 column  0 index 113 count 20

If called in an elementStart handler, the
XML_GetCurrent* functions return sensible values. Line
5 column 0 is the opening "<" of that tag, as the
comment in expat.h says, and the complete markup
reported is 20 characters long. Very fine.

If called in the doctype declaration start handler,
element declaration handler or attlist declaration
handler, the results getting stranger. The position,
reported by the XML_GetCurrentLine/ColumnNumber is
somewhere inside the reported markup and the results of
XML_GetCurrentByteCount looks really somewhat wired. At
least, the result of XML_GetCurrentByteIndex points
always to the same char as XML_GetCurrentLine/ColumnNumber.

The current behavior seems to allow me, to do what I
want (preserve the internal subset as found in the
original XML data, with copying the parts of the input
streams as indicated by XML_GetCurrent* function calls
in the doctype declaration start/end handler) but all
in all, this behavior isn't really considered to be
stable or 'the right one' and for sure, it's not
documented, so that one could bank on it.

rolf


----------------------------------------------------------------------

>Comment By: Rolf Ade (pointsman)
Date: 2003-02-10 16:09

Message:
Logged In: YES 
user_id=13222


Ah, right, Karl. I missed that sentences at the start of the
chapter "Parse position and error reporting functions",
because I jumped from the toc via in page link to
XML_GetCurrentLineNumber(). Sorry for that :-(.

For my concrete problem the current behaviour seems to be
'just right' - what XML_GetCurrentLine/ColumnNumber return
in a doctype declaration start element is exactly the
information, I need. But since it isn't documented (or
better it is documented as 'bogus information), this may
change at any point, and seems nothing, to rely on.

Though, this isn't really important to me, at the moment, so
there is no action needed (eventually beside a small
addition to the related comment in expat.h, to clearify the
topic also at this point).

Thanks
rolf


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2003-02-10 02:32

Message:
Logged In: YES 
user_id=290026

This is what reference.html has to say:
<quote>
The position reporting functions are accurate only outside
of the DTD. In other words, they usually return bogus 
information when called from within a DTD declaration handler.
</quote>

I don't know why nothing like that is mentioned in expat.h.

Strictly speaking this works as documented and is not a bug.
However, it would be nice if it worked as Rolf wanted it to.
I haven't had a chance yet to investigate to which degree 
it would be possible to fix that. Any ideas?

Leave open for Fred to comment (here and in expat.h <g>).

Karl



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=683681&group_id=10127



More information about the Expat-bugs mailing list