[XML-SIG] Proposed Expat API changes

Fred L. Drake, Jr. fdrake@acm.org
Thu, 8 Aug 2002 14:46:59 -0400


--+P7Q+Q/GTh
Content-Type: text/plain; charset=us-ascii
Content-Description: message body and .signature
Content-Transfer-Encoding: 7bit


I've proposed some changes to Expat's C API on the expat-discuss list;
these changes would allow pull-based and mixed-mode parsers to be
built on top of Expat.

Unfortunately, the message hasn't appeared in the online archives;
this is the cost of using SF's mailing lists.  ;-(  I've attached the
proposal to this email, in case anyone is interested.  Followups
pertaining to Expat's C API should be directed to the expat-discuss
list:

        http://sourceforge.net/mail/?group_id=10127


  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation


--+P7Q+Q/GTh
Content-Type: text/plain
Content-Description: Proposed Expat API changes
Content-Disposition: inline;
	filename="api-changes.txt"
Content-Transfer-Encoding: 7bit

Implementing a blocking mode in Expat
=====================================

Requests for a pull-based API for Expat have surfaced a few times over
(at least) the last couple of years; there is a feature request for
this on SourceForge (issue #544682):

http://sourceforge.net/tracker/index.php?func=detail&aid=544682&group_id=10127&atid=110127

An additional motivation is that we'd like to be able to share a
codebase with the Mozilla project, which is currently using a
substantially modified version of an older version of Expat.

Pull-based parsers have become increasingly popular as the limitations
of DOM- or SAX-like APIs have become better known.  The pull-based
APIs provide an opportunity to build each part of an application in
the way that's most appropriate, allowing a mixture of DOM- and
SAX-like behaviors.

Expat could provide the basis for an efficient pull-based API if it
offered an opportunity to suspend parsing temporarily, allowing
parsing to resume when the application is ready for additional
information from the document.  A .NET-like API could easily be built
on top of such a feature.

Karl Waclawek and I have been having discussions about this, and think
we have a good idea of how to introduce such a feature into Expat.
There are questions and issues regarding the possible API that would
need to be exposed; I've summarized our ideas an analysis below in the
form of two alternate API proposals.

We welcome feedback and discussion, including the introduction of
additional API proposals, on the expat-discuss list.


Supporting Information
----------------------

Expat 1.95.6 / 1.96 will include a new enumeration, XML_Status,
specifying return values for the XML_Parse() and XML_ParseBuffer()
functions.  Our recommendation is that the result of XML_Parse() and
XML_ParseBuffer() be tested for these values specifically, even when
using older versions of Expat 1.95.x -- this will be completely
equivalent in practice.  This change allows us to extend the number of
possible return values in the future; the documented API in Expat 1.95
through 1.95.4 really only defines a boolean interpretation of these
return values, but only the two specific values, now named by
XML_Status enum names, were actually used.


API Option 1
------------

This alternative introduces two new functions and three new constants.
These are only needed if an application uses the new functionality.

XML_STATUS_SUSPENDED

    New value in the XML_Status enumeration.  This is only used if
    XML_SuspendParser() has been called.

XML_ERROR_NOT_SUSPENDED
XML_ERROR_SUSPENDED

    These new error codes would be used to indicate that a call to the
    parser was made when the parser was not in the expected internal
    state, and indicate programming errors in the application.

XML_Status
XML_SuspendParser(XML_Parser parser)

    Inform the parser that parsing should be suspended when the
    currently active callback returns.  It should only be called from
    a callback.  Returns XML_STATUS_OK or XML_STATUS_ERROR.  Multiple
    calls to XML_SuspendParser() during a callback are allowed, and
    are equivalent to a single call to XML_SuspendParser().  It is an
    error to call this function while a callback function is not
    active.

XML_Status
XML_ResumeParser(XML_Parser parser)

    Resume parsing using a suspended parser.  Returns XML_STATUS_OK,
    XML_STATUS_ERROR, or XML_STATUS_SUSPENDED.  If the parser has not
    been suspended, this returns XML_STATUS_ERROR, and
    XML_GetErrorCode() returns XML_ERROR_NOT_SUSPENDED.  The parser is
    not invalidated in this case, and parsing may be continued with
    additional input using XML_Parse() or XML_ParseBuffer().

The following functions change:

XML_Status
XML_Parse(XML_Parser parser, const char *s, int len, int isFinal)

XML_Status
XML_ParseBuffer(XML_Parser parser, int len, int isFinal)

    These two existing functions will change the meaning of their
    return value slightly.  If parsing is suspended using
    XML_SuspendParser(), they will return XML_STATUS_SUSPENDED,
    otherwise the current values of XML_STATUS_OK and XML_STATUS_ERROR
    may be returned.

    If XML_STATUS_SUSPENDED is returned, the parse of the input
    document can only be resumed using XML_ResumeParser().  If either
    of these is called on a suspended parser, XML_ERROR_OK will be
    returned with the error code XML_ERROR_SUSPENDED returned by
    XML_GetErrorCode().  The parser is not invalidated in this case,
    and parsing may still be resumed.

void *
XML_GetBuffer(XML_Parser parser, int len)

    If the parser has been suspended, returns NULL and
    XML_GetErrorCode() returns XML_ERROR_SUSPENDED.  Parsing the input
    which has already been passed into Expat should be continued using
    XML_ResumeParser().  No changes if the parser was not suspended.


Potential Issues
----------------

The risk inherent in this API varient is that it does change the
interpretation of the return code for XML_Parse() and
XML_ParseBuffer().  This is only significant if any callback ever
calls XML_SuspendParser().  In the case of suspension,
XML_STATUS_SUSPENDED would be returned, but an existing main loop will
recognize this as a successful parse.  This would be a programming
error in the revised API, but not the old API.  If the buffer being
parsed was not the last buffer, a reasonable error would be returned
when the main loop calls XML_Parse() or XML_ParseBuffer() is called
again, but if the last input buffer was already passed (isFinal is
true), there would be no opportunity to report the error, possibly
making it difficult to diagnose application errors introduced by this
change.

We don't know how important this change is in practice for Expat
1.95.x users; we would appreciate feedback on the expat-discuss list.


API Option 2
------------

This version of the API changes provide increased backward
compatibility, at the cost of a cruftier API to Expat.

An alternate version of the API also adds the XML_SuspendParser() and
XML_ResumeParser() functions, and the new XML_ERROR_* constants, but
not the new XML_Status value.  This variant would describe suspension
as a pseudo-error from the XML_Parse() and XML_ParseBuffer()
functions, allowing existing applications to report "errors" from the
main loop if they had not been prepared for the suspension feature,
but some callback function called XML_SuspendParser().  This would
only be expected to occur during development, but applications that
only suspend parsing occaissionally may find that poorly tested code
paths expose problems late in the development cycle or even after the
application has entered production.

The alternate version uses this description for XML_Parse() and
XML_ParseBuffer():

XML_Status
XML_Parse(XML_Parser parser, const char *s, int len, int isFinal)

XML_Status
XML_ParseBuffer(XML_Parser parser, int len, int isFinal)

    If XML_STATUS_ERROR is returned, a main loop which supports the
    suspension feature needs to check whether XML_GetErrorCode(parser)
    == XML_ERROR_SUSPENDED.  If so, the parse was suspended and the
    call to continue the parse needs to be XML_ResumeParser().
    Otherwise, the error is "real".

This approach conflates error codes with the state of the parse, and
labels the normal operation of the parser as an error.

--+P7Q+Q/GTh--