[Expat-discuss] Discussion: InternalEntityRefHandler

rolf@pointsman.de rolf@pointsman.de
Wed Jun 12 19:45:03 2002


(I'm a bit shy to answer, because I mostly have some minor objections,
but no clear counter proposal. The reason to follow up is the hope,
that some of my doubts and questions may help to keep the discussion
roll.)

On 11 Jun, Karl Waclawek wrote:
> We plan to add an InternalEntityRefHandler for release 1.95.4.
> Among other reasons, such a handler would make Expat more fully
> SAX compliant.

I think this would be a valuable addition to the expat API. For
example, to build some XML editor application, that aims to preserve
the internal entities of the document throu the editing with expat as
parser is really hard - it may not impossible, using
XML_EntityDeclHandler() and listen at the defaultHandler, plus a lot
of fuzz, but it's much to hard. An InternalEntityRefHandler could be a
lot of help here.

> Here is what we have so far:
> 
> An internal entity reference handler should probably
> do two things:
> 
> 1) pass name, entity value and entity type (PE or GE)

I guess, with entity value you mean "replacement text" (in the sense
of the first paragraph of XML rec 4.5), since even the
XML_EntityDeclHandler() gives that, right? With an
XML_EntityDeclHandler(), that stores the entity name/value pairs in a
lookup table and the name passed, it isn't strictly necessary also to
pass the value, but it's fine with me, if expat does this work as a
service.

> 2) allow the application to return a value that
>    indicates if the entity should be expanded or not

I've mixed emotions about that. What exactly should this return value
control? For a GE, if the return value is true, the replacement text
is reported again throu the characterDataHandler? There may be sense
in this, but what's the sense in returning false for a PE? Can't
imagin a sensible reason, does anybody else?

> The second one is necessary, since - contrary
> to the external entity ref handler - the entity value
> is already determined, whereas in the external
> entity ref handler it is the handler that "creates"
> that data, so by doing nothing it achieves the
> effect of ignoring the entity automatically.

Ignoring an internal entity completely may probably not the best of
all ideas. The question is: are there reasons (and may it be
convenience) to report the entity value twice (first throu the
InternalEntityRefHandler and, if that has returned true, throu the
characterDataHandler)? 

I think an InternalEntityRefHandler could simply return void. Just see
the InternalEntityRefHandler as a variant of the
characterDataHandler. The document data, reported throu the
InternalEntityRefHandler is special only in that it has two different
forms: the 'reference form' of the entity name and the 'expanded form'
replacement text. Both forms are reported throu the planned
InternalEntityRefHandler, so I don't see the point of reporting the
replacement text again (depending of the return value of the
InternalEntityRefHandler).

> Here are a few problems that need to be resolved:
> 
> 1) There is possible interference with the SetDefaultHandler
> and SetDefaulthandlerExpand functions. The former will
> turn off expansion of internal general entities.
> However, this could now also be done with the InternalEntityRefHandler.
> So, what happens when expansion is turned off?

Without InternalEntityRefHandler set, let it, as it is (it's not
perfect, I agree). That warrantees backward compatibility (I'm
personllay not very keen about that in this area, but some may be).

If the proposed InternalEntityRefHandler is set, it depends on the
answer of the question what InternalEntityRefHandler should return, if
ever. With a void InternalEntityRefHandler, I would say, doesn't
report either entity name nor replacement text, regardless if
SetDefaultHandler or SetDefaulthandlerExpand is set (because there is
a handler for this data, so why should it be reported also throu the
default handler, which reports (in theory, counter example is some DTD
data) only all data, for which is no handler registered). A bit
clumsy, yes, but it doesn't feel better, if InternalEntityRefHandler
returns a value. The reason is:

> Personally, I don't like to have such a feature as a side-effect
> of setting a handler.

I second this. 
 
 My vote would go for having only
> one function - SetDefaultHandler - *and* removing the influence
> it has on internal entity expansion completely.
> The InternalEntityRefHandler could deal with that on a per-call basis.

Yes.

> 2) How does that work with the skippedEntityHandler if
> the InternalEntityRefHandler returns with a value indicating
> that the entity reference should be ignored?

Let me first ask the question, what the InternalEntityRefHandler
should return for an entity, that also triggers the
skippedEntityHandler? Entity value could only be NULL, in this case,
yes? There are some ways, and I'm not completely sure. I would say
just don't call the InternalEntityRefHandler at all, for undeclared
entities (and this isn't an error), if a skippedEntityHandler is
set. Call it with entity value NULL, if skippedEntityHandler isn't
set. (Well, this way an InternalEntityRefHandler could supersede the
skippedEntityHandler..).

rolf