[Expat-discuss] Discussion: InternalEntityRefHandler

rolf@pointsman.de rolf@pointsman.de
Thu Jun 13 10:16:07 2002


>> > 2) allow the application to return a value that
>> >    indicates if the entity should be expanded or not
>> 
>> I've mixed emotions about that. What exactly should this return value
>> control? For a GE, if the return value is true, the replacement text
>> is reported again throu the characterDataHandler? There may be sense
>> in this, but what's the sense in returning false for a PE? Can't
>> imagin a sensible reason, does anybody else?
> 
> Good points, but it isn't the character handler, it's potentially all c=
ontent
> handlers. For GEs there have been requests to suppress expansion of cer=
tain
> types of entities, e.g. predefined ones. For PEs - I don't know.
> However, I am also missing a good argument for *not* wanting
> to ever ignore/suppress them.

Don't see, how any handler beside the characterDataHandler could be
able to handle not expanded entities in a sensible way.

Maybe a few examples could make things a bit clearer. Consider
a document like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo SYSTEM "1.ent">
<foo/>


with 1.ent:

<!-- 1 -->
<!ENTITY % draft 'INCLUDE' >
<![%draft;[
<!ELEMENT book (comments*, title, body, supplements?)>
]]>

<!-- 2 -->
<!ENTITY % someElement  "<!ELEMENT element ANY>">
%someElement;

<!-- 3 -->
<!ENTITY % fooContent "EMPTY" >
<!ELEMENT foo %fooContent;>

<!-- 4 -->
<!ENTITY % someBarChilds "boo,baz">
<!ELEMENT bar (foe,%someBarChilds;,bom)>

<!-- 5 -->
<!ENTITY % pub    "&#xc9;ditions Gallimard" >
<!ENTITY   rights "All rights reserved" >
<!ENTITY   book   "La Peste: Albert Camus, &#xA9; 1947 %pub; &rights;" >


What should happen in examples 1 if the
InternalEntityRefHandler return 0? Skip the hole conditional section??
Completely skipping the PE as well as leaving the PE in unexpanded
form results in not wellformed XML, in this case.

Example 2 is probably the least critical. If InternalEntityRefHandler
returns 0, the unexpanded PE goes throu defaultHandler?

In Example 3 and 4 I guess the InternalEntityRefHandler must be called
befor the elementDeclHandler. If the InternalEntityRefHandler returns
0, what should the elementDeclHandler return: how does the
%fooContent; fit into a XML_Content, same Question for the bar
content?

Even example 5 doesn't looks like a sensible case. Sure,
InternalEntityRefHandler should be called befor the
EntityDeclHandler. If it returns 0, the replacement text reported
throu the EntityDeclHandler would be "La Peste: Albert Camus, © 1947
%pub; &rights;"? But the XML rec says clearly in 4.5: "The replacement
text is the content of the entity, after replacement of character
references and parameter-entity references."

Still, this optional not expanding of PE's feels to me like a can of
worms. But OK, do it, I have no problem to let the
InternalEntityRefHandler return always 1 for PE's. There may be
valuable use cases for this, that I just don't see at the moment.

> If we return "False" the character handler simply returns the unexpande=
d
> reference, so that makes sense, but if we return "True", then we report
> it twice. However, the parser will not just report it through the chara=
cter
> handler, it will parse it like any other XML input - generating various=
 callbacks.
> So, the parser is still doing something we can't do (easily).
> So, we really are not reporting it twice the same way.

Yes. You're right.

>> > 1) There is possible interference with the SetDefaultHandler
>> > and SetDefaulthandlerExpand functions. The former will
>> > turn off expansion of internal general entities.
>> > However, this could now also be done with the InternalEntityRefHandl=
er.
>> > So, what happens when expansion is turned off?
> [..]
> Even with the InternalEntityRefHandler we might leave both in,

Yes. And without InternalEntityRefHandler set, let them behave as
now. Easiest way for backward compatibility.

> but declare one of them as deprecated (and possibly make them
> behave the same - but that might break some apps).

Yes, get rid of one of them, on the long run. But make them only
behave the same way, if InternalEntityRefHandler is set. 


Call InternalEntityRefHandler
>> with entity value NULL, if skippedEntityHandler isn't
>> set. (Well, this way an InternalEntityRefHandler could supersede the
>> skippedEntityHandler..).
> 
> That's an interesting idea - only have an InternalEntityRefHandler,
> and set entityValue = NULL if skipped (when not an error, of course).

Suddenly, I feel happy about that the skippedEntityHandler hasn't made
it into 1.95.3. But I'm not perfect sure. Let's hear other opinions.

> Anyway, you have raised doubts in my mind if it makes sense to
> report the entity value at all. What does it mean examining it?
> Isn't that what the parser does?

Depends. For sure, the parser will not be able to examining the
semantic of the replacement text, this could only be done by the
application. But the application could always manage his own entity
value lookup table (using entityDeclHandler etc.), if needed.  There
may be a minimal speed penalty, if the InternalEntityRefHandler
returns 0 (do not expand), because of the in this case 'unnecessary'
entity value lookup, but this may only be measurable in extrem
cases. On the other hand it would be a nice service, to get the entity
value without managing an own lookup table. Could live with both.

rolf