[Expat-bugs] [ expat-Bugs-1109116 ] Optimize implementation of XML_ParserReset

SourceForge.net noreply at sourceforge.net
Thu Jan 27 05:06:19 CET 2005


Bugs item #1109116, was opened at 2005-01-25 09:46
Message generated for change (Comment added) made by fdrake
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1109116&group_id=10127

Category: None
Group: Feature Request
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Karl Waclawek (kwaclaw)
Summary: Optimize implementation of XML_ParserReset

Initial Comment:
The current implementation of XML_ParserReset sets the
user data pointer and the handlers to NULL. So, the
user has to set these again. At a first glance, there
is no need to do things this way. Instead,
XML_ParserReset could reset the internal parser data
only and leave user data and handlers untouched. So
after this function has been called, the parser would
be ready to start parsing a new document (just like
stated in the documentation of this function). To
completely reset the parser, the user would still be
able to call XML_ParserFree and XML_ParserCreate.

Stefan Letz.
stefan.letz <at> de.ibm.com

----------------------------------------------------------------------

>Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2005-01-26 23:06

Message:
Logged In: YES 
user_id=3066

Either adding it now or after works for me, but after 
probably makes more sense given that we're trying to wrap 
up a stable 2.0. 
 
My own suspicion is that the cost of re-initializing a 
parser should be thought of as two separate parts: 
re-initializing the parse state within Expat, and 
re-initializing the user-provided information (callbacks 
and user-data).  The parse state involves a whole pile of 
memory structures that the user has no direct access to 
(or control over the cost of re-initialization), and the 
user-provided information is most a collection of pointers 
for which the initialization cost is dominated by the 
function call overhead. 
 
This leads me to think that there's no substantial 
performance benefit in this for most applications, but 
there may be substantial benefit for apps running on 
embedded processors. 
 
It would be interesting to know if Sebastian's application 
is running in an embedded environment, and if not, what 
the motivation for the original question might be.  If the 
motivation is not efficiency but convenience, then it can 
certainly wait for Expat 3. 
 
FTR, the original post that generated this tracker issue 
can be found in the expat-discuss mailing list archives: 
 
http://mail.libexpat.org/pipermail/expat-discuss/2005-January/001726.html 
 

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2005-01-26 22:08

Message:
Logged In: YES 
user_id=290026

Your last suggestion seems reasonable.

What about this - let's wait until after Expat 2.0, as the
subsequent releases won't be backwards compatible anyway.
Then we can simply add the "aspects" parameter to the
existing XML_ParserReset() function without having to worry
about compatibility.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2005-01-26 01:52

Message:
Logged In: YES 
user_id=3066

The new function could be XML_ParserResetAspects(parser,
aspects), where aspects is a mask of bits defined in the
enum.  The initial set of aspects could include

XML_ASPECT_USERDATA
XML_ASPECT_HANDLERS
XML_ASPECT_DOCUMENT_CONTEXT

I think that covers the aspects that are currently dealt
with, but haven't reviewed the current code or the patch.


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2005-01-25 13:02

Message:
Logged In: YES 
user_id=290026

Maybe its the new member that could gain an extra argument.
This could be an enum, so that we could add more choices
later (if ever needed).

XML_ParserResetDocument() isn't that clear to me either.
What about XML_ParserResetCustom()? Or some name that
shows that there are choices?

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2005-01-25 12:21

Message:
Logged In: YES 
user_id=3066

Perhaps XML_ParserResetDocument()?  "Partial" isn't clear 
as to the intent. 
 
I understand the concern about too many API members.  I 
think a good rule of thumb is that if you have a simple 
boolean that will (in practice) always be passed a 
constant value, it makes more sense to use separate entry 
points.  If we expected someone to compute the value (or 
load from preferences, or whatever), it would make more 
sense to support it as an argument. 
 
The backward compatibility issue would still require a 
separate entry point in this case, however. 
 

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2005-01-25 11:48

Message:
Logged In: YES 
user_id=290026

So, you suggest a new API member, let's call it
XML_ParserPartialReset(parser, ...), which would
then keep the handlers and certain other settings.

I guess this is for backward compatibility, otherwise
I would have suggested adding another argument to the
call, like "XML_BOOL clearAll", as i don't like adding too
many API members.

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2005-01-25 11:27

Message:
Logged In: YES 
user_id=3066

While I don't expect this change to affect most 
applications, it may well affect several in unexpected 
ways that are difficult to debug. 
 
For applications which set the handlers and never change 
them, it's not a problem, and does save some overhead when 
initializing the reset parser. 
 
A different approach to using the handlers, however, is to 
change the handlers as information is loaded.  This is 
certainly done in at least one language binding (Python's 
PyExpat) to deal with error conditions; it's also a 
capability exposed to the user (which I know I've used). 
 
I think a new name should be assigned for the new 
semantics; there's no reason we can't support both. 
 

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2005-01-25 10:20

Message:
Logged In: YES 
user_id=290026

I have created a patch against current CVS (xmlparse.c).
XML_ParseReset will not clear any of the settings
caused by these API functions:

- all call-back handler setters (XML_Set...Handler)
- XML_SetUserData
- XML_SetBase
- XML_UseParserAshandlerArg
- XML_SetExternalEntityRefHandlerArg

Please try the patch and report any problems it may cause.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1109116&group_id=10127


More information about the Expat-bugs mailing list