[ expat-Bugs-549014 ] May cause memory error in dtdCopy.

noreply@sourceforge.net noreply@sourceforge.net
Sat May 4 07:06:03 2002


Bugs item #549014, was opened at 2002-04-26 06:51
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=110127&aid=549014&group_id=10127

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jun Huang (huangjun_se)
Assigned to: Nobody/Anonymous (nobody)
Summary: May cause memory error in dtdCopy.

Initial Comment:
This problem may not a bug.If not ,I want somebody to
tell me how to use the XML_ExternalEntityParserCreate
and XML_ParserFree.Thank you.

In function "dtdCopy",there is a comment "/* Don't want
deep copying for scaffolding */".I don't understand
it's meaning.But the following code set
oldDtd->scaffIndex to newDtd->scaffIndex.I found it may
cause a memory error.
If a parentParser has allocated the memory pointed by
scaffIndex,I use XML_ExternalEntityParserCreate to
create a subParser.So the subParser will get the
scaffIndex of the parentParser.And then I call
XML_ParserFree to free the subParser,it will free the
memory pointed by scaffIndex of the subParser.But the
scaffIndex of the parentParser still pointed the memory
freed.Then if the following code visit the memory
pointed by the scaffIndex
,it will cause a memory error.



----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-04 10:05

Message:
Logged In: YES 
user_id=290026

Here is some more analysis (I am fairly new to Expat, so
I may repeat what you already know):

dtd.scaffold is used as a "working memory" pool for
building content models passed to an element declaration
handler.  It is passed between parsers only so that the
working memory does not have to be allocated again (speed).
The passing from parent to child is done using dtdCopy
or dtdSwap. From all of that I would say that it should
only be freed at the very end, when the root parser gets
destroyed.

So, to achieve that, simply change the code
in dtdDestroy from:
...
  if (p->scaffIndex)
    FREE(p->scaffIndex);
  if (p->scaffold)
    FREE(p->scaffold);
...

to:

...
  if (!parentParser) {
    if (p->scaffIndex)
      FREE(p->scaffIndex);
    if (p->scaffold)
      FREE(p->scaffold);
  }
...

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-04 01:25

Message:
Logged In: YES 
user_id=290026

Just tested my solution for the dangling pointers:
Won't work, since it should only be applied when
a dtdCopy was performed (external GE), not when
a dtdSwap was done (external PE). However, we don't
have a flag that tells us which type of external
entity was parsed.

Another approach could be to clear the old pointers right
when dtdCopy is performed (and also reset scaffCount and 
scaffSize). That way, even if dtd.scaffold was used again,
it would be allocated fram scratch, since the code will 
detect the null pointers.

A lot depends on how dtdCopy and dtd.scaffold are used. 
Currently dtdCopy is only called for creating an external
GE parser, and dtd.scaffold is only used for building the
content model of an element declaration, which means it is
not shared across parsers.

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-03 23:45

Message:
Logged In: YES 
user_id=290026

There is already a flag: look at the parentParser macro.
However, as I said, this dangling pointer does not
cause an actual problem. I found this out by looking at
the code as well as creating a specific test situation,
where an external parser would be created, then freed,
and then element declaration would be parsed which would
require use of the dtd.scaffold member.

However, as it appeared to me, all of this happens
during parsing of the DTD, but the dtdCopy function
will not have been be called since it is only called
when the childparser has context !=0, which means
it parses a general entity. And that never happens
before the DTD parsing is completed.

The dangling pointers could simply be set to null
like this in XML_ParserFree:

...
#ifdef XML_DTD
  if (parentParser) {
    dtdSwap(&dtd, &((Parser *)parentParser)->m_dtd);
  }
#endif /* XML_DTD */
  dtdDestroy(&dtd, parser);
// now, add this code:
((Parser *)parentParser)->m_dtd.scaffold = 0;
((Parser *)parentParser)->m_dtd.scaffIndex = 0;

Karl

----------------------------------------------------------------------

Comment By: Rolf Ade (pointsman)
Date: 2002-05-03 23:06

Message:
Logged In: YES 
user_id=13222

An internal flag for this would be great; the simplest
solution, very low cost in speed and memory. I would love,
to have a this way fixed expat.


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-05-03 22:54

Message:
Logged In: YES 
user_id=3066

Since the "master" and descendent parsers are created using
different functions, we should just add an internal field
that indicates whether the parser is the master or not.  I
think we just need to know whether the parser is the master,
but don't need a pointer to the master.

Karl, does this sound sufficient?  I think we can do this
with no public API changes.

----------------------------------------------------------------------

Comment By: Rolf Ade (pointsman)
Date: 2002-05-03 22:49

Message:
Logged In: YES 
user_id=13222


Exactly. I came to the same conclusion as Jun
Huang, while debugging some rare seg faults of an
expat based application, I work with. You have to
work with external entities (must use
XML_ExternalEntityParserCreate()) to see the
error.

I second this bug report, saying loud that this
man is right, not only in that there is a very
hard problem (seg fault) with using external
enitities with expat but also with his analysis of
the reason for this problem.

OK, given that, the obvious question is, how to
fix that? As far as I see, there isn't a simple
way to determine, if a parser is, well, the
'master', or the 'main' parser or if it's a
parser, created to parse an external entity. It's
OK, to collect the DTD Information of the internal
subset and the parts in the (could be) multiple
parameter entities in one memory structure, ie I
think, it's OK to not deep dopying the
scaffolding. But the memory has to be freed in the
outmost parser.


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-03 14:57

Message:
Logged In: YES 
user_id=290026

Looking closer at the code: dtdCopy will only be called
for a child parser, if the entity is a general entity,
not a parameter entity. dtd.scaffold will only be used
when the parser is processing the external or internal
subset, which always happens *before* any general external
entity is processed. So, by planning (or coincidence <g>) 
dtd.scaffold will not get used after being freed as 
described.

However, we still have a dangling pointer, which should
be set to null.

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-03 12:37

Message:
Logged In: YES 
user_id=290026

It seems your observation is correct.
This can cause memory errors.
I am just curious why I haven't seen them yet.

Karl

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=110127&aid=549014&group_id=10127