[ expat-Bugs-549014 ] May cause memory error in dtdCopy.

noreply@sourceforge.net noreply@sourceforge.net
Tue Jun 11 07:18:02 2002


Bugs item #549014, was opened at 2002-04-26 06:51
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=110127&aid=549014&group_id=10127

Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Jun Huang (huangjun_se)
Assigned to: Nobody/Anonymous (nobody)
Summary: May cause memory error in dtdCopy.

Initial Comment:
This problem may not a bug.If not ,I want somebody to
tell me how to use the XML_ExternalEntityParserCreate
and XML_ParserFree.Thank you.

In function "dtdCopy",there is a comment "/* Don't want
deep copying for scaffolding */".I don't understand
it's meaning.But the following code set
oldDtd->scaffIndex to newDtd->scaffIndex.I found it may
cause a memory error.
If a parentParser has allocated the memory pointed by
scaffIndex,I use XML_ExternalEntityParserCreate to
create a subParser.So the subParser will get the
scaffIndex of the parentParser.And then I call
XML_ParserFree to free the subParser,it will free the
memory pointed by scaffIndex of the subParser.But the
scaffIndex of the parentParser still pointed the memory
freed.Then if the following code visit the memory
pointed by the scaffIndex
,it will cause a memory error.



----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2002-06-11 10:17

Message:
Logged In: YES 
user_id=290026

No complaints received so far.
Let's close it now.

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-18 14:13

Message:
Logged In: YES 
user_id=290026

The fix in patch # 551599 seems to fix this,
but I suggest we leave this bug open until
more people have tested it.

Karl

----------------------------------------------------------------------

Comment By: Rolf Ade (pointsman)
Date: 2002-05-04 17:16

Message:
Logged In: YES 
user_id=13222

In other words, your mean:

  if (prologState.documentEntity) {
      if (p->scaffIndex)
          FREE(p->scaffIndex);
      if (p->scaffold)
          FREE(p->scaffold);
  }

at the end of dtdDestroy?

Nope, still seg faults. And it seems for 
the same reason, why parentParser could not used
for this.

As far, as I see, documentEntity is only set in
XmlPrologStateInit() (to 1) and in
XmlPrologStateInitExternalEntity() (to 0). Now
take again a look at the already quoted end of
XML_ExternalEntityParserCreate():

  if (context) {
#endif /* XML_DTD */
    if (!dtdCopy(&dtd, oldDtd, parser) ||
!setContext(parser, context)) {
      XML_ParserFree(parser);
      return 0;
    }
    processor = externalEntityInitProcessor;
#ifdef XML_DTD
  }
  else {
    dtdSwap(&dtd, oldDtd);
    parentParser = oldParser;
    XmlPrologStateInitExternalEntity(&prologState);
    dtd.complete = 1;
    hadExternalDoctype = 1;
  }

XmlPrologStateInitExternalEntity() is only called
for external parameter entities and the external
DTD subset (for this, context is 0). For external
general entities, documentEntity isn't set to
0. And therefor the test in dtdDestroy() above
doesn't work.

I can track this in my test case. This test case
has an external subset (for which documentEntity
is set to 0) and various external general entities
in the content. documentEntity is set to 0 for the
parser, that parses the external subset.  The
external entity parser created to parse the
external general entities doesn't have set
documentEntity to 0.

Consequently, the scaffolding stuff is free'ed in
dtdDestroy() - in the current CVS the same as with
both proposed tests - at the moment, the parser
for the first external general entity is free'ed.
If the 'child' parser created for the second
external general entity is freed, it makes *bang*:
seg fault.


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-04 16:44

Message:
Logged In: YES 
user_id=290026

No, documentEntity does not fit the bill either,
so it seems we need a new member.

After having a closer look: parentParser is not
set for external general entities probably because none
of the related code uses it. As far as I can tell,
parentParser is used exclusively by DTD processing code,
except for the XML_ParserFree function.

I have put together a patch (based on my patch # 551599)
that does the following:
- adds a new member m_isParamEntity and the associated macro
- sets parentParser to oldParser for both, external GEs
  and PEs, in XML_ExternalEntityParserCreate
- initializes isParamEntity to 0, but sets it to 1 where
  parentParser used to be set to oldParser
- replaces "if (parentParser)" with "if (isParamEntity)"
  in XML_ParserFree
- keeps the previously suggested code in dtdDestroy:
  if (!parentParser) {  
    if (p->scaffIndex)
      FREE(p->scaffIndex);
    if (p->scaffold)
      FREE(p->scaffold);
  }
  which should guarantee that dtd.scaffold is freed for
  the root parser only, since parentParser will always
  be set for child parsers
- all other checks of parentParentParser in the DTD
  handling code should probably be replaced with checks
  for isParamEntity, however, in DTD  handling code,
  whenever parentParser is set, isParamEntity = 1,
  and whenever parentParser is null, isParamEntity = 0,
  so the two are equivalent. Only outside of DTD handling
  code can they be different

If anybody is interested, I can e-mail the patch.
If it works - for those that can test it - then I will
probably add it as another improvement to patch # 551599.

Karl


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-04 14:44

Message:
Logged In: YES 
user_id=290026

You are right - damn!
I have to revisit my recent patch for PE handling!
Anyway, if parentParser does not fit the bill,
then prologState.documentEntity might.

Could you please check that too.

Karl

----------------------------------------------------------------------

Comment By: Rolf Ade (pointsman)
Date: 2002-05-04 12:23

Message:
Logged In: YES 
user_id=13222


I'm sure, I don't understand all the internals in
every detail. But: I would say, Karl Waclawek's
code

  if (!parentParser) {
    if (p->scaffIndex)
      FREE(p->scaffIndex);
    if (p->scaffold)
      FREE(p->scaffold);
  }

goes into the right direction, but this alone
doesn't fix the problem (at least doesn't fix the
problem completely).

I've an example, which triggers the seg fault
(it's a bit complex, a hole application, to much,
to attach it, but if somebody is interested
enough, I can point you to the code and supply
test data.)

Even with Karls modification, this example still
seg faults. But, as said, where at the right
trace. If I remove the freeing of p->scaffIndex
and p->scaffold from dtdDestroy(), everthing works
as expected (but with a memory leak, of course).

But how could this be? Well, as far as I see,
parentParser is _not_ set in every case in
XML_ExternalEntityParserCreate(). Take a look at
the end of this function:

  if (context) {
#endif /* XML_DTD */
    if (!dtdCopy(&dtd, oldDtd, parser) ||
!setContext(parser, context)) {
      XML_ParserFree(parser);
      return 0;
    }
    processor = externalEntityInitProcessor;
#ifdef XML_DTD
  }
  else {
    dtdSwap(&dtd, oldDtd);
    parentParser = oldParser;
    XmlPrologStateInitExternalEntity(&prologState);
    dtd.complete = 1;
    hadExternalDoctype = 1;
  }

parentParser is only set (to a value != 0), if
context is false.

If I use 

  parentParser = oldParser;
  if (context) {
#endif /* XML_DTD */
    if (!dtdCopy(&dtd, oldDtd, parser) ||
!setContext(parser, context)) {
      XML_ParserFree(parser);
      return 0;
    }
    processor = externalEntityInitProcessor;
#ifdef XML_DTD
  }
  else {
    dtdSwap(&dtd, oldDtd);
    XmlPrologStateInitExternalEntity(&prologState);
    dtd.complete = 1;
    hadExternalDoctype = 1;
  }

(moved the parentParser setting outside the if, so
that it's always set). This, together with Karl's
code fixes the problem for me.

But I'm really not sure, if this could (should) be
done. I don't recall all details of my debugging
sessions (it was some month ago), but if I recall
right, there are reasons, for that the
parentParser is not always set. It's a way to
distinguish, if the external entity parser parses
an external parameter entity or a general external
entity (for parameter entities context is 0). This
distinction seems to be used at some places and it
looks like, we need it (please correct me, if I'm
wrong).

So, what to do? Back to Fred's propose of a new
interal flag, that's only set for the master
parser? Looks like this would be the simplest
solution, with the lowest risk of unexpected side
effects.

Does somebody knows, if this distinction is
necessary somewhere, for the parser, to work
correct?



----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-04 10:05

Message:
Logged In: YES 
user_id=290026

Here is some more analysis (I am fairly new to Expat, so
I may repeat what you already know):

dtd.scaffold is used as a "working memory" pool for
building content models passed to an element declaration
handler.  It is passed between parsers only so that the
working memory does not have to be allocated again (speed).
The passing from parent to child is done using dtdCopy
or dtdSwap. From all of that I would say that it should
only be freed at the very end, when the root parser gets
destroyed.

So, to achieve that, simply change the code
in dtdDestroy from:
...
  if (p->scaffIndex)
    FREE(p->scaffIndex);
  if (p->scaffold)
    FREE(p->scaffold);
...

to:

...
  if (!parentParser) {
    if (p->scaffIndex)
      FREE(p->scaffIndex);
    if (p->scaffold)
      FREE(p->scaffold);
  }
...

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-04 01:25

Message:
Logged In: YES 
user_id=290026

Just tested my solution for the dangling pointers:
Won't work, since it should only be applied when
a dtdCopy was performed (external GE), not when
a dtdSwap was done (external PE). However, we don't
have a flag that tells us which type of external
entity was parsed.

Another approach could be to clear the old pointers right
when dtdCopy is performed (and also reset scaffCount and 
scaffSize). That way, even if dtd.scaffold was used again,
it would be allocated fram scratch, since the code will 
detect the null pointers.

A lot depends on how dtdCopy and dtd.scaffold are used. 
Currently dtdCopy is only called for creating an external
GE parser, and dtd.scaffold is only used for building the
content model of an element declaration, which means it is
not shared across parsers.

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-03 23:45

Message:
Logged In: YES 
user_id=290026

There is already a flag: look at the parentParser macro.
However, as I said, this dangling pointer does not
cause an actual problem. I found this out by looking at
the code as well as creating a specific test situation,
where an external parser would be created, then freed,
and then element declaration would be parsed which would
require use of the dtd.scaffold member.

However, as it appeared to me, all of this happens
during parsing of the DTD, but the dtdCopy function
will not have been be called since it is only called
when the childparser has context !=0, which means
it parses a general entity. And that never happens
before the DTD parsing is completed.

The dangling pointers could simply be set to null
like this in XML_ParserFree:

...
#ifdef XML_DTD
  if (parentParser) {
    dtdSwap(&dtd, &((Parser *)parentParser)->m_dtd);
  }
#endif /* XML_DTD */
  dtdDestroy(&dtd, parser);
// now, add this code:
((Parser *)parentParser)->m_dtd.scaffold = 0;
((Parser *)parentParser)->m_dtd.scaffIndex = 0;

Karl

----------------------------------------------------------------------

Comment By: Rolf Ade (pointsman)
Date: 2002-05-03 23:06

Message:
Logged In: YES 
user_id=13222

An internal flag for this would be great; the simplest
solution, very low cost in speed and memory. I would love,
to have a this way fixed expat.


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2002-05-03 22:54

Message:
Logged In: YES 
user_id=3066

Since the "master" and descendent parsers are created using
different functions, we should just add an internal field
that indicates whether the parser is the master or not.  I
think we just need to know whether the parser is the master,
but don't need a pointer to the master.

Karl, does this sound sufficient?  I think we can do this
with no public API changes.

----------------------------------------------------------------------

Comment By: Rolf Ade (pointsman)
Date: 2002-05-03 22:49

Message:
Logged In: YES 
user_id=13222


Exactly. I came to the same conclusion as Jun
Huang, while debugging some rare seg faults of an
expat based application, I work with. You have to
work with external entities (must use
XML_ExternalEntityParserCreate()) to see the
error.

I second this bug report, saying loud that this
man is right, not only in that there is a very
hard problem (seg fault) with using external
enitities with expat but also with his analysis of
the reason for this problem.

OK, given that, the obvious question is, how to
fix that? As far as I see, there isn't a simple
way to determine, if a parser is, well, the
'master', or the 'main' parser or if it's a
parser, created to parse an external entity. It's
OK, to collect the DTD Information of the internal
subset and the parts in the (could be) multiple
parameter entities in one memory structure, ie I
think, it's OK to not deep dopying the
scaffolding. But the memory has to be freed in the
outmost parser.


----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-03 14:57

Message:
Logged In: YES 
user_id=290026

Looking closer at the code: dtdCopy will only be called
for a child parser, if the entity is a general entity,
not a parameter entity. dtd.scaffold will only be used
when the parser is processing the external or internal
subset, which always happens *before* any general external
entity is processed. So, by planning (or coincidence <g>) 
dtd.scaffold will not get used after being freed as 
described.

However, we still have a dangling pointer, which should
be set to null.

Karl

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2002-05-03 12:37

Message:
Logged In: YES 
user_id=290026

It seems your observation is correct.
This can cause memory errors.
I am just curious why I haven't seen them yet.

Karl

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=110127&aid=549014&group_id=10127