[Expat-discuss] CharacterDataHandler question.

Karl Waclawek karl@waclawek.net
Mon, 23 Sep 2002 09:25:26 -0400


> I red the documentation about the CharacterDataHandler and I think
> this event is called when there's data inside tags.  I'm using the C++
> Wrapper written by Tim Smith.
> 
> The fact is that I'm testing it and I discovered that this handler is
> called even if there's no data inside tags.
> e.g.:
> <A></A>

If <A></A> is on a line of itself, then Expat will report
the line breaks before and after!
 
> It is called twice and the data is first a CR and then a white space.
> On the other hand I know that this handler can be called more than once
> for the same tag bringing pieces of information (why is that?).  

What if the character data is "interrupted" with child elements?
 
> What I cannot solve is differencing between a valid white space and this
> strange spureous white space.
> 
> e.g.
> <A>THIS HAS THREE SPACES</A>
> 
> My first idea:
> On start element: init a variable.
> On character: save adding on the variable.
> On end element: take off the CR and the last white space.
> 
> Do I have to do all that work?
> If so I think I can write a subclass from Tim's one to do it only once.

If you are sure that the extra whitespace is reported *between*
the start and end tags, then I would suggest you try out Expat directly.
If the behavious is still there, then please file a bug report and
attach a small example to reproduce the behaviour.

You can also try the other C++ wrapper, which is SAX2 compliant:
http://www.jezuk.co.uk/cgi-bin/view/arabica

Karl