[Expat-discuss] Fw: Extra character inserted in CharacterData Handler?

Subramanian, Binu binu.subramanian@barconet.com
Wed Jul 24 04:46:03 2002


Hello,

I am facing exactly the same problem. In my case the characters are the
Euro, trademark, etc.
When i write the xml file, i replace the Euro character with its numerical
entity € 
I have specified the encoding for my XML file as UTF-8.

Now when the expat parser parses the file, it appends the  character. so it
is  followed by the Euro character.
What should i do to get rid of the extra character?
Am i missing something here?
Binu


> 
 > The "." character in your file - 0xB7 - is invalid UTF-8.
 > Maybe it is valid ISO-8859-1?
 > In that case you must add an XML declaration.
 > 
 > Actually, 1.95.3 should reject it (and it does so on my system).
 
 Rolf Ade just pointed out to me that I didn't read your code.
 You passed the ISO-8859-1 encoding to the parser, so there
 was no error on your side.
 
 However, what you reported looks exactly like what a word processor
 would show you when it expects ISO-8859-1, but gets UTF-8 (tested with
Wordpad).
 Now, this would be a correct result, since Expat only passes UTF-8
 or UTF-16 to its handlers, no matter what the input.
 
 Karl