[Expat-discuss] Fw: Extra character inserted in CharacterData Handler?
Subramanian, Binu
binu.subramanian@barconet.com
Wed Jul 24 04:46:03 2002
Hello,
I am facing exactly the same problem. In my case the characters are the
Euro, trademark, etc.
When i write the xml file, i replace the Euro character with its numerical
entity €
I have specified the encoding for my XML file as UTF-8.
Now when the expat parser parses the file, it appends the  character. so it
is  followed by the Euro character.
What should i do to get rid of the extra character?
Am i missing something here?
Binu
>
> The "." character in your file - 0xB7 - is invalid UTF-8.
> Maybe it is valid ISO-8859-1?
> In that case you must add an XML declaration.
>
> Actually, 1.95.3 should reject it (and it does so on my system).
Rolf Ade just pointed out to me that I didn't read your code.
You passed the ISO-8859-1 encoding to the parser, so there
was no error on your side.
However, what you reported looks exactly like what a word processor
would show you when it expects ISO-8859-1, but gets UTF-8 (tested with
Wordpad).
Now, this would be a correct result, since Expat only passes UTF-8
or UTF-16 to its handlers, no matter what the input.
Karl