[Expat-bugs] [ expat-Bugs-768010 ] Extended characters not parsed
correctly...
SourceForge.net
noreply at sourceforge.net
Wed Jul 9 09:19:49 EDT 2003
Bugs item #768010, was opened at 2003-07-08 20:13
Message generated for change (Comment added) made by etpalmer
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=768010&group_id=10127
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Palmer (etpalmer)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extended characters not parsed correctly...
Initial Comment:
We have been using the expat parser for a couple years
and have recently had need for parsing extended
characters (128-255). Even though I specify "iso-8859-
1" in my parser create, it still is returning me incorrect
character data. I am not sure what I am doing wrong.
I am on a window 2000 platform with 1.95.6 now.
I feed in the following:
<?xml version="1.0" encoding="iso-8859-1"?>
<MESSAGE>TÖNY</MESSAGE>
What I get out of the parse is (start,character data,
hex of character data, end):
MESSAGE
TÖNY
54 ffffffc3 ffffff96 4e 59
MESSAGE
My code is as follows:
void startElement(void *userData, const char *name,
const char **atts)
{
printf("%s\n",name);
}
void endElement(void *userData, const char *name)
{
printf("%s\n",name);
}
void characterData(void *userData, const XML_Char *s,
int len)
{
char s2[256000];
int i;
strncpy(s2,s,len);
s2[len]=0;
printf("%s\n", s2);
for (i = 0; i < len; ++i)
printf("%x ",s2[i]);
printf("\n");
}
void ParseXML(const char* csInput)
{
XML_Parser parser = XML_ParserCreate("iso-
8859-1");
int len = 0;
XML_SetElementHandler(parser, startElement,
endElement);
XML_SetCharacterDataHandler(parser,
characterData);
len = strlen(csInput);
printf("started XML parse\n");
if (!XML_Parse(parser, csInput, len, 1))
{
printf("%s at line %d column %d\n",
XML_ErrorString(XML_GetErrorCode
(parser)),
XML_GetCurrentLineNumber(parser),
XML_GetCurrentColumnNumber
(parser));
return;
}
XML_ParserFree(parser);
}
Any help would be appreciated.
Thanks,
Tony
----------------------------------------------------------------------
>Comment By: Tony Palmer (etpalmer)
Date: 2003-07-09 15:19
Message:
Logged In: YES
user_id=818598
Thanks for the clarification. I guess then I am not sure how I
am supposed to convert from UTF-8 back to the iso-8859-1
character that was passed into the parser.
----------------------------------------------------------------------
Comment By: Karl Waclawek (kwaclaw)
Date: 2003-07-08 20:20
Message:
Logged In: YES
user_id=290026
Are you aware that Expat reports XML content encoded
as UTF-8 or UTF-16 only? The document's encoding
is irrelevant for that.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=768010&group_id=10127
More information about the Expat-bugs
mailing list