[Expat-bugs] [ expat-Bugs-1190887 ] bad charset mangement in DTD

SourceForge.net noreply at sourceforge.net
Tue May 3 21:59:29 CEST 2005


Bugs item #1190887, was opened at 2005-04-27 05:57
Message generated for change (Comment added) made by kwaclaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1190887&group_id=10127

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: bad charset mangement in DTD

Initial Comment:
Run the example hereafter you will have this message :
not well-formed (invalid token) (line 1, offset 17)

The char "é" is in ISO-8859-1but expat doesn't like it.

e-mail : lemathias at gmail.com



#include <stdio.h>
#include "expat.h"


#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>


static int XMLCALL test(XML_Parser parser,
                       const XML_Char *context,
                       const XML_Char *base,
                       const XML_Char *systemId,
                       const XML_Char *publicId)
{
    char *text = (char *)XML_GetUserData(parser);
    XML_Parser extparser;

    extparser = XML_ExternalEntityParserCreate(parser,
context, NULL);
    if (extparser == NULL)
    {
      fprintf(stderr, "fine\n");
      return 0;
    }
    if (  XML_Parse(extparser, text, strlen(text),
XML_TRUE)
          == XML_STATUS_ERROR) {
      fprintf(stderr,"%s (line %d, offset %d)\n",
            XML_ErrorString(XML_GetErrorCode(extparser)),
            XML_GetCurrentLineNumber(extparser),
            XML_GetCurrentColumnNumber(extparser));
        return XML_STATUS_ERROR;
    }
    return XML_STATUS_OK;
}

int main(int argc, char *argv[])
{
    char *text =
        "<?xml version='1.0' encoding='ISO-8859-1'
standalone='yes'?>\n"
        "<!DOCTYPE doc SYSTEM 'foo'>\n"
        "<doc>&entity;</doc>";
    char *foo_text =
        "<!ELEMENT Employés (#PCDATA)*>"; /* "é" is in
the ISO-8859-1 charset */

    XML_Parser  parser = XML_ParserCreate(NULL);
    if (parser == NULL)
    {
      return 0;
    }

    XML_SetParamEntityParsing(parser,
XML_PARAM_ENTITY_PARSING_ALWAYS);
    XML_SetUserData(parser, foo_text);
    XML_SetExternalEntityRefHandler(parser, test);

    if (XML_Parse(parser, text, strlen(text), XML_TRUE)
== XML_STATUS_OK)
    {
      fprintf(stderr, "fine\n");
      return 0;
    }
 
    if (XML_GetErrorCode(parser) !=
XML_ERROR_UNDEFINED_ENTITY)
    {
      fprintf(stderr, "fine\n");
      return 0;
    }

  return 1;
}

----------------------------------------------------------------------

>Comment By: Karl Waclawek (kwaclaw)
Date: 2005-05-03 15:59

Message:
Logged In: YES 
user_id=290026

The DTD contained in foo_text is an external entity.
As such it is treated separately (it does not inherit
the document entity's encoding declaration).

You need to add an XML declaration with the appriopriate
encoding to foo_text, and your program should work.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1190887&group_id=10127


More information about the Expat-bugs mailing list