[Expat-bugs] [ expat-Bugs-1690883 ] Expat rejects xml: prefix with "unbound prefix" error.

SourceForge.net noreply at sourceforge.net
Mon May 7 09:31:36 CEST 2007


Bugs item #1690883, was opened at 2007-03-29 15:31
Message generated for change (Settings changed) made by klemscot
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1690883&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Open
Resolution: Works For Me
Priority: 5
Private: No
Submitted By: Scott Klement (klemscot)
Assigned to: Nobody/Anonymous (nobody)
Summary: Expat rejects xml: prefix with "unbound prefix" error.

Initial Comment:
When namespace handling is enabled, Expat considers the following document invalid:

<Text xml:lang="en">Foo</Text>

The error is "Unbound prefix", referring to the "xml:" prefix.  However, according the W3C spec, it's not necessary to declare that particular prefix.  I'm looking at the following document:

http://www.w3.org/TR/REC-xml-names/

Under chapter 3, subheading "Namespace constraint: Reserved Prefixes and Namespace Names"

This is the section I'm reading:

The prefix xml is by definition bound to the namespace name http://www.w3.org/XML/1998/namespace. ****It MAY, but need not, be declared****, and MUST NOT be bound to any other namespace name. Other prefixes MUST NOT be bound to this namespace name, and it MUST NOT be declared as the default namespace.

----------------------------------------------------------------------

Comment By: Scott Klement (klemscot)
Date: 2007-05-07 02:30

Message:
Logged In: YES 
user_id=223473
Originator: YES

Ooops.  Sorry about the duplicate comment! 

Here's a patch (Unified Diff) that solves the problem for me. 

--- /home/klemscot/as400/expat-2.0.0.orig/lib/ascii.h	Tue Nov 12 15:45:57
2002
+++ ./ascii.h	Mon May  7 01:25:36 2007
@@ -83,3 +83,10 @@
 #define ASCII_LSQB 0x5B
 #define ASCII_RSQB 0x5D
 #define ASCII_UNDERSCORE 0x5F
+#define ASCII_LPAREN 0x28
+#define ASCII_RPAREN 0x29
+#define ASCII_FF 0x0C
+#define ASCII_SLASH 0x2F
+#define ASCII_HASH 0x23
+#define ASCII_PIPE 0x7C
+#define ASCII_COMMA 0x2C
--- /home/klemscot/as400/expat-2.0.0.orig/lib/xmlparse.c	Fri Dec 23
08:45:27 2005
+++ ./xmlparse.c	Mon May  7 01:21:55 2007
@@ -5,6 +5,7 @@
 #include <stddef.h>
 #include <string.h>                     /* memset(), memcpy() */
 #include <assert.h>
+#include <ascii.h>
 
 #define XML_BUILDING_EXPAT 1
 
@@ -665,10 +666,12 @@
 }
 
 static const XML_Char implicitContext[] = {
-  'x', 'm', 'l', '=', 'h', 't', 't', 'p', ':', '/', '/',
-  'w', 'w', 'w', '.', 'w', '3', '.', 'o', 'r', 'g', '/',
-  'X', 'M', 'L', '/', '1', '9', '9', '8', '/',
-  'n', 'a', 'm', 'e', 's', 'p', 'a', 'c', 'e', '\0'
+  ASCII_x, ASCII_m, ASCII_l, ASCII_EQUALS, ASCII_h, ASCII_t, ASCII_t,
ASCII_p,
+  ASCII_COLON, ASCII_SLASH, ASCII_SLASH, ASCII_w, ASCII_w, ASCII_w, 
+  ASCII_PERIOD, ASCII_w, ASCII_3, ASCII_PERIOD, ASCII_o, ASCII_r,
ASCII_g,
+  ASCII_SLASH, ASCII_X, ASCII_M, ASCII_L, ASCII_SLASH, ASCII_1, ASCII_9,
+  ASCII_9, ASCII_8, ASCII_SLASH, ASCII_n, ASCII_a, ASCII_m, ASCII_e,
+  ASCII_s, ASCII_p, ASCII_a, ASCII_c, ASCII_e, '\0'
 };
 
 XML_Parser XMLCALL
@@ -761,7 +764,7 @@
   unknownEncodingHandler = NULL;
   unknownEncodingHandlerData = NULL;
 
-  namespaceSeparator = '!';
+  namespaceSeparator = ASCII_EXCL;
   ns = XML_FALSE;
   ns_triplets = XML_FALSE;
 
@@ -2806,7 +2809,7 @@
             return XML_ERROR_NO_MEMORY;
           uriHash = CHAR_HASH(uriHash, c);
         }
-        while (*s++ != XML_T(':'))
+        while (*s++ != XML_T(ASCII_COLON))
           ;
         do {  /* copies null terminator */
           const XML_Char c = *s;
@@ -2880,7 +2883,7 @@
     if (!binding)
       return XML_ERROR_UNBOUND_PREFIX;
     localPart = tagNamePtr->str;
-    while (*localPart++ != XML_T(':'))
+    while (*localPart++ != XML_T(ASCII_COLON))
       ;
   }
   else if (dtd->defaultPrefix.binding) {
@@ -2935,17 +2938,21 @@
            const XML_Char *uri, BINDING **bindingsPtr)
 {
   static const XML_Char xmlNamespace[] = {
-    'h', 't', 't', 'p', ':', '/', '/',
-    'w', 'w', 'w', '.', 'w', '3', '.', 'o', 'r', 'g', '/',
-    'X', 'M', 'L', '/', '1', '9', '9', '8', '/',
-    'n', 'a', 'm', 'e', 's', 'p', 'a', 'c', 'e', '\0'
+    ASCII_h, ASCII_t, ASCII_t, ASCII_p, ASCII_COLON, ASCII_SLASH,
ASCII_SLASH,
+    ASCII_w, ASCII_w, ASCII_w, ASCII_PERIOD, ASCII_w, ASCII_3,
ASCII_PERIOD,
+    ASCII_o, ASCII_r, ASCII_g, ASCII_SLASH, ASCII_X, ASCII_M, ASCII_L, 
+    ASCII_SLASH, ASCII_1, ASCII_9, ASCII_9, ASCII_8, ASCII_SLASH,
+    ASCII_n, ASCII_a, ASCII_m, ASCII_e, ASCII_s, ASCII_p, ASCII_a,
ASCII_c,
+    ASCII_e, '\0'
   };
   static const int xmlLen = 
     (int)sizeof(xmlNamespace)/sizeof(XML_Char) - 1;
   static const XML_Char xmlnsNamespace[] = {
-    'h', 't', 't', 'p', ':', '/', '/',
-    'w', 'w', 'w', '.', 'w', '3', '.', 'o', 'r', 'g', '/',
-    '2', '0', '0', '0', '/', 'x', 'm', 'l', 'n', 's', '/', '\0'
+    ASCII_h, ASCII_t, ASCII_t, ASCII_p, ASCII_COLON, ASCII_SLASH,
ASCII_SLASH,
+    ASCII_w, ASCII_w, ASCII_w, ASCII_PERIOD, ASCII_w, ASCII_3,
ASCII_PERIOD,
+    ASCII_o, ASCII_r, ASCII_g, ASCII_SLASH, ASCII_2, ASCII_0, ASCII_0, 
+    ASCII_0, ASCII_SLASH, ASCII_x, ASCII_m, ASCII_l, ASCII_n, ASCII_s, 
+    ASCII_SLASH, '\0'
   };
   static const int xmlnsLen = 
     (int)sizeof(xmlnsNamespace)/sizeof(XML_Char) - 1;
@@ -2962,13 +2969,13 @@
     return XML_ERROR_UNDECLARING_PREFIX;
 
   if (prefix->name
-      && prefix->name[0] == XML_T('x')
-      && prefix->name[1] == XML_T('m')
-      && prefix->name[2] == XML_T('l')) {
+      && prefix->name[0] == XML_T(ASCII_x)
+      && prefix->name[1] == XML_T(ASCII_m)
+      && prefix->name[2] == XML_T(ASCII_l)) {
 
     /* Not allowed to bind xmlns */
-    if (prefix->name[3] == XML_T('n')
-        && prefix->name[4] == XML_T('s')
+    if (prefix->name[3] == XML_T(ASCII_n)
+        && prefix->name[4] == XML_T(ASCII_s)
         && prefix->name[5] == XML_T('\0'))
       return XML_ERROR_RESERVED_PREFIX_XMLNS;
 
@@ -3628,23 +3635,30 @@
          XML_Bool haveMore)
 {
 #ifdef XML_DTD
-  static const XML_Char externalSubsetName[] = { '#' , '\0' };
+  static const XML_Char externalSubsetName[] = { ASCII_HASH , '\0' };
 #endif /* XML_DTD */
-  static const XML_Char atypeCDATA[] = { 'C', 'D', 'A', 'T', 'A', '\0'
};
-  static const XML_Char atypeID[] = { 'I', 'D', '\0' };
-  static const XML_Char atypeIDREF[] = { 'I', 'D', 'R', 'E', 'F', '\0'
};
-  static const XML_Char atypeIDREFS[] = { 'I', 'D', 'R', 'E', 'F', 'S',
'\0' };
-  static const XML_Char atypeENTITY[] = { 'E', 'N', 'T', 'I', 'T', 'Y',
'\0' };
+  static const XML_Char atypeCDATA[] = { ASCII_C, ASCII_D, ASCII_A,
ASCII_T, 
+                                         ASCII_A, '\0' };
+  static const XML_Char atypeID[] = { ASCII_I, ASCII_D, '\0' };
+  static const XML_Char atypeIDREF[] = { ASCII_I, ASCII_D, ASCII_R,
ASCII_E, 
+                                         ASCII_F, '\0' };
+  static const XML_Char atypeIDREFS[] = { ASCII_I, ASCII_D, ASCII_R,
ASCII_E,
+                                          ASCII_F, ASCII_S, '\0' };
+  static const XML_Char atypeENTITY[] = { ASCII_E, ASCII_N, ASCII_T,
ASCII_I,
+                                          ASCII_T, ASCII_Y, '\0' };
   static const XML_Char atypeENTITIES[] =
-      { 'E', 'N', 'T', 'I', 'T', 'I', 'E', 'S', '\0' };
+      { ASCII_E, ASCII_N, ASCII_T, ASCII_I, ASCII_T, ASCII_I, ASCII_E, 
+        ASCII_S, '\0' };
   static const XML_Char atypeNMTOKEN[] = {
-      'N', 'M', 'T', 'O', 'K', 'E', 'N', '\0' };
+      ASCII_N, ASCII_M, ASCII_T, ASCII_O, ASCII_K, ASCII_E, ASCII_N, '\0'
};
   static const XML_Char atypeNMTOKENS[] = {
-      'N', 'M', 'T', 'O', 'K', 'E', 'N', 'S', '\0' };
+      ASCII_N, ASCII_M, ASCII_T, ASCII_O, ASCII_K, ASCII_E, ASCII_N, 
+      ASCII_S, '\0' };
   static const XML_Char notationPrefix[] = {
-      'N', 'O', 'T', 'A', 'T', 'I', 'O', 'N', '(', '\0' };
-  static const XML_Char enumValueSep[] = { '|', '\0' };
-  static const XML_Char enumValueStart[] = { '(', '\0' };
+      ASCII_N, ASCII_O, ASCII_T, ASCII_A, ASCII_T, ASCII_I, ASCII_O,
ASCII_N,
+      ASCII_LPAREN, '\0' };
+  static const XML_Char enumValueSep[] = { ASCII_PIPE, '\0' };
+  static const XML_Char enumValueStart[] = { ASCII_LPAREN, '\0' };
 
   /* save one level of indirection */
   DTD * const dtd = _dtd; 
@@ -3950,11 +3964,11 @@
                              0, parser))
           return XML_ERROR_NO_MEMORY;
         if (attlistDeclHandler && declAttributeType) {
-          if (*declAttributeType == XML_T('(')
-              || (*declAttributeType == XML_T('N')
-                  && declAttributeType[1] == XML_T('O'))) {
+          if (*declAttributeType == XML_T(ASCII_LPAREN)
+              || (*declAttributeType == XML_T(ASCII_N)
+                  && declAttributeType[1] == XML_T(ASCII_O))) {
             /* Enumerated or Notation type */
-            if (!poolAppendChar(&tempPool, XML_T(')'))
+            if (!poolAppendChar(&tempPool, XML_T(ASCII_RPAREN))
                 || !poolAppendChar(&tempPool, XML_T('\0')))
               return XML_ERROR_NO_MEMORY;
             declAttributeType = tempPool.start;
@@ -3987,11 +4001,11 @@
                              declAttributeIsCdata, XML_FALSE, attVal,
parser))
           return XML_ERROR_NO_MEMORY;
         if (attlistDeclHandler && declAttributeType) {
-          if (*declAttributeType == XML_T('(')
-              || (*declAttributeType == XML_T('N')
-                  && declAttributeType[1] == XML_T('O'))) {
+          if (*declAttributeType == XML_T(ASCII_LPAREN)
+              || (*declAttributeType == XML_T(ASCII_N)
+                  && declAttributeType[1] == XML_T(ASCII_O))) {
             /* Enumerated or Notation type */
-            if (!poolAppendChar(&tempPool, XML_T(')'))
+            if (!poolAppendChar(&tempPool, XML_T(ASCII_RPAREN))
                 || !poolAppendChar(&tempPool, XML_T('\0')))
               return XML_ERROR_NO_MEMORY;
             declAttributeType = tempPool.start;
@@ -4318,14 +4332,14 @@
       }
       break;
     case XML_ROLE_GROUP_SEQUENCE:
-      if (groupConnector[prologState.level] == '|')
+      if (groupConnector[prologState.level] == ASCII_PIPE)
         return XML_ERROR_SYNTAX;
-      groupConnector[prologState.level] = ',';
+      groupConnector[prologState.level] = ASCII_COMMA;
       if (dtd->in_eldecl && elementDeclHandler)
         handleDefault = XML_FALSE;
       break;
     case XML_ROLE_GROUP_CHOICE:
-      if (groupConnector[prologState.level] == ',')
+      if (groupConnector[prologState.level] == ASCII_COMMA)
         return XML_ERROR_SYNTAX;
       if (dtd->in_eldecl
           && !groupConnector[prologState.level]
@@ -4337,7 +4351,7 @@
         if (elementDeclHandler)
           handleDefault = XML_FALSE;
       }
-      groupConnector[prologState.level] = '|';
+      groupConnector[prologState.level] = ASCII_PIPE;
       break;
     case XML_ROLE_PARAM_ENTITY_REF:
 #ifdef XML_DTD
@@ -5267,7 +5281,7 @@
   DTD * const dtd = _dtd;  /* save one level of indirection */
   const XML_Char *name;
   for (name = elementType->name; *name; name++) {
-    if (*name == XML_T(':')) {
+    if (*name == XML_T(ASCII_COLON)) {
       PREFIX *prefix;
       const XML_Char *s;
       for (s = elementType->name; s != name; s++) {
@@ -5314,12 +5328,12 @@
     poolFinish(&dtd->pool);
     if (!ns)
       ;
-    else if (name[0] == XML_T('x')
-        && name[1] == XML_T('m')
-        && name[2] == XML_T('l')
-        && name[3] == XML_T('n')
-        && name[4] == XML_T('s')
-        && (name[5] == XML_T('\0') || name[5] == XML_T(':'))) {
+    else if (name[0] == XML_T(ASCII_x)
+        && name[1] == XML_T(ASCII_m)
+        && name[2] == XML_T(ASCII_l)
+        && name[3] == XML_T(ASCII_n)
+        && name[4] == XML_T(ASCII_s)
+        && (name[5] == XML_T('\0') || name[5] == XML_T(ASCII_COLON))) {
       if (name[5] == XML_T('\0'))
         id->prefix = &dtd->defaultPrefix;
       else
@@ -5330,7 +5344,7 @@
       int i;
       for (i = 0; name[i]; i++) {
         /* attributes without prefix are *not* in the default namespace
*/
-        if (name[i] == XML_T(':')) {
+        if (name[i] == XML_T(ASCII_COLON)) {
           int j;
           for (j = 0; j < i; j++) {
             if (!poolAppendChar(&dtd->pool, name[j]))
@@ -5352,7 +5366,7 @@
   return id;
 }
 
-#define CONTEXT_SEP XML_T('\f')
+#define CONTEXT_SEP XML_T(ASCII_FF)
 
 static const XML_Char *
 getContext(XML_Parser parser)
@@ -5364,7 +5378,7 @@
   if (dtd->defaultPrefix.binding) {
     int i;
     int len;
-    if (!poolAppendChar(&tempPool, XML_T('=')))
+    if (!poolAppendChar(&tempPool, XML_T(ASCII_EQUALS)))
       return NULL;
     len = dtd->defaultPrefix.binding->uriLen;
     if (namespaceSeparator)
@@ -5390,7 +5404,7 @@
     for (s = prefix->name; *s; s++)
       if (!poolAppendChar(&tempPool, *s))
         return NULL;
-    if (!poolAppendChar(&tempPool, XML_T('=')))
+    if (!poolAppendChar(&tempPool, XML_T(ASCII_EQUALS)))
       return NULL;
     len = prefix->binding->uriLen;
     if (namespaceSeparator)
@@ -5442,7 +5456,7 @@
       context = s;
       poolDiscard(&tempPool);
     }
-    else if (*s == XML_T('=')) {
+    else if (*s == XML_T(ASCII_EQUALS)) {
       PREFIX *prefix;
       if (poolLength(&tempPool) == 0)
         prefix = &dtd->defaultPrefix;




----------------------------------------------------------------------

Comment By: Scott Klement (klemscot)
Date: 2007-05-07 02:26

Message:
Logged In: YES 
user_id=223473
Originator: YES

I suspect that the problem is due to the fact that I'm running Expat on an
EBCDIC-based system.  In most cases, Expat uses the macros defined in
ascii.h, which work correctly, as they explicitly specify a hex code point.
 However, there are places where Expat will reference the character
directly, expecting that the hex code point of the document will match the
hex code point of a character typed into the source code.  On an ASCII
system, they match, on an EBCDIC system they don't.  


----------------------------------------------------------------------

Comment By: Scott Klement (klemscot)
Date: 2007-05-06 17:10

Message:
Logged In: YES 
user_id=223473
Originator: YES

I suspect that the problem is due to the fact that I'm running Expat on an
EBCDIC-based system.  In most cases, Expat uses the macros defined in
ascii.h, which work correctly, as they explicitly specify a hex code point.
 However, there are places where Expat will reference the character
directly, expecting that the hex code point of the document will match the
hex code point of a character typed into the source code.  On an ASCII
system, they match, on an EBCDIC system they don't.  

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2007-05-06 11:01

Message:
Logged In: YES 
user_id=290026
Originator: NO

Closing this as "Works for me", no follow-up from original poster.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2007-04-28 21:36

Message:
Logged In: YES 
user_id=290026
Originator: NO

Just tried with expat 2.0.0 and had no problems.
Maybe you are picking up an old version that is still installed on your
system?

----------------------------------------------------------------------

Comment By: Scott Klement (klemscot)
Date: 2007-04-28 18:52

Message:
Logged In: YES 
user_id=223473
Originator: YES

Version 2.0.0.

----------------------------------------------------------------------

Comment By: Karl Waclawek (kwaclaw)
Date: 2007-04-28 10:37

Message:
Logged In: YES 
user_id=290026
Originator: NO

I have no issues parsing this document with Expat CVS head. What version
of Expat are you using?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=1690883&group_id=10127


More information about the Expat-bugs mailing list