[XML-SIG] Attribute value normalization

Thu, 2 Aug 2001 08:44:43 +0200

> What does your system produce when trying "eg3" ?

The output looks like this on the screen:

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE doc>
<doc>
&#10;rm    ' "/>
&#10;rmN   B   '/>'   A
&#10;rmI   ' "/>
</doc>

but it actually is this:

0000000   <   ?   x   m   l       v   e   r   s   i   o   n   =   '   1
0000020   .   0   '       e   n   c   o   d   i   n   g   =   '   U   T
0000040   F   -   8   '   ?   >  \n   <   !   D   O   C   T   Y   P   E
0000060       d   o   c   >  \n   <   d   o   c   >  \n           <   n
0000100   o   r   m       a   t   t   r   =   "       '
0000120  \r   &   #   1   0   ;  \t               '       "   /   >  \n
0000140           <   n   o   r   m   N   a   m   e   s       a   t   t
0000160   r   =   '               A                  \r   &   #   1   0
0000200   ;  \t               B               '   /   >  \n           <
0000220   n   o   r   m   I   d       i   d   =   "       '
0000240      \r   &   #   1   0   ;  \t               '       "   /   >
0000260  \n   <   /   d   o   c   >  \n
0000270

> Anyhow, I expect multiple spaces collapsed into one.

What specific text in the XML recommendation makes you expect this?
Perhaps this:

# If the attribute type is not CDATA, then the XML processor must
# further process the normalized attribute value ... by replacing
# sequences of space (#x20) characters by a single space (#x20)
# character.

Then pyexpat behaves correctly, as far as I can tell:

# All attributes for which no declaration has been read should be
# treated by a non-validating processor as if declared CDATA.

Since your documents contain no attribute declaration, all attributes
are CDATA, and therefore spaces most *not* be collapsed.

Regards,
Martin