[XML-SIG] Attribute value normalization
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Thu, 2 Aug 2001 08:44:43 +0200
> What does your system produce when trying "eg3" ?
The output looks like this on the screen:
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE doc>
<doc>
rm ' "/>
rmN B '/>' A
rmI ' "/>
</doc>
but it actually is this:
0000000 < ? x m l v e r s i o n = ' 1
0000020 . 0 ' e n c o d i n g = ' U T
0000040 F - 8 ' ? > \n < ! D O C T Y P E
0000060 d o c > \n < d o c > \n < n
0000100 o r m a t t r = " '
0000120 \r & # 1 0 ; \t ' " / > \n
0000140 < n o r m N a m e s a t t
0000160 r = ' A \r & # 1 0
0000200 ; \t B ' / > \n <
0000220 n o r m I d i d = " '
0000240 \r & # 1 0 ; \t ' " / >
0000260 \n < / d o c > \n
0000270
> Anyhow, I expect multiple spaces collapsed into one.
What specific text in the XML recommendation makes you expect this?
Perhaps this:
# If the attribute type is not CDATA, then the XML processor must
# further process the normalized attribute value ... by replacing
# sequences of space (#x20) characters by a single space (#x20)
# character.
Then pyexpat behaves correctly, as far as I can tell:
# All attributes for which no declaration has been read should be
# treated by a non-validating processor as if declared CDATA.
Since your documents contain no attribute declaration, all attributes
are CDATA, and therefore spaces most *not* be collapsed.
Regards,
Martin