[XML-SIG] Normalized AttVals

Paul Prescod paul@prescod.net
Mon, 14 Dec 1998 17:55:52 -0600


John Day wrote:
> 
> Re: quoted attribute contents ("AttVal")
> When '>' is encountered e.g. <code op=">"> it is "normalized"
> to '&gt;', however, when '&' is encountered it is a fatal
> error e.g. <a href="www.zzz.com?a=1&b=3">

That's what the XML spec says.

AttValue ::=  '"' ([^<&"] | Reference)* '"'  
   |  "'" ([^<&'] | Reference)* "'" 

That means that "<" and "&" are never allowed in attribute values except
as parts of an attribute reference.

> Is this pyexpat behavior correct? Why can't the parser tell that
> '&b' above is _not_ a defined entity because it is not terminated
> by ';'? 

That's what full SGML does, but that's not what XML does. XML is supposed
to be easier to implement.

> It seems to me that this usage could be normalized to
> '&amp;b', just like pyexpat did for '>'. Then it would be backward
> compatible with HTML (sort of).

There are several ways that it isn't backwards compatible with HTML

> The impact of this seems to be enormous. All of the existing HTML
> parameter generators will have to change the way they post arguments,
> when HTML is replaced by XML, right?

This has been a known problem for a long time.

http://www.uni-ulm.de/uni/fak/natwis/strudo/ampersand.html

 Paul Prescod  - ISOGEN Consulting Engineer speaking for only himself
 http://itrc.uwaterloo.ca/~papresco

"Sports utility vehicles are gated communities on wheels" - Anon