[Python-bugs-list] [ python-Bugs-440351 ] saxutils.escape needs to escape "quotes"

noreply@sourceforge.net noreply@sourceforge.net
Tue, 07 Aug 2001 10:20:46 -0700


Bugs item #440351, was opened at 2001-07-11 03:32
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=440351&group_id=5470

Category: XML
Group: None
>Status: Open
Resolution: Fixed
Priority: 5
Submitted By: Andrew Dalke (dalke)
Assigned to: Fred L. Drake, Jr. (fdrake)
>Summary: saxutils.escape needs to escape "quotes"

Initial Comment:
XML attributes containing a value with a double
quote are not properly escaped.  Consider

from xml.sax import saxutils
gen = saxutils.XMLGenerator()
gen.startDocument()
gen.startElement('spam', {'width': '12"'})

This produces

<?xml version="1.0" encoding="iso-8859-1"?>
<spam width="12"">

That second line should more likely be

<spam width="12&quot;">

or perhaps use the hex escape, which I think
is '&#23'.  But I'm not an XML guru so don't
trust me on either one!

                               Andrew Dalke
                               dalke@acm.org


----------------------------------------------------------------------

>Comment By: Andrew Dalke (dalke)
Date: 2001-08-07 10:20

Message:
Logged In: YES 
user_id=190903

So now there are two related functions in that module,

def escape(data, entities={}):
    """Escape &, <, and > in a string of data.

def quoteattr(data, entities={}):
    """Escape and quote an attribute value.

I'm fine with that.  I even think that's correct, since
I see two types of quotings.  What I'm curious about is why
XMLGenerator.startElement and startElementNS should use
'escape' (that is, remain unchanged) as compared to
using 'quoteattr' (the new function)

Here's is the relevant code in starteElement:

        for (name, value) in attrs.items():
            self._out.write(' %s="%s"' % (name, escape
(value)))

Here's what I think it should be:

        for (name, value) in attrs.items():
            self._out.write(' %s=%s' % (name, quoteattr
(value)))

(similar change for startElementNS() - characters() remains
unchanged.)

Consider this test case:

from xml.sax import saxutils 
gen = saxutils.XMLGenerator() 
gen.startDocument() 
gen.startElement('spam', {'width': '5\'3"'}) 

With the saxutil module straight out of CVS I get

<?xml version="1.0" encoding="iso-8859-1"?>
<spam width="5'3"">

That is not what I expected.  The second line should be

<spam width="5'3&quot;">

Now how do I get proper output using XMLGenerator?  I can't
quote the " myself, since the call to 'quote' escapes the &

>>> gen.startElement('spam', {'width': '5\'3&quot;'})
<spam width="5'3&amp;quot;">

The only solution is to derive from XMLGenerator to make
it do the right thing.  So why shouldn't the right thing
be in XMLGenerator proper?  Or is there some other way I
can generically convert SAX startElement events to proper
XML?
                    Andrew
                    dalke@dalkescientific.com


----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-07-19 09:11

Message:
Logged In: YES 
user_id=3066

Fixed, but not in the way requested.  ;-)

I've added a new function to the saxutils module,
quoteattr().  It prepares an attribute value for inclusion
as part of markup by doing "just enough" escaping of quote
characters, and supplies the proper quote characters for the
escaping it actually did.

The addition was checked in as Lib/xml/sax/saxutils.py
revision 1.15, with corresponding documentation and test
updates.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=440351&group_id=5470