[XML-SIG] problem with elementtree 1.2.6
Fredrik Lundh
fredrik at pythonware.com
Thu Nov 29 00:33:08 CET 2007
Chris Withers wrote:
>> That's how escaping works, be it in XML, encodings, compression, whatever.
>
> Well yes and no. I'd expect escaping to work such that whatever we're
> dealing with can be round tripped, ie: parsed, serialiazed, parsed
> again, etc.
that's exactly how it works in ET, of course. you put Python strings in
the tree, the ET parsers and serializers take care of the rest.
elem = ET.Element("tag")
elem.text = value # ASCII or Unicode string
... write to disk ...
... read it back ...
assert elem.text == value
>> You can read the SGML spec regarding CDATA.
>
> Not sure what that's supposed to mean. CDATA for me means stuff inside a
> <![CDATA[ ]]> section._escape_cdata is used for everything inside any
> tag that isn't another tag.
cdata is character data; see
http://www.w3.org/TR/html401/types.html#h-6.2
that's not the same thing as a "CDATA section" (which is just one of
several ways to store character data in an XML file). how things are
stored doesn't matter; that's just a serialization detail:
http://www.w3.org/TR/xml-infoset/#omitted
What is not in the Information Set
6. Whether characters are represented by character references.
19. The boundaries of CDATA marked sections.
...
> I and many others do not ;-) When writing content into an html template,
> that content often comes from other sources that spit out lumps of html.
> Being able to insert them without escaping is a common use case.
HTML might be similar to XML, but an XML parser cannot parse HTML, so
you cannot insert HTML fragments into an XML document without either
escaping it, or pre-processing it to make sure it's well-formed.
if you want to insert literal XML fragments in an ET tree, use the XML
factory function:
fragment = "<tag>...</tag>"
elem.append(ET.XML(fragment))
if you want to embed HTML fragments in an ET tree, use ElementTidy or
ElementSoup (or equivalent) to turn the fragment into properly nested
and properly namespaced XHTML.
if you want to do unstructured string handling, use a template library
or Python strings. don't use an XML library if you don't want to work
with XML.
> That's true, sometimes. That inserted lump may have come from a process
> which can only spit out perfect html fragments, in which case you're
> fine, or it may come from user input, in which case you're doomed but
> will likely have happy customers ;-)
the hackers will be happy, at least:
http://en.wikipedia.org/wiki/Cross_site_scripting
</F>
More information about the XML-SIG
mailing list