xml input sanitizing method in standard lib?
afri at afri.cz
Mon Mar 9 18:30:31 CET 2009
> > Is there some method provided in python standard library to sanitize
> > strings used as input to xml documents? (=remove form-feeds and whatever
> > else). I've searched docs and google, found only 4Suite project. I
> > cannot rely on something not in standard lib, so I'm wondering if I've
> > just overlooked something in the docs to this (imo) important task...
> What do you mean by "sanitize strings used as input to xml
> Do you have a string that you're going to parse as an XML document? Either
> it is valid XML, or not. I would not "sanitize" it, you risk changing the
> data inside.
Thanks for response and sorry for I wasn't clear first time. I have a
heap of data (logs), from which I build a XML document using
xml.dom.minidom. In this data, some xml invalid characters may occur -
form feed (\x0c) character is one example.
I don't know what else is illegal in xml, so I've searched if there's
some method how to prepare strings for insertion to a xml doc before I
start research on a xml spec and write such function on my own.
More information about the Python-list