xml input sanitizing method in standard lib?

Petr Muller afri at afri.cz
Mon Mar 9 18:30:31 CET 2009


> > Is there some method provided in python standard library to sanitize
> > strings used as input to xml documents? (=remove form-feeds and whatever
> > else). I've searched docs and google, found only 4Suite project. I
> > cannot rely on something not in standard lib, so I'm wondering if I've
> > just overlooked something in the docs to this (imo) important task...

> What do you mean by "sanitize strings used as input to xml
> documents"? 
> Do you have a string that you're going to parse as an XML document? Either  
> it is valid XML, or not. I would not "sanitize" it, you risk changing the  
> data inside.

Thanks for response and sorry for I wasn't clear first time. I have a
heap of data (logs), from which I build a XML document using
xml.dom.minidom. In this data, some xml invalid characters may occur -
form feed (\x0c) character is one example. 

I don't know what else is illegal in xml, so I've searched if there's
some method how to prepare strings for insertion to a xml doc before I
start research on a xml spec and write such function on my own.


More information about the Python-list mailing list