Preventing control characters from entering an XML file

Scott David Daniels scott.daniels at acm.org
Sun Jan 1 18:37:55 EST 2006


Frank Niessink wrote:
> ...
> Character Range
> Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | 
> [#x10000-#x10FFFF]"
> 
> - What is the easiest/most pythonic (preferably build-in) way of 
> checking a unicode string for control characters and weeding those 
> characters out?

     drop_controls = [None] * 0x20
     for c in '\t\r\n':
         drop_controls[c] = unichr(c)
     ...
     some_unicode_string = some_unicode_string.translate(drop_controls)

--Scott David Daniels
scott.daniels at acm.org



More information about the Python-list mailing list