Preventing control characters from entering an XML file

Frank Niessink frank at
Thu Jan 5 22:52:31 CET 2006

Scott David Daniels wrote:
> Frank Niessink wrote:
>>- What is the easiest/most pythonic (preferably build-in) way of 
>>checking a unicode string for control characters and weeding those 
>>characters out?
>      drop_controls = [None] * 0x20
>      for c in '\t\r\n':
>          drop_controls[c] = unichr(c)
>      ...
>      some_unicode_string = some_unicode_string.translate(drop_controls)

Hi Scott,

Your code gave me a "TypeError: an integer is required". Anyway, it was 
sufficient to push me in the right direction. This is my version:

for ordinal in range(0x20):
     if chr(ordinal) not in '\t\r\n':

Which let you do:

 >>> u'T\x04est\x09'.translate(UNICODE_CONTROL_CHARACTERS_TO_WEED)

Thanks, Frank

More information about the Python-list mailing list