Preventing control characters from entering an XML file

Frank Niessink frank at niessink.com
Thu Jan 5 22:52:31 CET 2006


Scott David Daniels wrote:
> Frank Niessink wrote:
> 
>>- What is the easiest/most pythonic (preferably build-in) way of 
>>checking a unicode string for control characters and weeding those 
>>characters out?
> 
> 
>      drop_controls = [None] * 0x20
>      for c in '\t\r\n':
>          drop_controls[c] = unichr(c)
>      ...
>      some_unicode_string = some_unicode_string.translate(drop_controls)

Hi Scott,

Your code gave me a "TypeError: an integer is required". Anyway, it was 
sufficient to push me in the right direction. This is my version:

UNICODE_CONTROL_CHARACTERS_TO_WEED = {}
for ordinal in range(0x20):
     if chr(ordinal) not in '\t\r\n':
         UNICODE_CONTROL_CHARACTERS_TO_WEED[ordinal] = None

Which let you do:

 >>> u'T\x04est\x09'.translate(UNICODE_CONTROL_CHARACTERS_TO_WEED)
u'Test\t'


Thanks, Frank



More information about the Python-list mailing list