Preventing control characters from entering an XML file
Frank Niessink
frank at niessink.com
Thu Jan 5 16:52:31 EST 2006
Scott David Daniels wrote:
> Frank Niessink wrote:
>
>>- What is the easiest/most pythonic (preferably build-in) way of
>>checking a unicode string for control characters and weeding those
>>characters out?
>
>
> drop_controls = [None] * 0x20
> for c in '\t\r\n':
> drop_controls[c] = unichr(c)
> ...
> some_unicode_string = some_unicode_string.translate(drop_controls)
Hi Scott,
Your code gave me a "TypeError: an integer is required". Anyway, it was
sufficient to push me in the right direction. This is my version:
UNICODE_CONTROL_CHARACTERS_TO_WEED = {}
for ordinal in range(0x20):
if chr(ordinal) not in '\t\r\n':
UNICODE_CONTROL_CHARACTERS_TO_WEED[ordinal] = None
Which let you do:
>>> u'T\x04est\x09'.translate(UNICODE_CONTROL_CHARACTERS_TO_WEED)
u'Test\t'
Thanks, Frank
More information about the Python-list
mailing list