It seems that Python treats non-breaking space (\xa0) as a normal whitespace character, e.g. when splitting a string. See below: >>> s='hello\xa0there' >>> s.split() ['hello', 'there'] Surely this is not intended behaviour?