Python treats non-breaking space wrong?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sat Jun 5 04:59:16 EDT 2010


On Sat, 05 Jun 2010 01:30:40 -0700, magnus.lycka at gmail.com wrote:

> It seems that Python treats non-breaking space (\xa0) as a normal
> whitespace character, e.g. when splitting a string. See below:
> 
>>>> s='hello\xa0there'
>>>> s.split()
> ['hello', 'there']
> 
> Surely this is not intended behaviour?


Yes it is.

str.split() breaks on whitespace, and \xa0 is whitespace according to the 
Unicode standard. To put it another way, str.split() is not a word-
wrapping split. This has been reported before, and rejected as a won't-
fix.

http://mail.python.org/pipermail/python-bugs-list/2006-January/031531.html



-- 
Steven



More information about the Python-list mailing list