Python treats non-breaking space wrong?
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Sat Jun 5 04:59:16 EDT 2010
On Sat, 05 Jun 2010 01:30:40 -0700, magnus.lycka at gmail.com wrote:
> It seems that Python treats non-breaking space (\xa0) as a normal
> whitespace character, e.g. when splitting a string. See below:
>
>>>> s='hello\xa0there'
>>>> s.split()
> ['hello', 'there']
>
> Surely this is not intended behaviour?
Yes it is.
str.split() breaks on whitespace, and \xa0 is whitespace according to the
Unicode standard. To put it another way, str.split() is not a word-
wrapping split. This has been reported before, and rejected as a won't-
fix.
http://mail.python.org/pipermail/python-bugs-list/2006-January/031531.html
--
Steven
More information about the Python-list
mailing list