split on NO-BREAK SPACE
wildemar at freakmail.de
Sun Jul 22 21:27:23 CEST 2007
Peter Kleiweg wrote:
> Define white space to isspace()
Explain that phrase.
> Here is another "space":
> >>> u'\uFEFF'.isspace()
> isspace() is inconsistent
I don't really know much about unicode, but google tells me that \uFEFF
is a byte order mark. I thought we we're implicitly in unison that
"whitespace" (whatever the formal definition) means "the stuff we put
into text to visually separate words".
So what is *your* definition of whitespace?
>>> Why does split() split when it says NO-BREAK?
>> Precisely. It says NO-BREAK. It doesn't say NO-SPLIT.
> That is a stupid answer.
I fail to see why you deem it a good idea to become insulting at this point.
It is a very valid answer: NO-BREAK means "when wrapping characters into
paragraphs do not break at this space".
split() however does not wrap text, it /splits/ it (at whitespace
characters, as it happens). The NO-BREAK semantic has no meaning here.
More information about the Python-list