schizophrenic view of what is white space
tjreedy at udel.edu
Thu Dec 4 20:37:58 CET 2008
> Robin Becker wrote:
>> Jean-Paul Calderone wrote:
>>> You have to give the re module an additional hint that you care about
>>> exarkun at charm:~$ python
>>> Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu
>>> 4.2.3-2ubuntu7)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> >>> import re
>>> >>> print re.compile(r'\s').search(u'a\xa0b')
>>> >>> print re.compile(r'\s', re.U).search(u'a\xa0b')
>>> <_sre.SRE_Match object at 0xb7dbb3a0>
>> so the default behaviour differs for unicode and re working on
>> unicode. I suppose that won't be true in Python 3.
> I'm not sure why the Unicode flag is needed in the API. I reckon that it
> should just look at the text that the regular expression is being
> applied to: if it's Unicode then follow the Unicode rules, if not then
I presume because \b is interpreted and replaced when the re is compiled
into internal state machine form.
More information about the Python-list