schizophrenic view of what is white space
Terry Reedy
tjreedy at udel.edu
Thu Dec 4 14:37:58 EST 2008
MRAB wrote:
> Robin Becker wrote:
>> Jean-Paul Calderone wrote:
>> .........
>>>
>>> You have to give the re module an additional hint that you care about
>>> unicode:
>>>
>>> exarkun at charm:~$ python
>>> Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu
>>> 4.2.3-2ubuntu7)] on linux2
>>> Type "help", "copyright", "credits" or "license" for more information.
>>> >>> import re
>>> >>> print re.compile(r'\s').search(u'a\xa0b')
>>> None
>>> >>> print re.compile(r'\s', re.U).search(u'a\xa0b')
>>> <_sre.SRE_Match object at 0xb7dbb3a0>
>>> >>>
>>>
>>> Jean-Paul
>> .......
>>
>> so the default behaviour differs for unicode and re working on
>> unicode. I suppose that won't be true in Python 3.
> >
> I'm not sure why the Unicode flag is needed in the API. I reckon that it
> should just look at the text that the regular expression is being
> applied to: if it's Unicode then follow the Unicode rules, if not then
> don't.
I presume because \b is interpreted and replaced when the re is compiled
into internal state machine form.
More information about the Python-list
mailing list