schizophrenic view of what is white space

Terry Reedy tjreedy at udel.edu
Thu Dec 4 14:37:58 EST 2008


MRAB wrote:
> Robin Becker wrote:
>> Jean-Paul Calderone wrote:
>> .........
>>>
>>> You have to give the re module an additional hint that you care about
>>> unicode:
>>>
>>>  exarkun at charm:~$ python
>>>  Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)  [GCC 4.2.3 (Ubuntu 
>>> 4.2.3-2ubuntu7)] on linux2
>>>  Type "help", "copyright", "credits" or "license" for more information.
>>>  >>> import re
>>>  >>> print re.compile(r'\s').search(u'a\xa0b')
>>>  None
>>>  >>> print re.compile(r'\s', re.U).search(u'a\xa0b')
>>>  <_sre.SRE_Match object at 0xb7dbb3a0>
>>>  >>>
>>>
>>> Jean-Paul
>> .......
>>
>> so the default behaviour differs for unicode and re working on 
>> unicode. I suppose that won't be true in Python 3.
>  >
> I'm not sure why the Unicode flag is needed in the API. I reckon that it 
> should just look at the text that the regular expression is being 
> applied to: if it's Unicode then follow the Unicode rules, if not then 
> don't.

I presume because \b is interpreted and replaced when the re is compiled 
into internal state machine form.




More information about the Python-list mailing list