[Tutor] search and replace
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Wed Mar 8 01:20:37 CET 2006
> > Slightly more robust is the '\b' boundary assertion pattern:
>
> And I never understood why there were two...
>
> Which begs the question: under what circumstance would we prefer
> to use \W instead of \b?
In terms of code:
######
>>> import re
>>> re.search(r'(\W\w+\W)', '||||hello|||').group(1)
'|hello|'
>>> re.search(r'(\b\w+\b)', '||||hello|||').group(1)
'hello'
######
'\b' is subtly different: it doesn't really capture whatever we're pattern
matching against. It's a zero-width "assertion". '\W', on the other
hand, has a meaning as any character that's not alphanumeric or underscore
('[^a-zA-Z0-9_]').
Hope this helps!
More information about the Tutor
mailing list