[Tutor] search and replace

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Wed Mar 8 01:20:37 CET 2006


> > Slightly more robust is the '\b' boundary assertion pattern:
>
> And I never understood why there were two...
>
> Which begs the question: under what circumstance would we prefer
> to use \W instead of \b?


In terms of code:

######
>>> import re
>>> re.search(r'(\W\w+\W)', '||||hello|||').group(1)
'|hello|'
>>> re.search(r'(\b\w+\b)', '||||hello|||').group(1)
'hello'
######


'\b' is subtly different: it doesn't really capture whatever we're pattern
matching against.  It's a zero-width "assertion".  '\W', on the other
hand, has a meaning as any character that's not alphanumeric or underscore
('[^a-zA-Z0-9_]').

Hope this helps!



More information about the Tutor mailing list