[Tutor] How to find a word in a string - nearly correct
Cameron Simpson
cs at cskk.id.au
Tue May 4 18:02:43 EDT 2021
On 04May2021 12:09, Phil <phillor9 at gmail.com> wrote:
>On 4/5/21 11:20 am, Mats Wichmann wrote:
>>On 5/3/21 7:09 PM, Phil wrote:
>import re
>
>result = re.search(r'\b' + 'this' + '\W', test)
You can write that in one go like this:
result = re.search(r'\bthis\W', test)
OTOH, keeping them separate may aid debugging.
>The output is 'this,' ,which is based on a white-space between words
>rather than punctuation. The search continues.
That's because you included the trailing "nonword" character in your
regexp. Had you considered that "\b" matches a word boundary at the
start _or_ the end?
\bthis\b
Bearing in mind that "word" is a lexical idea in regexps and may not be
meaningful in nonEnglish settings.
For reference, the special flavour of regexps in Python is documents (at
length, alas), here:
https://docs.python.org/3/library/re.html#regular-expression-syntax
The other thing to keep in mind is that if you're matching something
dependent on the surrounding context you can:
Use a group:
\W(this)\W
and access the grouped part, eg in the above result.group(1) is "this".
Use a look-ahead or look-behind match:
(?<!\w)this(?!\w)
These are tests, and are not included in the matched text. The leading
one texts for _not_ matching a "word character \w" before they text and
the trailing one tests for _not_ matching a word character after the
text. But only the stuff in between is part of the result ("this").
You can see this can get pretty complex pretty fast. But \b for a "word
boundary" is very useful.
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Tutor
mailing list