[Tutor] How to find a word in a string - nearly correct

Cameron Simpson cs at cskk.id.au
Tue May 4 18:02:43 EDT 2021


On 04May2021 12:09, Phil <phillor9 at gmail.com> wrote:
>On 4/5/21 11:20 am, Mats Wichmann wrote:
>>On 5/3/21 7:09 PM, Phil wrote:
>import re
>
>result = re.search(r'\b' + 'this' + '\W', test)

You can write that in one go like this:

    result = re.search(r'\bthis\W', test)

OTOH, keeping them separate may aid debugging.

>The output is 'this,' ,which is based on a white-space between words 
>rather than punctuation. The search continues.

That's because you included the trailing "nonword" character in your 
regexp. Had you considered that "\b" matches a word boundary at the 
start _or_ the end?

    \bthis\b

Bearing in mind that "word" is a lexical idea in regexps and may not be 
meaningful in nonEnglish settings.

For reference, the special flavour of regexps in Python is documents (at 
length, alas), here:

    https://docs.python.org/3/library/re.html#regular-expression-syntax

The other thing to keep in mind is that if you're matching something 
dependent on the surrounding context you can:

Use a group:

    \W(this)\W

and access the grouped part, eg in the above result.group(1) is "this".

Use a look-ahead or look-behind match:

    (?<!\w)this(?!\w)

These are tests, and are not included in the matched text. The leading 
one texts for _not_ matching a "word character \w" before they text and 
the trailing one tests for _not_ matching a word character after the 
text. But only the stuff in between is part of the result ("this").

You can see this can get pretty complex pretty fast. But \b for a "word 
boundary" is very useful.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list