
On Wed, 16 Feb 2022 at 00:55, Steven D'Aprano <steve@pearwood.info> wrote:
On Tue, Feb 15, 2022 at 05:39:33AM -0600, Tim Peters wrote:
([^s]|s(?!pam))*spam
Bingo. That pattern is easy enough to understand
You and I have very different definitions of the word "easy" :-)
(if not to invent the first time): we can chew up a character if it's not an "s", or if it is an "s" but one _not_ followed immediately by "pam".
It is times like this that I am reminded why I prefer to just call string.find("spam") :-)
Yeah, regexes always look terrible when they're used for simple examples :) But try matching a line that has (somewhere in it) the word "spam", then whitespace, then a number (or if you prefer: then a sequence of ASCII digits). It's easy to write "spam\s+[0-9]+" and not nearly as easy to write it with method calls. So it makes sense that, when you add a restriction like "the word spam must be the first instance of that in the line" (maybe not common with words, but it certainly would be if you're scanning for a colon or other separator), it should still be written that way. To be honest, I don't think I've ever used method calls for complicated parsing. It's just way too messy. Much easier to reach for a regex, sscanf pattern, or other tool - even if it's not technically perfect. ChrisA