
On Wed, 16 Feb 2022 at 09:28, Steven D'Aprano <steve@pearwood.info> wrote:
On Wed, Feb 16, 2022 at 01:02:44AM +1100, Chris Angelico wrote:
Yeah, regexes always look terrible when they're used for simple examples :) But try matching a line that has (somewhere in it) the word "spam", then whitespace, then a number (or if you prefer: then a sequence of ASCII digits). It's easy to write "spam\s+[0-9]+"
After this thread, I no longer trust that "easy" regexes will do what they "obviously" look like they should do :-(
I'm not trying to be funny or snarky.
(That must be rare!)
I *thought* I had a reasonable understanding of regexes, and now I have learned that I don't, and that the regexes I've been writing don't do what I thought they did, and presumedly the only reason they haven't blown up in my face (either performance-wise, or the wrong output) is blind luck.
Now I have *three* problems :-(
I think it's one of those cases where it normally doesn't matter that they don't technically do quite what you thought. Pretending that a regex matches in a simpler way than it actually does is like pretending that the earth is a sphere: technically wrong, but almost always close enough. It's only in the rare cases that it matters, and they usually only show up with the regexps that are so complicated that I wouldn't trust them to not be buggy anyway. (Debugging a regexp is a PAIN, when your main response is just "nope didn't match".) ChrisA