[Python-ideas] Re: Regex timeouts

Feb. 15, 2022


      On Mon, Feb 14, 2022 at 05:13:38PM -0600, Tim Peters wrote:
...
An interesting lesson nobody wants to learn: the original major
string-processing language, SNOBOL, had powerful pattern matching but
no regexps. Griswold's more modern successor language, Icon, found no
reason to change that.
I've been interested in the existence of SNOBOL string scanning for 
a long time, but I know very little about it.

How does it differ from regexes, and why have programming languages 
pretty much standardised on regexes rather than other forms of string 
matching?
...
Naive regexps are both clumsy and prone to bad
timing in many tasks that "should be" very easy to express. For
example, "now match up to the next occurrence of 'X'". In SNOBOL and
Icon, that's trivial. 75% of regexp users will write ".*X", with scant
understanding that it may match waaaay more than they intended.
Indeed, I've been bitten by that many times :-)
...
Another 20% will write ".*?X", with scant understanding that may
extend beyond _just_ "the next" X in some cases.
But this surprises me. Do you have an example?
...
That leaves the happy
5% who write "[^X]*X", which finally says what they intended from the
start.
Doesn't that only work if X is literally a single character?
...
...
...
import re
string = "This is some spam and extra spam."
re.search('[^spam]*spam', string)
<re.Match object; span=(11, 17), match='e spam'>
Whereas this seems to do what I expected:
...
...
...
re.search('.*?spam', string)
<re.Match object; span=(0, 17), match='This is some spam'>
-- 
Steve

[Python-ideas] Re: Regex timeouts

Steven D'Aprano