
[Tim]
In SNOBOL, as I recall, it could be spelled
ARB "spam" FENCE
[Chris]
Ah, so that's a bit more complicated than the "no-backtracking" parsing style of REXX and scanf.
Oh, a lot more complex. In SNOBOL, arbitrary computation can be performed at any point in pattern matching.. Yet _simple_ things are simple to say. Going back to the original one-letter "find the leftmost 'x' after the current point" is just BREAK('X') 'X' BREAK is a builtin function that returns a pattern that consumes all the characters until the first occurrence of one of the letters in its argument. If it's backed up into via backtracking, it fails: the shortest string is the only one it will consume. If you want something fancier, fine - you can code anything you like. For example, here's a pattern object (BAL) that matches the longest non-empty substring balanced with respect to parentheses: BALEXP = NOTANY('()') '(' ARBNO( *BALEXP) ')' BAL = BALEXP ARBNO(BALEXP) ARBNO is akin to a regexp .*? quantifier. The '*' in front of BALEXP in the first line tells SNOBOL to delay evaluating BALEXP until the pattern runs; at the time BALEXP is _defined_ by that line, BALEXP isn't yet bound to anything useful.
Does this guarantee anything about performance, or is it primarily to allow you to write patterns that can't mismatch strings?
They were inventing "pattern matching" in the 1960s. and were looking for expressive notations that were easy for experts to use. Regular expressions were much more of theoretical interest at the time, and offered far feebler abilities than SNOBOL's pattern abilities anyway. It's still interesting to me that they settled on a relatively "wordy" notation, wh\ch is pretty easy to pick up. This is perhaps because regexps are so feeble in comparison that they allowed proving mountains of properties, so a hyper-concise notation for them was invented to make publishing long-winded proofs possible ;-) But while SNOBOL was wildly inventive for its time, it was always intended to be used "to get real work done". Note too that potential backtracking isn't hiding in notation: it's quite explicit in names like ARB and ARBNO, and, of course, in pattern alternation (the infix binary "|" operator). Most builtin pattern functions and objects in SNOBOL are "one and done" (they match or they don't, and fail if backtracked into). Want a string of digits? SPAN('0123456789'). LIke a regexp's \d+, but will _only_ match the longest string of digits starting at the current point. The possibility for backtracking at the _lexical_ level is almost always a misfeature. Which didn't matter in the seminal lex+yacc scheme. yacc consumed the tokens lex's regexps produced, and if yacc failed to match the grammar lex didn't go on to try a different way of tokenizing. In effect, \d+ in lex acted like SNOBOL's SPAN. regexps didn['t manage the transition from "a tool" to "a solution" gracefully ;-)