
[Tim]
That leaves the happy 5% who write "[^X]*X", which finally says what they intended from the start.
[Steven]
Doesn't that only work if X is literally a single character?
RIght. It was an examp[e, not a meta-example. Even for a _single character_, "match up to the next, but never more or less than that" is a puzzle for most regexp users. [Chris]
Yes, but if X is actually "spam", then you can probably do other assertions to guarantee the right match. It gets pretty clunky though.
Assertions aren't needed, but it is nightmarish to get right. (|[^s]|s(|[^p]|p(|[^a]|a(|[^m]))))*spam The "spam" at the end is the only obvious part ;-) Before then, we match 0 or more instances of nothing or not 's' or 's' followed by nothing or not 'p' or 'p' followed by nothing or not 'a' or 'a' followed by nothing or not 'm' "spam" itself can't get through that maze, so backtracking into it after its first match can't consume the matched "spam" to find a later one. In SNOBOL, as I recall, it could be spelled ARB "spam" FENCE Those are all pattern objects, and infix whitespace is a binary pattern catenation operator. ARB is a builtin pattern that matches the empty string at first, and extends what it matches by one character each time it's backtracked into. "spam" matches the obvious string. Then FENCE is a builtin pattern that matches an empty string, but acts as a backtracking barrier: if the overall match attempt fails, backtracking will not move "to the left" of FENCE. So, here, ARB will not get a chance to consume more characters after the leftmost "spam" is found.