
On Wed, 30 Mar 2022 at 10:08, Steven D'Aprano <steve@pearwood.info> wrote:
Here's the version of grab I used:
def grab(text, start, end): a = text.index(start) b = text.index(end, a+len(start)) return text[a+len(start):b]
This is where Python would benefit from an sscanf-style parser. Instead of regexps, something this simple could be written like this: [fruit] = sscanf(sample, "%*sfruit:%s\n") It's simple left-to-right tokenization, so it's faster than a regex (due to the lack of backtracking). It's approximately as clear, and doesn't require playing with the index and remembering to skip len(start). That said, though - I do think the OP's task is better served by a tokenization pass that transforms the string into something easier to look things up in. ChrisA