
On 25 Mar 2020, at 9:48, Stephen J. Turnbull wrote:
Walter Dörwald writes:
A `find()` that supports multiple search strings (and returns the leftmost position where a search string can be found) is a great help in implementing some kind of tokenizer:
In other words, you want the equivalent of Emacs's "(search-forward (regexp-opt list-of-strings))", which also meets the requirement of returning which string was found (as "(match-string 0)").
Sounds like it. I'm not familiar with Emacs.
Since Python already has a functionally similar API for regexps, we can add a regexp-opt (with appropriate name) method to re, perhaps as .compile_string_list(), and provide a convenience function re.search_string_list() for your application.
If you're using regexps anyway, building the appropriate or-expression shouldn't be a problem. I guess that's what most lexers/tokenizers do anyway.
I'm applying practicality before purity, of course. To some extent we want to encourage simple string approaches, and putting this in regex is not optimal for that.
Exactly. I'm always a bit hesitant when using regexps, if there's a simpler string approach.
Steve
Servus, Walter