How to escape strings for re.finditer?
Thomas Passin
list1 at tompassin.net
Tue Feb 28 10:41:11 EST 2023
On 2/28/2023 10:05 AM, Roel Schroeven wrote:
> Op 28/02/2023 om 14:35 schreef Thomas Passin:
>> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
>>> [...]
>>> (2) Searching for a string in another string, in a performant way, is
>>> not as simple as it first appears. Your version works correctly, but
>>> slowly. In some situations it doesn't matter, but in other cases it
>>> will. For better performance, string searching algorithms jump ahead
>>> either when they found a match or when they know for sure there isn't
>>> a match for some time (see e.g. the Boyer–Moore string-search
>>> algorithm). You could write such a more efficient algorithm, but then
>>> it becomes more complex and more error-prone. Using a well-tested
>>> existing function becomes quite attractive.
>>
>> Sure, it all depends on what the real task will be. That's why I
>> wrote "Without knowing how general your expressions will be". For the
>> example string, it's unlikely that speed will be a factor, but who
>> knows what target strings and keys will turn up in the future?
> On hindsight I think it was overthinking things a bit. "It all depends
> on what the real task will be" you say, and indeed I think that should
> be the main conclusion here.
It is interesting, though, how pre-processing the search pattern can
improve search times if you can afford the pre-processing. Here's a
paper on rapidly finding matches when there may be up to one misspelled
character. It's easy enough to implement, though in Python you can't
take the additional step of tuning it to stay in cache.
https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf
More information about the Python-list
mailing list