How to escape strings for re.finditer?

Tue Feb 28 11:48:29 EST 2023

On 2023-02-28, Thomas Passin <list1 at tompassin.net> wrote:
> On 2/28/2023 10:05 AM, Roel Schroeven wrote:
>> Op 28/02/2023 om 14:35 schreef Thomas Passin:
>>> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
>>>> [...]
>>>> (2) Searching for a string in another string, in a performant way, is 
>>>> not as simple as it first appears. Your version works correctly, but 
>>>> slowly. In some situations it doesn't matter, but in other cases it 
>>>> will. For better performance, string searching algorithms jump ahead 
>>>> either when they found a match or when they know for sure there isn't 
>>>> a match for some time (see e.g. the Boyer–Moore string-search 
>>>> algorithm). You could write such a more efficient algorithm, but then 
>>>> it becomes more complex and more error-prone. Using a well-tested 
>>>> existing function becomes quite attractive.
>>>
>>> Sure, it all depends on what the real task will be.  That's why I 
>>> wrote "Without knowing how general your expressions will be". For the 
>>> example string, it's unlikely that speed will be a factor, but who 
>>> knows what target strings and keys will turn up in the future?
>> On hindsight I think it was overthinking things a bit. "It all depends 
>> on what the real task will be" you say, and indeed I think that should 
>> be the main conclusion here.
>
> It is interesting, though, how pre-processing the search pattern can 
> improve search times if you can afford the pre-processing.  Here's a 
> paper on rapidly finding matches when there may be up to one misspelled 
> character.  It's easy enough to implement, though in Python you can't 
> take the additional step of tuning it to stay in cache.
>
> https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf

You've somehow title-cased that URL. The correct URL is:

https://robert.muth.org/Papers/1996-approx-multi.pdf