How to escape strings for re.finditer?
Jon Ribbens
jon+usenet at unequivocal.eu
Tue Feb 28 11:48:29 EST 2023
On 2023-02-28, Thomas Passin <list1 at tompassin.net> wrote:
> On 2/28/2023 10:05 AM, Roel Schroeven wrote:
>> Op 28/02/2023 om 14:35 schreef Thomas Passin:
>>> On 2/28/2023 4:33 AM, Roel Schroeven wrote:
>>>> [...]
>>>> (2) Searching for a string in another string, in a performant way, is
>>>> not as simple as it first appears. Your version works correctly, but
>>>> slowly. In some situations it doesn't matter, but in other cases it
>>>> will. For better performance, string searching algorithms jump ahead
>>>> either when they found a match or when they know for sure there isn't
>>>> a match for some time (see e.g. the Boyer–Moore string-search
>>>> algorithm). You could write such a more efficient algorithm, but then
>>>> it becomes more complex and more error-prone. Using a well-tested
>>>> existing function becomes quite attractive.
>>>
>>> Sure, it all depends on what the real task will be. That's why I
>>> wrote "Without knowing how general your expressions will be". For the
>>> example string, it's unlikely that speed will be a factor, but who
>>> knows what target strings and keys will turn up in the future?
>> On hindsight I think it was overthinking things a bit. "It all depends
>> on what the real task will be" you say, and indeed I think that should
>> be the main conclusion here.
>
> It is interesting, though, how pre-processing the search pattern can
> improve search times if you can afford the pre-processing. Here's a
> paper on rapidly finding matches when there may be up to one misspelled
> character. It's easy enough to implement, though in Python you can't
> take the additional step of tuning it to stay in cache.
>
> https://Robert.Muth.Org/Papers/1996-Approx-Multi.Pdf
You've somehow title-cased that URL. The correct URL is:
https://robert.muth.org/Papers/1996-approx-multi.pdf
More information about the Python-list
mailing list