Most efficient method to search text?
Bengt Richter
bokr at oz.net
Fri Oct 18 14:32:20 EDT 2002
On Fri, 18 Oct 2002 10:03:10 GMT, Michael Hudson <mwh at python.net> wrote:
>bokr at oz.net (Bengt Richter) writes:
>
>> On Thu, 17 Oct 2002 11:14:09 GMT, Michael Hudson <mwh at python.net> wrote:
>>
>> >Tim Peters <tim.one at comcast.net> writes:
>> >
>> >> Especially for purposes of building lexers, it might be useful if the re
>> >> package could recognize when a DFA approach was sufficient and practical,
>> >> and switch to a different scheme entirely then. Or it might not. Build
>> >> code to try it both ways, and let us know how it turns out ...
>> >
>> >Indeed, my code canes re when the wordlist gets long. Here's some
>>
>> If this is easy to add to your test harness, I'd be interested to see what
>> this search does in comparison, with the longer word lists (it's probably
>> faster than has_word.py that I posted elsewhere, depending on relative lengths
>> of word lists and strings, and internal vs external looping and allocation.
>
>It's quick:
>
>>>> robin.do_comp(1000)
>compile... 3.42289197445
>compile2... 2.49848008156
>compile3... 0.696313977242
>compile4... 4.04265594482
>compile_re... 0.627331018448
>compile_bengt... 0.00175499916077
>
>test... 1.39854204655
>test2... 2.93543899059
>test3... 3.2231388092
>test4... 2.15867292881
>test_re... 8.38554108143
>test_bengt... 0.437232971191
>
>I'm sure Tim once said something along the lines of "Python doesn't
>give much advice for getting good performance, beyond a not-so-subtle
>hint to exploit dicts for all they're worth" but I can't find it now.
>
Thanks. I wonder what would happen if you used word lists and files
guaranteed *not* to have any hits, so they'd all be searched to the end.
Or what was your mix (no of search words, length of files, hit distribution)?
>I'm sure Tim once said something along the lines of "Python doesn't
>give much advice for getting good performance, beyond a not-so-subtle
>hint to exploit dicts for all they're worth" but I can't find it now.
>
There's probably a lot to learn from the dictionary implementation code
... one of these days ;-)
>Cheers,
>M.
>hmm, look at that sig...
>
>--
> Premature optimization is the root of all evil.
> -- Donald E. Knuth, Structured Programming with goto Statements
Premature celebration is a related problem ;-)
Regards,
Bengt Richter
More information about the Python-list
mailing list