Most efficient method to search text?

Bengt Richter bokr at oz.net
Fri Oct 18 14:32:20 EDT 2002


On Fri, 18 Oct 2002 10:03:10 GMT, Michael Hudson <mwh at python.net> wrote:

>bokr at oz.net (Bengt Richter) writes:
>
>> On Thu, 17 Oct 2002 11:14:09 GMT, Michael Hudson <mwh at python.net> wrote:
>> 
>> >Tim Peters <tim.one at comcast.net> writes:
>> >
>> >> Especially for purposes of building lexers, it might be useful if the re
>> >> package could recognize when a DFA approach was sufficient and practical,
>> >> and switch to a different scheme entirely then.  Or it might not.  Build
>> >> code to try it both ways, and let us know how it turns out ...
>> >
>> >Indeed, my code canes re when the wordlist gets long.  Here's some
>> 
>> If this is easy to add to your test harness, I'd be interested to see what
>> this search does in comparison, with the longer word lists (it's probably
>> faster than has_word.py that I posted elsewhere, depending on relative lengths
>> of word lists and strings, and internal vs external looping and allocation.
>
>It's quick:
>
>>>> robin.do_comp(1000)
>compile...       3.42289197445
>compile2...      2.49848008156
>compile3...      0.696313977242
>compile4...      4.04265594482
>compile_re...    0.627331018448
>compile_bengt... 0.00175499916077
>
>test...          1.39854204655
>test2...         2.93543899059
>test3...         3.2231388092
>test4...         2.15867292881
>test_re...       8.38554108143
>test_bengt...    0.437232971191
>
>I'm sure Tim once said something along the lines of "Python doesn't
>give much advice for getting good performance, beyond a not-so-subtle
>hint to exploit dicts for all they're worth" but I can't find it now.
>
Thanks. I wonder what would happen if you used word lists and files
guaranteed *not* to have any hits, so they'd all be searched to the end.
Or what was your mix (no of search words, length of files, hit distribution)?

>I'm sure Tim once said something along the lines of "Python doesn't
>give much advice for getting good performance, beyond a not-so-subtle
>hint to exploit dicts for all they're worth" but I can't find it now.
>
There's probably a lot to learn from the dictionary implementation code
... one of these days ;-)

>Cheers,
>M.
>hmm, look at that sig...
>
>-- 
>  Premature optimization is the root of all evil.
>       -- Donald E. Knuth, Structured Programming with goto Statements
Premature celebration is a related problem ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list