union in Python

Tue Aug 19 21:10:48 EDT 2003

On Tue, 19 Aug 2003 08:17:57 +0000 (UTC), Duncan Booth <duncan at NOSPAMrcp.co.uk> wrote:

>bokr at oz.net (Bengt Richter) wrote in news:bhrsuj$mdv$0 at 216.39.172.122:
>
>> But I don't see any reason why the performances couldn't be made to
>> match. 
>
>For searching short strings there probably isn't any advantage to using 
>Boyer-Moore, and for short search strings the advantage is probably quite 
>small. I think you would need to spend some time working out just where the 
>breakpoints are that make it worth switching from a plain search to one 
>that skips ahead.
Sure. No question. IWT the deciding question would be if the check can be done
with CPU cycles that are otherwise going to be wasted anyway. You wouldn't want
to incur a cost for 99.44% of cases for the sake of a big improvement in 0.56%.

Sometimes you can also rewrite the code and get a check that doesn't interfere
with normal fast cases.

If the check can be made "for free," then I only see programming cost in doing it.
I think there's a chance a simple length check could be done by the CPU in the time
shadow of waiting for data to load in its cache, what with all the pipeline and scheduling
magic CPUs do these days. But this is handwaving imaginings. I couldn't be sure without testing.

>
>The regular expression search may still win, since regular expressions, 
>even if not precompiled, are cached. That means that if you perform the 
>regex search millions of times, the setup cost is only incurred once. 
>String searches don't have this advantage, and it seems unlikely to be 
>worth the effort of caching arguments passed to string.find!
But Gustavo's test had the regex compiled outside the loop, so that shouldn't
have affected the result. A programmer who leaves a constant regex compilation
inside the loop deserves what s/he gets (cache-mitigated or not, the calls would
still be there) ;-)

Regards,
Bengt Richter