List Count

Tue Apr 23 10:49:58 EDT 2013

On Tue, 23 Apr 2013 08:05:53 +0100, Blind Anagram wrote:

> I did a lot of work comparing the overall performance of the sieve when
> using either lists or arrays and I found that lists were a lot faster
> for the majority use case when the sieve is not large.

And when the sieve is large?

I don't actually believe that the bottleneck is the cost of taking a list 
slice. That's pretty fast, even for huge lists, and all efforts to skip 
making a copy by using itertools.islice actually ended up slower. But 
suppose it is the bottleneck. Then *sooner or later* arrays will win over 
lists, simply because they're smaller.

Of course, "sooner or later" might be much later.

I expect that you will not find a single algorithm, or data structure, 
that works optimally for both small and huge inputs. In general, there 
are two strategies you might take:

1) Use an algorithm or data structure which is efficient for small inputs 
with small inputs, and after some cut-off size, swap to a different 
algorithm which is efficient for large inputs. That swap over may require 
a one-off conversion cost, but provided your sieve never shrinks, this 
may not matter.

2) Use only the algorithm for large inputs. For small inputs, the 
difference between the two is insignificant in absolute terms (who cares 
if the operation takes 5ms instead of 1ms?), but for large N, there is a 
clear winner.

There's nothing that says you're limited to two algorithms. You may find 
that to really optimize things, you need three or more algorithms, each 
one optimized for a particular subset of inputs. Of course, all this 
added complexity is itself very costly. Is it worth it?

-- 
Steven