List Count
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Tue Apr 23 10:49:58 EDT 2013
On Tue, 23 Apr 2013 08:05:53 +0100, Blind Anagram wrote:
> I did a lot of work comparing the overall performance of the sieve when
> using either lists or arrays and I found that lists were a lot faster
> for the majority use case when the sieve is not large.
And when the sieve is large?
I don't actually believe that the bottleneck is the cost of taking a list
slice. That's pretty fast, even for huge lists, and all efforts to skip
making a copy by using itertools.islice actually ended up slower. But
suppose it is the bottleneck. Then *sooner or later* arrays will win over
lists, simply because they're smaller.
Of course, "sooner or later" might be much later.
I expect that you will not find a single algorithm, or data structure,
that works optimally for both small and huge inputs. In general, there
are two strategies you might take:
1) Use an algorithm or data structure which is efficient for small inputs
with small inputs, and after some cut-off size, swap to a different
algorithm which is efficient for large inputs. That swap over may require
a one-off conversion cost, but provided your sieve never shrinks, this
may not matter.
2) Use only the algorithm for large inputs. For small inputs, the
difference between the two is insignificant in absolute terms (who cares
if the operation takes 5ms instead of 1ms?), but for large N, there is a
clear winner.
There's nothing that says you're limited to two algorithms. You may find
that to really optimize things, you need three or more algorithms, each
one optimized for a particular subset of inputs. Of course, all this
added complexity is itself very costly. Is it worth it?
--
Steven
More information about the Python-list
mailing list