On 11 June 2017 at 13:35, David Mertz email@example.com wrote:
You are right. I made a thinko.
List construction from an iterator is O(N) just as is
sum(1 for _ in it).
Both of them need to march through every element. But as a constant
multiplier, just constructing the list should be faster than needing an
addition (Python append is O(1) because of smart dynamic memory
So the "just read the iterator" is about 2-3 times faster than read-then-accumulate). Of course, it the run-lengths are LARGE, we can start worrying about the extra memory allocation needed as a tradeoff. Your sum uses constant memory.
This would be another argument in favour of providing an itertools.counted_len function, as that would be able to avoid all the overheads currently associated with the "sum(1 for __ in iterable)" counting strategy.
Without that, the best you can do in pure Python is to use __length_hint__ to choose your preferred counting strategy at runtime based on the specific input.
from operator import length_hint # 10k 64-bit pointers ~= 640k max temporary list
_USE_COUNTED_SUM = 10_001
def counted_len(iterable): # For sized containers, just return their length try: return len(iterable) except TypeError: pass # For probably-large inputs & those with no length hint, count them hint = length_hint(iterable, default=_USE_COUNTED_SUM) if hint >= _USE_COUNTED_SUM: return sum(1 for __ in iter(iterable)) # Otherwise, make a temporary list and report its length # as the specifics of the CPython implementation make this # faster than using the generator expression return len(list(iterable))
P.S. itertools._grouper objects don't currently provide a length hint, and it would be tricky to add one, since it would need to be based on the number of remaining items in the original sequence, which would it turn depend on that defining either __len__ or __length_hint__.
-- Nick Coghlan | firstname.lastname@example.org | Brisbane, Australia