On Thu, Aug 13, 2020 at 12:38:38PM -0700, Christopher Barker wrote:
On Thu, Aug 13, 2020 at 12:27 PM Ben Rudiak-Gould benrudiak@gmail.com wrote:
I think islice should implement __length_hint__, though. As of 3.8.5 it doesn't.
And it could support __len__, and raise an Exception when the underlying iterable doesn’t support it.
"Iterators should support len" is one of those features that everyone thinks they want, but nobody can show how it is actually workable as a language or library feature in the most general case.
It might, sometimes, be workable for a specific custom iterator that you control yourself, in an application. But as a library feature, it is unusable. The problem is that the concept of "the length of an iterator" is unworkable in general, leading to confusing and contradictory behaviour.
This seem reasonable at first:
orig = 'abcd' it = iter(orig) assert len(it) == 4
but as soon as you begin to iterate over the iterator, we run into trouble. Should `len(it)` return the length of the original sequence, or track the number of currently remaining elements? Both cases are troublesome.
(1) We return the length of the original sequence. Then we have the surprising result that `len(it) != len(list(it))`.
item = next(it) assert item == 'a' assert len(it) == len(orig) list(it) # ['b', 'c', 'd'] has length 3, not 4
This violates the critical invariant that if an iterable has length N, then iterating over it (with no early exit) will take N loops.
count = 0 n = len(it) for x in it: count += 1 assert count == n
If the assertion fails, then your len function lied to you, and we will have a lot of bug reports that len is inaccurate.
(2) We track the remaining items in the iterator. Then we violate the critical invariant that the length of a sequence (or sequence-like object) should not depend on whether you have iterated over it or not.
it = iter(orig) assert len(it) == 4 for x in it: pass assert len(it) == 0
Why is this a critical invariant? Because otherwise we introduce a surprising temporal coupling in your code, making algorithms fragile and likely buggy. The length of the iterator depends on whether we check it before or after the loop, which is bad:
def average(iterable): return sum(iterable)/len(iterable)
def average(iterable): n = len(iterable) return sum(iterable)/n
If those two functions don't give the same result, then your length calculation is broken.
Whichever strategy we pick for the length of an iterator, we're going to surprise people and lead to fragile, buggy code.
The easy cases:
it = iter(sequence) n = len(it) for item in it: process(item) print(f"Processed {n} items")
where we work with a fresh iterator, retrieve the length *before* iterating, and then *only* iterate fully to completion, might work okay, but as soon as you get to more complex cases the idea of len for iterators is a minefield.
The only generally correct solution is to not pick either strategy (1) or strategy (2), both of which are sometimes what the caller expects but sometimes leads to surprising results and fragile, broken code, but instead refuse to guess.