[Python-ideas] Make len() usable on a generator
Steven D'Aprano
steve at pearwood.info
Sat Oct 11 07:11:40 CEST 2014
On Fri, Oct 10, 2014 at 02:06:20PM -0400, random832 at fastmail.us wrote:
> On Fri, Oct 10, 2014, at 11:09, Adam Jorgensen wrote:
> > I don't think it makes much sense for len() to work on generators and the
> > fact that sum() works isn't a good argument.
> >
> > Summing the contents of a generator can make sense whereas attempting to
> > obtain the length of something which specifically does not define a
> > length
> > seems a little nonsensical to me...
>
> Why doesn't it define a length? No, hear me out. Is there any reason
> that, for example, generator expressions on lists or ranges shouldn't be
> read-only views instead of generators?
The important question is not "Why *doesn't* it define a length?" but
"Why *should* it define a length?". What advantage does it give you?
Let's take the case where you are both the producer and consumer of the
generator. You might like to write something like this:
it = (func(x) for x in some_list)
n = len(it)
consume(it, n)
But that's no better or easier than:
it = (func(x) for x in some_list)
n = len(some_list)
consume(it, n)
so there is no benefit to having the generator have a length. It does no
harm either. However, it does require a very specific special case. It
only works when you walk directly over a fixed-length sequence, and
can't be used in cases like these:
it = (func(x) for x in some_list if condition(x))
it = (func(x) for x in some_iterator_of_unpredictable_length)
So from the perspective of the producer, generators cannot always be
given a length, and if they can, since Python can determine the length,
so can you. There's no advantage to having the generator type do so
that I can see.
Now consider from the perspective of a consumer of an iterator. You
don't know where it comes from or how it is produced, so you don't know
if it has a predictable length or not. Since you can't rely on it having
a length, it doesn't actually give you any benefit.
Perhaps you're already writing code that supports sequences and
iterators, with a separate branch for sequences based on the fact that
they have a known length:
def function(it):
try:
n = len(it)
except TypeError:
# process the cases with no known length
else:
# process cases with a known length
It might be nice if some generators will be processed by the "known
length" branch, but it doesn't save you from having to write the
"unknown length" branch. Again, the benefit is minimal or zero.
There may be cases where you *require* a predictable length, in which
case you probably should support only sequences.
So there is no real benefit as far as I can see why generators should
support len() even when they could.
--
Steven
More information about the Python-ideas
mailing list