[Python-ideas] Make len() usable on a generator

Andrew Barnert abarnert at yahoo.com
Sat Oct 4 16:45:01 CEST 2014


On Oct 4, 2014, at 15:27, Steven D'Aprano <steve at pearwood.info> wrote:

> On Fri, Oct 03, 2014 at 05:09:20PM +0200, Thomas Chaumeny wrote:
>> Hi!
>> 
>> I have just come across some code counting a generator comprehension
>> expression by doing len([foo for foo in bar if some_condition]) and I
>> realized it might be better if we could just use len(foo for foo in bar if
>> some_condition) as it would avoid a list allocation in memory.
>> 
>> Another possibility is to write sum(1 for foo in bar if some_condition),
>> but that is not optimal either as it generates a lot of intermediate
>> additions which should not be needed.
> 
> I don't understand this reasoning. Why do you think that they are 
> unnecessary?
> 
> I believe that, in the general case of an arbitrary generator 
> expression, there are only two ways to tell what the length will be. The 
> first is to produce a list (or other sequence), then return the 
> length of the list:  len([obj for obj in generator if condition]). The 
> second is to count each item as it is produced, but without storing them 
> all: sum(1 for obj in generator if condition). The first is optimized 
> for readability, the second for memory.

Also more that the itertools recipes include a function for doing exactly this:

def quantify(iterable, pred=bool):
    "Count how many times the predicate is true"
    return sum(map(pred, iterable))
The more-itertools library on PyPI has this, plus an even simpler `ilen` function that just does, IIRC, sum(1 for _ in iterable). Of course they're both trivial one-liners, but maybe calling more_itertools.ilen is a nice way to clarify that you're intentionally consuming an iterator just to get its length.

> I don't believe that there is any other general way to work out the 
> length of an arbitrary generator. (Apart from trivial, or obfuscated, 
> variations on the above two, of course.)

Would teeing the iterator and consuming the extra copy count as an obfuscated variation? In practice, it's kind of the worst of both worlds--you're constructing a sequence (a deque instead of a list) in memory, but can only access it as a one-shot iterator. Conceptually, it looks somewhat nice--you're copying the iterator to count it without destroying it--but I think that's only an illusion for those who don't understand lists as iterables.

> How would you tell what the 
> length of this generator should be, apart from actually running it to 
> exhaustion?

Obviously we just need to rewrite Python around lazy dataflow variables, so whatever you assign len(generator()) to doesn't consume the generator until necessary, meaning you could still use the iterator's vales elsewhere in the mean time. :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20141004/4ab389dd/attachment.html>


More information about the Python-ideas mailing list