On Fri, Oct 03, 2014 at 05:09:20PM +0200, Thomas Chaumeny wrote:
Hi!
I have just come across some code counting a generator comprehension expression by doing len([foo for foo in bar if some_condition]) and I realized it might be better if we could just use len(foo for foo in bar if some_condition) as it would avoid a list allocation in memory.
Another possibility is to write sum(1 for foo in bar if some_condition), but that is not optimal either as it generates a lot of intermediate additions which should not be needed.
I don't understand this reasoning. Why do you think that they are unnecessary? I believe that, in the general case of an arbitrary generator expression, there are only two ways to tell what the length will be. The first is to produce a list (or other sequence), then return the length of the list: len([obj for obj in generator if condition]). The second is to count each item as it is produced, but without storing them all: sum(1 for obj in generator if condition). The first is optimized for readability, the second for memory. If I have missed a third way, please tell me. I don't believe that there is any other general way to work out the length of an arbitrary generator. (Apart from trivial, or obfuscated, variations on the above two, of course.) How would you tell what the length of this generator should be, apart from actually running it to exhaustion? def generator(): while True: if random.random() < 0.5: return yield "spam" Since there is no general way to know what the length of an arbitrary generator will be, it is better to be explicit that it has to be calculated by running through the generator and exhausting it. -- Steven