
On Tuesday 21 October 2003 03:57 pm, Skip Montanaro wrote:
>> [Alex measures speed improvements]
Guido> Great! This is a plus for iterator comprehensions (we need a Guido> better term BTW).
Here's an alternate suggestion. Instead of inventing new syntax, why not change the semantics of list comprehensions to be lazy? They haven't been in use that long, and while they are popular, the semantic tweakage would probably cause minimal disruption. In situations where laziness wasn't wanted, the most that a particular use would have to change (I think) is to pass it to list().
Well, yes, the _most_ one could ever have to change is move from [ ... ] to list[ ... ]) to get back today's semantics. But any use NOT so changed may break, in general; any perfectly correct program coded with Python 2.1 to Python 2.3 -- several years' worth of "current Python", by the time 2.4 comes out -- might break. I think we should keep the user-observable semantics as now, BUT maybe an optimization IS possible if all the user code does with the LC is loop on it (or anyway just get its iter(...) and nothing else). Perhaps a _variant_ of "The Trick" MIGHT be practicable (since I don't believe the "call from C holding just one ref" IS a real risk here). Again it would be based on reference-count being 1 at a certain point. The LC itself _might_ just build a generator and wrap it in a "pseudolist" object. Said pseudolist object, IF reacting to a tp_iter when its reference count is one, NEED NOT "unfold" itself. But for ANY other operation, it must generate the real list and "get out of the way" as much as possible. Note that this includes a tp_iter WITH rc>1. For example: x = [ a.strip().upper() for a in thefile if len(a)>7 ] for y in x: blah(y) for z in x: bluh(z) the first 'for' implicitly calls iter(x) but that must NOT be allowed to "consume" thefile in a throwaway fashion -- because x can be used again later (e.g. in the 2nd for). This works fine today and has worked for years, and I would NOT like it to break in 2.4... if LC's had been lazy from the start (just as they are in Haskell), that would have been wonderful, but, alas, we didn't have the iterator protocol then...:-( As to whether the optimization is worth this complication, I dunno. I'd rather have "iterator literals", I think -- simpler and more explicit. That way when i see [x.bah() for x in someiterator] I KNOW the iterator is consumed right then and there, I don't need to look at the surrounding context... context-depended semantics is not Python's most normal and usual approach, after all... Alex