[Python-Dev] accumulator display syntax

Alex Martelli aleaxit at yahoo.com
Tue Oct 21 11:49:21 EDT 2003


On Tuesday 21 October 2003 03:57 pm, Skip Montanaro wrote:
>     >> [Alex measures speed improvements]
>
>     Guido> Great!  This is a plus for iterator comprehensions (we need a
>     Guido> better term BTW).
>
> Here's an alternate suggestion.  Instead of inventing new syntax, why not
> change the semantics of list comprehensions to be lazy?  They haven't been
> in use that long, and while they are popular, the semantic tweakage would
> probably cause minimal disruption.  In situations where laziness wasn't
> wanted, the most that a particular use would have to change (I think) is to
> pass it to list().

Well, yes, the _most_ one could ever have to change is move from
[ ... ] to list[ ... ]) to get back today's semantics.  But any use NOT so
changed may break, in general; any perfectly correct program coded
with Python 2.1 to Python 2.3 -- several years' worth of "current Python",
by the time 2.4 comes out -- might break.

I think we should keep the user-observable semantics as now, BUT
maybe an optimization IS possible if all the user code does with the
LC is loop on it (or anyway just get its iter(...) and nothing else).

Perhaps a _variant_ of "The Trick" MIGHT be practicable (since I
don't believe the "call from C holding just one ref" IS a real risk here).
Again it would be based on reference-count being 1 at a certain point.

The LC itself _might_ just build a generator and wrap it in a
"pseudolist" object.  Said pseudolist object, IF reacting to a tp_iter
when its reference count is one, NEED NOT "unfold" itself.  But
for ANY other operation, it must generate the real list and "get out
of the way" as much as possible.

Note that this includes a tp_iter WITH rc>1.  For example:

x = [ a.strip().upper() for a in thefile if len(a)>7 ]
for y in x: blah(y)
for z in x: bluh(z)

the first 'for' implicitly calls iter(x) but that must NOT be allowed
to "consume" thefile in a throwaway fashion -- because x can be
used again later (e.g. in the 2nd for).  This works fine today and
has worked for years, and I would NOT like it to break in 2.4... if 
LC's had been lazy from the start (just as they are in Haskell),
that would have been wonderful, but, alas, we didn't have the 
iterator protocol then...:-(

As to whether the optimization is worth this complication, I dunno.
I'd rather have "iterator literals", I think -- simpler and more explicit.
That way when i see [x.bah() for x in someiterator] I KNOW the
iterator is consumed right then and there, I don't need to look at
the surrounding context... context-depended semantics is not
Python's most normal and usual approach, after all...


Alex




More information about the Python-Dev mailing list