[Python-ideas] Re: String comprehension

May 3, 2021

      On Mon, May 03, 2021 at 09:04:51PM +1000, Chris Angelico wrote:
...
...
My understanding of the situation is that the list comprehension [ 
x*x for x in range(5) ] is a shorthand for list( x*x for x in 
range(5) ).
Sorta-kinda. It's not a shorthand in the sense that you can't simply
replace one with the other,
Only because the `list` name could be shadowed or rebound to something 
else. Syntactically and functionally, aside from the lazy vs eager 
difference, a comprehension is a comprehension and there is nothing 
generator comprehensions can do that list comprehensions can't.

In Python 2 there were scoping differences between the two, but I 
believe that in Python 3 those have been eliminated.
...
but they do have very similar behaviour,
yes. A genexp is far more flexible than a list comp,
Aside from the lazy nature of generator comprehensions, what else?
...
so the compiled
bytecode for list(genexp) has to go to a lot of unnecessary work to
permit that flexibility, whereas the list comp can simplify things
down.
I don't think so. The bytecode in 3.9 is remarkably similar.

    >>> dis.dis('list(spam for spam in eggs)')
      1           0 LOAD_NAME                0 (list)
                  2 LOAD_CONST               0 (<code object <genexpr> at 0x7fc185ce0870, file "<dis>", line 1>)
                  4 LOAD_CONST               1 ('<genexpr>')
                  6 MAKE_FUNCTION            0
                  8 LOAD_NAME                1 (eggs)
                 10 GET_ITER
                 12 CALL_FUNCTION            1
                 14 CALL_FUNCTION            1
                 16 RETURN_VALUE

    Disassembly of <code object <genexpr> at 0x7fc185ce0870, file "<dis>", line 1>:
      1           0 LOAD_FAST                0 (.0)
            >>    2 FOR_ITER                10 (to 14)
                  4 STORE_FAST               1 (spam)
                  6 LOAD_FAST                1 (spam)
                  8 YIELD_VALUE
                 10 POP_TOP
                 12 JUMP_ABSOLUTE            2
            >>   14 LOAD_CONST               0 (None)
                 16 RETURN_VALUE

The bytecode for the list comp `[spam for spam in eggs]` is only three
bytecodes shorter, so that doesn't support your comment about "a lot of 
unnecessary work".

`dis.dis('[spam for spam in eggs]')` can:

- skip the name lookup for list (LOAD_NAME);

- and the CALL_FUNCTION that ends up calling it;

The dissassemblies of the two code objects, "<genexpr>" and 
"<listcomp>", have slightly different implementations but only differ by 
one bytecode overall.

As far as runtime efficiency, list comps are a little faster. Iterating 
over a 1000-item sequence is 33% faster for a list comp, but for a 
100000-item sequence that drops to 25% faster. But as soon as you do a 
significant amount of work inside the comprehension, that work is likely 
to dominate the other costs.

There's definitely some overhead needed to support starting and stopping 
a generator, but we can argue that is an implementation detail. A 
sufficiently clever interpreter could avoid that overhead.
...
That said, I think the only way you'd actually detect a
behavioural difference is if the name "list" has been rebound.
That and timing.

-- 
Steve

[Python-ideas] Re: String comprehension

Steven D'Aprano