[Python-ideas] Temporary variables in comprehensions

Steven D'Aprano steve at pearwood.info
Thu Feb 15 19:57:40 EST 2018


Hi fhsxfhsx, and welcome.

My comments below, interleaved with yours.


On Thu, Feb 15, 2018 at 01:56:44PM +0800, fhsxfhsx wrote:

[quoted out of order]
> And I hope the discussion could focus more on whether we should allow 
> assigning temporary variables in comprehensions rather than how to 
> solve the specific example I mentioned above.

Whether or not to allow this proposal will depend on what alternate 
solutions to the problem already exist, so your specific example is very 
relevant. Any proposed change has to compete with existing solutions.


> As far as I can see, a comprehension like
> alist = [f(x) for x in range(10)]
> is better than a for-loop
> for x in range(10):
>   alist.append(f(x))
> because the previous one shows every element of the list explicitly so 
> that we don't need to handle `append` mentally.

While I personally agree with you, many others disagree. I know quite a 
few experienced, competent Python programmers who avoid list 
comprehensions because they consider them harder to read and reason 
about. They consider a regular for-loop better precisely because you do 
see the explicit call to append.

(In my experience, those of us who get functional-programming idioms 
often forget that others find them tricky.)

The point is that list comprehensions are already complex enough that 
they are difficult for many people to learn, and some people never come 
to grips with them. Adding even more features comes with a cost.

The bottom line is that it isn't clear to me that allowing local 
variables inside comprehensions will make them more readable.


> But when it comes to something like
> [f(x) + g(f(x)) for x in range(10)]
> you find you have to sacrifice some readableness if you don't want two 
> f(x) which might slow down your code.

The usual comments about premature optimisation apply here.

Setting a new comprehension variable is not likely to be free, and may even be 
more costly than calling f(x) twice if f() is a cheap expression:

    [x+1 + some_func(x+1) for x in range(10)]

could be faster than

    [y + some_func(y) for x in range(10) let y = x + 1]

or whatever syntax we come up with.


> Someone may argue that one can write
> [y + g(y) for y in [f(x) for x in range(10)]]

Indeed. This would be the functional-programming solution, and I 
personally think it is an excellent one. The only changes are that I'd 
use a generator expression for the intermediate value, avoiding the need 
to make a full list, and I would lay it out more nicely, using 
whitespace to make the structure more clear:

    result = [y + g(y) for y in 
                 (f(x) for x in range(10))
                 ]


> but it's not as clear as to show what `y` is in a subsequent clause, 
> not to say there'll be another temporary list built in the process.

There's no need to build the temporary list. Use a generator 
comprehension. And I disagree that the value of y isn't as clear.

An alternative is simply to refactor your list comprehension. Move the 
calls to f() and g() into a helper function:

def func(x):
    y = f(x)
    return y + g(y)

and now you can write the extremely clear comprehension

[func(x) for x in range(10)]

that needs no extra variable.



[...]
> In a word, what I'm arguing is that we need a way to assign temporary 
> variables in a comprehension.

"Need" is very strong. I think that the two alternatives I mention above 
cover 95% of the cases where might use a local variable in a 
comprehension. And of the remaining cases, many of them will be so 
complex that they should be re-written as an explicit for-loop. So in my 
opinion, we're only talking about a "need" to solve the problem for a 
small proportion of cases:

- most comprehensions don't need a local variable (apart from 
  the loop variable) at all;

- of those which do need a local variable, most can be easily 
  solved using a nested comprehension or a helper function;

- of those which cannot be solved that way, most are complicated
  enough that they should use a regular for-loop;

- leaving only a small number of cases which are complicated enough
  to genuinely benefit from local variables but not too complicated.

So this is very much a borderline feature. Occasionally it would be 
"nice to have", but on the negative side:

- it adds complexity to the language;

- makes comprehensions harder to read;

- and people will use it unnecessarily where there is no readability 
  or speed benefit (premature optimization again).

It is not clear to me that we should burden *all* Python programmers 
with additional syntax and complexity of an *already complex* feature 
for such a marginal improvement.


> In my opinion, code like
> [y + g(y) for x in range(10) **some syntax for `y=f(x)` here**]
> is more natural than any solution we now have.
> And that's why I pro the new syntax, it's clear, explicit and readable

How can you say that the new syntax is "clear, explicit and readable" 
when you haven't proposed any new syntax yet?

For lack of anything better, I'm going to suggest "let y = f(x)" as the 
syntax, although personally I don't like it even a bit.

Where should the assignment go?

    [(y, y**2) let y = x+1 for x in (1, 2, 3, 4)]

    [(y, y**2) for x in (1, 2, 3, 4) let y = x+1]

I think they're both pretty ugly, but I can't think of anything else.

Can we rename the loop variable, or is that an error?

    [(x, x**2) let x = x+1 for x in (1, 2, 3, 4)]

How do they interact when you have multiple loops and if-clauses?

    [(w, w**2) for x in (1, 2, 3, 4) let y = x+1 
               for a in range(y) let z = a+1 if z > 2 
               for b in range(z) let w = z+1]


For simplicity, perhaps we should limit any such local assignments to 
the very end of the comprehension:

    [expression for name in sequence 
                <zero or more for-loops and if-clauses>
                <zero or more let-clauses>
                ]

but that means we can't optimise this sort of comprehension:

    [expression for x in sequence 
                for y in (something_expensive(x) + function(something_expensive(x))
                ]

Or these:

    [expression for x in sequence 
                if something_expensive(x) or condition(something_expensive(x))
                ]


I think these are very hard questions to answer.


-- 
Steve


More information about the Python-ideas mailing list