[Python-ideas] Map-then-filter in comprehensions

Paul Moore p.f.moore at gmail.com
Tue Mar 8 12:18:15 EST 2016


On 8 March 2016 at 16:20, Allan Clark <allan.clark at gmail.com> wrote:
> Again, fair. Not sure I can quite articulate why I would prefer a
> comprehension here.
> A very weak argument would be that such code tends to change, and when it
> does it may get morphed back into something that *can* be done with a
> comprehension, but you might end up leaving it as a for-loop with append.

That's actually a very good point - there's quite a distinct "break
point" where you have to rewrite code as a loop rather than a
comprehension, and for maintainability purposes that switch tends to
be permanent. So delaying the point where you need to switch (assuming
the comprehension form is readable and maintainable) is a fair goal.

There's also the point that much code is actually one-off scripts, and
the break point can be very different in such code - "how I think of
it" has a much higher weight in such a situation (often even greater
than "is it maintainable" for completely throwaway code).

The problem then becomes one of finding a syntax that is natural and
not forced. That's hard, particularly with things like the high bar on
introducing new keywords meaning you're trying to reuse words that
"sort of" suit the situation.

>> Obviously real world examples would be better than artificial ones, as
>> artificially simple examples make terse notation look better... And in
>> the example using a generator, you'd be able to give it a far more
>> meaningful name with a bit of real-life domain terminology
>
> Yeah agreed.
> I would note that in a real-world example you are giving a name to something
> that is forced to calculate *and* filter. So your name is going to end-up
> being something like, say "is_registered_and_voting_for", or
> "is_highest_tax_bracket_and_is_taxed". Which you might not actually write,
> and instead opt for something like "tax_amount". Even in your code for my
> artificial example your generator is named "bounded" but that does not sound
> like it is doing any filtering at all, it sounds like it is simply bounding
> all values. Of course to be fair, I didn't give a name for that at all, and
> probably I want to give a name for the resulting list. Although of course we
> both need to do that, but at least with a comprehension you only have to
> come up with a name for the result, not for the result and the associated
> generator.

Again, a good point. Compound clauses make for bad names typically, so
if you're thinking in terms of "calculate and filter" it'll be hard to
think of a really good name.

Maybe a better approach is to look at chaining of individual building
blocks. The

    [y for y in (abs(x) for x in numbers) if y > 5]

approach takes that form, but the nesting hides the fact.

In some code I wrote today, I used the form

    data = [abs(x) for x in numbers]
    data = [x for x in data if x > 5]

You can even use generators

    data = (abs(x) for x in numbers)
    data = (x for x in data if x > 5)
    list(data)

to make the calculations lazy.

To me, that makes the step by step "calculate, then filter" pipeline
explicit, and is actually really readable. Of course, once again this
is the sort of thing that's very much about personal opinion, and you
may hate that style (particularly using the generic variable name
"data" over and over).

So, overall I'd say I'm not against the idea, but there are some
reasonably good alternatives already available, and so coming up with
something that's compellingly better than the status quo is going to
be a hard job. I'm glad we're having the discussion, though - we need
to (re-) explore questions like this to avoid the language stagnating.
And who knows when the inspiration will strike? :-)

Paul


More information about the Python-ideas mailing list