[Python-Dev] Re: accumulator display syntax

Guido van Rossum guido at python.org
Thu Oct 23 01:43:40 EDT 2003


[Guido]
> > I have to stand on my head to understand what it
> > does.  This is even the case for examples like
> > 
> >   reduce(lambda x, y: x + y.foo, seq)

[Greg]
> It occurs to me that, with generator expressions,
> such cases could be rewritten as
> 
>     reduce(lambda x, y: x + y, (z.foo for z in seq))
> 
> i.e. any part of the computation that only depends on
> the right argument can be factored out into the
> generator. So I might have to take back some of what
> I said earlier about generator comprehensions being
> independent of reduce.
> 
> But if I understand you correctly, what you're saying
> is that the interesting cases are the ones where there
> isn't a ready-made binary function that does what
> you want, in which case you're going to have to spell
> everything out explicitly anyway one way or another.

(And then spelling it out so that it works with reduce() reduces
clarity.)

> In that case, the most you could gain from a reduce
> syntax would be that it's an expression rather than
> a sequence of statements.
> 
> But the same could be said of list comprehensions --
> and *was* said quite loudly by many people in the early
> days, if I recall correctly. What's the point, people
> asked, when writing out a set of nested loops is just
> about as easy?

Some people still hate LC's for this reason.

> Somehow we came to the conclusion that being able to
> write a list comprehension as an expression was a
> valuable thing to have, even if it wasn't significantly
> shorter or clearer. What about reductions? Do we feel
> differently? If so, why?

IMO LC's *are* significantly clearer because the notation lets you
focus on what goes into the list (e.g. the expresion "x**2") and under
what conditions (e.g. the condition "x%2 == 1") rather than how you
get it there (i.e. the initializer "result = []" and the call
"result.append(...)").

This is an incredibly common idiom in the use of loops; for
experienced programmers the boilerplate disappears when they read the
code, but for less experienced readers it takes more time to recognize
the idiom.  I think this is at least in part due to the fact that
there are more details that can be written differently, e.g. the name
of the result variable, and exactly at which point it is initialized.

I think that for reductions the gains are less clear.  The initializer
for the result variable and the call that updates its are no longer
boilerplate, because they vary for each use; plus the name of the
result variable should be chosen carefully because it indicates what
kind of result it is (e.g. a sum or product).  So, leaving out the
condition for now, the pattern or idiom is:

  <result> = <initializer>
  for <variable> in <iterable>:
      <result> = <expression>

(Where <expression> uses <result> and <variable>.)

If we think of this as a template with parameters, there are five
parameters!  (A LC without a condition only has 3: <expression>,
<variable> and <iterable>.)  No matter how hard you try, a macro with
5 parameters will have a hard time conveying the meaning of each
without being at least as verbose as the full template.

We could reduce the number of template parameters to 4 by leaving
<result> anonymous; we could then refer to it by e.g. "_" in
<expression>, which is more concise and perhaps acceptable, but makes
certain uses more strained (e.g. mean() below).

Just for fun, let me try to propose a macro syntax:

  reduction(<initializer>, <expression>, <variable>, <iterable>)

(I think it's better to have <initializer> as the first parameter, but
you can )

For example:

  reduction(0, _+x**2, x, S)

Lavishly sprinkle syntactic sugar, and perhaps it can become this
('reduction' would have to be a reserved word):

  reduction(0, _+x**2 for x in S)

A few more examples using this notation:

  # product(S), if Raymond's product() builtin is accepted
  reduction(1, _*x for x in S)

  # mean of f(x); uses result tuple and needs result postprocessing
  total, n = reduction((0, 0), (_[0]+f(x), _[1]+1) for x in S)
  mean = total/n

  # horner(S, x): evaluate a polynomial over x: [6, 3, 4] => 6*x**2 + 3*x + 4
  reduction(0, _*x + c for c in S)

In each of these cases I have the same gut response as to writing
these using reduce(): the notation is too "concentrated", I have to
think so hard before I understand what it does that I wouldn't mind
having it spread over three lines.  Compare the above four examples
to:

  sum = 0
  for x in S:
      sum += x**2

  product = 1
  for x in S:
      product *= x

  total, n = 0, 0
  for x in S:
      total += f(x)
      n += 1
  mean = total/n

  horner = 0
  for c in S:
      horner = horner*x + c

I find that these cause much less strain on the eyes.

(BTW the horner example shows that insisting on augmented assignment
would reduce the power.)

Concluding, I think the reduce() pattern is doomed -- the template is
too complex to capture in special syntax.

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list