Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

9 Apr 2018

      Kyle, you sounded so reasonable when you were trashing itertools.accumulate
(which I now agree is horrible).  But then you go and support Serhiy's
madness:  "smooth_signal = [average for average in [0] for x in signal for
average in [(1-decay)*average + decay*x]]" which I agree is clever, but
reads more like a riddle than readable code.

Anyway, I continue to stand by:

    (y:= f(y, x) for x in iter_x from y=initial_y)

And, if that's not offensive enough, to its extension:

    (z, y := f(z, x) -> y for x in iter_x from z=initial_z)

Which carries state "z" forward but only yields "y" at each iteration.
(see proposal: https://github.com/petered/peps/blob/master/pep-9999.rst)

Why am I so obsessed?  Because it will allow you to conveniently replace
classes with more clean, concise, functional code.  People who thought they
never needed such a construct may suddenly start finding it indispensable
once they get used to it.

How many times have you written something of the form?:

    class StatefulThing(object):

        def __init__(self, initial_state, param_1, param_2):
            self._param_1= param_1
            self._param_2 = param_2
            self._state = initial_state

        def update_and_get_output(self, new_observation):  # (or just
__call__)
            self._state = do_some_state_update(self._state, new_observation,
self._param_1)
            output = transform_state_to_output(self._state, self._param_2)
            return output

    processor = StatefulThing(initial_state = initial_state, param_1 = 1,
param_2 = 4)
    processed_things = [processor.update_and_get_output(x) for x in x_gen]

I've done this many times.  Video encoding, robot controllers, neural
networks, any iterative machine learning algorithm, and probably lots of
things I don't know about - they all tend to have this general form.

And how many times have I had issues like "Oh no now I want to change
param_1 on the fly instead of just setting it on initialization, I guess I
have to refactor all usages of this class to pass param_1 into
update_and_get_output instead of __init__".

What if instead I could just write:

    def update_and_get_output(last_state, new_observation, param_1, param_2)
        new_state = do_some_state_update(last_state, new_observation,
_param_1)
        output = transform_state_to_output(last_state, _param_2)
        return new_state, output

    processed_things = [state, output:= update_and_get_output(state, x,
param_1=1, param_2=4) -> output for x in observations from
state=initial_state]

Now we have:
- No mutable objects (which cuts of a whole slew of potential bugs and
anti-patterns familiar to people who do OOP.)
- Fewer lines of code
- Looser assumptions on usage and less refactoring. (if I want to now pass
in param_1 at each iteration instead of just initialization, I need to make
no changes to update_and_get_output).
- No need for state getters/setters, since state is is passed around
explicitly.

I realize that calling for changes to syntax is a lot to ask - but I still
believe that the main objections to this syntax would also have been raised
as objections to the now-ubiquitous list-comprehensions - they seem hostile
and alien-looking at first, but very lovable once you get used to them.

On Sun, Apr 8, 2018 at 1:41 PM, Kyle Lahnakoski 
wrote:
...
On 2018-04-05 21:18, Steven D'Aprano wrote:
...
(I don't understand why so many people have such an aversion to writing
functions and seek to eliminate them from their code.)
I think I am one of those people that have an aversion to writing
functions!
I hope you do not mind that I attempt to explain my aversion here. I
want to clarify my thoughts on this, and maybe others will find
something useful in this explanation, maybe someone has wise words for
me. I think this is relevant to python-ideas because someone with this
aversion will make different language suggestions than those that don't.
Here is why I have an aversion to writing functions: Every unread
function represents multiple unknowns in the code. Every function adds
to code complexity by mapping an inaccurate name to specific
functionality.
When I read code, this is what I see:
...
x = you_will_never_guess_how_corner_cases_are_handled(a, b, c)
   y =
you_dont_know_I_throw_a_BaseException_when_I_do_not_like_your_arguments(j,
k, l)
Not everyone sees code this way: I see people read method calls, make a
number of wild assumptions about how those methods work, AND THEY ARE
CORRECT!  How do they do it!?  It is as if there are some unspoken
convention about how code should work that's opaque to me.
For example before I read the docs on
itertools.accumulate(list_of_length_N, func), here are the unknowns I see:
* Does it return N, or N-1 values?
* How are initial conditions handled?
* Must `func` perform the initialization by accepting just one
parameter, and accumulate with more-than-one parameter?
* If `func` is a binary function, and `accumulate` returns N values,
what's the Nth value?
* if `func` is a non-cummutative binary function, what order are the
arguments passed?
* Maybe accumulate expects func(*args)?
* Is there a window size? Is it equal to the number of arguments of `func`?
These are not all answered by reading the docs, they are answered by
reading the code. The code tells me the first value is a special case;
the first parameter of `func` is the accumulated `total`; `func` is
applied in order; and an iterator is returned.  Despite all my
questions, notice I missed asking what `accumulate` returns? It is the
unknown unknowns that get me most.
So, `itertools.accumulate` is a kinda-inaccurate name given to a
specific functionality: Not a problem on its own, and even delightfully
useful if I need it often.
What if I am in a domain where I see `accumulate` only a few times a
year? Or how about a program that uses `accumulate` in only one place?
For me, I must (re)read the `accumulate` source (or run the caller
through the debugger) before I know what the code is doing. In these
cases I advocate for in-lining the function code to remove these
unknowns. Instead of an inaccurate name, there is explicit code. If we
are lucky, that explicit code follows idioms that make the increased
verbosity easier to read.
Consider Serhiy Storchaka's elegant solution, which I reformatted for
readability
...
smooth_signal = [
    average
    for average in [0]
    for x in signal
    for average in [(1-decay)*average + decay*x]
]
We see the initial conditions, we see the primary function, we see how
the accumulation happens, we see the number of returned values, and we
see it's a list. It is a compact, easy read, from top to bottom. Yes, we
must know `for x in [y]` is an idiom for assignment, but we can reuse
that knowledge in all our other list comprehensions.  So, in the
specific case of this Reduce-Map thread, I would advocate using the list
comprehension.
In general, all functions introduce non-trivial code debt: This debt is
worth it if the function is used enough; but, in single-use or rare-use
cases, functions can obfuscate.
Thank you for your time.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

Peter O'Connor