[Python-ideas] Proposal: A Reduce-Map Comprehension and a "last" builtin

Peter O'Connor peter.ed.oconnor at gmail.com
Mon Apr 9 18:54:59 EDT 2018


Kyle, you sounded so reasonable when you were trashing itertools.accumulate
(which I now agree is horrible).  But then you go and support Serhiy's
madness:  "smooth_signal = [average for average in [0] for x in signal for
average in [(1-decay)*average + decay*x]]" which I agree is clever, but
reads more like a riddle than readable code.

Anyway, I continue to stand by:

    (y:= f(y, x) for x in iter_x from y=initial_y)

And, if that's not offensive enough, to its extension:

    (z, y := f(z, x) -> y for x in iter_x from z=initial_z)

Which carries state "z" forward but only yields "y" at each iteration.
(see proposal: https://github.com/petered/peps/blob/master/pep-9999.rst)

Why am I so obsessed?  Because it will allow you to conveniently replace
classes with more clean, concise, functional code.  People who thought they
never needed such a construct may suddenly start finding it indispensable
once they get used to it.

How many times have you written something of the form?:

    class StatefulThing(object):

        def __init__(self, initial_state, param_1, param_2):
            self._param_1= param_1
            self._param_2 = param_2
            self._state = initial_state

        def update_and_get_output(self, new_observation):  # (or just
__call__)
            self._state = do_some_state_update(self._state, new_observation,
self._param_1)
            output = transform_state_to_output(self._state, self._param_2)
            return output

    processor = StatefulThing(initial_state = initial_state, param_1 = 1,
param_2 = 4)
    processed_things = [processor.update_and_get_output(x) for x in x_gen]

I've done this many times.  Video encoding, robot controllers, neural
networks, any iterative machine learning algorithm, and probably lots of
things I don't know about - they all tend to have this general form.

And how many times have I had issues like "Oh no now I want to change
param_1 on the fly instead of just setting it on initialization, I guess I
have to refactor all usages of this class to pass param_1 into
update_and_get_output instead of __init__".

What if instead I could just write:

    def update_and_get_output(last_state, new_observation, param_1, param_2)
        new_state = do_some_state_update(last_state, new_observation,
_param_1)
        output = transform_state_to_output(last_state, _param_2)
        return new_state, output

    processed_things = [state, output:= update_and_get_output(state, x,
param_1=1, param_2=4) -> output for x in observations from
state=initial_state]

Now we have:
- No mutable objects (which cuts of a whole slew of potential bugs and
anti-patterns familiar to people who do OOP.)
- Fewer lines of code
- Looser assumptions on usage and less refactoring. (if I want to now pass
in param_1 at each iteration instead of just initialization, I need to make
no changes to update_and_get_output).
- No need for state getters/setters, since state is is passed around
explicitly.

I realize that calling for changes to syntax is a lot to ask - but I still
believe that the main objections to this syntax would also have been raised
as objections to the now-ubiquitous list-comprehensions - they seem hostile
and alien-looking at first, but very lovable once you get used to them.




On Sun, Apr 8, 2018 at 1:41 PM, Kyle Lahnakoski <klahnakoski at mozilla.com>
wrote:

>
>
> On 2018-04-05 21:18, Steven D'Aprano wrote:
> > (I don't understand why so many people have such an aversion to writing
> > functions and seek to eliminate them from their code.)
> >
>
> I think I am one of those people that have an aversion to writing
> functions!
>
> I hope you do not mind that I attempt to explain my aversion here. I
> want to clarify my thoughts on this, and maybe others will find
> something useful in this explanation, maybe someone has wise words for
> me. I think this is relevant to python-ideas because someone with this
> aversion will make different language suggestions than those that don't.
>
> Here is why I have an aversion to writing functions: Every unread
> function represents multiple unknowns in the code. Every function adds
> to code complexity by mapping an inaccurate name to specific
> functionality.
>
> When I read code, this is what I see:
>
> >    x = you_will_never_guess_how_corner_cases_are_handled(a, b, c)
> >    y =
> you_dont_know_I_throw_a_BaseException_when_I_do_not_like_your_arguments(j,
> k, l)
>
> Not everyone sees code this way: I see people read method calls, make a
> number of wild assumptions about how those methods work, AND THEY ARE
> CORRECT!  How do they do it!?  It is as if there are some unspoken
> convention about how code should work that's opaque to me.
>
> For example before I read the docs on
> itertools.accumulate(list_of_length_N, func), here are the unknowns I see:
>
> * Does it return N, or N-1 values?
> * How are initial conditions handled?
> * Must `func` perform the initialization by accepting just one
> parameter, and accumulate with more-than-one parameter?
> * If `func` is a binary function, and `accumulate` returns N values,
> what's the Nth value?
> * if `func` is a non-cummutative binary function, what order are the
> arguments passed?
> * Maybe accumulate expects func(*args)?
> * Is there a window size? Is it equal to the number of arguments of `func`?
>
> These are not all answered by reading the docs, they are answered by
> reading the code. The code tells me the first value is a special case;
> the first parameter of `func` is the accumulated `total`; `func` is
> applied in order; and an iterator is returned.  Despite all my
> questions, notice I missed asking what `accumulate` returns? It is the
> unknown unknowns that get me most.
>
> So, `itertools.accumulate` is a kinda-inaccurate name given to a
> specific functionality: Not a problem on its own, and even delightfully
> useful if I need it often.
>
> What if I am in a domain where I see `accumulate` only a few times a
> year? Or how about a program that uses `accumulate` in only one place?
> For me, I must (re)read the `accumulate` source (or run the caller
> through the debugger) before I know what the code is doing. In these
> cases I advocate for in-lining the function code to remove these
> unknowns. Instead of an inaccurate name, there is explicit code. If we
> are lucky, that explicit code follows idioms that make the increased
> verbosity easier to read.
>
> Consider Serhiy Storchaka's elegant solution, which I reformatted for
> readability
>
> > smooth_signal = [
> >     average
> >     for average in [0]
> >     for x in signal
> >     for average in [(1-decay)*average + decay*x]
> > ]
>
> We see the initial conditions, we see the primary function, we see how
> the accumulation happens, we see the number of returned values, and we
> see it's a list. It is a compact, easy read, from top to bottom. Yes, we
> must know `for x in [y]` is an idiom for assignment, but we can reuse
> that knowledge in all our other list comprehensions.  So, in the
> specific case of this Reduce-Map thread, I would advocate using the list
> comprehension.
>
> In general, all functions introduce non-trivial code debt: This debt is
> worth it if the function is used enough; but, in single-use or rare-use
> cases, functions can obfuscate.
>
>
>
> Thank you for your time.
>
>
>
>
>
>
>
>
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180409/9be85fef/attachment-0001.html>


More information about the Python-ideas mailing list