[Python-ideas] A more readable way to nest functions

Steven D'Aprano steve at pearwood.info
Sat Jan 28 21:30:13 EST 2017


On Sat, Jan 28, 2017 at 03:16:27PM +0100, zmo via Python-ideas wrote:
> Hi list o/
> 
> This idea sounds fun, so as a thought experiment why not imagine one
> way of integrating it in what I believe would be pythonic enough.

This idea is sometimes called "the Collection Pipeline" design pattern, 
and is used in various command shells. Martin Fowler wrote about this 
design pattern here:

https://martinfowler.com/articles/collection-pipeline/

and I wrote a recipe for it:

https://code.activestate.com/recipes/580625-collection-pipeline-in-python/

with a working, although basic, implementation.

The recipe shows that we don't need new syntax for this sort of feature. 
I'm rather partial to either the | or >> operators, both of which are 
rarely used except by ints. Nor does it need to be a built-in part of 
the language. It could be a third-party module, or a library module.

I think that the most important feature of pipeline syntax is that we 
write the functions in the same order that they are applied, instead of 
backwards. Instead of:

    print(list(map(float, filter(lambda n: 20 < n < 30, data))))

where you have to read all the way to the right to find out what you are 
operating on, and then read backwards to the left in order to follow the 
execution order, a pipeline starts with the argument and then applies 
the functions in execution order:

    data | Filter(lambda n: 20 < n < 30) | Map(float) | List | Print

(In principle, Python built-ins could support this sort of syntax so I 
could write filter, map, list, print rather than custom versions Filter, 
Map, etc. That would feel very natural to a language like Haskell, for 
example, where partial function application is a fundamental part of the 
language. But for Python that would be a *major* change, and not one I 
wish to propose. Easier to just have a separate, parallel set of 
pipeline functions, with an easy way to create new ones. A module is 
perfect for that.)

Now we can see that these sorts of pipelines are best suited for a 
particular style of programming. It doesn't work so well for arbitrary 
function calls where the data arg could end up in any argument position:

    aardvark(1, 2, cheese('a', eggs(spam(arg), 'b')), 4)

But I don't see that as a problem. This is not a replacement for regular 
function call syntax in its full generality, but a powerful design 
pattern for solving certain kinds of problems.


> On Sat, Jan 28, 2017 at 12:41:24PM +0000, Ed Kellett wrote:
> > FWIW, I'd spell it without the (), so it's simply a right-associative
> > binary operator on expressions, (a -> b, a) -> b, rather than magic syntax.
> >     print XYZ some_func XYZ another_func("Hello")
> 
> I agree this would look a bit more elegant. To focus on the feature of
> that operator, instead of how to write it, I'll use XYZ instead of <| in
> this post.
> 
> So, considering it's decided that the RHS is in charge of filling up all
> the arguments of the LHS, 

Is that actually decided?

That seems to break the advantage of a pipeline: the left-to-right 
order. To understand your syntax, you have to read from the right 
backwards to the left:

    # print(list(map(float, filter(lambda n: 20 < n < 30, data))))
    print XYZ list XYZ map(float) XYZ filter(lambda n: 20 < n < 30, data)

That's actually longer than the current syntax.

Actually, I don't think this would work using your idea. filter would 
need to pass on *all* of map's arguments, not just the data argument:

    filter(float, lambda n: 20 < n < 30, data,)
    # returns a tuple (float, FilterObject)

which gives us:

    print XYZ list XYZ map XYZ filter(float, lambda n: 20 < n < 30, data)

But of course filter doesn't actually have that syntax, so either we 
have a new, parallel series of functions including Filter(...) or we 
write something like:

    print XYZ list XYZ map XYZ lambda (f1, f2, arg): (f1, filter(f2, arg))(float, lambda n: 20 < n < 30, data)


which is simply horrid. Maybe there could be a series of helper 
functions, but I don't think this idea is workable. See below.


> how to deal with positional and keyword
> arguments without introducing new syntax? Should it be by returning a
> tuple of positional iterable and keyword dict? i.e.:
> 
>     def fn_a(*args, **kwarg):
>         print("args: {}, kwarg: {}".format(args, kwarg))
> 
>     def fn_b():
>         return (1,2,3), {'a':1, 'b':2, 'c':3}
> 
>     fn_a XYZ fn_b()

The problem is that each function needs to know what arguments the 
*next* function expects. That means that the function on the right needs 
to have every argument used by the entire pipeline, and each function 
has to take the arguments it needs and pass on the rest.

It also means that everything is very sensitive to the order that 
arguments are expected:

    def spam(func, data): ...
    def ham(argument, function): ...

    spam XYZ foo(bar, data)
    ham XYZ foo(bar, data)

What should foo() return?


[...]
> In practice, such a scheme would make it possible to have:
> 
>     print XYZ (("Hello World",), {"file": sys.stderr})

In what way is this even close to an improvement over the existing 
function call syntax?

    print XYZ (("Hello World",), {"file": sys.stderr})
    print("Hello World", file=sys.stderr)


If "Hello World" wasn't a literal, but came from somewhere else:

    print XYZ ((greetings(),), {"file": sys.stderr})
    print(greetings(), file=sys.stderr)


so you're not even avoiding nested parentheses.


> All in all, it can be a nice syntactic sugar to have which could make it
> more flexible working with higher order functions, but it with the way
> I'm suggesting to comply with python's arguments handling, it offers
> little advantages when the RHS is not filling LHS arguments:
> 
>     >>> print(all(map(lambda x: x>2, filter(lambda x: isinstance(x, int), range(0,3)))))
>     True
> 
> vs
> 
>     >>> print XYZ all XYZ map XYZ (lambda x: x>2, filter(lambda x: isinstance(x, int), range(0,3))),
>     True

I think that "literal advantage" is being very kind. The best you can 
say is that you save two pairs of parentheses at the cost of three 
operators and moving arguments away from the functions that use them.


> Here, applying map onto all onto print offers a great readability, 

I don't think so. At *best*, it is no better than what we already have:

    print XYZ all XYZ map XYZ ...
    print (   all (   map (   ...


but moving the arguments away from where they are used makes it 
unspeakable. Consider:

    def double(values):
        for v in values:
            return 2*v

    print(max(map(float, double(range(5)))))

How would I use your syntax?

    print XYZ max XYZ map float XYZ double XYZ range XYZ 5


doesn't work without new syntax, and 

    print XYZ max XYZ map XYZ double XYZ range XYZ (float, 5)


doesn't work without re-writing range and double to pass on unused 
arguments. I'd need partial application:

    from functools import partial
    print XYZ max XYZ partial(map, float) XYZ double XYZ range XYZ 5


which is now starting to look like a collection pipeline written out 
backwards:

    5 | Range | Apply(double) | Map(float) | Max | Print

where (again) the Capital letter functions will be pipe-compatible 
versions of the usual range, map, etc. They don't necessarily have to be 
prepared before hand: many could be a simple wrapper around the 
built-in:

    Max = Apply(max)

There may be ways to avoid even that. A third-party library is a good 
place to experiment with these questions, this is in no way ready for 
the standard library, let alone a new operator.


[...]
> But then it would be just another way to introduce currying as a
> language feature with an operator, so we should then just discuss on how
> to add currying as a language syntax "by the book", but I'm pretty sure
> that's a topic already discussed before I joined this list ;-)

The easiest way to support currying, or at least some form of it, is:

    from functools import partial as p
    p(map, float)  # curries map with a single argument float

which is not quite the map(float) syntax Haskell programmers expect, 
but its not awful.


-- 
Steve


More information about the Python-ideas mailing list