[Python-ideas] The pipe protocol, a convention for extensible method chaining

Tue May 26 04:54:55 CEST 2015

On Mon, May 25, 2015 at 04:38:20PM -0700, Stephan Hoyer wrote:
> In the PyData community, we really like method chaining for data analysis
> pipelines:
> 
> (iris.query('SepalLength > 5')
>  .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
>          PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
>  .plot(kind='scatter', x='SepalRatio', y='PetalRatio'))
> 
> 
> Unfortunately, method chaining isn't very extensible -- short of monkey
> patching, every method we want to use has exist on the original object. If
> a user wants to supply their own plotting function, they can't use method
> chaining anymore.

It's not really *method* chaining any more if they do that :-)

> You may recall that we brought this up a few months ago on python-ideas as
> an example of why we would like macros.
> 
> To get around this issue, we are contemplating adding a pipe method to
> pandas DataFrames. It looks like this:
> 
> def pipe(self, func, *args, **kwargs):
>     pipe_func = getattr(func, '__pipe_func__', func)
>     return pipe_func(self, *args, **kwargs)

Are you sure this actually works in practice?

Since pipe() returns the result of calling the passed in function, not 
the dataframe, it seems to me that you can't actually chain this unless 
it's the last call in the chain. This should work:

(iris.query('SepalLength > 5')
    .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
            PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
   .pipe(myplot, kind='scatter', x='SepalRatio', y='PetalRatio')
   )

but I don't think this will work:

(iris.query('SepalLength > 5')
    .assign(SepalRatio = lambda x: x.SepalWidth / x.SepalLength,
            PetalRatio = lambda x: x.PetalWidth / x.PetalLength)
   .pipe(myexport, spam=True, eggs=False)
   .plot(kind='scatter', x='SepalRatio', y='PetalRatio')
   )

That makes it somewhat less of a general purpose pipelining method and 
more of a special case "replace the plotter with a different plotter" 
helper method. And for that special case, I'd prefer to give the plot 
method an extra argument, which if not None, is a function to delegate 
to:

    .plot(kind='scatter', x='SepalRatio', y='PetalRatio', plotter=myplot)

What's the point of the redirection to __pipe_func__? Under what 
circumstances would somebody use __pipe_func__ instead of just passing a 
callable (a function or other object with __call__ method)? If you don't 
have a good use case for it, then "You Ain't Gonna Need It" applies.

I think that is completely unnecessary. (It also abuses a reserved 
namespace, but you've already said you don't care about that.) Instead 
of passing:

    .pipe(myobject, args)  # myobject has a __pipe_func__ method

just make it explicit and write:

    .pipe(myobject.some_method, args)

And for what it's worth, apart from the dunder issue, I think it's silly 
to have a *method* called "*_func__".

> The business with __pipe_func__ is more magical, and frankly we aren't sure
> it's worth the complexity. The idea is to create a "pipe protocol" that
> allows functions to decide how they are called when piped. This is useful
> in some cases, because it doesn't always make sense for functions that act
> on piped data to accept that data as their first argument.

Just use a wrapper function that reorders the arguments. If the 
reordering is simple enough, you can do it in place with a lambda:

    .pipe(lambda *args, **kwargs: myplot(args[1], args[0], *args[2:]))

> Obviously, this sort of protocol would not be an official part of the
> Python language. But because we are considering creating a de-facto
> standard, we would love to get feedback from other Python communities that
> use method chaining:

Because you are considering creating a de-facto standard, I think it is 
especially rude to trespass on the reserved dunder namespace. (Unless, 
of course, the core developers decide that they don't mind.)

> 1. Have you encountered or addressed the problem of extensible method
> chaining?

Yes. I love chaining in, say, bash, and it works well in Ruby, but it's 
less useful in Python. My attempt to help bring chaining to Python is 
here 

http://code.activestate.com/recipes/578770-method-chaining/

but it relies on methods operating by side-effect, not returning a new 
result. But generally speaking, I don't like methods that operate by 
side-effect, so I don't use chaining much in practice. I'm always on the 
look-out for opportunities where it makes sense though.

> 2. Would this pipe protocol be useful to you?

I don't think so.

> 3. Is it worth allowing piped functions to override how they are called by
> defining something like __pipe_func__?

No, I think it is completely unnecessary.

-- 
Steve