On Jul 14, 2019, at 13:46, Nima Hamidi <hamidi@stanford.edu> wrote:
Andrew Barnert wrote:
But in your proposal, wouldn’t this have to be written as dt[price < 1]? I think the cost of putting the expression in ticks is at least as bad as the cost of naming the dt. Also: dt.price < 1 is a perfectly valid expression, with a useful value. You can store it in a temporary variable to avoid repeating it, or stash it for later, or print it out to see what’s happening. But price < 1 on its own is a NameError, and I’m not sure what price < 1 is worth on its own. Would this invite code that’s hard to refactor and even harder to debug? In this particular example, the payoff is negligible. But look at the following example:
tips[(tips['size'] >= 5) | (tips['total_bill'] > 45)]This could be simplified to: ``` tips[(size >= 5) | (total_bill > 45)] ```
That’s not a fair example, because you’re ignoring the dot syntax that Pandas already provides, and also leaving out the backticks. So it’s really: tips[(tips.size >= 5) | (tips.total_bill > 45)] tips[`(size >= 5) | (total_bill > 45)`] So, while there is still some advantage, it’s not nearly as big. And again, the tradeoff is that you don’t have useful intermediate values anymore. For example, if I want to use tips.size >= 5 repeatedly, or print it out for debugging before using it, etc., I can just do hightips = tips.size >= 5. There’s no way to do the same thing with your version.
Pipe-lining in R is also much cleaner. Dplyr provided an operator %>% which passes the return value of its LHS as the first argument of its RHS. In other words, f() %>% g() is equivalent to g(f()). This is pretty useful for long pipelines. The way that it works is that the operator %>% changes AST and then evaluates the modified expression. In this example, evaluating g() is undesirable.
This doesn’t seem necessary in a language with first-class functions. Why can’t you just write the pipeline as something f %>% g, much as you would in, say, Haskell, which would just define a function (presumably equivalent to either lambda: g(f()) or lambda a, **kw: g(f(a, **kw))) that represents the pipeline that you then just call normally? I don’t see the benefit in being able to write g() instead of g here, and in fact it seems actively misleading, because it implies calling g on no arguments instead of one. Also, given that Python doesn’t have such an operator, or a way to define custom operators, and that proposals for even simpler operators on functions like @ for compose have been rejected every time they’ve been suggested, I wouldn’t expect much traction from this example. Is there something similar that could plausibly be done in Python, and feel Pythonic? Please look at the "partials" example by Xavier. I think it illustrates how this is different than function composition.
I know the difference between function pipelines and function composition, but if nobody was interested in adding an operator for the even simpler compose, I think it’s unlikely that anyone will be interested in adding an operator for pipeline. And certainly not something that looks like %>%. So, this example doesn’t really help sell your proposal. Also, you didn’t answer any of the other issues that have nothing to do with that comparison with @ for compose. Why do you want to make people spell “feed to g” as “feed to g()”? Why shouldn’t it create a function that can be called (or otherwise used) normally? And so on.