New subject: Non-standard evaluation for Python

July 14, 2019

      (Re-sending, because this was originally a reply to an off-list message by Nima Hamidi)

On Jul 13, 2019, at 14:12, Nima Hamidi <hamidi@stanford.edu> wrote:
...
Sometimes it's necessary not to evaluate the expression. Two such applications of NSE in R are as follows:
1. Data-tables have cleaner syntax. For example, letting dt be a data-table with a column called price, one can retrieve items cheaper than $1 using the following: dt [price < 1]. Pandas syntax requires something like dt[dt.price < 1]. This is currently inevitable as the expression is evaluated *before* __getitem__ is invoked. Using NSE, dt.__getitem__ can, first, add its columns to locals() dictionary and then evaluate the expression in the new context.
This one looks good. I can also imagine it being useful for SQLAlchemy, appscript, etc. just as it is for Pandas.

But in your proposal, wouldn’t this have to be written as dt[`price < 1`]? I think the cost of putting the expression in ticks is at least as bad as the cost of naming the dt.

Also: dt.price < 1 is a perfectly valid expression, with a useful value. You can store it in a temporary variable to avoid repeating it, or stash it for later, or print it out to see what’s happening. But price < 1 on its own is a NameError, and I’m not sure what `price < 1` is worth on its own. Would this invite code that’s hard to refactor and even harder to debug?
...
2. Pipe-lining in R is also much cleaner. Dplyr provided an operator %>% which passes the return value of its LHS as the first argument of its RHS. In other words, f() %>% g() is equivalent to g(f()). This is pretty useful for long pipelines. The way that it works is that the operator %>% changes AST and then evaluates the modified expression. In this example, evaluating g() is undesirable.
This doesn’t seem necessary in a language with first-class functions. Why can’t you just write the pipeline as something f %>% g, much as you would in, say, Haskell, which would just define a function (presumably equivalent to either lambda: g(f()) or lambda *a, **kw: g(f(*a, **kw))) that represents the pipeline that you then just call normally? I don’t see the benefit in being able to write g() instead of g here, and in fact it seems actively misleading, because it implies calling g on no arguments instead of one.

Also, given that Python doesn’t have such an operator, or a way to define custom operators, and that proposals for even simpler operators on functions like @ for compose have been rejected every time they’ve been suggested, I wouldn’t expect much traction from this example. Is there something similar that could plausibly be done in Python, and feel Pythonic?

—-

A couple more things I thought of since the initial reply…

I’m pretty sure Python’s AST objects don’t contain the original source text. So, what is your plot function going to actually do with its arguments to get the axes? What if it’s called with plot(`x[..., 3:]`)? Will plot—and every other function that wants to do something similar—need to come up with a way to generate the nicest source text that could produce the given AST? Or do we need to add a decompile to the stdlib for them? I suppose you could solve this by just adding more fields to BoundExpression, but I’m not sure that wouldn’t make it a lot harder to implement the backtick feature.

Backticks are supposed to be banned for the life of Python since 3.0 eliminated them as shorthand for repr. That could be revisited, but it might be a tough sell. Maybe the original “grit on Tim’s screen” reason is no longer as compelling because of higher-res screens and more uniform console fonts, but the rise to ubiquity of markdown seems like an even better reason not to use them. Today, you can paste Python code between backticks to mark it as code in markdown; if Python code can contain backticks, that’s no longer true. People who use languages that rely on backticks have been complaining about that for years; so we want to join them?

Finally, I think you need a fully worked-through example, not just a description of one. Show what the implementation of plot would look like if it could be handed BoundExpression objects. (Although pd.DataFrame.__getitem__ seems like the killer use case here, so maybe show that one instead, even though it’s probably more complicated.)

Re: Non-standard evaluation for Python

Andrew Barnert

Serhiy Storchaka

Nima Hamidi

Serhiy Storchaka

Andrew Barnert

Chris Angelico

Andrew Barnert

Nima Hamidi

Andrew Barnert

Nima Hamidi

Serhiy Storchaka

Nima Hamidi

Serhiy Storchaka

Andrew Barnert

Chris Angelico

Andrew Barnert

Nima Hamidi

Andrew Barnert

Nima Hamidi

tags

participants (4)