On Jul 15, 2019, at 01:27, Serhiy Storchaka
14.07.19 23:20, Nima Hamidi пише:
Thank you for your question! It would depend on the implementation of DataFrame.__getitem__. Note that BoundExpression is endowed with locals and globals of the callee. So, it does have access to x in your example. I think the way that data.table in R handles this is that before evaluating the expression, __getitem__ simply adds columns to locals and then evaluates the expression. In your example, x already exists in locals, but the price doesn't. So, __getitem__ adds it to locals and so everything's there to evaluate the expression correctly. I think this feature is called "non-standard evaluation" because it lets programmers evaluate expressions in a context other than the standard context.
The problem with this is that you should know all column names to avoid conflicts, even if you do not use them. If new columns be added which conflict with your locals you could silently get an unexpected result. This is as bad as using a star import which overrides your globals or locals.
The feature as described allows the library to do whatever it wants with the namespaces, and letting locals take priority over columns, or raising an exception if there’s an ambiguity, are just as easy as letting columns take priority over locals. If one of those options is clearly better, then libraries like Pandas or SQLAlchemy or whatever are going to implement the better one, not the worse one.
It would be better to mark either free or bound variables explicitly. For example, dt[\price < x].
At that point I think you’re better off with the existing syntax, dt[dt.price < x]. When you want to explicitly specify a namespace, that’s what dot syntax already means. Consider the case where dt is a join of two tables d1 and d2. Today you can write dt[d1.price * d2.taxrate < x]. With the proposed new feature, you could presumably write dt[price * taxrate < x], and get an exception if, say, both tables have price columns, but otherwise get exactly what you expected. I assume you think that’s too unclear or magical or whatever? But then I’m not sure how dt[\price * \taxrate < x] is much better.