On 2020-02-23 16:32, Guido van Rossum wrote:
Assuming that the reader is familiar with the example `Lottery ~ Literacy + Wealth + Region` is *not* going to work. I have literally no idea from what field that is taken or what the purpose of the example is. Please don't expect that I can just Google it: I did, found https://www.statsmodels.org/stable/example_formulas.html, and I still have no idea what it's about.
Sorry, perhaps I should have given a bit more explanation. As I said, "~" means "depends on". So in R, you do something like: model = some_statistical_model_function(Lottery ~ Literacy + Wealth + Region, some_data_table) This means "make a model that predicts the value of Lottery based on the values of Literacy, Wealth and Region", where the names Lottery, Literacy etc. refer to columns in some_data_table, which is a tabular data structure akin to a pandas DataFrame. So, again, `Lottery ~ Literacy + Wealth + Region` means "Lottery depends on Literacy, Wealth, and Region". It doesn't really matter what names we use, we can use "A ~ B + C" just as well; the point is it is defining a relationship between variables whose measurements we have as columns in a tabular structure, and it means that we want a model where the variables on the right of the tilde are the independent variables and the one on the left is the dependent variable. "Y ~ X" means "predict Y using X". As you mentioned (in a part of your response I snipped) the precedence of the operator is important. In this case we would want the operator to have very low precedence, because we want it to mean `Lottery ~ (Literacy + Wealth + Region)` --- that is, that the independent variable may depend on some complicated expression involving combinations of the dependent variables. It's also worth noting that the tilde here isn't notation for any of the work that the statistical model does. It's just a way of writing a "formula" that relates the independent and dependent variables, but you still have to pass that formula to some function that actually runs the model. All that said, given that we can already achieve the desired precedence with parentheses, I'll reiterate that I don't think the tilde is a real blocker to doing this kind of model specification with Python expressions, so I don't think I'm in favor of this proposal as it is. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown