Supporting ~ as a binary operator is an interesting idea, especially given
the relatively limited usage of unary ~. However, the big hole in this
proposal for formulas is that there is a de facto standard "minilanguage"
for writing such formulas in Python, namely what Patsy supports:
https://patsy.readthedocs.io/en/latest/formulas.html Patsy is used by
statsmodels and other tools to support the same formulas as we see in R (or
S).
We immediately see the problem with the interaction operator : (colon),
which conflicts with how it is used to support annotations in Python. Given
that this formula minilanguage is comprehensive, this seems to be a fatal
objection.
Note that deferred evaluation is sort of a red herring - it is
straightforward to defer execution in Python's object model, as we see in
SymPy, Pandas dataframes where clauses, and SQLAlchemy, among other
examples.
On Sun, Feb 23, 2020 at 5:37 PM
Aaron Hall wrote:
Currently, Python only has ~ (tilde) in the context of a unary operation (like -, with __neg__(self), and +, __pos__(self)). ~ currently calls __invert__(self) in the unary context. I think it would be awesome to have in the language, as it would allow modelling along the lines of R that we currently only get with text, e.g.: smf.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df) With a binary context for ~, we could write the above string as pure Python, with implications for symbolic evaluation (with SymPy) and statistical modelling (such as with sklearn or statsmodels) - and other use-cases/DSLs. In LaTeX we call this \sim (Wikipedia indicates this is for "similar to"). I'm not too particular, but __sim__(self, other) would have the benefits of being both short and consistent with LaTeX. This is not a fully baked idea, perhaps there's a good reason we haven't added a binary ~. It seems like I've seen discussion in the past. But I couldn't find such discussion. And as I'm currently taking some statistics courses, I'm getting R-feature-envy again... What do you think? Aaron Hall
I really do not fully understand your proposal. I do not know nothing about R and my statistical knowledge has gone long ago.
However, I think that we cannot expect that Python accommodates every existing domain. Let me explain: Python have not special features, syntax, operators to deal with SQL, HTML, ini files, OpenGL, etc. These domains, and others, are supported via libraries, outside of the language core.
~ exists in bit-wise context and, as long as I know, it comes from C --I have never used it indeed in Python. It is a unary operator because it works in that way as a bitwise operator.
I cannot see any improvement in becoming ~ into a binary operator. I imagine that a binary ~ would have a completely different meaning from a unary ~. I can foresee many problems here.
In my opinion, you should prove that binary ~ has a relevant benefit for the whole language, not just for R tasks. It should be useful in some different domains and behave consistently --or at least so consistent as possible-- in those domains.
Can you, for instance, envision other uses of binary ~ beyond R? _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UMXBIM... Code of Conduct: http://python.org/psf/codeofconduct/