<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi David<div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On 25 Mar 2017, at 20:08, David Mertz <<a href="mailto:mertz@gnosis.cx" class="">mertz@gnosis.cx</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">I think it's extraordinarily unlikely that a big change in Python syntax to support query syntax will ever happen. Moreover, I would oppose such a change myself.<div class=""><br class=""></div><div class="">But just a change also really is not necessary. Pandas already abstracts all the things mentioned using only Python methods. It is true that Pandas sometimes does some black magic within those methods to get there; and it also uses somewhat non-Pythonic style of long chains of method calls. But it does everything PythonQL does, as well as much, much more. Pandas builds in DataFrame readers for every data source you are likely to encounter, including leveraging all the abstractions provided by RDBMS drivers, etc. It does groupby, join, etc.</div><div class=""><br class=""></div></div></div></blockquote><div><br class=""></div><div>I work daily with pandas, so of course it does have the functionality that PythonQL introduces, its a completely different beast. One of the reasons I started with</div><div>PythonQL is because pandas is so difficult to master (just like any function-based database API would).</div><div><br class=""></div><div>The key benefit of PythonQL is that with minimal grammar extensions you get the power of a real query language. So a Python programmer who knows </div><div>comprehensions well and has a good idea of SQL or other query languages can start writing complex queries right away.</div><div><br class=""></div><div>With pandas you need to read the docs all the time and complex data transformations become incredibly cryptic.</div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="">See, e.g.:</div><div class=""><br class=""></div><div class=""> <a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html" class="">http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html</a></div><div class=""><br class=""></div><div class="">Now there's one reasonable objection to Pandas: It doesn't handle larger-than-memory datasets well. I don't see that PythonQL is better in that regard. But there is an easy next step for that larger data. Blaze provides generic interfaces to many, many larger-than-memory data sources. It is largely a subset of the Pandas API, although not precisely that. See, e.g.:</div><div class=""><br class=""></div><div class=""> <a href="http://blaze.readthedocs.io/en/latest/rosetta-sql.html" class="">http://blaze.readthedocs.io/en/latest/rosetta-sql.html</a><br class=""></div><div class=""><br class=""></div><div class="">Moreover, within the Blaze "orbit" is Dask. This is a framework for parallel computation, one of whose abstractions is a DataFrame based on Pandas. This gives you 90% of those methods for slicing-and-dicing data that Pandas does, but deals seamlessly with larger-than-memory datasets. See, e.g.:</div><div class=""><br class=""></div><div class=""> <a href="http://dask.pydata.org/en/latest/dataframe.html" class="">http://dask.pydata.org/en/latest/dataframe.html</a><br class=""></div><div class="gmail_extra"><br class=""></div><div class="gmail_extra">So I think your burden is even higher than showing the usefulness of PythonQL. You have to show why it's worth adding new syntax to do somewhat LESS than is available in very widely used 3rd party tools that avoid new syntax.</div></div></div></blockquote><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class="gmail_extra"><br class=""><div class="gmail_quote">On Fri, Mar 24, 2017 at 8:10 AM, Pavel Velikhov <span dir="ltr" class=""><<a href="mailto:pavel.velikhov@gmail.com" target="_blank" class="">pavel.velikhov@gmail.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class="">Hi folks!<div class=""><br class=""><div class=""> We started a project to extend Python with a full-blown query language about a year ago. The project is call PythonQL, the links are given below in the references section. We have implemented what is kind of an alpha version now, and gained some experience and insights about why and where this is really useful. So I’d like to share those with you and gather some opinions whether you think we should try to include these extensions in the Python core.</div><div class=""><br class=""></div><div class=""><b class="">Intro</b></div><div class=""><br class=""></div><div class=""> What we have done is (mostly) extended Python’s comprehensions with group by, order by, let and window clauses, which can come in any order, thus comprehensions become a query language a bit cleaner and more powerful than SQL. And we added a couple small convenience extensions, like a We have identified three top motivations for folks to use these extensions:</div><div class=""><br class=""></div><div class=""><b class="">Our Motivations</b></div><div class=""><br class=""></div><div class="">1. This can become a standard for running queries against database systems. Instead of learning a large number of different SQL dialects (the pain point here are libraries of functions and operators that are different for each vendor), the Python developer needs only to learn PythonQL and he can query any SQL and NoSQL database.</div><div class=""><br class=""></div><div class="">2. A single PythonQL expression can integrate a number of databases/files/memory structures seamlessly, with the PythonQL optimizer figuring out which pieces of plans to ship to which databases. This is a cool virtual database integration story that can be very convenient, especially now, when a lot of data scientists use Python to wrangle the data all day long.</div><div class=""><br class=""></div><div class="">3. Querying data structures inside Python with the full power of SQL (and a bit more) is also really convenient on its own. Usually folks that are well-versed in SQL have to resort to completely different means when they need to run a query in Python on top of some data structures.</div><div class=""><br class=""></div><div class=""><b class="">Current Status</b></div><div class=""><br class=""></div><div class="">We have PythonQL running, its installed via pip and an encoding hack, that runs our preprocessor. We currently compile PythonQL into Python using our executor functions and execute Python subexpressions via eval. We don’t do any optimization / rewriting of queries into languages of underlying systems. And the query processor is basic too, with naive implementations of operators. But we’ve build DBMS systems before, so if there is a good amount of support for this project, we’ll be able to build a real system here.</div><div class=""><br class=""></div><div class=""><b class="">Your take on this</b></div><div class=""><br class=""></div><div class="">Extending Python’s grammar is surely a painful thing for the community. We’re now convinced that it is well worth it, because of all the wonderful functionality and convenience this extension offers. We’d like to get your feedback on this and maybe you’ll suggest some next steps for us.</div><div class=""><br class=""></div><div class=""><b class="">References</b></div><div class=""><br class=""></div><div class="">PythonQL GitHub page: <a href="https://github.com/pythonql/pythonql" target="_blank" class="">https://github.com/<wbr class="">pythonql/pythonql</a></div><div class="">PythonQL Intro and Tutorial (this is all User Documentation we have right now): <a href="https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial" target="_blank" class="">https://github.com/<wbr class="">pythonql/pythonql/wiki/<wbr class="">PythonQL-Intro-and-Tutorial</a></div><div class="">A use-case of querying Event Logs and doing Process Mining with PythonQL: <a href="https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL" target="_blank" class="">https://github.com/<wbr class="">pythonql/pythonql/wiki/Event-<wbr class="">Log-Querying-and-Process-<wbr class="">Mining-with-PythonQL</a></div><div class="">PythonQL demo site: <a href="http://www.pythonql.org/" target="_blank" class="">www.pythonql.org</a></div><div class=""><br class=""></div><div class="">Best regards,</div><div class="">PythonQL Team</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div></div></div><br class="">______________________________<wbr class="">_________________<br class="">
Python-ideas mailing list<br class="">
<a href="mailto:Python-ideas@python.org" class="">Python-ideas@python.org</a><br class="">
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank" class="">https://mail.python.org/<wbr class="">mailman/listinfo/python-ideas</a><br class="">
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank" class="">http://python.org/psf/<wbr class="">codeofconduct/</a><br class="">
<br class=""></blockquote></div><br class=""><br clear="all" class=""><div class=""><br class=""></div>-- <br class=""><div class="gmail_signature">Keeping medicines from the bloodstreams of the sick; food <br class="">from the bellies of the hungry; books from the hands of the <br class="">uneducated; technology from the underdeveloped; and putting <br class="">advocates of freedom in prisons. Intellectual property is<br class="">to the 21st century what the slave trade was to the 16th.<br class=""></div>
</div></div>
</div></blockquote></div><br class=""></div></body></html>