[Python-ideas] Proposal: Query language extension to Python (PythonQL)

Pavel Velikhov pavel.velikhov at gmail.com
Mon Mar 27 04:55:32 EDT 2017


Hi Brice,

> On 27 Mar 2017, at 10:17, Brice PARENT <contact at brice.xyz> wrote:
> 
> I prefer this a lot to the original syntax, and I really think this has much better chances to be integrated (if such an integration had to be done, and not kept as a separate module).
> 
> Also, maybe managing this with classes instead of syntax could also be done easily (without any change to Python), like this: 
> from pyql import PQL, Select, For, Where, GroupBy, Let
> 
> result = PQL(
>     Select("x", "sum_y"),
>     For("x", range(1, 8)),
>     For("y", range(1, 7)),
>     Where(lambda x, y: x % 2 == 0 and y % 2 != 0 and x > y, "x", "y"),  # function, *[arguments to pass to the function]
>     Where("sum_y", lambda sum_y: sum_y % 2 != 0)
>     GroupBy("x"),
>     Let("sum_y", lambda y: sum(y), "y")
> )
> 
> 

So here’s the deal: small queries will look pretty decent in pretty much all paradigms, ORM, or PythonQL or your proposal.
Once they get bigger and combine multiple pain points (say outerjoins, grouping and nested data) - then unless you have a
really clear and minimal language, folks will get confused and lost.

We’ve gone through a few query languages that failed, including XQuery and others, and the main reason was the need to learn
a whole new language and a bunch of libraries, nobody wanted to do it. So the main selling point behind PythonQL is: its Python
that folks hopefully know already, with just a few extensions.
> (to be defined more precisely, I don't really like to rely on case to differentiate the "for" keyword and the "For" class, which by the way could be inherited from a more general "From" class, allowing to get the data from a database, pure python, a JSON/csv/xml file/object, or anything else.)
> 
> With a nice lazy evaluation, in the order of the arguments of the constructor, I suppose you could achieve everything you need, and have an easily extendable syntax (create new objects between versions of the module, without ever having to create new keywords). There is no new parsing, and you already have the autocompletion of your IDE if you annotate correctly your code.
> 
> You could even have this : 
> 
> query = PQL(
>     Select("x", "sum_y"),
>     Where("x", "y", lambda x, y: x % 2 == 0 and y % 2 != 0 and x > y),
>     Where("sum_y", lambda sum_y: sum_y % 2 != 0)
>     GroupBy("x"),
>     Let("sum_y", lambda y: sum(y), "y")  # [name of the new var], function, *[arguments to pass to the function]
> )
> query.execute(x=range(1, 8), y=range(1, 7))
> or 
> query.execute(PgDatabase(**dbsettings))
> 
> -Brice
> 
> Le 25/03/17 à 16:40, Kyle Lahnakoski a écrit :
>> 
>> Pavel,
>> 
>> I like PythonQL. I perform a lot of data transformation, and often find Python's list comprehensions too limiting; leaving me wishing for LINQ-like language features. 
>> As an alternative to extending Python with PythonQL, Terry Reedy suggested interpreting a DSL string, and Pavel Velikhov alluded to using magic method tricks found in ORM libraries. I can see how both these are not satisfactory.  
>> A third alternative could be to encode the query clauses as JSON objects. For example:  
>> result = [ select (x, sum_y)
>>            for x in range(1,8), 
>>                y in range(1,7)
>>            where x % 2 == 0 and y % 2 != 0 and x > y
>>            group by x
>>            let sum_y = sum(y)
>>            where sum_y % 2 != 0
>>            ]
>> result = pq([
>>     {"select":["x", "sum_y"]},
>>     {"for":{"x": range(1,8), "y": range(1,7)}},
>>     {"where": lambda x,y: x % 2 == 0 and y % 2 != 0 and x > y},
>>     {"groupby": "x"},
>>     {"with":{"sum_y":{"SUM":"y"}},
>>     {"where": {"neq":[{"mod":["sum_y", 2]}, 0]}}
>>  ])
>> This representation does look a little lispy, and it may resemble PythonQL's parse tree. I think the benefits are:
>> 
>> 1) no python language change
>> 2) easier to parse
>> 3) better than string-based DSL for catching syntax errors
>> 4) {"clause": parameters} format is flexible for handling common query patterns **
>> 5) works in javascript too
>> 6) easy to compose with automation (my favorite) 
>> 
>> It is probably easy for you to see the drawbacks.
>> 
>> 
>> ** The `where` clause can accept a native lambda function, or an expression tree
>> 
>> 
>> 
>> 
>> 
>> "If you are writing a loop, you are doing it wrong!" :)
>> 
>> 
>> On 2017-03-24 11:10, Pavel Velikhov wrote:
>>> Hi folks!
>>> 
>>>   We started a project to extend Python with a full-blown query language about a year ago. The project is call PythonQL, the links are given below in the references section. We have implemented what is kind of an alpha version now, and gained some experience and insights about why and where this is really useful. So I’d like to share those with you and gather some opinions whether you think we should try to include these extensions in the Python core.
>>> 
>>> Intro
>>> 
>>>   What we have done is (mostly) extended Python’s comprehensions with group by, order by, let and window clauses, which can come in any order, thus comprehensions become a query language a bit cleaner and more powerful than SQL. And we added a couple small convenience extensions, like a  We have identified three top motivations for folks to use these extensions:
>>> 
>>> Our Motivations
>>> 
>>> 1. This can become a standard for running queries against database systems. Instead of learning a large number of different SQL dialects (the pain point here are libraries of functions and operators that are different for each vendor), the Python developer needs only to learn PythonQL and he can query any SQL and NoSQL database.
>>> 
>>> 2. A single PythonQL expression can integrate a number of databases/files/memory structures seamlessly, with the PythonQL optimizer figuring out which pieces of plans to ship to which databases. This is a cool virtual database integration story that can be very convenient, especially now, when a lot of data scientists use Python to wrangle the data all day long.
>>> 
>>> 3. Querying data structures inside Python with the full power of SQL (and a bit more) is also really convenient on its own. Usually folks that are well-versed in SQL have to resort to completely different means when they need to run a query in Python on top of some data structures.
>>> 
>>> Current Status
>>> 
>>> We have PythonQL running, its installed via pip and an encoding hack, that runs our preprocessor. We currently compile PythonQL into Python using our executor functions and execute Python subexpressions via eval. We don’t do any optimization / rewriting of queries into languages of underlying systems. And the query processor is basic too, with naive implementations of operators. But we’ve build DBMS systems before, so if there is a good amount of support for this project, we’ll be able to build a real system here.
>>> 
>>> Your take on this
>>> 
>>> Extending Python’s grammar is surely a painful thing for the community. We’re now convinced that it is well worth it, because of all the wonderful functionality and convenience this extension offers. We’d like to get your feedback on this and maybe you’ll suggest some next steps for us.
>>> 
>>> References
>>> 
>>> PythonQL GitHub page: https://github.com/pythonql/pythonql <https://github.com/pythonql/pythonql>
>>> PythonQL Intro and Tutorial (this is all User Documentation we have right now): https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial <https://github.com/pythonql/pythonql/wiki/PythonQL-Intro-and-Tutorial>
>>> A use-case of querying Event Logs and doing Process Mining with PythonQL: https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL <https://github.com/pythonql/pythonql/wiki/Event-Log-Querying-and-Process-Mining-with-PythonQL>
>>> PythonQL demo site: www.pythonql.org <http://www.pythonql.org/>
>>> 
>>> Best regards,
>>> PythonQL Team
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Python-ideas mailing list
>>> Python-ideas at python.org <mailto:Python-ideas at python.org>
>>> https://mail.python.org/mailman/listinfo/python-ideas <https://mail.python.org/mailman/listinfo/python-ideas>
>>> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
>> 
>> 
>> 
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org <mailto:Python-ideas at python.org>
>> https://mail.python.org/mailman/listinfo/python-ideas <https://mail.python.org/mailman/listinfo/python-ideas>
>> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170327/8eb61c39/attachment-0001.html>


More information about the Python-ideas mailing list