[Python-ideas] Proposal: Query language extension to Python (PythonQL)

Pavel Velikhov pavel.velikhov at gmail.com
Sun Mar 26 07:14:56 EDT 2017


Terry,

> On 26 Mar 2017, at 07:23, Terry Reedy <tjreedy at udel.edu> wrote:
> 
> On 3/25/2017 11:40 AM, Kyle Lahnakoski wrote:
>> 
>> Pavel,
>> 
>> I like PythonQL. I perform a lot of data transformation, and often find
>> Python's list comprehensions too limiting; leaving me wishing for
>> LINQ-like language features.
>> 
>> As an alternative to extending Python with PythonQL, Terry Reedy
>> suggested interpreting a DSL string, and Pavel Velikhov alluded to using
>> magic method tricks found in ORM libraries. I can see how both these are
>> not satisfactory.
>> 
>> A third alternative could be to encode the query clauses as JSON
>> objects. For example:
> 
> PythonQL version
> 
>> result = [ select (x, sum_y)
>>           for x in range(1,8),
>>               y in range(1,7)
>>           where x % 2 == 0 and y % 2 != 0 and x > y
>>           group by x
>>           let sum_y = sum(y)
>>           where sum_y % 2 != 0
>>           ]
> 
> Someone mentioned the problem of adding multiple new keywords.  Even 1 requires a proposal to meet a high bar; I think we average less than 1 new keyword per release in the last 20 years.
> 
> Searching '\bgroup\b' just in /lib (the 3.6 stdlib on Windows) gets over 300 code hits in about 30 files.  I think this makes in ineligible to bere's match.group() accounts for many.  'select' has fair number of code uses also.  I also see 'where', 'let', and 'by' in the above.

Yes, we add quite a few keywords. If you look at the window clause we have, there are even more keywords there.
This is definitely a huge concern and the main reason that the community would oppose the change in my view.

I’m not too experienced with Python parser, but could we make all these keywords not be real keywords (only interpreted
inside comprehension as keywords, not breaking any other code)?

> 
>> result = pq([
>>    {"select":["x", "sum_y"]},
>>    {"for":{"x": range(1,8), "y": range(1,7)}},
>>    {"where": lambda x,y: x % 2 == 0 and y % 2 != 0 and x > y},
>>    {"groupby": "x"},
>>    {"with":{"sum_y":{"SUM":"y"}},
>>    {"where": {"neq":[{"mod":["sum_y", 2]}, 0]}}
>> ])
>> 
>> This representation does look a little lispy, and it may resemble
>> PythonQL's parse tree. I think the benefits are:
>> 
>> 1) no python language change
>> 2) easier to parse
>> 3) better than string-based DSL for catching syntax errors
>> 4) {"clause": parameters} format is flexible for handling common query
>> patterns **
>> 5) works in javascript too
>> 6) easy to compose with automation (my favorite)
>> 
>> It is probably easy for you to see the drawbacks.
> 
> 
> -- 
> Terry Jan Reedy
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org <mailto:Python-ideas at python.org>
> https://mail.python.org/mailman/listinfo/python-ideas <https://mail.python.org/mailman/listinfo/python-ideas>
> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20170326/e3fb5bc9/attachment-0001.html>


More information about the Python-ideas mailing list