[Python-ideas] Proposal: Query language extension to Python (PythonQL)

Sun Mar 26 07:40:41 EDT 2017

Hi Nick,

  Thanks for such a detailed response!

> On 25 Mar 2017, at 19:40, Nick Coghlan <ncoghlan at gmail.com> wrote:
> 
> First off, I think PythonQL (and PonyORM before it) is a very
> interesting piece of technology. However, I think some of the answers
> so far suggest we may need to discuss a couple of meta-issues around
> target audiences and available technical options before continuing on.
> 
> I'm quoting Gerald's post here because it highlights the "target
> audience" problem, but my comments apply to the thread generally.
> 
> On 25 March 2017 at 22:51, Gerald Britton <gerald.britton at gmail.com> wrote:
>> 
>> I see lots of C# code, but (thankfully) not so much LINQ to SQL.  Yes, it is a cool technology.  But I sometimes have a problem with the SQL it generates.  Since I'm also a SQL developer, I'm sensitive to how queries are constructed, for performance reasons, as well as how they look, for readability and aesthetic reasons.
>> 
>> LINQ queries can generate poorly-performing SQL, since LINQ is a basically a translator, but not an AI.  As far as appearances go, LINQ queries can look pretty gnarly, especially if they include sub queries or a few joins.  That makes it hard for the SQL dev (me!) to read and understand if there are performance problems (which there often are, in my experience)
>> 
>> So, I would tend to code the SQL separately and put it in a SQL view, function or stored procedure.  I can still parse the results with LINQ (not LINQ to SQL), which is fine.
>> 
>> For similar reasons, I'm not a huge fan of ORMs either.  Probably my bias towards designing the database first and building up queries to meet the business goals before writing a line of Python, C#, or the language de jour.
> 
> 
> Right, the target audience here *isn't* folks who already know how to
> construct their own relational queries in SQL, and it definitely isn't
> folks that know how to tweak their queries to get optimal performance
> from the specific database they're using. Rather, it's folks that
> already know Python's comprehensions, and perhaps some of the
> itertools features, and helping to provide them with a smoother
> on-ramp into the world of relational data processing.

Actually I myself am a user of PythonQL, even though I’m an SQL expert. I work in data science, so 
I do a lot of ad-hoc querying and we always get some new datasets we need to check out and work with.
Some things like nested data models are also much better handled by PythonQL, and data like
JSON or XML will also be easier to handle.

I see even more use-cases coming up, once we get further with smart database wrappers in PythonQL.

> 
> There's no question that folks dealing with sufficiently large data
> sets with sufficiently stringent performance requirements are
> eventually going to want to reach for handcrafted SQL or a distributed
> computation framework like dask, but that's not really any different
> from our standard position that when folks are attempting to optimise
> a hot loop, they're eventually going to have to switch to something
> that can eliminate the interpreter's default runtime object management
> overhead (whether that's Cython, PyPy's or Numba's JIT, or writing an
> extension module in a different language entirely). It isn't an
> argument against making it easier for folks to postpone the point
> where they find it necessary to reach for the "something else" that
> takes them beyond Python's default capabilities.

Don’t know, for example one of the wrappers is going to be an Apache Spark
wrappers, so you could quickly hack up a PythonQL query that would be run
on a distributed platform.

> 
> However, at the same time, PythonQL *is* a DSL for data manipulation
> operations, and map and filter are far and away the most common of
> those. Even reduce, which was previously a builtin, was pushed into
> functools for Python 3.0, with the preferred alternative being to just
> write a suitably named function that accepts an iterable and returns a
> single value. And while Python is a very popular tool for data
> manipulation, it would be a big stretch to assume that that was it's
> primary use case in all contexts.
> 
> So it makes sense to review some of the technical options that are
> available to help make projects like PythonQL more maintainable,
> without necessarily gating improvements to them on the relatively slow
> update and rollout cycle of new Python versions.
> 
> = Option 1 =
> 
> Fully commit to the model of allowing alternate syntactic dialects to
> run atop Python interpreters. In Hylang and PythonQL we have at least
> two genuinely interesting examples of that working through the text
> encoding system, as well as other examples like Cython that work
> through the extension module system.
> 
> So that's an opportunity to take this from "Possible, but a bit hacky"
> to "Pluggable source code translation is supported at all levels of
> the interpreter, including debugger source maps, etc" (perhaps by
> borrowing ideas from other ecosytems like Java, JavaScript, and .NET,
> where this kind of thing is already a lot more common.
> 
> The downside of this approach is that actually making it happen would
> be getting pretty far afield from the original PythonQL goal of
> "provide nicer data manipulation abstractions in Python", and it
> wouldn't actually deliver anything new that can't already be done with
> existing import and codec system features.

This would be great anyways, if we could rely on some preprocessor directive,
instead of hacking encodings, this could be nice.

> 
> = Option 2 =
> 
> Back when f-strings were added for 3.6, I wrote PEP 501 to generalise
> the idea as "i-strings": exposing the intermediate interpolated form
> of f-strings, such that you could write code like `myquery =
> sql(i"SELECT {column} FROM {table};")` where the "sql" function
> received an "InterpolationTemplate" object that it could render
> however it wanted, but the "column" and "table" references were just
> regular Python expressions.
> 
> It's currently deferred indefinitely, as I didn't have any concrete
> use cases that Guido found sufficiently compelling to make the
> additional complexity worthwhile. However, given optionally delayed
> rendering of interpolated strings, PythonQL could be used in the form:
> 
>    result =pyql(i"""
>        (x,y)
>        for x in {range(1,8)}
>        for y in {range(1,7)}
>        if x % 2 == 0 and
>           y % 2 != 0 and
>           x > y
>    """)
> 
> I personally like this idea (otherwise I wouldn't have written PEP 501
> in the first place), and the necessary technical underpinnings to
> enable it are all largely already in place to support f-strings. If
> the PEP were revised to show examples of using it to support
> relatively seamless calling back and forth between Hylang, PythonQL
> and regular Python code in the same process, that might be intriguing
> enough to pique Guido's interest (and I'm open to adding co-authors
> that are interested in pursuing that).

What would be the difference between this and just executing a PythonQL
string for us, getting local and global variables into PythonQL scope?

> 
> Option 3:
> 
> Go all the way to expanding comprehensions to natively be a full data
> manipulation DSL.
> 
> I'm personally not a fan of that approach, as syntax is really hard to
> search for help on (keywords are better for that than punctuation, but
> not by much), while methods and functions get to have docstrings. It
> also means the query language gets tightly coupled to the Python
> grammar, which not only makes the query language difficult to update,
> but also makes Python's base syntax harder for new users to learn.
> 
> By contrast, when DSLs are handled as interpolation templates with
> delayed rendering, then the rendering function gets to provide runtime
> documentation, and the definition of the DSL is coupled to the update
> cycle of the rendering function, *not* that of the Python language
> definition.
> 
> Cheers,
> Nick.
> 
> -- 
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/