On Tue, Jul 27, 2010 at 12:25 PM, Bruce Leban <bruce@leapyear.org> wrote:

The idea of LINQ is that you write the expression directly in the language and it translates into a query expression. It's going to be operating on an expression parse tree, right? Rather than trying to change the allowable expressions maybe the question is to figure out how to translate what we have and find what we can't express with what we have (and that's an orthogonal question and has nothing to do with __xxx__ functions).

On Tue, Jul 27, 2010 at 9:42 AM, Masklinn <masklinn@masklinn.net> wrote:

What about french quotes

expr = «x + y * z»

Isn't there are already a syntax for this?

expr = lambda: x + y * z

Maybe you want some conversion of that lambda into a different form:

expr = @ast lambda: x + y + z

There's also an unexecuted expression in generator expressions, which is prettier than lambda.

There's two places where I've seen people doing this in Python (not counting the operator overloading, of which there are many examples). The first is DejaVu (http://www.aminus.net/dejavu/chrome/common/doc/trunk/managing.html#Querying) which decompiles lambdas. Then it does partial translation to SQL, and I think will actually execute things in Python when they can't be translated (e.g., if you are using a Python function on a database-derived result). But it can easily break between Python versions, and only works with CPython. It also seems to have some fairly complex rules about partial evaluation.

The other place is peak.rules (http://pypi.python.org/pypi/PEAK-Rules) which uses a strings for conditions. My understanding is that the string is compiled to an AST and then analyzed, so partial expressions shared by many conditions can be efficiently evaluated together. Also it changes scopes (the expression is defined outside the function, but evaluated in the context of specific function arguments).

Maybe it'd be helpful to consider actual examples in the context of SQL...

def users_over_age(minimum_age=timedelta(years=18)):
    return User.select("datetime.now() - user.birth_date > minimum_age")
    # or...
    return User.select(datetime.now() - User.birth_date > minimum_age)

def users_with_addresses():
    return User.select("sql_exists(Address.select('address.user_id == user.id'))")
    # or ...
    return User.select(sql_exists(Address.select(Address.user_id == User.user_id))

def users_in_list(list_of_users_or_ids):
    list_of_ids = [item.id if isinstance(item, User) else item for item in list_of_users_or_ids]
    return User.select("user.id in list_of_ids")
    # or ...
    return User.select(sql_in(User.id, list_of_ids))

Well, I'm not seeing any advantage. You could do things like:

def valid_email(email):
    # Obviously write it better...
    return re.match(r'[a-z0-9]+@[a-z0-9.-]+', email)

def users_with_valid_email():
    return User.select("valid_email(user.email)")

and then have it detect (ala DejaVu) that valid_email() cannot be translated to SQL, so select everything then filter it with that function. This looks clever, but usually this kind of transparency will only bite; as in this example, what looks like it might be a normal kind of query is actually an extremely expensive query that might take a very long time to complete. (My impression is that LINQ is clever like this too, allowing expressions that are evaluated in part in different locations?)

I was worried about binding arguments, but potentially it can work nicely. E.g., all these only take a single variable from the outer scope, but imagine something like:

def expired_users(time=timedelta(days=30):
    return User.select("user.last_login < (datetime.now() - time)")

if you were clever you could detect that "datetime.now() - time" can be statically computed. If you weren't clever you might send the expression to the database (which actually isn't terrible). But maybe consider a case:

def users_with_ip(ip):
    return User.select("user.last_login_ip == encode_ip(ip)")

where encode_ip does something like turn dotted numbers into an integer. If the mapper is clever it might tell that there are no SQL expressions in the arguments to encode_ip, and it can evaluate it early. Except... what if the function does something like return a random number? Then you've changed things by evaluating it once instead of for every user. So maybe you can't do that optimization, and so the only way to make this work is to create a local variable to make explicit that you only want to evaluate the argument once. As such, the status quo is better (User.select(User.last_login_ip == encode_ip(ip))) because the way it is evaluated is more obvious, and the constraints are clearer. This is managed because "magic" stuff is very specific (those column objects, which have all the operator overloading), and everything else is plain Python.

--
Ian Bicking | http://blog.ianbicking.org