Re: [Python-ideas] Non-boolean return from contains

27 Jul 2010

      On Tue, Jul 27, 2010 at 12:25 PM, Bruce Leban  wrote:
...
The idea of LINQ is that you write the expression directly in the language
and it translates into a query expression. It's going to be operating on an
expression parse tree, right? Rather than trying to change
the allowable expressions maybe the question is to figure out how to
translate what we have and find what we can't express with what we have (and
that's an orthogonal question and has nothing to do with __xxx__ functions).
On Tue, Jul 27, 2010 at 9:42 AM, Masklinn  wrote:
...
What about french quotes
expr = «x + y * z»
Isn't there are already a syntax for this?
expr = lambda: x + y * z
Maybe you want some conversion of that lambda into a different form:
expr = @ast lambda: x + y + z
There's also an unexecuted expression in generator expressions, which is
prettier than lambda.

There's two places where I've seen people doing this in Python (not counting
the operator overloading, of which there are many examples).  The first is
DejaVu (
http://www.aminus.net/dejavu/chrome/common/doc/trunk/managing.html#Querying)
which decompiles lambdas.  Then it does partial translation to SQL, and I
think will actually execute things in Python when they can't be translated
(e.g., if you are using a Python function on a database-derived result).
But it can easily break between Python versions, and only works with
CPython.  It also seems to have some fairly complex rules about partial
evaluation.

The other place is peak.rules (http://pypi.python.org/pypi/PEAK-Rules) which
uses a strings for conditions.  My understanding is that the string is
compiled to an AST and then analyzed, so partial expressions shared by many
conditions can be efficiently evaluated together.  Also it changes scopes
(the expression is defined outside the function, but evaluated in the
context of specific function arguments).

Maybe it'd be helpful to consider actual examples in the context of SQL...

def users_over_age(minimum_age=timedelta(years=18)):
    return User.select("datetime.now() - user.birth_date > minimum_age")
    # or...
    return User.select(datetime.now() - User.birth_date > minimum_age)

def users_with_addresses():
    return User.select("sql_exists(Address.select('address.user_id ==
user.id'))")
    # or ...
    return User.select(sql_exists(Address.select(Address.user_id ==
User.user_id))

def users_in_list(list_of_users_or_ids):
    list_of_ids = [item.id if isinstance(item, User) else item for item in
list_of_users_or_ids]
    return User.select("user.id in list_of_ids")
    # or ...
    return User.select(sql_in(User.id, list_of_ids))

Well, I'm not seeing any advantage.  You could do things like:

def valid_email(email):
    # Obviously write it better...
    return re.match(r'[a-z0-9]+@[a-z0-9.-]+', email)

def users_with_valid_email():
    return User.select("valid_email(user.email)")

and then have it detect (ala DejaVu) that valid_email() cannot be translated
to SQL, so select everything then filter it with that function.  This looks
clever, but usually this kind of transparency will only bite; as in this
example, what looks like it might be a normal kind of query is actually an
extremely expensive query that might take a very long time to complete.  (My
impression is that LINQ is clever like this too, allowing expressions that
are evaluated in part in different locations?)

I was worried about binding arguments, but potentially it can work nicely.
E.g., all these only take a single variable from the outer scope, but
imagine something like:

def expired_users(time=timedelta(days=30):
    return User.select("user.last_login < (datetime.now() - time)")

if you were clever you could detect that "datetime.now() - time" can be
statically computed.  If you weren't clever you might send the expression to
the database (which actually isn't terrible).  But maybe consider a case:

def users_with_ip(ip):
    return User.select("user.last_login_ip == encode_ip(ip)")

where encode_ip does something like turn dotted numbers into an integer.  If
the mapper is clever it might tell that there are no SQL expressions in the
arguments to encode_ip, and it can evaluate it early.  Except... what if the
function does something like return a random number?  Then you've changed
things by evaluating it once instead of for every user.  So maybe you can't
do that optimization, and so the only way to make this work is to create a
local variable to make explicit that you only want to evaluate the argument
once.  As such, the status quo is better (User.select(User.last_login_ip ==
encode_ip(ip))) because the way it is evaluated is more obvious, and the
constraints are clearer.  This is managed because "magic" stuff is very
specific (those column objects, which have all the operator overloading),
and everything else is plain Python.

-- 
Ian Bicking  |  http://blog.ianbicking.org

Re: [Python-ideas] Non-boolean return from __contains__

Ian Bicking

Re: [Python-ideas] Non-boolean return from contains