On Mon, 2011-10-17 at 22:05 +1000, Nick Coghlan wrote:
On Mon, Oct 17, 2011 at 5:52 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
> Nick Coghlan wrote:
>>
>> Yeah, that's a large part of why I now think the given clause needs to
>> be built on the same semantics that we already use internally for
>> implicit out of order evaluation (i.e. decorators, comprehensions and
>> generator expressions), such that it merely exposes the unifying
>> mechanic underlying existing constructs rather than creating a
>> completely new way of doing things.
>
> I'm not sure what you mean by that. If you're talking about
> the implementation, all three of those use rather different
> underlying mechanics. What exactly do you see about these
> that unifies them?

Actually, comprehensions and generator expressions are almost
identical in 3.x (they only differ in the details of the inner loop in
the anonymous function).

For comprehensions, the parallel with the proposed given statement
would be almost exact:

    seq = [x*y for x in range(10) for y in range(5)]

would map to:

    seq = _list_comp given _outermost_iter = range(10):
        _list_comp = []
        for x in _outermost_iter:
            for y in range(5):
                _list_comp.append(x*y)


Ok, here's a way to look at this that I think you will find interesting.


It looks to me that the 'given' keyword is setting up a local name_space in the way it's used.  So rather than taking an expression, maybe it should take a mapping.  (which could be from an expression)


    mapping = dict(iter1=range(10), iter2=range(5))
    given mapping:
        # mapping as local scope
list_comp=[]
        for x in iter1:
            for y in iter2:
                list_comp.append(x*y)
    seq = mapping['list_comp']

(We could stop here.)

This doesn't do anything out of order.  It shows that statement local name space, and the out of order assignment are two completely different things. But let's continue...


Suppose we use a two suite pattern to make getting values out easier.

    mapping = dict(iter1=range(10), iter2=range(5))
    given mapping:
        list_comp=[]
        for x in iter1:
            for y in iter2:
                list_comp.append(x*y)
    get:
        list_comp as seq         # seq = mapping['list_comp']
       
That saves us from having to refer to 'mapping' multiple times, especially if we need to get a lot of values from it.


So now we can change the above to ...

    given dict(iter1=range(10), iter2=range(5)):
list_comp=[]
        for x in iter1:
            for y in iter2:
                list_comp.append(x*y)
    get:
        list_comp as seq


And then finally put the 'get' block first.

    get:
        list_comp as seq
    given dict(iter1=range(10), iter2=range(5)):
        list_comp=[]
        for x in iter1:
            for y in iter2:
                list_comp.append(x*y)

Which is very close to the example you gave above, but more readable because it puts the keywords in the front.  That also makes it more like a statement than an expression.

Note, that if you use a named mapping with given, you can inspect it after the given block is done, and/or reuse it multiple times.  I think that will be very useful for unittests.


This creates a nice way to express some types of blocks that have local only names in pure python rather than just saying it's magic dust sprinkled here and there to make it work like that.

(That doesn't mean we should actually change those, but the semantics could match.)

And similarly for set and dict comprehensions:

    # unique = {x*y for x in range(10) for y in range(5)}
    unique = _set_comp given _outermost_iter = range(10):
        _set_comp = set()
        for x in _outermost_iter:
            for y in range(5):
                _set_comp.add(x*y)

    get:
        set_comp as unique
    given dict(iter1=range(10), iter2=range(5)):
        set_comp = set()
        for x in iter1:
            for y in iter2:
                set_comp.add(x, y)


    # map = {(x, y):x*y for x in range(10) for y in range(5)}
    map = _dict_comp given _outermost_iter = range(10):
        _anon = {}
        for x in _outermost_iter:
            for y in range(5):
                _anon[x,y] = x*y

    get:
        dict_comp as map
    given dict(iter1=range(10), iter2=range(5)):
        dict_comp = {}
        for x in iter1:
            for y in iter2:
                dict_comp[x] = y


I'm not sure if I prefer the "get" block first or last.

    given dict(iter1=range(10), iter2=range(5)):
        dict_comp = {}
        for x in iter1:
            for y in iter2:
                dict_comp[x] = y
    get:
        dict_comp as map

But the order given/get order is a detail you can put to a final vote at some later time.


Note that this lays bare some of the quirks of comprehension scoping -
at class scope, the outermost iterator expression can sometimes see
names that the inner iterator expressions miss.

For generator expressions, the parallel isn't quite as strong, since
the compiler is able to avoid the redundant anonymous function
involved in the given clause and just emit an anonymous generator
directly. However, the general principle still holds:

    # gen_iter = (x*y for x in range(10) for y in range(5))
    gen_iter = _genexp() given _outermost_iter = range(10):
        def _genexp():
            for x in _outermost_iter:
                for y in range(5):
                    yield x*y


    given dict(iter1=range(10), iter2=range(5)):
        def genexp():
            for x in iter1:
                for y in iter2:
                    yield x*y
    get:
        genexp as gen_iter



Interestingly, if we transform the given blocks a bit more we get something that is nearly a function.

   given Signature(<signature>).bind(mapping):
       ... function body ...
   get:
       ... return values ...


('def' would wrap it in an object, and give it a name.)


So it looks like it has potential to unify some underlying mechanisms as well as create a nice local only statement space.

What I like about it is that it appears to complement python very well and doesn't feel like it's something tacked on.  I think having given take a mapping is what did that for me.


Cheers,
    Ron


For decorated functions, the parallel is actually almost as weak as it
is for classes, since so many of the expressions involved (decorator
expressions, default arguments, annotations) get evaluated in order in
the current scope and even a given statement can't reproduce the
actual function statement's behaviour of not being bound at *all* in
the current scope while decorators are being applied, even though the
function already knows what it is going to be called:

It's hard to beat a syntax that is only one character long. ;-)


>>> def call(f):
...     print(f.__name__)
...     return f()
...
>>> @call
... def func():
...     return func.__name__
...
func
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in call
  File "<stdin>", line 3, in func
NameError: global name 'func' is not defined
>
So it's really only the machinery underlying comprehensions that is
being exposed by the PEP rather than anything more far reaching.
>
Exposing the generator expression machinery directly would require the
ability to turn the given clause into a generator (via a top level
yield expression) and then a means to reference that from the header
line, which gets us back into cryptic and unintuitive PEP 403
territory. Better to settle for the named alternative.

Cheers,
Nick.