[Patches] [ python-Patches-872326 ] generator expression implementation

SourceForge.net noreply at sourceforge.net
Sun Mar 21 11:59:14 EST 2004


Patches item #872326, was opened at 2004-01-07 21:10
Message generated for change (Comment added) made by jiwon
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=872326&group_id=5470

Category: Parser/Compiler
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Jiwon Seo (jiwon)
Assigned to: Raymond Hettinger (rhettinger)
Summary: generator expression implementation

Initial Comment:
Since I was interested in pep 289(generator
expression), I dabbled with it, and implemented a
working version of it. I'm not sure if I did it right,
but maybe someone who knows better can fix it right.

1. Grammar has two changes, which is
 a. atom: '(' [testlist] ')' | '[' [listmaker] ']' | ...

     changes to 

    atom: '(' [testgenexpr] ')' | '[' [listmaker] ']' | ...

     where testgenexpr defines like this.
 
    testgenexpr: test ( gen_for | (',' test)* [','] )

 b. argument: [test '='] test
 
     changes to

    argument: [test '='] test [gen_for]

 (gen_for, gen_if, gen_iter is similar to list_for,
list_if, list_iter respectively.)

2. Instead of changing rule of arglist in Grammar to
accept generator expression, I changed argument rule
like 1.b. This means Grammar accepts generator
expression without parenthesis in function call even
there are several arguments, like

reduce(operator.add, (x for x in range(10))) 

This is against what pep requires, so I made
parsermodule.c and compile.c to catch that and throw
error message when there's more than one argument in a
function. The reason I did it that way is to go around
a problem of ambiguity in the grammar by adding
generator expression to arglist.


3. I implemented generator expression as equivalent to
a call to a function which has variable capturing code
and generator code. For example,

x = 1
g = (x for i in range(10))

is equivalent to

x = 1
def __gen_wrapper():
    _[x] = x # variable capture here
    def __gen():
        for i in range(10):
            yield _[x]
    return __gen()

g = __gen_wrapper()

4. Since I implemented generator expression equivalent
to a function call, symbol table should enter new scope
when it meets generator expression. So I ended up with
adding following code to symtable.c

PyObject *
PySymtableEntry_New(struct symtable *st, char *name,
int type, int lineno)
{

...
        switch (type) {
        case funcdef:
        case lambdef:
!       case testgenexpr: /* generator expression */
!       case argument:    /* generator expression */
                ste->ste_type = TYPE_FUNCTION;
                break;
...

so that generator expression can be dealt as function
type. This is kind of stupid, but I couldn't find other
easy for it, so I just left it like that.

5. this patch does not include diff of
src/Lib/symbol.py, so you must run python Lib/symbol.py
to get it right.

----------------------------------------------------------------------

>Comment By: Jiwon Seo (jiwon)
Date: 2004-03-22 01:59

Message:
Logged In: YES 
user_id=595483

Hi, quiver. I don't think we can easily go around this problem 
if we have to capture iterators in generator expression.
If you run following, you'll know what I mean.

>>> a = iter("abcd")
>>> b = iter("abcd")
>>> [(x, y) for x in a for y in b]
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')]

I think one possible solution could be, instead of passing 
evaluations of iterators in generator expression,  make a list 
from iterator and, then pass it as argument. For instance,

g = (x for x in iter("abcd"))

will be equivalent to,

def __gen(_[1]):
    for x in _[1]:
        yield x
g = __gen(list(iter("abcd"))) # see 'list'

- instead of g = __gen(iter("abcd"))  .

I'm not sure if I'm in a position to decide to do that way or 
not. If the current reviewer (rhettinger) approves it, I'll do 
that way. Or else, I think I will post it on the mailing list.

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2004-03-19 22:37

Message:
Logged In: YES 
user_id=671362

Hi, Jiwon. This is another bug candidate.
If you use genexp with iterators, it doesn't work well.

# test Perky's previous post using iterators
>>> list((x,y) for x in iter('abcd') for y in iter('abcd'))
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')]

Thanks.

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-03-17 17:45

Message:
Logged In: YES 
user_id=595483

To fix the bug that perky picked out, I hand-made ast.py
instead of using auto-generated code. But, still I don't
understand why Tools/compiler/regrtest didn't tell me about
it. (Maybe I didn't look at the output carefully enough.)
Ah, and it would be nice if somebody tell me how to run
test_grammar.py(only) with python-compiler package.

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-03-17 15:28

Message:
Logged In: YES 
user_id=55188

The compiler package patch has some problems in its
compiler/ast.py currently.
jiwon said he regenerated it using Tools/compiler/astgen.py.
But it made some incompatibilities against ast.py on current
CVS. For example, class Expression.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-17 15:21

Message:
Logged In: YES 
user_id=80475

I'll give it a second review.

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-03-17 14:37

Message:
Logged In: YES 
user_id=55188

Okay. Here's my draft for documentation.
Any review/fix/enhance is very welcome!
I think it's ready to putting it into CVS now.

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-03-17 01:24

Message:
Logged In: YES 
user_id=595483

ah, the following bug was due to the patch I uploaded
2004-02-04 15:13.

In order to get an error instantly from an expression like
"g=(x for x in None)", I made it equivalent to,

def __gen(_[1]):
    for x in _[1]:
        yield x
g=__gen(iter(None)) 
                 ^^^^

But when there is two or more iterator expression of same
symbol(or string), that patch produces error, because
currently, "((x, y) for x in 'abcd' for y in 'abcd') " is
equivalent to,

def __gen(_[1]):
    for x in _[1]:
        for y in _[1]:
            yield (x,y)

g = __gen(iter("abcd")) # passing only one parameter

I could make it pass every iterator expressions respectively
 even when they are same symbol(or string, ...), but for
now, I just dropped that patch (patch of 2004-02-04) since
that's the simplest thing now. If somebody says getting
instance error for cases like "(x for x in None)" is
important, I'll make another patch for it.

Btw, added more test case that covers what perky mentioned.

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-03-16 13:16

Message:
Logged In: YES 
user_id=55188

Another bug in the implementation.

>>> list((x, y) for x in 'abcd' for y in 'abcd')
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')]

Expected behavior may be:
>>> [(x, y) for x in 'abcd' for y in 'abcd']
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'a'),
('b', 'b'), ('b', 'c'), ('b', 'd'), ('c', 'a'), ('c', 'b'),
('c', 'c'), ('c', 'd'), ('d', 'a'), ('d', 'b'), ('d', 'c'),
('d', 'd')]


----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-03-14 11:59

Message:
Logged In: YES 
user_id=55188

I'm writing docs for tutorial and language reference. It'll
be completed in a day or two.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-14 06:17

Message:
Logged In: YES 
user_id=80475

Any recent progress?

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-02-24 01:34

Message:
Logged In: YES 
user_id=595483

Whoa! I finally patched compiler package in pure python. (I
was a bit busy recently :)
I've run regression test with 'Tools/compiler/regrtest.py'
before I submit this patch, and there was one failure (which
is test_dis.py). I'm not sure if that's a problem, so I'm
just submitting this in. 

Now, I think I'll refactor things a bit!

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-02-04 22:53

Message:
Logged In: YES 
user_id=55188

As it mentioned in PEP, even listcomp's leaking of loop
value will be dropped in Python 3. I'm sorry but I don't see
that the usage is solid in the future.

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2004-02-04 19:45

Message:
Logged In: YES 
user_id=671362

What about this code?
In the currently implementation, loop variables inside a list 
comprehension is not visible outside if you use it inside a 
generator expression.
For example:

>>> (a*b for a in [b for b in range(5)])
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
NameError: name 'b' is not defined

Its list comprehension counterpart is:

>>> [a*b for a in [b for b in range(5)]]
[0, 4, 8, 12, 16]

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-02-04 15:13

Message:
Logged In: YES 
user_id=595483

Again, I fixed the patch so that it can get error from '(x
for x in None)' immediately. (upto now, error does not occur
until generator expression is evaluated)

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-02-03 20:46

Message:
Logged In: YES 
user_id=4771

After thinking a bit more on the issue, note that the generator version of your example is equivalent to:

def g():
    for y in range(3):
        yield lambda x: x+y

which means that it can suffer from the same problem as the first example, but more subtly (and even more confusingly):

for f in g(): print f(0)         # 0, 1, 2
for f in list(g()): print f(0)   # 2, 2, 2

This is because due to Python's nested scope rules the above generator is equivalent to:

def g():
    result = lambda x: x+y
    for y in range(3):
        yield result

at which point I think it gets pretty confusing...

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2004-02-03 18:07

Message:
Logged In: YES 
user_id=671362

Thanks, Arigo and Perky.

Hmm, maybe I should read PEP and the thread about 
namespace more carefully, not just skim the surface.
Anyway, PEP has not been updated since last October, so I 
think it's good time to merge recent progress of genexpr and 
update PEP-289.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-02-02 21:58

Message:
Logged In: YES 
user_id=4771

The behavior is indeed the one described by the PEP but it is still surprising.  As far as I'm concerned it is yet another reason why free variable bindings ("nested scopes") are done wrong in Python currently :-(

If you use the older trick "lambda x,y=y:x+y" instead, then both cases will agree (with the sensible result in my opinion).  A few more issues and I'll start a campain for definition-time bindings of free variables :-(

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-02-02 21:37

Message:
Logged In: YES 
user_id=55188

I think it's right for namespace difference between listcomp
and genexpr.

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2004-02-02 20:47

Message:
Logged In: YES 
user_id=671362

I am not sure if I should call this a bug, but here it goes:

>>> lst = [lambda x:x+y for y in range(3)]
>>> for f in lst:print f(0)
...
2
2
2
>>> gen = (lambda x:x+y for y in range(3))
>>> for f in gen:print f(0)
...
0
1
2

Is this the intended behavior?

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-02-02 17:29

Message:
Logged In: YES 
user_id=595483

ok. I fixed another bug reported by perky, and added related
test to test_grammar.py.


----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-01-30 14:09

Message:
Logged In: YES 
user_id=55188

Yet another Fatal Python error;

>>> (a for a in (b for b in (a for a in 'xxx' if True) if
False) if True)
lookup 'True' in ? 3 -1
freevars of <generator expression>: ('True',)
Fatal Python error: com_make_closure()
zsh: abort (core dumped)  ./python

# applied patch as of 2004-01-30

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-01-30 12:01

Message:
Logged In: YES 
user_id=595483

ok, I've fixed the bug quiver commented, and added related
test code to test_grammar.py.

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2004-01-29 20:48

Message:
Logged In: YES 
user_id=671362

yet another Fatal Python error;

>>> (a for a in (b for b in (0,1)))
<generator object at 0x40170f8c>
>>> (a for a in (b for b in range(2)))
Fatal Python error: unknown scope for range in ?(0) in <stdin>
symbols: {'range': 8}
locals: {}
globals: {}
 
Aborted

# applied patch as of 2004-01-27

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-01-27 16:44

Message:
Logged In: YES 
user_id=595483

I fixed the patch for the bug that arigo mentioned, and for
what perky mentioned - PyList_GetSlice error handling - .

now, I'll tackle python compiler package :)

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-01-27 00:15

Message:
Logged In: YES 
user_id=4771

>>> (a for d in a)
Fatal Python error: unknown scope for a in <generator
expression>(1) in <stdin>
symbols: {'a': 2048, '_[1]': 4, 'd': 2}
locals: {'_[1]': 0, 'd': 1}
globals: {}


----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-01-26 15:17

Message:
Logged In: YES 
user_id=55188

Please ignore the previous comment.
It was a result from an old revision of patches. It works on
the recent.


----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-01-26 08:44

Message:
Logged In: YES 
user_id=55188

BTW, does unavaliability of yield (genexpr) and return
(genexpr) an intended behavior?

----------------------------------------------------------------------

Comment By: Michael Hudson (mwh)
Date: 2004-01-22 03:13

Message:
Logged In: YES 
user_id=6656

Documentation would be nice!

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-01-21 23:14

Message:
Logged In: YES 
user_id=55188

Okay. My review is done. The revised patch "gexp.diff" looks fine 
for me.
There're few minor tweaks still needed to get into CVS.
- Some error exit cases are ignored. You need to provide safe exit 
paths for MemoryError. (eg. PyList_GetSlice usage of Python/
compiler.c)
- A patch for 'compiler' pure python package. (there's an old 
implementation on SF #795947)

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-01-19 20:10

Message:
Logged In: YES 
user_id=595483

ok. I modified the patch so that it evaluates iterator expr in 
generator expression creation time. That means, 

g = (x for x in range(10))

is equivalent to

def __gen(iter1):
    for x in iter1:
        yield x
g = __gen(range(10))

so, evalution of range(10) is not deferred anymore.

I also changed testgenexpr to testlist_gexp in 
Grammar/Grammar, since Hye-Shik Chang advised as such.
I'm willing to take any advice about this patch, so please do. 

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-01-12 12:26

Message:
Logged In: YES 
user_id=595483

ok. I've implemented capturing variables part as arigo
suggested. File genexpr-capture-vars-in-args.diff is that.
If you look at the code, you'll find that the code for
generator expression is much shorter, and there's lesser
modification in the existing code.

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-01-09 10:49

Message:
Logged In: YES 
user_id=595483

What arigo wrote sounds reasonable, and not very difficult
to implement. I'll try to do that way.

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2004-01-09 03:50

Message:
Logged In: YES 
user_id=4771

We may not need two levels of nested anonymous functions.  It seems to me that the following equivalent code would do, because it captures the variable in an argument instead of via nested scopes:

x = 1
def __gen(x):
  for i in range(10):
    yield x
g = __gen(x)

I don't know though if this is easy to implement in compile.c.  Alternatively:

x = 1
def __gen(x=x):
  for i in range(10):
    yield x
g = __gen()


----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-01-08 14:44

Message:
Logged In: YES 
user_id=55188

Okay. I verified that basic characteristics mentioned on PEP
are working.
I started to review the implementation.

----------------------------------------------------------------------

Comment By: Jiwon Seo (jiwon)
Date: 2004-01-08 13:50

Message:
Logged In: YES 
user_id=595483

I added diff of Lib/symbol.py, so no need to run python
Lib/symbol.py now.

----------------------------------------------------------------------

Comment By: Hye-Shik Chang (perky)
Date: 2004-01-07 21:25

Message:
Logged In: YES 
user_id=55188

Assigned to me.
The originator is my friend and I have much interest on
this. :-)


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=872326&group_id=5470



More information about the Patches mailing list