Please comment on PEP XXX: Enhanced Generators

Oren Tirosh oren-py-l at hishome.net
Fri Feb 1 04:42:14 EST 2002


> 1.  Four new built-in functions:  xmap, xfilter, xzip, and indexed

I have already implemented these functions (except indexed) two months ago 
in my dataflow library (http://www.tothink.com/python/dataflow).  

My implementation has a very significant difference: just like the xrange 
function, the resulting object is a restartable source, not a one-time 
iterator.  Naturally, the source is restartable only of all arguments to 
the functions are restartable sources themselves.

> 2.  Generator comprehensions:  [yield int(x) for x in floatgen()]

The only problem I see with this syntax is that the square brackets are
misleading - the resulting object is not a list.  How about using parens
instead?  They are not strictly required, but just like for tuples, they
help disambiguate the syntax.

> 3.  Feeding data into a generator using an optional argument to .next()

Python currently has generators that produce data.  Below are three other
significant dataflow configurations:

1. consumers
2. transformations - receive data from an upstream source, perform some 
transformation on it and send it to their downstream consumer.
3. dialogues - like transformations, dialogues they have an input and an 
output, but they are both connected to the same logical entity, not an 
upstream and a downstream.

Consumers functions are relatively simple. Like generators, they do not 
require full-fledged multithreading, just the suspension of a single 
function context.  Like generator functions, consumer functions would allow 
the state machines of the producer process and the consumer pto run 
independently.

Dialogues can be implemented without threads, too, but in that case they
would require the producer process and the consumer process to run in 
lockstep, producing and consuming an item of data for each step.  Letting
them produce and consumer data at different rates would require either 
the use of null values, full threading support or decoupling the rates 
with a queue.

Transformations can be implemented easily using a generator function that
takes an upstream iterator as an argument.  Transformations may be chained
together, if necessary.  The input and output rates of a tranformation are 
decoupled without any need for threads of queuing: the transformation may 
call the upstream iterator's next more or less than once per yield statement.

Transformations could also be implemented as a consumer function that
takes a downstream sink object as an argument, but the upstream view looks
more natural and can be implemented today using generator functions.

My opinion:

We know enough about consumers to implement them now, if a good syntax is 
found and they are deemed important enough by the BDFL.  Transformations 
are easy with the current generators.  Dialogues are a form of dataflow 
more often found in communication systems than as a method of data flow 
between two parts of the same program.  Communication systems have many 
other issues to deal with than just the two-way exchange of data so two-way 
generators are probably not the right way to support them.

   dataflowingly yours,  

	Oren


Appendix: Emulating consumer functions with generator functions: 

  def consumer(args):
      <initialize>
      more = [None]
      yield more
      while more:
          <use data in more[0]>
          yield more
      <cleanup>

driving a consumer function:

    for mailbox in consumer(args):
        if <more data>:
            mailbox[0] = <data>
        else:
            del mailbox[:]

the long version:

    itr = consumerfunction(args)
    mailbox = itr.next()
    while <more data>
      mailbox[0] = <data>
      itr.next()
    mailbox[:] = []
    try:
      itr.next()
      raise 'Shouldn't be here'
    except StopIteration: pass





More information about the Python-list mailing list