request help with Pipe class in iterwrap.py

Wed May 3 03:40:18 EDT 2006

On Wed, 03 May 2006 08:01:12 +0200, Marc 'BlackJack' Rintsch wrote:
> Maybe one with less "magic" syntax.  What about using a function that
> takes the iterators and an iterable and returns an iterator of the chained
> iterators::
> 
>   new_list = pipe(grep('larch'), grep('parrot', 'v')), list)(my_list)

This is a good idea.  But not quite magic enough, I'm afraid.

One of the features of Pipe() is that it automatically pastes in the first
argument of each function call (namely, the iterator returned by the
previous function call). It is able to do this because of a special
__getattr__ that grabs the function reference but returns the "self"
instance of the Pipe class, to allow the dot-chain to continue.

Any supplied options will then be pasted in after that first argument.

In your example, "grep('larch')" is going to be evaluated by Python, and
immediately called.  And it will then complain because its first argument
is not an iterator.  I cannot see any way to modify this call before it
happens.

If we take your basic idea, and apply just a little bit of magic, we could
do this:

new_list = Pipe(my_list, grep, 'larch', grep, ('parrot', 'v'), list)

The rules would be:

* the first argument to Pipe is always the initial iterable sequence.

* each argument after that is tested to see if it is callable.  If it is,
it's remembered; if not, it is presumed to be an argument for the
remembered callable.  Multiple arguments must be packaged up into a tuple
or list.  Once Pipe() has a callable and an argument or sequence of
arguments, Pipe() can paste in all arguments and make the call;
alternatively, once Pipe() sees another callable, it can safely assume
that there aren't going to be any extra arguments for the remembered
callable, and paste in that one iterator argument and make the call.

Now Pipe always knows when it has reached the last callable, because it
will have reached the end of the supplied arguments!  Then it can safely
assume there aren't going to be any extra arguments, and make the call to
the last remembered callable.

However, I remain fond of the dot-chaining syntax.  For interactively
playing around with data, I think the dot-chaining syntax is more natural
for most people.

newlist = Pipe(mylist).sort.uniq.list()

newlist = Pipe(mylist, sort, uniq, list)

Hmmm.  The second one really isn't bad...  Also, the second one doesn't
require my tricky e_eval() to work; it just lets Python figure out all the
function references.

Thinking about it, I realize that "list" is a very common thing with
which to end a dot-chain.  I think perhaps if my code would just notice
that the last function reference is "list", which takes exactly one
argument and thus cannot be waiting for additional arguments, it could
just call list() right away.

If there were a general way to know that a function reference only expects
a single argument, I could generalize this idea.  But it may be enough to
just do the special case for list().

I think I'll keep Pipe(), hackish as it is, but I will also add a new one
based on your idea.  Maybe I'll call it "Chain()".

newlist = Chain(mylist, sort, uniq, list)

I did kind of want a way to make a "reusable pipe".  If you come up with a
useful chain, it might be nice if you could use it again with convenient
syntax. Maybe like so:

sort_u = [sort, uniq, list]
newlist = Chain(mylist, sort_u)

Thank you very much for making a helpful suggestion!
-- 
Steve R. Hastings    "Vita est"
steve at hastings.org    http://www.blarg.net/~steveha