Python 3000, zip, *args and iterators

Steven Bethard steven.bethard at gmail.com
Mon Dec 27 20:14:55 CET 2004


Raymond Hettinger wrote:
> [Steven Bethard]
> 
>>What I would prefer is something like:
>>
>> >>> zip(*g(4))
>><iterator object at ...>
>> >>> x, y, z = zip(*g(4))
>> >>> x, y, z
>>(<iterator object at ...>, <iterator object at ..., <iterator object
> at ...)
>
> 2.  It is instructive to look at Guido's reactions to other *args
> proposals.  His receptivity to a,b,*c=it wanes whenever someone then
> requests support for a,*b,c=it.  

Yeah, I've seen his responses to those kind of suggestions.  I don't 
think what I'm suggesting (at least in terms of *args) is quite as 
extreme though -- I'm still only talking about *args in function 
definitions.  I'm just suggesting that in a function with a *args in the 
def, the args variable be an iterator instead of a tuple.  (This doesn't 
entirely solve my zip problem of course, but it's the only *args change 
I was suggesting.)

> Likewise, he considers zip(*args) as a
> transpose function to be an abuse of the *arg protocol.

Ahh, I didn't know that.  Is there another (preferred) way to do this?

> 3.  The recipe discussion and newsgroup posting present only toy
> examples -- real use cases have not yet emerged.

Ok, I'll try to give you one of my use cases.  It's a little 
complicated, so sorry if my explanation goes on for a bit here.

Basically, I'm parsing one file format to another.  The files can be 
quite large, so it's important to use iterators wherever possible.  My 
conversion function is a generator that generates a (label, 
feature_dict) pair for each line in the input file.

Now, two possible things can happen at this point (depending on 
parameters from the user):

CASE 1: I output the (label, feature_dict) pairs as is, with code 
something like:

     for label, feature_dict in generator:
         write_instance(label, feature_dict)

This is, of course, the simple case.

CASE 2: I need to apply a windowing function to the iterables so that 
each line includes not only its feature_dict's values, but also the 
values of some of the surrounding feature_dicts.  Note that I only want 
to window the feature_dicts, not the labels.  This gives me code 
something like:

     labels, feature_dicts = starzip(generator)
     for label, feature_window in izip(labels, window(feature_dicts)):
         write_instance(label, combine_dicts(feature_widow))

Note that I can't write the code like:

     for label, feature_dict in generator:
         feature_dict = combine_dicts(window(feature_dict)) # WRONG!
         write_instance(label, feature_dict)

because window produces an iterable from an *iterable* of feature_dicts, 
not from a single feature_dict.  So basically what I've done here is to 
"transpose" (to use your word) the iterators, apply my function, and 
then transpose the iterators back.


Hopefully this gives a little better justification for starzip?  If you 
have a cleaner way to do this kind of thing, I'd welcome any suggestions 
of course.


If zip(*) is discouraged as a transpose function, maybe I should be 
lobbying for adding a transpose function instead?  (For now, of course, 
it would go into itertools, but when iterators become the standard in 
Python 3.0, maybe it could be moved into the builtins...)


Thanks for your comments!

Steve



More information about the Python-list mailing list