[Python-ideas] [Python-Dev] Proposal: new list function: pack

Raymond Hettinger python at rcn.com
Fri Mar 20 18:01:15 CET 2009


>> I propose a new function for list for pack values of a list and
>> sliding over them:
>>
>> then we can do things like this:
>> for i, j, k in pack(range(10), 3, partialend=False):
>> print i, j, k
 . . .
>> def pack(l, size=2, slide=2, partialend=True):
>> lenght = len(l)
>> for p in range(0,lenght-size,slide):
>> def packet():
>> for i in range(size):
>> yield l[p+i]
>> yield packet()
>> p = p + slide
>> if partialend or lenght-p == size:
>> def packet():
>> for i in range(lenght-p):
>> yield l[p+i]
>> yield packet()

This has been discussed before and rejected.

There were several considerations.  The itertools recipes already 
include simple patterns for grouper() and pairwise() that are easy
to use as primitives in your code or to serve as models for variants. 

The design of pack() itself is questionable.  It attempts to be a 
Swiss Army Knife by parameterizing all possible variations
(length of window, length to slide, and how to handle end-cases).
This design makes the tool harder to learn and use, and it makes
the implementation more complex.  

That complexity isn't necessary.  Use cases would typically fall
into grouper cases where the window length equals the slide
length or into cases that slide one element at a time.  You don't
win anything by combining the two cases except for more making
the tool harder to learn and use.

The pairwise() recipe could be generalized to larger windows,
but seemed like less of a good idea after closely examining potential
use cases.  For cases that used a larger window, there always
seemed to be a better solution than extending pairwise().  For
instance, a twenty-day moving average is better implemented with
a deque(maxlen=20) and a running sum than with an iterator
returning tuples of length twenty -- that approach does a lot of
unnecessary work shifting elements in the tuple, turning an
O(n) process into an O(m*n) process.

For short windows, like pairwise() itself, the issue is not one of
total running time; instead, the problem is that almost every 
proposed use case was better coded as a simple Python loop,
saving the value previous values with a step like:  oldvalue = value.
Having pairwise() or tripletwise() tended to be a distraction away
from better solutions.  Also, the pure python approach was more
general as it allowed accumulations:  total += value. 

While your proposed function has been re-invented a number of
times, that doesn't mean it's a good idea.  It is more an exercise
in what can be done, not in what should be done.


Raymond



More information about the Python-ideas mailing list