[Python-ideas] Looking for a "batch" function

Nick Coghlan ncoghlan at gmail.com
Sun Jul 18 03:02:15 CEST 2010


On Sun, Jul 18, 2010 at 6:30 AM, Tal Einat <taleinat at gmail.com> wrote:
> This kind of operation often has slightly different requirements in
> different scenarios. It is very simple to implement a version of this
> to meet your exact needs. Sometimes in these kinds of situations it is
> better not to have a built-in generic function, to force programmers
> to decide explicitly how they want it to work.
>
> You mentioned efficiency; to do this kind of operation efficiently
> ones really needs to know what kind of sequence/iterator is being
> "packetized", and implement accordingly.

Indeed. There's actually a reasonably decent general windowing recipe
on ASPN (http://code.activestate.com/recipes/577196-windowing-an-iterable-with-itertools/),
but even that isn't appropriate for every use case.

The OP, for example, has rather different requirements to what is
implemented there:
- non-overlapping windows, so tee() isn't needed
- return type should match original container type

A custom generator for that task is actually pretty trivial (note:
untested, so may contain typos):

def windowed(seq, window_len):
  for slice_start in range(0, len(seq), window_len): # use xrange() in 2.x
    slice_end = slice_start + window_len
    yield seq[slice_start:slice_end]

Even adding support for overlapped windows is fairly easy:

def windowed(seq, window_len, overlap=0):
  slice_step = window_len - overlap
  for slice_start in range(0, len(seq), slice_step): # use xrange() in 2.x
    slice_end = slice_start + window_len
    yield seq[slice_start:slice_end]

However, those approaches don't support arbitrary iterators (i.e.
those without __len__), they only support sequences. To support
arbitrary iterators, you'd need to do something fancier with either
collections.deque (either directly or via itertools.tee), but again,
the most appropriate approach is going to be application specific (for
byte data, you're probably going to want to use buffer or memoryview
rather than the original container type).

It isn't that this is an uncommon problem - it's that any
appropriately general solution is going to be suboptimal in many
specific applications, while an optimal solution for specific
applications is going to be insufficiently general to be appropriate
for the standard library.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-ideas mailing list