flatten a level one list

Peter Otten __peter__ at web.de
Fri Jan 13 03:39:57 EST 2006


Michael Spencer wrote:

> Peter Otten wrote:
> 
>> If you require len(xdata) == len(ydata) there's an easy way to move the
>> loop into C:
>> 
>> def flatten7():
>>     n = len(xdata)
>>     assert len(ydata) == n
>>     result = [None] * (2*n)
>>     result[::2] = xdata
>>     result[1::2] = ydata
>>     return result
>> 
>> $ python -m timeit 'from flatten import flatten6 as f' 'f()'
>> 1000 loops, best of 3: 847 usec per loop
>> $ python -m timeit 'from flatten import flatten7 as f' 'f()'
>> 10000 loops, best of 3: 43.9 usec per loop

> Very nice observation, Peter.  

Thank you :-)

> Easily the "winner".  It reminds me of 
> str.translate beating python loops in filtering applications:
> http://groups.google.com/group/comp.lang.python/msg/e23cdc374144a4bd

> What's more, you can generalize your approach to any number of sequences
> and un-equal lengths, with only modest loss of speed:
> 
> def interleave(*args, **kw):
>      """Peter Otten flatten7 (generalized by Michael Spencer)
>      Interleave any number of sequences, padding shorter sequences if kw
>      pad is supplied"""
>      dopad = "pad" in kw
>      pad = dopad and kw["pad"]
>      count = len(args)
>      lengths = map(len, args)
>      maxlen = max(lengths)
>      result = maxlen*count*[None]
>      for ix, input in enumerate(args):
>          try:
>              result[ix::count] = input
>          except ValueError:
>              if dopad:
>                  result[ix::count] = input + [pad]*(maxlen-lengths[ix])
>              else:
>                  raise
>      return result

You don't loose speed because the expensive operation is only executed if
list lengths differ. Two observations:

- list arithmetic like 'input + [pad]*(maxlen-lengths[ix])' should and can
be avoided

- You are padding twice -- once with None, and then with the real thing.

def interleave2(*args, **kw):
     dopad = "pad" in kw
     pad = kw.get("pad")
     count = len(args)
     lengths = map(len, args)
     maxlen = max(lengths)
     if not dopad and min(lengths) != maxlen:
         raise ValueError
     result = maxlen*count*[pad]
     for ix, input in enumerate(args):
         result[ix:len(input)*count:count] = input
     return result

$ python -m timeit -s 'from interleave_spencer import xdata, ydata,
interleave as i;xdata=xdata[:-1]' 'i(xdata, ydata, pad=None)'
10000 loops, best of 3: 69.7 usec per loop
$ python -m timeit -s 'from interleave_spencer import xdata, ydata,
interleave2 as i;xdata=xdata[:-1]' 'i(xdata, ydata, pad=None)'
10000 loops, best of 3: 46.4 usec per loop

Not overwhelming, but I expect the difference to grow when the arguments
occupy a significant portion of the available memory.

Peter



More information about the Python-list mailing list