[Numpy-discussion] Repeated dot products

denis denis-bz-py at t-online.de
Thu Jan 7 09:05:19 EST 2010

```On 12/12/2009 22:55, T J wrote:
> Hi,
>
> Suppose I have an array of shape:  (n, k, k).  In this case, I have n
> k-by-k matrices.  My goal is to compute the product of a (potentially
> large) user-specified selection (with replacement) of these matrices.
> For example,
>
>     x = [0,1,2,1,3,3,2,1,3,2,1,5,3,2,3,5,2,5,3,2,1,3,5,6]

TJ,
what are your n, k, len(x) ?

_dotblas.dot is fast: dot( 10x10 matrices ) takes ~ 22 usec on my g4 ppc,
which is ~ 15 clock cycles (700 MHz) per mem access * +.

A hack to find repeated pairs (or triples ...) follows.
Your sequence above has only (3,2) 4 times, no win.

(Can someone give a probabilistic estimate of the number of non-overlapping pairs
in N letters from an alphabet of size A ?)

#!/usr/bin/env python
# numpy-discuss 2009 12dec TJ repeated dot products

from __future__ import division
from collections import defaultdict
import numpy as np

__version__ = "2010 7jan denis"

def pairs( s, Len=2 ):
""" repeated non-overlapping pairs (substrings, subwords)
"abracadabra" -> ab ra [[0 7] [2 9]], not br
Len=3: triples, 4 ...
"""
# bruteforce
# grow repeated 2 3 ... ?
pairs = defaultdict(list)
for j in range(len(s)-Len+1):
pairs[ s[j:j+Len] ].append(j)
min2 = filter( lambda x: len(x) > 1, pairs.values() )
min2.sort( key = lambda x: len(x), reverse=True )
# remove overlaps --
# (if many, during init scan would be faster)
runs = np.zeros( len(s), np.uint8 )
run = np.ones( Len, np.uint8 )
run[0] = Len
chains = []
for ovchain in min2:
chain = []
for c in ovchain:
if not runs[c:c+Len].any():
runs[c:c+Len] = run
chain.append(c)
if len(chain) > 1:
chains.append(chain)
return (chains, runs)

#...............................................................................
if __name__ == "__main__":
import sys
alph = 5
randlen = 100
randseed = 1
exec( "\n".join( sys.argv[1:] ))  # Test= ...

print "pairs( %s ) --" % abra
print pairs( abra )  # ab [0, 7], br [2, 9]]
print pairs( abra, 3 )  # abr [0, 7]

np.random.seed( randseed )
r = np.random.random_integers( 1, alph, randlen )
chains, runs = pairs( tuple(r) )
npair = sum([ len(c) for c in chains ])
print "%d repeated pairs in %d random %d" % (npair, randlen, alph)
# 35 repeated pairs in 100 random 5  (prob estimate this ?)
# 25 repeated pairs in 100 random 10

```