getting a submatrix of all true
Anton Vredegoor
anton at vredegoor.doge.nl
Sat Jul 5 12:49:33 CEST 2003
John Hunter <jdhunter at ace.bsd.uchicago.edu> wrote:
>>>>>> "Bengt" == Bengt Richter <bokr at oz.net> writes:
> Bengt> Brute force seems to work on this little example. Maybe it
> Bengt> can be memoized and optimized and/or whatever to handle
> Bengt> your larger matrix fast enough?
>
>Thanks for the example. Unfortunately, it is too slow even for
>moderate size matrices (30,10). I've been running it for over two
>hours for a 30x10 matrix and it hasn't finished. And my data are
>1000x100!
I posted a somewhat long version before (inspired by Bengts idea) but
I remembered an easier way to generate all combinations, and also I
noticed that there is freedom in choosing to analyze rows or columns,
which can be computationally advantageous. Below is a version that
depends mostly on 2**N where N is the smallest of those two values.
Unfortunately 2**100 is still way to big a number, but my code can now
routinely solve 100x15 matrices ...
>Last night, I began to formulate the problem as a logic statement,
>hoping this would give me an idea of how to proceed. But no progress
>yet. But I have come to the conclusion that with P ones, brute force
>requires 2^P combinations. With 1000x100 with 5% missing that gives
>me 2^5000. Not good.
IMO it depends (except for the amount time spent in copying data of
course) on the smallest of the number of rows or columns, so that's
2**100 in this case. Maybe for some matrices, your number is
preferable?
>Thanks for your suggestion, though. If you have any more thoughts,
>let me know.
I hope you don't mind me following this too :-) Nice problem, and I'm
not so sure about my math as I sound above, if anyone can prove me
right or wrong I'd be happy anyway ...
Anton
---
class ScoreMatrix:
def __init__(self, X):
n1,n2 = len(X),len(X[0])
if n2<n1:
self.X = zip(*X)
self.rotated = True
n = n2
else:
self.X = X
n = n1
self.rotated = False
self.R = range(n)
self.count = 2**n
def __getitem__(self,i):
#score selected rows and columns by index
if not (-1<i<self.count): raise IndexError
rows =[x for j,x in enumerate(self.R) if 1<<j &i]
if rows:
Y = [self.X[i] for i in rows]
Z = [(i,z) for i,z in enumerate(zip(*Y)) if 1 not in z]
cols = [i for i,z in Z]
score = sum(map(len,[z for i,z in Z]))
if self.rotated: return score,cols,rows
else: return score,rows,cols
else: return 0,[],[]
def test():
from random import random,randint
f = .05
r,c = 100,15
X=[]
for i in range(r):
X.append([])
for j in range(c):
X[i].append(int(random()<f))
M = ScoreMatrix(X)
highscore = 0,[],[]
for i,sc in enumerate(M):
if sc > highscore:
mi,highscore = i,sc
sc,rows,cols = highscore
print "maximum score: %s" %(sc)
print "index of this score: %s" %(mi)
print "tabledata:"
for i,r in enumerate(X):
for j,c in enumerate(r):
if i in rows and j in cols: print c,
else: print 'x',
print
if __name__=='__main__':
test()
More information about the Python-list
mailing list