getting a submatrix of all true

Anton Vredegoor anton at vredegoor.doge.nl
Sat Jul 5 12:49:33 CEST 2003


John Hunter <jdhunter at ace.bsd.uchicago.edu> wrote:

>>>>>> "Bengt" == Bengt Richter <bokr at oz.net> writes:
>    Bengt> Brute force seems to work on this little example. Maybe it
>    Bengt> can be memoized and optimized and/or whatever to handle
>    Bengt> your larger matrix fast enough?
>
>Thanks for the example.  Unfortunately, it is too slow even for
>moderate size matrices (30,10).  I've been running it for over two
>hours for a 30x10 matrix and it hasn't finished.  And my data are
>1000x100!

I posted a somewhat long version before (inspired by Bengts idea) but
I remembered an easier way to generate all combinations, and also I
noticed that there is freedom in choosing to analyze rows or columns,
which can be computationally advantageous. Below is a version that
depends mostly on 2**N where N is the smallest of those two values.
Unfortunately 2**100 is still way to big a number, but my code can now
routinely solve 100x15 matrices ...

>Last night, I began to formulate the problem as a logic statement,
>hoping this would give me an idea of how to proceed.  But no progress
>yet.  But I have come to the conclusion that with P ones, brute force
>requires 2^P combinations.  With 1000x100 with 5% missing that gives
>me 2^5000.  Not good.  

IMO it depends (except for the amount time spent in copying data of
course) on the smallest of the number of rows or columns, so that's
2**100 in this case. Maybe for some matrices, your number is
preferable?

>Thanks for your suggestion, though.  If you have any more thoughts,
>let me know.

I hope you don't mind me following this too :-) Nice problem, and I'm
not so sure about my math as I sound above, if anyone can prove me
right or wrong I'd be happy anyway ...

Anton

---

class ScoreMatrix:
    
    def __init__(self, X):
        n1,n2 = len(X),len(X[0])
        if n2<n1:
            self.X = zip(*X)
            self.rotated = True
            n = n2
        else:
            self.X = X
            n = n1
            self.rotated = False
        self.R = range(n)
        self.count = 2**n
    
    def __getitem__(self,i):
        #score selected rows and columns by index
        if not (-1<i<self.count): raise IndexError
        rows =[x for j,x in enumerate(self.R) if 1<<j &i] 
        if rows:
            Y = [self.X[i] for i in rows]
            Z = [(i,z) for i,z in enumerate(zip(*Y)) if 1 not in z]
            cols = [i for i,z in Z]
            score = sum(map(len,[z for i,z in Z]))
            if self.rotated:  return score,cols,rows
            else: return score,rows,cols
        else: return 0,[],[]

def test():
    from random import random,randint
    f = .05
    r,c = 100,15
    X=[]
    for i in range(r):
        X.append([])
        for j in range(c):
            X[i].append(int(random()<f))
    M = ScoreMatrix(X)
    highscore = 0,[],[]
    for i,sc in enumerate(M):
        if sc > highscore:
            mi,highscore = i,sc
    sc,rows,cols = highscore
    print "maximum score:       %s" %(sc)
    print "index of this score: %s" %(mi)
    print "tabledata:"
    for i,r in enumerate(X):
        for j,c in enumerate(r):
            if i in rows and j in cols: print c,
            else: print 'x',
        print

if __name__=='__main__':
    test()







More information about the Python-list mailing list