[Numpy-discussion] Help using numPy to create a very large multi dimensional array

Bruno Santos bacmsantos at gmail.com
Fri Apr 13 06:50:30 EDT 2007


Dear Sirs,
I'm trying to use Numpy to solve a speed problem with Python, I need to
perform agglomerative clustering as a first step to k-means clustering.
My problem is that I'm using a very large list in Pyhton and the script is
taking more than 9minutes to process all the information, so I'm trying to
use Numpy to create a matrix.
I'm reading the vectors from a text file and I end up with an array of
115*2634 float elements, How can I create this structure with numpy?

Where is my code in python:
#Read each document vector to a matrix
    doclist = []
    matrix = []
    list = []
    for line in vecfile:
        list = line.split()
        for elem in range(1, len(list)):
            list[elem] = float(list[elem])
        matrix.append (list[1:])
    vecfile.close()

    #Read the desired number of final clusters
    numclust = input('Input the desired number of clusters: ')

#Clustering process
    clust = rows
    ind = [-1, -1]
    list_j=[]
    list_k=[]
    while (clust > numclust):
        min = 2147483647
        print('Number of Clusters %d \n' % clust)
        #Find the 2 most similares vectors in the file
        for j in range(0, clust):
            list_j=matrix[j]
            for k in range(j+1, clust):
                list_k=matrix[k]
                dist=0
                for e in range(0, columns):
                    result = list_j[e] - list_k[e]
                    dist += result * result
                if (dist < min):
                    ind[0] = j
                    ind[1] = k
                    min = dist

        #Combine the two most similaires vectores by median
        for e in range(0, columns): matrix[ind[0]][e] = (matrix[ind[0]][e] +
matrix[ind[1]][e]) / 2.0
        clust = clust -1

        #Move up all the remaining vectors
        for k in range(ind[1], (rows - 1)):
            for e in range(0, columns): matrix[k][e]=matrix[k+1][e]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20070413/3f8fabcb/attachment.html>


More information about the NumPy-Discussion mailing list