[Numpy-discussion] Help using numPy to create a very large multi dimensional array
Bruno Santos
bacmsantos at gmail.com
Fri Apr 13 06:50:30 EDT 2007
Dear Sirs,
I'm trying to use Numpy to solve a speed problem with Python, I need to
perform agglomerative clustering as a first step to k-means clustering.
My problem is that I'm using a very large list in Pyhton and the script is
taking more than 9minutes to process all the information, so I'm trying to
use Numpy to create a matrix.
I'm reading the vectors from a text file and I end up with an array of
115*2634 float elements, How can I create this structure with numpy?
Where is my code in python:
#Read each document vector to a matrix
doclist = []
matrix = []
list = []
for line in vecfile:
list = line.split()
for elem in range(1, len(list)):
list[elem] = float(list[elem])
matrix.append (list[1:])
vecfile.close()
#Read the desired number of final clusters
numclust = input('Input the desired number of clusters: ')
#Clustering process
clust = rows
ind = [-1, -1]
list_j=[]
list_k=[]
while (clust > numclust):
min = 2147483647
print('Number of Clusters %d \n' % clust)
#Find the 2 most similares vectors in the file
for j in range(0, clust):
list_j=matrix[j]
for k in range(j+1, clust):
list_k=matrix[k]
dist=0
for e in range(0, columns):
result = list_j[e] - list_k[e]
dist += result * result
if (dist < min):
ind[0] = j
ind[1] = k
min = dist
#Combine the two most similaires vectores by median
for e in range(0, columns): matrix[ind[0]][e] = (matrix[ind[0]][e] +
matrix[ind[1]][e]) / 2.0
clust = clust -1
#Move up all the remaining vectors
for k in range(ind[1], (rows - 1)):
for e in range(0, columns): matrix[k][e]=matrix[k+1][e]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20070413/3f8fabcb/attachment.html>
More information about the NumPy-Discussion
mailing list