Comparing a lot of data efficiently

Mike C. Fletcher mcfletch at home.com
Mon Aug 27 12:07:29 CEST 2001


Um, why not use the (index, index, index) values as _keys_ instead of values
in the dictionary?

That is:
trione = { (i1,i2,i3): triNumber }
tritwo = { (i1,i2,i3): triNumber } # not really necessary, see below

Then, to iterate:

for line in file2:
	key, localName = getIndices( line )
	# key being (i1,i2,i3) for the line
	otherName = trione.get( key )
	if otherName is not None:
		out.write("%i;%i \n" % (otherName, localName,))

So you don't load up all of the second file, just one line at a time.  If
you need to check for 1,2,3 == 2,3,1 then just check the various
combinations by doing gets for the other possible combinations.

HTH,
Mike

-----Original Message-----
From: python-list-admin at python.org
[mailto:python-list-admin at python.org]On Behalf Of Marcus Stojek
Sent: August 27, 2001 05:34
To: python-list at python.org
Subject: Comparing a lot of data efficiently


Hi,
first let me say that I'm new to Python and that I really love it.
Finally I can spend my time solving my problems and not fighting with the
programming language.

Okay, here is what I do. I have two files (ASCII), each of them
containing a huge amount (let's say 100.000) triangles. The format is:

Number of triangle,number of first node, number of second node , number
of third node \n

node are corners of the triangle here. Each node has three coordinates in
space, but we don't need them here.

The two files are describing exactly the same group of triangles, but the
triangles are numbered differently in the two files. So the triangle
containing the nodes (12,49583,331) has the number 5 in one file and the
number 23094 in the other one. As two triangles can share two nodes I
have to compare all three nodes for all triangles.

I have two dictonaries:

trione={number triangel:1.node,2.node,3.node}
(key and values are integer)
tritwo=as trione

def common(list1,list2):
    hit=0
    for i in list1:
        for j in list2:
            if i==j:
                hit +=1
    return hit

for t1 in trione.keys():
    for t2 in tritwo.keys():
        if common(trione[t1],tritwo[t2])>=3:
            out.write("%i;%i \n" % (t1,t2))



Works fine but it takes years. I tried to remove from the dictionaries
the triangles that already have been identified to reduce data.
(using pop or slicing) but this slowed the process even more. I know I
could speed up things by breaking the loops in common() when one node
from list1 is not in list2, but for future use I have to deal with shapes
containing more than three nodes.

Any idea how I could significantly improve perfomance? In "Programming
Python" I learned about tuple stacks, binary search trees and graphs but
I'm not sure whether one of those will help. Do I really have to creep
back to C? If so, can anyone tell me how to get my dictonaries into a C
routine? (I know I have to use SWIG, but a little example would help a
lot.)

Thanks in advance.

Marcus

--
http://mail.python.org/mailman/listinfo/python-list





More information about the Python-list mailing list