# Comparing a lot of data efficiently

Emile van Sebille emile at fenx.com
Mon Aug 27 12:21:31 CEST 2001

```Using the two dicts you already have, you could rip through one, building a
new dict where the data from the first forms the key of the new:

matches = {}
for ky in trione.keys():
matches[trione[ky]] = [ky]

Then add the matching triangle from the second to matches:

for ky in tritwo.keys():
matches[tritwo[ky]] .append(ky)

matches then contains the nodes as the keys with the data per key being the
triangle numbers from trione and tritwo.

This works as long as the data can serve as a key, and the sequence of the
nodes is the same in trione as tritwo.  You'll need to adapt as required.

HTH,

--

Emile van Sebille
emile at fenx.com

---------
"Marcus Stojek" <stojek at part-gmbh.de> wrote in message
news:MPG.15f43279f613af34989682 at news.easynews.net...
> Hi,
> first let me say that I'm new to Python and that I really love it.
> Finally I can spend my time solving my problems and not fighting with the
> programming language.
>
> Okay, here is what I do. I have two files (ASCII), each of them
> containing a huge amount (let's say 100.000) triangles. The format is:
>
> Number of triangle,number of first node, number of second node , number
> of third node \n
>
> node are corners of the triangle here. Each node has three coordinates in
> space, but we don't need them here.
>
> The two files are describing exactly the same group of triangles, but the
> triangles are numbered differently in the two files. So the triangle
> containing the nodes (12,49583,331) has the number 5 in one file and the
> number 23094 in the other one. As two triangles can share two nodes I
> have to compare all three nodes for all triangles.
>
> I have two dictonaries:
>
> trione={number triangel:1.node,2.node,3.node}
> (key and values are integer)
> tritwo=as trione
>
> def common(list1,list2):
>     hit=0
>     for i in list1:
>         for j in list2:
>             if i==j:
>                 hit +=1
>     return hit
>
> for t1 in trione.keys():
>     for t2 in tritwo.keys():
>         if common(trione[t1],tritwo[t2])>=3:
>             out.write("%i;%i \n" % (t1,t2))
>
>
>
> Works fine but it takes years. I tried to remove from the dictionaries
> the triangles that already have been identified to reduce data.
> (using pop or slicing) but this slowed the process even more. I know I
> could speed up things by breaking the loops in common() when one node
> from list1 is not in list2, but for future use I have to deal with shapes
> containing more than three nodes.
>
> Any idea how I could significantly improve perfomance? In "Programming
> Python" I learned about tuple stacks, binary search trees and graphs but
> I'm not sure whether one of those will help. Do I really have to creep
> back to C? If so, can anyone tell me how to get my dictonaries into a C
> routine? (I know I have to use SWIG, but a little example would help a
> lot.)
>