Best match searching

Jeff Epler jepler at unpythonic.net
Mon Dec 29 15:08:12 CET 2003


Searching 600*400 values by simply going in order is going to take
awhile.  It's O(N), linear in the number of values.

If your "array" is relatively unchanging, then you can store it in a
different way so that searching takes much less time.  I think that the
method I'm going to describe is O(log N) complexity.

You want to choose a tree representation for your data with the
following property: at each node of the tree, there are N children, and
each of the N children has approximately 1/N of the leaves below that
node.  Furthermore, there must be a simple test that will let you choose
which of those children may hold the closest node under your distance
measure.

I think that one algorithm you could follow is described at
http://www.cs.sunysb.edu/~algorith/files/kd-trees.shtml

Beware that if you write your k-d trees in Python but compare to a
linear search using numpy that effectively runs in C, the numpy approach
may win on your example.  But if the search space grows larger the
tree approach will eventually win over a brute-force approach.

Jeff





More information about the Python-list mailing list