Data structure recommendation?

Jochen Schulz usenet-nospam at well-adjusted.de
Tue Apr 8 17:02:05 CEST 2008


* Steven Clark:
> Hi all-
> 
> I'm looking for a data structure that is a bit like a dictionary or a
> hash map. In particular, I want a mapping of floats to objects.
> However, I want to map a RANGE of floats to an object.

This solution may be more than you actually need, but I implemented two
metric space indexes which would do what you want (and I wanted to plug
it anyway :)):

http://well-adjusted.de/mspace.py/

You can do arbitrary range searches and nearest-neighbour search in
these indexes provided you have a metric distance function for the
objects you want to index. The distance function in your case would
simply be the absolute difference between your floats.

If every log message was contained in objects of this type:

class LogMessage(object)
    def __init__(self, timestamp, object):
        self.timestamp = timestamp
        self.message = message

you could write a distance function for these objects like this:

def log_distance(log1, log2):
    return abs(log1.timestamp - log2.timestamp)

and then you would create and search an index like this:

from mspace import VPTree
index = VPTree(all_log_messages, log_distance)
index.search(single_msg, 10)

The last line would yield all log messages within ten seconds (or
whatever the unit of your timestamps is) of the given LogMessage
single_msg. You could also use

index.nn_search(single_msg, 3)

To find the three messages closest to the given single_msg.

Searching should be quite fast (O(log(n)), but indexing might take some
time (O(n*log(n))). The problem is that VPTrees have to be constructed
from the complete data set initially. They cannot grow. The alternative
from mspace.py, BKTrees, may grow over time but they don't work with
non-discrete distances. If you want to use them, you'd have to convert
your timestamps to int/long first.

J.
-- 
In the west we kill people like chickens.
[Agree]   [Disagree]
                 <http://www.slowlydownward.com/NODATA/data_enter2.html>



More information about the Python-list mailing list