don't need dictionary's keys - hash table?
Fredrik Lundh
fredrik at pythonware.com
Thu Jul 13 04:54:43 EDT 2006
kdotsky at gmail.com wrote:
>> depending on your application, a bloom filter might be a good enough:
>>
>> http://en.wikipedia.org/wiki/Bloom_filter
>>
>
> Thanks (everyone) for the comments. I like the idea of the bloom
> filter or using an md5 hash, since a rare collision will not be a
> show-stopper in my case.
btw, since my brain is on vacation, could anyone who already has all the necessary
background information in their head (Paul?) tell me if using a chopped-up MD5 or
SHA-1 (or whatever) digest as the hash functions for a bloom filter is a good idea...
i.e. something like:
import sha
try:
all
except NameError:
def all(S):
for x in S:
if not x:
return False
return True
def digest(data):
d = sha.sha(data).digest()
return [d[i:i+4] for i in range(0, len(d), 4)]
class bloom(object):
def __init__(self):
self.filter = [set() for i in range(len(digest("")))]
def add(self, data):
for s, p in zip(self.filter, digest(data)):
s.add(p)
def __contains__(self, data):
return all(p in s for s, p in zip(self.filter, digest(data)))
b = bloom()
b.add("showbiz")
b.add("absolution")
print "hullaballo" in b
print "showbiz" in b
print "origin of symmetry" in b
(and if this isn't a good idea, why it isn't).
</F>
More information about the Python-list
mailing list