External Hashing [was Re: matching strings in a large set of strings]

Tim Chase python.list at tim.thechases.com
Fri Apr 30 15:45:17 EDT 2010

On 04/30/2010 12:51 PM, Helmut Jarausch wrote:
> I think one could apply an external hashing technique which would require only
> very few disk accesses per lookup.
> Unfortunately, I'm now aware of an implementation in Python.
> Does anybody know about a Python implementation of external hashing?

While you don't detail what you're hashing, Stephan Behnel 
already suggested (in the parent thread) using one of Python's 
native dbm modules (I just use anydbm and let it choose).  The 
underlying implementations should be fairly efficient assuming 
you don't use the dumbdbm last-resort fallback).  With the anydbm 
interface, you can implement dict/set semantics as long as you 
take care that everything is marshalled into and out of strings 
for keys/values.


More information about the Python-list mailing list