Databases and python

Fri Feb 17 06:18:02 EST 2006

Dan Stromberg wrote:
>  Bryan Olson wrote:
[...]
>> Well, you could use simple files instead of fancy database tables.
> 
> That's an interesting thought.  Perhaps especially if australopithecine
> were saved in a filename like:
> 
> ~/indices/au/st/ra/lo/pi/th/ec/in/e 

Right, though the better filesystems can efficiently support large
numbers of files per directory.

>>Below is a demo of an alternate technique that uses bsddb B-Trees,
>>and puts both the word and the file-id in the key. I don't know
>>how efficient it is for real data, but at least the time won't grow
>>as Theta(n**2).
> 
> 
> Perhaps I'm missing something, but is it not roughly O(1) for
> individual insertions, but O(n*m) (n == number of files, m == number of
> words) for lookups?

Insertion into a B-Tree with n elements is Theta(lg(n)). Finding the
files associated with a word, call the number of them 'k', should
be Theta(lg(n) + k). That assumes words and files are roughly
constant size.

-- 
--Bryan