dict vs kjBuckets vs ???

MK mark_removethis_ at _removethis_btweng.krakow.pl
Fri Jun 11 10:59:46 CEST 1999


On Thu, 10 Jun 1999 22:23:15 -0400, "Tim Peters"
<tim_one at email.msn.com> wrote:
>[MK]
>> In some book on algorithms I've read that after inserting limited
>> number of items performance of operating on hash tables
>> drops dramatically.

>Depends on the details.  What you read is true of some kinds of hash tables.
>Python's dicts dynamically expand to keep the "load factor" under 2/3, so
>what you read isn't applicable to Python in normal use.

Great.

>> I plan to write a program that would store lots (in range of 10M or even
>> more) of relatively small objects (a few hundred bytes at most), so what
>> do you think I should use?

>Let's do a little math <wink>:  10M * 100 = ?, a lower bound on what you're
>contemplating.  Do you have gigabytes of RAM?

I'm opening a boutique.

>> I thought about dictionaries, kjBuckets, or maybe even library called
>> Metakit for Python (http://www.equi4.com/metakit/info/README-Python.html).

>> what-do-you-think-ly y'rs

>You don't really want to know <wink>.  Memory-based data structures aren't
>going to work for the size of thing you have in mind.  If you can make it
>fly it all, you'll likely require a powerful database, so of those choices
>Metakit is the only approach that's not dead on arrival.

A few additional informations: items stored would be natural language
text fragments (several sentences at most, several words typically)
+ binary descriptions, primary operation would be lots of searching. 
Is there anything else that would be better for this kind of program? 
Object database?

>better-still-write-it-in-perl<wink>-ly y'rs  - tim

it's-a-stiff-ly yours




--------------------------------------------------
Reality is something that does not disappear after
you cease believing in it - VALIS, Philip K. Dick
--------------------------------------------------

Delete _removethis_ from address to email me




More information about the Python-list mailing list