dict vs kjBuckets vs ???

Tim Peters tim_one at email.msn.com
Tue Jun 8 22:27:39 EDT 1999


[Aahz Maruch]
> I'm currently using a dict for storing references to objects (call them
> customer objects, referenced by customer ID (a number)).  I'm using a
> dict (instead of SQL or something similar) because we're willing to
> trade lots of memory to get speed.  However, I've heard that after about
> 10K items in a dict, it starts having problems.

11,523 to be exact.  After that, dicts drink to excess and show up for work
late the morning after.  We don't like to talk about it, though.

> Is this a specious rumor,

It wasn't before you spread it <wink>.  Seriously, haven't heard a peep
about this before, and routinely use dicts with 40K objects -- performance
stays O(1) all the way.  It's certainly possible-- but unlikely --that you
assign "customer IDs" in a pattern that interacts badly with the hash
function, leading to lots of collisions.

> should I switch to kjBuckets, or should I do something else?

The latter:  investigate!  Create a reproducible bad case and we'll fix it,
or retract your slander <wink>.

> Does the answer change if we have 100K objects?

No.

>  A million?

No.

It all depends on how much memory you have and how well the hash function is
working.  Note that dicts work hard to "randomize" the addresses at which
they store keys, so memory locality can be terrible.  But if there's no
pattern to customer ID accesses, locality would be terrible in a 1M array
too.

> (The only operations are object insert, update, delete, and lookup.)

Kinda figured that, else you wouldn't be using a dict <wink>.

dicts-are-great-use-more-of-them-ly y'rs  - tim






More information about the Python-list mailing list