[Python-Dev] Dictionary sparseness

Tim Peters tim.one@comcast.net
Mon, 05 May 2003 13:40:18 -0400


[Jeremy Hylton]
> Have you seen the work on gray-box systems?
>
> http://www.cs.wisc.edu/graybox/
>
> The philosophy of this project seems to be "You can observe an awful lot
> just by watching."  (Apologies to Yogi.)  The approach is to learn how a
> particular service is implemented, e.g. what buffer-replacement
> algorithm is used, by observing its behavior.  Then write an application
> that exploits that knowledge to drive the system into optimized behavior
> for  the application.  No madvise() necessary.
>
> I wonder if the same can be done for dicts?  My first guess would be no,
> because the sparseness is a fixed policy.

Well, a dict suffers damaging collisions or it doesn't.  If it does, the
best thing a user can do is rebuild the dict from scratch, inserting keys by
decreasing order of access frequency.  Then the most frequently accessed
keys come earliest in their collision chains.  Collisions simply don't
matter for rarely referenced keys.  (And, for example, if there *are* any
truly damaging collisions in __builtin__.__dict__, I expect this gimmick
would remove the damage.)  The size of the dict can be forced larger by
inserting artificial keys, if a user is insane <wink>.  It's always been
possible to eliminate dummy entries by doing "dict = dict.copy()".  Note
that because Python exposes the hash function used by dicts, you can write a
faithful low-level dict emulator in Python, and deduce what effects a
sequence of dict inserts and deletes will have.  So, overall, I expect
there's more you *could* do to speed dict access (in the handful of bad
cases it's not already good enough) yourself than Python could do for you.
You'd have to be nuts, though -- or writing papers on gray-box systems.