[Python-3000] Two proposals for a new list-like type: one modest, one radical

Daniel Stutzbach daniel at stutzbachenterprises.com
Tue Apr 24 00:54:00 CEST 2007


On 4/23/07, Mike Klaas <mike.klaas at gmail.com> wrote:
> The init-from iterable is especially interesting: a 20% speed increase
> for list comps is not to be scoffed at.

That one is because list()'s tp_new method is PyType_GenericNew
instead of a wrapper around the more efficient PyList_New.  Probably
list comprehensions and the [] syntax already call PyList_New, but I
haven't checked.

> Have you analyzed the memory consumption of the two data structures?

I haven't done any empirical studies, but I can give you a pretty good picture.

For small lists, a BList uses a fixed amount of memory rather than
aiming for the "just large enough" for Python lists.  So, for very
small lists, BLists are currently rather wasteful.  The BList code
could be changed so that for small lists they also do the "just large
enough" thing, at the expense of a bit more code complexity.

For large lists, the *worst* case for BLists is around twice the
memory requirements of a regular Python list (each BList node is
required to be at least half-full).  I know that sounds horrible, but
hear me out. :-)  The *best* case for BLists is when each node is
completely full, in which case they use negligibly more memory than a
regular Python list.

If a user creates a BList from whole-cloth (e.g., BList(iterable)),
they will get a best-case BList.  This would probably be the common
case for user's who don't plan to modify their list much.

If a user is doing lots of inserts and deletes, the BList's node's
will typically be three-quarters full, translating into 33% extra
memory above a regular Python list.  However (and this is a big
however!), these users will greatly appreciate the BList's additional
speed for these operations.

I hasten to add that these memory estimates are only about the space
required to store the list structure as well as pointers to the
objects in the list.  The objects themselves (e.g., integers, class
instances, whatever) will take up substantially more memory in any
case.  For example, for the regular 32-bit integer type, a single
object takes up 12 bytes.  A regular python list will spend an extra 4
bytes per integer (16 bytes total).  A worst-case BList spends an
extra 8 bytes per integer (20 bytes total).  So, worst case is only a
25% increase.

If the user makes use of operations that leverage the copy-on-write
feature, then BLists are *much* more memory efficient that regular
lists.  __getslice__ is probably the most commonly used function in
this category.  A slice of a large BList uses only O(log n) memory in
the worst case.

-- 
Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC


More information about the Python-3000 mailing list