[Python-Dev] deque alternative (was: Linked lists)

Tim Peters tim.peters at gmail.com
Mon Dec 26 18:56:27 CET 2005


[restoring context and attributions lost in the original]

[Tim Peters]
>>>> Like Phillip Eby, I use 2-tuples for this when I feel the need
>>>> (usually during a backtracking graph search, to keep track of paths
>>>> back to the root in a space-efficient way), and am happy with that.

[Josiah Carlson]
>>> Then there's the whole Python list with append() and pop().  It
>>> takes a method resolution and function call, but at least in
>>> Python 2.3, it is a hair faster...

[Martin Blais]
>> Josiah, that's beside the point, because it is not space-efficient,
>> the list is actually an dynamic array/vector, and I would expect an
>> efficient array implementation to grow exponentially.  One of the
>> reasons we're talking about lists is to avoid exactly this problem...

[James Y Knight]
> Okay, now *that* reason is a pretty crazy one. Consider what you're
> saying here.

I'm sure he did ;-)  Consider a very simple graph, a skinny tree
rooted at `A` that branches once "at the end", represented by a dict
of adjacency lists:

kids[A] = [B]
kids[B] = [C]
kids[C] = [D]
...
kids[W] = [X]
kids[X] = [Y, Z]

A natural representation of the path from B back to the root is:

    Bpath = B, (A, None)

and from C back to the root:

    Cpath = C, Bpath

and from D back to the root:

    Dpath = D, Cpath

...

and from X back to the root:

    Xpath = X, Wpath

A "flat" list or tuple would certainly be more space-efficient up to
this point.  But when the graph branches, the 2-tuple representation
allows *sharing* the common path suffix (which may be very long!):

    Ypath = Y, Xpath
and
    Zpath = Z, Xpath

If there's an N-way branch, using 2-tuples allows recording the N new
paths back to the root with incremental memory burden N *
some_constant.  You can't do this kind of sharing with a flat list or
tuple, and the incremental memory burden at an N-way branch zooms to
(N + some_other_constant) * (the amount of memory needed to record the
path from the branch point back to the root), due to duplicating the
back-to-the-root info N times.   The potential memory saving from
using 2-tuples instead is unbounded.

...

> Plus, in python, the overhead per object in the pyobject memory allocation
> system, which I don't know how to quantify.

For objects requiring a number of bytes divisible by 8, a few percent
at worst.  The per-object space overhead in obmalloc consists entirely
of internal padding needed to reach a multiple of 8 bytes per
allocation unit, and that's 0 when the # of bytes needed is divisible
by 8.  The only obmalloc space overheads then are due to per-pool and
per-arena bookkeeping space, and those are small fractions of total
pool and arena sizes.

...

> So, the list will generally be 1/8th of its size overallocated, or
> 112 elements plus 9 words overhead is 121 words. Better than any cons-
> linked list could be, and *insanely better* than a cons-linked list
> would be in python.

You seem to be ignoring possiblities for sharing across lists, and
such sharing is natural in many graph algorithms.


More information about the Python-Dev mailing list