[Python-3000] BList PEP
Josiah Carlson
jcarlson at uci.edu
Tue May 1 09:53:51 CEST 2007
"Daniel Stutzbach" <daniel at stutzbachenterprises.com> wrote:
> Title: BList: A faster list-like type
> 1. Add it to the collections module, or
+1
> 2. Replace the existing list type
-1
For types that are used every day, I can't help but prefer a simple
implementation. Among the features of the current Python list
implementation is that small lists (0, 4, 8, 16 elements) use very
little space. Your current BList implementation uses a fixed size for
the smallest sequence, 128, which would offer worse memory performance
for applications where many small lists are common.
> ========= ================ ====================
> Operation Array-based list BList
> ========= ================ ====================
> Copy O(n) **O(1)**
> Append **O(1)** O(log n)
> Insert O(n) **O(log n)**
> Get Item **O(1)** O(log n)
> Set Item **O(1)** **O(log n)**
what's going on with this pair? ^^ ^^
> Del Item O(n) **O(log n)**
> Iteration O(n) O(n)
> Get Slice O(k) **O(log n)**
> Del Slice O(n) **O(log n)**
> Set Slice O(n+k) **O(log k + log n)**
> Extend O(k) **O(log k + log n)**
> Sort O(n log n) O(n log n)
> Multiply O(nk) **O(log k)**
> ========= ================ ====================
> The performance for the LIFO use case could be improved to O(n) time,
You probably want to mention "over n appends/pop(-1)s". You also may
want to update the above chart to take into consideration that you plan
on doing that modification.
Generally, the BList is as fast or faster asymptotically than a list for
everything except random getitem/setitem; at which point it is O(logn)
rather than O(1). You may want to explicitly state this in some later
version.
> Implementation
> ==============
>
> The BList is based on the B+Tree data structure. The BList is a wide,
> bushy tree where each node contains an array of up to 128 pointers to
> its children. If the node is a leaf, its children are the
> user-visible objects that the user has placed in the list. If node is
> not a leaf, its children are other BList nodes that are not
> user-visible. If the list contains only a few elements, they will all
> be a children of single node that is both the root and a leaf. Since
> a node is little more than array of pointers, small lists operate in
> effectively the same way as an array-based data type and share the
> same good performance characteristics.
In the collections module, there exists a deque type. This deque type
more or less uses a sequence of 64 pointers, the first two of which are
linked list pointers to the previous and next block of pointers. I
don't know how much tuning was done to choose this value of 64, but you
may consider reducing the number of pointers to 64 for the the same
cache/allocation behavior.
- Josiah
More information about the Python-3000
mailing list