[Python-3000] Two proposals for a new list-like type: one modest, one radical

Mon Apr 23 18:58:25 CEST 2007

I like the implementation of the BList, however I'm not so sure whether
replacing a list with BList is a good idea since there are many instances
where python scripts will iterate through endless lists of items. In such
case time complexity (to iterate a whole list) as you have mentioned is O
(nlog(n)) instead of O(n). You will probably find that such cases (iterating
through long lists) are very common and at the same time the developer still
wants his lists to be mutable. If a developer wants to optimise his lists
for insertion and deletion speed he can then use an optimised implementation
like the one you're proposing anyway.

What's the performance of concatenation in these BLists? O(n) ?

Are you suggesting BList as part of the standard library?

-Neville

On 4/23/07, Daniel Stutzbach <daniel at stutzbachenterprises.com> wrote:
>
> I have two proposals for Python 3000 that I'll write PEP(s) for if
> there's interest.  The first proposal is, I think, a modest one.  The
> second proposal is related, but more radical.  I fully expect the
> second proposal to be rejected for that alone, especially since I am a
> relatively an outsider to the Python developer community (though it
> was great to meet some of you at PyCon this year).  I bring up the
> more radical proposal primarily for completeness.
>
> The Modest Proposal
> -------------------------------
>
> As a pet project, I've written a container type that looks, acts, and
> quacks like a Python list(), but has better asymptotic performance.
> Specifically, O(log n) time for inserts and deletes in the middle of
> the list.  I'd like to offer it for inclusion in the collections
> module.
>
> Probably, you have encountered some other proposals for types that
> meet this description.  Here are some of the unique features of mine,
> called a BList, which is based on B+Trees.  I'm open to suggestions
> for a better name.
>
> - just as fast as a Python list() when the list is small
> - getslice runs in O(log n) time
> - making a shallow copy runs in O(1) time
> - setslice runs in O(log n + log k) time if the inserted slice is a
> BList of length k
> - multipling a BList by k takes O(log k) time and O(log k) memory
>
> Example:
>
> >>> from blist import *
> >>> x = BList([0])
> >>> x *= 2**29
> >>> x.append(5)
> >>> y = x[4:-234234]
> >>> del x[3:1024]
>
> None of the above operations have a noticeable delay, even though the
> lists have over 500 million elements due to line 3.
>
> The BList has two key features that allow it to pull this off this
> performance:
>
> 1) The BList type features transparent copy-on-write.  If a non-root
> node needs to be copied (as part of a getslice, copy, setslice, etc.),
> the node is shared between multiple parents instead of being copied.
> If it needs to be modified later, it will be copied at that time.
> This is completely behind-the-scenes; from the user's point of view,
> the BList works just like a regular Python list.
>
> 2) A B+Tree is a wide, squat tree.  The maximum number of children per
> node is defined by a compile-time parameter (LIMIT).  If the list is
> shorter than LIMIT, then there is only one node, which simply contains
> an array of the objects in the list.  In other words, for short lists,
> a BList works just like Python's array-based list() type.  Thus, it
> has the same good performance on small lists.
>
> So you can see the performance of the BList in more detail, I've made
> several performance graphs available at the following link:
>    http://stutzbachenterprises.com/blist/
>
> Each webpage shows two graphs:
> - The raw execution time for the operation as a function of the list size
> - The execution time for the operation, normalized by the execution
> time of Python's list()
>
> These graphs were made using LIMIT=128.
>
> The only operation where list() is much faster for small lists is
> getitem through the x[5] syntax.  There's a special case for that
> operation deep inside the Python bytecode interpreter.  My extension
> module can't compete with that. :-)  BList is just as fast as list()
> for small lists via .__getitem__().
>
> The source code for the type is available for compilation as an external
> module:
>     http://stutzbachenterprises.com/blist/blist.tar.gz
>
> I wrote it against Python 2.5.
>
> The Radical Proposal
> -------------------------------
>
> Replace list() with the BList.
>
> There are only a few use-cases (that I can think of) where Python's
> list() regularly outperforms the BList.  These are:
>
> 1. A large LIFO stack, where there are many .append() and .pop(-1)
> operations.  These are O(1) for a Python list, but O(log n) for the
> BList().  (BList's are just as fast for small lists)
>
> 2. An large, unchanging list, where there are many getitem() calls,
> and none of the inserts or deletes where the BList() shines.  Again,
> these are O(1) for a Python list, but O(log n) for the BList(), and
> the BList() is just as fast if the list is small.
>
> Use case #1 is covered by collections.deque.  Use case #2 is covered by
> tuples.
>
> When I first started learning Python, I often ended up inadvertently
> writing O(n**2) algorithms by putting a O(n) list operation inside a
> loop (which is a problem when n=100,000).  Since nothing in the
> signature of list() informs the user when an operation is going to be
> slow, it took me a while to learn how to write my code to work around
> this constraint.  In fact, I initially assumed "list" meant "linked
> list" so most of my assumptions about which operations would be fast
> vs. slow were backwards.  I think it would be more Pythonic if the
> user can trust that the implementation will always be fast (at least
> O(log n)).
>
> I recognize that the Python developer community is understandably very
> attached to the current array-based list, so I expect this to get shot
> down.  I hope this doesn't reflect badly on the more modest proposal
> of including a new type in the collections module.  Also, please don't
> kill the messenger. :-)
>
> --
> Daniel Stutzbach, Ph.D.             President, Stutzbach Enterprises LLC
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/nevillegrech%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20070423/f0bd8136/attachment.html